Executes machine learning pipeline with support for stratification,
leave-one-out (LOO), and cross-testing configurations using logistic regression
with parallel processing. Provides flexible model training across different
experimental designs. Split/seed/n_fold are resolved from ml_parameters.json
when available via .resolveSplitParams().
runMLmodels(
path,
stratify_by = NULL,
LOO = FALSE,
cross_test = FALSE,
threads = 16,
split = c(0.8, 0),
n_fold = 5,
prop_vi_top_feats = c(0, 1),
pca_threshold = 0.99,
verbose = TRUE,
return_tune_res = TRUE,
return_fit = TRUE,
return_pred = TRUE,
use_saved_split = TRUE,
shuffle_labels = FALSE,
use_pca = FALSE
)Character scalar. Base directory containing matrix files.
Character scalar or NULL. One of "year", "country", or NULL (no stratification).
Logical. Perform Leave-One-Out analysis. Default is FALSE.
Logical. Perform cross-testing between groups. Default is FALSE.
Integer. Number of parallel workers for model training. Default is 16.
Numeric vector of length 2. Train/validation split proportions.
Integer. Number of cross-validation folds. Default is 5.
Numeric vector of length 2. Proportion range for variable-importance selection.
Numeric. PCA variance threshold. Default 0.99.
Logical. Print progress messages during model training. Default TRUE.
Logical. Return tuning results from cross-validation. Default TRUE.
Logical. Return fitted model objects. Default TRUE.
Logical. Return prediction results. Default TRUE.
Logical. Whether to inherit split/seed/n_fold from ml_parameters.json. Default TRUE.
Logical. Randomly shuffle labels for baseline runs. Default FALSE.
Logical. Use PCA on predictors. Default FALSE.
NULL (invisible). Called for side effects (model training and result saving).
NULL (invisible). Called for side effects (writes results).
This function supports multiple analysis configurations:
Standard mode (stratify_by = NULL, LOO = FALSE, cross_test = FALSE):
Trains models using train/test split from the same dataset
Saves results to ML_* directories
Cross-test without stratification (stratify_by = NULL, cross_test = TRUE):
Trains on one drug/class, tests on another drug/class
Pairs different drugs within same feature type
Saves results to cross_test_ML_* directories
Cross-test with stratification (stratify_by != NULL, cross_test = TRUE):
Trains on one stratum (year/country), tests on another stratum
Same drug/class across different stratification groups
Saves results to cross_test_ML_year_* or cross_test_ML_country_* directories
LOO with cross-test (LOO = TRUE, cross_test = TRUE):
Trains on leave-out dataset (one stratum excluded)
Tests on the full dataset including the left-out stratum
Saves results to LOO_cross_test_ML_year_* or LOO_cross_test_ML_country_* directories
Model configuration:
Algorithm: Logistic Regression with elastic net regularization
Penalty values: 10^seq(-4, -1, length.out = 10)
Mixture (alpha): 0, 0.2, 0.4, 0.6, 0.8, 1.0 (ridge to lasso)
Selection metric: Matthews Correlation Coefficient (MCC)
Random seed: 5280 (for reproducibility)
PCA: Disabled
Output file naming: Files are saved with prefixes and suffixes indicating the configuration:
LOO: Prefixed with "LOO_"
Cross-test: Prefixed with "cross_test_"
Stratification: Suffixed with "_country" or "_year"
For example: "LOO_cross_test_ML_year_performance.tsv"
This function requires the following packages:
future - for parallel processing backend
future.apply - for parallel lapply
readr - for reading/writing TSV files
dplyr, purrr, stringr, tibble - for data manipulation
Ensure that loadMLInputTibble(), runMLPipeline(), and
createMLinputList() are available in your environment before calling this function.
createMLinputList for generating input file lists,
runMDRmodels for MDR-specific model execution,
createMLResultDir for directory structure creation
if (FALSE) { # \dontrun{
# Standard ML models (no stratification)
runMLmodels("/path/to/results")
# Cross-test between drugs (no stratification)
runMLmodels("/path/to/results", cross_test = TRUE)
# Stratified by year
runMLmodels("/path/to/results", stratify_by = "year")
# Cross-test with year stratification
runMLmodels("/path/to/results",
stratify_by = "year",
cross_test = TRUE,
threads = 32)
# LOO analysis stratified by country with cross-testing
runMLmodels("/path/to/results",
stratify_by = "country",
LOO = TRUE,
cross_test = TRUE,
verbose = TRUE)
# Run without saving model fits (save disk space)
runMLmodels("/path/to/results",
stratify_by = "year",
return_fit = FALSE)
} # }