Parses parquet file names in the matrix directory and generates a tibble mapping input files to output paths. Handles multiple analysis modes including standard, cross-test, Leave-One-Out (LOO), and Multi-Drug Resistance (MDR).

createMLinputList(
  path,
  stratify_by = NULL,
  LOO = FALSE,
  MDR = FALSE,
  cross_test = FALSE
)

Arguments

path

Character scalar. Base directory path containing matrix subdirectories.

stratify_by

Character scalar or NULL. Stratification method: "country", "year", or NULL.

LOO

Logical. Whether to perform Leave-One-Out analysis. Requires stratify_by.

MDR

Logical. Whether to perform Multi-Drug Resistance analysis.

cross_test

Logical. Whether to perform cross-testing between groups.

Value

A tibble with columns:

ref_file

Path to reference/training parquet file

test_file

Path to test parquet file (NA for non-cross-test)

output_prefix

Prefix for output files

matrix_path

Directory containing matrix files

out_perf

Directory for performance output

out_top

Directory for top features output

out_models

Directory for model objects

out_pred

Directory for predictions

Examples

if (FALSE) { # \dontrun{
# Standard ML input list
inputs <- createMLinputList("/path/to/results")

# Cross-test with year stratification
inputs_ct <- createMLinputList("/path/to/results",
                               stratify_by = "year",
                               cross_test = TRUE)

# MDR analysis
inputs_mdr <- createMLinputList("/path/to/results", MDR = TRUE)
} # }