Evaluates whether a drug-bug combination should be skipped based on insufficient samples or class imbalance for cross-validation or train/val/test splits.
The subsetted genome IDs for a given matrix combination
The count of AMR observations for that combination
Number of CV folds
For CV: c(1, 0); for classical splits: c(train_prop, val_prop)
Minimum total observations required before attempting CV (default 40)
"minimal" or "debug"; "debug" offers more info per processed matrix
Logical; TRUE if matrix should be skipped, FALSE if matrix should be generated
if (FALSE) { # \dontrun{
# Check if a drug-bug combination should be skipped
genome_ids <- c("genome1", "genome2", "genome3")
phenotype_counts <- data.frame(
phenotype = c("Resistant", "Susceptible"),
count = c(2, 1)
)
# For 5-fold CV
skip <- skipImbalancedMatrix(
genome_ids = genome_ids,
phenotype_counts = phenotype_counts,
n_fold = 5,
split = c(1, 0),
min_total_obs = 40,
verbosity = "minimal"
)
} # }