Splits an ML-ready tibble into training and testing (and in some cases validation) sets.

splitMLInputTibble(ml_input_tibble, split = c(0.6, 0.2), seed = NULL)

Arguments

ml_input_tibble

An ML-ready tibble generated by loadMLInputTibble(). This must have a target variable column named either genome_drug.resistant_phenotype ("Resistant" or "Susceptible " classification for one bug/drug combination) or resistant_classes (multi-class classification for determining the drug classes to which each genome is resistant), but not both.

split

num Vector of length 2 indicating the proportion of data to be designated as training and validation, respectively.

seed

num Optional. If supplied, the split is seeded (and the caller's RNG state restored afterward) for standalone reproducibility. When NULL (the default, as used by runMLPipeline()), the split inherits the ambient RNG stream so it can share one seed with downstream tuning and fitting.

Value

An rsplit object

Examples

ml <- tibble::tibble(
  genome_id = paste0("g", 1:20),
  genome_drug.resistant_phenotype = rep(c("Resistant", "Susceptible"), each = 10),
  feat_a = rep(c(0L, 1L), 10),
  feat_b = rep(c(1L, 0L), 10)
)
splitMLInputTibble(ml, split = c(1, 0), seed = 42)
#> <Training/Testing/Total>
#> <16/4/20>