amRml is part of the amR suite for antimicrobial resistance prediction. amRml is the machine learning (ML) engine of the https://github.com/JRaviLab/amR. It trains interpretable ML models to predict antimicrobial resistance (AMR) from the multi‑scale genomic features prepared by amRdata. The package includes streamlined pipelines for:
All with reproducible, parallelized model execution.
amRml produces ML‑ready tibbles, trains logistic regression models, and exports clean performance summaries, feature rankings, predictions, and tuned model objects.
amRml provides functions to:
The models are optimized for reproducibility, interpretability, and benchmarking across species and antibiotic classes. See the https://jravilab.github.io/amRml/articles/intro.html for full examples.
# Install from GitHub
if (!requireNamespace("remotes", quietly = TRUE))
install.packages("remotes")
remotes::install_github("JRaviLab/amRml")generateMLInputs() converts the Parquet‑backed DuckDB file created by amRdata into a standard set of ready-to-model matrices.
library(amRml)
generateMLInputs(
parquet_duckdb_path = "data/Shigella_flexneri/Sfl_parquet.duckdb",
out_path = "results/Shigella_flexneri/",
n_fold = 5,
verbosity = "minimal"
)This will analyze the Parquet tables attached to the {Bug}_parquet.duckdb database created by the amRdata::cleanData() step at the end of the amRdata::runDataProcessing() workflow. For each drug or drug class, it evaluates how many isolates have paired genotype and AMR phenotype data, and produced subsetted ML-ready matrices for each feature scale.
The result: All modeling matrices that can be created for a given dataset, ready for input into the ML modeling workflow next.
runMLmodels() executes logistic regression ML models across all prepared matrices.
runMLmodels(
path = "results/Shigella_flexneri/",
threads = 16
)This produces 4 output directories: - performance: Containing performance metrics per bug-drug-feature_scale model - pred: Containing the specific predictions made per model - top_features: Containing which specific features drove model predictions per model - models: Containing the model fits themselves in .Rds format
These models can also be adjusted with parameters like shuffle_labels = TRUE (creates random baseline models to compare against), use_pca = TRUE (to reduce feature space by using PCs as features), stratify_by = "country" (to stratify samples into countries of origin to identify regional trends in AMR and how well models generalize), and many other options.
runMDRmodels() trains models that predict aggregated multi‑drug resistance phenotypes.
runMDRmodels(
path = "results/Shigella_flexneri/",
threads = 16
)This uses specific matrices to test whether ML models can predict resistance against multiple drug classes, and identify any features associated with MDR.
See the package vignette for detailed usage.
amRml is designed to work seamlessly with other amR packages:
library(amRdata)
library(amRml)
library(amRshiny)
# 1. Curate data
prepareGenomes("Shigella flexneri")
runDataProcessing("amRdata/data/Shigella_flexneri/Sfl.duckdb")
# 2. Train models
runMLmodels("amRdata/data/Shigella_flexneri/Sfl_parquet.duckdb")
# 3. Visualize
launchAMRDashboard()We welcome contributions! Please see CONTRIBUTING.md for guidelines.
Report bugs and request features at: https://github.com/JRaviLab/amRml/issues
BSD 3-Clause License. See LICENSE for details.
Corresponding author: Janani Ravi (janani.ravi@cuanschutz.edu)
Lab website: https://jravilab.github.io
Please note that amRml is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.