R/data_curation.R
prepareGenomes.RdThis wrapper takes user_bacs input (species names and/or taxon IDs),
retrieves the corresponding filtered genome set from BV-BRC, downloads all
required genome files (.fna, .PATRIC.faa, .PATRIC.gff), and produces the
Panaroo input table via genomeList(). These outputs are used for the
data_processing.R script next.
prepareGenomes(
user_bacs,
base_dir = ".",
method = c("ftp", "cli"),
overwrite = FALSE,
verbose = TRUE
)Character vector. Species and/or taxon IDs (e.g.
c("Shigella flexneri", "623")).
Character. Project root directory. Default ".".
Character. Download method passed to retrieveGenomes().
"ftp" (default) or "cli".
Logical. Passed to metadata filtering and DuckDB creation. Default FALSE.
Logical. Print progress messages. Default TRUE.
A list (the output of genomeList()), containing:
duckdbConnection Active DBI connection to the per-bug DuckDB
table_name "files"
Internally, this runs:
retrieveGenomes() – filters BV-BRC metadata, selects genomes, downloads files.
genomeList() – scans downloaded files and writes a "files" table
in the per-selection DuckDB.
The per-selection DuckDB is located automatically under:
<base_dir>/data/<bug_dir>/