Download and prepare all files for a chosen bacterial species or TaxID

This wrapper takes user_bacs input (species names and/or taxon IDs), retrieves the corresponding filtered genome set from BV-BRC, downloads all required genome files (.fna, .PATRIC.faa, .PATRIC.gff), and produces the Panaroo input table via genomeList(). These outputs are used for the data_processing.R script next.

prepareGenomes(
  user_bacs,
  base_dir = ".",
  method = c("ftp", "cli"),
  overwrite = FALSE,
  verbose = TRUE
)

Arguments

user_bacs: Character vector. Species and/or taxon IDs (e.g. c("Shigella flexneri", "623")).
base_dir: Character. Project root directory. Default ".".
method: Character. Download method passed to retrieveGenomes(). "ftp" (default) or "cli".
overwrite: Logical. Passed to metadata filtering and DuckDB creation. Default FALSE.
verbose: Logical. Print progress messages. Default TRUE.

Value

A list (the output of genomeList()), containing:

duckdbConnection Active DBI connection to the per-bug DuckDB
table_name "files"

Details

Internally, this runs:

retrieveGenomes() – filters BV-BRC metadata, selects genomes, downloads files.
genomeList() – scans downloaded files and writes a "files" table in the per-selection DuckDB.

The per-selection DuckDB is located automatically under: <base_dir>/data/<bug_dir>/.duckdb