This wrapper takes user_bacs input (species names and/or taxon IDs), retrieves the corresponding filtered genome set from BV-BRC, downloads all required genome files (.fna, .PATRIC.faa, .PATRIC.gff), and produces the Panaroo input table via genomeList(). These outputs are used for the data_processing.R script next.

prepareGenomes(
  user_bacs,
  base_dir = ".",
  method = c("ftp", "cli"),
  overwrite = FALSE,
  verbose = TRUE
)

Arguments

user_bacs

Character vector. Species and/or taxon IDs (e.g. c("Shigella flexneri", "623")).

base_dir

Character. Project root directory. Default ".".

method

Character. Download method passed to retrieveGenomes(). "ftp" (default) or "cli".

overwrite

Logical. Passed to metadata filtering and DuckDB creation. Default FALSE.

verbose

Logical. Print progress messages. Default TRUE.

Value

A list (the output of genomeList()), containing:

  • duckdbConnection Active DBI connection to the per-bug DuckDB

  • table_name "files"

Details

Internally, this runs:

  1. retrieveGenomes() – filters BV-BRC metadata, selects genomes, downloads files.

  2. genomeList() – scans downloaded files and writes a "files" table in the per-selection DuckDB.

The per-selection DuckDB is located automatically under: <base_dir>/data/<bug_dir>/.duckdb