Executes Panaroo inside a Docker container on genome annotation files prepared by genomeList(). The function can optionally split input genomes into batches, runs Panaroo with strict cleaning and clustering options, and returns the results of each batch execution.

.runPanaroo(
  duckdb_path = "data/{Bug}/{Bug}.duckdb",
  output_path = "data/{Bug}/",
  core_threshold = 0.9,
  len_dif_percent = 0.95,
  cluster_threshold = 0.95,
  family_seq_identity = 0.5,
  threads = 8,
  split_jobs = FALSE
)

Arguments

duckdb_path

A path to the DuckDB database containing the "files" table.

output_path

Character scalar. Base directory for Panaroo outputs and temporary files.

core_threshold

Numeric. Core genome threshold for Panaroo (--core_threshold). Default 0.90.

len_dif_percent

Numeric. Length difference percentage (--len_dif_percent). Default 0.95.

cluster_threshold

Numeric. Sequence identity threshold (--threshold). Default 0.95.

family_seq_identity

Numeric. Gene family clustering identity (-f). Default 0.5.

threads

Integer. Number of threads for Panaroo and parallel execution. Default 8.

split_jobs

Logical. If TRUE, split into multiple smaller pangenome generation jobs that can be merged by .mergePanaroo(). If FALSE, all isolates in one run.

Value

A list of results for each Panaroo batch in its output directory.

Details

  • Panaroo uses: --clean-mode strict, --merge_paralogs, --remove-invalid-genes.

  • Temporary genome file lists are created in output_path.

  • Output directories are named panaroo_out_<timestamp> under output_path.