R/data_processing.R
dot-runPanaroo.RdExecutes Panaroo inside a Docker container on genome annotation
files prepared by genomeList(). The function can optionally split input genomes
into batches, runs Panaroo with strict cleaning and clustering options, and
returns the results of each batch execution.
.runPanaroo(
duckdb_path = "data/{Bug}/{Bug}.duckdb",
output_path = "data/{Bug}/",
core_threshold = 0.9,
len_dif_percent = 0.95,
cluster_threshold = 0.95,
family_seq_identity = 0.5,
threads = 8,
split_jobs = FALSE
)A path to the DuckDB database containing the "files" table.
Character scalar. Base directory for Panaroo outputs and temporary files.
Numeric. Core genome threshold for Panaroo (--core_threshold). Default 0.90.
Numeric. Length difference percentage (--len_dif_percent). Default 0.95.
Numeric. Sequence identity threshold (--threshold). Default 0.95.
Numeric. Gene family clustering identity (-f). Default 0.5.
Integer. Number of threads for Panaroo and parallel execution. Default 8.
Logical. If TRUE, split into multiple smaller pangenome
generation jobs that can be merged by .mergePanaroo(). If FALSE, all isolates in one run.
A list of results for each Panaroo batch in its output directory.
Panaroo uses: --clean-mode strict, --merge_paralogs, --remove-invalid-genes.
Temporary genome file lists are created in output_path.
Output directories are named panaroo_out_<timestamp> under output_path.