Concatenates .faa files, executes CD-HIT in a Docker container,
and returns paths to the cluster output files.
.runCDHIT(
duckdb_path,
output_path,
output_prefix = "cdhit_out",
identity = 0.9,
word_length = 5,
threads = 0,
memory = 0,
extra_args = c("-g", "1")
)Path to DuckDB containing the files table.
Directory to write concatenated FASTA and CD-HIT results.
String used to prefix CD-HIT output files.
CD-HIT sequence identity threshold (-c).
CD-HIT word size (-n).
Integer number of threads.
Integer memory limit (-M).
Character vector of additional CD-HIT arguments.
A list containing paths to the concatenated FASTA and cluster FASTA.