Concatenates .faa files, executes CD-HIT in a Docker container, and returns paths to the cluster output files.

.runCDHIT(
  duckdb_path,
  output_path,
  output_prefix = "cdhit_out",
  identity = 0.9,
  word_length = 5,
  threads = 0,
  memory = 0,
  extra_args = c("-g", "1")
)

Arguments

duckdb_path

Path to DuckDB containing the files table.

output_path

Directory to write concatenated FASTA and CD-HIT results.

output_prefix

String used to prefix CD-HIT output files.

identity

CD-HIT sequence identity threshold (-c).

word_length

CD-HIT word size (-n).

threads

Integer number of threads.

memory

Integer memory limit (-M).

extra_args

Character vector of additional CD-HIT arguments.

Value

A list containing paths to the concatenated FASTA and cluster FASTA.