Skip to contents

Cleanup cluster file

Cleans a cluster file by removing rows that do not contain the query in the cluster.

This function removes irrelevant rows which do not contain the query protein within the ClustName column. The return value is the cleaned up data frame.

Usage

cleanup_clust(
  prot,
  domains_rename,
  domains_keep,
  repeat2s = TRUE,
  remove_tails = FALSE,
  remove_empty = FALSE
)

Arguments

prot

A data frame that must contain columns Query and ClustName.

domains_rename

A data frame containing the domain names to be replaced in a column 'old' and the corresponding replacement values in a column 'new'.

domains_keep

A data frame containing the domain names to be retained.

repeat2s

Boolean. If TRUE, repeated domains in 'ClustName' are condensed. Default is TRUE.

remove_tails

Boolean. If TRUE, 'ClustName' will be filtered based on domains to keep/remove. Default is FALSE.

remove_empty

Boolean. If TRUE, rows with empty/unnecessary values in 'ClustName' are removed. Default is FALSE.

Value

Cleaned up data frame

Examples

if (FALSE) { # \dontrun{
cleanup_clust(prot, TRUE, FALSE, domains_keep, domains_rename)
} # }