Cleanup cluster file
Cleans a cluster file by removing rows that do not contain the query in the cluster.
This function removes irrelevant rows which do not contain the query protein within the ClustName column. The return value is the cleaned up data frame.
Usage
cleanup_clust(
prot,
domains_rename,
domains_keep,
repeat2s = TRUE,
remove_tails = FALSE,
remove_empty = FALSE
)
Arguments
- prot
A data frame that must contain columns Query and ClustName.
- domains_rename
A data frame containing the domain names to be replaced in a column 'old' and the corresponding replacement values in a column 'new'.
- domains_keep
A data frame containing the domain names to be retained.
- repeat2s
Boolean. If TRUE, repeated domains in 'ClustName' are condensed. Default is TRUE.
- remove_tails
Boolean. If TRUE, 'ClustName' will be filtered based on domains to keep/remove. Default is FALSE.
- remove_empty
Boolean. If TRUE, rows with empty/unnecessary values in 'ClustName' are removed. Default is FALSE.