Cleanup cluster file
Cleans a cluster file by removing rows that do not contain the query in the cluster.
This function removes irrelevant rows which do not contain the query protein within the ClustName column. The return value is the cleaned up data frame.
Usage
cleanClusters(
prot,
domains_rename,
domains_keep,
condenseRepeatedDomains = TRUE,
removeTails = FALSE,
removeEmptyRows = FALSE
)
Arguments
- prot
A data frame that must contain columns Query and ClustName.
- domains_rename
A data frame containing the domain names to be replaced in a column 'old' and the corresponding replacement values in a column 'new'.
- domains_keep
A data frame containing the domain names to be retained.
- condenseRepeatedDomains
Boolean. If TRUE, repeated domains in 'ClustName' are condensed. Default is TRUE.
- removeTails
Boolean. If TRUE, 'ClustName' will be filtered based on domains to keep/remove. Default is FALSE.
- removeEmptyRows
Boolean. If TRUE, rows with empty/unnecessary values in 'ClustName' are removed. Default is FALSE.