Cleanup Domain Architectures
Cleans the DomArch column by replacing/removing certain domains
This function cleans the DomArch column of one data frame by renaming certain domains according to a second data frame. Certain domains can be removed according to an additional data frame. The original data frame is returned with the clean DomArchs column and the old domains in the DomArchs.old column.
Usage
cleanup_domarch(
prot,
old = "DomArch.orig",
new = "DomArch",
domains_keep,
domains_rename,
repeat2s = TRUE,
remove_tails = FALSE,
remove_empty = F,
domains_ignore = NULL
)
Arguments
- prot
A data frame containing a 'DomArch' column
- domains_keep
A data frame containing the domain names to be retained.
- domains_rename
A data frame containing the domain names to be replaced in a column 'old' and the corresponding replacement values in a column 'new'.
- repeat2s
Boolean. If TRUE, repeated domains in 'DomArch' are condensed. Default is TRUE.
- remove_tails
Boolean. If TRUE, 'ClustName' will be filtered based on domains to keep/remove. Default is FALSE.
- remove_empty
Boolean. If TRUE, rows with empty/unnecessary values in 'DomArch' are removed. Default is FALSE.
- domains_ignore
A data frame containing the domain names to be removed in a column called 'domains'