Filter specific cell type signature genes against other subsets.

Specify the signature of the subset matched 'target_group' against other subsets, either "union", "intersect" or "RRA" can be specified when input is a list of datasets to integrate the signatures into one.

filter_subset_sig(
  data,
  group_col,
  target_group,
  markers = NULL,
  normalize = TRUE,
  dir = "UP",
  gene_id = "SYMBOL",
  feature_selection = c("auto", "rankproduct", "none"),
  comb = union,
  filter = c(10, 10),
  s_thres = 0.05,
  ...
)

# S4 method for list
filter_subset_sig(
  data,
  group_col,
  target_group,
  markers = NULL,
  normalize = TRUE,
  dir = "UP",
  gene_id = "SYMBOL",
  feature_selection = c("auto", "rankproduct", "none"),
  comb = union,
  filter = c(10, 10),
  s_thres = 0.05,
  slot = "counts",
  batch = NULL,
  ...
)

# S4 method for DGEList
filter_subset_sig(
  data,
  group_col,
  target_group,
  markers = NULL,
  normalize = TRUE,
  dir = "UP",
  gene_id = "SYMBOL",
  feature_selection = c("auto", "rankproduct", "none"),
  comb = union,
  filter = c(10, 10),
  s_thres = 0.05,
  ...
)

# S4 method for ANY
filter_subset_sig(
  data,
  group_col,
  target_group,
  markers = NULL,
  normalize = TRUE,
  dir = "UP",
  gene_id = "SYMBOL",
  feature_selection = c("auto", "rankproduct", "none"),
  comb = union,
  filter = c(10, 10),
  s_thres = 0.05,
  ...
)

Arguments

data: An expression data or a list of expression data objects
group_col: vector or character, specify the group factor or column name of coldata for DE comparisons
target_group: pattern, specify the group of interest, e.g. NK
markers: vector, a vector of gene names, listed the gene symbols to be kept anyway after filtration. Default 'NULL' means no special genes need to be kept.
normalize: logical, if the expr in data is raw counts needs to be normalized
dir: character, could be 'UP' or 'DOWN' to use only up- or down-expressed genes
gene_id: character, specify the gene ID target_group of rownames of expression data when markers is not NULL, could be one of 'ENSEMBL', 'SYMBOL', 'ENTREZ'..., default 'SYMBOL'
feature_selection: one of "auto" (default), "rankproduct" or "none", choose if to use rank product or not to select DEGs from multiple comparisons of DE analysis, 'auto' uses 'rankproduct' but change to 'none' if final genes < 5 for both UP and DOWN
comb: 'RRA' or Fun for combining sigs from multiple datasets, keep all passing genes or only intersected genes, could be union or intersect or setdiff or customized Fun, or could be 'RRA' to use Robust Rank Aggregation method for integrating multi-lists of sigs, default 'union'
filter: (list of) vector of 2 numbers, filter condition to remove low expression genes, the 1st for min.counts (if normalize = TRUE) or CPM/TPM (if normalize = FALSE), the 2nd for samples size 'large.n'
s_thres: num, threshold of score if comb = 'RRA'
...: other params for get_degs()
slot: character, specify which slot to use only for DGEList, sce or seurat object, optional, default 'counts'
batch: vector of character, column name(s) of coldata to be treated as batch effect factor, default NULL

Value

a vector of gene symbols

Examples

data("im_data_6", "nk_markers")
sigs <- filter_subset_sig(im_data_6, "celltype:ch1", "NK",
  markers = nk_markers$HGNC_Symbol,
  gene_id = "ENSEMBL"
)
#> 'select()' returned 1:many mapping between keys and columns
#>        NK-Neutrophils NK-Monocytes NK-B.cells NK-CD4 NK-CD8
#> Down             4011         3946       3145   2696   2154
#> NotSig           1489         2688       4420   4995   6192
#> Up               4935         3801       2870   2744   2089
#> 'select()' returned 1:many mapping between keys and columns