Specify the signature of the subset matched 'target_group' against other subsets, either "union", "intersect" or "RRA" can be specified when input is a list of datasets to integrate the signatures into one.

filter_subset_sig(
  data,
  group_col,
  target_group,
  markers = NULL,
  normalize = TRUE,
  dir = "UP",
  gene_id = "SYMBOL",
  feature_selection = c("auto", "rankproduct", "none"),
  comb = union,
  filter = c(10, 10),
  s_thres = 0.05,
  ...
)

# S4 method for list
filter_subset_sig(
  data,
  group_col,
  target_group,
  markers = NULL,
  normalize = TRUE,
  dir = "UP",
  gene_id = "SYMBOL",
  feature_selection = c("auto", "rankproduct", "none"),
  comb = union,
  filter = c(10, 10),
  s_thres = 0.05,
  slot = "counts",
  batch = NULL,
  ...
)

# S4 method for DGEList
filter_subset_sig(
  data,
  group_col,
  target_group,
  markers = NULL,
  normalize = TRUE,
  dir = "UP",
  gene_id = "SYMBOL",
  feature_selection = c("auto", "rankproduct", "none"),
  comb = union,
  filter = c(10, 10),
  s_thres = 0.05,
  ...
)

# S4 method for ANY
filter_subset_sig(
  data,
  group_col,
  target_group,
  markers = NULL,
  normalize = TRUE,
  dir = "UP",
  gene_id = "SYMBOL",
  feature_selection = c("auto", "rankproduct", "none"),
  comb = union,
  filter = c(10, 10),
  s_thres = 0.05,
  ...
)

Arguments

data

An expression data or a list of expression data objects

group_col

vector or character, specify the group factor or column name of coldata for DE comparisons

target_group

pattern, specify the group of interest, e.g. NK

markers

vector, a vector of gene names, listed the gene symbols to be kept anyway after filtration. Default 'NULL' means no special genes need to be kept.

normalize

logical, if the expr in data is raw counts needs to be normalized

dir

character, could be 'UP' or 'DOWN' to use only up- or down-expressed genes

gene_id

character, specify the gene ID target_group of rownames of expression data when markers is not NULL, could be one of 'ENSEMBL', 'SYMBOL', 'ENTREZ'..., default 'SYMBOL'

feature_selection

one of "auto" (default), "rankproduct" or "none", choose if to use rank product or not to select DEGs from multiple comparisons of DE analysis, 'auto' uses 'rankproduct' but change to 'none' if final genes < 5 for both UP and DOWN

comb

'RRA' or Fun for combining sigs from multiple datasets, keep all passing genes or only intersected genes, could be union or intersect or setdiff or customized Fun, or could be 'RRA' to use Robust Rank Aggregation method for integrating multi-lists of sigs, default 'union'

filter

(list of) vector of 2 numbers, filter condition to remove low expression genes, the 1st for min.counts (if normalize = TRUE) or CPM/TPM (if normalize = FALSE), the 2nd for samples size 'large.n'

s_thres

num, threshold of score if comb = 'RRA'

...

other params for get_degs()

slot

character, specify which slot to use only for DGEList, sce or seurat object, optional, default 'counts'

batch

vector of character, column name(s) of coldata to be treated as batch effect factor, default NULL

Value

a vector of gene symbols

Examples

data("im_data_6", "nk_markers")
sigs <- filter_subset_sig(im_data_6, "celltype:ch1", "NK",
  markers = nk_markers$HGNC_Symbol,
  gene_id = "ENSEMBL"
)
#> 'select()' returned 1:many mapping between keys and columns
#>        NK-Neutrophils NK-Monocytes NK-B.cells NK-CD4 NK-CD8
#> Down             4011         3946       3145   2696   2154
#> NotSig           1489         2688       4420   4995   6192
#> Up               4935         3801       2870   2744   2089
#> 'select()' returned 1:many mapping between keys and columns