Compute word frequencies for a single MSigDB collection

computeMsigWordFreq(
  msigGsc,
  weight = NULL,
  measure = c("tfidf", "tf"),
  version = msigdb::getMsigdbVersions(),
  org = c("auto", "hs", "mm"),
  rmwords = getMsigBlacklist()
)

Arguments

msigGsc

a GeneSetCollection object, containing gene sets from the MSigDB. The GSEABase::getBroadSets() function can be used to parse XML files downloaded from MSigDB.

weight

a named numeric vector, containing weights to apply to each gene-set. This can be -log10(FDR), -log10(p-value) or an enrichment score (ideally unsigned).

measure

a character, specifying how frequencies should be computed. "tf" uses term frequencies and "tfidf" (default) applies inverse document frequency weights to term frequencies.

version

a character, specifying the version of msigdb to use (see msigdb::getMsigdbVersions()).

org

a character, specifying the organism to use. This can either be "auto" (default), "hs" or "mm".

rmwords

a character vector, containing a blacklist of words to discard from the analysis.

Value

a list, containing two data.frames summarising the results of the frequency analysis on gene set names and short descriptions.

Examples

data(hgsc)
freq <- computeMsigWordFreq(hgsc, measure = 'tfidf')
#> Warning: Assuming the organism to be human.