This function generates a number of random gene sets that have the same number of genes as the scored gene set. It scores each random gene set and returns a matrix of scores for all samples. The empirical scores are used to calculate the empirical p-values and plot the null distribution. The implementation uses BiocParallel::bplapply() for easy access to parallel backends. Note that one should pass the same values to the upSet, downSet, centerScore and bidirectional arguments as what they provide for the simpleScore() function to generate a proper null distribution.

generateNull(
  upSet,
  downSet = NULL,
  rankData,
  subSamples = NULL,
  centerScore = TRUE,
  knownDirection = TRUE,
  B = 1000,
  ncores = 1,
  seed = sample.int(1e+06, 1),
  useBPPARAM = NULL
)

# S4 method for vector,ANY
generateNull(
  upSet,
  downSet = NULL,
  rankData,
  subSamples = NULL,
  centerScore = TRUE,
  knownDirection = TRUE,
  B = 1000,
  ncores = 1,
  seed = sample.int(1e+06, 1),
  useBPPARAM = NULL
)

# S4 method for GeneSet,ANY
generateNull(
  upSet,
  downSet = NULL,
  rankData,
  subSamples = NULL,
  centerScore = TRUE,
  knownDirection = TRUE,
  B = 1000,
  ncores = 1,
  seed = sample.int(1e+06, 1),
  useBPPARAM = NULL
)

# S4 method for vector,vector
generateNull(
  upSet,
  downSet = NULL,
  rankData,
  subSamples = NULL,
  centerScore = TRUE,
  knownDirection = TRUE,
  B = 1000,
  ncores = 1,
  seed = sample.int(1e+06, 1),
  useBPPARAM = NULL
)

# S4 method for GeneSet,GeneSet
generateNull(
  upSet,
  downSet = NULL,
  rankData,
  subSamples = NULL,
  centerScore = TRUE,
  knownDirection = TRUE,
  B = 1000,
  ncores = 1,
  seed = sample.int(1e+06, 1),
  useBPPARAM = NULL
)

Arguments

upSet

A GeneSet object or character vector of gene IDs of up-regulated gene set or a gene set where the nature of genes is not known

downSet

A GeneSet object or character vector of gene IDs of down-regulated gene set or NULL where only a single gene set is provided

rankData

A matrix object, ranked gene expression matrix data generated using the rankGenes() function (make sure this matrix is not modified, see details)

subSamples

A vector of sample labels/indices that will be used to subset the rankData matrix. All samples will be scored if not provided

centerScore

A Boolean, specifying whether scores should be centered around 0, default as TRUE. Note: scores never centered if knownDirection = FALSE

knownDirection

A boolean, determining whether the gene set should be considered to be directional or not. A gene set is directional if the type of genes in it are known i.e. up- or down-regulated. This should be set to TRUE if the gene set is composed of both up- AND down-regulated genes. Defaults to TRUE. This parameter becomes irrelevant when both upSet(Colc) and downSet(Colc) are provided.

B

integer, the number of permutation repeats or the number of random gene sets to be generated, default as 1000

ncores,

integer, the number of CPU cores the function can use

seed

integer, set the seed for randomisation

useBPPARAM,

the backend the function uses, if NULL is provided, the function uses the default parallel backend which is the first on the list returned by BiocParallel::registered() i.e BiocParallel::registered()[[1]] for your machine. It can be changed explicitly by passing a BPPARAM

Value

A matrix of empirical scores for all samples

See also

Post about BiocParallel browseVignettes("BiocParallel")

Author

Ruqian Lyu

Examples

ranked <- rankGenes(toy_expr_se)
scoredf <- simpleScore(ranked, upSet = toy_gs_up, downSet = toy_gs_dn)

# find out what backends can be registered on your machine
BiocParallel::registered()
#> $MulticoreParam
#> class: MulticoreParam
#>   bpisup: FALSE; bpnworkers: 4; bptasks: 0; bpjobname: BPJOB
#>   bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
#>   bpRNGseed: ; bptimeout: NA; bpprogressbar: FALSE
#>   bpexportglobals: TRUE; bpexportvariables: FALSE; bpforceGC: FALSE
#>   bpfallback: TRUE
#>   bplogdir: NA
#>   bpresultdir: NA
#>   cluster type: FORK
#> 
#> $SnowParam
#> class: SnowParam
#>   bpisup: FALSE; bpnworkers: 4; bptasks: 0; bpjobname: BPJOB
#>   bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
#>   bpRNGseed: ; bptimeout: NA; bpprogressbar: FALSE
#>   bpexportglobals: TRUE; bpexportvariables: TRUE; bpforceGC: FALSE
#>   bpfallback: TRUE
#>   bplogdir: NA
#>   bpresultdir: NA
#>   cluster type: SOCK
#> 
#> $SerialParam
#> class: SerialParam
#>   bpisup: FALSE; bpnworkers: 1; bptasks: 0; bpjobname: BPJOB
#>   bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
#>   bpRNGseed: ; bptimeout: NA; bpprogressbar: FALSE
#>   bpexportglobals: FALSE; bpexportvariables: FALSE; bpforceGC: FALSE
#>   bpfallback: FALSE
#>   bplogdir: NA
#>   bpresultdir: NA
#> 
# the first one is the default backend
# ncores = ncores <- parallel::detectCores() - 2
permuteResult = generateNull(upSet = toy_gs_up, downSet = toy_gs_dn, ranked,
centerScore = TRUE, B =10, seed = 1, ncores = 1 )