vignettes/ExperimentList.Rmd
ExperimentList.Rmd
Abstract
The ExperimentList package defines S4 classes to handle data
from multiple experiments or studies by providing features of
lists as well as those of a concatenated experiment. Individual
experiments can be in the form of SummarizedExperiment, Ranged
SummarizedExperiment, SingleCellExperiment, or SpatialExperiment
objects. Annotations specific to each experiment are stored thus
providing a unified interface to dealing with data from multiple
studies. Specialised functions to access experiment data, and to
apply functions across experiments are implemented. Existing
functions implemented for each individual experiment (e.g.,
SingleCellExperiment::reducedDim()
) can be readily
applied across the entire list of experiments.
The advent of high throughput molecular measurement technologies has
resulted in the generation of vast amounts of data. The
SummarizedExperiment
object and its derivates have assisted
in hosting data from these technologies. The
SingleCellExperiment
and SpatialExperiment
objects are able to store even higher resolution single-cell and spatial
transcriptomics measurements from a single biological sample
respectively. Reduced costs has enabled the generation of these data
from multiple biological samples. Such data is not easily stored and
manupilated within a single object. Concatenation of objects can
partially resolve this data since it can be analysed in unison, however,
prevents for sample-wise analysis. Maintaining a list of objects would
allow object-wise analysis, but would hinder collective analysis.
The ExperimentList
object is designed to fill in this
gap and allows storage and manipulation of multiple
SpatialExperiment
, SingleCellExperiment
,
RangedSummarizedExperiment
or
SummarizedExperiment
objects. It provides both list-like
and object-like functionality thus providing dynamic access to the data
as the needs arise. For example, when analysing multiple spatial
transcriptomic datasets, users may wish to compute reduced dimensions
(e.g., PCA) on each individual object before dataset integration and
compute a combined reduced dimension after. In such scenarios, having a
hybrid interface to the data is beneficial.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("ExperimentList")
ExperimentList
The ExperimentList
object borrows its structure from the
object it hosts. When dealing with a list of
SpatialExperiment
objects, all slots from the object are
made available therefore the rows contain feature (e.g.,
gene/transcript) information, the columns represent individual
observations (e.g., visium spots), and multiple assays can be stored.
The individual objects (different shades in the schematic) are
concatenated to create a single SpatialExperiment
object.
The mapping between the original sample and the concatenated columns are
maintained using and internal slot (experimentIndex
).
Sample-specific annotation (NOT spot-specific) are held in a
DataFrame
and are linked to the columns of the new
ExperimentList
object.
The following schematic demonstrates the structure of the ExperimentList object and lists some common accessors to interact with the object.
Under the hood, a separate class is used to hold the list of objects as described in the table below.
Object class | Matched ExperimentList class |
---|---|
SummarizedExperiment |
SummarizedExperimentList |
RangedSummarizedExperiment |
RangedSummarizedExperimentList |
SingleCellExperiment |
SingleCellExperimentList |
SpatialExperiment |
SpatialExperimentList |
ExperimentList
The TENxVisiumData
package contains 10X Visium data from
various human and mouse tissues. We will use the breast cancer IDC and
ILC datasets from this package to demonstrate the ExperimentList
container. These data contain measurements of 36601 transcripts measured
across 7785 and 4325 spots respectively.
library(ExperimentList)
library(TENxVisiumData)
#download data
spe1 = TENxVisiumData::HumanBreastCancerIDC()
spe2 = TENxVisiumData::HumanBreastCancerILC()
#remove alt exps - these should be matched across exps (likewise for rownames)
altExps(spe2) = list()
#create a list of objects
spe_list = list(
'HumanBreastCancerIDC' = spe1,
'HumanBreastCancerILC' = spe2
)
spe_list
#> $HumanBreastCancerIDC
#> class: SpatialExperiment
#> dim: 36601 7785
#> metadata(0):
#> assays(1): counts
#> rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
#> ENSG00000277196
#> rowData names(1): symbol
#> colnames(7785): AAACAAGTATCTCCCA-1 AAACACCAATAACTGC-1 ...
#> TTGTTTGTATTACACG-1 TTGTTTGTGTAAATTC-1
#> colData names(1): sample_id
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
#> imgData names(4): sample_id image_id data scaleFactor
#>
#> $HumanBreastCancerILC
#> class: SpatialExperiment
#> dim: 36601 4325
#> metadata(0):
#> assays(1): counts
#> rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
#> ENSG00000277196
#> rowData names(1): symbol
#> colnames(4325): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
#> TTGTTTCCATACAACT-1 TTGTTTGTGTAAATTC-1
#> colData names(1): sample_id
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
#> imgData names(4): sample_id image_id data scaleFactor
Given the individual objects, we first create a sample-specific
annotation table and subsequently use it to create an
ExperimentList
object. With the data prepared, the object
can be constructed using the ExperimentList()
function.
#create some artificial experiment annotations
experimentAnnotation = data.frame(
sex = c('Female', 'Female'),
age = c(65, 68),
row.names = c('HumanBreastCancerIDC', 'HumanBreastCancerILC')
)
experimentAnnotation
#> sex age
#> HumanBreastCancerIDC Female 65
#> HumanBreastCancerILC Female 68
#create ExperimentList objects
el = ExperimentList(experiments = spe_list, experimentData = experimentAnnotation)
el
#> ExperimentList with 2 SpatialExperiments
#> class: SpatialExperimentList
#> dim: 36601 12110
#> metadata(0):
#> assays(1): counts
#> rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
#> ENSG00000277196
#> rowData names(1): symbol
#> colnames(12110): HumanBreastCancerIDC.AAACAAGTATCTCCCA-1
#> HumanBreastCancerIDC.AAACACCAATAACTGC-1 ...
#> HumanBreastCancerILC.TTGTTTCCATACAACT-1
#> HumanBreastCancerILC.TTGTTTGTGTAAATTC-1
#> colData names(1): sample_id
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
#> imgData names(4): sample_id image_id data scaleFactor
#> experiments: 2
#> experimentNames (2): HumanBreastCancerIDC HumanBreastCancerILC
#> experimentData names (2): sex age
The ExperimentList
object can be created without
experimentData and with an unnamed list. When using a named
list, names of the list should be matched with rownames of
experimentData.
ExperimentList
Subsetting can be performed in a similar way to the parental
SummarizedExperiment
object using the [
function.
#subset the first five features and first three samples
el[1:5, 1:3]
#> ExperimentList with 2 SpatialExperiments
#> class: SpatialExperimentList
#> dim: 5 3
#> metadata(0):
#> assays(1): counts
#> rownames(5): ENSG00000243485 ENSG00000237613 ENSG00000186092
#> ENSG00000238009 ENSG00000239945
#> rowData names(1): symbol
#> colnames(3): HumanBreastCancerIDC.AAACAAGTATCTCCCA-1
#> HumanBreastCancerIDC.AAACACCAATAACTGC-1
#> HumanBreastCancerIDC.AAACAGAGCGACTCCT-1
#> colData names(1): sample_id
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
#> imgData names(4): sample_id image_id data scaleFactor
#> experiments: 2
#> experimentNames (2): HumanBreastCancerIDC HumanBreastCancerILC
#> experimentData names (2): sex age
In addition to this, entire experiments can be subsetted using the
[
function as below.
#subset the first five features and all columns from the second experiment
el[1:5, , exp = 2]
#> ExperimentList with 1 SpatialExperiments
#> class: SpatialExperimentList
#> dim: 5 4325
#> metadata(0):
#> assays(1): counts
#> rownames(5): ENSG00000243485 ENSG00000237613 ENSG00000186092
#> ENSG00000238009 ENSG00000239945
#> rowData names(1): symbol
#> colnames(4325): HumanBreastCancerILC.AAACAACGAATAGTTC-1
#> HumanBreastCancerILC.AAACAAGTATCTCCCA-1 ...
#> HumanBreastCancerILC.TTGTTTCCATACAACT-1
#> HumanBreastCancerILC.TTGTTTGTGTAAATTC-1
#> colData names(1): sample_id
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
#> imgData names(4): sample_id image_id data scaleFactor
#> experiments: 1
#> experimentNames (1): HumanBreastCancerILC
#> experimentData names (2): sex age
The functions below can be used to access and set data in the object.
#number of experiments
nexp(el)
#> [1] 2
#names of experiments
experimentNames(el)
#> [1] "HumanBreastCancerIDC" "HumanBreastCancerILC"
#get a list of individual experiments
experiments(el)
#> $HumanBreastCancerIDC
#> class: SpatialExperiment
#> dim: 36601 7785
#> metadata(0):
#> assays(1): counts
#> rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
#> ENSG00000277196
#> rowData names(1): symbol
#> colnames(7785): AAACAAGTATCTCCCA-1 AAACACCAATAACTGC-1 ...
#> TTGTTTGTATTACACG-1 TTGTTTGTGTAAATTC-1
#> colData names(1): sample_id
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
#> imgData names(4): sample_id image_id data scaleFactor
#>
#> $HumanBreastCancerILC
#> class: SpatialExperiment
#> dim: 36601 4325
#> metadata(0):
#> assays(1): counts
#> rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
#> ENSG00000277196
#> rowData names(1): symbol
#> colnames(4325): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
#> TTGTTTCCATACAACT-1 TTGTTTGTGTAAATTC-1
#> colData names(1): sample_id
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
#> imgData names(4): sample_id image_id data scaleFactor
#get experiment annotations
experimentData(el)
#> DataFrame with 2 rows and 2 columns
#> sex age
#> <character> <numeric>
#> HumanBreastCancerIDC Female 65
#> HumanBreastCancerILC Female 68
#get column annotations
head(colData(el))
#> DataFrame with 6 rows and 1 column
#> sample_id
#> <character>
#> HumanBreastCancerIDC.AAACAAGTATCTCCCA-1 HumanBreastCancerIDC1
#> HumanBreastCancerIDC.AAACACCAATAACTGC-1 HumanBreastCancerIDC1
#> HumanBreastCancerIDC.AAACAGAGCGACTCCT-1 HumanBreastCancerIDC1
#> HumanBreastCancerIDC.AAACAGGGTCTATATT-1 HumanBreastCancerIDC1
#> HumanBreastCancerIDC.AAACAGTGTTCCTGGG-1 HumanBreastCancerIDC1
#> HumanBreastCancerIDC.AAACATTTCCCGGATT-1 HumanBreastCancerIDC1
#get column annotations merged with experiment annotations
head(colData(el, experimentData = TRUE))
#> DataFrame with 6 rows and 3 columns
#> sample_id sex
#> <character> <character>
#> HumanBreastCancerIDC.AAACAAGTATCTCCCA-1 HumanBreastCancerIDC1 Female
#> HumanBreastCancerIDC.AAACACCAATAACTGC-1 HumanBreastCancerIDC1 Female
#> HumanBreastCancerIDC.AAACAGAGCGACTCCT-1 HumanBreastCancerIDC1 Female
#> HumanBreastCancerIDC.AAACAGGGTCTATATT-1 HumanBreastCancerIDC1 Female
#> HumanBreastCancerIDC.AAACAGTGTTCCTGGG-1 HumanBreastCancerIDC1 Female
#> HumanBreastCancerIDC.AAACATTTCCCGGATT-1 HumanBreastCancerIDC1 Female
#> age
#> <numeric>
#> HumanBreastCancerIDC.AAACAAGTATCTCCCA-1 65
#> HumanBreastCancerIDC.AAACACCAATAACTGC-1 65
#> HumanBreastCancerIDC.AAACAGAGCGACTCCT-1 65
#> HumanBreastCancerIDC.AAACAGGGTCTATATT-1 65
#> HumanBreastCancerIDC.AAACAGTGTTCCTGGG-1 65
#> HumanBreastCancerIDC.AAACATTTCCCGGATT-1 65
The elapply()
function can be used to apply functions to
individual objects.
#apply function
elapply(el, dim)
#> $HumanBreastCancerIDC
#> [1] 36601 7785
#>
#> $HumanBreastCancerILC
#> [1] 36601 4325
If the return type is the same as the type of the individual
experiment objects, they will be combined into
ExperimentList
object.
#get the first 100 spots
elapply(el, function(x) x[, 1:100])
#> ExperimentList with 2 SpatialExperiments
#> class: SpatialExperimentList
#> dim: 36601 200
#> metadata(0):
#> assays(1): counts
#> rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
#> ENSG00000277196
#> rowData names(1): symbol
#> colnames(200): HumanBreastCancerIDC.AAACAAGTATCTCCCA-1
#> HumanBreastCancerIDC.AAACACCAATAACTGC-1 ...
#> HumanBreastCancerILC.AACCCGAGCAGAATCG-1
#> HumanBreastCancerILC.AACCCTACTGTCAATA-1
#> colData names(1): sample_id
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
#> imgData names(4): sample_id image_id data scaleFactor
#> experiments: 2
#> experimentNames (2): HumanBreastCancerIDC HumanBreastCancerILC
#> experimentData names (2): sex age
This function allows the ExperimentList
object to be
treated as a list. Since the parent class of the object is
SpatialExperiment
, all functions that work for it
automatically work for the ExperimentList
object.
#extract image data for each object separately
elapply(el, imgData)
#> $HumanBreastCancerIDC
#> DataFrame with 2 rows and 4 columns
#> sample_id image_id data scaleFactor
#> <character> <character> <list> <numeric>
#> 1 HumanBreastCancerIDC1 lowres #### 0.0247525
#> 2 HumanBreastCancerIDC2 lowres #### 0.0247525
#>
#> $HumanBreastCancerILC
#> DataFrame with 1 row and 4 columns
#> sample_id image_id data scaleFactor
#> <character> <character> <list> <numeric>
#> 1 HumanBreastCancerILC.. lowres #### 0.0247525
#extract image data collectively
imgData(el)
#> DataFrame with 3 rows and 4 columns
#> sample_id image_id data scaleFactor
#> <character> <character> <list> <numeric>
#> 1 HumanBreastCancerIDC1 lowres #### 0.0247525
#> 2 HumanBreastCancerIDC2 lowres #### 0.0247525
#> 3 HumanBreastCancerILC.. lowres #### 0.0247525
ExperimentList
objects can be coerced to their parental
classes or ExperimentList
versions of their parental
classes.
#convert to SpatialExperiment
as(el, 'SpatialExperiment')
#> class: SpatialExperiment
#> dim: 36601 12110
#> metadata(0):
#> assays(1): counts
#> rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
#> ENSG00000277196
#> rowData names(1): symbol
#> colnames(12110): HumanBreastCancerIDC.AAACAAGTATCTCCCA-1
#> HumanBreastCancerIDC.AAACACCAATAACTGC-1 ...
#> HumanBreastCancerILC.TTGTTTCCATACAACT-1
#> HumanBreastCancerILC.TTGTTTGTGTAAATTC-1
#> colData names(1): sample_id
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
#> imgData names(4): sample_id image_id data scaleFactor
#convert to SummarizedExperiment
as(el, 'SummarizedExperiment')
#> class: SummarizedExperiment
#> dim: 36601 12110
#> metadata(0):
#> assays(1): counts
#> rownames: NULL
#> rowData names(0):
#> colnames(12110): HumanBreastCancerIDC.AAACAAGTATCTCCCA-1
#> HumanBreastCancerIDC.AAACACCAATAACTGC-1 ...
#> HumanBreastCancerILC.TTGTTTCCATACAACT-1
#> HumanBreastCancerILC.TTGTTTGTGTAAATTC-1
#> colData names(1): sample_id
#convert to SummarizedExperimentList
as(el, 'SummarizedExperimentList')
#> ExperimentList with 2 SummarizedExperiments
#> class: SummarizedExperimentList
#> dim: 36601 12110
#> metadata(0):
#> assays(1): counts
#> rownames: NULL
#> rowData names(0):
#> colnames(12110): HumanBreastCancerIDC.AAACAAGTATCTCCCA-1
#> HumanBreastCancerIDC.AAACACCAATAACTGC-1 ...
#> HumanBreastCancerILC.TTGTTTCCATACAACT-1
#> HumanBreastCancerILC.TTGTTTGTGTAAATTC-1
#> colData names(1): sample_id
#> experiments: 2
#> experimentNames (2): HumanBreastCancerIDC HumanBreastCancerILC
#> experimentData names (2): sex age
The full coercion hierarchy can be explored using the
is()
function.
is(el)
#> [1] "SpatialExperimentList" "SpatialExperiment"
#> [3] "ExperimentList" "SingleCellExperiment"
#> [5] "RangedSummarizedExperiment" "SummarizedExperiment"
#> [7] "RectangularData" "Vector"
#> [9] "Annotated" "vector_OR_Vector"
sessionInfo()
#> R Under development (unstable) (2022-03-10 r81874)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.4 LTS
#>
#> Matrix products: default
#> BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] TENxVisiumData_1.3.0 ExperimentHub_2.3.5
#> [3] AnnotationHub_3.3.9 BiocFileCache_2.3.4
#> [5] dbplyr_2.1.1 ExperimentList_0.99.0
#> [7] SpatialExperiment_1.5.4 SingleCellExperiment_1.17.2
#> [9] SummarizedExperiment_1.25.3 Biobase_2.55.0
#> [11] GenomicRanges_1.47.6 GenomeInfoDb_1.31.4
#> [13] IRanges_2.29.1 MatrixGenerics_1.7.0
#> [15] matrixStats_0.61.0 S4Vectors_0.33.10
#> [17] BiocGenerics_0.41.2
#>
#> loaded via a namespace (and not attached):
#> [1] rjson_0.2.21 ellipsis_0.3.2
#> [3] rprojroot_2.0.2 scuttle_1.5.0
#> [5] XVector_0.35.0 fs_1.5.2
#> [7] bit64_4.0.5 interactiveDisplayBase_1.33.0
#> [9] AnnotationDbi_1.57.1 fansi_1.0.2
#> [11] R.methodsS3_1.8.1 sparseMatrixStats_1.7.0
#> [13] cachem_1.0.6 knitr_1.37
#> [15] jsonlite_1.8.0 png_0.1-7
#> [17] R.oo_1.24.0 shiny_1.7.1
#> [19] HDF5Array_1.23.2 BiocManager_1.30.16
#> [21] compiler_4.2.0 httr_1.4.2
#> [23] dqrng_0.3.0 assertthat_0.2.1
#> [25] Matrix_1.4-0 fastmap_1.1.0
#> [27] limma_3.51.5 cli_3.2.0
#> [29] later_1.3.0 htmltools_0.5.2
#> [31] tools_4.2.0 glue_1.6.2
#> [33] GenomeInfoDbData_1.2.7 dplyr_1.0.8
#> [35] rappdirs_0.3.3 Rcpp_1.0.8.2
#> [37] jquerylib_0.1.4 pkgdown_2.0.2.9000
#> [39] vctrs_0.3.8 Biostrings_2.63.1
#> [41] rhdf5filters_1.7.0 DelayedMatrixStats_1.17.0
#> [43] xfun_0.30 stringr_1.4.0
#> [45] beachmat_2.11.0 mime_0.12
#> [47] lifecycle_1.0.1 edgeR_3.37.0
#> [49] zlibbioc_1.41.0 BiocStyle_2.23.1
#> [51] ragg_1.2.2 promises_1.2.0.1
#> [53] parallel_4.2.0 rhdf5_2.39.6
#> [55] yaml_2.3.5 curl_4.3.2
#> [57] memoise_2.0.1 sass_0.4.0
#> [59] stringi_1.7.6 RSQLite_2.2.10
#> [61] BiocVersion_3.15.0 desc_1.4.1
#> [63] filelock_1.0.2 BiocParallel_1.29.17
#> [65] rlang_1.0.2 pkgconfig_2.0.3
#> [67] systemfonts_1.0.4 bitops_1.0-7
#> [69] evaluate_0.15 lattice_0.20-45
#> [71] purrr_0.3.4 Rhdf5lib_1.17.3
#> [73] bit_4.0.4 tidyselect_1.1.2
#> [75] magrittr_2.0.2 R6_2.5.1
#> [77] magick_2.7.3 generics_0.1.2
#> [79] DelayedArray_0.21.2 DBI_1.1.2
#> [81] withr_2.5.0 pillar_1.7.0
#> [83] prettydoc_0.4.1 KEGGREST_1.35.0
#> [85] RCurl_1.98-1.6 tibble_3.1.6
#> [87] crayon_1.5.0 DropletUtils_1.15.2
#> [89] utf8_1.2.2 rmarkdown_2.13
#> [91] locfit_1.5-9.5 grid_4.2.0
#> [93] blob_1.2.2 digest_0.6.29
#> [95] xtable_1.8-4 httpuv_1.6.5
#> [97] R.utils_2.11.0 textshaping_0.3.6
#> [99] bslib_0.3.1