Bioconductor Code: tidySpatialExperiment

Name	Mode	Size
.github	040000
R	040000
data	040000
inst	040000
man	040000
tests	040000
vignettes	040000
.Rbuildignore	100644	0 kb
DESCRIPTION	100644	2 kb
LICENSE.md	100644	34 kb
NAMESPACE	100644	4 kb
NEWS.md	100644	1 kb
README.Rmd	100644	14 kb
README.md	100644	24 kb

README.md

# tidySpatialExperiment - part of *tidyomics* <img src="man/figures/logo.png" id="tidySpatialExperiment_logo" align="right" width="125"/>  [![Lifecycle:experimental](https://img.shields.io/badge/lifecycle-experimental-blue.svg)](https://www.tidyverse.org/lifecycle/#experimental) [![R build status](https://github.com/william-hutchison/tidySpatialExperiment/workflows/rworkflows/badge.svg)](https://github.com/william-hutchison/tidySpatialExperiment/actions)  # Introduction tidySpatialExperiment provides a bridge between the [SpatialExperiment](https://github.com/drighelli/SpatialExperiment) package and the [*tidyverse*](https://www.tidyverse.org) ecosystem. It creates an invisible layer that allows you to interact with a `SpatialExperiment` object as if it were a tibble; enabling the use of functions from [dplyr](https://github.com/tidyverse/dplyr), [tidyr](https://github.com/tidyverse/tidyr), [ggplot2](https://github.com/tidyverse/ggplot2) and [plotly](https://github.com/plotly/plotly.R). But, underneath, your data remains a `SpatialExperiment` object. tidySpatialExperiment also provides five additional utility functions. ## Resources If you would like to learn more about tidySpatialExperiment and *tidyomics*, the following links are a good place to start: - [The tidySpatialExperiment website](http://william-hutchison.github.io/tidySpatialExperiment/) - [The tidyomics website](https://github.com/tidyomics) The *tidyomics* ecosystem also includes packages for: - Working with genomic features: - [plyranges](https://github.com/sa-lee/plyranges), for tidy manipulation of genomic range data. - [nullranges](https://github.com/nullranges/nullranges), for tidy generation of genomic ranges representing the null hypothesis. - [plyinteractions](https://github.com/tidyomics/plyinteractions), for tidy manipulation of genomic interaction data. - Working with transcriptomic features: - [tidySummarizedExperiment](https://github.com/stemangiola/tidySummarizedExperiment), for tidy manipulation of `SummarizedExperiment` objects. - [tidySingleCellExperiment](https://github.com/stemangiola/tidySingleCellExperiment), for tidy manipulation of `SingleCellExperiment` objects. - [tidyseurat](https://github.com/stemangiola/tidyseurat), for tidy manipulation of `Seurat` objects. - [tidybulk](https://github.com/stemangiola/tidybulk), for bulk RNA-seq analysis. - Working with cytometry features: - [tidytof](https://github.com/keyes-timothy/tidytof), for tidy manipulation of high-dimensional cytometry data. - And a few associated packages: - [tidygate](https://github.com/stemangiola/tidygate), for manual gating of points in space. - [tidyheatmap](https://github.com/stemangiola/tidyHeatmap/), for modular heatmap contruction. ## Functions and utilities | Package | Functions available | |---------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | `SpatialExperiment` | All | | `dplyr` | `arrange`,`bind_rows`, `bind_cols`, `distinct`, `filter`, `group_by`, `summarise`, `select`, `mutate`, `rename`, `left_join`, `right_join`, `inner_join`, `slice`, `sample_n`, `sample_frac`, `count`, `add_count` | | `tidyr` | `nest`, `unnest`, `unite`, `separate`, `extract`, `pivot_longer` | | `ggplot2` | `ggplot` | | `plotly` | `plot_ly` | | Utility | Description | |---------------------|----------------------------------------------------------------------------------| | `as_tibble` | Convert cell data to a `tbl_df` | | `join_features` | Append feature data to cell data | | `aggregate_cells` | Aggregate cell-feature abundance into a pseudobulk `SummarizedExperiment` object | | `rectangle` | Select cells in a rectangular region of space | | `ellipse` | Select cells in an elliptical region of space | | `gate_spatial` | | | `gate_programmatic` | | ## Installation You can install the stable version of tidySpatialExperiment from Bioconductor. ``` r if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("tidySpatialExperiment") ``` Or, you can install the development version of tidySpatialExperiment from GitHub. ``` r if (!requireNamespace("pak", quietly = TRUE)) install.packages("pak") pak::pak("william-hutchison/tidySpatialExperiment") ``` ## Load data Here, we attach tidySpatialExperiment and an example `SpatialExperiment` object. ``` r # Load example SpatialExperiment object library(tidySpatialExperiment) example(read10xVisium) ``` ## SpatialExperiment-tibble abstraction A `SpatialExperiment` object represents assay-feature values as rows and cells as columns. Additional information about the cells is stored in the `reducedDims`, `colData` and `spatialCoords` slots. tidySpatialExperiment provides a SpatialExperiment-tibble abstraction, representing cells as rows and cell data as columns, in accordance with the tidy observation-variable convention. The cell data is made up of information stored in the `colData` and `spatialCoords` slots. The default view is now of the SpatialExperiment-tibble abstraction. ``` r spe # # A SpatialExperiment-tibble abstraction: 50 × 7 # # Features = 50 | Cells = 50 | Assays = counts # .cell in_tissue array_row array_col sample_id pxl_col_in_fullres # <chr> <lgl> <int> <int> <chr> <int> # 1 AAACAACGAATAGTTC-1 FALSE 0 16 section1 2312 # 2 AAACAAGTATCTCCCA-1 TRUE 50 102 section1 8230 # 3 AAACAATCTACTAGCA-1 TRUE 3 43 section1 4170 # 4 AAACACCAATAACTGC-1 TRUE 59 19 section1 2519 # 5 AAACAGAGCGACTCCT-1 TRUE 14 94 section1 7679 # # ℹ 45 more rows # # ℹ 1 more variable: pxl_row_in_fullres <int> ``` But, our data maintains its status as a `SpatialExperiment` object. Therefore, we have access to all `SpatialExperiment` functions. ``` r spe |> colData() |> head() # DataFrame with 6 rows and 4 columns # in_tissue array_row array_col sample_id # <logical> <integer> <integer> <character> # AAACAACGAATAGTTC-1 FALSE 0 16 section1 # AAACAAGTATCTCCCA-1 TRUE 50 102 section1 # AAACAATCTACTAGCA-1 TRUE 3 43 section1 # AAACACCAATAACTGC-1 TRUE 59 19 section1 # AAACAGAGCGACTCCT-1 TRUE 14 94 section1 # AAACAGCTTTCAGAAG-1 FALSE 43 9 section1 spe |> spatialCoords() |> head() # pxl_col_in_fullres pxl_row_in_fullres # AAACAACGAATAGTTC-1 2312 1252 # AAACAAGTATCTCCCA-1 8230 7237 # AAACAATCTACTAGCA-1 4170 1611 # AAACACCAATAACTGC-1 2519 8315 # AAACAGAGCGACTCCT-1 7679 2927 # AAACAGCTTTCAGAAG-1 1831 6400 spe |> imgData() # DataFrame with 1 row and 4 columns # sample_id image_id data scaleFactor # <character> <character> <list> <numeric> # 1 section1 lowres #### 0.0510334 ``` # Integration with the *tidyverse* ecosystem ## Manipulate with dplyr Most functions from dplyr are available for use with the SpatialExperiment-tibble abstraction. For example, `filter()` can be used to filter cells by a variable of interest. ``` r spe |> filter(array_col < 5) # # A SpatialExperiment-tibble abstraction: 3 × 7 # # Features = 50 | Cells = 3 | Assays = counts # .cell in_tissue array_row array_col sample_id pxl_col_in_fullres # <chr> <lgl> <int> <int> <chr> <int> # 1 AAACATGGTGAGAGGA-1 FALSE 62 0 section1 1212 # 2 AAACGAAGATGGAGTA-1 FALSE 58 4 section1 1487 # 3 AAAGAATGACCTTAGA-1 FALSE 64 2 section1 1349 # # ℹ 1 more variable: pxl_row_in_fullres <int> ``` And `mutate` can be used to add new variables, or modify the value of an existing variable. ``` r spe |> mutate(in_region = c(in_tissue & array_row < 10)) # # A SpatialExperiment-tibble abstraction: 50 × 8 # # Features = 50 | Cells = 50 | Assays = counts # .cell in_tissue array_row array_col sample_id in_region pxl_col_in_fullres # <chr> <lgl> <int> <int> <chr> <lgl> <int> # 1 AAACAACG… FALSE 0 16 section1 FALSE 2312 # 2 AAACAAGT… TRUE 50 102 section1 FALSE 8230 # 3 AAACAATC… TRUE 3 43 section1 TRUE 4170 # 4 AAACACCA… TRUE 59 19 section1 FALSE 2519 # 5 AAACAGAG… TRUE 14 94 section1 FALSE 7679 # # ℹ 45 more rows # # ℹ 1 more variable: pxl_row_in_fullres <int> ``` ## Tidy with tidyr Most functions from tidyr are also available. Here, `nest()` is used to group the data by `sample_id`, and `unnest()` is used to ungroup the data. ``` r # Nest the SpatialExperiment object by sample_id spe_nested <- spe |> nest(data = -sample_id) # View the nested SpatialExperiment object spe_nested # # A tibble: 1 × 2 # sample_id data # <chr> <list> # 1 section1 <SptlExpr[,50]> # Unnest the nested SpatialExperiment objects spe_nested |> unnest(data) # # A SpatialExperiment-tibble abstraction: 50 × 7 # # Features = 50 | Cells = 50 | Assays = counts # .cell in_tissue array_row array_col sample_id pxl_col_in_fullres # <chr> <lgl> <int> <int> <chr> <int> # 1 AAACAACGAATAGTTC-1 FALSE 0 16 section1 2312 # 2 AAACAAGTATCTCCCA-1 TRUE 50 102 section1 8230 # 3 AAACAATCTACTAGCA-1 TRUE 3 43 section1 4170 # 4 AAACACCAATAACTGC-1 TRUE 59 19 section1 2519 # 5 AAACAGAGCGACTCCT-1 TRUE 14 94 section1 7679 # # ℹ 45 more rows # # ℹ 1 more variable: pxl_row_in_fullres <int> ``` ## Plot with ggplot2 The `ggplot()` function can be used to create a plot directly from a `SpatialExperiment` object. This example also demonstrates how tidy operations can be combined to build up more complex analysis. ``` r spe |> filter(sample_id == "section1" & in_tissue) |> # Add a column with the sum of feature counts per cell mutate(count_sum = purrr::map_int(.cell, ~ spe[, .x] |> counts() |> sum() )) |> # Plot with tidySpatialExperiment and ggplot2 ggplot(aes(x = reorder(.cell, count_sum), y = count_sum)) + geom_point() + coord_flip() ``` ![](man/figures/unnamed-chunk-11-1.png) ## Plot with plotly The `plot_ly()` function can also be used to create a plot from a `SpatialExperiment` object. ``` r spe |> filter(sample_id == "section1") |> plot_ly( x = ~ array_col, y = ~ array_row, color = ~ in_tissue, type = "scatter" ) ``` ![](man/figures/plotly_demo.png) # Utilities ## Append feature data to cell data The *tidyomics* ecosystem places an emphasis on interacting with cell data. To interact with feature data, the `join_features()` function can be used to append assay-feature values to cell data. ``` r # Join feature data in wide format, preserving the SpatialExperiment object spe |> join_features(features = c("ENSMUSG00000025915", "ENSMUSG00000042501"), shape = "wide") |> head() # # A SpatialExperiment-tibble abstraction: 50 × 9 # # Features = 6 | Cells = 50 | Assays = counts # .cell in_tissue array_row array_col sample_id ENSMUSG00000025915 # <chr> <lgl> <int> <int> <chr> <dbl> # 1 AAACAACGAATAGTTC-1 FALSE 0 16 section1 0 # 2 AAACAAGTATCTCCCA-1 TRUE 50 102 section1 0 # 3 AAACAATCTACTAGCA-1 TRUE 3 43 section1 0 # 4 AAACACCAATAACTGC-1 TRUE 59 19 section1 0 # 5 AAACAGAGCGACTCCT-1 TRUE 14 94 section1 0 # # ℹ 45 more rows # # ℹ 3 more variables: ENSMUSG00000042501 <dbl>, pxl_col_in_fullres <int>, # # pxl_row_in_fullres <int> # Join feature data in long format, discarding the SpatialExperiment object spe |> join_features(features = c("ENSMUSG00000025915", "ENSMUSG00000042501"), shape = "long") |> head() # tidySpatialExperiment says: A data frame is returned for independent data # analysis. # # A tibble: 6 × 7 # .cell in_tissue array_row array_col sample_id .feature .abundance_counts # <chr> <lgl> <int> <int> <chr> <chr> <dbl> # 1 AAACAACGAA… FALSE 0 16 section1 ENSMUSG… 0 # 2 AAACAACGAA… FALSE 0 16 section1 ENSMUSG… 0 # 3 AAACAAGTAT… TRUE 50 102 section1 ENSMUSG… 0 # 4 AAACAAGTAT… TRUE 50 102 section1 ENSMUSG… 1 # 5 AAACAATCTA… TRUE 3 43 section1 ENSMUSG… 0 # # ℹ 1 more row ``` ## Aggregate cells Sometimes, it is necessary to aggregate the gene-transcript abundance from a group of cells into a single value. For example, when comparing groups of cells across different samples with fixed-effect models. The `aggregate_cells()` function can be used to aggregate cells by a specified variable and assay, returning a `SummarizedExperiment` object. ``` r spe |> aggregate_cells(in_tissue, assays = "counts") # class: SummarizedExperiment # dim: 50 2 # metadata(0): # assays(1): counts # rownames(50): ENSMUSG00000002459 ENSMUSG00000005886 ... # ENSMUSG00000104217 ENSMUSG00000104328 # rowData names(1): feature # colnames(2): FALSE TRUE # colData names(3): in_tissue .aggregated_cells sample_id ``` ## Elliptical and rectangular region selection The `ellipse()` and `rectangle()` functions can be used to select cells by their position in space. ``` r spe |> filter(sample_id == "section1") |> mutate(in_ellipse = ellipse(array_col, array_row, c(20, 40), c(20, 20))) |> ggplot(aes(x = array_col, y = array_row, colour = in_ellipse)) + geom_point() ``` ![](man/figures/unnamed-chunk-15-1.png) ## Interactive gating For the interactive selection of cells in space, tidySpatialExperiment experiment provides `gate()`. This function uses [tidygate](https://github.com/stemangiola/tidygate), shiny and plotly to launch an interactive plot overlaying cells in position with image data. Additional parameters can be used to specify point colour, shape, size and alpha, either with a column in the SpatialExperiment object or a constant value. ``` r spe_gated <- spe |> gate(colour = "in_tissue", alpha = 0.8) ``` ![](man/figures/gate_interactive_demo.gif) A record of which points appear in which gates is appended to the SpatialExperiment object in the `.gated` column. To select cells which appear within any gates, filter for non-NA values. To select cells which appear within a specific gate, string pattern matching can be used. ``` r # Select cells within any gate spe_gated |> filter(!is.na(.gated)) # # A SpatialExperiment-tibble abstraction: 4 × 8 # # Features = 50 | Cells = 4 | Assays = counts # .cell in_tissue array_row array_col sample_id .gated pxl_col_in_fullres # <chr> <lgl> <int> <int> <chr> <chr> <int> # 1 AAACGAGACGG… TRUE 35 79 section1 2 6647 # 2 AAACTGCTGGC… TRUE 45 67 section1 2 5821 # 3 AAAGGGATGTA… TRUE 24 62 section1 1,2 5477 # 4 AAAGGGCAGCT… TRUE 24 26 section1 1 3000 # # ℹ 1 more variable: pxl_row_in_fullres <int> # Select cells within gate 2 spe_gated |> filter(stringr::str_detect(.gated, "2")) # # A SpatialExperiment-tibble abstraction: 3 × 8 # # Features = 50 | Cells = 3 | Assays = counts # .cell in_tissue array_row array_col sample_id .gated pxl_col_in_fullres # <chr> <lgl> <int> <int> <chr> <chr> <int> # 1 AAACGAGACGG… TRUE 35 79 section1 2 6647 # 2 AAACTGCTGGC… TRUE 45 67 section1 2 5821 # 3 AAAGGGATGTA… TRUE 24 62 section1 1,2 5477 # # ℹ 1 more variable: pxl_row_in_fullres <int> ``` Details of the interactively drawn gates are saved to `tidygate_env$gates`. This variable is overwritten each time interactive gates are drawn, so save it right away if you would like to access it later. ``` r # Inspect previously drawn gates tidygate_env$gates |> head() # # A tibble: 6 × 3 # x y .gate # <dbl> <dbl> <dbl> # 1 4310. 3125. 1 # 2 3734. 3161. 1 # 3 2942. 3521. 1 # 4 2834. 3665. 1 # 5 2834. 4385. 1 # # ℹ 1 more row ``` ``` r # Save if needed tidygate_env$gates |> write_rds("important_gates.rds") ``` If previously drawn gates are supplied to the `programmatic_gates` argument, cells will be gated programmatically. This feature allows the reproduction of previously drawn interactive gates. ``` r important_gates <- read_rds("important_gates.rds") spe |> gate(programmatic_gates = important_gates)) |> filter(!is.na(.gated)) ``` # # A SpatialExperiment-tibble abstraction: 4 × 8 # # Features = 50 | Cells = 4 | Assays = counts # .cell in_tissue array_row array_col sample_id .gated pxl_col_in_fullres # <chr> <lgl> <int> <int> <chr> <chr> <int> # 1 AAACGAGACGG… TRUE 35 79 section1 2 6647 # 2 AAACTGCTGGC… TRUE 45 67 section1 2 5821 # 3 AAAGGGATGTA… TRUE 24 62 section1 1,2 5477 # 4 AAAGGGCAGCT… TRUE 24 26 section1 1 3000 # # ℹ 1 more variable: pxl_row_in_fullres <int> # Special column behaviour Removing the `.cell` column will return a tibble. This is consistent with the behaviour in other *tidyomics* packages. ``` r spe |> select(-.cell) |> head() # tidySpatialExperiment says: Key columns are missing. A data frame is # returned for independent data analysis. # # A tibble: 6 × 4 # in_tissue array_row array_col sample_id # <lgl> <int> <int> <chr> # 1 FALSE 0 16 section1 # 2 TRUE 50 102 section1 # 3 TRUE 3 43 section1 # 4 TRUE 59 19 section1 # 5 TRUE 14 94 section1 # # ℹ 1 more row ``` The `sample_id` column cannot be removed with *tidyverse* functions, and can only be modified if the changes are accepted by SpatialExperiment’s `colData()` function. ``` r # sample_id is not removed, despite the user's request spe |> select(-sample_id) # # A SpatialExperiment-tibble abstraction: 50 × 7 # # Features = 50 | Cells = 50 | Assays = counts # .cell in_tissue array_row array_col sample_id pxl_col_in_fullres # <chr> <lgl> <int> <int> <chr> <int> # 1 AAACAACGAATAGTTC-1 FALSE 0 16 section1 2312 # 2 AAACAAGTATCTCCCA-1 TRUE 50 102 section1 8230 # 3 AAACAATCTACTAGCA-1 TRUE 3 43 section1 4170 # 4 AAACACCAATAACTGC-1 TRUE 59 19 section1 2519 # 5 AAACAGAGCGACTCCT-1 TRUE 14 94 section1 7679 # # ℹ 45 more rows # # ℹ 1 more variable: pxl_row_in_fullres <int> # This change maintains separation of sample_ids and is permitted spe |> mutate(sample_id = stringr::str_c(sample_id, "_modified")) |> head() # # A SpatialExperiment-tibble abstraction: 50 × 7 # # Features = 6 | Cells = 50 | Assays = counts # .cell in_tissue array_row array_col sample_id pxl_col_in_fullres # <chr> <lgl> <int> <int> <chr> <int> # 1 AAACAACGAATAGTTC-1 FALSE 0 16 section1_… 2312 # 2 AAACAAGTATCTCCCA-1 TRUE 50 102 section1_… 8230 # 3 AAACAATCTACTAGCA-1 TRUE 3 43 section1_… 4170 # 4 AAACACCAATAACTGC-1 TRUE 59 19 section1_… 2519 # 5 AAACAGAGCGACTCCT-1 TRUE 14 94 section1_… 7679 # # ℹ 45 more rows # # ℹ 1 more variable: pxl_row_in_fullres <int> # This change does not maintain separation of sample_ids and produces an error spe |> mutate(sample_id = "new_sample") # # A SpatialExperiment-tibble abstraction: 50 × 7 # # Features = 50 | Cells = 50 | Assays = counts # .cell in_tissue array_row array_col sample_id pxl_col_in_fullres # <chr> <lgl> <int> <int> <chr> <int> # 1 AAACAACGAATAGTTC-1 FALSE 0 16 new_sample 2312 # 2 AAACAAGTATCTCCCA-1 TRUE 50 102 new_sample 8230 # 3 AAACAATCTACTAGCA-1 TRUE 3 43 new_sample 4170 # 4 AAACACCAATAACTGC-1 TRUE 59 19 new_sample 2519 # 5 AAACAGAGCGACTCCT-1 TRUE 14 94 new_sample 7679 # # ℹ 45 more rows # # ℹ 1 more variable: pxl_row_in_fullres <int> ``` The `pxl_col_in_fullres` and `px_row_in_fullres` columns cannot be removed or modified with *tidyverse* functions. This is consistent with the behaviour of dimension reduction data in other *tidyomics* packages. ``` r # Attempting to remove pxl_col_in_fullres produces an error spe |> select(-pxl_col_in_fullres) # Error in `select_helper()`: # ! Can't select columns that don't exist. # ✖ Column `pxl_col_in_fullres` doesn't exist. # Attempting to modify pxl_col_in_fullres produces an error spe |> mutate(pxl_col_in_fullres) # Error in `dplyr::mutate()`: # ℹ In argument: `pxl_col_in_fullres`. # Caused by error: # ! object 'pxl_col_in_fullres' not found ``` # Citation If you use tidySpatialExperiment in published research, please cite [The tidyomics ecosystem: enhancing omic data analyses](https://doi.org/10.1038/s41592-024-02299-2).