# SEtools **Pierre-Luc Germain, 14.01.2020** *D-HEST Institute for Neurosciences, ETH Zürich & Laboratory of Statistical Bioinformatics, University Zürich* *** The *SEtools* package is a set of convenience functions for the _Bioconductor_ class *[SummarizedExperiment](*. It facilitates merging, melting, and plotting `SummarizedExperiment` objects. **NOTE that the heatmap-related functions habe been moved to a standalone package, [sechm](, and have been deprecated from this package.** <br/><br/> # Getting started ## Package installation ```r if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("SEtools") ``` Or, to install the latest development version: ```r BiocManager::install("plger/SEtools") ``` ## Example data To showcase the main functions, we will use an example object which contains (a subset of) whole-hippocampus RNAseq of mice after different stressors: ```r suppressPackageStartupMessages({ library(SummarizedExperiment) library(SEtools) }) data("SE", package="SEtools") SE ``` ``` ## class: SummarizedExperiment ## dim: 100 20 ## metadata(0): ## assays(2): counts logcpm ## rownames(100): Egr1 Nr4a1 ... CH36-200G6.4 Bhlhe22 ## rowData names(2): meanCPM meanTPM ## colnames(20): HC.Homecage.1 HC.Homecage.2 ... HC.Swim.4 HC.Swim.5 ## colData names(2): Region Condition ``` This is taken from [Floriou-Servou et al., Biol Psychiatry 2018]( <br/><br/> <br/> ## Merging and aggregating SEs ```r se1 <- SE[,1:10] se2 <- SE[,11:20] se3 <- mergeSEs( list(se1=se1, se2=se2) ) se3 ``` ``` ## class: SummarizedExperiment ## dim: 100 20 ## metadata(3): se1 se2 anno_colors ## assays(2): counts logcpm ## rownames(100): AC139063.2 Actr6 ... Zfp667 Zfp930 ## rowData names(3): meanCPM meanTPM cluster ## colnames(20): se1.HC.Homecage.1 se1.HC.Homecage.2 ... ## se2.HC.Swim.4 se2.HC.Swim.5 ## colData names(3): Dataset Region Condition ``` All assays were merged, along with rowData and colData slots. By default, row z-scores are calculated for each object when merging. This can be prevented with: ```r se3 <- mergeSEs( list(se1=se1, se2=se2), do.scale=FALSE) ``` If more than one assay is present, one can specify a different scaling behavior for each assay: ```r se3 <- mergeSEs( list(se1=se1, se2=se2), use.assays=c("counts", "logcpm"), do.scale=c(FALSE, TRUE)) ``` ### Merging by rowData columns It is also possible to merge by rowData columns, which are specified through the `mergeBy` argument. In this case, one can have one-to-many and many-to-many mappings, in which case two behaviors are possible: * By default, all combinations will be reported, which means that the same feature of one object might appear multiple times in the output because it matches multiple features of another object. * If a function is passed through `aggFun`, the features of each object will by aggregated by `mergeBy` using this function before merging. ```r rowData(se1)$metafeature <- sample(LETTERS,nrow(se1),replace = TRUE) rowData(se2)$metafeature <- sample(LETTERS,nrow(se2),replace = TRUE) se3 <- mergeSEs( list(se1=se1, se2=se2), do.scale=FALSE, mergeBy="metafeature", aggFun=median) ``` ``` ## Aggregating the objects by metafeature ## Merging... ``` ```r sehm(se3) ``` ![](README_files/figure-html/merging-1.png)<!-- --> <br/><br/> ### Aggregating a SE A single SE can also be aggregated by using the `aggSE` function: ```r se1b <- aggSE(se1, by = "metafeature") ``` ``` ## Aggregation methods for each assay: ## counts: sum; logcpm: expsum ``` ```r se1b ``` ``` ## class: SummarizedExperiment ## dim: 26 10 ## metadata(0): ## assays(2): counts logcpm ## rownames(26): A B ... Y Z ## rowData names(4): meanCPM meanTPM cluster metafeature ## colnames(10): HC.Homecage.1 HC.Homecage.2 ... HC.Handling.4 ## HC.Handling.5 ## colData names(2): Region Condition ``` If the aggregation function(s) are not specified, `aggSE` will try to guess decent aggregation functions from the assay names. <br/> *** <br/> ## Melting SE To facilitate plotting features with *[ggplot2](*, the `meltSE` function combines assay values along with row/column data: ```r d <- meltSE(SE, genes=g[1:4]) head(d) ``` ``` ## feature sample Region Condition counts logcpm ## 1 Egr1 HC.Homecage.1 HC Homecage 1581.0 4.4284969 ## 2 Nr4a1 HC.Homecage.1 HC Homecage 750.0 3.6958917 ## 3 Fos HC.Homecage.1 HC Homecage 91.4 1.7556317 ## 4 Egr2 HC.Homecage.1 HC Homecage 15.1 0.5826999 ## 5 Egr1 HC.Homecage.2 HC Homecage 1423.0 4.4415828 ## 6 Nr4a1 HC.Homecage.2 HC Homecage 841.0 3.9237691 ``` ```r suppressPackageStartupMessages(library(ggplot2)) ggplot(d, aes(Condition, counts, fill=Condition)) + geom_violin() + facet_wrap(~feature, scale="free") ``` ![An example ggplot created from a melted SE.](README_files/figure-html/unnamed-chunk-13-1.png)