# SEtools
**Pierre-Luc Germain, 14.01.2020**
*D-HEST Institute for Neurosciences, ETH Zürich & Laboratory of Statistical Bioinformatics, University Zürich*
***
The *SEtools* package is a set of convenience functions for the _Bioconductor_ class *[SummarizedExperiment](https://bioconductor.org/packages/3.10/SummarizedExperiment)*. It facilitates merging, melting, and plotting `SummarizedExperiment` objects.
**NOTE that the heatmap-related functions habe been moved to a standalone package, [sechm](https://github.com/plger/sechm), and have been deprecated from this package.**
<br/><br/>
# Getting started
## Package installation
```r
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("SEtools")
```
Or, to install the latest development version:
```r
BiocManager::install("plger/SEtools")
```
## Example data
To showcase the main functions, we will use an example object which contains (a subset of) whole-hippocampus RNAseq of mice after different stressors:
```r
suppressPackageStartupMessages({
library(SummarizedExperiment)
library(SEtools)
})
data("SE", package="SEtools")
SE
```
```
## class: SummarizedExperiment
## dim: 100 20
## metadata(0):
## assays(2): counts logcpm
## rownames(100): Egr1 Nr4a1 ... CH36-200G6.4 Bhlhe22
## rowData names(2): meanCPM meanTPM
## colnames(20): HC.Homecage.1 HC.Homecage.2 ... HC.Swim.4 HC.Swim.5
## colData names(2): Region Condition
```
This is taken from [Floriou-Servou et al., Biol Psychiatry 2018](https://doi.org/10.1016/j.biopsych.2018.02.003).
<br/><br/>
<br/>
## Merging and aggregating SEs
```r
se1 <- SE[,1:10]
se2 <- SE[,11:20]
se3 <- mergeSEs( list(se1=se1, se2=se2) )
se3
```
```
## class: SummarizedExperiment
## dim: 100 20
## metadata(3): se1 se2 anno_colors
## assays(2): counts logcpm
## rownames(100): AC139063.2 Actr6 ... Zfp667 Zfp930
## rowData names(3): meanCPM meanTPM cluster
## colnames(20): se1.HC.Homecage.1 se1.HC.Homecage.2 ...
## se2.HC.Swim.4 se2.HC.Swim.5
## colData names(3): Dataset Region Condition
```
All assays were merged, along with rowData and colData slots.
By default, row z-scores are calculated for each object when merging. This can be prevented with:
```r
se3 <- mergeSEs( list(se1=se1, se2=se2), do.scale=FALSE)
```
If more than one assay is present, one can specify a different scaling behavior for each assay:
```r
se3 <- mergeSEs( list(se1=se1, se2=se2), use.assays=c("counts", "logcpm"), do.scale=c(FALSE, TRUE))
```
### Merging by rowData columns
It is also possible to merge by rowData columns, which are specified through the `mergeBy` argument.
In this case, one can have one-to-many and many-to-many mappings, in which case two behaviors are possible:
* By default, all combinations will be reported, which means that the same feature of one object might appear multiple times in the output because it matches multiple features of another object.
* If a function is passed through `aggFun`, the features of each object will by aggregated by `mergeBy` using this function before merging.
```r
rowData(se1)$metafeature <- sample(LETTERS,nrow(se1),replace = TRUE)
rowData(se2)$metafeature <- sample(LETTERS,nrow(se2),replace = TRUE)
se3 <- mergeSEs( list(se1=se1, se2=se2), do.scale=FALSE, mergeBy="metafeature", aggFun=median)
```
```
## Aggregating the objects by metafeature
## Merging...
```
```r
sehm(se3)
```
![](README_files/figure-html/merging-1.png)<!-- -->
<br/><br/>
### Aggregating a SE
A single SE can also be aggregated by using the `aggSE` function:
```r
se1b <- aggSE(se1, by = "metafeature")
```
```
## Aggregation methods for each assay:
## counts: sum; logcpm: expsum
```
```r
se1b
```
```
## class: SummarizedExperiment
## dim: 26 10
## metadata(0):
## assays(2): counts logcpm
## rownames(26): A B ... Y Z
## rowData names(4): meanCPM meanTPM cluster metafeature
## colnames(10): HC.Homecage.1 HC.Homecage.2 ... HC.Handling.4
## HC.Handling.5
## colData names(2): Region Condition
```
If the aggregation function(s) are not specified, `aggSE` will try to guess decent aggregation functions from the assay names.
<br/>
***
<br/>
## Melting SE
To facilitate plotting features with *[ggplot2](https://CRAN.R-project.org/package=ggplot2)*, the `meltSE` function combines assay values along with row/column data:
```r
d <- meltSE(SE, genes=g[1:4])
head(d)
```
```
## feature sample Region Condition counts logcpm
## 1 Egr1 HC.Homecage.1 HC Homecage 1581.0 4.4284969
## 2 Nr4a1 HC.Homecage.1 HC Homecage 750.0 3.6958917
## 3 Fos HC.Homecage.1 HC Homecage 91.4 1.7556317
## 4 Egr2 HC.Homecage.1 HC Homecage 15.1 0.5826999
## 5 Egr1 HC.Homecage.2 HC Homecage 1423.0 4.4415828
## 6 Nr4a1 HC.Homecage.2 HC Homecage 841.0 3.9237691
```
```r
suppressPackageStartupMessages(library(ggplot2))
ggplot(d, aes(Condition, counts, fill=Condition)) + geom_violin() +
facet_wrap(~feature, scale="free")
```
![An example ggplot created from a melted SE.](README_files/figure-html/unnamed-chunk-13-1.png)