...
|
...
|
@@ -33,12 +33,30 @@ The package can be loaded using the `library` command.
|
33
|
33
|
library(celda)
|
34
|
34
|
```
|
35
|
35
|
|
36
|
|
-DecontX can take either `SingleCellExperiment` object from package [SingleCellExperiment package](https://bioconductor.org/packages/release/bioc/html/SingleCellExperiment.html) or a single counts matrix as input. `decontX` will attempt to convert any input matrix to class `dgCMatrix` from package [Matrix](https://cran.r-project.org/web/packages/Matrix/index.html) before beginning any analyses.
|
|
36
|
+# Importing data
|
37
|
37
|
|
|
38
|
+DecontX can take either a [SingleCellExperiment](https://bioconductor.org/packages/release/bioc/html/SingleCellExperiment.html) object or a counts matrix as input. `decontX` will attempt to convert any input matrix to class `dgCMatrix` from package [Matrix](https://cran.r-project.org/web/packages/Matrix/index.html) before starting the analysis.
|
|
39
|
+
|
|
40
|
+To import datasets directly into an SCE object, the [singleCellTK](https://bioconductor.org/packages/release/bioc/html/singleCellTK.html) package has several importing functions for different preprocessing tools including CellRanger, STARsolo, BUStools, Optimus, DropEST, SEQC, and Alevin/Salmon. For example, the following code can be used as a template to read in the filtered and raw matrices for multiple samples processed with CellRanger:
|
|
41
|
+
|
|
42
|
+```{r sce_import, eval = FALSE}
|
|
43
|
+library(singleCellTK)
|
|
44
|
+sce <- importCellRanger(sampleDirs = c("path/to/sample1/", "path/to/sample2/"))
|
|
45
|
+```
|
|
46
|
+
|
|
47
|
+Within each sample directory, there should be subfolders called `"outs/filtered_feature_bc_matrix/"` or `"outs/raw_feature_bc_matrix/"` with files called `matrix.mtx.gz`, `features.tsv.gz` and `barcodes.tsv.gz`. If these files are in different subdirectories, the `importCellRangerV3Sample` function can be used to import data from a different directory instead.
|
|
48
|
+
|
|
49
|
+Optionally, the "raw" or "droplet" matrix can also be easily imported by setting the `dataType` argument to "raw":
|
|
50
|
+
|
|
51
|
+```{r sce_import_raw, eval = FALSE}
|
|
52
|
+sce.raw <- importCellRanger(sampleDirs = c("path/to/sample1/", "path/to/sample2/"), dataType = "raw")
|
|
53
|
+```
|
|
54
|
+
|
|
55
|
+The raw matrix can be passed to the `background` parameter in `decontX` as described below. If using Seurat, go to the [Working with Seurat](#seurat) section for details on how to convert between SCE and Seurat objects.
|
38
|
56
|
|
39
|
57
|
# Load PBMC4k data from 10X
|
40
|
58
|
|
41
|
|
-We will utlize the 10X PBMC 4K dataset as an example. This can be easily retrieved from the package [TENxPBMCData](http://bioconductor.org/packages/release/data/experiment/html/TENxPBMCData.html). Make sure the the column names are set before running decontX.
|
|
59
|
+We will utilize the 10X PBMC 4K dataset as an example in this vignette. This data can be easily retrieved from the package [TENxPBMCData](http://bioconductor.org/packages/release/data/experiment/html/TENxPBMCData.html). Make sure the the column names are set before running decontX.
|
42
|
60
|
|
43
|
61
|
```{r load_10X, eval=TRUE, message=FALSE}
|
44
|
62
|
# Install TENxPBMCData if is it not already
|
...
|
...
|
@@ -59,23 +77,28 @@ rownames(sce) <- rowData(sce)$Symbol_TENx
|
59
|
77
|
|
60
|
78
|
# Running decontX
|
61
|
79
|
|
62
|
|
-A SingleCellExperiment (SCE) object or a sparse matrix containing the counts for filtered cells can be passed to decontX via the `x` parameter. There are two major ways to run decontX: with and without the raw/droplet matrix containing empty droplets. The raw/droplet matrix can be used to empirically estimate the distribution of ambient RNA, which is especially useful when cells that contributed to the ambient RNA are not accurately represented in the filtered count matrix containing the cells. For example, cells that were removed via flow cytometry or that were more sensitive to lysis during dissociation may have contributed to the ambient RNA but were not measured in the filtered/cell matrix. The raw/droplet matrix can be input as an SCE object or a sparse matrix using the `background` parameter:
|
|
80
|
+A SingleCellExperiment (SCE) object or a sparse matrix containing the counts for filtered cells can be passed to decontX via the `x` parameter. There are two major ways to run decontX: with and without the raw/droplet matrix containing empty droplets.
|
63
|
81
|
|
64
|
|
-```{r decontX_background, eval=FALSE, message=FALSE}
|
65
|
|
-sce <- decontX(sce, background = raw)
|
66
|
|
-```
|
67
|
82
|
|
68
|
|
-We would like to stress that `background` input was designed to contain only empty droplets. In case the `background` input contains both cell and empty droplets, for example the raw output from 10X Genomics, the software will try to look up for the cell/column names in the raw matrix (`background`) that are also found in the filtered counts matrix (`x`), and exclude them from the raw matrix. When cell/column names are not available for the input objects, the software will treat the entire `background` input as empty droplets. This will render incorrect estimation of the ambient RNA profile.
|
|
83
|
+## Without the Background
|
69
|
84
|
|
70
|
|
-If the raw matrix is not available, then `decontX` will estimate the contamination distribution for each cell cluster based on the profiles of the other cell clusters in the filtered dataset:
|
|
85
|
+If the raw matrix is supplied via the `background` parameter, then `decontX` will estimate the contamination distribution for each cell cluster based on the profiles of the other cell clusters in the filtered dataset:
|
71
|
86
|
|
72
|
87
|
```{r decontX, eval=TRUE, message=FALSE}
|
73
|
88
|
sce <- decontX(sce)
|
74
|
89
|
```
|
75
|
90
|
|
76
|
|
-Note that in this case `decontX` will perform heuristic clustering to quickly define major cell clusters. However if you have your own cell cluster labels, they can be specified with the `z` parameter.
|
|
91
|
+The estimated contamination results can be found in the `colData(sce)$decontX_contamination` and the decontaminated counts can be accessed with `decontXcounts(sce)`. If the input object was a matrix, make sure to save the output into a variable with a different name (e.g. result). The result object will be a list with contamination in `result$contamination` and the decontaminated counts in `result$decontXcounts`. `decontX` will perform heuristic clustering to quickly define major cell clusters. However if you have your own cell cluster labels, they can be specified with the `z` parameter. These results will be used throughout the rest of the vignette.
|
|
92
|
+
|
|
93
|
+## With the Background
|
|
94
|
+
|
|
95
|
+The raw/droplet matrix can be used to empirically estimate the distribution of ambient RNA, which is especially useful when cells that contributed to the ambient RNA are not accurately represented in the filtered count matrix containing the cells. For example, cells that were removed via flow cytometry or that were more sensitive to lysis during dissociation may have contributed to the ambient RNA but were not measured in the filtered/cell matrix. The raw/droplet matrix can be input as an SCE object or a sparse matrix using the `background` parameter:
|
|
96
|
+
|
|
97
|
+```{r decontX_background, eval=FALSE, message=FALSE}
|
|
98
|
+sce <- decontX(sce, background = sce.raw)
|
|
99
|
+```
|
77
|
100
|
|
78
|
|
-The contamination can be found in the `colData(sce)$decontX_contamination` and the decontaminated counts can be accessed with `decontXcounts(sce)`. If the input object was a matrix, make sure to save the output into a variable with a different name (e.g. result). The result object will be a list with contamination in `result$contamination` and the decontaminated counts in `result$decontXcounts`.
|
|
101
|
+Only empty droplets in the background matrix should be used to estimate the ambient RNA. If any cell ids (i.e. `colnames`) in the raw/droplet matrix supplied to the `background` parameter are also found in the filtered counts matrix (`x`), decontX will automatically remove them from the raw matrix. However, if the cell ids are not available for the input matrices, decontX will treat the entire `background` input as empty droplets. All of the outputs are the same as when running decontX without setting the `background` parameter.
|
79
|
102
|
|
80
|
103
|
# Plotting DecontX results
|
81
|
104
|
|
...
|
...
|
@@ -147,7 +170,7 @@ plotDecontXMarkerPercentage(sce,
|
147
|
170
|
Some helpful hints when using `plotDecontXMarkerPercentage`:
|
148
|
171
|
|
149
|
172
|
1. Cell clusters can be renamed and re-grouped using the `groupCluster` parameter, which also needs to be a named list. If `groupCluster` is used, cell clusters not included in the list will be excluded in the barplot. For example, if we wanted to group T-cells and NK-cells together, we could set `cellTypeMappings <- list(NK_Tcells = c(2,6), Bcells = 5, Monocytes = 1)`
|
150
|
|
-2. The level a gene needs to be expressed to be considered detected in a cell can be adjusted using the `threshold` parameter.
|
|
173
|
+2. The level a gene that needs to be expressed to be considered detected in a cell can be adjusted using the `threshold` parameter.
|
151
|
174
|
3. If you are not using a `SingleCellExperiment`, then you will need to supply the original counts matrix or the decontaminated counts matrix as the first argument to generate the barplots.
|
152
|
175
|
|
153
|
176
|
## Violin plot to compare the distributions of original and decontaminated counts
|
...
|
...
|
@@ -168,16 +191,17 @@ Some helpful hints when using `plotDecontXMarkerExpression`:
|
168
|
191
|
4. This function can plot any assay in a `SingleCellExperiment`. Therefore you could also examine normalized expression of the original and decontaminated counts. For example:
|
169
|
192
|
|
170
|
193
|
|
171
|
|
-```{r plot_norm_counts, eval = FALSE}
|
172
|
|
-sce <- scater::logNormCounts(sce,
|
|
194
|
+```{r plot_norm_counts, eval = TRUE}
|
|
195
|
+library(scater)
|
|
196
|
+sce <- logNormCounts(sce,
|
173
|
197
|
exprs_values = "decontXcounts",
|
174
|
|
- name = "dlogcounts")
|
|
198
|
+ name = "decontXlogcounts")
|
175
|
199
|
|
176
|
200
|
plotDecontXMarkerExpression(sce,
|
177
|
201
|
markers = markers[["Monocyte_Markers"]],
|
178
|
202
|
groupClusters = cellTypeMappings,
|
179
|
203
|
ncol = 3,
|
180
|
|
- assayName = c("logcounts", "dlogcounts"))
|
|
204
|
+ assayName = c("logcounts", "decontXlogcounts"))
|
181
|
205
|
```
|
182
|
206
|
|
183
|
207
|
|
...
|
...
|
@@ -204,40 +228,37 @@ plot(sce$decontX_contamination, sce.delta$decontX_contamination,
|
204
|
228
|
abline(0, 1, col = "red", lwd = 2)
|
205
|
229
|
```
|
206
|
230
|
|
207
|
|
-## Integration with packages such as Seurat and singleCellTK
|
208
|
|
-You can integrate decontX into your scRNA-seq analysis pipelines, such as the one provided by [Seurat](https://cran.r-project.org/web/packages/Seurat/index.html). Both decontX and Seurat takes input count matrix, although the decontaminated matrix of decontX consists of floating point numbers. As heuristics, you can round the decontaminated matrix to integers before applying it to your Seurat pipeline.
|
|
231
|
+## Working with Seurat {#seurat}
|
209
|
232
|
|
210
|
|
-```{r seuratIntegration, eval=FALSE}
|
211
|
|
-library(Seurat)
|
|
233
|
+If you are using the [Seurat](https://cran.r-project.org/web/packages/Seurat/index.html) package for downstream analysis, the following code can be used to read in a matrix and convert between Seurat and SCE objects:
|
212
|
234
|
|
213
|
|
-counts <- Read10X("path/to/file")
|
|
235
|
+```{r seurat_create, eval = FALSE}
|
|
236
|
+# Read counts from CellRanger output
|
|
237
|
+library(Seurat)
|
|
238
|
+counts <- Read10X("sample/outs/filtered_feature_bc_matrix/")
|
214
|
239
|
|
215
|
|
-# Convert count matrix to SingleCellExperiment to run on decontX
|
|
240
|
+# Create a SingleCellExperiment object and run decontX
|
216
|
241
|
sce <- SingleCellExperiment(list(counts = counts))
|
217
|
242
|
sce <- decontX(sce)
|
218
|
243
|
|
219
|
|
-# Retrieve decontaminated matrix, round to integer, and convert to Seurat object
|
220
|
|
-decontaminated.matrix <- decontXcounts(sce)
|
221
|
|
-decontaminated.counts <- round(decontaminated.matrix)
|
222
|
|
-seuratObject <- CreateSeuratObject(decontaminated.counts)
|
|
244
|
+# Create a Seurat object from a SCE with decontX results
|
|
245
|
+seuratObject <- CreateSeuratObject(round(decontXcounts(sce)))
|
223
|
246
|
```
|
224
|
247
|
|
225
|
|
-Conversely, if you have a Seurat object containing raw count matrix and would like to run decontX, simply retrieve the count matrix, convert to SingleCellExperiment, and run on decontX.
|
|
248
|
+Optionally, the "raw" matrix can be also be imported and used as the background:
|
226
|
249
|
|
227
|
|
-```{r seuratIntegration2, eval=FALSE}
|
228
|
|
-counts <- GetAssayData(object = seuratObject, slot = "counts")
|
229
|
|
-sce <- SingleCellExperiment(list(counts = counts))
|
230
|
|
-sce <- decontX(sce)
|
|
250
|
+```{r seurat_raw, eval = FALSE}
|
|
251
|
+counts.raw <- Read10X("sample/outs/raw_feature_bc_matrix/")
|
|
252
|
+sce.raw <- SingleCellExperiment(list(counts = counts.raw))
|
|
253
|
+sce <- decontX(sce, background = sce.raw)
|
231
|
254
|
```
|
232
|
255
|
|
|
256
|
+Note that the decontaminated matrix of decontX consists of floating point numbers and must be rounded to integers before adding it to a Seurat object. If you already have a Seurat object containing the counts matrix and would like to run decontX, you can retrieve the count matrix, create a SCE object, and run decontX, and then add it back to the Seurat object:
|
233
|
257
|
|
234
|
|
-To import datasets into SingleCellExperiment object, the [singleCellTK](https://bioconductor.org/packages/release/bioc/html/singleCellTK.html) package has several importing functions for different preprocessing tools including CellRanger, STARsolo, BUStools, Optimus, DropEST, SEQC, and Alevin/Salmon. For example, the following code can be used as a template to read in the filtered and raw matrices for multiple samples processed with CellRanger:
|
235
|
|
-
|
236
|
|
-```{r singleCellTKIntegration, eval=FALSE}
|
237
|
|
-library(singleCellTK)
|
238
|
|
-
|
239
|
|
-sce <- importCellRanger(sampleDirs = c("path/to/sample1/", "path/to/sample2/"))
|
240
|
|
-sce.raw <- importCellRanger(sampleDirs = c("path/to/sample1/", "path/to/sample2/"), dataType = "raw")
|
|
258
|
+```{r seurat_create2, eval = FALSE}
|
|
259
|
+counts <- GetAssayData(object = seuratObject, slot = "counts")
|
|
260
|
+sce <- SingleCellExperiment(list(counts = counts))
|
|
261
|
+sce <- decontX(sce)
|
241
|
262
|
```
|
242
|
263
|
|
243
|
264
|
|