Browse code

Updates to decontX vignette and minor change to a decontX plotting function

Joshua D. Campbell authored on 29/03/2022 15:05:16
Showing 3 changed files

... ...
@@ -371,7 +371,7 @@ plotDecontXMarkerExpression <- function(x, markers, groupClusters = NULL,
371 371
       df.list[[i]] <- cbind(df, assay = assayName[i])
372 372
     }
373 373
     df <- do.call(rbind, df.list)
374
-    assay <- as.factor(df$assay)
374
+    assay <- factor(df$assay, levels = assayName)
375 375
     if (length(assayName) > 1) {
376 376
       legend <- "right"
377 377
     }
... ...
@@ -397,7 +397,7 @@ plotDecontXMarkerExpression <- function(x, markers, groupClusters = NULL,
397 397
   }
398 398
   Expression <- df$Expression
399 399
   Marker <- df$Marker
400
-  Assay <- df$assay
400
+  Assay <- factor(df$assay, levels = assayName)
401 401
   Cluster <- df$Cluster
402 402
   if (!is.null(groupClusters)) {
403 403
     df <- cbind(df, Cell_Type = names(groupClusters)[Cluster])
... ...
@@ -6,11 +6,6 @@
6 6
 
7 7
 using namespace Rcpp;
8 8
 
9
-#ifdef RCPP_USE_GLOBAL_ROSTREAM
10
-Rcpp::Rostream<true>&  Rcpp::Rcout = Rcpp::Rcpp_cout_get();
11
-Rcpp::Rostream<false>& Rcpp::Rcerr = Rcpp::Rcpp_cerr_get();
12
-#endif
13
-
14 9
 // decontXEM
15 10
 Rcpp::List decontXEM(const Eigen::MappedSparseMatrix<double>& counts, const NumericVector& counts_colsums, const NumericVector& theta, const bool& estimate_eta, const NumericMatrix& eta, const NumericMatrix& phi, const IntegerVector& z, const bool& estimate_delta, const NumericVector& delta, const double& pseudocount);
16 11
 RcppExport SEXP _celda_decontXEM(SEXP countsSEXP, SEXP counts_colsumsSEXP, SEXP thetaSEXP, SEXP estimate_etaSEXP, SEXP etaSEXP, SEXP phiSEXP, SEXP zSEXP, SEXP estimate_deltaSEXP, SEXP deltaSEXP, SEXP pseudocountSEXP) {
... ...
@@ -33,12 +33,30 @@ The package can be loaded using the `library` command.
33 33
 library(celda)
34 34
 ```
35 35
 
36
-DecontX can take either `SingleCellExperiment` object from package [SingleCellExperiment package](https://bioconductor.org/packages/release/bioc/html/SingleCellExperiment.html) or a single counts matrix as input. `decontX` will attempt to convert any input matrix to class `dgCMatrix` from package [Matrix](https://cran.r-project.org/web/packages/Matrix/index.html) before beginning any analyses.
36
+# Importing data
37 37
 
38
+DecontX can take either a [SingleCellExperiment](https://bioconductor.org/packages/release/bioc/html/SingleCellExperiment.html) object or a counts matrix as input. `decontX` will attempt to convert any input matrix to class `dgCMatrix` from package [Matrix](https://cran.r-project.org/web/packages/Matrix/index.html) before starting the analysis.
39
+
40
+To import datasets directly into an SCE object, the [singleCellTK](https://bioconductor.org/packages/release/bioc/html/singleCellTK.html) package has several importing functions for different preprocessing tools including CellRanger, STARsolo, BUStools, Optimus, DropEST, SEQC, and Alevin/Salmon. For example, the following code can be used as a template to read in the filtered and raw matrices for multiple samples processed with CellRanger:
41
+
42
+```{r sce_import, eval = FALSE}
43
+library(singleCellTK)
44
+sce <- importCellRanger(sampleDirs = c("path/to/sample1/", "path/to/sample2/"))
45
+```
46
+
47
+Within each sample directory, there should be subfolders called `"outs/filtered_feature_bc_matrix/"` or `"outs/raw_feature_bc_matrix/"` with files called `matrix.mtx.gz`, `features.tsv.gz` and `barcodes.tsv.gz`. If these files are in different subdirectories, the `importCellRangerV3Sample` function can be used to import data from a different directory instead. 
48
+
49
+Optionally, the "raw" or "droplet" matrix can also be easily imported by setting the `dataType` argument to "raw":
50
+
51
+```{r sce_import_raw, eval = FALSE}
52
+sce.raw <- importCellRanger(sampleDirs = c("path/to/sample1/", "path/to/sample2/"), dataType = "raw")
53
+```
54
+
55
+The raw matrix can be passed to the `background` parameter in `decontX` as described below. If using Seurat, go to the [Working with Seurat](#seurat) section for details on how to convert between SCE and Seurat objects.
38 56
 
39 57
 # Load PBMC4k data from 10X
40 58
 
41
-We will utlize the 10X PBMC 4K dataset as an example. This can be easily retrieved from the package [TENxPBMCData](http://bioconductor.org/packages/release/data/experiment/html/TENxPBMCData.html). Make sure the the column names are set before running decontX.
59
+We will utilize the 10X PBMC 4K dataset as an example in this vignette. This data can be easily retrieved from the package [TENxPBMCData](http://bioconductor.org/packages/release/data/experiment/html/TENxPBMCData.html). Make sure the the column names are set before running decontX.
42 60
 
43 61
 ```{r load_10X, eval=TRUE, message=FALSE}
44 62
 # Install TENxPBMCData if is it not already
... ...
@@ -59,23 +77,28 @@ rownames(sce) <- rowData(sce)$Symbol_TENx
59 77
 
60 78
 # Running decontX
61 79
 
62
-A SingleCellExperiment (SCE) object or a sparse matrix containing the counts for filtered cells can be passed to decontX via the `x` parameter. There are two major ways to run decontX: with and without the raw/droplet matrix containing empty droplets. The raw/droplet matrix can be used to empirically estimate the distribution of ambient RNA, which is especially useful when cells that contributed to the ambient RNA are not accurately represented in the filtered count matrix containing the cells. For example, cells that were removed via flow cytometry or that were more sensitive to lysis during dissociation may have contributed to the ambient RNA but were not measured in the filtered/cell matrix. The raw/droplet matrix can be input as an SCE object or a sparse matrix using the `background` parameter:
80
+A SingleCellExperiment (SCE) object or a sparse matrix containing the counts for filtered cells can be passed to decontX via the `x` parameter. There are two major ways to run decontX: with and without the raw/droplet matrix containing empty droplets.
63 81
 
64
-```{r decontX_background, eval=FALSE, message=FALSE}
65
-sce <- decontX(sce, background = raw)
66
-```
67 82
 
68
-We would like to stress that `background` input was designed to contain only empty droplets. In case the `background` input contains both cell and empty droplets, for example the raw output from 10X Genomics, the software will try to look up for the cell/column names in the raw matrix (`background`) that are also found in the filtered counts matrix (`x`), and exclude them from the raw matrix. When cell/column names are not available for the input objects, the software will treat the entire `background` input as empty droplets. This will render incorrect estimation of the ambient RNA profile.
83
+## Without the Background
69 84
 
70
-If the raw matrix is not available, then `decontX` will estimate the contamination distribution for each cell cluster based on the profiles of the other cell clusters in the filtered dataset:
85
+If the raw matrix is supplied via the `background` parameter, then `decontX` will estimate the contamination distribution for each cell cluster based on the profiles of the other cell clusters in the filtered dataset:
71 86
 
72 87
 ```{r decontX, eval=TRUE, message=FALSE}
73 88
 sce <- decontX(sce)
74 89
 ```
75 90
 
76
-Note that in this case `decontX` will perform heuristic clustering to quickly define major cell clusters. However if you have your own cell cluster labels, they can be specified with the `z` parameter.
91
+The estimated contamination results can be found in the `colData(sce)$decontX_contamination` and the decontaminated counts can be accessed with `decontXcounts(sce)`. If the input object was a matrix, make sure to save the output into a variable with a different name (e.g. result). The result object will be a list with contamination in `result$contamination` and the decontaminated counts in `result$decontXcounts`. `decontX` will perform heuristic clustering to quickly define major cell clusters. However if you have your own cell cluster labels, they can be specified with the `z` parameter. These results will be used throughout the rest of the vignette. 
92
+
93
+## With the Background
94
+
95
+The raw/droplet matrix can be used to empirically estimate the distribution of ambient RNA, which is especially useful when cells that contributed to the ambient RNA are not accurately represented in the filtered count matrix containing the cells. For example, cells that were removed via flow cytometry or that were more sensitive to lysis during dissociation may have contributed to the ambient RNA but were not measured in the filtered/cell matrix. The raw/droplet matrix can be input as an SCE object or a sparse matrix using the `background` parameter:
96
+
97
+```{r decontX_background, eval=FALSE, message=FALSE}
98
+sce <- decontX(sce, background = sce.raw)
99
+```
77 100
 
78
-The contamination can be found in the `colData(sce)$decontX_contamination` and the decontaminated counts can be accessed with `decontXcounts(sce)`. If the input object was a matrix, make sure to save the output into a variable with a different name (e.g. result). The result object will be a list with contamination in `result$contamination` and the decontaminated counts in `result$decontXcounts`. 
101
+Only empty droplets in the background matrix should be used to estimate the ambient RNA. If any cell ids (i.e. `colnames`) in the raw/droplet matrix supplied to the `background` parameter are also found in the filtered counts matrix (`x`), decontX will automatically remove them from the raw matrix. However, if the cell ids are not available for the input matrices, decontX will treat the entire `background` input as empty droplets. All of the outputs are the same as when running decontX without setting the `background` parameter.
79 102
 
80 103
 # Plotting DecontX results
81 104
 
... ...
@@ -147,7 +170,7 @@ plotDecontXMarkerPercentage(sce,
147 170
 Some helpful hints when using `plotDecontXMarkerPercentage`:
148 171
 
149 172
 1. Cell clusters can be renamed and re-grouped using the `groupCluster` parameter, which also needs to be a named list. If `groupCluster` is used, cell clusters not included in the list will be excluded in the barplot. For example, if we wanted to group T-cells and NK-cells together, we could set `cellTypeMappings <- list(NK_Tcells = c(2,6), Bcells = 5, Monocytes = 1)`
150
-2. The level a gene needs to be expressed to be considered detected in a cell can be adjusted using the `threshold` parameter.
173
+2. The level a gene that needs to be expressed to be considered detected in a cell can be adjusted using the `threshold` parameter.
151 174
 3. If you are not using a `SingleCellExperiment`, then you will need to supply the original counts matrix or the decontaminated counts matrix as the first argument to generate the barplots. 
152 175
 
153 176
 ## Violin plot to compare the distributions of original and decontaminated counts
... ...
@@ -168,16 +191,17 @@ Some helpful hints when using `plotDecontXMarkerExpression`:
168 191
 4. This function can plot any assay in a `SingleCellExperiment`. Therefore you could also examine normalized expression of the original and decontaminated counts. For example:
169 192
 
170 193
 
171
-```{r plot_norm_counts, eval = FALSE}
172
-sce <- scater::logNormCounts(sce,
194
+```{r plot_norm_counts, eval = TRUE}
195
+library(scater)
196
+sce <- logNormCounts(sce,
173 197
     exprs_values = "decontXcounts",
174
-    name = "dlogcounts")
198
+    name = "decontXlogcounts")
175 199
 
176 200
 plotDecontXMarkerExpression(sce,
177 201
     markers = markers[["Monocyte_Markers"]],
178 202
     groupClusters = cellTypeMappings,
179 203
     ncol = 3,
180
-    assayName = c("logcounts", "dlogcounts"))
204
+    assayName = c("logcounts", "decontXlogcounts"))
181 205
 ```
182 206
 
183 207
 
... ...
@@ -204,40 +228,37 @@ plot(sce$decontX_contamination, sce.delta$decontX_contamination,
204 228
 abline(0, 1, col = "red", lwd = 2)
205 229
 ```
206 230
 
207
-## Integration with packages such as Seurat and singleCellTK
208
-You can integrate decontX into your scRNA-seq analysis pipelines, such as the one provided by  [Seurat](https://cran.r-project.org/web/packages/Seurat/index.html). Both decontX and Seurat takes input count matrix, although the decontaminated matrix of decontX consists of floating point numbers. As heuristics, you can round the decontaminated matrix to integers before applying it to your Seurat pipeline. 
231
+## Working with Seurat {#seurat}
209 232
 
210
-```{r seuratIntegration, eval=FALSE}
211
-library(Seurat)
233
+If you are using the [Seurat](https://cran.r-project.org/web/packages/Seurat/index.html) package for downstream analysis, the following code can be used to read in a matrix and convert between Seurat and SCE objects:
212 234
 
213
-counts <- Read10X("path/to/file")
235
+```{r seurat_create, eval = FALSE}
236
+# Read counts from CellRanger output
237
+library(Seurat)
238
+counts <- Read10X("sample/outs/filtered_feature_bc_matrix/")
214 239
 
215
-# Convert count matrix to SingleCellExperiment to run on decontX
240
+# Create a SingleCellExperiment object and run decontX
216 241
 sce <- SingleCellExperiment(list(counts = counts))
217 242
 sce <- decontX(sce)
218 243
 
219
-# Retrieve decontaminated matrix, round to integer, and convert to Seurat object
220
-decontaminated.matrix <- decontXcounts(sce)
221
-decontaminated.counts <- round(decontaminated.matrix)
222
-seuratObject <- CreateSeuratObject(decontaminated.counts)
244
+# Create a Seurat object from a SCE with decontX results
245
+seuratObject <- CreateSeuratObject(round(decontXcounts(sce)))
223 246
 ```
224 247
 
225
-Conversely, if you have a Seurat object containing raw count matrix and would like to run decontX, simply retrieve the count matrix, convert to SingleCellExperiment, and run on decontX.
248
+Optionally, the "raw" matrix can be also be imported and used as the background:
226 249
 
227
-```{r seuratIntegration2, eval=FALSE}
228
-counts <- GetAssayData(object = seuratObject, slot = "counts")
229
-sce <- SingleCellExperiment(list(counts = counts))
230
-sce <- decontX(sce)
250
+```{r seurat_raw, eval = FALSE}
251
+counts.raw <- Read10X("sample/outs/raw_feature_bc_matrix/")
252
+sce.raw <- SingleCellExperiment(list(counts = counts.raw))
253
+sce <- decontX(sce, background = sce.raw)
231 254
 ```
232 255
 
256
+Note that the decontaminated matrix of decontX consists of floating point numbers and must be rounded to integers before adding it to a Seurat object. If you already have a Seurat object containing the counts matrix and would like to run decontX, you can retrieve the count matrix, create a SCE object, and run decontX, and then add it back to the Seurat object:
233 257
 
234
-To import datasets into SingleCellExperiment object, the [singleCellTK](https://bioconductor.org/packages/release/bioc/html/singleCellTK.html) package has several importing functions for different preprocessing tools including CellRanger, STARsolo, BUStools, Optimus, DropEST, SEQC, and Alevin/Salmon. For example, the following code can be used as a template to read in the filtered and raw matrices for multiple samples processed with CellRanger:
235
-
236
-```{r singleCellTKIntegration, eval=FALSE}
237
-library(singleCellTK)
238
-
239
-sce <- importCellRanger(sampleDirs = c("path/to/sample1/", "path/to/sample2/"))
240
-sce.raw <- importCellRanger(sampleDirs = c("path/to/sample1/", "path/to/sample2/"), dataType = "raw")
258
+```{r seurat_create2, eval = FALSE}
259
+counts <- GetAssayData(object = seuratObject, slot = "counts")
260
+sce <- SingleCellExperiment(list(counts = counts))
261
+sce <- decontX(sce)
241 262
 ```
242 263
 
243 264