... | ... |
@@ -314,13 +314,13 @@ DecontXoneBatch = function(counts, z=NULL, batch=NULL, max.iter=200, beta=1e-6, |
314 | 314 |
if ( !is.null(batch) ) { logMessages("batch: ", batch, logfile=logfile, append=TRUE, verbose=verbose) } |
315 | 315 |
logMessages("----------------------------------------------------------------------", logfile=logfile, append=TRUE, verbose=verbose) |
316 | 316 |
|
317 |
- run.params = list("beta"=beta, "delta.init"=delta.init, "iteration"=iter-1L, "seed"=seed) |
|
317 |
+ run.params = list("beta.init"=beta, "delta.init"=delta.init, "iteration"=iter-1L, "seed"=seed) |
|
318 | 318 |
|
319 |
- res.list = list("logLikelihood" = ll, "est.rmat"=next.decon$est.rmat , "est.conp"= res.conp, "theta"=theta , "delta"=delta) |
|
320 |
- if( decon.method=="clustering" ) { |
|
321 |
- posterior.params = list( "est.GeneDist"=phi, "est.ConDist"=eta ) |
|
322 |
- res.list = append( res.list , posterior.params ) |
|
323 |
- } |
|
319 |
+ res.list = list("logLikelihood" = ll, "est.nativeCounts"=next.decon$est.rmat , "est.conp"= res.conp, "theta"=theta , "delta"=delta) |
|
320 |
+ #if( decon.method=="clustering" ) { |
|
321 |
+ # posterior.params = list( "est.GeneDist"=phi, "est.ConDist"=eta ) |
|
322 |
+ # res.list = append( res.list , posterior.params ) |
|
323 |
+ #} |
|
324 | 324 |
|
325 | 325 |
return(list("run.params"=run.params, "res.list"=res.list, "method"=decon.method )) |
326 | 326 |
} |
327 | 327 |
new file mode 100644 |
... | ... |
@@ -0,0 +1,78 @@ |
1 |
+--- |
|
2 |
+title: "Estimate and remove cross-contamination from ambient RNA for scRNA-seq data with DecontX" |
|
3 |
+author: "Shiyi Yang, Sean Corbett, Yusuke Koga, Zhe Wang, W. Evan Johnson, Masanao Yajima, Joshua D. Campbell" |
|
4 |
+date: "`r Sys.Date()`" |
|
5 |
+output: rmarkdown::html_vignette |
|
6 |
+vignette: > |
|
7 |
+ %\VignetteIndexEntry{Estimate and remove cross-contamination from ambient RNA for scRNA-seq data with DecontX} |
|
8 |
+ %\VignetteEngine{knitr::rmarkdown} |
|
9 |
+ %\VignetteEncoding{UTF-8} |
|
10 |
+--- |
|
11 |
+ |
|
12 |
+```{r setup, include = FALSE} |
|
13 |
+knitr::opts_chunk$set( |
|
14 |
+ collapse = TRUE, |
|
15 |
+ comment = "#>" |
|
16 |
+) |
|
17 |
+``` |
|
18 |
+ |
|
19 |
+# Introduction |
|
20 |
+DecontX is a Bayesian hierarchical model to estimate and remove cross-contamination from ambient RNA in single-cell RNA-seq count data generated from droplet-based sequencing devices.DecontX will take the count matrix with/without the cell labels and estimate the contamination level and deliver a decontaminted count matrix for downstream analysis. |
|
21 |
+ |
|
22 |
+In this vignette we will demonstrate how to use DecontX to estimate and remove contamination. |
|
23 |
+ |
|
24 |
+ |
|
25 |
+The package can be loaded using the `library` command. |
|
26 |
+ |
|
27 |
+```{r, eval=TRUE, warning = FALSE, echo = FALSE, message = FALSE} |
|
28 |
+library(celda) |
|
29 |
+``` |
|
30 |
+To see the latest updates and releases or to post a bug, see our GitHub page at https://github.com/compbiomed/celda. To ask questions about running Celda, visit our Google group at https://groups.google.com/forum/#!forum/celda-list. |
|
31 |
+ |
|
32 |
+ |
|
33 |
+# Generation of a cross-contaminated dataset |
|
34 |
+DecontX will take a matrix of counts (referred as observed counts) where each row is a feature, each column is a cell, and each entry in the matrix is the number of counts of each feature in each cell. To illustrate the utility of DecontX, we will apply it to a simulated dataset. |
|
35 |
+ |
|
36 |
+In the function `simulateContaminatedMatrix`, the K parameter designates the number of cell clusters, the C parameter determines the number of cells, the G parameter determines the number of genes in the simulated dataset. |
|
37 |
+ |
|
38 |
+``` |
|
39 |
+sim_counts = simulateContaminatedMatric( G = 300, C = 100, K = 3 ) |
|
40 |
+``` |
|
41 |
+ |
|
42 |
+The `nativeCounts` is the natively expressed counts matrix, and `observedCounts` is the observed counts matrix that contains both contaminated and natively expressed transctripts. The `N.by.C` is the total number of observed transcripts per cell. The counts matrix which only contains contamianted transcripts can be obtained by subtracting the observed counts matrix from the observed counts matrix. |
|
43 |
+ |
|
44 |
+``` |
|
45 |
+contamination = sim_counts$observedCounts - sim_counts$observedCounts |
|
46 |
+``` |
|
47 |
+ |
|
48 |
+The `z` variable contains the population label for each cell |
|
49 |
+``` |
|
50 |
+table( sim_counts$z) |
|
51 |
+``` |
|
52 |
+ |
|
53 |
+The `phi` and `eta` variables contain the expression distributions and contamination distributions for each population, respectively. Each column corresponds to a population, each row represents a gene. The sum of the rows equal to 1. |
|
54 |
+``` |
|
55 |
+colSums( sim_counts$phi ) |
|
56 |
+colSums( sim_counts$eta ) |
|
57 |
+``` |
|
58 |
+ |
|
59 |
+ |
|
60 |
+# Decontamination using DecontX |
|
61 |
+DecontX uses bayesian method to estimate and remove contamination via varitaional inference. |
|
62 |
+```{r, warning = FALSE, message = FALSE} |
|
63 |
+decontx.model = DecontX( counts = sim_counts$observedCounts, z = sim_counts$z ) |
|
64 |
+``` |
|
65 |
+ |
|
66 |
+## Check convergance |
|
67 |
+Use log-likelihood to check converagance |
|
68 |
+```{r, eval = TRUE, fig.width = 5, fig.height = 5} |
|
69 |
+plot( decontx.model$res.list$loglikelihood ) |
|
70 |
+``` |
|
71 |
+## Evaluate model performance |
|
72 |
+`DecontX` estimates a contamination proportion for each cell. We compare the estimated contamination proportion with the real contamination proportion. |
|
73 |
+```{r, eval = TRUE, fig.width = 5, fig.height = 5} |
|
74 |
+plot( decontx.model$res.list$est.conp, colSums(contamination) / sim_counts$N.By.C, col=sim_counts$z) |
|
75 |
+abline( 0, 1) |
|
76 |
+``` |
|
77 |
+ |
|
78 |
+ |