Browse code

decontx vignettes, DecontX output modified

Irisapo authored on 24/03/2019 21:07:28
Showing 3 changed files

... ...
@@ -30,6 +30,6 @@ src/*.o
30 30
 src/*.dll
31 31
 src/*.so
32 32
 etc/*
33
-
34 33
 # Celda log files with default prefix
35 34
 Celda_chain.*log.txt
35
+inst/doc
... ...
@@ -314,13 +314,13 @@ DecontXoneBatch = function(counts, z=NULL, batch=NULL, max.iter=200, beta=1e-6,
314 314
     if ( !is.null(batch) ) {  logMessages("batch: ",  batch, logfile=logfile, append=TRUE, verbose=verbose)    }
315 315
     logMessages("----------------------------------------------------------------------", logfile=logfile, append=TRUE, verbose=verbose) 
316 316
 
317
-    run.params = list("beta"=beta, "delta.init"=delta.init, "iteration"=iter-1L, "seed"=seed)
317
+    run.params = list("beta.init"=beta, "delta.init"=delta.init, "iteration"=iter-1L, "seed"=seed)
318 318
 
319
-    res.list = list("logLikelihood" = ll, "est.rmat"=next.decon$est.rmat , "est.conp"= res.conp, "theta"=theta , "delta"=delta)
320
-    if( decon.method=="clustering" ) {
321
-        posterior.params = list( "est.GeneDist"=phi,  "est.ConDist"=eta  ) 
322
-        res.list = append( res.list , posterior.params ) 
323
-  }
319
+    res.list = list("logLikelihood" = ll, "est.nativeCounts"=next.decon$est.rmat , "est.conp"= res.conp, "theta"=theta , "delta"=delta)
320
+    #if( decon.method=="clustering" ) {
321
+    #    posterior.params = list( "est.GeneDist"=phi,  "est.ConDist"=eta  ) 
322
+    #    res.list = append( res.list , posterior.params ) 
323
+    #}
324 324
   
325 325
     return(list("run.params"=run.params, "res.list"=res.list, "method"=decon.method  ))
326 326
 }
327 327
new file mode 100644
... ...
@@ -0,0 +1,78 @@
1
+---
2
+title: "Estimate and remove cross-contamination from ambient RNA for scRNA-seq data with DecontX"
3
+author: "Shiyi Yang, Sean Corbett, Yusuke Koga, Zhe Wang, W. Evan Johnson, Masanao Yajima, Joshua D. Campbell"
4
+date: "`r Sys.Date()`"
5
+output: rmarkdown::html_vignette
6
+vignette: >
7
+  %\VignetteIndexEntry{Estimate and remove cross-contamination from ambient RNA for scRNA-seq data with DecontX}
8
+  %\VignetteEngine{knitr::rmarkdown}
9
+  %\VignetteEncoding{UTF-8}
10
+---
11
+
12
+```{r setup, include = FALSE}
13
+knitr::opts_chunk$set(
14
+  collapse = TRUE,
15
+  comment = "#>"
16
+)
17
+```
18
+
19
+# Introduction 
20
+DecontX is a Bayesian hierarchical model to estimate and remove cross-contamination from ambient RNA in single-cell RNA-seq count data generated from droplet-based sequencing devices.DecontX will take the count matrix with/without the cell labels and estimate the contamination level and deliver a decontaminted count matrix for downstream analysis. 
21
+
22
+In this vignette we will demonstrate how to use DecontX to estimate and remove contamination.  
23
+
24
+
25
+The package can be loaded using the `library` command.
26
+
27
+```{r, eval=TRUE, warning = FALSE, echo = FALSE, message = FALSE}
28
+library(celda)
29
+```
30
+To see the latest updates and releases or to post a bug, see our GitHub page at https://github.com/compbiomed/celda. To ask questions about running Celda, visit our Google group at https://groups.google.com/forum/#!forum/celda-list.
31
+
32
+
33
+# Generation of a cross-contaminated dataset 
34
+DecontX will take a matrix of counts (referred as observed counts) where each row is a feature, each column is a cell, and each entry in the matrix is the number of counts of each feature in each cell. To illustrate the utility of DecontX, we will apply it to a simulated dataset.
35
+
36
+In the function `simulateContaminatedMatrix`, the K parameter designates the number of cell clusters, the C parameter determines the number of cells, the G parameter determines the number of genes in the simulated dataset.
37
+
38
+```
39
+sim_counts = simulateContaminatedMatric( G = 300, C = 100, K = 3 ) 
40
+```
41
+
42
+The `nativeCounts` is the natively expressed counts matrix, and `observedCounts` is the observed counts matrix that contains both contaminated and natively expressed transctripts. The `N.by.C` is the total number of observed transcripts per cell. The counts matrix which only contains contamianted transcripts can be obtained by subtracting the observed counts matrix from the observed counts matrix. 
43
+
44
+```
45
+contamination = sim_counts$observedCounts - sim_counts$observedCounts 
46
+```
47
+
48
+The `z` variable contains the population label for each cell
49
+```
50
+table( sim_counts$z) 
51
+```
52
+
53
+The `phi` and `eta` variables contain the expression distributions and contamination distributions for each population, respectively. Each column corresponds to a population, each row represents a gene. The sum of the rows equal to 1. 
54
+```
55
+colSums( sim_counts$phi ) 
56
+colSums( sim_counts$eta )
57
+```
58
+
59
+
60
+# Decontamination using DecontX
61
+DecontX uses bayesian method to estimate and remove contamination via varitaional inference. 
62
+```{r, warning = FALSE, message = FALSE}
63
+decontx.model = DecontX( counts = sim_counts$observedCounts, z = sim_counts$z ) 
64
+```
65
+
66
+## Check convergance
67
+Use log-likelihood to check converagance 
68
+```{r, eval = TRUE, fig.width = 5, fig.height = 5}
69
+plot( decontx.model$res.list$loglikelihood ) 
70
+```
71
+## Evaluate model performance 
72
+`DecontX` estimates a contamination proportion for each cell. We compare the estimated contamination proportion with the real contamination proportion.
73
+```{r, eval = TRUE, fig.width = 5, fig.height = 5}
74
+plot( decontx.model$res.list$est.conp, colSums(contamination) / sim_counts$N.By.C,  col=sim_counts$z) 
75
+abline( 0, 1) 
76
+```
77
+
78
+