Browse code

Updates to the vignettes in inst/doc

The main updates to the copynumber and illumina_copynumber vignettes are

- replace source("helperFunctions.R") by crlmm:::myfunction
- removed last figure in illumina_copynumber that plots CN after outlier-removal

git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/branches/RELEASE_2_6/madman/Rpacks/crlmm@49114 bc3139a8-67e5-0310-9ffc-ced21a209358

Rob Scharp authored on 30/08/2010 15:45:51
Showing7 changed files

... ...
@@ -2,7 +2,7 @@ Package: crlmm
2 2
 Type: Package
3 3
 Title: Genotype Calling (CRLMM) and Copy Number Analysis tool for Affymetrix SNP 5.0 and 6.0 and Illumina arrays.
4 4
 Version: 1.6.5
5
-Date: 2010-04-16
5
+Date: 2010-08-30
6 6
 Author: Rafael A Irizarry, Benilton S Carvalho <carvalho@bclab.org>, Robert Scharpf <rscharpf@jhsph.edu>, Matt Ritchie <mritchie@wehi.edu.au>
7 7
 Maintainer: Benilton S Carvalho <carvalho@bclab.org>, Robert Scharpf <rscharpf@jhsph.edu>, Matt Ritchie <mritchie@wehi.EDU.AU>
8 8
 Description: Faster implementation of CRLMM specific to SNP 5.0 and 6.0 arrays, as well as a copy number tool specific to 5.0, 6.0, and Illumina platforms
9 9
Binary files a/inst/doc/copynumber.pdf and b/inst/doc/copynumber.pdf differ
10 10
Binary files a/inst/doc/illumina_copynumber.pdf and b/inst/doc/illumina_copynumber.pdf differ
... ...
@@ -155,40 +155,42 @@ class(calls(cnSet2))
155 155
 dims(cnSet2)
156 156
 print(object.size(cnSet), units="Mb")
157 157
 print(object.size(cnSet2), units="Mb")
158
+snp.index <- which(isSnp(cnSet))
158 159
 if(FALSE){
159 160
 	replicate(5, system.time(gt1 <- calls(cnSet)[, 1:50])[[1]])
160 161
 	open(calls(cnSet2))
161 162
 	replicate(5, system.time(gt2 <- calls(cnSet2)[, 1:50])[[1]])
162 163
 }
163
-gt1 <- calls(cnSet)[, 1:50]
164
-gt2 <- calls(cnSet2)[, 1:50]
164
+gt1 <- calls(cnSet)[snp.index, 1:50]
165
+gt2 <- calls(cnSet2)[snp.index, 1:50]
165 166
 all.equal(gt1, gt2)
166 167
 @
167 168
 
168
-\noindent With \Robject{ff} objects the additional I/O time required
169
+\noindent Note that the genotype calls and call probabilities may
170
+differ slightly as the two approaches used a different set of
171
+samples. With \Robject{ff} objects the additional I/O time required
169 172
 for extracting data is substantial and will increase for larger
170 173
 datasets.  Patience is required.
171 174
 
172
-Note that for the Affymetrix 6.0 platform the assay data elements each
173
-have a row dimension corresponding to the total number of polymorphic
174
-and nonpolymorphic markers interrogated by the Affymetrix 6.0
175
-platform.  A consequence of keeping the rows of the assay data
176
-elements the same for all of the statistical summaries is that the
177
-matrix used to store genotype calls is larger than necessary.  Also,
178
-note the additional overhead of some operations when using
179
-\Robject{ff} objects.  For instance, the posterior probabilities for
180
-the CRLMM genotype calls are represented as integers. The accessor
181
-\Robject{snpCallProbability} can be used to access these confidence
182
-scores.  When stored as matrices, converting the integer
183
-representation back to the probability scale is straightforward as
184
-shown below.  However, for the \Robject{ff} objects we must first
185
-bring the data into memory. To bring all the scores into memory, one
186
-could use the function \Rfunction{[,]} but this could be slow and
187
-require a lot of RAM depending on the size of the dataset.  Instead,
188
-we suggest pulling only the needed rows and columns from memory. In
189
-the following example, we convert the integer scores to probabilities
190
-for the CEPH samples.  As genotype confidence scores are not
191
-applicable to the nonpolymorphic markers, we extract only the
175
+For the Affymetrix 6.0 platform the assay data elements each have a
176
+row dimension corresponding to the total number of polymorphic and
177
+nonpolymorphic markers interrogated by the Affymetrix 6.0 platform.  A
178
+consequence of keeping the rows of the assay data elements the same
179
+for all of the statistical summaries is that the matrix used to store
180
+genotype calls is larger than necessary.  Also, note the additional
181
+overhead of some operations when using \Robject{ff} objects.  For
182
+instance, the posterior probabilities for the CRLMM genotype calls are
183
+represented as integers. The accessor \Robject{snpCallProbability} can
184
+be used to access these confidence scores.  When stored as matrices,
185
+converting the integer representation back to the probability scale is
186
+straightforward as shown below.  However, for the \Robject{ff} objects
187
+we must first bring the data into memory. To bring all the scores into
188
+memory, one could use the function \Rfunction{[,]} but this could be
189
+slow and require a lot of RAM depending on the size of the dataset.
190
+Instead, we suggest pulling only the needed rows and columns from
191
+memory. In the following example, we convert the integer scores to
192
+probabilities for the CEPH samples.  As genotype confidence scores are
193
+not applicable to the nonpolymorphic markers, we extract only the
192 194
 polymorphic markers using the \Rfunction{isSnp} function.  We observe
193 195
 a slight difference in the posterior probabilities.  The difference
194 196
 arises because \Robject{cnSet2} was processed with a larger number of
... ...
@@ -202,9 +204,9 @@ cols <- 1:50
202 204
 posterior.prob <- tryCatch(integerScoreToProbability(snpCallProbability),
203 205
 			    error=function(e) print("This will not work for an ff object."))
204 206
 posterior.prob1 <- integerScoreToProbability(snpCallProbability(cnSet)[rows, cols])
205
-open(snpCallProbability(cnSet2))
207
+invisible(open(snpCallProbability(cnSet2)))
206 208
 posterior.prob2 <- integerScoreToProbability(snpCallProbability(cnSet2)[rows, cols])
207
-close(snpCallProbability(cnSet2))
209
+invisible(close(snpCallProbability(cnSet2)))
208 210
 all.equal(posterior.prob1, posterior.prob2)
209 211
 @
210 212
 
211 213
Binary files a/inst/scripts/copynumber.pdf and b/inst/scripts/copynumber.pdf differ
... ...
@@ -138,7 +138,7 @@ cnAB <- get("cnAB")
138 138
 load(file.path(outdir, "crlmmResult.rda"))
139 139
 @
140 140
 
141
-<<loadIntermediate, eval=TRUE, echo=FALSE>>=
141
+<<loadIntermediate, echo=FALSE>>=
142 142
 if(!exists("res")){
143 143
 	load(file.path(outdir, "snpFile.rda"))
144 144
 	res <- get("res")
... ...
@@ -150,30 +150,13 @@ if(!exists("res")){
150 150
 
151 151
 After running the crlmm algorithm, we construct a container for
152 152
 storing the quantile normalized intensities, genotype calls, and
153
-allele-specific copy number estimates.  A few helper functions for
154
-facilitating the construction of this container have been added to the
155
-inst/scripts directory of this package and can be sourced as follows.
156
-Documentation for these helper functions will be available in the
157
-devel version of this package.
158
-
159
-<<constructContainer, eval=FALSE>>=
160
-##path <- system.file("scripts", package="crlmm")
161
-##source(file.path(path, "helperFunctions.R"))
162
-fD <- crlmm:::constructFeatureData(c(res$gns, cnAB$gns), cdfName="human370v1c")
163
-new.order <- order(fD$chromosome, fD$position)
164
-fD <- fD[new.order, ]
165
-aD <- crlmm:::constructAssayData(cnAB, res, crlmmResult, order.index=new.order)
166
-protocolData(crlmmResult)$batch <- vector("integer", ncol(crlmmResult))
167
-container <- new("CNSetLM",
168
-		 assayData=aD,
169
-		 phenoData=phenoData(crlmmResult),
170
-		 protocolData=protocolData(crlmmResult),
171
-		 featureData=fD,
172
-		 annotation="human370v1c")
173
-@
153
+allele-specific copy number estimates.  As not all of these functions
154
+have been exported in this release, please use the \verb@:::@ operator
155
+as indicated.  Documentation of these functions will be available in
156
+future versions of \Rpackage{crlmm}.
174 157
 
175
-<<container2,echo=FALSE>>=
176
-if(!file.exists(file.path(outdir, "cnSet"))){
158
+<<constructContainer>>=
159
+if(!exists(file.path(outdir, "cnSet.rda"))){
177 160
 	##path <- system.file("scripts", package="crlmm")
178 161
 	##source(file.path(path, "helperFunctions.R"))
179 162
 	fD <- crlmm:::constructFeatureData(c(res$gns, cnAB$gns), cdfName="human370v1c")
... ...
@@ -191,7 +174,6 @@ if(!file.exists(file.path(outdir, "cnSet"))){
191 174
 @
192 175
 
193 176
 
194
-
195 177
 As described in the \texttt{copynumber} vignette, two \R{} functions
196 178
 for copy number estimation are available: \Rfunction{crlmmCopynumber}
197 179
 and \Rfunction{crlmmCopynumber2}.  The latter requires that the assay
... ...
@@ -261,8 +243,8 @@ nonpolymorphic loci, the total copy number is the number of copies for
261 243
 the A allele.  The helper function \Rfunction{totalCopyNumber} can be
262 244
 used to extract the total copy number for all polymorphic and
263 245
 nonpolymorphic markers.  Documentation of the
264
-\Rfunction{totalCopyNumber} will be available in the devel version of
265
-the \Rpackage{oligoClasses}.
246
+\Rfunction{totalCopyNumber} will be available in a future version of
247
+\Rpackage{crlmm}.
266 248
 
267 249
 <<totalCopyNumber>>=
268 250
 cn.total <- ca+cb
... ...
@@ -287,6 +269,7 @@ ratios.
287 269
 hist(cnSet$SNR, breaks=15)
288 270
 @
289 271
 
272
+\begin{figure}[tbh]
290 273
 <<firstSample, fig=TRUE>>=
291 274
 low.snr <- which(cnSet$SNR == min(cnSet$SNR))
292 275
 high.snr <- which(cnSet$SNR == max(cnSet$SNR))
... ...
@@ -303,35 +286,26 @@ for(j in c(low.snr, high.snr)){
303 286
 axis(1, at=pretty(x), labels=pretty(x/1e6))
304 287
 mtext("Mb", 1, outer=TRUE, line=2)
305 288
 @
289
+\caption{Copy number plotted against physical position for a sample
290
+  with high SNR (top) and a sample with low SNR (bottom).}
291
+\end{figure}
306 292
 
307 293
 Here's a very simple approach to handle outliers by applying a running
308 294
 median using a window of size 3.  Following outlier removal, we
309 295
 suggest applying a wave correction to adjust for more global waves
310 296
 followed by a segmentation or hidden markov model.
311 297
 
312
-<<removeOutliers, fig=TRUE>>=
313
-par(mfrow=c(2,1), las=1, mar=c(0.5, 4, 0.5, 0.5), oma=c(4, 1, 1,1))
314
-for(j in c(low.snr, high.snr)){
315
-	cn <- cn.total[, j]
316
-	x <- x[!is.na(cn)]
317
-	cn <- cn[!is.na(cn)]
318
-	y <- as.numeric(runmed(cn, k=3))
319
-	plot(x,
320
-	     y,
321
-	     pch=".", ylab="copy number", xaxt="n", log="y", ylim=c(0.5,5))
322
-	legend("topright", bty="n", legend=paste("SNR =", round(cnSet$SNR[j], 1)))
323
-	abline(h=2, col="grey70")
324
-}
325
-axis(1, at=pretty(x), labels=pretty(x/1e6))
326
-mtext("Mb", 1, outer=TRUE, line=2)
298
+<<removeOutliers>>=
299
+cn <- cn[!is.na(cn)]
300
+y <- as.numeric(runmed(cn, k=3))
327 301
 @
328 302
 
329 303
 \section{Session information}
304
+
330 305
 <<sessionInfo, results=tex>>=
331 306
 toLatex(sessionInfo())
332 307
 @
333 308
 
334
-
335 309
 \end{document}
336 310
 
337 311
 
338 312
Binary files a/inst/scripts/illumina_copynumber.pdf and b/inst/scripts/illumina_copynumber.pdf differ