Browse code

Commit made by the Bioconductor Git-SVN bridge. Consists of 1 commit.

Commit information:

Commit id: 736588e3d47fd58358f61ce2241bb2c85b4d9a0c

update to vignette to process previously unevaluated codechunks

Committed by: Rob Scharpf
Author Name: Rob Scharpf
Commit date: 2014-09-19 10:14:04 -0400
Author date: 2014-09-19 10:14:04 -0400


git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/crlmm@94285 bc3139a8-67e5-0310-9ffc-ced21a209358

Rob Scharp authored on 19/09/2014 14:22:04
Showing 1 changed files
... ...
@@ -200,8 +200,7 @@ Estimation of log R ratios and B allele frequencies from an object of class \Rcl
200 200
 crlmmCopynumber(cnSet, fit.linearModel=FALSE)
201 201
 @
202 202
 
203
-<<oligoSnpSet,eval=FALSE>>=
204
-library(VanillaICE)
203
+<<oligoSnpSet>>=
205 204
 open(cnSet)
206 205
 oligoSetList <- BafLrrSetList(cnSet)
207 206
 close(cnSet)
... ...
@@ -214,7 +213,7 @@ oligoSetList[[1]]
214 213
 \noindent Log R ratios and B allele frequences can be retrieved by the
215 214
 accessors \Rfunction{lrr} and \Rfunction{baf}, respectively.
216 215
 
217
-<<testEqual,eval=FALSE>>=
216
+<<testEqual>>=
218 217
 lrrList <- lrr(oligoSetList)
219 218
 class(lrrList)
220 219
 dim(lrrList[[1]]) ## log R ratios for chromosome 1.
Browse code

Commit made by the Bioconductor Git-SVN bridge. Consists of 1 commit.

Commit information:

Commit id: 7a7b52509b1e09bf8d3f517244f90f9045d102ca
Commit message: Set enough.men and enough.women to FALSE by default in fit.lm3 function.
Committed by: Rob Scharpf
Author Name: Rob Scharpf
Commit date: 2014-08-25 21:44:19 -0400
Author date: 2014-08-25 21:44:19 -0400



git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/crlmm@93599 bc3139a8-67e5-0310-9ffc-ced21a209358

Rob Scharp authored on 26/08/2014 01:45:03
Showing 1 changed files
... ...
@@ -225,4 +225,3 @@ dim(bafList[[1]]) ## B allele frequencies for chromosome 1
225 225
 If the \Rfunction{crlmmCopynumber} function is not run prior to the
226 226
 \Rfunction{BafLrrSetList} construction, the log R ratios and BAFs will
227 227
 be initialized as NAs.
228
-
Browse code

merging from collab

* collab:
add warning in vignette about NAs with BafLrrSetList function
Added Human Omni Express Exome 8 v1.1b as a supported chip
updated version number of pacakge and man pages to reflect these changes
skeleton for krlmm capability added. genotype.Illumina() can now take and XY object as input
update copynumber.Rnw to use BafLrrSetList
updates to vignettes
update namespace

# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.

git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/crlmm@79138 bc3139a8-67e5-0310-9ffc-ced21a209358

Rob Scharp authored on 31/07/2013 01:37:34
Showing 1 changed files
... ...
@@ -53,15 +53,24 @@ print(histogram(~snr,
53 53
 
54 54
 \section{Copy number estimation}
55 55
 
56
-As described in \cite{Scharpf2011}, the CRLMM-CopyNumber algorithm
57
-fits a linear model to the normalized intensities stratified by the
58
-diallic genotype call.  The intercept and slope from the linear model
59
-are both SNP- and batch-specific.  The implementation in the \crlmm{}
60
-package is encapsulated by the function \Rfunction{crlmmCopynumber}
61
-that, using the default settings, can be called by passing a single
62
-object of class \Rclass{CNSet}.  See the appropriate
63
-preprocessing/genotyping vignette for the construction of an object of
64
-class \Rclass{CNSet}.
56
+There are two ways to obtain marker-level estimates of copy number
57
+that are supported by \Rpackage{crlmm}. One approach is to fit a
58
+linear model to the normalized intensities stratified by the diallelic
59
+genotype call at each SNP, as described in \cite{Scharpf2011}.
60
+Another alternative is to compute log R ratio and B allele
61
+frequencies. The latter is often better supported by downstream hidden
62
+Markov models such as those in the \Rpackage{VanillaICE} package. We
63
+describe each approach in the following two sections.
64
+
65
+\subsection{Linear model for normalized intensities}
66
+
67
+In this section, we fit a linear model to the normalized intensities
68
+stratified by the diallic genotype call.  The intercept and slope from
69
+the linear model are both SNP- and batch-specific.  The implementation
70
+in the \crlmm{} package is encapsulated by the function
71
+\Rfunction{crlmmCopynumber} that, using the default settings, can be
72
+called by passing a single object of class \Rclass{CNSet}.
73
+
65 74
 <<LDS_copynumber,cache=TRUE>>=
66 75
 crlmmCopynumber(cnSet)
67 76
 @
... ...
@@ -105,12 +114,7 @@ described in the \R{} package \Rpackage{ff}. The value returned by
105 114
 \Rfunction{crlmmCopynumber} is \verb+TRUE+, indicating that the files
106 115
 on disk have been successfully updated.  Note that while the
107 116
 \Robject{cnSet} object is unchanged, the values on disk are different.
108
-On the other hand, subsetting the \Robject{cnSet} with the `[' method
109
-coerces all of the elements to class \Rclass{matrix}. The
110
-batch-specific summaries are now ordinary matrices stored in RAM. The
111
-object returned by \Robject{crlmmCopynumber} is an object of class
112
-\Rclass{CNSet} with the matrices in the \verb+batchStatistics+ slot
113
-updated.
117
+
114 118
 
115 119
 <<chr1index>>=
116 120
 chr1.index <- which(chromosome(cnSet) == 1)
... ...
@@ -127,20 +131,10 @@ for(i in seq_along(nms)) cls[i] <- class(batchStatistics(cnSet2)[[nms[i]]])[1]
127 131
 all(cls == "matrix")
128 132
 @
129 133
 
130
-<<matrix_copynumber,cache=TRUE>>=
131
-cnSet3 <- crlmmCopynumber(cnSet2)
132
-class(cnSet3)
133
-@
134
-
135 134
 <<clean, echo=FALSE, results=hide>>=
136 135
 rm(cnSet2); gc()
137 136
 @
138 137
 
139
-\subsection{Marker-specific estimates}
140
-%\subsection{Raw copy number}
141
-
142
-%\paragraph{Log R ratios and B allele frequencies.}
143
-
144 138
 \paragraph{Raw total copy number.}
145 139
 Several functions are available that will compute relatively quickly
146 140
 the allele-specific, \emph{raw} copy number estimates. At allele $k$,
... ...
@@ -186,38 +180,49 @@ cb <- CB(cnSet, i=snp.index, j=1:2)
186 180
 
187 181
 
188 182
 \subsection{Container for log R ratios and B allele frequencies}
183
+\label{sec:lrrBaf}
189 184
 
190 185
 A useful container for storing the \crlmm{} genotypes, genotype
191 186
 confidence scores, and the total or relative copy number at each
192
-marker is the \Rclass{oligoSetList} class.  Coercion of a
193
-\Rclass{CNSet} object to a \Rfunction{oligoSnpSet} object can be
194
-acheived by using the function \Rfunction{constructOligoSetFrom} as
187
+marker is the \Rclass{BafLrrSetList} class.  Coercion of a
188
+\Rclass{CNSet} object to a \Rfunction{BafLrrSetList} object can be
189
+acheived by the function \Rfunction{BafLrrSetList} as
195 190
 illustrated below. Users should note that if the \verb+assayData+
196 191
 elements in the \Rclass{CNSet} instance are \Rpackage{ff} objects, the
197
-\verb+assayData+ elements of each element in the \Rclass{oligoSetList}
192
+\verb+assayData+ elements of each element in the \Rclass{BafLrrSetList}
198 193
 object will be \Rpackage{ff}-dervied objects (a new
199 194
 \verb+total_cn*.ff+ file will be created in the \verb+ldPath()+
200
-directory).
195
+directory). The following code-chunk is not evalutated.
196
+
197
+Estimation of log R ratios and B allele frequencies from an object of class \Rclass{CNSet} instantiated from genotyping Affymetrix CEL files or Illumina iDat files requires running the \R{} function \Rfunction{crlmmCopynumber} as a preliminary step.  Specifying \texttt{fit.linearModel=FALSE} will speed up computation as this function will by default compute summary statistics unnecessary for computing BAFs and log R ratios.
201 198
 
202
-<<oligoSnpSet>>=
199
+<<callingCrlmmCopynumber,eval=FALSE>>=
200
+crlmmCopynumber(cnSet, fit.linearModel=FALSE)
201
+@
202
+
203
+<<oligoSnpSet,eval=FALSE>>=
203 204
 library(VanillaICE)
204
-open(cnSet3)
205
-oligoSetList <- constructOligoSetListFrom(cnSet3, batch.name=batch(cnSet3)[1])
206
-close(cnSet3)
205
+open(cnSet)
206
+oligoSetList <- BafLrrSetList(cnSet)
207
+close(cnSet)
207 208
 show(oligoSetList)
208 209
 class(oligoSetList)
209 210
 ## oligoSnpSet of first chromosome
210 211
 oligoSetList[[1]]
211 212
 @
212 213
 
213
-\noindent Note that log R ratios stored in the \Robject{oligoSnpSet}
214
-object can be retrieved by the \Rfunction{copyNumber} accessor.  B
215
-allele frequences are retrieved by the \Rfunction{baf} accessor.
214
+\noindent Log R ratios and B allele frequences can be retrieved by the
215
+accessors \Rfunction{lrr} and \Rfunction{baf}, respectively.
216 216
 
217
-<<testEqual>>=
218
-lrrList <- copyNumber(oligoSetList)
217
+<<testEqual,eval=FALSE>>=
218
+lrrList <- lrr(oligoSetList)
219 219
 class(lrrList)
220 220
 dim(lrrList[[1]]) ## log R ratios for chromosome 1.
221 221
 bafList <- baf(oligoSetList)
222 222
 dim(bafList[[1]]) ## B allele frequencies for chromosome 1
223 223
 @
224
+
225
+If the \Rfunction{crlmmCopynumber} function is not run prior to the
226
+\Rfunction{BafLrrSetList} construction, the log R ratios and BAFs will
227
+be initialized as NAs.
228
+
Browse code

merge with collab branch containing fix for dqrls and bug-fix for computeRBaf that can misalign sample index with batch index (when batch is not in alphabetical order)

git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/crlmm@71295 bc3139a8-67e5-0310-9ffc-ced21a209358

Rob Scharp authored on 17/11/2012 15:24:46
Showing 1 changed files
... ...
@@ -2,7 +2,7 @@
2 2
   Affymetrix 5.0 and 6.0 platforms, as well as several Illumina
3 3
   platforms.  This vignette assumes that the arrays have already been
4 4
   successfully preprocessed and genotyped as per the instructions in
5
-  the \verb+AffymetrixPreprocessCN+ and \verb+IlluminaPreprocessCN+
5
+  the \verb+AffyGW+ and \verb+IlluminaPreprocessCN+
6 6
   vignettes for the Affymetrix and Illumina platforms,
7 7
   respectively. While this vignette uses Affymetrix 6.0 arrays for
8 8
   illustration, the steps at this point are identical for both
Browse code

Merge branch 'collab'

* collab:
Update AffyGW.pdf and IlluminaPreprocessCN.pdf
crlmmIlluminaV2 calls crlmmGT. comment call to crlmmGT2
Update AffyGW vignette to step through construction of CNSet, normalization, and genotyping
v 1.15.27: Fix bug in crlmmGT2. clean up comments in crlmmIllumina.
update getFeatureData

git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/crlmm@69561 bc3139a8-67e5-0310-9ffc-ced21a209358

Rob Scharp authored on 19/09/2012 12:31:51
Showing 1 changed files
... ...
@@ -63,7 +63,7 @@ object of class \Rclass{CNSet}.  See the appropriate
63 63
 preprocessing/genotyping vignette for the construction of an object of
64 64
 class \Rclass{CNSet}.
65 65
 <<LDS_copynumber,cache=TRUE>>=
66
-(cnSet.updated <- crlmmCopynumber(cnSet))
66
+crlmmCopynumber(cnSet)
67 67
 @
68 68
 
69 69
 The following steps were performed by the \Rfunction{crlmmCopynumber}
Browse code

Merge branch 'collab'

* collab: (34 commits)
revert change to IlluminaPreprocessCN
fix bug in isValidCdfName
print warning when all features in a batch of probes are flagged, but allow processing to continue
add utility cleancdfnames
Add validCdfNames.Rd
export validCdfNames
imputeGender fix when chromosome Y not available
Use splitIndicesByLength(index, ocSamples/getDoParWorkers())
Can not allocate vector of size XG with genotype.Illumina. Use splitIndicesByNode() only if the length of the list is greater than the split from splitIndicesByLength(). Otherwise, split by length using ocSamples()
update .gitignore
Add make.unique for sampleSheet$Sample_ID in readIdatFiles
bug in description
ensure sample ids stored in samplesheet are unique when constructing cnSet object
update oligoClasses dependency
update unit test for genotype.Illumina
revert change in constructInf call from genotype.Illumina
Update genotype.Rd
edit ACN function
1.15.6 use make.unique(basename(arrayNames)) to allow processing of Illumina samples with duplicated barcodes
check that sample identifies are unique in crlmm function
...

git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/crlmm@67435 bc3139a8-67e5-0310-9ffc-ced21a209358

Rob Scharp authored on 08/07/2012 19:00:03
Showing 1 changed files
... ...
@@ -154,7 +154,7 @@ formally, \newcommand{\A}{A} \newcommand{\B}{B}
154 154
     \phi}_{k,ip}}\left(I_{k,ijp}-{\hat \nu}_{k,ip}\right), ~0\right\}
155 155
 \mbox{~for~} k \in \{\A, \B\}.
156 156
 \end{eqnarray}
157
-\noindent See \cite{Scharpf2010} for details.
157
+\noindent See \cite{Scharpf2011} for details.
158 158
 
159 159
 The function \Rfunction{totalCopynumber} translates the normalized
160 160
 intensities to an estimate of raw copy number by adding the
... ...
@@ -175,42 +175,49 @@ dim(tmp2)
175 175
 
176 176
 Alternatively, the functions \Rfunction{CA} and \Rfunction{CB} compute
177 177
 the allele-specific copy number.  For instance, the following code
178
-chunk computes the allele-specific summaries at all polymorphic loci.
178
+chunk computes the allele-specific summaries at all polymorphic loci
179
+for the first 2 samples.
179 180
 
180 181
 <<ca>>=
181 182
 snp.index <- which(isSnp(cnSet) & !is.na(chromosome(cnSet)))
182
-ca <- CA(cnSet, i=snp.index, j=seq_len(ncol(cnSet)))
183
-cb <- CB(cnSet, i=snp.index, j=seq_len(ncol(cnSet)))
183
+ca <- CA(cnSet, i=snp.index, j=1:2)
184
+cb <- CB(cnSet, i=snp.index, j=1:2)
184 185
 @
185 186
 
186 187
 
187
-\subsection{Container for raw copy number}
188
+\subsection{Container for log R ratios and B allele frequencies}
188 189
 
189 190
 A useful container for storing the \crlmm{} genotypes, genotype
190
-confidence scores, and the total copy number at each marker is the
191
-\Rclass{oligoSnpSet} class.  Coercion of a \Rclass{CNSet} object to a
192
-\Rfunction{oligoSnpSet} object can be acheived by using the method
193
-\Rfunction{as} (as illustrated below). Users should note that if the
194
-\verb+assayData+ elements in the \Rclass{CNSet} instance are
195
-\Rpackage{ff} objects, the \verb+assayData+ elements of the
196
-instantiated \Rclass{oligoSnpSet} will also be \Rpackage{ff}-dervied
197
-objects (a new \verb+total_cn*.ff+ file will be created in the
198
-\verb+ldPath()+ directory).
191
+confidence scores, and the total or relative copy number at each
192
+marker is the \Rclass{oligoSetList} class.  Coercion of a
193
+\Rclass{CNSet} object to a \Rfunction{oligoSnpSet} object can be
194
+acheived by using the function \Rfunction{constructOligoSetFrom} as
195
+illustrated below. Users should note that if the \verb+assayData+
196
+elements in the \Rclass{CNSet} instance are \Rpackage{ff} objects, the
197
+\verb+assayData+ elements of each element in the \Rclass{oligoSetList}
198
+object will be \Rpackage{ff}-dervied objects (a new
199
+\verb+total_cn*.ff+ file will be created in the \verb+ldPath()+
200
+directory).
199 201
 
200 202
 <<oligoSnpSet>>=
203
+library(VanillaICE)
201 204
 open(cnSet3)
202
-oligoSet <- as(cnSet3, "oligoSnpSet")
205
+oligoSetList <- constructOligoSetListFrom(cnSet3, batch.name=batch(cnSet3)[1])
203 206
 close(cnSet3)
204
-class(copyNumber(oligoSet))
207
+show(oligoSetList)
208
+class(oligoSetList)
209
+## oligoSnpSet of first chromosome
210
+oligoSetList[[1]]
205 211
 @
206 212
 
207
-\noindent Note that the raw copy number estimates stored in the
208
-\Robject{oligoSnpSet} object can be retrieved by the
209
-\Rfunction{copyNumber} accessor and is equivalent to that returned by
210
-the \Rfunction{totalCopynumber} function defined over the same row and
211
-column indices.
213
+\noindent Note that log R ratios stored in the \Robject{oligoSnpSet}
214
+object can be retrieved by the \Rfunction{copyNumber} accessor.  B
215
+allele frequences are retrieved by the \Rfunction{baf} accessor.
212 216
 
213 217
 <<testEqual>>=
214
-total.cn3 <- totalCopynumber(cnSet3, i=1:nrow(cnSet3), j=seq_len(ncol(cnSet3)))
215
-all.equal(copyNumber(oligoSet), total.cn3)
216
-@
217 218
\ No newline at end of file
219
+lrrList <- copyNumber(oligoSetList)
220
+class(lrrList)
221
+dim(lrrList[[1]]) ## log R ratios for chromosome 1.
222
+bafList <- baf(oligoSetList)
223
+dim(bafList[[1]]) ## B allele frequencies for chromosome 1
224
+@
Browse code

Merge branch 'mymac'

* mymac:
add AffyGW.pdf
update vignettes in inst/scripts
Change argument of validCEL to celfiles
Update constructInf to accommodate GenomeDataFrame class for featureData
bump version to 1.13.7
Add doRUnit.R
Add celfile-utils.Rd
Streamlne some of the Rd files
add validCEL function that checks whether all celfiles can be read
getFeatureData returns GenomeAnnotatedDataFrame
Remove imports from methods. Remove pdf of illumina_copynumber.pdf (large file) and copynumber.pdf
getFeatureDAta returns GenomeAnnotatedDataFrame
Remove separate vignette for copy number in inst/scripts. Include copynumber section in both affy and illumina pipelines.
update documentation files for genotype.Illumina, preprocessInf, and genotypeInf (cdfName added as argument. Indicate that 'batch' should be a character string)
pass cdfName to genotypeInf and preprocessInf
add unitTests and cn-functions for 'simple usage'
Combine AffyPreprocess and copynumber. Combine IlluminaPreprocess and copynumber
remove depency on ff to allow installation on my mac

git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/crlmm@62108 bc3139a8-67e5-0310-9ffc-ced21a209358

Rob Scharp authored on 17/01/2012 19:13:44
Showing 1 changed files
... ...
@@ -1,35 +1,3 @@
1
-%\VignetteIndexEntry{crlmm copy number Vignette}
2
-%\VignetteDepends{crlmm, genomewidesnp6Crlmm}
3
-%\VignetteKeywords{crlmm, SNP 6}
4
-%\VignettePackage{crlmm}
5
-\documentclass{article}
6
-\usepackage{graphicx}
7
-\usepackage{natbib}
8
-\usepackage{url}
9
-\usepackage{amsmath,amssymb}
10
-\newcommand{\Rfunction}[1]{{\texttt{#1}}}
11
-\newcommand{\Rmethod}[1]{{\texttt{#1}}}
12
-\newcommand{\Rcode}[1]{{\texttt{#1}}}
13
-\newcommand{\Robject}[1]{{\texttt{#1}}}
14
-\newcommand{\Rpackage}[1]{{\textsf{#1}}}
15
-\newcommand{\Rclass}[1]{{\textit{#1}}}
16
-\newcommand{\oligo}{\Rpackage{oligo }}
17
-\newcommand{\R}{\textsf{R}}
18
-\newcommand{\crlmm}{\Rpackage{crlmm}}
19
-\usepackage[margin=1in]{geometry}
20
-
21
-\begin{document}
22
-\title{Copy number estimation with \Rpackage{crlmm}}
23
-\date{\today}
24
-\author{Rob Scharpf}
25
-\maketitle
26
-
27
-<<setup, echo=FALSE, results=hide>>=
28
-options(continue=" ", width=70)
29
-@
30
-
31
-\begin{abstract}
32
-
33 1
   Copy number routines in the \crlmm{} package are available for
34 2
   Affymetrix 5.0 and 6.0 platforms, as well as several Illumina
35 3
   platforms.  This vignette assumes that the arrays have already been
... ...
@@ -44,35 +12,6 @@ options(continue=" ", width=70)
44 12
   \crlmm{} package is available from the author's website:
45 13
   \url{http://www.biostat.jhsph.edu/~rscharpf/crlmmCompendium/index.html}.
46 14
 
47
-\end{abstract}
48
-
49
-\section{Set up}
50
-
51
-<<libraries,results=hide>>=
52
-library(ff)
53
-library(crlmm)
54
-library(lattice)
55
-library(cacheSweave)
56
-if(getRversion() < "2.13.0"){
57
-	rpath <- getRversion()
58
-} else rpath <- "trunk"
59
-outdir <- paste("/thumper/ctsa/snpmicroarray/rs/ProcessedData/crlmm/", rpath, "/copynumber_vignette", sep="")
60
-@
61
-
62
-<<ram>>=
63
-ldPath(outdir)
64
-setCacheDir(outdir)
65
-ocProbesets(150e3)
66
-ocSamples(200)
67
-@
68
-
69
-We begin by loading the \Robject{cnSet} object created by the
70
-\verb+AffymetrixPreprocessCN+ vignette.
71
-
72
-<<loadcnset>>=
73
-if(!exists("cnSet")) load(file.path(outdir, "cnSet.rda"))
74
-@
75
-
76 15
 
77 16
 \textbf{Limitations:} While a minimum number of samples is not
78 17
 required for preprocessing and genotyping, copy number estimation in
... ...
@@ -100,24 +39,16 @@ quality.  The following code chunk makes a histogram of the SNR values
100 39
 for the HapMap samples.
101 40
 
102 41
 <<snr,fig=TRUE,include=FALSE,width=6, height=4>>=
42
+library(lattice)
103 43
 invisible(open(cnSet$SNR))
104 44
 snr <- cnSet$SNR[]
105 45
 close(cnSet$SNR)
106
-print(histogram(~snr, panel=function(...){
107
-	panel.histogram(...)
108
-	panel.abline(v=5, col="grey",lty=2)
109
-},
110
-		breaks=25,xlim=c(4.5,10), xlab="SNR"))
46
+print(histogram(~snr,
47
+		panel=function(...){
48
+			panel.histogram(...)},
49
+		breaks=25, xlim=range(snr), xlab="SNR"))
111 50
 @
112
-\begin{figure}[t!]
113
-  \begin{center}
114
-  \includegraphics[width=0.6\textwidth]{copynumber-snr.pdf}
115
-  \caption{The signal to noise ratio (SNR) for 180 HapMap samples. For
116
-    Affymetrix platforms, SNR values below 5 can indicate possible
117
-    problems with sample quality.  In some circumstances, it may be
118
-    more helpful to exclude samples with poor DNA quality.}
119
-\end{center}
120
-\end{figure}
51
+
121 52
 
122 53
 
123 54
 \section{Copy number estimation}
... ...
@@ -149,29 +80,17 @@ computed for subsets of the markers to reduce the required RAM. Note
149 80
 that the value returned by the \Rfunction{crlmmCopynumber} function in
150 81
 the above example is \verb+TRUE+.  The reason the function returns
151 82
 \verb+TRUE+ in the above example is that the elements of the
152
-\verb+batchStatistics+ slot have the class \Rclass{ff\_matrix}.  Rather
153
-than keep the statistical summaries in memory, the summaries are
154
-written to files on disk using protocols described in the
83
+\verb+batchStatistics+ slot have the class \Rclass{ff\_matrix}.
84
+Rather than keep the statistical summaries in memory, the summaries
85
+are written to files on disk using protocols described in the
155 86
 \Rpackage{ff} package. Hence, while the \Robject{cnSet} object itself
156 87
 is unchanged as a result of the \Rfunction{crlmmCopynumber} function,
157 88
 the data on disk is updated accordingly.  Users that are interested in
158 89
 accessing these low-level summaries can refer to the
159
-\verb+Infrastructure+ vignette.  Computation of the raw copy number
160
-estimates for each allele is described in the following section.
161
-
162
-For users that are interested in the analysis of a specific chromosome
163
-(subset of markers) or a s
164
-
165
-pointers to files on disk, are stored in the \verb+batchStatistics+
166
-slot of the class \Rclass{CNSet}.  Using the default settings for the
167
-\Rfunction{crlmmCopynumber} function, only an object of class
168
-\Rclass{CNSet} is required.  %(See \verb+AffyPreprocessCN+ or
169
-%\verb+IlluminaPreprocessCN+ vignettes for details.)
170
-
171
-Note that depends on whether the elements of the \verb+batchStatistics+
172
-slot are \Robject{ff} objects or ordinary matrices.  In this example,
173
-the elements of \verb+batchStatistics+ have the class
174
-\Rclass{ff\_matrix}.
90
+\verb+Infrastructure+ vignette.  Note that the data structure depends
91
+on whether the elements of the \verb+batchStatistics+ slot are
92
+\Robject{ff} objects or ordinary matrices.  In this example, the
93
+elements of \verb+batchStatistics+ have the class \Rclass{ff\_matrix}.
175 94
 
176 95
 <<classes>>=
177 96
 nms <- ls(batchStatistics(cnSet))
... ...
@@ -186,10 +105,8 @@ described in the \R{} package \Rpackage{ff}. The value returned by
186 105
 \Rfunction{crlmmCopynumber} is \verb+TRUE+, indicating that the files
187 106
 on disk have been successfully updated.  Note that while the
188 107
 \Robject{cnSet} object is unchanged, the values on disk are different.
189
-
190
-
191
-\noindent On the other hand, subsetting the \Robject{cnSet} with the
192
-`[' method coerces all of the elements to class \Rclass{matrix}. The
108
+On the other hand, subsetting the \Robject{cnSet} with the `[' method
109
+coerces all of the elements to class \Rclass{matrix}. The
193 110
 batch-specific summaries are now ordinary matrices stored in RAM. The
194 111
 object returned by \Robject{crlmmCopynumber} is an object of class
195 112
 \Rclass{CNSet} with the matrices in the \verb+batchStatistics+ slot
... ...
@@ -219,14 +136,18 @@ class(cnSet3)
219 136
 rm(cnSet2); gc()
220 137
 @
221 138
 
222
-\subsection{Raw copy number}
139
+\subsection{Marker-specific estimates}
140
+%\subsection{Raw copy number}
141
+
142
+%\paragraph{Log R ratios and B allele frequencies.}
223 143
 
144
+\paragraph{Raw total copy number.}
224 145
 Several functions are available that will compute relatively quickly
225 146
 the allele-specific, \emph{raw} copy number estimates. At allele $k$,
226 147
 marker $i$, sample $j$, and batch $p$, the estimate of allele-specific
227 148
 copy number is computed by subtracting the estimated background from
228
-the normalized intensity and scaling by the slope coefficient. More formally,
229
-\newcommand{\A}{A} \newcommand{\B}{B}
149
+the normalized intensity and scaling by the slope coefficient. More
150
+formally, \newcommand{\A}{A} \newcommand{\B}{B}
230 151
 \begin{eqnarray}
231 152
   \label{eq:cnK}
232 153
 {\hat c}_{k,ijp} &=& \mbox{max}\left\{\frac{1}{{\hat
... ...
@@ -246,9 +167,9 @@ at all markers for the first 2 samples, and the total copy number for
246 167
 chromosome 20 for the first 50 samples.
247 168
 
248 169
 <<totalCopynumber>>=
249
-tmp <- totalCopynumber(cnSet, i=1:nrow(cnSet), j=1:2)
170
+tmp <- totalCopynumber(cnSet, i=seq_len(nrow(cnSet)), j=1:2)
250 171
 dim(tmp)
251
-tmp2 <- totalCopynumber(cnSet, i=which(chromosome(cnSet) == 20), j=1:50)
172
+tmp2 <- totalCopynumber(cnSet, i=which(chromosome(cnSet) == 20), j=seq_len(ncol(cnSet)))
252 173
 dim(tmp2)
253 174
 @
254 175
 
... ...
@@ -258,98 +179,12 @@ chunk computes the allele-specific summaries at all polymorphic loci.
258 179
 
259 180
 <<ca>>=
260 181
 snp.index <- which(isSnp(cnSet) & !is.na(chromosome(cnSet)))
261
-ca <- CA(cnSet, i=snp.index, j=1:5)
262
-cb <- CB(cnSet, i=snp.index, j=1:5)
263
-@
264
-
265
-\noindent Note the equivalence of the following calculations.
266
-
267
-<<totalCopynumber>>=
268
-ct <- ca+cb
269
-ct2 <- totalCopynumber(cnSet, i=snp.index, j=1:5)
270
-stopifnot(all.equal(ct, ct2))
271
-@
272
-
273
-At nonpolymorphic loci, \Rfunction{CA} function returns the total copy
274
-number and, by construction, the \Rfunction{CB} function returns 0.
275
-
276
-<<nonpolymorphicAutosomes>>=
277
-marker.index <- which(!isSnp(cnSet))
278
-ct <- CA(cnSet, i=marker.index, j=1:5)
279
-## all zeros
280
-stopifnot(all(CB(cnSet, i=marker.index, j=1:5) == 0))
281
-ct2 <- totalCopynumber(cnSet, i=marker.index, j=1:5)
282
-stopifnot(all.equal(ct, ct2))
283
-@
284
-
285
-In the following code chunk, we extract estimates of the total copy
286
-number at nonpolymorphic markers on chromosome X.
287
-
288
-<<npx>>=
289
-set.seed(123)
290
-npx.index <- which(chromosome(cnSet)==23 & !isSnp(cnSet))
291
-M <- sample(which(cnSet$gender[]==1), 5)
292
-F <- sample(which(cnSet$gender[]==2), 5)
293
-cn.M <- CA(cnSet, i=npx.index, j=M)
294
-cn.F <- CA(cnSet, i=npx.index, j=F)
295
-@
296
-
297
-\noindent Again, the function \Rfunction{totalCopynumber} is
298
-equivalent.
299
-
300
-<<npx2>>=
301
-cnX <- cbind(cn.M, cn.F)
302
-cnX2 <- totalCopynumber(cnSet, i=npx.index, j=c(M, F))
303
-stopifnot(all.equal(cnX, cnX2))
182
+ca <- CA(cnSet, i=snp.index, j=seq_len(ncol(cnSet)))
183
+cb <- CB(cnSet, i=snp.index, j=seq_len(ncol(cnSet)))
304 184
 @
305 185
 
306
-<<nonpolymorphicX,fig=TRUE,include=FALSE, echo=FALSE, eval=FALSE>>=
307
-df <- data.frame(cn=as.numeric(cnX), id=factor(rep(sampleNames(cnSet)[c(M,F)], each=nrow(cnX)), levels=sampleNames(cnSet)[c(M,F)], ordered=T))
308
-library(RColorBrewer)
309
-cols <- brewer.pal(8, "Accent")[c(1, 5)]
310
-print(bwplot(cn~id,df, panel=function(x,y,...){
311
-	panel.grid(v=-10,h=0)
312
-	panel.bwplot(x,y, ...)
313
-	panel.abline(h=1:2, col="grey70",lwd=2)
314
-},
315
-	scales=list(x=list(rot=90)), cex=0.5, ylab="total copy number", main="Chr X SNPs",
316
-       fill=cols[cnSet$gender[c(M,F)]], key=mykey))
317
-@
318
-
319
-%\begin{figure}[t!]
320
-% \centering
321
-% \includegraphics[width=0.5\textwidth]{copynumber-nonpolymorphicX}
322
-% \caption{Copy number estimates for nonpolymorphic loci on chromosome
323
-%   X (5 men, 5 women).  We assume that the median copy number across
324
-%   samples at a given marker on X is 1 for men and 2 for women.}
325
-%\end{figure}
326
-
327
-Polymorphic markers on chromosome X:
328 186
 
329
-<<polymorphicX,fig=TRUE,include=FALSE>>=
330
-library(RColorBrewer)
331
-cols <- brewer.pal(8, "Accent")[c(1, 5)]
332
-X.markers <- which(isSnp(cnSet) & chromosome(cnSet) == 23)
333
-cnX <- totalCopynumber(cnSet, i=X.markers, j=c(M,F))
334
-df <- data.frame(cn=as.numeric(cnX), id=factor(rep(sampleNames(cnSet)[c(M,F)], each=length(X.markers)), levels=sampleNames(cnSet)[c(M,F)], ordered=T))
335
-mykey <- simpleKey(c("male", "female"), points=FALSE,col=cols)
336
-print(bwplot(cn~id,df, panel=function(x,y,...){
337
-	panel.grid(v=-10,h=0)
338
-	panel.bwplot(x,y, ...)
339
-	panel.abline(h=1:2, col="grey70",lwd=2)
340
-},
341
-	scales=list(x=list(rot=90)), cex=0.5, ylab="total copy number", main="Chr X SNPs",
342
-       fill=cols[cnSet$gender[c(M,F)]], key=mykey))
343
-@
344
-\begin{figure}[t!]
345
- \centering
346
- \includegraphics[width=0.5\textwidth]{copynumber-polymorphicX}
347
- \caption{Copy number estimates for polymorphic markers on chromosome
348
-   X. crlmm assumes that the median copy number across samples at a
349
-   given marker on X is 1 for men and 2 for women.}
350
-\end{figure}
351
-
352
-\subsection{A container for raw copy number}
187
+\subsection{Container for raw copy number}
353 188
 
354 189
 A useful container for storing the \crlmm{} genotypes, genotype
355 190
 confidence scores, and the total copy number at each marker is the
... ...
@@ -362,28 +197,6 @@ instantiated \Rclass{oligoSnpSet} will also be \Rpackage{ff}-dervied
362 197
 objects (a new \verb+total_cn*.ff+ file will be created in the
363 198
 \verb+ldPath()+ directory).
364 199
 
365
-%The advantage of this approach is that the raw copy number estimates
366
-%can be computed for all markers and all samples simultaneously without
367
-%excessive RAM.  The disadvantage is that such a step creates
368
-%additional I/O for reading and writing the raw copy number estimates
369
-%from disk.  For very large data sets (1000+ samples), the time
370
-%required for I/O may outweigh the benefits.  The alternative is to
371
-%perform downstream processing (such as smoothing) on the total copy
372
-%number for subsets of markers and/or samples.  As indicated
373
-%previously, the subset `[' method applied to an object of class
374
-%\Robject{CNSet} will return an object of the same class with simple
375
-%matrices instead of \Rpackage{ff} objects.  Therefore, one can
376
-%instantiate a \Rpackage{oligoSnpSet} object from a \Robject{CNSet}
377
-%object without the I/O overhead for storing the total copy number
378
-%estimates by first subsetting the \Rclass{CNSet} object.  One could
379
-%then smooth the raw copy number estimates and store a summary copy
380
-%number for the interval, thereby substantially reducing the
381
-%dimensionality of the copy number estimates.  Examples of smoothing
382
-%applications such as circular binary segmentation implemented in the
383
-%\Rpackage{DNAcopy} package or a hidden Markov model implemented in the
384
-%\Rpackage{VanillaICE} package are provided in the \verb+SmoothingCN+
385
-%vignette.
386
-
387 200
 <<oligoSnpSet>>=
388 201
 open(cnSet3)
389 202
 oligoSet <- as(cnSet3, "oligoSnpSet")
... ...
@@ -398,19 +211,6 @@ the \Rfunction{totalCopynumber} function defined over the same row and
398 211
 column indices.
399 212
 
400 213
 <<testEqual>>=
401
-total.cn3 <- totalCopynumber(cnSet3, i=1:nrow(cnSet3), j=1:ncol(cnSet3))
214
+total.cn3 <- totalCopynumber(cnSet3, i=1:nrow(cnSet3), j=seq_len(ncol(cnSet3)))
402 215
 all.equal(copyNumber(oligoSet), total.cn3)
403
-@
404
-
405
-
406
-\section{Session information}
407
-<<sessionInfo, results=tex>>=
408
-toLatex(sessionInfo())
409
-@
410
-
411
-%\begin{bibliography}
412
-  \bibliographystyle{plain}
413
-  \bibliography{refs}
414
-%\end{bibliography}
415
-
416
-\end{document}
216
+@
417 217
\ No newline at end of file
Browse code

Update copy number vignettes. Pass cdfName to cnrmaAffy function.

git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/crlmm@54327 bc3139a8-67e5-0310-9ffc-ced21a209358

Rob Scharp authored on 31/03/2011 16:52:11
Showing 1 changed files
... ...
@@ -38,7 +38,7 @@ options(continue=" ", width=70)
38 38
   vignettes for the Affymetrix and Illumina platforms,
39 39
   respectively. While this vignette uses Affymetrix 6.0 arrays for
40 40
   illustration, the steps at this point are identical for both
41
-  platforms.  See \citep{Scharpf2010} for details regarding the
41
+  platforms.  See \citep{Scharpf2011} for details regarding the
42 42
   methodology implemented in \crlmm{} for copy number analysis.  In
43 43
   addition, a compendium describing copy number analysis using the
44 44
   \crlmm{} package is available from the author's website:
... ...
@@ -80,7 +80,9 @@ the \crlmm{} package currently requires at least 10 samples per batch.
80 80
 The parameter estimates for copy number and the corresponding
81 81
 estimates of raw copy number will tend to be more noisy for batches
82 82
 with small sample sizes (e.g., $<$ 50).  Chemistry plate or scan date
83
-are often useful surrogates for batch.
83
+are often useful surrogates for batch.  Samples that were processed at
84
+similar times (e.g., in the same month) can be grouped together in the
85
+same batch.
84 86
 
85 87
 \section{Quality control}
86 88
 
... ...
@@ -98,7 +100,7 @@ quality.  The following code chunk makes a histogram of the SNR values
98 100
 for the HapMap samples.
99 101
 
100 102
 <<snr,fig=TRUE,include=FALSE,width=6, height=4>>=
101
-open(cnSet$SNR)
103
+invisible(open(cnSet$SNR))
102 104
 snr <- cnSet$SNR[]
103 105
 close(cnSet$SNR)
104 106
 print(histogram(~snr, panel=function(...){
... ...
@@ -120,13 +122,13 @@ print(histogram(~snr, panel=function(...){
120 122
 
121 123
 \section{Copy number estimation}
122 124
 
123
-As described in \cite{Scharpf2010}, the CRLMM-CopyNumber algorithm
125
+As described in \cite{Scharpf2011}, the CRLMM-CopyNumber algorithm
124 126
 fits a linear model to the normalized intensities stratified by the
125 127
 diallic genotype call.  The intercept and slope from the linear model
126 128
 are both SNP- and batch-specific.  The implementation in the \crlmm{}
127 129
 package is encapsulated by the function \Rfunction{crlmmCopynumber}
128 130
 that, using the default settings, can be called by passing a single
129
-object of class \code{CNSet}.  See the appropriate
131
+object of class \Rclass{CNSet}.  See the appropriate
130 132
 preprocessing/genotyping vignette for the construction of an object of
131 133
 class \Rclass{CNSet}.
132 134
 <<LDS_copynumber,cache=TRUE>>=
... ...
@@ -147,7 +149,7 @@ computed for subsets of the markers to reduce the required RAM. Note
147 149
 that the value returned by the \Rfunction{crlmmCopynumber} function in
148 150
 the above example is \verb+TRUE+.  The reason the function returns
149 151
 \verb+TRUE+ in the above example is that the elements of the
150
-\verb+batchStatistics+ slot have the class \Rclass{ff_matrix}.  Rather
152
+\verb+batchStatistics+ slot have the class \Rclass{ff\_matrix}.  Rather
151 153
 than keep the statistical summaries in memory, the summaries are
152 154
 written to files on disk using protocols described in the
153 155
 \Rpackage{ff} package. Hence, while the \Robject{cnSet} object itself
... ...
@@ -169,9 +171,9 @@ slot of the class \Rclass{CNSet}.  Using the default settings for the
169 171
 Note that depends on whether the elements of the \verb+batchStatistics+
170 172
 slot are \Robject{ff} objects or ordinary matrices.  In this example,
171 173
 the elements of \verb+batchStatistics+ have the class
172
-\Rclass{ff_matrix}.
174
+\Rclass{ff\_matrix}.
173 175
 
174
-<<LDS_copynumber,cache=TRUE>>=
176
+<<classes>>=
175 177
 nms <- ls(batchStatistics(cnSet))
176 178
 cls <- rep(NA, length(nms))
177 179
 for(i in seq_along(nms)) cls[i] <- class(batchStatistics(cnSet)[[nms[i]]])[1]
... ...
@@ -284,6 +286,7 @@ In the following code chunk, we extract estimates of the total copy
284 286
 number at nonpolymorphic markers on chromosome X.
285 287
 
286 288
 <<npx>>=
289
+set.seed(123)
287 290
 npx.index <- which(chromosome(cnSet)==23 & !isSnp(cnSet))
288 291
 M <- sample(which(cnSet$gender[]==1), 5)
289 292
 F <- sample(which(cnSet$gender[]==2), 5)
... ...
@@ -341,9 +344,9 @@ print(bwplot(cn~id,df, panel=function(x,y,...){
341 344
 \begin{figure}[t!]
342 345
  \centering
343 346
  \includegraphics[width=0.5\textwidth]{copynumber-polymorphicX}
344
-\caption{Copy number estimates for polymorphic markers on chromosome
345
-  X. crlmm assumes that the median copy number across samples at a
346
-  given marker on X is 1 for men and 2 for women.}
347
+ \caption{Copy number estimates for polymorphic markers on chromosome
348
+   X. crlmm assumes that the median copy number across samples at a
349
+   given marker on X is 1 for men and 2 for women.}
347 350
 \end{figure}
348 351
 
349 352
 \subsection{A container for raw copy number}
... ...
@@ -355,29 +358,31 @@ confidence scores, and the total copy number at each marker is the
355 358
 \Rfunction{as} (as illustrated below). Users should note that if the
356 359
 \verb+assayData+ elements in the \Rclass{CNSet} instance are
357 360
 \Rpackage{ff} objects, the \verb+assayData+ elements of the
358
-instantiated \Rclass{oligoSnpSet} will also be \Rpackage{ff} object (a
359
-new \verb+total_cn*.ff+ file will be created in the \verb+ldPath()+
360
-directory). The advantage of this approach is that the raw copy number
361
-estimates can be computed for all markers and all samples
362
-simultaneously without excessive RAM.  The disadvantage is that such a
363
-step creates additional I/O for reading and writing the raw copy
364
-number estimates from disk.  For very large data sets (1000+ samples),
365
-the time required for I/O may outweigh the benefits.  The alternative
366
-is to perform downstream processing (such as smoothing) on the total
367
-copy number for subsets of markers and/or samples.  As indicated
368
-previously, the subset `[' method applied to an object of class
369
-\Robject{CNSet} will return an object of the same class with simple
370
-matrices instead of \Rpackage{ff} objects.  Therefore, one can
371
-instantiate a \Rpackage{oligoSnpSet} object from a \Robject{CNSet}
372
-object without the I/O overhead for storing the total copy number
373
-estimates by first subsetting the \Rclass{CNSet} object.  One could
374
-then smooth the raw copy number estimates and store a summary copy
375
-number for the interval, thereby substantially reducing the
376
-dimensionality of the copy number estimates.  Examples of smoothing
377
-applications such as circular binary segmentation implemented in the
378
-\Rpackage{DNAcopy} package or a hidden Markov model implemented in the
379
-\Rpackage{VanillaICE} package are provided in the \verb+SmoothingCN+
380
-vignette.
361
+instantiated \Rclass{oligoSnpSet} will also be \Rpackage{ff}-dervied
362
+objects (a new \verb+total_cn*.ff+ file will be created in the
363
+\verb+ldPath()+ directory).
364
+
365
+%The advantage of this approach is that the raw copy number estimates
366
+%can be computed for all markers and all samples simultaneously without
367
+%excessive RAM.  The disadvantage is that such a step creates
368
+%additional I/O for reading and writing the raw copy number estimates
369
+%from disk.  For very large data sets (1000+ samples), the time
370
+%required for I/O may outweigh the benefits.  The alternative is to
371
+%perform downstream processing (such as smoothing) on the total copy
372
+%number for subsets of markers and/or samples.  As indicated
373
+%previously, the subset `[' method applied to an object of class
374
+%\Robject{CNSet} will return an object of the same class with simple
375
+%matrices instead of \Rpackage{ff} objects.  Therefore, one can
376
+%instantiate a \Rpackage{oligoSnpSet} object from a \Robject{CNSet}
377
+%object without the I/O overhead for storing the total copy number
378
+%estimates by first subsetting the \Rclass{CNSet} object.  One could
379
+%then smooth the raw copy number estimates and store a summary copy
380
+%number for the interval, thereby substantially reducing the
381
+%dimensionality of the copy number estimates.  Examples of smoothing
382
+%applications such as circular binary segmentation implemented in the
383
+%\Rpackage{DNAcopy} package or a hidden Markov model implemented in the
384
+%\Rpackage{VanillaICE} package are provided in the \verb+SmoothingCN+
385
+%vignette.
381 386
 
382 387
 <<oligoSnpSet>>=
383 388
 open(cnSet3)
... ...
@@ -388,7 +393,7 @@ class(copyNumber(oligoSet))
388 393
 
389 394
 \noindent Note that the raw copy number estimates stored in the
390 395
 \Robject{oligoSnpSet} object can be retrieved by the
391
-\Rfunction{copyNumber} accessor and is equivalanet to that returned by
396
+\Rfunction{copyNumber} accessor and is equivalent to that returned by
392 397
 the \Rfunction{totalCopynumber} function defined over the same row and
393 398
 column indices.
394 399
 
Browse code

Add vignettes for copy number analysis. Add crlmmGT2.R file

git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/crlmm@54171 bc3139a8-67e5-0310-9ffc-ced21a209358

Rob Scharp authored on 30/03/2011 02:41:00
Showing 1 changed files
... ...
@@ -6,6 +6,7 @@
6 6
 \usepackage{graphicx}
7 7
 \usepackage{natbib}
8 8
 \usepackage{url}
9
+\usepackage{amsmath,amssymb}
9 10
 \newcommand{\Rfunction}[1]{{\texttt{#1}}}
10 11
 \newcommand{\Rmethod}[1]{{\texttt{#1}}}
11 12
 \newcommand{\Rcode}[1]{{\texttt{#1}}}
... ...
@@ -18,7 +19,7 @@
18 19
 \usepackage[margin=1in]{geometry}
19 20
 
20 21
 \begin{document}
21
-\title{Copy number estimation and genotype calling with \Rpackage{crlmm}}
22
+\title{Copy number estimation with \Rpackage{crlmm}}
22 23
 \date{\today}
23 24
 \author{Rob Scharpf}
24 25
 \maketitle
... ...
@@ -27,178 +28,202 @@
27 28
 options(continue=" ", width=70)
28 29
 @
29 30
 
30
-%\section{Estimating copy number}
31
-
32
-%At present, software for copy number estimation is provided only for the
33
-%Affymetrix 6.0 platform.
34
-
35 31
 \begin{abstract}
36
-  This vignette estimates copy number for HapMap samples on the
37
-  Affymetrix 6.0 platform.  See \citep{Scharpf2010} for details
38
-  regarding the methodology implemented in \crlmm{}.  In addition, a
39
-  compendium describing copy number analysis using the \crlmm{}
40
-  package is available from the author's website:
32
+
33
+  Copy number routines in the \crlmm{} package are available for
34
+  Affymetrix 5.0 and 6.0 platforms, as well as several Illumina
35
+  platforms.  This vignette assumes that the arrays have already been
36
+  successfully preprocessed and genotyped as per the instructions in
37
+  the \verb+AffymetrixPreprocessCN+ and \verb+IlluminaPreprocessCN+
38
+  vignettes for the Affymetrix and Illumina platforms,
39
+  respectively. While this vignette uses Affymetrix 6.0 arrays for
40
+  illustration, the steps at this point are identical for both
41
+  platforms.  See \citep{Scharpf2010} for details regarding the
42
+  methodology implemented in \crlmm{} for copy number analysis.  In
43
+  addition, a compendium describing copy number analysis using the
44
+  \crlmm{} package is available from the author's website:
41 45
   \url{http://www.biostat.jhsph.edu/~rscharpf/crlmmCompendium/index.html}.
42 46
 
43 47
 \end{abstract}
44 48
 
45
-\section{Copy number estimation}
46
-
47
-\subsection{Set up}
49
+\section{Set up}
48 50
 
49
-<<cdfname>>=
51
+<<libraries,results=hide>>=
52
+library(ff)
50 53
 library(crlmm)
54
+library(lattice)
51 55
 library(cacheSweave)
52
-@
53
-
54
-Several genotyping platforms are currently supported.  Supported
55
-platforms must have a corresponding annotation package (installed
56
-separately) that are listed below.
57
-
58
-<<annotationPackages>>=
59
-pkgs <- annotationPackages()
60
-pkgs <- pkgs[grep("Crlmm", pkgs)]
61
-pkgs
62
-@
63
-
64
-Note that 'pd.genomewidesnp6' and 'genomewidesnp6Crlmm' are both
65
-annotation packages for the Affymetrix 6.0 platform available on
66
-Bioconductor, but the 'genomewidesnp6Crlmm' annotation package must be
67
-used for copy number estimation.  The annotation package is specified
68
-through the 'cdfName' -- the identifier without the 'Crlmm' postfix.
69
-In the following code, we specify the cdf name for Affymetrix 6.0,
70
-provide the complete path to the CEL files, and indicate where
71
-intermediate files from the copy number estimation are to be saved.
72
-
73
-<<setup>>=
74
-cdfName <- "genomewidesnp6"
75
-pathToCels <- "/thumper/ctsa/snpmicroarray/hapmap/raw/affy/1m"
76 56
 if(getRversion() < "2.13.0"){
77 57
 	rpath <- getRversion()
78 58
 } else rpath <- "trunk"
79 59
 outdir <- paste("/thumper/ctsa/snpmicroarray/rs/ProcessedData/crlmm/", rpath, "/copynumber_vignette", sep="")
80
-dir.create(outdir, recursive=TRUE, showWarnings=FALSE)
60
+@
61
+
62
+<<ram>>=
63
+ldPath(outdir)
81 64
 setCacheDir(outdir)
65
+ocProbesets(150e3)
66
+ocSamples(200)
82 67
 @
83 68
 
84
-All long computations are saved in the output directory
85
-\Robject{outdir}.  Users should change these variables as appropriate.
86
-The following code chunk should fail unless the above arguments have
87
-been set appropriately.
69
+We begin by loading the \Robject{cnSet} object created by the
70
+\verb+AffymetrixPreprocessCN+ vignette.
88 71
 
89
-<<checkSetup>>=
90
-if(!file.exists(outdir)) stop("Please specify a valid directory for storing output")
91
-if(!file.exists(pathToCels)) stop("Please specify the correct path to the CEL files")
72
+<<loadcnset>>=
73
+if(!exists("cnSet")) load(file.path(outdir, "cnSet.rda"))
92 74
 @
93 75
 
94
-Processed data from codechunks requiring long computations are saved
95
-to disk by wrapping function calls in the \Robject{checkExists}
96
-function.  After running this vignette as a batch job, subsequent
97
-calls to \verb@Sweave@ will load the saved computations from disk. See
98
-the \Rfunction{checkExists} help file for additional details.
99
-
100
-\subsection{Preprocessing and genotyping.}
101
-
102
-In the following code chunk, we provide the complete path to the
103
-Affymetrix CEL files and define a 'batch' variable. The
104
-\Robject{batch} variable will be used to initialize a container for
105
-storing the normalized intensities, the genotype calls, and the
106
-parameter estimates for copy number.  Often the chemistry plate or the
107
-scan date of the array is a useful surrogate for batch. For the HapMap
108
-CEL files in our analysis, the CEPH (C) and Yoruban (Y) samples were
109
-prepared on separate chemistry plates.  In the following code chunk,
110
-we extract the population identifier from the CEL file names and
111
-assign these identifiers to the variable \Robject{batch}.
112
-
113
-<<celfiles>>=
114
-celFiles <- list.celfiles(pathToCels, full.names=TRUE, pattern=".CEL")
115
-celFiles <- celFiles[substr(basename(celFiles), 13, 13) %in% c("C", "Y")]
116
-plates <- substr(basename(celFiles), 13, 13)
117
-@
118 76
 
119
-The preprocessing steps for copy number estimation includes quantile
120
-normalization of the raw intensities for each probe and a step that
121
-summarizes the intensities of multiple probes at a single locus.  For
122
-example, the Affymetrix 6.0 platform has 3 or 4 identical probes at
123