Browse code

Commit made by the Bioconductor Git-SVN bridge. Consists of 1 commit.

Commit information:

Commit id: 736588e3d47fd58358f61ce2241bb2c85b4d9a0c

update to vignette to process previously unevaluated codechunks

Committed by: Rob Scharpf
Author Name: Rob Scharpf
Commit date: 2014-09-19 10:14:04 -0400
Author date: 2014-09-19 10:14:04 -0400


git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/crlmm@94285 bc3139a8-67e5-0310-9ffc-ced21a209358

Rob Scharp authored on 19/09/2014 14:22:04
Showing1 changed files
... ...
@@ -77,7 +77,8 @@ the path indicated by \verb+outdir+.
77 77
 
78 78
 <<setup>>=
79 79
 pathToCels <- "/thumper/ctsa/snpmicroarray/hapmap/raw/affy/1m"
80
-outdir <- paste("/local_data/r00/crlmm/", getRversion(), "/affy_vignette", sep="")
80
+v <- paste0("crlmm_v", gsub("\\.", "_", packageDescription("crlmm")$Version))
81
+outdir <- file.path("/thumper/ctsa/snpmicroarray/rs/ProcessedData", v)
81 82
 dir.create(outdir, recursive=TRUE, showWarnings=FALSE)
82 83
 @
83 84
 
Browse code

merging from collab

* collab:
add warning in vignette about NAs with BafLrrSetList function
Added Human Omni Express Exome 8 v1.1b as a supported chip
updated version number of pacakge and man pages to reflect these changes
skeleton for krlmm capability added. genotype.Illumina() can now take and XY object as input
update copynumber.Rnw to use BafLrrSetList
updates to vignettes
update namespace

# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.

git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/crlmm@79138 bc3139a8-67e5-0310-9ffc-ced21a209358

Rob Scharp authored on 31/07/2013 01:37:34
Showing1 changed files
... ...
@@ -194,16 +194,6 @@ table(c("male", "female")[cnSet$gender[]])
194 194
 if(any(is.na(cnSet$gender[]))) stop("missing genders")
195 195
 @
196 196
 
197
-%Parameters for determining log R ratios and B allele frequencies are
198
-%estimated using the function \Rfunction{crlmmCopynumber}:
199
-%
200
-%<<crlmmCopynumber>>=
201
-%crlmmCopynumber(cnSet, fit.linearModel=FALSE)
202
-%@
203
-%<<LDS_genotype, cache=TRUE>>=
204
-%cnSet <- genotype(celFiles, batch=plates, cdfName=cdfName, genome="hg19")
205
-%@
206
-
207 197
 The normalized intensities, genotype calls, and confidence scores are
208 198
 stored as \verb+ff+ objects in the \verb+assayData+ slot.  A concise
209 199
 summary of this object can be obtained throught the \Rfunction{print}
... ...
@@ -220,10 +210,6 @@ genotype calls are stored on disk rather than in active memory.
220 210
 print(object.size(cnSet), units="Mb")
221 211
 @
222 212
 
223
-%Users can proceed to the \verb+copynumber+ vignette for copy number
224
-%analyses.  See the \verb+Infrastructure+ vignette for additional
225
-%details on the \Robject{CNSet} class, including an overview of the
226
-%available accessors.
227 213
 \SweaveInput{copynumber}
228 214
 
229 215
 A sample-specific estimate of the signal to noise ratio (SNR)
Browse code

merge with collab branch containing fix for dqrls and bug-fix for computeRBaf that can misalign sample index with batch index (when batch is not in alphabetical order)

git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/crlmm@71295 bc3139a8-67e5-0310-9ffc-ced21a209358

Rob Scharp authored on 17/11/2012 15:24:46
Showing1 changed files
... ...
@@ -110,7 +110,7 @@ the purposes of this vignette, we only analyze the CEPH ('C') and
110 110
 Yoruban ('Y') samples.
111 111
 
112 112
 <<celfiles>>=
113
-celFiles <- list.celfiles(pathToCels, full.names=TRUE, pattern=".CEL")
113
+celFiles <- list.celfiles(pathToCels, full.names=TRUE, pattern=".CEL")[1:10]
114 114
 celFiles <- celFiles[substr(basename(celFiles), 13, 13) %in% c("C", "Y")]
115 115
 if(exists("file.index")){
116 116
 	celFiles <- celFiles[file.index]
Browse code

Merge branch 'collab'

* collab:
Update AffyGW.pdf and IlluminaPreprocessCN.pdf
crlmmIlluminaV2 calls crlmmGT. comment call to crlmmGT2
Update AffyGW vignette to step through construction of CNSet, normalization, and genotyping
v 1.15.27: Fix bug in crlmmGT2. clean up comments in crlmmIllumina.
update getFeatureData

git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/crlmm@69561 bc3139a8-67e5-0310-9ffc-ced21a209358

Rob Scharp authored on 19/09/2012 12:31:51
Showing1 changed files
... ...
@@ -115,7 +115,6 @@ celFiles <- celFiles[substr(basename(celFiles), 13, 13) %in% c("C", "Y")]
115 115
 if(exists("file.index")){
116 116
 	celFiles <- celFiles[file.index]
117 117
 }
118
-##celFiles <- celFiles[substr(basename(celFiles), 13, 13) %in% "C"]
119 118
 @
120 119
 
121 120
 Finally, copy number analyses using \crlmm{} require specification of
... ...
@@ -145,27 +144,70 @@ by a median.  For the nonpolymorphic markers on Affymetrix 6.0, only
145 144
 one probe per locus is available and the summarization step is not
146 145
 needed.  After preprocessing the arrays, the \crlmm{} package
147 146
 estimates the genotype using the CRLMM algorithm and provides a
148
-confidence score for the genotype calls.  The function
149
-\Rfunction{genotype} performs both the preprocessing and genotyping.
147
+confidence score for the genotype calls.   To begin, we initialize a
148
+container for the normalized intensities:
150 149
 
151
-<<LDS_genotype, cache=TRUE>>=
152
-cnSet <- genotype(celFiles, batch=plates, cdfName=cdfName, genome="hg19")
150
+<<constructCNSet, cache=TRUE>>=
151
+cnSet <- constructAffyCNSet(celFiles, batch=plates,
152
+			    cdfName="genomewidesnp6",
153
+			    genome="hg19")
153 154
 @
154 155
 
155
-Segment faults that occur with the above step can often be traced to a
156
-corrupt cel file. To check if any of the files are corrupt, try
157
-reading the files in one at a time:
156
+
157
+We quantile normalize the SNPs and nonpolymorphic markers separately.
158
+Since the normalized intensities are ff objects, the functions
159
+\Rfunction{cnrmaAffy} and \Rfunction{snprmaAffy} write the normalized
160
+intensities to disk and nothing is returned.
161
+
162
+<<cnrmaAffy, cache=TRUE>>=
163
+cnrmaAffy(cnSet)
164
+@
165
+
166
+Any segment fault that occurs during the normalization can often be
167
+traced to a corrupt cel file. To check if any of the files are
168
+corrupt, one can use the function \Rfunction{validCEL} that tries to
169
+read each files as in the following unevaluated codechunk:
158 170
 
159 171
 <<checkcorrupt,eval=FALSE>>=
160 172
 validCEL(celFiles)
161 173
 @
162 174
 
175
+<<snprmaAffy, cache=TRUE>>=
176
+snprmaAffy(cnSet)
177
+@
178
+
179
+The function \Rfunction{genotypeAffy} performs performs the genotyping.
180
+
181
+<<genotypeAffy, cache=TRUE>>=
182
+genotypeAffy(cnSet, gender=NULL)
183
+@
184
+
185
+The above function also imputes the gender from the chromosome X and Y
186
+intensities when the argument gender is \texttt{NULL}. The imputed
187
+genders are
163 188
 
164
-The value returned by genotype is an instance of the class
165
-\Robject{CNSet}.  The normalized intensities, genotype calls, and
166
-confidence scores are stored as \verb+ff+ objects in the
167
-\verb+assayData+ slot.  A concise summary of this object can be
168
-obtained throught the \Rfunction{print} or \Rfunction{show} methods.
189
+<<gender>>=
190
+table(c("male", "female")[cnSet$gender[]])
191
+@
192
+
193
+<<genderCheck,echo=FALSE>>=
194
+if(any(is.na(cnSet$gender[]))) stop("missing genders")
195
+@
196
+
197
+%Parameters for determining log R ratios and B allele frequencies are
198
+%estimated using the function \Rfunction{crlmmCopynumber}:
199
+%
200
+%<<crlmmCopynumber>>=
201
+%crlmmCopynumber(cnSet, fit.linearModel=FALSE)
202
+%@
203
+%<<LDS_genotype, cache=TRUE>>=
204
+%cnSet <- genotype(celFiles, batch=plates, cdfName=cdfName, genome="hg19")
205
+%@
206
+
207
+The normalized intensities, genotype calls, and confidence scores are
208
+stored as \verb+ff+ objects in the \verb+assayData+ slot.  A concise
209
+summary of this object can be obtained throught the \Rfunction{print}
210
+or \Rfunction{show} methods.
169 211
 
170 212
 <<show>>=
171 213
 print(cnSet)
... ...
@@ -184,6 +226,14 @@ print(object.size(cnSet), units="Mb")
184 226
 %available accessors.
185 227
 \SweaveInput{copynumber}
186 228
 
229
+A sample-specific estimate of the signal to noise ratio (SNR)
230
+measuring the overall separation of the genotypes provides a measure
231
+of sample quality.  Samples with SNRs below 5 typically indicate poor
232
+quality, and typically have genotypes with lower confidence scores and
233
+noisier copy number estimates.  The SNR is stored in the
234
+\Robject{phenoData} slot of the \Rclass{CNSet} class and can be
235
+accessed using the ``\$" operator.
236
+
187 237
 
188 238
 \section{Session information}
189 239
 <<sessionInfo, results=tex>>=
... ...
@@ -196,15 +246,14 @@ toLatex(sessionInfo())
196 246
   \includegraphics[width=0.6\textwidth]{AffyGW-snr.pdf}
197 247
   \caption{The signal to noise ratio (SNR) for 180 HapMap samples. For
198 248
     Affymetrix platforms, SNR values below 5 can indicate possible
199
-    problems with sample quality.  In some circumstances, it may be
200
-    more helpful to exclude samples with poor DNA quality.}
249
+    problems with sample quality.  }
201 250
 \end{center}
202 251
 \end{figure}
203 252
 
204 253
 
205
-\begin{bibliography}
254
+%\begin{bibliography}
206 255
   \bibliographystyle{plain}
207 256
   \bibliography{refs}
208
-\end{bibliography}
257
+%\end{bibliography}
209 258
 
210 259
 \end{document}
Browse code

Merge branch 'collab'

* collab: (34 commits)
revert change to IlluminaPreprocessCN
fix bug in isValidCdfName
print warning when all features in a batch of probes are flagged, but allow processing to continue
add utility cleancdfnames
Add validCdfNames.Rd
export validCdfNames
imputeGender fix when chromosome Y not available
Use splitIndicesByLength(index, ocSamples/getDoParWorkers())
Can not allocate vector of size XG with genotype.Illumina. Use splitIndicesByNode() only if the length of the list is greater than the split from splitIndicesByLength(). Otherwise, split by length using ocSamples()
update .gitignore
Add make.unique for sampleSheet$Sample_ID in readIdatFiles
bug in description
ensure sample ids stored in samplesheet are unique when constructing cnSet object
update oligoClasses dependency
update unit test for genotype.Illumina
revert change in constructInf call from genotype.Illumina
Update genotype.Rd
edit ACN function
1.15.6 use make.unique(basename(arrayNames)) to allow processing of Illumina samples with duplicated barcodes
check that sample identifies are unique in crlmm function
...

git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/crlmm@67435 bc3139a8-67e5-0310-9ffc-ced21a209358

Rob Scharp authored on 08/07/2012 19:00:03
Showing1 changed files
... ...
@@ -47,9 +47,11 @@ options(continue=" ", width=70)
47 47
 \section{Set up}
48 48
 
49 49
 <<cdfname, results=hide>>=
50
-library(crlmm)
51
-library(ff)
52
-library(cacheSweave)
50
+library(oligoClasses)
51
+library2(crlmm)
52
+library2(ff)
53
+if(!exists("useCache")) useCache <- TRUE
54
+if(useCache) library2(cacheSweave)
53 55
 @
54 56
 
55 57
 This vignette analyzes HapMap samples assayed on the Affymetrix 6.0
... ...
@@ -67,8 +69,11 @@ cdfName <- "genomewidesnp6"
67 69
 
68 70
 The HapMap CEL files are stored in a local directory assigned to
69 71
 \verb+pathToCels+ in the following code. The genotyping step will
70
-create several files with \verb+ff+ extensions. We will store these
71
-files to the path indicated by \verb+outdir+.
72
+create several files with \verb+ff+ extensions. The ff objects contain
73
+the low-level, normalized intensities as well as parameters used to
74
+subsequently estimate copy number and B allele frequencies. These
75
+files should not be deleted or moved.  We will store these files to
76
+the path indicated by \verb+outdir+.
72 77
 
73 78
 <<setup>>=
74 79
 pathToCels <- "/thumper/ctsa/snpmicroarray/hapmap/raw/affy/1m"
... ...
@@ -86,7 +91,7 @@ ldPath(outdir)
86 91
 
87 92
 % only needed if cacheing
88 93
 <<cachedir, echo=FALSE>>=
89
-setCacheDir(outdir)
94
+if(useCache) setCacheDir(outdir)
90 95
 @
91 96
 
92 97
 The \R{} functions \Rfunction{ocProbesets} and \Rfunction{ocSamples}
... ...
@@ -107,6 +112,10 @@ Yoruban ('Y') samples.
107 112
 <<celfiles>>=
108 113
 celFiles <- list.celfiles(pathToCels, full.names=TRUE, pattern=".CEL")
109 114
 celFiles <- celFiles[substr(basename(celFiles), 13, 13) %in% c("C", "Y")]
115
+if(exists("file.index")){
116
+	celFiles <- celFiles[file.index]
117
+}
118
+##celFiles <- celFiles[substr(basename(celFiles), 13, 13) %in% "C"]
110 119
 @
111 120
 
112 121
 Finally, copy number analyses using \crlmm{} require specification of
... ...
@@ -140,7 +149,7 @@ confidence score for the genotype calls.  The function
140 149
 \Rfunction{genotype} performs both the preprocessing and genotyping.
141 150
 
142 151
 <<LDS_genotype, cache=TRUE>>=
143
-cnSet <- genotype(celFiles, batch=plates, cdfName=cdfName)
152
+cnSet <- genotype(celFiles, batch=plates, cdfName=cdfName, genome="hg19")
144 153
 @
145 154
 
146 155
 Segment faults that occur with the above step can often be traced to a
... ...
@@ -169,22 +178,10 @@ genotype calls are stored on disk rather than in active memory.
169 178
 print(object.size(cnSet), units="Mb")
170 179
 @
171 180
 
172
-We save the \Robject{cnSet} object in a local directory for subsequent
173
-copy number analysis in the \verb+copynumber+ vignette.
174
-
175
-<<save,cache=TRUE>>=
176
-saveObject <- function(outdir, cnSet) {
177
-	save(cnSet, file=file.path(outdir, "cnSet.rda"))
178
-	TRUE
179
-}
180
-(cnset.saved <- saveObject(outdir, cnSet))
181
-@
182
-
183 181
 %Users can proceed to the \verb+copynumber+ vignette for copy number
184 182
 %analyses.  See the \verb+Infrastructure+ vignette for additional
185 183
 %details on the \Robject{CNSet} class, including an overview of the
186 184
 %available accessors.
187
-
188 185
 \SweaveInput{copynumber}
189 186
 
190 187
 
Browse code

Merge branch 'collab'

* collab:
replace splitIndicesByLength with splitIndicesByNode throughout cnrma-functions.R (check that snow is loaded and getCluster is not null)
add an example to genotype.Illumina that indicates how parallelization would be enabled. The example requires local data and is not run.
change outdir in IlluminaPreprocessCN and AffyGW
Update R/crlmm-illumina.R
bump version for parallelization of genotype.Illumina
Update R/crlmm-illumina.R

git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/crlmm@64151 bc3139a8-67e5-0310-9ffc-ced21a209358

Rob Scharp authored on 20/03/2012 13:55:50
Showing1 changed files
... ...
@@ -72,10 +72,7 @@ files to the path indicated by \verb+outdir+.
72 72
 
73 73
 <<setup>>=
74 74
 pathToCels <- "/thumper/ctsa/snpmicroarray/hapmap/raw/affy/1m"
75
-if(getRversion() < "2.13.0"){
76
-	rpath <- getRversion()
77
-} else rpath <- "trunk"
78
-outdir <- paste("/amber1/archive/oncbb/rscharpf/crlmm_vignette/", rpath, "/gw6/", sep="")
75
+outdir <- paste("/local_data/r00/crlmm/", getRversion(), "/affy_vignette", sep="")
79 76
 dir.create(outdir, recursive=TRUE, showWarnings=FALSE)
80 77
 @
81 78
 
Browse code

Merge branch 'mymac'

* mymac:
add AffyGW.pdf
update vignettes in inst/scripts
Change argument of validCEL to celfiles
Update constructInf to accommodate GenomeDataFrame class for featureData
bump version to 1.13.7
Add doRUnit.R
Add celfile-utils.Rd
Streamlne some of the Rd files
add validCEL function that checks whether all celfiles can be read
getFeatureData returns GenomeAnnotatedDataFrame
Remove imports from methods. Remove pdf of illumina_copynumber.pdf (large file) and copynumber.pdf
getFeatureDAta returns GenomeAnnotatedDataFrame
Remove separate vignette for copy number in inst/scripts. Include copynumber section in both affy and illumina pipelines.
update documentation files for genotype.Illumina, preprocessInf, and genotypeInf (cdfName added as argument. Indicate that 'batch' should be a character string)
pass cdfName to genotypeInf and preprocessInf
add unitTests and cn-functions for 'simple usage'
Combine AffyPreprocess and copynumber. Combine IlluminaPreprocess and copynumber
remove depency on ff to allow installation on my mac

git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/crlmm@62108 bc3139a8-67e5-0310-9ffc-ced21a209358

Rob Scharp authored on 17/01/2012 19:13:44
Showing1 changed files
1 1
new file mode 100644
... ...
@@ -0,0 +1,216 @@
1
+%\VignetteIndexEntry{Preprocessing and genotyping Affymetrix arrays
2
+%for copy number analysis}
3
+%\VignetteDepends{crlmm, genomewidesnp6Crlmm, cacheSweave, ff}
4
+%\VignetteKeywords{crlmm, SNP 6, copy number, SNP}
5
+%\VignettePackage{crlmm}
6
+\documentclass{article}
7
+\usepackage{graphicx}
8
+\usepackage{natbib}
9
+\usepackage{amsmath}
10
+\usepackage{url}
11
+\newcommand{\Rfunction}[1]{{\texttt{#1}}}
12
+\newcommand{\Rmethod}[1]{{\texttt{#1}}}
13
+\newcommand{\Rcode}[1]{{\texttt{#1}}}
14
+\newcommand{\Robject}[1]{{\texttt{#1}}}
15
+\newcommand{\Rpackage}[1]{{\textsf{#1}}}
16
+\newcommand{\Rclass}[1]{{\textit{#1}}}
17
+\newcommand{\oligo}{\Rpackage{oligo }}
18
+\newcommand{\R}{\textsf{R}}
19
+\newcommand{\crlmm}{\Rpackage{crlmm}}
20
+\usepackage[margin=1in]{geometry}
21
+
22
+\begin{document}
23
+\title{Preprocessing \& Genotyping Affymetrix Arrays for Copy Number Analysis}
24
+\date{\today}
25
+\author{Rob Scharpf}
26
+\maketitle
27
+
28
+<<setup, echo=FALSE, results=hide>>=
29
+options(continue=" ", width=70)
30
+@
31
+
32
+%\section{Estimating copy number}
33
+
34
+%At present, software for copy number estimation is provided only for the
35
+%Affymetrix 6.0 platform.
36
+
37
+\begin{abstract}
38
+
39
+  This vignette describes the setup needed to analyze Affymetrix 6.0
40
+  (or 5.0) CEL files and the steps for preprocessing and
41
+  genotyping. These steps must be completed prior to copy number
42
+  analyses in \crlmm{}.  After completing these steps, users can refer
43
+  to the \verb+copynumber+ vignette.
44
+
45
+\end{abstract}
46
+
47
+\section{Set up}
48
+
49
+<<cdfname, results=hide>>=
50
+library(crlmm)
51
+library(ff)
52
+library(cacheSweave)
53
+@
54
+
55
+This vignette analyzes HapMap samples assayed on the Affymetrix 6.0
56
+platform.  The annotation package for this platform is
57
+\Rpackage{genomewidesnp6Crlmm}.  We assign the name of the annotation
58
+package without the \verb+Crlmm+ postfix to the name \verb+cdfName+.
59
+We use the \R{} package \Rpackage{cacheSweave} to cache long
60
+computations in this vignette.  Users should refer to the
61
+\Rpackage{cacheSweave} package for additional details regarding
62
+cacheing.
63
+
64
+<<cdfname>>=
65
+cdfName <- "genomewidesnp6"
66
+@
67
+
68
+The HapMap CEL files are stored in a local directory assigned to
69
+\verb+pathToCels+ in the following code. The genotyping step will
70
+create several files with \verb+ff+ extensions. We will store these
71
+files to the path indicated by \verb+outdir+.
72
+
73
+<<setup>>=
74
+pathToCels <- "/thumper/ctsa/snpmicroarray/hapmap/raw/affy/1m"
75
+if(getRversion() < "2.13.0"){
76
+	rpath <- getRversion()
77
+} else rpath <- "trunk"
78
+outdir <- paste("/amber1/archive/oncbb/rscharpf/crlmm_vignette/", rpath, "/gw6/", sep="")
79
+dir.create(outdir, recursive=TRUE, showWarnings=FALSE)
80
+@
81
+
82
+By providing the path in \verb+outdir+ as an argument to the \R{}
83
+function \Rfunction{ldPath}, all of the \verb+ff+ files created during
84
+the genotyping step will be stored in \verb+outdir+.
85
+
86
+<<ldpath>>=
87
+ldPath(outdir)
88
+@
89
+
90
+% only needed if cacheing
91
+<<cachedir, echo=FALSE>>=
92
+setCacheDir(outdir)
93
+@
94
+
95
+The \R{} functions \Rfunction{ocProbesets} and \Rfunction{ocSamples}
96
+manage the RAM required for our analysis. See the documentation for
97
+these functions and the \verb+CopyNumberOverview+ vignette for
98
+additional details.
99
+
100
+<<ram>>=
101
+ocProbesets(100000)
102
+ocSamples(200)
103
+@
104
+
105
+
106
+Next we indicate the local directory that contains the CEL files. For
107
+the purposes of this vignette, we only analyze the CEPH ('C') and
108
+Yoruban ('Y') samples.
109
+
110
+<<celfiles>>=
111
+celFiles <- list.celfiles(pathToCels, full.names=TRUE, pattern=".CEL")
112
+celFiles <- celFiles[substr(basename(celFiles), 13, 13) %in% c("C", "Y")]
113
+@
114
+
115
+Finally, copy number analyses using \crlmm{} require specification of
116
+a batch variable that is used to indicate which samples were processed
117
+together.  For example, if some of the samples were processed in April
118
+and another set of samples were processed in June, we could name the
119
+batches 'April' and 'June', respectively.  A useful surrogate for
120
+batch is often the chemistry plate or the scan date of the array. For
121
+the HapMap CEL files analyzed in this vignette, the CEPH (C) and
122
+Yoruban (Y) samples were prepared on separate chemistry plates.  In
123
+the following code chunk, we extract the population identifier from
124
+the CEL file names and assign these identifiers to the variable
125
+\Robject{plate}.
126
+
127
+<<plates>>=
128
+plates <- substr(basename(celFiles), 13, 13)
129
+@
130
+
131
+\section{Preprocessing and genotyping.}
132
+
133
+The preprocessing steps for copy number estimation includes quantile
134
+normalization of the raw intensities for each probe and a step that
135
+summarizes the intensities of multiple probes at a single locus.  For
136
+example, the Affymetrix 6.0 platform has 3 or 4 identical probes at
137
+each polymorphic locus and the normalized intensities are summarized
138
+by a median.  For the nonpolymorphic markers on Affymetrix 6.0, only
139
+one probe per locus is available and the summarization step is not
140
+needed.  After preprocessing the arrays, the \crlmm{} package
141
+estimates the genotype using the CRLMM algorithm and provides a
142
+confidence score for the genotype calls.  The function
143
+\Rfunction{genotype} performs both the preprocessing and genotyping.
144
+
145
+<<LDS_genotype, cache=TRUE>>=
146
+cnSet <- genotype(celFiles, batch=plates, cdfName=cdfName)
147
+@
148
+
149
+Segment faults that occur with the above step can often be traced to a
150
+corrupt cel file. To check if any of the files are corrupt, try
151
+reading the files in one at a time:
152
+
153
+<<checkcorrupt,eval=FALSE>>=
154
+validCEL(celFiles)
155
+@
156
+
157
+
158
+The value returned by genotype is an instance of the class
159
+\Robject{CNSet}.  The normalized intensities, genotype calls, and
160
+confidence scores are stored as \verb+ff+ objects in the
161
+\verb+assayData+ slot.  A concise summary of this object can be
162
+obtained throught the \Rfunction{print} or \Rfunction{show} methods.
163
+
164
+<<show>>=
165
+print(cnSet)
166
+@
167
+
168
+Note that the object is relatively small as the intensities and
169
+genotype calls are stored on disk rather than in active memory.
170
+
171
+<<objectsize>>=
172
+print(object.size(cnSet), units="Mb")
173
+@
174
+
175
+We save the \Robject{cnSet} object in a local directory for subsequent
176
+copy number analysis in the \verb+copynumber+ vignette.
177
+
178
+<<save,cache=TRUE>>=
179
+saveObject <- function(outdir, cnSet) {
180
+	save(cnSet, file=file.path(outdir, "cnSet.rda"))
181
+	TRUE
182
+}
183
+(cnset.saved <- saveObject(outdir, cnSet))
184
+@
185
+
186
+%Users can proceed to the \verb+copynumber+ vignette for copy number
187
+%analyses.  See the \verb+Infrastructure+ vignette for additional
188
+%details on the \Robject{CNSet} class, including an overview of the
189
+%available accessors.
190
+
191
+\SweaveInput{copynumber}
192
+
193
+
194
+\section{Session information}
195
+<<sessionInfo, results=tex>>=
196
+toLatex(sessionInfo())
197
+@
198
+
199
+
200
+\begin{figure}[f]
201
+  \begin{center}
202
+  \includegraphics[width=0.6\textwidth]{AffyGW-snr.pdf}
203
+  \caption{The signal to noise ratio (SNR) for 180 HapMap samples. For
204
+    Affymetrix platforms, SNR values below 5 can indicate possible
205
+    problems with sample quality.  In some circumstances, it may be
206
+    more helpful to exclude samples with poor DNA quality.}
207
+\end{center}
208
+\end{figure}
209
+
210
+
211
+\begin{bibliography}
212
+  \bibliographystyle{plain}
213
+  \bibliography{refs}
214
+\end{bibliography}
215
+
216
+\end{document}