Commit information:
Commit id: 736588e3d47fd58358f61ce2241bb2c85b4d9a0c
update to vignette to process previously unevaluated codechunks
Committed by: Rob Scharpf
Author Name: Rob Scharpf
Commit date: 2014-09-19 10:14:04 -0400
Author date: 2014-09-19 10:14:04 -0400
git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/crlmm@94285 bc3139a8-67e5-0310-9ffc-ced21a209358
... | ... |
@@ -77,7 +77,8 @@ the path indicated by \verb+outdir+. |
77 | 77 |
|
78 | 78 |
<<setup>>= |
79 | 79 |
pathToCels <- "/thumper/ctsa/snpmicroarray/hapmap/raw/affy/1m" |
80 |
-outdir <- paste("/local_data/r00/crlmm/", getRversion(), "/affy_vignette", sep="") |
|
80 |
+v <- paste0("crlmm_v", gsub("\\.", "_", packageDescription("crlmm")$Version)) |
|
81 |
+outdir <- file.path("/thumper/ctsa/snpmicroarray/rs/ProcessedData", v) |
|
81 | 82 |
dir.create(outdir, recursive=TRUE, showWarnings=FALSE) |
82 | 83 |
@ |
83 | 84 |
|
* collab:
add warning in vignette about NAs with BafLrrSetList function
Added Human Omni Express Exome 8 v1.1b as a supported chip
updated version number of pacakge and man pages to reflect these changes
skeleton for krlmm capability added. genotype.Illumina() can now take and XY object as input
update copynumber.Rnw to use BafLrrSetList
updates to vignettes
update namespace
# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/crlmm@79138 bc3139a8-67e5-0310-9ffc-ced21a209358
... | ... |
@@ -194,16 +194,6 @@ table(c("male", "female")[cnSet$gender[]]) |
194 | 194 |
if(any(is.na(cnSet$gender[]))) stop("missing genders") |
195 | 195 |
@ |
196 | 196 |
|
197 |
-%Parameters for determining log R ratios and B allele frequencies are |
|
198 |
-%estimated using the function \Rfunction{crlmmCopynumber}: |
|
199 |
-% |
|
200 |
-%<<crlmmCopynumber>>= |
|
201 |
-%crlmmCopynumber(cnSet, fit.linearModel=FALSE) |
|
202 |
-%@ |
|
203 |
-%<<LDS_genotype, cache=TRUE>>= |
|
204 |
-%cnSet <- genotype(celFiles, batch=plates, cdfName=cdfName, genome="hg19") |
|
205 |
-%@ |
|
206 |
- |
|
207 | 197 |
The normalized intensities, genotype calls, and confidence scores are |
208 | 198 |
stored as \verb+ff+ objects in the \verb+assayData+ slot. A concise |
209 | 199 |
summary of this object can be obtained throught the \Rfunction{print} |
... | ... |
@@ -220,10 +210,6 @@ genotype calls are stored on disk rather than in active memory. |
220 | 210 |
print(object.size(cnSet), units="Mb") |
221 | 211 |
@ |
222 | 212 |
|
223 |
-%Users can proceed to the \verb+copynumber+ vignette for copy number |
|
224 |
-%analyses. See the \verb+Infrastructure+ vignette for additional |
|
225 |
-%details on the \Robject{CNSet} class, including an overview of the |
|
226 |
-%available accessors. |
|
227 | 213 |
\SweaveInput{copynumber} |
228 | 214 |
|
229 | 215 |
A sample-specific estimate of the signal to noise ratio (SNR) |
git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/crlmm@71295 bc3139a8-67e5-0310-9ffc-ced21a209358
... | ... |
@@ -110,7 +110,7 @@ the purposes of this vignette, we only analyze the CEPH ('C') and |
110 | 110 |
Yoruban ('Y') samples. |
111 | 111 |
|
112 | 112 |
<<celfiles>>= |
113 |
-celFiles <- list.celfiles(pathToCels, full.names=TRUE, pattern=".CEL") |
|
113 |
+celFiles <- list.celfiles(pathToCels, full.names=TRUE, pattern=".CEL")[1:10] |
|
114 | 114 |
celFiles <- celFiles[substr(basename(celFiles), 13, 13) %in% c("C", "Y")] |
115 | 115 |
if(exists("file.index")){ |
116 | 116 |
celFiles <- celFiles[file.index] |
* collab:
Update AffyGW.pdf and IlluminaPreprocessCN.pdf
crlmmIlluminaV2 calls crlmmGT. comment call to crlmmGT2
Update AffyGW vignette to step through construction of CNSet, normalization, and genotyping
v 1.15.27: Fix bug in crlmmGT2. clean up comments in crlmmIllumina.
update getFeatureData
git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/crlmm@69561 bc3139a8-67e5-0310-9ffc-ced21a209358
... | ... |
@@ -115,7 +115,6 @@ celFiles <- celFiles[substr(basename(celFiles), 13, 13) %in% c("C", "Y")] |
115 | 115 |
if(exists("file.index")){ |
116 | 116 |
celFiles <- celFiles[file.index] |
117 | 117 |
} |
118 |
-##celFiles <- celFiles[substr(basename(celFiles), 13, 13) %in% "C"] |
|
119 | 118 |
@ |
120 | 119 |
|
121 | 120 |
Finally, copy number analyses using \crlmm{} require specification of |
... | ... |
@@ -145,27 +144,70 @@ by a median. For the nonpolymorphic markers on Affymetrix 6.0, only |
145 | 144 |
one probe per locus is available and the summarization step is not |
146 | 145 |
needed. After preprocessing the arrays, the \crlmm{} package |
147 | 146 |
estimates the genotype using the CRLMM algorithm and provides a |
148 |
-confidence score for the genotype calls. The function |
|
149 |
-\Rfunction{genotype} performs both the preprocessing and genotyping. |
|
147 |
+confidence score for the genotype calls. To begin, we initialize a |
|
148 |
+container for the normalized intensities: |
|
150 | 149 |
|
151 |
-<<LDS_genotype, cache=TRUE>>= |
|
152 |
-cnSet <- genotype(celFiles, batch=plates, cdfName=cdfName, genome="hg19") |
|
150 |
+<<constructCNSet, cache=TRUE>>= |
|
151 |
+cnSet <- constructAffyCNSet(celFiles, batch=plates, |
|
152 |
+ cdfName="genomewidesnp6", |
|
153 |
+ genome="hg19") |
|
153 | 154 |
@ |
154 | 155 |
|
155 |
-Segment faults that occur with the above step can often be traced to a |
|
156 |
-corrupt cel file. To check if any of the files are corrupt, try |
|
157 |
-reading the files in one at a time: |
|
156 |
+ |
|
157 |
+We quantile normalize the SNPs and nonpolymorphic markers separately. |
|
158 |
+Since the normalized intensities are ff objects, the functions |
|
159 |
+\Rfunction{cnrmaAffy} and \Rfunction{snprmaAffy} write the normalized |
|
160 |
+intensities to disk and nothing is returned. |
|
161 |
+ |
|
162 |
+<<cnrmaAffy, cache=TRUE>>= |
|
163 |
+cnrmaAffy(cnSet) |
|
164 |
+@ |
|
165 |
+ |
|
166 |
+Any segment fault that occurs during the normalization can often be |
|
167 |
+traced to a corrupt cel file. To check if any of the files are |
|
168 |
+corrupt, one can use the function \Rfunction{validCEL} that tries to |
|
169 |
+read each files as in the following unevaluated codechunk: |
|
158 | 170 |
|
159 | 171 |
<<checkcorrupt,eval=FALSE>>= |
160 | 172 |
validCEL(celFiles) |
161 | 173 |
@ |
162 | 174 |
|
175 |
+<<snprmaAffy, cache=TRUE>>= |
|
176 |
+snprmaAffy(cnSet) |
|
177 |
+@ |
|
178 |
+ |
|
179 |
+The function \Rfunction{genotypeAffy} performs performs the genotyping. |
|
180 |
+ |
|
181 |
+<<genotypeAffy, cache=TRUE>>= |
|
182 |
+genotypeAffy(cnSet, gender=NULL) |
|
183 |
+@ |
|
184 |
+ |
|
185 |
+The above function also imputes the gender from the chromosome X and Y |
|
186 |
+intensities when the argument gender is \texttt{NULL}. The imputed |
|
187 |
+genders are |
|
163 | 188 |
|
164 |
-The value returned by genotype is an instance of the class |
|
165 |
-\Robject{CNSet}. The normalized intensities, genotype calls, and |
|
166 |
-confidence scores are stored as \verb+ff+ objects in the |
|
167 |
-\verb+assayData+ slot. A concise summary of this object can be |
|
168 |
-obtained throught the \Rfunction{print} or \Rfunction{show} methods. |
|
189 |
+<<gender>>= |
|
190 |
+table(c("male", "female")[cnSet$gender[]]) |
|
191 |
+@ |
|
192 |
+ |
|
193 |
+<<genderCheck,echo=FALSE>>= |
|
194 |
+if(any(is.na(cnSet$gender[]))) stop("missing genders") |
|
195 |
+@ |
|
196 |
+ |
|
197 |
+%Parameters for determining log R ratios and B allele frequencies are |
|
198 |
+%estimated using the function \Rfunction{crlmmCopynumber}: |
|
199 |
+% |
|
200 |
+%<<crlmmCopynumber>>= |
|
201 |
+%crlmmCopynumber(cnSet, fit.linearModel=FALSE) |
|
202 |
+%@ |
|
203 |
+%<<LDS_genotype, cache=TRUE>>= |
|
204 |
+%cnSet <- genotype(celFiles, batch=plates, cdfName=cdfName, genome="hg19") |
|
205 |
+%@ |
|
206 |
+ |
|
207 |
+The normalized intensities, genotype calls, and confidence scores are |
|
208 |
+stored as \verb+ff+ objects in the \verb+assayData+ slot. A concise |
|
209 |
+summary of this object can be obtained throught the \Rfunction{print} |
|
210 |
+or \Rfunction{show} methods. |
|
169 | 211 |
|
170 | 212 |
<<show>>= |
171 | 213 |
print(cnSet) |
... | ... |
@@ -184,6 +226,14 @@ print(object.size(cnSet), units="Mb") |
184 | 226 |
%available accessors. |
185 | 227 |
\SweaveInput{copynumber} |
186 | 228 |
|
229 |
+A sample-specific estimate of the signal to noise ratio (SNR) |
|
230 |
+measuring the overall separation of the genotypes provides a measure |
|
231 |
+of sample quality. Samples with SNRs below 5 typically indicate poor |
|
232 |
+quality, and typically have genotypes with lower confidence scores and |
|
233 |
+noisier copy number estimates. The SNR is stored in the |
|
234 |
+\Robject{phenoData} slot of the \Rclass{CNSet} class and can be |
|
235 |
+accessed using the ``\$" operator. |
|
236 |
+ |
|
187 | 237 |
|
188 | 238 |
\section{Session information} |
189 | 239 |
<<sessionInfo, results=tex>>= |
... | ... |
@@ -196,15 +246,14 @@ toLatex(sessionInfo()) |
196 | 246 |
\includegraphics[width=0.6\textwidth]{AffyGW-snr.pdf} |
197 | 247 |
\caption{The signal to noise ratio (SNR) for 180 HapMap samples. For |
198 | 248 |
Affymetrix platforms, SNR values below 5 can indicate possible |
199 |
- problems with sample quality. In some circumstances, it may be |
|
200 |
- more helpful to exclude samples with poor DNA quality.} |
|
249 |
+ problems with sample quality. } |
|
201 | 250 |
\end{center} |
202 | 251 |
\end{figure} |
203 | 252 |
|
204 | 253 |
|
205 |
-\begin{bibliography} |
|
254 |
+%\begin{bibliography} |
|
206 | 255 |
\bibliographystyle{plain} |
207 | 256 |
\bibliography{refs} |
208 |
-\end{bibliography} |
|
257 |
+%\end{bibliography} |
|
209 | 258 |
|
210 | 259 |
\end{document} |
* collab: (34 commits)
revert change to IlluminaPreprocessCN
fix bug in isValidCdfName
print warning when all features in a batch of probes are flagged, but allow processing to continue
add utility cleancdfnames
Add validCdfNames.Rd
export validCdfNames
imputeGender fix when chromosome Y not available
Use splitIndicesByLength(index, ocSamples/getDoParWorkers())
Can not allocate vector of size XG with genotype.Illumina. Use splitIndicesByNode() only if the length of the list is greater than the split from splitIndicesByLength(). Otherwise, split by length using ocSamples()
update .gitignore
Add make.unique for sampleSheet$Sample_ID in readIdatFiles
bug in description
ensure sample ids stored in samplesheet are unique when constructing cnSet object
update oligoClasses dependency
update unit test for genotype.Illumina
revert change in constructInf call from genotype.Illumina
Update genotype.Rd
edit ACN function
1.15.6 use make.unique(basename(arrayNames)) to allow processing of Illumina samples with duplicated barcodes
check that sample identifies are unique in crlmm function
...
git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/crlmm@67435 bc3139a8-67e5-0310-9ffc-ced21a209358
... | ... |
@@ -47,9 +47,11 @@ options(continue=" ", width=70) |
47 | 47 |
\section{Set up} |
48 | 48 |
|
49 | 49 |
<<cdfname, results=hide>>= |
50 |
-library(crlmm) |
|
51 |
-library(ff) |
|
52 |
-library(cacheSweave) |
|
50 |
+library(oligoClasses) |
|
51 |
+library2(crlmm) |
|
52 |
+library2(ff) |
|
53 |
+if(!exists("useCache")) useCache <- TRUE |
|
54 |
+if(useCache) library2(cacheSweave) |
|
53 | 55 |
@ |
54 | 56 |
|
55 | 57 |
This vignette analyzes HapMap samples assayed on the Affymetrix 6.0 |
... | ... |
@@ -67,8 +69,11 @@ cdfName <- "genomewidesnp6" |
67 | 69 |
|
68 | 70 |
The HapMap CEL files are stored in a local directory assigned to |
69 | 71 |
\verb+pathToCels+ in the following code. The genotyping step will |
70 |
-create several files with \verb+ff+ extensions. We will store these |
|
71 |
-files to the path indicated by \verb+outdir+. |
|
72 |
+create several files with \verb+ff+ extensions. The ff objects contain |
|
73 |
+the low-level, normalized intensities as well as parameters used to |
|
74 |
+subsequently estimate copy number and B allele frequencies. These |
|
75 |
+files should not be deleted or moved. We will store these files to |
|
76 |
+the path indicated by \verb+outdir+. |
|
72 | 77 |
|
73 | 78 |
<<setup>>= |
74 | 79 |
pathToCels <- "/thumper/ctsa/snpmicroarray/hapmap/raw/affy/1m" |
... | ... |
@@ -86,7 +91,7 @@ ldPath(outdir) |
86 | 91 |
|
87 | 92 |
% only needed if cacheing |
88 | 93 |
<<cachedir, echo=FALSE>>= |
89 |
-setCacheDir(outdir) |
|
94 |
+if(useCache) setCacheDir(outdir) |
|
90 | 95 |
@ |
91 | 96 |
|
92 | 97 |
The \R{} functions \Rfunction{ocProbesets} and \Rfunction{ocSamples} |
... | ... |
@@ -107,6 +112,10 @@ Yoruban ('Y') samples. |
107 | 112 |
<<celfiles>>= |
108 | 113 |
celFiles <- list.celfiles(pathToCels, full.names=TRUE, pattern=".CEL") |
109 | 114 |
celFiles <- celFiles[substr(basename(celFiles), 13, 13) %in% c("C", "Y")] |
115 |
+if(exists("file.index")){ |
|
116 |
+ celFiles <- celFiles[file.index] |
|
117 |
+} |
|
118 |
+##celFiles <- celFiles[substr(basename(celFiles), 13, 13) %in% "C"] |
|
110 | 119 |
@ |
111 | 120 |
|
112 | 121 |
Finally, copy number analyses using \crlmm{} require specification of |
... | ... |
@@ -140,7 +149,7 @@ confidence score for the genotype calls. The function |
140 | 149 |
\Rfunction{genotype} performs both the preprocessing and genotyping. |
141 | 150 |
|
142 | 151 |
<<LDS_genotype, cache=TRUE>>= |
143 |
-cnSet <- genotype(celFiles, batch=plates, cdfName=cdfName) |
|
152 |
+cnSet <- genotype(celFiles, batch=plates, cdfName=cdfName, genome="hg19") |
|
144 | 153 |
@ |
145 | 154 |
|
146 | 155 |
Segment faults that occur with the above step can often be traced to a |
... | ... |
@@ -169,22 +178,10 @@ genotype calls are stored on disk rather than in active memory. |
169 | 178 |
print(object.size(cnSet), units="Mb") |
170 | 179 |
@ |
171 | 180 |
|
172 |
-We save the \Robject{cnSet} object in a local directory for subsequent |
|
173 |
-copy number analysis in the \verb+copynumber+ vignette. |
|
174 |
- |
|
175 |
-<<save,cache=TRUE>>= |
|
176 |
-saveObject <- function(outdir, cnSet) { |
|
177 |
- save(cnSet, file=file.path(outdir, "cnSet.rda")) |
|
178 |
- TRUE |
|
179 |
-} |
|
180 |
-(cnset.saved <- saveObject(outdir, cnSet)) |
|
181 |
-@ |
|
182 |
- |
|
183 | 181 |
%Users can proceed to the \verb+copynumber+ vignette for copy number |
184 | 182 |
%analyses. See the \verb+Infrastructure+ vignette for additional |
185 | 183 |
%details on the \Robject{CNSet} class, including an overview of the |
186 | 184 |
%available accessors. |
187 |
- |
|
188 | 185 |
\SweaveInput{copynumber} |
189 | 186 |
|
190 | 187 |
|
* collab:
replace splitIndicesByLength with splitIndicesByNode throughout cnrma-functions.R (check that snow is loaded and getCluster is not null)
add an example to genotype.Illumina that indicates how parallelization would be enabled. The example requires local data and is not run.
change outdir in IlluminaPreprocessCN and AffyGW
Update R/crlmm-illumina.R
bump version for parallelization of genotype.Illumina
Update R/crlmm-illumina.R
git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/crlmm@64151 bc3139a8-67e5-0310-9ffc-ced21a209358
... | ... |
@@ -72,10 +72,7 @@ files to the path indicated by \verb+outdir+. |
72 | 72 |
|
73 | 73 |
<<setup>>= |
74 | 74 |
pathToCels <- "/thumper/ctsa/snpmicroarray/hapmap/raw/affy/1m" |
75 |
-if(getRversion() < "2.13.0"){ |
|
76 |
- rpath <- getRversion() |
|
77 |
-} else rpath <- "trunk" |
|
78 |
-outdir <- paste("/amber1/archive/oncbb/rscharpf/crlmm_vignette/", rpath, "/gw6/", sep="") |
|
75 |
+outdir <- paste("/local_data/r00/crlmm/", getRversion(), "/affy_vignette", sep="") |
|
79 | 76 |
dir.create(outdir, recursive=TRUE, showWarnings=FALSE) |
80 | 77 |
@ |
81 | 78 |
|
* mymac:
add AffyGW.pdf
update vignettes in inst/scripts
Change argument of validCEL to celfiles
Update constructInf to accommodate GenomeDataFrame class for featureData
bump version to 1.13.7
Add doRUnit.R
Add celfile-utils.Rd
Streamlne some of the Rd files
add validCEL function that checks whether all celfiles can be read
getFeatureData returns GenomeAnnotatedDataFrame
Remove imports from methods. Remove pdf of illumina_copynumber.pdf (large file) and copynumber.pdf
getFeatureDAta returns GenomeAnnotatedDataFrame
Remove separate vignette for copy number in inst/scripts. Include copynumber section in both affy and illumina pipelines.
update documentation files for genotype.Illumina, preprocessInf, and genotypeInf (cdfName added as argument. Indicate that 'batch' should be a character string)
pass cdfName to genotypeInf and preprocessInf
add unitTests and cn-functions for 'simple usage'
Combine AffyPreprocess and copynumber. Combine IlluminaPreprocess and copynumber
remove depency on ff to allow installation on my mac
git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/crlmm@62108 bc3139a8-67e5-0310-9ffc-ced21a209358
1 | 1 |
new file mode 100644 |
... | ... |
@@ -0,0 +1,216 @@ |
1 |
+%\VignetteIndexEntry{Preprocessing and genotyping Affymetrix arrays |
|
2 |
+%for copy number analysis} |
|
3 |
+%\VignetteDepends{crlmm, genomewidesnp6Crlmm, cacheSweave, ff} |
|
4 |
+%\VignetteKeywords{crlmm, SNP 6, copy number, SNP} |
|
5 |
+%\VignettePackage{crlmm} |
|
6 |
+\documentclass{article} |
|
7 |
+\usepackage{graphicx} |
|
8 |
+\usepackage{natbib} |
|
9 |
+\usepackage{amsmath} |
|
10 |
+\usepackage{url} |
|
11 |
+\newcommand{\Rfunction}[1]{{\texttt{#1}}} |
|
12 |
+\newcommand{\Rmethod}[1]{{\texttt{#1}}} |
|
13 |
+\newcommand{\Rcode}[1]{{\texttt{#1}}} |
|
14 |
+\newcommand{\Robject}[1]{{\texttt{#1}}} |
|
15 |
+\newcommand{\Rpackage}[1]{{\textsf{#1}}} |
|
16 |
+\newcommand{\Rclass}[1]{{\textit{#1}}} |
|
17 |
+\newcommand{\oligo}{\Rpackage{oligo }} |
|
18 |
+\newcommand{\R}{\textsf{R}} |
|
19 |
+\newcommand{\crlmm}{\Rpackage{crlmm}} |
|
20 |
+\usepackage[margin=1in]{geometry} |
|
21 |
+ |
|
22 |
+\begin{document} |
|
23 |
+\title{Preprocessing \& Genotyping Affymetrix Arrays for Copy Number Analysis} |
|
24 |
+\date{\today} |
|
25 |
+\author{Rob Scharpf} |
|
26 |
+\maketitle |
|
27 |
+ |
|
28 |
+<<setup, echo=FALSE, results=hide>>= |
|
29 |
+options(continue=" ", width=70) |
|
30 |
+@ |
|
31 |
+ |
|
32 |
+%\section{Estimating copy number} |
|
33 |
+ |
|
34 |
+%At present, software for copy number estimation is provided only for the |
|
35 |
+%Affymetrix 6.0 platform. |
|
36 |
+ |
|
37 |
+\begin{abstract} |
|
38 |
+ |
|
39 |
+ This vignette describes the setup needed to analyze Affymetrix 6.0 |
|
40 |
+ (or 5.0) CEL files and the steps for preprocessing and |
|
41 |
+ genotyping. These steps must be completed prior to copy number |
|
42 |
+ analyses in \crlmm{}. After completing these steps, users can refer |
|
43 |
+ to the \verb+copynumber+ vignette. |
|
44 |
+ |
|
45 |
+\end{abstract} |
|
46 |
+ |
|
47 |
+\section{Set up} |
|
48 |
+ |
|
49 |
+<<cdfname, results=hide>>= |
|
50 |
+library(crlmm) |
|
51 |
+library(ff) |
|
52 |
+library(cacheSweave) |
|
53 |
+@ |
|
54 |
+ |
|
55 |
+This vignette analyzes HapMap samples assayed on the Affymetrix 6.0 |
|
56 |
+platform. The annotation package for this platform is |
|
57 |
+\Rpackage{genomewidesnp6Crlmm}. We assign the name of the annotation |
|
58 |
+package without the \verb+Crlmm+ postfix to the name \verb+cdfName+. |
|
59 |
+We use the \R{} package \Rpackage{cacheSweave} to cache long |
|
60 |
+computations in this vignette. Users should refer to the |
|
61 |
+\Rpackage{cacheSweave} package for additional details regarding |
|
62 |
+cacheing. |
|
63 |
+ |
|
64 |
+<<cdfname>>= |
|
65 |
+cdfName <- "genomewidesnp6" |
|
66 |
+@ |
|
67 |
+ |
|
68 |
+The HapMap CEL files are stored in a local directory assigned to |
|
69 |
+\verb+pathToCels+ in the following code. The genotyping step will |
|
70 |
+create several files with \verb+ff+ extensions. We will store these |
|
71 |
+files to the path indicated by \verb+outdir+. |
|
72 |
+ |
|
73 |
+<<setup>>= |
|
74 |
+pathToCels <- "/thumper/ctsa/snpmicroarray/hapmap/raw/affy/1m" |
|
75 |
+if(getRversion() < "2.13.0"){ |
|
76 |
+ rpath <- getRversion() |
|
77 |
+} else rpath <- "trunk" |
|
78 |
+outdir <- paste("/amber1/archive/oncbb/rscharpf/crlmm_vignette/", rpath, "/gw6/", sep="") |
|
79 |
+dir.create(outdir, recursive=TRUE, showWarnings=FALSE) |
|
80 |
+@ |
|
81 |
+ |
|
82 |
+By providing the path in \verb+outdir+ as an argument to the \R{} |
|
83 |
+function \Rfunction{ldPath}, all of the \verb+ff+ files created during |
|
84 |
+the genotyping step will be stored in \verb+outdir+. |
|
85 |
+ |
|
86 |
+<<ldpath>>= |
|
87 |
+ldPath(outdir) |
|
88 |
+@ |
|
89 |
+ |
|
90 |
+% only needed if cacheing |
|
91 |
+<<cachedir, echo=FALSE>>= |
|
92 |
+setCacheDir(outdir) |
|
93 |
+@ |
|
94 |
+ |
|
95 |
+The \R{} functions \Rfunction{ocProbesets} and \Rfunction{ocSamples} |
|
96 |
+manage the RAM required for our analysis. See the documentation for |
|
97 |
+these functions and the \verb+CopyNumberOverview+ vignette for |
|
98 |
+additional details. |
|
99 |
+ |
|
100 |
+<<ram>>= |
|
101 |
+ocProbesets(100000) |
|
102 |
+ocSamples(200) |
|
103 |
+@ |
|
104 |
+ |
|
105 |
+ |
|
106 |
+Next we indicate the local directory that contains the CEL files. For |
|
107 |
+the purposes of this vignette, we only analyze the CEPH ('C') and |
|
108 |
+Yoruban ('Y') samples. |
|
109 |
+ |
|
110 |
+<<celfiles>>= |
|
111 |
+celFiles <- list.celfiles(pathToCels, full.names=TRUE, pattern=".CEL") |
|
112 |
+celFiles <- celFiles[substr(basename(celFiles), 13, 13) %in% c("C", "Y")] |
|
113 |
+@ |
|
114 |
+ |
|
115 |
+Finally, copy number analyses using \crlmm{} require specification of |
|
116 |
+a batch variable that is used to indicate which samples were processed |
|
117 |
+together. For example, if some of the samples were processed in April |
|
118 |
+and another set of samples were processed in June, we could name the |
|
119 |
+batches 'April' and 'June', respectively. A useful surrogate for |
|
120 |
+batch is often the chemistry plate or the scan date of the array. For |
|
121 |
+the HapMap CEL files analyzed in this vignette, the CEPH (C) and |
|
122 |
+Yoruban (Y) samples were prepared on separate chemistry plates. In |
|
123 |
+the following code chunk, we extract the population identifier from |
|
124 |
+the CEL file names and assign these identifiers to the variable |
|
125 |
+\Robject{plate}. |
|
126 |
+ |
|
127 |
+<<plates>>= |
|
128 |
+plates <- substr(basename(celFiles), 13, 13) |
|
129 |
+@ |
|
130 |
+ |
|
131 |
+\section{Preprocessing and genotyping.} |
|
132 |
+ |
|
133 |
+The preprocessing steps for copy number estimation includes quantile |
|
134 |
+normalization of the raw intensities for each probe and a step that |
|
135 |
+summarizes the intensities of multiple probes at a single locus. For |
|
136 |
+example, the Affymetrix 6.0 platform has 3 or 4 identical probes at |
|
137 |
+each polymorphic locus and the normalized intensities are summarized |
|
138 |
+by a median. For the nonpolymorphic markers on Affymetrix 6.0, only |
|
139 |
+one probe per locus is available and the summarization step is not |
|
140 |
+needed. After preprocessing the arrays, the \crlmm{} package |
|
141 |
+estimates the genotype using the CRLMM algorithm and provides a |
|
142 |
+confidence score for the genotype calls. The function |
|
143 |
+\Rfunction{genotype} performs both the preprocessing and genotyping. |
|
144 |
+ |
|
145 |
+<<LDS_genotype, cache=TRUE>>= |
|
146 |
+cnSet <- genotype(celFiles, batch=plates, cdfName=cdfName) |
|
147 |
+@ |
|
148 |
+ |
|
149 |
+Segment faults that occur with the above step can often be traced to a |
|
150 |
+corrupt cel file. To check if any of the files are corrupt, try |
|
151 |
+reading the files in one at a time: |
|
152 |
+ |
|
153 |
+<<checkcorrupt,eval=FALSE>>= |
|
154 |
+validCEL(celFiles) |
|
155 |
+@ |
|
156 |
+ |
|
157 |
+ |
|
158 |
+The value returned by genotype is an instance of the class |
|
159 |
+\Robject{CNSet}. The normalized intensities, genotype calls, and |
|
160 |
+confidence scores are stored as \verb+ff+ objects in the |
|
161 |
+\verb+assayData+ slot. A concise summary of this object can be |
|
162 |
+obtained throught the \Rfunction{print} or \Rfunction{show} methods. |
|
163 |
+ |
|
164 |
+<<show>>= |
|
165 |
+print(cnSet) |
|
166 |
+@ |
|
167 |
+ |
|
168 |
+Note that the object is relatively small as the intensities and |
|
169 |
+genotype calls are stored on disk rather than in active memory. |
|
170 |
+ |
|
171 |
+<<objectsize>>= |
|
172 |
+print(object.size(cnSet), units="Mb") |
|
173 |
+@ |
|
174 |
+ |
|
175 |
+We save the \Robject{cnSet} object in a local directory for subsequent |
|
176 |
+copy number analysis in the \verb+copynumber+ vignette. |
|
177 |
+ |
|
178 |
+<<save,cache=TRUE>>= |
|
179 |
+saveObject <- function(outdir, cnSet) { |
|
180 |
+ save(cnSet, file=file.path(outdir, "cnSet.rda")) |
|
181 |
+ TRUE |
|
182 |
+} |
|
183 |
+(cnset.saved <- saveObject(outdir, cnSet)) |
|
184 |
+@ |
|
185 |
+ |
|
186 |
+%Users can proceed to the \verb+copynumber+ vignette for copy number |
|
187 |
+%analyses. See the \verb+Infrastructure+ vignette for additional |
|
188 |
+%details on the \Robject{CNSet} class, including an overview of the |
|
189 |
+%available accessors. |
|
190 |
+ |
|
191 |
+\SweaveInput{copynumber} |
|
192 |
+ |
|
193 |
+ |
|
194 |
+\section{Session information} |
|
195 |
+<<sessionInfo, results=tex>>= |
|
196 |
+toLatex(sessionInfo()) |
|
197 |
+@ |
|
198 |
+ |
|
199 |
+ |
|
200 |
+\begin{figure}[f] |
|
201 |
+ \begin{center} |
|
202 |
+ \includegraphics[width=0.6\textwidth]{AffyGW-snr.pdf} |
|
203 |
+ \caption{The signal to noise ratio (SNR) for 180 HapMap samples. For |
|
204 |
+ Affymetrix platforms, SNR values below 5 can indicate possible |
|
205 |
+ problems with sample quality. In some circumstances, it may be |
|
206 |
+ more helpful to exclude samples with poor DNA quality.} |
|
207 |
+\end{center} |
|
208 |
+\end{figure} |
|
209 |
+ |
|
210 |
+ |
|
211 |
+\begin{bibliography} |
|
212 |
+ \bibliographystyle{plain} |
|
213 |
+ \bibliography{refs} |
|
214 |
+\end{bibliography} |
|
215 |
+ |
|
216 |
+\end{document} |