%\VignetteIndexEntry{crlmm copy number Vignette}
%\VignetteDepends{crlmm, genomewidesnp6Crlmm}
%\VignetteKeywords{crlmm, SNP 6}
%\VignettePackage{crlmm}
\documentclass{article}
\usepackage{graphicx}
\usepackage{natbib}
\usepackage{url}
\usepackage{amsmath,amssymb}
\newcommand{\Rfunction}[1]{{\texttt{#1}}}
\newcommand{\Rmethod}[1]{{\texttt{#1}}}
\newcommand{\Rcode}[1]{{\texttt{#1}}}
\newcommand{\Robject}[1]{{\texttt{#1}}}
\newcommand{\Rpackage}[1]{{\textsf{#1}}}
\newcommand{\Rclass}[1]{{\textit{#1}}}
\newcommand{\oligo}{\Rpackage{oligo }}
\newcommand{\R}{\textsf{R}}
\newcommand{\crlmm}{\Rpackage{crlmm}}
\usepackage[margin=1in]{geometry}

\begin{document}
\title{Copy number estimation with \Rpackage{crlmm}}
\date{\today}
\author{Rob Scharpf}
\maketitle

<<setup, echo=FALSE, results=hide>>=
options(continue=" ", width=70)
@

\begin{abstract}

Copy number routines in the \crlmm{} package are available for
Affymetrix 5.0 and 6.0 platforms, as well as several Illumina
platforms.  This vignette assumes that the arrays have already been
successfully preprocessed and genotyped as per the instructions in
the \verb+AffymetrixPreprocessCN+ and \verb+IlluminaPreprocessCN+
vignettes for the Affymetrix and Illumina platforms,
respectively. While this vignette uses Affymetrix 6.0 arrays for
illustration, the steps at this point are identical for both
platforms.  See \citep{Scharpf2011} for details regarding the
methodology implemented in \crlmm{} for copy number analysis.  In
addition, a compendium describing copy number analysis using the
\crlmm{} package is available from the author's website:
\url{http://www.biostat.jhsph.edu/~rscharpf/crlmmCompendium/index.html}.

\end{abstract}

\section{Set up}

<<libraries,results=hide>>=
library(ff)
library(crlmm)
library(lattice)
library(cacheSweave)
if(getRversion() < "2.13.0"){
rpath <- getRversion()
} else rpath <- "trunk"
outdir <- paste("/thumper/ctsa/snpmicroarray/rs/ProcessedData/crlmm/", rpath, "/copynumber_vignette", sep="")
@

<<ram>>=
ldPath(outdir)
setCacheDir(outdir)
ocProbesets(150e3)
ocSamples(200)
@

\verb+AffymetrixPreprocessCN+ vignette.

@

\textbf{Limitations:} While a minimum number of samples is not
required for preprocessing and genotyping, copy number estimation in
the \crlmm{} package currently requires at least 10 samples per batch.
The parameter estimates for copy number and the corresponding
estimates of raw copy number will tend to be more noisy for batches
with small sample sizes (e.g., $<$ 50).  Chemistry plate or scan date
are often useful surrogates for batch.  Samples that were processed at
similar times (e.g., in the same month) can be grouped together in the
same batch.

\section{Quality control}

The signal to noise ratio (SNR) estimated by the CRLMM genotyping
algorithm is an overall measure of the separation of the diallelic
genotype clusters at polymorphic loci and can be a useful measure of
array quality.  Small SNR values can indicate possible problems with
the DNA.  Depending on the size of the dataset and the number of
samples with low SNR, users may wish to rerun the preprocessing and
genotyping steps after excluding samples with low SNR.  The SNR is
stored in the \verb+phenoData+ slot of the \Robject{CNSet} object and
is available after preprocessing and genotyping. SNR values below 5
for Affymetrix or below 25 for Illumina may indicate poor sample
quality.  The following code chunk makes a histogram of the SNR values
for the HapMap samples.

<<snr,fig=TRUE,include=FALSE,width=6, height=4>>=
invisible(open(cnSet$SNR)) snr <- cnSet$SNR[]
F <- sample(which(cnSet$gender[]==2), 5) cn.M <- CA(cnSet, i=npx.index, j=M) cn.F <- CA(cnSet, i=npx.index, j=F) @ \noindent Again, the function \Rfunction{totalCopynumber} is equivalent. <<npx2>>= cnX <- cbind(cn.M, cn.F) cnX2 <- totalCopynumber(cnSet, i=npx.index, j=c(M, F)) stopifnot(all.equal(cnX, cnX2)) @ <<nonpolymorphicX,fig=TRUE,include=FALSE, echo=FALSE, eval=FALSE>>= df <- data.frame(cn=as.numeric(cnX), id=factor(rep(sampleNames(cnSet)[c(M,F)], each=nrow(cnX)), levels=sampleNames(cnSet)[c(M,F)], ordered=T)) library(RColorBrewer) cols <- brewer.pal(8, "Accent")[c(1, 5)] print(bwplot(cn~id,df, panel=function(x,y,...){ panel.grid(v=-10,h=0) panel.bwplot(x,y, ...) panel.abline(h=1:2, col="grey70",lwd=2) }, scales=list(x=list(rot=90)), cex=0.5, ylab="total copy number", main="Chr X SNPs", fill=cols[cnSet$gender[c(M,F)]], key=mykey))
@

%\begin{figure}[t!]
% \centering
% \includegraphics[width=0.5\textwidth]{copynumber-nonpolymorphicX}
% \caption{Copy number estimates for nonpolymorphic loci on chromosome
%   X (5 men, 5 women).  We assume that the median copy number across
%   samples at a given marker on X is 1 for men and 2 for women.}
%\end{figure}

Polymorphic markers on chromosome X:

<<polymorphicX,fig=TRUE,include=FALSE>>=
library(RColorBrewer)
cols <- brewer.pal(8, "Accent")[c(1, 5)]
X.markers <- which(isSnp(cnSet) & chromosome(cnSet) == 23)
cnX <- totalCopynumber(cnSet, i=X.markers, j=c(M,F))
df <- data.frame(cn=as.numeric(cnX), id=factor(rep(sampleNames(cnSet)[c(M,F)], each=length(X.markers)), levels=sampleNames(cnSet)[c(M,F)], ordered=T))
mykey <- simpleKey(c("male", "female"), points=FALSE,col=cols)
print(bwplot(cn~id,df, panel=function(x,y,...){
panel.grid(v=-10,h=0)
panel.bwplot(x,y, ...)
panel.abline(h=1:2, col="grey70",lwd=2)
},
scales=list(x=list(rot=90)), cex=0.5, ylab="total copy number", main="Chr X SNPs",
fill=cols[cnSet\$gender[c(M,F)]], key=mykey))
@
\begin{figure}[t!]
\centering
\includegraphics[width=0.5\textwidth]{copynumber-polymorphicX}
\caption{Copy number estimates for polymorphic markers on chromosome
X. crlmm assumes that the median copy number across samples at a
given marker on X is 1 for men and 2 for women.}
\end{figure}

\subsection{A container for raw copy number}

A useful container for storing the \crlmm{} genotypes, genotype
confidence scores, and the total copy number at each marker is the
\Rclass{oligoSnpSet} class.  Coercion of a \Rclass{CNSet} object to a
\Rfunction{oligoSnpSet} object can be acheived by using the method
\Rfunction{as} (as illustrated below). Users should note that if the
\verb+assayData+ elements in the \Rclass{CNSet} instance are
\Rpackage{ff} objects, the \verb+assayData+ elements of the
instantiated \Rclass{oligoSnpSet} will also be \Rpackage{ff}-dervied
objects (a new \verb+total_cn*.ff+ file will be created in the
\verb+ldPath()+ directory).

%The advantage of this approach is that the raw copy number estimates
%can be computed for all markers and all samples simultaneously without
%excessive RAM.  The disadvantage is that such a step creates
%from disk.  For very large data sets (1000+ samples), the time
%required for I/O may outweigh the benefits.  The alternative is to
%perform downstream processing (such as smoothing) on the total copy
%number for subsets of markers and/or samples.  As indicated
%previously, the subset [' method applied to an object of class
%\Robject{CNSet} will return an object of the same class with simple
%matrices instead of \Rpackage{ff} objects.  Therefore, one can
%instantiate a \Rpackage{oligoSnpSet} object from a \Robject{CNSet}
%object without the I/O overhead for storing the total copy number
%estimates by first subsetting the \Rclass{CNSet} object.  One could
%then smooth the raw copy number estimates and store a summary copy
%number for the interval, thereby substantially reducing the
%dimensionality of the copy number estimates.  Examples of smoothing
%applications such as circular binary segmentation implemented in the
%\Rpackage{DNAcopy} package or a hidden Markov model implemented in the
%\Rpackage{VanillaICE} package are provided in the \verb+SmoothingCN+
%vignette.

<<oligoSnpSet>>=
open(cnSet3)
oligoSet <- as(cnSet3, "oligoSnpSet")
close(cnSet3)
class(copyNumber(oligoSet))
@

\noindent Note that the raw copy number estimates stored in the
\Robject{oligoSnpSet} object can be retrieved by the
\Rfunction{copyNumber} accessor and is equivalent to that returned by
the \Rfunction{totalCopynumber} function defined over the same row and
column indices.

<<testEqual>>=
total.cn3 <- totalCopynumber(cnSet3, i=1:nrow(cnSet3), j=1:ncol(cnSet3))
all.equal(copyNumber(oligoSet), total.cn3)
@

\section{Session information}
<<sessionInfo, results=tex>>=
toLatex(sessionInfo())
@

%\begin{bibliography}
\bibliographystyle{plain}
\bibliography{refs}
%\end{bibliography}

\end{document}