%\VignetteIndexEntry{crlmm Vignette - Illumina 370k chip} %\VignetteKeywords{genotype, crlmm, Illumina} %\VignettePackage{crlmm} \documentclass[12pt]{article} \usepackage{amsmath,pstricks} \usepackage[authoryear,round]{natbib} \usepackage{hyperref} \usepackage{Sweave} \textwidth=6.2in \textheight=8.5in %\parskip=.3cm \oddsidemargin=.1in \evensidemargin=.1in \headheight=-.3in \newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\Rmethod}[1]{{\texttt{#1}}} \newcommand{\Rfunarg}[1]{{\texttt{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \textwidth=6.2in \bibliographystyle{plainnat} \begin{document} %\setkeys{Gin}{width=0.55\textwidth} \title{Using \Rpackage{crlmm} to genotype data from Illumina's Infinium BeadChips} \author{Matt Ritchie} \maketitle \section{Getting started} In this user guide we read in and genotype data from 40 HapMap samples which have been analyzed using Illumina's 370k Duo BeadChips. This data is available in the \Rpackage{hapmap370k} package. Additional chip-specific model parameters and basic SNP annotation information used by CRLMM is stored in the \Rpackage{human370v1cCrlmm} package. The required packages can be installed in the usual way using the \Rfunction{biocLite} function. <<echo=TRUE, results=hide, eval=FALSE>>= source("http://www.bioconductor.org/biocLite.R") biocLite(c("crlmm", "hapmap370k", "human370v1cCrlmm")) @ \section{Reading in data} The function \Rfunction{readIdatFiles} extracts the Red and Green intensities from the binary {\tt idat} files output by Illumina's scanning device. The file {\tt samples370k.csv} contains information about each sample. <<echo=FALSE, results=hide, eval=TRUE>>= options(width=70) @ <<read, results=hide, eval=TRUE>>= library(crlmm) library(ff) library(hapmap370k) data.dir = system.file("idatFiles", package="hapmap370k") # Read in sample annotation info samples = read.csv(file.path(data.dir, "samples370k.csv"), as.is=TRUE) samples[1:5,] @ \section{Genotyping} Next we use the function \Rfunction{crlmmIllumina} which performs preprocessing followed by genotyping using the CRLMM algorithm. It reads in data from idat files and stores results in a \Rclass{CNSet} object. <<genotype, results=hide, cache=TRUE>>= crlmmResult = crlmmIllumina(samples, path=data.dir, arrayInfoColNames=list(barcode=NULL,position="SentrixPosition"), saveDate=TRUE, cdfName="human370v1c", verbose=FALSE) class(crlmmResult) dim(crlmmResult) slotNames(crlmmResult) calls(crlmmResult)[1:10, 1:5] i2p(confs(crlmmResult)[1:10,1:5]) @ If GenCall output is available instead of idat files, the function \Rfunction{readGenCallOutput} can be used to read in the data. This function assumes the GenCall output is formatted to have samples listed one below the other, and that the columns 'X Raw' and 'Y Raw' are available in the file. The resulting \Robject{NChannelSet} from this function can be used as input to \Rfunction{crlmmIllumina} (or equivalently \code{genotype.Illumina}) via the \Robject{XY} argument (instead of the usual \Rfunction{RG} argument used when the data has been read in from idat files). Plotting the {\it SNR} reveals no obvious batch effects in this data set (different symbols are used for arrays scanned on different days). <<snr, fig=TRUE, width=8, height=6>>= plot(crlmmResult[["SNR"]], pch=scanbatch, xlab="Array", ylab="SNR", main="Signal-to-noise ratio per array",las=2) @ <<plotsamples, fig=TRUE, width=8, height=8>>= par(mfrow=c(2,2)) plotSamples(crlmmResult, col=1:4) @ <<plotsnps, fig=TRUE, width=8, height=8>>= par(mfrow=c(2,2)) plotSNPs(crlmmResult, row=1:4) @ \section{System information} This analysis was carried out on a linux machine with 32GB of RAM using the following packages: <<session>>= sessionInfo() @ \end{document}