%\VignetteIndexEntry{crlmm Vignette - Illumina 370k chip} %\VignetteKeywords{genotype, crlmm, Illumina} %\VignettePackage{crlmm} \documentclass[12pt]{article} \usepackage{amsmath,pstricks} \usepackage[authoryear,round]{natbib} \usepackage{hyperref} \usepackage{Sweave} \textwidth=6.2in \textheight=8.5in %\parskip=.3cm \oddsidemargin=.1in \evensidemargin=.1in \headheight=-.3in \newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\Rmethod}[1]{{\texttt{#1}}} \newcommand{\Rfunarg}[1]{{\texttt{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \textwidth=6.2in \bibliographystyle{plainnat} \begin{document} %\setkeys{Gin}{width=0.55\textwidth} \title{Using \Rpackage{crlmm} to genotype data from Illumina's Infinium BeadChips} \author{Matt Ritchie} \maketitle \section{Getting started} In this user guide we read in and genotype data from 40 HapMap samples which have been analyzed using Illumina's 370k Duo BeadChips. This data is available in the \Rpackage{hapmap370k} package. Additional chip-specific model parameters and basic SNP annotation information used by CRLMM is stored in the \Rpackage{human370v1c} package. These can be downloaded from \href{http://rafalab.jhsph.edu/software.html}{http://rafalab.jhsph.edu/software.html} and must be installed for the following code to work. \section{Reading in data} The function \Rfunction{readIdatFiles} extracts the Red and Green intensities from the binary {\tt idat} files output by Illumina's scanning device. The file {\tt samples370k.csv} contains information about each sample. <<<echo=FALSE, results=hide, eval=TRUE>>= options(width=50) @ <<read>>= library(Biobase) library(crlmm) library(hapmap370k) data.dir = system.file("idatFiles", package="hapmap370k") # Read in sample annotation info samples = read.csv(file.path(data.dir, "samples370k.csv"), as.is=TRUE) samples[1:5,] @ <<read2, results=hide, cache=TRUE>>= # Read in .idats using sampleSheet information RG = readIdatFiles(samples, path=data.dir, arrayInfoColNames=list(barcode=NULL, position="SentrixPosition"), saveDate=TRUE) @ Reading in this data takes approximately 90 seconds and peak memory usage was 1.2 GB of RAM on our linux system. The \Robject{RG} object is an \Rclass{NChannelSet} which stores the Red and Green intensities, the number of beads and standard errors for each bead-type. The scanning date of each array is stored in \Robject{protocolData}. <<explore>>= class(RG) dim(RG) slotNames(RG) channelNames(RG) exprs(channel(RG, "R"))[1:5,1:5] exprs(channel(RG, "G"))[1:5,1:5] pd = pData(RG) pd[1:5,] scandatetime = strptime(protocolData(RG)[["ScanDate"]], "%m/%d/%Y %H:%M:%S %p") datescanned = substr(scandatetime, 1, 10) scanbatch = factor(datescanned) levels(scanbatch) = 1:16 scanbatch = as.numeric(scanbatch) @ Plots of the summarised data can be easily generated to check for arrays with poor signal. <<boxplots, fig=TRUE, width=8, height=8>>= par(mfrow=c(2,1), mai=c(0.4,0.4,0.4,0.1), oma=c(1,1,0,0)) boxplot(log2(exprs(channel(RG, "R"))), xlab="Array", ylab="", names=1:40, main="Red channel", outline=FALSE, las=2) boxplot(log2(exprs(channel(RG, "G"))), xlab="Array", ylab="", names=1:40, main="Green channel", outline=FALSE, las=2) mtext(expression(log[2](intensity)), side=2, outer=TRUE) mtext("Array", side=1, outer=TRUE) @ \section{Genotyping} Next we use the function \Rfunction{crlmmIllumina} which performs preprocessing followed by genotyping using the CRLMM algorithm. <<genotype, results=hide, cache=TRUE>>= crlmmResult = crlmmIllumina(RG=RG, cdfName="human370v1c", sns=pData(RG)$ID, returnParams=TRUE) @ This analysis took 470 seconds to complete and peak memory usage was 3.3 GB on our system. The output stored in \Robject{crlmmResult} is a \Rclass{SnpSet} object. <<explore2>>= class(crlmmResult) dim(crlmmResult) slotNames(crlmmResult) calls(crlmmResult)[1:10, 1:5] @ Plotting the {\it SNR} reveals no obvious batch effects in this data set (different symbols are used for arrays scanned on different days). <<snr, fig=TRUE, width=8, height=6>>= plot(crlmmResult[["SNR"]], pch=scanbatch, xlab="Array", ylab="SNR", main="Signal-to-noise ratio per array", las=2) @ \section{System information} This analysis was carried out on a linux machine with 32GB of RAM using the following packages: <<session>>= sessionInfo() @ \end{document}