Browse code

Updating DESCRIPTION,ucscannot.R, and GMRP.Rnw files after syncing my Github and Bioconductor repos

y.tan authored on 12/05/2018 17:57:17
Showing 4 changed files

... ...
@@ -12,7 +12,7 @@ Description: Perform Mendelian randomization analysis of multiple SNPs
12 12
 License: GPL (>= 2)
13 13
 Depends: R(>= 3.3.0),stats,utils,graphics, grDevices, diagram, plotrix,
14 14
         base,GenomicRanges
15
-Suggests: BiocStyle, BiocGenerics, VariantAnnotation
15
+Suggests: BiocStyle, BiocGenerics
16 16
 LazyLoad: yes
17 17
 biocViews: Sequencing, Regression, SNP
18 18
 NeedsCompilation: no
... ...
@@ -40,13 +40,18 @@ stop("No Symbol found in the data")
40 40
  annot<-c("exon","intron","intergene","5'UTR","5'upstream","3'UTR","3'downstream")
41 41
  lbls <- paste(annot, "\n", round(res2*100,1),"%", sep="")
42 42
 # oldcexmain<-par(cex.main=A)
43
- pie3D(res2,labelcex=B,labels=lbls,labelcol=par("fg"), explode=0.1,labelrad=C,main=main)
44
- }else if(method==2){
45
-  annot<-c("intron","exon","3'downstream","3'UTR","5'upstream","5'UTR","intergene")
46
-  lbls <- paste(round(res2*100,1),"%", sep="")		
47
- # cols=c("gold2","red","magenta","blue","cornflowerblue","limegreen","mediumseagreen")
48
- pie3D(res2,labelcex=(B+0.5),labels=lbls,labelcol=par("fg"), explode=0.08,labelrad=C,main=main)
49
- if (B>=1.8){
43
+bisectors<-pie3D(res2,explode=0.1,main=main)
44
+ pie3D.labels(radialpos=bisectors,radius=1,height=0.1,theta=pi/6,
45
+       labels=lbls,labelcol=par("fg"),labelcex=B,labelrad=C,minsep=0.3)
46
+  }else if(method==2){
47
+        annot<-c("intron","exon","3'downstream","3'UTR","5'upstream","5'UTR","intergene")
48
+          lbls <- paste(round(res2*100,1),"%", sep="")      
49
+           # cols=c("gold2","red","magenta","blue","cornflowerblue","limegreen","mediumseagreen")
50
+            bisectors<-pie3D(res2,explode=0.08,main=main)
51
+            pie3D.labels(radialpos=bisectors,radius=1,height=0.1,theta=pi/6,
52
+                  labels=lbls,labelcol=par("fg"),labelcex=(B+0.5),labelrad=C,minsep=0.3) 
53
+
54
+if (B>=1.8){
50 55
  D<-1.8
51 56
  }else{
52 57
  D<-B}
... ...
@@ -3,9 +3,10 @@
3 3
 ###################################################
4 4
 ### code chunk number 1: <style-Sweave
5 5
 ###################################################
6
-BiocStyle::latex()
6
+#BiocStyle::latex()
7
+BiocStyle::latex(use.unsrturl=FALSE)
7 8
 
8
-#library("knitr")
9
+#library(knitr)
9 10
 #opts_chunk$set(tidy=FALSE,dev="pdf",fig.show="hide",
10 11
  #              fig.width=4,fig.height=4.5,
11 12
   #             dpi=300,# increase dpi to avoid pixelised pngs
... ...
@@ -4,22 +4,29 @@
4 4
 % To compile this document
5 5
 % library('cacheSweave');rm(list=ls());Sweave('GMRP.Rnw',driver=cacheSweaveDriver());system("pdflatex GMRP")
6 6
 
7
-\documentclass[a4paper]{article}
7
+\documentclass{article}
8
+
9
+%\usepackage[authoryear,round]{natbib}
10
+ 
11
+<<style, echo=FALSE, results=tex>>=
12
+BiocStyle::latex(use.unsrturl=FALSE)
13
+@
8 14
 
9 15
 \title{GWAS-based Mendelian Randomization Path Analysis}
10
-\author{Yuan-De Tan \ \
16
+\author{Yuan-De Tan \\
11 17
 \texttt{tanyuande@gmail.com}}
12 18
 
13
-<<<style-Sweave, eval=TRUE, echo=FALSE, results=tex>>=
14
-BiocStyle::latex()
15
-@ 
19
+%<<<style-Sweave, eval=TRUE, echo=FALSE, results=tex>>=
20
+%BiocStyle::latex()
21
+%@ 
16 22
 
17 23
 \begin{document}
18 24
 
19 25
 \maketitle
20 26
 
21 27
 \begin{abstract}
22
-\Rpackage{GMRP} can perform analyses of Mendelian randomization (\emph{MR}),correlation, path of causal variables onto disease of interest and \emph{SNP} annotation analysis. \emph{MR} includes \emph{SNP} selection with given criteria and regression analysis of causal variables on the disease to generate beta values of causal variables on the disease. Using the beta vectors, \Rpackage{GMRP} performs correlation and path analyses to construct  path diagrams of causal variables to the disease. \Rpackage{GMRP} consists of 8 $R$ functions: \Rfunction{chrp},\Rfunction{fmerge},\Rfunction{mktable},\Rfunction{pathdiagram},\Rfunction{pathdiagram2},\Rfunction{path},\Rfunction{snpPositAnnot},\Rfunction{ucscannot} and 5 datasets: \Robject{beta.data,cad.data,lpd.data,SNP358.data,SNP368annot.data}. \Rfunction{chrp} is used to separate string vector \texttt{hg19} into two numeric vectors: chromosome number and \emph{SNP} chromosome position. Function \Rfunction{fmerge} is used to merge two \emph{GWAS} result datasets into one dataset. Function \Rfunction{mktable} performs \emph{SNP} selection and creates a standard beta table for function \Rfunction{path} to do \emph{MR} and path analyses. Function \Rfunction{pathdiagram} is used to create a path diagram of causal variables onto a given disease or onto outcome. Function \Rfunction{pathdiagram2} can merge two-level \emph{pathdiagrams} into one nested \emph{pathdiagram} where inner \emph{pathdiagram} is a \emph{pathdiagram} of causal variables contributing to outcome and the outside \emph{pathdiagram} is a path diagram of causal variables including outcome onto the disease. The five datasets provide examples for running these functions. \Robject{lpd.data} and \Robject{cad.data} provide an example to create a standard beta dataset for path function to do path analysis and \emph{SNP} data for \emph{SNP} annotation analysis by performing \Rfunction{mktable} and \Rfunction{fmerge}. \Robject{beta.data} are a standard beta dataset for path analysis. \Robject{SNP358.data} provide an example for \Rfunction{snpPositAnnot} to do \emph{SNP} position annotation analysis and \Robject{SNP368annot.data} are for \Rfunction{ucscannot} to perform \emph{SNP} function annotation analysis.
28
+\Rpackage{GMRP} can perform analyses of Mendelian randomization (\emph{MR}),correlation, path of causal variables onto disease of interest and \emph{SNP} annotation analysis. \emph{MR} includes \emph{SNP} selection with given criteria and regression analysis of causal variables on the disease to generate beta values of causal variables on the disease. Using the beta vectors, \Rpackage{GMRP} performs correlation and path analyses to construct  path diagrams of causal variables to the
29
+disease. \Rpackage{GMRP} consists of 8 \emph{R} functions: \Rfunction{chrp},\Rfunction{fmerge},\Rfunction{mktable},\Rfunction{pathdiagram},\Rfunction{pathdiagram2},\Rfunction{path},\Rfunction{snpPositAnnot},\Rfunction{ucscannot} and 5 datasets: \Robject{beta.data,cad.data,lpd.data,SNP358.data,SNP368annot.data}. \Rfunction{chrp} is used to separate string vector \texttt{hg19} into two numeric vectors: chromosome number and \emph{SNP} chromosome position. Function \Rfunction{fmerge} is used to merge two \emph{GWAS} result datasets into one dataset. Function \Rfunction{mktable} performs \emph{SNP} selection and creates a standard beta table for function \Rfunction{path} to do \emph{MR} and path analyses. Function \Rfunction{pathdiagram} is used to create a path diagram of causal variables onto a given disease or onto outcome. Function \Rfunction{pathdiagram2} can merge two-level \emph{pathdiagrams} into one nested \emph{pathdiagram} where inner \emph{pathdiagram} is a \emph{pathdiagram} of causal variables contributing to outcome and the outside \emph{pathdiagram} is a path diagram of causal variables including outcome onto the disease. The five datasets provide examples for running these functions. \Robject{lpd.data} and \Robject{cad.data} provide an example to create a standard beta dataset for path function to do path analysis and \emph{SNP} data for \emph{SNP} annotation analysis by performing \Rfunction{mktable} and \Rfunction{fmerge}. \Robject{beta.data} are a standard beta dataset for path analysis. \Robject{SNP358.data} provide an example for \Rfunction{snpPositAnnot} to do \emph{SNP} position annotation analysis and \Robject{SNP368annot.data} are for \Rfunction{ucscannot} to perform \emph{SNP} function annotation analysis.
23 30
 \end{abstract}
24 31
 
25 32
 \tableofcontents
... ...
@@ -29,15 +36,16 @@ As an example of human disease, coronary artery disease (\emph{CAD}) is one of t
29 36
 
30 37
 \emph{MR} analysis can perfectly exclude confounding factors associated with disease.  However, when we expand one causal variable to many, \emph{MR} analysis becomes challenged and complicated because the genetic variant would have additional effects on the other risk factors, which violate assumption of no pleiotropy. An unknown genetic variant in \emph{MR} analysis possibly provides a false instrument for causal effect assessment of risk factors on the disease. The reason is that if this genetic variant is in \emph{LD} with another gene that is not used but has effect on the disease of study~\cite{Sheehan2007, Sheehan2010}. It then violates the third assumption.  These two problems can be addressed by using multiple instrumental variables.  For this reason, Do \emph{et al} (2013)developed statistic approach to address this issue~\cite{Do2013}. However, method of Do \emph{et al} ~\cite{Do2013} cannot disentangle correlation effects among the multiple undefined risk factors on the disease of study.  The beta values obtained from regression analyses are not direct causal effects because their effects are entangled with correlations among these undefined risk factors. 
31 38
 
32
-The best way to address the entanglement of multiple causal effects is path analysis that was developed by Wright~\cite{Wright1921, Wright1934}. This is because path analysis can dissect beta values into direct and indirect effects of causal variables on the disease. However, path analysis has not broadly been applied to diseases because diseases are usually binary variable. The method of Do ~\cite{Do2013} makes it possible to apply path analysis to disentangle causal effects of undefined risk factors on diseases. For doing so, we here provide \textbf {R package} \Rpackage{GMRP} (\emph{GWAS}-based \emph{MR} and \emph{path analysis}) to solve the above issues.
39
+The best way to address the entanglement of multiple causal effects is path analysis that was developed by Wright~\cite{Wright1921, Wright1934}. This is because path analysis can dissect beta values into direct and indirect effects of causal variables on the disease. However, path analysis has not broadly been applied to diseases because diseases are usually binary variable. The method of Do ~\cite{Do2013} makes it possible to apply path analysis to disentangle causal effects of undefined risk factors on diseases. For doing so, we here provide \textbf{R package} \Rpackage{GMRP} (\emph{GWAS}-based \emph{MR} and \emph{path analysis}) to solve the above issues.
33 40
 
34 41
 This vignette is intended to give a rapid introduction to the commands used in implementing \emph{MR} analysis, regression analysis, and path analysis, including \emph{SNP} annotation and chromosomal position analysis by means of the \Rpackage{GMRP} package. 
35 42
 
36
-We assume that user has the \emph{GWAS} result data from \emph{GWAS} analysis or \emph{GWAS} meta analysis of \emph{SNP}s associated with risk or confounding factors and a disease of study. If all studied causal variables of \emph{GWAS} data are separately saved in different sheet files, then files are assumed to have the same sheet format and they are required to be merged by using function \Rfunction{fmerg} into one sheet file without disease \emph{GWAS} data. After a standard beta table is created with \Rfunction{mktable}, user can use function \Rfunction{path} to perform \emph{RM} and path analyses.  Using the result of path analysis, user can draw path \Rfunction{plot}\textit{(pathdiagram)} with functions \Rfunction{pathdiagram} and \Rfunction{pathdiagram2}. These will be introduced in detail in the following examples.
43
+We assume that user has the \emph{GWAS} result data from \emph{GWAS} analysis or \emph{GWAS} meta analysis of \emph{SNP}s associated with risk or confounding factors and a disease of study. If all studied causal variables of \emph{GWAS} data are separately saved in different sheet files, then files are assumed to have the same sheet format and they are required to be merged by using function \Rfunction{fmerge} into one sheet file without disease \emph{GWAS} data. After a standard beta table is created with \Rfunction{mktable}, user can use function \Rfunction{path} to perform \emph{RM} and path analyses.  Using the result of path analysis, user can draw path \Rfunction{plot}\textit{(pathdiagram)} with functions \Rfunction{pathdiagram} and \Rfunction{pathdiagram2}. These will be introduced in detail in the following examples.
37 44
 
38 45
 We begin by loading the \Rpackage{GMRP} package.
39 46
 
40 47
 <<echo = false, results = hide>>=
48
+#library(knitr)
41 49
 set.seed(102)
42 50
 options(width = 90)
43 51
 @
... ...
@@ -48,9 +56,11 @@ library(GMRP)
48 56
 
49 57
 
50 58
 \section{Loading Data}
51
-\Rpackage{GMRP} provides five data files:   \Robject{beta.data}, \Robject{cad.data}, \Robject{lpd.data}, \Robject{SNP358.data} and \Robject{SNP368annot.data} where
59
+\Rpackage{GMRP} provides five data files:\Robject{beta.data}, \Robject{cad.data}, \Robject{lpd.data}, \Robject{SNP358.data} and \Robject{SNP368annot.data} where
52 60
 
53
-\Robject{lpd.data} was a subset (1069 SNPs) of four GWAS result datasets for \emph{LDL}, \emph{HDL}, \emph{TG} and \emph{TC}. These \emph{GWAS} result data sheets were downloaded from the website\footnote{\url{http://csg.sph.umich.edu//abecasis/public/lipids2013/} } where there are 120165 SNPs scattered on 23 chromosomes and 40 variables. Four GWAS result datasets for \emph{LDL}, \emph{HDL}, \emph{TG} and \emph{TC} were merged into one data sheet by using \Rfunction{fmerg}\textit{(fl1,fl2,ID1,ID2,A,B,method)} where \textit{fl1} and \textit{fl2} are two \emph{GWAS} result data sheets. $ID1$ and $ID2$ are key $id$ in files \textit{fl1} and \textit{fl2}, respectively, and required. $A$ and $B$ are postfix for \textit{fl1} and \textit{fl2}. Default values are $A$="" and $B$="". \textit{method} is method for merging . In the current version, there are four methods: \textit{method}="No" or "no" or "NO" or "N" or "n" means that  the data with unmatched \emph{SNP}s in \textit{file1} and \textit{file 2} are not saved in the merged file; \textit{method}="ALL" or "All" or "all" or "A" or "a" indicates that the data with all unmatched \emph{SNP}s in \textit{file 1} and \textit{file 2} are saved in the unpaired way in the merged data file; if \textit{method}="\textit{file1}", then those with unmatched \emph{SNP}s only from file1 are saved or if \textit{method}="\textit{file2}", \Rfunction{fmerg} will save the data with unmatched \emph{SNP}s only from \textit{file2}". Here is a simple example:
61
+\Robject{lpd.data} was a subset (1069 SNPs) of four GWAS result datasets for \emph{LDL}, \emph{HDL}, \emph{TG} and \emph{TC}. These \emph{GWAS} result data sheets were downloaded from the website\footnote{\url{http://csg.sph.umich.edu//abecasis/public/lipids2013/}} where there are 120165 SNPs on 23 chromosomes and 40 variables. Four GWAS result datasets for \emph{LDL}, \emph{HDL}, \emph{TG} and \emph{TC} were merged into one data sheet by
62
+using\Rfunction{fmerge}\textit{(fl1,fl2,ID1,ID2,A,B,method)} where \textit{fl1} and \textit{fl2} are two \emph{GWAS} result data sheets. $ID1$ and $ID2$ are key $id$ in files \textit{fl1} and \textit{fl2}, respectively, and required. $A$ and $B$ are respectily postfix for \textit{fl1} and \textit{fl2}. Default values are $A$="" and $B$="". \textit{method} is method for merging . In the current version, there are four methods: \textit{method}="No" or "no" or "NO" or "N" or "n" means that
63
+the data with unmatched \emph{SNP}s in \textit{file1} and \textit{file 2} are not saved in the merged file; \textit{method}="ALL" or "All" or "all" or "A" or "a" indicates that the data with all unmatched \emph{SNP}s in \textit{file 1} and \textit{file 2} are saved in the unpaired way in the merged data file; If \textit{method}=\textit{"file1"}, then those with unmatched \emph{SNP}s only from file1 are saved or if \textit{method}=\textit{"file2"}, \Rfunction{fmerge} will save the data with unmatched \emph{SNP}s only from \textit{file2}". Here is a simple example:
54 64
 
55 65
 <<fmerge,keep.source=TRUE, eval=FALSE>>=
56 66
 data1 <- matrix(NA, 20, 4)
... ...
@@ -78,13 +88,13 @@ User can take the following approach to merge all four lipid files into a data s
78 88
 
79 89
 \textit{lpd<-fmerge(fl1=LDL\underline{}HDL,fl2=TG\underline{}TC,ID1="SNP",ID2="SNP",A="",B="",method="No")}
80 90
 
81
-\Robject{cad.data} was also a subset (1069 SNPs) of original GWAS meta-analyzed dataset that was downloaded from the website\footnote{\url{http://www.cardiogramplusc4d.org/downloads/} } and contains 2420360 \emph{SNP}s and 12 variables. 
91
+\Robject{cad.data} was also a subset (1069 SNPs) of original GWAS meta-analyzed dataset that was downloaded from the website\footnote{\url{http://www.cardiogramplusc4d.org/downloads/}} and contains 2420360 \emph{SNP}s and 12 variables. 
82 92
 
83
-\Robject{beta.data}  that was created by using function \Rfunction{mktable} and \Rfunction{fmerg} from \Robject{lpd.data} and \Robject{cad.data} is a standard beta table for \emph{MR} and path analyses.
93
+\Robject{beta.data}  that was created by using function \Rfunction{mktable} and \Rfunction{fmerge} from \Robject{lpd.data} and \Robject{cad.data} is a standard beta table for \emph{MR} and path analyses.
84 94
  
85 95
 \Robject{SNP358.data} contains 358 \emph{SNP}s selected by \Rfunction{mktable} for \emph{SNP} position annotation analysis.
86 96
 
87
-\Robject{SNP368annot.data} is the data obtained from function analysis with \href{http://snp-nexus.org/index.html}{\emph{SNP} Annotation Tool} and provides example of performing function \Rfunction{ucscanno} to draw a \texttt{3D} pie and output the results of proportions of \emph{SNP}s coming from gene function various elements. 
97
+\Robject{SNP368annot.data} is the data obtained from function analysis with \url{http://snp-nexus.org/index.html}{\emph{SNP} Annotation Tool} and provides example of performing function \Rfunction{ucscanno} to draw a \texttt{3D} pie and output the results of proportions of \emph{SNP}s coming from gene function various elements. 
88 98
   
89 99
 <<>>=
90 100
 data(cad.data)
... ...
@@ -184,7 +194,7 @@ sd.TC <- rep(42.74, length(pvj))
184 194
 sd <- cbind(sd.LDL, sd.HDL, sd.TG, sd.TC)
185 195
 @
186 196
 
187
-Step7:  \emph{SNPID} and position:
197
+Step7:  \emph{SNPID} and position are retrieved from \Robject{lpd} data:
188 198
 <<Step7, keep.source=TRUE, eval=FALSE>>=
189 199
 hg19 <- lpd$SNP_hg19.HDL
190 200
 rsid <- lpd$rsid.HDL
... ...
@@ -202,7 +212,7 @@ newdata<-cbind(chr,rsid,alle1,as.data.frame(newdata))
202 212
 dim(newdata)
203 213
 @
204 214
  
205
-Step10: retrieve data from \Robject{cad} and calculate $pdj$ and frequency of coronary artery disease: \textit{freq.case} in population:
215
+Step10: retrieve data from \Robject{cad} and calculate $pdj$ and frequency of coronary artery disease\textit{cad}, \textit{freq.case} in case population:
206 216
 <<Step10,keep.source=TRUE, eval=FALSE>>=
207 217
 hg18.d <- cad$chr_pos_b36
208 218
 SNP.d <- cad$SNP #SNPID
... ...
@@ -386,7 +396,7 @@ Note that in the current version, \Rpackage{GMRP} can just create two-level nest
386 396
 
387 397
 <<>>=
388 398
 data(SNP358.data)
389
-SNP358 <- DataFrame(SNP358.data)
399
+SNP358 <- as.data.frame(SNP358.data)
390 400
 head(SNP358)
391 401
 @
392 402
 
... ...
@@ -395,18 +405,15 @@ head(SNP358)
395 405
 library(graphics)
396 406
 @
397 407
 
398
-With SNP data \Robject{SNP358}, we can perform \emph{SNP} position annotation using function \Rfunction{snpposit}\textit{(SNPdata,SNP\underline{}hg19,LG,main,maxd)} where 
408
+With SNP data \Robject{SNP358}, we can perform \emph{SNP} position annotation using function \Rfunction{snpPositAnnot} \textit{(SNPdata,SNP\underline{}hg19,main)} where 
399 409
 
400 410
 \textit{SNPdata} is R object that may be \emph{hg19} that is a string vector(\textit{chr}\#\#.\#\#\#\#\#\#\#\#) or two numeric vectors (chromosome number and \emph{SNP} position).
401 411
 
402
-\textit{SNP\underline{}hg19} is a string parameter. It may be "\textit{hg19}" or "\textit{chr}". If \textit{SNP\underline{}hg19}="\textit{hg19}",then \textit{SNPdata} contains a string vector of \textit{hg19} or if \textit{SNP\underline{}hg19}="\textit{chr}", then \textit{SNPdata} consists of at lest two numeric columns: \textit{chr} and \textit{posit}. \textit{chr} is chromosome number and \textit{posit} is \emph{SNP} physical position on chromosomes. Note that "\textit{chr}" and "\textit{posit}" are required column names in \textit{SNPdata} if \textit{SNP\underline{}hg19} ="\textit{chr}".
403
-
404
-\emph{LG} is a numeric parameter that gives maximum permissible distance between positions. Its default is 10.
412
+\textit{SNP\underline{}hg19} is a string parameter. It may be \textit{"hg19"} or \textit{"chr"}. If \textit{SNP\underline{}hg19}=\textit{"hg19"},then \textit{SNPdata} contains a string vector of \textit{hg19} or if \textit{SNP\underline{}hg19}=\textit{"chr"}, then \textit{SNPdata} consists of at lest two numeric columns: \textit{chr} and \textit{posit}. \textit{chr} is chromosome number and \textit{posit} is \emph{SNP} physical position on chromosomes.
413
+Note that \textit{"chr"} and \textit{"posit"} are required column names in \textit{SNPdata} if \textit{SNP\underline{}hg19} =\textit{"chr"}.
405 414
 
406 415
 \textit{main} is a string which is title of graph. If no title is given, then man="". Its default is "A".
407 416
 
408
-\textit{maxd} is a numeric parameter that is maximum distance for truncating chromosome columns. If there are not big differences among 23 chromosomes, then \textit{maxd} can be set to be larger than 2000$kbp$. Its default is  2000$kbp$.
409
-
410 417
 <<fig=FALSE,keep.source=TRUE, label=ChromHistogram>>=
411 418
 snpPositAnnot(SNPdata=SNP358,SNP_hg19="chr",main="A")
412 419
 @
... ...
@@ -415,23 +422,23 @@ snpPositAnnot(SNPdata=SNP358,SNP_hg19="chr",main="A")
415 422
 <<label=figChromHistogram, fig=TRUE,echo=FALSE>>=
416 423
 <<ChromHistogram>>
417 424
 @ 
418
-\caption{ Chromosomal histogram of 358 selected \emph{SNP}s. Averaged lengths of \emph{SNP} intervals on chromosome mean that the \emph{SNP}s on a chromosome have their averaged lengths of intervals between them.  All averaged lengths over 2000kb on chromosomes were truncated, the \emph{SNP}s on these chromosomes have at least more than 2000$kbp$ averaged length of interval. Numbers above \textit{chr} columns are numbers of \emph{SNP} distributed on the chromosomes}
425
+\caption{ Chromosomal histogram of 358 selected \emph{SNP}s. Averaged lengths of \emph{SNP} intervals on chromosome mean that the \emph{SNP}s on a chromosome have their averaged lengths of intervals between them.  All averaged lengths over 2000kb on chromosomes were truncated, the \emph{SNP}s on these chromosomes have at least 2000$kbp$ length of interval. Numbers above \textit{chr} columns are numbers of \emph{SNP} distributed on the chromosomes}
419 426
 \label{figure4}
420 427
 \end{center}
421 428
 \end{figure}
422 429
 
423 430
 SNP function annotation analysis has two steps:
424 431
 
425
-Step 1: copy \emph{SNP ID}s selected to \textbf{Batch Query} box in\href{http://snp-nexus.org/index.html}{\emph{SNP} Annotation Tool}. After setting parameters and running by clicking \emph{run button}, SNP annotation result will be obtained after running for a while. Choose consequence sheet of \emph{UCSC} and copy the results to excel sheet,"\emph{Predicted function}" column name is changed to "\emph{function\underline{}unit}" name and save it as \textit{csv} format. 
432
+Step 1: copy \emph{SNP ID}s selected to \textbf{Batch Query} box in\href{http://snp-nexus.org/index.html}{\emph{SNP} Annotation Tool}. After setting parameters and running by clicking \emph{run button}, SNP annotation result will be obtained after running for a while. Choose consequence sheet of \emph{UCSC} and copy the results to excel sheet,\emph{"Predicted function"} column name is changed to \emph{"function\underline{}unit"} name and save it as \textit{csv} format. 
426 433
 
427 434
 Step2: input the \textit{csv} file into \emph{R Console} using \textbf{R} function \Rfunction{read.csv}. In \Rpackage{GMRP} package, we have provided data for \emph{SNP} function annotation analysis. 
428 435
 <<>>=
429 436
 data(SNP368annot.data)
430
-SNP368<-DataFrame(SNP368annot.data)
437
+SNP368<-as.data.frame(SNP368annot.data)
431 438
 SNP368[1:10, ]
432 439
 @
433 440
 
434
-We perform function \Rfunction{ucscannot} to summarize proportions of SNPs coming from gene various elements such as code region, introns, etc, and then create 3D pie with \Rfunction{pie3D} of \Rpackage{plotrix}. 
441
+We perform function \Rfunction{ucscannot} to summarize proportions of SNPs coming from gene various elements such as code region, introns, etc, and then create 3D pie using \Rfunction{pie3D} of \Rpackage{plotrix}. 
435 442
 <<>>=
436 443
 library(plotrix)
437 444
 @
... ...
@@ -446,7 +453,7 @@ $A$ is numeric parameter for title size, default=2.5.
446 453
 
447 454
 $B$ is numeric parameter for label size, default=1.5.
448 455
 
449
-$C$ is numeric parameter for \emph{labelrad} distance,default=0.1.
456
+$C$ is numeric parameter for \emph{labelrad} distance,default=1.3.
450 457
 
451 458
 \textit{method} is numeric parameter for choosing figure output methods. It has two options: method=1 has no legend but color and pie components are labeled with gene elements, method=2 has legend over pie. The default = 1.	
452 459