Browse code

Updated vignette, manual pages, minor changes

git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/metagenomeSeq@84540 bc3139a8-67e5-0310-9ffc-ced21a209358

Joseph Paulson authored on 14/12/2013 00:02:50
Showing10 changed files

... ...
@@ -1,7 +1,7 @@
1 1
 Package: metagenomeSeq
2 2
 Title: Statistical analysis for sparse high-throughput sequencing
3
-Version: 1.5.27
4
-Date: 2013-11-20
3
+Version: 1.5.28
4
+Date: 2013-12-13
5 5
 Author: Joseph Nathaniel Paulson, Mihai Pop,  Hector Corrada Bravo
6 6
 Maintainer: Joseph N. Paulson <jpaulson@umiacs.umd.edu>
7 7
 Description: metagenomeSeq is designed to determine features (be it Operational
... ...
@@ -3,12 +3,12 @@
3 3
 #' 
4 4
 #' Using the featureData information in the MRexperiment, calling aggregateByTaxonomy on a
5 5
 #' MRexperiment and a particular featureData column (i.e. 'genus') will aggregate counts
6
-#' to the desired level by with the aggfun function (default colSums). Possible aggfun alternatives
6
+#' to the desired level using the aggfun function (default colSums). Possible aggfun alternatives
7 7
 #' include colMeans and colMedians.
8 8
 #' 
9 9
 #' @param obj A MRexperiment object.
10 10
 #' @param lvl featureData column name from the MRexperiment object.
11
-#' @param alternatelabel Use the rowname for undefined OTUs instead of aggregating to others.
11
+#' @param alternate Use the rowname for undefined OTUs instead of aggregating to "no_match".
12 12
 #' @param norm Whether to aggregate normalized counts or not.
13 13
 #' @param aggfun Aggregation function.
14 14
 #' @return An aggregated count matrix.
... ...
@@ -2,7 +2,7 @@
2 2
 #' 
3 3
 #' Calculates the number of estimated effective samples per feature from the output
4 4
 #' of a fitZig run. The estimated effective samples per feature is calculated as the
5
-#' \sum_1^n (n = number of samples) 1-z_i where z_i is the posterior probability a feature
5
+#' sum_1^n (n = number of samples) 1-z_i where z_i is the posterior probability a feature
6 6
 #' belongs to the technical distribution.
7 7
 #' 
8 8
 #' @param obj The output of fitZig run on a MRexperiment object.
... ...
@@ -6,7 +6,7 @@
6 6
 #' @aliases load_meta metagenomicLoader
7 7
 #' @param file Path and filename of the actual data file.
8 8
 #' @param sep File delimiter.
9
-#' @return An object of count data.
9
+#' @return A list with objects 'counts' and 'taxa'.
10 10
 #' @seealso \code{\link{load_phenoData}}
11 11
 #' @examples
12 12
 #' 
... ...
@@ -5,7 +5,7 @@
5 5
 #' 
6 6
 #' @aliases load_metaQ qiimeLoader
7 7
 #' @param file Path and filename of the actual data file.
8
-#' @return An object of count data.
8
+#' @return An list with 'counts' containing the count data, 'taxa' containing the otu annotation, and 'otus'.
9 9
 #' @seealso \code{\link{load_meta}} \code{\link{load_phenoData}}
10 10
 #' @examples
11 11
 #' 
... ...
@@ -5,7 +5,7 @@
5 5
 
6 6
 Using the featureData information in the MRexperiment, calling aggregateByTaxonomy on a
7 7
 MRexperiment and a particular featureData column (i.e. 'genus') will aggregate counts
8
-to the desired level by with the aggfun function (default colSums). Possible aggfun alternatives
8
+to the desired level using the aggfun function (default colSums). Possible aggfun alternatives
9 9
 include colMeans and colMedians.}
10 10
 \usage{
11 11
 aggregateByTaxonomy(obj, lvl, alternate = FALSE, norm = TRUE,
... ...
@@ -19,8 +19,8 @@ aggTax(obj, lvl, alternate = FALSE, norm = TRUE, aggfun = colSums)
19 19
   \item{lvl}{featureData column name from the MRexperiment
20 20
   object.}
21 21
 
22
-  \item{alternatelabel}{Use the rowname for undefined OTUs
23
-  instead of aggregating to others.}
22
+  \item{alternate}{Use the rowname for undefined OTUs
23
+  instead of aggregating to "no_match".}
24 24
 
25 25
   \item{norm}{Whether to aggregate normalized counts or
26 26
   not.}
... ...
@@ -37,7 +37,7 @@ level.
37 37
 Using the featureData information in the MRexperiment,
38 38
 calling aggregateByTaxonomy on a MRexperiment and a
39 39
 particular featureData column (i.e. 'genus') will aggregate
40
-counts to the desired level by with the aggfun function
40
+counts to the desired level using the aggfun function
41 41
 (default colSums). Possible aggfun alternatives include
42 42
 colMeans and colMedians.
43 43
 }
... ...
@@ -14,7 +14,7 @@ A list of the estimated effective samples per feature.
14 14
 \description{
15 15
 Calculates the number of estimated effective samples per
16 16
 feature from the output of a fitZig run. The estimated
17
-effective samples per feature is calculated as the \sum_1^n
17
+effective samples per feature is calculated as the sum_1^n
18 18
 (n = number of samples) 1-z_i where z_i is the posterior
19 19
 probability a feature belongs to the technical
20 20
 distribution.
... ...
@@ -11,7 +11,7 @@ load_meta(file, sep = "\\t")
11 11
   \item{sep}{File delimiter.}
12 12
 }
13 13
 \value{
14
-An object of count data.
14
+A list with objects 'counts' and 'taxa'.
15 15
 }
16 16
 \description{
17 17
 Load a matrix of OTUs in a tab delimited format
... ...
@@ -9,7 +9,8 @@ load_metaQ(file)
9 9
   \item{file}{Path and filename of the actual data file.}
10 10
 }
11 11
 \value{
12
-An object of count data.
12
+An list with 'counts' containing the count data, 'taxa'
13
+containing the otu annotation, and 'otus'.
13 14
 }
14 15
 \description{
15 16
 Load a matrix of OTUs in Qiime's format
... ...
@@ -12,7 +12,6 @@
12 12
 \bibliographystyle{unsrt}
13 13
 
14 14
 \begin{document}
15
-\SweaveOpts{concordance=TRUE}
16 15
 <<include=FALSE>>=
17 16
 require(knitr)
18 17
 opts_chunk$set(concordance=TRUE,tidy=TRUE)
... ...
@@ -20,7 +19,7 @@ opts_chunk$set(concordance=TRUE,tidy=TRUE)
20 19
 
21 20
 \title{{\textbf{\texttt{metagenomeSeq}: Statistical analysis for sparse high-throughput sequencing}}}
22 21
 \author{Joseph Nathaniel Paulson\\[1em]\\ Applied Mathematics $\&$ Statistics, and Scientific Computation\\ Center for Bioinformatics and Computational Biology\\ University of Maryland, College Park\\[1em]\\ \texttt{jpaulson@umiacs.umd.edu}}
23
-\date{Modified: November 19, 2013. Compiled: \today}
22
+\date{Modified: December 13, 2013. Compiled: \today}
24 23
 \maketitle
25 24
 \tableofcontents
26 25
 
... ...
@@ -34,6 +33,7 @@ set.seed(42)
34 33
 @
35 34
 
36 35
 \section{Introduction}
36
+\textbf{This is a vignette for pieces of an association study pipeline. For a full list of functions available in the package: help(package=metagenomeSeq). For more information about a particular function call: ?function.}
37 37
 
38 38
 Metagenomics is the study of genetic material targeted directly from an environmental community. 
39 39
 Originally focused on exploratory and validation projects, these studies now focus on understanding the differences in microbial communities caused by phenotypic differences. 
... ...
@@ -130,7 +130,7 @@ otu  = read.delim(file.path(dataDirectory,"CHK_otus.taxonomy.csv"),stringsAsFact
130 130
 
131 131
 As our OTUs appear to be in order with the count matrix we loaded earlier, the next step is to load phenodata. 
132 132
 
133
-Warning: features need to have the same names as the rows of the count matrix when we create the MRexperiment object for provenance purposes. 
133
+\textbf{Warning}: features need to have the same names as the rows of the count matrix when we create the MRexperiment object for provenance purposes. 
134 134
 
135 135
 \subsection{Loading metadata}
136 136
 Phenotype data can be optionally loaded into \texttt{R} with \texttt{load\_phenoData}. This function loads the data as a list.
... ...
@@ -143,7 +143,7 @@ head(clin[1:2,])
143 143
 @
144 144
 
145 145
 
146
-Warning: phenotypes must have the same names as the columns on the count matrix when we create the MRexperiment object for provenance purposes. 
146
+\textbf{Warning}: phenotypes must have the same names as the columns on the count matrix when we create the MRexperiment object for provenance purposes. 
147 147
 
148 148
 \subsection{Creating a \texttt{MRexperiment} object}
149 149
 
... ...
@@ -244,8 +244,10 @@ instance). Our linear model methodology can easily incorporate these
244 244
 confounding covariates in a straightforward manner. \texttt{fitZig} output includes weighted fits for each of the $m$ features. Results can be filtered and saved using \texttt{MRcoefs} or \texttt{MRtable}.
245 245
 
246 246
 \subsection{Example using fitZig for differential abundance testing}
247
-\textbf{Warning: The user should restrict significant features to those with a minimum number of positive samples. What this means is that one should not claim features are significant unless the effective number of samples is above a particular percentage. For example, fold-change estimates might be unreliable if an entire group does not have a positive count for the feature in question.}
248
-\textbf{We recommend the user remove features based on the number of estimated effective samples. We recommend removing features with less than the average number of effective samples in all features. See exporting fits (MRfulltable) on how to do this.}
247
+\textbf{Warning}: The user should restrict significant features to those with a minimum number of positive samples. What this means is that one should not claim features are significant unless the effective number of samples is above a particular percentage. For example, fold-change estimates might be unreliable if an entire group does not have a positive count for the feature in question.
248
+
249
+
250
+We recommend the user remove features based on the number of estimated effective samples, please see \texttt{calculateEffectiveSamples}. We recommend removing features with less than the average number of effective samples in all features. In essence, setting eff = .5 when using \texttt{MRcoefs}, \texttt{MRfulltable}, or \texttt{MRtable}.
249 251
 
250 252
 In our analysis of the lung microbiome data, we can remove features that are not present in many samples, controls, and calculate the normalization factors. The user needs to decide which metadata should be included in the linear model.
251 253
 
... ...
@@ -304,6 +306,14 @@ res = fitPA(mouseData[1:5,],cl=classes)
304 306
 head(res)
305 307
 @
306 308
 
309
+\newpage
310
+\section{Aggregating features}
311
+Normalization is recommended at the OTU level. However, functions are in place to aggregate 
312
+the count matrix (normalized or not), based on a particular user defined level. Using the
313
+featureData information in the MRexperiment object, calling \texttt{aggregateByTaxonomy} or \texttt{aggTax} on a MRexperiment object and declaring particular featureData column name (i.e.
314
+'genus') will aggregate counts to the desired level with the aggfun function (default colSums). Possible aggfun alternatives include colMeans and colMedians.
315
+
316
+
307 317
 \newpage
308 318
 \section{Visualization of features}
309 319