Browse code

Updated vignette.

[rcastelo] authored on 13/05/2021 16:53:11
Showing 2 changed files

... ...
@@ -1,5 +1,5 @@
1 1
 Package: GSVA
2
-Version: 1.39.29
2
+Version: 1.39.30
3 3
 Title: Gene Set Variation Analysis for microarray and RNA-seq data
4 4
 Authors@R: c(person("Justin", "Guinney", role=c("aut", "cre"), email="justin.guinney@sagebase.org"),
5 5
              person("Robert", "Castelo", role="aut", email="robert.castelo@upf.edu"),
... ...
@@ -63,8 +63,9 @@ Once `r Biocpkg("GSVA")` is installed, it can be loaded with the following comma
63 63
 library(GSVA)
64 64
 ```
65 65
 
66
-Given a gene expression data matrix with rows corresponding to genes and columns
67
-to samples, such as this one simulated from random Gaussian data:
66
+Given a gene expression data matrix, which we shall call `X`, with rows
67
+corresponding to genes and columns to samples, such as this one simulated from
68
+random Gaussian data:
68 69
 
69 70
 ```{r}
70 71
 p <- 10000 ## number of genes
... ...
@@ -75,8 +76,9 @@ X <- matrix(rnorm(p*n), nrow=p,
75 76
 X[1:5, 1:5]
76 77
 ```
77 78
 
78
-Given a collection of gene sets stored, for instance, in a `list` object such as
79
-this one with genes sampled uniformly at random without replacement into the gene sets:
79
+Given a collection of gene sets stored, for instance, in a `list` object, which
80
+we shall call `gs`, with genes sampled uniformly at random without replacement
81
+into 100 different gene sets:
80 82
 
81 83
 ```{r}
82 84
 ## sample gene set sizes
... ...
@@ -95,24 +97,26 @@ dim(gsva.es)
95 97
 gsva.es[1:5, 1:5]
96 98
 ```
97 99
 
98
-So, the first argument to the `gsva()` function is the gene expression data matrix
99
-and the second the collection of gene sets. The `gsva()` function can take the input
100
-expression data and gene sets using different specialized containers that facilitate
101
-the access and manipulation of molecular and phenotype data, as well as their associated
102
-metadata. Another advanced features include the use of on-disk and parallel backends to
103
-enable using GSVA on large molecular data sets and speed up computing time. You will
104
-find information on all these features in this vignette.
100
+The first argument to the `gsva()` function is the gene expression data matrix
101
+and the second the collection of gene sets. The `gsva()` function can take the
102
+input expression data and gene sets using different specialized containers that
103
+facilitate the access and manipulation of molecular and phenotype data, as well
104
+as their associated metadata. Another advanced features include the use of
105
+on-disk and parallel backends to enable, respectively, using GSVA on large
106
+molecular data sets and speed up computing time. You will find information on
107
+these features in this vignette.
105 108
 
106 109
 # Introduction
107 110
 
108 111
 Gene set variation analysis (GSVA) provides an estimate of pathway activity
109
-by transforming an input gene-by-sample expression data matrix
110
-into a gene-set-by-sample one. This resulting expression data matrix can be
111
-then used with classical analytical methods such as differential expression,
112
-classification, survival analysis, clustering or correlation analysis in a
113
-pathway-centric manner. One can also perform sample-wise comparisons between
114
-pathways and other molecular data types such as microRNA expression or binding
115
-data, copy-number variation (CNV) data or single nucleotide polymorphisms (SNPs).
112
+by transforming an input gene-by-sample expression data matrix into a
113
+corresponding gene-set-by-sample expression data matrix. This resulting
114
+expression data matrix can be then used with classical analytical methods such
115
+as differential expression, classification, survival analysis, clustering or
116
+correlation analysis in a pathway-centric manner. One can also perform
117
+sample-wise comparisons between pathways and other molecular data types such
118
+as microRNA expression or binding data, copy-number variation (CNV) data or
119
+single nucleotide polymorphisms (SNPs).
116 120
 
117 121
 The GSVA package provides an implementation of this approach for the following
118 122
 methods:
... ...
@@ -151,7 +155,7 @@ methods:
151 155
 
152 156
 The interested user may find full technical details about how these methods
153 157
 work in their corresponding articles cited above. If you use any of them in a
154
-publication, please cite it with the given bibliographic reference.
158
+publication, please cite them with the given bibliographic reference.
155 159
 
156 160
 # Overview of the GSVA functionality
157 161
 
... ...
@@ -197,13 +201,14 @@ the following filters:
197 201
    minimum and maximum size, which by default is one for the minimum size and
198 202
    has no limit for the maximum size.
199 203
 
200
-If, as a result of this filter, either no genes or gene sets are left, the
201
-`gsva()` function will prompt an error. A common cause for an error at this
202
-stage is that gene identifiers between the expression data matrix and the gene
203
-sets do not belong to the same standard nomenclature and could not be mapped,
204
-because either the input data were not provided using some of the specialized
205
-containers described above or the necessary metadata in those containers to
206
-successfully map gene identifiers is missing.
204
+If, as a result of applying these three filters, either no genes or gene sets
205
+are left, the `gsva()` function will prompt an error. A common cause for such
206
+an error at this stage is that gene identifiers between the expression data
207
+matrix and the gene sets do not belong to the same standard nomenclature and
208
+could not be mapped. This may happen because either the input data were not
209
+provided using some of the specialized containers described above or the
210
+necessary metadata in those containers that allows the software to successfully
211
+map gene identifiers, is missing.
207 212
 
208 213
 By default, the `gsva()` function employs the method described by
209 214
 @haenzelmann_castelo_guinney_2013 but this can be changed using the argument
... ...
@@ -253,12 +258,12 @@ continuous expression values.
253 258
 
254 259
 # Gene set definitions and gene identifier mapping
255 260
 
256
-Gene sets constitute a simple, yet useful, way to define pathways, essentially
257
-because we use pathway membership definitions only, neglecting the information
258
-on molecular interactions. Gene set definitions are a crucial input to any gene
259
-set enrichment analysis because if our gene sets do not capture the biological
261
+Gene sets constitute a simple, yet useful, way to define pathways because we
262
+use pathway membership definitions only, neglecting the information on molecular
263
+interactions. Gene set definitions are a crucial input to any gene set
264
+enrichment analysis because if our gene sets do not capture the biological
260 265
 processes we are studying, we will likely not find any relevant insights in our
261
-data.
266
+data from an analysis based on these gene sets.
262 267
 
263 268
 There are multiple sources of gene sets, the most popular ones being
264 269
 [The Gene Ontology (GO) project](http://geneontology.org) and