Browse code

Updated vignette and added GHA.

Robert Castelo authored on 30/03/2021 17:42:22
Showing4 changed files

1 1
new file mode 100644
... ...
@@ -0,0 +1 @@
1
+^\.github$
0 2
new file mode 100644
... ...
@@ -0,0 +1,275 @@
1
+## Read more about GitHub actions the features of this GitHub Actions workflow
2
+## at https://lcolladotor.github.io/biocthis/articles/biocthis.html#use_bioc_github_action
3
+##
4
+## For more details, check the biocthis developer notes vignette at
5
+## https://lcolladotor.github.io/biocthis/articles/biocthis_dev_notes.html
6
+##
7
+## You can add this workflow to other packages using:
8
+## > biocthis::use_bioc_github_action()
9
+## or
10
+## > usethis::use_github_action("check-bioc", "https://bit.ly/biocthis_gha", "check-bioc.yml")
11
+## without having to install biocthis.
12
+##
13
+## Using GitHub Actions exposes you to many details about how R packages are
14
+## compiled and installed in several operating systems.
15
+## If you need help, please follow the steps listed at
16
+## https://github.com/r-lib/actions#where-to-find-help
17
+##
18
+## If you found an issue specific to biocthis's GHA workflow, please report it
19
+## with the information that will make it easier for others to help you.
20
+## Thank you!
21
+
22
+## Acronyms:
23
+## * GHA: GitHub Action
24
+## * OS: operating system
25
+
26
+on:
27
+  push:
28
+    branches:
29
+      - master
30
+      - 'RELEASE_*'
31
+  pull_request:
32
+    branches:
33
+      - master
34
+      - 'RELEASE_*'
35
+
36
+name: R-CMD-check-bioc
37
+
38
+## These environment variables control whether to run GHA code later on that is
39
+## specific to testthat, covr, and pkgdown.
40
+##
41
+## If you need to clear the cache of packages, update the number inside
42
+## cache-version as discussed at https://github.com/r-lib/actions/issues/86.
43
+## Note that you can always run a GHA test without the cache by using the word
44
+## "/nocache" in the commit message.
45
+env:
46
+  has_testthat: 'false'
47
+  run_covr: 'true'
48
+  run_pkgdown: 'false'
49
+  has_RUnit: 'true'
50
+  cache-version: 'cache-v1'
51
+
52
+jobs:
53
+  ## This first job uses the GitHub repository branch name to infer what
54
+  ## version of Bioconductor we will be working on.
55
+  define-docker-info:
56
+    runs-on: ubuntu-latest
57
+    outputs:
58
+      imagename: ${{ steps.findinfo.outputs.imagename }}
59
+      biocversion: ${{ steps.findinfo.outputs.biocversion }}
60
+    steps:
61
+      - id: findinfo
62
+        run: |
63
+          ## Find what Bioconductor RELEASE branch we are working on
64
+          ## otherwise, assume we are working on bioc-devel.
65
+          if echo "$GITHUB_REF" | grep -q "RELEASE_"; then
66
+            biocversion="$(basename -- $GITHUB_REF | tr '[:upper:]' '[:lower:]')"
67
+          else
68
+            biocversion="devel"
69
+          fi
70
+          ## Define the image name and print the information
71
+          imagename="bioconductor/bioconductor_docker:${biocversion}"
72
+          echo $imagename
73
+          echo $biocversion
74
+
75
+          ## Save the information for the next job
76
+          echo "::set-output name=imagename::${imagename}"
77
+          echo "::set-output name=biocversion::${biocversion}"
78
+
79
+  R-CMD-check-bioc:
80
+    ## This job then checks the R package using the Bioconductor docker that
81
+    ## was defined by the previous job. This job will determine what version of
82
+    ## R to use for the macOS and Windows builds on the next job.
83
+    runs-on: ubuntu-latest
84
+    needs: define-docker-info
85
+
86
+    ## Name shown on the GHA log
87
+    name: ubuntu-latest (r-biocdocker bioc-${{ needs.define-docker-info.outputs.biocversion }})
88
+
89
+    ## Information used by the next job that will run on macOS and Windows
90
+    outputs: 
91
+      rversion: ${{ steps.findrversion.outputs.rversion }}
92
+      biocversionnum: ${{ steps.findrversion.outputs.biocversionnum }}
93
+
94
+    ## Environment variables unique to this job.
95
+    env:
96
+      R_REMOTES_NO_ERRORS_FROM_WARNINGS: true
97
+      TZ: UTC
98
+      NOT_CRAN: true
99
+      GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
100
+      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
101
+
102
+    ## The docker container to use. Note that we link a directory on the GHA
103
+    ## runner to a docker directory, such that we can then cache the linked
104
+    ## directory. This directory will contain the R packages used.
105
+    container:
106
+      image: ${{ needs.define-docker-info.outputs.imagename }}
107
+      volumes:
108
+        - /home/runner/work/_temp/Library:/usr/local/lib/R/host-site-library
109
+    steps:
110
+
111
+      - name: Install latest git
112
+        run: |
113
+          ## git version provided
114
+          git --version
115
+          ## to be able to install software properties
116
+          sudo apt-get update -y
117
+          ## to be able to use add-apt-repository
118
+          sudo apt-get install software-properties-common -y
119
+          ## to use stable releases of git that are already in a PPA at
120
+          ## https://launchpad.net/~git-core/+archive/ubuntu/candidate
121
+          sudo add-apt-repository ppa:git-core/candidate -y
122
+          ## Update
123
+          sudo apt-get update -y
124
+          ## Upgrade git and other tools
125
+          sudo apt-get upgrade -y
126
+          ## latest git version
127
+          git --version
128
+        shell: bash {0}
129
+
130
+      ## Most of these steps are the same as the ones in
131
+      ## https://github.com/r-lib/actions/blob/master/examples/check-standard.yaml
132
+      ## If they update their steps, we will also need to update ours.
133
+      - uses: actions/checkout@v2
134
+
135
+      - name: Query dependencies
136
+        run: |
137
+          install.packages('remotes')
138
+          saveRDS(remotes::dev_package_deps(dependencies = TRUE), ".github/depends.Rds", version = 2)
139
+          message(paste('****', Sys.time(), 'installing BiocManager ****'))
140
+          remotes::install_cran("BiocManager")
141
+        shell: Rscript {0}
142
+
143
+      ## Find the corresponding R version based on the Bioconductor version
144
+      ## to use for the macOS and Windows checks by the next GHA job
145
+      - id: findrversion
146
+        name: Find Bioc and R versions
147
+        run: |
148
+          ## Find what branch we are working on
149
+          if echo "$GITHUB_REF" | grep -q "master"; then
150
+              biocversion="devel"
151
+          elif echo "$GITHUB_REF" | grep -q "RELEASE_"; then
152
+              biocversion="release"
153
+          fi
154
+
155
+          ## Define the R and Bioconductor version numbers
156
+          biocversionnum=$(Rscript -e "info <- BiocManager:::.version_map_get_online('https://bioconductor.org/config.yaml'); res <- subset(info, BiocStatus == '${biocversion}')[, 'Bioc']; cat(as.character(res))")
157
+          rversion=$(Rscript -e "info <- BiocManager:::.version_map_get_online('https://bioconductor.org/config.yaml'); res <- subset(info, BiocStatus == '${biocversion}')[, 'R']; cat(as.character(res))")
158
+
159
+          ## Print the results
160
+          echo $biocversion
161
+          echo $biocversionnum
162
+          echo $rversion
163
+
164
+          ## Save the info for the next job
165
+          echo "::set-output name=rversion::${rversion}"
166
+          echo "::set-output name=biocversionnum::${biocversionnum}"
167
+        shell:
168
+          bash {0}
169
+
170
+      - name: Cache R packages
171
+        if: "!contains(github.event.head_commit.message, '/nocache')"
172
+        uses: actions/cache@v1
173
+        with:
174
+          path: /home/runner/work/_temp/Library
175
+          key: ${{ env.cache-version }}-${{ runner.os }}-biocdocker-biocbranch-${{ needs.define-docker-info.outputs.biocversion }}-r-${{ steps.findrversion.outputs.rversion }}-bioc-${{ steps.findrversion.outputs.biocversionnum }}-${{ hashFiles('.github/depends.Rds') }}
176
+          restore-keys: ${{ env.cache-version }}-${{ runner.os }}-biocdocker-biocbranch-${{ needs.define-docker-info.outputs.biocversion }}-r-${{ steps.findrversion.outputs.rversion }}-bioc-${{ steps.findrversion.outputs.biocversionnum }}-
177
+
178
+      - name: Install dependencies
179
+        run: |
180
+          ## Try installing the package dependencies in steps. First the local
181
+          ## dependencies, then any remaining dependencies to avoid the
182
+          ## issues described at
183
+          ## https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016675.html
184
+          ## https://github.com/r-lib/remotes/issues/296
185
+          ## Ideally, all dependencies should get installed in the first pass.
186
+
187
+          ## Pass #1 at installing dependencies
188
+          message(paste('****', Sys.time(), 'pass number 1 at installing dependencies: local dependencies ****'))
189
+          local_deps <- remotes::local_package_deps(dependencies = TRUE)
190
+          deps <- remotes::dev_package_deps(dependencies = TRUE, repos = BiocManager::repositories())
191
+          BiocManager::install(local_deps[local_deps %in% deps$package[deps$diff != 0]])
192
+
193
+          ## Pass #2 at installing dependencies
194
+          message(paste('****', Sys.time(), 'pass number 2 at installing dependencies: any remaining dependencies ****'))
195
+          deps <- remotes::dev_package_deps(dependencies = TRUE, repos = BiocManager::repositories())
196
+          BiocManager::install(deps$package[deps$diff != 0])
197
+
198
+          ## For running the checks
199
+          message(paste('****', Sys.time(), 'installing rcmdcheck ****'))
200
+          remotes::install_cran("rcmdcheck")
201
+        shell: Rscript {0}
202
+
203
+      - name: Session info
204
+        run: |
205
+          options(width=100)
206
+          pkgs <- installed.packages()[, "Package"]
207
+          sessioninfo::session_info(pkgs, include_base=TRUE)
208
+        shell: Rscript {0}
209
+
210
+      - name: Check
211
+        env:
212
+          _R_CHECK_CRAN_INCOMING_: false
213
+        run: |
214
+          rcmdcheck::rcmdcheck(
215
+              args = c("--no-build-vignettes", "--no-manual", "--timings"),
216
+              build_args = c("--no-manual", "--no-resave-data"),
217
+              error_on = "error",
218
+              check_dir = "check"
219
+          )
220
+        shell: Rscript {0}
221
+
222
+      - name: Reveal testthat details
223
+        if:  env.has_testthat == 'true'
224
+        run: find . -name testthat.Rout -exec cat '{}' ';'
225
+
226
+      - name: Run RUnit tests
227
+        if:  env.has_RUnit == 'true'
228
+        run: |
229
+          ## Install BiocGenerics
230
+          BiocManager::install("BiocGenerics")
231
+          ## Install the package itself, otherwise BiocGenerics:::testPackage() doesn't find it
232
+          install.packages(".", repos=NULL)
233
+          BiocGenerics:::testPackage()
234
+        shell: Rscript {0}
235
+
236
+      - name: Install covr
237
+        if: github.ref == 'refs/heads/master' && env.run_covr == 'true'
238
+        run: |
239
+          remotes::install_cran("covr")
240
+        shell: Rscript {0}
241
+
242
+      - name: Test coverage
243
+        if: github.ref == 'refs/heads/master' && env.run_covr == 'true'
244
+        run: |
245
+          covr::codecov()
246
+        shell: Rscript {0}
247
+
248
+      - name: Install pkgdown
249
+        if: github.ref == 'refs/heads/master' && env.run_pkgdown == 'true'
250
+        run: |
251
+          remotes::install_github("r-lib/pkgdown")
252
+        shell: Rscript {0}
253
+
254
+      - name: Install package
255
+        if: github.ref == 'refs/heads/master' && env.run_pkgdown == 'true'
256
+        run: R CMD INSTALL .
257
+
258
+      - name: Deploy package
259
+        if: github.ref == 'refs/heads/master' && env.run_pkgdown == 'true'
260
+        run: |
261
+          git config --local user.email "action@github.com"
262
+          git config --local user.name "GitHub Action"
263
+          Rscript -e "pkgdown::deploy_to_branch(new_process = FALSE)"
264
+        shell: bash {0}
265
+        ## Note that you need to run pkgdown::deploy_to_branch(new_process = FALSE)
266
+        ## at least one locally before this will work. This creates the gh-pages
267
+        ## branch (erasing anything you haven't version controlled!) and
268
+        ## makes the git history recognizable by pkgdown.
269
+
270
+      - name: Upload check results
271
+        if: failure()
272
+        uses: actions/upload-artifact@master
273
+        with:
274
+          name: ${{ runner.os }}-biocdocker-biocbranch-${{ needs.define-docker-info.outputs.biocversion }}-r-${{ steps.findrversion.outputs.rversion }}-bioc-${{ steps.findrversion.outputs.biocversionnum }}-results
275
+          path: check
... ...
@@ -1,5 +1,5 @@
1 1
 Package: GSVA
2
-Version: 1.39.19
2
+Version: 1.39.20
3 3
 Title: Gene Set Variation Analysis for microarray and RNA-seq data
4 4
 Authors@R: c(person("Justin", "Guinney", role=c("aut", "cre"), email="justin.guinney@sagebase.org"),
5 5
              person("Robert", "Castelo", role="aut", email="robert.castelo@upf.edu"),
... ...
@@ -1,5 +1,5 @@
1 1
 ---
2
-title: "GSVA: gene set variation analysis for molecular profiling data"
2
+title: "GSVA: gene set variation analysis"
3 3
 author:
4 4
 - name: Robert Castelo
5 5
   affiliation:
... ...
@@ -13,19 +13,19 @@ author:
13 13
   - Sage Bionetworks
14 14
   email: justin.guinney@sagebase.org
15 15
 abstract: >
16
-  The GSVA package provides the implementation of four single-sample gene set
17
-  enrichment methods, concretely _zscore_, _plage_, _ssGSEA_ and its own called
18
-  _GSVA_. These methods transform an input gene-by-sample expression data matrix
19
-  into a gene-set-by-sample expression data matrix. Thereby enabling the
20
-  estimation of pathway activity for each sample and facilitating pathway-centric
21
-  analyses of gene expression data. While this methodology was initially developed
22
-  for gene expression data, it is readily aplicable to other types of molecular
23
-  profiling data. In this vignette we illustrate how to use
16
+  Gene set variation analysis (GSVA) is a particular type of gene set enrichment
17
+  method that works on single samples and enables pathway-centric analyses of
18
+  molecular data by performing a conceptually simple but powerful change in the
19
+  functional unit of analysis, from genes to gene sets. The GSVA package provides
20
+  the implementation of four single-sample gene set enrichment methods, concretely
21
+  _zscore_, _plage_, _ssGSEA_ and its own called _GSVA_. While this methodology
22
+  was initially developed for gene expression data, it can be applied to other
23
+  types of molecular profiling data. In this vignette we illustrate how to use
24 24
   the GSVA package with bulk microarray and RNA-seq expression data.
25 25
 date: "`r BiocStyle::doc_date()`"
26 26
 package: "`r pkg_ver('GSVA')`"
27 27
 vignette: >
28
-  %\VignetteIndexEntry{GSVA for bulk expression data}
28
+  %\VignetteIndexEntry{Gene set variation analysis}
29 29
   %\VignetteEngine{knitr::rmarkdown}
30 30
   %\VignetteEncoding{UTF-8}
31 31
   %\VignetteKeywords{GeneExpression, Microarray, RNAseq, GeneSetEnrichment, Pathway}
... ...
@@ -250,7 +250,44 @@ In general, the default values for the previous parameters are suitable for
250 250
 most analysis settings, which usually consist of some kind of normalized
251 251
 continuous expression values.
252 252
 
253
-# Gene sets definitions and mapping to gene identifiers
253
+# Gene set definitions and gene identifier mapping
254
+
255
+Gene sets constitute a simple, yet useful, way to define pathways, essentially
256
+because we use pathway membership definitions only, neglecting the information
257
+on molecular interactions. Gene set definitions are a crucial input to any gene
258
+set enrichment analysis because if our gene sets do not capture the biological
259
+processes we are studying, we will likely not find any relevant insights in our
260
+data.
261
+
262
+There are multiple sources of gene sets, the most popular ones being
263
+[The Gene Ontology (GO) project](http://geneontology.org) and
264
+[The Molecular Signatures Database (MSigDB)](https://www.gsea-msigdb.org/gsea/msigdb).
265
+Sometimes gene set databases will not include the ones we need. In such a case
266
+we should either curate our own gene sets or use techniques to infer them from
267
+data.
268
+
269
+The most basic data container for gene sets in R is the `list` class of objects,
270
+as illustrated before in the quick start section, where we defined a toy collection
271
+of three gene sets stored in a list object called `gs`:
272
+
273
+```{r}
274
+gs
275
+```
276
+
277
+Using a Bioconductor organism-level package such as
278
+`r Biocpkg("org.Hs.eg.db")` we can easily build a list object containing a
279
+collection of gene sets defined as GO terms with annotated Entrez gene
280
+identifiers, as follows:
281
+
282
+```{r}
283
+library(org.Hs.eg.db)
284
+
285
+goannot <- select(org.Hs.eg.db, keys=keys(org.Hs.eg.db), columns="GO")
286
+head(goannot)
287
+genesbygo <- split(annot$ENTREZID, annot$GO)
288
+length(genesbygo)
289
+head(genesbygo)
290
+```
254 291
 
255 292
 # Example applications
256 293
 
... ...
@@ -329,11 +366,6 @@ mtext("Gene sets", side=4, line=0, cex=1.5)
329 366
 mtext("Samples          ", side=1, line=4, cex=1.5)
330 367
 ```
331 368
 
332
-
333
-# Parallel calculations
334
-
335
-# Frequently asked questions
336
-
337 369
 # Session information {.unnumbered}
338 370
 
339 371
 Here is the output of `sessionInfo()` on the system on which this document was