Browse code

Updating vignette and citations

Pourya Naderi authored on 17/08/2022 19:29:13
Showing 2 changed files

... ...
@@ -19,9 +19,25 @@ knitr::opts_chunk$set(
19 19
 
20 20
 # Introduction
21 21
 
22
-PanomiR is a package for pathway and microRNA Analysis of gene expression data.
23
-This document provides details about how to install and utilize various 
24
-functionality in PanomiR.
22
+MicroRNAs (miRNAs) can target co-expressed genes to coordinate multiple 
23
+pathways. “Pathway networks of miRNA Regulation” (PanomiR) is a
24
+framework to support the discovery of miRNA regulators based on their targeting
25
+of coordinated pathways. It analyzes and prioritizes multi-pathway dynamics
26
+of miRNA-orchestrated regulation, as opposed to investigating isolated
27
+miRNA-pathway interaction events. PanomiR uses predefined pathways, their 
28
+co-activation, gene expression, and annotated miRNA-mRNA interactions to extract
29
+miRNA-pathway targeting events. This vignette describes PanomiR’s functions
30
+and analysis tools to derive these multi-pathway targeting events. 
31
+
32
+If you use PanomiR for your research, please cite PanomiR's manuscript
33
+[@yeganeh2022panomir]. Please send any questions/suggestions you may have to
34
+`pnaderiy [at] bidmc [dot] harvard [dot] edu` or submit Github issues at
35
+[https://github.com/pouryany/PanomiR]().
36
+
37
+Naderi Yeganeh, Pourya, Yue Yang Teo, Dimitra Karagkouni,
38
+Yered Pita-Juarez, Sarah L. Morgan, Ioannis S. Vlachos, and Winston Hide.
39
+"PanomiR: A systems biology framework for analysis of multi-pathway targeting
40
+by miRNAs." bioRxiv (2022). doi: [https://doi.org/10.1101/2022.07.12.499819]().
25 41
 
26 42
 # Installation
27 43
 
... ...
@@ -43,28 +59,28 @@ devtools::install_github("pouryany/PanomiR")
43 59
 
44 60
 # Overview
45 61
 
46
-PanomiR is a pipeline to prioritize disease-associated miRNAs based on activity 
62
+PanomiR is a framework to prioritize disease-associated miRNAs using activity 
47 63
 of disease-associated pathways. The input datasets for PanomiR are (a) a gene 
48
-expression disease dataset along with covariates, (b) a background collection
49
-of pathways/genesets, and (c) a collection of miRNAs containing gene targets.
64
+expression dataset along with covariates such as disease-state and batch,
65
+(b) a background collection of pathways/genesets, and (c) a collection of
66
+miRNAs and their gene targets.
50 67
 
51
-The general workflow of PanomiR is (a) generation of pathway summary statistics 
52
-from gene expression data, (b) detection of differentially activated pathways,
53
-(c) finding coherent groups, or clusters, of differentially activated pathways,
54
-and (d) detecting miRNAs targeting each group of pathways. 
68
+The workflow of PanomiR includes (a) generation of pathway summary
69
+statistics from gene expression data, (b) detection of differentially activated
70
+pathways, (c) finding coherent groups, or clusters, of differentially activated
71
+pathways, and (d) detecting miRNAs that target each group of pathways. 
55 72
 
56
-Individual steps of the workflow can be used in isolation to carry out different
57
-analyses. The following sections outline each step and material needed to
73
+Individual steps of the workflow can be used in isolation to carry out specific
74
+analyses. The following sections outline each step and the material needed to
58 75
 execute PanomiR. 
59 76
 
60 77
 # Pathway summarization
61 78
 
62
-PanomiR can generate pathway activity profiles given a gene expression dataset
63
-and a list of pathways.Pathway summaries are numbers that represent the overall
64
-activity of genes that 
65
-belong to each pathway. These numbers are calculated based on a methodology
66
-previously described in part in  [@altschuler2013pathprinting;
67
-@joachim2018relative].
79
+PanomiR generates pathway activity summary profiles from gene expression data
80
+and a list of pathways. Pathway summaries are numbers that represent the overall
81
+activity of genes that belong to each pathway. These numbers are calculated
82
+based on a methodology previously described in part by Altschuler et al.
83
+[@altschuler2013pathprinting;@joachim2018relative].
68 84
 Briefly, genes in each sample are ranked by their expression values and then
69 85
 pathway summaries are calculated as the average rank-squared of genes within a 
70 86
 pathway. The summaries are then center and scaled (zNormalized) across samples.
... ...
@@ -80,10 +96,10 @@ this manual.
80 96
 
81 97
 This section uses a reduced example dataset from The Cancer Genome Atlas (TCGA)
82 98
 Liver Hepatocellular Carcinoma (LIHC) dataset to generate
83
-Pathway summary statistics [@ally2017comprehensive]. **Note:** Make sure that you
84
-select gene representation type that matches the rownames of your expression 
85
-data. The type can be modified using the `id` argument in the function below.
86
-The default value for this argument is `ENSEMBL`. 
99
+pathway summary statistics [@ally2017comprehensive]. **Note:** Make sure that
100
+you select a gene representation type that matches the rownames of your
101
+expression data. The type can be modified using the `id` argument in the
102
+function below. The default value for this argument is `ENSEMBL`. 
87 103
 
88 104
 ```{r load_package}
89 105
 library(PanomiR)
... ...
@@ -106,17 +122,18 @@ head(summaries)[,1:2]
106 122
 # Differential Pathway activation
107 123
 
108 124
 Once you generate the pathway activity profiles, as discussed in the last
109
-section, there are several analysis that you can perform. We have bundled some 
110
-of the most important ones into standalone functions. Here, we describe
111
-differential pathway activation profiling, which is examining differences in
112
-pathway activity profiles in user-determined conditions.
125
+section, there are several possible analyses that you can perform. We have
126
+bundled some of the most important ones into standalone functions. Here, we
127
+describe differential pathway activity profiling to determine dysregulatd
128
+pathways. This function analyzes differences in pathway activity profiles
129
+in user-determined conditions.
113 130
 
114 131
 At this stage you need to provide a pathway-gene association table, an
115
-expression dataset, and a covariates table. You need to specity what covariates
132
+expression dataset, and a covariates table. You need to specify covariates that
116 133
 you would like to contrast. You also need to provide a contrast, as formatted in
117
-limma. If the contrast is not provided, the function assumes the first two 
118
-levels of the provided contrast covariate. **Note:** make sure the contrast 
119
-covariate is formatted as factor.
134
+limma [@ritchie2015limma]. If the contrast is not provided, the function assumes
135
+the first two levels of the provided covariate are to be contrasted.
136
+**Note:** make sure the contrast covariate is formatted as factor.
120 137
 
121 138
 
122 139
 ```{r differential}
... ...
@@ -183,39 +200,42 @@ PanomiR identifies miRNAs that target clusters of pathways, as defined in the
183 200
 last section. In order to this, you would need a reference table of
184 201
 miRNA-Pathway association score (enrichment). We recommend using a customized
185 202
 miRNA-Pathway association table, tailored to your experimental data.
186
-This section provides an overview of prioritization process. Readers interested
187
-in knowing more about the technical details of PanomiR are refered to
188
-accompaniying publication (Work under preparation).
203
+This section provides an overview of prioritization process. Readers who 
204
+interested in knowing more about the technical details of PanomiR can access
205
+PanomiR's accompanying publication [@yeganeh2022panomir].
189 206
 
190 207
 ## Enrichment reference
191
-Here, we provide a preprocessed small example table of miRNA-pathway enrichment
208
+Here, we provide a pre-processed small example table of miRNA-pathway enrichment
192 209
 in `miniTestsPanomiR$miniEnrich` object. This table contains enrichment analysis
193 210
 results using Fisher's Exact Test between MSigDB pathways and TargetScan miRNA
194
-targets. The individual components are  accessible via `data(msigdb_c2)` and
211
+targets. The individual components are accessible via `data(msigdb_c2)` and
195 212
 `data(targetScan_03)` [@agarwal2015predicting; @liberzon2011molecular]. This
196
-example table is contains only a full subset of the full pairwise enrichment. 
197
-You can refer to [section 5](#geneset) of this manual on how to create full 
198
-tables and how to customize them to your specific gene expression data.
213
+example table contains only a subset of the full pairwise enrichment. 
214
+You can refer to [section 5](#geneset) of this manual to learn how to create
215
+enrichment tables and how to customize them to your specific gene expression
216
+data.
199 217
 
200 218
 ## Generating targeting scores
201
-PanomiR generates a score for individual miRNAs targeting a group of pathways.
202
-These scores are generated based on the reference enrichment table.
203
-We are interested in knowing to what extent each miRNA targets pathway clusters
204
-identified in the last step (see previous section). 
205
-PanomiR constructs a null distribution of this targeting score for each miRNA.
206
-The significance of observed scores from a given group of pathways (clusters
207
-in this case) is contrasted against the null distribution to generate a
208
-targeting p-value. These p-values are used to rank miRNAs per cluster.
219
+PanomiR generates individual scores for individual miRNAs, which quantify
220
+targeting a group of pathways. These scores are generated based on the reference
221
+enrichment table described in the previous section. We are interested in knowing
222
+to what extent each miRNA targets clusters of pathways identified in the last
223
+step (see previous section). 
224
+
225
+PanomiR constructs a null distribution of the targeting score for each miRNA.
226
+It then contrasts observed scores from a given group of pathways (clusters)
227
+against the null distribution in order to generate a targeting p-value.
228
+These p-values are used to rank miRNAs per cluster.
209 229
 
210 230
 ## Sampling parameter
211
-The above described process requires repeated sampling to empirically obtain the
231
+The process described above requires repeated sampling to empirically obtain the
212 232
 null distribution. The argument `sampRate` denotes the number of repeats in the
213 233
 process. Note that in the example below, we use a sampling rate of 50, the
214
-recommended rate is between 500-1000. Also, we set the saveSampling argument to
215
-FALSE. This argument, if set TRUE, ensures that the null distribution is obtain
216
-only once. This argument should be set to TRUE if you wish to save your sampling
217
-and check for different outputs from the clustering algorithms or pathway
218
-thresholds.
234
+recommended rate is between 500-1000. Also, we set the `saveSampling` argument
235
+to `FALSE`. This argument, when set `TRUE`, ensures that the null distribution
236
+is obtained only once. This argument should be set to TRUE if you wish to save
237
+your sampling and check for different outputs from the clustering algorithms or
238
+pathway thresholds.
219 239
 
220 240
 
221 241
 ```{r miRNA}
... ...
@@ -242,11 +262,11 @@ head(output2$Cluster1)
242 262
 
243 263
 # miRNA-Pathway enrichment tables 
244 264
 
245
-PanomiR best performs on tissue/experiment-customized datasets. In order to do
246
-this, you need to create a customized enrichment table. You can simply do so by
247
-using the pathway and miRNA list that we have provided as a part of the package.
248
-simply, plug in the name of the genes present (expressed) in your experiment in
249
-the following code
265
+We recommend using PanomiR with on tissue/experiment-customized datasets.
266
+In order to do this, you need to create a customized enrichment table.
267
+You can simply do so by using the pathway and miRNA list that we have provided
268
+as a part of the package. Simply, plug in the name of the genes that are present
269
+(expressed) in your experiment in the following code:
250 270
 
251 271
 
252 272
 
... ...
@@ -293,9 +313,9 @@ PanomiR can integrate genesets and pathways from external sources including
293 313
 those annotated in MSigDB. In order to do so, you need to provide a 
294 314
 `GeneSetCollection` object as defined in the `GSEABase` package. 
295 315
 
296
-The example below illustrates how to use external sources to create your 
297
-own customized pathway-gene association table. This customized can then
298
-replaced `path_gene_table` input in functions described in sections 1,2, and 5
316
+The example below illustrates using external sources to create your 
317
+own customized pathway-gene association table. This customized table can
318
+replace the `path_gene_table` input in sections 1, 2, and 5
299 319
 of this manual.
300 320
 
301 321
 ```{r customized_gsc}
... ...
@@ -1,3 +1,11 @@
1
+@article{yeganeh2022panomir,
2
+  title={PanomiR: A systems biology framework for analysis of multi-pathway targeting by miRNAs},
3
+  author={Naderi Yeganeh, Pourya and Teo, Yue Yang and Karagkouni, Dimitra and Pita-Juarez, Yered and Morgan, Sarah L and Vlachos, Ioannis S and Hide, Winston},
4
+  journal={bioRxiv},
5
+  year={2022},
6
+  publisher={Cold Spring Harbor Laboratory}
7
+}
8
+
1 9
 @article{pita2018pathway,
2 10
   title={The pathway Coexpression network: revealing pathway relationships},
3 11
   author={Pita-Ju{\'a}rez, Yered and Altschuler, Gabriel and Kariotis, Sokratis and Wei, Wenbin and Koler, Katju{\v{s}}a and Green, Claire and Tanzi, Rudolph E and Hide, Winston},
... ...
@@ -62,6 +70,17 @@
62 70
   publisher={eLife Sciences Publications Limited}
63 71
 }
64 72
 
73
+@article{ritchie2015limma,
74
+  title={limma powers differential expression analyses for RNA-sequencing and microarray studies},
75
+  author={Ritchie, Matthew E and Phipson, Belinda and Wu, DI and Hu, Yifang and Law, Charity W and Shi, Wei and Smyth, Gordon K},
76
+  journal={Nucleic acids research},
77
+  volume={43},
78
+  number={7},
79
+  pages={e47--e47},
80
+  year={2015},
81
+  publisher={Oxford Academic}
82
+}
83
+
65 84
 @article{subramanian2005gene,
66 85
   title={Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles},
67 86
   author={Subramanian, Aravind and Tamayo, Pablo and Mootha, Vamsi K and Mukherjee, Sayan and Ebert, Benjamin L and Gillette, Michael A and Paulovich, Amanda and Pomeroy, Scott L and Golub, Todd R and Lander, Eric S and others},