...
|
...
|
@@ -19,9 +19,25 @@ knitr::opts_chunk$set(
|
19
|
19
|
|
20
|
20
|
# Introduction
|
21
|
21
|
|
22
|
|
-PanomiR is a package for pathway and microRNA Analysis of gene expression data.
|
23
|
|
-This document provides details about how to install and utilize various
|
24
|
|
-functionality in PanomiR.
|
|
22
|
+MicroRNAs (miRNAs) can target co-expressed genes to coordinate multiple
|
|
23
|
+pathways. “Pathway networks of miRNA Regulation” (PanomiR) is a
|
|
24
|
+framework to support the discovery of miRNA regulators based on their targeting
|
|
25
|
+of coordinated pathways. It analyzes and prioritizes multi-pathway dynamics
|
|
26
|
+of miRNA-orchestrated regulation, as opposed to investigating isolated
|
|
27
|
+miRNA-pathway interaction events. PanomiR uses predefined pathways, their
|
|
28
|
+co-activation, gene expression, and annotated miRNA-mRNA interactions to extract
|
|
29
|
+miRNA-pathway targeting events. This vignette describes PanomiR’s functions
|
|
30
|
+and analysis tools to derive these multi-pathway targeting events.
|
|
31
|
+
|
|
32
|
+If you use PanomiR for your research, please cite PanomiR's manuscript
|
|
33
|
+[@yeganeh2022panomir]. Please send any questions/suggestions you may have to
|
|
34
|
+`pnaderiy [at] bidmc [dot] harvard [dot] edu` or submit Github issues at
|
|
35
|
+[https://github.com/pouryany/PanomiR]().
|
|
36
|
+
|
|
37
|
+Naderi Yeganeh, Pourya, Yue Yang Teo, Dimitra Karagkouni,
|
|
38
|
+Yered Pita-Juarez, Sarah L. Morgan, Ioannis S. Vlachos, and Winston Hide.
|
|
39
|
+"PanomiR: A systems biology framework for analysis of multi-pathway targeting
|
|
40
|
+by miRNAs." bioRxiv (2022). doi: [https://doi.org/10.1101/2022.07.12.499819]().
|
25
|
41
|
|
26
|
42
|
# Installation
|
27
|
43
|
|
...
|
...
|
@@ -43,28 +59,28 @@ devtools::install_github("pouryany/PanomiR")
|
43
|
59
|
|
44
|
60
|
# Overview
|
45
|
61
|
|
46
|
|
-PanomiR is a pipeline to prioritize disease-associated miRNAs based on activity
|
|
62
|
+PanomiR is a framework to prioritize disease-associated miRNAs using activity
|
47
|
63
|
of disease-associated pathways. The input datasets for PanomiR are (a) a gene
|
48
|
|
-expression disease dataset along with covariates, (b) a background collection
|
49
|
|
-of pathways/genesets, and (c) a collection of miRNAs containing gene targets.
|
|
64
|
+expression dataset along with covariates such as disease-state and batch,
|
|
65
|
+(b) a background collection of pathways/genesets, and (c) a collection of
|
|
66
|
+miRNAs and their gene targets.
|
50
|
67
|
|
51
|
|
-The general workflow of PanomiR is (a) generation of pathway summary statistics
|
52
|
|
-from gene expression data, (b) detection of differentially activated pathways,
|
53
|
|
-(c) finding coherent groups, or clusters, of differentially activated pathways,
|
54
|
|
-and (d) detecting miRNAs targeting each group of pathways.
|
|
68
|
+The workflow of PanomiR includes (a) generation of pathway summary
|
|
69
|
+statistics from gene expression data, (b) detection of differentially activated
|
|
70
|
+pathways, (c) finding coherent groups, or clusters, of differentially activated
|
|
71
|
+pathways, and (d) detecting miRNAs that target each group of pathways.
|
55
|
72
|
|
56
|
|
-Individual steps of the workflow can be used in isolation to carry out different
|
57
|
|
-analyses. The following sections outline each step and material needed to
|
|
73
|
+Individual steps of the workflow can be used in isolation to carry out specific
|
|
74
|
+analyses. The following sections outline each step and the material needed to
|
58
|
75
|
execute PanomiR.
|
59
|
76
|
|
60
|
77
|
# Pathway summarization
|
61
|
78
|
|
62
|
|
-PanomiR can generate pathway activity profiles given a gene expression dataset
|
63
|
|
-and a list of pathways.Pathway summaries are numbers that represent the overall
|
64
|
|
-activity of genes that
|
65
|
|
-belong to each pathway. These numbers are calculated based on a methodology
|
66
|
|
-previously described in part in [@altschuler2013pathprinting;
|
67
|
|
-@joachim2018relative].
|
|
79
|
+PanomiR generates pathway activity summary profiles from gene expression data
|
|
80
|
+and a list of pathways. Pathway summaries are numbers that represent the overall
|
|
81
|
+activity of genes that belong to each pathway. These numbers are calculated
|
|
82
|
+based on a methodology previously described in part by Altschuler et al.
|
|
83
|
+[@altschuler2013pathprinting;@joachim2018relative].
|
68
|
84
|
Briefly, genes in each sample are ranked by their expression values and then
|
69
|
85
|
pathway summaries are calculated as the average rank-squared of genes within a
|
70
|
86
|
pathway. The summaries are then center and scaled (zNormalized) across samples.
|
...
|
...
|
@@ -80,10 +96,10 @@ this manual.
|
80
|
96
|
|
81
|
97
|
This section uses a reduced example dataset from The Cancer Genome Atlas (TCGA)
|
82
|
98
|
Liver Hepatocellular Carcinoma (LIHC) dataset to generate
|
83
|
|
-Pathway summary statistics [@ally2017comprehensive]. **Note:** Make sure that you
|
84
|
|
-select gene representation type that matches the rownames of your expression
|
85
|
|
-data. The type can be modified using the `id` argument in the function below.
|
86
|
|
-The default value for this argument is `ENSEMBL`.
|
|
99
|
+pathway summary statistics [@ally2017comprehensive]. **Note:** Make sure that
|
|
100
|
+you select a gene representation type that matches the rownames of your
|
|
101
|
+expression data. The type can be modified using the `id` argument in the
|
|
102
|
+function below. The default value for this argument is `ENSEMBL`.
|
87
|
103
|
|
88
|
104
|
```{r load_package}
|
89
|
105
|
library(PanomiR)
|
...
|
...
|
@@ -106,17 +122,18 @@ head(summaries)[,1:2]
|
106
|
122
|
# Differential Pathway activation
|
107
|
123
|
|
108
|
124
|
Once you generate the pathway activity profiles, as discussed in the last
|
109
|
|
-section, there are several analysis that you can perform. We have bundled some
|
110
|
|
-of the most important ones into standalone functions. Here, we describe
|
111
|
|
-differential pathway activation profiling, which is examining differences in
|
112
|
|
-pathway activity profiles in user-determined conditions.
|
|
125
|
+section, there are several possible analyses that you can perform. We have
|
|
126
|
+bundled some of the most important ones into standalone functions. Here, we
|
|
127
|
+describe differential pathway activity profiling to determine dysregulatd
|
|
128
|
+pathways. This function analyzes differences in pathway activity profiles
|
|
129
|
+in user-determined conditions.
|
113
|
130
|
|
114
|
131
|
At this stage you need to provide a pathway-gene association table, an
|
115
|
|
-expression dataset, and a covariates table. You need to specity what covariates
|
|
132
|
+expression dataset, and a covariates table. You need to specify covariates that
|
116
|
133
|
you would like to contrast. You also need to provide a contrast, as formatted in
|
117
|
|
-limma. If the contrast is not provided, the function assumes the first two
|
118
|
|
-levels of the provided contrast covariate. **Note:** make sure the contrast
|
119
|
|
-covariate is formatted as factor.
|
|
134
|
+limma [@ritchie2015limma]. If the contrast is not provided, the function assumes
|
|
135
|
+the first two levels of the provided covariate are to be contrasted.
|
|
136
|
+**Note:** make sure the contrast covariate is formatted as factor.
|
120
|
137
|
|
121
|
138
|
|
122
|
139
|
```{r differential}
|
...
|
...
|
@@ -183,39 +200,42 @@ PanomiR identifies miRNAs that target clusters of pathways, as defined in the
|
183
|
200
|
last section. In order to this, you would need a reference table of
|
184
|
201
|
miRNA-Pathway association score (enrichment). We recommend using a customized
|
185
|
202
|
miRNA-Pathway association table, tailored to your experimental data.
|
186
|
|
-This section provides an overview of prioritization process. Readers interested
|
187
|
|
-in knowing more about the technical details of PanomiR are refered to
|
188
|
|
-accompaniying publication (Work under preparation).
|
|
203
|
+This section provides an overview of prioritization process. Readers who
|
|
204
|
+interested in knowing more about the technical details of PanomiR can access
|
|
205
|
+PanomiR's accompanying publication [@yeganeh2022panomir].
|
189
|
206
|
|
190
|
207
|
## Enrichment reference
|
191
|
|
-Here, we provide a preprocessed small example table of miRNA-pathway enrichment
|
|
208
|
+Here, we provide a pre-processed small example table of miRNA-pathway enrichment
|
192
|
209
|
in `miniTestsPanomiR$miniEnrich` object. This table contains enrichment analysis
|
193
|
210
|
results using Fisher's Exact Test between MSigDB pathways and TargetScan miRNA
|
194
|
|
-targets. The individual components are accessible via `data(msigdb_c2)` and
|
|
211
|
+targets. The individual components are accessible via `data(msigdb_c2)` and
|
195
|
212
|
`data(targetScan_03)` [@agarwal2015predicting; @liberzon2011molecular]. This
|
196
|
|
-example table is contains only a full subset of the full pairwise enrichment.
|
197
|
|
-You can refer to [section 5](#geneset) of this manual on how to create full
|
198
|
|
-tables and how to customize them to your specific gene expression data.
|
|
213
|
+example table contains only a subset of the full pairwise enrichment.
|
|
214
|
+You can refer to [section 5](#geneset) of this manual to learn how to create
|
|
215
|
+enrichment tables and how to customize them to your specific gene expression
|
|
216
|
+data.
|
199
|
217
|
|
200
|
218
|
## Generating targeting scores
|
201
|
|
-PanomiR generates a score for individual miRNAs targeting a group of pathways.
|
202
|
|
-These scores are generated based on the reference enrichment table.
|
203
|
|
-We are interested in knowing to what extent each miRNA targets pathway clusters
|
204
|
|
-identified in the last step (see previous section).
|
205
|
|
-PanomiR constructs a null distribution of this targeting score for each miRNA.
|
206
|
|
-The significance of observed scores from a given group of pathways (clusters
|
207
|
|
-in this case) is contrasted against the null distribution to generate a
|
208
|
|
-targeting p-value. These p-values are used to rank miRNAs per cluster.
|
|
219
|
+PanomiR generates individual scores for individual miRNAs, which quantify
|
|
220
|
+targeting a group of pathways. These scores are generated based on the reference
|
|
221
|
+enrichment table described in the previous section. We are interested in knowing
|
|
222
|
+to what extent each miRNA targets clusters of pathways identified in the last
|
|
223
|
+step (see previous section).
|
|
224
|
+
|
|
225
|
+PanomiR constructs a null distribution of the targeting score for each miRNA.
|
|
226
|
+It then contrasts observed scores from a given group of pathways (clusters)
|
|
227
|
+against the null distribution in order to generate a targeting p-value.
|
|
228
|
+These p-values are used to rank miRNAs per cluster.
|
209
|
229
|
|
210
|
230
|
## Sampling parameter
|
211
|
|
-The above described process requires repeated sampling to empirically obtain the
|
|
231
|
+The process described above requires repeated sampling to empirically obtain the
|
212
|
232
|
null distribution. The argument `sampRate` denotes the number of repeats in the
|
213
|
233
|
process. Note that in the example below, we use a sampling rate of 50, the
|
214
|
|
-recommended rate is between 500-1000. Also, we set the saveSampling argument to
|
215
|
|
-FALSE. This argument, if set TRUE, ensures that the null distribution is obtain
|
216
|
|
-only once. This argument should be set to TRUE if you wish to save your sampling
|
217
|
|
-and check for different outputs from the clustering algorithms or pathway
|
218
|
|
-thresholds.
|
|
234
|
+recommended rate is between 500-1000. Also, we set the `saveSampling` argument
|
|
235
|
+to `FALSE`. This argument, when set `TRUE`, ensures that the null distribution
|
|
236
|
+is obtained only once. This argument should be set to TRUE if you wish to save
|
|
237
|
+your sampling and check for different outputs from the clustering algorithms or
|
|
238
|
+pathway thresholds.
|
219
|
239
|
|
220
|
240
|
|
221
|
241
|
```{r miRNA}
|
...
|
...
|
@@ -242,11 +262,11 @@ head(output2$Cluster1)
|
242
|
262
|
|
243
|
263
|
# miRNA-Pathway enrichment tables
|
244
|
264
|
|
245
|
|
-PanomiR best performs on tissue/experiment-customized datasets. In order to do
|
246
|
|
-this, you need to create a customized enrichment table. You can simply do so by
|
247
|
|
-using the pathway and miRNA list that we have provided as a part of the package.
|
248
|
|
-simply, plug in the name of the genes present (expressed) in your experiment in
|
249
|
|
-the following code
|
|
265
|
+We recommend using PanomiR with on tissue/experiment-customized datasets.
|
|
266
|
+In order to do this, you need to create a customized enrichment table.
|
|
267
|
+You can simply do so by using the pathway and miRNA list that we have provided
|
|
268
|
+as a part of the package. Simply, plug in the name of the genes that are present
|
|
269
|
+(expressed) in your experiment in the following code:
|
250
|
270
|
|
251
|
271
|
|
252
|
272
|
|
...
|
...
|
@@ -293,9 +313,9 @@ PanomiR can integrate genesets and pathways from external sources including
|
293
|
313
|
those annotated in MSigDB. In order to do so, you need to provide a
|
294
|
314
|
`GeneSetCollection` object as defined in the `GSEABase` package.
|
295
|
315
|
|
296
|
|
-The example below illustrates how to use external sources to create your
|
297
|
|
-own customized pathway-gene association table. This customized can then
|
298
|
|
-replaced `path_gene_table` input in functions described in sections 1,2, and 5
|
|
316
|
+The example below illustrates using external sources to create your
|
|
317
|
+own customized pathway-gene association table. This customized table can
|
|
318
|
+replace the `path_gene_table` input in sections 1, 2, and 5
|
299
|
319
|
of this manual.
|
300
|
320
|
|
301
|
321
|
```{r customized_gsc}
|