Browse code

Version 0.99.12: this version includes several improvements, including a completely revised vignette where all chunks are evaluated, minor tweaks to default values for differential expression analysis, and some bug fixes to the error bars in the `plotDE()` function. Other issues, such as artifacts in show methods, lacks in documentation, and dependence in DESCRIPTION, have been addressed too.

jacopo-ronchi authored on 12/02/2024 13:37:25
Showing 15 changed files

... ...
@@ -1,6 +1,6 @@
1 1
 Package: MIRit
2 2
 Title: Integrate microRNA and gene expression to decipher pathway complexity
3
-Version: 0.99.11
3
+Version: 0.99.12
4 4
 Date: 2023-11-23
5 5
 Authors@R: c(
6 6
     person("Jacopo", "Ronchi", email = "jacopo.ronchi@unimib.it",
... ...
@@ -18,7 +18,7 @@ Description: MIRit is an R package that provides several methods for
18 18
     characterization.
19 19
 License: GPL (>= 3)
20 20
 URL: https://github.com/jacopo-ronchi/MIRit
21
-BugReports: https://support.bioconductor.org/tag/MIRit
21
+BugReports: https://github.com/jacopo-ronchi/MIRit/issues
22 22
 biocViews: Software, GeneRegulation, NetworkEnrichment, NetworkInference, Epigenetics, FunctionalGenomics, SystemsBiology, Network, Pathways, GeneExpression, DifferentialExpression
23 23
 Encoding: UTF-8
24 24
 Roxygen: list(markdown = TRUE)
... ...
@@ -41,7 +41,6 @@ Imports:
41 41
     httr,
42 42
     limma,
43 43
     methods,
44
-    MultiAssayExperiment,
45 44
     Rcpp,
46 45
     readxl,
47 46
     Rgraphviz (>= 2.44.0),
... ...
@@ -79,6 +78,7 @@ Suggests:
79 78
     rmarkdown,
80 79
     testthat (>= 3.0.0)
81 80
 Depends: 
81
+    MultiAssayExperiment,
82 82
     R (>= 4.4.0)
83 83
 LazyData: false
84 84
 VignetteBuilder: knitr
... ...
@@ -78,7 +78,6 @@ importFrom(BiocParallel,bpprogressbar)
78 78
 importFrom(BiocParallel,bptasks)
79 79
 importFrom(MultiAssayExperiment,MultiAssayExperiment)
80 80
 importFrom(Rcpp,sourceCpp)
81
-importFrom(ggpubr,mean_sd)
82 81
 importFrom(grDevices,col2rgb)
83 82
 importFrom(grDevices,colorRampPalette)
84 83
 importFrom(graphics,arrows)
... ...
@@ -1,3 +1,11 @@
1
+# MIRit 0.99.12
2
+
3
+This version includes several improvements, including a completely revised
4
+vignette where all chunks are evaluated, minor tweaks to default values for
5
+differential expression analysis, and some bug fixes to the error bars in the
6
+`plotDE()` function. Other issues, such as artifacts in show methods, lacks in
7
+documentation, and dependence in DESCRIPTION, have been addressed too.
8
+
1 9
 # MIRit 0.99.11
2 10
 
3 11
 This new version introduces the possibility of limiting validated targets
... ...
@@ -140,8 +140,8 @@ searchDisease <- function(diseaseName) {
140 140
 #'
141 141
 #' \donttest{
142 142
 #' # search disease
143
-#' searchDisease("Alzheimer disease")
144
-#' disId <- "Alzheimer disease"
143
+#' searchDisease("response to antidepressant")
144
+#' disId <- "response to antidepressant"
145 145
 #'
146 146
 #' # retrieve associated SNPs
147 147
 #' association <- findMirnaSNPs(obj, disId)
... ...
@@ -3,10 +3,11 @@
3 3
 #' `performMirnaDE()` and `performGeneDE()` are two functions provided by MIRit
4 4
 #' to conduct miRNA and gene differential expression analysis, respectively.
5 5
 #' In particular, these functions allow the user to compute differential
6
-#' expression through different methods, namely `edgeR`, `DESeq2`, `limma-voom`
7
-#' and `limma`. Data deriving from NGS experiments and microarray technology
8
-#' are all suitable for these functions. For precise indications about how to
9
-#' use these functions, please refer to the *details* section.
6
+#' expression through different methods, namely `edgeR` (Quasi-Likelihood
7
+#' framework), `DESeq2`, `limma-voom` and `limma`. Data deriving from NGS
8
+#' experiments and microarray technology are all suitable for these functions.
9
+#' For precise indications about how to use these functions, please refer to
10
+#' the *details* section.
10 11
 #'
11 12
 #' @details
12 13
 #' When performing differential expression for NGS experiments, count matrices
... ...
@@ -69,7 +70,7 @@
69 70
 #' `DESeq2`, and `voom` (for limma-voom). Instead, for microarray data, only
70 71
 #' `limma` can be used
71 72
 #' @param logFC The minimum log2 fold change required to consider a gene as
72
-#' differentially expressed. Default is 1, to retain only two-fold differences
73
+#' differentially expressed. Optional, default is 0
73 74
 #' @param pCutoff The adjusted p-value cutoff to use for statistical
74 75
 #' significance. The default value is `0.05`
75 76
 #' @param pAdjustment The p-value correction method for multiple testing. It
... ...
@@ -188,7 +189,7 @@ performMirnaDE <- function(
188 189
         contrast,
189 190
         design,
190 191
         method = "edgeR",
191
-        logFC = 1,
192
+        logFC = 0,
192 193
         pCutoff = 0.05,
193 194
         pAdjustment = "fdr",
194 195
         filterByExpr.args = list(),
... ...
@@ -254,7 +255,7 @@ performGeneDE <- function(
254 255
         contrast,
255 256
         design,
256 257
         method = "edgeR",
257
-        logFC = 1,
258
+        logFC = 0,
258 259
         pCutoff = 0.05,
259 260
         pAdjustment = "fdr",
260 261
         filterByExpr.args = list(),
... ...
@@ -319,7 +320,7 @@ performDE <- function(assay,
319 320
     contrast,
320 321
     design,
321 322
     method = "edgeR",
322
-    logFC = 1,
323
+    logFC = 0,
323 324
     pCutoff = 0.05,
324 325
     pAdjustment = "fdr",
325 326
     filterByExpr.args = list(),
... ...
@@ -391,7 +392,7 @@ performDE <- function(assay,
391 392
         length(logFC) != 1 |
392 393
         logFC < 0) {
393 394
         stop("'logFC' must be a non-neagtive number that specifies the ",
394
-            "minimum absolute significant fold change (default is 1)",
395
+            "minimum absolute significant fold change (default is 0)",
395 396
             call. = FALSE
396 397
         )
397 398
     }
... ...
@@ -732,8 +733,8 @@ edgeR.DE <- function(counts,
732 733
     deRes <- identifyColNames(deRes)
733 734
 
734 735
     ## select significant features
735
-    sig <- rownames(deRes[abs(deRes$logFC) > logFC &
736
-        deRes$adj.P.Val < pCutoff, ])
736
+    sig <- rownames(deRes[abs(deRes$logFC) >= logFC &
737
+        deRes$adj.P.Val <= pCutoff, ])
737 738
 
738 739
     ## create a list with DE results
739 740
     deList <- list(
... ...
@@ -807,8 +808,8 @@ DESeq2.DE <- function(counts,
807 808
     deRes <- identifyColNames(deRes)
808 809
 
809 810
     ## select significant features
810
-    sig <- rownames(deRes[abs(deRes$logFC) > logFC &
811
-        deRes$adj.P.Val < pCutoff, ])
811
+    sig <- rownames(deRes[abs(deRes$logFC) >= logFC &
812
+        deRes$adj.P.Val <= pCutoff, ])
812 813
 
813 814
     ## create a list with DE results
814 815
     deList <- list(
... ...
@@ -943,8 +944,8 @@ voom.DE <- function(counts,
943 944
     deRes <- identifyColNames(deRes)
944 945
 
945 946
     ## select significant features
946
-    sig <- rownames(deRes[abs(deRes$logFC) > logFC &
947
-        deRes$adj.P.Val < pCutoff, ])
947
+    sig <- rownames(deRes[abs(deRes$logFC) >= logFC &
948
+        deRes$adj.P.Val <= pCutoff, ])
948 949
 
949 950
     ## create a list with DE results
950 951
     deList <- list(
... ...
@@ -1079,8 +1080,8 @@ limma.DE <- function(expr,
1079 1080
     deRes <- identifyColNames(deRes)
1080 1081
 
1081 1082
     ## select significant features
1082
-    sig <- rownames(deRes[abs(deRes$logFC) > logFC &
1083
-        deRes$adj.P.Val < pCutoff, ])
1083
+    sig <- rownames(deRes[abs(deRes$logFC) >= logFC &
1084
+        deRes$adj.P.Val <= pCutoff, ])
1084 1085
 
1085 1086
     ## create a list with DE results
1086 1087
     deList <- list(
... ...
@@ -1166,15 +1167,14 @@ limma.DE <- function(expr,
1166 1167
 #' expression analysis. Check the *details* section to see the required format.
1167 1168
 #' Default is NULL not to add gene differential expression results
1168 1169
 #' @param mirna.logFC The minimum log2 fold change required to consider a miRNA
1169
-#' as differentially expressed. Default is 1, to retain only two-fold
1170
-#' differences
1170
+#' as differentially expressed. Optional, default is 0
1171 1171
 #' @param mirna.pCutoff The adjusted p-value cutoff to use for miRNA statistical
1172 1172
 #' significance. The default value is `0.05`
1173 1173
 #' @param mirna.pAdjustment The p-value correction method for miRNA multiple
1174 1174
 #' testing. It must be one of: `fdr` (default), `BH`, `none`, `holm`,
1175 1175
 #' `hochberg`, `hommel`, `bonferroni`, `BY`
1176 1176
 #' @param gene.logFC The minimum log2 fold change required to consider a gene as
1177
-#' differentially expressed. Default is 1, to retain only two-fold differences
1177
+#' differentially expressed. Optional, default is 0
1178 1178
 #' @param gene.pCutoff The adjusted p-value cutoff to use for gene statistical
1179 1179
 #' significance. The default value is `0.05`
1180 1180
 #' @param gene.pAdjustment The p-value correction method for gene multiple
... ...
@@ -1211,8 +1211,8 @@ limma.DE <- function(expr,
1211 1211
 #'
1212 1212
 #' # add DE results to MirnaExperiment object
1213 1213
 #' obj <- addDifferentialExpression(obj, de_m, de_g,
1214
-#'     mirna.logFC = 1, mirna.pCutoff = 0.05,
1215
-#'     gene.logFC = 1, gene.pCutoff = 0.05
1214
+#'     mirna.pCutoff = 0.05,
1215
+#'     gene.pCutoff = 0.05
1216 1216
 #' )
1217 1217
 #'
1218 1218
 #' @author
... ...
@@ -1222,10 +1222,10 @@ limma.DE <- function(expr,
1222 1222
 addDifferentialExpression <- function(mirnaObj,
1223 1223
     mirnaDE = NULL,
1224 1224
     geneDE = NULL,
1225
-    mirna.logFC = 1,
1225
+    mirna.logFC = 0,
1226 1226
     mirna.pCutoff = 0.05,
1227 1227
     mirna.pAdjustment = "fdr",
1228
-    gene.logFC = 1,
1228
+    gene.logFC = 0,
1229 1229
     gene.pCutoff = 0.05,
1230 1230
     gene.pAdjustment = "fdr") {
1231 1231
     ## check inputs
... ...
@@ -1239,7 +1239,7 @@ addDifferentialExpression <- function(mirnaObj,
1239 1239
         length(mirna.logFC) != 1 |
1240 1240
         mirna.logFC < 0) {
1241 1241
         stop("'mirna.logFC' must be a non-neagtive number that specifies the ",
1242
-            "minimum absolute significant fold change (default is 1)",
1242
+            "minimum absolute significant fold change (default is 0)",
1243 1243
             call. = FALSE
1244 1244
         )
1245 1245
     }
... ...
@@ -1268,7 +1268,7 @@ addDifferentialExpression <- function(mirnaObj,
1268 1268
         length(gene.logFC) != 1 |
1269 1269
         gene.logFC < 0) {
1270 1270
         stop("'gene.logFC' must be a non-neagtive number that specifies the ",
1271
-            "minimum absolute significant fold change (default is 1)",
1271
+            "minimum absolute significant fold change (default is 0)",
1272 1272
             call. = FALSE
1273 1273
         )
1274 1274
     }
... ...
@@ -1317,8 +1317,8 @@ addDifferentialExpression <- function(mirnaObj,
1317 1317
         }
1318 1318
 
1319 1319
         ## define significantly differentially expressed miRNAs
1320
-        significantMirnas <- mirnaDE$ID[abs(mirnaDE$logFC) > mirna.logFC &
1321
-            mirnaDE$adj.P.Val < mirna.pCutoff]
1320
+        significantMirnas <- mirnaDE$ID[abs(mirnaDE$logFC) >= mirna.logFC &
1321
+            mirnaDE$adj.P.Val <= mirna.pCutoff]
1322 1322
 
1323 1323
         ## add miRNA differential expression results to mirnaObj
1324 1324
         mirnaDE(mirnaObj) <- list(
... ...
@@ -1358,8 +1358,8 @@ addDifferentialExpression <- function(mirnaObj,
1358 1358
         }
1359 1359
 
1360 1360
         ## define significantly differentially expressed genes
1361
-        significantGenes <- geneDE$ID[abs(geneDE$logFC) > gene.logFC &
1362
-            geneDE$adj.P.Val < gene.pCutoff]
1361
+        significantGenes <- geneDE$ID[abs(geneDE$logFC) >= gene.logFC &
1362
+            geneDE$adj.P.Val <= gene.pCutoff]
1363 1363
 
1364 1364
         ## add gene differential expression results to mirnaObj
1365 1365
         geneDE(mirnaObj) <- list(
... ...
@@ -845,8 +845,7 @@ fryMirnaTargets <- function(mirnaObj,
845 845
             y = expr,
846 846
             index = tgList,
847 847
             design = des,
848
-            contrast = con,
849
-            adjust.method = pAdjustment
848
+            contrast = con
850 849
         )
851 850
     } else {
852 851
         ## perform miRNA-target integration through 'fry'
... ...
@@ -854,10 +853,12 @@ fryMirnaTargets <- function(mirnaObj,
854 853
             y = expr,
855 854
             index = tgList,
856 855
             design = des,
857
-            contrast = con,
858
-            adjust.method = pAdjustment
856
+            contrast = con
859 857
         )
860 858
     }
859
+    
860
+    ## correct p-values for multiple testing
861
+    rs$adj.P.Val <- p.adjust(rs$PValue, method = pAdjustment)
861 862
 
862 863
     ## retain effects in the right direction
863 864
     dem <- mirnaDE(mirnaObj)
... ...
@@ -867,7 +868,7 @@ fryMirnaTargets <- function(mirnaObj,
867 868
         (rownames(rs) %in% downDem & rs$Direction == "Up"), ]
868 869
 
869 870
     ## maintain interactions under the specified cutoff
870
-    res <- rs[rs$FDR <= pCutoff, ]
871
+    res <- rs[rs$adj.P.Val <= pCutoff, ]
871 872
 
872 873
     ## print integration results
873 874
     if (nrow(res) == 0) {
... ...
@@ -893,7 +894,7 @@ fryMirnaTargets <- function(mirnaObj,
893 894
         }, res$microRNA, res$mirna.direction)
894 895
         res$DE_targets <- deTarg[1, ]
895 896
         res$DE <- deTarg[2, ]
896
-        res <- res[, c(7, 8, 2, 10, 1, 3, 4, 9)]
897
+        res <- res[, c(8, 9, 2, 11, 1, 3, 7, 10)]
897 898
         colnames(res) <- c(
898 899
             "microRNA", "mirna.direction", "gene.direction",
899 900
             "DE", "targets", "P.Val", "adj.P.Val", "DE.targets"
... ...
@@ -8,15 +8,15 @@
8 8
 #' @export
9 9
 #' @importMethodsFrom MultiAssayExperiment show
10 10
 setMethod("show", "MirnaExperiment", function(object) {
11
-    cat("An object of class MirnaExperiment, which extends",
11
+    cat("An object of class MirnaExperiment, which extends ",
12 12
         "MultiAssayExperiment class and contains:\n\n",
13 13
         "\t- microRNA expression values: ",
14
-        class(experiments(object)[["microRNA"]]), " with ",
14
+        class(experiments(object)[["microRNA"]])[1], " with ",
15 15
         nrow(experiments(object)[["microRNA"]]),
16 16
         " rows and ",
17 17
         ncol(experiments(object)[["microRNA"]]),
18 18
         " columns\n", "\t- gene expression values: ",
19
-        class(experiments(object)[["genes"]]), " with ",
19
+        class(experiments(object)[["genes"]])[1], " with ",
20 20
         nrow(experiments(object)[["genes"]]), " rows and ",
21 21
         ncol(experiments(object)[["genes"]]), " columns\n",
22 22
         "\t- samples metadata: ", class(colData(object)),
... ...
@@ -61,13 +61,13 @@ setMethod("show", "MirnaExperiment", function(object) {
61 61
 #' @export
62 62
 setMethod("show", "FunctionalEnrichment", function(object) {
63 63
     cat("Object of class FunctionalEnrichment containing:\n\n",
64
-        "\t- over-representation analysis results: ",
64
+        "\t- functional enrichment results: ",
65 65
         class(enrichmentResults(object)), " with ",
66 66
         nrow(enrichmentResults(object)), " rows and ",
67 67
         ncol(enrichmentResults(object)), " columns\n",
68
-        "\t- functional enrichment analysis: ", object@method, "\n",
68
+        "\t- enrichment method: ", object@method, "\n",
69 69
         "\t- organism: ", object@organism, "\n",
70
-        "\t- gene sets database: ", enrichmentDatabase(object), "\n",
70
+        "\t- gene-sets database: ", enrichmentDatabase(object), "\n",
71 71
         "\t- p-value cutoff used: ", object@pCutoff, "\n",
72 72
         "\t- p-value adjustment method: ", object@pAdjustment, "\n",
73 73
         "\t- features used for the enrichment: ", class(object@features),
... ...
@@ -2338,7 +2338,6 @@ plotCorrelation <- function(mirnaObj,
2338 2338
 #' @author
2339 2339
 #' Jacopo Ronchi, \email{jacopo.ronchi@@unimib.it}
2340 2340
 #'
2341
-#' @importFrom ggpubr mean_sd
2342 2341
 #' @export
2343 2342
 plotDE <- function(mirnaObj,
2344 2343
     features,
... ...
@@ -2621,16 +2620,30 @@ plotDE <- function(mirnaObj,
2621 2620
             y = "Expression", fill = "Condition"
2622 2621
         )
2623 2622
     } else if (graph == "barplot") {
2623
+        ## calculate standard error intervals
2624
+        interval <- vapply(seq(nrow(exprDf)), function(x) {
2625
+            g <- exprDf[x, "Gene"]
2626
+            cG <- exprDf[x, "Condition"]
2627
+            expCond <- exprDf[exprDf$Gene == g & exprDf$Condition == cG, ]
2628
+            se <- sd(expCond$Expression)/sqrt(nrow(expCond))
2629
+            c(mean(expCond$Expression) + se,
2630
+              mean(expCond$Expression - se))
2631
+        }, FUN.VALUE = numeric(2))
2632
+        exprDf$upperCi <- interval[1, ]
2633
+        exprDf$lowerCi <- interval[2, ]
2634
+        
2624 2635
         ## create a grouped barplot
2625 2636
         dePlot <- ggpubr::ggbarplot(
2626 2637
             data = exprDf,
2627 2638
             x = "Gene",
2628 2639
             y = "Expression",
2629 2640
             fill = "Condition",
2630
-            position = ggplot2::position_dodge(0.8),
2631
-            add = "mean_sd",
2632
-            error.plot = "upper_errorbar"
2633
-        )
2641
+            merge = TRUE,
2642
+            add = "mean"
2643
+        ) + geom_errorbar(aes(group = Condition, ymax = upperCi,
2644
+                              ymin = lowerCi),
2645
+                          position = position_dodge(width = 0.8),
2646
+                          width = 0.25)
2634 2647
     } else if (graph == "violinplot") {
2635 2648
         ## create a grouped violinplot
2636 2649
         dePlot <- ggpubr::ggviolin(
... ...
@@ -2976,10 +2989,10 @@ plotVolcano <- function(mirnaObj,
2976 2989
 
2977 2990
     ## determine Up and Downregulated features
2978 2991
     featDE$Change <- "Stable"
2979
-    featDE$Change[which(featDE$logFC > lCut &
2980
-        featDE$adj.P.Val < pCut)] <- "Up"
2981
-    featDE$Change[which(featDE$logFC < -lCut &
2982
-        featDE$adj.P.Val < pCut)] <- "Down"
2992
+    featDE$Change[which(featDE$logFC >= lCut &
2993
+        featDE$adj.P.Val <= pCut)] <- "Up"
2994
+    featDE$Change[which(featDE$logFC <= -lCut &
2995
+        featDE$adj.P.Val <= pCut)] <- "Down"
2983 2996
 
2984 2997
     ## determine the labels to show
2985 2998
     if (is.character(labels)) {
... ...
@@ -2991,7 +3004,7 @@ plotVolcano <- function(mirnaObj,
2991 3004
             featDE$ID[which(!featDE$ID %in% labels)] <- ""
2992 3005
         }
2993 3006
     } else if (is.numeric(labels)) {
2994
-        fcFeat <- featDE[abs(featDE$logFC) > lCut, ]
3007
+        fcFeat <- featDE[abs(featDE$logFC) >= lCut, ]
2995 3008
         featDE$ID[which(!featDE$ID %in% fcFeat$ID[seq(labels)])] <- ""
2996 3009
     }
2997 3010
 
... ...
@@ -3007,18 +3020,28 @@ plotVolcano <- function(mirnaObj,
3007 3020
     ) +
3008 3021
         ggplot2::geom_point(alpha = pointAlpha, size = pointSize) +
3009 3022
         ggplot2::scale_color_manual(values = colorScale) +
3010
-        ggplot2::geom_vline(
3011
-            xintercept = c(-lCut, lCut), lty = interceptType,
3012
-            col = interceptColor, lwd = interceptWidth
3013
-        ) +
3014
-        ggplot2::geom_hline(
3015
-            yintercept = -log10(pCutoff), lty = interceptType,
3016
-            col = interceptColor, lwd = interceptWidth
3017
-        ) +
3018 3023
         ggplot2::labs(
3019 3024
             x = "log2(fold change)",
3020 3025
             y = "-log10 (p-value)"
3021 3026
         )
3027
+    
3028
+    ## add logFC cutoff lines
3029
+    if (lCut != 0) {
3030
+        pVol <- pVol +
3031
+            ggplot2::geom_vline(
3032
+                xintercept = c(-lCut, lCut), lty = interceptType,
3033
+                col = interceptColor, lwd = interceptWidth
3034
+            )
3035
+    }
3036
+    
3037
+    ## add p-value cutoff line
3038
+    if (pCutoff != 1) {
3039
+        pVol <- pVol +
3040
+            ggplot2::geom_hline(
3041
+                yintercept = -log10(pCutoff), lty = interceptType,
3042
+                col = interceptColor, lwd = interceptWidth
3043
+            )
3044
+    }
3022 3045
 
3023 3046
     ## apply MIRit ggplot2 theme
3024 3047
     pVol <- pVol +
... ...
@@ -16,7 +16,7 @@ knitr::opts_chunk$set(
16 16
 # MIRit <img src="man/figures/logo.svg" align="right" height="139" alt="" />
17 17
 
18 18
 <!-- badges: start -->
19
-[![Devel version](https://img.shields.io/badge/devel%20version-0.99.11-blue.svg)](https://github.com/jacopo-ronchi/MIRit)
19
+[![Devel version](https://img.shields.io/badge/devel%20version-0.99.12-blue.svg)](https://github.com/jacopo-ronchi/MIRit)
20 20
 [![GitHub issues](https://img.shields.io/github/issues/jacopo-ronchi/MIRit)](https://github.com/jacopo-ronchi/MIRit/issues)
21 21
 [![GitHub pulls](https://img.shields.io/github/issues-pr/jacopo-ronchi/MIRit)](https://github.com/jacopo-ronchi/MIRit/pulls)
22 22
 [![Last commit](https://img.shields.io/github/last-commit/jacopo-ronchi/MIRit.svg)](https://github.com/jacopo-ronchi/MIRit/commits/devel)
... ...
@@ -6,7 +6,7 @@
6 6
 <!-- badges: start -->
7 7
 
8 8
 [![Devel
9
-version](https://img.shields.io/badge/devel%20version-0.99.11-blue.svg)](https://github.com/jacopo-ronchi/MIRit)
9
+version](https://img.shields.io/badge/devel%20version-0.99.12-blue.svg)](https://github.com/jacopo-ronchi/MIRit)
10 10
 [![GitHub
11 11
 issues](https://img.shields.io/github/issues/jacopo-ronchi/MIRit)](https://github.com/jacopo-ronchi/MIRit/issues)
12 12
 [![GitHub
... ...
@@ -14,7 +14,7 @@ MIRit is an R package that provides several methods for investigating the relati
14 14
 Useful links:
15 15
 \itemize{
16 16
   \item \url{https://github.com/jacopo-ronchi/MIRit}
17
-  \item Report bugs at \url{https://support.bioconductor.org/tag/MIRit}
17
+  \item Report bugs at \url{https://github.com/jacopo-ronchi/MIRit/issues}
18 18
 }
19 19
 
20 20
 }
... ...
@@ -8,10 +8,10 @@ addDifferentialExpression(
8 8
   mirnaObj,
9 9
   mirnaDE = NULL,
10 10
   geneDE = NULL,
11
-  mirna.logFC = 1,
11
+  mirna.logFC = 0,
12 12
   mirna.pCutoff = 0.05,
13 13
   mirna.pAdjustment = "fdr",
14
-  gene.logFC = 1,
14
+  gene.logFC = 0,
15 15
   gene.pCutoff = 0.05,
16 16
   gene.pAdjustment = "fdr"
17 17
 )
... ...
@@ -29,8 +29,7 @@ expression analysis. Check the \emph{details} section to see the required format
29 29
 Default is NULL not to add gene differential expression results}
30 30
 
31 31
 \item{mirna.logFC}{The minimum log2 fold change required to consider a miRNA
32
-as differentially expressed. Default is 1, to retain only two-fold
33
-differences}
32
+as differentially expressed. Optional, default is 0}
34 33
 
35 34
 \item{mirna.pCutoff}{The adjusted p-value cutoff to use for miRNA statistical
36 35
 significance. The default value is \code{0.05}}
... ...
@@ -40,7 +39,7 @@ testing. It must be one of: \code{fdr} (default), \code{BH}, \code{none}, \code{
40 39
 \code{hochberg}, \code{hommel}, \code{bonferroni}, \code{BY}}
41 40
 
42 41
 \item{gene.logFC}{The minimum log2 fold change required to consider a gene as
43
-differentially expressed. Default is 1, to retain only two-fold differences}
42
+differentially expressed. Optional, default is 0}
44 43
 
45 44
 \item{gene.pCutoff}{The adjusted p-value cutoff to use for gene statistical
46 45
 significance. The default value is \code{0.05}}
... ...
@@ -132,8 +131,8 @@ de_g <- geneDE(object = loadExamples(), onlySignificant = FALSE)
132 131
 
133 132
 # add DE results to MirnaExperiment object
134 133
 obj <- addDifferentialExpression(obj, de_m, de_g,
135
-    mirna.logFC = 1, mirna.pCutoff = 0.05,
136
-    gene.logFC = 1, gene.pCutoff = 0.05
134
+    mirna.pCutoff = 0.05,
135
+    gene.pCutoff = 0.05
137 136
 )
138 137
 
139 138
 }
... ...
@@ -12,7 +12,7 @@ performMirnaDE(
12 12
   contrast,
13 13
   design,
14 14
   method = "edgeR",
15
-  logFC = 1,
15
+  logFC = 0,
16 16
   pCutoff = 0.05,
17 17
   pAdjustment = "fdr",
18 18
   filterByExpr.args = list(),
... ...
@@ -40,7 +40,7 @@ performGeneDE(
40 40
   contrast,
41 41
   design,
42 42
   method = "edgeR",
43
-  logFC = 1,
43
+  logFC = 0,
44 44
   pCutoff = 0.05,
45 45
   pAdjustment = "fdr",
46 46
   filterByExpr.args = list(),
... ...
@@ -87,7 +87,7 @@ expression. For NGS experiments, it must be one of \code{edgeR} (default),
87 87
 \code{limma} can be used}
88 88
 
89 89
 \item{logFC}{The minimum log2 fold change required to consider a gene as
90
-differentially expressed. Default is 1, to retain only two-fold differences}
90
+differentially expressed. Optional, default is 0}
91 91
 
92 92
 \item{pCutoff}{The adjusted p-value cutoff to use for statistical
93 93
 significance. The default value is \code{0.05}}
... ...
@@ -170,10 +170,11 @@ expression results. To access these results, the user may run the
170 170
 \code{performMirnaDE()} and \code{performGeneDE()} are two functions provided by MIRit
171 171
 to conduct miRNA and gene differential expression analysis, respectively.
172 172
 In particular, these functions allow the user to compute differential
173
-expression through different methods, namely \code{edgeR}, \code{DESeq2}, \code{limma-voom}
174
-and \code{limma}. Data deriving from NGS experiments and microarray technology
175
-are all suitable for these functions. For precise indications about how to
176
-use these functions, please refer to the \emph{details} section.
173
+expression through different methods, namely \code{edgeR} (Quasi-Likelihood
174
+framework), \code{DESeq2}, \code{limma-voom} and \code{limma}. Data deriving from NGS
175
+experiments and microarray technology are all suitable for these functions.
176
+For precise indications about how to use these functions, please refer to
177
+the \emph{details} section.
177 178
 }
178 179
 \details{
179 180
 When performing differential expression for NGS experiments, count matrices
... ...
@@ -62,8 +62,8 @@ obj <- loadExamples()
62 62
 
63 63
 \donttest{
64 64
 # search disease
65
-searchDisease("Alzheimer disease")
66
-disId <- "Alzheimer disease"
65
+searchDisease("response to antidepressant")
66
+disId <- "response to antidepressant"
67 67
 
68 68
 # retrieve associated SNPs
69 69
 association <- findMirnaSNPs(obj, disId)
... ...
@@ -19,66 +19,101 @@ package: "`r pkg_ver('MIRit')`"
19 19
 link-citations: true
20 20
 bibliography: references.bib
21 21
 abstract: |
22
-  In this vignette, we are going to see how to use MIRit for investigating the compromised miRNA-gene regulatory networks in thyroid cancer. In particular, an RNA-Seq experiment will be used as an example to demonstrate how to perform an integrative analysis with MIRit, including differential expression analysis, functional enrichment and characterization, correlation analysis and, lastly, the construction and visualization of the impaired miRNAs regulatory axes within biological pathways.
22
+  In this vignette, we are going to see how to use MIRit for investigating the
23
+  compromised miRNA-gene regulatory networks in thyroid cancer. In particular,
24
+  an RNA-Seq experiment will be used as an example to demonstrate how to perform
25
+  an integrative analysis with MIRit, including differential expression
26
+  analysis, functional enrichment and characterization, correlation analysis
27
+  and, lastly, the construction and visualization of the impaired miRNAs
28
+  regulatory axes within biological pathways.
23 29
 vignette: |
24 30
   %\VignetteIndexEntry{Integrate miRNA and gene expression data with MIRit}
25 31
   %\VignetteEncoding{UTF-8}
26 32
   %\VignetteEngine{knitr::rmarkdown}
27
-editor_options: 
28
-  markdown: 
29
-    wrap: 80
30 33
 ---
31 34
 
32 35
 ```{r setup, include = FALSE}
33 36
 knitr::opts_chunk$set(
34
-    collapse = TRUE,
35
-    comment = "#>",
36
-    crop = NULL
37
+  collapse = TRUE,
38
+  comment = "#>",
39
+  crop = NULL
37 40
 )
38 41
 ```
39 42
 
40 43
 
41 44
 # Introduction
42 45
 
43
-<img src="../man/figures/logo.svg" style="float:right; padding:20px" height="192" alt=""/>
46
+<img src="../man/figures/logo.svg" style="float:right;
47
+padding:20px" height="192" alt=""/>
44 48
 
45 49
 ## What is MIRit
46 50
 
47
-`MIRit` (miRNA integration tool) is an open-source R package that aims to facilitate the comprehension of microRNA (miRNA) biology through the integrative analysis of gene and miRNA expression data deriving from different platforms, including microarrays, RNA-Seq, miRNA-Seq, proteomics and single-cell transcriptomics. Given their regulatory importance, a complete characterization of miRNA dysregulations results crucial to explore the molecular networks that may lead to the insurgence of complex diseases. Unfortunately, there are no currently available options for thoroughly interpreting the biological consequences of miRNA dysregulations, thus limiting the ability to identify the affected pathways and reconstruct the compromised molecular networks. To tackle this limitation, we developed MIRit, an all-in-one framework that provides flexible and powerful methods for performing integrative miRNA-mRNA multi-omic analyses from start to finish. In particular, MIRit includes multiple modules that allow to perform:
48
-
49
-1. **Differential expression analysis**, which identifies miRNAs and genes that vary across biological conditions for both RNA-Seq and microarray experiments (even though other technologies can also be used);
50
-2. **Functional enrichment analysis**, which allows to understand the consequences of gene dysregulations through different strategies, including over-representation analysis (ORA), gene-set enrichment analysis (GSEA) and Correlation Adjusted MEan RAnk gene set test (CAMERA);
51
-3. **SNP association**, that links miRNA dysregulations to disease-associated SNPs occurring at miRNA gene loci;
52
-4. **Target identification**, which retrieves predicted and validated miRNA-target interactions through innovative state-of-the-art approaches that limit false discoveries;
53
-5. **Expression levels integration**, which integrates the expression levels of miRNAs and mRNAs for both paired data, through correlation analyses, and unpaired data, through association tests and rotation gene-set tests;
54
-6. **Topological analysis**, which implements a novel approach called *Topologically-Aware Integrative Pathway Analysis (TAIPA)* for identifying the impaired molecular networks that affect biological pathways retrieved from KEGG, Reactome and WikiPathways.
51
+`MIRit` (miRNA integration tool) is an open source R package that aims to
52
+facilitate the comprehension of microRNA (miRNA) biology through the integrative
53
+analysis of gene and miRNA expression data deriving from different platforms,
54
+including microarrays, RNA-Seq, miRNA-Seq, proteomics and single-cell
55
+transcriptomics. Given their regulatory importance, a complete characterization
56
+of miRNA dysregulations results crucial to explore the molecular networks that
57
+may lead to the insurgence of complex diseases. Unfortunately, there are no
58
+currently available options for thoroughly interpreting the biological
59
+consequences of miRNA dysregulations, thus limiting the ability to identify
60
+the affected pathways and reconstruct the compromised molecular networks. To
61
+tackle this limitation, we developed MIRit, an all-in-one framework that
62
+provides flexible and powerful methods for performing integrative miRNA-mRNA
63
+multi-omic analyses from start to finish. In particular, MIRit includes multiple
64
+modules that allow to perform:
65
+
66
+1. **Differential expression analysis**, which identifies miRNAs and genes that
67
+vary across biological conditions for both RNA-Seq and microarray experiments
68
+(even though other technologies can also be used);
69
+2. **Functional enrichment analysis**, which allows to understand the
70
+consequences of gene dysregulations through different strategies, including 
71
+over-representation analysis (ORA), gene-set enrichment analysis (GSEA) and
72
+Correlation Adjusted MEan RAnk gene set test (CAMERA);
73
+3. **SNP association**, that links miRNA dysregulations to disease-associated
74
+SNPs occurring at miRNA gene loci;
75
+4. **Target identification**, which retrieves predicted and validated
76
+miRNA-target interactions through innovative state-of-the-art approaches that
77
+limit false discoveries;
78
+5. **Expression levels integration**, which integrates the expression levels of
79
+miRNAs and mRNAs for both paired data, through correlation analyses, and
80
+unpaired data, through association tests and rotation gene-set tests;
81
+6. **Topological analysis**, which implements a novel approach called
82
+*Topology-Aware Integrative Pathway Analysis (TAIPA)* for identifying the
83
+impaired molecular networks that affect biological pathways retrieved from KEGG,
84
+Reactome and WikiPathways.
55 85
 
56 86
 ## How to cite MIRit
57 87
 
58
-If you use MIRit in published research, please cite:
88
+If you use MIRit in published research, please cite the following paper:
59 89
 
60
-> Ronchi J and Foti M. 'MIRit: an integrative R framework for the identification of impaired miRNA-mRNA regulatory networks in complex diseases'. bioRxiv (2023). doi:10.1101/2023.11.24.568528
90
+> Ronchi J and Foti M. 'MIRit: an integrative R framework for the identification
91
+of impaired miRNA-mRNA regulatory networks in complex diseases'. bioRxiv (2023).
92
+doi:10.1101/2023.11.24.568528
61 93
 
62
-This package internally uses different R/Bioconductor packages, remember to cite the appropriate publication.
94
+This package internally uses different R/Bioconductor packages, remember to cite
95
+the appropriate publications.
63 96
 
64 97
 ## Installation
65 98
 
66
-Before starting, MIRit must be installed on your system. You can do this through Bioconductor.
99
+Before starting, MIRit must be installed on your system. You can do this through
100
+Bioconductor.
67 101
 
68 102
 ```{r getPackage, eval=FALSE}
69 103
 ## install MIRit from Bioconductor
70 104
 if (!requireNamespace("BiocManager", quietly = TRUE)) {
71
-    install.packages("BiocManager")
105
+  install.packages("BiocManager")
72 106
 }
73 107
 BiocManager::install("MIRit")
74 108
 ```
75 109
 
76
-If needed, you could also install the development version of MIRit directly from GitHub:
110
+If needed, you could also install the development version of MIRit directly from
111
+GitHub:
77 112
 
78 113
 ```{r getPackageDevel, eval=FALSE}
79 114
 ## install the development version from GitHub
80 115
 if (!requireNamespace("BiocManager", quietly = TRUE)) {
81
-    install.packages("BiocManager")
116
+  install.packages("BiocManager")
82 117
 }
83 118
 BiocManager::install("jacopo-ronchi/MIRit")
84 119
 ```
... ...
@@ -95,9 +130,15 @@ library(MIRit)
95 130
 
96 131
 ## Load example data
97 132
 
98
-To demonstrate the capabilities of MIRit we will use RNA-Seq data from @riesco-eizaguirre_mir-146b-3ppax8nis_2015. This experiment collected samples from 8 papillary thyroid carcinoma tumors and contralateral normal thyroid tissue from the same patients. These samples were profiled for gene expression through RNA-Seq, and for miRNA expression through miRNA-Seq. To provide an easy access to the user, raw count matrices have been retrieved from GEO and included in this package.
133
+To demonstrate the capabilities of MIRit we will use RNA-Seq data from
134
+@riesco-eizaguirre_mir-146b-3ppax8nis_2015. This experiment collected samples
135
+from 8 papillary thyroid carcinoma tumors and contralateral normal thyroid
136
+tissue from the same patients. These samples were profiled for gene expression
137
+through RNA-Seq, and for miRNA expression through miRNA-Seq. To provide easy
138
+access to the user, raw count matrices have been retrieved from GEO and included
139
+in this package.
99 140
 
100
-To load the example data, we can simply use the `data()` function for both gene and miRNA count matrices.
141
+To load example data, we can simply use the `data()` function:
101 142
 
102 143
 ```{r example}
103 144
 ## load count matrix for genes
... ...
@@ -109,17 +150,38 @@ data(mirnaCounts, package = "MIRit")
109 150
 
110 151
 ## Paired vs unpaired data
111 152
 
112
-When using MIRit, we must specify whether miRNA and gene expression values derive from the same individuals or not. As already mentioned, **paired data** are those where individuals used to measure gene expression are the same subjects used to measure miRNA expression. On the other hand, **unpaired data** are those where gene expression and miRNA expression derive from different cohorts of donors. Importantly, MIRit considers as paired samples those data sets where paired measurements are available for at least some samples.
153
+When using MIRit, we must specify whether miRNA and gene expression values
154
+derive from the same individuals or not. As already mentioned, **paired data**
155
+are those where individuals used to measure gene expression are the same
156
+subjects used to measure miRNA expression. On the other hand, **unpaired data**
157
+are those where gene expression and miRNA expression derive from different
158
+cohorts of donors. Importantly, MIRit considers as paired samples those data
159
+sets where paired measurements are available for at least some samples.
113 160
 
114
-In our case, miRNA and gene expression data originate from the same subjects, and therefore we will conduct a *paired samples* analysis.
161
+In our case, miRNA and gene expression data originate from the same subjects,
162
+and therefore we will conduct a *paired samples* analysis.
115 163
 
116 164
 ## Set up expression matrices
117 165
 
118
-As input data, MIRit requires miRNA and gene expression measurements as matrices with samples as columns, and genes/miRNAs as rows. Further, the row names of miRNA expression matrix should contain miRNA names according to miRBase nomenclature (e.g. hsa-miR-151a-5p, hsa-miR-21-5p), whereas for gene expression matrix, row names must contain gene symbols according to HGNC (e.g. TYK2, BDNF, NTRK2).
119
-
120
-These matrices may handle different types of values deriving from multiple technologies, including microarrays, RNA-Seq and proteomics. The only requirement is that, for microarray studies, expression matrices must be normalized and log2 transformed. This can be achieved by applying the RMA algorithm implemented in the `r Biocpkg("oligo")` [@carvalho_framework_2010] package, or by applying other quantile normalization strategies. On the contrary, for RNA-Seq and miRNA-Seq experiments, the simple count matrix must be supplied.
121
-
122
-Eventually, expression matrices required by MIRit should appear as those in `mirnaCounts` and `geneCounts`, which are displayed in Tables \@ref(tab:geneExpr) and \@ref(tab:mirnaExpr).
166
+As input data, MIRit requires miRNA and gene expression measurements as
167
+matrices with samples as columns, and genes/miRNAs as rows. Further, the row
168
+names of miRNA expression matrix should contain miRNA names according to
169
+miRBase nomenclature (e.g. hsa-miR-151a-5p, hsa-miR-21-5p), whereas for gene
170
+expression matrix, row names must contain gene symbols according to HGNC
171
+(e.g. TYK2, BDNF, NTRK2).
172
+
173
+These matrices may handle different types of values deriving from multiple
174
+technologies, including microarrays, RNA-Seq and proteomics. The only
175
+requirement is that, for microarray studies, expression matrices must be
176
+normalized and log2 transformed. This can be achieved by applying the RMA
177
+algorithm implemented in the `r Biocpkg("oligo")` [@carvalho_framework_2010]
178
+package, or by applying other quantile normalization strategies. On the
179
+contrary, for RNA-Seq and miRNA-Seq experiments, the simple count matrix must
180
+be supplied.
181
+
182
+Eventually, expression matrices required by MIRit should appear as those in
183
+`mirnaCounts` and `geneCounts`, which are displayed in Tables
184
+\@ref(tab:geneExpr) and \@ref(tab:mirnaExpr).
123 185
 
124 186
 ```{r geneExpr, echo=FALSE}
125 187
 ## print a table for gene expression matrix
... ...
@@ -133,15 +195,22 @@ knitr::kable(mirnaCounts[seq(5), seq(5)], digits = 2, caption = "MiRNA expressio
133 195
 
134 196
 ## Define sample metadata {#meta}
135 197
 
136
-Once we have expression matrices in the proper format, we need to inform MIRit about the samples in study and the biological conditions of interest. To do so, we need to create a `data.frame` that must contain:
198
+Once we have expression matrices in the proper format, we need to inform MIRit
199
+about the samples in study and the biological conditions of interest. To do so,
200
+we need to create a `data.frame` that must contain:
137 201
 
138
-- a column named `primary`, specifying a unique identifier for each different subject;
139
-- a column named `mirnaCol`, matching the column name used for each sample in the miRNA expression matrix;
140
-- a column named `geneCol`, matching the column name used for each sample in the gene expression matrix;
202
+- a column named `primary`, specifying a unique identifier for each different
203
+subject;
204
+- a column named `mirnaCol`, matching the column name used for each sample in
205
+the miRNA expression matrix;
206
+- a column named `geneCol`, matching the column name used for each sample in
207
+the gene expression matrix;
141 208
 - a column that defines the biological condition of interest;
142
-- other optional columns that store specific sample metadata, such as age, sex and so on...
209
+- other optional columns that store specific sample metadata, such as age, sex
210
+and so on...
143 211
 
144
-Firstly, let's take a look at the column names used for miRNA and gene expression matrices.
212
+Firstly, let's take a look at the column names used for miRNA and gene
213
+expression matrices.
145 214
 
146 215
 ```{r colnames}
147 216
 ## print sample names in geneCounts
... ...
@@ -154,9 +223,16 @@ colnames(mirnaCounts)
154 223
 identical(colnames(geneCounts), colnames(mirnaCounts))
155 224
 ```
156 225
 
157
-In our case, we see that both expression matrices have the same column names, and therefore `mirnaCol` and `geneCol` will contain the same values. However, note that is not always the case, especially for unpaired data, where miRNA and gene expression values derive from different subjects. In these cases, `mirnaCol` and `geneCol` must map each column of miRNA and gene expression matrices to the relative subjects indicated in the `primary` column. Notably, for unpaired data, NAs can be used for missing entries in `mirnaCol`/`geneCol`.
226
+In our case, we see that both expression matrices have the same column names,
227
+and therefore `mirnaCol` and `geneCol` will be identical. However, note that is
228
+not always the case, especially for unpaired data, where miRNA and gene
229
+expression values derive from different subjects. In these cases, `mirnaCol` and
230
+`geneCol` must map each column of miRNA and gene expression matrices to the
231
+relative subjects indicated in the `primary` column. Notably, for unpaired data,
232
+`NAs` can be used for missing entries in `mirnaCol`/`geneCol`.
158 233
 
159
-That said, we can proceed to create the `data.frame` with sample metadata as follows.
234
+That said, we can proceed to create the `data.frame` with sample metadata as
235
+follows.
160 236
 
161 237
 ```{r metadata}
162 238
 ## create a data.frame containing sample metadata
... ...
@@ -169,9 +245,16 @@ meta <- data.frame(primary = colnames(mirnaCounts),
169 245
 
170 246
 ## Create a `MirnaExperiment` object
171 247
 
172
-At this point, after setting up expression matrices, and after defining sample metadata, we need to create an object of class `MirnaExperiment`, which is the main class used in MIRit to store all the data that are necessary for the integrative miRNA-mRNA analysis. In particular, this class extends the `MultiAssayExperiment` class from the homonym package [@ramos_software_2017] to store expression levels of both miRNAs and genes, differential expression results, miRNA-target pairs and integrative miRNA-gene analysis.
248
+At this point, we need to create an object of class `MirnaExperiment`, which is
249
+the main class used in MIRit for integrative miRNA-mRNA analyses. In particular,
250
+this class extends the `MultiAssayExperiment` class from the homonym package
251
+[@ramos_software_2017] to store expression levels of both miRNAs and genes,
252
+differential expression results, miRNA-target pairs and integrative miRNA-gene
253
+analysis.
173 254
 
174
-The easiest way to create a valid `MirnaExperiment` object is to use the `MirnaExperiment()` function, which automatically handles the formatting of input data and the creation of the object.
255
+The easiest way to create a valid `MirnaExperiment` object is to use the
256
+`MirnaExperiment()` function, which automatically handles the formatting of
257
+input data and the creation of the object.
175 258
 
176 259
 ```{r mirnaObj}
177 260
 ## create the MirnaExperiment object
... ...
@@ -181,15 +264,26 @@ experiment <- MirnaExperiment(mirnaExpr = mirnaCounts,
181 264
                               pairedSamples = TRUE)
182 265
 ```
183 266
 
267
+
184 268
 # Differential expression analysis
185 269
 
186
-Now that the `MirnaExperiment` object has been created, we can move to differential expression analysis, which aims to define differentially expressed features across biological conditions.
270
+Now that the `MirnaExperiment` object has been created, we can move to
271
+differential expression analysis, which aims to define differentially expressed
272
+features across biological conditions.
187 273
 
188 274
 ## Visualize expression variability
189 275
 
190
-Firstly, before doing anything else, it is useful to explore miRNA and gene expression variability through dimensionality reduction techniques. This is useful because it allows us to visualize the main drivers of expression variability. In this regard, MIRit offers the `plotDimensions()` function, which enables to visualize both miRNA and gene expression in the multidimensional space (MDS plots). Moreover, it is possible to color samples based on specific variables, hence allowing to explore miRNA/gene expression variation between distinct biological groups.
276
+Firstly, before doing anything else, it is useful to explore expression
277
+variability through dimensionality reduction techniques. This is useful because
278
+it allows us to visualize the main drivers of expression differences. In this
279
+regard, MIRit offers the `plotDimensions()` function, which enables to visualize
280
+both miRNA and gene expression in the multidimensional space (MDS plots).
281
+Moreover, it is possible to color samples based on specific variables, hence
282
+allowing to explore specific patterns between distinct biological groups.
191 283
 
192
-In our example, let's examine expression variability for both miRNAs and genes, and let's color the samples based on "disease", a variable included in the previously defined metadata.
284
+In our example, let's examine expression variability for both miRNAs and genes,
285
+and let's color the samples based on "disease", a variable included in the
286
+previously defined metadata.
193 287
 
194 288
 ```{r mds, fig.wide=TRUE, fig.cap="MDS plots for miRNAs and genes. Both plots show that the main source of variability is given by the disease condition."}
195 289
 geneMDS <- plotDimensions(experiment,
... ...
@@ -206,70 +300,123 @@ ggpubr::ggarrange(geneMDS, mirnaMDS,
206 300
                   nrow = 1, labels = "AUTO", common.legend = TRUE)
207 301
 ```
208 302
 
209
-As we can see from Figures \@ref(fig:mds)A and \@ref(fig:mds)B, the samples are very well separated on the basis of disease condition, thus suggesting that this is the major factor that influences expression variability. This is exactly what we want, since we aim to evaluate the differences between cancer and normal tissue. If this wasn't the case, we should identify the confounding variables, and include them in the model used for differential expression analysis (see Section \@ref(model)).
303
+As we can see from Figures \@ref(fig:mds)A and \@ref(fig:mds)B, the samples are
304
+very well separated on the basis of disease condition, thus suggesting that this
305
+is the major factor that influences expression variability. This is exactly what
306
+we want, since we aim to evaluate the differences between cancer and normal
307
+tissue. If this wasn't the case, we should identify the confounding variables
308
+and include them in the model used for differential expression analysis
309
+(see Section \@ref(model)).
210 310
 
211 311
 ## Perform miRNA and gene differential expression
212 312
 
213 313
 ### Available methods for RNA-Seq and microarrays {#deMethods}
214 314
 
215
-Now, we are ready to perform differential expression analysis. In this concern, MIRit provides different options based on the technology used for generating expression data. Indeed, when expression measurements derive from microarrays, MIRit calculates differentially expressed features through the pipeline implemented in the `r Biocpkg("limma")` package. On the other hand, when expression values derive from RNA-Seq experiments, MIRit allows to choose between different approaches, including:
315
+Now, we are ready to perform differential expression analysis. In this concern,
316
+MIRit provides different options based on the technology used. Indeed, when
317
+expression measurements derive from microarrays, MIRit calculates differentially
318
+expressed features through the pipeline implemented in the `r Biocpkg("limma")`
319
+package. On the other hand, when expression values derive from RNA-Seq
320
+experiments, MIRit allows to choose between different approaches, including:
216 321
 
217
-- the approach defined in the `r Biocpkg("edgeR")` package;
322
+- the Quasi-Likelihood framework defined in the `r Biocpkg("edgeR")` package;
218 323
 - the approach defined in the `r Biocpkg("DESeq2")` package;
219 324
 - the `limma-voom` approach defined in the `r Biocpkg("limma")` package.
220 325
 
221
-Moreover, MIRit gives the possibility of fully customizing the parameters used for differential expression analysis, thus allowing a finer control that makes it easy to adopt strategies that differ from the standard pipelines proposed in these packages. For additional information, see Section \@ref(param).
326
+Moreover, MIRit gives the possibility of fully customizing the parameters used
327
+for differential expression analysis, thus allowing a finer control that makes
328
+it possible to adopt strategies that differ from the standard pipelines proposed
329
+in these packages. For additional information, see Section \@ref(param).
222 330
 
223 331
 ### Model design {#model}
224 332
 
225
-After identifying the variable of interest and the confounding factors, we must indicate the experimental model used for fitting expression values. Notably, MIRit will automatically take care of model fitting, so that we only need to indicate a formula with the appropriate variables.
333
+After identifying the variable of interest and the confounding factors, we must
334
+indicate the experimental model used for fitting expression values. Notably,
335
+MIRit will automatically take care of model fitting, so that we only need to
336
+indicate a formula with the appropriate variables.
226 337
 
227
-In our case, we want to evaluate the differences between cancer and normal thyroid. Therefore, disease condition is our variable of interest. However, in this experimental design, each individual has been assayed twice, one time for cancer tissue, and one time for healthy contralateral thyroid. Thus, we also need to include the patient ID as a covariate in order to prevent the individual differences between subjects from confounding the differences due to the disease.
338
+In our case, we want to evaluate the differences between cancer and normal
339
+thyroid. Therefore, disease condition is our variable of interest. However,
340
+in this experimental design, each individual has been assayed twice, one time
341
+for cancer tissue, and one time for healthy contralateral thyroid. Thus, we also
342
+need to include the patient ID as a covariate in order to prevent the individual
343
+differences between subjects from confounding the differences due to disease.
228 344
 
229 345
 ```{r model}
230 346
 ## design the linear model for both genes and miRNAs
231 347
 model <- ~ disease + patient
232 348
 ```
233 349
 
234
-If other variables affecting miRNA and gene expression are observed, they should be included in this formula.
350
+If other variables affecting miRNA and gene expression are observed, they should
351
+be included in this formula.
235 352
 
236 353
 ### The `performMirnaDE()` and `performGeneDE()` functions
237 354
 
238
-Once the linear model has been defined, we can perform the differential expression analysis through the `performMirnaDE()` and `performGeneDE()` functions. Indeed, these two functions take as input a `MirnaExperiment` object, and compute differential expression for miRNAs and genes.
239
-
240
-Additionally, when we run these functions, we must define different arguments, namely:
241
-
242
-- `group`, which corresponds to the name of the variable of interest as specified in Section \@ref(meta). In our case, we are interested in studying the differences between cancer tissue and normal tissue, and therefore our variable of interest is "*disease*". 
243
-- `contrast`, which indicates the levels of the variable of interest to be compared. In particular, this parameter takes as input a string where the levels are separated by a dash, and where the second level corresponds to the reference group. In our example, we want to compare samples from papillary thyroid cancer (PTC) against normal thyroid tissue (NTH), thus we set `contrast` to "*PTC-NTH*".
244
-- `design`, which specifies the linear model with the variable of interest and eventual covariates. To do so, we pass to this parameter the R formula that we defined in Section \@ref(model).
245
-- `method`, which tells MIRit which pipeline we want to use for computing differentially expressed features. As stated in Section \@ref(deMethods), for microarray studies the only option available is `limma`, while for RNA-Seq experiments, the user can choose between `edgeR`, `DESeq2`, and `voom` (for limma-voom). In our case we are going to perform differential expression analysis through the pipeline implemented in the `r Biocpkg("edgeR")` package.
246
-
247
-Following our example, let's calculate differentially expressed genes and differentially expressed miRNAs in thyroid cancer.
248
-
249
-```{r diffexp, eval=FALSE}
355
+Once the linear model has been defined, we can perform differential expression
356
+analysis through the `performMirnaDE()` and `performGeneDE()` functions, which
357
+take as input a `MirnaExperiment` object, and compute differential expression
358
+for miRNAs and genes, respectively.
359
+
360
+Additionally, we must define multiple arguments, namely:
361
+
362
+- `group`, which corresponds to the name of the variable of interest, as
363
+specified in Section \@ref(meta). In our case, we are interested in studying the
364
+differences between cancer tissue and normal tissue. Therefore our variable of
365
+interest is "*disease*". 
366
+- `contrast`, which indicates the levels of the variable of interest to be
367
+compared. In particular, this parameter takes as input a string where the
368
+levels are separated by a dash, and where the second level corresponds to the
369
+reference group. In our example, we want to compare samples from papillary
370
+thyroid cancer (PTC) against normal thyroid tissue (NTH), thus we set `contrast`
371
+to "*PTC-NTH*".
372
+- `design`, which specifies the linear model with the variable of interest and
373
+eventual covariates. To do so, we pass to this parameter the R formula that we
374
+defined in Section \@ref(model).
375
+- `method`, which tells MIRit which pipeline we want to use for computing
376
+differentially expressed features. As stated in Section \@ref(deMethods), for
377
+microarray studies the only option available is `limma`, while for RNA-Seq
378
+experiments, the user can choose between `edgeR`, `DESeq2`, and `voom` (for
379
+limma-voom). In our case we are going to perform differential expression
380
+analysis through the Quasi-Likelihood pipeline implemented in the
381
+`r Biocpkg("edgeR")` package.
382
+
383
+Following our example, let's calculate differentially expressed genes and
384
+differentially expressed miRNAs in thyroid cancer.
385
+
386
+```{r diffexp}
250 387
 ## perform differential expression for genes
251 388
 experiment <- performGeneDE(experiment,
252 389
                             group = "disease",
253 390
                             contrast = "PTC-NTH",
254
-                            design = model)
391
+                            design = model,
392
+                            pCutoff = 0.01)
255 393
 
256 394
 ## perform differential expression for miRNAs
257 395
 experiment <- performMirnaDE(experiment,
258 396
                              group = "disease",
259 397
                              contrast = "PTC-NTH",
260
-                             design = model)
261
-```
262
-
263
-```{r, echo=FALSE}
264
-## load the example object
265
-experiment <- loadExamples()
398
+                             design = model,
399
+                             pCutoff = 0.01)
266 400
 ```
267 401
 
268
-If not specified, the `performMirnaDE()` and `performGeneDE()` functions will define differentially expressed genes/miRNAs as those having an adjusted p-value lower than 0.05, and an absolute log2 fold change higher than 1 (FC > 2). However, this behavior can be changed by tweaking the `pCutoff` parameter, that specifies the statistical significance threshold; the `logFC` parameter, which indicates the minimum log2 fold change that features must display for being considered as differentially expressed; and the `pAdjustment` parameter, which specifies the approach used for multiple testing correction (default is `fdr` to use the Benjamini-Hochberg method).
402
+If not specified, the `performMirnaDE()` and `performGeneDE()` functions will
403
+define differentially expressed genes/miRNAs as those having an adjusted p-value
404
+lower than 0.05. However, this behavior can be changed by tweaking the `pCutoff`
405
+parameter, that specifies the statistical significance threshold; and the
406
+`pAdjustment` option, which specifies the approach used for multiple testing
407
+correction (default is `fdr` to use the Benjamini-Hochberg method). Optionally,
408
+it is possible to set the `logFC` parameter, which indicates the minimum log2
409
+fold change that features must display for being considered as differentially
410
+expressed. Please note that the simultaneous use of adjusted p-value and logFC
411
+cutoffs is discouraged and not recommended.
269 412
 
270 413
 ### Advanced parameters {#param}
271 414
 
272
-In addition to the above mentioned settings, other parameters can be passed to the `performMirnaDE()` and `performGeneDE()` functions. Specifically, depending on the method adopted for differential expression analysis, the user can finely control the arguments passed to each function involved in the pipeline. In particular, these two functions include the following advanced parameters:
415
+In addition to the above mentioned settings, other options can be passed to
416
+the `performMirnaDE()` and `performGeneDE()` functions. Specifically, depending
417
+on the method adopted for differential expression analysis, the user can finely
418
+control the arguments passed to each function involved in the pipeline. In
419
+particular, the following advanced parameters can be set:
273 420
 
274 421
 - `filterByExpr.args`,
275 422
 - `calcNormFactors.args`,
... ...
@@ -289,9 +436,23 @@ In addition to the above mentioned settings, other parameters can be passed to t
289 436
 - `correlationBlockVariable`
290 437
 - `duplicateCorrelation.args`
291 438
 
292
-In this regard, when using limma-voom strategy, the `useVoomWithQualityWeights` parameter tells MIRit whether to use `voomWithWualityWeights()` instead of the standard `voom()` function. In the same way, for microarray studies, the `useArrayWeights` specifies whether to consider array quality weights during the `limma` pipeline. Similarly, `useWsva` can be set to TRUE to include a weighted surrogate variable analysis for batch effect correction. Moreover, `useDuplicateCorrelation` can be set to TRUE if you want to consider the effect of correlated samples through the `duplicateCorrelation()` function in `limma`. In this concern, the `correlationBlockVariable` specifies the blocking variable to use. All the other parameters ending with "*.args*", accept a `list` object with additional parameters to be passed to the relative functions. In this way, the user has **full control** over the strategy used for differential expression analysis.
293
-
294
-For a complete reference on the usage of these parameters, check the help page of these functions. Instead, for further instructions on how to use these tools, please refer to their original manuals, which represent exceptional resources for learning the basics of differential expression analysis:
439
+In detail, when using limma-voom strategy, the `useVoomWithQualityWeights`
440
+parameter tells MIRit whether to use `voomWithWualityWeights()` instead of the
441
+standard `voom()` function. In the same way, for microarray studies, the
442
+`useArrayWeights` specifies whether to consider array quality weights during
443
+the `limma` pipeline. Similarly, `useWsva` can be set to TRUE to include a
444
+weighted surrogate variable analysis for batch effect correction. Moreover,
445
+`useDuplicateCorrelation` can be set to TRUE if you want to consider the effect
446
+of correlated samples through the `duplicateCorrelation()` function in `limma`.
447
+In this concern, the `correlationBlockVariable` specifies the blocking variable
448
+to use. All the other parameters ending with "*.args*", accept a `list` object
449
+with additional parameters to be passed to the relative functions. In this way,
450
+the user has **full control** over the strategy used.
451
+
452
+For a complete reference on the usage of these parameters, check the help page
453
+of these functions. Instead, for further instructions on how to use these tools,
454
+please refer to their original manuals, which represent exceptional resources
455
+for learning the basics of differential expression analysis:
295 456
 
296 457
 - limma User's Guide,
297 458
 - edgeR User's Guide,
... ...
@@ -299,23 +460,46 @@ For a complete reference on the usage of these parameters, check the help page o
299 460
 
300 461
 ### Add differential expression results from other technologies
301 462
 
302
-Even though MIRit implements all the most commonly used strategies for differential expression analyses, these methods may not be suitable for all kind of experiments. For instance, expression data deriving from technologies different from microarrays and RNA-Seq can't be processed through `performGeneDE()` and `performMirnaDE()` functions. Therefore, MIRit grants the possibility to perform differential expression analysis with every method of choice, and then add the results to an existing `MirnaExperiment` object. This is particularly valuable for proteomic studies, where different normalization strategies are used in differential expression pipelines. In this way, MIRit fully supports the use of proteomic data for conducting miRNA integrative analyses.
303
-
304
-To do so, we can make use of the `addDifferentialExpression()` function, which allows to manually add the results of the analysis. This function takes as input a `MirnaExperiment` object, and a table containing the differential expression results for all miRNAs/genes analyzed, not just for statistically significant species. If we want to manually set differential expression results for both miRNAs and genes, two different tables must be supplied. These tables must include:
305
-
306
-- One column containing miRNA/gene names (according to miRBase/HGNC nomenclature). Accepted column names are: `ID`, `Symbol`, `Gene_Symbol`, `Mirna`, `mir`, `Gene`, `gene.symbol`, `Gene.symbol`.
307
-- One column with log2 fold changes. Accepted column names are: `logFC`, `log2FoldChange`, `FC`, `lFC`.
308
-- One column with average expression. Accepted column names are: `AveExpr`, `baseMean`, `logCPM`.
309
-- One column with the p-values resulting from the differential expression analysis. Accepted column names are: `P.Value`, `pvalue`, `PValue`, `Pvalue`.
310
-- One column containing p-values adjusted for multiple testing. Accepted column names are: `adj.P.Val`, `padj`, `FDR`, `fdr`, `adj`, `adj.p`, `adjp`.
311
-
312
-Further, we must specify the cutoff levels used to consider miRNAs/genes as significantly differentially expressed. This can be done through the `mirna.logFC`, `mirna.pCutoff`, `mirna.pAdjustment`, `gene.logFC`, `gene.pCutoff`, `gene.pAdjustment` parameters.
463
+Even though MIRit implements all the most commonly used strategies for
464
+differential expression analyses, these methods may not be suitable for all kind
465
+of experiments. For instance, expression data deriving from technologies
466
+different from microarrays and RNA-Seq can't be processed through
467
+`performGeneDE()` and `performMirnaDE()` functions. Therefore, MIRit grants the
468
+possibility to perform differential expression analysis with every method of
469
+choice, and then add the results to an existing `MirnaExperiment` object. This
470
+is particularly valuable for proteomic studies, where different normalization
471
+strategies are used. In this way, MIRit fully supports the use of proteomic data
472
+for conducting miRNA integrative analyses.
473
+
474
+To do so, we can make use of the `addDifferentialExpression()` function, which
475
+takes as input a `MirnaExperiment` object, and a table containing the
476
+differential expression results for all miRNAs/genes analyzed (not just for
477
+statistically significant species). If we want to manually set differential
478
+expression results for both miRNAs and genes, two different tables must be
479
+supplied. These tables must include:
480
+
481
+- One column containing miRNA/gene names (according to miRBase/HGNC
482
+nomenclature). Accepted column names are: `ID`, `Symbol`, `Gene_Symbol`,
483
+`Mirna`, `mir`, `Gene`, `gene.symbol`, `Gene.symbol`.
484
+- One column with log2 fold changes. Accepted column names are: `logFC`,
485
+`log2FoldChange`, `FC`, `lFC`.
486
+- One column with average expression. Accepted column names are: `AveExpr`,
487
+`baseMean`, `logCPM`.
488
+- One column with the p-values resulting from the differential expression
489
+analysis. Accepted column names are: `P.Value`, `pvalue`, `PValue`, `Pvalue`.
490
+- One column containing p-values adjusted for multiple testing. Accepted column
491
+names are: `adj.P.Val`, `padj`, `FDR`, `fdr`, `adj`, `adj.p`, `adjp`.
492
+
493
+Further, we must specify the cutoff levels used to consider miRNAs/genes as
494
+significantly differentially expressed.
313 495
 
314 496
 ## Visualize differentially expressed features
315 497
 
316 498
 ### Access differential expression tables
317 499
 
318
-Once differential expression analysis has been performed, we can use the `mirnaDE()` and `geneDE()` functions to access a table with differentially expressed features. Therefore, let's access the results of differential expression analysis in thyroid cancer for both miRNAs and genes.
500
+Once differential expression analysis has been performed, we can use the
501
+`mirnaDE()` and `geneDE()` functions to access a table with differentially
502
+expressed features.
319 503
 
320 504
 ```{r accessDe}
321 505
 ## access DE results for genes
... ...
@@ -327,11 +511,12 @@ deMirnas <- mirnaDE(experiment)
327 511
 
328 512
 ### Create a volcano plot for miRNAs and genes
329 513
 
330
-In addition to differential expression tables, we can also generate a graphical overview of differential expression through volcano plots, which are extremely useful for visualizing features changing across biological conditions. To produce volcano plots, MIRit offers the `plotVolcano()` function.
331
-
332
-In our example, let's create volcano plots for both miRNA and gene differential expression.
514
+In addition to tables, we can also generate a graphical overview of
515
+differential expression through volcano plots, which are extremely useful for
516
+visualizing features changing across biological conditions. To produce volcano
517
+plots, MIRit offers the `plotVolcano()` function.
333 518
 
334
-```{r volcano, fig.wide=TRUE, fig.cap="Volcano plots of gene and miRNA differential expression."}
519
+```{r volcano, fig.wide=TRUE, fig.cap="Volcano plots of gene and miRNA differential expression. (A) shows the differentially expressed genes, while (B) displays differentially expressed miRNAs."}
335 520
 ## create a volcano plot for genes
336 521
 geneVolcano <- plotVolcano(experiment,
337 522
                            assay = "genes",
... ...
@@ -349,27 +534,45 @@ ggpubr::ggarrange(geneVolcano, mirnaVolcano,
349 534
 
350 535
 ### Produce differential expression bar plots
351 536
 
352
-Finally, if we are interested in specific genes/miRNAs, MIRit implements the `plotDE()` function that allows to represent expression changes of specific features as box plots, bar plots, or violin plots. In our example, we can use this function to visualize expression changes of different genes involved in the normal functioning of thyroid gland. Note that we use the `linear = FALSE` option to plot data in log scale (useful when multiple genes have very different expression levels).
537
+Finally, if we are interested in specific genes/miRNAs, MIRit implements the
538
+`plotDE()` function that allows to represent expression changes of specific
539
+features as box plots, bar plots, or violin plots. In our example, we can use
540
+this function to visualize expression changes of different genes involved in
541
+the normal functioning of thyroid gland. Note that we use the `linear = FALSE`
542
+option to plot data in log scale (useful when multiple genes have very different
543
+expression levels).
353 544
 
354 545
 ```{r thyroidBars, fig.wide=FALSE, fig.cap="Differential expression bar plots for different thyroid genes. Differential expression analysis demonstrated how TG, TPO, DIO2 and PAX8 result downregulated in thyroid cancer."}
355 546
 ## create a bar plot for all thyroid features
356
-thyrBar <- plotDE(experiment,
357
-                  features = c("TG", "TPO", "DIO2", "PAX8"),
358
-                  graph = "barplot", linear = FALSE)
359
-
360
-## show the resulting plot
361
-thyrBar
547
+plotDE(experiment,
548
+       features = c("TG", "TPO", "DIO2", "PAX8"),
549
+       graph = "barplot", linear = FALSE)
362 550
 ```
363 551
 
552
+
364 553
 # Functional enrichment analysis
365 554
 
366
-After finding differentially expressed genes, we usually end up having long lists of features whose expression changes between biological conditions. However, this is usually not very informative, and we seek to understand which biological functions result impaired in our experiments. In this regard, different methods exist for determining which cellular processes are dysregulated in our conditions.
555
+After finding differentially expressed genes, we usually end up having long
556
+lists of features whose expression changes between conditions. However, this is
557
+usually not very informative, and we seek to understand which functions result
558
+impaired in our experiments. In this regard, different methods exist for
559
+determining which cellular processes are dysregulated in our samples.
367 560
 
368 561
 ## Available approaches: ORA, GSEA and CAMERA
369 562
 
370
-In particular, MIRit supports different strategies for functional enrichment of genes, including over-representation analysis (ORA), gene-set enrichment analysis (GSEA), and Correlation Adjusted MEan RAnk gene set test (CAMERA). In this way, the user can infer compromised biological functions according to the approach of choice.
563
+MIRit supports different strategies for functional enrichment analysis of genes,
564
+including over-representation analysis (ORA), gene-set enrichment analysis
565
+(GSEA), and Correlation Adjusted MEan RAnk gene set test (CAMERA). In this way,
566
+the user can infer compromised biological functions according to the approach
567
+of choice.
371 568
 
372
-Among these methods, the first one that has been developed is named **over-representation analysis** [@boyle_gotermfinderopen_2004], often abbreviated as **ORA**. This analysis aims to define whether genes annotated to specific gene sets (such as ontological terms or biological pathways) are differentially expressed more than would be expected by chance. To do this, a p-value is calculated by the hypergeometric distribution for each gene set as in Equation \@ref(eq:hyper).
569
+Among these methods, the first one that has been developed is
570
+**over-representation analysis** [@boyle_gotermfinderopen_2004], often
571
+abbreviated as **ORA**. This analysis aims to define whether genes annotated to
572
+specific gene-sets (such as ontological terms or biological pathways) are
573
+differentially expressed more than would be expected by chance. To do this, a
574
+p-value is calculated by the hypergeometric distribution as in Equation
575
+\@ref(eq:hyper).
373 576
 
374 577
 \begin{equation}
375 578
   p = 1 - \sum_{i = 0}^{k - 1}{\frac{\binom{M}{i}\binom{N - M}
... ...
@@ -377,19 +580,53 @@ Among these methods, the first one that has been developed is named **over-repre
377 580
   (\#eq:hyper)
378 581
 \end{equation}
379 582
 
380
-Here, $N$ is the total number of genes tested, $M$ is the number of genes that are annotated to a particular gene set, $n$ is the number of differentially expressed genes, and $k$ is the number of differentially expressed genes that are annotated to the gene set.
381
-
382
-Additionally, another available approach is the **gene set enrichment analysis** [@subramanian_gene_2005], often refereed to with the acronym **GSEA**, which is suitable to find categories whose genes change in a small but coordinated way. The GSEA algorithm takes as input a list of genes ranked with a particular criterion, and then walks down the list to evaluate whether members of a given gene set are normally distributed or are mainly present at the top or at the bottom of the list. To check this out, the algorithm uses a running-sum that increases when finding a gene belonging to a given category, and decreases when a gene not contained in that specific set is found. The maximum distance from zero occurred in the running score is defined as the enrichment score (ES). To estimate the statistical significance of enrichment scores, a permutation test is performed by swapping gene labels annotated to a gene set.
383
-
384
-Even though GSEA is arguably the most commonly used approach for functional enrichment, @wu_camera_2012 demonstrated that inter-gene correlations might affect the reliability of functional enrichment analyses. To overcome this issue, they developed **Correlation Adjusted MEan RAnk gene set test (CAMERA)**, which is another competitive test used for functional enrichment analysis of genes. The main advantage of this method is that it adjusts the gene set test statistic according to inter-gene correlations.
583
+Here, $N$ is the total number of genes tested, $M$ is the number of genes that
584
+are annotated to a particular gene-set, $n$ is the number of differentially
585
+expressed genes, and $k$ is the number of differentially expressed genes that
586
+are annotated to the gene set.
587
+
588
+Additionally, another available approach is the **gene set enrichment analysis**
589
+[@subramanian_gene_2005], often refereed to with the acronym **GSEA**, which is
590
+suitable to find categories whose genes change in a small but coordinated way.
591
+The GSEA algorithm takes as input a list of genes ranked with a particular
592
+criterion, and then walks down the list to evaluate whether members of a given
593
+gene set are normally distributed or are mainly present at the top or at the
594
+bottom of the list. To check this out, the algorithm uses a running-sum that
595
+increases when finding a gene belonging to a given category, and decreases when
596
+a gene not contained in that specific set is found. The maximum distance from
597
+zero occurred in the running score is defined as the enrichment score (ES).
598
+To estimate the statistical significance of enrichment scores, a permutation
599
+test is performed by swapping gene labels annotated to a gene set.
600
+
601
+Even though GSEA is arguably the most commonly used approach for functional
602
+enrichment, @wu_camera_2012 demonstrated that inter-gene correlations might
603
+affect its reliability. To overcome this issue, they developed the
604
+**Correlation Adjusted MEan RAnk gene set test (CAMERA)**. The main advantage
605
+of this method is that it adjusts the gene set test statistic according to
606
+inter-gene correlations.
385 607
 
386 608
 ## Available databases and categories {#categories}
387 609
 
388
-As described above, functional enrichment analysis relies on gene sets, which consist in collections of genes that are annotated to specific functions or terms. Independently from the strategy used for the analysis, functional enrichment methods need access to these properly curated collections of genes. In the effort of providing access to a vast number of these resources, MIRit uses the `r Biocpkg("geneset")` package to support multiple databases, including Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), MsigDB, WikiPathways, Reactome, Enrichr, Disease Ontology (DO), Network of Cancer Genes (NCG), DisGeNET, and COVID-19. However, the majority of these databases contain multiple categories. To see the complete list of available gene sets for each database refer to the documentation of the `enrichGenes()` function.
610
+As described above, functional enrichment analysis relies on gene-sets, which
611
+consist in collections of genes that are annotated to specific functions or
612
+terms. Independently from the strategy used for the analysis, functional
613
+enrichment methods need access to these properly curated collections of genes.
614
+In the effort of providing access to a vast number of these resources, MIRit
615
+uses the `r CRANpkg("geneset")` package to support multiple databases,
616
+including Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG),
617
+MsigDB, WikiPathways, Reactome, Enrichr, Disease Ontology (DO), Network of
618
+Cancer Genes (NCG), DisGeNET, and COVID-19. However, the majority of these
619
+databases contain multiple categories. To see the complete list of available
620
+gene-sets for each database refer to the documentation of the `enrichGenes()`
621
+function.
389 622
 
390 623
 ## Supported species
391 624
 
392
-The above-mentioned collections have their own lists of supported species. To check the available species for a given database, MIRit provides a practical helper function named `supportedOrganisms()`. For example, to retrieve the species supported by Reactome database, we can simply run the following piece of code.
625
+The above-mentioned collections have their own lists of supported species.
626
+To check the available species for a given database, MIRit provides a practical
627
+helper function named `supportedOrganisms()`. For example, to retrieve the
628
+species supported by Reactome database, we can simply run the following piece
629
+of code.
393 630
 
394 631
 ```{r species}
395 632
 ## list available species for Reactome database
... ...
@@ -398,15 +635,23 @@ supportedOrganisms("Reactome")
398 635
 
399 636
 ## Perform functional enrichment with the `enrichGenes()` function {#enrichment}
400 637
 
401
-The main function in MIRit for the functional enrichment analysis of genes is `enrichGenes()`, which requires as input:
638
+The main function in MIRit for the functional enrichment analysis of genes is
639
+`enrichGenes()`, which requires as input:
402 640
 
403
-- the `MirnaExperiment` object that we get after running differential expression analysis;
404
-- `method`, which specifies the desired approach among `ORA`, `GSEA`, and `CAMERA`;
405
-- `database` and `category`, which define the gene set that you want to use for the enrichment (see Section \@ref(categories) for a complete reference of available databases and categories);
406
-- `organism`, which indicates the specie under investigation (defaults to "Homo sapiens");
407
-- `pCutoff` and `pAdjustment`, which specify the cutoff for statistical significance and the multiple testing correction, respectively.
641
+- the `MirnaExperiment` object that we get after running differential
642
+expression analysis;
643
+- `method`, which specifies the desired approach among `ORA`, `GSEA`, and
644
+`CAMERA`;
645
+- `database` and `category`, which define the gene-set that you want to use
646
+(see Section \@ref(categories) for a complete reference of available databases
647
+and categories);
648
+- `organism`, which indicates the specie under investigation (defaults to
649
+"Homo sapiens");
650
+- `pCutoff` and `pAdjustment`, which specify the cutoff for statistical
651
+significance and the multiple testing correction, respectively.
408 652
 
409
-In our example, we are going to perform the enrichment analysis by using ORA with GO database (biological processes).
653
+In our example, we are going to perform the enrichment analysis by using ORA
654
+with GO database (biological processes).
410 655
 
411 656
 ```{r ora}
412 657
 ## perform over-representation analysis with GO
... ...
@@ -416,9 +661,14 @@ ora <- enrichGenes(experiment,
416 661
                    organism = "Homo sapiens")
417 662
 ```
418 663
 
419
-MIRit performs ORA separately for upregulated and downregulated genes, as it has been shown that this is more powerful compared to enriching all DE genes [@hong_separate_2014-1]. Therefore, when we use ORA, `enrichGenes()` returns a `list` containing two objects of class `FunctionalEnrichment`, one storing enrichment results for upregulated genes, and one for downregulated genes.
664
+MIRit performs ORA separately for upregulated and downregulated genes, as it
665
+has been shown to be more powerful compared to enriching all DE genes
666
+[@hong_separate_2014-1]. Therefore, when we use ORA, `enrichGenes()` returns a
667
+`list` containing two objects of class `FunctionalEnrichment`, one storing
668
+enrichment results for upregulated genes, and one for downregulated genes.
420 669
 
421
-Before exploring the results of the analysis, we will also demonstrate the capabilities of MIRit by performing GSEA with the gene sets provided by the KEGG pathway database.
670
+Before exploring the results of the analysis, we will also demonstrate the use
671
+of GSEA with the gene-sets provided by the KEGG pathway database.
422 672
 
423 673
 ```{r gsea}
424 674
 ## set seed for reproducible results
... ...
@@ -431,13 +681,18 @@ gse <- enrichGenes(experiment,
431 681
                    organism = "Homo sapiens")
432 682
 ```
433 683
 
434
-In this case, the `enrichGenes()` function returns a single object of class `FunctionalEnrichment`, containing GSEA results.
684
+In this case, the `enrichGenes()` function returns a single object of class
685
+`FunctionalEnrichment`, containing GSEA results.
435 686
 
436 687
 ## Visualize enriched sets
437 688
 
438 689
 ### Access results table
439 690
 
440
-After running the `enrichGenes()` function, we get `FunctionalEnrichment` objects holding the results of enrichment analyses. To access the full table containing significantly affected biological functions, we can use the `enrichmentResults()` function. In our case, we can check the results of our GSEA analysis to investigate the human pathways that result affected in thyroid cancer. In Table \@ref(tab:gseaTab) we can see the output of `enrichmentResults(gse)`.
691
+After running the `enrichGenes()` function, we get `FunctionalEnrichment`
692
+objects holding the results of enrichment analyses. To access the full table
693
+containing significantly affected functions, we can use the
694
+`enrichmentResults()` function. In our case, we can check the results of GSEA
695
+(Table \@ref(tab:gseaTab)).
441 696
 
442 697
 ```{r gseaTab, echo=FALSE}
443 698
 ## display the results of GSEA
... ...
@@ -446,9 +701,12 @@ knitr::kable(enrichmentResults(gse), digits = 2, caption = "GSEA results. A tabl
446 701
 
447 702
 ### Enrichment dot plots and bar plots
448 703
 
449
-Further, in addition to exploring results table, MIRit offers several options for the visualization of enrichment analyses, including dot plots and bar plots. These plots are available for every `FunctionalEnrichment` object independently from the method used.
704
+Further, MIRit offers several options for the visualization of enrichment
705
+analyses, including dot plots and bar plots. These plots are available for
706
+every `FunctionalEnrichment` object independently from the method used.
450 707
 
451
-Following our example, we can visualize the results of the ORA that we performed in Section \@ref(enrichment) through a simple dot plot.
708
+Following our example, we can visualize ORA results that we obtained in Section
709
+\@ref(enrichment) through a simple dot plot.
452 710
 
453 711
 ```{r oraDot, fig.wide=FALSE, fig.cap="ORA results for downregulated genes. The enrichment of downregulated genes through the gene sets provided by GO database."}
454 712
 ## create a dot plot for ORA
... ...
@@ -457,121 +715,274 @@ enrichmentDotplot(ora$downregulated, title = "Depleted functions")
457 715
 
458 716
 ### Other plots for GSEA
459 717
 
460
-Additionally, MIRit provides specific visualization methods that are exclusive for GSEA, including ridge plots and GSEA plots. For instance, after running GSEA through `enrichGenes()`, we can produce a GSEA-style plot through the `gseaPlot()` function. In our case, we are going to use this plotting method for the "Thyroid hormone synthesis" pathway.
718
+Additionally, MIRit provides specific visualization methods that are exclusive
719
+for GSEA, including ridge plots and GSEA plots. For instance, after running GSEA
720
+through `enrichGenes()`, we can produce a GSEA-style plot through the
721
+`gseaPlot()` function. In our case, we are going to produce this plot for the
722
+"Thyroid hormone synthesis" pathway.
461 723
 
462 724
 ```{r gsePlot, fig.wide=FALSE, fig.cap="GSEA-style plot for Thyroid hormone synthesis. This type of plot shows the running sum that GSEA uses to determinate the enrichment score for each pathway."}
463 725
 ## create a GSEA plot
464 726
 gseaPlot(gse, "Thyroid hormone synthesis", rankingMetric = TRUE)
465 727
 ```
466 728
 
729
+
467 730
 # Associate miRNAs with disease-SNPs
468 731
 
469
-Interestingly, MIRit enables to explore the presence of disease-associated SNPs occurring at loci of differentially expressed miRNAs. In this concern, SNPs occurring within miRNAs may have important effects on the biological function of these transcripts, as they might alter its expression or the spectrum of miRNA targets. To verify the presence of disease-SNPs within miRNA loci, MIRit directly queries the NHGRI-EBI Catalog of published genome-wide association studies through the `r CRANpkg("gwasrapidd")` package, and then retains only SNPs that affect DE-miRNA genes or their relative host genes (for intragenic miRNAs).
732
+Interestingly, MIRit enables to explore the presence of disease-associated SNPs
733
+located in differentially expressed miRNAs. In this concern, SNPs occurring
734
+within miRNA loci may have important effects on the biological function of these
735
+transcripts, as they might alter their expression or the spectrum of targets.
736
+To verify the presence of disease-SNPs within miRNA loci, MIRit directly queries
737
+the NHGRI-EBI Catalog of published genome-wide association studies through the
738
+`r CRANpkg("gwasrapidd")` package, and then retains only SNPs that affect
739
+DE-miRNA genes or their relative host genes (for intragenic miRNAs).
470 740
 
471
-In our case, there are no SNPs associated with thyroid cancer that occur witin DE-miRNA loci. Therefore, we will explain how this function works with reference to the analysis on Alzheimer's disease reported in the MIRit paper.
741
+In our case, there are no SNPs associated with thyroid cancer that occur within
742
+differentially expressed miRNAs. Therefore, we will demonstrate the use of this
743
+function with SNPs associated with the response to antidepressant drugs.
472 744
 
473 745
 ## Search disease-related SNPs
474 746
 
475
-First, we need to identify the Experimental Factor Ontology (EFO) identifier of a given disease of interest. To do so, MIRit provides the `searchDisease()` function. For example, to identify the relevant EFO ID for Alzheimer disease we can use the following code chunk.
747
+First, we need to identify the Experimental Factor Ontology (EFO) identifier of
748
+a given phenotype. To do so, MIRit provides the `searchDisease()` function.
749
+For example, to identify the relevant EFO ID for antidepressant response we can
750
+use the following code chunk.
476 751
 
477
-```{r seach_disease, eval=FALSE}
478
-## identify the EFO ID corresponding to Alzheimer's disease
479
-searchDisease("alzheimer")
752
+```{r seach_disease}
753
+## identify the EFO ID corresponding to antidepressant response
754
+searchDisease("antidepressant")
480 755
 ```
481 756
 
482
-The relevant EFO trait is "Alzheimer disease".
757
+The relevant EFO trait is "response to antidepressant".
483 758
 
484 759
 ## Identify miRNA genes overlapping with disease-SNPs
485 760
 
486
-Now, we can use the `findMirnaSNPs()` function to identify the disease-related SNPs that affect miRNA loci. 
761
+Now, we can use the `findMirnaSNPs()` function to identify the disease-related
762
+SNPs that affect miRNA loci. 
487 763
 
488
-```{r snp_association, eval=FALSE}
489
-## detect disease-SNPs occuring at DE-miRNAs loci
490
-association <- findMirnaSNPs(experiment, "Alzheimer disease")
764
+```{r snp_association}
765
+## detect SNPs occuring at DE-miRNAs loci
766
+association <- findMirnaSNPs(experiment, "response to antidepressant")
491 767
 ```
492 768
 
493 769
 ## Build a track plot to display miRNA-SNP associations
494 770
 
495
-Finally, if any disease-related SNPs is present within DE-miRNA loci, we can use the `mirVariantPlot()` function to graphically build a track plot displaying the polymorphism along with the relevant genomic context, including the corresponding miRNA locus.
771
+If any disease-related SNP is present within DE-miRNA loci, we can use the
772
+`mirVariantPlot()` function to graphically build a track plot that displays the
773
+polymorphism along with the relevant genomic context, including the
774
+corresponding miRNA locus.
496 775
 
497
-```{r track_plot, eval=FALSE}
776
+```{r trackPlot, fig.dim=c(7, 3.5), fig.cap="Track plot for miRNA-SNPs. This trackplot shows the proximity of rs2402960 with the locus that encodes for miR-182."}
498 777
 ## create a track plot to represent disease-SNPs at DE-miRNA loci
499
-mirVariantPlot("rs2632516", association, showContext = TRUE)
778
+mirVariantPlot("rs2402960", association, showSequence = FALSE)
500 779
 ```
501 780
 
781
+## Explore the evidence supporting SNPs association
782
+
783
+Finally, to review the literature supporting the association between SNPs and
784
+specific traits, and possibly with differentially expressed miRNAs, MIRit
785
+provides the `getEvidence()` function, which returns a data.frame reporting
786
+some details of the studies where the association is supported.
787
+
788
+For example, we can see the list of experiments where the association between
789
+rs2402960 and the response to antidepressants was observed.
790
+
791
+```{r snpEvidence}
792
+## retrieve the evidence supporting SNP-trait association
793
+snpEvidence <- getEvidence(variant = "rs2402960",
794
+                           diseaseEFO = "response to antidepressant")
795
+
796
+## take a look at the evidence table
797
+head(snpEvidence)
798
+```
799
+
800
+
502 801
 # Retrieve miRNA targets
503 802
 
504
-Before performing integrative miRNA-mRNA analyses, we need to identify the targets of differentially expressed miRNAs, so that we can test whether they really affect the levels of their targets or not.
803
+Before performing integrative miRNA-mRNA analyses, we need to identify the
804
+targets of differentially expressed miRNAs, so that we can test whether they
805
+really affect the levels of their targets or not.
505 806
 
506 807
 ## Databases with miRNA-mRNA interactions
507 808
 
508
-Different resources have been developed over the years to predict and collect miRNA-target interactions, and we can categorize them in two main types:
809
+Different resources have been developed over the years to predict and collect
810
+miRNA-target interactions, and we can categorize them in two main types:
509 811
 
510
-- **Prediction databases**, that contain information about computationally determined miRNA-target interactions;
511
-- **Validated databases**, which only contain interactions that have been proven through biomolecular experiments.
812
+- **Prediction databases**, that contain information about computationally
813
+determined miRNA-target interactions; and
814
+- **Validated databases**, which only contain interactions that have been proven
815
+through biomolecular experiments.
512 816
 
513
-The choice of which type of resources to use for identifying miRNA targets drastically influences the outcome of the analysis. In this regard, some researchers tend to give the priority to validated interactions, even though they are usually fewer than predicted ones. On the other hand, predicted pairs are much more numerous, but they exhibit a high number of false positive hits.
817
+The choice of which type of resources to use for identifying miRNA targets
818
+drastically influences the outcome of the analysis. In this regard, some
819
+researchers tend to give the priority to validated interactions, even though
820
+they are usually fewer than predicted ones. On the other hand, predicted pairs
821
+are much more numerous, but they exhibit a high number of false positive hits.
514 822
 
515 823
 ## The mirDIP approach
516 824
 
517
-The downside of miRNA target prediction algorithms is also the scarce extend of overlap existing between different tools. To address this issue, several ensemble methods have been developed, trying to aggregate the predictions obtained by different algorithms. Initially, several researchers determined as significant miRNA-target pairs those predicted by more than one tool (intersection method). However, this method is not able to capture an important number of meaningful interactions. Alternatively, other strategies used to merge predictions from several algorithms (union method). Despite identifying more true relationships, the union method leads to a higher proportion of false discoveries. Therefore, other ensemble methods started using other statistics to rank miRNA-target predictions obtained by multiple algorithms. Among these newly developed ensemble methods, one of the best performing one is the **microRNA Data Integration Portal (mirDIP)** database, which aggregates miRNA target predictions from 24 different resources by using an integrated score inferred from different prediction metrics. In this way, mirDIP reports more accurate predictions compared to those of individual tools. For additional information on mirDIP database and its ranking metric refer to @tokar_mirdip_2018 and @hauschild_mirdip_2023.
825
+The downside of miRNA target prediction algorithms is also the scarce extend of
826
+overlap existing between different tools. To address this issue, several
827
+ensemble methods have been developed, trying to aggregate the predictions
828
+obtained by different algorithms. Initially, several researchers determined as
829
+significant miRNA-target pairs those predicted by more than one tool
830
+(intersection method). However, this method is not able to capture an important
831
+number of meaningful interactions. Alternatively, other strategies used to merge
832
+predictions from several algorithms (union method). Despite identifying more
833
+true relationships, the union method leads to a higher proportion of false
834
+discoveries. Therefore, other ensemble methods started using other statistics to
835
+rank miRNA-target predictions obtained by multiple algorithms. Among these newly
836
+developed ensemble methods, one of the best performing one is the
837
+**microRNA Data Integration Portal (mirDIP)** database, which aggregates miRNA
838
+target predictions from 24 different resources by using an integrated score
839
+inferred from different prediction metrics. In this way, mirDIP reports more
840
+accurate predictions compared to those of individual tools. For additional
841
+information on mirDIP database and its ranking metric refer to
842
+@tokar_mirdip_2018 and @hauschild_mirdip_2023.
518 843
 
519 844
 ## Download predicted and validated interactions with `getTargets()`
520 845
 
521
-Given the above, MIRit allows the prediction of miRNA-target interactions via the **mirDIP** database (version 5.2). In addition, in order to raise the number of true interactions, MIRit combines the miRNA-target pairs returned by mirDIP with the experimentally validated interactions contained in the **miRTarBase** database (version 9) [@huang_mirtarbase_2022]. In practice, to identify miRNA targets MIRit implements the `getTargets()` function, which allows to download both type of interactions. Specifically, this function also includes a parameter called `score` that determines the degree of confidence required for the targets predicted by mirDIP. The value of this parameter must be one of `Very High`, `High` (default), `Medium`, and `Low`, which correspond to ranks among top 1%, top 5% (excluding top 1%), top 1/3 (excluding top 5%) and remaining predictions, respectively. Moreover, the `includeValidated` parameter tells MIRit whether to include experimentally validate interactions deriving from miRTarBase (default is TRUE). Please note that mirDIP database is only available for human miRNAs; thus, for species other than Homo sapiens, only validated interactions contained in miRTarBase are used.
522
-
523
-In our example, we are going to retrieve both predicted and validated interactions by using default settings.
524
-
525
-```{r targets, eval=FALSE}
846
+Given the above, MIRit allows the prediction of miRNA-target interactions via
847
+the **mirDIP** database (version 5.2). In addition, in order to raise the number
848
+of true interactions, MIRit combines the miRNA-target pairs returned by mirDIP
849
+with the experimentally validated interactions contained in **miRTarBase**
850
+(version 9) [@huang_mirtarbase_2022]. In practice, to identify miRNA targets,
851
+MIRit implements the `getTargets()` function. Specifically, this function
852
+includes a parameter called `score` that determines the degree of confidence
853
+required for the targets predicted by mirDIP. The value of this parameter must
854
+be one of `Very High`, `High` (default), `Medium`, and `Low`, which correspond
855
+to ranks among top 1%, top 5%, top 1/3, and remaining predictions, respectively.
856
+Moreover, the `includeValidated` parameter tells MIRit whether to include
857
+experimentally validated interactions deriving from miRTarBase. It is also
858
+possible (with the `evidence` parameter) to consider all interactions in
859
+miRTarBase, or just limiting the retrieval to those interactions with strong
860
+experimental evidence. Please note that mirDIP database is only available for
861
+human miRNAs; thus, for species other than Homo sapiens, only validated
862
+interactions contained in miRTarBase are used.
863
+
864
+In our example, we are going to retrieve both predicted and validated
865
+interactions by using default settings.
866
+
867
+```{r targets, results='hide'}
526 868
 ## retrieve miRNA target genes
527 869
 experiment <- getTargets(experiment)
528 870
 ```
529 871
 
530
-After running this function, we obtain a `MirnaExperiment` object containing miRNA-target interactions in its `targets` slot. The user can access a `data.frame` detailing these interactions through the `mirnaTargets()` function.
872
+After running this function, we obtain a `MirnaExperiment` object containing
873
+miRNA-target interactions in its `targets` slot. The user can access a
874
+`data.frame` detailing these interactions through the `mirnaTargets()` function.
531 875
 
532
-# Investigate the effects of miRNA expression changes on target genes
533 876
 
534
-Now that we have defined the targets of differentially expressed miRNAs, we can continue with the integrative analysis of miRNA and gene expression levels. This analysis is useful as it allows to only consider miNA-target pairs where an inverse relationship is observed. As already mentioned, MIRit can work with both paired and unpaired data by using different statistical approaches, including:
877
+# Assess the effects of miRNAs on target genes
535 878
 
536
-- **Correlation analysis**, which is the recommended method when samples are paired;
537
-- **Association tests**, like Fisher's exact test and Boschloo's exact test;
538
-- **Rotation gene-set tests**, as implemented in the `fry` function from `r Biocpkg("limma")` package.
879
+Now that we have defined the targets of differentially expressed miRNAs, we can
880
+continue with the integrative analysis of miRNA and gene expression levels. The
881
+purpose of this analysis is to only consider miNA-target pairs where an inverse
882
+relationship is observed.
539 883
 
540
-For unpaired data, only association tests and rotation gene-set tests are available, whereas correlation analysis is the best performing strategy for paired data. The integrative analysis, either performed through correlation, association tests, or rotation gene-set tests, is implemented in the `mirnaIntegration()` function. When using the default option `test = "auto"`, MIRit automatically performs the appropriate test for paired and unpaired samples. If only some samples of the data set have paired measurements, a correlation analysis will be carried out only for those subjects.
884
+As already mentioned, MIRit can work with both paired and unpaired data by using
885
+different statistical approaches, including:
886
+
887
+- **Correlation analysis**, which is the recommended method when samples are
888
+paired;
889
+- **Association tests**, like Fisher's exact test and Boschloo's exact test;
890
+- **Rotation gene-set tests**, as implemented in the `fry` function from
891
+`r Biocpkg("limma")` package.
892
+
893
+For unpaired data, only association tests and rotation gene-set tests are
894
+available, whereas correlation analysis is the best performing strategy for
895
+paired data. The integrative analysis, either performed through correlation,
896
+association tests, or rotation gene-set tests, is implemented in the
897
+`mirnaIntegration()` function. When using the default option `test = "auto"`,
898
+MIRit automatically performs the appropriate test for paired and unpaired
899
+samples. If only some samples of the dataset have paired measurements, a
900
+correlation analysis will be carried out only for those subjects.
541 901
 
542 902
 ## Correlation analysis for paired data
543 903
 
544
-When both miRNA and gene expression measurements are available for the same samples, a correlation analysis is the recommended procedure. In statistics, correlation is a measure that expresses the extent to which two random variables are dependent. In our case, we want to assess whether a statistical relationship is present between the expression of a miRNA and the expression of its targets.
904
+When both miRNA and gene expression measurements are available for the same
905
+samples, a correlation analysis is the recommended procedure. In statistics,
906
+correlation is a measure that expresses the extent to which two random variables
907
+are dependent. In our case, we want to assess whether a statistical relationship
908
+is present between the expression of a miRNA and the expression of its targets.
545 909
 
546 910
 ### Statistical correlation coefficients
547 911
 
548
-Several statistical coefficients can be used to weigh the degree of a correlation. Among them, the most commonly used are *Pearson's correlation coefficient* $r$, *Spearman's correlation coefficient* $\rho$, and *Kendall's Tau-b correlation coefficient* $\tau_b$. Pearson's $r$ is probably the most diffused for determining the association between miRNA and gene expression. However, it assumes that the relationships between miRNA and gene expression values is linear. This is typically not true for miRNAs, whose interactions with their targets are characterized by imperfect complementarity. Additionally, miRNAs can target multiple genes with different binding sites, and this implies that a simple linear relationship may not be sufficient to properly describe the complexity of these interactions. In contrast, Spearman's and Kendall's Tau-b correlation coefficients result more suitable for representing the interplay between miRNAs and target genes, because they are robust to non-linear relationships and outliers. However, Kendall's correlation just relies on the number of concordant and discordant pairs, and is less sensitive then Spearman's correlation; so, when many ties are present or when the sample size is small, it may have a lower detection power. This is the reason why **Spearman's correlation coefficient** is the default coefficient used in the `mirnaIntegration()` function to measure the correlation between miRNA and gene expression. Moreover, since miRNAs mainly act as negative regulators, only negatively correlated miRNA-target pairs are considered, and statistical significance is estimated through a one-tailed t-test.
912
+Several statistical coefficients can be used to weigh the degree of a
913
+correlation. Among them, the most commonly used are
914
+*Pearson's correlation coefficient* $r$, *Spearman's correlation coefficient*
915
+$\rho$, and *Kendall's Tau-b correlation coefficient* $\tau_b$. Pearson's $r$ is
916
+probably the most diffused for determining the association between miRNA and
917
+gene expression. However, it assumes that the relationship between miRNA and
918
+gene expression values is linear. This is typically not true for miRNAs, whose
919
+interactions with their targets are characterized by imperfect complementarity.
920
+Additionally, miRNAs can target multiple genes with different binding sites, and
921
+this implies that a simple linear relationship may not be sufficient to properly
922
+model the complexity of these interactions. In contrast, Spearman's and
923
+Kendall's Tau-b correlation coefficients result more suitable for representing
924
+the interplay between miRNAs and targets, because they are robust to non-linear
925
+relationships and outliers. However, Kendall's correlation just relies on the
926
+number of concordant and discordant pairs, and is less sensitive then Spearman's
927
+correlation; so, when many ties are present or when the sample size is small, it
928
+may have a lower detection power. This is the reason why
929
+**Spearman's correlation coefficient** is the default used in the
930
+`mirnaIntegration()` function. Moreover, since miRNAs mainly act as negative
931
+regulators, only negatively correlated miRNA-target pairs are considered, and
932
+statistical significance is estimated through a one-tailed t-test.
549 933
 
550 934
 ### Perform a correlation analysis in MIRit {#correlation}
551 935
 
552
-To sum up the steps that MIRit follows when evaluating the correlation between miRNAs and genes, what the `mirnaIntegration()` function does during a correlation analysis is to:
936
+To sum up the steps that MIRit follows when evaluating the correlation between
937
+miRNAs and genes, what the `mirnaIntegration()` function does during a
938
+correlation analysis is to:
553 939
 
554
-1. consider the miRNA-target interactions retrieved with the `getTargets()` function;
555
-2. calculate the correlation coefficient for each miRNA-target pair based on their expression values;
940
+1. consider the miRNA-target interactions retrieved with the `getTargets()`
941
+function;
942
+2. calculate the correlation coefficient for each miRNA-target pair based on
943
+their expression values;
556 944
 3. compute the statistical significance of all miRNA-target pairs;
557 945
 4. adjust p-values for multiple testing before reporting significant results.
558 946
 
559
-In our thyroid cancer example, we want to find the miRNA-target pairs that exhibit a negative correlation with a Spearman's coefficient lower than -0.5 and with an adjusted p-value lower than 0.05.
947
+In our thyroid cancer example, we want to find the miRNA-target pairs that
948
+exhibit a negative correlation with a Spearman's coefficient lower than -0.5 and
949
+with an adjusted p-value less than 0.05.
560 950
 
561 951
 ```{r correlation}
562 952
 ## perform a correlation analysis
563 953
 experiment <- mirnaIntegration(experiment, test = "correlation")
564 954
 ```
565 955
 
566
-Please note that all the parameters used for the correlation analysis are customizable. For instance, the user can change the significance threshold and the multiple testing correction method by setting the `pCutoff` and `pAdjustment` parameters, respectively. Further, it is also possible to change the correlation coefficient used, by editing the `corMethod` option, and the minimum required value of the correlation coefficient, by changing the `corCutoff` setting.
956
+Please note that all the parameters used for the correlation analysis are
957
+customizable. For instance, the user can change the significance threshold and
958
+the multiple testing correction method by setting the `pCutoff` and
959
+`pAdjustment` parameters, respectively. Further, it is also possible to change
960
+the correlation coefficient used, by editing the `corMethod` option, and the
961
+minimum required value of the correlation coefficient, by changing the
962
+`corCutoff` setting.
567 963
 
568 964
 ### Account for batch effects prior to correlation analysis
569 965
 
570
-Sometimes, when exploring expression variability through MDS plots, as we do with the `plotDimensions()` function, we notice the presence of batch effects that prevent a clear separation of our biological groups. Indeed, batch effects consist in unwanted sources of technical variation that confound expression variability and limit downstream analyses. Since the reliability of biological conclusions of integrative miRNA-mRNA analyses depends on the correlation between miRNA and gene expression levels, it is pivotal to ensure that expression measurements are not affected by technical variation. In this regard, if batch effects are noticed in the data, MIRit provides the `batchCorrection()` function, which removes batch effects from expression data before moving to correlation analysis. Please note that this procedure can only be used prior to correlation analysis, because for differential expression analysis it is more appropriate to include batch variables in the linear model, as specified in Section \@ref(model). For additional information, please refer to the manual of the `batchCorrection()` function.
966
+Sometimes, when exploring expression variability through MDS plots, as we do
967
+with the `plotDimensions()` function, we notice the presence of batch effects
968
+that prevent a clear separation of our biological groups. Batch effects
969
+consist in unwanted sources of technical variation that confound expression
970
+variability and limit downstream analyses. Since the reliability of biological
971
+conclusions depends on the correlation between miRNAs and genes,
972
+it is pivotal to ensure that expression measurements are scarcely affected by
973
+technical artifacts. In this regard, if strong batch effects are noticed in the
974
+data, MIRit provides the `batchCorrection()` function, which removes batch
975
+effects prior to correlation analysis. Please note that this procedure cannot be
976
+used before differential expression testing, because for that purpose it is more appropriate to include batch variables in the linear model, as specified in
977
+Section \@ref(model). For additional information, please refer to the manual of
978
+the `batchCorrection()` function.
571 979
 
572 980
 ### Explore the succesfully integrated miRNA-target pairs
573 981
 
574
-Before moving on to the identification of the altered miRNAs regulatory networks, we can explore correlated miRNA-target pairs thanks to the `integration()` function, which returns a `data.frame` object with comprehensive details about the computed correlations.
982
+Before moving to the identification of the altered miRNAs regulatory networks,
983
+we can explore correlated miRNA-target pairs thanks to the `integration()`
984
+function, which returns a `data.frame` object with comprehensive details about
985
+the computed interactions.
575 986
 
576 987
 ```{r correlationResults}
577 988
 ## extract correlation results
... ...
@@ -583,7 +994,13 @@ head(integrationResults)
583 994
 
584 995
 ### Visualize the correlation between miRNAs and genes
585 996
 
586
-Additionally, for correlation analyses, MIRit allows to graphically represent inverse correlations through a scatter plot. Indeed, we can make use of the `plotCorrelation()` function to display the correlation between specific miRNA-target pairs. For example, the correlation analysis performed in Section \@ref(correlation) revealed how miR-146b-5p, the most upregulated miRNA, is inversely correlated with the expression of DIO2, which is crucial for thyroid hormone functioning. Furthermore, it has also emerged that miR-146b-3p results negatively correlated with PAX8, which directly induces TG transcription.
997
+Additionally, MIRit allows to graphically represent inverse correlations through
998
+a scatter plot. To do so, we can use the `plotCorrelation()` function to display
999
+the correlation between specific miRNA-target pairs. For example, we can plot
1000
+the existing correlation between miR-146b-5p and DIO2, which is crucial for
1001
+thyroid hormone functioning. Furthermore, we can also show how the upregulation
1002
+of miR-146b-3p is associated with the downregulation of PAX8, which directly
1003
+induces TG transcription.
587 1004
 
588 1005
 ```{r corPlot, fig.wide=TRUE, fig.cap="Correlation between miRNAs and key thyroid genes. These plots suggest that the upregulation of miR-146b-5p and miR-146b-3p may be responsible for the downregulation of DIO2 and PAX8, respectively."}
589 1006
 ## plot the correlation between miR-146b-5p and DIO2
... ...
@@ -605,74 +1022,121 @@ ggpubr::ggarrange(cor1, cor2, nrow = 1,
605 1022
 
606 1023
 ## Association tests for unpaired data
607 1024
 
608
-For unpaired data, we cannot directly quantify the influence of miRNA expression on the levels of their targets, because we do not have any sample correspondence between miRNA and gene measurements. However, **one-sided association tests** can be applied in these cases to evaluate if targets of downregulated miRNAs are statistically enriched in upregulated genes, and, conversely, if targets of upregulated miRNAs are statistically enriched in downregulated genes. In this regard, to estimate the effects of differentially expressed miRNAs on their target genes, MIRit can use two different one-sided association tests, namely:
1025
+For unpaired data, we cannot directly quantify the influence of miRNA expression
1026
+on the levels of their targets, because we do not have any sample correspondence
1027
+between miRNA and gene measurements. However, **one-sided association tests**
1028
+can be applied in these cases to evaluate if targets of downregulated miRNAs are
1029
+statistically enriched in upregulated genes, and, conversely, if targets of
1030
+upregulated miRNAs are statistically enriched in downregulated genes. In this
1031
+regard, to estimate the effects of differentially expressed miRNAs on their
1032
+targets, MIRit can use two different one-sided association tests, namely:
609 1033
 
610 1034
 - **Fisher's exact test**,
611 1035
 - **Boschloo's exact test** (default).
612 1036
 
613
-Both these tests consist in a statistical procedure that estimates the association between two dichotomous categorical variables. In our case, for each miRNA, we want to evaluate whether the proportion of targets within the differentially expressed genes significantly differs from the proportion of targets in non-differentially expressed genes. To do this, a 2x2 contingency table is built as shown in Table \@ref(tab:contingency).
1037
+Both these tests consist in a statistical procedure that estimates the
1038
+association between two dichotomous categorical variables. In our case, for each
1039
+miRNA, we want to evaluate whether the proportion of targets within the
1040
+differentially expressed genes significantly differs from the proportion of
1041
+targets in non-differentially expressed genes. To do this, a 2x2 contingency
1042
+table is built as shown in Table \@ref(tab:contingency).
614 1043
 
615
-|                                  | Target genes | Non target genes |      Row total      |
616
-|--------------------:|:------------------:|:------------------:|:------------------:|
617
-|     **Differentially expressed** |     $a$      |       $b$        |       $a + b$       |
618
-| **Non differentially expressed** |     $c$      |       $d$        |       $c + d$       |
619
-|                 **Column total** |   $a + c$    |     $b + d$      | $a + b + c + d = n$ |
1044
+| | Target genes | Non target genes | Row total |
1045
+|---:|:---:|:---:|:---:|
1046
+| **Differentially expressed** | $a$ | $b$ | $a + b$ |
1047
+| **Non differentially expressed** | $c$ | $d$ | $c + d$ |
1048
+| **Column total** | $a + c$ | $b + d$ | $a + b + c + d = n$ |
620 1049
 
621
-: (#tab:contingency) The 2x2 contingency table that MIRit uses for one-sided association tests. This is the table that the `mirnaIntegration()` function creates to determine if differentially expressed genes are enriched in miRNA targets.
1050
+: (#tab:contingency) The 2x2 contingency table that MIRit uses for one-sided
1051
+association tests. This is the table that the `mirnaIntegration()` function
1052
+creates to determine if differentially expressed genes are enriched in miRNA
1053
+targets.
622 1054
 
623 1055
 ### Fisher's exact test
624 1056
 
625
-After that the contingency table has been defined, Fisher's exact test p-value can be calculated through Equation \@ref(eq:fisher).
1057
+When the contingency table has been defined, Fisher's exact test p-value can be
1058
+calculated through Equation \@ref(eq:fisher).
626 1059
 
627 1060
 \begin{equation}
628 1061
   p = \frac{(a + b)!\ (c + d)!\ (a + c)!\ (b + d)!}{a!\ b!\ c!\ d!\ n!}
629 1062
   (\#eq:fisher)
630 1063
 \end{equation}
631 1064
 
632
-Additionally, it is also possible to compute Fisher's p-values with **Lancaster's mid-p adjustment**, since it has been proven that it increases statistical power while retaining Type I error rates.
1065
+Additionally, it is also possible to compute Fisher's p-values with
1066
+**Lancaster's mid-p adjustment**, since it has been proven that it increases
1067
+statistical power while retaining Type I error rates.
633 1068
 
634 1069
 ### Boschloo's exact test
635 1070
 
636
-In contrast to Fisher's exact test, a more appropriate method for the integrative analysis between miRNAs and genes is Boschloo's exact test. Indeed, the major drawback of the Fisher's exact test is that it consists in a conditional test that requires the sum of both rows and columns of a contingency table to be fixed. Notably, this is not true for genomic data because it is likely that different data sets may lead to a different number of DEGs. Therefore, the **default** behavior in MIRit is to use a variant of Barnard's exact test, named **Boschloo's exact test**, that is suitable when group sizes of contingency tables are variable. Moreover, it is possible to demonstrate that Boschloo's test is uniformly more powerful compared to Fisher's exact test. However, keep in mind that Boschloo's test is much more computational intensive compared to Fisher's exact test, and it may require some time.
1071
+The major drawback of the Fisher's exact test is that it consists in a
1072
+conditional test that requires the sum of both rows and columns of a contingency
1073
+table to be fixed. Notably, this is not true for genomic data because it is
1074
+likely that different datasets may lead to a different number of DEGs.
1075
+Therefore, the **default** behavior in MIRit is to use a variant of Barnard's
1076
+exact test, named **Boschloo's exact test**, that is suitable when group sizes
1077
+of contingency tables are variable. Moreover, it is possible to demonstrate that
1078
+Boschloo's exact test is uniformly more powerful compared to Fisher's one.
1079
+However, keep in mind that Boschloo's test is much more computational intensive
1080
+compared to Fisher's exact test, and it may require some time, even though
1081
+parallel computing is employed.
637 1082
 
638 1083
 ### Perform one-sideded association tests in MIRit
639 1084
 
640
-In MIRit, the `mirnaIntegration()` function automatically performs association tests for unpaired data when `test = "auto"`. Moreover, the type of association test to use can be specified through the `associationMethod` parameter, which can be set to:
1085
+In MIRit, the `mirnaIntegration()` function automatically performs association
1086
+tests for unpaired data when `test = "auto"`. Moreover, the type of association
1087
+test to use can be specified through the `associationMethod` parameter, which
1088
+can be set to:
641 1089
 
642 1090
 - `fisher`, to perform a simple one-sided Fisher's exact test;
643
-- `fisher-midp`, to perform a one-sided Fisher's exact test with Lancaster's mid-p correction;
1091
+- `fisher-midp`, to perform a one-sided Fisher's exact test with Lancaster's
1092
+mid-p correction; and
644 1093
 - `boschloo`, to perform a one-sided Boschloo's exact test (*default option*).
645 1094
 
646
-For example, we could use the Boschloo's exact to evaluate the inverse association between miRNA and gene expression values through a simple call to `mirnaIntegration()` function.
1095
+For example, we could use Fisher's exact test with mid-p correction to evaluate
1096
+the inverse association between miRNA and gene expression.
647 1097
 
648
-```{r association, eval=FALSE}
1098
+```{r association}
649 1099
 ## perform a one-sided inverse association
650 1100
 exp.association <- mirnaIntegration(experiment,
651 1101
                                     test = "association",
652
-                                    associationMethod = "boschloo",
1102
+                                    associationMethod = "fisher-midp",
1103
+                                    pCutoff = 0.2,
653 1104
                                     pAdjustment = "none")
654 1105
 ```
655 1106
 
656
-Finally, after performing the association analysis, results can be accessed through the `integration()` function in the same way as we can do for correlation analyses.
1107
+In the end, results can be accessed through the `integration()` function in the
1108
+same way as we can do for correlation analyses.
657 1109
 
658 1110
 ## Rotation gene-set tests for unpaired data
659 1111
 
660
-Lastly, for unpaired data, the effect of DE-miRNAs on the expression of target genes can be estimated through rotation gene-set tests. In this approach, we want to evaluate for each miRNA whether its target genes tend to be differentially expressed in the opposite direction. In particular, a fast approximation to rotation gene-set testing called `fry`, implemented in the `r Biocpkg("limma")` package, can be used to statistically quantify the influence of miRNAs on expression changes of their target genes.
1112
+Lastly, for unpaired data, the effect of DE-miRNAs on the expression of target
1113
+genes can be estimated through rotation gene-set tests. In this approach, we
1114
+want to evaluate for each miRNA whether its target genes tend to be
1115
+differentially expressed in the opposite direction. In particular, a fast
1116
+approximation to rotation gene-set testing called `fry`, implemented in the
1117
+`r Biocpkg("limma")` package, can be used to statistically quantify the
1118
+impact of miRNAs on expression changes of their targets.
661 1119
 
662
-To perform the integrative analysis through rotation gene-set tests, we must simply set `test = "fry"` when calling `mirnaIntegration()` function.
1120
+To perform the integrative analysis through rotation gene-set tests, we must
1121
+simply set `test = "fry"` when calling the `mirnaIntegration()` function.
663 1122
 
664
-```{r fry, eval=FALSE}
1123
+```{r fry}
665 1124
 ## perform the integrative analysis through 'fry' method
666 1125
 exp.fry <- mirnaIntegration(experiment,
667 1126
                             test = "fry",
668
-                            pCutoff = 0.1)
1127
+                            pAdjustment = "none")
669 1128
 ```
670 1129
 
671 1130
 ## Functional enrichment of integrated target genes
672 1131
 
673
-Additionally, after finding miRNA-target pairs that appear to have an inverse association, we can try to identify the impaired biological functions as a result of miRNA dysregulations through ORA. To do this, MIRit provides the `enrichTargets()` function, which automatically performs ORA for the functional enrichment of target genes that result associated with differentially expressed miRNAs.
1132
+After finding influential miRNA-target pairs, we can try to identify the
1133
+consequences of miRNomic alterations through ORA. To do this, MIRit provides the
1134
+`enrichTargets()` function, which automatically performs ORA for target genes
1135
+that result associated with differentially expressed miRNAs.
674 1136
 
675
-In our example, we are going to enrich the significantly anti-correlated targets that we have found in Section \@ref(correlation) through the Disease Ontology database.
1137
+In our example, we are going to enrich the significantly anti-correlated targets
1138
+that we have found in Section \@ref(correlation) through the Disease Ontology
1139
+database.
676 1140
 
677 1141
 ```{r intTargEnr, fig.cap="Functional enrichment of integrated targets. This dot plot shows the enriched diseases for downregulated genes."}
678 1142
 ## enrichment of integrated targets
... ...
@@ -685,15 +1149,34 @@ enrichmentDotplot(oraTarg$downregulated,
685 1149
                   title = "Depleted diseases")
686 1150
 ```
687 1151
 
688
-In Figure \@ref(fig:intTargEnr), we appreciate the depletion of diseases where thyroid gland is overly active, such as goiter and hyperthyroidism, therefore suggesting the involvement of miRNAs in thyroid malfunctioning.
1152
+In Figure \@ref(fig:intTargEnr), we appreciate the depletion of diseases where
1153
+thyroid gland is overly active, such as goiter and hyperthyroidism, therefore
1154
+suggesting the involvement of miRNAs in thyroid malfunctioning.
689 1155
 
690
-# Identify the impaired miRNA-mRNA regulatory networks
691 1156
 
692
-Once the dysregulated miRNA-mRNA regulatory networks have been identified, the typical goal is to infer altered cellular processes and functions. To do so, MIRit introduces a novel approach named **Topology-Aware Integrative Pathway Analysis (TAIPA)**, which specifically focuses on detecting altered molecular networks in miRNA-mRNA multi-omic analyses by considering the topology of biological pathways and miRNA-mRNA interactions.
693
-
694
-## Topologically-Aware Integrative Pathway Analysis (TAIPA)
1157
+# Identify the impaired miRNA-mRNA regulatory networks
695 1158
 
696
-This analysis aims to identify the biological pathways that result affected by miRNA and mRNA dysregulations. In this analysis, biological pathways are retrieved from a pathway database such as KEGG, and the interplay between miRNAs and genes is then added to the networks. Each network is defined as a graph $G(V, E)$, where $V$ represents nodes, and $E$ represents the relationships between nodes. Then, nodes that are not significantly differentially expressed are assigned a weight $w_i = 1$, whereas differentially expressed nodes are assigned a weight $w_i = \left| \Delta E_i \right|$, where $\Delta E_i$ is the linear fold change of the node. Moreover, to consider the biological interaction between two nodes, namely $i$ and $j$, we define an interaction parameter $\beta_{i \rightarrow j} = 1$ for activation interactions and $\beta_{i \rightarrow j} = -1$ for repression interactions. Subsequently, the concordance coefficient $\gamma_{i \rightarrow j}$ is defined as in Equation \@ref(eq:gamma):
1159
+Once the dysregulated miRNA-mRNA pairs have been identified, the typical goal is
1160
+to infer altered cellular processes and networks. To do so, MIRit introduces a
1161
+novel approach named **Topology-Aware Integrative Pathway Analysis (TAIPA)**,
1162
+which specifically focuses on detecting compromised molecular networks in
1163
+miRNA-mRNA multi-omic analyses by considering the topology of biological
1164
+pathways and miRNA-interactions interactions.
1165
+
1166
+## Topology-Aware Integrative Pathway Analysis (TAIPA)
1167
+
1168
+In this analysis, biological pathways are retrieved from a pathway database such
1169
+as KEGG, and the interplay between miRNAs and genes is then added to the
1170
+networks. Each network is defined as a graph $G(V, E)$, where $V$ represents
1171
+nodes, and $E$ represents the relationships between nodes. Then, nodes that are
1172
+not significantly differentially expressed are assigned a weight $w_i = 1$,
1173
+whereas differentially expressed nodes are assigned a weight
1174
+$w_i = \left| \Delta E_i \right|$, where $\Delta E_i$ is the linear fold change
1175
+of the node. Moreover, to consider the biological interaction between two nodes,
1176
+namely $i$ and $j$, we define an interaction parameter
1177
+$\beta_{i \rightarrow j} = 1$ for activation interactions and
1178
+$\beta_{i \rightarrow j} = -1$ for repression interactions. Subsequently, the
1179
+concordance coefficient $\gamma_{i \rightarrow j}$ is defined as in Equation \@ref(eq:gamma):
697 1180
 
698 1181
 \begin{equation}
699 1182
   \gamma_{i \rightarrow j} = \begin{cases} \beta_{i \rightarrow j}
... ...
@@ -702,14 +1185,20 @@ This analysis aims to identify the biological pathways that result affected by m
702 1185
   (\#eq:gamma)
703 1186
 \end{equation}
704 1187
 
705
-Later in the process, a breadth-first search (BFS) algorithm is applied to topologically sort pathway nodes so that each individual node occurs after all its upstream nodes. Nodes within cycles are considered leaf nodes. At this point, a node score $\phi$ is calculated for each pathway node $i$ as in Equation \@ref(eq:phi):
1188
+Later in the process, a breadth-first search (BFS) algorithm is applied to
1189
+topologically sort pathway nodes so that each individual node occurs after all
1190
+its upstream nodes. Nodes within cycles are considered leaf nodes. At this
1191
+point, a node score $\phi$ is calculated for each pathway node $i$ as in
1192
+Equation \@ref(eq:phi):
706 1193
 
707 1194
 \begin{equation}
708 1195
   \phi_i = w_i + \sum_{j=1}^{U} \gamma_{i \rightarrow j} \cdot k_j\,,
709 1196
   (\#eq:phi)
710 1197
 \end{equation}
711 1198
 
712
-where $U$ represents the number of upstream nodes, $\gamma_{i \rightarrow j}$ denotes the concordance coefficient, and $k_j$ is a propagation factor defined as in Equation \@ref(eq:k):
1199
+where $U$ represents the number of upstream nodes, $\gamma_{i \rightarrow j}$
1200
+denotes the concordance coefficient, and $k_j$ is a propagation factor defined
1201
+as in Equation \@ref(eq:k):
713 1202
 
714 1203
 \begin{equation}
715 1204
   k_j = \begin{cases} w_j &\text{if } \phi_j = 0 \\ \phi_j &\text{if }
... ...
@@ -724,9 +1213,17 @@ Finally, the pathway score $\Psi$ is calculated as in Equation \@ref(eq:Psi):
724 1213
   (\#eq:Psi)
725 1214
 \end{equation}
726 1215
 
727
-where $M$ represents the proportion of miRNAs in the pathway, and $N$ represents the total number of nodes in the network. Then, to compute the statistical significance of each pathway score, a permutation procedure is applied. Later, both observed pathway scores and permuted scores are standardized by subtracting the mean score of the permuted sets $\mu_{\Psi_P}$ and then dividing by the standard deviation of the permuted scores $\sigma_{\Psi_P}$.
1216
+where $M$ represents the proportion of miRNAs in the pathway, and $N$ represents
1217
+the total number of nodes in the network. Then, to compute the statistical
1218
+significance of each pathway score, a permutation procedure is applied. Later,
1219
+both observed pathway scores and permuted scores are standardized by subtracting
1220
+the mean score of the permuted sets $\mu_{\Psi_P}$ and then dividing by the
1221
+standard deviation of the permuted scores $\sigma_{\Psi_P}$.
728 1222
 
729
-Finally, the p-value is defined based on the fraction of permutations that reported a higher normalized pathway score than the observed one. However, to prevent p-values equal to zero, we define p-values as in Equation \@ref(eq:pval):
1223
+Finally, the p-value is defined based on the fraction of permutations that
1224
+reported a higher normalized pathway score than the observed one. However, to
1225
+prevent p-values equal to zero, we define p-values as in
1226
+Equation \@ref(eq:pval):
730 1227
 
731 1228
 \begin{equation}
732 1229
   p = \frac{\sum_{n=1}^{N_p} \left[ \Psi_{P_N} \ge \Psi_N \right] + 1}
... ...
@@ -734,49 +1231,82 @@ Finally, the p-value is defined based on the fraction of permutations that repor
734 1231
   (\#eq:pval)
735 1232
 \end{equation}
736 1233
 
737
-In the end, either p-values are corrected for multiple testing through the max-T procedure (default option) which is particularly suited for permutation tests, or through the standard multiple testing approaches.
1234
+In the end, either p-values are corrected for multiple testing through the max-T 
1235
+procedure (default option) which is particularly suited for permutation tests,
1236
+or through the standard multiple testing approaches.
738 1237
 
739 1238
 ## Perform TAIPA in MIRit
740 1239
 
741
-Before performing TAIPA, we need to create miRNA-augmented networks. To do so, MIRit implements the `preparePathways()` function, which automatically uses the `r Biocpkg("graphite")` R package to download biological networks from multiple pathway databases, namely `KEGG`, `WikiPathways` and `Reactome`. Then, each pathway is converted to a `graph` object and significant miRNA-mRNA pairs are added to the network. Further, edge weights are included according to interaction type. After running this function, we obtain a `list` containing all the miRNA-augmented nwtworks as `graph` objects.
1240
+Before performing TAIPA, we need to create miRNA-augmented networks. To do so,
1241
+MIRit implements the `preparePathways()` function, which automatically uses the
1242
+`r Biocpkg("graphite")` R package to download biological networks from multiple
1243
+pathway databases, namely `KEGG`, `WikiPathways` and `Reactome`. Then, each
1244
+pathway is converted to a `graph` object and significant miRNA-mRNA pairs are
1245
+added to the network. Further, edge weights are included according to
1246
+interaction type. After running this function, we obtain a `list` containing
1247
+all the miRNA-augmented networks as `graph` objects.
742 1248
 
743
-In our example, we want to use the significant miRNA-target pairs that we identified in Section \@ref(correlation) to augment biological pathways retrieved from the KEGG database.
1249
+In our example, we want to use the significant miRNA-target pairs that we
1250
+identified in Section \@ref(correlation) to augment biological pathways
1251
+retrieved from the KEGG database.
744 1252
 
745
-```{r augmented_networks, eval=FALSE}
1253
+```{r augmented_networks}
746 1254
 ## create miRNA-augmented networks using KEGG pathways
747 1255
 networks <- preparePathways(experiment,
748 1256
                             database = "KEGG",
749
-                            organism = "Homo sapiens")
1257
+                            organism = "Homo sapiens",
1258
+                            minPc = 20)
750 1259
 ```
751 1260
 
752
-After running this function, pathways with less than 10% of nodes with expression measurements are removed. This option can be changed by specifying the `minPc` parameter.
1261
+After running this function, pathways with less than 20% of nodes with
1262
+expression measurements are removed. This option can be changed by specifying
1263
+the `minPc` parameter (default is 10%).
753 1264
 
754
-Now, we are ready to perform TAIPA through the `topologicalAnalysis()` function, which is used to calculate pathway scores for all the augmented networks and to evaluate their statistical significance through permutation tests.
1265
+Now, we are ready to perform TAIPA through the `topologicalAnalysis()` function,
1266
+which is used to calculate pathway scores for all the augmented networks and
1267
+evaluate their statistical significance through permutation testing. For
1268
+demonstration purposes, we only considered a smaller subset of augmented
1269
+pathways.
1270
+
1271
+```{r taipa}
1272
+## only consider a smaller set of augmented networks
1273
+networks <- networks[seq(15, 30)]
755 1274
 
756
-```{r taipa, eval=FALSE}
757 1275
 ## set seed for reproducible results
758 1276
 set.seed(1234)
759 1277
 
760
-## perform TAIPA
1278
+## perform TAIPA with 1000 permutations
761 1279
 taipa <- topologicalAnalysis(experiment,
762 1280
                              pathways = networks,
763
-                             nPerm = 10000)
1281
+                             nPerm = 1000)
764 1282
 ```
765 1283
 
766
-As a result of the analysis, an object of class `IntegrativePathwayAnalysis` storing the results of TAIPA is returned. Notably, the user can change the behavior of the `topologicalAnalysis()` in several ways. For example, the `pCutoff` and `pAdjustment` parameters can be used to change the significance threshold and the multiple testing correction method, respectively. Moreover, the `nPerm` parameter can be tweaked to change the number of permutations to use for evaluating statistical significance. In this regard, we recommend using at least 10000 permutations, with no less than 1000.
767
-
768
-```{r, echo=FALSE}
769
-## perform TAIPA
770
-taipa <- loadExamples("IntegrativePathwayAnalysis")
771
-```
1284
+As a result of the analysis, an object of class `IntegrativePathwayAnalysis`
1285
+storing the results of TAIPA is returned. Notably, the user can change the
1286
+behavior of `topologicalAnalysis()` in several ways. For example, the `pCutoff`
1287
+and `pAdjustment` parameters can be used to change the significance threshold
1288
+and the multiple testing correction method, respectively. Moreover, the `nPerm`
1289
+parameter can be tweaked to change the number of permutations used for
1290
+evaluating statistical significance. In this regard, we recommend using at least
1291
+10000 permutations, with no less than 1000.
772 1292
 
773 1293
 ## C++ code and parallel computing with `BiocParallel`
774 1294
 
775
-For computational efficiency, pathway score computation has been implemented in C++ language. Furthermore, since computing pathway score for 10000 networks for each pathway is computationally intensive, parallel computing has been employed to reduce running time. The user can modify the parallel computing behavior by specifying the `BPPARAM` parameter. See the help page of the `r Biocpkg("BiocParallel")` package for further details. Both the `preparePathways()` and the `topologicalAnalysis()` functions accept the `BPPARAM` option.
1295
+For computational efficiency, pathway score computation has been implemented in
1296
+C++ language. Furthermore, since computing pathway score for 10000 networks for
1297
+each pathway is computationally intensive, parallel computing has been employed
1298
+to reduce running time. The user can modify the parallel computing behavior by
1299
+specifying the `BPPARAM` parameter. See the help page of the
1300
+`r Biocpkg("BiocParallel")` package for further details. Both the
1301
+`preparePathways()` and the `topologicalAnalysis()` functions accept the
1302
+`BPPARAM` option.
776 1303
 
777 1304
 ## Visualize the significantly affected pathways
778 1305
 
779
-After running the `topologicalAnalysis()` function, we can inspect the significantly perturbed pathways contained in the `IntegrativePathwayAnalysis` object by using the `integratedPathways()` function, which returns a `data.frame` reporting the results of TAIPA.
1306
+After running the `topologicalAnalysis()` function, we can inspect the
1307
+significantly perturbed pathways contained in the `IntegrativePathwayAnalysis`
1308
+object by using the `integratedPathways()` function, which returns a
1309
+`data.frame` reporting the results.
780 1310
 
781 1311
 ```{r integratedPathways}
782 1312
 ## extract the results of TAIPA
... ...
@@ -785,17 +1315,23 @@ perturbedNetworks <- integratedPathways(taipa)
785 1315
 
786 1316
 ## Visualize the impaired networks within biological pathways
787 1317
 
788
-As with functional enrichment analyses, we can plot perturbed miRNA-mRNA networks as dot plots. To do so, the `integrationDotplot()` function can be used. In particular, we will graphically represent the most perturbed pathways in thyroid cancer.
1318
+As with functional enrichment analyses, we can plot perturbed miRNA-mRNA
1319
+networks as dotplots. To do so, the `integrationDotplot()` function can be used.
789 1320
 
790
-```{r integrationDot, fig.wide=FALSE, fig.cap="The perturbation of miRNA-mRNA networks in thyroid cancer. This dot plot display the impairment of the biological processes involved in the production of thyroid hormone, further highlighting the disruption of this mechanism in this disease."}
1321
+```{r integrationDot, fig.wide=FALSE, fig.cap="The perturbation of miRNA-mRNA networks in thyroid cancer. This dot plot display the impairment of thyroid hormone production."}
791 1322
 ## produce a dotplot that shows the most affected networks
792
-intDot <- integrationDotplot(taipa)
793
-intDot
1323
+integrationDotplot(taipa)
794 1324
 ```
795 1325
 
796
-Finally, after identifying the impaired molecular networks, MIRit provides the possibility of exploring the molecular perturbations. In this concern, the `visualizeNetworks()` function can be used to visually represent the compromised pathways along with expression changes of both miRNAs and genes, so that users can easily interpret the functional consequences of miRNA and gene dysregulations. For example, we can explore the perturbed molecular events that are responsible for diminished production of thyroid hormone in thyroid cancer.
1326
+Finally, MIRit provides the possibility of exploring the molecular
1327
+perturbations. In this concern, the `visualizeNetworks()` function can be used
1328
+to visually reconstruct the compromised pathways along with expression changes
1329
+of both miRNAs and genes, so that users can easily interpret the functional
1330
+consequences of miRNA and gene dysregulations. For example, we can explore the
1331
+perturbed molecular events that are responsible for diminished production of
1332
+thyroid hormone in thyroid cancer.
797 1333
 
798
-```{r thyroidNetwork, fig.wide=TRUE, fig.cap="Impaired miRNA-mRNA regulatory network prevents thyroid hormone synthesis in thyroid cancer. The network created by MIRit suggests that the upregulation of miR-146b-5p and miR-146b-3p may be responsible for diminished expression of PAX8, which in turn causes reduced transcription of thyroid hormone."}
1334
+```{r thyroidNetwork, fig.wide=TRUE, fig.cap="Impaired network involved in thyroid hormone synthesis. The network created by MIRit suggests that the upregulation of miR-146b-5p and miR-146b-3p may be responsible for diminished expression of PAX8, which in turn causes reduced transcription of thyroid hormone."}
799 1335
 ## plot the impaired network responsible for reduced TG synthesis
800 1336
 visualizeNetwork(taipa, "Thyroid hormone synthesis")
801 1337
 ```
... ...
@@ -807,4 +1343,3 @@ sessionInfo()
807 1343
 ```
808 1344
 
809 1345
 # References {.unnumbered}
810
-