Browse code

Version update

Christian Arnold authored on 23/09/2022 14:42:39
Showing19 changed files

... ...
@@ -1,6 +1,6 @@
1 1
 Package: GRaNIE
2 2
 Title: GRaNIE: Reconstruction cell type specific gene regulatory networks including enhancers using chromatin accessibility and RNA-seq data
3
-Version: 1.1.12
3
+Version: 1.1.13
4 4
 Encoding: UTF-8
5 5
 Authors@R: c(person("Christian", "Arnold", email =
6 6
         "chrarnold@web.de", role = c("cre","aut")),
... ...
@@ -81,7 +81,9 @@ Suggests:
81 81
     csaw,
82 82
     BiocParallel,
83 83
     topGO,
84
-    robust
84
+    robust,
85
+    variancePartition,
86
+    purrr
85 87
 VignetteBuilder: knitr
86 88
 biocViews: Software, GeneExpression, GeneRegulation, NetworkInference, GeneSetEnrichment, BiomedicalInformatics, Genetics, Transcriptomics, ATACSeq, RNASeq, GraphAndNetwork, Regression, Transcription, ChIPSeq
87 89
 License: Artistic-2.0
... ...
@@ -6,6 +6,7 @@ export(addConnections_peak_gene)
6 6
 export(addData)
7 7
 export(addTFBS)
8 8
 export(add_TF_gene_correlation)
9
+export(add_featureVariation)
9 10
 export(build_eGRN_graph)
10 11
 export(calculateCommunitiesEnrichment)
11 12
 export(calculateCommunitiesStats)
... ...
@@ -1,4 +1,6 @@
1
-# GRaNIE 1.1.12 (2022-09-13)
1
+
2
+
3
+# GRaNIE 1.1.12 and 1.1.13 (2022-09-13)
2 4
 
3 5
 ## Major changes
4 6
 
... ...
@@ -7,9 +9,12 @@
7 9
 - the eGRN graph structure as built by `build_eGRN_graph()` in the `GRaNIE` object is now reset whenever the function `filterGRNAndConnectGenes()` is successfully executed to make sure that enrichment functions etc are not using an outdated graph structure. 
8 10
 - the landing page of the website has been extended and overhauled
9 11
 - removed some dependency packages and moved others into `Suggests` to lower the installation burden of the package. In addition, removed `topGO` from the `Depends` section (now in `Suggests`) and removed `tidyverse` altogether (before in `Depends`). Detailed explanations when and how the packages listed under `Suggests` are needed can now be found in the new [Package Details Vignette](https://grp-zaugg.embl-community.io/GRaNIE/articles/GRaNIE_packageDetails.html#installation) and are clearly given to the user when executing the respective functions
12
+- major updates to the function `getGRNConnections`, which now has more arguments allowing a more fine-tuned and rich retrieval of eGRN connections, features and feature metadata
13
+- a new function `add_featureVariation` to quantify and interpret multiple sources of biological and technical variation for features (TFs, peaks, and genes) in a GRN object, see the R help for more information 
14
+- `filterGRNAndConnectGenes` now doesnt include feature metadata columns to save space in the result data frame that is created. The help has been updated to make clear that `getGRNConnections` includes these features now.
10 15
 
11 16
 ## Minor changes
12
-
17
+- small changes in the GRN object structure, moved `GRN@data$TFs@translationTable` to `GRN@annotation@TFs`. All exported functions run automatically a small helper function to make this change for any GRN object automatically to adapt to the new structure
13 18
 - many small changes in the code, updated argument checking, and preparing rigorous unit test inclusion
14 19
 - internally renaming the (recently changed / renamed) gene type `lncRNA` from `biomaRt` to `lincRNA` to be compatible with older versions of `GRaNIE`
15 20
 
... ...
@@ -165,8 +165,8 @@ addData <- function(GRN, counts_peaks, normalization_peaks = "DESeq_sizeFactor",
165 165
   start = Sys.time()
166 166
   
167 167
   checkmate::assertClass(GRN, "GRN")
168
-  GRN = .addFunctionLogToObject(GRN)      
169
-  
168
+  GRN = .addFunctionLogToObject(GRN)    
169
+
170 170
   checkmate::assertDataFrame(counts_peaks, min.rows = 1, min.cols = 2)
171 171
   checkmate::assertDataFrame(counts_rna, min.rows = 1, min.cols = 2)
172 172
   checkmate::assertCharacter(idColumn_peaks, min.chars = 1, len = 1)
... ...
@@ -224,7 +224,9 @@ addData <- function(GRN, counts_peaks, normalization_peaks = "DESeq_sizeFactor",
224 224
     # Check uniqueness of IDs
225 225
     nDuplicateRows = nrow(counts_rna) - length(unique(counts_rna$ENSEMBL))
226 226
     if (nDuplicateRows > 0) {
227
-      futile.logger::flog.warn(paste0(" Found ", nDuplicateRows, " duplicate rows in RNA-Seq data, consolidating them by summing them up."))
227
+      message = paste0(" Found ", nDuplicateRows, " duplicate rows in RNA-Seq data, consolidating them by summing them up.")
228
+      .checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE)
229
+        
228 230
       counts_rna = counts_rna %>%
229 231
         dplyr::group_by(.data$ENSEMBL) %>%
230 232
         dplyr::summarise_if(is.numeric, sum) 
... ...
@@ -266,7 +268,9 @@ addData <- function(GRN, counts_peaks, normalization_peaks = "DESeq_sizeFactor",
266 268
       
267 269
       # Force the first column to be the ID column
268 270
       if ("sampleID" %in% colnames(GRN@data$metadata)) {
269
-        futile.logger::flog.warn("Renaming the first column to sampleID. However, this column already exists, it will be renamed accordingly.")
271
+        message = paste0("Renaming the first column to sampleID. However, this column already exists, it will be renamed accordingly.")
272
+        .checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE)  
273
+          
270 274
         colnames(GRN@data$metadata)[which(colnames(GRN@data$metadata) == "sampleID")] = "sampleID_original"
271 275
         
272 276
       } 
... ...
@@ -544,15 +548,15 @@ addData <- function(GRN, counts_peaks, normalization_peaks = "DESeq_sizeFactor",
544 548
                                                                      GENENAME = as.factor(.data$GENENAME),
545 549
                                                                      SYMBOL = as.factor(.data$SYMBOL)),
546 550
                                                      by = c("peak.ID" = "peakID")) %>%
547
-      dplyr::rename(peak.gene.chr = .data$geneChr,
548
-                    peak.gene.start = .data$geneStart, 
549
-                    peak.gene.end = .data$geneEnd, 
550
-                    peak.gene.length = .data$geneLength, 
551
-                    peak.gene.strand = .data$geneStrand, 
552
-                    peak.gene.name = .data$GENENAME,
553
-                    peak.gene.distanceToTSS = .data$distanceToTSS,
554
-                    peak.gene.ENSEMBL = .data$ENSEMBL,
555
-                    peak.gene.symbol = .data$SYMBOL,
551
+      dplyr::rename(peak.nearestGene.chr = .data$geneChr,
552
+                    peak.nearestGene.start = .data$geneStart, 
553
+                    peak.nearestGene.end = .data$geneEnd, 
554
+                    peak.nearestGene.length = .data$geneLength, 
555
+                    peak.nearestGene.strand = .data$geneStrand, 
556
+                    peak.nearestGene.name = .data$GENENAME,
557
+                    peak.nearestGene.distanceToTSS = .data$distanceToTSS,
558
+                    peak.nearestGene.ENSEMBL = .data$ENSEMBL,
559
+                    peak.nearestGene.symbol = .data$SYMBOL,
556 560
                     peak.annotation = .data$annotation
557 561
       )
558 562
     
... ...
@@ -606,7 +610,7 @@ addData <- function(GRN, counts_peaks, normalization_peaks = "DESeq_sizeFactor",
606 610
 #' @importFrom rlang .data `:=`
607 611
 .calcGCContentPeaks <- function(GRN) {
608 612
   
609
-  futile.logger::flog.info(paste0("Calculate GC-content for peaks... "))
613
+  futile.logger::flog.info(paste0("Calculate GC-content for peaks. This may take a while"))
610 614
   start = Sys.time()
611 615
   genomeAssembly = GRN@config$parameters$genomeAssembly
612 616
   #TODO: GC content for all peaks
... ...
@@ -655,6 +659,11 @@ addData <- function(GRN, counts_peaks, normalization_peaks = "DESeq_sizeFactor",
655 659
 
656 660
 #' Filter RNA-seq and/or peak data from a \code{\linkS4class{GRN}} object
657 661
 #' 
662
+#' This function marks genes and/or peaks as \code{filtered} depending on the chosen filtering criteria. Filtered genes / peaks will then be 
663
+#' disregarded when adding connections in subsequent steps via \code{\link{addConnections_TF_peak}} and  \code{\link{addConnections_peak_gene}}. \strong{This function does NOT (re)filter existing connections when the \code{\linkS4class{GRN}} object already contains connections. Thus, upon re-execution of this function with different filtering criteria, all downstream steps have to be re-run.}
664
+#' 
665
+#' All this function does is setting (or modifying) the filtering flag in \code{GRN@data$peaks$counts_norm} and \code{GRN@data$RNA$counts_norm.l}, respectively.
666
+#' 
658 667
 #' @template GRN 
659 668
 #' @param minNormalizedMean_peaks Numeric[0,] or \code{NULL}. Default 5. Minimum mean across all samples for a peak to be retained for the normalized counts table. Set to \code{NULL} for not applying the filter.
660 669
 #' @param maxNormalizedMean_peaks Numeric[0,] or \code{NULL}. Default \code{NULL}. Maximum mean across all samples for a peak to be retained for the normalized counts table. Set to \code{NULL} for not applying the filter.
... ...
@@ -687,6 +696,8 @@ filterData <- function (GRN,
687 696
     
688 697
   checkmate::assertClass(GRN, "GRN")
689 698
   GRN = .addFunctionLogToObject(GRN) 
699
+  
700
+  GRN = .makeObjectCompatible(GRN)
690 701
 
691 702
   checkmate::assertNumber(minNormalizedMean_peaks, lower = 0, null.ok = TRUE)
692 703
   checkmate::assertNumber(minNormalizedMeanRNA, lower = 0, null.ok = TRUE)
... ...
@@ -936,7 +947,7 @@ filterData <- function (GRN,
936 947
 #' @param fileEnding Character. Default \code{".bed"}. File ending for the files from the motif folder.
937 948
 #' @param nTFMax \code{NULL} or Integer[1,]. Default \code{NULL}. Maximal number of TFs to import. Can be used for testing purposes, e.g., setting to 5 only imports 5 TFs even though the whole \code{motifFolder} has many more TFs defined.
938 949
 #' @template forceRerun
939
-#' @return An updated \code{\linkS4class{GRN}} object, with additional information added from this function (\code{GRN@data$TFs$translationTable} in particular)
950
+#' @return An updated \code{\linkS4class{GRN}} object, with additional information added from this function (\code{GRN@annotation$TFs} in particular)
940 951
 #' @examples 
941 952
 #' # See the Workflow vignette on the GRaNIE website for examples
942 953
 #' @export
... ...
@@ -945,6 +956,8 @@ addTFBS <- function(GRN, motifFolder, TFs = "all", nTFMax = NULL, filesTFBSPatte
945 956
   start = Sys.time()
946 957
   checkmate::assertClass(GRN, "GRN")
947 958
   GRN = .addFunctionLogToObject(GRN)
959
+  
960
+  GRN = .makeObjectCompatible(GRN)
948 961
 
949 962
   checkmate::assertDirectoryExists(motifFolder)
950 963
   checkmate::assertCharacter(TFs, min.len = 1)
... ...
@@ -953,19 +966,20 @@ addTFBS <- function(GRN, motifFolder, TFs = "all", nTFMax = NULL, filesTFBSPatte
953 966
   checkmate::assertCharacter(fileEnding, len = 1, min.chars = 1)
954 967
   checkmate::assertFlag(forceRerun)
955 968
   
956
-  if (is.null(GRN@data$TFs$translationTable) | is.null(GRN@data$TFs$translationTable) | is.null(GRN@config$allTF)  | is.null(GRN@config$directories$motifFolder) | forceRerun) {
969
+  if (is.null(GRN@annotation$TFs) | is.null(GRN@annotation$TFs) | is.null(GRN@config$allTF)  | is.null(GRN@config$directories$motifFolder) | forceRerun) {
957 970
     
958 971
     GRN@config$TFBS_fileEnding  = fileEnding
959 972
     GRN@config$TFBS_filePattern = filesTFBSPattern
960
-    GRN@data$TFs$translationTable = .getFinalListOfTFs(motifFolder, filesTFBSPattern, fileEnding, TFs, nTFMax, getCounts(GRN, type = "rna", norm = TRUE, permuted = FALSE))
973
+    GRN@annotation$TFs = .getFinalListOfTFs(motifFolder, filesTFBSPattern, fileEnding, TFs, nTFMax, getCounts(GRN, type = "rna", norm = TRUE, permuted = FALSE))
961 974
     
962 975
     
963
-    GRN@data$TFs$translationTable = GRN@data$TFs$translationTable %>%
964
-      dplyr::select(c("HOCOID", "ENSEMBL")) %>%
965
-      dplyr::mutate(TF.name = .data$HOCOID, TF.ENSEMBL = .data$ENSEMBL) 
976
+    GRN@annotation$TFs = GRN@annotation$TFs %>%
977
+      dplyr::rename(TF.ENSEMBL = .data$ENSEMBL, TF.HOCOID = .data$HOCOID)  %>% 
978
+      dplyr::mutate(TF.name = .data$TF.HOCOID)  %>%
979
+      dplyr::select(.data$TF.name, .data$TF.ENSEMBL, .data$TF.HOCOID)
966 980
     
967 981
     # TODO: Change here and make it more logical what to put where
968
-    GRN@config$allTF = GRN@data$TFs$translationTable$TF.name
982
+    GRN@config$allTF = GRN@annotation$TFs$TF.name
969 983
     
970 984
     #Store all data-dependent TF information
971 985
     # GRN@config$TF_list = list()
... ...
@@ -1071,6 +1085,8 @@ overlapPeaksAndTFBS <- function(GRN, nCores = 2, forceRerun = FALSE) {
1071 1085
   checkmate::assertClass(GRN, "GRN")
1072 1086
   GRN = .addFunctionLogToObject(GRN)
1073 1087
   
1088
+  GRN = .makeObjectCompatible(GRN)
1089
+  
1074 1090
   checkmate::assertIntegerish(nCores, lower = 1)
1075 1091
   checkmate::assertFlag(forceRerun)
1076 1092
   
... ...
@@ -1098,9 +1114,10 @@ overlapPeaksAndTFBS <- function(GRN, nCores = 2, forceRerun = FALSE) {
1098 1114
       tbl_discarded = table(annotation_discared$chr)
1099 1115
       tbl_discarded = tbl_discarded[which(tbl_discarded > 0)]
1100 1116
       
1101
-      futile.logger::flog.warn(paste0("Found ", sum(tbl_discarded), " regions from chromosomes without a reference length. ", 
1102
-                                      "Typically, these are random fragments from known or unknown chromosomes. The following regions will be discarded: \n",
1103
-                                      paste0(names(tbl_discarded), " (", tbl_discarded, ")", collapse = ",")))
1117
+      message = paste0("Found ", sum(tbl_discarded), " regions from chromosomes without a reference length. ", 
1118
+                       "Typically, these are random fragments from known or unknown chromosomes. The following regions will be discarded: \n",
1119
+                       paste0(names(tbl_discarded), " (", tbl_discarded, ")", collapse = ","))
1120
+      .checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE)  
1104 1121
       
1105 1122
       GRN@data$peaks$consensusPeaks = dplyr::filter(GRN@data$peaks$consensusPeaks, .data$chr %in% names(seqlengths))
1106 1123
     }
... ...
@@ -1218,7 +1235,9 @@ overlapPeaksAndTFBS <- function(GRN, nCores = 2, forceRerun = FALSE) {
1218 1235
   } else if (ncol(TFBS.df) >= 6) {
1219 1236
     
1220 1237
     if (ncol(TFBS.df) > 6) {
1221
-      futile.logger::flog.warn(paste0(" File ", file_tfbs_in, " had more than 6 columns, only the first 6 will be used."))
1238
+      message = paste0(" File ", file_tfbs_in, " had more than 6 columns, only the first 6 will be used.")
1239
+      .checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE)  
1240
+        
1222 1241
       TFBS.df = TFBS.df[,seq_len(6)]
1223 1242
     }
1224 1243
     colnames(TFBS.df) = c("chr", "start", "end", "annotation", "score", "strand")
... ...
@@ -1350,9 +1369,9 @@ addData_TFActivity <- function(GRN, normalization = "cyclicLoess", name = "TF_ac
1350 1369
     stopifnot(identical(nrow(countsPeaks), nrow(GRN@data$TFs$TF_peak_overlap)))
1351 1370
     
1352 1371
     #Select a maximum set of TFs to run this for
1353
-    allTF = GRN@data$TFs$translationTable$TF.name
1372
+    allTF = GRN@annotation$TFs$TF.name
1354 1373
     
1355
-    # rownamesTFs = GRN@data$TFs$translationTable$ENSEMBL[match(allTF, GRN@data$TFs$translationTable$HOCOID)] 
1374
+    # rownamesTFs = GRN@annotation$TFs$ENSEMBL[match(allTF, GRN@annotation$TFs$HOCOID)] 
1356 1375
     
1357 1376
     # Calculating TF activity is done for all TF that are available
1358 1377
     TF.activity.m = matrix(NA, nrow = length(allTF), ncol = length(GRN@config$sharedSamples), 
... ...
@@ -1385,7 +1404,7 @@ addData_TFActivity <- function(GRN, normalization = "cyclicLoess", name = "TF_ac
1385 1404
       as.data.frame() %>% 
1386 1405
       tibble::rownames_to_column("TF.name") %>%
1387 1406
       tibble::as_tibble() %>%
1388
-      dplyr::left_join(GRN@data$TFs$translationTable, by = "TF.name") %>%
1407
+      dplyr::left_join(GRN@annotation$TFs, by = "TF.name") %>%
1389 1408
       dplyr::select(.data$ENSEMBL, .data$TF.name, tidyselect::all_of(GRN@config$sharedSamples))
1390 1409
     
1391 1410
     # Update available connection types
... ...
@@ -1509,7 +1528,7 @@ importTFData <- function(GRN, data, name, idColumn = "ENSEMBL", nameColumn = "TF
1509 1528
     futile.logger::flog.info(paste0("Importing external TF data under the name ", name)) 
1510 1529
     
1511 1530
     # Check whether TF have been added already
1512
-    if (is.null(GRN@data$TFs$translationTable)) {
1531
+    if (is.null(GRN@annotation$TFs)) {
1513 1532
       message = paste0("No TFBS afound in the object. Make sure to run the function addTFBS first.")
1514 1533
       .checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)
1515 1534
     }
... ...
@@ -1535,10 +1554,10 @@ importTFData <- function(GRN, data, name, idColumn = "ENSEMBL", nameColumn = "TF
1535 1554
     }
1536 1555
     
1537 1556
     # Make sure the column is reset
1538
-    GRN@data$TFs$translationTable[[paste0("TF.name.", name)]] = NULL
1557
+    GRN@annotation$TFs[[paste0("TF.name.", name)]] = NULL
1539 1558
     
1540 1559
     # TODO: Repeated execution results in more and more rows
1541
-    GRN@data$TFs$translationTable = dplyr::left_join(GRN@data$TFs$translationTable, data[, idColumns], 
1560
+    GRN@annotation$TFs = dplyr::left_join(GRN@annotation$TFs, data[, idColumns], 
1542 1561
                                                      by = "TF.name", suffix = c("", paste0(".", name)))
1543 1562
     
1544 1563
     # data = dplyr::select(data, -tidyselect::one_of(nameColumn))
... ...
@@ -1550,7 +1569,7 @@ importTFData <- function(GRN, data, name, idColumn = "ENSEMBL", nameColumn = "TF
1550 1569
     countsNorm$ENSEMBL = data$ENSEMBL
1551 1570
     
1552 1571
     nRowBefore = nrow(countsNorm)
1553
-    countsNorm.subset = dplyr::filter(countsNorm, .data$ENSEMBL %in% GRN@data$TFs$translationTable$ENSEMBL)
1572
+    countsNorm.subset = dplyr::filter(countsNorm, .data$ENSEMBL %in% GRN@annotation$TFs$TF.ENSEMBL)
1554 1573
     nRowAfter = nrow(countsNorm.subset)
1555 1574
     if (nRowAfter == 0) {
1556 1575
       message = "No rows overlapping with translation table, check ENSEMBL IDs."
... ...
@@ -1627,6 +1646,7 @@ AR_classification_wrapper<- function (GRN, significanceThreshold_Wilcoxon = 0.05
1627 1646
   checkmate::assertClass(GRN, "GRN")
1628 1647
   GRN = .addFunctionLogToObject(GRN)
1629 1648
   
1649
+  GRN = .makeObjectCompatible(GRN)
1630 1650
   
1631 1651
   checkmate::assertNumber(significanceThreshold_Wilcoxon, lower = 0, upper = 1)
1632 1652
   checkmate::assertNumber(plot_minNoTFBS_heatmap, lower = 1)
... ...
@@ -1637,7 +1657,7 @@ AR_classification_wrapper<- function (GRN, significanceThreshold_Wilcoxon = 0.05
1637 1657
   
1638 1658
   outputFolder = .checkOutputFolder(GRN, outputFolder)
1639 1659
   
1640
-  GRN@data$TFs$classification$TF.translation.orig = GRN@data$TFs$translationTable %>%
1660
+  GRN@data$TFs$classification$TF.translation.orig = GRN@annotation$TFs %>%
1641 1661
     dplyr::mutate(TF.name = .data$HOCOID)
1642 1662
   
1643 1663
   if (is.null(GRN@data$TFs$TF_peak_overlap)) {
... ...
@@ -1695,7 +1715,7 @@ AR_classification_wrapper<- function (GRN, significanceThreshold_Wilcoxon = 0.05
1695 1715
         
1696 1716
         TF_peak_cor = .correlateMatrices(matrix1      = counts1, 
1697 1717
                                          matrix_peaks = counts_peaks, 
1698
-                                         GRN@data$TFs$translationTable, corMethod)
1718
+                                         GRN@annotation$TFs, corMethod)
1699 1719
         
1700 1720
         peak_TF_overlapCur.df = .filterSortAndShuffle_peakTF_overlapTable(GRN, permutationCur, TF_peak_cor)
1701 1721
         res.l = .computeForegroundAndBackgroundMatrices(peak_TF_overlapCur.df, TF_peak_cor)
... ...
@@ -1766,7 +1786,7 @@ AR_classification_wrapper<- function (GRN, significanceThreshold_Wilcoxon = 0.05
1766 1786
           TF_peak_cor = GRN@data$TFs$classification[[permIndex]] [[connectionTypeCur]]$TF_peak_cor
1767 1787
           peak_TF_overlapCur.df = .filterSortAndShuffle_peakTF_overlapTable(GRN, permutationCur, TF_peak_cor)
1768 1788
           .plot_heatmapAR(TF.peakMatrix.df = peak_TF_overlapCur.df, 
1769
-                          HOCOMOCO_mapping.df.exp = GRN@data$TFs$translationTable %>% dplyr::mutate(TF = .data$TF.name), 
1789
+                          HOCOMOCO_mapping.df.exp = GRN@annotation$TFs %>% dplyr::mutate(TF = .data$TF.name), 
1770 1790
                           sort.cor.m = TF_peak_cor, 
1771 1791
                           par.l = GRN@config$parameters, 
1772 1792
                           corMethod = corMethod,
... ...
@@ -1860,6 +1880,8 @@ addConnections_TF_peak <- function (GRN, plotDiagnosticPlots = TRUE, plotDetails
1860 1880
 
1861 1881
   checkmate::assertClass(GRN, "GRN")
1862 1882
   GRN = .addFunctionLogToObject(GRN)
1883
+  
1884
+  GRN = .makeObjectCompatible(GRN)
1863 1885
 
1864 1886
   checkmate::assertFlag(plotDiagnosticPlots)
1865 1887
   checkmate::assertFlag(plotDetails)
... ...
@@ -1981,7 +2003,7 @@ addConnections_TF_peak <- function (GRN, plotDiagnosticPlots = TRUE, plotDetails
1981 2003
     # Filtering of the matrices happens automatically within the next function
1982 2004
     peaksCor.m = .correlateMatrices( matrix1= counts_connectionTypeCur, 
1983 2005
                                      matrix_peaks = counts_peaks, 
1984
-                                     GRN@data$TFs$translationTable, 
2006
+                                     GRN@annotation$TFs, 
1985 2007
                                      corMethod,
1986 2008
                                      whitespacePrefix = "  ")
1987 2009
     
... ...
@@ -2419,6 +2441,7 @@ addConnections_peak_gene <- function(GRN, overlapTypeGene = "TSS", corMethod = "
2419 2441
   checkmate::assertClass(GRN, "GRN")
2420 2442
   GRN = .addFunctionLogToObject(GRN)
2421 2443
   
2444
+  GRN = .makeObjectCompatible(GRN)
2422 2445
   
2423 2446
   checkmate::assertChoice(overlapTypeGene, c("TSS", "full"))
2424 2447
   checkmate::assertChoice(corMethod, c("pearson", "spearman"))
... ...
@@ -2688,7 +2711,8 @@ addConnections_peak_gene <- function(GRN, overlapTypeGene = "TSS", corMethod = "
2688 2711
     # Check whether TAD boundaries overlap and print a warning if so
2689 2712
     nMultipleOverlaps = .checkSelfOverlap(subject)
2690 2713
     if (nMultipleOverlaps > 0) {
2691
-      futile.logger::flog.warn(paste0(nMultipleOverlaps, " out of ", length(subject), " TADs overlap with at least one other TAD. Please verify whether this is intended or a mistake. Particularly 1bp overlaps may not resembl the truth."))
2714
+        message =paste0(nMultipleOverlaps, " out of ", length(subject), " TADs overlap with at least one other TAD. Please verify whether this is intended or a mistake. Particularly 1bp overlaps may not resembl the truth.")
2715
+        .checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE)
2692 2716
     }
2693 2717
     
2694 2718
     
... ...
@@ -2937,6 +2961,7 @@ addConnections_peak_gene <- function(GRN, overlapTypeGene = "TSS", corMethod = "
2937 2961
 #' This is one of the main integrative functions of the \code{GRaNIE} package. It has two main functions: 
2938 2962
 #' First, filtering both TF-peak and peak-gene connections according to different criteria such as FDR and other properties 
2939 2963
 #' Second, joining the three major elements that an eGRN consist of (TFs, peaks, genes) into one data frame, with one row per unique TF-peak-gene connection.
2964
+#' \strong{After successful execution, the connections (along with additional feature metadata) can be retrieved with the function \code{\link{getGRNConnections}}.}
2940 2965
 #' \strong{Note that the eGRN graph is reset upon successful execution of this function, and re-running the function \code{\link{build_eGRN_graph}} 
2941 2966
 #' is necessary when any of the network functions of the package shall be executed.}
2942 2967
 
... ...
@@ -2968,7 +2993,8 @@ addConnections_peak_gene <- function(GRN, overlapTypeGene = "TSS", corMethod = "
2968 2993
 #' @param silent \code{TRUE} or \code{FALSE}.  Default \code{FALSE}. Print progress messages and filter statistics.
2969 2994
 #' @param filterLoops  \code{TRUE} or \code{FALSE}. Default \code{TRUE}. If a TF regulates itself (i.e., the TF and the gene are the same entity), should such loops be filtered from the GRN?
2970 2995
 #' @template outputFolder
2971
-#' @return An updated \code{\linkS4class{GRN}} object, with additional information added from this function. The filtered and merged TF-peak and peak-gene connections in the slot \code{GRN@connections$all.filtered}.
2996
+#' @return An updated \code{\linkS4class{GRN}} object, with additional information added from this function. 
2997
+#' The filtered and merged TF-peak and peak-gene connections in the slot \code{GRN@connections$all.filtered} and can be retrieved (along with other feature metadata) using the function \code{\link{getGRNConnections}}.
2972 2998
 #' @examples 
2973 2999
 #' # See the Workflow vignette on the GRaNIE website for examples
2974 3000
 #' GRN = loadExampleObject()
... ...
@@ -2977,6 +3003,7 @@ addConnections_peak_gene <- function(GRN, overlapTypeGene = "TSS", corMethod = "
2977 3003
 #' @seealso \code{\link{addConnections_TF_peak}} 
2978 3004
 #' @seealso \code{\link{addConnections_peak_gene}} 
2979 3005
 #' @seealso \code{\link{build_eGRN_graph}} 
3006
+#' @seealso \code{\link{getGRNConnections}} 
2980 3007
 #' @importFrom rlang .data
2981 3008
 #' @importFrom magrittr `%>%`
2982 3009
 #' @export
... ...
@@ -3004,6 +3031,8 @@ filterGRNAndConnectGenes <- function(GRN,
3004 3031
   checkmate::assertClass(GRN, "GRN")
3005 3032
   GRN = .addFunctionLogToObject(GRN)
3006 3033
   
3034
+  GRN = .makeObjectCompatible(GRN)
3035
+  
3007 3036
   
3008 3037
   checkmate::assertCharacter(TF_peak.connectionTypes, min.len = 1, any.missing = FALSE)
3009 3038
   checkmate::assert(checkmate::checkNull(peak_gene.p_raw.threshold), checkmate::checkNumber(peak_gene.p_raw.threshold, lower = 0, upper = 1))
... ...
@@ -3015,7 +3044,7 @@ filterGRNAndConnectGenes <- function(GRN,
3015 3044
   checkmate::assertChoice(peak_gene.fdr.method, c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none", "IHW"))
3016 3045
   checkmate::assert(checkmate::checkNull(peak_gene.IHW.covariate), checkmate::checkCharacter(peak_gene.IHW.covariate, min.chars = 1, len = 1))
3017 3046
   
3018
-  checkmate::assert(checkIntegerish(peak_gene.IHW.nbins, lower = 1), checkSubset(peak_gene.IHW.nbins, "auto"))
3047
+  checkmate::assert(checkmate::checkIntegerish(peak_gene.IHW.nbins, lower = 1), checkmate::checkSubset(peak_gene.IHW.nbins, "auto"))
3019 3048
   
3020 3049
   checkmate::assertFlag(silent)
3021 3050
   checkmate::assertChoice(peak_gene.selection, c("all", "closest"))
... ...
@@ -3056,16 +3085,22 @@ filterGRNAndConnectGenes <- function(GRN,
3056 3085
       .checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)
3057 3086
     }
3058 3087
     
3088
+    # Only select the absolute necessary here, no additional metadata
3089
+    ann.gene.red = GRN@annotation$genes %>%
3090
+        dplyr::mutate(gene.ENSEMBL = as.character(.data$gene.ENSEMBL)) %>%
3091
+        dplyr::select(.data$gene.ENSEMBL, .data$gene.name, .data$gene.type)
3092
+    
3059 3093
     peakGeneCorrelations = GRN@connections$peak_genes[[permIndex]] %>%
3060 3094
       dplyr::mutate(gene.ENSEMBL = as.character(.data$gene.ENSEMBL)) %>%
3061
-      dplyr::left_join(dplyr::mutate(GRN@annotation$genes, gene.ENSEMBL = as.character(.data$gene.ENSEMBL)), by = "gene.ENSEMBL")
3062
-    
3095
+      dplyr::left_join(ann.gene.red, by = "gene.ENSEMBL")
3096
+
3097
+
3063 3098
     
3064 3099
     # Add TF Ensembl IDs
3065 3100
     grn.filt = GRN@connections$TF_peaks[[permIndex]]$main  %>% 
3066 3101
       tibble::as_tibble() %>%
3067
-      dplyr::left_join(GRN@data$TFs$translationTable, by = c("TF.name")) %>%
3068
-      dplyr::select(-.data$HOCOID, -.data$ENSEMBL) %>%
3102
+      dplyr::left_join(GRN@annotation$TFs %>% dplyr::select(.data$TF.name, .data$TF.ENSEMBL), by = c("TF.name")) %>%
3103
+      dplyr::select(-.data$TF_peak.fdr_orig) %>%
3069 3104
       dplyr::mutate(TF.ENSEMBL = as.factor(.data$TF.ENSEMBL))
3070 3105
     
3071 3106
     if (is.null(grn.filt)) {
... ...
@@ -3172,20 +3207,28 @@ filterGRNAndConnectGenes <- function(GRN,
3172 3207
     if (!is.null(filterTFs)) {
3173 3208
       futile.logger::flog.info(paste0(" Filter network to the following TF: ", paste0(filterTFs, collapse = ",")))
3174 3209
       futile.logger::flog.info(paste0("  Number of TF-peak rows before filtering TFs: ", nrow(grn.filt)))
3175
-      grn.filt = dplyr::filter(grn.filt, .data$TF %in% filterTFs)
3210
+      grn.filt = dplyr::filter(grn.filt, .data$TF.name %in% filterTFs)
3176 3211
       futile.logger::flog.info(paste0("  Number of TF-peak rows after filtering TFs: ", nrow(grn.filt)))
3177 3212
     }
3178 3213
     
3179 3214
     if (!is.null(filterPeaks)) {
3180 3215
       futile.logger::flog.info(paste0(" Filter network to the following peaks: ", paste0(filterPeaks, collapse = ",")))
3181 3216
       futile.logger::flog.info(paste0("  Number of TF-peak rows before filtering peaks: ", nrow(grn.filt)))
3182
-      grn.filt = dplyr::filter(grn.filt, .data$peakID %in% filterPeaks)
3217
+      grn.filt = dplyr::filter(grn.filt, .data$peak.ID %in% filterPeaks)
3183 3218
       futile.logger::flog.info(paste0("  Number of TF-peak rows after filtering peaks: ", nrow(grn.filt)))
3184 3219
     }
3185 3220
     
3186 3221
     # Filters on peak-genes
3187 3222
     
3188 3223
     futile.logger::flog.info("2. Filter peak-gene connections")
3224
+    
3225
+    if (!is.null(filterGenes)) {
3226
+        futile.logger::flog.info(paste0(" Filter peak-gene connections for the following genes: ", paste0(filterGenes, collapse = ",")))
3227
+        futile.logger::flog.info(paste0("  Number of rows before filtering genes: ", nrow(peakGeneCorrelations)))
3228
+        peakGeneCorrelations = dplyr::filter(peakGeneCorrelations, .data$gene.ENSEMBL %in% filterGenes)
3229
+        futile.logger::flog.info(paste0("  Number of rows after filtering genes: ", nrow(peakGeneCorrelations)))
3230
+    }
3231
+    
3189 3232
     if (!is.null(peak_gene.maxDistance)) {
3190 3233
       checkmate::assertIntegerish(peak_gene.maxDistance, lower = 0)
3191 3234
       futile.logger::flog.info(paste0(" Filter peak-gene connections for their distance and keep only connections with a maximum distance of  ", peak_gene.maxDistance))
... ...
@@ -3201,6 +3244,8 @@ filterGRNAndConnectGenes <- function(GRN,
3201 3244
       futile.logger::flog.info(paste0("  Number of peak-gene rows after filtering by gene type: ", nrow(peakGeneCorrelations)))
3202 3245
     }
3203 3246
     
3247
+   
3248
+    
3204 3249
     
3205 3250
     futile.logger::flog.info(paste0("3. Merging TF-peak with peak-gene connections and filter the combined table..."))
3206 3251
     # Now we need the connected genes. All fitters that are independent of that have been done
... ...
@@ -3213,7 +3258,7 @@ filterGRNAndConnectGenes <- function(GRN,
3213 3258
     }
3214 3259
     
3215 3260
     
3216
-    futile.logger::flog.info(paste0("Inital number of rows left before all filtering steps: ", nrow(grn.filt)))
3261
+    futile.logger::flog.info(paste0("Inital number of rows left before filtering steps: ", nrow(grn.filt)))
3217 3262
     
3218 3263
     if (filterLoops) {
3219 3264
       futile.logger::flog.info(paste0(" Filter TF-TF self-loops"))
... ...
@@ -3227,12 +3272,7 @@ filterGRNAndConnectGenes <- function(GRN,
3227 3272
       futile.logger::flog.info(paste0("  Number of rows after filtering genes: ", nrow(grn.filt)))
3228 3273
     }
3229 3274
     
3230
-    if (!is.null(filterGenes)) {
3231
-      futile.logger::flog.info(paste0(" Filter network to the following genes: ", paste0(filterGenes, collapse = ",")))
3232
-      futile.logger::flog.info(paste0("  Number of rows before filtering genes: ", nrow(grn.filt)))
3233
-      grn.filt = dplyr::filter(grn.filt, .data$gene.ENSEMBL %in% filterGenes)
3234
-      futile.logger::flog.info(paste0("  Number of rows after filtering genes: ", nrow(grn.filt)))
3235
-    }
3275
+    
3236 3276
     
3237 3277
     if (allowMissingGenes) {
3238 3278
       # Nothing to do here
... ...
@@ -3359,13 +3399,12 @@ filterGRNAndConnectGenes <- function(GRN,
3359 3399
     
3360 3400
     
3361 3401
     grn.filt = grn.filt %>%
3362
-      dplyr::left_join(GRN@annotation$consensusPeaks, by = "peak.ID") %>%
3363
-      dplyr::select(dplyr::starts_with("TF."), 
3364
-                    dplyr::starts_with("TF_peak."), 
3365
-                    dplyr::starts_with("peak."), 
3366
-                    dplyr::starts_with("peak_gene."),
3367
-                    dplyr::starts_with("gene."),
3368
-                    -dplyr::starts_with("peak.gene.")) %>% # these are the annotation columns by ChipSeeker that are confusing
3402
+      dplyr::select(tidyselect::starts_with("TF."), 
3403
+                    tidyselect::starts_with("TF_peak."), 
3404
+                    tidyselect::starts_with("peak."), 
3405
+                    tidyselect::starts_with("peak_gene."),
3406
+                    tidyselect::starts_with("gene."),
3407
+                    tidyselect::everything()) %>%
3369 3408
       dplyr::mutate(peak.ID      = as.factor(.data$peak.ID),
3370 3409
                     gene.ENSEMBL = as.factor(.data$gene.ENSEMBL),
3371 3410
                     TF.name      = as.factor(.data$TF.name))
... ...
@@ -3461,7 +3500,7 @@ filterGRNAndConnectGenes <- function(GRN,
3461 3500
     
3462 3501
     # We can compare this to the result of applying the method of Benjamini and Hochberg to the p-values only:
3463 3502
     
3464
-    padj_bh <- stats::p.adjust(pvalues, method = "BH")
3503
+    padj_bh = stats::p.adjust(pvalues, method = "BH")
3465 3504
     rejectionsBH = sum(padj_bh <= alpha, na.rm = TRUE)
3466 3505
     rejectionsIHW = IHW::rejections(ihw_res)
3467 3506
     
... ...
@@ -3580,6 +3619,9 @@ add_TF_gene_correlation <- function(GRN, corMethod = "pearson", addRobustRegress
3580 3619
   
3581 3620
   checkmate::assertClass(GRN, "GRN")  
3582 3621
   GRN = .addFunctionLogToObject(GRN)    
3622
+  
3623
+  GRN = .makeObjectCompatible(GRN)
3624
+  
3583 3625
   checkmate::assertChoice(corMethod, c("pearson", "spearman"))
3584 3626
   checkmate::assertFlag(addRobustRegression)
3585 3627
   checkmate::assertIntegerish(nCores, lower = 1)
... ...
@@ -3594,10 +3636,7 @@ add_TF_gene_correlation <- function(GRN, corMethod = "pearson", addRobustRegress
3594 3636
     
3595 3637
     GRN@connections$TF_genes.filtered = list()
3596 3638
     
3597
-    if (is.null(GRN@connections$all.filtered)) {
3598
-      message = "Could not find merged and filtered connections. Run the function filterGRNAndConnectGenes first."
3599
-      .checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)
3600
-    }
3639
+    .checkExistanceFilteredConnections(GRN)
3601 3640
     
3602 3641
     futile.logger::flog.info(paste0("Calculate correlations for TF and genes from the filtered set of connections"))
3603 3642
     
... ...
@@ -3611,7 +3650,7 @@ add_TF_gene_correlation <- function(GRN, corMethod = "pearson", addRobustRegress
3611 3650
         dplyr::filter(!is.na(.data$gene.ENSEMBL)) %>%
3612 3651
         dplyr::select(.data$TF.name, .data$gene.ENSEMBL) %>%
3613 3652
         dplyr::distinct() %>%
3614
-        dplyr::left_join(GRN@data$TFs$translationTable, by = c("TF.name"), suffix = c("", ".transl")) # %>%
3653
+        dplyr::left_join(GRN@annotation$TFs, by = c("TF.name"), suffix = c("", ".transl")) # %>%
3615 3654
       # dplyr::distinct(ENSEMBL, ENSEMBL.transl)
3616 3655
       # TODO: Improve: Only loop over distinct ENSMBL_TF and ENSEMBL_gene pairs
3617 3656
       
... ...
@@ -3658,7 +3697,7 @@ add_TF_gene_correlation <- function(GRN, corMethod = "pearson", addRobustRegress
3658 3697
                                     dplyr::mutate(TF.ENSEMBL   = getCounts(GRN, type = "rna", norm = TRUE, permuted = as.logical(permutationCur))$ENSEMBL[map_TF],
3659 3698
                                                   gene.ENSEMBL = getCounts(GRN, type = "rna", norm = TRUE, permuted = as.logical(permutationCur))$ENSEMBL[map_gene]) %>%
3660 3699
                                     dplyr::filter(!is.na(.data$gene.ENSEMBL), !is.na(.data$TF.ENSEMBL)) %>%  # For some peak-gene combinations, no RNA-Seq data was available, these NAs are filtered
3661
-                                    dplyr::left_join(GRN@data$TFs$translationTable, by = c("TF.ENSEMBL")) %>%
3700
+                                    dplyr::left_join(GRN@annotation$TFs, by = c("TF.ENSEMBL")) %>%
3662 3701
                                     dplyr::select(tidyselect::all_of(selectColumns))) %>%
3663 3702
           dplyr::mutate(gene.ENSEMBL = as.factor(.data$gene.ENSEMBL), 
3664 3703
                         TF.ENSEMBL   = as.factor(.data$TF.ENSEMBL),
... ...
@@ -3677,8 +3716,9 @@ add_TF_gene_correlation <- function(GRN, corMethod = "pearson", addRobustRegress
3677 3716
         }
3678 3717
         
3679 3718
       } else {
3680
-        futile.logger::flog.warn(paste0(" Nothing to do, skip."))
3681
-        
3719
+        message =paste0(" Nothing to do, skip.")
3720
+        .checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE)
3721
+          
3682 3722
         if (addRobustRegression) {
3683 3723
           res.df = tibble::tribble(~TF.name, ~TF.ENSEMBL, ~gene.ENSEMBL, ~TF_gene.r, ~TF_gene.p_raw, ~TF_gene.p_raw.robust, 
3684 3724
                                    ~TF_gene.r_robust, ~TF_gene.bias_M_p.raw, ~TF_gene.bias_LS_p.raw)
... ...
@@ -3732,7 +3772,8 @@ addSNPOverlap <- function(grn, SNPData, col_chr = "chr", col_pos = "pos", col_pe
3732 3772
   if (genomeAssembly_SNP %in% c("hg19", "hg38")) {
3733 3773
     index_chr23 = which(grepl("chr23", SNPData[, col_chr]))
3734 3774
     if (length(index_chr23) > 0) {
3735
-      futile.logger::flog.warn(paste0(" Chr23 found in SNP data. Renaming to chrX"))
3775
+      message =paste0(" Chr23 found in SNP data. Renaming to chrX")
3776
+      .checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE)
3736 3777
       SNPData[index_chr23, col_chr] = stringr::str_replace(SNPData[index_chr23, col_chr], "chr23", "chrX")
3737 3778
     }
3738 3779
   }
... ...
@@ -3857,6 +3898,8 @@ generateStatsSummary <- function(GRN,
3857 3898
   
3858 3899
   checkmate::assertClass(GRN, "GRN")
3859 3900
   GRN = .addFunctionLogToObject(GRN)
3901
+  
3902
+  GRN = .makeObjectCompatible(GRN)
3860 3903
 
3861 3904
   checkmate::assertNumeric(TF_peak.fdr, lower = 0, upper = 1, min.len = 1)
3862 3905
   checkmate::assertSubset(TF_peak.connectionTypes, c("all", GRN@config$TF_peak_connectionTypes), empty.ok = FALSE)
... ...
@@ -4172,8 +4215,7 @@ loadExampleObject <- function(forceDownload = FALSE, fileURL = "https://www.embl
4172 4215
   GRN@config$directories$motifFolder = "."
4173 4216
   GRN@config$files$output_log = "GRN.log"
4174 4217
   
4175
-  # Temporary fix: Replace lincRNA with lncRNA due to a recent change in biomaRt until we update the object directly
4176
-  GRN@annotation$genes = dplyr::mutate(GRN@annotation$genes, gene.type = dplyr::recode_factor(.data$gene.type, lncRNA = "lincRNA")) 
4218
+  GRN = .makeObjectCompatible(GRN)
4177 4219
   
4178 4220
   GRN
4179 4221
   
... ...
@@ -4200,7 +4242,9 @@ loadExampleObject <- function(forceDownload = FALSE, fileURL = "https://www.embl
4200 4242
 getCounts <- function(GRN, type, norm, permuted = FALSE) {
4201 4243
   
4202 4244
   checkmate::assertClass(GRN, "GRN")
4203
-  GRN = .addFunctionLogToObject(GRN)      
4245
+  GRN = .addFunctionLogToObject(GRN)     
4246
+  
4247
+  GRN = .makeObjectCompatible(GRN)
4204 4248
   
4205 4249
   checkmate::assertChoice(type, c("peaks", "rna"))
4206 4250
   checkmate::assertFlag(norm)#
... ...
@@ -4248,7 +4292,7 @@ getCounts <- function(GRN, type, norm, permuted = FALSE) {
4248 4292
   } else { # permutation 1
4249 4293
     
4250 4294
     if (type == "peaks") {
4251
-      message = "Could not find permuted peak counts in GRN object. Peaks are not stored as pemruted, set permuted = FALSE for type = \"peaks\""
4295
+      message = "Could not find permuted peak counts in GRN object. Peaks are not stored as permuted, set permuted = FALSE for type = \"peaks\""
4252 4296
       .checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)
4253 4297
     } else {
4254 4298
       
... ...
@@ -4283,52 +4327,123 @@ getCounts <- function(GRN, type, norm, permuted = FALSE) {
4283 4327
 #' @export
4284 4328
 #' @template GRN 
4285 4329
 #' @template permuted
4286
-#' @param type Character. One of \code{TF_peaks}, \code{peak_genes}, \code{all.filtered}. Default \code{all.filtered}. The type of connections to retrieve.
4287
-#' @param include_TF_gene_correlations Logical. \code{TRUE} or \code{FALSE}. Default \code{FALSE}. Should TFs and gene correlations be returned as well? If set to \code{TRUE}, they must have been computed beforehand with \code{\link{add_TF_gene_correlation}}.
4330
+#' @param type Character. One of \code{TF_peaks}, \code{peak_genes}, \code{TF_genes} or \code{all.filtered}. Default \code{all.filtered}. The type of connections to retrieve.
4331
+#' @param include_TF_gene_correlations Logical. \code{TRUE} or \code{FALSE}. Default \code{FALSE}. Should TFs and gene correlations be returned as well? If set to \code{TRUE}, they must have been computed beforehand with \code{\link{add_TF_gene_correlation}}. Only relevant for type = "all.filtered"
4332
+#' @param include_TFMetadata Logical. \code{TRUE} or \code{FALSE}. Default \code{FALSE}. Should TF metadata be returned as well? Only relevant for type = "all.filtered"
4333
+#' @param include_peakMetadata Logical. \code{TRUE} or \code{FALSE}. Default \code{FALSE}. Should peak metadata be returned as well?  Only relevant for type = "all.filtered"
4334
+#' @param include_geneMetadata Logical. \code{TRUE} or \code{FALSE}. Default \code{FALSE}. Should gene metadata be returned as well?  Only relevant for type = "all.filtered"
4335
+#' @param include_variancePartitionResults Logical. \code{TRUE} or \code{FALSE}. Default \code{FALSE}. 
4336
+#' Should the results from the function \code{\link{add_featureVariation}} be included? 
4337
+#' If set to \code{TRUE}, they must have been computed beforehand with \code{\link{add_featureVariation}}; otherwise, an error is thrown.
4338
+#' Only relevant for type = "all.filtered"
4288 4339
 #' @return A data frame with the requested connections. This function does **NOT** return a \code{\linkS4class{GRN}} object.
4289 4340
 #' @seealso \code{\link{filterGRNAndConnectGenes}}
4341
+#' @seealso \code{\link{add_featureVariation}}
4342
+#' @seealso \code{\link{add_TF_gene_correlation}}
4290 4343
 #' @examples 
4291 4344
 #' # See the Workflow vignette on the GRaNIE website for examples
4292 4345
 #' GRN = loadExampleObject()
4293 4346
 #' GRN_con.all.df = getGRNConnections(GRN)
4294
-getGRNConnections <- function(GRN, type = "all.filtered",  permuted = FALSE, include_TF_gene_correlations = FALSE) {
4347
+getGRNConnections <- function(GRN, type = "all.filtered",  permuted = FALSE, 
4348
+                              include_TF_gene_correlations = FALSE, 
4349
+                              include_TFMetadata = FALSE,
4350
+                              include_peakMetadata = FALSE,
4351
+                              include_geneMetadata = FALSE,
4352
+                              include_variancePartitionResults = FALSE) {
4295 4353
   
4296 4354
   checkmate::assertClass(GRN, "GRN")  
4297 4355
   GRN = .addFunctionLogToObject(GRN)
4298 4356
   
4357
+  GRN = .makeObjectCompatible(GRN)
4299 4358
    
4300
-  checkmate::assertChoice(type, c("TF_peaks", "peak_genes", "all.filtered"))
4359
+  checkmate::assertChoice(type, c("TF_peaks", "peak_genes", "TF_genes", "all.filtered"))
4301 4360
   checkmate::assertFlag(permuted)
4302 4361
   #checkmate::assertIntegerish(permutation, lower = 0, upper = .getMaxPermutation(GRN))
4303 4362
   checkmate::assertFlag(include_TF_gene_correlations)
4363
+  checkmate::assertFlag(include_variancePartitionResults)
4364
+  checkmate::assertFlag(include_TFMetadata)
4365
+  checkmate::assertFlag(include_peakMetadata)
4366
+  checkmate::assertFlag(include_geneMetadata)
4304 4367
   
4305 4368
   permIndex = dplyr::if_else(permuted, "1", "0")
4306 4369
   
4307 4370
   if (type == "all.filtered") {
4371
+      
4372
+    merged.df = GRN@connections$all.filtered[[permIndex]]
4373
+
4374
+    if (include_TF_gene_correlations) {
4375
+        
4376
+        if (is.null(GRN@connections$TF_genes.filtered)) {
4377
+            message = "Please run the function add_TF_gene_correlation first. "
4378
+            .checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)
4379
+        }
4380
+        
4381
+        # Merge with TF-gene table
4382
+        merged.df = merged.df %>%
4383
+            dplyr::left_join(GRN@connections$TF_genes.filtered[[permIndex]], 
4384
+                             by = c("TF.name", "TF.ENSEMBL", "gene.ENSEMBL")) 
4385
+    }
4308 4386
     
4387
+    if (include_variancePartitionResults) {
4388
+        
4389
+        if (ncol(GRN@annotation$genes %>% dplyr::select(tidyselect::starts_with("variancePartition"))) == 0 |
4390
+            ncol(GRN@annotation$consensusPeaks %>% dplyr::select(tidyselect::starts_with("variancePartition"))) == 0) {
4391
+            message = "Please run the function add_featureVariation first. "
4392
+            .checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)
4393
+        }
4394
+        
4395
+        merged.df = merged.df %>%
4396
+            dplyr::left_join(GRN@annotation$TFs %>% 
4397
+                                 dplyr::select(.data$TF.ENSEMBL, tidyselect::starts_with("TF.variancePartition")), 
4398
+                             by = "TF.ENSEMBL")  %>%
4399
+            dplyr::left_join(GRN@annotation$genes %>% 
4400
+                                 dplyr::select(.data$gene.ENSEMBL, tidyselect::starts_with("gene.variancePartition")), 
4401
+                             by = "gene.ENSEMBL")  %>%
4402
+            dplyr::left_join(GRN@annotation$consensusPeaks %>% 
4403
+                                 dplyr::select(.data$peak.ID, tidyselect::starts_with("peak.variancePartition")), 
4404
+                             by = "peak.ID")
4405
+        
4406
+    }
4309 4407
     
4310
-    if (include_TF_gene_correlations) {
4311
-      
4312
-      if (is.null(GRN@connections$TF_genes.filtered)) {
4313
-        message = "Please run the function add_TF_gene_correlation first. "
4314
-        .checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)
4315
-      }
4316
-      
4317
-      # Merge with TF-gene table
4318
-      merged.df = dplyr::left_join(GRN@connections$all.filtered[[permIndex]], 
4319
-                                   GRN@connections$TF_genes.filtered[[permIndex]], 
4320
-                                   by = c("TF.name", "TF.ENSEMBL", "gene.ENSEMBL")) %>%
4321
-        dplyr::select(tidyselect::starts_with("TF."), tidyselect::everything()) %>%
4322
-        tibble::as_tibble()
4323
-      
4324
-      
4325
-      return(merged.df)
4326
-      
4327
-    } else {
4328
-      return(GRN@connections$all.filtered[[permIndex]])
4408
+    if (include_TFMetadata) {
4409
+        
4410
+        colsMissing = setdiff(colnames(GRN@annotation$TFs), colnames(merged.df))
4411
+        if (length(colsMissing) > 0) {
4412
+            merged.df = merged.df %>%
4413
+                dplyr::left_join(GRN@annotation$TFs, by = c("TF.name", "TF.ENSEMBL"))
4414
+        }
4329 4415
     }
4330 4416
     
4417
+    if (include_peakMetadata) {
4418
+        
4419
+        colsMissing = setdiff(colnames(GRN@annotation$consensusPeaks), colnames(merged.df))
4420
+        if (length(colsMissing) > 0) {
4421
+            merged.df = merged.df %>%
4422
+                dplyr::left_join(GRN@annotation$consensusPeaks, by = "peak.ID")
4423
+        }
4424
+    }
4425
+    
4426
+    if (include_geneMetadata) {
4427
+        
4428
+        colsMissing = setdiff(colnames(GRN@annotation$genes), colnames(merged.df))
4429
+        if (length(colsMissing) > 0) {
4430
+            merged.df = merged.df %>%
4431
+                dplyr::left_join(GRN@annotation$genes, by = "gene.ENSEMBL")
4432
+        }
4433
+    }
4434
+
4331 4435
     
4436
+    merged.df = merged.df %>%
4437
+        dplyr::select(tidyselect::starts_with("TF."), 
4438
+                      tidyselect::starts_with("peak."), 
4439
+                      tidyselect::starts_with("TF_peak."), 
4440
+                      tidyselect::starts_with("gene."), 
4441
+                      tidyselect::starts_with("peak_gene."), 
4442
+                      tidyselect::starts_with("TF_gene."), 
4443
+                      tidyselect::everything()) %>%
4444
+        tibble::as_tibble()
4445
+
4446
+    return(merged.df)
4332 4447
     
4333 4448
   } else if (type == "TF_peaks") {
4334 4449
     
... ...
@@ -4338,6 +4453,15 @@ getGRNConnections <- function(GRN, type = "all.filtered",  permuted = FALSE, inc
4338 4453
     
4339 4454
     return(tibble::as_tibble(GRN@connections$peak_genes[[permIndex]]))
4340 4455
     
4456
+  } else if (type == "TF_genes") {
4457
+      
4458
+    if (is.null(GRN@connections$TF_genes.filtered)) {
4459
+      message = "Please run the function add_TF_gene_correlation first. "
4460
+      .checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)
4461
+    }
4462
+      
4463
+    return(tibble::as_tibble(GRN@connections$TF_genes.filtered[[permIndex]]))
4464
+      
4341 4465
   } 
4342 4466
   
4343 4467
 }
... ...
@@ -4374,6 +4498,10 @@ getGRNConnections <- function(GRN, type = "all.filtered",  permuted = FALSE, inc
4374 4498
     } else {
4375 4499
       GRN@config$directories$output_plots = gsub("\\", "/", GRN@config$directories$output_plots, fixed = TRUE)
4376 4500
     }
4501
+      
4502
+    if (!endsWith(GRN@config$directories$output_plots, .Platform$file.sep)) {
4503
+        GRN@config$directories$output_plots = paste0(GRN@config$directories$output_plots, .Platform$file.sep)
4504
+    }
4377 4505
     
4378 4506
     if (!dir.exists(GRN@config$directories$output_plots)) {
4379 4507
       dir.create(GRN@config$directories$output_plots, recursive = TRUE)
... ...
@@ -4382,6 +4510,8 @@ getGRNConnections <- function(GRN, type = "all.filtered",  permuted = FALSE, inc
4382 4510
   
4383 4511
   dplyr::if_else(is.null(outputFolder), GRN@config$directories$output_plots, outputFolder)
4384 4512
   
4513
+  
4514
+  
4385 4515
 }
4386 4516
 
4387 4517
 
... ...
@@ -4488,6 +4618,10 @@ getGRNConnections <- function(GRN, type = "all.filtered",  permuted = FALSE, inc
4488 4618
 getParameters <- function (GRN, type = "parameter", name = "all") {
4489 4619
   
4490 4620
   checkmate::assertClass(GRN, "GRN")
4621
+  GRN = .addFunctionLogToObject(GRN)
4622
+    
4623
+  GRN = .makeObjectCompatible(GRN)
4624
+    
4491 4625
   checkmate::assertChoice(type, c("function", "parameter"))
4492 4626
   
4493 4627
   if (type == "function") {
... ...
@@ -4542,6 +4676,7 @@ deleteIntermediateData <- function(GRN) {
4542 4676
   checkmate::assertClass(GRN, "GRN")
4543 4677
   GRN = .addFunctionLogToObject(GRN)
4544 4678
   
4679
+  GRN = .makeObjectCompatible(GRN)
4545 4680
   
4546 4681
   for (permutationCur in 0:.getMaxPermutation(GRN)) {
4547 4682
     
... ...
@@ -4589,8 +4724,8 @@ getBasic_metadata_visualization <- function(GRN, forceRerun = FALSE) {
4589 4724
       dplyr::mutate(ENSEMBL_ID = gsub("\\..+", "", .data$ENSEMBL_ID, perl = TRUE),
4590 4725
                     baseMean_log = log2(baseMean + 0.01))
4591 4726
     
4592
-    expression_TF.df = dplyr::filter(expression.df, .data$ENSEMBL_ID %in% GRN@data$TFs$translationTable$ENSEMBL) %>%
4593
-      dplyr::left_join(GRN@data$TFs$translationTable, by = c("ENSEMBL_ID" = "ENSEMBL"))
4727
+    expression_TF.df = dplyr::filter(expression.df, .data$ENSEMBL_ID %in% GRN@annotation$TFs$TF.ENSEMBL) %>%
4728
+      dplyr::left_join(GRN@annotation$TFs, by = c("ENSEMBL_ID" = "TF.ENSEMBL"))
4594 4729
     
4595 4730
     meanPeaks.df = tibble::tibble(peakID = getCounts(GRN, type = "peaks", norm = TRUE, permuted = FALSE)$peakID, 
4596 4731
                                   mean = rowMeans(dplyr::select(getCounts(GRN, type = "peaks", norm = TRUE, permuted = FALSE), -.data$peakID))) %>%
... ...
@@ -4617,14 +4752,23 @@ getBasic_metadata_visualization <- function(GRN, forceRerun = FALSE) {
4617 4752
 #' GRN = changeOutputDirectory(GRN, outputDirectory = ".")
4618 4753
 changeOutputDirectory <- function(GRN, outputDirectory = ".") {
4619 4754
   
4620
-  GRN@config$directories$outputRoot   =  outputDirectory
4621
-  GRN@config$directories$output_plots = paste0(outputDirectory, .Platform$file.sep, "plots", .Platform$file.sep)
4622
-  GRN@config$files$output_log         = paste0(outputDirectory, .Platform$file.sep, "GRN.log")
4755
+    start = Sys.time()  
4623 4756
     
4624
-  futile.logger::flog.info(paste0("Output directory changed in the object to " , outputDirectory))
4625
-  
4626
-  
4627
-  GRN
4757
+    checkmate::assertClass(GRN, "GRN")
4758
+    GRN = .addFunctionLogToObject(GRN)
4759
+    
4760
+    GRN = .makeObjectCompatible(GRN)
4761
+    
4762
+    checkmate::assertCharacter(outputDirectory, len = 1, min.chars = 1)
4763
+    
4764
+    GRN@config$directories$outputRoot   = outputDirectory
4765
+    GRN@config$directories$output_plots = paste0(outputDirectory, .Platform$file.sep, "plots", .Platform$file.sep)
4766
+    GRN@config$files$output_log         = paste0(outputDirectory, .Platform$file.sep, "GRN.log")
4767
+    
4768
+    futile.logger::flog.info(paste0("Output directory changed in the object to " , outputDirectory))
4769
+    
4770
+    
4771
+    GRN
4628 4772
   
4629 4773
 }
4630 4774
 
... ...
@@ -4735,3 +4879,249 @@ changeOutputDirectory <- function(GRN, outputDirectory = ".") {
4735 4879
     paste0(names(tbl_freq), " (", tbl_freq, ")")
4736 4880
 }
4737 4881
 
4882
+######## Quantify source of variation ########
4883
+
4884
+#' Quantify and interpret multiple sources of biological and technical variation for features (TFs, peaks, and genes) in a \code{\linkS4class{GRN}} object
4885
+#' 
4886
+#' Runs the main function \code{fitExtractVarPartModel} of the package \code{variancePartition}: Fits a linear (mixed) model to estimate contribution of multiple sources of variation while simultaneously correcting for all other variables for the features in a GRN object (TFs, peaks, and genes) given particular metadata. The function reports the fraction of variance attributable to each metadata variable.
4887
+#' \strong{Note: The results are not added to \code{GRN@connections$all.filtered}, rerun the function \code{\link{getGRNConnections}} and set \code{include_variancePartitionResults} to \code{TRUE} to do so}.
4888
+#' The results object is stored in \code{GRN@stats$variancePartition} and can be used for the various diagnostic and plotting functions from \code{variancePartition}.
4889
+#' 
4890
+#' The normalized count matrices are used as input for \code{fitExtractVarPartModel}. 
4891
+#' 
4892
+#' @template GRN 
4893
+#' @param formula Character(1). Either \code{auto} or a manually defined formula to be used for the model fitting. Default \code{auto}. Must include only terms that are part of the metadata as specified with the \code{metadata} parameter. If set to \code{auto}, the formula will be build automatically based on all metadata variables as specified with the \code{metadata} parameter. By default, numerical variables will be modeled as fixed effects, while variables that are defined as factors or can be converted to factors (characters and logical variables) are modeled as random effects as recommended by the \code{variancePartition} package.
4894
+#' @param metadata Character vector. Default \code{all}. Vector of column names from the metadata data frame that was provided when using the function 
4895
+#' \code{\link{addData}}. Must either contain the special keyword \code{all} to denote that all (!) metadata columns from \code{GRN@data$metadata} are taken
4896
+#' or if not, a subset of the column names from \code{GRN@data$metadata}to include in the model fitting for \code{fitExtractVarPartModel}..
4897
+#' @param features Character(1). Either \code{all_filtered} or \code{all}. Default \code{all_filtered}. Should \code{variancePartition} only be run for the features (TFs, peaks and genes) from the filtered set of connections (the result of \code{\link{filterGRNAndConnectGenes}}) or for all genes that are defined in the object? If set to \code{all}, the running time is greatly increased.
4898
+#' @template nCores
4899
+#' @template forceRerun
4900
+#' @param ... Additional parameters passed on to \code{variancePartition::fitExtractVarPartModel} beyond \code{exprObj}, \code{formula} and \code{data}. See the function help for more information
4901
+# #' @seealso \code{\link{plotDiagnosticPlots_featureVariation}}
4902
+#' @seealso \code{\link{addData}}
4903
+#' @seealso \code{\link{getGRNConnections}}
4904
+#' @return An updated \code{\linkS4class{GRN}} object, with additional information added from this function to \code{GRN@stats$variancePartition} as well as the elements \code{genes}, \code{consensusPeaks} and \code{TFs} within \code{GRN@annotation}. 
4905
+#' As noted above, the results are not added to \code{GRN@connections$all.filtered}; rerun the function \code{\link{getGRNConnections}} and set \code{include_variancePartitionResults} to \code{TRUE} to include the results in the eGRN output table.
4906
+#' @examples 
4907
+#' # See the Workflow vignette on the GRaNIE website for examples
4908
+#' GRN = loadExampleObject()
4909
+#' GRN = add_featureVariation(GRN, metadata = c("mt_frac"), forceRerun = TRUE)
4910
+#' @export
4911
+add_featureVariation <- function (GRN, 
4912
+                                    formula = "auto", 
4913
+                                    metadata = c("all"),
4914
+                                    features = "all_filtered", 
4915
+                                    nCores = 1,
4916
+                                    forceRerun = FALSE, 
4917
+                                    ...) {
4918
+    
4919
+    start = Sys.time()  
4920
+    
4921
+    checkmate::assertClass(GRN, "GRN")
4922
+    GRN = .addFunctionLogToObject(GRN)
4923
+    
4924
+    GRN = .makeObjectCompatible(GRN)
4925
+    
4926
+    checkmate::assertCharacter(formula, min.chars = 2, len = 1)
4927
+    
4928
+    packageMessage = paste0("The package variancePartition is not installed but required for this function.")
4929
+    .checkPackageInstallation("variancePartition", packageMessage, isWarning = FALSE)
4930
+
4931
+    if (is.null(GRN@data$metadata)) {
4932
+        message = paste0("No metadata was found (GRN@data$metadata is NULL), cannot run variancePartition without metadata. Reren the addData function and provide metadata.")
4933
+        
4934
+        .checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)    
4935
+    }
4936
+    checkmate::assert(checkmate::checkChoice(metadata, "all"), checkmate::checkSubset(metadata, colnames(GRN@data$metadata)))
4937
+
4938
+    checkmate::assertChoice(features, c("all_filtered", "all"))
4939
+    checkmate::assertIntegerish(nCores, lower = 1)
4940
+    checkmate::assertFlag(forceRerun)
4941
+    
4942
+    if (is.null(GRN@stats$variancePartition$RNA) | is.null(GRN@stats$variancePartition$peaks) | forceRerun) {
4943
+        
4944
+        if (features == "all_filtered") {
4945
+            .checkExistanceFilteredConnections(GRN)
4946
+        }
4947
+        
4948
+        # Prepare the metadata
4949
+        if ("all" %in% metadata) {
4950
+            columnsToSelect = colnames(GRN@data$metadata)
4951
+        } else {
4952
+            columnsToSelect = unique(c(metadata, "has_both"))
4953
+        }
4954
+        meta <- GRN@data$metadata %>% 
4955
+            dplyr::filter(.data$has_both == TRUE) %>%
4956
+            dplyr::select(tidyselect::one_of(columnsToSelect)) %>%
4957
+            dplyr::mutate_if(is.character, as.factor) %>%
4958
+            dplyr::mutate_if(is.logical, as.factor) %>%
4959
+            dplyr::select(-.data$has_both)
4960
+        
4961
+        # Remove factors with only one level
4962
+        coltypes = meta %>% dplyr::summarise_all(class)
4963
+        factorVariables  = which(coltypes[1,] == "factor")
4964
+        numericVariables = which(coltypes[1,] == "numeric")
4965
+        factorVariablesNames = colnames(coltypes)[factorVariables]
4966
+        numericVariablesNames = colnames(coltypes)[numericVariables]
4967
+
4968
+        nLevels = sapply(factorVariables, function(x) {nlevels(dplyr::pull(meta, colnames(coltypes)[x]))})
4969
+        nLevelOne = which(nLevels == 1)
4970
+        if (length(nLevelOne) > 0) {
4971
+            meta = dplyr::select(meta, -tidyselect::one_of(factorVariablesNames[nLevelOne]))
4972
+            factorVariablesNames = intersect(colnames(meta), factorVariablesNames)
4973
+        }
4974
+        
4975
+        if (formula == "auto") {
4976
+            
4977
+            futile.logger::flog.info(paste0("Due to formula = \"auto\", all factors will be modeled as random effects and all numerical variables as fixed effects, as recommended in the variancePartition vignette."))
4978
+            
4979
+            # See https://bioconductor.org/packages/release/bioc/vignettes/variancePartition/inst/doc/variancePartition.pdf for details
4980
+            # Factor variables are modeled as random effect
4981
+            if (length(factorVariablesNames) > 0) {
4982
+                formulaElems = paste0("(1|", factorVariablesNames, ")")
4983
+            } else {
4984
+                formulaElems  =""
4985
+            }
4986
+            
4987
+            if (length(numericVariablesNames) > 0) {
4988
+                if (length(factorVariablesNames) > 0) {
4989
+                    separator = " + " 
4990
+                } else {
4991
+                    separator = ""
4992
+                }
4993
+                # Numeric variables are modeled as fixed effect
4994
+                formulaElems2 = paste0(numericVariablesNames)
4995
+                
4996
+                message = paste0("Make sure that all variables from the metadata that are numeric are indeed really numeric and not (hidden) factors. The following variables will be treated as numeric: ", paste0(numericVariablesNames, collapse = ",", ". If one of these is in fact a factor, change the type in GRN@data$metadata and re-run, or provide the design formula manually"))
4997
+                .checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE)
4998
+                
4999
+            } else {
5000
+                separator = ""
5001
+                formulaElems2 = c()
5002
+            }
5003
+            
5004
+            designFormula <- stats::as.formula(paste0("~ ", paste0(formulaElems, collapse = " + "), separator, paste0(formulaElems2, collapse = " + "))) 
5005
+            
5006
+        } else {
5007
+            
5008
+            message = paste0("A custom formula has been provides. This is currently not being checked for correctness and therefore may reuslt in errors when running variancePartition. In that case, make sure the provided formula is correct.")
5009
+            .checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE)
5010
+            designFormula = formula
5011
+        }
5012
+        
5013
+        futile.logger::flog.info(paste0("The following formula will be used for model fitting: \"", deparse1(designFormula), "\". Make sure this is appropriate."))
5014
+        
5015
+        
5016
+        # RNA and ATAC norm counts
5017
+        if (features == "all_filtered") {
5018
+            genesToInclude = unique(c(GRN@connections$all.filtered[["0"]]$TF.ENSEMBL, GRN@connections$all.filtered[["0"]]$gene.ENSEMBL))
5019
+            peaksToInclude = unique(GRN@connections$all.filtered[["0"]]$peak.ID)
5020
+            
5021
+            futile.logger::flog.info(paste0("Using ", length(peaksToInclude), " peaks and ", length(genesToInclude), " TF/genes from the filtered connections."))
5022
+            
5023
+        } else if (features == "all") {
5024
+            genesToInclude = GRN@data$RNA$counts_norm.l[["0"]]$ENSEMBL
5025
+            peaksToInclude = GRN@data$peaks$counts_norm$peakID
5026
+            
5027
+            futile.logger::flog.info(paste0("Using all ", length(peaksToInclude), " peaks and ", length(genesToInclude), " TF/genes. This may take a long time. Consider setting features = \"all_filtered\"."))
5028
+        }
5029
+        
5030
+        data.l = list()
5031
+        data.l[["RNA"]] <- GRN@data$RNA$counts_norm.l[["0"]] %>% 
5032
+            dplyr::filter(!.data$isFiltered, .data$ENSEMBL %in% genesToInclude) %>%
5033
+            dplyr::select(-.data$isFiltered) %>% 
5034
+            tibble::column_to_rownames(var = "ENSEMBL")
5035
+        
5036
+        data.l[["peaks"]] <- GRN@data$peaks$counts_norm %>% 
5037
+            dplyr::filter(!.data$isFiltered, .data$peakID %in% peaksToInclude) %>%
5038
+            dplyr::select(-.data$isFiltered) %>%
5039
+            tibble::column_to_rownames(var = "peakID")  
5040
+        
5041
+        
5042
+        for (dataType in c("RNA", "peaks")) {
5043
+            
5044
+            # row_order <- matrixStats::rowVars(as.matrix(data.l[[dataType]]) ) %>% order(decreasing = T)
5045
+            # data_set <- data.l[[dataType]][row_order,]
5046
+            
5047
+            # fit model
5048
+            futile.logger::flog.info(paste0("Running variancePartition and fit models for data type ", dataType, " using ", nCores, " core(s). This may take a while."))
5049
+            varPart <- variancePartition::fitExtractVarPartModel(exprObj = data.l[[dataType]], 
5050
+                                                                 formula = designFormula, 
5051
+                                                                 data = meta, 
5052
+                                                                 BPPARAM = .initBiocParallel(nCores),
5053
+                                                                 ...)
5054
+            GRN@stats$variancePartition[[dataType]]  = varPart
5055
+            
5056
+            res = as.data.frame(GRN@stats$variancePartition[[dataType]]) %>%
5057
+                dplyr::rename_all( .funs = ~ paste0("variancePartition_", .x)) %>%
5058
+                tibble::as_tibble(rownames = "ID")
5059
+            
5060
+            # Update annotation slots
5061
+            if (dataType == "RNA") {
5062
+                
5063
+                colnames(res)[2:ncol(res)] = paste0("gene.", colnames(res)[2:ncol(res)])
5064
+                
5065
+                GRN@annotation$genes = GRN@annotation$genes %>%
5066
+                    dplyr::select(-tidyselect::starts_with("gene.variancePart")) %>%
5067
+                    dplyr::left_join(res, by = c("gene.ENSEMBL" = "ID"))
5068
+                
5069
+                colnames(res) = gsub("gene.", "TF.", colnames(res))
5070
+                
5071
+                GRN@annotation$TFs = GRN@annotation$TFs %>%
5072
+                    dplyr::select(-tidyselect::starts_with("TF.variancePart")) %>%
5073
+                    dplyr::left_join(res, by = c("TF.ENSEMBL" = "ID"))
5074
+                
5075
+                
5076
+            } else {
5077
+                
5078
+                colnames(res)[2:ncol(res)] = paste0("peak.", colnames(res)[2:ncol(res)])
5079
+                
5080
+                GRN@annotation$consensusPeaks = GRN@annotation$consensusPeaks %>%
5081
+                    dplyr::select(-tidyselect::starts_with("variancePart")) %>%
5082
+                    dplyr::left_join(res, by = c("peak.ID" = "ID"))
5083
+            }
5084
+            
5085
+        }
5086
+        
5087
+        futile.logger::flog.info(paste0("The result objects have been stored in GRN@stats$variancePartition for both RNA and peaks."))
5088
+        
5089
+    } else {
5090
+        futile.logger::flog.info(paste0("Data already exists in object, nothing to do due to forceRerun = FALSE"))
5091
+    }
5092
+    
5093
+    
5094
+    .printExecutionTime(start, prefix = "")
5095
+    
5096
+    GRN
5097
+    
5098
+}
5099
+
5100
+
5101
+.checkExistanceFilteredConnections <- function(GRN) {
5102
+    
5103
+    if (is.null(GRN@connections$all.filtered[["0"]])) {
5104
+        message = "Could not find filtered connections (the slot GRN@connections$all.filtered[[\"0\"]] is NULL). Run the function filterGRNAndConnectGenes."
5105
+        .checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)
5106
+    }
5107
+}
5108
+
5109
+.makeObjectCompatible <- function(GRN) {
5110
+    
5111
+    if (is.null(GRN@annotation$TFs)) {
5112
+        GRN@annotation$TFs = GRN@data$TFs$translationTable
5113
+        GRN@data$TFs$translationTable = NULL
5114
+    }
5115
+    
5116
+    
5117
+    # Temporary fix: Replace lincRNA with lncRNA due to a recent change in biomaRt until we update the object directly
5118
+    if ("lncRNA" %in% levels(GRN@annotation$genes$gene.type)) {
5119
+        GRN@annotation$genes = dplyr::mutate(GRN@annotation$genes, gene.type = dplyr::recode_factor(.data$gene.type, lncRNA = "lincRNA")) 
5120
+    }
5121
+   
5122
+    
5123
+    
5124
+    GRN
5125
+}
5126
+
5127
+  
4738 5128
\ No newline at end of file
... ...
@@ -21,6 +21,9 @@ build_eGRN_graph <- function(GRN, model_TF_gene_nodes_separately = FALSE,
21 21
   start = Sys.time()  
22 22
   checkmate::assertClass(GRN, "GRN")
23 23
   GRN = .addFunctionLogToObject(GRN)
24
+  
25
+  GRN = .makeObjectCompatible(GRN)
26
+  
24 27
   checkmate::assertFlag(model_TF_gene_nodes_separately)
25 28
   checkmate::assertFlag(allowLoops)
26 29
   checkmate::assertFlag(removeMultiple)
... ...
@@ -234,8 +237,12 @@ performAllNetworkAnalyses <- function(GRN, ontology = c("GO_BP", "GO_MF"),
234 237
                                       forceRerun = FALSE) {
235 238
   
236 239
   start = Sys.time()
240
+  
241
+  checkmate::assertClass(GRN, "GRN")
237 242
   GRN = .addFunctionLogToObject(GRN)
238 243
   
244
+  GRN = .makeObjectCompatible(GRN)
245
+  
239 246
   GRN = build_eGRN_graph(GRN, model_TF_gene_nodes_separately = FALSE, allowLoops = FALSE, directed = FALSE, removeMultiple = FALSE)
240 247
   
241 248
   GRN = plotGeneralGraphStats(GRN, outputFolder = outputFolder, forceRerun = forceRerun) 
... ...
@@ -362,6 +369,8 @@ calculateGeneralEnrichment <- function(GRN, ontology = c("GO_BP", "GO_MF"),
362 369
   checkmate::assertClass(GRN, "GRN")
363 370
   GRN = .addFunctionLogToObject(GRN)
364 371
   
372
+  GRN = .makeObjectCompatible(GRN)
373
+  
365 374
   checkmate::assertSubset(ontology , c("GO_BP", "GO_MF", "GO_CC", "KEGG", "DO", "Reactome"), empty.ok = FALSE)
366 375
   
367 376
   .checkPackage_topGO_and_arguments(ontology, algorithm, statistic)
... ...
@@ -490,7 +499,7 @@ calculateGeneralEnrichment <- function(GRN, ontology = c("GO_BP", "GO_MF"),
490 499
 }
491 500
 
492 501
 #' @importFrom biomaRt useEnsembl getBM
493
-.runEnrichment <- function(GRN, foreground, background, backgroundStr, database = "GO", ontology, 
502
+.runEnrichment <- function(GRN, foreground, background, backgroundStr, ontology, 
494 503
                            description = "Enrichment Analysis",
495 504
                            algorithm="weight01", statistic = "fisher", mapping, pAdjustMethod = "BH", minGSSize = 0, maxGSSize = 5000){
496 505
   
... ...
@@ -509,7 +518,7 @@ calculateGeneralEnrichment <- function(GRN, ontology = c("GO_BP", "GO_MF"),
509 518
   
510 519
   
511 520
   if (ontology %in% c("KEGG", "DO", "Reactome")) {
512
-    # the ENSEMBL IDs will need to be mapped to Enntrez IDs for these ontologies
521
+    # the ENSEMBL IDs will need to be mapped to Entrez IDs for these ontologies
513 522
     
514 523
     if (statistic != "fisher") {
515 524
       statistic = "fisher"
... ...
@@ -759,7 +768,8 @@ calculateGeneralEnrichment <- function(GRN, ontology = c("GO_BP", "GO_MF"),
759 768
 #' @param clustering Character. Default \code{louvain}. One of: \code{louvain}, \code{leiden}, \code{leading_eigen}, \code{fast_greedy}, \code{optimal}, \code{walktrap}. The community detection algorithm to be used. Please bear in mind the robustness and time consumption of the algorithms when opting for an alternative to the default. 
760 769
 #' @param ... Additional parameters for the used clustering method, see the \code{igraph::cluster_*} methods for details on the specific parameters and what they do. For \code{leiden} clustering, for example, you may add a \code{resolution_parameter} to control the granularity of the community detection or \code{n_iterations} to modify the number of iterations.
761 770
 #' @template forceRerun
762
-#' @return An updated \code{\linkS4class{GRN}} object, with a table that consists of the connections clustered into communities stored in the \code{stats$communities} slot.
771
+#' @return An updated \code{\linkS4class{GRN}} object, with a table that consists of the connections clustered into communities stored in the 
772
+#' \code{GRN@graph$TF_gene$clusterGraph} slot as well as within the \code{igraph} object in \code{GRN@graph$TF_gene$graph} (retrievable via \code{igraph} using \code{igraph::vertex.attributes(GRN@graph$TF_gene$graph)$community}, for example.)
763 773
 #' @seealso \code{\link{plotCommunitiesStats}}
764 774
 #' @seealso \code{\link{calculateCommunitiesEnrichment}}
765 775
 #' @import patchwork
... ...
@@ -774,6 +784,7 @@ calculateCommunitiesStats <- function(GRN, clustering = "louvain", forceRerun =
774 784
   checkmate::assertClass(GRN, "GRN")
775 785
   GRN = .addFunctionLogToObject(GRN)
776 786
   
787
+  GRN = .makeObjectCompatible(GRN)
777 788
   
778 789
   checkmate::assertChoice(clustering, c("louvain", "leading_eigen", "fast_greedy", "optimal", "walktrap", "leiden"))
779 790
   checkmate::assertFlag(forceRerun)
... ...
@@ -850,7 +861,8 @@ calculateCommunitiesStats <- function(GRN, clustering = "louvain", forceRerun =
850 861
 #' 
851 862
 #' @inheritParams calculateGeneralEnrichment
852 863
 #' @param selection Character. Default \code{"byRank"}. One of: \code{"byRank"}, \code{"byLabel"}. Specify whether the communities enrichment will by calculated based on their rank, where the largest community (with most vertices) would have a rank of 1, or by their label. Note that the label is independent of the rank.
853
-#' @param communities Numeric vector. Default \code{c(1:10)}. Depending on what was specified in the \code{display} parameter, this parameter would indicate either the rank or the label of the communities to be plotted. i.e. for \code{communities = c(1,4)}, if \code{display = "byRank"} the GO enrichment for the first and fourth largest communities will be calculated if \code{display = "byLabel"}, the results for the communities labeled "1", and "4" will be plotted.
864
+#' @param communities Numeric vector. Default \code{c(1:10)}. Depending on what was specified in the \code{display} parameter, 
865
+#' this parameter would indicate either the rank or the label of the communities to be plotted. i.e. for \code{communities = c(1,4)}, if \code{display = "byRank"} the GO enrichment for the first and fourth largest communities will be calculated if \code{display = "byLabel"}, the results for the communities labeled "1", and "4" will be plotted.
854 866
 #' @return An updated \code{\linkS4class{GRN}} object, with the enrichment results stored in the \code{stats$Enrichment$byCommunity} slot.
855 867
 #' @seealso \code{\link{plotCommunitiesEnrichment}}
856 868
 #' @seealso \code{\link{plotGeneralEnrichment}}
... ...
@@ -874,6 +886,7 @@ calculateCommunitiesEnrichment <- function(GRN,
874 886
   checkmate::assertClass(GRN, "GRN")
875 887
   GRN = .addFunctionLogToObject(GRN)
876 888
   
889
+  GRN = .makeObjectCompatible(GRN)
877 890
 
878 891
   checkmate::assertSubset(ontology , c("GO_BP", "GO_MF", "GO_CC", "KEGG", "DO", "Reactome"), empty.ok = FALSE)
879 892
  
... ...
@@ -981,7 +994,7 @@ calculateCommunitiesEnrichment <- function(GRN,
981 994
   
982 995
   if (length(selCommunities) == 0) {
983 996
       existingCommunities = unique(df$community)
984
-      message = paste0("None of the requested communities (", paste0(communities, collapse = ",", ") were found. Only the following communities are available: ", paste0(existingCommunities, collapse = ",")))
997
+      message = paste0("None of the requested communities (", paste0(communities, collapse = ","), ") were found. Only the following communities are available: ", paste0(existingCommunities, collapse = ","))
985 998
       .checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)
986 999
   }
987 1000
   
... ...
@@ -1012,6 +1025,9 @@ getTopNodes <- function(GRN, nodeType, rankType, n = 0.1, use_TF_gene_network =
1012 1025
   start = Sys.time()
1013 1026
   checkmate::assertClass(GRN, "GRN")
1014 1027
   GRN = .addFunctionLogToObject(GRN)
1028
+  
1029
+  GRN = .makeObjectCompatible(GRN)
1030
+  
1015 1031
   checkmate::assertChoice(nodeType, c("gene", "TF"))
1016 1032
   checkmate::assertChoice(rankType, c("degree", "EV"))
1017 1033
   checkmate::assertFlag(use_TF_gene_network)
... ...
@@ -1117,6 +1133,8 @@ calculateTFEnrichment <- function(GRN, rankType = "degree", n = 3, TF.names = NU
1117 1133
   checkmate::assertClass(GRN, "GRN")
1118 1134
   GRN = .addFunctionLogToObject(GRN)
1119 1135
   
1136
+  GRN = .makeObjectCompatible(GRN)
1137
+  
1120 1138
   checkmate::assertChoice(rankType, c("degree", "EV", "custom"))
1121 1139
   checkmate::assert(checkmate::checkNumeric(n, lower = 0.0001, upper = 0.999999), checkmate::checkIntegerish(n, lower = 1))
1122 1140
   checkmate::assertSubset(ontology , c("GO_BP", "GO_MF", "GO_CC", "KEGG", "Reactome", "DO"), empty.ok = FALSE)
... ...
@@ -1133,10 +1151,14 @@ calculateTFEnrichment <- function(GRN, rankType = "degree", n = 3, TF.names = NU
1133 1151
       futile.logger::flog.error("To calculate the GO enrichment for a custom set of TFs, you must provide the TF names in the 'TF.names' parameter.")
1134 1152
     }
1135 1153
     wrongTFs = setdiff(TF.names, unique(GRN@connections$all.filtered$`0`$TF.name))
1136
-    if (length(wrongTFs)>0){
1137
-      futile.logger::flog.warn(paste0("The following TFs are not in the filtered GRN and will be ommited from the analysis: ",  paste0(wrongTFs, collapse = ", ")))
1138
-    }
1154
+
1139 1155
     TFset = setdiff(TF.names, wrongTFs) 
1156
+    
1157
+    if (length(wrongTFs) > 0 & length(TFset) > 0){
1158
+        message = paste0("The following TFs are not in the filtered GRN and will be ommited from the analysis: ",  paste0(wrongTFs, collapse = ", "))
1159
+        .checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE)
1160
+    }
1161
+    
1140 1162
   } else{
1141 1163
     
1142 1164
     # TF.name is always there, irrespective of whether ENSEMBL ID or TF name is used as primary ID type
... ...
@@ -1158,7 +1180,7 @@ calculateTFEnrichment <- function(GRN, rankType = "degree", n = 3, TF.names = NU
1158 1180
     
1159 1181
   } else {
1160 1182
     message = paste0("No TF fulfills the chosen criteria. Try increasing the value of the parameter n")
1161
-    .checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE)
1183
+    .checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)
1162 1184
   }
1163 1185
   
1164 1186
   for (TF in as.character(TFset)){
... ...
@@ -33,6 +33,8 @@ plotPCA_all <- function(GRN, outputFolder = NULL, basenameOutput = NULL,
33 33
   checkmate::assertClass(GRN, "GRN")
34 34
   GRN = .addFunctionLogToObject(GRN)
35 35
   
36
+  GRN = .makeObjectCompatible(GRN)
37
+  
36 38
   checkmate::assertSubset(data , c("rna", "peaks", "all"), empty.ok = FALSE)
37 39
   checkmate::assertSubset(type , c("raw", "normalized", "all"), empty.ok = FALSE)
38 40
   checkmate::assertFlag(plotAsPDF)
... ...
@@ -337,7 +339,7 @@ plotPCA_all <- function(GRN, outputFolder = NULL, basenameOutput = NULL,
337 339
     
338 340
   }
339 341
   
340
-  .printWarningPageNumber(pages, pageCounter)
342
+  .checkPageNumberValidity(pages, pageCounter)
341 343
   if (plotAsPDF) grDevices::dev.off()
342 344
   
343 345
 }
... ...
@@ -516,6 +518,8 @@ plotDiagnosticPlots_TFPeaks <- function(GRN,
516 518
   checkmate::assertClass(GRN, "GRN")
517 519
   GRN = .addFunctionLogToObject(GRN)
518 520
   
521
+  GRN = .makeObjectCompatible(GRN)
522
+  
519 523
   checkmate::assert(checkmate::checkNull(outputFolder), checkmate::checkCharacter(outputFolder, min.chars = 1))
520 524
   checkmate::assert(checkmate::checkNull(basenameOutput), checkmate::checkCharacter(basenameOutput, len = 1, min.chars = 1, any.missing = FALSE))
521 525
   checkmate::assertFlag(plotDetails)
... ...
@@ -857,7 +861,7 @@ plotDiagnosticPlots_TFPeaks <- function(GRN,
857 861
     
858 862
   } # end allTF
859 863
   
860
-  .printWarningPageNumber(pages, pageCounter)
864
+  .checkPageNumberValidity(pages, pageCounter)
861 865
   
862 866
   if (!is.null(file)) dev.off()
863 867
   
... ...
@@ -908,6 +912,8 @@ plotDiagnosticPlots_peakGene <- function(GRN,
908 912
   checkmate::assertClass(GRN, "GRN")
909 913
   GRN = .addFunctionLogToObject(GRN)
910 914
   
915
+  GRN = .makeObjectCompatible(GRN)
916
+  
911 917
   checkmate::assert(checkmate::checkNull(outputFolder), checkmate::checkCharacter(outputFolder, min.chars = 1))
912 918
   checkmate::assert(checkmate::checkNull(basenameOutput), checkmate::checkCharacter(basenameOutput, len = 1, min.chars = 1, any.missing = FALSE))
913 919
   checkmate::assertList(gene.types, any.missing = FALSE, min.len = 1, types = "character")
... ...
@@ -1553,7 +1559,7 @@ plotDiagnosticPlots_peakGene <- function(GRN,
1553 1559
       }
1554 1560
       pageCounter = pageCounter + 1
1555 1561
       
1556
-      .printWarningPageNumber(pages, pageCounter)
1562
+      .checkPageNumberValidity(pages, pageCounter)
1557 1563
       if (!is.null(fileBase)) {
1558 1564
         grDevices::dev.off()
1559 1565
       }
... ...
@@ -1612,6 +1618,8 @@ plot_stats_connectionSummary <- function(GRN, type = "heatmap",
1612 1618
   checkmate::assertClass(GRN, "GRN")
1613 1619
   GRN = .addFunctionLogToObject(GRN)
1614 1620
   
1621
+  GRN = .makeObjectCompatible(GRN)
1622
+  
1615 1623
   
1616 1624
   checkmate::assertChoice(type, c("heatmap", "boxplot", "density"))
1617 1625
   checkmate::assertFlag(plotAsPDF)
... ...
@@ -1764,7 +1772,7 @@ plot_stats_connectionSummary <- function(GRN, type = "heatmap",
1764 1772
       }
1765 1773
     }
1766 1774
     
1767
-    .printWarningPageNumber(pages, pageCounter)
1775
+    .checkPageNumberValidity(pages, pageCounter)
1768 1776
     if (!is.null(file)) {
1769 1777
       grDevices::dev.off()
1770 1778
     }
... ...
@@ -1911,7 +1919,7 @@ plot_stats_connectionSummary <- function(GRN, type = "heatmap",
1911 1919
       
1912 1920
     } # end for TF_peak.fdr_cur
1913 1921
     
1914
-    .printWarningPageNumber(pages, pageCounter)
1922
+    .checkPageNumberValidity(pages, pageCounter)
1915 1923
     if (!is.null(file)) {
1916 1924
       grDevices::dev.off()
1917 1925
     }
... ...
@@ -1954,6 +1962,8 @@ plotGeneralGraphStats <- function(GRN, outputFolder = NULL, basenameOutput = NUL
1954 1962
   checkmate::assertClass(GRN, "GRN")
1955 1963
   GRN = .addFunctionLogToObject(GRN)
1956 1964
   
1965
+  GRN = .makeObjectCompatible(GRN)
1966
+  
1957 1967
   checkmate::assertFlag(plotAsPDF)
1958 1968
   checkmate::assertNumeric(pdf_width, lower = 5, upper = 99)
1959 1969
   checkmate::assertNumeric(pdf_height, lower = 5, upper = 99)
... ...
@@ -2113,7 +2123,7 @@ plotGeneralGraphStats <- function(GRN, outputFolder = NULL, basenameOutput = NUL
2113 2123
       
2114 2124
     }
2115 2125
     
2116
-    .printWarningPageNumber(pages, pageCounter)
2126
+    .checkPageNumberValidity(pages, pageCounter)
2117 2127
     
2118 2128
     if (plotAsPDF) {grDevices::dev.off()}
2119 2129
     
... ...
@@ -2127,11 +2137,11 @@ plotGeneralGraphStats <- function(GRN, outputFolder = NULL, basenameOutput = NUL
2127 2137
   
2128 2138
 }
2129 2139
 
2130
-.printWarningPageNumber <- function(pages, pageCounter) {
2140
+.checkPageNumberValidity <- function(pages, pageCounter) {
2131 2141
   
2132 2142
   if(any(pages > pageCounter)) {
2133
-    message = paste0("At least one page could not be plotted because the total number of plots is only ", pageCounter, " while a larger page number has been requested")
2134
-    .checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE)
2143
+    message = paste0("At least one page could not be plotted because the total number of plots is only ", pageCounter, " while a larger page number has been requested. To fix this, re-adjust the page number(s) and execute the function again.")
2144
+    .checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)
2135 2145
   }
2136 2146
 }
2137 2147
 
... ...
@@ -2172,6 +2182,8 @@ plotGeneralEnrichment <- function(GRN, outputFolder = NULL, basenameOutput = NUL
2172 2182
   checkmate::assertClass(GRN, "GRN")
2173 2183
   GRN = .addFunctionLogToObject(GRN)
2174 2184
   
2185
+  GRN = .makeObjectCompatible(GRN)
2186
+  
2175 2187
   
2176 2188
   ontologiesFound = names(GRN@stats$Enrichment$general)
2177 2189
   
... ...
@@ -2230,7 +2242,7 @@ plotGeneralEnrichment <- function(GRN, outputFolder = NULL, basenameOutput = NUL
2230 2242
       
2231 2243
     }
2232 2244
     
2233
-    .printWarningPageNumber(pages, pageCounter)
2245
+    .checkPageNumberValidity(pages, pageCounter)
2234 2246
     if (plotAsPDF) grDevices::dev.off()
2235 2247
     
2236 2248
   } else {
... ...
@@ -2349,6 +2361,7 @@ plotCommunitiesStats <- function(GRN, outputFolder = NULL, basenameOutput = NULL
2349 2361
   checkmate::assertClass(GRN, "GRN")
2350 2362
   GRN = .addFunctionLogToObject(GRN)
2351 2363
   
2364
+  GRN = .makeObjectCompatible(GRN)
2352 2365
   
2353 2366
   checkmate::assert(checkmate::checkNull(outputFolder), checkmate::checkCharacter(outputFolder, min.chars = 1))
2354 2367
   checkmate::assert(checkmate::checkNull(basenameOutput), checkmate::checkCharacter(basenameOutput, len = 1, min.chars = 1, any.missing = FALSE))
... ...
@@ -2505,7 +2518,7 @@ plotCommunitiesStats <- function(GRN, outputFolder = NULL, basenameOutput = NULL
2505 2518
       
2506 2519
     }
2507 2520
     
2508
-    .printWarningPageNumber(pages, pageCounter)
2521
+    .checkPageNumberValidity(pages, pageCounter)
2509 2522
     if (plotAsPDF) grDevices::dev.off()
2510 2523
     
2511 2524
   } else {
... ...
@@ -2522,13 +2535,13 @@ plotCommunitiesStats <- function(GRN, outputFolder = NULL, basenameOutput = NULL
2522 2535
 #' Similarly to \code{\link{plotGeneralEnrichment}} and \code{\link{plotTFEnrichment}}, the results of the community-based enrichment analysis are plotted.
2523 2536
 #' This function produces multiple plots. First, one plot per community to summarize the community-specific enrichment.
2524 2537
 #' Second, a summary heatmap of all significantly enriched terms across all communities and for the whole eGRN. The latter allows to compare the results with the general network enrichment.
2525
-#' #' Third, a subset of the aforementioned heatmap, showing only the top most significantly enriched terms per community and for the whole eGRN (as specified by \code{nID}) for improved visibility 
2538
+#' Third, a subset of the aforementioned heatmap, showing only the top most significantly enriched terms per community and for the whole eGRN (as specified by \code{nID}) for improved visibility 
2526 2539
 #' 
2527 2540
 #' @inheritParams plotGeneralEnrichment
2528 2541
 #' @param display Character. Default \code{"byRank"}. One of: \code{"byRank"}, \code{"byLabel"}. Specify whether the communities will be displayed based on their rank, where the largest community (with most vertices) would have a rank of 1, or by their label. Note that the label is independent of the rank.
2529 2542
 #' @param communities \code{NULL} or numeric vector. Default \code{NULL}. If set to \code{NULL}, the default, all communities enrichments that have been calculated before are plotted. If a numeric vector is specified: Depending on what was specified in the \code{display} parameter, this parameter indicates either the rank or the label of the communities to be plotted. i.e. for \code{communities = c(1,4)}, if \code{display = "byRank"} the results for the first and fourth largest communities are plotted. if \code{display = "byLabel"}, the results for the communities labeled \code{"1"}, and \code{"4"} are plotted. 
2530
-#' @param nSignificant Numeric. Default 3. Threshold to filter out an ontology term with less than \code{nSignificant} overlapping genes. 
2531
-#' @param nID Numeric. Default 10. For the reduced summary heatmap, number of top terms to select per community / for the general enrichment.
2543
+#' @param nSignificant Numeric >0. Default 3. Threshold to filter out an ontology term with less than \code{nSignificant} overlapping genes. 
2544
+#' @param nID Numeric >0. Default 10. For the reduced summary heatmap, number of top terms to select per community / for the general enrichment.
2532 2545
 #' @template maxWidth_nchar_plot
2533 2546
 #' @return  The same \code{\linkS4class{GRN}} object, without modifications.
2534 2547
 #' @seealso \code{\link{plotGeneralEnrichment}}
... ...
@@ -2551,6 +2564,8 @@ plotCommunitiesEnrichment <- function(GRN, outputFolder = NULL, basenameOutput =
2551 2564
   checkmate::assertClass(GRN, "GRN")
2552 2565
   GRN = .addFunctionLogToObject(GRN)
2553 2566
   
2567
+  GRN = .makeObjectCompatible(GRN)
2568
+  
2554 2569
   checkmate::assertChoice(display , c("byRank", "byLabel"))
2555 2570
   checkmate::assert(checkmate::checkNull(communities), checkmate::checkNumeric(communities, lower = 1, any.missing = FALSE, min.len = 1))
2556 2571
   checkmate::assertNumeric(p, lower = 0, upper = 1)
... ...
@@ -2791,7 +2806,7 @@ plotCommunitiesEnrichment <- function(GRN, outputFolder = NULL, basenameOutput =
2791 2806
     }
2792 2807
     
2793 2808
     
2794
-    .printWarningPageNumber(pages, pageCounter)
2809
+    .checkPageNumberValidity(pages, pageCounter)
2795 2810
     if (plotAsPDF) grDevices::dev.off()
2796 2811
     
2797 2812
   } else {
... ...
@@ -2895,6 +2910,8 @@ plotTFEnrichment <- function(GRN, rankType = "degree", n = NULL, TF.names = NULL
2895 2910
   start = Sys.time()
2896 2911
   checkmate::assertClass(GRN, "GRN")
2897 2912
   GRN = .addFunctionLogToObject(GRN)
2913
+  
2914
+  GRN = .makeObjectCompatible(GRN)
2898 2915
 
2899 2916
   checkmate::assertChoice(rankType, c("degree", "EV", "custom"))
2900 2917
   checkmate::assert(checkmate::checkNull(n), checkmate::checkNumeric(n))
... ...
@@ -2958,10 +2975,10 @@ plotTFEnrichment <- function(GRN, rankType = "degree", n = NULL, TF.names = NULL
2958 2975
     vertexMetadata = as.data.frame(igraph::vertex.attributes(GRN@graph$TF_gene$graph))
2959 2976
     nodeDegree = igraph::degree(GRN@graph$TF_gene$graph)
2960 2977
     
2961
-    # nodeDegree_TFset = sort(nodeDegree[GRN@data$TFs$translationTable %>% dplyr::filter(TF.name %in% as.character(TFset)) %>% 
2978
+    # nodeDegree_TFset = sort(nodeDegree[GRN@annotation$TFs %>% dplyr::filter(TF.name %in% as.character(TFset)) %>% 
2962 2979
     #                                      dplyr::pull(TF.ENSEMBL)], decreasing = TRUE)
2963 2980
     
2964
-    nodeDegree_TFset = dplyr::left_join(GRN@data$TFs$translationTable, 
2981
+    nodeDegree_TFset = dplyr::left_join(GRN@annotation$TFs, 
2965 2982
                                         as.data.frame(nodeDegree) %>% tibble::rownames_to_column("ENSEMBL"), by = "ENSEMBL") %>%
2966 2983
       dplyr::filter(.data$TF.name %in% as.character(TFset))
2967 2984
     
... ...
@@ -2981,7 +2998,7 @@ plotTFEnrichment <- function(GRN, rankType = "degree", n = NULL, TF.names = NULL
2981 2998
         
2982 2999
         if (is.null(pages) | (!is.null(pages) && pageCounter %in% pages)) {
2983 3000
           
2984
-          TF.ENSEMBL = GRN@data$TFs$translationTable %>% dplyr::filter(.data$TF.name == TFCur) %>% dplyr::pull(TF.ENSEMBL)
3001
+          TF.ENSEMBL = GRN@annotation$TFs %>% dplyr::filter(.data$TF.name == TFCur) %>% dplyr::pull(TF.ENSEMBL)
2985 3002
           
2986 3003
           TF.name.full = paste0(TFCur, " (", TF.ENSEMBL, ")")
2987 3004
           
... ...
@@ -3115,7 +3132,7 @@ plotTFEnrichment <- function(GRN, rankType = "degree", n = NULL, TF.names = NULL
3115 3132
       
3116 3133
     }
3117 3134
     
3118
-    .printWarningPageNumber(pages, pageCounter)
3135
+    .checkPageNumberValidity(pages, pageCounter)
3119 3136
     if (plotAsPDF) grDevices::dev.off()
3120 3137
     
3121 3138
   } else {
... ...
@@ -3304,9 +3321,9 @@ plotTFEnrichment <- function(GRN, rankType = "degree", n = NULL, TF.names = NULL
3304 3321
 
3305 3322
 ############### GRN visualization ###############
3306 3323
 
3307
-#' Visualize a filtered GRN in a flexible manner. 
3324
+#' Visualize a filtered eGRN in a flexible manner. 
3308 3325
 #' 
3309
-#' This function requires a filtered set of connections in the \code{\linkS4class{GRN}} object as generated by \code{\link{filterGRNAndConnectGenes}}
3326
+#' This function can visualize a filtered eGRN in a very flexible manner and requires a filtered set of connections in the \code{\linkS4class{GRN}} object as generated by \code{\link{filterGRNAndConnectGenes}}. 
3310 3327
 #'
3311 3328
 #' @template GRN 
3312 3329
 #' @template outputFolder
... ...
@@ -3314,11 +3331,11 @@ plotTFEnrichment <- function(GRN, rankType = "degree", n = NULL, TF.names = NULL
3314 3331
 #' @template plotAsPDF
3315 3332
 #' @template pdf_width
3316 3333
 #' @template pdf_height
3317
-#' @param title NULL or Character. Default NULL. Title to be assigned to the plot.
3318
-#' @param maxRowsToPlot Numeric. Default 500. Refers to the maximum number of connections to be plotted. If the network size is above this limit, nothing will be drawn. In such a case, it may help to either increase the value of this parameter or set the filtering criteria for the network to be more stringent, so that the network becomes smaller.
3334
+#' @param title \code{NULL} or Character. Default \code{NULL}. Title to be assigned to the plot.
3335
+#' @param maxRowsToPlot Integer > 0. Default 500. Refers to the maximum number of connections to be plotted. If the network size is above this limit, nothing will be drawn. In such a case, it may help to either increase the value of this parameter or set the filtering criteria for the network to be more stringent, so that the network becomes smaller.
3319 3336
 #' @param nCommunitiesMax Integer > 0. Default 8. Maximum number of communities that get a distinct coloring. All additional communities will be colored with the same (gray) color.
3320 3337
 #' @param graph Character. Default \code{TF-gene}. One of: \code{TF-gene}, \code{TF-peak-gene}. Whether to plot a graph with links from TFs to peaks to gene, or the graph with the inferred TF to gene connections.
3321
-#' @param colorby Character. Default \code{type}. One of \code{type}, code \code{community}. Color the vertices by either type (TF/peak/gene) or community. See \code{\link{calculateCommunitiesStats}}
3338
+#' @param colorby Character. Default \code{type}. Either \code{type} or \code{community}. Color the vertices by either type (TF/peak/gene) or community. See \code{\link{calculateCommunitiesStats}}
3322 3339
 #' @param layout Character. Default \code{fr}. One of \code{star}, \code{fr}, \code{sugiyama}, \code{kk}, \code{lgl}, \code{graphopt}, \code{mds}, \code{sphere}
3323 3340
 #' @param vertice_color_TFs Named list. Default \code{list(h = 10, c = 85, l = c(25, 95))}. The list must specify the color in hcl format (hue, chroma, luminence). See the \code{colorspace} package for more details and examples
3324 3341
 #' @param vertice_color_peaks Named list. Default \code{list(h = 135, c = 45, l = c(35, 95))}.
... ...
@@ -3343,10 +3360,12 @@ visualizeGRN <- function(GRN, outputFolder = NULL,  basenameOutput = NULL, plotA
3343 3360
     checkmate::assertClass(GRN, "GRN")
3344 3361
     GRN = .addFunctionLogToObject(GRN)
3345 3362
     
3363
+    GRN = .makeObjectCompatible(GRN)
3364
+    
3346 3365
     checkmate::assertFlag(plotAsPDF)
3347 3366
     checkmate::assertNumeric(pdf_width, lower = 5, upper = 99)
3348 3367
     checkmate::assertNumeric(pdf_height, lower = 5, upper = 99)
3349
-    checkmate::assertNumeric(maxRowsToPlot)
3368
+    checkmate::assertIntegerish(maxRowsToPlot, lower = 1)
3350 3369
     checkmate::assertIntegerish(nCommunitiesMax,lower = 1)
3351 3370
     checkmate::assertChoice(graph, c("TF-gene", "TF-peak-gene"))
3352 3371
     checkmate::assertChoice(colorby, c("type", "community"))
... ...
@@ -3354,10 +3373,13 @@ visualizeGRN <- function(GRN, outputFolder = NULL,  basenameOutput = NULL, plotA
3354 3373
     checkmate::assertChoice(layout, c("star", "fr", "sugiyama", "kk", "lgl", "graphopt", "mds", "sphere"))
3355 3374
     checkmate::assertList(vertice_color_TFs)
3356 3375
     checkmate::assertNames(names(vertice_color_TFs), must.include = c("h", "c", "l"), subset.of = c("h", "c", "l"))
3376
+    checkmate::assertIntegerish(unlist(vertice_color_TFs), lower = 0, upper = 360)
3357 3377
     checkmate::assertList(vertice_color_peaks)
3358 3378
     checkmate::assertNames(names(vertice_color_peaks), must.include = c("h", "c", "l"), subset.of = c("h", "c", "l"))
3379
+    checkmate::assertIntegerish(unlist(vertice_color_peaks), lower = 0, upper = 360)
3359 3380
     checkmate::assertList(vertice_color_genes)
3360 3381
     checkmate::assertNames(names(vertice_color_genes), must.include = c("h", "c", "l"), subset.of = c("h", "c", "l"))
3382
+    checkmate::assertIntegerish(unlist(vertice_color_genes), lower = 0, upper = 360)
3361 3383
     checkmate::assertNumeric(vertexLabel_cex)
3362 3384
     checkmate::assertNumeric(vertexLabel_dist)
3363 3385
     checkmate::assertFlag(forceRerun)
... ...
@@ -3390,10 +3412,9 @@ visualizeGRN <- function(GRN, outputFolder = NULL,  basenameOutput = NULL, plotA
3390 3412
         grn.merged$V1[!is.na(grn.merged$TF.name)] = as.character(grn.merged$TF.name[!is.na(grn.merged$TF.name)]) # replace TF ensembl with TF name
3391 3413
         
3392 3414
         edges_final = grn.merged %>%
3393
-            dplyr::mutate(R = as.vector(stats::na.omit(c(.data$TF_peak.r, .data$peak_gene.r))),
3394
-                          weight = as.vector(stats::na.omit(c(1- .data$TF_peak.fdr, .data$peak_gene.r))),
3415
+            dplyr::mutate(weight = .data$r,
3395 3416
                           linetype = "solid") %>%
3396
-            dplyr::rename(from = .data$V1, to = .data$V2) 
3417
+            dplyr::rename(from = .data$V1, to = .data$V2, R =  .data$r) 
3397 3418
         
3398 3419
     }
3399 3420
     
... ...
@@ -3730,7 +3751,7 @@ visualizeGRN <- function(GRN, outputFolder = NULL,  basenameOutput = NULL, plotA
3730 3751
                 community_colors = rbind(community_colors, fillercolors)
3731 3752
                 
3732 3753
             }else{
3733
-                community_colors = data.frame(community = names(sort(table(GRN@graph$TF_gene$clusterGraph$membership), decreasing = TRUE)[seq_len(nCommunitiesMax)]),
3754
+                community_colors = data.frame(community = names(sort(table(GRN@graph$TF_gene$clusterGraph$membership), decreasing = TRUE)[seq_len(ncommunities)]),
3734 3755
                                               color = grDevices::rainbow(ncommunities))
3735 3756
             }
3736 3757
             
... ...
@@ -3958,3 +3979,36 @@ visualizeGRN <- function(GRN, outputFolder = NULL,  basenameOutput = NULL, plotA
3958 3979
   checkmate::assertDataFrame(vertice_color_list[[1]])
3959 3980
   checkmate::assertSubset(c(vertice_color_list[[2]], vertice_color_list[[3]]), colnames(vertice_color_list[[1]]), empty.ok = FALSE)
3960 3981
 }
3982
+
3983
+
3984
+
3985
+
3986
+# **dist**
3987
+#     ```{r}
3988
+# # sort cols by median fraction of the variance explained (default) and plot
3989
+# plotVarPart( sortCols(varPart) )
3990
+# ```
3991
+# 
3992
+# **sex**
3993
+#     ```{r}
3994
+# # plot top 10
3995
+# head(df_varPart[order( dplyr::pull(df_varPart, c("sex")), decreasing = T),],10) %>% 
3996
+#     plotPercentBars()
3997
+# ```
3998
+# 
3999
+# **cell_subset**
4000
+#     ```{r}
4001
+# # plot top 10
4002
+# head(df_varPart[order( dplyr::pull(df_varPart, c("cell_subset")), decreasing = T),],10) %>% 
4003
+#     plotPercentBars()
4004
+# ```
4005
+# 
4006
+# **sample_loc**
4007
+#     ```{r}
4008
+# # plot top 10 for sample_loc
4009
+# head(df_varPart[order( dplyr::pull(df_varPart, c("sample_loc")), decreasing = T),],10) %>% 
4010
+#     plotPercentBars()
4011
+# ```
4012
+# 
4013
+# 
4014
+# }
... ...
@@ -30,7 +30,7 @@ addTFBS(
30 30
 \item{forceRerun}{\code{TRUE} or \code{FALSE}. Default \code{FALSE}. Force execution, even if the GRN object already contains the result. Overwrites the old results.}
31 31
 }
32 32
 \value{
33
-An updated \code{\linkS4class{GRN}} object, with additional information added from this function (\code{GRN@data$TFs$translationTable} in particular)
33
+An updated \code{\linkS4class{GRN}} object, with additional information added from this function (\code{GRN@annotation$TFs} in particular)
34 34
 }
35 35
 \description{
36 36
 Add TFBS to a \code{\linkS4class{GRN}} object
37 37
new file mode 100644
... ...
@@ -0,0 +1,56 @@
1
+% Generated by roxygen2: do not edit by hand
2
+% Please edit documentation in R/core.R
3
+\name{add_featureVariation}
4
+\alias{add_featureVariation}
5
+\title{Quantify and interpret multiple sources of biological and technical variation for features (TFs, peaks, and genes) in a \code{\linkS4class{GRN}} object}
6
+\usage{
7
+add_featureVariation(
8
+  GRN,
9
+  formula = "auto",
10
+  metadata = c("all"),
11
+  features = "all_filtered",
12
+  nCores = 1,
13
+  forceRerun = FALSE,
14
+  ...
15
+)
16
+}
17
+\arguments{
18
+\item{GRN}{Object of class \code{\linkS4class{GRN}}}
19
+
20
+\item{formula}{Character(1). Either \code{auto} or a manually defined formula to be used for the model fitting. Default \code{auto}. Must include only terms that are part of the metadata as specified with the \code{metadata} parameter. If set to \code{auto}, the formula will be build automatically based on all metadata variables as specified with the \code{metadata} parameter. By default, numerical variables will be modeled as fixed effects, while variables that are defined as factors or can be converted to factors (characters and logical variables) are modeled as random effects as recommended by the \code{variancePartition} package.}
21
+
22
+\item{metadata}{Character vector. Default \code{all}. Vector of column names from the metadata data frame that was provided when using the function 
23
+\code{\link{addData}}. Must either contain the special keyword \code{all} to denote that all (!) metadata columns from \code{GRN@data$metadata} are taken
24
+or if not, a subset of the column names from \code{GRN@data$metadata}to include in the model fitting for \code{fitExtractVarPartModel}..}
25
+
26
+\item{features}{Character(1). Either \code{all_filtered} or \code{all}. Default \code{all_filtered}. Should \code{variancePartition} only be run for the features (TFs, peaks and genes) from the filtered set of connections (the result of \code{\link{filterGRNAndConnectGenes}}) or for all genes that are defined in the object? If set to \code{all}, the running time is greatly increased.}
27
+
28
+\item{nCores}{Integer >0. Default 1. Number of cores to use. 
29
+A value >1 requires the \code{BiocParallel} package (as it is listed under \code{Suggests}, it may not be installed yet).}
30
+
31
+\item{forceRerun}{\code{TRUE} or \code{FALSE}. Default \code{FALSE}. Force execution, even if the GRN object already contains the result. Overwrites the old results.}
32
+
33
+\item{...}{Additional parameters passed on to \code{variancePartition::fitExtractVarPartModel} beyond \code{exprObj}, \code{formula} and \code{data}. See the function help for more information}
34
+}
35
+\value{
36
+An updated \code{\linkS4class{GRN}} object, with additional information added from this function to \code{GRN@stats$variancePartition} as well as the elements \code{genes}, \code{consensusPeaks} and \code{TFs} within \code{GRN@annotation}. 
37
+As noted above, the results are not added to \code{GRN@connections$all.filtered}; rerun the function \code{\link{getGRNConnections}} and set \code{include_variancePartitionResults} to \code{TRUE} to include the results in the eGRN output table.
38
+}
39
+\description{
40
+Runs the main function \code{fitExtractVarPartModel} of the package \code{variancePartition}: Fits a linear (mixed) model to estimate contribution of multiple sources of variation while simultaneously correcting for all other variables for the features in a GRN object (TFs, peaks, and genes) given particular metadata. The function reports the fraction of variance attributable to each metadata variable.
41
+\strong{Note: The results are not added to \code{GRN@connections$all.filtered}, rerun the function \code{\link{getGRNConnections}} and set \code{include_variancePartitionResults} to \code{TRUE} to do so}.
42
+The results object is stored in \code{GRN@stats$variancePartition} and can be used for the various diagnostic and plotting functions from \code{variancePartition}.
43
+}
44
+\details{
45
+The normalized count matrices are used as input for \code{fitExtractVarPartModel}.
46
+}
47
+\examples{
48
+# See the Workflow vignette on the GRaNIE website for examples
49
+GRN = loadExampleObject()
50
+GRN = add_featureVariation(GRN, metadata = c("mt_frac"), forceRerun = TRUE)
51
+}
52
+\seealso{
53
+\code{\link{addData}}
54
+
55
+\code{\link{getGRNConnections}}
56
+}
... ...
@@ -39,7 +39,8 @@ The special keyword \code{"all"} means no filter on gene type.}
39 39
 
40 40
 \item{selection}{Character. Default \code{"byRank"}. One of: \code{"byRank"}, \code{"byLabel"}. Specify whether the communities enrichment will by calculated based on their rank, where the largest community (with most vertices) would have a rank of 1, or by their label. Note that the label is independent of the rank.}
41 41
 
42
-\item{communities}{Numeric vector. Default \code{c(1:10)}. Depending on what was specified in the \code{display} parameter, this parameter would indicate either the rank or the label of the communities to be plotted. i.e. for \code{communities = c(1,4)}, if \code{display = "byRank"} the GO enrichment for the first and fourth largest communities will be calculated if \code{display = "byLabel"}, the results for the communities labeled "1", and "4" will be plotted.}
42
+\item{communities}{Numeric vector. Default \code{c(1:10)}. Depending on what was specified in the \code{display} parameter, 
43
+this parameter would indicate either the rank or the label of the communities to be plotted. i.e. for \code{communities = c(1,4)}, if \code{display = "byRank"} the GO enrichment for the first and fourth largest communities will be calculated if \code{display = "byLabel"}, the results for the communities labeled "1", and "4" will be plotted.}
43 44
 
44 45
 \item{pAdjustMethod}{Character. Default \code{"BH"}. One of: \code{"holm"}, \code{"hochberg"}, \code{"hommel"}, \code{"bonferroni"}, \code{"BH"}, \code{"BY"}, \code{"fdr"}. This parameter is only relevant for the following ontologies: KEGG, DO, Reactome. For the other ontologies, the algorithm serves as an adjustment.}
45 46
 
... ...
@@ -16,7 +16,8 @@ calculateCommunitiesStats(GRN, clustering = "louvain", forceRerun = FALSE, ...)
16 16
 \item{...}{Additional parameters for the used clustering method, see the \code{igraph::cluster_*} methods for details on the specific parameters and what they do. For \code{leiden} clustering, for example, you may add a \code{resolution_parameter} to control the granularity of the community detection or \code{n_iterations} to modify the number of iterations.}
17 17
 }
18 18
 \value{
19
-An updated \code{\linkS4class{GRN}} object, with a table that consists of the connections clustered into communities stored in the \code{stats$communities} slot.
19
+An updated \code{\linkS4class{GRN}} object, with a table that consists of the connections clustered into communities stored in the 
20
+\code{GRN@graph$TF_gene$clusterGraph} slot as well as within the \code{igraph} object in \code{GRN@graph$TF_gene$graph} (retrievable via \code{igraph} using \code{igraph::vertex.attributes(GRN@graph$TF_gene$graph)$community}, for example.)
20 21
 }
21 22
 \description{
22 23
 The results can subsequently be visualized with the function \code{\link{plotCommunitiesStats}}
... ...
@@ -51,7 +51,11 @@ filterData(
51 51
 An updated \code{\linkS4class{GRN}} object, with added data from this function.
52 52
 }
53 53
 \description{
54
-Filter RNA-seq and/or peak data from a \code{\linkS4class{GRN}} object
54
+This function marks genes and/or peaks as \code{filtered} depending on the chosen filtering criteria. Filtered genes / peaks will then be 
55
+disregarded when adding connections in subsequent steps via \code{\link{addConnections_TF_peak}} and  \code{\link{addConnections_peak_gene}}. \strong{This function does NOT (re)filter existing connections when the \code{\linkS4class{GRN}} object already contains connections. Thus, upon re-execution of this function with different filtering criteria, all downstream steps have to be re-run.}
56
+}
57
+\details{
58
+All this function does is setting (or modifying) the filtering flag in \code{GRN@data$peaks$counts_norm} and \code{GRN@data$RNA$counts_norm.l}, respectively.
55 59
 }
56 60
 \examples{
57 61
 # See the Workflow vignette on the GRaNIE website for examples
... ...
@@ -80,12 +80,14 @@ keeping backwards compatibility with \code{\linkS4class{GRN}} objects.}
80 80
 \item{silent}{\code{TRUE} or \code{FALSE}.  Default \code{FALSE}. Print progress messages and filter statistics.}
81 81
 }
82 82
 \value{
83
-An updated \code{\linkS4class{GRN}} object, with additional information added from this function. The filtered and merged TF-peak and peak-gene connections in the slot \code{GRN@connections$all.filtered}.
83
+An updated \code{\linkS4class{GRN}} object, with additional information added from this function. 
84
+The filtered and merged TF-peak and peak-gene connections in the slot \code{GRN@connections$all.filtered} and can be retrieved (along with other feature metadata) using the function \code{\link{getGRNConnections}}.
84 85
 }
85 86
 \description{
86 87
 This is one of the main integrative functions of the \code{GRaNIE} package. It has two main functions: 
87 88
 First, filtering both TF-peak and peak-gene connections according to different criteria such as FDR and other properties 
88 89
 Second, joining the three major elements that an eGRN consist of (TFs, peaks, genes) into one data frame, with one row per unique TF-peak-gene connection.
90
+\strong{After successful execution, the connections (along with additional feature metadata) can be retrieved with the function \code{\link{getGRNConnections}}.}
89 91
 \strong{Note that the eGRN graph is reset upon successful execution of this function, and re-running the function \code{\link{build_eGRN_graph}} 
90 92
 is necessary when any of the network functions of the package shall be executed.}
91 93
 Internally, before joining them, both TF-peak links and peak-gene connections are filtered separately for reasons of memory and computational efficacy:
... ...
@@ -106,4 +108,6 @@ GRN = filterGRNAndConnectGenes(GRN)
106 108
 \code{\link{addConnections_peak_gene}}
107 109
 
108 110
 \code{\link{build_eGRN_graph}}
111
+
112
+\code{\link{getGRNConnections}}
109 113
 }
... ...
@@ -8,17 +8,32 @@ getGRNConnections(
8 8
   GRN,
9 9
   type = "all.filtered",
10 10
   permuted = FALSE,
11
-  include_TF_gene_correlations = FALSE
11
+  include_TF_gene_correlations = FALSE,
12
+  include_TFMetadata = FALSE,
13
+  include_peakMetadata = FALSE,
14
+  include_geneMetadata = FALSE,
15
+  include_variancePartitionResults = FALSE
12 16
 )
13 17
 }
14 18
 \arguments{
15 19
 \item{GRN}{Object of class \code{\linkS4class{GRN}}}
16 20
 
17
-\item{type}{Character. One of \code{TF_peaks}, \code{peak_genes}, \code{all.filtered}. Default \code{all.filtered}. The type of connections to retrieve.}
21
+\item{type}{Character. One of \code{TF_peaks}, \code{peak_genes}, \code{TF_genes} or \code{all.filtered}. Default \code{all.filtered}. The type of connections to retrieve.}
18 22
 
19 23
 \item{permuted}{\code{TRUE} or \code{FALSE}. Default \code{FALSE}. Should the permuted data be taken (\code{TRUE}) or the non-permuted, original one (\code{FALSE})?}
20 24
 
21
-\item{include_TF_gene_correlations}{Logical. \code{TRUE} or \code{FALSE}. Default \code{FALSE}. Should TFs and gene correlations be returned as well? If set to \code{TRUE}, they must have been computed beforehand with \code{\link{add_TF_gene_correlation}}.}
25
+\item{include_TF_gene_correlations}{Logical. \code{TRUE} or \code{FALSE}. Default \code{FALSE}. Should TFs and gene correlations be returned as well? If set to \code{TRUE}, they must have been computed beforehand with \code{\link{add_TF_gene_correlation}}. Only relevant for type = "all.filtered"}
26
+
27
+\item{include_TFMetadata}{Logical. \code{TRUE} or \code{FALSE}. Default \code{FALSE}. Should TF metadata be returned as well? Only relevant for type = "all.filtered"}
28
+
29
+\item{include_peakMetadata}{Logical. \code{TRUE} or \code{FALSE}. Default \code{FALSE}. Should peak metadata be returned as well?  Only relevant for type = "all.filtered"}
30
+
31
+\item{include_geneMetadata}{Logical. \code{TRUE} or \code{FALSE}. Default \code{FALSE}. Should gene metadata be returned as well?  Only relevant for type = "all.filtered"}
32
+
33
+\item{include_variancePartitionResults}{Logical. \code{TRUE} or \code{FALSE}. Default \code{FALSE}. 
34
+Should the results from the function \code{\link{add_featureVariation}} be included? 
35
+If set to \code{TRUE}, they must have been computed beforehand with \code{\link{add_featureVariation}}; otherwise, an error is thrown.
36
+Only relevant for type = "all.filtered"}
22 37
 }
23 38
 \value{
24 39
 A data frame with the requested connections. This function does **NOT** return a \code{\linkS4class{GRN}} object.
... ...
@@ -34,4 +49,8 @@ GRN_con.all.df = getGRNConnections(GRN)
34 49
 }
35 50
 \seealso{
36 51
 \code{\link{filterGRNAndConnectGenes}}
52
+
53
+\code{\link{add_featureVariation}}
54
+
55
+\code{\link{add_TF_gene_correlation}}
37 56
 }
... ...
@@ -38,9 +38,9 @@ plotCommunitiesEnrichment(
38 38
 
39 39
 \item{p}{Numeric. Default 0.05. p-value threshold to determine significance.}
40 40
 
41
-\item{nSignificant}{Numeric. Default 3. Threshold to filter out an ontology term with less than \code{nSignificant} overlapping genes.}
41
+\item{nSignificant}{Numeric >0. Default 3. Threshold to filter out an ontology term with less than \code{nSignificant} overlapping genes.}
42 42
 
43
-\item{nID}{Numeric. Default 10. For the reduced summary heatmap, number of top terms to select per community / for the general enrichment.}
43
+\item{nID}{Numeric >0. Default 10. For the reduced summary heatmap, number of top terms to select per community / for the general enrichment.}
44 44
 
45 45
 \item{maxWidth_nchar_plot}{Integer (>=10). Default 50. Maximum number of characters for a term before it is truncated.}
46 46
 
... ...
@@ -63,7 +63,7 @@ The same \code{\linkS4class{GRN}} object, without modifications.
63 63
 Similarly to \code{\link{plotGeneralEnrichment}} and \code{\link{plotTFEnrichment}}, the results of the community-based enrichment analysis are plotted.
64 64
 This function produces multiple plots. First, one plot per community to summarize the community-specific enrichment.
65 65
 Second, a summary heatmap of all significantly enriched terms across all communities and for the whole eGRN. The latter allows to compare the results with the general network enrichment.
66
-#' Third, a subset of the aforementioned heatmap, showing only the top most significantly enriched terms per community and for the whole eGRN (as specified by \code{nID}) for improved visibility
66
+Third, a subset of the aforementioned heatmap, showing only the top most significantly enriched terms per community and for the whole eGRN (as specified by \code{nID}) for improved visibility
67 67
 }
68 68
 \examples{
69 69
 # See the Workflow vignette on the GRaNIE website for examples
... ...
@@ -37,9 +37,9 @@ plotTFEnrichment(
37 37
 
38 38
 \item{p}{Numeric. Default 0.05. p-value threshold to determine significance.}
39 39
 
40
-\item{nSignificant}{Numeric. Default 3. Threshold to filter out an ontology term with less than \code{nSignificant} overlapping genes.}
40
+\item{nSignificant}{Numeric >0. Default 3. Threshold to filter out an ontology term with less than \code{nSignificant} overlapping genes.}
41 41
 
42
-\item{nID}{Numeric. Default 10. For the reduced summary heatmap, number of top terms to select per community / for the general enrichment.}
42
+\item{nID}{Numeric >0. Default 10. For the reduced summary heatmap, number of top terms to select per community / for the general enrichment.}
43 43
 
44 44
 \item{display_pAdj}{\code{TRUE} or \code{FALSE}. Default \code{FALSE}. Is the p-value being displayed in the plots the adjusted p-value? This parameter is relevant for KEGG, Disease Ontology, and Reactome enrichments, and does not affect GO enrichments.}
45 45
 
... ...
@@ -2,7 +2,7 @@
2 2
 % Please edit documentation in R/plot.R
3 3
 \name{visualizeGRN}
4 4
 \alias{visualizeGRN}
5
-\title{Visualize a filtered GRN in a flexible manner.}
5
+\title{Visualize a filtered eGRN in a flexible manner.}
6 6
 \usage{
7 7
 visualizeGRN(
8 8
   GRN,
... ...
@@ -38,15 +38,15 @@ visualizeGRN(
38 38
 
39 39
 \item{pdf_height}{Number >0. Default 12. Height of the PDF, in cm.}
40 40
 
41
-\item{title}{NULL or Character. Default NULL. Title to be assigned to the plot.}
41
+\item{title}{\code{NULL} or Character. Default \code{NULL}. Title to be assigned to the plot.}
42 42
 
43
-\item{maxRowsToPlot}{Numeric. Default 500. Refers to the maximum number of connections to be plotted. If the network size is above this limit, nothing will be drawn. In such a case, it may help to either increase the value of this parameter or set the filtering criteria for the network to be more stringent, so that the network becomes smaller.}
43
+\item{maxRowsToPlot}{Integer > 0. Default 500. Refers to the maximum number of connections to be plotted. If the network size is above this limit, nothing will be drawn. In such a case, it may help to either increase the value of this parameter or set the filtering criteria for the network to be more stringent, so that the network becomes smaller.}
44 44
 
45 45
 \item{nCommunitiesMax}{Integer > 0. Default 8. Maximum number of communities that get a distinct coloring. All additional communities will be colored with the same (gray) color.}
46 46
 
47 47
 \item{graph}{Character. Default \code{TF-gene}. One of: \code{TF-gene}, \code{TF-peak-gene}. Whether to plot a graph with links from TFs to peaks to gene, or the graph with the inferred TF to gene connections.}
48 48
 
49
-\item{colorby}{Character. Default \code{type}. One of \code{type}, code \code{community}. Color the vertices by either type (TF/peak/gene) or community. See \code{\link{calculateCommunitiesStats}}}
49
+\item{colorby}{Character. Default \code{type}. Either \code{type} or \code{community}. Color the vertices by either type (TF/peak/gene) or community. See \code{\link{calculateCommunitiesStats}}}
50 50
 
51 51
 \item{layout}{Character. Default \code{fr}. One of \code{star}, \code{fr}, \code{sugiyama}, \code{kk}, \code{lgl}, \code{graphopt}, \code{mds}, \code{sphere}}
52 52
 
... ...
@@ -66,7 +66,7 @@ visualizeGRN(
66 66
 The same \code{\linkS4class{GRN}} object, without modifications.
67 67
 }
68 68
 \description{
69
-This function requires a filtered set of connections in the \code{\linkS4class{GRN}} object as generated by \code{\link{filterGRNAndConnectGenes}}
69
+This function can visualize a filtered eGRN in a very flexible manner and requires a filtered set of connections in the \code{\linkS4class{GRN}} object as generated by \code{\link{filterGRNAndConnectGenes}}.
70 70
 }
71 71
 \examples{
72 72
 GRN = loadExampleObject()
73 73
deleted file mode 100644
... ...
@@ -1,4 +0,0 @@
1
-library(testthat)
2
-library(GRaNIE)
3
-
4
-test_check("GRaNIE")
... ...
@@ -34,13 +34,6 @@ opts_chunk$set(
34 34
 # Motivation and Necessity  {#motivation}
35 35
 
36 36
 
37
-<center>
38
-```{r logo, echo=FALSE, fig.cap="<i>Macrophages eGRN as constructed by the `GRaNIE` package</i>", out.width = '50%', fig.align="center"}
39
-knitr::include_graphics("../man/figures/logo.png")
40
-```
41
-</center>
42
-
43
-
44 37
 Genetic variants associated with diseases often affect non-coding regions, thus likely having a regulatory role. To understand the effects of genetic variants in these regulatory regions, identifying genes that are modulated by specific regulatory elements (REs) is crucial. The effect of gene regulatory elements, such as enhancers, is often cell-type specific, likely because the combinations of transcription factors (TFs) that are regulating a given enhancer have cell-type specific activity. This TF activity can be quantified with existing tools such as `diffTF` and captures differences in binding of a TF in open chromatin regions. Collectively, this forms a enhancer-mediated gene regulatory network (*eGRN*) with cell-type and data-specific TF-RE and RE-gene links. Here, we reconstruct such a *eGRN* using bulk RNA-seq and open chromatin (e.g., using ATAC-seq or ChIP-seq for open chromatin marks) and optionally TF activity data. Our network contains different types of links, connecting TFs to regulatory elements, the latter of which is connected to genes in the vicinity or within the same chromatin domain (*TAD*). We use a statistical framework to assign empirical FDRs and weights to all links using a permutation-based approach.
45 38
 
46 39
 In summary, we present a framework to reconstruct predictive enhancer-mediated regulatory network models that are based on integrating of expression and chromatin accessibility/activity pattern across individuals, and provide a comprehensive resource of cell-type specific gene regulatory networks for particular cell types.
... ...
@@ -68,6 +61,7 @@ The basic installation installs all required packages but not necessarily those
68 61
     - `IHW`: Needed only for the function `filterGRNAndConnectGenes` for p-value adjustment of the raw peak-gene p-values when using the `IHW` framework (parameter `peak_gene.fdr.method`)
69 62
     - `robust`: Needed only when setting `addRobustRegression = TRUE` in the functions `addConnections_peak_gene()` and `add_TF_gene_correlation()` for enabling robust regression (an experimental, largely undocumented feature so far). 
70 63
     - `csaw`: Needed only for a *cyclic LOESS* normalization, which is the default normalization currently for adding TF activity data via `addData_TFActivity()`
64
+    - `variancePartition`:  Needed only for the function `add_featureVariation` to quantify multiple sources of biological and technical variation for features (TFs, peaks, and genes)
71 65
 - **Enrichment-associated packages**
72 66
     - `topGO`, `ReactomePA`, `DOSE`, `clusterProfiler`: Needed only for all enrichment functions for `GO`, `Reactome`, `DO`, and `KEGG` respectively
73 67
 - **Computational efficacy**
... ...
@@ -425,15 +425,19 @@ saveRDS(GRN, GRN_file_outputRDS)
425 425
         
426 426
 ## Retrieve filtered connections
427 427
 
428
-We are now ready to retrieve the connections and the additional data we added to them. This can be done with the helper function `getGRNConnections()` that retrieves a data frame from a `GRaNIE` object from a particular slot. Here, we specify `all.filtered`, as we want to retrieve all filtered connections. For more parameter details, see the R help (`getGRNConnections`). Note that the first time, we assign a different variable to the return of the function (i.e., `GRN_connections.all` and NOT `GRN` as before). Importantly, we have to select a new variable as we would otherwise overwrite our `GRN` object altogether! All `get` functions from the `GRaNIE` package return an element from within the object and NOT the object itself, so please keep that in mind and always check what the functions returns before running it. You can simply do so in the R help (`?getGRNConnections`). 
428
+We are now ready to retrieve the filtered connections, along with adding various additional metadata (optional). This can be done with the helper function `getGRNConnections()` that retrieves the filtered eGRN as a data frame from a `GRaNIE` object. Here, we specify `all.filtered`, as we want to retrieve all filtered connections (i.e., the eGRN). For more parameter details, see the R help (`getGRNConnections`). 
429
+
430
+Here, for example, we add various additional information to the resulting data frame: TF-gene correlations, and gene metadata. We could also add TF and peak metadata as well as the results from running the `variancePartition` package, but this is not done here and we leave this as an exercise to the reader!
431
+
432
+Note that the first time, we assign a different variable to the return of the function (i.e., `GRN_connections.all` and NOT `GRN` as before). Importantly, we have to select a new variable as we would otherwise overwrite our `GRN` object altogether! All `get` functions from the `GRaNIE` package return an element from within the object and NOT the object itself, so please keep that in mind and always check what the functions returns before running it. You can simply do so in the R help (`?getGRNConnections`). 
429 433
 
430 434
 ```{r, include=TRUE, eval = TRUE, class.output="scroll-200"}
431
-GRN_connections.all = getGRNConnections(GRN, type = "all.filtered", include_TF_gene_correlations = TRUE)
435
+GRN_connections.all = getGRNConnections(GRN, type = "all.filtered", include_TF_gene_correlations = TRUE, include_geneMetadata = TRUE)
432 436
 
433 437
 GRN_connections.all
434 438
 ```
435 439
 
436
-The table contains many columns, and the prefix of each column name indicates the part of the *eGRN* network that the column refers to (e.g., TFs, TF-enhancers, enhancers, enhancer-genes or genes, or TF-gene if the function `add_TF_gene_correlation()` has been run before). Data are stored in a format that minimizes the memory footprint (e.g., each character column is stored as a factor). This table can now be used for any downstream analysis, as it is just a normal data frame.
440
+The table contains many columns, and the prefix of each column name indicates the part of the *eGRN* network that the column refers to (e.g., TFs, TF-peaks, peaks, peaks-genes or genes, or TF-gene if the function `add_TF_gene_correlation()` has been run before). Data are stored in a format that minimizes the memory footprint (e.g., each character column is stored as a factor). This table can now be used for any downstream analysis, as it is just a normal data frame.
437 441
 
438 442
 
439 443
 ## Construct the eGRN graph