... | ... |
@@ -1,6 +1,6 @@ |
1 | 1 |
Package: GRaNIE |
2 | 2 |
Title: GRaNIE: Reconstruction cell type specific gene regulatory networks including enhancers using chromatin accessibility and RNA-seq data |
3 |
-Version: 1.1.12 |
|
3 |
+Version: 1.1.13 |
|
4 | 4 |
Encoding: UTF-8 |
5 | 5 |
Authors@R: c(person("Christian", "Arnold", email = |
6 | 6 |
"chrarnold@web.de", role = c("cre","aut")), |
... | ... |
@@ -81,7 +81,9 @@ Suggests: |
81 | 81 |
csaw, |
82 | 82 |
BiocParallel, |
83 | 83 |
topGO, |
84 |
- robust |
|
84 |
+ robust, |
|
85 |
+ variancePartition, |
|
86 |
+ purrr |
|
85 | 87 |
VignetteBuilder: knitr |
86 | 88 |
biocViews: Software, GeneExpression, GeneRegulation, NetworkInference, GeneSetEnrichment, BiomedicalInformatics, Genetics, Transcriptomics, ATACSeq, RNASeq, GraphAndNetwork, Regression, Transcription, ChIPSeq |
87 | 89 |
License: Artistic-2.0 |
... | ... |
@@ -1,4 +1,6 @@ |
1 |
-# GRaNIE 1.1.12 (2022-09-13) |
|
1 |
+ |
|
2 |
+ |
|
3 |
+# GRaNIE 1.1.12 and 1.1.13 (2022-09-13) |
|
2 | 4 |
|
3 | 5 |
## Major changes |
4 | 6 |
|
... | ... |
@@ -7,9 +9,12 @@ |
7 | 9 |
- the eGRN graph structure as built by `build_eGRN_graph()` in the `GRaNIE` object is now reset whenever the function `filterGRNAndConnectGenes()` is successfully executed to make sure that enrichment functions etc are not using an outdated graph structure. |
8 | 10 |
- the landing page of the website has been extended and overhauled |
9 | 11 |
- removed some dependency packages and moved others into `Suggests` to lower the installation burden of the package. In addition, removed `topGO` from the `Depends` section (now in `Suggests`) and removed `tidyverse` altogether (before in `Depends`). Detailed explanations when and how the packages listed under `Suggests` are needed can now be found in the new [Package Details Vignette](https://grp-zaugg.embl-community.io/GRaNIE/articles/GRaNIE_packageDetails.html#installation) and are clearly given to the user when executing the respective functions |
12 |
+- major updates to the function `getGRNConnections`, which now has more arguments allowing a more fine-tuned and rich retrieval of eGRN connections, features and feature metadata |
|
13 |
+- a new function `add_featureVariation` to quantify and interpret multiple sources of biological and technical variation for features (TFs, peaks, and genes) in a GRN object, see the R help for more information |
|
14 |
+- `filterGRNAndConnectGenes` now doesnt include feature metadata columns to save space in the result data frame that is created. The help has been updated to make clear that `getGRNConnections` includes these features now. |
|
10 | 15 |
|
11 | 16 |
## Minor changes |
12 |
- |
|
17 |
+- small changes in the GRN object structure, moved `GRN@data$TFs@translationTable` to `GRN@annotation@TFs`. All exported functions run automatically a small helper function to make this change for any GRN object automatically to adapt to the new structure |
|
13 | 18 |
- many small changes in the code, updated argument checking, and preparing rigorous unit test inclusion |
14 | 19 |
- internally renaming the (recently changed / renamed) gene type `lncRNA` from `biomaRt` to `lincRNA` to be compatible with older versions of `GRaNIE` |
15 | 20 |
|
... | ... |
@@ -165,8 +165,8 @@ addData <- function(GRN, counts_peaks, normalization_peaks = "DESeq_sizeFactor", |
165 | 165 |
start = Sys.time() |
166 | 166 |
|
167 | 167 |
checkmate::assertClass(GRN, "GRN") |
168 |
- GRN = .addFunctionLogToObject(GRN) |
|
169 |
- |
|
168 |
+ GRN = .addFunctionLogToObject(GRN) |
|
169 |
+ |
|
170 | 170 |
checkmate::assertDataFrame(counts_peaks, min.rows = 1, min.cols = 2) |
171 | 171 |
checkmate::assertDataFrame(counts_rna, min.rows = 1, min.cols = 2) |
172 | 172 |
checkmate::assertCharacter(idColumn_peaks, min.chars = 1, len = 1) |
... | ... |
@@ -224,7 +224,9 @@ addData <- function(GRN, counts_peaks, normalization_peaks = "DESeq_sizeFactor", |
224 | 224 |
# Check uniqueness of IDs |
225 | 225 |
nDuplicateRows = nrow(counts_rna) - length(unique(counts_rna$ENSEMBL)) |
226 | 226 |
if (nDuplicateRows > 0) { |
227 |
- futile.logger::flog.warn(paste0(" Found ", nDuplicateRows, " duplicate rows in RNA-Seq data, consolidating them by summing them up.")) |
|
227 |
+ message = paste0(" Found ", nDuplicateRows, " duplicate rows in RNA-Seq data, consolidating them by summing them up.") |
|
228 |
+ .checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE) |
|
229 |
+ |
|
228 | 230 |
counts_rna = counts_rna %>% |
229 | 231 |
dplyr::group_by(.data$ENSEMBL) %>% |
230 | 232 |
dplyr::summarise_if(is.numeric, sum) |
... | ... |
@@ -266,7 +268,9 @@ addData <- function(GRN, counts_peaks, normalization_peaks = "DESeq_sizeFactor", |
266 | 268 |
|
267 | 269 |
# Force the first column to be the ID column |
268 | 270 |
if ("sampleID" %in% colnames(GRN@data$metadata)) { |
269 |
- futile.logger::flog.warn("Renaming the first column to sampleID. However, this column already exists, it will be renamed accordingly.") |
|
271 |
+ message = paste0("Renaming the first column to sampleID. However, this column already exists, it will be renamed accordingly.") |
|
272 |
+ .checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE) |
|
273 |
+ |
|
270 | 274 |
colnames(GRN@data$metadata)[which(colnames(GRN@data$metadata) == "sampleID")] = "sampleID_original" |
271 | 275 |
|
272 | 276 |
} |
... | ... |
@@ -544,15 +548,15 @@ addData <- function(GRN, counts_peaks, normalization_peaks = "DESeq_sizeFactor", |
544 | 548 |
GENENAME = as.factor(.data$GENENAME), |
545 | 549 |
SYMBOL = as.factor(.data$SYMBOL)), |
546 | 550 |
by = c("peak.ID" = "peakID")) %>% |
547 |
- dplyr::rename(peak.gene.chr = .data$geneChr, |
|
548 |
- peak.gene.start = .data$geneStart, |
|
549 |
- peak.gene.end = .data$geneEnd, |
|
550 |
- peak.gene.length = .data$geneLength, |
|
551 |
- peak.gene.strand = .data$geneStrand, |
|
552 |
- peak.gene.name = .data$GENENAME, |
|
553 |
- peak.gene.distanceToTSS = .data$distanceToTSS, |
|
554 |
- peak.gene.ENSEMBL = .data$ENSEMBL, |
|
555 |
- peak.gene.symbol = .data$SYMBOL, |
|
551 |
+ dplyr::rename(peak.nearestGene.chr = .data$geneChr, |
|
552 |
+ peak.nearestGene.start = .data$geneStart, |
|
553 |
+ peak.nearestGene.end = .data$geneEnd, |
|
554 |
+ peak.nearestGene.length = .data$geneLength, |
|
555 |
+ peak.nearestGene.strand = .data$geneStrand, |
|
556 |
+ peak.nearestGene.name = .data$GENENAME, |
|
557 |
+ peak.nearestGene.distanceToTSS = .data$distanceToTSS, |
|
558 |
+ peak.nearestGene.ENSEMBL = .data$ENSEMBL, |
|
559 |
+ peak.nearestGene.symbol = .data$SYMBOL, |
|
556 | 560 |
peak.annotation = .data$annotation |
557 | 561 |
) |
558 | 562 |
|
... | ... |
@@ -606,7 +610,7 @@ addData <- function(GRN, counts_peaks, normalization_peaks = "DESeq_sizeFactor", |
606 | 610 |
#' @importFrom rlang .data `:=` |
607 | 611 |
.calcGCContentPeaks <- function(GRN) { |
608 | 612 |
|
609 |
- futile.logger::flog.info(paste0("Calculate GC-content for peaks... ")) |
|
613 |
+ futile.logger::flog.info(paste0("Calculate GC-content for peaks. This may take a while")) |
|
610 | 614 |
start = Sys.time() |
611 | 615 |
genomeAssembly = GRN@config$parameters$genomeAssembly |
612 | 616 |
#TODO: GC content for all peaks |
... | ... |
@@ -655,6 +659,11 @@ addData <- function(GRN, counts_peaks, normalization_peaks = "DESeq_sizeFactor", |
655 | 659 |
|
656 | 660 |
#' Filter RNA-seq and/or peak data from a \code{\linkS4class{GRN}} object |
657 | 661 |
#' |
662 |
+#' This function marks genes and/or peaks as \code{filtered} depending on the chosen filtering criteria. Filtered genes / peaks will then be |
|
663 |
+#' disregarded when adding connections in subsequent steps via \code{\link{addConnections_TF_peak}} and \code{\link{addConnections_peak_gene}}. \strong{This function does NOT (re)filter existing connections when the \code{\linkS4class{GRN}} object already contains connections. Thus, upon re-execution of this function with different filtering criteria, all downstream steps have to be re-run.} |
|
664 |
+#' |
|
665 |
+#' All this function does is setting (or modifying) the filtering flag in \code{GRN@data$peaks$counts_norm} and \code{GRN@data$RNA$counts_norm.l}, respectively. |
|
666 |
+#' |
|
658 | 667 |
#' @template GRN |
659 | 668 |
#' @param minNormalizedMean_peaks Numeric[0,] or \code{NULL}. Default 5. Minimum mean across all samples for a peak to be retained for the normalized counts table. Set to \code{NULL} for not applying the filter. |
660 | 669 |
#' @param maxNormalizedMean_peaks Numeric[0,] or \code{NULL}. Default \code{NULL}. Maximum mean across all samples for a peak to be retained for the normalized counts table. Set to \code{NULL} for not applying the filter. |
... | ... |
@@ -687,6 +696,8 @@ filterData <- function (GRN, |
687 | 696 |
|
688 | 697 |
checkmate::assertClass(GRN, "GRN") |
689 | 698 |
GRN = .addFunctionLogToObject(GRN) |
699 |
+ |
|
700 |
+ GRN = .makeObjectCompatible(GRN) |
|
690 | 701 |
|
691 | 702 |
checkmate::assertNumber(minNormalizedMean_peaks, lower = 0, null.ok = TRUE) |
692 | 703 |
checkmate::assertNumber(minNormalizedMeanRNA, lower = 0, null.ok = TRUE) |
... | ... |
@@ -936,7 +947,7 @@ filterData <- function (GRN, |
936 | 947 |
#' @param fileEnding Character. Default \code{".bed"}. File ending for the files from the motif folder. |
937 | 948 |
#' @param nTFMax \code{NULL} or Integer[1,]. Default \code{NULL}. Maximal number of TFs to import. Can be used for testing purposes, e.g., setting to 5 only imports 5 TFs even though the whole \code{motifFolder} has many more TFs defined. |
938 | 949 |
#' @template forceRerun |
939 |
-#' @return An updated \code{\linkS4class{GRN}} object, with additional information added from this function (\code{GRN@data$TFs$translationTable} in particular) |
|
950 |
+#' @return An updated \code{\linkS4class{GRN}} object, with additional information added from this function (\code{GRN@annotation$TFs} in particular) |
|
940 | 951 |
#' @examples |
941 | 952 |
#' # See the Workflow vignette on the GRaNIE website for examples |
942 | 953 |
#' @export |
... | ... |
@@ -945,6 +956,8 @@ addTFBS <- function(GRN, motifFolder, TFs = "all", nTFMax = NULL, filesTFBSPatte |
945 | 956 |
start = Sys.time() |
946 | 957 |
checkmate::assertClass(GRN, "GRN") |
947 | 958 |
GRN = .addFunctionLogToObject(GRN) |
959 |
+ |
|
960 |
+ GRN = .makeObjectCompatible(GRN) |
|
948 | 961 |
|
949 | 962 |
checkmate::assertDirectoryExists(motifFolder) |
950 | 963 |
checkmate::assertCharacter(TFs, min.len = 1) |
... | ... |
@@ -953,19 +966,20 @@ addTFBS <- function(GRN, motifFolder, TFs = "all", nTFMax = NULL, filesTFBSPatte |
953 | 966 |
checkmate::assertCharacter(fileEnding, len = 1, min.chars = 1) |
954 | 967 |
checkmate::assertFlag(forceRerun) |
955 | 968 |
|
956 |
- if (is.null(GRN@data$TFs$translationTable) | is.null(GRN@data$TFs$translationTable) | is.null(GRN@config$allTF) | is.null(GRN@config$directories$motifFolder) | forceRerun) { |
|
969 |
+ if (is.null(GRN@annotation$TFs) | is.null(GRN@annotation$TFs) | is.null(GRN@config$allTF) | is.null(GRN@config$directories$motifFolder) | forceRerun) { |
|
957 | 970 |
|
958 | 971 |
GRN@config$TFBS_fileEnding = fileEnding |
959 | 972 |
GRN@config$TFBS_filePattern = filesTFBSPattern |
960 |
- GRN@data$TFs$translationTable = .getFinalListOfTFs(motifFolder, filesTFBSPattern, fileEnding, TFs, nTFMax, getCounts(GRN, type = "rna", norm = TRUE, permuted = FALSE)) |
|
973 |
+ GRN@annotation$TFs = .getFinalListOfTFs(motifFolder, filesTFBSPattern, fileEnding, TFs, nTFMax, getCounts(GRN, type = "rna", norm = TRUE, permuted = FALSE)) |
|
961 | 974 |
|
962 | 975 |
|
963 |
- GRN@data$TFs$translationTable = GRN@data$TFs$translationTable %>% |
|
964 |
- dplyr::select(c("HOCOID", "ENSEMBL")) %>% |
|
965 |
- dplyr::mutate(TF.name = .data$HOCOID, TF.ENSEMBL = .data$ENSEMBL) |
|
976 |
+ GRN@annotation$TFs = GRN@annotation$TFs %>% |
|
977 |
+ dplyr::rename(TF.ENSEMBL = .data$ENSEMBL, TF.HOCOID = .data$HOCOID) %>% |
|
978 |
+ dplyr::mutate(TF.name = .data$TF.HOCOID) %>% |
|
979 |
+ dplyr::select(.data$TF.name, .data$TF.ENSEMBL, .data$TF.HOCOID) |
|
966 | 980 |
|
967 | 981 |
# TODO: Change here and make it more logical what to put where |
968 |
- GRN@config$allTF = GRN@data$TFs$translationTable$TF.name |
|
982 |
+ GRN@config$allTF = GRN@annotation$TFs$TF.name |
|
969 | 983 |
|
970 | 984 |
#Store all data-dependent TF information |
971 | 985 |
# GRN@config$TF_list = list() |
... | ... |
@@ -1071,6 +1085,8 @@ overlapPeaksAndTFBS <- function(GRN, nCores = 2, forceRerun = FALSE) { |
1071 | 1085 |
checkmate::assertClass(GRN, "GRN") |
1072 | 1086 |
GRN = .addFunctionLogToObject(GRN) |
1073 | 1087 |
|
1088 |
+ GRN = .makeObjectCompatible(GRN) |
|
1089 |
+ |
|
1074 | 1090 |
checkmate::assertIntegerish(nCores, lower = 1) |
1075 | 1091 |
checkmate::assertFlag(forceRerun) |
1076 | 1092 |
|
... | ... |
@@ -1098,9 +1114,10 @@ overlapPeaksAndTFBS <- function(GRN, nCores = 2, forceRerun = FALSE) { |
1098 | 1114 |
tbl_discarded = table(annotation_discared$chr) |
1099 | 1115 |
tbl_discarded = tbl_discarded[which(tbl_discarded > 0)] |
1100 | 1116 |
|
1101 |
- futile.logger::flog.warn(paste0("Found ", sum(tbl_discarded), " regions from chromosomes without a reference length. ", |
|
1102 |
- "Typically, these are random fragments from known or unknown chromosomes. The following regions will be discarded: \n", |
|
1103 |
- paste0(names(tbl_discarded), " (", tbl_discarded, ")", collapse = ","))) |
|
1117 |
+ message = paste0("Found ", sum(tbl_discarded), " regions from chromosomes without a reference length. ", |
|
1118 |
+ "Typically, these are random fragments from known or unknown chromosomes. The following regions will be discarded: \n", |
|
1119 |
+ paste0(names(tbl_discarded), " (", tbl_discarded, ")", collapse = ",")) |
|
1120 |
+ .checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE) |
|
1104 | 1121 |
|
1105 | 1122 |
GRN@data$peaks$consensusPeaks = dplyr::filter(GRN@data$peaks$consensusPeaks, .data$chr %in% names(seqlengths)) |
1106 | 1123 |
} |
... | ... |
@@ -1218,7 +1235,9 @@ overlapPeaksAndTFBS <- function(GRN, nCores = 2, forceRerun = FALSE) { |
1218 | 1235 |
} else if (ncol(TFBS.df) >= 6) { |
1219 | 1236 |
|
1220 | 1237 |
if (ncol(TFBS.df) > 6) { |
1221 |
- futile.logger::flog.warn(paste0(" File ", file_tfbs_in, " had more than 6 columns, only the first 6 will be used.")) |
|
1238 |
+ message = paste0(" File ", file_tfbs_in, " had more than 6 columns, only the first 6 will be used.") |
|
1239 |
+ .checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE) |
|
1240 |
+ |
|
1222 | 1241 |
TFBS.df = TFBS.df[,seq_len(6)] |
1223 | 1242 |
} |
1224 | 1243 |
colnames(TFBS.df) = c("chr", "start", "end", "annotation", "score", "strand") |
... | ... |
@@ -1350,9 +1369,9 @@ addData_TFActivity <- function(GRN, normalization = "cyclicLoess", name = "TF_ac |
1350 | 1369 |
stopifnot(identical(nrow(countsPeaks), nrow(GRN@data$TFs$TF_peak_overlap))) |
1351 | 1370 |
|
1352 | 1371 |
#Select a maximum set of TFs to run this for |
1353 |
- allTF = GRN@data$TFs$translationTable$TF.name |
|
1372 |
+ allTF = GRN@annotation$TFs$TF.name |
|
1354 | 1373 |
|
1355 |
- # rownamesTFs = GRN@data$TFs$translationTable$ENSEMBL[match(allTF, GRN@data$TFs$translationTable$HOCOID)] |
|
1374 |
+ # rownamesTFs = GRN@annotation$TFs$ENSEMBL[match(allTF, GRN@annotation$TFs$HOCOID)] |
|
1356 | 1375 |
|
1357 | 1376 |
# Calculating TF activity is done for all TF that are available |
1358 | 1377 |
TF.activity.m = matrix(NA, nrow = length(allTF), ncol = length(GRN@config$sharedSamples), |
... | ... |
@@ -1385,7 +1404,7 @@ addData_TFActivity <- function(GRN, normalization = "cyclicLoess", name = "TF_ac |
1385 | 1404 |
as.data.frame() %>% |
1386 | 1405 |
tibble::rownames_to_column("TF.name") %>% |
1387 | 1406 |
tibble::as_tibble() %>% |
1388 |
- dplyr::left_join(GRN@data$TFs$translationTable, by = "TF.name") %>% |
|
1407 |
+ dplyr::left_join(GRN@annotation$TFs, by = "TF.name") %>% |
|
1389 | 1408 |
dplyr::select(.data$ENSEMBL, .data$TF.name, tidyselect::all_of(GRN@config$sharedSamples)) |
1390 | 1409 |
|
1391 | 1410 |
# Update available connection types |
... | ... |
@@ -1509,7 +1528,7 @@ importTFData <- function(GRN, data, name, idColumn = "ENSEMBL", nameColumn = "TF |
1509 | 1528 |
futile.logger::flog.info(paste0("Importing external TF data under the name ", name)) |
1510 | 1529 |
|
1511 | 1530 |
# Check whether TF have been added already |
1512 |
- if (is.null(GRN@data$TFs$translationTable)) { |
|
1531 |
+ if (is.null(GRN@annotation$TFs)) { |
|
1513 | 1532 |
message = paste0("No TFBS afound in the object. Make sure to run the function addTFBS first.") |
1514 | 1533 |
.checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE) |
1515 | 1534 |
} |
... | ... |
@@ -1535,10 +1554,10 @@ importTFData <- function(GRN, data, name, idColumn = "ENSEMBL", nameColumn = "TF |
1535 | 1554 |
} |
1536 | 1555 |
|
1537 | 1556 |
# Make sure the column is reset |
1538 |
- GRN@data$TFs$translationTable[[paste0("TF.name.", name)]] = NULL |
|
1557 |
+ GRN@annotation$TFs[[paste0("TF.name.", name)]] = NULL |
|
1539 | 1558 |
|
1540 | 1559 |
# TODO: Repeated execution results in more and more rows |
1541 |
- GRN@data$TFs$translationTable = dplyr::left_join(GRN@data$TFs$translationTable, data[, idColumns], |
|
1560 |
+ GRN@annotation$TFs = dplyr::left_join(GRN@annotation$TFs, data[, idColumns], |
|
1542 | 1561 |
by = "TF.name", suffix = c("", paste0(".", name))) |
1543 | 1562 |
|
1544 | 1563 |
# data = dplyr::select(data, -tidyselect::one_of(nameColumn)) |
... | ... |
@@ -1550,7 +1569,7 @@ importTFData <- function(GRN, data, name, idColumn = "ENSEMBL", nameColumn = "TF |
1550 | 1569 |
countsNorm$ENSEMBL = data$ENSEMBL |
1551 | 1570 |
|
1552 | 1571 |
nRowBefore = nrow(countsNorm) |
1553 |
- countsNorm.subset = dplyr::filter(countsNorm, .data$ENSEMBL %in% GRN@data$TFs$translationTable$ENSEMBL) |
|
1572 |
+ countsNorm.subset = dplyr::filter(countsNorm, .data$ENSEMBL %in% GRN@annotation$TFs$TF.ENSEMBL) |
|
1554 | 1573 |
nRowAfter = nrow(countsNorm.subset) |
1555 | 1574 |
if (nRowAfter == 0) { |
1556 | 1575 |
message = "No rows overlapping with translation table, check ENSEMBL IDs." |
... | ... |
@@ -1627,6 +1646,7 @@ AR_classification_wrapper<- function (GRN, significanceThreshold_Wilcoxon = 0.05 |
1627 | 1646 |
checkmate::assertClass(GRN, "GRN") |
1628 | 1647 |
GRN = .addFunctionLogToObject(GRN) |
1629 | 1648 |
|
1649 |
+ GRN = .makeObjectCompatible(GRN) |
|
1630 | 1650 |
|
1631 | 1651 |
checkmate::assertNumber(significanceThreshold_Wilcoxon, lower = 0, upper = 1) |
1632 | 1652 |
checkmate::assertNumber(plot_minNoTFBS_heatmap, lower = 1) |
... | ... |
@@ -1637,7 +1657,7 @@ AR_classification_wrapper<- function (GRN, significanceThreshold_Wilcoxon = 0.05 |
1637 | 1657 |
|
1638 | 1658 |
outputFolder = .checkOutputFolder(GRN, outputFolder) |
1639 | 1659 |
|
1640 |
- GRN@data$TFs$classification$TF.translation.orig = GRN@data$TFs$translationTable %>% |
|
1660 |
+ GRN@data$TFs$classification$TF.translation.orig = GRN@annotation$TFs %>% |
|
1641 | 1661 |
dplyr::mutate(TF.name = .data$HOCOID) |
1642 | 1662 |
|
1643 | 1663 |
if (is.null(GRN@data$TFs$TF_peak_overlap)) { |
... | ... |
@@ -1695,7 +1715,7 @@ AR_classification_wrapper<- function (GRN, significanceThreshold_Wilcoxon = 0.05 |
1695 | 1715 |
|
1696 | 1716 |
TF_peak_cor = .correlateMatrices(matrix1 = counts1, |
1697 | 1717 |
matrix_peaks = counts_peaks, |
1698 |
- GRN@data$TFs$translationTable, corMethod) |
|
1718 |
+ GRN@annotation$TFs, corMethod) |
|
1699 | 1719 |
|
1700 | 1720 |
peak_TF_overlapCur.df = .filterSortAndShuffle_peakTF_overlapTable(GRN, permutationCur, TF_peak_cor) |
1701 | 1721 |
res.l = .computeForegroundAndBackgroundMatrices(peak_TF_overlapCur.df, TF_peak_cor) |
... | ... |
@@ -1766,7 +1786,7 @@ AR_classification_wrapper<- function (GRN, significanceThreshold_Wilcoxon = 0.05 |
1766 | 1786 |
TF_peak_cor = GRN@data$TFs$classification[[permIndex]] [[connectionTypeCur]]$TF_peak_cor |
1767 | 1787 |
peak_TF_overlapCur.df = .filterSortAndShuffle_peakTF_overlapTable(GRN, permutationCur, TF_peak_cor) |
1768 | 1788 |
.plot_heatmapAR(TF.peakMatrix.df = peak_TF_overlapCur.df, |
1769 |
- HOCOMOCO_mapping.df.exp = GRN@data$TFs$translationTable %>% dplyr::mutate(TF = .data$TF.name), |
|
1789 |
+ HOCOMOCO_mapping.df.exp = GRN@annotation$TFs %>% dplyr::mutate(TF = .data$TF.name), |
|
1770 | 1790 |
sort.cor.m = TF_peak_cor, |
1771 | 1791 |
par.l = GRN@config$parameters, |
1772 | 1792 |
corMethod = corMethod, |
... | ... |
@@ -1860,6 +1880,8 @@ addConnections_TF_peak <- function (GRN, plotDiagnosticPlots = TRUE, plotDetails |
1860 | 1880 |
|
1861 | 1881 |
checkmate::assertClass(GRN, "GRN") |
1862 | 1882 |
GRN = .addFunctionLogToObject(GRN) |
1883 |
+ |
|
1884 |
+ GRN = .makeObjectCompatible(GRN) |
|
1863 | 1885 |
|
1864 | 1886 |
checkmate::assertFlag(plotDiagnosticPlots) |
1865 | 1887 |
checkmate::assertFlag(plotDetails) |
... | ... |
@@ -1981,7 +2003,7 @@ addConnections_TF_peak <- function (GRN, plotDiagnosticPlots = TRUE, plotDetails |
1981 | 2003 |
# Filtering of the matrices happens automatically within the next function |
1982 | 2004 |
peaksCor.m = .correlateMatrices( matrix1= counts_connectionTypeCur, |
1983 | 2005 |
matrix_peaks = counts_peaks, |
1984 |
- GRN@data$TFs$translationTable, |
|
2006 |
+ GRN@annotation$TFs, |
|
1985 | 2007 |
corMethod, |
1986 | 2008 |
whitespacePrefix = " ") |
1987 | 2009 |
|
... | ... |
@@ -2419,6 +2441,7 @@ addConnections_peak_gene <- function(GRN, overlapTypeGene = "TSS", corMethod = " |
2419 | 2441 |
checkmate::assertClass(GRN, "GRN") |
2420 | 2442 |
GRN = .addFunctionLogToObject(GRN) |
2421 | 2443 |
|
2444 |
+ GRN = .makeObjectCompatible(GRN) |
|
2422 | 2445 |
|
2423 | 2446 |
checkmate::assertChoice(overlapTypeGene, c("TSS", "full")) |
2424 | 2447 |
checkmate::assertChoice(corMethod, c("pearson", "spearman")) |
... | ... |
@@ -2688,7 +2711,8 @@ addConnections_peak_gene <- function(GRN, overlapTypeGene = "TSS", corMethod = " |
2688 | 2711 |
# Check whether TAD boundaries overlap and print a warning if so |
2689 | 2712 |
nMultipleOverlaps = .checkSelfOverlap(subject) |
2690 | 2713 |
if (nMultipleOverlaps > 0) { |
2691 |
- futile.logger::flog.warn(paste0(nMultipleOverlaps, " out of ", length(subject), " TADs overlap with at least one other TAD. Please verify whether this is intended or a mistake. Particularly 1bp overlaps may not resembl the truth.")) |
|
2714 |
+ message =paste0(nMultipleOverlaps, " out of ", length(subject), " TADs overlap with at least one other TAD. Please verify whether this is intended or a mistake. Particularly 1bp overlaps may not resembl the truth.") |
|
2715 |
+ .checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE) |
|
2692 | 2716 |
} |
2693 | 2717 |
|
2694 | 2718 |
|
... | ... |
@@ -2937,6 +2961,7 @@ addConnections_peak_gene <- function(GRN, overlapTypeGene = "TSS", corMethod = " |
2937 | 2961 |
#' This is one of the main integrative functions of the \code{GRaNIE} package. It has two main functions: |
2938 | 2962 |
#' First, filtering both TF-peak and peak-gene connections according to different criteria such as FDR and other properties |
2939 | 2963 |
#' Second, joining the three major elements that an eGRN consist of (TFs, peaks, genes) into one data frame, with one row per unique TF-peak-gene connection. |
2964 |
+#' \strong{After successful execution, the connections (along with additional feature metadata) can be retrieved with the function \code{\link{getGRNConnections}}.} |
|
2940 | 2965 |
#' \strong{Note that the eGRN graph is reset upon successful execution of this function, and re-running the function \code{\link{build_eGRN_graph}} |
2941 | 2966 |
#' is necessary when any of the network functions of the package shall be executed.} |
2942 | 2967 |
|
... | ... |
@@ -2968,7 +2993,8 @@ addConnections_peak_gene <- function(GRN, overlapTypeGene = "TSS", corMethod = " |
2968 | 2993 |
#' @param silent \code{TRUE} or \code{FALSE}. Default \code{FALSE}. Print progress messages and filter statistics. |
2969 | 2994 |
#' @param filterLoops \code{TRUE} or \code{FALSE}. Default \code{TRUE}. If a TF regulates itself (i.e., the TF and the gene are the same entity), should such loops be filtered from the GRN? |
2970 | 2995 |
#' @template outputFolder |
2971 |
-#' @return An updated \code{\linkS4class{GRN}} object, with additional information added from this function. The filtered and merged TF-peak and peak-gene connections in the slot \code{GRN@connections$all.filtered}. |
|
2996 |
+#' @return An updated \code{\linkS4class{GRN}} object, with additional information added from this function. |
|
2997 |
+#' The filtered and merged TF-peak and peak-gene connections in the slot \code{GRN@connections$all.filtered} and can be retrieved (along with other feature metadata) using the function \code{\link{getGRNConnections}}. |
|
2972 | 2998 |
#' @examples |
2973 | 2999 |
#' # See the Workflow vignette on the GRaNIE website for examples |
2974 | 3000 |
#' GRN = loadExampleObject() |
... | ... |
@@ -2977,6 +3003,7 @@ addConnections_peak_gene <- function(GRN, overlapTypeGene = "TSS", corMethod = " |
2977 | 3003 |
#' @seealso \code{\link{addConnections_TF_peak}} |
2978 | 3004 |
#' @seealso \code{\link{addConnections_peak_gene}} |
2979 | 3005 |
#' @seealso \code{\link{build_eGRN_graph}} |
3006 |
+#' @seealso \code{\link{getGRNConnections}} |
|
2980 | 3007 |
#' @importFrom rlang .data |
2981 | 3008 |
#' @importFrom magrittr `%>%` |
2982 | 3009 |
#' @export |
... | ... |
@@ -3004,6 +3031,8 @@ filterGRNAndConnectGenes <- function(GRN, |
3004 | 3031 |
checkmate::assertClass(GRN, "GRN") |
3005 | 3032 |
GRN = .addFunctionLogToObject(GRN) |
3006 | 3033 |
|
3034 |
+ GRN = .makeObjectCompatible(GRN) |
|
3035 |
+ |
|
3007 | 3036 |
|
3008 | 3037 |
checkmate::assertCharacter(TF_peak.connectionTypes, min.len = 1, any.missing = FALSE) |
3009 | 3038 |
checkmate::assert(checkmate::checkNull(peak_gene.p_raw.threshold), checkmate::checkNumber(peak_gene.p_raw.threshold, lower = 0, upper = 1)) |
... | ... |
@@ -3015,7 +3044,7 @@ filterGRNAndConnectGenes <- function(GRN, |
3015 | 3044 |
checkmate::assertChoice(peak_gene.fdr.method, c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none", "IHW")) |
3016 | 3045 |
checkmate::assert(checkmate::checkNull(peak_gene.IHW.covariate), checkmate::checkCharacter(peak_gene.IHW.covariate, min.chars = 1, len = 1)) |
3017 | 3046 |
|
3018 |
- checkmate::assert(checkIntegerish(peak_gene.IHW.nbins, lower = 1), checkSubset(peak_gene.IHW.nbins, "auto")) |
|
3047 |
+ checkmate::assert(checkmate::checkIntegerish(peak_gene.IHW.nbins, lower = 1), checkmate::checkSubset(peak_gene.IHW.nbins, "auto")) |
|
3019 | 3048 |
|
3020 | 3049 |
checkmate::assertFlag(silent) |
3021 | 3050 |
checkmate::assertChoice(peak_gene.selection, c("all", "closest")) |
... | ... |
@@ -3056,16 +3085,22 @@ filterGRNAndConnectGenes <- function(GRN, |
3056 | 3085 |
.checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE) |
3057 | 3086 |
} |
3058 | 3087 |
|
3088 |
+ # Only select the absolute necessary here, no additional metadata |
|
3089 |
+ ann.gene.red = GRN@annotation$genes %>% |
|
3090 |
+ dplyr::mutate(gene.ENSEMBL = as.character(.data$gene.ENSEMBL)) %>% |
|
3091 |
+ dplyr::select(.data$gene.ENSEMBL, .data$gene.name, .data$gene.type) |
|
3092 |
+ |
|
3059 | 3093 |
peakGeneCorrelations = GRN@connections$peak_genes[[permIndex]] %>% |
3060 | 3094 |
dplyr::mutate(gene.ENSEMBL = as.character(.data$gene.ENSEMBL)) %>% |
3061 |
- dplyr::left_join(dplyr::mutate(GRN@annotation$genes, gene.ENSEMBL = as.character(.data$gene.ENSEMBL)), by = "gene.ENSEMBL") |
|
3062 |
- |
|
3095 |
+ dplyr::left_join(ann.gene.red, by = "gene.ENSEMBL") |
|
3096 |
+ |
|
3097 |
+ |
|
3063 | 3098 |
|
3064 | 3099 |
# Add TF Ensembl IDs |
3065 | 3100 |
grn.filt = GRN@connections$TF_peaks[[permIndex]]$main %>% |
3066 | 3101 |
tibble::as_tibble() %>% |
3067 |
- dplyr::left_join(GRN@data$TFs$translationTable, by = c("TF.name")) %>% |
|
3068 |
- dplyr::select(-.data$HOCOID, -.data$ENSEMBL) %>% |
|
3102 |
+ dplyr::left_join(GRN@annotation$TFs %>% dplyr::select(.data$TF.name, .data$TF.ENSEMBL), by = c("TF.name")) %>% |
|
3103 |
+ dplyr::select(-.data$TF_peak.fdr_orig) %>% |
|
3069 | 3104 |
dplyr::mutate(TF.ENSEMBL = as.factor(.data$TF.ENSEMBL)) |
3070 | 3105 |
|
3071 | 3106 |
if (is.null(grn.filt)) { |
... | ... |
@@ -3172,20 +3207,28 @@ filterGRNAndConnectGenes <- function(GRN, |
3172 | 3207 |
if (!is.null(filterTFs)) { |
3173 | 3208 |
futile.logger::flog.info(paste0(" Filter network to the following TF: ", paste0(filterTFs, collapse = ","))) |
3174 | 3209 |
futile.logger::flog.info(paste0(" Number of TF-peak rows before filtering TFs: ", nrow(grn.filt))) |
3175 |
- grn.filt = dplyr::filter(grn.filt, .data$TF %in% filterTFs) |
|
3210 |
+ grn.filt = dplyr::filter(grn.filt, .data$TF.name %in% filterTFs) |
|
3176 | 3211 |
futile.logger::flog.info(paste0(" Number of TF-peak rows after filtering TFs: ", nrow(grn.filt))) |
3177 | 3212 |
} |
3178 | 3213 |
|
3179 | 3214 |
if (!is.null(filterPeaks)) { |
3180 | 3215 |
futile.logger::flog.info(paste0(" Filter network to the following peaks: ", paste0(filterPeaks, collapse = ","))) |
3181 | 3216 |
futile.logger::flog.info(paste0(" Number of TF-peak rows before filtering peaks: ", nrow(grn.filt))) |
3182 |
- grn.filt = dplyr::filter(grn.filt, .data$peakID %in% filterPeaks) |
|
3217 |
+ grn.filt = dplyr::filter(grn.filt, .data$peak.ID %in% filterPeaks) |
|
3183 | 3218 |
futile.logger::flog.info(paste0(" Number of TF-peak rows after filtering peaks: ", nrow(grn.filt))) |
3184 | 3219 |
} |
3185 | 3220 |
|
3186 | 3221 |
# Filters on peak-genes |
3187 | 3222 |
|
3188 | 3223 |
futile.logger::flog.info("2. Filter peak-gene connections") |
3224 |
+ |
|
3225 |
+ if (!is.null(filterGenes)) { |
|
3226 |
+ futile.logger::flog.info(paste0(" Filter peak-gene connections for the following genes: ", paste0(filterGenes, collapse = ","))) |
|
3227 |
+ futile.logger::flog.info(paste0(" Number of rows before filtering genes: ", nrow(peakGeneCorrelations))) |
|
3228 |
+ peakGeneCorrelations = dplyr::filter(peakGeneCorrelations, .data$gene.ENSEMBL %in% filterGenes) |
|
3229 |
+ futile.logger::flog.info(paste0(" Number of rows after filtering genes: ", nrow(peakGeneCorrelations))) |
|
3230 |
+ } |
|
3231 |
+ |
|
3189 | 3232 |
if (!is.null(peak_gene.maxDistance)) { |
3190 | 3233 |
checkmate::assertIntegerish(peak_gene.maxDistance, lower = 0) |
3191 | 3234 |
futile.logger::flog.info(paste0(" Filter peak-gene connections for their distance and keep only connections with a maximum distance of ", peak_gene.maxDistance)) |
... | ... |
@@ -3201,6 +3244,8 @@ filterGRNAndConnectGenes <- function(GRN, |
3201 | 3244 |
futile.logger::flog.info(paste0(" Number of peak-gene rows after filtering by gene type: ", nrow(peakGeneCorrelations))) |
3202 | 3245 |
} |
3203 | 3246 |
|
3247 |
+ |
|
3248 |
+ |
|
3204 | 3249 |
|
3205 | 3250 |
futile.logger::flog.info(paste0("3. Merging TF-peak with peak-gene connections and filter the combined table...")) |
3206 | 3251 |
# Now we need the connected genes. All fitters that are independent of that have been done |
... | ... |
@@ -3213,7 +3258,7 @@ filterGRNAndConnectGenes <- function(GRN, |
3213 | 3258 |
} |
3214 | 3259 |
|
3215 | 3260 |
|
3216 |
- futile.logger::flog.info(paste0("Inital number of rows left before all filtering steps: ", nrow(grn.filt))) |
|
3261 |
+ futile.logger::flog.info(paste0("Inital number of rows left before filtering steps: ", nrow(grn.filt))) |
|
3217 | 3262 |
|
3218 | 3263 |
if (filterLoops) { |
3219 | 3264 |
futile.logger::flog.info(paste0(" Filter TF-TF self-loops")) |
... | ... |
@@ -3227,12 +3272,7 @@ filterGRNAndConnectGenes <- function(GRN, |
3227 | 3272 |
futile.logger::flog.info(paste0(" Number of rows after filtering genes: ", nrow(grn.filt))) |
3228 | 3273 |
} |
3229 | 3274 |
|
3230 |
- if (!is.null(filterGenes)) { |
|
3231 |
- futile.logger::flog.info(paste0(" Filter network to the following genes: ", paste0(filterGenes, collapse = ","))) |
|
3232 |
- futile.logger::flog.info(paste0(" Number of rows before filtering genes: ", nrow(grn.filt))) |
|
3233 |
- grn.filt = dplyr::filter(grn.filt, .data$gene.ENSEMBL %in% filterGenes) |
|
3234 |
- futile.logger::flog.info(paste0(" Number of rows after filtering genes: ", nrow(grn.filt))) |
|
3235 |
- } |
|
3275 |
+ |
|
3236 | 3276 |
|
3237 | 3277 |
if (allowMissingGenes) { |
3238 | 3278 |
# Nothing to do here |
... | ... |
@@ -3359,13 +3399,12 @@ filterGRNAndConnectGenes <- function(GRN, |
3359 | 3399 |
|
3360 | 3400 |
|
3361 | 3401 |
grn.filt = grn.filt %>% |
3362 |
- dplyr::left_join(GRN@annotation$consensusPeaks, by = "peak.ID") %>% |
|
3363 |
- dplyr::select(dplyr::starts_with("TF."), |
|
3364 |
- dplyr::starts_with("TF_peak."), |
|
3365 |
- dplyr::starts_with("peak."), |
|
3366 |
- dplyr::starts_with("peak_gene."), |
|
3367 |
- dplyr::starts_with("gene."), |
|
3368 |
- -dplyr::starts_with("peak.gene.")) %>% # these are the annotation columns by ChipSeeker that are confusing |
|
3402 |
+ dplyr::select(tidyselect::starts_with("TF."), |
|
3403 |
+ tidyselect::starts_with("TF_peak."), |
|
3404 |
+ tidyselect::starts_with("peak."), |
|
3405 |
+ tidyselect::starts_with("peak_gene."), |
|
3406 |
+ tidyselect::starts_with("gene."), |
|
3407 |
+ tidyselect::everything()) %>% |
|
3369 | 3408 |
dplyr::mutate(peak.ID = as.factor(.data$peak.ID), |
3370 | 3409 |
gene.ENSEMBL = as.factor(.data$gene.ENSEMBL), |
3371 | 3410 |
TF.name = as.factor(.data$TF.name)) |
... | ... |
@@ -3461,7 +3500,7 @@ filterGRNAndConnectGenes <- function(GRN, |
3461 | 3500 |
|
3462 | 3501 |
# We can compare this to the result of applying the method of Benjamini and Hochberg to the p-values only: |
3463 | 3502 |
|
3464 |
- padj_bh <- stats::p.adjust(pvalues, method = "BH") |
|
3503 |
+ padj_bh = stats::p.adjust(pvalues, method = "BH") |
|
3465 | 3504 |
rejectionsBH = sum(padj_bh <= alpha, na.rm = TRUE) |
3466 | 3505 |
rejectionsIHW = IHW::rejections(ihw_res) |
3467 | 3506 |
|
... | ... |
@@ -3580,6 +3619,9 @@ add_TF_gene_correlation <- function(GRN, corMethod = "pearson", addRobustRegress |
3580 | 3619 |
|
3581 | 3620 |
checkmate::assertClass(GRN, "GRN") |
3582 | 3621 |
GRN = .addFunctionLogToObject(GRN) |
3622 |
+ |
|
3623 |
+ GRN = .makeObjectCompatible(GRN) |
|
3624 |
+ |
|
3583 | 3625 |
checkmate::assertChoice(corMethod, c("pearson", "spearman")) |
3584 | 3626 |
checkmate::assertFlag(addRobustRegression) |
3585 | 3627 |
checkmate::assertIntegerish(nCores, lower = 1) |
... | ... |
@@ -3594,10 +3636,7 @@ add_TF_gene_correlation <- function(GRN, corMethod = "pearson", addRobustRegress |
3594 | 3636 |
|
3595 | 3637 |
GRN@connections$TF_genes.filtered = list() |
3596 | 3638 |
|
3597 |
- if (is.null(GRN@connections$all.filtered)) { |
|
3598 |
- message = "Could not find merged and filtered connections. Run the function filterGRNAndConnectGenes first." |
|
3599 |
- .checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE) |
|
3600 |
- } |
|
3639 |
+ .checkExistanceFilteredConnections(GRN) |
|
3601 | 3640 |
|
3602 | 3641 |
futile.logger::flog.info(paste0("Calculate correlations for TF and genes from the filtered set of connections")) |
3603 | 3642 |
|
... | ... |
@@ -3611,7 +3650,7 @@ add_TF_gene_correlation <- function(GRN, corMethod = "pearson", addRobustRegress |
3611 | 3650 |
dplyr::filter(!is.na(.data$gene.ENSEMBL)) %>% |
3612 | 3651 |
dplyr::select(.data$TF.name, .data$gene.ENSEMBL) %>% |
3613 | 3652 |
dplyr::distinct() %>% |
3614 |
- dplyr::left_join(GRN@data$TFs$translationTable, by = c("TF.name"), suffix = c("", ".transl")) # %>% |
|
3653 |
+ dplyr::left_join(GRN@annotation$TFs, by = c("TF.name"), suffix = c("", ".transl")) # %>% |
|
3615 | 3654 |
# dplyr::distinct(ENSEMBL, ENSEMBL.transl) |
3616 | 3655 |
# TODO: Improve: Only loop over distinct ENSMBL_TF and ENSEMBL_gene pairs |
3617 | 3656 |
|
... | ... |
@@ -3658,7 +3697,7 @@ add_TF_gene_correlation <- function(GRN, corMethod = "pearson", addRobustRegress |
3658 | 3697 |
dplyr::mutate(TF.ENSEMBL = getCounts(GRN, type = "rna", norm = TRUE, permuted = as.logical(permutationCur))$ENSEMBL[map_TF], |
3659 | 3698 |
gene.ENSEMBL = getCounts(GRN, type = "rna", norm = TRUE, permuted = as.logical(permutationCur))$ENSEMBL[map_gene]) %>% |
3660 | 3699 |
dplyr::filter(!is.na(.data$gene.ENSEMBL), !is.na(.data$TF.ENSEMBL)) %>% # For some peak-gene combinations, no RNA-Seq data was available, these NAs are filtered |
3661 |
- dplyr::left_join(GRN@data$TFs$translationTable, by = c("TF.ENSEMBL")) %>% |
|
3700 |
+ dplyr::left_join(GRN@annotation$TFs, by = c("TF.ENSEMBL")) %>% |
|
3662 | 3701 |
dplyr::select(tidyselect::all_of(selectColumns))) %>% |
3663 | 3702 |
dplyr::mutate(gene.ENSEMBL = as.factor(.data$gene.ENSEMBL), |
3664 | 3703 |
TF.ENSEMBL = as.factor(.data$TF.ENSEMBL), |
... | ... |
@@ -3677,8 +3716,9 @@ add_TF_gene_correlation <- function(GRN, corMethod = "pearson", addRobustRegress |
3677 | 3716 |
} |
3678 | 3717 |
|
3679 | 3718 |
} else { |
3680 |
- futile.logger::flog.warn(paste0(" Nothing to do, skip.")) |
|
3681 |
- |
|
3719 |
+ message =paste0(" Nothing to do, skip.") |
|
3720 |
+ .checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE) |
|
3721 |
+ |
|
3682 | 3722 |
if (addRobustRegression) { |
3683 | 3723 |
res.df = tibble::tribble(~TF.name, ~TF.ENSEMBL, ~gene.ENSEMBL, ~TF_gene.r, ~TF_gene.p_raw, ~TF_gene.p_raw.robust, |
3684 | 3724 |
~TF_gene.r_robust, ~TF_gene.bias_M_p.raw, ~TF_gene.bias_LS_p.raw) |
... | ... |
@@ -3732,7 +3772,8 @@ addSNPOverlap <- function(grn, SNPData, col_chr = "chr", col_pos = "pos", col_pe |
3732 | 3772 |
if (genomeAssembly_SNP %in% c("hg19", "hg38")) { |
3733 | 3773 |
index_chr23 = which(grepl("chr23", SNPData[, col_chr])) |
3734 | 3774 |
if (length(index_chr23) > 0) { |
3735 |
- futile.logger::flog.warn(paste0(" Chr23 found in SNP data. Renaming to chrX")) |
|
3775 |
+ message =paste0(" Chr23 found in SNP data. Renaming to chrX") |
|
3776 |
+ .checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE) |
|
3736 | 3777 |
SNPData[index_chr23, col_chr] = stringr::str_replace(SNPData[index_chr23, col_chr], "chr23", "chrX") |
3737 | 3778 |
} |
3738 | 3779 |
} |
... | ... |
@@ -3857,6 +3898,8 @@ generateStatsSummary <- function(GRN, |
3857 | 3898 |
|
3858 | 3899 |
checkmate::assertClass(GRN, "GRN") |
3859 | 3900 |
GRN = .addFunctionLogToObject(GRN) |
3901 |
+ |
|
3902 |
+ GRN = .makeObjectCompatible(GRN) |
|
3860 | 3903 |
|
3861 | 3904 |
checkmate::assertNumeric(TF_peak.fdr, lower = 0, upper = 1, min.len = 1) |
3862 | 3905 |
checkmate::assertSubset(TF_peak.connectionTypes, c("all", GRN@config$TF_peak_connectionTypes), empty.ok = FALSE) |
... | ... |
@@ -4172,8 +4215,7 @@ loadExampleObject <- function(forceDownload = FALSE, fileURL = "https://www.embl |
4172 | 4215 |
GRN@config$directories$motifFolder = "." |
4173 | 4216 |
GRN@config$files$output_log = "GRN.log" |
4174 | 4217 |
|
4175 |
- # Temporary fix: Replace lincRNA with lncRNA due to a recent change in biomaRt until we update the object directly |
|
4176 |
- GRN@annotation$genes = dplyr::mutate(GRN@annotation$genes, gene.type = dplyr::recode_factor(.data$gene.type, lncRNA = "lincRNA")) |
|
4218 |
+ GRN = .makeObjectCompatible(GRN) |
|
4177 | 4219 |
|
4178 | 4220 |
GRN |
4179 | 4221 |
|
... | ... |
@@ -4200,7 +4242,9 @@ loadExampleObject <- function(forceDownload = FALSE, fileURL = "https://www.embl |
4200 | 4242 |
getCounts <- function(GRN, type, norm, permuted = FALSE) { |
4201 | 4243 |
|
4202 | 4244 |
checkmate::assertClass(GRN, "GRN") |
4203 |
- GRN = .addFunctionLogToObject(GRN) |
|
4245 |
+ GRN = .addFunctionLogToObject(GRN) |
|
4246 |
+ |
|
4247 |
+ GRN = .makeObjectCompatible(GRN) |
|
4204 | 4248 |
|
4205 | 4249 |
checkmate::assertChoice(type, c("peaks", "rna")) |
4206 | 4250 |
checkmate::assertFlag(norm)# |
... | ... |
@@ -4248,7 +4292,7 @@ getCounts <- function(GRN, type, norm, permuted = FALSE) { |
4248 | 4292 |
} else { # permutation 1 |
4249 | 4293 |
|
4250 | 4294 |
if (type == "peaks") { |
4251 |
- message = "Could not find permuted peak counts in GRN object. Peaks are not stored as pemruted, set permuted = FALSE for type = \"peaks\"" |
|
4295 |
+ message = "Could not find permuted peak counts in GRN object. Peaks are not stored as permuted, set permuted = FALSE for type = \"peaks\"" |
|
4252 | 4296 |
.checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE) |
4253 | 4297 |
} else { |
4254 | 4298 |
|
... | ... |
@@ -4283,52 +4327,123 @@ getCounts <- function(GRN, type, norm, permuted = FALSE) { |
4283 | 4327 |
#' @export |
4284 | 4328 |
#' @template GRN |
4285 | 4329 |
#' @template permuted |
4286 |
-#' @param type Character. One of \code{TF_peaks}, \code{peak_genes}, \code{all.filtered}. Default \code{all.filtered}. The type of connections to retrieve. |
|
4287 |
-#' @param include_TF_gene_correlations Logical. \code{TRUE} or \code{FALSE}. Default \code{FALSE}. Should TFs and gene correlations be returned as well? If set to \code{TRUE}, they must have been computed beforehand with \code{\link{add_TF_gene_correlation}}. |
|
4330 |
+#' @param type Character. One of \code{TF_peaks}, \code{peak_genes}, \code{TF_genes} or \code{all.filtered}. Default \code{all.filtered}. The type of connections to retrieve. |
|
4331 |
+#' @param include_TF_gene_correlations Logical. \code{TRUE} or \code{FALSE}. Default \code{FALSE}. Should TFs and gene correlations be returned as well? If set to \code{TRUE}, they must have been computed beforehand with \code{\link{add_TF_gene_correlation}}. Only relevant for type = "all.filtered" |
|
4332 |
+#' @param include_TFMetadata Logical. \code{TRUE} or \code{FALSE}. Default \code{FALSE}. Should TF metadata be returned as well? Only relevant for type = "all.filtered" |
|
4333 |
+#' @param include_peakMetadata Logical. \code{TRUE} or \code{FALSE}. Default \code{FALSE}. Should peak metadata be returned as well? Only relevant for type = "all.filtered" |
|
4334 |
+#' @param include_geneMetadata Logical. \code{TRUE} or \code{FALSE}. Default \code{FALSE}. Should gene metadata be returned as well? Only relevant for type = "all.filtered" |
|
4335 |
+#' @param include_variancePartitionResults Logical. \code{TRUE} or \code{FALSE}. Default \code{FALSE}. |
|
4336 |
+#' Should the results from the function \code{\link{add_featureVariation}} be included? |
|
4337 |
+#' If set to \code{TRUE}, they must have been computed beforehand with \code{\link{add_featureVariation}}; otherwise, an error is thrown. |
|
4338 |
+#' Only relevant for type = "all.filtered" |
|
4288 | 4339 |
#' @return A data frame with the requested connections. This function does **NOT** return a \code{\linkS4class{GRN}} object. |
4289 | 4340 |
#' @seealso \code{\link{filterGRNAndConnectGenes}} |
4341 |
+#' @seealso \code{\link{add_featureVariation}} |
|
4342 |
+#' @seealso \code{\link{add_TF_gene_correlation}} |
|
4290 | 4343 |
#' @examples |
4291 | 4344 |
#' # See the Workflow vignette on the GRaNIE website for examples |
4292 | 4345 |
#' GRN = loadExampleObject() |
4293 | 4346 |
#' GRN_con.all.df = getGRNConnections(GRN) |
4294 |
-getGRNConnections <- function(GRN, type = "all.filtered", permuted = FALSE, include_TF_gene_correlations = FALSE) { |
|
4347 |
+getGRNConnections <- function(GRN, type = "all.filtered", permuted = FALSE, |
|
4348 |
+ include_TF_gene_correlations = FALSE, |
|
4349 |
+ include_TFMetadata = FALSE, |
|
4350 |
+ include_peakMetadata = FALSE, |
|
4351 |
+ include_geneMetadata = FALSE, |
|
4352 |
+ include_variancePartitionResults = FALSE) { |
|
4295 | 4353 |
|
4296 | 4354 |
checkmate::assertClass(GRN, "GRN") |
4297 | 4355 |
GRN = .addFunctionLogToObject(GRN) |
4298 | 4356 |
|
4357 |
+ GRN = .makeObjectCompatible(GRN) |
|
4299 | 4358 |
|
4300 |
- checkmate::assertChoice(type, c("TF_peaks", "peak_genes", "all.filtered")) |
|
4359 |
+ checkmate::assertChoice(type, c("TF_peaks", "peak_genes", "TF_genes", "all.filtered")) |
|
4301 | 4360 |
checkmate::assertFlag(permuted) |
4302 | 4361 |
#checkmate::assertIntegerish(permutation, lower = 0, upper = .getMaxPermutation(GRN)) |
4303 | 4362 |
checkmate::assertFlag(include_TF_gene_correlations) |
4363 |
+ checkmate::assertFlag(include_variancePartitionResults) |
|
4364 |
+ checkmate::assertFlag(include_TFMetadata) |
|
4365 |
+ checkmate::assertFlag(include_peakMetadata) |
|
4366 |
+ checkmate::assertFlag(include_geneMetadata) |
|
4304 | 4367 |
|
4305 | 4368 |
permIndex = dplyr::if_else(permuted, "1", "0") |
4306 | 4369 |
|
4307 | 4370 |
if (type == "all.filtered") { |
4371 |
+ |
|
4372 |
+ merged.df = GRN@connections$all.filtered[[permIndex]] |
|
4373 |
+ |
|
4374 |
+ if (include_TF_gene_correlations) { |
|
4375 |
+ |
|
4376 |
+ if (is.null(GRN@connections$TF_genes.filtered)) { |
|
4377 |
+ message = "Please run the function add_TF_gene_correlation first. " |
|
4378 |
+ .checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE) |
|
4379 |
+ } |
|
4380 |
+ |
|
4381 |
+ # Merge with TF-gene table |
|
4382 |
+ merged.df = merged.df %>% |
|
4383 |
+ dplyr::left_join(GRN@connections$TF_genes.filtered[[permIndex]], |
|
4384 |
+ by = c("TF.name", "TF.ENSEMBL", "gene.ENSEMBL")) |
|
4385 |
+ } |
|
4308 | 4386 |
|
4387 |
+ if (include_variancePartitionResults) { |
|
4388 |
+ |
|
4389 |
+ if (ncol(GRN@annotation$genes %>% dplyr::select(tidyselect::starts_with("variancePartition"))) == 0 | |
|
4390 |
+ ncol(GRN@annotation$consensusPeaks %>% dplyr::select(tidyselect::starts_with("variancePartition"))) == 0) { |
|
4391 |
+ message = "Please run the function add_featureVariation first. " |
|
4392 |
+ .checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE) |
|
4393 |
+ } |
|
4394 |
+ |
|
4395 |
+ merged.df = merged.df %>% |
|
4396 |
+ dplyr::left_join(GRN@annotation$TFs %>% |
|
4397 |
+ dplyr::select(.data$TF.ENSEMBL, tidyselect::starts_with("TF.variancePartition")), |
|
4398 |
+ by = "TF.ENSEMBL") %>% |
|
4399 |
+ dplyr::left_join(GRN@annotation$genes %>% |
|
4400 |
+ dplyr::select(.data$gene.ENSEMBL, tidyselect::starts_with("gene.variancePartition")), |
|
4401 |
+ by = "gene.ENSEMBL") %>% |
|
4402 |
+ dplyr::left_join(GRN@annotation$consensusPeaks %>% |
|
4403 |
+ dplyr::select(.data$peak.ID, tidyselect::starts_with("peak.variancePartition")), |
|
4404 |
+ by = "peak.ID") |
|
4405 |
+ |
|
4406 |
+ } |
|
4309 | 4407 |
|
4310 |
- if (include_TF_gene_correlations) { |
|
4311 |
- |
|
4312 |
- if (is.null(GRN@connections$TF_genes.filtered)) { |
|
4313 |
- message = "Please run the function add_TF_gene_correlation first. " |
|
4314 |
- .checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE) |
|
4315 |
- } |
|
4316 |
- |
|
4317 |
- # Merge with TF-gene table |
|
4318 |
- merged.df = dplyr::left_join(GRN@connections$all.filtered[[permIndex]], |
|
4319 |
- GRN@connections$TF_genes.filtered[[permIndex]], |
|
4320 |
- by = c("TF.name", "TF.ENSEMBL", "gene.ENSEMBL")) %>% |
|
4321 |
- dplyr::select(tidyselect::starts_with("TF."), tidyselect::everything()) %>% |
|
4322 |
- tibble::as_tibble() |
|
4323 |
- |
|
4324 |
- |
|
4325 |
- return(merged.df) |
|
4326 |
- |
|
4327 |
- } else { |
|
4328 |
- return(GRN@connections$all.filtered[[permIndex]]) |
|
4408 |
+ if (include_TFMetadata) { |
|
4409 |
+ |
|
4410 |
+ colsMissing = setdiff(colnames(GRN@annotation$TFs), colnames(merged.df)) |
|
4411 |
+ if (length(colsMissing) > 0) { |
|
4412 |
+ merged.df = merged.df %>% |
|
4413 |
+ dplyr::left_join(GRN@annotation$TFs, by = c("TF.name", "TF.ENSEMBL")) |
|
4414 |
+ } |
|
4329 | 4415 |
} |
4330 | 4416 |
|
4417 |
+ if (include_peakMetadata) { |
|
4418 |
+ |
|
4419 |
+ colsMissing = setdiff(colnames(GRN@annotation$consensusPeaks), colnames(merged.df)) |
|
4420 |
+ if (length(colsMissing) > 0) { |
|
4421 |
+ merged.df = merged.df %>% |
|
4422 |
+ dplyr::left_join(GRN@annotation$consensusPeaks, by = "peak.ID") |
|
4423 |
+ } |
|
4424 |
+ } |
|
4425 |
+ |
|
4426 |
+ if (include_geneMetadata) { |
|
4427 |
+ |
|
4428 |
+ colsMissing = setdiff(colnames(GRN@annotation$genes), colnames(merged.df)) |
|
4429 |
+ if (length(colsMissing) > 0) { |
|
4430 |
+ merged.df = merged.df %>% |
|
4431 |
+ dplyr::left_join(GRN@annotation$genes, by = "gene.ENSEMBL") |
|
4432 |
+ } |
|
4433 |
+ } |
|
4434 |
+ |
|
4331 | 4435 |
|
4436 |
+ merged.df = merged.df %>% |
|
4437 |
+ dplyr::select(tidyselect::starts_with("TF."), |
|
4438 |
+ tidyselect::starts_with("peak."), |
|
4439 |
+ tidyselect::starts_with("TF_peak."), |
|
4440 |
+ tidyselect::starts_with("gene."), |
|
4441 |
+ tidyselect::starts_with("peak_gene."), |
|
4442 |
+ tidyselect::starts_with("TF_gene."), |
|
4443 |
+ tidyselect::everything()) %>% |
|
4444 |
+ tibble::as_tibble() |
|
4445 |
+ |
|
4446 |
+ return(merged.df) |
|
4332 | 4447 |
|
4333 | 4448 |
} else if (type == "TF_peaks") { |
4334 | 4449 |
|
... | ... |
@@ -4338,6 +4453,15 @@ getGRNConnections <- function(GRN, type = "all.filtered", permuted = FALSE, inc |
4338 | 4453 |
|
4339 | 4454 |
return(tibble::as_tibble(GRN@connections$peak_genes[[permIndex]])) |
4340 | 4455 |
|
4456 |
+ } else if (type == "TF_genes") { |
|
4457 |
+ |
|
4458 |
+ if (is.null(GRN@connections$TF_genes.filtered)) { |
|
4459 |
+ message = "Please run the function add_TF_gene_correlation first. " |
|
4460 |
+ .checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE) |
|
4461 |
+ } |
|
4462 |
+ |
|
4463 |
+ return(tibble::as_tibble(GRN@connections$TF_genes.filtered[[permIndex]])) |
|
4464 |
+ |
|
4341 | 4465 |
} |
4342 | 4466 |
|
4343 | 4467 |
} |
... | ... |
@@ -4374,6 +4498,10 @@ getGRNConnections <- function(GRN, type = "all.filtered", permuted = FALSE, inc |
4374 | 4498 |
} else { |
4375 | 4499 |
GRN@config$directories$output_plots = gsub("\\", "/", GRN@config$directories$output_plots, fixed = TRUE) |
4376 | 4500 |
} |
4501 |
+ |
|
4502 |
+ if (!endsWith(GRN@config$directories$output_plots, .Platform$file.sep)) { |
|
4503 |
+ GRN@config$directories$output_plots = paste0(GRN@config$directories$output_plots, .Platform$file.sep) |
|
4504 |
+ } |
|
4377 | 4505 |
|
4378 | 4506 |
if (!dir.exists(GRN@config$directories$output_plots)) { |
4379 | 4507 |
dir.create(GRN@config$directories$output_plots, recursive = TRUE) |
... | ... |
@@ -4382,6 +4510,8 @@ getGRNConnections <- function(GRN, type = "all.filtered", permuted = FALSE, inc |
4382 | 4510 |
|
4383 | 4511 |
dplyr::if_else(is.null(outputFolder), GRN@config$directories$output_plots, outputFolder) |
4384 | 4512 |
|
4513 |
+ |
|
4514 |
+ |
|
4385 | 4515 |
} |
4386 | 4516 |
|
4387 | 4517 |
|
... | ... |
@@ -4488,6 +4618,10 @@ getGRNConnections <- function(GRN, type = "all.filtered", permuted = FALSE, inc |
4488 | 4618 |
getParameters <- function (GRN, type = "parameter", name = "all") { |
4489 | 4619 |
|
4490 | 4620 |
checkmate::assertClass(GRN, "GRN") |
4621 |
+ GRN = .addFunctionLogToObject(GRN) |
|
4622 |
+ |
|
4623 |
+ GRN = .makeObjectCompatible(GRN) |
|
4624 |
+ |
|
4491 | 4625 |
checkmate::assertChoice(type, c("function", "parameter")) |
4492 | 4626 |
|
4493 | 4627 |
if (type == "function") { |
... | ... |
@@ -4542,6 +4676,7 @@ deleteIntermediateData <- function(GRN) { |
4542 | 4676 |
checkmate::assertClass(GRN, "GRN") |
4543 | 4677 |
GRN = .addFunctionLogToObject(GRN) |
4544 | 4678 |
|
4679 |
+ GRN = .makeObjectCompatible(GRN) |
|
4545 | 4680 |
|
4546 | 4681 |
for (permutationCur in 0:.getMaxPermutation(GRN)) { |
4547 | 4682 |
|
... | ... |
@@ -4589,8 +4724,8 @@ getBasic_metadata_visualization <- function(GRN, forceRerun = FALSE) { |
4589 | 4724 |
dplyr::mutate(ENSEMBL_ID = gsub("\\..+", "", .data$ENSEMBL_ID, perl = TRUE), |
4590 | 4725 |
baseMean_log = log2(baseMean + 0.01)) |
4591 | 4726 |
|
4592 |
- expression_TF.df = dplyr::filter(expression.df, .data$ENSEMBL_ID %in% GRN@data$TFs$translationTable$ENSEMBL) %>% |
|
4593 |
- dplyr::left_join(GRN@data$TFs$translationTable, by = c("ENSEMBL_ID" = "ENSEMBL")) |
|
4727 |
+ expression_TF.df = dplyr::filter(expression.df, .data$ENSEMBL_ID %in% GRN@annotation$TFs$TF.ENSEMBL) %>% |
|
4728 |
+ dplyr::left_join(GRN@annotation$TFs, by = c("ENSEMBL_ID" = "TF.ENSEMBL")) |
|
4594 | 4729 |
|
4595 | 4730 |
meanPeaks.df = tibble::tibble(peakID = getCounts(GRN, type = "peaks", norm = TRUE, permuted = FALSE)$peakID, |
4596 | 4731 |
mean = rowMeans(dplyr::select(getCounts(GRN, type = "peaks", norm = TRUE, permuted = FALSE), -.data$peakID))) %>% |
... | ... |
@@ -4617,14 +4752,23 @@ getBasic_metadata_visualization <- function(GRN, forceRerun = FALSE) { |
4617 | 4752 |
#' GRN = changeOutputDirectory(GRN, outputDirectory = ".") |
4618 | 4753 |
changeOutputDirectory <- function(GRN, outputDirectory = ".") { |
4619 | 4754 |
|
4620 |
- GRN@config$directories$outputRoot = outputDirectory |
|
4621 |
- GRN@config$directories$output_plots = paste0(outputDirectory, .Platform$file.sep, "plots", .Platform$file.sep) |
|
4622 |
- GRN@config$files$output_log = paste0(outputDirectory, .Platform$file.sep, "GRN.log") |
|
4755 |
+ start = Sys.time() |
|
4623 | 4756 |
|
4624 |
- futile.logger::flog.info(paste0("Output directory changed in the object to " , outputDirectory)) |
|
4625 |
- |
|
4626 |
- |
|
4627 |
- GRN |
|
4757 |
+ checkmate::assertClass(GRN, "GRN") |
|
4758 |
+ GRN = .addFunctionLogToObject(GRN) |
|
4759 |
+ |
|
4760 |
+ GRN = .makeObjectCompatible(GRN) |
|
4761 |
+ |
|
4762 |
+ checkmate::assertCharacter(outputDirectory, len = 1, min.chars = 1) |
|
4763 |
+ |
|
4764 |
+ GRN@config$directories$outputRoot = outputDirectory |
|
4765 |
+ GRN@config$directories$output_plots = paste0(outputDirectory, .Platform$file.sep, "plots", .Platform$file.sep) |
|
4766 |
+ GRN@config$files$output_log = paste0(outputDirectory, .Platform$file.sep, "GRN.log") |
|
4767 |
+ |
|
4768 |
+ futile.logger::flog.info(paste0("Output directory changed in the object to " , outputDirectory)) |
|
4769 |
+ |
|
4770 |
+ |
|
4771 |
+ GRN |
|
4628 | 4772 |
|
4629 | 4773 |
} |
4630 | 4774 |
|
... | ... |
@@ -4735,3 +4879,249 @@ changeOutputDirectory <- function(GRN, outputDirectory = ".") { |
4735 | 4879 |
paste0(names(tbl_freq), " (", tbl_freq, ")") |
4736 | 4880 |
} |
4737 | 4881 |
|
4882 |
+######## Quantify source of variation ######## |
|
4883 |
+ |
|
4884 |
+#' Quantify and interpret multiple sources of biological and technical variation for features (TFs, peaks, and genes) in a \code{\linkS4class{GRN}} object |
|
4885 |
+#' |
|
4886 |
+#' Runs the main function \code{fitExtractVarPartModel} of the package \code{variancePartition}: Fits a linear (mixed) model to estimate contribution of multiple sources of variation while simultaneously correcting for all other variables for the features in a GRN object (TFs, peaks, and genes) given particular metadata. The function reports the fraction of variance attributable to each metadata variable. |
|
4887 |
+#' \strong{Note: The results are not added to \code{GRN@connections$all.filtered}, rerun the function \code{\link{getGRNConnections}} and set \code{include_variancePartitionResults} to \code{TRUE} to do so}. |
|
4888 |
+#' The results object is stored in \code{GRN@stats$variancePartition} and can be used for the various diagnostic and plotting functions from \code{variancePartition}. |
|
4889 |
+#' |
|
4890 |
+#' The normalized count matrices are used as input for \code{fitExtractVarPartModel}. |
|
4891 |
+#' |
|
4892 |
+#' @template GRN |
|
4893 |
+#' @param formula Character(1). Either \code{auto} or a manually defined formula to be used for the model fitting. Default \code{auto}. Must include only terms that are part of the metadata as specified with the \code{metadata} parameter. If set to \code{auto}, the formula will be build automatically based on all metadata variables as specified with the \code{metadata} parameter. By default, numerical variables will be modeled as fixed effects, while variables that are defined as factors or can be converted to factors (characters and logical variables) are modeled as random effects as recommended by the \code{variancePartition} package. |
|
4894 |
+#' @param metadata Character vector. Default \code{all}. Vector of column names from the metadata data frame that was provided when using the function |
|
4895 |
+#' \code{\link{addData}}. Must either contain the special keyword \code{all} to denote that all (!) metadata columns from \code{GRN@data$metadata} are taken |
|
4896 |
+#' or if not, a subset of the column names from \code{GRN@data$metadata}to include in the model fitting for \code{fitExtractVarPartModel}.. |
|
4897 |
+#' @param features Character(1). Either \code{all_filtered} or \code{all}. Default \code{all_filtered}. Should \code{variancePartition} only be run for the features (TFs, peaks and genes) from the filtered set of connections (the result of \code{\link{filterGRNAndConnectGenes}}) or for all genes that are defined in the object? If set to \code{all}, the running time is greatly increased. |
|
4898 |
+#' @template nCores |
|
4899 |
+#' @template forceRerun |
|
4900 |
+#' @param ... Additional parameters passed on to \code{variancePartition::fitExtractVarPartModel} beyond \code{exprObj}, \code{formula} and \code{data}. See the function help for more information |
|
4901 |
+# #' @seealso \code{\link{plotDiagnosticPlots_featureVariation}} |
|
4902 |
+#' @seealso \code{\link{addData}} |
|
4903 |
+#' @seealso \code{\link{getGRNConnections}} |
|
4904 |
+#' @return An updated \code{\linkS4class{GRN}} object, with additional information added from this function to \code{GRN@stats$variancePartition} as well as the elements \code{genes}, \code{consensusPeaks} and \code{TFs} within \code{GRN@annotation}. |
|
4905 |
+#' As noted above, the results are not added to \code{GRN@connections$all.filtered}; rerun the function \code{\link{getGRNConnections}} and set \code{include_variancePartitionResults} to \code{TRUE} to include the results in the eGRN output table. |
|
4906 |
+#' @examples |
|
4907 |
+#' # See the Workflow vignette on the GRaNIE website for examples |
|
4908 |
+#' GRN = loadExampleObject() |
|
4909 |
+#' GRN = add_featureVariation(GRN, metadata = c("mt_frac"), forceRerun = TRUE) |
|
4910 |
+#' @export |
|
4911 |
+add_featureVariation <- function (GRN, |
|
4912 |
+ formula = "auto", |
|
4913 |
+ metadata = c("all"), |
|
4914 |
+ features = "all_filtered", |
|
4915 |
+ nCores = 1, |
|
4916 |
+ forceRerun = FALSE, |
|
4917 |
+ ...) { |
|
4918 |
+ |
|
4919 |
+ start = Sys.time() |
|
4920 |
+ |
|
4921 |
+ checkmate::assertClass(GRN, "GRN") |
|
4922 |
+ GRN = .addFunctionLogToObject(GRN) |
|
4923 |
+ |
|
4924 |
+ GRN = .makeObjectCompatible(GRN) |
|
4925 |
+ |
|
4926 |
+ checkmate::assertCharacter(formula, min.chars = 2, len = 1) |
|
4927 |
+ |
|
4928 |
+ packageMessage = paste0("The package variancePartition is not installed but required for this function.") |
|
4929 |
+ .checkPackageInstallation("variancePartition", packageMessage, isWarning = FALSE) |
|
4930 |
+ |
|
4931 |
+ if (is.null(GRN@data$metadata)) { |
|
4932 |
+ message = paste0("No metadata was found (GRN@data$metadata is NULL), cannot run variancePartition without metadata. Reren the addData function and provide metadata.") |
|
4933 |
+ |
|
4934 |
+ .checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE) |
|
4935 |
+ } |
|
4936 |
+ checkmate::assert(checkmate::checkChoice(metadata, "all"), checkmate::checkSubset(metadata, colnames(GRN@data$metadata))) |
|
4937 |
+ |
|
4938 |
+ checkmate::assertChoice(features, c("all_filtered", "all")) |
|
4939 |
+ checkmate::assertIntegerish(nCores, lower = 1) |
|
4940 |
+ checkmate::assertFlag(forceRerun) |
|
4941 |
+ |
|
4942 |
+ if (is.null(GRN@stats$variancePartition$RNA) | is.null(GRN@stats$variancePartition$peaks) | forceRerun) { |
|
4943 |
+ |
|
4944 |
+ if (features == "all_filtered") { |
|
4945 |
+ .checkExistanceFilteredConnections(GRN) |
|
4946 |
+ } |
|
4947 |
+ |
|
4948 |
+ # Prepare the metadata |
|
4949 |
+ if ("all" %in% metadata) { |
|
4950 |
+ columnsToSelect = colnames(GRN@data$metadata) |
|
4951 |
+ } else { |
|
4952 |
+ columnsToSelect = unique(c(metadata, "has_both")) |
|
4953 |
+ } |
|
4954 |
+ meta <- GRN@data$metadata %>% |
|
4955 |
+ dplyr::filter(.data$has_both == TRUE) %>% |
|
4956 |
+ dplyr::select(tidyselect::one_of(columnsToSelect)) %>% |
|
4957 |
+ dplyr::mutate_if(is.character, as.factor) %>% |
|
4958 |
+ dplyr::mutate_if(is.logical, as.factor) %>% |
|
4959 |
+ dplyr::select(-.data$has_both) |
|
4960 |
+ |
|
4961 |
+ # Remove factors with only one level |
|
4962 |
+ coltypes = meta %>% dplyr::summarise_all(class) |
|
4963 |
+ factorVariables = which(coltypes[1,] == "factor") |
|
4964 |
+ numericVariables = which(coltypes[1,] == "numeric") |
|
4965 |
+ factorVariablesNames = colnames(coltypes)[factorVariables] |
|
4966 |
+ numericVariablesNames = colnames(coltypes)[numericVariables] |
|
4967 |
+ |
|
4968 |
+ nLevels = sapply(factorVariables, function(x) {nlevels(dplyr::pull(meta, colnames(coltypes)[x]))}) |
|
4969 |
+ nLevelOne = which(nLevels == 1) |
|
4970 |
+ if (length(nLevelOne) > 0) { |
|
4971 |
+ meta = dplyr::select(meta, -tidyselect::one_of(factorVariablesNames[nLevelOne])) |
|
4972 |
+ factorVariablesNames = intersect(colnames(meta), factorVariablesNames) |
|
4973 |
+ } |
|
4974 |
+ |
|
4975 |
+ if (formula == "auto") { |
|
4976 |
+ |
|
4977 |
+ futile.logger::flog.info(paste0("Due to formula = \"auto\", all factors will be modeled as random effects and all numerical variables as fixed effects, as recommended in the variancePartition vignette.")) |
|
4978 |
+ |
|
4979 |
+ # See https://bioconductor.org/packages/release/bioc/vignettes/variancePartition/inst/doc/variancePartition.pdf for details |
|
4980 |
+ # Factor variables are modeled as random effect |
|
4981 |
+ if (length(factorVariablesNames) > 0) { |
|
4982 |
+ formulaElems = paste0("(1|", factorVariablesNames, ")") |
|
4983 |
+ } else { |
|
4984 |
+ formulaElems ="" |
|
4985 |
+ } |
|
4986 |
+ |
|
4987 |
+ if (length(numericVariablesNames) > 0) { |
|
4988 |
+ if (length(factorVariablesNames) > 0) { |
|
4989 |
+ separator = " + " |
|
4990 |
+ } else { |
|
4991 |
+ separator = "" |
|
4992 |
+ } |
|
4993 |
+ # Numeric variables are modeled as fixed effect |
|
4994 |
+ formulaElems2 = paste0(numericVariablesNames) |
|
4995 |
+ |
|
4996 |
+ message = paste0("Make sure that all variables from the metadata that are numeric are indeed really numeric and not (hidden) factors. The following variables will be treated as numeric: ", paste0(numericVariablesNames, collapse = ",", ". If one of these is in fact a factor, change the type in GRN@data$metadata and re-run, or provide the design formula manually")) |
|
4997 |
+ .checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE) |
|
4998 |
+ |
|
4999 |
+ } else { |
|
5000 |
+ separator = "" |
|
5001 |
+ formulaElems2 = c() |
|
5002 |
+ } |
|
5003 |
+ |
|
5004 |
+ designFormula <- stats::as.formula(paste0("~ ", paste0(formulaElems, collapse = " + "), separator, paste0(formulaElems2, collapse = " + "))) |
|
5005 |
+ |
|
5006 |
+ } else { |
|
5007 |
+ |
|
5008 |
+ message = paste0("A custom formula has been provides. This is currently not being checked for correctness and therefore may reuslt in errors when running variancePartition. In that case, make sure the provided formula is correct.") |
|
5009 |
+ .checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE) |
|
5010 |
+ designFormula = formula |
|
5011 |
+ } |
|
5012 |
+ |
|
5013 |
+ futile.logger::flog.info(paste0("The following formula will be used for model fitting: \"", deparse1(designFormula), "\". Make sure this is appropriate.")) |
|
5014 |
+ |
|
5015 |
+ |
|
5016 |
+ # RNA and ATAC norm counts |
|
5017 |
+ if (features == "all_filtered") { |
|
5018 |
+ genesToInclude = unique(c(GRN@connections$all.filtered[["0"]]$TF.ENSEMBL, GRN@connections$all.filtered[["0"]]$gene.ENSEMBL)) |
|
5019 |
+ peaksToInclude = unique(GRN@connections$all.filtered[["0"]]$peak.ID) |
|
5020 |
+ |
|
5021 |
+ futile.logger::flog.info(paste0("Using ", length(peaksToInclude), " peaks and ", length(genesToInclude), " TF/genes from the filtered connections.")) |
|
5022 |
+ |
|
5023 |
+ } else if (features == "all") { |
|
5024 |
+ genesToInclude = GRN@data$RNA$counts_norm.l[["0"]]$ENSEMBL |
|
5025 |
+ peaksToInclude = GRN@data$peaks$counts_norm$peakID |
|
5026 |
+ |
|
5027 |
+ futile.logger::flog.info(paste0("Using all ", length(peaksToInclude), " peaks and ", length(genesToInclude), " TF/genes. This may take a long time. Consider setting features = \"all_filtered\".")) |
|
5028 |
+ } |
|
5029 |
+ |
|
5030 |
+ data.l = list() |
|
5031 |
+ data.l[["RNA"]] <- GRN@data$RNA$counts_norm.l[["0"]] %>% |
|
5032 |
+ dplyr::filter(!.data$isFiltered, .data$ENSEMBL %in% genesToInclude) %>% |
|
5033 |
+ dplyr::select(-.data$isFiltered) %>% |
|
5034 |
+ tibble::column_to_rownames(var = "ENSEMBL") |
|
5035 |
+ |
|
5036 |
+ data.l[["peaks"]] <- GRN@data$peaks$counts_norm %>% |
|
5037 |
+ dplyr::filter(!.data$isFiltered, .data$peakID %in% peaksToInclude) %>% |
|
5038 |
+ dplyr::select(-.data$isFiltered) %>% |
|
5039 |
+ tibble::column_to_rownames(var = "peakID") |
|
5040 |
+ |
|
5041 |
+ |
|
5042 |
+ for (dataType in c("RNA", "peaks")) { |
|
5043 |
+ |
|
5044 |
+ # row_order <- matrixStats::rowVars(as.matrix(data.l[[dataType]]) ) %>% order(decreasing = T) |
|
5045 |
+ # data_set <- data.l[[dataType]][row_order,] |
|
5046 |
+ |
|
5047 |
+ # fit model |
|
5048 |
+ futile.logger::flog.info(paste0("Running variancePartition and fit models for data type ", dataType, " using ", nCores, " core(s). This may take a while.")) |
|
5049 |
+ varPart <- variancePartition::fitExtractVarPartModel(exprObj = data.l[[dataType]], |
|
5050 |
+ formula = designFormula, |
|
5051 |
+ data = meta, |
|
5052 |
+ BPPARAM = .initBiocParallel(nCores), |
|
5053 |
+ ...) |
|
5054 |
+ GRN@stats$variancePartition[[dataType]] = varPart |
|
5055 |
+ |
|
5056 |
+ res = as.data.frame(GRN@stats$variancePartition[[dataType]]) %>% |
|
5057 |
+ dplyr::rename_all( .funs = ~ paste0("variancePartition_", .x)) %>% |
|
5058 |
+ tibble::as_tibble(rownames = "ID") |
|
5059 |
+ |
|
5060 |
+ # Update annotation slots |
|
5061 |
+ if (dataType == "RNA") { |
|
5062 |
+ |
|
5063 |
+ colnames(res)[2:ncol(res)] = paste0("gene.", colnames(res)[2:ncol(res)]) |
|
5064 |
+ |
|
5065 |
+ GRN@annotation$genes = GRN@annotation$genes %>% |
|
5066 |
+ dplyr::select(-tidyselect::starts_with("gene.variancePart")) %>% |
|
5067 |
+ dplyr::left_join(res, by = c("gene.ENSEMBL" = "ID")) |
|
5068 |
+ |
|
5069 |
+ colnames(res) = gsub("gene.", "TF.", colnames(res)) |
|
5070 |
+ |
|
5071 |
+ GRN@annotation$TFs = GRN@annotation$TFs %>% |
|
5072 |
+ dplyr::select(-tidyselect::starts_with("TF.variancePart")) %>% |
|
5073 |
+ dplyr::left_join(res, by = c("TF.ENSEMBL" = "ID")) |
|
5074 |
+ |
|
5075 |
+ |
|
5076 |
+ } else { |
|
5077 |
+ |
|
5078 |
+ colnames(res)[2:ncol(res)] = paste0("peak.", colnames(res)[2:ncol(res)]) |
|
5079 |
+ |
|
5080 |
+ GRN@annotation$consensusPeaks = GRN@annotation$consensusPeaks %>% |
|
5081 |
+ dplyr::select(-tidyselect::starts_with("variancePart")) %>% |
|
5082 |
+ dplyr::left_join(res, by = c("peak.ID" = "ID")) |
|
5083 |
+ } |
|
5084 |
+ |
|
5085 |
+ } |
|
5086 |
+ |
|
5087 |
+ futile.logger::flog.info(paste0("The result objects have been stored in GRN@stats$variancePartition for both RNA and peaks.")) |
|
5088 |
+ |
|
5089 |
+ } else { |
|
5090 |
+ futile.logger::flog.info(paste0("Data already exists in object, nothing to do due to forceRerun = FALSE")) |
|
5091 |
+ } |
|
5092 |
+ |
|
5093 |
+ |
|
5094 |
+ .printExecutionTime(start, prefix = "") |
|
5095 |
+ |
|
5096 |
+ GRN |
|
5097 |
+ |
|
5098 |
+} |
|
5099 |
+ |
|
5100 |
+ |
|
5101 |
+.checkExistanceFilteredConnections <- function(GRN) { |
|
5102 |
+ |
|
5103 |
+ if (is.null(GRN@connections$all.filtered[["0"]])) { |
|
5104 |
+ message = "Could not find filtered connections (the slot GRN@connections$all.filtered[[\"0\"]] is NULL). Run the function filterGRNAndConnectGenes." |
|
5105 |
+ .checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE) |
|
5106 |
+ } |
|
5107 |
+} |
|
5108 |
+ |
|
5109 |
+.makeObjectCompatible <- function(GRN) { |
|
5110 |
+ |
|
5111 |
+ if (is.null(GRN@annotation$TFs)) { |
|
5112 |
+ GRN@annotation$TFs = GRN@data$TFs$translationTable |
|
5113 |
+ GRN@data$TFs$translationTable = NULL |
|
5114 |
+ } |
|
5115 |
+ |
|
5116 |
+ |
|
5117 |
+ # Temporary fix: Replace lincRNA with lncRNA due to a recent change in biomaRt until we update the object directly |
|
5118 |
+ if ("lncRNA" %in% levels(GRN@annotation$genes$gene.type)) { |
|
5119 |
+ GRN@annotation$genes = dplyr::mutate(GRN@annotation$genes, gene.type = dplyr::recode_factor(.data$gene.type, lncRNA = "lincRNA")) |
|
5120 |
+ } |
|
5121 |
+ |
|
5122 |
+ |
|
5123 |
+ |
|
5124 |
+ GRN |
|
5125 |
+} |
|
5126 |
+ |
|
5127 |
+ |
|
4738 | 5128 |
\ No newline at end of file |
... | ... |
@@ -21,6 +21,9 @@ build_eGRN_graph <- function(GRN, model_TF_gene_nodes_separately = FALSE, |
21 | 21 |
start = Sys.time() |
22 | 22 |
checkmate::assertClass(GRN, "GRN") |
23 | 23 |
GRN = .addFunctionLogToObject(GRN) |
24 |
+ |
|
25 |
+ GRN = .makeObjectCompatible(GRN) |
|
26 |
+ |
|
24 | 27 |
checkmate::assertFlag(model_TF_gene_nodes_separately) |
25 | 28 |
checkmate::assertFlag(allowLoops) |
26 | 29 |
checkmate::assertFlag(removeMultiple) |
... | ... |
@@ -234,8 +237,12 @@ performAllNetworkAnalyses <- function(GRN, ontology = c("GO_BP", "GO_MF"), |
234 | 237 |
forceRerun = FALSE) { |
235 | 238 |
|
236 | 239 |
start = Sys.time() |
240 |
+ |
|
241 |
+ checkmate::assertClass(GRN, "GRN") |
|
237 | 242 |
GRN = .addFunctionLogToObject(GRN) |
238 | 243 |
|
244 |
+ GRN = .makeObjectCompatible(GRN) |
|
245 |
+ |
|
239 | 246 |
GRN = build_eGRN_graph(GRN, model_TF_gene_nodes_separately = FALSE, allowLoops = FALSE, directed = FALSE, removeMultiple = FALSE) |
240 | 247 |
|
241 | 248 |
GRN = plotGeneralGraphStats(GRN, outputFolder = outputFolder, forceRerun = forceRerun) |
... | ... |
@@ -362,6 +369,8 @@ calculateGeneralEnrichment <- function(GRN, ontology = c("GO_BP", "GO_MF"), |
362 | 369 |
checkmate::assertClass(GRN, "GRN") |
363 | 370 |
GRN = .addFunctionLogToObject(GRN) |
364 | 371 |
|
372 |
+ GRN = .makeObjectCompatible(GRN) |
|
373 |
+ |
|
365 | 374 |
checkmate::assertSubset(ontology , c("GO_BP", "GO_MF", "GO_CC", "KEGG", "DO", "Reactome"), empty.ok = FALSE) |
366 | 375 |
|
367 | 376 |
.checkPackage_topGO_and_arguments(ontology, algorithm, statistic) |
... | ... |
@@ -490,7 +499,7 @@ calculateGeneralEnrichment <- function(GRN, ontology = c("GO_BP", "GO_MF"), |
490 | 499 |
} |
491 | 500 |
|
492 | 501 |
#' @importFrom biomaRt useEnsembl getBM |
493 |
-.runEnrichment <- function(GRN, foreground, background, backgroundStr, database = "GO", ontology, |
|
502 |
+.runEnrichment <- function(GRN, foreground, background, backgroundStr, ontology, |
|
494 | 503 |
description = "Enrichment Analysis", |
495 | 504 |
algorithm="weight01", statistic = "fisher", mapping, pAdjustMethod = "BH", minGSSize = 0, maxGSSize = 5000){ |
496 | 505 |
|
... | ... |
@@ -509,7 +518,7 @@ calculateGeneralEnrichment <- function(GRN, ontology = c("GO_BP", "GO_MF"), |
509 | 518 |
|
510 | 519 |
|
511 | 520 |
if (ontology %in% c("KEGG", "DO", "Reactome")) { |
512 |
- # the ENSEMBL IDs will need to be mapped to Enntrez IDs for these ontologies |
|
521 |
+ # the ENSEMBL IDs will need to be mapped to Entrez IDs for these ontologies |
|
513 | 522 |
|
514 | 523 |
if (statistic != "fisher") { |
515 | 524 |
statistic = "fisher" |
... | ... |
@@ -759,7 +768,8 @@ calculateGeneralEnrichment <- function(GRN, ontology = c("GO_BP", "GO_MF"), |
759 | 768 |
#' @param clustering Character. Default \code{louvain}. One of: \code{louvain}, \code{leiden}, \code{leading_eigen}, \code{fast_greedy}, \code{optimal}, \code{walktrap}. The community detection algorithm to be used. Please bear in mind the robustness and time consumption of the algorithms when opting for an alternative to the default. |
760 | 769 |
#' @param ... Additional parameters for the used clustering method, see the \code{igraph::cluster_*} methods for details on the specific parameters and what they do. For \code{leiden} clustering, for example, you may add a \code{resolution_parameter} to control the granularity of the community detection or \code{n_iterations} to modify the number of iterations. |
761 | 770 |
#' @template forceRerun |
762 |
-#' @return An updated \code{\linkS4class{GRN}} object, with a table that consists of the connections clustered into communities stored in the \code{stats$communities} slot. |
|
771 |
+#' @return An updated \code{\linkS4class{GRN}} object, with a table that consists of the connections clustered into communities stored in the |
|
772 |
+#' \code{GRN@graph$TF_gene$clusterGraph} slot as well as within the \code{igraph} object in \code{GRN@graph$TF_gene$graph} (retrievable via \code{igraph} using \code{igraph::vertex.attributes(GRN@graph$TF_gene$graph)$community}, for example.) |
|
763 | 773 |
#' @seealso \code{\link{plotCommunitiesStats}} |
764 | 774 |
#' @seealso \code{\link{calculateCommunitiesEnrichment}} |
765 | 775 |
#' @import patchwork |
... | ... |
@@ -774,6 +784,7 @@ calculateCommunitiesStats <- function(GRN, clustering = "louvain", forceRerun = |
774 | 784 |
checkmate::assertClass(GRN, "GRN") |
775 | 785 |
GRN = .addFunctionLogToObject(GRN) |
776 | 786 |
|
787 |
+ GRN = .makeObjectCompatible(GRN) |
|
777 | 788 |
|
778 | 789 |
checkmate::assertChoice(clustering, c("louvain", "leading_eigen", "fast_greedy", "optimal", "walktrap", "leiden")) |
779 | 790 |
checkmate::assertFlag(forceRerun) |
... | ... |
@@ -850,7 +861,8 @@ calculateCommunitiesStats <- function(GRN, clustering = "louvain", forceRerun = |
850 | 861 |
#' |
851 | 862 |
#' @inheritParams calculateGeneralEnrichment |
852 | 863 |
#' @param selection Character. Default \code{"byRank"}. One of: \code{"byRank"}, \code{"byLabel"}. Specify whether the communities enrichment will by calculated based on their rank, where the largest community (with most vertices) would have a rank of 1, or by their label. Note that the label is independent of the rank. |
853 |
-#' @param communities Numeric vector. Default \code{c(1:10)}. Depending on what was specified in the \code{display} parameter, this parameter would indicate either the rank or the label of the communities to be plotted. i.e. for \code{communities = c(1,4)}, if \code{display = "byRank"} the GO enrichment for the first and fourth largest communities will be calculated if \code{display = "byLabel"}, the results for the communities labeled "1", and "4" will be plotted. |
|
864 |
+#' @param communities Numeric vector. Default \code{c(1:10)}. Depending on what was specified in the \code{display} parameter, |
|
865 |
+#' this parameter would indicate either the rank or the label of the communities to be plotted. i.e. for \code{communities = c(1,4)}, if \code{display = "byRank"} the GO enrichment for the first and fourth largest communities will be calculated if \code{display = "byLabel"}, the results for the communities labeled "1", and "4" will be plotted. |
|
854 | 866 |
#' @return An updated \code{\linkS4class{GRN}} object, with the enrichment results stored in the \code{stats$Enrichment$byCommunity} slot. |
855 | 867 |
#' @seealso \code{\link{plotCommunitiesEnrichment}} |
856 | 868 |
#' @seealso \code{\link{plotGeneralEnrichment}} |
... | ... |
@@ -874,6 +886,7 @@ calculateCommunitiesEnrichment <- function(GRN, |
874 | 886 |
checkmate::assertClass(GRN, "GRN") |
875 | 887 |
GRN = .addFunctionLogToObject(GRN) |
876 | 888 |
|
889 |
+ GRN = .makeObjectCompatible(GRN) |
|
877 | 890 |
|
878 | 891 |
checkmate::assertSubset(ontology , c("GO_BP", "GO_MF", "GO_CC", "KEGG", "DO", "Reactome"), empty.ok = FALSE) |
879 | 892 |
|
... | ... |
@@ -981,7 +994,7 @@ calculateCommunitiesEnrichment <- function(GRN, |
981 | 994 |
|
982 | 995 |
if (length(selCommunities) == 0) { |
983 | 996 |
existingCommunities = unique(df$community) |
984 |
- message = paste0("None of the requested communities (", paste0(communities, collapse = ",", ") were found. Only the following communities are available: ", paste0(existingCommunities, collapse = ","))) |
|
997 |
+ message = paste0("None of the requested communities (", paste0(communities, collapse = ","), ") were found. Only the following communities are available: ", paste0(existingCommunities, collapse = ",")) |
|
985 | 998 |
.checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE) |
986 | 999 |
} |
987 | 1000 |
|
... | ... |
@@ -1012,6 +1025,9 @@ getTopNodes <- function(GRN, nodeType, rankType, n = 0.1, use_TF_gene_network = |
1012 | 1025 |
start = Sys.time() |
1013 | 1026 |
checkmate::assertClass(GRN, "GRN") |
1014 | 1027 |
GRN = .addFunctionLogToObject(GRN) |
1028 |
+ |
|
1029 |
+ GRN = .makeObjectCompatible(GRN) |
|
1030 |
+ |
|
1015 | 1031 |
checkmate::assertChoice(nodeType, c("gene", "TF")) |
1016 | 1032 |
checkmate::assertChoice(rankType, c("degree", "EV")) |
1017 | 1033 |
checkmate::assertFlag(use_TF_gene_network) |
... | ... |
@@ -1117,6 +1133,8 @@ calculateTFEnrichment <- function(GRN, rankType = "degree", n = 3, TF.names = NU |
1117 | 1133 |
checkmate::assertClass(GRN, "GRN") |
1118 | 1134 |
GRN = .addFunctionLogToObject(GRN) |
1119 | 1135 |
|
1136 |
+ GRN = .makeObjectCompatible(GRN) |
|
1137 |
+ |
|
1120 | 1138 |
checkmate::assertChoice(rankType, c("degree", "EV", "custom")) |
1121 | 1139 |
checkmate::assert(checkmate::checkNumeric(n, lower = 0.0001, upper = 0.999999), checkmate::checkIntegerish(n, lower = 1)) |
1122 | 1140 |
checkmate::assertSubset(ontology , c("GO_BP", "GO_MF", "GO_CC", "KEGG", "Reactome", "DO"), empty.ok = FALSE) |
... | ... |
@@ -1133,10 +1151,14 @@ calculateTFEnrichment <- function(GRN, rankType = "degree", n = 3, TF.names = NU |
1133 | 1151 |
futile.logger::flog.error("To calculate the GO enrichment for a custom set of TFs, you must provide the TF names in the 'TF.names' parameter.") |
1134 | 1152 |
} |
1135 | 1153 |
wrongTFs = setdiff(TF.names, unique(GRN@connections$all.filtered$`0`$TF.name)) |
1136 |
- if (length(wrongTFs)>0){ |
|
1137 |
- futile.logger::flog.warn(paste0("The following TFs are not in the filtered GRN and will be ommited from the analysis: ", paste0(wrongTFs, collapse = ", "))) |
|
1138 |
- } |
|
1154 |
+ |
|
1139 | 1155 |
TFset = setdiff(TF.names, wrongTFs) |
1156 |
+ |
|
1157 |
+ if (length(wrongTFs) > 0 & length(TFset) > 0){ |
|
1158 |
+ message = paste0("The following TFs are not in the filtered GRN and will be ommited from the analysis: ", paste0(wrongTFs, collapse = ", ")) |
|
1159 |
+ .checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE) |
|
1160 |
+ } |
|
1161 |
+ |
|
1140 | 1162 |
} else{ |
1141 | 1163 |
|
1142 | 1164 |
# TF.name is always there, irrespective of whether ENSEMBL ID or TF name is used as primary ID type |
... | ... |
@@ -1158,7 +1180,7 @@ calculateTFEnrichment <- function(GRN, rankType = "degree", n = 3, TF.names = NU |
1158 | 1180 |
|
1159 | 1181 |
} else { |
1160 | 1182 |
message = paste0("No TF fulfills the chosen criteria. Try increasing the value of the parameter n") |
1161 |
- .checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE) |
|
1183 |
+ .checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE) |
|
1162 | 1184 |
} |
1163 | 1185 |
|
1164 | 1186 |
for (TF in as.character(TFset)){ |
... | ... |
@@ -33,6 +33,8 @@ plotPCA_all <- function(GRN, outputFolder = NULL, basenameOutput = NULL, |
33 | 33 |
checkmate::assertClass(GRN, "GRN") |
34 | 34 |
GRN = .addFunctionLogToObject(GRN) |
35 | 35 |
|
36 |
+ GRN = .makeObjectCompatible(GRN) |
|
37 |
+ |
|
36 | 38 |
checkmate::assertSubset(data , c("rna", "peaks", "all"), empty.ok = FALSE) |
37 | 39 |
checkmate::assertSubset(type , c("raw", "normalized", "all"), empty.ok = FALSE) |
38 | 40 |
checkmate::assertFlag(plotAsPDF) |
... | ... |
@@ -337,7 +339,7 @@ plotPCA_all <- function(GRN, outputFolder = NULL, basenameOutput = NULL, |
337 | 339 |
|
338 | 340 |
} |
339 | 341 |
|
340 |
- .printWarningPageNumber(pages, pageCounter) |
|
342 |
+ .checkPageNumberValidity(pages, pageCounter) |
|
341 | 343 |
if (plotAsPDF) grDevices::dev.off() |
342 | 344 |
|
343 | 345 |
} |
... | ... |
@@ -516,6 +518,8 @@ plotDiagnosticPlots_TFPeaks <- function(GRN, |
516 | 518 |
checkmate::assertClass(GRN, "GRN") |
517 | 519 |
GRN = .addFunctionLogToObject(GRN) |
518 | 520 |
|
521 |
+ GRN = .makeObjectCompatible(GRN) |
|
522 |
+ |
|
519 | 523 |
checkmate::assert(checkmate::checkNull(outputFolder), checkmate::checkCharacter(outputFolder, min.chars = 1)) |
520 | 524 |
checkmate::assert(checkmate::checkNull(basenameOutput), checkmate::checkCharacter(basenameOutput, len = 1, min.chars = 1, any.missing = FALSE)) |
521 | 525 |
checkmate::assertFlag(plotDetails) |
... | ... |
@@ -857,7 +861,7 @@ plotDiagnosticPlots_TFPeaks <- function(GRN, |
857 | 861 |
|
858 | 862 |
} # end allTF |
859 | 863 |
|
860 |
- .printWarningPageNumber(pages, pageCounter) |
|
864 |
+ .checkPageNumberValidity(pages, pageCounter) |
|
861 | 865 |
|
862 | 866 |
if (!is.null(file)) dev.off() |
863 | 867 |
|
... | ... |
@@ -908,6 +912,8 @@ plotDiagnosticPlots_peakGene <- function(GRN, |
908 | 912 |
checkmate::assertClass(GRN, "GRN") |
909 | 913 |
GRN = .addFunctionLogToObject(GRN) |
910 | 914 |
|
915 |
+ GRN = .makeObjectCompatible(GRN) |
|
916 |
+ |
|
911 | 917 |
checkmate::assert(checkmate::checkNull(outputFolder), checkmate::checkCharacter(outputFolder, min.chars = 1)) |
912 | 918 |
checkmate::assert(checkmate::checkNull(basenameOutput), checkmate::checkCharacter(basenameOutput, len = 1, min.chars = 1, any.missing = FALSE)) |
913 | 919 |
checkmate::assertList(gene.types, any.missing = FALSE, min.len = 1, types = "character") |
... | ... |
@@ -1553,7 +1559,7 @@ plotDiagnosticPlots_peakGene <- function(GRN, |
1553 | 1559 |
} |
1554 | 1560 |
pageCounter = pageCounter + 1 |
1555 | 1561 |
|
1556 |
- .printWarningPageNumber(pages, pageCounter) |
|
1562 |
+ .checkPageNumberValidity(pages, pageCounter) |
|
1557 | 1563 |
if (!is.null(fileBase)) { |
1558 | 1564 |
grDevices::dev.off() |
1559 | 1565 |
} |
... | ... |
@@ -1612,6 +1618,8 @@ plot_stats_connectionSummary <- function(GRN, type = "heatmap", |
1612 | 1618 |
checkmate::assertClass(GRN, "GRN") |
1613 | 1619 |
GRN = .addFunctionLogToObject(GRN) |
1614 | 1620 |
|
1621 |
+ GRN = .makeObjectCompatible(GRN) |
|
1622 |
+ |
|
1615 | 1623 |
|
1616 | 1624 |
checkmate::assertChoice(type, c("heatmap", "boxplot", "density")) |
1617 | 1625 |
checkmate::assertFlag(plotAsPDF) |
... | ... |
@@ -1764,7 +1772,7 @@ plot_stats_connectionSummary <- function(GRN, type = "heatmap", |
1764 | 1772 |
} |
1765 | 1773 |
} |
1766 | 1774 |
|
1767 |
- .printWarningPageNumber(pages, pageCounter) |
|
1775 |
+ .checkPageNumberValidity(pages, pageCounter) |
|
1768 | 1776 |
if (!is.null(file)) { |
1769 | 1777 |
grDevices::dev.off() |
1770 | 1778 |
} |
... | ... |
@@ -1911,7 +1919,7 @@ plot_stats_connectionSummary <- function(GRN, type = "heatmap", |
1911 | 1919 |
|
1912 | 1920 |
} # end for TF_peak.fdr_cur |
1913 | 1921 |
|
1914 |
- .printWarningPageNumber(pages, pageCounter) |
|
1922 |
+ .checkPageNumberValidity(pages, pageCounter) |
|
1915 | 1923 |
if (!is.null(file)) { |
1916 | 1924 |
grDevices::dev.off() |
1917 | 1925 |
} |
... | ... |
@@ -1954,6 +1962,8 @@ plotGeneralGraphStats <- function(GRN, outputFolder = NULL, basenameOutput = NUL |
1954 | 1962 |
checkmate::assertClass(GRN, "GRN") |
1955 | 1963 |
GRN = .addFunctionLogToObject(GRN) |
1956 | 1964 |
|
1965 |
+ GRN = .makeObjectCompatible(GRN) |
|
1966 |
+ |
|
1957 | 1967 |
checkmate::assertFlag(plotAsPDF) |
1958 | 1968 |
checkmate::assertNumeric(pdf_width, lower = 5, upper = 99) |
1959 | 1969 |
checkmate::assertNumeric(pdf_height, lower = 5, upper = 99) |
... | ... |
@@ -2113,7 +2123,7 @@ plotGeneralGraphStats <- function(GRN, outputFolder = NULL, basenameOutput = NUL |
2113 | 2123 |
|
2114 | 2124 |
} |
2115 | 2125 |
|
2116 |
- .printWarningPageNumber(pages, pageCounter) |
|
2126 |
+ .checkPageNumberValidity(pages, pageCounter) |
|
2117 | 2127 |
|
2118 | 2128 |
if (plotAsPDF) {grDevices::dev.off()} |
2119 | 2129 |
|
... | ... |
@@ -2127,11 +2137,11 @@ plotGeneralGraphStats <- function(GRN, outputFolder = NULL, basenameOutput = NUL |
2127 | 2137 |
|
2128 | 2138 |
} |
2129 | 2139 |
|
2130 |
-.printWarningPageNumber <- function(pages, pageCounter) { |
|
2140 |
+.checkPageNumberValidity <- function(pages, pageCounter) { |
|
2131 | 2141 |
|
2132 | 2142 |
if(any(pages > pageCounter)) { |
2133 |
- message = paste0("At least one page could not be plotted because the total number of plots is only ", pageCounter, " while a larger page number has been requested") |
|
2134 |
- .checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE) |
|
2143 |
+ message = paste0("At least one page could not be plotted because the total number of plots is only ", pageCounter, " while a larger page number has been requested. To fix this, re-adjust the page number(s) and execute the function again.") |
|
2144 |
+ .checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE) |
|
2135 | 2145 |
} |
2136 | 2146 |
} |
2137 | 2147 |
|
... | ... |
@@ -2172,6 +2182,8 @@ plotGeneralEnrichment <- function(GRN, outputFolder = NULL, basenameOutput = NUL |
2172 | 2182 |
checkmate::assertClass(GRN, "GRN") |
2173 | 2183 |
GRN = .addFunctionLogToObject(GRN) |
2174 | 2184 |
|
2185 |
+ GRN = .makeObjectCompatible(GRN) |
|
2186 |
+ |
|
2175 | 2187 |
|
2176 | 2188 |
ontologiesFound = names(GRN@stats$Enrichment$general) |
2177 | 2189 |
|
... | ... |
@@ -2230,7 +2242,7 @@ plotGeneralEnrichment <- function(GRN, outputFolder = NULL, basenameOutput = NUL |
2230 | 2242 |
|
2231 | 2243 |
} |
2232 | 2244 |
|
2233 |
- .printWarningPageNumber(pages, pageCounter) |
|
2245 |
+ .checkPageNumberValidity(pages, pageCounter) |
|
2234 | 2246 |
if (plotAsPDF) grDevices::dev.off() |
2235 | 2247 |
|
2236 | 2248 |
} else { |
... | ... |
@@ -2349,6 +2361,7 @@ plotCommunitiesStats <- function(GRN, outputFolder = NULL, basenameOutput = NULL |
2349 | 2361 |
checkmate::assertClass(GRN, "GRN") |
2350 | 2362 |
GRN = .addFunctionLogToObject(GRN) |
2351 | 2363 |
|
2364 |
+ GRN = .makeObjectCompatible(GRN) |
|
2352 | 2365 |
|
2353 | 2366 |
checkmate::assert(checkmate::checkNull(outputFolder), checkmate::checkCharacter(outputFolder, min.chars = 1)) |
2354 | 2367 |
checkmate::assert(checkmate::checkNull(basenameOutput), checkmate::checkCharacter(basenameOutput, len = 1, min.chars = 1, any.missing = FALSE)) |
... | ... |
@@ -2505,7 +2518,7 @@ plotCommunitiesStats <- function(GRN, outputFolder = NULL, basenameOutput = NULL |
2505 | 2518 |
|
2506 | 2519 |
} |
2507 | 2520 |
|
2508 |
- .printWarningPageNumber(pages, pageCounter) |
|
2521 |
+ .checkPageNumberValidity(pages, pageCounter) |
|
2509 | 2522 |
if (plotAsPDF) grDevices::dev.off() |
2510 | 2523 |
|
2511 | 2524 |
} else { |
... | ... |
@@ -2522,13 +2535,13 @@ plotCommunitiesStats <- function(GRN, outputFolder = NULL, basenameOutput = NULL |
2522 | 2535 |
#' Similarly to \code{\link{plotGeneralEnrichment}} and \code{\link{plotTFEnrichment}}, the results of the community-based enrichment analysis are plotted. |
2523 | 2536 |
#' This function produces multiple plots. First, one plot per community to summarize the community-specific enrichment. |
2524 | 2537 |
#' Second, a summary heatmap of all significantly enriched terms across all communities and for the whole eGRN. The latter allows to compare the results with the general network enrichment. |
2525 |
-#' #' Third, a subset of the aforementioned heatmap, showing only the top most significantly enriched terms per community and for the whole eGRN (as specified by \code{nID}) for improved visibility |
|
2538 |
+#' Third, a subset of the aforementioned heatmap, showing only the top most significantly enriched terms per community and for the whole eGRN (as specified by \code{nID}) for improved visibility |
|
2526 | 2539 |
#' |
2527 | 2540 |
#' @inheritParams plotGeneralEnrichment |
2528 | 2541 |
#' @param display Character. Default \code{"byRank"}. One of: \code{"byRank"}, \code{"byLabel"}. Specify whether the communities will be displayed based on their rank, where the largest community (with most vertices) would have a rank of 1, or by their label. Note that the label is independent of the rank. |
2529 | 2542 |
#' @param communities \code{NULL} or numeric vector. Default \code{NULL}. If set to \code{NULL}, the default, all communities enrichments that have been calculated before are plotted. If a numeric vector is specified: Depending on what was specified in the \code{display} parameter, this parameter indicates either the rank or the label of the communities to be plotted. i.e. for \code{communities = c(1,4)}, if \code{display = "byRank"} the results for the first and fourth largest communities are plotted. if \code{display = "byLabel"}, the results for the communities labeled \code{"1"}, and \code{"4"} are plotted. |
2530 |
-#' @param nSignificant Numeric. Default 3. Threshold to filter out an ontology term with less than \code{nSignificant} overlapping genes. |
|
2531 |
-#' @param nID Numeric. Default 10. For the reduced summary heatmap, number of top terms to select per community / for the general enrichment. |
|
2543 |
+#' @param nSignificant Numeric >0. Default 3. Threshold to filter out an ontology term with less than \code{nSignificant} overlapping genes. |
|
2544 |
+#' @param nID Numeric >0. Default 10. For the reduced summary heatmap, number of top terms to select per community / for the general enrichment. |
|
2532 | 2545 |
#' @template maxWidth_nchar_plot |
2533 | 2546 |
#' @return The same \code{\linkS4class{GRN}} object, without modifications. |
2534 | 2547 |
#' @seealso \code{\link{plotGeneralEnrichment}} |
... | ... |
@@ -2551,6 +2564,8 @@ plotCommunitiesEnrichment <- function(GRN, outputFolder = NULL, basenameOutput = |
2551 | 2564 |
checkmate::assertClass(GRN, "GRN") |
2552 | 2565 |
GRN = .addFunctionLogToObject(GRN) |
2553 | 2566 |
|
2567 |
+ GRN = .makeObjectCompatible(GRN) |
|
2568 |
+ |
|
2554 | 2569 |
checkmate::assertChoice(display , c("byRank", "byLabel")) |
2555 | 2570 |
checkmate::assert(checkmate::checkNull(communities), checkmate::checkNumeric(communities, lower = 1, any.missing = FALSE, min.len = 1)) |
2556 | 2571 |
checkmate::assertNumeric(p, lower = 0, upper = 1) |
... | ... |
@@ -2791,7 +2806,7 @@ plotCommunitiesEnrichment <- function(GRN, outputFolder = NULL, basenameOutput = |
2791 | 2806 |
} |
2792 | 2807 |
|
2793 | 2808 |
|
2794 |
- .printWarningPageNumber(pages, pageCounter) |
|
2809 |
+ .checkPageNumberValidity(pages, pageCounter) |
|
2795 | 2810 |
if (plotAsPDF) grDevices::dev.off() |
2796 | 2811 |
|
2797 | 2812 |
} else { |
... | ... |
@@ -2895,6 +2910,8 @@ plotTFEnrichment <- function(GRN, rankType = "degree", n = NULL, TF.names = NULL |
2895 | 2910 |
start = Sys.time() |
2896 | 2911 |
checkmate::assertClass(GRN, "GRN") |
2897 | 2912 |
GRN = .addFunctionLogToObject(GRN) |
2913 |
+ |
|
2914 |
+ GRN = .makeObjectCompatible(GRN) |
|
2898 | 2915 |
|
2899 | 2916 |
checkmate::assertChoice(rankType, c("degree", "EV", "custom")) |
2900 | 2917 |
checkmate::assert(checkmate::checkNull(n), checkmate::checkNumeric(n)) |
... | ... |
@@ -2958,10 +2975,10 @@ plotTFEnrichment <- function(GRN, rankType = "degree", n = NULL, TF.names = NULL |
2958 | 2975 |
vertexMetadata = as.data.frame(igraph::vertex.attributes(GRN@graph$TF_gene$graph)) |
2959 | 2976 |
nodeDegree = igraph::degree(GRN@graph$TF_gene$graph) |
2960 | 2977 |
|
2961 |
- # nodeDegree_TFset = sort(nodeDegree[GRN@data$TFs$translationTable %>% dplyr::filter(TF.name %in% as.character(TFset)) %>% |
|
2978 |
+ # nodeDegree_TFset = sort(nodeDegree[GRN@annotation$TFs %>% dplyr::filter(TF.name %in% as.character(TFset)) %>% |
|
2962 | 2979 |
# dplyr::pull(TF.ENSEMBL)], decreasing = TRUE) |
2963 | 2980 |
|
2964 |
- nodeDegree_TFset = dplyr::left_join(GRN@data$TFs$translationTable, |
|
2981 |
+ nodeDegree_TFset = dplyr::left_join(GRN@annotation$TFs, |
|
2965 | 2982 |
as.data.frame(nodeDegree) %>% tibble::rownames_to_column("ENSEMBL"), by = "ENSEMBL") %>% |
2966 | 2983 |
dplyr::filter(.data$TF.name %in% as.character(TFset)) |
2967 | 2984 |
|
... | ... |
@@ -2981,7 +2998,7 @@ plotTFEnrichment <- function(GRN, rankType = "degree", n = NULL, TF.names = NULL |
2981 | 2998 |
|
2982 | 2999 |
if (is.null(pages) | (!is.null(pages) && pageCounter %in% pages)) { |
2983 | 3000 |
|
2984 |
- TF.ENSEMBL = GRN@data$TFs$translationTable %>% dplyr::filter(.data$TF.name == TFCur) %>% dplyr::pull(TF.ENSEMBL) |
|
3001 |
+ TF.ENSEMBL = GRN@annotation$TFs %>% dplyr::filter(.data$TF.name == TFCur) %>% dplyr::pull(TF.ENSEMBL) |
|
2985 | 3002 |
|
2986 | 3003 |
TF.name.full = paste0(TFCur, " (", TF.ENSEMBL, ")") |
2987 | 3004 |
|
... | ... |
@@ -3115,7 +3132,7 @@ plotTFEnrichment <- function(GRN, rankType = "degree", n = NULL, TF.names = NULL |
3115 | 3132 |
|
3116 | 3133 |
} |
3117 | 3134 |
|
3118 |
- .printWarningPageNumber(pages, pageCounter) |
|
3135 |
+ .checkPageNumberValidity(pages, pageCounter) |
|
3119 | 3136 |
if (plotAsPDF) grDevices::dev.off() |
3120 | 3137 |
|
3121 | 3138 |
} else { |
... | ... |
@@ -3304,9 +3321,9 @@ plotTFEnrichment <- function(GRN, rankType = "degree", n = NULL, TF.names = NULL |
3304 | 3321 |
|
3305 | 3322 |
############### GRN visualization ############### |
3306 | 3323 |
|
3307 |
-#' Visualize a filtered GRN in a flexible manner. |
|
3324 |
+#' Visualize a filtered eGRN in a flexible manner. |
|
3308 | 3325 |
#' |
3309 |
-#' This function requires a filtered set of connections in the \code{\linkS4class{GRN}} object as generated by \code{\link{filterGRNAndConnectGenes}} |
|
3326 |
+#' This function can visualize a filtered eGRN in a very flexible manner and requires a filtered set of connections in the \code{\linkS4class{GRN}} object as generated by \code{\link{filterGRNAndConnectGenes}}. |
|
3310 | 3327 |
#' |
3311 | 3328 |
#' @template GRN |
3312 | 3329 |
#' @template outputFolder |
... | ... |
@@ -3314,11 +3331,11 @@ plotTFEnrichment <- function(GRN, rankType = "degree", n = NULL, TF.names = NULL |
3314 | 3331 |
#' @template plotAsPDF |
3315 | 3332 |
#' @template pdf_width |
3316 | 3333 |
#' @template pdf_height |
3317 |
-#' @param title NULL or Character. Default NULL. Title to be assigned to the plot. |
|
3318 |
-#' @param maxRowsToPlot Numeric. Default 500. Refers to the maximum number of connections to be plotted. If the network size is above this limit, nothing will be drawn. In such a case, it may help to either increase the value of this parameter or set the filtering criteria for the network to be more stringent, so that the network becomes smaller. |
|
3334 |
+#' @param title \code{NULL} or Character. Default \code{NULL}. Title to be assigned to the plot. |
|
3335 |
+#' @param maxRowsToPlot Integer > 0. Default 500. Refers to the maximum number of connections to be plotted. If the network size is above this limit, nothing will be drawn. In such a case, it may help to either increase the value of this parameter or set the filtering criteria for the network to be more stringent, so that the network becomes smaller. |
|
3319 | 3336 |
#' @param nCommunitiesMax Integer > 0. Default 8. Maximum number of communities that get a distinct coloring. All additional communities will be colored with the same (gray) color. |
3320 | 3337 |
#' @param graph Character. Default \code{TF-gene}. One of: \code{TF-gene}, \code{TF-peak-gene}. Whether to plot a graph with links from TFs to peaks to gene, or the graph with the inferred TF to gene connections. |
3321 |
-#' @param colorby Character. Default \code{type}. One of \code{type}, code \code{community}. Color the vertices by either type (TF/peak/gene) or community. See \code{\link{calculateCommunitiesStats}} |
|
3338 |
+#' @param colorby Character. Default \code{type}. Either \code{type} or \code{community}. Color the vertices by either type (TF/peak/gene) or community. See \code{\link{calculateCommunitiesStats}} |
|
3322 | 3339 |
#' @param layout Character. Default \code{fr}. One of \code{star}, \code{fr}, \code{sugiyama}, \code{kk}, \code{lgl}, \code{graphopt}, \code{mds}, \code{sphere} |
3323 | 3340 |
#' @param vertice_color_TFs Named list. Default \code{list(h = 10, c = 85, l = c(25, 95))}. The list must specify the color in hcl format (hue, chroma, luminence). See the \code{colorspace} package for more details and examples |
3324 | 3341 |
#' @param vertice_color_peaks Named list. Default \code{list(h = 135, c = 45, l = c(35, 95))}. |
... | ... |
@@ -3343,10 +3360,12 @@ visualizeGRN <- function(GRN, outputFolder = NULL, basenameOutput = NULL, plotA |
3343 | 3360 |
checkmate::assertClass(GRN, "GRN") |
3344 | 3361 |
GRN = .addFunctionLogToObject(GRN) |
3345 | 3362 |
|
3363 |
+ GRN = .makeObjectCompatible(GRN) |
|
3364 |
+ |
|
3346 | 3365 |
checkmate::assertFlag(plotAsPDF) |
3347 | 3366 |
checkmate::assertNumeric(pdf_width, lower = 5, upper = 99) |
3348 | 3367 |
checkmate::assertNumeric(pdf_height, lower = 5, upper = 99) |
3349 |
- checkmate::assertNumeric(maxRowsToPlot) |
|
3368 |
+ checkmate::assertIntegerish(maxRowsToPlot, lower = 1) |
|
3350 | 3369 |
checkmate::assertIntegerish(nCommunitiesMax,lower = 1) |
3351 | 3370 |
checkmate::assertChoice(graph, c("TF-gene", "TF-peak-gene")) |
3352 | 3371 |
checkmate::assertChoice(colorby, c("type", "community")) |
... | ... |
@@ -3354,10 +3373,13 @@ visualizeGRN <- function(GRN, outputFolder = NULL, basenameOutput = NULL, plotA |
3354 | 3373 |
checkmate::assertChoice(layout, c("star", "fr", "sugiyama", "kk", "lgl", "graphopt", "mds", "sphere")) |
3355 | 3374 |
checkmate::assertList(vertice_color_TFs) |
3356 | 3375 |
checkmate::assertNames(names(vertice_color_TFs), must.include = c("h", "c", "l"), subset.of = c("h", "c", "l")) |
3376 |
+ checkmate::assertIntegerish(unlist(vertice_color_TFs), lower = 0, upper = 360) |
|
3357 | 3377 |
checkmate::assertList(vertice_color_peaks) |
3358 | 3378 |
checkmate::assertNames(names(vertice_color_peaks), must.include = c("h", "c", "l"), subset.of = c("h", "c", "l")) |
3379 |
+ checkmate::assertIntegerish(unlist(vertice_color_peaks), lower = 0, upper = 360) |
|
3359 | 3380 |
checkmate::assertList(vertice_color_genes) |
3360 | 3381 |
checkmate::assertNames(names(vertice_color_genes), must.include = c("h", "c", "l"), subset.of = c("h", "c", "l")) |
3382 |
+ checkmate::assertIntegerish(unlist(vertice_color_genes), lower = 0, upper = 360) |
|
3361 | 3383 |
checkmate::assertNumeric(vertexLabel_cex) |
3362 | 3384 |
checkmate::assertNumeric(vertexLabel_dist) |
3363 | 3385 |
checkmate::assertFlag(forceRerun) |
... | ... |
@@ -3390,10 +3412,9 @@ visualizeGRN <- function(GRN, outputFolder = NULL, basenameOutput = NULL, plotA |
3390 | 3412 |
grn.merged$V1[!is.na(grn.merged$TF.name)] = as.character(grn.merged$TF.name[!is.na(grn.merged$TF.name)]) # replace TF ensembl with TF name |
3391 | 3413 |
|
3392 | 3414 |
edges_final = grn.merged %>% |
3393 |
- dplyr::mutate(R = as.vector(stats::na.omit(c(.data$TF_peak.r, .data$peak_gene.r))), |
|
3394 |
- weight = as.vector(stats::na.omit(c(1- .data$TF_peak.fdr, .data$peak_gene.r))), |
|
3415 |
+ dplyr::mutate(weight = .data$r, |
|
3395 | 3416 |
linetype = "solid") %>% |
3396 |
- dplyr::rename(from = .data$V1, to = .data$V2) |
|
3417 |
+ dplyr::rename(from = .data$V1, to = .data$V2, R = .data$r) |
|
3397 | 3418 |
|
3398 | 3419 |
} |
3399 | 3420 |
|
... | ... |
@@ -3730,7 +3751,7 @@ visualizeGRN <- function(GRN, outputFolder = NULL, basenameOutput = NULL, plotA |
3730 | 3751 |
community_colors = rbind(community_colors, fillercolors) |
3731 | 3752 |
|
3732 | 3753 |
}else{ |
3733 |
- community_colors = data.frame(community = names(sort(table(GRN@graph$TF_gene$clusterGraph$membership), decreasing = TRUE)[seq_len(nCommunitiesMax)]), |
|
3754 |
+ community_colors = data.frame(community = names(sort(table(GRN@graph$TF_gene$clusterGraph$membership), decreasing = TRUE)[seq_len(ncommunities)]), |
|
3734 | 3755 |
color = grDevices::rainbow(ncommunities)) |
3735 | 3756 |
} |
3736 | 3757 |
|
... | ... |
@@ -3958,3 +3979,36 @@ visualizeGRN <- function(GRN, outputFolder = NULL, basenameOutput = NULL, plotA |
3958 | 3979 |
checkmate::assertDataFrame(vertice_color_list[[1]]) |
3959 | 3980 |
checkmate::assertSubset(c(vertice_color_list[[2]], vertice_color_list[[3]]), colnames(vertice_color_list[[1]]), empty.ok = FALSE) |
3960 | 3981 |
} |
3982 |
+ |
|
3983 |
+ |
|
3984 |
+ |
|
3985 |
+ |
|
3986 |
+# **dist** |
|
3987 |
+# ```{r} |
|
3988 |
+# # sort cols by median fraction of the variance explained (default) and plot |
|
3989 |
+# plotVarPart( sortCols(varPart) ) |
|
3990 |
+# ``` |
|
3991 |
+# |
|
3992 |
+# **sex** |
|
3993 |
+# ```{r} |
|
3994 |
+# # plot top 10 |
|
3995 |
+# head(df_varPart[order( dplyr::pull(df_varPart, c("sex")), decreasing = T),],10) %>% |
|
3996 |
+# plotPercentBars() |
|
3997 |
+# ``` |
|
3998 |
+# |
|
3999 |
+# **cell_subset** |
|
4000 |
+# ```{r} |
|
4001 |
+# # plot top 10 |
|
4002 |
+# head(df_varPart[order( dplyr::pull(df_varPart, c("cell_subset")), decreasing = T),],10) %>% |
|
4003 |
+# plotPercentBars() |
|
4004 |
+# ``` |
|
4005 |
+# |
|
4006 |
+# **sample_loc** |
|
4007 |
+# ```{r} |
|
4008 |
+# # plot top 10 for sample_loc |
|
4009 |
+# head(df_varPart[order( dplyr::pull(df_varPart, c("sample_loc")), decreasing = T),],10) %>% |
|
4010 |
+# plotPercentBars() |
|
4011 |
+# ``` |
|
4012 |
+# |
|
4013 |
+# |
|
4014 |
+# } |
... | ... |
@@ -30,7 +30,7 @@ addTFBS( |
30 | 30 |
\item{forceRerun}{\code{TRUE} or \code{FALSE}. Default \code{FALSE}. Force execution, even if the GRN object already contains the result. Overwrites the old results.} |
31 | 31 |
} |
32 | 32 |
\value{ |
33 |
-An updated \code{\linkS4class{GRN}} object, with additional information added from this function (\code{GRN@data$TFs$translationTable} in particular) |
|
33 |
+An updated \code{\linkS4class{GRN}} object, with additional information added from this function (\code{GRN@annotation$TFs} in particular) |
|
34 | 34 |
} |
35 | 35 |
\description{ |
36 | 36 |
Add TFBS to a \code{\linkS4class{GRN}} object |
37 | 37 |
new file mode 100644 |
... | ... |
@@ -0,0 +1,56 @@ |
1 |
+% Generated by roxygen2: do not edit by hand |
|
2 |
+% Please edit documentation in R/core.R |
|
3 |
+\name{add_featureVariation} |
|
4 |
+\alias{add_featureVariation} |
|
5 |
+\title{Quantify and interpret multiple sources of biological and technical variation for features (TFs, peaks, and genes) in a \code{\linkS4class{GRN}} object} |
|
6 |
+\usage{ |
|
7 |
+add_featureVariation( |
|
8 |
+ GRN, |
|
9 |
+ formula = "auto", |
|
10 |
+ metadata = c("all"), |
|
11 |
+ features = "all_filtered", |
|
12 |
+ nCores = 1, |
|
13 |
+ forceRerun = FALSE, |
|
14 |
+ ... |
|
15 |
+) |
|
16 |
+} |
|
17 |
+\arguments{ |
|
18 |
+\item{GRN}{Object of class \code{\linkS4class{GRN}}} |
|
19 |
+ |
|
20 |
+\item{formula}{Character(1). Either \code{auto} or a manually defined formula to be used for the model fitting. Default \code{auto}. Must include only terms that are part of the metadata as specified with the \code{metadata} parameter. If set to \code{auto}, the formula will be build automatically based on all metadata variables as specified with the \code{metadata} parameter. By default, numerical variables will be modeled as fixed effects, while variables that are defined as factors or can be converted to factors (characters and logical variables) are modeled as random effects as recommended by the \code{variancePartition} package.} |
|
21 |
+ |
|
22 |
+\item{metadata}{Character vector. Default \code{all}. Vector of column names from the metadata data frame that was provided when using the function |
|
23 |
+\code{\link{addData}}. Must either contain the special keyword \code{all} to denote that all (!) metadata columns from \code{GRN@data$metadata} are taken |
|
24 |
+or if not, a subset of the column names from \code{GRN@data$metadata}to include in the model fitting for \code{fitExtractVarPartModel}..} |
|
25 |
+ |
|
26 |
+\item{features}{Character(1). Either \code{all_filtered} or \code{all}. Default \code{all_filtered}. Should \code{variancePartition} only be run for the features (TFs, peaks and genes) from the filtered set of connections (the result of \code{\link{filterGRNAndConnectGenes}}) or for all genes that are defined in the object? If set to \code{all}, the running time is greatly increased.} |
|
27 |
+ |
|
28 |
+\item{nCores}{Integer >0. Default 1. Number of cores to use. |
|
29 |
+A value >1 requires the \code{BiocParallel} package (as it is listed under \code{Suggests}, it may not be installed yet).} |
|
30 |
+ |
|
31 |
+\item{forceRerun}{\code{TRUE} or \code{FALSE}. Default \code{FALSE}. Force execution, even if the GRN object already contains the result. Overwrites the old results.} |
|
32 |
+ |
|
33 |
+\item{...}{Additional parameters passed on to \code{variancePartition::fitExtractVarPartModel} beyond \code{exprObj}, \code{formula} and \code{data}. See the function help for more information} |
|
34 |
+} |
|
35 |
+\value{ |
|
36 |
+An updated \code{\linkS4class{GRN}} object, with additional information added from this function to \code{GRN@stats$variancePartition} as well as the elements \code{genes}, \code{consensusPeaks} and \code{TFs} within \code{GRN@annotation}. |
|
37 |
+As noted above, the results are not added to \code{GRN@connections$all.filtered}; rerun the function \code{\link{getGRNConnections}} and set \code{include_variancePartitionResults} to \code{TRUE} to include the results in the eGRN output table. |
|
38 |
+} |
|
39 |
+\description{ |
|
40 |
+Runs the main function \code{fitExtractVarPartModel} of the package \code{variancePartition}: Fits a linear (mixed) model to estimate contribution of multiple sources of variation while simultaneously correcting for all other variables for the features in a GRN object (TFs, peaks, and genes) given particular metadata. The function reports the fraction of variance attributable to each metadata variable. |
|
41 |
+\strong{Note: The results are not added to \code{GRN@connections$all.filtered}, rerun the function \code{\link{getGRNConnections}} and set \code{include_variancePartitionResults} to \code{TRUE} to do so}. |
|
42 |
+The results object is stored in \code{GRN@stats$variancePartition} and can be used for the various diagnostic and plotting functions from \code{variancePartition}. |
|
43 |
+} |
|
44 |
+\details{ |
|
45 |
+The normalized count matrices are used as input for \code{fitExtractVarPartModel}. |
|
46 |
+} |
|
47 |
+\examples{ |
|
48 |
+# See the Workflow vignette on the GRaNIE website for examples |
|
49 |
+GRN = loadExampleObject() |
|
50 |
+GRN = add_featureVariation(GRN, metadata = c("mt_frac"), forceRerun = TRUE) |
|
51 |
+} |
|
52 |
+\seealso{ |
|
53 |
+\code{\link{addData}} |
|
54 |
+ |
|
55 |
+\code{\link{getGRNConnections}} |
|
56 |
+} |
... | ... |
@@ -39,7 +39,8 @@ The special keyword \code{"all"} means no filter on gene type.} |
39 | 39 |
|
40 | 40 |
\item{selection}{Character. Default \code{"byRank"}. One of: \code{"byRank"}, \code{"byLabel"}. Specify whether the communities enrichment will by calculated based on their rank, where the largest community (with most vertices) would have a rank of 1, or by their label. Note that the label is independent of the rank.} |
41 | 41 |
|
42 |
-\item{communities}{Numeric vector. Default \code{c(1:10)}. Depending on what was specified in the \code{display} parameter, this parameter would indicate either the rank or the label of the communities to be plotted. i.e. for \code{communities = c(1,4)}, if \code{display = "byRank"} the GO enrichment for the first and fourth largest communities will be calculated if \code{display = "byLabel"}, the results for the communities labeled "1", and "4" will be plotted.} |
|
42 |
+\item{communities}{Numeric vector. Default \code{c(1:10)}. Depending on what was specified in the \code{display} parameter, |
|
43 |
+this parameter would indicate either the rank or the label of the communities to be plotted. i.e. for \code{communities = c(1,4)}, if \code{display = "byRank"} the GO enrichment for the first and fourth largest communities will be calculated if \code{display = "byLabel"}, the results for the communities labeled "1", and "4" will be plotted.} |
|
43 | 44 |
|
44 | 45 |
\item{pAdjustMethod}{Character. Default \code{"BH"}. One of: \code{"holm"}, \code{"hochberg"}, \code{"hommel"}, \code{"bonferroni"}, \code{"BH"}, \code{"BY"}, \code{"fdr"}. This parameter is only relevant for the following ontologies: KEGG, DO, Reactome. For the other ontologies, the algorithm serves as an adjustment.} |
45 | 46 |
|
... | ... |
@@ -16,7 +16,8 @@ calculateCommunitiesStats(GRN, clustering = "louvain", forceRerun = FALSE, ...) |
16 | 16 |
\item{...}{Additional parameters for the used clustering method, see the \code{igraph::cluster_*} methods for details on the specific parameters and what they do. For \code{leiden} clustering, for example, you may add a \code{resolution_parameter} to control the granularity of the community detection or \code{n_iterations} to modify the number of iterations.} |
17 | 17 |
} |
18 | 18 |
\value{ |
19 |
-An updated \code{\linkS4class{GRN}} object, with a table that consists of the connections clustered into communities stored in the \code{stats$communities} slot. |
|
19 |
+An updated \code{\linkS4class{GRN}} object, with a table that consists of the connections clustered into communities stored in the |
|
20 |
+\code{GRN@graph$TF_gene$clusterGraph} slot as well as within the \code{igraph} object in \code{GRN@graph$TF_gene$graph} (retrievable via \code{igraph} using \code{igraph::vertex.attributes(GRN@graph$TF_gene$graph)$community}, for example.) |
|
20 | 21 |
} |
21 | 22 |
\description{ |
22 | 23 |
The results can subsequently be visualized with the function \code{\link{plotCommunitiesStats}} |
... | ... |
@@ -51,7 +51,11 @@ filterData( |
51 | 51 |
An updated \code{\linkS4class{GRN}} object, with added data from this function. |
52 | 52 |
} |
53 | 53 |
\description{ |
54 |
-Filter RNA-seq and/or peak data from a \code{\linkS4class{GRN}} object |
|
54 |
+This function marks genes and/or peaks as \code{filtered} depending on the chosen filtering criteria. Filtered genes / peaks will then be |
|
55 |
+disregarded when adding connections in subsequent steps via \code{\link{addConnections_TF_peak}} and \code{\link{addConnections_peak_gene}}. \strong{This function does NOT (re)filter existing connections when the \code{\linkS4class{GRN}} object already contains connections. Thus, upon re-execution of this function with different filtering criteria, all downstream steps have to be re-run.} |
|
56 |
+} |
|
57 |
+\details{ |
|
58 |
+All this function does is setting (or modifying) the filtering flag in \code{GRN@data$peaks$counts_norm} and \code{GRN@data$RNA$counts_norm.l}, respectively. |
|
55 | 59 |
} |
56 | 60 |
\examples{ |
57 | 61 |
# See the Workflow vignette on the GRaNIE website for examples |
... | ... |
@@ -80,12 +80,14 @@ keeping backwards compatibility with \code{\linkS4class{GRN}} objects.} |
80 | 80 |
\item{silent}{\code{TRUE} or \code{FALSE}. Default \code{FALSE}. Print progress messages and filter statistics.} |
81 | 81 |
} |
82 | 82 |
\value{ |
83 |
-An updated \code{\linkS4class{GRN}} object, with additional information added from this function. The filtered and merged TF-peak and peak-gene connections in the slot \code{GRN@connections$all.filtered}. |
|
83 |
+An updated \code{\linkS4class{GRN}} object, with additional information added from this function. |
|
84 |
+The filtered and merged TF-peak and peak-gene connections in the slot \code{GRN@connections$all.filtered} and can be retrieved (along with other feature metadata) using the function \code{\link{getGRNConnections}}. |
|
84 | 85 |
} |
85 | 86 |
\description{ |
86 | 87 |
This is one of the main integrative functions of the \code{GRaNIE} package. It has two main functions: |
87 | 88 |
First, filtering both TF-peak and peak-gene connections according to different criteria such as FDR and other properties |
88 | 89 |
Second, joining the three major elements that an eGRN consist of (TFs, peaks, genes) into one data frame, with one row per unique TF-peak-gene connection. |
90 |
+\strong{After successful execution, the connections (along with additional feature metadata) can be retrieved with the function \code{\link{getGRNConnections}}.} |
|
89 | 91 |
\strong{Note that the eGRN graph is reset upon successful execution of this function, and re-running the function \code{\link{build_eGRN_graph}} |
90 | 92 |
is necessary when any of the network functions of the package shall be executed.} |
91 | 93 |
Internally, before joining them, both TF-peak links and peak-gene connections are filtered separately for reasons of memory and computational efficacy: |
... | ... |
@@ -106,4 +108,6 @@ GRN = filterGRNAndConnectGenes(GRN) |
106 | 108 |
\code{\link{addConnections_peak_gene}} |
107 | 109 |
|
108 | 110 |
\code{\link{build_eGRN_graph}} |
111 |
+ |
|
112 |
+\code{\link{getGRNConnections}} |
|
109 | 113 |
} |
... | ... |
@@ -8,17 +8,32 @@ getGRNConnections( |
8 | 8 |
GRN, |
9 | 9 |
type = "all.filtered", |
10 | 10 |
permuted = FALSE, |
11 |
- include_TF_gene_correlations = FALSE |
|
11 |
+ include_TF_gene_correlations = FALSE, |
|
12 |
+ include_TFMetadata = FALSE, |
|
13 |
+ include_peakMetadata = FALSE, |
|
14 |
+ include_geneMetadata = FALSE, |
|
15 |
+ include_variancePartitionResults = FALSE |
|
12 | 16 |
) |
13 | 17 |
} |
14 | 18 |
\arguments{ |
15 | 19 |
\item{GRN}{Object of class \code{\linkS4class{GRN}}} |
16 | 20 |
|
17 |
-\item{type}{Character. One of \code{TF_peaks}, \code{peak_genes}, \code{all.filtered}. Default \code{all.filtered}. The type of connections to retrieve.} |
|
21 |
+\item{type}{Character. One of \code{TF_peaks}, \code{peak_genes}, \code{TF_genes} or \code{all.filtered}. Default \code{all.filtered}. The type of connections to retrieve.} |
|
18 | 22 |
|
19 | 23 |
\item{permuted}{\code{TRUE} or \code{FALSE}. Default \code{FALSE}. Should the permuted data be taken (\code{TRUE}) or the non-permuted, original one (\code{FALSE})?} |
20 | 24 |
|
21 |
-\item{include_TF_gene_correlations}{Logical. \code{TRUE} or \code{FALSE}. Default \code{FALSE}. Should TFs and gene correlations be returned as well? If set to \code{TRUE}, they must have been computed beforehand with \code{\link{add_TF_gene_correlation}}.} |
|
25 |
+\item{include_TF_gene_correlations}{Logical. \code{TRUE} or \code{FALSE}. Default \code{FALSE}. Should TFs and gene correlations be returned as well? If set to \code{TRUE}, they must have been computed beforehand with \code{\link{add_TF_gene_correlation}}. Only relevant for type = "all.filtered"} |
|
26 |
+ |
|
27 |
+\item{include_TFMetadata}{Logical. \code{TRUE} or \code{FALSE}. Default \code{FALSE}. Should TF metadata be returned as well? Only relevant for type = "all.filtered"} |
|
28 |
+ |
|
29 |
+\item{include_peakMetadata}{Logical. \code{TRUE} or \code{FALSE}. Default \code{FALSE}. Should peak metadata be returned as well? Only relevant for type = "all.filtered"} |
|
30 |
+ |
|
31 |
+\item{include_geneMetadata}{Logical. \code{TRUE} or \code{FALSE}. Default \code{FALSE}. Should gene metadata be returned as well? Only relevant for type = "all.filtered"} |
|
32 |
+ |
|
33 |
+\item{include_variancePartitionResults}{Logical. \code{TRUE} or \code{FALSE}. Default \code{FALSE}. |
|
34 |
+Should the results from the function \code{\link{add_featureVariation}} be included? |
|
35 |
+If set to \code{TRUE}, they must have been computed beforehand with \code{\link{add_featureVariation}}; otherwise, an error is thrown. |
|
36 |
+Only relevant for type = "all.filtered"} |
|
22 | 37 |
} |
23 | 38 |
\value{ |
24 | 39 |
A data frame with the requested connections. This function does **NOT** return a \code{\linkS4class{GRN}} object. |
... | ... |
@@ -34,4 +49,8 @@ GRN_con.all.df = getGRNConnections(GRN) |
34 | 49 |
} |
35 | 50 |
\seealso{ |
36 | 51 |
\code{\link{filterGRNAndConnectGenes}} |
52 |
+ |
|
53 |
+\code{\link{add_featureVariation}} |
|
54 |
+ |
|
55 |
+\code{\link{add_TF_gene_correlation}} |
|
37 | 56 |
} |
... | ... |
@@ -38,9 +38,9 @@ plotCommunitiesEnrichment( |
38 | 38 |
|
39 | 39 |
\item{p}{Numeric. Default 0.05. p-value threshold to determine significance.} |
40 | 40 |
|
41 |
-\item{nSignificant}{Numeric. Default 3. Threshold to filter out an ontology term with less than \code{nSignificant} overlapping genes.} |
|
41 |
+\item{nSignificant}{Numeric >0. Default 3. Threshold to filter out an ontology term with less than \code{nSignificant} overlapping genes.} |
|
42 | 42 |
|
43 |
-\item{nID}{Numeric. Default 10. For the reduced summary heatmap, number of top terms to select per community / for the general enrichment.} |
|
43 |
+\item{nID}{Numeric >0. Default 10. For the reduced summary heatmap, number of top terms to select per community / for the general enrichment.} |
|
44 | 44 |
|
45 | 45 |
\item{maxWidth_nchar_plot}{Integer (>=10). Default 50. Maximum number of characters for a term before it is truncated.} |
46 | 46 |
|
... | ... |
@@ -63,7 +63,7 @@ The same \code{\linkS4class{GRN}} object, without modifications. |
63 | 63 |
Similarly to \code{\link{plotGeneralEnrichment}} and \code{\link{plotTFEnrichment}}, the results of the community-based enrichment analysis are plotted. |
64 | 64 |
This function produces multiple plots. First, one plot per community to summarize the community-specific enrichment. |
65 | 65 |
Second, a summary heatmap of all significantly enriched terms across all communities and for the whole eGRN. The latter allows to compare the results with the general network enrichment. |
66 |
-#' Third, a subset of the aforementioned heatmap, showing only the top most significantly enriched terms per community and for the whole eGRN (as specified by \code{nID}) for improved visibility |
|
66 |
+Third, a subset of the aforementioned heatmap, showing only the top most significantly enriched terms per community and for the whole eGRN (as specified by \code{nID}) for improved visibility |
|
67 | 67 |
} |
68 | 68 |
\examples{ |
69 | 69 |
# See the Workflow vignette on the GRaNIE website for examples |
... | ... |
@@ -37,9 +37,9 @@ plotTFEnrichment( |
37 | 37 |
|
38 | 38 |
\item{p}{Numeric. Default 0.05. p-value threshold to determine significance.} |
39 | 39 |
|
40 |
-\item{nSignificant}{Numeric. Default 3. Threshold to filter out an ontology term with less than \code{nSignificant} overlapping genes.} |
|
40 |
+\item{nSignificant}{Numeric >0. Default 3. Threshold to filter out an ontology term with less than \code{nSignificant} overlapping genes.} |
|
41 | 41 |
|
42 |
-\item{nID}{Numeric. Default 10. For the reduced summary heatmap, number of top terms to select per community / for the general enrichment.} |
|
42 |
+\item{nID}{Numeric >0. Default 10. For the reduced summary heatmap, number of top terms to select per community / for the general enrichment.} |
|
43 | 43 |
|
44 | 44 |
\item{display_pAdj}{\code{TRUE} or \code{FALSE}. Default \code{FALSE}. Is the p-value being displayed in the plots the adjusted p-value? This parameter is relevant for KEGG, Disease Ontology, and Reactome enrichments, and does not affect GO enrichments.} |
45 | 45 |
|
... | ... |
@@ -2,7 +2,7 @@ |
2 | 2 |
% Please edit documentation in R/plot.R |
3 | 3 |
\name{visualizeGRN} |
4 | 4 |
\alias{visualizeGRN} |
5 |
-\title{Visualize a filtered GRN in a flexible manner.} |
|
5 |
+\title{Visualize a filtered eGRN in a flexible manner.} |
|
6 | 6 |
\usage{ |
7 | 7 |
visualizeGRN( |
8 | 8 |
GRN, |
... | ... |
@@ -38,15 +38,15 @@ visualizeGRN( |
38 | 38 |
|
39 | 39 |
\item{pdf_height}{Number >0. Default 12. Height of the PDF, in cm.} |
40 | 40 |
|
41 |
-\item{title}{NULL or Character. Default NULL. Title to be assigned to the plot.} |
|
41 |
+\item{title}{\code{NULL} or Character. Default \code{NULL}. Title to be assigned to the plot.} |
|
42 | 42 |
|
43 |
-\item{maxRowsToPlot}{Numeric. Default 500. Refers to the maximum number of connections to be plotted. If the network size is above this limit, nothing will be drawn. In such a case, it may help to either increase the value of this parameter or set the filtering criteria for the network to be more stringent, so that the network becomes smaller.} |
|
43 |
+\item{maxRowsToPlot}{Integer > 0. Default 500. Refers to the maximum number of connections to be plotted. If the network size is above this limit, nothing will be drawn. In such a case, it may help to either increase the value of this parameter or set the filtering criteria for the network to be more stringent, so that the network becomes smaller.} |
|
44 | 44 |
|
45 | 45 |
\item{nCommunitiesMax}{Integer > 0. Default 8. Maximum number of communities that get a distinct coloring. All additional communities will be colored with the same (gray) color.} |
46 | 46 |
|
47 | 47 |
\item{graph}{Character. Default \code{TF-gene}. One of: \code{TF-gene}, \code{TF-peak-gene}. Whether to plot a graph with links from TFs to peaks to gene, or the graph with the inferred TF to gene connections.} |
48 | 48 |
|
49 |
-\item{colorby}{Character. Default \code{type}. One of \code{type}, code \code{community}. Color the vertices by either type (TF/peak/gene) or community. See \code{\link{calculateCommunitiesStats}}} |
|
49 |
+\item{colorby}{Character. Default \code{type}. Either \code{type} or \code{community}. Color the vertices by either type (TF/peak/gene) or community. See \code{\link{calculateCommunitiesStats}}} |
|
50 | 50 |
|
51 | 51 |
\item{layout}{Character. Default \code{fr}. One of \code{star}, \code{fr}, \code{sugiyama}, \code{kk}, \code{lgl}, \code{graphopt}, \code{mds}, \code{sphere}} |
52 | 52 |
|
... | ... |
@@ -66,7 +66,7 @@ visualizeGRN( |
66 | 66 |
The same \code{\linkS4class{GRN}} object, without modifications. |
67 | 67 |
} |
68 | 68 |
\description{ |
69 |
-This function requires a filtered set of connections in the \code{\linkS4class{GRN}} object as generated by \code{\link{filterGRNAndConnectGenes}} |
|
69 |
+This function can visualize a filtered eGRN in a very flexible manner and requires a filtered set of connections in the \code{\linkS4class{GRN}} object as generated by \code{\link{filterGRNAndConnectGenes}}. |
|
70 | 70 |
} |
71 | 71 |
\examples{ |
72 | 72 |
GRN = loadExampleObject() |
... | ... |
@@ -34,13 +34,6 @@ opts_chunk$set( |
34 | 34 |
# Motivation and Necessity {#motivation} |
35 | 35 |
|
36 | 36 |
|
37 |
-<center> |
|
38 |
-```{r logo, echo=FALSE, fig.cap="<i>Macrophages eGRN as constructed by the `GRaNIE` package</i>", out.width = '50%', fig.align="center"} |
|
39 |
-knitr::include_graphics("../man/figures/logo.png") |
|
40 |
-``` |
|
41 |
-</center> |
|
42 |
- |
|
43 |
- |
|
44 | 37 |
Genetic variants associated with diseases often affect non-coding regions, thus likely having a regulatory role. To understand the effects of genetic variants in these regulatory regions, identifying genes that are modulated by specific regulatory elements (REs) is crucial. The effect of gene regulatory elements, such as enhancers, is often cell-type specific, likely because the combinations of transcription factors (TFs) that are regulating a given enhancer have cell-type specific activity. This TF activity can be quantified with existing tools such as `diffTF` and captures differences in binding of a TF in open chromatin regions. Collectively, this forms a enhancer-mediated gene regulatory network (*eGRN*) with cell-type and data-specific TF-RE and RE-gene links. Here, we reconstruct such a *eGRN* using bulk RNA-seq and open chromatin (e.g., using ATAC-seq or ChIP-seq for open chromatin marks) and optionally TF activity data. Our network contains different types of links, connecting TFs to regulatory elements, the latter of which is connected to genes in the vicinity or within the same chromatin domain (*TAD*). We use a statistical framework to assign empirical FDRs and weights to all links using a permutation-based approach. |
45 | 38 |
|
46 | 39 |
In summary, we present a framework to reconstruct predictive enhancer-mediated regulatory network models that are based on integrating of expression and chromatin accessibility/activity pattern across individuals, and provide a comprehensive resource of cell-type specific gene regulatory networks for particular cell types. |
... | ... |
@@ -68,6 +61,7 @@ The basic installation installs all required packages but not necessarily those |
68 | 61 |
- `IHW`: Needed only for the function `filterGRNAndConnectGenes` for p-value adjustment of the raw peak-gene p-values when using the `IHW` framework (parameter `peak_gene.fdr.method`) |
69 | 62 |
- `robust`: Needed only when setting `addRobustRegression = TRUE` in the functions `addConnections_peak_gene()` and `add_TF_gene_correlation()` for enabling robust regression (an experimental, largely undocumented feature so far). |
70 | 63 |
- `csaw`: Needed only for a *cyclic LOESS* normalization, which is the default normalization currently for adding TF activity data via `addData_TFActivity()` |
64 |
+ - `variancePartition`: Needed only for the function `add_featureVariation` to quantify multiple sources of biological and technical variation for features (TFs, peaks, and genes) |
|
71 | 65 |
- **Enrichment-associated packages** |
72 | 66 |
- `topGO`, `ReactomePA`, `DOSE`, `clusterProfiler`: Needed only for all enrichment functions for `GO`, `Reactome`, `DO`, and `KEGG` respectively |
73 | 67 |
- **Computational efficacy** |
... | ... |
@@ -425,15 +425,19 @@ saveRDS(GRN, GRN_file_outputRDS) |
425 | 425 |
|
426 | 426 |
## Retrieve filtered connections |
427 | 427 |
|
428 |
-We are now ready to retrieve the connections and the additional data we added to them. This can be done with the helper function `getGRNConnections()` that retrieves a data frame from a `GRaNIE` object from a particular slot. Here, we specify `all.filtered`, as we want to retrieve all filtered connections. For more parameter details, see the R help (`getGRNConnections`). Note that the first time, we assign a different variable to the return of the function (i.e., `GRN_connections.all` and NOT `GRN` as before). Importantly, we have to select a new variable as we would otherwise overwrite our `GRN` object altogether! All `get` functions from the `GRaNIE` package return an element from within the object and NOT the object itself, so please keep that in mind and always check what the functions returns before running it. You can simply do so in the R help (`?getGRNConnections`). |
|
428 |
+We are now ready to retrieve the filtered connections, along with adding various additional metadata (optional). This can be done with the helper function `getGRNConnections()` that retrieves the filtered eGRN as a data frame from a `GRaNIE` object. Here, we specify `all.filtered`, as we want to retrieve all filtered connections (i.e., the eGRN). For more parameter details, see the R help (`getGRNConnections`). |
|
429 |
+ |
|
430 |
+Here, for example, we add various additional information to the resulting data frame: TF-gene correlations, and gene metadata. We could also add TF and peak metadata as well as the results from running the `variancePartition` package, but this is not done here and we leave this as an exercise to the reader! |
|
431 |
+ |
|
432 |
+Note that the first time, we assign a different variable to the return of the function (i.e., `GRN_connections.all` and NOT `GRN` as before). Importantly, we have to select a new variable as we would otherwise overwrite our `GRN` object altogether! All `get` functions from the `GRaNIE` package return an element from within the object and NOT the object itself, so please keep that in mind and always check what the functions returns before running it. You can simply do so in the R help (`?getGRNConnections`). |
|
429 | 433 |
|
430 | 434 |
```{r, include=TRUE, eval = TRUE, class.output="scroll-200"} |
431 |
-GRN_connections.all = getGRNConnections(GRN, type = "all.filtered", include_TF_gene_correlations = TRUE) |
|
435 |
+GRN_connections.all = getGRNConnections(GRN, type = "all.filtered", include_TF_gene_correlations = TRUE, include_geneMetadata = TRUE) |
|
432 | 436 |
|
433 | 437 |
GRN_connections.all |
434 | 438 |
``` |
435 | 439 |
|
436 |
-The table contains many columns, and the prefix of each column name indicates the part of the *eGRN* network that the column refers to (e.g., TFs, TF-enhancers, enhancers, enhancer-genes or genes, or TF-gene if the function `add_TF_gene_correlation()` has been run before). Data are stored in a format that minimizes the memory footprint (e.g., each character column is stored as a factor). This table can now be used for any downstream analysis, as it is just a normal data frame. |
|
440 |
+The table contains many columns, and the prefix of each column name indicates the part of the *eGRN* network that the column refers to (e.g., TFs, TF-peaks, peaks, peaks-genes or genes, or TF-gene if the function `add_TF_gene_correlation()` has been run before). Data are stored in a format that minimizes the memory footprint (e.g., each character column is stored as a factor). This table can now be used for any downstream analysis, as it is just a normal data frame. |
|
437 | 441 |
|
438 | 442 |
|
439 | 443 |
## Construct the eGRN graph |