Name Mode Size
.github 040000
R 040000
inst 040000
man 040000
tests 040000
vignettes 040000
.Rbuildignore 100644 0 kb
DESCRIPTION 100644 2 kb
NAMESPACE 100644 3 kb 100644 0 kb 100644 4 kb
# treeclimbR <!-- badges: start --> [![R-CMD-check](]( <!-- badges: end --> `treeclimbR` is an algorithm to pinpoint the optimal data-dependent resolution for interpreting hierarchical hypotheses. The algorithm is described in more detail in the following paper: Huang R, Soneson C, Germain PL, Schmidt TSB, Mering CV, Robinson MD: [treeclimbR pinpoints the data-dependent resolution of hierarchical hypotheses]( _Genome Biology_ 22(1):157 (2021). ## Installation `treeclimbR` can be installed from Bioconductor (release 3.19 onwards) via ``` r if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("treeclimbR") ``` ## Usage A detailed vignette outlining the main functionality is available [here]( ## Basic example (more details [here]( In this example application, we first generate a random tree with 100 leaves, with each leaf representing an entity (e.g., a microbial species). Then we sample counts (abundances) of these entities in each sample from a multinomial distribution. In total, there are 40 samples (20 assigned to group A and the other 20 to group B). 18 entities are randomly selected to display differential abundance between the groups (and will have their counts multiplied by 2 in group A). Before running `treeclimbR`, we expand the leaf-level count matrix to also encompass internal nodes, by summing the counts for all the corresponding leaves. Next, we perform a Wilcoxon rank sum test on each node in the tree to obtain P-values and directions of change. Details of the analysis are available [here]( The question is now whether it is beneficial to interpret parts of the tree on a level higher up in the tree than the leaves - this would be the case if there are subtrees where all nodes and leaves change consistently in the same direction. In this situation, we can summarize the signal on the root level of that subtree, which will reduce the length of the result list and improve the interpretability. The optimal aggregation level can vary across the tree, and will be inferred from the data. Thus, `treeclimbR` will propose a range of aggregation 'candidates', corresponding to different values of a threshold parameter `t`. These candidates are shown in the animation below. Note that as `t` increases, the aggregation happens further up the tree. Orange branches correspond to branches with a true differential signal, and blue circles indicate the aggregated nodes for a given threshold value. The heatmap shows the abundances of entities (rows) across the samples (columns) split by group. <p align="center"> <img src=""> </p> The trees below compare the nodes that are identified as significantly differentially abundant with `treeclimbR` for the optimal value of `t` (in red) to the leaves that are found to be significant (at an adjusted p-value threshold of 0.05) after applying a multiple hypothesis testing correction using the Benjamini-Hochberg method to the leaf-level results only (in blue). <p align="center"> <img src=""> </p> ## Other simulation scenarios (more details [here]( The animations below show the identified candidates under other simulation settings. Orange branches represent 'positive' signal (higher abundance in group B compared to group A), and blue branches represent 'negative' signal (lower abundance in group B compared to group A). <p align="center"> <img src=""> </p> <p align="center"> <img src=""> </p> ## Learn more Additional examples of applying `treeclimbR` to different types of data can be found [here](