Name Mode Size
.github 040000
R 040000
data 040000
inst 040000
man 040000
tests 040000
vignettes 040000
.Rbuildignore 100644 0 kb
.gitignore 100644 1 kb
DESCRIPTION 100644 3 kb
NAMESPACE 100644 3 kb 100644 15 kb
README.Rmd 100644 12 kb 100644 15 kb
⚖<code>EpiCompare</code>⚖<br>QC and Benchmarking of Epigenomic Datasets ================ <img src='' title='Hex sticker for EpiCompare' height='300'><br> [![](]( [![](]( [![](]( [![download](]( [![License: GPL-3](]( [![](]( <br> [![](]( [![](]( [![](]( <br> [![R build status](]( [![](]( <br> <a href='' target='_blank'><img src='' title='Codecov icicle graph' width='200' height='50' style='vertical-align: top;'></a> <h4> Authors: <i>Sera Choi, Brian Schilder, Leyla Abbasova, Alan Murphy, Nathan Skene</i> </h4> <h5> <i>Updated</i>: Mar-08-2023 </h5> # Introduction `EpiCompare` is an R package for comparing multiple epigenomic datasets for quality control and benchmarking purposes. The function outputs a report in HTML format consisting of three sections: 1. General Metrics: Metrics on peaks (percentage of blacklisted and non-standard peaks, and peak widths) and fragments (duplication rate) of samples. 2. Peak Overlap: Frequency, percentage, statistical significance of overlapping and non-overlapping peaks. This also includes Upset, precision-recall and correlation plots. 3. Functional Annotation: Functional annotation (ChromHMM, ChIPseeker and enrichment analysis) of peaks. Also includes peak enrichment around Transcription Start Site. *Note*: Peaks located in blacklisted regions and non-standard chromosomes are removed from the files prior to analysis. # Installation ## Standard To install `EpiCompare` use: ``` r if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("EpiCompare") ``` ## All dependencies <details> <summary> 👈 <strong>Details</strong> </summary> Installing all *Imports* and *Suggests* will allow you to use the full functionality of `EpiCompare` right away, without having to stop and install extra dependencies later on. To install these packages as well, use: ``` r BiocManager::install("EpiCompare", dependencies=TRUE) ``` Note that this will increase installation time, but it means that you won’t have to worry about installing any R packages when using functions with certain suggested dependencies </details> ## Development <details> <summary> 👈 <strong>Details</strong> </summary> To install the development version of `EpiCompare`, use: ``` r if (!require("remotes")) install.packages("remotes") remotes::install_github("neurogenomics/EpiCompare") ``` </details> ## Citation If you use `EpiCompare`, please cite: <!-- Modify this by editing the file: inst/CITATION --> > EpiCompare: R package for the comparison and quality control of > epigenomic peak files (2022) Sera Choi, Brian M. Schilder, Leyla > Abbasova, Alan E. Murphy, Nathan G. Skene, bioRxiv, 2022.07.22.501149; > doi: <> # Documentation ## [EpiCompare website]( ## [Docker/Singularity container]( ## [Bioconductor page]( ### :warning: Note on documentation versioning The documentation in this README and the [GitHub Pages website]( pertains to the *development* version of `EpiCompare`. Older versions of `EpiCompare` may have slightly different documentation (e.g. available functions, parameters). For documentation in older versions of `EpiCompare`, please see the **Documentation** section of the relevant version on [Bioconductor]( # Usage Load package and example datasets. ``` r library(EpiCompare) data("encode_H3K27ac") # example peakfile data("CnT_H3K27ac") # example peakfile data("CnR_H3K27ac") # example peakfile data("CnT_H3K27ac_picard") # example Picard summary output data("CnR_H3K27ac_picard") # example Picard summary output ``` Prepare input files: ``` r # create named list of peakfiles peakfiles <- list("CnT"=CnT_H3K27ac, "CnR"=CnR_H3K27ac) # set ref file and name reference <- list("ENCODE_H3K27ac" = encode_H3K27ac) # create named list of Picard summary picard_files <- list("CnT"=CnT_H3K27ac_picard, "CnR"=CnR_H3K27ac_picard) ``` <details> <summary> <strong>👈 Tips on importing user-supplied files</strong> </summary> `EpiCompare::gather_files` is helpful for identifying and importing peak or picard files. ``` r # To import BED files as GRanges object peakfiles <- EpiCompare::gather_files(dir = "path/to/peaks/", type = "peaks.stringent") # EpiCompare alternatively accepts paths (to BED files) as input peakfiles <- list(sample1="/path/to/peaks/file1_peaks.stringent.bed", sample2="/path/to/peaks/file2_peaks.stringent.bed") # To import Picard summary output txt file as data frame picard_files <- EpiCompare::gather_files(dir = "path/to/peaks", type = "picard") ``` </details> Run `EpiCompare()`: ``` r EpiCompare::EpiCompare(peakfiles = peakfiles, genome_build = list(peakfiles="hg19", reference="hg38"), genome_build_output = "hg19", picard_files = picard_files, reference = reference, run_all = TRUE output_dir = tempdir()) ``` #### Required Inputs These input parameters must be provided: <details> <summary> 👈 <strong>Details</strong> </summary> - `peakfiles` : Peakfiles you want to analyse. EpiCompare accepts peakfiles as GRanges object and/or as paths to BED files. Files must be listed and named using `list()`. E.g. `list("name1"=peakfile1, "name2"=peakfile2)`. - `genome_build` : A named list indicating the human genome build used to generate each of the following inputs: - `peakfiles` : Genome build for the `peakfiles` input. Assumes genome build is the same for each element in the `peakfiles` list. - `reference` : Genome build for the `reference` input. - `blacklist` : Genome build for the `blacklist` input. <br> E.g. `genome_build = list(peakfiles="hg38", reference="hg19", blacklist="hg19")` - `genome_build_output` Genome build to standardise all inputs to. Liftovers will be performed automatically as needed. Default is “hg19”. - `blacklist` : Peakfile as GRanges object specifying genomic regions that have anomalous and/or unstructured signals independent of the cell-line or experiment. For human hg19 and hg38 genome, use built-in data `data(hg19_blacklist)` and `data(hg38_blacklist)` respectively. For mouse mm10 genome, use built-in data `data(mm10_blacklist)`. - `output_dir` : Please specify the path to directory, where all `EpiCompare` outputs will be saved. </details> #### Optional Inputs The following input files are optional: <details> <summary> 👈 <strong>Details</strong> </summary> - `picard_files` : A list of summary metrics output from [Picard]( *Picard MarkDuplicates* can be used to identify the duplicate reads amongst the alignment. This tool generates a summary output, normally with the ending *.markdup.MarkDuplicates.metrics.txt*. If this input is provided, metrics on fragments (e.g. mapped fragments and duplication rate) will be included in the report. Files must be in data.frame format and listed using `list()` and named using `names()`. To import Picard duplication metrics (.txt file) into R as data frame, use `picard <- read.table("/path/to/picard/output", header = TRUE, fill = TRUE)`. - `reference` : Reference peak file(s) is used in `stat_plot` and `chromHMM_plot`. File must be in `GRanges` object, listed and named using `list("reference_name" = GRanges_obect)`. If more than one reference is specified, `EpiCompare` outputs individual reports for each reference. However, please note that this can take awhile. </details> #### Optional Plots By default, these plots will not be included in the report unless set to `TRUE`. To turn on all features at once, simply use the `run_all=TRUE` argument: <details> <summary> 👈 <strong>Details</strong> </summary> - `upset_plot` : Upset plot of overlapping peaks between samples. - `stat_plot` : included only if a `reference` dataset is provided. The plot shows statistical significance (p/q-values) of sample peaks that are overlapping/non-overlapping with the `reference` dataset. - `chromHMM_plot` : ChromHMM annotation of peaks. If a `reference` dataset is provided, ChromHMM annotation of overlapping and non-overlapping peaks with the `reference` is also included in the report. - `chipseeker_plot` : ChIPseeker annotation of peaks. - `enrichment_plot` : KEGG pathway and GO enrichment analysis of peaks. - `tss_plot` : Peak frequency around (+/- 3000bp) transcriptional start site. Note that it may take awhile to generate this plot for large sample sizes. - `precision_recall_plot` : Plot showing the precision-recall score across the peak calling stringency thresholds. - `corr_plot` : Plot showing the correlation between the quantiles when the genome is binned at a set size. These quantiles are based on the intensity of the peak, dependent on the peak caller used (q-value for MACS2). </details> #### Other Options <details> <summary> 👈 <strong>Details</strong> </summary> - `chromHMM_annotation` : Cell-line annotation for ChromHMM. Default is K562. Options are: - “K562” = K-562 cells - “Gm12878” = Cellosaurus cell-line GM12878 - “H1hesc” = H1 Human Embryonic Stem Cell - “Hepg2” = Hep G2 cell - “Hmec” = Human Mammary Epithelial Cell - “Hsmm” = Human Skeletal Muscle Myoblasts - “Huvec” = Human Umbilical Vein Endothelial Cells - “Nhek” = Normal Human Epidermal Keratinocytes - “Nhlf” = Normal Human Lung Fibroblasts - `interact` : By default, all heatmaps (percentage overlap and ChromHMM heatmaps) in the report will be interactive. If set FALSE, all heatmaps will be static. N.B. If `interact=TRUE`, interactive heatmaps will be saved as html files, which may take time for larger sample sizes. - `output_filename` : By default, the report is named *EpiCompare.html*. You can specify the file name of the report here. - `output_timestamp` : By default FALSE. If TRUE, the filename of the report includes the date. </details> #### Outputs `EpiCompare` outputs the following: 1. **HTML report**: A summary of all analyses saved in specified `output_dir` 2. **EpiCompare_file**: if `save_output=TRUE`, all plots generated by `EpiCompare` will be saved in *EpiCompare_file* directory also in specified `output_dir` An example report comparing ATAC-seq and DNase-seq can be found [here]( ## Datasets `EpiCompare` includes several built-in datasets: <details> <summary> 👈 <strong>Details</strong> </summary> - `encode_H3K27ac`: Human H3K27ac peak file generated with ChIP-seq using K562 cell-line. Taken from [ENCODE]( project. For more information, run `?encode_H3K27ac`. - `CnT_H3K27ac`: Human H3K27ac peak file generated with CUT&Tag using K562 cell-line from [Kaya-Okur et al., (2019)]( For more information, run `?CnT_H3K27ac`. - `CnR_H3K27ac`: Human H3K27ac peak file generated with CUT&Run using K562 cell-line from [Meers et al., (2019)]( For more details, run `?CnR_H3K27ac`. </details> ## Session Info <details> <summary> 👈 <strong>Details</strong> </summary> ``` r utils::sessionInfo() ``` ## R version 4.2.1 (2022-06-23) ## Platform: x86_64-apple-darwin17.0 (64-bit) ## Running under: macOS Big Sur ... 10.16 ## ## Matrix products: default ## BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib ## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib ## ## locale: ## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## loaded via a namespace (and not attached): ## [1] pillar_1.8.1 compiler_4.2.1 RColorBrewer_1.1-3 ## [4] BiocManager_1.30.20 bitops_1.0-7 yulab.utils_0.0.6 ## [7] tools_4.2.1 digest_0.6.31 jsonlite_1.8.4 ## [10] evaluate_0.20 lifecycle_1.0.3 tibble_3.1.8 ## [13] gtable_0.3.1 pkgconfig_2.0.3 rlang_1.0.6 ## [16] graph_1.76.0 cli_3.6.0 rstudioapi_0.14 ## [19] rvcheck_0.2.1 yaml_2.3.7 xfun_0.37 ## [22] fastmap_1.1.0 dplyr_1.1.0 knitr_1.42 ## [25] generics_0.1.3 desc_1.4.2 vctrs_0.5.2 ## [28] dlstats_0.1.6 stats4_4.2.1 rprojroot_2.0.3 ## [31] grid_4.2.1 tidyselect_1.2.0 here_1.0.1 ## [34] Biobase_2.58.0 glue_1.6.2 R6_2.5.1 ## [37] fansi_1.0.4 XML_3.99-0.13 RBGL_1.74.0 ## [40] rmarkdown_2.20.1 ggplot2_3.4.1 badger_0.2.3 ## [43] magrittr_2.0.3 BiocGenerics_0.44.0 biocViews_1.66.2 ## [46] scales_1.2.1 htmltools_0.5.4 rworkflows_0.99.7 ## [49] RUnit_0.4.32 colorspace_2.1-0 renv_0.17.0 ## [52] utf8_1.2.3 RCurl_1.98-1.10 munsell_0.5.0 </details> ## Contact ### [Neurogenomics Lab]( UK Dementia Research Institute Department of Brain Sciences Faculty of Medicine Imperial College London [GitHub]( [DockerHub]( <br>