[![Build Status](https://travis-ci.org/pliu55/pram.svg)](https://travis-ci.org/pliu55/pram) [![bioc](http://www.bioconductor.org/shields/years-in-bioc/pram.svg)](http://bioconductor.org/packages/devel/bioc/html/pram.html) PRAM: Pooling RNA-seq and Assembling Models =========================================== Table of Contents ----------------- * [Introduction](#Introduction) * [Installation](#Installation) * [Reference](#Reference) * [Contact](#Contact) * [License](#License) * * * ## <a name='Introduction'></a> Introduction Pooling RNA-seq and Assembling Models (__PRAM__) is an __Bioconductor__ __R__ package that utilizes multiple RNA-seq datasets to predict transcript models. The workflow of PRAM contains four steps, which is shown in the figure below with function names and associated key parameters. PRAM has a [vignette](https://bioconductor.org/packages/devel/bioc/vignettes/pram/inst/doc/pram.pdf) that describes each function in details. <p align='center'> <img src="vignettes/workflow_noScreen.jpg" width="400" height="407"> </p> ## <a name='Installation'></a> Installation ### From GitHub Start __R__ and enter: ```r devtools::install_github('pliu55/pram') ``` ### From Bioconductor Start __R__ and enter: ```r if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("pram") ``` <!-- - Cufflinks v2.2.1 macOS binary have some issues - it will report segmentation fault for the same bam file, which Linux Cufflinks runs ok - Have to use Cufflinks v2.1.1 for macOS instead --> <!-- ## <a name='Quick-start'></a>Quick start PRAM provides a function `runPRAM()` to let you run through the whole workflow. ### <a name='predict-only'></a> Predict transcript models only For a given gene annotation and RNA-seq alignments, you can predict transcript models in intergenic genomic regions: ```R runPRAM(in_gtf, in_bamv, out_gtf) ``` - `in_gtf`: an input GTF file defining genomic coordinates of existing genes. Required to have an attribute of __gene_id__ in the ninth column. - `in_bamv`: a vector of input BAM file(s) containing RNA-seq alignments. Currently, PRAM only supports strand-specific paired-end RNA-seq with the first mate on the right-most of transcript coordinate, i.e., 'fr-firststrand' by Cufflinks definition. - `out_gtf`: an output GTF file of predicted transcript models ### <a name='predict-screen'></a> Predict transcript models and screen them by ChIP-seq If you are interested to predict intergenic transcripts for a particular cell or tissue type, you can use epigenetic ChIP-seq data together with known transcripts and their expression levels to further screen intergenic transcript models: ``` runPRAM(in_gtf, in_bamv, out_gtf, in_bedv, training_tpms, training_gtf) ``` - `in_gtf`, `in_bamv`, and `out_gtf` are the same as described above - `in_bedv`: A vector of BED file(s) containing ChIP-seq alignments. - `training_tpms`: A vector of RSEM quantification results for known transcripts - `training_gtf`: A GTF file defining genomic coordinates of known transcripts ### <a name='Examples'></a> Examples PRAM has included input examples files in its `extdata/demo/` folder. The table below provides a quick summary of all the example files. | input argument | file name(s) | |:--------------:|:------------:| | `in_gtf` | [in.gtf](inst/extdata/demo/in.gtf) | | `in_bamv` | [SZP.bam](inst/extdata/demo/SZP.bam), [TLC.bam](inst/extdata/demo/TLC.bam) | | `in_bedv` | H3K79me2.bed.gz, POLR2.bed.gz | | `training_tpms`| AED1.isoforms.results, AED2.isoforms.results | | `training_gtf` | training.gtf | You can access example files by `system.file()` in __R__, e.g. for the argument `in_gtf`, you can access its example file by ```R system.file('extdata/demo/in.gtf', package='pram') ``` Below shows usage of `runPRAM()` with example input files: ## ## Predict transcript models only ## ```R in_gtf = system.file('extdata/demo/in.gtf', package='pram') in_bamv = c( system.file('extdata/demo/SZP.bam', package='pram'), system.file('extdata/demo/TLC.bam', package='pram') ) pred_out_gtf = tempfile(fileext='.gtf') runPRAM(in_gtf, in_bamv, pred_out_gtf) ``` ## ## Predict transcript models and screen them by ChIP-seq data ## in_bedv = c( system.file('extdata/demo/H3K79me2.bed.gz', package='pram'), system.file('extdata/demo/POLR2.bed.gz', package='pram') ) training_tpms = c( system.file('extdata/demo/AED1.isoforms.results', package='pram'), system.file('extdata/demo/AED2.isoforms.results', package='pram') ) training_gtf = system.file('extdata/demo/training.gtf', package='pram') screen_out_gtf = tempfile(fileext='.gtf') runPRAM(in_gtf, in_bamv, screen_out_gtf, in_bedv, training_tpms, training_gtf) --> ## <a name="Reference"></a> Reference __PRAM: a novel pooling approach for discovering intergenic transcripts from large-scale RNA sequencing experiments__. Peng Liu, Alexandra A. Soukup, Emery H. Bresnick, Colin N. Dewey, and Sündüz Keleş. _bioRxiv_, 2019. __doi__: https://doi.org/10.1101/636282 For key results reported in the PRAM manuscript and scripts for reproducibility, please check out [this GitHub repository](https://github.com/pliu55/pram_paper). ## <a name="Contact"></a> Contact Got a question? Please report it at the [issues tab](https://github.com/pliu55/pram/issues) in this repository. ## <a name="License"></a> License PRAM is licensed under the [GNU General Public License v3](LICENSE).