Name Mode Size
R 040000
inst 040000
man 040000
src 040000
tests 040000
vignettes 040000
.Rbuildignore 100644 0 kb
.gitignore 100644 0 kb
DESCRIPTION 100644 2 kb
LICENSE 100644 0 kb
LICENSE.md 100644 1 kb
NAMESPACE 100644 5 kb
NEWS 100644 2 kb
README.md 100644 6 kb
README.md
# DOTSeq `DOTSeq` is an R package for identifying **differentially translated open reading frames (ORFs)** from ribosome profiling (Ribo-seq) and matched RNA-seq datasets. Unlike traditional gene-level approaches, `DOTSeq` performs analysis at the **ORF level**, enabling detection of: - **Differential ORF Usage (DOU)** — changes in ORF usage within the same gene across conditions. - **Differential Translation Efficiency (DTE)** — changes in ribosome loading relative to RNA level across conditions. `DOTSeq` models Ribo-seq and RNA-seq read counts using a **beta-binomial generalised linear model (GLM)** implemented via [`glmmTMB`](https://CRAN.R-project.org/package=glmmTMB). It supports experimental designs with multiple conditions, and uses an interaction term (`condition:strategy`) to isolate translation-specific effects. Post hoc contrasts are computed using [`emmeans`](https://CRAN.R-project.org/package=emmeans), and empirical Bayes shrinkage is applied via [`ashr`](https://CRAN.R-project.org/package=ashr). ## DEPENDENCIES * R (>= 4.5.0) * biomaRt (>=2.65.0) * SummarizedExperiment (>= 1.39.1) * Bioc.gff (>= 0.99.17) * DESeq2 (>=1.49.4) * GenomicRanges (>=1.61.5) * IRanges (>=2.43.5) * S4Vectors (0.47.4) * ashr (>=2.2-63) * DHARMa (>=0.4.7) * emmeans (>=1.11.2-8) * glmmTMB (>=1.1.12) * eulerr (>=7.0.4) * pbapply (>=1.7-4) ## INSTALLATION Please ensure the dependencies listed above are installed using the following steps before installing `DOTSeq`: ```r # Create a directory for R packages if not already package_dir <- file.path(Sys.getenv("HOME"), "R/4.5") dir.create(package_dir, showWarnings = TRUE, recursive = TRUE) .libPaths(c(package_dir, .libPaths())) # Install BiocManager if not already available if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager", lib = package_dir) # Initialise usage of Bioconductor 3.22 options(repos = BiocManager::repositories(version = "3.22")) # Install DOTSeq and required packages with automatic update confirmation BiocManager::install("compgenom/DOTSeq", lib = package_dir, ask = FALSE) ``` ## DOCUMENTATION TL;DR: To understand how to use `DOTSeq` without going through the preprocessing steps, please refer to the [vignettes](https://github.com/compgenom/DOTSeq/tree/main/vignettes). ### Preprocessing Steps Required Before Running `DOTSeq`: #### Step 1. Align Ribo-seq and RNA-seq reads We use a publicly available HeLa cell cycle dataset from [Ly 2024](https://pubmed.ncbi.nlm.nih.gov/39443796/). ```shell # Clone DOTSeq repository git clone https://github.com/compgenom/DOTSeq.git # Assume FASTQ files are downloaded via SRA Toolkit and stored in: # DOTSeq/inst/extdata/ly_2024 # STAR index will be generated in: # DOTSeq/inst/extdata/hg38_star_index # Download GENCODE annotation and transcript FASTA files wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_47/gencode.v47.annotation.gtf.gz \ -O DOTSeq/inst/extdata/gencode.v47.annotation.gtf.gz wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_47/gencode.v47.pc_transcripts.fa.gz \ -O DOTSeq/inst/extdata/gencode.v47.pc_transcripts.fa.gz # Assume reference genome FASTA is downloaded # Generate STAR index STAR --runMode genomeGenerate \ --runThreadN 32 \ --genomeFastaFiles DOTSeq/inst/extdata/hg38.fa \ --sjdbGTFfile DOTSeq/inst/extdata/gencode.v47.annotation.gtf.gz \ --genomeDir DOTSeq/inst/extdata/hg38_star_index # Trim and align reads for i in DOTSeq/inst/extdata/ly_2024/*/*/*.fastq.gz; do cutadapt -j 16 -m 15 -u 8 -e 0.1 --match-read-wildcards \ -a TCGTATGCCGTCTTCTGCTTG -O 1 \ -o $(dirname "$i")/$(basename "$i" .fastq.gz).trimmed.fasta.gz "$i" STAR --runMode alignReads \ --runThreadN 32 \ --outFilterType BySJout --outFilterMismatchNmax 2 \ --genomeDir DOTSeq/inst/extdata/hg38_star_index \ --readFilesIn $(dirname "$i")/$(basename "$i" .fastq.gz).trimmed.fasta.gz \ --readFilesCommand zcat \ --outFileNamePrefix $(dirname "$i")/$(basename "$i" .fastq.gz) \ --outSAMtype BAM SortedByCoordinate \ --quantMode TranscriptomeSAM GeneCounts \ --outFilterMultimapNmax 1 --outFilterMatchNmin 16 \ --alignEndsType EndToEnd --outSAMattributes All done ``` #### Step 2. Prepare ORF-level GTF and BED files Step-by-step on how to prepare ORF-level annotation is available in the [vignettes](https://github.com/compgenom/DOTSeq/tree/main/vignettes). GTF files from GENCODE, Ensembl, or Araport should be used as input. Alternatively, DOTSeq accept flattened annotation files generated using the [`RIBOSS`](https://github.com/lcscs12345/riboss) engine. ```shell # Generate ORF-level GTF using DOTSeq's Python script python DOTSeq/inst/scripts/orf_to_gtf.py \ --gtf DOTSeq/inst/extdata/gencode.v47.annotation.gtf.gz \ --transcripts DOTSeq/inst/extdata/gencode.v47.pc_transcripts.fa.gz \ --output DOTSeq/inst/extdata/dotseq ``` #### Step 3: Count reads ```shell featureCounts -f -O -s 0 -T 16 \ -F GTF -a DOTSeq/inst/extdata/dotseq.gtf \ -o DOTSeq/inst/extdata/featureCounts.dotseq.out DOTSeq/inst/extdata/ly_2024/*/*/*Aligned.sortedByCoord.out.bam ``` #### Step 4: Run `DOTSeq` Follow the analysis workflow as demonstrated in the [vignettes](https://github.com/compgenom/DOTSeq/tree/main/vignettes). ## CONTRIBUTING We welcome contributions from the community! Whether it's fixing bugs, improving documentation, or suggesting new features, your input is valuable. By participating in this project, you agree to abide by the terms outlined in this [Contributor Code of Conduct](https://www.contributor-covenant.org/version/1/0/0/code-of-conduct/). To get started: - Fork the repository - Create a new branch for your feature or fix - Submit a pull request with a clear description of your changes If you have questions or ideas, feel free to open an issue or start a discussion. ## CONTACTS AND BUG REPORTS - Chun Shen Lim: chunshen [dot] lim [at] otago [dot] ac [dot] nz - Gabrielle Chieng: gabrielle [dot] chieng [at] postgrad [dot] otago [dot] ac [dot] nz