Name Mode Size
R 040000
data 040000
inst 040000
man 040000
pkgdown 040000
tests 040000
vignettes 040000
.Rbuildignore 100644 0 kb
.gitignore 100644 0 kb
DESCRIPTION 100644 2 kb
NAMESPACE 100644 0 kb
NEWS.md 100644 1 kb
README.Rmd 100644 5 kb
README.md 100644 7 kb
_pkgdown.yml 100644 0 kb
README.md
<!-- README.md is generated from README.Rmd. Please edit that file --> # TrIdent **TrIdent - Transduction Identification** <a href="https://jlmaier12.github.io/TrIdent/"><img src="man/figures/logo.png" align="right" height="139" alt="TrIdent website" /></a> **TrIdent automates the analysis of transductomics data by detecting, classifying, and characterizing read coverage patterns associated with potential transduction events.** Transductomics, developed by Kleiner et al. (2020), is a DNA sequencing-based method for the detection and characterization of transduction events in pure cultures and complex communities. Transductomics relies on mapping sequencing reads from a viral-like particle (VLP)-fraction of a sample to contigs assembled from the metagenome (whole-community) of the same sample. Reads from bacterial DNA carried by VLPs will map back to the bacterial contigs of origin creating read coverage patterns indicative of ongoing transduction. **The read coverage patterns detected represent DNA being actively carried or transduced by VLPs. The read coverage patterns do not represent complete transduction events (i.e integration of transduced DNA into new bacterial chromosomes).** To obtain the data needed for transductomics, a microbiome sample of interest is split to prepare two sub-sample types: - Whole-community: Represents the ‘whole-community’ (all bacteria, fungi, virus, etc) in the microbiome of interest - VLP-fraction: Represents only the virus and ‘viral-like particles’ associated with the microbiome of interest - The VLP-fraction must be obtained by an appropriate ultra-purification protocol for your sample type to remove bacterial cells and contaminating free bacterial DNA. With transductomics and TrIdent, a researcher can obtain information about the phage-host pairs involved in transduction, the types of transduction occuring, and the region of the host genome that is potentially transduced, which allows exploration of transferred genes. **Reference:** Kleiner, M., Bushnell, B., Sanderson, K.E. et al. Transductomics: sequencing-based detection and analysis of transduced DNA in pure cultures and microbial communities. Microbiome 8, 158 (2020). <https://doi.org/10.1186/s40168-020-00935-5> ### Input files TrIdent detects read coverage patterns using a pattern-matching algorithm that operates on pileup files. A pileup file is a file format where each row summarizes the ‘pileup’ of reads at specific genomic locations. Pileup files can be used to generate a rolling mean of read coverages and associated base pair positions across a metagenome assembly which reduces data size while preserving read coverage patterns. **TrIdent requires that input pileups files be generated using a 100 bp window/bin size.** Some read mappers, like [BBMap](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbmap-guide/), will allow for the generation of pileup files in the [`bbmap.sh`](https://github.com/BioInfoTools/BBMap/blob/master/sh/bbmap.sh) command with the use of the `bincov` output with the `covbinsize=100` parameter/argument. **Otherwise, BBMap’s [`pileup.sh`](https://github.com/BioInfoTools/BBMap/blob/master/sh/pileup.sh) can convert .bam files produced by any read mapper to pileup files compatible with TrIdent using the `bincov` output with `binsize=100`.** TrIdent requires two pileup files from a transductomics dataset as input: - A VLP-fraction pileup: Sequencing reads from a sample’s ultra-purified VLP-fraction mapped to the whole-community metagenome assembly from the same sample. - A whole-community pileup: Sequencing reads from a sample’s whole-community mapped to the whole-community metagenome from the same sample. **The data used for each pileup file must originate from the same sample. Pileup files must use a 100 bp window/bin size for the rolling mean.** Transductomics sample preparation, sequencing procedures, and analysis methods are detailed in [Kleiner et al. (2020)](https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-020-00935-5) ## Installation Install TrIdent with BiocManager: ``` r if (!require("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } BiocManager::install("TrIdent") library(TrIdent) ``` Install the development version of TrIdent through Github with BiocManager: ``` r BiocManager::install("jlmaier12/TrIdent") library(TrIdent) ``` ## Quick Start ``` r ## Load TrIdent library(TrIdent) ## Load sample datasets data("VLPFractionSamplePileup") data("WholeCommunitySamplePileup") ## Run TrIdent: ## Run first: TrIdentOutput <- TrIdentClassifier( VLPpileup = VLPFractionSamplePileup, WCpileup = WholeCommunitySamplePileup ) #> Reformatting pileup files #> Starting pattern-matching... #> A quarter of the way done with pattern-matching #> Half of the way done with pattern-matching #> Almost done with pattern-matching! #> Determining sizes (bp) of pattern matches #> Identifying highly active/abundant or heterogenously integrated #> Prophage-like elements #> Finalizing output #> Execution time: 14.56secs #> 1 contigs were filtered out based on low read coverage #> 0 contigs were filtered out based on length #> #> HighCovNoPattern NoPattern Prophage-like Sloping #> 1 1 4 3 #> 3 of the prophage-like classifications are highly active or abundant #> 1 of the prophage-like classifications are mixed, i.e. heterogenously #> integrated into their bacterial host population ## Run second: plotTrIdentResults( VLPpileup = VLPFractionSamplePileup, WCpileup = WholeCommunitySamplePileup, TrIdentResults = TrIdentOutput ) #> $NODE_62 ``` <img src="man/figures/README-example-1.png" width="100%" /> #> #> $NODE_135 <img src="man/figures/README-example-2.png" width="100%" /> #> #> $NODE_1088 <img src="man/figures/README-example-3.png" width="100%" /> #> #> $NODE_352 <img src="man/figures/README-example-4.png" width="100%" /> #> #> $NODE_368 <img src="man/figures/README-example-5.png" width="100%" /> #> #> $NODE_560 <img src="man/figures/README-example-6.png" width="100%" /> #> #> $NODE_617 <img src="man/figures/README-example-7.png" width="100%" /> #> #> $NODE_2060 <img src="man/figures/README-example-8.png" width="100%" /> ``` r ## Run third: specializedTransductionID( VLPpileup = VLPFractionSamplePileup, TrIdentResults = TrIdentOutput ) #> 2 contigs have potential specialized transduction #> We recommend that you also view the results of this search with #> logScale=TRUE #> $summaryTable #> contigName specTransduc left right lengthLeft lengthRight #> 1 NODE_62 yes yes no 45400 <NA> #> 2 NODE_135 no no no <NA> <NA> #> 3 NODE_368 no no no <NA> <NA> #> 4 NODE_617 yes yes yes 33300 9800 #> #> $Plots #> $Plots$NODE_62 ``` <img src="man/figures/README-example-9.png" width="100%" /> #> #> $Plots$NODE_135 <img src="man/figures/README-example-10.png" width="100%" /> #> #> $Plots$NODE_368 <img src="man/figures/README-example-11.png" width="100%" /> #> #> $Plots$NODE_617 <img src="man/figures/README-example-12.png" width="100%" />