Bioconductor Code: rhinotypeR

Name	Mode	Size
R	040000
inst	040000
man	040000
tests	040000
vignettes	040000
.gitignore	100644	0 kb
DESCRIPTION	100644	2 kb
LICENSE	100644	1 kb
NAMESPACE	100644	0 kb
NEWS	100644	0 kb
README.Rmd	100644	7 kb
README.md	100644	8 kb

README.md

# rhinotypeR   /** /** /** /*** *** | ** |**/ | ** | **__ ** /****** | ******* /** /******* /****** /****** /** /** /****** /****** | ** \ ** /**__ **| **__ **| **| **__ ** /**__ **|_ **_/ | ** | ** /**__ ** /**__ **| *******/ | ** \__/| ** \ **| **| ** \ **| ** \ ** | ** | ** | **| ** \ **| ********| **__ ** | ** | ** | **| **| ** | **| ** | ** | ** /**| ** | **| ** | **| **_____/| ** \ ** | ** | ** | **| **| ** | **| ******/ | *****/| *******/| ********| *******/| ** | |__/ |__/ |__/|__/|__/ |__/ \______/ \___/ \____ **| **____/ \_______/|__/ |__/ /** | **| ** | ******/| ** \______/ |__/ ## Table of Contents 1. [Background](#Background) 2. [Test-Data](#Test-Data) 3. [Workflow](#Workflow) 4. [Package](#Package) 5. [Contributors](#Contributors/%20Citation) ## Background Rhinoviruses (RV), common respiratory pathogens, are positive-sense, single-stranded RNA viruses characterized by a high antigenic diversity and mutation rate. These viruses are classified into 169 types across three species: RV-A, RV-B, and RV-C. Genotype assignment, a critical aspect of RV research, is based on pairwise genetic distances and phylogenetic clustering with prototype strains, a process currently executed manually and laboriously. We address this by presenting rhinotypeR, an R package that streamlines the classification of rhinoviruses ensuring rapid, reproducible results. ## Test-Data The project utilizes VP4/2 sequences available in the public domain from GenBank and reference prototype strains from www.picornaviridae.com The input datasets (target, reference and prototype) are fasta files. Here’s an example of a FASTA file: ![](inst/figures/example_fasta_file.png) ## Workflow ![](inst/figures/workflow.png) RhinotypeR workflow. The user downloads prototype strains using `getPrototypeSeqs()` function, combines these with their newly generated VP4/2 sequences, aligns and manually curates the alignment. The user then reads the curated alignment into R using `Biostrings::readDNAStringSet()` function. The DNAStringSet object can then be used to run all the second-level functions, including `assignTypes()` which assigns the sequences into genotypes, filters out the prototype sequences and returns the genotype assignment of the new sequences. This output can be used to visualise the frequency of assigned genotypes. The distance matrix object, an output of `pairwiseDistance()` function, can be used to create a phylogenetic tree or a heatmap to visualize genetic relatedness of sequences. ## Package Our package aims to streamline RV genotype assignment, facilitating genomic scientists in efficiently genotyping RV infections. ### Installation You can install rhinotypeR from BioConductor using if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("rhinotypeR") You can also install the development version of rhinotypeR from GitHub with: devtools::install_github("omicscodeathon/rhinotypeR") ##### Load Library library("rhinotypeR") #> #> Attaching package: 'rhinotypeR' #> The following objects are masked _by_ '.GlobalEnv': #> #> assignTypes, overallMeanDistance, pairwiseDistances, plotAA, SNPeek ## Functions The package encompasses functions to compute genetic distances, perform phylogenetic clustering, and compare sequences against RV prototype strains. These functionalities are designed to be user-friendly and adaptable to various research needs. The package (summarized in Table 1) does the following: - Assigns genotypes to query sequences - Computes for pairwise distance among query sequences - Calculates pairwise distance between query and prototype sequences - Calculates overall genetic distance of query sequences #### Table 1. A summary of the functions <table> <colgroup> <col style="width: 15%" /> <col style="width: 35%" /> <col style="width: 23%" /> <col style="width: 25%" /> </colgroup> <thead> <tr class="header"> <th>Function</th> <th>Role</th> <th>Input</th> <th>Output</th> </tr> </thead> <tbody> <tr class="odd"> <td><code>getPrototypeSeqs()</code></td> <td>Downloads rhinovirus prototype strains into a local directory. These sequences should be combined with and aligned alongside newly generated sequences before being imported into R for genotype assignment</td> <td>Destination path</td> <td>RV prototypes are downloaded into the local machine</td> </tr> <tr class="even"> <td><code>SNPeek()</code></td> <td>Visualizes single nucleotide polymorphisms (SNPs) relative to a specified reference sequence. To specify the reference, move it to the bottom of the alignment. Substitutions are color-coded by nucleotide: A = green, T = red, C = blue, G = yellow</td> <td>fasta file</td> <td>A plot highlighting SNPs per sequence</td> </tr> <tr class="odd"> <td><code>plotAA()</code></td> <td>Plots amino acid substitutions with a specified sequence as the reference. The input is an amino acid fasta file (translated DNA sequences). To specify the reference sequence, move it to the bottom of the alignment. Changes are coloured by the class of amino acid: Red = Positively charged, Blue = Negatively charged, Green = Polar, Yellow = Non-polar</td> <td>Amino acid fasta file</td> <td>A plot highlighting amino acid substitutions per sequence</td> </tr> <tr class="even"> <td><code>assignTypes()</code></td> <td>Rapidly assigns genotypes to input sequences. The input fasta file should include the prototype strains, which can be downloaded using getPrototypeSeqs()</td> <td>fasta file</td> <td>CSV file with three columns: sequence header, assigned type, and genetic distance</td> </tr> <tr class="odd"> <td><code>pairwiseDistances()</code></td> <td>Estimates pairwise distances across input sequences using a specified evolutionary model</td> <td>fasta file</td> <td>A dense distance matrix</td> </tr> <tr class="even"> <td><code>overallMeanDistance()</code></td> <td>Estimates the overall mean distance of input sequences</td> <td>fasta file</td> <td>A single numeric value</td> </tr> <tr class="odd"> <td><code>countSNPs()</code></td> <td>Counts single nucleotide polymorphisms across input sequences</td> <td>fasta file</td> <td>A dense matrix</td> </tr> <tr class="even"> <td><code>plotFrequency()</code></td> <td>Plots the frequency of assigned genotypes. This function uses the output of assignTypes() as input</td> <td>output from assignTypes</td> <td>Barplot</td> </tr> <tr class="odd"> <td><code>plotDistances()</code></td> <td>Visualizes pairwise genetic distances in a heatmap. This function uses the output of pairwiseDistances() as input</td> <td>distance matrix from prototype distance function</td> <td>Heatmap</td> </tr> <tr class="even"> <td><code>plotTree()</code></td> <td>Plots a simple phylogenetic tree using the genetic distances estimated by pairwiseDistances()</td> <td>output from pairwise distances</td> <td>A simple phylogenetic tree</td> </tr> </tbody> </table> ## Contributors/ Citation Martha M. Luka1,#, Ruth Nanjala2, Wafaa M. Rashed3,4, Winfred Gatua5,6, Olaitan I. Awe7 \# <m.luka.1@research.gla.ac.uk> Authors’ Affiliation 1. School of Biodiversity, One Health and Veterinary Medicine, University of Glasgow, Glasgow, G12 8QQ, UK. 2. Kennedy Institute of Rheumatology, Nuttfield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, UK. 3. Pharmacy Practice Department, Faculty of Pharmacy, Ahram Canadian University, Egypt. 4. Computational Systems Biology Laboratory, USP, Brazil. 5. MRC Integrative Epidemiology Unit, University of Bristol, UK. 6. Population Health Sciences, Bristol Medical School, UK. 7. African Society for Bioinformatics and Computational Biology, Cape Town, South Africa.