Name Mode Size
R 040000
inst 040000
man 040000
tests 040000
vignettes 040000
.Rbuildignore 100644 0 kb
.gitignore 100644 0 kb
DESCRIPTION 100644 1 kb
INSTALL 100644 2 kb
LICENSE 100644 34 kb
NAMESPACE 100644 0 kb
NEWS 100644 0 kb
README.Rmd 100644 2 kb
README.md 100644 4 kb
README.md
<img src="https://bioconductor.org/images/logo/svg/Logo.svg" align="right" /> # R package rBLAST - R Interface for the Basic Local Alignment Search Tool [![r-universe status](https://mhahsler.r-universe.dev/badges/rBLAST)](https://mhahsler.r-universe.dev/rBLAST) [![Package on Bioc](https://img.shields.io/badge/Bioconductor-blue)](https://bioconductor.org/packages/rBLAST) Interfaces the Basic Local Alignment Search Tool (BLAST) to search genetic sequence data bases with the Bioconductor infrastructure. This includes interfaces to `blastn`, `blastp`, `blastx`, and `makeblastdb`. The BLAST software needs to be downloaded and installed separately. Other R interfaces for bioinformatics are also available: - [rRDP](https://bioconductor.org/packages/rRDP): Interface to the RDP Classifier - [rMSA](https://mhahsler.r-universe.dev/rMSA): Interface for Popular Multiple Sequence Alignment Tools including ClustalW, MAFFT, MUSCLE, and Kalign ## Installation 1. Install the BLAST software by following the instructions in the [INSTALL](https://github.com/mhahsler/rBLAST/blob/devel/INSTALL) file. 2. Install the latest version of the R package ``` r if (!require("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } # The following initializes usage of Bioc devel BiocManager::install(version = "devel") BiocManager::install("rBLAST") ``` ## Usage ``` r library(rBLAST) ``` Download the prebuilt 16S Microbial data base from NCBI’s ftp server at: <https://ftp.ncbi.nlm.nih.gov/blast/db/> ``` r tgz_file <- blast_db_get("16S_ribosomal_RNA.tar.gz") untar(tgz_file, exdir = "16S_rRNA_DB") ``` Load the downloaded BLAST database. ``` r bl <- blast(db = "./16S_rRNA_DB/16S_ribosomal_RNA") bl ``` ## BLAST Database ## Location: /home/hahsler/baR/rBLAST/16S_rRNA_DB/16S_ribosomal_RNA ## BLAST Type: blastn ## Database: 16S ribosomal RNA (Bacteria and Archaea type strains) ## 27,138 sequences; 39,323,968 total bases ## ## Date: Apr 9, 2024 5:36 AM Longest sequence: 3,600 bases ## ## BLASTDB Version: 5 ## ## Volumes: ## /home/hahsler/baR/rBLAST/16S_rRNA_DB/16S_ribosomal_RNA Load some test sequences shipped with the package. ``` r seq <- readRNAStringSet(system.file("examples/RNA_example.fasta", package = "rBLAST")) seq ``` ## RNAStringSet object of length 5: ## width seq names ## [1] 1481 AGAGUUUGAUCCUGGCUCAGAAC...GGUGAAGUCGUAACAAGGUAACC 1675 AB015560.1 d... ## [2] 1404 GCUGGCGGCAGGCCUAACACAUG...CACGGUAAGGUCAGCGACUGGGG 4399 D14432.1 Rho... ## [3] 1426 GGAAUGCUNAACACAUGCAAGUC...AACAAGGUAGCCGUAGGGGAACC 4403 X72908.1 Ros... ## [4] 1362 GCUGGCGGAAUGCUUAACACAUG...UACCUUAGGUGUCUAGGCUAACC 4404 AF173825.1 A... ## [5] 1458 AGAGUUUGAUUAUGGCUCAGAGC...UGAAGUCGUAACAAGGUAACCGU 4411 Y07647.2 Dre... Query the BLAST database to find matches for the first test sequence with a 99% percent identity or higher. ``` r cl <- predict(bl, seq[1, ], BLAST_args = "-perc_identity 99") cl ``` ## qseqid sseqid pident length mismatch gapopen qstart qend sstart send ## 1 1675 NR_151899.1 100 40 0 0 22 61 2 41 ## 2 1675 NR_041235.1 100 37 0 0 22 58 2 38 ## 3 1675 NR_173526.1 100 35 0 0 32 66 1 35 ## 4 1675 NR_117153.1 100 32 0 0 27 58 1 32 ## evalue bitscore ## 1 1.5e-12 75 ## 2 7.0e-11 69 ## 3 9.0e-10 66 ## 4 4.2e-08 60 ## Citation Request To cite package ‘rBLAST’ in publications use: > Hahsler M, Nagar A (2024). “rBLAST: R Interface for the Basic Local > Alignment Search Tool.” Bioconductor version: Release (3.19). > <doi:10.18129/B9.bioc.rBLAST> > <https://doi.org/10.18129/B9.bioc.rBLAST>, R package version 0.99.4. @Misc{, title = {{rBLAST:} {R} Interface for the Basic Local Alignment Search Tool}, author = {Michael Hahsler and Annurag Nagar}, year = {2024}, doi = {10.18129/B9.bioc.rBLAST}, note = {R package version 0.99.4}, howpublished = {Bioconductor version: Release (3.19)}, } ## Acknowledgments This work was partially supported by grant no. R21HG005912 from the [National Human Genome Research Institute](https://www.genome.gov/).