Name Mode Size
R 040000
inst 040000
man 040000
tests 040000
vignettes 040000
.BBSoptions 100644 0 kb
.Rbuildignore 100644 0 kb
.gitignore 100644 0 kb
DESCRIPTION 100644 1 kb
LICENSE 100644 1 kb
NAMESPACE 100644 1 kb
README.Rmd 100644 5 kb
README.md 100644 12 kb
README.md
crisprBwa: alignment of gRNA spacer sequences using BWA ================ - <a href="#overview-of-crisprbwa" id="toc-overview-of-crisprbwa">Overview of crisprBwa</a> - <a href="#installation-and-getting-started" id="toc-installation-and-getting-started">Installation and getting started</a> - <a href="#software-requirements" id="toc-software-requirements">Software requirements</a> - <a href="#os-requirements" id="toc-os-requirements">OS Requirements</a> - <a href="#installation-from-bioconductor" id="toc-installation-from-bioconductor">Installation from Bioconductor</a> - <a href="#building-a-bwa-index" id="toc-building-a-bwa-index">Building a bwa index</a> - <a href="#alignment-using-runcrisprbwa" id="toc-alignment-using-runcrisprbwa">Alignment using <code>runCrisprBwa</code></a> - <a href="#applications-beyond-crispr" id="toc-applications-beyond-crispr">Applications beyond CRISPR</a> - <a href="#example-using-rnai-sirna-design" id="toc-example-using-rnai-sirna-design">Example using RNAi (siRNA design)</a> - <a href="#reproducibility" id="toc-reproducibility">Reproducibility</a> - <a href="#references" id="toc-references">References</a> Authors: Jean-Philippe Fortin Date: July 13, 2022 # Overview of crisprBwa `crisprBwa` provides two main functions to align short DNA sequences to a reference genome using the short read aligner BWA-backtrack (Li and Durbin 2009) and return the alignments as R objects: `runBwa` and `runCrisprBwa`. It utilizes the Bioconductor package `Rbwa` to access the BWA program in a platform-independent manner. This means that users do not need to install BWA prior to using `crisprBwa`. The latter function (`runCrisprBwa`) is specifically designed to map and annotate CRISPR guide RNA (gRNA) spacer sequences using CRISPR nuclease objects and CRISPR genomic arithmetics defined in the Bioconductor package [crisprBase](https://github.com/crisprVerse/crisprBase). This enables a fast and accurate on-target and off-target search of gRNA spacer sequences for virtually any type of CRISPR nucleases. It also provides an off-target search engine for our main gRNA design package [crisprDesign](https://github.com/crisprVerse/crisprDesign) of the [crisprVerse](https://github.com/crisprVerse) ecosystem. See the `addSpacerAlignments` function in `crisprDesign` for more details. # Installation and getting started ## Software requirements ### OS Requirements This package is supported for macOS and Linux only. Package was developed and tested on R version 4.2.1. ## Installation from Bioconductor `crisprBwa` can be installed from from the Bioconductor devel branch using the following commands in a fresh R session: ``` r if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install(version="devel") BiocManager::install("crisprBwa") ``` # Building a bwa index To use `runBwa` or `runCrisprBwa`, users need to first build a BWA genome index. For a given genome, this step has to be done only once. The `Rbwa` package conveniently provides the function `bwa_build_index` to build a BWA index from any custom genome from a FASTA file. As an example, we build a BWA index for a small portion of the human chromosome 12 (`chr12.fa` file provided in the `crisprBwa` package) and save the index file as `myIndex` to a temporary directory: ``` r library(Rbwa) fasta <- system.file(package="crisprBwa", "example/chr12.fa") outdir <- tempdir() index <- file.path(outdir, "chr12") Rbwa::bwa_build_index(fasta, index_prefix=index) ``` To learn how to create a BWA index for a complete genome or transcriptome, please visit our [tutorial page](https://github.com/crisprVerse/Tutorials/tree/master/Building_Genome_Indices). # Alignment using `runCrisprBwa` As an example, we align 5 spacer sequences (of length 20bp) to the custom genome built above, allowing a maximum of 3 mismatches between the spacer and protospacer sequences. We specify that the search is for the wildtype Cas9 (SpCas9) nuclease by providing the `CrisprNuclease` object `SpCas9` available through the `crisprBase` package. The argument `canonical=FALSE` specifies that non-canonical PAM sequences are also considered (NAG and NGA for SpCas9). The function `getAvailableCrisprNucleases` in `crisprBase` returns a character vector of available `crisprNuclease` objects found in `crisprBase`. We also need to provide a `BSgenome` object corresponding to the reference genome used for alignment to extract protospacer and PAM sequences of the target sequences. ``` r library(crisprBwa) ``` ## Warning: multiple methods tables found for 'aperm' ## Warning: replacing previous import 'BiocGenerics::aperm' by ## 'DelayedArray::aperm' when loading 'SummarizedExperiment' ``` r library(BSgenome.Hsapiens.UCSC.hg38) ``` ## Loading required package: BSgenome ## Loading required package: BiocGenerics ## ## Attaching package: 'BiocGenerics' ## The following objects are masked from 'package:stats': ## ## IQR, mad, sd, var, xtabs ## The following objects are masked from 'package:base': ## ## anyDuplicated, aperm, append, as.data.frame, basename, cbind, ## colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find, ## get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply, ## match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, ## Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, ## table, tapply, union, unique, unsplit, which.max, which.min ## Loading required package: S4Vectors ## Loading required package: stats4 ## ## Attaching package: 'S4Vectors' ## The following objects are masked from 'package:base': ## ## expand.grid, I, unname ## Loading required package: IRanges ## Loading required package: GenomeInfoDb ## Loading required package: GenomicRanges ## Loading required package: Biostrings ## Loading required package: XVector ## ## Attaching package: 'Biostrings' ## The following object is masked from 'package:base': ## ## strsplit ## Loading required package: rtracklayer ``` r data(SpCas9, package="crisprBase") crisprNuclease <- SpCas9 bsgenome <- BSgenome.Hsapiens.UCSC.hg38 spacers <- c("AGCTGTCCGTGGGGGTCCGC", "CCCCTGCTGCTGTGCCAGGC", "ACGAACTGTAAAAGGCTTGG", "ACGAACTGTAACAGGCTTGG", "AAGGCCCTCAGAGTAATTAC") runCrisprBwa(spacers, bsgenome=bsgenome, crisprNuclease=crisprNuclease, n_mismatches=3, canonical=FALSE, bwa_index=index) ``` ## [runCrisprBwa] Using BSgenome.Hsapiens.UCSC.hg38 ## [runCrisprBwa] Searching for SpCas9 protospacers ## spacer protospacer pam chr pam_site strand ## 1 AAGGCCCTCAGAGTAATTAC AAGGCCCTCAGAGTAATTAC AGA chr12 170636 + ## 2 ACGAACTGTAAAAGGCTTGG ACGAACTGTAAAAGGCTTGG AGG chr12 170815 - ## 3 ACGAACTGTAACAGGCTTGG ACGAACTGTAAAAGGCTTGG AGG chr12 170815 - ## 4 AGCTGTCCGTGGGGGTCCGC AGCTGTCCGTGGGGGTCCGC AGG chr12 170585 + ## 5 CCCCTGCTGCTGTGCCAGGC CCCCTGCTGCTGTGCCAGGC CGG chr12 170609 + ## n_mismatches canonical ## 1 0 FALSE ## 2 0 TRUE ## 3 1 TRUE ## 4 0 TRUE ## 5 0 TRUE # Applications beyond CRISPR The function `runBwa` is similar to `runCrisprBwa`, but does not impose constraints on PAM sequences. It can be used to search for any short read sequence in a genome. ## Example using RNAi (siRNA design) Seed-related off-targets caused by mismatch tolerance outside of the seed region is a well-studied and characterized problem observed in RNA interference (RNAi) experiments. `runBWa` can be used to map shRNA/siRNA seed sequences to reference genomes to predict putative off-targets: ``` r seeds <- c("GTAAGCGGAGTGT", "AACGGGGAGATTG") runBwa(seeds, n_mismatches=2, bwa_index=index) ``` ## query chr pos strand n_mismatches ## 1 AACGGGGAGATTG chr12 68337 - 2 ## 2 AACGGGGAGATTG chr12 1666 - 2 ## 3 AACGGGGAGATTG chr12 123863 + 2 ## 4 AACGGGGAGATTG chr12 151731 - 2 ## 5 AACGGGGAGATTG chr12 110901 + 2 ## 6 GTAAGCGGAGTGT chr12 101550 - 2 # Reproducibility ``` r sessionInfo() ``` ## R version 4.2.1 (2022-06-23) ## Platform: x86_64-apple-darwin17.0 (64-bit) ## Running under: macOS Catalina 10.15.7 ## ## Matrix products: default ## BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib ## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib ## ## locale: ## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 ## ## attached base packages: ## [1] stats4 stats graphics grDevices utils datasets methods ## [8] base ## ## other attached packages: ## [1] BSgenome.Hsapiens.UCSC.hg38_1.4.4 BSgenome_1.65.2 ## [3] rtracklayer_1.57.0 Biostrings_2.65.3 ## [5] XVector_0.37.1 GenomicRanges_1.49.1 ## [7] GenomeInfoDb_1.33.7 IRanges_2.31.2 ## [9] S4Vectors_0.35.3 BiocGenerics_0.43.4 ## [11] crisprBwa_1.1.3 Rbwa_1.1.0 ## ## loaded via a namespace (and not attached): ## [1] SummarizedExperiment_1.27.2 tidyselect_1.1.2 ## [3] xfun_0.32 purrr_0.3.4 ## [5] lattice_0.20-45 vctrs_0.4.1 ## [7] htmltools_0.5.3 yaml_2.3.5 ## [9] utf8_1.2.2 XML_3.99-0.10 ## [11] rlang_1.0.5 pillar_1.8.1 ## [13] glue_1.6.2 BiocParallel_1.31.12 ## [15] bit64_4.0.5 matrixStats_0.62.0 ## [17] GenomeInfoDbData_1.2.8 lifecycle_1.0.1 ## [19] stringr_1.4.1 zlibbioc_1.43.0 ## [21] MatrixGenerics_1.9.1 codetools_0.2-18 ## [23] evaluate_0.16 restfulr_0.0.15 ## [25] Biobase_2.57.1 knitr_1.40 ## [27] tzdb_0.3.0 fastmap_1.1.0 ## [29] parallel_4.2.1 fansi_1.0.3 ## [31] crisprBase_1.1.8 readr_2.1.2 ## [33] DelayedArray_0.23.1 vroom_1.5.7 ## [35] bit_4.0.4 Rsamtools_2.13.4 ## [37] rjson_0.2.21 hms_1.1.2 ## [39] digest_0.6.29 stringi_1.7.8 ## [41] BiocIO_1.7.1 grid_4.2.1 ## [43] cli_3.4.0 tools_4.2.1 ## [45] bitops_1.0-7 magrittr_2.0.3 ## [47] RCurl_1.98-1.8 tibble_3.1.8 ## [49] crayon_1.5.1 pkgconfig_2.0.3 ## [51] ellipsis_0.3.2 Matrix_1.4-1 ## [53] rmarkdown_2.16 rstudioapi_0.14 ## [55] R6_2.5.1 GenomicAlignments_1.33.1 ## [57] compiler_4.2.1 # References <div id="refs" class="references csl-bib-body hanging-indent"> <div id="ref-bwa" class="csl-entry"> Li, Heng, and Richard Durbin. 2009. “Fast and Accurate Short Read Alignment with Burrows–Wheeler Transform.” *Bioinformatics* 25 (14): 1754–60. </div> </div>