... | ... |
@@ -5,7 +5,10 @@ Version: 0.1 |
5 | 5 |
Date: 2018-02-13 |
6 | 6 |
Author: August Guang |
7 | 7 |
Maintainer: August Guang <august_guang@brown.edu> |
8 |
-Description: This package contains headers from the SeqAn C++ library for easy of usage in R. The probably usage will be with Rcpp. |
|
9 |
-License: BSD |
|
8 |
+Description: This package contains headers from the SeqAn C++ library for easy of usage in R. |
|
9 |
+License: MIT + file LICENSE |
|
10 | 10 |
BugReports: https://github.com/compbiocore/RSeqAn/issues |
11 |
-RoxygenNote: 6.0.1 |
|
12 | 11 |
\ No newline at end of file |
12 |
+RoxygenNote: 6.0.1 |
|
13 |
+Suggests: knitr, |
|
14 |
+ rmarkdown |
|
15 |
+VignetteBuilder: knitr |
... | ... |
@@ -1,26 +1,21 @@ |
1 |
-Copyright (c) 2006-2018, Knut Reinert, FU Berlin |
|
2 |
-All rights reserved. |
|
1 |
+MIT License |
|
3 | 2 |
|
4 |
-Redistribution and use in source and binary forms, with or without |
|
5 |
-modification, are permitted provided that the following conditions are met: |
|
3 |
+Copyright (c) 2018 August Guang |
|
6 | 4 |
|
7 |
- * Redistributions of source code must retain the above copyright |
|
8 |
- notice, this list of conditions and the following disclaimer. |
|
9 |
- * Redistributions in binary form must reproduce the above copyright |
|
10 |
- notice, this list of conditions and the following disclaimer in the |
|
11 |
- documentation and/or other materials provided with the distribution. |
|
12 |
- * Neither the name of Knut Reinert or the FU Berlin nor the names of |
|
13 |
- its contributors may be used to endorse or promote products derived |
|
14 |
- from this software without specific prior written permission. |
|
5 |
+Permission is hereby granted, free of charge, to any person obtaining a copy |
|
6 |
+of this software and associated documentation files (the "Software"), to deal |
|
7 |
+in the Software without restriction, including without limitation the rights |
|
8 |
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
|
9 |
+copies of the Software, and to permit persons to whom the Software is |
|
10 |
+furnished to do so, subject to the following conditions: |
|
15 | 11 |
|
16 |
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" |
|
17 |
-AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE |
|
18 |
-IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE |
|
19 |
-ARE DISCLAIMED. IN NO EVENT SHALL KNUT REINERT OR THE FU BERLIN BE LIABLE |
|
20 |
-FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL |
|
21 |
-DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR |
|
22 |
-SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER |
|
23 |
-CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT |
|
24 |
-LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY |
|
25 |
-OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH |
|
26 |
-DAMAGE. |
|
12 |
+The above copyright notice and this permission notice shall be included in all |
|
13 |
+copies or substantial portions of the Software. |
|
14 |
+ |
|
15 |
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
|
16 |
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
|
17 |
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
|
18 |
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
|
19 |
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
|
20 |
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE |
|
21 |
+SOFTWARE. |
|
27 | 22 |
\ No newline at end of file |
... | ... |
@@ -1,4 +1,4 @@ |
1 |
-[](https://travis-ci.org/compbiocore/RSeqAn) [](https://opensource.org/licenses/BSD-3-Clause) |
|
1 |
+[](https://travis-ci.org/compbiocore/RSeqAn) [](https://opensource.org/licenses/MIT) |
|
2 | 2 |
|
3 | 3 |
# RSeqAn |
4 | 4 |
SeqAn Headers for R |
... | ... |
@@ -11,8 +11,4 @@ RSeqAn can be used via the `LinkingTo:` field in the `DESCRIPTION` field of an R |
11 | 11 |
|
12 | 12 |
## Author |
13 | 13 |
|
14 |
-August Guang |
|
15 |
- |
|
16 |
-## License |
|
17 |
- |
|
18 |
-The license provided is the same as for SeqAn and is unaltered. |
|
19 | 14 |
\ No newline at end of file |
15 |
+August Guang |
|
20 | 16 |
\ No newline at end of file |
21 | 17 |
new file mode 100644 |
... | ... |
@@ -0,0 +1,91 @@ |
1 |
+--- |
|
2 |
+title: "A First Example" |
|
3 |
+author: "August Guang" |
|
4 |
+date: "`r Sys.Date()`" |
|
5 |
+output: rmarkdown::html_vignette |
|
6 |
+vignette: > |
|
7 |
+ %\VignetteIndexEntry{Vignette Title} |
|
8 |
+ %\VignetteEngine{knitr::rmarkdown} |
|
9 |
+ %\VignetteEncoding{UTF-8} |
|
10 |
+--- |
|
11 |
+ |
|
12 |
+```{r setup, include = FALSE} |
|
13 |
+knitr::opts_chunk$set( |
|
14 |
+ collapse = TRUE, |
|
15 |
+ comment = "#>" |
|
16 |
+) |
|
17 |
+Sys.setenv("PKG_CXXFLAGS"="-std=c++14") |
|
18 |
+``` |
|
19 |
+ |
|
20 |
+## Introduction |
|
21 |
+ |
|
22 |
+The reason RSeqAn was created was to allow for easy integration of the SeqAn biological sequence analysis C++ library into R packages. While R is an excellent language for many other applications, it is just not fast enough for reading and writing files on the scale of next generation sequencing output. This is where a well-developed and mature library like SeqAn comes in. |
|
23 |
+ |
|
24 |
+This vignette only goes through the first example in the [A First Example](http://seqan.readthedocs.io/en/master/Tutorial/GettingStarted/AFirstExample.html#tutorial-getting-started-first-steps-in-seqan) section as found in the [Getting Started](http://seqan.readthedocs.io/en/master/Tutorial/GettingStarted/) section of the [SeqAn](http://seqan.readthedocs.io/en/master/index.html) docs. We have modified the function slightly to make it work here in R, and will go through how and why we did so. The purpose of using this example is to help the user get an idea of how to go between SeqAn and R. To take full advantage of SeqAn though the user will need to read through SeqAn's documentation. |
|
25 |
+ |
|
26 |
+Besides that, the user is expected to have some experience with both C++ and Rcpp, although it not need be extensive. After all, that is what RSeqAn is for. |
|
27 |
+ |
|
28 |
+## Template functions and template classes |
|
29 |
+ |
|
30 |
+The simple example in `pattern_search` does, as you might expect, a pattern search of a short query sequence (pattern) in a long subject sequence (text). It returns a score value for each position of the database sequence as the sum of matching characters between the pattern and the text. |
|
31 |
+ |
|
32 |
+```{r pattern_search, engine='Rcpp'} |
|
33 |
+// [[Rcpp::depends(RSeqAn)]] |
|
34 |
+ |
|
35 |
+#include <iostream> |
|
36 |
+#include <seqan/file.h> |
|
37 |
+#include <seqan/sequence.h> |
|
38 |
+#include <Rcpp.h> |
|
39 |
+using namespace Rcpp; |
|
40 |
+using namespace seqan; |
|
41 |
+using namespace std; |
|
42 |
+ |
|
43 |
+// [[Rcpp::export]] |
|
44 |
+IntegerVector pattern_search(std::string t, std::string p) { |
|
45 |
+ |
|
46 |
+ seqan::String<char> text = t; |
|
47 |
+ seqan::String<char> pattern = p; |
|
48 |
+ |
|
49 |
+ String<int> score; |
|
50 |
+ resize(score, length(text) - length(pattern) + 1); |
|
51 |
+ |
|
52 |
+ // Computation of the similarities |
|
53 |
+ // Iteration over the text (outer loop) |
|
54 |
+ for (unsigned i = 0; i < length(text) - length(pattern) + 1; ++i) |
|
55 |
+ { |
|
56 |
+ int localScore = 0; |
|
57 |
+ // Iteration over the pattern for character comparison |
|
58 |
+ for (unsigned j = 0; j < length(pattern); ++j) |
|
59 |
+ { |
|
60 |
+ if (text[i + j] == pattern[j]) |
|
61 |
+ ++localScore; |
|
62 |
+ } |
|
63 |
+ score[i] = localScore; |
|
64 |
+ } |
|
65 |
+ |
|
66 |
+ // Returning the result |
|
67 |
+ IntegerVector s(length(score)); |
|
68 |
+ for (unsigned i = 0; i < length(score); ++i) |
|
69 |
+ s[i] = score[i]; |
|
70 |
+ |
|
71 |
+ return s; |
|
72 |
+} |
|
73 |
+``` |
|
74 |
+ |
|
75 |
+The results are shown in `ps_r`. We see that the first position has a score of 1, because the `i` in the pattern matches the 1 `i` in `is`. |
|
76 |
+ |
|
77 |
+```{r ps_r} |
|
78 |
+pattern_search("This is an awesome tutorial to get to know SeqAn!", "tutorial") |
|
79 |
+``` |
|
80 |
+ |
|
81 |
+### A more detailed look at the program |
|
82 |
+ |
|
83 |
+As we can see, writing a C++ function that utilizes SeqAn inside R is quite easy with Rcpp. We included `<seqan/file.h>` as well as `<seqan/sequence.h>` as those are the modeuls that provide the SeqAn [String class](http://docs.seqan.de/seqan/master/?p=String). This is one of the most fundamental classes in SeqAn. |
|
84 |
+ |
|
85 |
+However, the function we wrote does look slightly different from the one in the [A First Example](http://seqan.readthedocs.io/en/master/Tutorial/GettingStarted/AFirstExample.html#tutorial-getting-started-first-steps-in-seqan) section. First, instead of the function `int main()`, we have instead written the function `IntegerVector pattern_search(std::string t, std::string p)`. (Note: we already declared the namespace std, but it was left here in the function for clarity) Next, instead of printing the score to stdout, we are returning it as an `IntegerVector`. |
|
86 |
+ |
|
87 |
+The reason we did this is that in order for any function using SeqAn to be useful in R, we probably want it to return something and to take in input. This means that the input and output object types need to be translatable between R and C++. SeqAn uses its own **template functions** and **template classes**, and the String class is one of the most fundamental classes in SeqAn. This makes sense since SeqAn is all about analyzing sequences. However, the String class has no direct translation to R. If you try to input `String<char> text` or return `String<int> score` you will end up with loads of errors from the compiler. So, how do we deal with this? |
|
88 |
+ |
|
89 |
+One way to do this is by writing conversion functions such that R and C++ both understand what the data type you are using (such as String) means. Rcpp provides a nice way to do this through `Rcpp::as<T>(obj)` to convert from R to C++ and `Rcpp::wrap(obj)` to convert from C++ to R. More of this is covered in the Rcpp vignette [Extending Rcpp](https://cran.r-project.org/web/packages/Rcpp/vignettes/Rcpp-extending.pdf). Once these functions are written, this is nice for the user as they can just go ahead and `Rcpp::wrap` and `Rcpp::as<T>` as they need. This has not been implemented in RSeqAn yet though, and so for now the user will have to pay attention to how to convert between classes in SeqAn and objects in R for each function that is written. |
|
90 |
+ |
|
91 |
+Rcpp has its own [data types](https://teuder.github.io/rcpp4everyone_en/070_data_types.html) for going between R and C++, and so that is the `IntegerVector` we declare here. Since `score` is essentially a vector of class `String` with type `int`, instead of iterating through `score` and printing to stdout, we create an `IntegerVector s` with the same length as `score` and iterate through `score` copying its values to `s` in order to be able to return the values in `score`. Similarly, we make use of the fact that Rcpp already autoconverts character strings in R to character strings in C++ and that character strings in C++ can be converted to `String<char>` in SeqAn to write `pattern_search` such that we can run it from R. |
|
0 | 92 |
\ No newline at end of file |