... | ... |
@@ -171,7 +171,7 @@ which allows for pretty-printing multiple alignments using the \LaTeX\ package |
171 | 171 |
\shade. As an example, the following \R\ code creates a PDF file |
172 | 172 |
\verb+myfirstAlignment.pdf+ which is shown in |
173 | 173 |
Figure~\ref{fig:myFirstAlignment}: |
174 |
-<<IntegratePDF2>>= |
|
174 |
+<<IntegratePDF2,eval=FALSE>>= |
|
175 | 175 |
msaPrettyPrint(myFirstAlignment, output="pdf", showNames="none", |
176 | 176 |
showLogo="none", askForOverwrite=FALSE, verbose=FALSE) |
177 | 177 |
@ |
... | ... |
@@ -6,12 +6,12 @@ |
6 | 6 |
\hypersetup{colorlinks=false, |
7 | 7 |
pdfborder=0 0 0, |
8 | 8 |
pdftitle={msa - An R Package for Multiple Sequence Alignment}, |
9 |
- pdfauthor={Enrico Bonatesta, Christoph Horejs-Kainrath, and Ulrich Bodenhofer}} |
|
9 |
+ pdfauthor={Enrico Bonatesta, Christoph Kainrath, and Ulrich Bodenhofer}} |
|
10 | 10 |
|
11 | 11 |
\usepackage[OT1]{fontenc} |
12 | 12 |
|
13 | 13 |
\title{{\Huge msa}\\[5mm] An R Package for Multiple Sequence Alignment} |
14 |
-\author{Enrico Bonatesta, Christoph Horej\v{s}-Kainrath, and Ulrich Bodenhofer} |
|
14 |
+\author{Enrico Bonatesta, Christoph Kainrath, and Ulrich Bodenhofer} |
|
15 | 15 |
\affiliation{Institute of Bioinformatics, Johannes Kepler University |
16 | 16 |
Linz\\Altenberger Str. 69, 4040 Linz, Austria\\ |
17 | 17 |
\email{msa@bioinf.jku.at}} |
... | ... |
@@ -887,126 +887,6 @@ Moreover, we insist that, any time you cite the package, you also cite |
887 | 887 |
the original paper in which the original algorithm has been introduced (see |
888 | 888 |
bibliography below). |
889 | 889 |
|
890 |
-\section{Change Log} |
|
891 |
- |
|
892 |
-\begin{description} |
|
893 |
-\item[Version 1.18.0:] release as part of Bioconductor 3.10 |
|
894 |
-\item[Version 1.17.1:] \mbox{ } \begin{itemize} |
|
895 |
- \item fixed regular expression to comply with PCRE2 |
|
896 |
- \item fixed Windows makefile for gc lib |
|
897 |
- \item fixed Windows cleanup script |
|
898 |
- \item fixed\verb+ src/Makevars.win+ |
|
899 |
- \end{itemize} |
|
900 |
-\item[Version 1.17.0:] new branch for Bioconductor 3.10 devel |
|
901 |
-\item[Version 1.16.0:] release as part of Bioconductor 3.9 |
|
902 |
-\item[Version 1.15.0:] new branch for Bioconductor 3.9 devel |
|
903 |
-\item[Version 1.14.0:] release as part of Bioconductor 3.8 |
|
904 |
-\item[Version 1.13.0:] new branch for Bioconductor 3.8 devel |
|
905 |
-\item[Version 1.12.0:] release as part of Bioconductor 3.7 |
|
906 |
-\item[Version 1.11.2:] \mbox{ } \begin{itemize} |
|
907 |
- \item minor fix in ClustalW |
|
908 |
- \end{itemize} |
|
909 |
-\item[Version 1.11.1:] \mbox{ } \begin{itemize} |
|
910 |
- \item fix of code for using custom substitution matrices in ClustalW |
|
911 |
- \end{itemize} |
|
912 |
-\item[Version 1.11.0:] new branch for Bioconductor 3.7 devel |
|
913 |
-\item[Version 1.10.0:] release as part of Bioconductor 3.56 |
|
914 |
-\item[Version 1.9.0:] new branch for Bioconductor 3.6 devel |
|
915 |
-\item[Version 1.8.0:] release as part of Bioconductor 3.5 |
|
916 |
-\item[Version 1.7.2:] \mbox{ } \begin{itemize} |
|
917 |
- \item fix for new \verb+clang+ 4 compiler on Mac OS |
|
918 |
- \end{itemize} |
|
919 |
-\item[Version 1.7.1:] \mbox{ } \begin{itemize} |
|
920 |
- \item additional conversions implemented for \verb+msaConvert()+ function |
|
921 |
- \item added a new method \verb+msaConsensusSequence()+ that extends the |
|
922 |
- functionality provided by \verb+Biostring+'s \verb+consensusString()+ method |
|
923 |
- \item added a new method \verb+msaConservationScore()+ |
|
924 |
- \item \verb+print()+ method extended such that it now also allows for |
|
925 |
- customization of the consensus sequence (via the new |
|
926 |
- \verb+msaConsensusSequence()+ method) |
|
927 |
- \item package now depends on \verb+Biostrings+ version $\geq$2.40.0 in order |
|
928 |
- to make sure that \verb+consensusMatrix()+ also works correctly |
|
929 |
- for masked alignments |
|
930 |
- \item corresponding changes in documentation and vignette |
|
931 |
- \end{itemize} |
|
932 |
-\item[Version 1.7.0:] new branch for Bioconductor 3.5 devel |
|
933 |
-\item[Version 1.6.0:] release as part of Bioconductor 3.4 |
|
934 |
-\item[Version 1.5.5:] \mbox{ } \begin{itemize} |
|
935 |
- \item fixes in ClustalOmega source code to ensure Windows compatibility of |
|
936 |
- GCC6 compatibility fix |
|
937 |
- \end{itemize} |
|
938 |
-\item[Version 1.5.4:] \mbox{ } \begin{itemize} |
|
939 |
- \item bug fix in \verb+msaClustalW()+: unsupported parameter `\verb+tree+' deactivated |
|
940 |
- \item fixes in ClustalOmega source code to ensure GCC 6 compatibility |
|
941 |
- \item fix in \verb+msaConvert()+ function to improve safety of call to suggested |
|
942 |
- package \verb+phangorn+ |
|
943 |
- \end{itemize} |
|
944 |
-\item[Version 1.5.3:] \mbox{ } \begin{itemize} |
|
945 |
- \item additional conversions implemented for \verb+msaConvert()+ function |
|
946 |
- \item corresponding changes in documentation |
|
947 |
- \end{itemize} |
|
948 |
-\item[Versions 1.5.1 and 1.5.2:] version number bumps for technical reasons |
|
949 |
- related to Bioconductor build servers |
|
950 |
-\item[Version 1.5.0:] new branch for Bioconductor 3.4 devel |
|
951 |
-\item[Version 1.4.0:] release as part of Bioconductor 3.3 |
|
952 |
-\item[Version 1.3.7:] \mbox{ } \begin{itemize} |
|
953 |
- \item fixes in \verb+msaPrettyPrint()+ function |
|
954 |
- \end{itemize} |
|
955 |
-\item[Version 1.3.6:] \mbox) { } \begin{itemize} |
|
956 |
- \item \verb+msaPrettyPrint()+ now also accepts dashes in file names |
|
957 |
- \item added section about pretty-printing wide alignments to package vignette |
|
958 |
- \end{itemize} |
|
959 |
-\item[Version 1.3.5:] \mbox{ } \begin{itemize} |
|
960 |
- \item adaptation of displaying help text by \verb+msa()+ function |
|
961 |
- \end{itemize} |
|
962 |
-\item[Version 1.3.4:] \mbox{ } \begin{itemize} |
|
963 |
- \item added function for checking and fixing sequence names for |
|
964 |
- possibly problematic characters that could lead to \LaTeX\ errors |
|
965 |
- when using \verb+msaPrettyPrint()+ |
|
966 |
- \item corresponding changes in documentation |
|
967 |
- \item minor namespace fix |
|
968 |
- \end{itemize} |
|
969 |
-\item[Version 1.3.3:] \mbox{ } \begin{itemize} |
|
970 |
- \item added function for converting multiple sequence alignments for |
|
971 |
- use with other sequence alignment packages |
|
972 |
- \item corresponding changes in documentation |
|
973 |
- \end{itemize} |
|
974 |
-\item[Version 1.3.2:] \mbox{ } \begin{itemize} |
|
975 |
- \item further fixes in Makefiles and Makevars files to account for changes in build system |
|
976 |
- \item update of citation information |
|
977 |
- \end{itemize} |
|
978 |
-\item[Version 1.3.1:] \mbox{ } \begin{itemize} |
|
979 |
- \item fixes in Makefiles and Makevars files to account for changes in build system |
|
980 |
- \end{itemize} |
|
981 |
-\item[Version 1.3.0:] new branch for Bioconductor 3.3 devel |
|
982 |
-\item[Version 1.2.0:] release as part of Bioconductor 3.2 |
|
983 |
-\item[Version 1.1.3:] \mbox{ } \begin{itemize} |
|
984 |
- \item bug fix related to custom substitution matrices |
|
985 |
- in the MUSCLE interface |
|
986 |
- \item corrections and updates of documentation |
|
987 |
- \end{itemize} |
|
988 |
-\item[Version 1.1.2:] \mbox{ } \begin{itemize} |
|
989 |
- \item new \verb+print()+ function for multiple alignments that also |
|
990 |
- allows for displaying alignments in their entirety (plus additional |
|
991 |
- customizations) |
|
992 |
- \item strongly improved handling of custom substitution matrices by |
|
993 |
- \verb+msaClustalW()+: now custom matrices can also be supplied for nucleotide |
|
994 |
- sequences which can also be passed via the \verb+substitutionMatrix+ argument. |
|
995 |
- The \verb+dnamatrix+ argument is still available for the sake of backwards |
|
996 |
- compatibility. |
|
997 |
- \item strongly improved handling of custom substitution matrices by |
|
998 |
- \verb+msaMuscle()+ |
|
999 |
- \item fix of improperly aligned sequence logos produced by |
|
1000 |
- \verb+msaPrettyPrint()+ |
|
1001 |
- \item updated citation information |
|
1002 |
- \end{itemize} |
|
1003 |
-\item[Version 1.1.1:] \mbox{ } \begin{itemize} |
|
1004 |
- \item fix of \verb+msa()+ function |
|
1005 |
- \end{itemize} |
|
1006 |
-\item[Version 1.1.0:] new branch for Bioconductor 3.2 devel |
|
1007 |
-\item[Version 1.0.0:] first official release as part of Bioconductor 3.1 |
|
1008 |
-\end{description} |
|
1009 |
- |
|
1010 | 890 |
\bibliographystyle{plain} |
1011 | 891 |
\bibliography{lit} |
1012 | 892 |
|
... | ... |
@@ -890,6 +890,18 @@ bibliography below). |
890 | 890 |
\section{Change Log} |
891 | 891 |
|
892 | 892 |
\begin{description} |
893 |
+\item[Version 1.18.0:] release as part of Bioconductor 3.10 |
|
894 |
+\item[Version 1.17.1:] \mbox{ } \begin{itemize} |
|
895 |
+ \item fixed regular expression to comply with PCRE2 |
|
896 |
+ \item fixed Windows makefile for gc lib |
|
897 |
+ \item fixed Windows cleanup script |
|
898 |
+ \item fixed\verb+ src/Makevars.win+ |
|
899 |
+ \end{itemize} |
|
900 |
+\item[Version 1.17.0:] new branch for Bioconductor 3.10 devel |
|
901 |
+\item[Version 1.16.0:] release as part of Bioconductor 3.9 |
|
902 |
+\item[Version 1.15.0:] new branch for Bioconductor 3.9 devel |
|
903 |
+\item[Version 1.14.0:] release as part of Bioconductor 3.8 |
|
904 |
+\item[Version 1.13.0:] new branch for Bioconductor 3.8 devel |
|
893 | 905 |
\item[Version 1.12.0:] release as part of Bioconductor 3.7 |
894 | 906 |
\item[Version 1.11.2:] \mbox{ } \begin{itemize} |
895 | 907 |
\item minor fix in ClustalW |
... | ... |
@@ -111,8 +111,9 @@ available via Bioconductor. The simplest way to install the package |
111 | 111 |
is the following: |
112 | 112 |
|
113 | 113 |
<<InstallMSA,eval=FALSE>>= |
114 |
-source("http://www.bioconductor.org/biocLite.R") |
|
115 |
-biocLite("msa") |
|
114 |
+if (!requireNamespace("BiocManager", quietly=TRUE)) |
|
115 |
+ install.packages("BiocManager") |
|
116 |
+BiocManager::install("msa") |
|
116 | 117 |
@ |
117 | 118 |
|
118 | 119 |
To test the installation of the \MSA\ package, enter |
... | ... |
@@ -890,6 +890,9 @@ bibliography below). |
890 | 890 |
|
891 | 891 |
\begin{description} |
892 | 892 |
\item[Version 1.12.0:] release as part of Bioconductor 3.7 |
893 |
+\item[Version 1.11.2:] \mbox{ } \begin{itemize} |
|
894 |
+ \item minor fix in ClustalW |
|
895 |
+ \end{itemize} |
|
893 | 896 |
\item[Version 1.11.1:] \mbox{ } \begin{itemize} |
894 | 897 |
\item fix of code for using custom substitution matrices in ClustalW |
895 | 898 |
\end{itemize} |
... | ... |
@@ -889,6 +889,14 @@ bibliography below). |
889 | 889 |
\section{Change Log} |
890 | 890 |
|
891 | 891 |
\begin{description} |
892 |
+\item[Version 1.12.0:] release as part of Bioconductor 3.7 |
|
893 |
+\item[Version 1.11.1:] \mbox{ } \begin{itemize} |
|
894 |
+ \item fix of code for using custom substitution matrices in ClustalW |
|
895 |
+ \end{itemize} |
|
896 |
+\item[Version 1.11.0:] new branch for Bioconductor 3.7 devel |
|
897 |
+\item[Version 1.10.0:] release as part of Bioconductor 3.56 |
|
898 |
+\item[Version 1.9.0:] new branch for Bioconductor 3.6 devel |
|
899 |
+\item[Version 1.8.0:] release as part of Bioconductor 3.5 |
|
892 | 900 |
\item[Version 1.7.2:] \mbox{ } \begin{itemize} |
893 | 901 |
\item fix for new \verb+clang+ 4 compiler on Mac OS |
894 | 902 |
\end{itemize} |
git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/msa@128803 bc3139a8-67e5-0310-9ffc-ced21a209358
... | ... |
@@ -889,6 +889,9 @@ bibliography below). |
889 | 889 |
\section{Change Log} |
890 | 890 |
|
891 | 891 |
\begin{description} |
892 |
+\item[Version 1.7.2:] \mbox{ } \begin{itemize} |
|
893 |
+ \item fix for new \verb+clang+ 4 compiler on Mac OS |
|
894 |
+ \end{itemize} |
|
892 | 895 |
\item[Version 1.7.1:] \mbox{ } \begin{itemize} |
893 | 896 |
\item additional conversions implemented for \verb+msaConvert()+ function |
894 | 897 |
\item added a new method \verb+msaConsensusSequence()+ that extends the |
... | ... |
@@ -925,7 +928,7 @@ bibliography below). |
925 | 928 |
\item[Version 1.3.7:] \mbox{ } \begin{itemize} |
926 | 929 |
\item fixes in \verb+msaPrettyPrint()+ function |
927 | 930 |
\end{itemize} |
928 |
-\item[Version 1.3.6:] \mbox{ } \begin{itemize} |
|
931 |
+\item[Version 1.3.6:] \mbox) { } \begin{itemize} |
|
929 | 932 |
\item \verb+msaPrettyPrint()+ now also accepts dashes in file names |
930 | 933 |
\item added section about pretty-printing wide alignments to package vignette |
931 | 934 |
\end{itemize} |
git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/msa@127901 bc3139a8-67e5-0310-9ffc-ced21a209358
... | ... |
@@ -382,8 +382,8 @@ both as setter and getter functions. To set row or column masks, an |
382 | 382 |
\verb+IRanges+ object must be supplied: |
383 | 383 |
<<maskExample>>= |
384 | 384 |
myMaskedAlignment <- myFirstAlignment |
385 |
-rowM <- IRanges(start=1, end=2) |
|
386 |
-rowmask(myMaskedAlignment) <- rowM |
|
385 |
+colM <- IRanges(start=1, end=100) |
|
386 |
+colmask(myMaskedAlignment) <- colM |
|
387 | 387 |
myMaskedAlignment |
388 | 388 |
@ |
389 | 389 |
|
... | ... |
@@ -400,16 +400,26 @@ conMat <- consensusMatrix(myFirstAlignment) |
400 | 400 |
dim(conMat) |
401 | 401 |
conMat[, 101:110] |
402 | 402 |
@ |
403 |
-Note that \verb+consensusMatrix()+ cannot handle |
|
404 |
-alignments with active masks. So, the masks in multiple alignment objects must |
|
405 |
-must be removed prior to the computation of the consensus matrix: |
|
403 |
+If called on a masked alignment, \verb+consensusMatrix()+ only uses those |
|
404 |
+sequences/rows that are not masked. If there are masked columns, |
|
405 |
+the matrix contains \verb+NA+'s in those columns: |
|
406 | 406 |
<<consensusExample2>>= |
407 |
-conMat <- consensusMatrix(unmasked(myMaskedAlignment)) |
|
407 |
+conMat <- consensusMatrix(myMaskedAlignment) |
|
408 |
+conMat[, 95:104] |
|
408 | 409 |
@ |
409 | 410 |
|
410 |
-Consensus strings can be computed from consensus matrices: |
|
411 |
+Multiple alignments also inherit the \verb+consensusString()+ method from |
|
412 |
+the \verb+Biostrings+ package. However, for more flexibility and consistency, |
|
413 |
+we rather advise users to use the method \verb+msaConsensusSequence()+ |
|
414 |
+method (see below). |
|
415 |
+ |
|
416 |
+\subsection{Consensus Sequences and Conservation Scores} |
|
417 |
+ |
|
418 |
+With version 1.7.1 of \MSA, new methods have been provided that allow for |
|
419 |
+the computation of consensus sequences and conservation scores. |
|
420 |
+By default, the \verb+msaConsensusSequence()+ method is a wrapper around the |
|
421 |
+\verb+consensusString()+ method from the \verb+Biostrings+: |
|
411 | 422 |
<<consensusExample3>>= |
412 |
-## auxiliary function for splitting a string into displayable portions |
|
413 | 423 |
printSplitString <- function(x, width=getOption("width") - 1) |
414 | 424 |
{ |
415 | 425 |
starts <- seq(from=1, to=nchar(x), by=width) |
... | ... |
@@ -418,20 +428,55 @@ printSplitString <- function(x, width=getOption("width") - 1) |
418 | 428 |
cat(substr(x, starts[i], starts[i] + width - 1), "\n") |
419 | 429 |
} |
420 | 430 |
|
421 |
-printSplitString(consensusString(conMat)) |
|
431 |
+printSplitString(msaConsensusSequence(myFirstAlignment)) |
|
422 | 432 |
@ |
423 |
-\noindent Consensus sequences can also be computed directly without computing |
|
424 |
-intermediate consensus matrices. However, the \verb+consensusString()+ |
|
425 |
-function cannot handle the |
|
426 |
-masks contained in the multiple alignment objects (no matter whether |
|
427 |
-there are active masks or not). Therefore, it is necessary to remove |
|
428 |
-the masks beforehand: |
|
433 |
+However, there is also a second method for computing consensus sequence that |
|
434 |
+has been implemented in line with a consensus sequence method implemented |
|
435 |
+in \TeXshade\ that allows for specify an upper and a lower conservation threshold |
|
436 |
+(see example below). This method can be accessed via the argument |
|
437 |
+\verb+type="upperlower"+. Additional customizations are available, too: |
|
429 | 438 |
<<consensusExample4>>= |
430 |
-printSplitString(consensusString(unmasked(myFirstAlignment))) |
|
431 |
-printSplitString(consensusString(unmasked(myMaskedAlignment))) |
|
439 |
+printSplitString(msaConsensusSequence(myFirstAlignment, type="upperlower", |
|
440 |
+ thresh=c(40, 20))) |
|
441 |
+@ |
|
442 |
+ |
|
443 |
+Regardless of which method is used, masks are taken into account: masked |
|
444 |
+rows/sequences are neglected and masked columns are shown as ``\verb+#+'' in |
|
445 |
+the consensus sequence: |
|
446 |
+<<consensusExample5>>= |
|
447 |
+printSplitString(msaConsensusSequence(myMaskedAlignment, type="upperlower", |
|
448 |
+ thresh=c(40, 20))) |
|
449 |
+@ |
|
450 |
+ |
|
451 |
+The main purpose of consensus sequences is to get an impression of conservation |
|
452 |
+at individual positions/columns of a multiple alignment. The \MSA\ package also |
|
453 |
+provides another means of analyzing conservation: the method |
|
454 |
+\verb+msaConservationScore()+ computes sums of pairwise scores for a given |
|
455 |
+substitution/scoring matrix. Thereby, conservation can also be analyzed in a |
|
456 |
+more sensible way than by only taking relative frequencies of letters into |
|
457 |
+account as \verb+msaConsensusSequence()+ does. |
|
458 |
+<<conservationExample1>>= |
|
459 |
+data(BLOSUM62) |
|
460 |
+msaConservationScore(myFirstAlignment, BLOSUM62) |
|
461 |
+@ |
|
462 |
+As the above example shows, a substitution matrix must be provided. The result |
|
463 |
+is obviously a vector as long as the alignment has columns. The entries of the |
|
464 |
+vector are labeled by the consensus sequence. The way the consensus sequence is |
|
465 |
+computed can be customized: |
|
466 |
+<<conservationExample2>>= |
|
467 |
+msaConservationScore(myFirstAlignment, BLOSUM62, gapVsGap=0, |
|
468 |
+ type="upperlower", thresh=c(40, 20)) |
|
469 |
+@ |
|
470 |
+The additional argument \verb+gapVsGap+ allows for controlling how pairs of |
|
471 |
+gap are taken into account when computing pairwise scores (see |
|
472 |
+\verb+?msaConservationScore+ for more details). |
|
473 |
+ |
|
474 |
+Conservation scores can also be computed from masked alignments. For masked |
|
475 |
+columns, \verb+NA+'s are returned: |
|
476 |
+<<conservationExample3>>= |
|
477 |
+msaConservationScore(myMaskedAlignment, BLOSUM62, gapVsGap=0, |
|
478 |
+ type="upperlower", thresh=c(40, 20)) |
|
432 | 479 |
@ |
433 |
-\noindent Actually, the \verb+print()+ method (see Section~\ref{sec:msaPrint} above) |
|
434 |
-uses this function to compute the consensus sequence. |
|
435 | 480 |
|
436 | 481 |
\subsection{Interfacing to Other Packages} |
437 | 482 |
|
... | ... |
@@ -795,6 +840,16 @@ source package tarball, untar it, comment/uncomment the corresponding line in |
795 | 840 |
\verb+msa/src/ClustalOmega/msaMakefile+ (see first six lines), and |
796 | 841 |
build/install the package from source. |
797 | 842 |
|
843 |
+\subsubsection*{Build/installation issues} |
|
844 |
+ |
|
845 |
+Some users have reported compiler and linker errors when building \MSA\ from |
|
846 |
+source on Linux systems. In almost all cases, these could have been tracked down |
|
847 |
+to issues with the \R\ setup on those systems (e.g.\ a \verb+Rprofile.site+ file |
|
848 |
+that makes changes to the \R\ environment that are not compatible with \MSA's |
|
849 |
+Makefiles).\footnote{See, e.g., \url{https://support.bioconductor.org/p/90735/}} |
|
850 |
+In most cases, these issues can be avoided by installing \MSA\ in a ``vanilla \R\ session'', |
|
851 |
+i.e.\ starting \R\ with option \verb+--vanilla+ when installing \MSA. |
|
852 |
+ |
|
798 | 853 |
\section{Future Extensions}\label{sec:future} |
799 | 854 |
|
800 | 855 |
We envision the following changes/extensions in future versions of the package: |
... | ... |
@@ -834,6 +889,21 @@ bibliography below). |
834 | 889 |
\section{Change Log} |
835 | 890 |
|
836 | 891 |
\begin{description} |
892 |
+\item[Version 1.7.1:] \mbox{ } \begin{itemize} |
|
893 |
+ \item additional conversions implemented for \verb+msaConvert()+ function |
|
894 |
+ \item added a new method \verb+msaConsensusSequence()+ that extends the |
|
895 |
+ functionality provided by \verb+Biostring+'s \verb+consensusString()+ method |
|
896 |
+ \item added a new method \verb+msaConservationScore()+ |
|
897 |
+ \item \verb+print()+ method extended such that it now also allows for |
|
898 |
+ customization of the consensus sequence (via the new |
|
899 |
+ \verb+msaConsensusSequence()+ method) |
|
900 |
+ \item package now depends on \verb+Biostrings+ version $\geq$2.40.0 in order |
|
901 |
+ to make sure that \verb+consensusMatrix()+ also works correctly |
|
902 |
+ for masked alignments |
|
903 |
+ \item corresponding changes in documentation and vignette |
|
904 |
+ \end{itemize} |
|
905 |
+\item[Version 1.7.0:] new branch for Bioconductor 3.5 devel |
|
906 |
+\item[Version 1.6.0:] release as part of Bioconductor 3.4 |
|
837 | 907 |
\item[Version 1.5.5:] \mbox{ } \begin{itemize} |
838 | 908 |
\item fixes in ClustalOmega source code to ensure Windows compatibility of |
839 | 909 |
GCC6 compatibility fix |
git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/msa@119986 bc3139a8-67e5-0310-9ffc-ced21a209358
... | ... |
@@ -834,6 +834,10 @@ bibliography below). |
834 | 834 |
\section{Change Log} |
835 | 835 |
|
836 | 836 |
\begin{description} |
837 |
+\item[Version 1.5.5:] \mbox{ } \begin{itemize} |
|
838 |
+ \item fixes in ClustalOmega source code to ensure Windows compatibility of |
|
839 |
+ GCC6 compatibility fix |
|
840 |
+ \end{itemize} |
|
837 | 841 |
\item[Version 1.5.4:] \mbox{ } \begin{itemize} |
838 | 842 |
\item bug fix in \verb+msaClustalW()+: unsupported parameter `\verb+tree+' deactivated |
839 | 843 |
\item fixes in ClustalOmega source code to ensure GCC 6 compatibility |
git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/msa@119484 bc3139a8-67e5-0310-9ffc-ced21a209358
... | ... |
@@ -834,6 +834,12 @@ bibliography below). |
834 | 834 |
\section{Change Log} |
835 | 835 |
|
836 | 836 |
\begin{description} |
837 |
+\item[Version 1.5.4:] \mbox{ } \begin{itemize} |
|
838 |
+ \item bug fix in \verb+msaClustalW()+: unsupported parameter `\verb+tree+' deactivated |
|
839 |
+ \item fixes in ClustalOmega source code to ensure GCC 6 compatibility |
|
840 |
+ \item fix in \verb+msaConvert()+ function to improve safety of call to suggested |
|
841 |
+ package \verb+phangorn+ |
|
842 |
+ \end{itemize} |
|
837 | 843 |
\item[Version 1.5.3:] \mbox{ } \begin{itemize} |
838 | 844 |
\item additional conversions implemented for \verb+msaConvert()+ function |
839 | 845 |
\item corresponding changes in documentation |
git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/msa@118902 bc3139a8-67e5-0310-9ffc-ced21a209358
... | ... |
@@ -436,14 +436,16 @@ uses this function to compute the consensus sequence. |
436 | 436 |
\subsection{Interfacing to Other Packages} |
437 | 437 |
|
438 | 438 |
There are also other sequence analysis packages that use or make use of multiple |
439 |
-sequence alignments. The \msa\ package does not directly interface to any of these packages |
|
439 |
+sequence alignments. The \msa\ package does not directly interface to these packages |
|
440 | 440 |
in order to avoid dependencies and possible incompatibilities. However, \msa\ provides |
441 | 441 |
a function \verb+msaConvert()+ that allows for converting multiple sequence alignment |
442 |
-objects to other types/classes. Currently, two such conversions are available, namely to |
|
443 |
-objects of class \verb+alignment+ (as defined and used by the \verb+seqinr+ package) and |
|
444 |
-to objects of class \verb+align+ (as defined and used by the \verb+bios2mds+ package). |
|
445 |
-Note that the conversion is performed without loading or depending on the respective |
|
446 |
-packages. |
|
442 |
+objects to other types/classes. Currently, five such conversions are available, namely to |
|
443 |
+the classes \verb+alignment+ (\verb+seqinr+ package \cite{CharifLobry2007}), |
|
444 |
+\verb+align+ (\verb+bios2mds+ package \cite{PeleBecuAbdiChabbert2012}), |
|
445 |
+\verb+AAbin+/\verb+DNAbin+ (\verb+ape+ package \cite{ParadisClaudeStrimmer2004}), |
|
446 |
+and \verb+phyDat+ (\verb+phangorn+ package \cite{Schliep2011}). Except for the |
|
447 |
+conversion to the class \verb+phyDat+, these conversion are performed without loading |
|
448 |
+or depending on the respective packages. |
|
447 | 449 |
|
448 | 450 |
In the following example, we perform a multiple alignment of Hemoglobin alpha |
449 | 451 |
example sequences and convert the result for later processing with the \verb+seqinr+ |
... | ... |
@@ -461,14 +463,15 @@ the \verb+seqinr+ package: |
461 | 463 |
library(seqinr) |
462 | 464 |
|
463 | 465 |
d <- dist.alignment(hemoAln2, "identity") |
464 |
-as.matrix(d)[3:4, 3:4] |
|
466 |
+as.matrix(d)[2:5, "HBA1_Homo_sapiens", drop=FALSE] |
|
465 | 467 |
@ |
466 |
-Now we can construct a draft phylogenetic tree using the \verb+hclust()+ function from |
|
467 |
-the \verb+stats+ package: |
|
468 |
-<<HemoglobinTree,output.width='0.8\\textwidth',output.height='0.5\\textwidth'>>= |
|
469 |
-hemoTree <- hclust(d) |
|
470 |
-plot(hemoTree, main="Phylogenetic Tree of Hemoglobin Alpha Sequences", |
|
471 |
- xlab="", sub="") |
|
468 |
+Now we can construct a phylogenetic tree with the neighbor joining algorithm using the |
|
469 |
+\verb+nj()+ function from the \verb+ape+ package: |
|
470 |
+<<HemoglobinTree,output.width='0.8\\textwidth',output.height='0.5\\textwidth',message=FALSE,results='hide'>>= |
|
471 |
+library(ape) |
|
472 |
+ |
|
473 |
+hemoTree <- nj(d) |
|
474 |
+plot(hemoTree, main="Phylogenetic Tree of Hemoglobin Alpha Sequences") |
|
472 | 475 |
@ |
473 | 476 |
|
474 | 477 |
The following example shows how to convert a multiple alignment object in an object of |
... | ... |
@@ -478,6 +481,20 @@ hemoAln3 <- msaConvert(hemoAln, type="bios2mds::align") |
478 | 481 |
str(hemoAln3) |
479 | 482 |
@ |
480 | 483 |
|
484 |
+The conversions to the standard \verb+Biostrings+ classes are straightforward using |
|
485 |
+standard \verb+as()+ methods and not provided by the \verb+msaConvert()+ function. |
|
486 |
+The following example converts a multiple alignment object to class \verb+BStringSet+ |
|
487 |
+(e.g.\ the \verb+msaplot()+ function from the \verb+ggtree+ package |
|
488 |
+\cite{YuSmithZhuGuanLam2016} accepts \verb+BStringSet+ objects): |
|
489 |
+<<Hemoglobin4>>= |
|
490 |
+hemoAln4 <- as(hemoAln, "BStringSet") |
|
491 |
+hemoAln4 |
|
492 |
+@ |
|
493 |
+ |
|
494 |
+\notebox{The \texttt{msaConvert()} function has been introduced in version 1.3.3 of the |
|
495 |
+ \MSA\ package. So, to have this function available, at least Bioconductor 3.3 |
|
496 |
+ is required, which requires at least R 3.3.0.} |
|
497 |
+ |
|
481 | 498 |
\section{Pretty-Printing Multiple Sequence Alignments}\label{sec:msaPrettyPrint} |
482 | 499 |
|
483 | 500 |
As already mentioned above, the \MSA\ package offers the function |
... | ... |
@@ -817,6 +834,13 @@ bibliography below). |
817 | 834 |
\section{Change Log} |
818 | 835 |
|
819 | 836 |
\begin{description} |
837 |
+\item[Version 1.5.3:] \mbox{ } \begin{itemize} |
|
838 |
+ \item additional conversions implemented for \verb+msaConvert()+ function |
|
839 |
+ \item corresponding changes in documentation |
|
840 |
+ \end{itemize} |
|
841 |
+\item[Versions 1.5.1 and 1.5.2:] version number bumps for technical reasons |
|
842 |
+ related to Bioconductor build servers |
|
843 |
+\item[Version 1.5.0:] new branch for Bioconductor 3.4 devel |
|
820 | 844 |
\item[Version 1.4.0:] release as part of Bioconductor 3.3 |
821 | 845 |
\item[Version 1.3.7:] \mbox{ } \begin{itemize} |
822 | 846 |
\item fixes in \verb+msaPrettyPrint()+ function |
... | ... |
@@ -847,7 +871,7 @@ bibliography below). |
847 | 871 |
\item[Version 1.3.1:] \mbox{ } \begin{itemize} |
848 | 872 |
\item fixes in Makefiles and Makevars files to account for changes in build system |
849 | 873 |
\end{itemize} |
850 |
-\item[Version 1.3.0:] devel branch created from version 1.2.0 |
|
874 |
+\item[Version 1.3.0:] new branch for Bioconductor 3.3 devel |
|
851 | 875 |
\item[Version 1.2.0:] release as part of Bioconductor 3.2 |
852 | 876 |
\item[Version 1.1.3:] \mbox{ } \begin{itemize} |
853 | 877 |
\item bug fix related to custom substitution matrices |
... | ... |
@@ -872,84 +896,11 @@ bibliography below). |
872 | 896 |
\item[Version 1.1.1:] \mbox{ } \begin{itemize} |
873 | 897 |
\item fix of \verb+msa()+ function |
874 | 898 |
\end{itemize} |
875 |
-\item[Version 1.1.0:] devel branch created from version 1.0.0 |
|
899 |
+\item[Version 1.1.0:] new branch for Bioconductor 3.2 devel |
|
876 | 900 |
\item[Version 1.0.0:] first official release as part of Bioconductor 3.1 |
877 | 901 |
\end{description} |
878 | 902 |
|
879 |
-%\bibliographystyle{plain} |
|
880 |
-%\bibliography{lit} |
|
881 |
- |
|
882 |
-\begin{thebibliography}{10} |
|
883 |
- |
|
884 |
-\bibitem{Beitz2000} |
|
885 |
-E.~Beitz. |
|
886 |
-\newblock {\TeX shade}: shading and labeling of multiple sequence alignments |
|
887 |
- using {\LaTeX2e}. |
|
888 |
-\newblock {\em Bioinformatics}, 16(2):135--139, 2000. |
|
889 |
- |
|
890 |
-\bibitem{Edgar2004b} |
|
891 |
-R.~C. Edgar. |
|
892 |
-\newblock {MUSCLE}: a multiple sequence alignment method with reduced time and |
|
893 |
- space complexity. |
|
894 |
-\newblock {\em BMC Bioinformatics}, 5(5):113, 2004. |
|
895 |
- |
|
896 |
-\bibitem{Edgar2004a} |
|
897 |
-R.~C. Edgar. |
|
898 |
-\newblock {MUSCLE:} multiple sequence alignment with high accuracy and high |
|
899 |
- throughput. |
|
900 |
-\newblock {\em Nucleic Acids Res.}, 32(5):1792--1797, 2004. |
|
901 |
- |
|
902 |
-\bibitem{Lamport1999} |
|
903 |
-L.~Lamport. |
|
904 |
-\newblock {\em {\LaTeX} --- A Document Preparation System. User's Guide and |
|
905 |
- Reference Manual}. |
|
906 |
-\newblock Addison-Wesley Longman, Amsterdam, 1999. |
|
907 |
- |
|
908 |
-\bibitem{Leisch2002} |
|
909 |
-F.~Leisch. |
|
910 |
-\newblock Sweave: dynamic generation of statistical reports using literate data |
|
911 |
- analysis. |
|
912 |
-\newblock In W.~H\"ardle and B.~R\"onz, editors, {\em Compstat 2002 --- |
|
913 |
- Proceedings in Computational Statistics}, pages 575--580, Heidelberg, 2002. |
|
914 |
- Physica-Verlag. |
|
915 |
- |
|
916 |
-\bibitem{Morgenstern1999} |
|
917 |
-B.~Morgenstern. |
|
918 |
-\newblock {DIALIGN 2}: improvement of the segment-to-segment approach to |
|
919 |
- multiple sequence alignment. |
|
920 |
-\newblock {\em Bioinformatics}, 15(3):211--218, 1999. |
|
921 |
- |
|
922 |
-\bibitem{Nethercote2007} |
|
923 |
-N.~Nethercote and J.~Seward. |
|
924 |
-\newblock Valgrind: A framework for heavyweight dynamic binary instrumentation. |
|
925 |
-\newblock In {\em Proc. of the ACM SIGPLAN 2007 Conf. on Programming Language |
|
926 |
- Design and Implementation}, San Diego, CA, 2007. |
|
927 |
- |
|
928 |
-\bibitem{Notredame2000} |
|
929 |
-C.~Notredame, D.~G. Higgins, and J.~Heringa. |
|
930 |
-\newblock {T-Coffee}: A novel method for fast and accurate multiple sequence |
|
931 |
- alignment. |
|
932 |
-\newblock {\em J. Mol. Biol.}, 302(1):205--217, 2000. |
|
933 |
- |
|
934 |
-\bibitem{Sievers2011} |
|
935 |
-F.~Sievers, A.~Wilm, D.~Dineen, T.~J. Gibson, K.~Karplus, W.~Li, R.~Lopez, |
|
936 |
- H.~McWilliam, M.~Remmert, J.~S\"oding, J.~D. Thompson, and D.~G. Higgins. |
|
937 |
-\newblock Fast, scalable generation of high-quality protein multiple sequence |
|
938 |
- alignments using {Clustal Omega}. |
|
939 |
-\newblock {\em Mol. Syst. Biol.}, 7:539, 2011. |
|
940 |
- |
|
941 |
-\bibitem{Thompson1994} |
|
942 |
-J.~D. Thompson, D.~G. Higgins, and T.~J. Gibson. |
|
943 |
-\newblock {CLUSTAL W}: improving the sensitivity of progressive multiple |
|
944 |
- sequence alignment through sequence weighting, position-specific gap |
|
945 |
- penalties and weight matrix choice. |
|
946 |
-\newblock {\em Nucleic Acids Res.}, 22(22):4673--4680, 2004. |
|
947 |
- |
|
948 |
-\bibitem{Xie2014} |
|
949 |
-Y.~Xie. |
|
950 |
-\newblock {\em Dynamic Documents with R and knitr}. |
|
951 |
-\newblock Chapman \&\ Hall/CRC, 2014. |
|
952 |
- |
|
953 |
-\end{thebibliography} |
|
903 |
+\bibliographystyle{plain} |
|
904 |
+\bibliography{lit} |
|
954 | 905 |
|
955 | 906 |
\end{document} |
git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/msa@116969 bc3139a8-67e5-0310-9ffc-ced21a209358
... | ... |
@@ -817,6 +817,10 @@ bibliography below). |
817 | 817 |
\section{Change Log} |
818 | 818 |
|
819 | 819 |
\begin{description} |
820 |
+\item[Version 1.4.0:] release as part of Bioconductor 3.3 |
|
821 |
+\item[Version 1.3.7:] \mbox{ } \begin{itemize} |
|
822 |
+ \item fixes in \verb+msaPrettyPrint()+ function |
|
823 |
+ \end{itemize} |
|
820 | 824 |
\item[Version 1.3.6:] \mbox{ } \begin{itemize} |
821 | 825 |
\item \verb+msaPrettyPrint()+ now also accepts dashes in file names |
822 | 826 |
\item added section about pretty-printing wide alignments to package vignette |
... | ... |
@@ -843,9 +847,8 @@ bibliography below). |
843 | 847 |
\item[Version 1.3.1:] \mbox{ } \begin{itemize} |
844 | 848 |
\item fixes in Makefiles and Makevars files to account for changes in build system |
845 | 849 |
\end{itemize} |
846 |
-\item[Version 1.3.0:] \mbox{ } \begin{itemize} |
|
847 |
- \item new branch for Bioconductor 3.3 devel |
|
848 |
- \end{itemize} |
|
850 |
+\item[Version 1.3.0:] devel branch created from version 1.2.0 |
|
851 |
+\item[Version 1.2.0:] release as part of Bioconductor 3.2 |
|
849 | 852 |
\item[Version 1.1.3:] \mbox{ } \begin{itemize} |
850 | 853 |
\item bug fix related to custom substitution matrices |
851 | 854 |
in the MUSCLE interface |
... | ... |
@@ -866,8 +869,10 @@ bibliography below). |
866 | 869 |
\verb+msaPrettyPrint()+ |
867 | 870 |
\item updated citation information |
868 | 871 |
\end{itemize} |
869 |
-\item[Version 1.1.1:] fix of \verb+msa()+ function |
|
870 |
-\item[Version 1.1.0:] new branch for Bioconductor 3.2 devel |
|
872 |
+\item[Version 1.1.1:] \mbox{ } \begin{itemize} |
|
873 |
+ \item fix of \verb+msa()+ function |
|
874 |
+ \end{itemize} |
|
875 |
+\item[Version 1.1.0:] devel branch created from version 1.0.0 |
|
871 | 876 |
\item[Version 1.0.0:] first official release as part of Bioconductor 3.1 |
872 | 877 |
\end{description} |
873 | 878 |
|
git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/msa@116456 bc3139a8-67e5-0310-9ffc-ced21a209358
... | ... |
@@ -684,6 +684,41 @@ is to check sequence names carefully and to avoid problematic sequence names fro |
684 | 684 |
Note, moreover, that too long sequence names will lead to less appealing outputs, |
685 | 685 |
so users are generally advised to consider sequence names carefully. |
686 | 686 |
|
687 |
+\subsection{Pretty-Printing Wide Alignments} |
|
688 |
+ |
|
689 |
+If the alignment to be printed with \verb+msaPrettyPrint()+ is wide |
|
690 |
+(thousands of columns or wider), \LaTeX\ may terminate prematurely because of |
|
691 |
+exceeded \TeX\ capacity. Unfortunately, this problem remains opaque to the |
|
692 |
+user, since \verb+texi2dvi()+ and \verb+texi2pdf()+ do not convey much details |
|
693 |
+about \LaTeX\ problems when typesetting a document. We recommend the following |
|
694 |
+if a user encounters problems with running \verb+msaPrettyPrint()+'s output |
|
695 |
+with \verb+texi2dvi()+ and \verb+texi2pdf()+: |
|
696 |
+\begin{enumerate} |
|
697 |
+\item Run \verb+pdflatex+ on the generated \verb+.tex+ file to see |
|
698 |
+ whether it is actually a problem with \TeX\ capacity. |
|
699 |
+\item If so, split the alignment into multiple chunks and run |
|
700 |
+ \verb+msaPrettyPrint()+ on each chunk separately. |
|
701 |
+\end{enumerate} |
|
702 |
+ |
|
703 |
+The following example |
|
704 |
+demonstrates this approach for a multiple aligment object `\verb+aln+': |
|
705 |
+<<SplitAlignmentIntoJunks,eval=FALSE>>= |
|
706 |
+chunkSize <- 300 ## how much fits on one page depends on the length of |
|
707 |
+ ## names and the number of sequences; |
|
708 |
+ ## change to what suits your needs |
|
709 |
+ |
|
710 |
+for (start in seq(1, ncol(aln), by=chunkSize)) |
|
711 |
+{ |
|
712 |
+ end <- min(start + chunkSize - 1, ncol(aln)) |
|
713 |
+ alnPart <- DNAMultipleAlignment(subseq(unmasked(aln), start, end)) |
|
714 |
+ |
|
715 |
+ msaPrettyPrint(x=alnPart, output="pdf", subset=NULL, |
|
716 |
+ file=paste0("aln_", start, "-", end, ".pdf")) |
|
717 |
+} |
|
718 |
+@ |
|
719 |
+\noindent This creates multiple PDF files all of which show one part of the alignment. |
|
720 |
+Please note, however, that the numbering of columns is restarted for each chunk. |
|
721 |
+ |
|
687 | 722 |
\subsection{Further Caveats} |
688 | 723 |
|
689 | 724 |
\begin{itemize} |
... | ... |
@@ -782,6 +817,10 @@ bibliography below). |
782 | 817 |
\section{Change Log} |
783 | 818 |
|
784 | 819 |
\begin{description} |
820 |
+\item[Version 1.3.6:] \mbox{ } \begin{itemize} |
|
821 |
+ \item \verb+msaPrettyPrint()+ now also accepts dashes in file names |
|
822 |
+ \item added section about pretty-printing wide alignments to package vignette |
|
823 |
+ \end{itemize} |
|
785 | 824 |
\item[Version 1.3.5:] \mbox{ } \begin{itemize} |
786 | 825 |
\item adaptation of displaying help text by \verb+msa()+ function |
787 | 826 |
\end{itemize} |
git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/msa@116078 bc3139a8-67e5-0310-9ffc-ced21a209358
... | ... |
@@ -782,6 +782,9 @@ bibliography below). |
782 | 782 |
\section{Change Log} |
783 | 783 |
|
784 | 784 |
\begin{description} |
785 |
+\item[Version 1.3.5:] \mbox{ } \begin{itemize} |
|
786 |
+ \item adaptation of displaying help text by \verb+msa()+ function |
|
787 |
+ \end{itemize} |
|
785 | 788 |
\item[Version 1.3.4:] \mbox{ } \begin{itemize} |
786 | 789 |
\item added function for checking and fixing sequence names for |
787 | 790 |
possibly problematic characters that could lead to \LaTeX\ errors |
git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/msa@115881 bc3139a8-67e5-0310-9ffc-ced21a209358
... | ... |
@@ -11,7 +11,7 @@ |
11 | 11 |
\usepackage[OT1]{fontenc} |
12 | 12 |
|
13 | 13 |
\title{{\Huge msa}\\[5mm] An R Package for Multiple Sequence Alignment} |
14 |
-\author{Enrico Bonatesta, Christoph Horejs-Kainrath, and Ulrich Bodenhofer} |
|
14 |
+\author{Enrico Bonatesta, Christoph Horej\v{s}-Kainrath, and Ulrich Bodenhofer} |
|
15 | 15 |
\affiliation{Institute of Bioinformatics, Johannes Kepler University |
16 | 16 |
Linz\\Altenberger Str. 69, 4040 Linz, Austria\\ |
17 | 17 |
\email{msa@bioinf.jku.at}} |
git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/msa@115880 bc3139a8-67e5-0310-9ffc-ced21a209358
... | ... |
@@ -666,10 +666,28 @@ the \shade\ package must be loaded in the preamble: |
666 | 666 |
\end{verbatim} |
667 | 667 |
\end{quote} |
668 | 668 |
|
669 |
+\subsection{Sequence Names} |
|
670 |
+ |
|
671 |
+The \verb+Biostrings+ package does not impose any restrictions on the names of |
|
672 |
+sequences. Consequently, \MSA\ also allows all possible ASCII strings as |
|
673 |
+sequence (row) names in multiple alignments. As soon as \verb+msaPrettyPrint()+ |
|
674 |
+is used for pretty-printing multiple sequence alignments, however, the sequence |
|
675 |
+names are interpreted as plain \LaTeX\ source code. Consequently, \LaTeX\ errors |
|
676 |
+may arise because of characters or words in the sequence names that \LaTeX\ |
|
677 |
+does not or cannot interpret as plain text correctly. This particularly includes |
|
678 |
+appearances of special characters and backslash characters in the sequence names. |
|
679 |
+ |
|
680 |
+The \MSA\ package offers a function \verb+msaCheckNames()+ which allows for finding |
|
681 |
+and replacing potentially problematic characters in the sequence names of |
|
682 |
+multiple alignment objects (see \verb+?msaCheckNames+). However, the best solution |
|
683 |
+is to check sequence names carefully and to avoid problematic sequence names from the beginning. |
|
684 |
+Note, moreover, that too long sequence names will lead to less appealing outputs, |
|
685 |
+so users are generally advised to consider sequence names carefully. |
|
686 |
+ |
|
669 | 687 |
\subsection{Further Caveats} |
670 | 688 |
|
671 | 689 |
\begin{itemize} |
672 |
- \item Note that \verb+texi2dvi()+ and \verb+ttexi2pdf()+ always |
|
690 |
+ \item Note that \verb+texi2dvi()+ and \verb+texi2pdf()+ always |
|
673 | 691 |
save the resulting DVI/PDF files to the current working directory, |
674 | 692 |
even if the \LaTeX\ source file is in a different directory. |
675 | 693 |
That is also the reason why the temporary file is created in the |
... | ... |
@@ -764,6 +782,13 @@ bibliography below). |
764 | 782 |
\section{Change Log} |
765 | 783 |
|
766 | 784 |
\begin{description} |
785 |
+\item[Version 1.3.4:] \mbox{ } \begin{itemize} |
|
786 |
+ \item added function for checking and fixing sequence names for |
|
787 |
+ possibly problematic characters that could lead to \LaTeX\ errors |
|
788 |
+ when using \verb+msaPrettyPrint()+ |
|
789 |
+ \item corresponding changes in documentation |
|
790 |
+ \item minor namespace fix |
|
791 |
+ \end{itemize} |
|
767 | 792 |
\item[Version 1.3.3:] \mbox{ } \begin{itemize} |
768 | 793 |
\item added function for converting multiple sequence alignments for |
769 | 794 |
use with other sequence alignment packages |
git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/msa@114012 bc3139a8-67e5-0310-9ffc-ced21a209358
... | ... |
@@ -39,6 +39,7 @@ Linz\\Altenberger Str. 69, 4040 Linz, Austria\\ |
39 | 39 |
options(width=65) |
40 | 40 |
set.seed(0) |
41 | 41 |
library(msa) |
42 |
+library(seqinr) |
|
42 | 43 |
msaVersion <- packageDescription("msa")$Version |
43 | 44 |
msaDateRaw <- packageDescription("msa")$Date |
44 | 45 |
msaDateYear <- as.numeric(substr(msaDateRaw, 1, 4)) |
... | ... |
@@ -366,6 +367,8 @@ print(myFirstAlignment, showNames=FALSE, show="complete") |
366 | 367 |
|
367 | 368 |
\section{Processing Multiple Alignments}\label{sec:msaProc} |
368 | 369 |
|
370 |
+\subsection{Methods Inherited From {\tt Biostrings}} |
|
371 |
+ |
|
369 | 372 |
The classes defined by the \MSA\ package for storing multiple alignment results |
370 | 373 |
have been derived from the corresponding classes defined by the |
371 | 374 |
\verb+Biostrings+ package. Therefore, all methods for processing |
... | ... |
@@ -430,6 +433,51 @@ printSplitString(consensusString(unmasked(myMaskedAlignment))) |
430 | 433 |
\noindent Actually, the \verb+print()+ method (see Section~\ref{sec:msaPrint} above) |
431 | 434 |
uses this function to compute the consensus sequence. |
432 | 435 |
|
436 |
+\subsection{Interfacing to Other Packages} |
|
437 |
+ |
|
438 |
+There are also other sequence analysis packages that use or make use of multiple |
|
439 |
+sequence alignments. The \msa\ package does not directly interface to any of these packages |
|
440 |
+in order to avoid dependencies and possible incompatibilities. However, \msa\ provides |
|
441 |
+a function \verb+msaConvert()+ that allows for converting multiple sequence alignment |
|
442 |
+objects to other types/classes. Currently, two such conversions are available, namely to |
|
443 |
+objects of class \verb+alignment+ (as defined and used by the \verb+seqinr+ package) and |
|
444 |
+to objects of class \verb+align+ (as defined and used by the \verb+bios2mds+ package). |
|
445 |
+Note that the conversion is performed without loading or depending on the respective |
|
446 |
+packages. |
|
447 |
+ |
|
448 |
+In the following example, we perform a multiple alignment of Hemoglobin alpha |
|
449 |
+example sequences and convert the result for later processing with the \verb+seqinr+ |
|
450 |
+package: |
|
451 |
+<<Hemoglobin1>>= |
|
452 |
+hemoSeq <- readAAStringSet(system.file("examples/HemoglobinAA.fasta", |
|
453 |
+ package="msa")) |
|
454 |
+hemoAln <- msa(hemoSeq) |
|
455 |
+hemoAln |
|
456 |
+hemoAln2 <- msaConvert(hemoAln, type="seqinr::alignment") |
|
457 |
+@ |
|
458 |
+Now we compute a distance matrix using the \verb+dist.alignment()+ function from |
|
459 |
+the \verb+seqinr+ package: |
|
460 |
+<<Hemoglobin2>>= |
|
461 |
+library(seqinr) |
|
462 |
+ |
|
463 |
+d <- dist.alignment(hemoAln2, "identity") |
|
464 |
+as.matrix(d)[3:4, 3:4] |
|
465 |
+@ |
|
466 |
+Now we can construct a draft phylogenetic tree using the \verb+hclust()+ function from |
|
467 |
+the \verb+stats+ package: |
|
468 |
+<<HemoglobinTree,output.width='0.8\\textwidth',output.height='0.5\\textwidth'>>= |
|
469 |
+hemoTree <- hclust(d) |
|
470 |
+plot(hemoTree, main="Phylogenetic Tree of Hemoglobin Alpha Sequences", |
|
471 |
+ xlab="", sub="") |
|
472 |
+@ |
|
473 |
+ |
|
474 |
+The following example shows how to convert a multiple alignment object in an object of |
|
475 |
+class \verb+align+ as defined by the \verb+bios2mds+ package: |
|
476 |
+<<Hemoglobin3>>= |
|
477 |
+hemoAln3 <- msaConvert(hemoAln, type="bios2mds::align") |
|
478 |
+str(hemoAln3) |
|
479 |
+@ |
|
480 |
+ |
|
433 | 481 |
\section{Pretty-Printing Multiple Sequence Alignments}\label{sec:msaPrettyPrint} |
434 | 482 |
|
435 | 483 |
As already mentioned above, the \MSA\ package offers the function |
... | ... |
@@ -716,6 +764,11 @@ bibliography below). |
716 | 764 |
\section{Change Log} |
717 | 765 |
|
718 | 766 |
\begin{description} |
767 |
+\item[Version 1.3.3:] \mbox{ } \begin{itemize} |
|
768 |
+ \item added function for converting multiple sequence alignments for |
|
769 |
+ use with other sequence alignment packages |
|
770 |
+ \item corresponding changes in documentation |
|
771 |
+ \end{itemize} |
|
719 | 772 |
\item[Version 1.3.2:] \mbox{ } \begin{itemize} |
720 | 773 |
\item further fixes in Makefiles and Makevars files to account for changes in build system |
721 | 774 |
\item update of citation information |
git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/msa@111693 bc3139a8-67e5-0310-9ffc-ced21a209358
... | ... |
@@ -699,8 +699,9 @@ We envision the following changes/extensions in future versions of the package: |
699 | 699 |
If you use this package for research that is published later, you are kindly |
700 | 700 |
asked to cite it as follows: |
701 | 701 |
\begin{quotation} |
702 |
-\noindent E.~Bonatesta, C.~Horejs-Kainrath, and U.~Bodenhofer, (2015). |
|
703 |
-msa: An R Package for Multiple Sequence Alignment. |
|
702 |
+\noindent U.~Bodenhofer, E.~Bonatesta, C.~Horej\v{s}-Kainrath, and S.~Hochreiter (2015). |
|
703 |
+msa: an R package for multiple sequence alignment. {\em Bioinformatics} {\bf 31}(24):3997--3999. |
|
704 |
+DOI: \href{http://dx.doi.org/10.1093/bioinformatics/btv494}{bioinformatics/btv494}. |
|
704 | 705 |
\end{quotation} |
705 | 706 |
To obtain a Bib\TeX\ entries of the reference, enter the |
706 | 707 |
following into your R session: |
... | ... |
@@ -715,8 +716,15 @@ bibliography below). |
715 | 716 |
\section{Change Log} |
716 | 717 |
|
717 | 718 |
\begin{description} |
718 |
-\item[Version 1.2.0:] \mbox{ } \begin{itemize} |
|
719 |
- \item new branch for Bioconductor 3.2 release |
|
719 |
+\item[Version 1.3.2:] \mbox{ } \begin{itemize} |
|
720 |
+ \item further fixes in Makefiles and Makevars files to account for changes in build system |
|
721 |
+ \item update of citation information |
|
722 |
+ \end{itemize} |
|
723 |
+\item[Version 1.3.1:] \mbox{ } \begin{itemize} |
|
724 |
+ \item fixes in Makefiles and Makevars files to account for changes in build system |
|
725 |
+ \end{itemize} |
|
726 |
+\item[Version 1.3.0:] \mbox{ } \begin{itemize} |
|
727 |
+ \item new branch for Bioconductor 3.3 devel |
|
720 | 728 |
\end{itemize} |
721 | 729 |
\item[Version 1.1.3:] \mbox{ } \begin{itemize} |
722 | 730 |
\item bug fix related to custom substitution matrices |
git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/msa@109581 bc3139a8-67e5-0310-9ffc-ced21a209358
... | ... |
@@ -209,7 +209,7 @@ The example in Section~\ref{sec:impatient} above simply called |
209 | 209 |
the function \verb+msa()+ without any additional arguments. |
210 | 210 |
We mentioned already that, in this case, ClustalW is called with default |
211 | 211 |
parameters. We can also explicitly request ClustalW or one of the two |
212 |
-other algorithms ClustalOmega or Muscle: |
|
212 |
+other algorithms ClustalOmega or MUSCLE: |
|
213 | 213 |
<<OtherAlgorithms,>>= |
214 | 214 |
myClustalWAlignment <- msa(mySequences, "ClustalW") |
215 | 215 |
myClustalWAlignment |
... | ... |
@@ -677,12 +677,6 @@ source package tarball, untar it, comment/uncomment the corresponding line in |
677 | 677 |
\verb+msa/src/ClustalOmega/msaMakefile+ (see first six lines), and |
678 | 678 |
build/install the package from source. |
679 | 679 |
|
680 |
-\subsubsection*{MUSCLE with Custom Substitution Matrices} |
|
681 |
- |
|
682 |
-We are aware the that our MUSCLE interface is rather picky in terms of the |
|
683 |
-format in which substitution matrices are passed to the \verb+msaMuscle()+ |
|
684 |
-function. This interface will be improved in future versions. |
|
685 |
- |
|
686 | 680 |
\section{Future Extensions}\label{sec:future} |
687 | 681 |
|
688 | 682 |
We envision the following changes/extensions in future versions of the package: |
... | ... |
@@ -721,10 +715,25 @@ bibliography below). |
721 | 715 |
\section{Change Log} |
722 | 716 |
|
723 | 717 |
\begin{description} |
724 |
-\item[Version 1.0.2:] \mbox{ } \begin{itemize} |
|
718 |
+\item[Version 1.2.0:] \mbox{ } \begin{itemize} |
|
719 |
+ \item new branch for Bioconductor 3.2 release |
|
720 |
+ \end{itemize} |
|
721 |
+\item[Version 1.1.3:] \mbox{ } \begin{itemize} |
|
722 |
+ \item bug fix related to custom substitution matrices |
|
723 |
+ in the MUSCLE interface |
|
724 |
+ \item corrections and updates of documentation |
|
725 |
+ \end{itemize} |
|
726 |
+\item[Version 1.1.2:] \mbox{ } \begin{itemize} |
|
725 | 727 |
\item new \verb+print()+ function for multiple alignments that also |
726 | 728 |
allows for displaying alignments in their entirety (plus additional |
727 | 729 |
customizations) |