FoldIndexR addition
... | ... |
@@ -1,7 +1,7 @@ |
1 | 1 |
Package: idpr |
2 | 2 |
Type: Package |
3 | 3 |
Title: Profiling and Analyzing Intrinsically Disordered Proteins in R |
4 |
-Version: 1.0.007 |
|
4 |
+Version: 1.6.1 |
|
5 | 5 |
Authors@R: c(person(c("William", "M."), "McFadden", |
6 | 6 |
email = "wmm27@pitt.edu", |
7 | 7 |
role = c("cre", "aut")), |
... | ... |
@@ -23,9 +23,9 @@ License: LGPL-3 |
23 | 23 |
Encoding: UTF-8 |
24 | 24 |
LazyData: true |
25 | 25 |
biocViews: StructuralPrediction, Proteomics, CellBiology |
26 |
-RoxygenNote: 7.1.1 |
|
26 |
+RoxygenNote: 7.1.2 |
|
27 | 27 |
Depends: |
28 |
- R (>= 4.0.0) |
|
28 |
+ R (>= 4.1.3) |
|
29 | 29 |
Imports: |
30 | 30 |
ggplot2 (>= 3.3.0), |
31 | 31 |
magrittr (>= 1.5), |
9 | 10 |
new file mode 100644 |
... | ... |
@@ -0,0 +1,131 @@ |
1 |
+#' Prediction of Intrinsic Disorder with FoldIndex method in R |
|
2 |
+#' |
|
3 |
+#' This is used to calculate the prediction of intrinsic disorder based on |
|
4 |
+#' the scaled hydropathy and absolute net charge of an amino acid |
|
5 |
+#' sequence using a sliding window. FoldIndex described this relationship and |
|
6 |
+#' implemented it graphically in 2005 by Prilusky, Felder, et al, |
|
7 |
+#' and this tool has been implemented |
|
8 |
+#' into multiple disorder prediction programs. When windows have a negative |
|
9 |
+#' score (<0) sequences are predicted as disordered. |
|
10 |
+#' When windows have a positive score (>0) sequences are predicted as |
|
11 |
+#' disordered. Graphically, this cutoff is displayed by the dashed |
|
12 |
+#' line at y = 0. Calculations are at pH 7.0 based on the described method and |
|
13 |
+#' the default is a sliding window of size 51. |
|
14 |
+#' |
|
15 |
+#' The output is either a data frame or graph |
|
16 |
+#' showing the calculated scores for each window along the sequence. |
|
17 |
+#' The equation used was originally described in Uversky et al. (2000)\cr |
|
18 |
+#' \url{https://doi.org/10.1002/1097-0134(20001115)41:3<415::AID-PROT130>3.0.CO;2-7} |
|
19 |
+#' . \cr |
|
20 |
+#' |
|
21 |
+#' The FoldIndex method of using a sliding window and utilizing the uversky |
|
22 |
+#' equation is described in Prilusky, J., Felder, C. E., et al. (2005). \cr |
|
23 |
+#' FoldIndex: a simple tool to predict whether a given protein sequence \cr |
|
24 |
+#' is intrinsically unfolded. Bioinformatics, 21(16), 3435-3438. \cr |
|
25 |
+#' |
|
26 |
+#' |
|
27 |
+#' @inheritParams sequenceCheck |
|
28 |
+#' @inheritParams chargeCalculationLocal |
|
29 |
+#' @param window a positive, odd integer. 51 by default. |
|
30 |
+#' Sets the size of sliding window, must be an odd number. |
|
31 |
+#' The window determines the number of residues to be analyzed and averaged |
|
32 |
+#' for each position along the sequence. |
|
33 |
+#' @param plotResults logical value, TRUE by default. |
|
34 |
+#' If \code{plotResults = TRUE} a plot will be the output. |
|
35 |
+#' If \code{plotResults = FALSE} the output is a data frame with scores for |
|
36 |
+#' each window analyzed. |
|
37 |
+#' @param proteinName character string with length = 1. |
|
38 |
+#' optional setting to replace the name of the plot if plotResults = TRUE. |
|
39 |
+#' @param ... any additional parameters, especially those for plotting. |
|
40 |
+#' @return see plotResults argument |
|
41 |
+#' @family scaled hydropathy functions |
|
42 |
+#' @seealso \code{\link{KDNorm}} for residue hydropathy values. |
|
43 |
+#' See \code{\link{pKaData}} for residue pKa values and citations. See |
|
44 |
+#' \code{\link{hendersonHasselbalch}} for charge calculations. |
|
45 |
+#' @references Kyte, J., & Doolittle, R. F. (1982). A simple method for |
|
46 |
+#' displaying the hydropathic character of a protein. |
|
47 |
+#' Journal of molecular biology, 157(1), 105-132. |
|
48 |
+#' @export |
|
49 |
+#' @section Plot Colors: |
|
50 |
+#' For users who wish to keep a common aesthetic, the following colors are |
|
51 |
+#' used when plotResults = TRUE. \cr |
|
52 |
+#' \itemize{ |
|
53 |
+#' \item Dynamic line colors: \itemize{ |
|
54 |
+#' \item Close to -1 = "#9672E6" |
|
55 |
+#' \item Close to 1 = "#D1A63F" |
|
56 |
+#' \item Close to midpoint = "grey65" or "#A6A6A6"}} |
|
57 |
+#' |
|
58 |
+#' @references |
|
59 |
+#' Kozlowski, L. P. (2016). IPC – Isoelectric Point Calculator. Biology |
|
60 |
+#' Direct, 11(1), 55. \url{https://doi.org/10.1186/s13062-016-0159-9} \cr |
|
61 |
+#' Kyte, J., & Doolittle, R. F. (1982). A simple method for |
|
62 |
+#' displaying the hydropathic character of a protein. |
|
63 |
+#' Journal of molecular biology, 157(1), 105-132. \cr |
|
64 |
+#' Prilusky, J., Felder, C. E., et al. (2005). \cr |
|
65 |
+#' FoldIndex: a simple tool to predict whether a given protein sequence \cr |
|
66 |
+#' is intrinsically unfolded. Bioinformatics, 21(16), 3435-3438. \cr |
|
67 |
+#' Uversky, V. N., Gillespie, J. R., & Fink, A. L. (2000). |
|
68 |
+#' Why are “natively unfolded” proteins unstructured under physiologic |
|
69 |
+#' conditions?. Proteins: structure, function, and bioinformatics, 41(3), |
|
70 |
+#' 415-427. |
|
71 |
+#' \url{https://doi.org/10.1002/1097-0134(20001115)41:3<415::AID-PROT130>3.0.CO;2-7} |
|
72 |
+#' @examples |
|
73 |
+#' #Amino acid sequences can be character strings |
|
74 |
+#' aaString <- "ACDEFGHIKLMNPQRSTVWY" |
|
75 |
+#' #Amino acid sequences can also be character vectors |
|
76 |
+#' aaVector <- c("A", "C", "D", "E", "F", |
|
77 |
+#' "G", "H", "I", "K", "L", |
|
78 |
+#' "M", "N", "P", "Q", "R", |
|
79 |
+#' "S", "T", "V", "W", "Y") |
|
80 |
+#' #Alternatively, .fasta files can also be used by providing |
|
81 |
+#' ##The path to the file as a character string. |
|
82 |
+#' |
|
83 |
+#' |
|
84 |
+#' foldIndexR(aaVector) |
|
85 |
+#' |
|
86 |
+#' exampleDF <- |
|
87 |
+#' foldIndexR(aaString, |
|
88 |
+#' plotResults = FALSE) |
|
89 |
+#' head(exampleDF) |
|
90 |
+#' |
|
91 |
+ |
|
92 |
+foldIndexR <- function(sequence, |
|
93 |
+ window = 51, |
|
94 |
+ proteinName = NA, |
|
95 |
+ pKaSet = "IPC_protein", |
|
96 |
+ plotResults = TRUE) { |
|
97 |
+ |
|
98 |
+ chargeDF <- |
|
99 |
+ chargeCalculationLocal(sequence = sequence, window = window, |
|
100 |
+ pH = 7.0, pKaSet = pKaSet, |
|
101 |
+ plotResults = FALSE) |
|
102 |
+ chargeDF$scaledWindowCharge <- chargeDF$windowCharge / window |
|
103 |
+ hydropDF <- scaledHydropathyLocal(sequence = sequence, |
|
104 |
+ window = window, |
|
105 |
+ plotResults = FALSE) |
|
106 |
+ mergeDF <- merge(hydropDF, chargeDF) |
|
107 |
+ |
|
108 |
+ mergeDF$foldIndex <- |
|
109 |
+ mergeDF$WindowHydropathy * 2.785 - |
|
110 |
+ abs(mergeDF$scaledWindowCharge) - 1.151 |
|
111 |
+ |
|
112 |
+ if (plotResults) { |
|
113 |
+ plotTitle <- "FoldIndex Prediction of Intrinsic Disorder" |
|
114 |
+ if (!is.na(proteinName)) { |
|
115 |
+ plotTitle <- |
|
116 |
+ paste0("FoldIndex Prediction of Intrinsic Disorder in ", |
|
117 |
+ proteinName, sep = "") |
|
118 |
+ } |
|
119 |
+ |
|
120 |
+ gg <- sequencePlot(position = mergeDF$Position, |
|
121 |
+ property = mergeDF$foldIndex, |
|
122 |
+ hline = 0, dynamicColor = mergeDF$foldIndex, |
|
123 |
+ customColors = c("#9672E6", "#D1A63F", "grey65"), |
|
124 |
+ customTitle = NA, propertyLimits = c(-1, 1)) |
|
125 |
+ gg <- gg + ggplot2::labs(title = plotTitle, y = "Score") |
|
126 |
+ return(gg) |
|
127 |
+ } else { |
|
128 |
+ return(mergeDF) |
|
129 |
+ } |
|
130 |
+ |
|
131 |
+} |
... | ... |
@@ -7,6 +7,7 @@ |
7 | 7 |
#' \code{\link{chargeCalculationLocal}}\cr |
8 | 8 |
#' \code{\link{scaledHydropathyLocal}}\cr |
9 | 9 |
#' \code{\link{structuralTendencyPlot}}\cr |
10 |
+#' \code{\link{foldIndexR}}\cr |
|
10 | 11 |
#' All of the above linked functions only require the sequence argument |
11 | 12 |
#' to output plots of characteristics associated with IDPs. The function also |
12 | 13 |
#' includes options for IUPred functions. The function does one of the |
... | ... |
@@ -24,7 +25,11 @@ |
24 | 25 |
#' @param uniprotAccession character string specifying the UniProt Accession of |
25 | 26 |
#' the protein of interest. Used to fetch predictions from IUPreds REST API. |
26 | 27 |
#' Default is NA. Keep as NA if you do not have a UniProt Accession. |
27 |
-#' |
|
28 |
+#' @param window a positive, odd integer. 51 by default. |
|
29 |
+#' Sets the size of sliding window, must be an odd number. |
|
30 |
+#' The window determines the number of residues to be analyzed and averaged |
|
31 |
+#' for each position along the sequence. 51 is default for |
|
32 |
+#' \code{\link{foldIndexR}}\cr. |
|
28 | 33 |
#' @param proteinName character string, optional. |
29 | 34 |
#' Used to add protein name to the title in ggplot. |
30 | 35 |
#' @inheritParams chargeCalculationLocal |
... | ... |
@@ -66,6 +71,7 @@ |
66 | 71 |
#' \code{\link{chargeCalculationLocal}}\cr |
67 | 72 |
#' \code{\link{scaledHydropathyLocal}}\cr |
68 | 73 |
#' \code{\link{structuralTendencyPlot}}\cr |
74 |
+#' \code{\link{foldIndexR}}\cr |
|
69 | 75 |
#' \code{\link{iupred}}\cr |
70 | 76 |
#' \code{\link{iupredAnchor}}\cr |
71 | 77 |
#' \code{\link{iupredRedox}} |
... | ... |
@@ -112,6 +118,19 @@ |
112 | 118 |
#' Protein Science, 22(6), 693-724. |
113 | 119 |
#' doi:10.1002/pro.2261 } |
114 | 120 |
#' } |
121 |
+#' \item \code{\link{foldIndexR}} |
|
122 |
+#' \itemize{ |
|
123 |
+#' \item{Prilusky, J., Felder, C. E., et al. (2005). |
|
124 |
+#' FoldIndex: a simple tool to predict whether |
|
125 |
+#' a given protein sequence is intrinsically unfolded. |
|
126 |
+#' Bioinformatics, 21(16), 3435-3438.} |
|
127 |
+#' \item{Uversky, V. N., Gillespie, J. R., & Fink, A. L. (2000). |
|
128 |
+#' Why are “natively unfolded” proteins unstructured under |
|
129 |
+#' physiologic conditions?. Proteins: structure, function, |
|
130 |
+#' and bioinformatics, 41(3), 415-427. |
|
131 |
+#' https://doi.org/10.1002/1097-0134(20001115)41:3<415::AID-PROT130>3.0.CO;2-7} |
|
132 |
+#' \item{Also see citations for hydrapthy and charge plots above} |
|
133 |
+#' } |
|
115 | 134 |
#' \item \code{\link{iupred}}, |
116 | 135 |
#' \code{\link{iupredAnchor}}, |
117 | 136 |
#' \code{\link{iupredRedox}} |
... | ... |
@@ -155,7 +174,7 @@ idprofile <- function( |
155 | 174 |
uniprotAccession = NA, |
156 | 175 |
proteinName = NA, |
157 | 176 |
iupredType = "long", |
158 |
- window = 9, |
|
177 |
+ window = 51, |
|
159 | 178 |
pH = 7.2, |
160 | 179 |
pKaSet = "IPC_protein", |
161 | 180 |
structuralTendencyType = "bar", |
... | ... |
@@ -180,6 +199,12 @@ idprofile <- function( |
180 | 199 |
plotResults = TRUE, |
181 | 200 |
pKaSet = pKaSet, |
182 | 201 |
proteinName = proteinName) |
202 |
+ hydropPlot <- scaledHydropathyLocal( |
|
203 |
+ sequence = sequence, |
|
204 |
+ window = window, |
|
205 |
+ plotResults = TRUE, |
|
206 |
+ pKaSet = pKaSet, |
|
207 |
+ proteinName = proteinName) |
|
183 | 208 |
tendencyPlot <- structuralTendencyPlot( |
184 | 209 |
sequence = sequence, |
185 | 210 |
graphType = structuralTendencyType, |
... | ... |
@@ -188,6 +213,10 @@ idprofile <- function( |
188 | 213 |
disorderNeutral = disorderNeutral, |
189 | 214 |
orderPromoting = orderPromoting, |
190 | 215 |
proteinName = proteinName) |
216 |
+ foldIndexPlot <- foldIndexR(sequence = sequence, |
|
217 |
+ window = window, |
|
218 |
+ proteinName = proteinName, |
|
219 |
+ pKaSet = pKaSet) |
|
191 | 220 |
|
192 | 221 |
#-------- Adding IUPred Plot based on which type |
193 | 222 |
if (!is.na(uniprotAccession)) { |
... | ... |
@@ -216,6 +245,7 @@ idprofile <- function( |
216 | 245 |
label = "No Uniprot Accession provided...IUPred plot skipped") + |
217 | 246 |
ggplot2::theme_void() |
218 | 247 |
} |
219 |
- plotList <- list(rhPlot, tendencyPlot, chargePlot, hydropPlot, iupredPlot) |
|
248 |
+ plotList <- list(rhPlot, tendencyPlot, chargePlot, hydropPlot, |
|
249 |
+ foldIndexPlot, iupredPlot) |
|
220 | 250 |
return(plotList) |
221 | 251 |
} |
... | ... |
@@ -89,6 +89,11 @@ idprofile(sequence = P53_HUMAN, #Generates the Profile |
89 | 89 |
|
90 | 90 |
<img src="man/figures/README-example-5.png" width="75%" /> |
91 | 91 |
|
92 |
+ #> |
|
93 |
+ #> [[6]] |
|
94 |
+ |
|
95 |
+<img src="man/figures/README-example-6.png" width="75%" /> |
|
96 |
+ |
|
92 | 97 |
**Please Refer to idpr-vignette.Rmd file for a detailed introduction to |
93 | 98 |
the** **idpr package.** [Link to the Vignette |
94 | 99 |
(here)](https://bioconductor.org/packages/release/bioc/vignettes/idpr/inst/doc/idpr-vignette.html) |
... | ... |
@@ -104,7 +109,7 @@ citation("idpr") |
104 | 109 |
#> |
105 | 110 |
#> William M. McFadden and Judith L. Yanowitz (2020). idpr: Profiling |
106 | 111 |
#> and Analyzing Intrinsically Disordered Proteins in R. R package |
107 |
-#> version 1.0.005. |
|
112 |
+#> version 1.6.1. |
|
108 | 113 |
#> |
109 | 114 |
#> A BibTeX entry for LaTeX users is |
110 | 115 |
#> |
... | ... |
@@ -112,7 +117,7 @@ citation("idpr") |
112 | 117 |
#> title = {idpr: Profiling and Analyzing Intrinsically Disordered Proteins in R}, |
113 | 118 |
#> author = {William M. McFadden and Judith L. Yanowitz}, |
114 | 119 |
#> year = {2020}, |
115 |
-#> note = {R package version 1.0.005}, |
|
120 |
+#> note = {R package version 1.6.1}, |
|
116 | 121 |
#> } |
117 | 122 |
``` |
118 | 123 |
|
... | ... |
@@ -120,9 +125,9 @@ citation("idpr") |
120 | 125 |
|
121 | 126 |
``` r |
122 | 127 |
Sys.time() |
123 |
-#> [1] "2020-12-23 14:07:28 EST" |
|
128 |
+#> [1] "2022-03-11 02:31:26 EST" |
|
124 | 129 |
Sys.Date() |
125 |
-#> [1] "2020-12-23" |
|
130 |
+#> [1] "2022-03-11" |
|
126 | 131 |
R.version |
127 | 132 |
#> _ |
128 | 133 |
#> platform x86_64-apple-darwin17.0 |
... | ... |
@@ -131,12 +136,12 @@ R.version |
131 | 136 |
#> system x86_64, darwin17.0 |
132 | 137 |
#> status |
133 | 138 |
#> major 4 |
134 |
-#> minor 0.3 |
|
135 |
-#> year 2020 |
|
136 |
-#> month 10 |
|
139 |
+#> minor 1.3 |
|
140 |
+#> year 2022 |
|
141 |
+#> month 03 |
|
137 | 142 |
#> day 10 |
138 |
-#> svn rev 79318 |
|
143 |
+#> svn rev 81868 |
|
139 | 144 |
#> language R |
140 |
-#> version.string R version 4.0.3 (2020-10-10) |
|
141 |
-#> nickname Bunny-Wunnies Freak Out |
|
145 |
+#> version.string R version 4.1.3 (2022-03-10) |
|
146 |
+#> nickname One Push-Up |
|
142 | 147 |
``` |
... | ... |
@@ -28,6 +28,7 @@ A dataset containing a measure of hydropathy for each amino acid residue |
28 | 28 |
} |
29 | 29 |
\seealso{ |
30 | 30 |
Other scaled hydropathy functions: |
31 |
+\code{\link{foldIndexR}()}, |
|
31 | 32 |
\code{\link{meanScaledHydropathy}()}, |
32 | 33 |
\code{\link{scaledHydropathyGlobal}()}, |
33 | 34 |
\code{\link{scaledHydropathyLocal}()} |
39 | 40 |
new file mode 100644 |
... | ... |
@@ -0,0 +1,135 @@ |
1 |
+% Generated by roxygen2: do not edit by hand |
|
2 |
+% Please edit documentation in R/foldIndexR.R |
|
3 |
+\name{foldIndexR} |
|
4 |
+\alias{foldIndexR} |
|
5 |
+\title{Prediction of Intrinsic Disorder with FoldIndex method in R} |
|
6 |
+\usage{ |
|
7 |
+foldIndexR( |
|
8 |
+ sequence, |
|
9 |
+ window = 51, |
|
10 |
+ proteinName = NA, |
|
11 |
+ pKaSet = "IPC_protein", |
|
12 |
+ plotResults = TRUE |
|
13 |
+) |
|
14 |
+} |
|
15 |
+\arguments{ |
|
16 |
+\item{sequence}{amino acid sequence as a single character string, |
|
17 |
+a vector of single characters, or an AAString object. |
|
18 |
+It also supports a single character string that specifies |
|
19 |
+the path to a .fasta or .fa file.} |
|
20 |
+ |
|
21 |
+\item{window}{a positive, odd integer. 51 by default. |
|
22 |
+Sets the size of sliding window, must be an odd number. |
|
23 |
+The window determines the number of residues to be analyzed and averaged |
|
24 |
+for each position along the sequence.} |
|
25 |
+ |
|
26 |
+\item{proteinName}{character string with length = 1. |
|
27 |
+optional setting to replace the name of the plot if plotResults = TRUE.} |
|
28 |
+ |
|
29 |
+\item{pKaSet}{A character string or data frame. "IPC_protein" by default. |
|
30 |
+Character string to load specific, preloaded pKa sets. |
|
31 |
+ c("EMBOSS", "DTASelect", "Solomons", "Sillero", "Rodwell", |
|
32 |
+ "Lehninger", "Toseland", "Thurlkill", "Nozaki", "Dawson", |
|
33 |
+ "Bjellqvist", "ProMoST", "Vollhardt", "IPC_protein", "IPC_peptide") |
|
34 |
+ Alternatively, the user may supply a custom pKa dataset. |
|
35 |
+ The format must be a data frame where: |
|
36 |
+ Column 1 must be a character vector of residues named "AA" AND |
|
37 |
+ Column 2 must be a numeric vector of pKa values.} |
|
38 |
+ |
|
39 |
+\item{plotResults}{logical value, TRUE by default. |
|
40 |
+If \code{plotResults = TRUE} a plot will be the output. |
|
41 |
+If \code{plotResults = FALSE} the output is a data frame with scores for |
|
42 |
+each window analyzed.} |
|
43 |
+ |
|
44 |
+\item{...}{any additional parameters, especially those for plotting.} |
|
45 |
+} |
|
46 |
+\value{ |
|
47 |
+see plotResults argument |
|
48 |
+} |
|
49 |
+\description{ |
|
50 |
+This is used to calculate the prediction of intrinsic disorder based on |
|
51 |
+ the scaled hydropathy and absolute net charge of an amino acid |
|
52 |
+ sequence using a sliding window. FoldIndex described this relationship and |
|
53 |
+ implemented it graphically in 2005 by Prilusky, Felder, et al, |
|
54 |
+ and this tool has been implemented |
|
55 |
+ into multiple disorder prediction programs. When windows have a negative |
|
56 |
+ score (<0) sequences are predicted as disordered. |
|
57 |
+ When windows have a positive score (>0) sequences are predicted as |
|
58 |
+ disordered. Graphically, this cutoff is displayed by the dashed |
|
59 |
+ line at y = 0. Calculations are at pH 7.0 based on the described method and |
|
60 |
+ the default is a sliding window of size 51. |
|
61 |
+ |
|
62 |
+ The output is either a data frame or graph |
|
63 |
+ showing the calculated scores for each window along the sequence. |
|
64 |
+ The equation used was originally described in Uversky et al. (2000)\cr |
|
65 |
+ \url{https://doi.org/10.1002/1097-0134(20001115)41:3<415::AID-PROT130>3.0.CO;2-7} |
|
66 |
+ . \cr |
|
67 |
+ |
|
68 |
+ The FoldIndex method of using a sliding window and utilizing the uversky |
|
69 |
+ equation is described in Prilusky, J., Felder, C. E., et al. (2005). \cr |
|
70 |
+ FoldIndex: a simple tool to predict whether a given protein sequence \cr |
|
71 |
+ is intrinsically unfolded. Bioinformatics, 21(16), 3435-3438. \cr |
|
72 |
+} |
|
73 |
+\section{Plot Colors}{ |
|
74 |
+ |
|
75 |
+ For users who wish to keep a common aesthetic, the following colors are |
|
76 |
+ used when plotResults = TRUE. \cr |
|
77 |
+ \itemize{ |
|
78 |
+ \item Dynamic line colors: \itemize{ |
|
79 |
+ \item Close to -1 = "#9672E6" |
|
80 |
+ \item Close to 1 = "#D1A63F" |
|
81 |
+ \item Close to midpoint = "grey65" or "#A6A6A6"}} |
|
82 |
+ |
|
83 |
+ @references |
|
84 |
+ Kozlowski, L. P. (2016). IPC – Isoelectric Point Calculator. Biology |
|
85 |
+ Direct, 11(1), 55. \url{https://doi.org/10.1186/s13062-016-0159-9} \cr |
|
86 |
+ Kyte, J., & Doolittle, R. F. (1982). A simple method for |
|
87 |
+ displaying the hydropathic character of a protein. |
|
88 |
+ Journal of molecular biology, 157(1), 105-132. \cr |
|
89 |
+ Prilusky, J., Felder, C. E., et al. (2005). \cr |
|
90 |
+ FoldIndex: a simple tool to predict whether a given protein sequence \cr |
|
91 |
+ is intrinsically unfolded. Bioinformatics, 21(16), 3435-3438. \cr |
|
92 |
+ Uversky, V. N., Gillespie, J. R., & Fink, A. L. (2000). |
|
93 |
+ Why are “natively unfolded” proteins unstructured under physiologic |
|
94 |
+ conditions?. Proteins: structure, function, and bioinformatics, 41(3), |
|
95 |
+ 415-427. |
|
96 |
+ \url{https://doi.org/10.1002/1097-0134(20001115)41:3<415::AID-PROT130>3.0.CO;2-7} |
|
97 |
+} |
|
98 |
+ |
|
99 |
+\examples{ |
|
100 |
+#Amino acid sequences can be character strings |
|
101 |
+aaString <- "ACDEFGHIKLMNPQRSTVWY" |
|
102 |
+#Amino acid sequences can also be character vectors |
|
103 |
+aaVector <- c("A", "C", "D", "E", "F", |
|
104 |
+ "G", "H", "I", "K", "L", |
|
105 |
+ "M", "N", "P", "Q", "R", |
|
106 |
+ "S", "T", "V", "W", "Y") |
|
107 |
+#Alternatively, .fasta files can also be used by providing |
|
108 |
+ ##The path to the file as a character string. |
|
109 |
+ |
|
110 |
+ |
|
111 |
+foldIndexR(aaVector) |
|
112 |
+ |
|
113 |
+exampleDF <- |
|
114 |
+ foldIndexR(aaString, |
|
115 |
+ plotResults = FALSE) |
|
116 |
+head(exampleDF) |
|
117 |
+ |
|
118 |
+} |
|
119 |
+\references{ |
|
120 |
+Kyte, J., & Doolittle, R. F. (1982). A simple method for |
|
121 |
+ displaying the hydropathic character of a protein. |
|
122 |
+ Journal of molecular biology, 157(1), 105-132. |
|
123 |
+} |
|
124 |
+\seealso{ |
|
125 |
+\code{\link{KDNorm}} for residue hydropathy values. |
|
126 |
+ See \code{\link{pKaData}} for residue pKa values and citations. See |
|
127 |
+ \code{\link{hendersonHasselbalch}} for charge calculations. |
|
128 |
+ |
|
129 |
+Other scaled hydropathy functions: |
|
130 |
+\code{\link{KDNorm}}, |
|
131 |
+\code{\link{meanScaledHydropathy}()}, |
|
132 |
+\code{\link{scaledHydropathyGlobal}()}, |
|
133 |
+\code{\link{scaledHydropathyLocal}()} |
|
134 |
+} |
|
135 |
+\concept{scaled hydropathy functions} |
... | ... |
@@ -43,10 +43,11 @@ disorder based on environmental conditions. Regions of predicted |
43 | 43 |
environmental sensitivity are highlighted. See the respective functions |
44 | 44 |
for more details. This is skipped if uniprotAccession = NA.} |
45 | 45 |
|
46 |
-\item{window}{a positive, odd integer. 7 by default. |
|
46 |
+\item{window}{a positive, odd integer. 51 by default. |
|
47 | 47 |
Sets the size of sliding window, must be an odd number. |
48 | 48 |
The window determines the number of residues to be analyzed and averaged |
49 |
-for each position along the sequence.} |
|
49 |
+for each position along the sequence. 51 is default for |
|
50 |
+\code{\link{foldIndexR}}\cr.} |
|
50 | 51 |
|
51 | 52 |
\item{pH}{numeric value, 7.0 by default. |
52 | 53 |
The environmental pH used to calculate residue charge.} |
... | ... |
@@ -94,6 +95,7 @@ The IDPRofile is a summation of many features of the idpr package, |
94 | 95 |
\code{\link{chargeCalculationLocal}}\cr |
95 | 96 |
\code{\link{scaledHydropathyLocal}}\cr |
96 | 97 |
\code{\link{structuralTendencyPlot}}\cr |
98 |
+ \code{\link{foldIndexR}}\cr |
|
97 | 99 |
All of the above linked functions only require the sequence argument |
98 | 100 |
to output plots of characteristics associated with IDPs. The function also |
99 | 101 |
includes options for IUPred functions. The function does one of the |
... | ... |
@@ -149,6 +151,19 @@ The IDPRofile is a summation of many features of the idpr package, |
149 | 151 |
Protein Science, 22(6), 693-724. |
150 | 152 |
doi:10.1002/pro.2261 } |
151 | 153 |
} |
154 |
+ \item \code{\link{foldIndexR}} |
|
155 |
+ \itemize{ |
|
156 |
+ \item{Prilusky, J., Felder, C. E., et al. (2005). |
|
157 |
+ FoldIndex: a simple tool to predict whether |
|
158 |
+ a given protein sequence is intrinsically unfolded. |
|
159 |
+ Bioinformatics, 21(16), 3435-3438.} |
|
160 |
+ \item{Uversky, V. N., Gillespie, J. R., & Fink, A. L. (2000). |
|
161 |
+ Why are “natively unfolded” proteins unstructured under |
|
162 |
+ physiologic conditions?. Proteins: structure, function, |
|
163 |
+ and bioinformatics, 41(3), 415-427. |
|
164 |
+ https://doi.org/10.1002/1097-0134(20001115)41:3<415::AID-PROT130>3.0.CO;2-7} |
|
165 |
+ \item{Also see citations for hydrapthy and charge plots above} |
|
166 |
+ } |
|
152 | 167 |
\item \code{\link{iupred}}, |
153 | 168 |
\code{\link{iupredAnchor}}, |
154 | 169 |
\code{\link{iupredRedox}} |
... | ... |
@@ -194,6 +209,7 @@ idprofile( |
194 | 209 |
\code{\link{chargeCalculationLocal}}\cr |
195 | 210 |
\code{\link{scaledHydropathyLocal}}\cr |
196 | 211 |
\code{\link{structuralTendencyPlot}}\cr |
212 |
+ \code{\link{foldIndexR}}\cr |
|
197 | 213 |
\code{\link{iupred}}\cr |
198 | 214 |
\code{\link{iupredAnchor}}\cr |
199 | 215 |
\code{\link{iupredRedox}} |
... | ... |
@@ -105,6 +105,7 @@ Kyte, J., & Doolittle, R. F. (1982). A simple method for |
105 | 105 |
|
106 | 106 |
Other scaled hydropathy functions: |
107 | 107 |
\code{\link{KDNorm}}, |
108 |
+\code{\link{foldIndexR}()}, |
|
108 | 109 |
\code{\link{meanScaledHydropathy}()}, |
109 | 110 |
\code{\link{scaledHydropathyGlobal}()} |
110 | 111 |
} |
... | ... |
@@ -63,7 +63,15 @@ $$<R> = - 2.785 <H> + 1.151 $$ |
63 | 63 |
|
64 | 64 |
This plot allows a distinction between |
65 | 65 |
negative and positive proteins while preserving the information of the |
66 |
-charge-hydropathy plot. |
|
66 |
+charge-hydropathy plot. |
|
67 |
+ |
|
68 |
+Further, a this can be used to identify folded regions on a protein. |
|
69 |
+FoldIndex used this equation and set variables to 0 and using a sliding window, |
|
70 |
+the resulting values would identify regions predicted as folded or unfolded. |
|
71 |
+$$ Score = 2.785 <H> - \lvert<R>\rvert -1.151 $$ |
|
72 |
+When windows have a negative score (<0) sequences are predicted as disordered. |
|
73 |
+When windows have a positive score (>0) sequences are predicted as ordered. |
|
74 |
+This was described in Prilusky, J., Felder, C. E., et al. (2005). |
|
67 | 75 |
|
68 | 76 |
## Installation |
69 | 77 |
|
... | ... |
@@ -196,6 +204,15 @@ chargeHydropathyPlot( |
196 | 204 |
``` |
197 | 205 |
|
198 | 206 |
|
207 |
+## Using FoldIndexR to predict folded and unfolded windows. |
|
208 |
+ |
|
209 |
+```{r} |
|
210 |
+foldIndexR(sequence = HUMAN_P53, |
|
211 |
+ plotResults = TRUE) |
|
212 |
+``` |
|
213 |
+ |
|
214 |
+Prilusky, J., Felder, C. E., et al. (2005). |
|
215 |
+ |
|
199 | 216 |
## Calculating Scaled Hydropathy |
200 | 217 |
|
201 | 218 |
### Mean Scaled Hydropathy |
... | ... |
@@ -521,6 +538,11 @@ biology, 157(1), 105-132. |
521 | 538 |
Po, H. N., & Senozan, N. (2001). The Henderson-Hasselbalch equation: |
522 | 539 |
its history and limitations. Journal of Chemical Education, 78(11), 1499. |
523 | 540 |
|
541 |
+Prilusky, J., Felder, C. E., et al. (2005). |
|
542 |
+FoldIndex: a simple tool to predict whether a given protein sequence |
|
543 |
+is intrinsically unfolded. Bioinformatics, 21(16), 3435-3438. |
|
544 |
+ |
|
545 |
+ |
|
524 | 546 |
Proteinogenic amino acid. (n.d.). In Wikipedia. Retrieved July 12th, 2020. |
525 | 547 |
https://en.wikipedia.org/wiki/Proteinogenic_amino_acid#Chemical_properties |
526 | 548 |
|
... | ... |
@@ -153,12 +153,13 @@ idprofile(sequence = P53_HUMAN, |
153 | 153 |
``` |
154 | 154 |
|
155 | 155 |
|
156 |
-idprofile returns 4-5 plots: |
|
156 |
+idprofile returns 5-6 plots: |
|
157 | 157 |
|
158 | 158 |
* Charge-Hydropathy Plot^\*^ |
159 | 159 |
* Plot of Amino Acid Composition and Structural Tendency^†^ |
160 | 160 |
* Calculations of Local Charge Along a Protein Sequence^\*^ |
161 | 161 |
* Local, Scaled Hydropathy Along a Protein Sequence^\*^ |
162 |
+ * A prediction of intrinsic disorder by FoldIndex^\*^ |
|
162 | 163 |
* A prediction of intrinsic disorder by IUPred2 (only with a uniprotAccession)^‡^ |
163 | 164 |
|
164 | 165 |
*Detailed descriptions of each plot can be found in specific vignettes.* |
... | ... |
@@ -173,7 +174,7 @@ idprofile returns 4-5 plots: |
173 | 174 |
A brief explanation of each plot is given below: |
174 | 175 |
|
175 | 176 |
|
176 |
-### Charge-Hydropathy Plot |
|
177 |
+### Charge-Hydropathy Plot and FoldIndex |
|
177 | 178 |
|
178 | 179 |
Uversky, Gillespie, & Fink (2000) showed that both high net charge and |
179 | 180 |
low mean hydropathy are properties of IDPs (15). One explanation is that a high |
... | ... |
@@ -185,7 +186,11 @@ graphic can be used to distinguish proteins that are extended or compact under |
185 | 186 |
native conditions. However, it is important to note that IDPs can have the |
186 | 187 |
characteristics of a collapsed protein or an extended protein. Therefore a |
187 | 188 |
protein within the “collapsed protein” field does not necessary mean that it |
188 |
-lacks intrinsic disorder under native conditions (15, 31). |
|
189 |
+lacks intrinsic disorder under native conditions (15, 31). This equation was |
|
190 |
+later applied to a method of predicting unfolded peptides using a sliding window |
|
191 |
+of charge and hydropathy in FoldIndex (44). When scores are negative, a region |
|
192 |
+is predicted as unfolded; when scores are positive, a region is predicted as |
|
193 |
+folded. |
|
189 | 194 |
|
190 | 195 |
|
191 | 196 |
**For further theory and details, please refer to idpr's ** |
... | ... |
@@ -387,6 +392,7 @@ et al. (2001) (25). |
387 | 392 |
41. Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, Madden TL. NCBI BLAST: a better web interface. Nucleic Acids Research. 2008;36(suppl_2):W5-W9. |
388 | 393 |
42. Madeira F, Park YM, Lee J, Buso N, Gur T, Madhusoodanan N, et al. The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic acids research. 2019;47(W1):W636-W41. |
389 | 394 |
43. Pagès H, Aboyoun P, Gentleman R, DebRoy S. Biostrings: Efficient manipulation of biological strings. R package version. 2020;2(0). |
395 |
+44. Prilusky J, Felder C, Zeev-Ben-Mordehai T, Rydberg E, Man O, Beckmann J, Silman I, & Sussman J. FoldIndex©: a simple tool to predict whether a given protein sequence is intrinsically unfolded. Bioinformatics 21, no. 16 (2005): 3435-3438. |
|
390 | 396 |
|
391 | 397 |
|
392 | 398 |
|