Browse code

Adding iupred functions

Three functions based on IUPred2A have been added to the idpr package.
This includes iupred(), iupredAnchor(), and iupredRedox(). These functions do not activly predict disorder, but fetch data from the IUPred2A REST API based on UniProt Accession ID.
IUPred2A citation: Mészáros et al (2019)

WilliamMc authored on 26/05/2020 20:03:36
Showing 3 changed files

... ...
@@ -4,6 +4,9 @@ export(chargeCalculationGlobal)
4 4
 export(chargeCalculationLocal)
5 5
 export(chargeHydropathyPlot)
6 6
 export(hendersonHasselbalch)
7
+export(iupred)
8
+export(iupredAnchor)
9
+export(iupredRedox)
7 10
 export(meanScaledHydropathy)
8 11
 export(netCharge)
9 12
 export(scaledHydropathyGlobal)
10 13
new file mode 100644
... ...
@@ -0,0 +1,315 @@
1
+#' Prediction of Intrinsic Disorder with IUPred2A
2
+#'
3
+#' This function makes a connection to the IUPred2A REST API based on the type
4
+#'   of analysis and UniProt accession number. This requires the user to know
5
+#'   the accession number of their protein and a connection to the internet.
6
+#'   The results are then formatted to match output in the idpr package. \cr \cr
7
+#'   Predictions are made on a scale of 0-1, where any residues with a score
8
+#'   over 0.5 are predicted to be disordered, and any residue scoring below 0.5
9
+#'   are predicted to be ordered (when using "long" and "short" predictions).\cr
10
+#'   The output is either a graph (ggplot) or data frame of predictions.
11
+#'   \cr\cr
12
+#'   \strong{iupred()} is used for standard predictions of intrinsic disorder
13
+#'   of an amino acid sequence. This is the core of predictions.
14
+#'   Predictions vary by iupredType (details below)
15
+#'   The results are either a ggplot or data frame of the fetched IUPred2.
16
+#'   predictions.
17
+#'   \cr
18
+#'   \strong{iupredAnchor()} is used to combine the output of IUPred2 long with
19
+#'   ANCHOR2 predictions. ANCHOR2 is a context-dependent predictor of binding
20
+#'   regions for protein-protein interactions. The results are either a ggplot
21
+#'   with 2 lines, one for IUPred2 long and another for ANCHOR predictions, or
22
+#'   a data frame with both IUPred2 long and ANCHOR Predictions. Values are
23
+#'   fetched by the IUPred2A REST API.
24
+#'   \cr
25
+#'   \strong{iupredRedox()} is used to predict redox-senstitive regions that may
26
+#'   experience induced folding upon changing environments.
27
+#'   This is a context-dependent predictor of disordered regions depending on
28
+#'   a reducing (plus) or oxidizing (minus) enviornment. The results can be
29
+#'   a ggplot with two IUPred2 long predictions, one for plus and another for
30
+#'   minus enviornments, with redox senstitive regions shaded (if predicted).
31
+#'   Alternativly, the results can be a data frame with both IUPred2 long plus
32
+#'   and minus predictions as well as a column of logical values where a residue
33
+#'   that is TRUE is predicted to be in a redox senstitive region. Values are
34
+#'   fetched by the IUPred2A REST API.
35
+#'   \cr \cr
36
+#'   IUPred2 website is located at \url{https://iupred2a.elte.hu/}.
37
+#'   For detailed information on using IUPred2A, please refer to
38
+#'   \href{https://doi.org/10.1002/cpbi.99}{Erdős & Dosztány (2020)}
39
+#'   Analyzing protein disorder with IUPred2A.
40
+#'   Current Protocols in Bioinformatics, 70, e99.
41
+#'   Additionally, please see
42
+#'   \href{https://doi.org/10.1093/nar/gky384}{Mészáros et al (2019)}
43
+#'   for further information, theory, and applications of IUPred2A.
44
+#'   \cr \cr
45
+#'   \strong{Please cite these articles if you use any iupred function.}
46
+#'
47
+#' @param uniprotAccession character string specifying the UniProt Accession
48
+#'   number of the protein used to fetch IUPred predictions.
49
+#' @param iupredType character string. "long" by default. accepted types are
50
+#'   c("long", "short", "glob"). See "Prediction Type" information below.
51
+#' @param proteinName character string, optional. Used to add protein name
52
+#'   to the title in ggplot. Ignored if \code{plotResults = FALSE}.
53
+#' @param plotResults logical value. TRUE by default.
54
+#'   If \code{plotResults = TRUE}, a ggplot of IUPred predictions is returned
55
+#'   If \code{plotResults = FALSE}, a dataframe of predictions is returned.
56
+#' @return see plotResults argument.
57
+#' @section Prediction Type:
58
+#'   Information from \url{https://iupred2a.elte.hu/help_new} on 5.22.20
59
+#'   Additionally, see the sources for futher details and source information.
60
+#'   This is only relevant for iupred(). iupredAnchor() and iupredRedox()
61
+#'   always utilize "long" for data in the REST API.
62
+#'   \itemize{
63
+#'     \item Long predictions of disorder (Default)
64
+#'        \itemize{
65
+#'          \item when iupredType = "long"
66
+#'          \item Optimized for global predictions of disorder, specifically
67
+#'            disordered regions over 30 amino acids in length.
68
+#'          \item "long" is always used for iupredAnchor() and iupredRedox().
69
+#'        }
70
+#'      \item Short predictions of disorder
71
+#'        \itemize{
72
+#'          \item when iupredType = "short"
73
+#'          \item Best for predicting small regions of disorder, especially
74
+#'            in mostly structured proteins.
75
+#'          \item Has adjustments for termini, since sequence ends are often
76
+#'            disordered.
77
+#'        }
78
+#'      \item Structured predictions
79
+#'        \itemize{
80
+#'          \item when iupredType = "glob"
81
+#'          \item Used to predict regions of globular folding.
82
+#'          \item please see
83
+#'            \href{https://doi.org/10.1002/cpbi.99}{Erdős & Dosztány (2020)}
84
+#'            for further information on interpreting these results.
85
+#'        }
86
+#'    }
87
+#' @source Bálint Mészáros, Gábor Erdős, Zsuzsanna Dosztányi,
88
+#'   IUPred2A: context-dependent prediction of protein disorder as a function of
89
+#'   redox state and protein binding, Nucleic Acids Research, Volume 46, Issue
90
+#'   W1, 2 July 2018, Pages W329–W337, \url{https://doi.org/10.1093/nar/gky384}
91
+#'   \cr\cr
92
+#'   Erdős, G., & Dosztányi, Z. (2020). Analyzing protein disorder with
93
+#'   IUPred2A. Current Protocols in Bioinformatics, 70, e99.
94
+#'   \url{https://doi.org/10.1002/cpbi.99}
95
+#' @export
96
+
97
+#----
98
+iupred <- function(
99
+  uniprotAccession,
100
+  iupredType = "long",
101
+  plotResults = TRUE,
102
+  proteinName = NA) {
103
+
104
+  #------
105
+  #Connecting to IUPred2A REST API
106
+  iupredURL <- paste("https://iupred2a.elte.hu/iupred2a/",
107
+                     iupredType,
108
+                     "/",
109
+                     uniprotAccession,
110
+                     ".json",
111
+                     sep = "")
112
+  iupredJson <- jsonlite::fromJSON(iupredURL)
113
+  #-----
114
+  #Reformatting data to be consistent in formatting across idpr
115
+  iupredPrediction <- iupredJson$iupred2
116
+  iupredSequence <- unlist(strsplit(iupredJson$sequence, ""))
117
+  iupredSequence <- unlist(iupredSequence)
118
+  seqLength <- length(iupredSequence)
119
+  iupredDF <- data.frame(Position = 1:seqLength,
120
+                         AA = iupredSequence,
121
+                         IUPred2 = iupredPrediction)
122
+  #------
123
+  #Returning
124
+  if (plotResults) {
125
+    if (!is.na(proteinName)) {
126
+      plotTitle <- paste("Prediction of Intrinsic Disorder in ",
127
+                         proteinName,
128
+                         sep = "")
129
+    } else {
130
+      plotTitle <- "Prediction of Intrinsic Disorder"
131
+    }
132
+    jsonType <- iupredJson$type
133
+    plotSubtitle <- paste("By IUPred2A ",
134
+                          jsonType,
135
+                          sep = "")
136
+
137
+    gg <-  sequencePlot(
138
+      position = iupredDF$Position,
139
+      property = iupredDF$IUPred2,
140
+      hline = 0.5,
141
+      dynamicColor = iupredDF$IUPred2,
142
+      customColors = c("darkolivegreen3", "darkorchid1", "grey65"),
143
+      customTitle = NA,
144
+      propertyLimits = c(0, 1))
145
+
146
+    gg <- gg + ggplot2::labs(title = plotTitle,
147
+                             subtitle = plotSubtitle)
148
+    return(gg)
149
+  } else {
150
+    return(iupredDF)
151
+  }
152
+
153
+}
154
+
155
+
156
+#' @rdname iupred
157
+#' @export
158
+#----
159
+iupredAnchor <- function(
160
+  uniprotAccession,
161
+  plotResults = TRUE,
162
+  proteinName = NA) {
163
+
164
+  #------
165
+  #Connecting to IUPred2A REST API
166
+  iupredURL <- paste("https://iupred2a.elte.hu/iupred2a/",
167
+                     "anchor",
168
+                     "/",
169
+                     uniprotAccession,
170
+                     ".json",
171
+                     sep = "")
172
+  iupredJson <- jsonlite::fromJSON(iupredURL)
173
+  #-----
174
+  #Reformatting data to be consistent in formatting across idpr
175
+  iupredPrediction <- iupredJson$iupred2
176
+  anchorPrediction <- iupredJson$anchor2
177
+  iupredSequence <- unlist(strsplit(iupredJson$sequence, ""))
178
+  iupredSequence <- unlist(iupredSequence)
179
+  seqLength <- length(iupredSequence)
180
+  iupredDF <- data.frame(Position = 1:seqLength,
181
+                         AA = iupredSequence,
182
+                         IUPred2 = iupredPrediction,
183
+                         ANCHOR2 = anchorPrediction)
184
+  #------
185
+  #Returning
186
+  if (plotResults) {
187
+    if (!is.na(proteinName)) {
188
+      plotTitle <- paste("Prediction of Intrinsic Disorder in ",
189
+                         proteinName,
190
+                         sep = "")
191
+    } else {
192
+      plotTitle <- "Prediction of Intrinsic Disorder"
193
+    }
194
+    jsonType <- iupredJson$type
195
+    plotSubtitle <- paste("By IUPred2A ",
196
+                          jsonType,
197
+                          " and ANCHOR2",
198
+                          sep = "")
199
+
200
+    gg <- sequencePlot(
201
+      position = iupredDF$Position,
202
+      property = iupredDF$IUPred2,
203
+      hline = 0.5,
204
+      dynamicColor = iupredDF$IUPred2,
205
+      customColors = c("darkolivegreen3", "darkorchid1", "grey65"),
206
+      customTitle = NA,
207
+      propertyLimits = c(0, 1))
208
+    gg <- gg + ggplot2::geom_line(data = iupredDF,
209
+                                  ggplot2::aes_(x = ~ Position,
210
+                                               y = ~ ANCHOR2),
211
+                                  color = "#92140C",
212
+                                  inherit.aes = FALSE)
213
+    gg <- gg + ggplot2::labs(title = plotTitle,
214
+                             subtitle = plotSubtitle)
215
+    return(gg)
216
+  } else {
217
+    return(iupredDF)
218
+  }
219
+
220
+}
221
+
222
+#' @rdname iupred
223
+#' @export
224
+iupredRedox <- function(
225
+  uniprotAccession,
226
+  plotResults = TRUE,
227
+  proteinName = NA) {
228
+
229
+  #------
230
+  #Connecting to IUPred2A REST API
231
+  iupredURL <- paste("https://iupred2a.elte.hu/iupred2a/",
232
+                     "redox",
233
+                     "/",
234
+                     uniprotAccession,
235
+                     ".json",
236
+                     sep = "")
237
+  iupredJson <- jsonlite::fromJSON(iupredURL)
238
+  #-----
239
+  #Reformatting data to be consistent in formatting across idpr
240
+  iupredPlus <- iupredJson$iupred2_redox_plus
241
+  iupredMinus <- iupredJson$iupred2_redox_minus
242
+  redoxSenstitiveMat <- iupredJson$redox_sensitive_regions
243
+  redoxSenstitiveDF <- as.data.frame(redoxSenstitiveMat)
244
+  iupredSequence <- unlist(strsplit(iupredJson$sequence, ""))
245
+  iupredSequence <- unlist(iupredSequence)
246
+  seqLength <- length(iupredSequence)
247
+  iupredDF <- data.frame(Position = 1:seqLength,
248
+                         AA = iupredSequence,
249
+                         iupredPlus = iupredPlus,
250
+                         iupredMinus = iupredMinus)
251
+  #------
252
+  #Returning
253
+  if (plotResults) {
254
+    if (!is.na(proteinName)) {
255
+      plotTitle <- paste("Prediction of Intrinsic Disorder in ",
256
+                         proteinName,
257
+                         sep = "")
258
+    } else {
259
+      plotTitle <- "Prediction of Intrinsic Disorder"
260
+    }
261
+    jsonType <- iupredJson$type
262
+    plotSubtitle <- paste("By IUPred2 ",
263
+                          jsonType,
264
+                          "|Based on Environmental Redox State",
265
+                          sep = "")
266
+    gg <- ggplot2::ggplot(iupredDF,
267
+                          ggplot2::aes(x = Position))
268
+
269
+    if (!is.null(redoxSenstitiveDF[1, 1])) {
270
+      gg <- gg + ggplot2::geom_rect(inherit.aes = F,
271
+                                    data = redoxSenstitiveDF,
272
+                                    ggplot2::aes_(xmin = ~ V1,
273
+                                                 xmax = ~ V2,
274
+                                                 ymin = 0,
275
+                                                 ymax = 1),
276
+                                    alpha = 0.5,
277
+                                    fill = "#5DD39E")
278
+    }
279
+    legendTitle <- "Redox-Sensitive\nDisorder Prediction"
280
+    gg <- gg + ggplot2::geom_hline(yintercept = 0.5,
281
+                                   linetype = "dotdash",
282
+                                   color = "gray13",
283
+                                   size = 1,
284
+                                   alpha = 0.5)
285
+    gg <- gg + ggplot2::geom_line(ggplot2::aes(y = iupredMinus,
286
+                                               color = "iupredMin"),
287
+                                  linetype = "solid") +
288
+      ggplot2::geom_line(ggplot2::aes(y = iupredPlus,
289
+                                      color = "iupredPlus"),
290
+                         linetype = "solid") +
291
+      ggplot2::scale_color_manual(values = c("iupredPlus" = "#BF3EFF",
292
+                                             "iupredMin" = "#348AA7"),
293
+                                  labels = c("Plus",
294
+                                             "Minus"),
295
+                                  name = legendTitle)
296
+    gg <- gg + ggplot2::labs(title = plotTitle,
297
+                             subtitle = plotSubtitle,
298
+                             x = "Residue",
299
+                             y = "Score") +
300
+      ggplot2::theme_minimal() +
301
+      ggplot2::geom_hline(yintercept = c(0, 1), color = "gray2")
302
+    return(gg)
303
+  } else {
304
+    if (!is.null(redoxSenstitiveDF[1, 1])) {
305
+      senstitiveRegions <- unlist(Map(":",
306
+                                      redoxSenstitiveDF$V1,
307
+                                      redoxSenstitiveDF$V2))
308
+      senstitivePositions <- 1:seqLength %in% unlist(senstitiveRegions)
309
+      iupredDF$redoxSensitive <- senstitivePositions
310
+    } else {
311
+      iupredDF$redoxSensitive <- rep(FALSE, nrow(iupredDF))
312
+    }
313
+    return(iupredDF)
314
+  }
315
+}
0 316
new file mode 100644
... ...
@@ -0,0 +1,124 @@
1
+% Generated by roxygen2: do not edit by hand
2
+% Please edit documentation in R/iupred.R
3
+\name{iupred}
4
+\alias{iupred}
5
+\alias{iupredAnchor}
6
+\alias{iupredRedox}
7
+\title{Prediction of Intrinsic Disorder with IUPred2A}
8
+\source{
9
+Bálint Mészáros, Gábor Erdős, Zsuzsanna Dosztányi,
10
+  IUPred2A: context-dependent prediction of protein disorder as a function of
11
+  redox state and protein binding, Nucleic Acids Research, Volume 46, Issue
12
+  W1, 2 July 2018, Pages W329–W337, \url{https://doi.org/10.1093/nar/gky384}
13
+  \cr\cr
14
+  Erdős, G., & Dosztányi, Z. (2020). Analyzing protein disorder with
15
+  IUPred2A. Current Protocols in Bioinformatics, 70, e99.
16
+  \url{https://doi.org/10.1002/cpbi.99}
17
+}
18
+\usage{
19
+iupred(
20
+  uniprotAccession,
21
+  iupredType = "long",
22
+  plotResults = TRUE,
23
+  proteinName = NA
24
+)
25
+
26
+iupredAnchor(uniprotAccession, plotResults = TRUE, proteinName = NA)
27
+
28
+iupredRedox(uniprotAccession, plotResults = TRUE, proteinName = NA)
29
+}
30
+\arguments{
31
+\item{uniprotAccession}{character string specifying the UniProt Accession
32
+number of the protein used to fetch IUPred predictions.}
33
+
34
+\item{iupredType}{character string. "long" by default. accepted types are
35
+c("long", "short", "glob"). See "Prediction Type" information below.}
36
+
37
+\item{plotResults}{logical value. TRUE by default.
38
+If \code{plotResults = TRUE}, a ggplot of IUPred predictions is returned
39
+If \code{plotResults = FALSE}, a dataframe of predictions is returned.}
40
+
41
+\item{proteinName}{character string, optional. Used to add protein name
42
+to the title in ggplot. Ignored if \code{plotResults = FALSE}.}
43
+}
44
+\value{
45
+see plotResults argument.
46
+}
47
+\description{
48
+This function makes a connection to the IUPred2A REST API based on the type
49
+  of analysis and UniProt accession number. This requires the user to know
50
+  the accession number of their protein and a connection to the internet.
51
+  The results are then formatted to match output in the idpr package. \cr \cr
52
+  Predictions are made on a scale of 0-1, where any residues with a score
53
+  over 0.5 are predicted to be disordered, and any residue scoring below 0.5
54
+  are predicted to be ordered (when using "long" and "short" predictions).\cr
55
+  The output is either a graph (ggplot) or data frame of predictions.
56
+  \cr\cr
57
+  \strong{iupred()} is used for standard predictions of intrinsic disorder
58
+  of an amino acid sequence. This is the core of predictions.
59
+  Predictions vary by iupredType (details below)
60
+  The results are either a ggplot or data frame of the fetched IUPred2.
61
+  predictions.
62
+  \cr
63
+  \strong{iupredAnchor()} is used to combine the output of IUPred2 long with
64
+  ANCHOR2 predictions. ANCHOR2 is a context-dependent predictor of binding
65
+  regions for protein-protein interactions. The results are either a ggplot
66
+  with 2 lines, one for IUPred2 long and another for ANCHOR predictions, or
67
+  a data frame with both IUPred2 long and ANCHOR Predictions. Values are
68
+  fetched by the IUPred2A REST API.
69
+  \cr
70
+  \strong{iupredRedox()} is used to predict redox-senstitive regions that may
71
+  experience induced folding upon changing environments.
72
+  This is a context-dependent predictor of disordered regions depending on
73
+  a reducing (plus) or oxidizing (minus) enviornment. The results can be
74
+  a ggplot with two IUPred2 long predictions, one for plus and another for
75
+  minus enviornments, with redox senstitive regions shaded (if predicted).
76
+  Alternativly, the results can be a data frame with both IUPred2 long plus
77
+  and minus predictions as well as a column of logical values where a residue
78
+  that is TRUE is predicted to be in a redox senstitive region. Values are
79
+  fetched by the IUPred2A REST API.
80
+  \cr \cr
81
+  IUPred2 website is located at \url{https://iupred2a.elte.hu/}.
82
+  For detailed information on using IUPred2A, please refer to
83
+  \href{https://doi.org/10.1002/cpbi.99}{Erdős & Dosztány (2020)}
84
+  Analyzing protein disorder with IUPred2A.
85
+  Current Protocols in Bioinformatics, 70, e99.
86
+  Additionally, please see
87
+  \href{https://doi.org/10.1093/nar/gky384}{Mészáros et al (2019)}
88
+  for further information, theory, and applications of IUPred2A.
89
+  \cr \cr
90
+  \strong{Please cite these articles if you use any iupred function.}
91
+}
92
+\section{Prediction Type}{
93
+
94
+  Information from \url{https://iupred2a.elte.hu/help_new} on 5.22.20
95
+  Additionally, see the sources for futher details and source information.
96
+  This is only relevant for iupred(). iupredAnchor() and iupredRedox()
97
+  always utilize "long" for data in the REST API.
98
+  \itemize{
99
+    \item Long predictions of disorder (Default)
100
+       \itemize{
101
+         \item when iupredType = "long"
102
+         \item Optimized for global predictions of disorder, specifically
103
+           disordered regions over 30 amino acids in length.
104
+         \item "long" is always used for iupredAnchor() and iupredRedox().
105
+       }
106
+     \item Short predictions of disorder
107
+       \itemize{
108
+         \item when iupredType = "short"
109
+         \item Best for predicting small regions of disorder, especially
110
+           in mostly structured proteins.
111
+         \item Has adjustments for termini, since sequence ends are often
112
+           disordered.
113
+       }
114
+     \item Structured predictions
115
+       \itemize{
116
+         \item when iupredType = "glob"
117
+         \item Used to predict regions of globular folding.
118
+         \item please see
119
+           \href{https://doi.org/10.1002/cpbi.99}{Erdős & Dosztány (2020)}
120
+           for further information on interpreting these results.
121
+       }
122
+   }
123
+}
124
+