WilliamMc authored on 17/10/2020 19:21:16
Showing 8 changed files

... ...
@@ -114,7 +114,7 @@
114 114
 #'                                 ggplot2::aes(label = AA,
115 115
 #'                                              y = Charge + 0.1))
116 116
 #'   plot(gg)
117
-#' #alternativly, you can pass the data frame to sequenceMap()
117
+#' #alternatively, you can pass the data frame to sequenceMap()
118 118
 #' sequenceMap(sequence = exampleDF$AA,
119 119
 #'             property = exampleDF$Charge)
120 120
 
... ...
@@ -25,16 +25,33 @@ packages.
25 25
 
26 26
 **Please Refer to idpr-vignette.Rmd file for a detailed introduction to the**
27 27
 **idpr package.**
28
+Links to the vignettes found at the 
29
+[Bioconductor landing page](https://doi.org/doi:10.18129/B9.bioc.idpr) 
30
+
28 31
 
29 32
 ## Installation
30 33
 
31
-You can install the development version from [GitHub](https://github.com/) with:
34
+You can install the development version from 
35
+[Bioconductor](https://doi.org/doi:10.18129/B9.bioc.idpr) with:
36
+``` r
37
+if (!requireNamespace("BiocManager", quietly = TRUE))
38
+    install.packages("BiocManager")
39
+
40
+# The following initializes usage of Bioc devel
41
+BiocManager::install(version='devel')
42
+
43
+BiocManager::install("idpr")
44
+```
45
+
46
+Or you can install the development version from 
47
+[GitHub](https://github.com/wmm27/idpr) with:
32 48
 
33 49
 ``` r
34 50
 # install.packages("devtools") #if not already installed
35 51
 devtools::install_github("wmm27/idpr")
36 52
 ```
37 53
 
54
+
38 55
 ## Example
39 56
 This is a basic example to quickly profile your protein of interest:
40 57
 
... ...
@@ -51,3 +68,20 @@ idprofile(sequence = P53_HUMAN, #Generates the Profi
51 68
 ```
52 69
 
53 70
 
71
+
72
+**Please Refer to idpr-vignette.Rmd file for a detailed introduction to the**
73
+**idpr package.**
74
+
75
+## Appendix
76
+
77
+### Package citation
78
+```{r}
79
+citation("idpr")
80
+```
81
+
82
+### Additional Information
83
+```{r}
84
+Sys.time()
85
+Sys.Date()
86
+R.version
87
+```
54 88
\ No newline at end of file
... ...
@@ -11,12 +11,26 @@ also includes tools for IDP-based sequence analysis to be used in
11 11
 conjunction with other R packages.
12 12
 
13 13
 **Please Refer to idpr-vignette.Rmd file for a detailed introduction to
14
-the** **idpr package.**
14
+the** **idpr package.** Links to the vignettes found at the
15
+[Bioconductor landing page](https://doi.org/doi:10.18129/B9.bioc.idpr)
15 16
 
16 17
 ## Installation
17 18
 
18 19
 You can install the development version from
19
-[GitHub](https://github.com/) with:
20
+[Bioconductor](https://doi.org/doi:10.18129/B9.bioc.idpr) with:
21
+
22
+``` r
23
+if (!requireNamespace("BiocManager", quietly = TRUE))
24
+    install.packages("BiocManager")
25
+
26
+# The following initializes usage of Bioc devel
27
+BiocManager::install(version='devel')
28
+
29
+BiocManager::install("idpr")
30
+```
31
+
32
+Or you can install the development version from
33
+[GitHub](https://github.com/wmm27/idpr) with:
20 34
 
21 35
 ``` r
22 36
 # install.packages("devtools") #if not already installed
... ...
@@ -63,3 +77,54 @@ idprofile(sequence = P53_HUMAN, #Generates the Profi
63 77
     #> [[5]]
64 78
 
65 79
 <img src="man/figures/README-example-5.png" width="75%" />
80
+
81
+**Please Refer to idpr-vignette.Rmd file for a detailed introduction to
82
+the** **idpr package.**
83
+
84
+## Appendix
85
+
86
+### Package citation
87
+
88
+``` r
89
+citation("idpr")
90
+#> 
91
+#> To cite package 'idpr' in publications use:
92
+#> 
93
+#>   William McFadden and Judith Yanowitz (2020). idpr: Profiling and
94
+#>   Analyzing Intrinsically Disordered Proteins in R. R package version
95
+#>   0.99.25.
96
+#> 
97
+#> A BibTeX entry for LaTeX users is
98
+#> 
99
+#>   @Manual{,
100
+#>     title = {idpr: Profiling and Analyzing Intrinsically Disordered Proteins in R},
101
+#>     author = {William McFadden and Judith Yanowitz},
102
+#>     year = {2020},
103
+#>     note = {R package version 0.99.25},
104
+#>   }
105
+```
106
+
107
+### Additional Information
108
+
109
+``` r
110
+Sys.time()
111
+#> [1] "2020-10-17 14:40:21 EDT"
112
+Sys.Date()
113
+#> [1] "2020-10-17"
114
+R.version
115
+#>                _                           
116
+#> platform       x86_64-apple-darwin17.0     
117
+#> arch           x86_64                      
118
+#> os             darwin17.0                  
119
+#> system         x86_64, darwin17.0          
120
+#> status                                     
121
+#> major          4                           
122
+#> minor          0.2                         
123
+#> year           2020                        
124
+#> month          06                          
125
+#> day            22                          
126
+#> svn rev        78730                       
127
+#> language       R                           
128
+#> version.string R version 4.0.2 (2020-06-22)
129
+#> nickname       Taking Off Again
130
+```
... ...
@@ -328,7 +328,7 @@ netCharge(HUMAN_P53,
328 328
 There are also many pKa sets that are preloaded in **idpr**. 
329 329
 pKa datasets used within this vignette are cited. See the documentation for
330 330
 netCharge or pKaData within **idpr** for additional information and citations 
331
-for avaliable pKa sets. 
331
+for available pKa sets.
332 332
 Additionally, see Kozlowski (2016) for further details on pKa data sets. 
333 333
 
334 334
 * "EMBOSS" -  (Rice, Longden, & Bleasby, 2000)
... ...
@@ -359,7 +359,7 @@ netCharge(HUMAN_P53,
359 359
 ```
360 360
 
361 361
 
362
-Alternativly, the user may supply a custom pKa dataset.
362
+Alternatively, the user may supply a custom pKa dataset.
363 363
 The format must be a data frame where: Column 1 must be a 
364 364
 character vector of residues AND Column 2 must be a numeric vector of pKa 
365 365
 values. This can be helpful if there is a data set the user prefers or if
... ...
@@ -389,7 +389,7 @@ netCharge(HUMAN_P53,
389 389
 ### Global Charge Distibution
390 390
 
391 391
 chargeCalculationGlobal is a function used to calculate the charge of
392
-each residue, indepenent of other amino acids, within a sequence. 
392
+each residue, independent of other amino acids, within a sequence. 
393 393
 The results are returned as a data frame (default) or a plot.
394 394
 
395 395
 chargeCalculationGlobal accepts the same pKa and pH arguments as netCharge. 
... ...
@@ -406,14 +406,14 @@ P53_ccg <- chargeCalculationGlobal(HUMAN_P53)
406 406
 head(P53_ccg)
407 407
 ```
408 408
 
409
-The results can return a ggplot visalizing the charge distribution.
409
+The results can return a ggplot visualizing the charge distribution.
410 410
 ```{r}
411 411
 chargeCalculationGlobal(HUMAN_P53,
412 412
                         plotResults = TRUE)
413 413
 ```
414 414
 
415 415
 (This is not the most aesthetically pleasing plot, so a sequenceMap from 
416
-**idpr** is reccomended in this case for visualizations.)
416
+**idpr** is recommended in this case for visualizations.)
417 417
 
418 418
 ```{r}
419 419
 P53_ccg <- chargeCalculationGlobal(HUMAN_P53) #repeating from above
... ...
@@ -425,7 +425,7 @@ sequenceMap(sequence = P53_ccg$AA,
425 425
 
426 426
 The C-terminus here has a charge of ~ -2 since the function aggregates the
427 427
 termini values with residue charges by default. If you wish to calculate
428
-the termini as seperate values, use sumTermini = FALSE. This will add 2 residues
428
+the termini as separate values, use sumTermini = FALSE. This will add 2 residues
429 429
 to the data frame as "NH3" and "COO"
430 430
 
431 431
 ```{r}
... ...
@@ -435,7 +435,7 @@ head(P53_ccg)
435 435
 ```
436 436
 
437 437
 
438
-If you wish to completly ignore the termini for calculation, set includeTermini
438
+If you wish to completely ignore the termini for calculation, set includeTermini
439 439
 = FALSE. 
440 440
 
441 441
 ```{r}
... ...
@@ -472,7 +472,7 @@ P53_cgl <- chargeCalculationLocal(HUMAN_P53)
472 472
 head(P53_cgl)
473 473
 ```
474 474
 
475
-Alternativly, results can be returned as a plot of each window's charge.
475
+Alternatively, results can be returned as a plot of each window's charge.
476 476
 ```{r}
477 477
 chargeCalculationLocal(HUMAN_P53,
478 478
                        plotResults = TRUE)
... ...
@@ -554,7 +554,7 @@ R Version
554 554
 R.version.string
555 555
 ```
556 556
 
557
-System Infomation
557
+System Information
558 558
 ```{r}
559 559
 as.data.frame(Sys.info())
560 560
 ```
... ...
@@ -103,7 +103,7 @@ containing a sequence of interest. All forms are handled automatically without
103 103
 user specification, and fasta files will be loaded using the ‘Bioconductor’ 
104 104
 package. Additionally, all visualizations generated by 
105 105
 ‘idpr’ are made using the ‘ggplot2’ package (30). This is to allow further 
106
-customizations on returned graphics.  
106
+customization on returned graphics.  
107 107
 
108 108
 Overall, ‘idpr’ aims to integrate tools for the computational analysis of 
109 109
 intrinsically disordered proteins within R. This package is used to identify
... ...
@@ -184,10 +184,14 @@ are plotted, extended IDPs occupy a unique area on the plot. Therefore, this
184 184
 graphic can be used to distinguish proteins that are extended or compact under
185 185
 native conditions. However, it is important to note that IDPs can have the 
186 186
 characteristics of a collapsed protein or an extended protein. Therefore a 
187
-protein within the “collapsed protein” field does not necessarly mean that it 
187
+protein within the “collapsed protein” field does not necessary mean that it 
188 188
 lacks intrinsic disorder under native conditions (15, 31). 
189 189
 
190 190
 
191
+**For further theory and details, please refer to idpr's **
192
+**"Charge and Hydropathy Vignette" file.**
193
+
194
+
191 195
 ### Structural Tendency Plot
192 196
 
193 197
 The composition of amino acids and the overall chemistry of IDPs are distinctly
... ...
@@ -204,6 +208,9 @@ Disorder-promoting residues are P, E, S, Q, K, A, and G;
204 208
 order-promoting residues are M, N, V, H, L, F, Y, I, W, and C; 
205 209
 disorder‐neutral residues are D, T, and R (32). 
206 210
 
211
+**For further theory and details, please refer to idpr's **
212
+**"Structural Tendency Vignette" file.**
213
+
207 214
 
208 215
 ### Local Charge Calculations
209 216
 
... ...
@@ -211,11 +218,15 @@ As stated, IDPs are enriched in charged residues. Residues of similar charge
211 218
 tend to repel one another which can prevent protein packing and promote an 
212 219
 unstructured protein configuration under native conditions (15). There are many
213 220
 pKa data sets, we utilize the IPC pKa data set for calculations (33). Beyond the
214
-use of IDP predictions, local charge is an impotant biochemical measurement with
215
-many applications. Charges are calculated using a sliding window to help 
221
+use of IDP predictions, local charge is an important biochemical measurement
222
+with many applications. Charges are calculated using a sliding window to help 
216 223
 identify regions of extreme charge. The resulting figure is similar to ProtScale
217 224
 from ExPASy (34).
218 225
 
226
+**For further theory and details, please refer to idpr's **
227
+**"Charge and Hydropathy Vignette" file.**
228
+
229
+
219 230
 ### Local Hydropathy
220 231
 
221 232
 As stated, hydrophobic residues are disfavored in IDPs (15). The hydrophobic 
... ...
@@ -229,6 +240,10 @@ The resulting figure is similar to ProtScale from ExPASy (34).
229 240
 Scaled hydropathy is averaged locally along the protein using a 
230 241
 sliding window to identify regions devoid of hydropathic characteristics.
231 242
 
243
+**For further theory and details, please refer to idpr's **
244
+**"Charge and Hydropathy Vignette" file.**
245
+
246
+
232 247
 ### IUPred
233 248
 
234 249
 IUPred2 analyzes an amino acid sequence and returns a score of intrinsic 
... ...
@@ -255,9 +270,12 @@ iupredAnchor(P53_ID) #IUPred2 long + ANCHOR2 prediction of scaffolding
255 270
 
256 271
 Redox-sensitive regions are shaded with a green background.
257 272
 ```{r}
258
-iupredRedox(P53_ID) #IUPred2 long with enviornmental context
273
+iupredRedox(P53_ID) #IUPred2 long with environmental context
259 274
 ```
260 275
 
276
+**For further theory, use, and details, please refer to idpr's **
277
+**"IUPred Vignette" file.**
278
+
261 279
 ***
262 280
 
263 281
 ## Visualizing Discrete Values
... ...
@@ -326,8 +344,6 @@ et al. (2001) (25).
326 344
 ## References
327 345
 
328 346
 
329
-References
330
-
331 347
 1. Dunker AK, Lawson JD, Brown CJ, Williams RM, Romero P, Oh JS, et al. Intrinsically disordered protein. Journal of Molecular Graphics and Modelling. 2001;19(1):26-59.
332 348
 2. Tompa P. Intrinsically unstructured proteins. Trends in biochemical sciences. 2002;27(10):527-33.
333 349
 3. Uversky VN. Intrinsically disordered proteins from A to Z. The International Journal of Biochemistry & Cell Biology. 2011;43(8):1090-103.
... ...
@@ -371,3 +387,28 @@ References
371 387
 41. Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, Madden TL. NCBI BLAST: a better web interface. Nucleic Acids Research. 2008;36(suppl_2):W5-W9.
372 388
 42. Madeira F, Park YM, Lee J, Buso N, Gur T, Madhusoodanan N, et al. The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic acids research. 2019;47(W1):W636-W41.
373 389
 43. Pagès H, Aboyoun P, Gentleman R, DebRoy S. Biostrings: Efficient manipulation of biological strings. R package version. 2020;2(0).
390
+
391
+
392
+
393
+
394
+### Additional Information
395
+R Version
396
+```{r}
397
+R.version.string
398
+```
399
+
400
+System Information
401
+```{r}
402
+as.data.frame(Sys.info())
403
+```
404
+
405
+```{r}
406
+sessionInfo()
407
+```
408
+
409
+```{r, results="asis"}
410
+citation()
411
+```
412
+
413
+
414
+
... ...
@@ -146,7 +146,9 @@ head(iupredLongDF)
146 146
 ### iupredType = "short"
147 147
 iupredType =  “short” is the setting to predict small regions of intrinsic
148 148
 disorder in proteins, optimized for missing regions of protein structures saved 
149
-to the Protein Databank (PDB). It is important to note that this tends to favor
149
+to the Protein Databank (PDB). Its goal is to predict 
150
+regions that are not represented in crystallographic experiments.
151
+It is important to note that this tends to favor
150 152
 disorder at the N- and C- terminus (Dosztányi, 2018). 
151 153
 
152 154
 ```{r}
... ...
@@ -166,8 +168,7 @@ head(iupredShortDF)
166 168
 ### iupredType = "glob"
167 169
 iupredType =  “glob” is the setting that is to help reduce the noise of small 
168 170
 disordered regions in otherwise ordered regions and to help identify sequences 
169
-that are likely to have a specific and rigid fold. Its goal is to predict 
170
-regions that are not represented in crystallographic experiments 
171
+that are likely to have a specific and rigid fold. 
171 172
 (Dosztányi, 2018). 
172 173
 ```{r}
173 174
 p53_ID <- "P04637"
... ...
@@ -233,9 +234,9 @@ cystine residues to serine when simulating a reducing or
233 234
 This eliminates any structural stabilization by disulfide bonds 
234 235
 (Mészáros et al., 2018).
235 236
 
236
-Redox-plus predictions are shown in blue, Redox-minus predications are shown
237
-in purple. Any region identified as "Redox Senstitive" will be highlighted in
238
-light green (does not appear if there are no senstitive regions predicted).
237
+Redox-plus predictions are shown in blue, Redox-minus predictions are shown
238
+in purple. Any region identified as "Redox Sensitive" will be highlighted in
239
+light green (does not appear if there are no sensitive regions predicted).
239 240
 ```{r}
240 241
 p53_ID <- "P04637"
241 242
 iupredRedox(p53_ID,
... ...
@@ -258,10 +259,10 @@ head(iupredRedoxDF)
258 259
 
259 260
 
260 261
 While the aesthetics of the plots above are meant to represent a middleground of
261
-the graphics avaliable on 
262
+the graphics available on 
262 263
 and the other plots generated by **idpr**, a user may wish to use the data
263 264
 frames for data analysis or unique graphics. Another way to represent the data
264
-is using the sequenceMap() funciton.
265
+is using the sequenceMap() function.
265 266
 
266 267
 ```{r}
267 268
 iupredLongDF <- iupred(p53_ID,
... ...
@@ -280,6 +281,10 @@ sequenceMap(sequence = iupredLongDF$AA,
280 281
 ```
281 282
 
282 283
 
284
+**For further details, please refer to idpr's **
285
+**"Sequence Map Vignette" file.**
286
+
287
+
283 288
 ## Getting the UniProt Accession
284 289
 
285 290
 To make a connection to the IUPred2A REST API, a UniProt Accession ID is 
... ...
@@ -290,6 +295,28 @@ If a user does not have the protein name or info to search, a BLAST search on
290 295
 UniProt may be helpful at https://www.uniprot.org/blast/ 
291 296
 (UniProt Consortium, 2019).
292 297
 
298
+## Use
299
+Please note that these functions are only meant to access the IUPred2A REST API. 
300
+The functions within **idpr** are **not** designed by the IUPred2A developers. 
301
+The authors of **idpr** do not control, manage, or maintain any 
302
+aspect of IUPred2A. Therefore, **idpr** is unable to guarantee access to 
303
+the API.
304
+
305
+
306
+The user MUST follow the IUPred2A Terms of Use in addition to the terms
307
+for use of **idpr**.
308
+
309
+When publishing or using any data generated with IUPred2A, the user must cite the 
310
+appropriate publication(s) for the IUPred2A service. This may change as the
311
+program updates or improves. **idpr** does not control updates to IUPred2A.
312
+
313
+
314
+The current website (as of 10/15/20) for IUPred2A is found here:
315
+[https://iupred2a.elte.hu/](https://iupred2a.elte.hu/). 
316
+The authors of **idpr** strongly recommend visiting this page to follow any
317
+updates and changes as well as confirming appropriate use per the IUPred2A
318
+terms of use. 
319
+
293 320
 
294 321
 ## References
295 322
 
... ...
@@ -344,7 +371,7 @@ R Version
344 371
 R.version.string
345 372
 ```
346 373
 
347
-System Infomation
374
+System Information
348 375
 ```{r}
349 376
 as.data.frame(Sys.info())
350 377
 ```
... ...
@@ -105,7 +105,7 @@ sequenceMap(
105 105
 There are multiple customization options to allow for improved graphing. 
106 106
 One is the organization of the labels. 
107 107
 You are able to represent the sequence with both amino acid residues and their 
108
-location in the sequence, but you can choose one or the other (or nethier).
108
+location in the sequence, but you can choose one or the other (or neither).
109 109
 This is specified by the 'labelType' argument
110 110
 
111 111
 ```{r}
... ...
@@ -138,7 +138,7 @@ sequenceMap(
138 138
 ```
139 139
 
140 140
 The text can also be rotated, via the 'rotationAngle' argument, for ease of 
141
-reading. This is espeically helpful for larger sequences with dense graphics.
141
+reading. This is especially helpful for larger sequences with dense graphics.
142 142
 ```{r}
143 143
 sequenceMap(
144 144
   sequence = tendencyDF$AA,
... ...
@@ -217,7 +217,7 @@ sequenceMap(
217 217
 
218 218
 Since the output is a ggplot, the visualization is able to be assigned to an
219 219
 object and additional features can be added and annotated. The example below 
220
-will annotate a metal binding residue and the region that the P53 protien binds 
220
+will annotate a metal binding residue and the region that the P53 protein binds 
221 221
 DNA. These annotations and locations were retrieved from UniProt 
222 222
 (UniProt Consortium 2019).
223 223
 ```{r}
... ...
@@ -299,10 +299,10 @@ of the residues on the map.
299 299
 
300 300
 To solve this, sequenceMapCoordinates() will return the row (y value) and the 
301 301
 column (x value) as a data frame for each residue visualized with sequenceMap(). 
302
-The puporse of this is to make adding annotations easier and customizable.
302
+The purpose of this is to make adding annotations easier and customizable.
303 303
 
304 304
 
305
-As shown before, nbResidues deterines how many residues will be on each
305
+As shown before, nbResidues determines how many residues will be on each
306 306
 row. Make sure nbResidues is equal to the value used in sequenceMap(). 
307 307
 
308 308
 To get the coordinates, the amino acid sequence must be supplied. The output
... ...
@@ -319,7 +319,7 @@ head(coord_DF)
319 319
 
320 320
 ## Sequence Plot
321 321
 
322
-The funtions for calculating charge and scaled hydropathy and the iupred
322
+The functions for calculating charge and scaled hydropathy and the iupred
323 323
 functions all have plotting options. The plotting for these are done with the
324 324
 sequencePlot() function to have a uniform aesthetic. If you wish to make a plot
325 325
 with custom values, sequencePlot() can still be used.
... ...
@@ -399,7 +399,7 @@ R Version
399 399
 R.version.string
400 400
 ```
401 401
 
402
-System Infomation
402
+System Information
403 403
 ```{r}
404 404
 as.data.frame(Sys.info())
405 405
 ```
... ...
@@ -78,7 +78,7 @@ head(tendencyDF)
78 78
 ```
79 79
 
80 80
 
81
-For convient plotting, use structuralTendencyPlot().
81
+For convenient plotting, use structuralTendencyPlot().
82 82
 Results can be as a pie chart or bar plot. 
83 83
 
84 84
 ```{r}
... ...
@@ -188,7 +188,7 @@ structuralTendencyPlot(P53_MOUSE,
188 188
 In addition to the compositional profile of each residue, a summary of the 
189 189
 profile focused only on the structural tendency can be given by setting
190 190
 summarize = TRUE. This shifts the focus from amino acid identity to the general 
191
-composition. The graphType is preseved.
191
+composition. The graphType is preserved.
192 192
 
193 193
 ```{r}
194 194
 structuralTendencyPlot(P53_MOUSE,
... ...
@@ -246,7 +246,7 @@ R Version
246 246
 R.version.string
247 247
 ```
248 248
 
249
-System Infomation
249
+System Information
250 250
 ```{r}
251 251
 as.data.frame(Sys.info())
252 252
 ```