Browse code

aggiornamenti

Emanuel Soda authored on 30/05/2022 12:43:00
Showing2 changed files

... ...
@@ -50,7 +50,7 @@ biocViews: Software, AssayDomain, GeneExpression
50 50
 BugReports: https://github.com/EmanuelSoda/ScreenR/issues
51 51
 URL: https://emanuelsoda.github.io/ScreenR/
52 52
 Collate: 
53
-    'ScreenR-pacakage.R'
53
+    'ScreenR-package.R'
54 54
     'annotation_table.R'
55 55
     'barcode_lost.R'
56 56
     'camera_method.R'
... ...
@@ -1,12 +1,14 @@
1 1
 ---
2 2
 title: "ScreenR Example Analysis"
3
-output: BiocStyle::html_document
3
+output:
4
+    BiocStyle::html_document:
5
+      toc: true
4 6
 author:
7
+  
5 8
 - name: "Emanuel Michele Soda"
6 9
   affiliation: Istituto Europeo Oncologia (IEO),
7 10
                Milano, Italy
8 11
   email: emanuelmichele@ieo.it
9
-  
10 12
 - name: "Elena Ceccacci"
11 13
   affiliation: Istituto Europeo Oncologia (IEO),
12 14
                Milano, Italy
... ...
@@ -18,17 +20,21 @@ vignette: >
18 20
   %\VignetteEngine{knitr::knitr}
19 21
 editor_options: 
20 22
   chunk_output_type: console
23
+  markdown: 
24
+    wrap: 80
21 25
 ---
22 26
 
23 27
 # Importing Package
24
-```{r packages, message=FALSE, warning=TRUE}
28
+
29
+```{r packages, message=TRUE, warning=TRUE}
25 30
 knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
26 31
 library(ggplot2)
27 32
 library(dplyr)
28 33
 library(tidyr)
29 34
 ```
30 35
 
31
-# Introduction 
36
+# Introduction
37
+
32 38
 The R package **ScreenR** has been developed to perform analysis of
33 39
 High-throughput RNA interference screening using pooled shRNA libraries and next
34 40
 generation sequencing.. Nowadays several short hairpin RNA (shRNA) libraries are
... ...
@@ -37,7 +43,7 @@ analysis, often called barcode screening, has greatly increased for their
37 43
 benefits both from a time-consuming point of view and for the possibility of
38 44
 carrying out screening on a large number of genes simultaneously. However, the
39 45
 bioinformatic analysis of this type of screening still lacks a golden standard.
40
-Here, screenR allows the user to carry out a preliminary quality check of the
46
+Here, ScreenR allows the user to carry out a preliminary quality check of the
41 47
 experiment, visually inspect the data and finally identify the most significant
42 48
 hits of the experiment through a series of plots and cross-statistical analyses.
43 49
 
... ...
@@ -45,62 +51,39 @@ hits of the experiment through a series of plots and cross-statistical analyses.
45 51
 
46 52
 ## Bioconductor
47 53
 
48
-**ScreenR** requires several CRAN and Bioconductor R packages to be
49
-installed. Dependencies are usually handled automatically, when installing the
50
-package using the following commands:
51
-
52
-```{r ScreenR install Bioc, eval=FALSE, message=FALSE, warning=TRUE}
53
-install.packages("BiocManager")
54
-BiocManager::install("ScreenR")
55
-```
54
+**ScreenR** requires several CRAN and Bioconductor R packages to be installed.
55
+Dependencies are usually handled automatically, when installing the package
56
+using the following commands:
56 57
 
57
-## Manual installation
58
-
59
-In the unlikely case that a manual installation is required, e.g., if you do
60
-not install **ScreenR** via Bioconductor (which is highly recommended),
61
-you can install CRAN packages using: 
62
-
63
-```{r pkg install, eval=FALSE, message=FALSE, warning=TRUE}
64
-install.packages("<package_name>")
65
-```
66
-
67
-While Bioconductor packages can be installed from R using the following command:
68
-
69
-```{r pkg bioc install, eval=FALSE, message=FALSE, warning=TRUE}
70
-BiocManager::install("<package_name>")
71
-```
58
+```{r ScreenR install Bioc, eval=FALSE, message=TRUE, warning=TRUE}
59
+if (!require("BiocManager", quietly = TRUE))
60
+    install.packages("BiocManager")
72 61
 
73
-Sometimes, it may also be useful to update Bioconductor:
74
-
75
-```{r bioc update, eval=FALSE, message=FALSE, warning=TRUE}
76
-BiocManager::install()
62
+BiocManager::install("ScreenR")
77 63
 ```
78 64
 
79
-Finally, the manual installation of **ScreenR** can, for example, be
80
-done from the command line ...
65
+## Github
81 66
 
82
-```{r manual install, eval=FALSE, message=FALSE, warning=TRUE}
83
-R CMD INSTALL ScreenR_<version>.tar.gz
84
-```
67
+The newest version can directly be installed from GitHub using the CRAN package
68
+devtools:
85 69
 
86
-... or the newest version can directly be installed from GitHub using the
87
-CRAN package devtools:
70
+```{r github install, eval=FALSE, message=TRUE, warning=TRUE}
71
+if (!require("devtools", quietly = TRUE))
72
+   install.packages("devtools")
88 73
 
89
-```{r github install, eval=FALSE, message=FALSE, warning=TRUE}
90
-# install.packages("devtools")
91 74
 devtools::install_github("EmanuelSoda/ScreenR")
92 75
 ```
93 76
 
94
-
95
-# Analysis 
77
+# Analysis
96 78
 
97 79
 Here is reported the ScreenR pipeline
98
-<img src="/Users/ieo5571/Documents/IEO/Stage/ScreenR/man/figures/Pipeline.png" align="top">
80
+<img src="/Users/ieo5571/Documents/IEO/Stage/ScreenR/man/figures/Pipeline.png" align="top"/>
99 81
 
100 82
 ## Loading the package
101 83
 
102 84
 After installation, loading the package is simple as :
103
-```{r, message=FALSE, warning=TRUE}
85
+
86
+```{r, message=TRUE, warning=TRUE}
104 87
 library(ScreenR)
105 88
 ```
106 89
 
... ...
@@ -108,17 +91,18 @@ library(ScreenR)
108 91
 
109 92
 The input of ScreenR is a count table obtained from barcode alignment to the
110 93
 reference genome/library. A count table is usually the starting point of an
111
-RNA-seq differentially expressed genes analysis and consists of a matrix
112
-containing reads count organized with: 
94
+RNA-seq deferentially expressed genes analysis and consists of a matrix
95
+containing reads count organized with:
113 96
 
114
-* Genes on the rows
115
-* Samples on the columns
97
+-   Genes on the rows
98
+-   Samples on the columns
116 99
 
117
-For this vignette we will use as an example a Loss of Function Chemical 
100
+For this vignette we will use as an example a Loss of Function Chemical
118 101
 lethality Genetic Screening performed using shRNA libraries where each gene is
119
-represented by ten slightly different shRNA each labeled with a univoce barcode.
102
+represented by ten slightly different shRNA each labeled with a unique barcode.
120 103
 First of all the data has to be read.
121
-```{r read_data, message=FALSE, warning=TRUE}
104
+
105
+```{r read_data, message=TRUE, warning=TRUE}
122 106
 data(count_table)
123 107
 data(annotation_table)
124 108
 
... ...
@@ -137,15 +121,15 @@ data <- data %>%
137 121
 total_Annotation <- annotation_table
138 122
 ```
139 123
 
140
-## Object Creation 
124
+## Object Creation
125
+
141 126
 The second needed step is to create a **ScreenR object** from the count table.
142 127
 The ScreenR object is created using the function **create_screenr_object()** and
143 128
 will be used to store the most important information to perform the analysis.
144 129
 Most of the ScreenR function takes as main input the ScreenR object to perform
145 130
 the needed operation and return a result.
146 131
 
147
-
148
-```{r Create_Object, message=FALSE, warning=TRUE}
132
+```{r Create_Object, message=TRUE, warning=TRUE}
149 133
 groups <- factor(c(
150 134
     "T1/T2", "T1/T2",
151 135
     "Time3_TRT", "Time3_TRT", "Time3_TRT",
... ...
@@ -167,43 +151,41 @@ object <- create_screenr_object(
167 151
 
168 152
 ## Removing all zero rows
169 153
 
170
-```{r Removing all zero rows, message=FALSE, warning=TRUE}
154
+```{r Removing all zero rows, message=TRUE, warning=TRUE}
171 155
 object <- remove_all_zero_row(object)
172 156
 ```
173 157
 
174
-
175 158
 ## Computing the needed tables
176 159
 
177 160
 Once the object is created, the data must be normalized to start the analysis.
178
-ScreenR uses *Counts Per Million* (**CPM**)  normalization  which has the
161
+ScreenR uses *Counts Per Million* (**CPM**) normalization which has the
179 162
 following mathematical expression:
180 163
 
181 164
 $$CPM = \frac{Number \; of \; mapped \; reads \; to \; a \; barcode} 
182
-             { \sum_{sample}{Number\; of \;mapped \; reads}} *10^{6}$$ 
183
-                   
184
-The number of reads mapped for each Barcode in a sample are normalized by the 
165
+             { \sum_{sample}{Number\; of \;mapped \; reads}} *10^{6}$$
166
+
167
+The number of reads mapped for each barcode in a sample are normalized by the
185 168
 number of reads in that sample and multiplied by one million.
186 169
 
187
-This information is store in a *data table* which is a tidy version of the 
170
+This information is store in a *data table* which is a tidy version of the
188 171
 original *count table* and will be used throughout the analysis.
189 172
 
190
-
191
-```{r normalization, message=FALSE, warning=TRUE}
173
+```{r normalization, message=TRUE, warning=TRUE}
192 174
 object <- normalize_data(object)
193 175
 object <- compute_data_table(object)
194 176
 ```
195 177
 
196
-
197 178
 ## Quality Check
198 179
 
199 180
 The first step to perform when dealing with sequencing data is to check the
200
-quality of the samples. In ScreenR this can be done using several methods. 
181
+quality of the samples. In ScreenR this can be done using several methods.
201 182
 
202 183
 ### Mapped Reads
203
-The total number of mapped reads can be displayed with a barplot with the 
184
+
185
+The total number of mapped reads can be displayed with a barplot with the
204 186
 formula.
205 187
 
206
-```{r plot_mapped_reads, message=FALSE, warning=TRUE}
188
+```{r plot_mapped_reads, message=TRUE, warning=TRUE}
207 189
 plot <- plot_mapped_reads(object, palette) +
208 190
     ggplot2::coord_flip() +
209 191
     ggplot2::scale_y_continuous(labels = scales::comma) +
... ...
@@ -213,11 +195,11 @@ plot <- plot_mapped_reads(object, palette) +
213 195
 plot
214 196
 ```
215 197
 
216
-
217 198
 For example the distribution can be seen using both boxplots or density plots.
218
-#### Boxplot Mapped Reads
219 199
 
220
-```{r  plot_mapped_reads_distribution_boxplot, message=FALSE, warning=TRUE}
200
+\#### Boxplot Mapped Reads
201
+
202
+```{r  plot_mapped_reads_distribution_boxplot, message=TRUE, warning=TRUE}
221 203
 plot <- plot_mapped_reads_distribution(
222 204
     object, palette,
223 205
     alpha = 0.8,
... ...
@@ -229,10 +211,9 @@ plot <- plot_mapped_reads_distribution(
229 211
 plot
230 212
 ```
231 213
 
232
-
233 214
 #### Density plot
234 215
 
235
-```{r  plot_mapped_reads_distribution_density, message=FALSE, warning=TRUE}
216
+```{r  plot_mapped_reads_distribution_density, message=TRUE, warning=TRUE}
236 217
 plot <- plot_mapped_reads_distribution(
237 218
     object, palette,
238 219
     alpha = 0.5,
... ...
@@ -243,27 +224,26 @@ plot <- plot_mapped_reads_distribution(
243 224
 plot
244 225
 ```
245 226
 
246
-
247 227
 ### Barcode Lost
248 228
 
249 229
 Another very important quality check when a Genetic Screening is performed is to
250 230
 check the barcode lost during the experiment, meaning the barcode that after
251
-different time points or treatments results in reads count equal to zero. 
252
-ScreenR implements a function able to compute and plot the number of barcodes 
231
+different time points or treatments results in reads count equal to zero.
232
+ScreenR implements a function able to compute and plot the number of barcodes
253 233
 lost for each samples.
254 234
 
255
-```{r  plot_barcode_lost, message=FALSE, warning=TRUE}
235
+```{r  plot_barcode_lost, message=TRUE, warning=TRUE}
256 236
 plot <- plot_barcode_lost(screenR_Object = object, palette = palette) +
257 237
     ggplot2::coord_flip()
258 238
 plot
259 239
 ```
260 240
 
261
-Moreover it is important to check if the lost barcodes  in a sample 
262
-all belong to the same gene, in order to verify that an adequate number of
263
-barcodes per gene are still present. This can be done by visualizing the number
264
-of barcode lost in a sample by gene.
241
+Moreover it is important to check if the lost barcodes in a sample all belong to
242
+the same gene, in order to verify that an adequate number of barcodes per gene
243
+are still present. This can be done by visualizing the number of barcode lost in
244
+a sample by gene.
265 245
 
266
-```{r  plot_barcode_lost_for_gene, message=FALSE, warning=TRUE}
246
+```{r  plot_barcode_lost_for_gene, message=TRUE, warning=TRUE}
267 247
 plot <- plot_barcode_lost_for_gene(object,
268 248
     samples = c("Time4_TRT_C", "Time4_CTRL_C")
269 249
 )
... ...
@@ -272,21 +252,22 @@ plot
272 252
 ```
273 253
 
274 254
 ### Plot MDS {.tabset}
275
-In order to see the samples clusterization an initial MDS analysis can be 
276
-conducted. In ScreenR this can be done using the *plot_mds* function and the 
277
-user can decide the color code of the graph in order to highlight the trend of 
278
-the samples based on replicates, treatment or timepoints simply by modifying the 
279
-field levels in the plot_mds function. 
280 255
 
256
+In order to see the samples clusterization an initial MDS analysis can be
257
+conducted. In ScreenR this can be done using the *plot_mds* function and the
258
+user can decide the color code of the graph in order to highlight the trend of
259
+the samples based on replicates, treatment or timepoints simply by modifying the
260
+field levels in the plot_mds function.
281 261
 
282 262
 #### For Sample
283
-```{r  Plot_MDS_Sample, message=FALSE, warning=TRUE}
263
+
264
+```{r  Plot_MDS_Sample, message=TRUE, warning=TRUE}
284 265
 plot_mds(screenR_Object = object)
285 266
 ```
286 267
 
287
-
288 268
 #### For Treatment
289
-```{r  Plot_MDS_Treatment, message=FALSE, warning=TRUE}
269
+
270
+```{r  Plot_MDS_Treatment, message=TRUE, warning=TRUE}
290 271
 group_table <- get_data_table(object)   %>%
291 272
     select(Sample, Day, Treatment) %>%
292 273
     distinct()
... ...
@@ -300,7 +281,8 @@ plot_mds(
300 281
 ```
301 282
 
302 283
 #### For Day
303
-```{r  Plot_MDS_Day, message=FALSE, warning=TRUE}
284
+
285
+```{r  Plot_MDS_Day, message=TRUE, warning=TRUE}
304 286
 group_day <- group_table$Day
305 287
 
306 288
 plot_mds(
... ...
@@ -310,24 +292,26 @@ plot_mds(
310 292
 ```
311 293
 
312 294
 ## Statistical Analysis
313
-Once the various steps of the quality check have been passed, the actual 
314
-statistical analysis can begin. The statistical Analysis of ScreenR is based 
315
-on three different methods:
316 295
 
317
-* [Z-score filtering](https://pubmed.ncbi.nlm.nih.gov/21515799/)
318
-* [CAMERA filtering](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3458527/)
319
-* [ROAST filtering](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2922896/)
296
+Once the various steps of the quality check have been passed, the actual
297
+statistical analysis can begin. The statistical Analysis of ScreenR is based on
298
+three different methods:
299
+
300
+-   [Z-score filtering](https://pubmed.ncbi.nlm.nih.gov/21515799/)
301
+-   [CAMERA filtering](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3458527/)
302
+-   [ROAST filtering](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2922896/)
320 303
 
321 304
 ### Z-score hit
322
-In order to compute the Z-score, first a list of metrics has to be computed. 
323
-In particular a *Log2FC* is computed for the treated vs control samples in the 
305
+
306
+In order to compute the Z-score, first a list of metrics has to be computed. In
307
+particular a *Log2FC* is computed for the treated vs control samples in the
324 308
 different conditions. This can be done using the function *compute_metrics()*.
325
-Here is reported an example of treated vs control in different day. 
309
+Here is reported an example of treated vs control in different day.
326 310
 
327
-Then the different distribution of the Z-score can be plotted 
328
-using the *plot_zscore_distribution* function. 
311
+Then the different distribution of the Z-score can be plotted using the
312
+*plot_zscore_distribution* function.
329 313
 
330
-```{r  compute_metrics, message=FALSE, warning=TRUE}
314
+```{r  compute_metrics, message=TRUE, warning=TRUE}
331 315
 # 2DG
332 316
 data_with_measure_TRT <- list(
333 317
     Time3 = compute_metrics(
... ...
@@ -345,13 +329,12 @@ data_with_measure_TRT <- list(
345 329
 plot_zscore_distribution(data_with_measure_TRT, alpha = 0.8) 
346 330
 ```
347 331
 
348
-
349 332
 ### Z-score hit
333
+
350 334
 Based on these metrics the Z-score hit identification can be computed using the
351 335
 *find_zscore_hit* function.
352 336
 
353
-
354
-```{r  Z_score_hit, message=FALSE, warning=TRUE}
337
+```{r  Z_score_hit, message=TRUE, warning=TRUE}
355 338
 zscore_hit_TRT <- list(
356 339
     Time3 = find_zscore_hit(
357 340
         table_treate_vs_control = data_with_measure_TRT$Time3,
... ...
@@ -363,12 +346,13 @@ zscore_hit_TRT <- list(
363 346
     )
364 347
 )
365 348
 zscore_hit_TRT
366
-```   
367
-
349
+```
368 350
 
369 351
 ### CAMERA
352
+
370 353
 The same can be done with the CAMERA hit using the function *find_camera_hit*.
371
-```{r  CAMERA, message=FALSE, warning=TRUE}
354
+
355
+```{r  CAMERA, message=TRUE, warning=TRUE}
372 356
 matrix_model <- model.matrix(~ groups)
373 357
 colnames(matrix_model) <- unique(groups)
374 358
 
... ...
@@ -388,9 +372,11 @@ camera_hit_TRT
388 372
 ```
389 373
 
390 374
 ### ROAST
391
-Last but not least this is done also for the ROAST hit using the function  
375
+
376
+Last but not least this is done also for the ROAST hit using the function\
392 377
 *find_roast_hit*.
393
-```{r ROAST, message=FALSE, warning=TRUE}
378
+
379
+```{r ROAST, message=TRUE, warning=TRUE}
394 380
 roast_hit_TRT <- list(
395 381
     Time3 = find_roast_hit(
396 382
         screenR_Object = object, matrix_model = matrix_model,
... ...
@@ -405,24 +391,22 @@ roast_hit_TRT <- list(
405 391
 roast_hit_TRT
406 392
 ```
407 393
 
394
+### Find Common Hit
408 395
 
409
-### Find Common Hit 
410
-
411
-ScreenR considers as final hit only the one result as candidate hit in all three 
412
-statistical methods. However this is a particularly stringent method and in 
413
-some cases leads to a small number of results. For this reason the user can 
414
-also decide to opt for a less stringent method that considers only the hits 
415
-present in at least two of the statistical methods. The two different strategies 
416
-can be computed with the function::
396
+ScreenR considers as final hit only the one result as candidate hit in all three
397
+statistical methods. However this is a particularly stringent method and in some
398
+cases leads to a small number of results. For this reason the user can also
399
+decide to opt for a less stringent method that considers only the hits present
400
+in at least two of the statistical methods. The two different strategies can be
401
+computed with the function::
417 402
 
418
-* **common_hit_TRT_at_least_2**: considering candidate Hits the one present in 
419
-                            at least two of the three methods (less stringent)
403
+-   **common_hit_TRT_at_least_2**: considering candidate Hits the one present in
404
+    at least two of the three methods (less stringent)
420 405
 
421
-* **common_hit_TRT_at_least_3**: considering candidate Hits the one present in 
422
-                                 all of the three methods
406
+-   **common_hit_TRT_at_least_3**: considering candidate Hits the one present in
407
+    all of the three methods
423 408
 
424
-
425
-```{r Common_Hit,  message=FALSE, warning=TRUE}
409
+```{r Common_Hit,  message=TRUE, warning=TRUE}
426 410
 common_hit_TRT_at_least_2 <- list(
427 411
     Time3 = find_common_hit(
428 412
         zscore_hit_TRT$Time3, camera_hit_TRT$Time3, roast_hit_TRT$Day3,
... ...
@@ -448,24 +432,22 @@ common_hit_TRT_at_least_3 <- list(
448 432
 
449 433
 ### Plot common hit
450 434
 
451
-The intersection of the hits found by the three statistical methods can be 
435
+The intersection of the hits found by the three statistical methods can be
452 436
 easily visualized using the *plot_common_hit* function.
453 437
 
454
-```{r Venn_diagram_in_at_least_2}
438
+```{r Venn_diagram_in_at_least_2,  message=TRUE, warning=TRUE}
455 439
 plot_common_hit(
456 440
     hit_zscore = zscore_hit_TRT$Time4, hit_camera = camera_hit_TRT$Time4,
457 441
     roast_hit_TRT$Time4, show_elements = FALSE, show_percentage = TRUE
458 442
 )
459 443
 ```
460 444
 
445
+As we all know, when we deal with statistical methods the is the possibility of
446
+*type I* error also known as "false positive". For this reason is important to
447
+visualize the results obtained. This can be done by visualizing the trend of the
448
+candidate hits obtained using the function **plot_trend**.
461 449
 
462
-As we all know, when we deal with statistical methods the is the possibility of 
463
-*type I* error also known as "false positive". For this reason is important to 
464
-visualize the results obtained. This can be done by visualizing the trend 
465
-of the candidate hits obtained using the function **plot_trend**.
466
-
467
-
468
-```{r plot_trend}
450
+```{r plot_trend,  message=TRUE, warning=TRUE}
469 451
 candidate_hits <- common_hit_TRT_at_least_2$Time3
470 452
 
471 453
 plot_trend(screenR_Object = object, 
... ...
@@ -479,6 +461,6 @@ plot_trend(screenR_Object = object,
479 461
            group_var = c("Time1", "Time2", "TRT"))
480 462
 ```
481 463
 
482
-```{r}
464
+```{r,  message=TRUE, warning=TRUE}
483 465
 sessionInfo()
484 466
 ```