Bioconductor Code: ppcseq

Name	Mode	Size
.github	040000
R	040000
README_files	040000
data	040000
inst	040000
man	040000
src	040000
tests	040000
vignettes	040000
.Rbuildignore	100644	0 kb
.gitignore	100644	0 kb
.travis.yml	100644	1 kb
DESCRIPTION	100644	3 kb
NAMESPACE	100644	2 kb
README.Rmd	100644	2 kb
README.md	100644	4 kb
configure	100755	0 kb
configure.win	100755	0 kb

README.md

Posterior predictive check for bulk RNA sequencing data ================ The input data set is a tidy representation of a differential gene transcript abundance analysis To install: Before install, for linux systems, in order to exploit multi-threading, from R write: ``` r dir.create(file.path("~/", ".R"), showWarnings = FALSE) fileConn<-file("~/.R/Makevars") writeLines(c( "CXX14FLAGS += -O3","CXX14FLAGS += -DSTAN_THREADS", "CXX14FLAGS += -pthread"), fileConn) close(fileConn) ``` Then, install with ``` r devtools::install_github("stemangiola/ppcseq") ``` You can get the test dataset with ``` r data("counts") counts ``` ## # A tibble: 394,821 x 9 ## sample symbol logCPM LR PValue FDR value W Label ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <chr> ## 1 10922PP SLC16A12 1.39 41.1 1.46e-10 0.00000274 160 -0.129 High ## 2 10935PP SLC16A12 1.39 41.1 1.46e-10 0.00000274 150 -0.127 High ## 3 10973PP SLC16A12 1.39 41.1 1.46e-10 0.00000274 146 -0.426 High ## 4 10976PP SLC16A12 1.39 41.1 1.46e-10 0.00000274 347 -0.0164 High ## 5 10985PP SLC16A12 1.39 41.1 1.46e-10 0.00000274 175 -0.135 High ## 6 11026PP SLC16A12 1.39 41.1 1.46e-10 0.00000274 244 0.125 High ## 7 11045PP SLC16A12 1.39 41.1 1.46e-10 0.00000274 399 -0.0892 High ## 8 11082PP SLC16A12 1.39 41.1 1.46e-10 0.00000274 100 0.261 Neoadjuvant ## 9 11086PP SLC16A12 1.39 41.1 1.46e-10 0.00000274 37 -0.132 Neoadjuvant ## 10 11103PP SLC16A12 1.39 41.1 1.46e-10 0.00000274 73 0.146 Neoadjuvant ## # … with 394,811 more rows You can identify anrtefactual calls from your differential transcribt anundance analysis, due to outliers. ``` r # Import libraries counts.ppc = counts |> mutate(is_significant = FDR < 0.01) |> identify_outliers( formula = ~ Label, .sample = sample, .transcript = symbol, .abundance = value, .significance = PValue, .do_check = is_significant, percent_false_positive_genes = 5 ) ``` The new posterior predictive check has been added to the original data frame ``` r counts.ppc ``` ## # A tibble: 15 x 4 ## symbol sample_wise_data ppc_samples_failed tot_deleterious_outliers ## <chr> <list> <int> <int> ## 1 SLC16A12 <tibble [21 × 12]> 0 0 ## 2 CYP1A1 <tibble [21 × 12]> 1 1 ## 3 ART3 <tibble [21 × 12]> 0 0 ## 4 DIO2 <tibble [21 × 12]> 0 0 ## 5 OR51E2 <tibble [21 × 12]> 0 0 ## 6 MUC16 <tibble [21 × 12]> 0 0 ## 7 CCNA1 <tibble [21 × 12]> 0 0 ## 8 LYZ <tibble [21 × 12]> 1 1 ## 9 PPM1H <tibble [21 × 12]> 0 0 ## 10 SUSD5 <tibble [21 × 12]> 0 0 ## 11 TPRG1 <tibble [21 × 12]> 0 0 ## 12 EPB42 <tibble [21 × 12]> 0 0 ## 13 LRRC38 <tibble [21 × 12]> 0 0 ## 14 SUSD4 <tibble [21 × 12]> 0 0 ## 15 MMP8 <tibble [21 × 12]> 0 0 The new data frame contains plots for each gene We can visualise the top five differentially transcribed genes ``` r counts.ppc_plots = counts.ppc |> plot_credible_intervals() ``` ``` r counts.ppc_plots |> pull(plot) |> head(2) ``` ## [[1]] ![](man/figures/unnamed-chunk-9-1.png) ## ## [[2]] ![](man/figures/unnamed-chunk-9-2.png)