Name Mode Size
.github 040000
R 040000
README_files 040000
data 040000
inst 040000
man 040000
src 040000
tests 040000
vignettes 040000
.Rbuildignore 100644 0 kb
.gitignore 100644 0 kb
.travis.yml 100644 1 kb
DESCRIPTION 100644 3 kb
NAMESPACE 100644 2 kb
README.Rmd 100644 2 kb
README.md 100644 4 kb
configure 100755 0 kb
configure.win 100755 0 kb
README.md
Posterior predictive check for bulk RNA sequencing data ================ The input data set is a tidy representation of a differential gene transcript abundance analysis To install: Before install, for linux systems, in order to exploit multi-threading, from R write: ``` r dir.create(file.path("~/", ".R"), showWarnings = FALSE) fileConn<-file("~/.R/Makevars") writeLines(c( "CXX14FLAGS += -O3","CXX14FLAGS += -DSTAN_THREADS", "CXX14FLAGS += -pthread"), fileConn) close(fileConn) ``` Then, install with ``` r devtools::install_github("stemangiola/ppcseq") ``` You can get the test dataset with ``` r data("counts") counts ``` ## # A tibble: 394,821 x 9 ## sample symbol logCPM LR PValue FDR value W Label ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <chr> ## 1 10922PP SLC16A12 1.39 41.1 1.46e-10 0.00000274 160 -0.129 High ## 2 10935PP SLC16A12 1.39 41.1 1.46e-10 0.00000274 150 -0.127 High ## 3 10973PP SLC16A12 1.39 41.1 1.46e-10 0.00000274 146 -0.426 High ## 4 10976PP SLC16A12 1.39 41.1 1.46e-10 0.00000274 347 -0.0164 High ## 5 10985PP SLC16A12 1.39 41.1 1.46e-10 0.00000274 175 -0.135 High ## 6 11026PP SLC16A12 1.39 41.1 1.46e-10 0.00000274 244 0.125 High ## 7 11045PP SLC16A12 1.39 41.1 1.46e-10 0.00000274 399 -0.0892 High ## 8 11082PP SLC16A12 1.39 41.1 1.46e-10 0.00000274 100 0.261 Neoadjuvant ## 9 11086PP SLC16A12 1.39 41.1 1.46e-10 0.00000274 37 -0.132 Neoadjuvant ## 10 11103PP SLC16A12 1.39 41.1 1.46e-10 0.00000274 73 0.146 Neoadjuvant ## # … with 394,811 more rows You can identify anrtefactual calls from your differential transcribt anundance analysis, due to outliers. ``` r # Import libraries counts.ppc = counts |> mutate(is_significant = FDR < 0.01) |> identify_outliers( formula = ~ Label, .sample = sample, .transcript = symbol, .abundance = value, .significance = PValue, .do_check = is_significant, percent_false_positive_genes = 5 ) ``` The new posterior predictive check has been added to the original data frame ``` r counts.ppc ``` ## # A tibble: 15 x 4 ## symbol sample_wise_data ppc_samples_failed tot_deleterious_outliers ## <chr> <list> <int> <int> ## 1 SLC16A12 <tibble [21 × 12]> 0 0 ## 2 CYP1A1 <tibble [21 × 12]> 1 1 ## 3 ART3 <tibble [21 × 12]> 0 0 ## 4 DIO2 <tibble [21 × 12]> 0 0 ## 5 OR51E2 <tibble [21 × 12]> 0 0 ## 6 MUC16 <tibble [21 × 12]> 0 0 ## 7 CCNA1 <tibble [21 × 12]> 0 0 ## 8 LYZ <tibble [21 × 12]> 1 1 ## 9 PPM1H <tibble [21 × 12]> 0 0 ## 10 SUSD5 <tibble [21 × 12]> 0 0 ## 11 TPRG1 <tibble [21 × 12]> 0 0 ## 12 EPB42 <tibble [21 × 12]> 0 0 ## 13 LRRC38 <tibble [21 × 12]> 0 0 ## 14 SUSD4 <tibble [21 × 12]> 0 0 ## 15 MMP8 <tibble [21 × 12]> 0 0 The new data frame contains plots for each gene We can visualise the top five differentially transcribed genes ``` r counts.ppc_plots = counts.ppc |> plot_credible_intervals() ``` ``` r counts.ppc_plots |> pull(plot) |> head(2) ``` ## [[1]] ![](man/figures/unnamed-chunk-9-1.png)<!-- --> ## ## [[2]] ![](man/figures/unnamed-chunk-9-2.png)<!-- -->