Name Mode Size
..
CITATION 100644 1 kb
README.md
[![Travis-CI Build Status](https://travis-ci.com/jhsiao999/peco.svg?branch=master)](https://travis-ci.com/jhsiao999/peco) [![AppVeyor build status](https://ci.appveyor.com/api/projects/status/github/jhsiao999/peco?branch=master&svg=true)](https://ci.appveyor.com/project/jhsiao999/peco) [![CircleCI build status](https://circleci.com/gh/jhsiao999/peco.svg?style=svg)](https://circleci.com/gh/jhsiao999/peco) [![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-blue.svg)](http://www.gnu.org/licenses/gpl-3.0) # peco **peco** is an R package for **P**r**E**dicting **C**ell cycle pr**O**gression in a continuum using scRNA-seq data. ## Installation To install and load the package, run: ``` r install.packages("devtools") library(devtools) install_github("jhsiao999/peco") ``` `peco` uses `SingleCellExperiment` class objects. ``` r library(peco) library(SingleCellExperiment) ``` ## Loading required package: SummarizedExperiment ## Loading required package: GenomicRanges ## Loading required package: stats4 ## Loading required package: BiocGenerics ## Loading required package: parallel ## ## Attaching package: 'BiocGenerics' ## The following objects are masked from 'package:parallel': ## ## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, ## clusterExport, clusterMap, parApply, parCapply, parLapply, ## parLapplyLB, parRapply, parSapply, parSapplyLB ## The following objects are masked from 'package:stats': ## ## IQR, mad, sd, var, xtabs ## The following objects are masked from 'package:base': ## ## anyDuplicated, append, as.data.frame, basename, cbind, ## colMeans, colnames, colSums, dirname, do.call, duplicated, ## eval, evalq, Filter, Find, get, grep, grepl, intersect, ## is.unsorted, lapply, lengths, Map, mapply, match, mget, order, ## paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, ## Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort, ## table, tapply, union, unique, unsplit, which, which.max, ## which.min ## Loading required package: S4Vectors ## ## Attaching package: 'S4Vectors' ## The following object is masked from 'package:base': ## ## expand.grid ## Loading required package: IRanges ## Loading required package: GenomeInfoDb ## Loading required package: Biobase ## Welcome to Bioconductor ## ## Vignettes contain introductory material; view with ## 'browseVignettes()'. To cite Bioconductor, see ## 'citation("Biobase")', and for packages 'citation("pkgname")'. ## Loading required package: DelayedArray ## Loading required package: matrixStats ## ## Attaching package: 'matrixStats' ## The following objects are masked from 'package:Biobase': ## ## anyMissing, rowMedians ## Loading required package: BiocParallel ## ## Attaching package: 'DelayedArray' ## The following objects are masked from 'package:matrixStats': ## ## colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges ## The following objects are masked from 'package:base': ## ## aperm, apply ``` r library(doParallel) ``` ## Loading required package: foreach ## Loading required package: iterators ``` r library(foreach) ``` ## Overview `peco` is a supervised approach for PrEdicting cell cycle phase in a COntinuum using single-cell RNA sequencing data. The R package provides functions to build training dataset and also functions to use existing training data to predict cell cycle on a continuum. Our work demonstrated that peco is able to predict continuous cell cylce phase using a small set of cylcic genes: *CDK1*, *UBE2C*, *TOP2A*, *HISTH1E*, and *HISTH1C* (identified as cell cycle marker genes in studies of yeast ([Spellman et al., 1998](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC25624)) and HeLa cells ([Whitfield et al., 2002](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC117619))). Below we provide two use cases. Vignette 1 shows how to use the built-training dataset to predict continuous cell cycle. Vignette 2 shows how to make a training datast and build a predictor using training data. Users can also view the vigenettes via `browseVignettes("peco")`. ## About the training dataset `training_human` stores built-in training data of 101 significant cyclic genes. Below are the slots contained in `training_human`: - `predict.yy`: a gene by sample matrix (101 by 888) that stores predict cyclic expression values. - `cellcycle_peco_reordered`: cell cycle phase in a unit circle (angle), ordered from 0 to 2\(pi\) - `cellcycle_function`: lists of 101 function corresponding to the top 101 cyclic genes identified in our dataset - `sigma`: standard error associated with cyclic trends of gene expression - `pve`: proportion of variance explained by the cyclic trend <!-- end list --> ``` r data("training_human") ``` ## Predict cell cycle phase using gene expression data `peco` is integrated with `SingleCellExperiment` object in Bioconductor. Below shows an example of inputting `SingleCellExperiment` object to perform cell cycle phase prediction. `sce_top101genes` includes 101 genes and 888 single-cell samples and one assay slot of `counts`. ``` r data("sce_top101genes") assays(sce_top101genes) ``` ## List of length 1 ## names(1): counts Transform the expression values to quantile-normalizesd counts-per-million values. `peco` uses the `cpm_quantNormed` slot as input data for predictions. ``` r sce_top101genes <- data_transform_quantile(sce_top101genes) ``` ## computing on 2 cores ``` r assays(sce_top101genes) ``` ## List of length 3 ## names(3): counts cpm cpm_quantNormed Apply the prediction model using function `cycle_npreg_outsample` and generate prediction results contained in a list object `pred_top101genes`. ``` r pred_top101genes <- cycle_npreg_outsample( Y_test=sce_top101genes, sigma_est=training_human$sigma[rownames(sce_top101genes),], funs_est=training_human$cellcycle_function[rownames(sce_top101genes)], method.trend="trendfilter", ncores=1, get_trend_estimates=FALSE) ``` The `pred_top101genes$Y` contains a SingleCellExperiment object with the predict cell cycle phase in the `colData` slot. ``` r head(colData(pred_top101genes$Y)$cellcycle_peco) ``` ## 20170905-A01 20170905-A02 20170905-A03 20170905-A06 20170905-A07 ## 1.099557 4.555309 2.481858 4.303982 4.052655 ## 20170905-A08 ## 1.413717 Visualize results of prediction for one gene. Below we choose CDK1 (“ENSG00000170312”). Because CDK1 is a known cell cycle gene, this visualization serves as a sanity check for the results of fitting. The fitted function `training_human$cellcycle_function[[1]]` was obtained from our training data. ``` r plot(y=assay(pred_top101genes$Y,"cpm_quantNormed")["ENSG00000170312",], x=colData(pred_top101genes$Y)$theta_shifted, main = "CDK1", ylab = "quantile normalized expression") points(y=training_human$cellcycle_function[["ENSG00000170312"]](seq(0,2*pi, length.out=100)), x=seq(0,2*pi, length.out=100), col = "blue", pch =16) ``` ![](README_files/figure-gfm/unnamed-chunk-7-1.png)<!-- --> ## Visualize cyclic expression trend based on predicted phase Visualize results of prediction for the top 10 genesone genes. Use `fit_cyclical_many` to estimate cyclic function based on the input data. ``` r # predicted cell time in the input data theta_predict = colData(pred_top101genes$Y)$cellcycle_peco names(theta_predict) = rownames(colData(pred_top101genes$Y)) # expression values of 10 genes in the input data yy_input = assay(pred_top101genes$Y,"cpm_quantNormed")[1:6,] # apply trendfilter to estimate cyclic gene expression trend fit_cyclic <- fit_cyclical_many(Y=yy_input, theta=theta_predict) ``` ## computing on 2 cores ``` r gene_symbols = rowData(pred_top101genes$Y)$hgnc[rownames(yy_input)] par(mfrow=c(2,3)) for (i in 1:6) { plot(y=yy_input[i,], x=fit_cyclic$cellcycle_peco_ordered, main = gene_symbols[i], ylab = "quantile normalized expression") points(y=fit_cyclic$cellcycle_function[[i]](seq(0,2*pi, length.out=100)), x=seq(0,2*pi, length.out=100), col = "blue", pch =16) } ``` ![](README_files/figure-gfm/unnamed-chunk-8-1.png)<!-- --> ## Session information ``` r sessionInfo() ``` ## R version 3.5.1 (2018-07-02) ## Platform: x86_64-pc-linux-gnu (64-bit) ## Running under: Scientific Linux 7.4 (Nitrogen) ## ## Matrix products: default ## BLAS/LAPACK: /software/openblas-0.2.19-el7-x86_64/lib/libopenblas_haswellp-r0.2.19.so ## ## locale: ## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C ## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 ## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 ## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C ## [9] LC_ADDRESS=C LC_TELEPHONE=C ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C ## ## attached base packages: ## [1] parallel stats4 stats graphics grDevices utils datasets ## [8] methods base ## ## other attached packages: ## [1] doParallel_1.0.14 iterators_1.0.12 ## [3] foreach_1.4.4 SingleCellExperiment_1.4.1 ## [5] SummarizedExperiment_1.12.0 DelayedArray_0.8.0 ## [7] BiocParallel_1.16.0 matrixStats_0.55.0 ## [9] Biobase_2.42.0 GenomicRanges_1.34.0 ## [11] GenomeInfoDb_1.18.1 IRanges_2.16.0 ## [13] S4Vectors_0.20.1 BiocGenerics_0.28.0 ## [15] peco_0.99.10 ## ## loaded via a namespace (and not attached): ## [1] Rcpp_1.0.3 mvtnorm_1.0-11 ## [3] lattice_0.20-38 assertthat_0.2.1 ## [5] rprojroot_1.3-2 digest_0.6.20 ## [7] plyr_1.8.4 R6_2.4.0 ## [9] backports_1.1.2 genlasso_1.4 ## [11] evaluate_0.12 pracma_2.2.9 ## [13] ggplot2_3.2.1 pillar_1.3.1 ## [15] zlibbioc_1.28.0 rlang_0.4.0 ## [17] lazyeval_0.2.1 Matrix_1.2-17 ## [19] rmarkdown_1.10 geigen_2.3 ## [21] stringr_1.3.1 igraph_1.2.2 ## [23] RCurl_1.95-4.11 munsell_0.5.0 ## [25] HDF5Array_1.10.1 vipor_0.4.5 ## [27] compiler_3.5.1 pkgconfig_2.0.3 ## [29] ggbeeswarm_0.6.0 htmltools_0.3.6 ## [31] tidyselect_0.2.5 tibble_2.1.1 ## [33] gridExtra_2.3 GenomeInfoDbData_1.2.0 ## [35] codetools_0.2-15 viridisLite_0.3.0 ## [37] crayon_1.3.4 dplyr_0.8.0.1 ## [39] MASS_7.3-51.1 bitops_1.0-6 ## [41] grid_3.5.1 gtable_0.2.0 ## [43] magrittr_1.5 scales_1.0.0 ## [45] stringi_1.2.4 reshape2_1.4.3 ## [47] XVector_0.22.0 viridis_0.5.1 ## [49] scater_1.10.1 DelayedMatrixStats_1.4.0 ## [51] boot_1.3-20 Rhdf5lib_1.4.3 ## [53] tools_3.5.1 beeswarm_0.2.3 ## [55] glue_1.3.0 purrr_0.3.2 ## [57] yaml_2.2.0 colorspace_1.3-2 ## [59] rhdf5_2.26.2 circular_0.4-93 ## [61] conicfit_1.0.4 knitr_1.20 ----- #### Contact Please contact me at [joyce.hsiao1@gmail.com](joyce.hsiao1@gmail.com) for questions on the package or the methods. #### How to cite > Hsiao, C. J., Tung, P., Blischak, J. D., Burnett, J., Dey, K. K., > Barr, A. K., Stephens, M., and Gilad, Y. (2018). [Characterizing and > inferring quantitative cell-cycle phase in single-cell RNA-seq data > analysis.](https://doi.org/10.1101/526848) bioRxiv > <doi:10.1101/526848> #### License Copyright (c) 2019-2020, Joyce Hsiao. All source code and software in this repository are made available under the terms of the [GNU General Public License](https://www.gnu.org/licenses/gpl-3.0.en.html). See file [LICENSE](LICENSE) for the full text of the license.