Name Mode Size
..
figures 040000
CORE_bagging.Rd 100644 2 kb
CORE_clustering.Rd 100644 2 kb
CORE_subcluster.Rd 100644 1 kb
PCA.Rd 100644 1 kb
PrinComp_cpp.Rd 100644 1 kb
add_import.Rd 100644 0 kb
annotate_clusters.Rd 100644 1 kb
bootstrap_parallel.Rd 100644 2 kb
bootstrap_prediction.Rd 100644 3 kb
calcDist.Rd 100644 0 kb
calcDistArma.Rd 100644 0 kb
clustering.Rd 100644 2 kb
clustering_bagging.Rd 100644 2 kb
day_2_cardio_cell_sample.Rd 100644 1 kb
day_5_cardio_cell_sample.Rd 100644 1 kb
distvec.Rd 100644 0 kb
find_markers.Rd 100644 2 kb
find_optimal_stability.Rd 100644 2 kb
find_stability.Rd 100644 1 kb
mean_cpp.Rd 100644 0 kb
new_scGPS_object.Rd 100644 2 kb
new_summarized_scGPS_object.Rd 100644 2 kb
plot_CORE.Rd 100644 1 kb
plot_optimal_CORE.Rd 100644 2 kb
plot_reduced.Rd 100644 2 kb
predicting.Rd 100644 2 kb
rand_index.Rd 100644 1 kb
rcpp_Eucl_distance_NotPar.Rd 100644 1 kb
rcpp_parallel_distance.Rd 100644 1 kb
reformat_LASSO.Rd 100644 2 kb
sub_clustering.Rd 100644 1 kb
subset_cpp.Rd 100644 1 kb
summary_accuracy.Rd 100644 1 kb
summary_deviance.Rd 100644 2 kb
summary_prediction_lasso.Rd 100644 2 kb
summary_prediction_lda.Rd 100644 2 kb
tSNE.Rd 100644 1 kb
top_var.Rd 100644 1 kb
tp_cpp.Rd 100644 0 kb
training.Rd 100644 3 kb
training_gene_sample.Rd 100644 1 kb
var_cpp.Rd 100644 0 kb
README.md
# _scGPS_ - Single Cell Global fate Potential of Subpopulations <p align="center"> <img src="man/figures/scGPSlogo.png" width="200px"> </p> The _scGPS_ package website is available at: https://imb-computational-genomics-lab.github.io/scGPS/index.html The usage instruction can be found at: https://imb-computational-genomics-lab.github.io/scGPS/articles/vignette.html ## _scGPS_ general description _scGPS_ is a complete single cell RNA analysis framework from decomposing a mixed population into clusters (_SCORE_) to analysing the relationship between clusters (_scGPS_). _scGPS_ also performs unsupervised selection of predictive genes defining a subpopulation and/or driving transition between subpopulations. The package implements two new algorithms _SCORE_ and _scGPS_. Key features of the _SCORE_ clustering algorithm - Unsupervised (no prior number of clusters), stable (with automated selection of stability and resolution parameters through scanning a range of search windows for each run, together with a boostrapping aggregation approach to determine stable clusters), fast (with Rcpp implementation) - _SCORE_ first builds a reference cluster (the highest resolution) and then runs iterative clustering through 40 windows (or more) in the dendrogram - Resolution is quantified as the divergence from reference by applying adjusted Rand index - Stability is the proportional to the number of executive runs without Rand index change while changing the cluster search space - Optimal resolution is the combination of: stable and high resolution - Bagging algorithm (bootstrap aggregation) can detect a rare subpopulation, which appears multiple times during different decision tree runs Key features of the _scGPS_ algorithm - Estimates transition scores between any two subpopulations - _scGPS_ prediction model is based on Elastic Net procedure, which enables to select predictive genes and train interpretable models to predict each subpopulation - Genes identified by _scGPS_ perform better than known gene markers in predicting cell subpopulations - Transition scores are percents of target cells classified as the same class to the original subpopulation - For cell subtype comparision, transition scores are similarity between two subpopulations - The scores are average values from 100 bootstrap runs - For comparison, a non-shrinkage procedure with linear discriminant analysis (LDA) is used ## _scGPS_ workflow _scGPS_ takes scRNA expression dataset(s) from one or more unknown sample(s) to find subpopulations and relationship between these subpopulations. The input dataset(s) contains mixed, heterogeous cells. _scGPS_ first uses _SCORE_ (or _CORE_ V2.0) to identify homogenous subpopulations. _scGPS_ contains a number of functions to verify the subpopulations identified by _SCORE_ (e.g. functions to compare with results from PCA, tSNE and the imputation method CIDR). _scGPS_ also has options to find gene markers that distinguish a subpopulation from the remaining cells and performs pathway enrichment analysis to annotate subpopulation. In the second stage, _scGPS_ applies a machine learning procedure to select optimal gene predictors and to build prediction models that can estimate between-subpopulation transition scores, which are the probability of cells from one subpopulation that can likely transition to the other subpopulation. <p align="center"> <img src="man/figures/packagePlan.png" width="450px"> <br> Figure 1. scGPS workflow. Yellow boxes show inputs, and green boxes show main scGPS analysis. </p>