Bioconductor Code: M3C

Name	Mode	Size
R	040000
data	040000
man	040000
vignettes	040000
.Rbuildignore	100755	0 kb
.gitignore	100755	0 kb
DESCRIPTION	100644	1 kb
NAMESPACE	100755	0 kb
NEWS	100755
README.md	100755	2 kb

README.md

# M3C: Monte Carlo Consensus Clustering Genome-wide data is used to stratify patients into classes using class discovery algorithms. However, we have observed systematic bias present in current state-of-the-art methods. This arises from not considering reference distributions while selecting the number of classes (K). As a solution, we developed a consensus clustering-based algorithm with a hypothesis testing framework called Monte Carlo consensus clustering (M3C). M3C uses a multi-core enabled Monte Carlo simulation to generate null distributions along the range of K which are used to calculate p values to select its value. P values beyond the limits of the simulation are estimated using a beta distribution. M3C can quantify structural relationships between clusters and uses spectral clustering to deal with non-gaussian and imbalanced structures. Details: -M3C calculates the consensus rate, a measure of stability of samples, which is quantified for each K using the PAC score -Generation of reference PAC distribution using a multi-core Monte Carlo simulation -Reference generation preserves gene-gene correlation structure of data -The relative cluster stability index (RCSI) and empirical p values are used instead of delta K -Extrapolated p values are calculate by fitting a beta distribution -Increased accuracy compared with other methods verified using simulations -Controls for the null hypothesis K = 1 -Removes systematic bias -Ability to investigates structural relationships using hierarchical clustering of medoids and sigclust -Inner algorithms are PAM, K means, and spectral clustering -Automatic re ordering of expression matrix and annotation data to help user do their analysis faster -Plotting code using ggplot2 for publication quality outputs