Name Mode Size
R 040000
data 040000
inst 040000
man 040000
tests 040000
vignettes 040000
.Rbuildignore 100644 0 kb
.gitignore 100644 0 kb
.travis.yml 100644 0 kb
DESCRIPTION 100644 2 kb
NAMESPACE 100755 1 kb
NEWS 100755 2 kb
README.Rmd 100644 7 kb 100644 7 kb
<!-- is generated from README.Rmd. Please edit that file --> <a href=""><img border="0" src="" title="How long since the package was first in a develd Bioconductor version (or is it in devel only)."></a> <a href=""><img border="0" src="" title="Percentile (top 5/20/50% or 'available') of downloads over last 6 full months. Comparison is done across all package categories (software, annotation, experiment)."></a> <a href=""><img border="0" src="" title="Support site activity, last 6 months: tagged questions/avg. answers per question/avg. comments per question/accepted answers, or 0 if no tagged posts."></a> <a href=""><img border="0" src="" title="average Subversion commits (to the devel branch) per month for the last 6 months"></a> Status: Travis CI [![Build Status](]( Bioc-release <a href=""><img border="0" src="" title="Whether the package is available on all platforms; click for details."></a> <a href=""><img border="0" src="" title="build results; click for full report"></a> Bioc-devel <a href=""><img border="0" src="" title="Whether the package is available on all platforms; click for details."></a> <a href=""><img border="0" src="" title="build results; click for full report"></a> Codecov [![codecov](]( # ClusterSignificance The ClusterSignificance package is written in [R]( and can be found hosted at the [Bioconductor]( repository via the links below. - [release]( - [devel]( ## Introduction The ClusterSignificance package provides tools to assess if clusters, in e.g. principal component analysis (PCA), have a separation different from random or permuted data. This is accomplished in a 3 step process *projection*, *classification*, and *permutation*. To be able to compare cluster separations, we have to give them a score based on this separation. First, all data points in each cluster are projected onto a line (*projection*), after which the seperation for two groups at a time is scored (*classification*). Furthermore, to get a p-value for the separation we have to compare the separation score for our real data to the separation score for permuted data (*permutation*). ## Installation The release version of ClusterSignificance can be installed in R from [Bioconductor]( as follows: ``` r install.packages("BiocManager") BiocManager::install("ClusterSignificance") ``` To install the development version use: ``` r install.packages("devtools") devtools::install_github("jasonserviss/ClusterSignificance") ``` ## Quick Start While we recommend reading the [vignette](, the instructions that follow will allow you to quickly get a feel for how ClusterSignificance works and what it is capable of. Here we utilize the example data included in the ClusterSignificance package for the Pcp method. ### Projection We start by projecting the points into one dimension using the Pcp method. We are able to visualize each step in the projection by plotting the results as shown below. ```r library(ClusterSignificance) classes <- rownames(pcpMatrix) prj <- pcp(pcpMatrix, classes) plot(prj) ``` <img src="man/figures/pcpPrj.png" align="center" /> ### Classification Now that the points are in one dimension, we can score each possible seperation and deduce the max seperation score. This is accomplished by the classify command (again we can plot the results afterwards). The vertical lines in the plot represent the seperation score for each possible seperation. ``` r ## Classify and plot. cl <- classify(prj) plot(cl) ``` <img src="man/figures/pcpCl.png" align="center" /> ### Permutation Finally, as we have now determined the max seperation score, we can permute the data to examine how many permuted max scores exceed that of our real max score and, thus, calculate a p-value for our seperation. Plotting the permutaion results show a histogram of the permuted max scores with the red line representing the real score. ``` r ## Set the seed and number of iterations. set.seed(3) iterations <- 100 ## Permute and plot. pe <- permute( mat = pcpMatrix, iter = iterations, classes = classes, projmethod = "pcp" ) ``` ## initializing permutation analysis ## 100 iterations were sucessfully completed for comparison class1 vs class2 ## 100 iterations were sucessfully completed for comparison class1 vs class3 ## 100 iterations were sucessfully completed for comparison class2 vs class3 ``` r plot(pe) ``` <img src="man/figures/pcpPerm.png" align="center" /> To calculate the p-value we use the following command. ## class1 vs class2 class1 vs class3 class2 vs class3 ## 0.01 0.15 0.01 ## Bug Reports and Issues The Bioconductor support site for the ClusterSignificance package is located [here]( Issues and bugs can be reported via Github at: [ClusterSignificance]( ## Citation Jason T. Serviss, Jesper R. Gådin, Per Eriksson, Lasse Folkersen, Dan Grandér; ClusterSignificance: a bioconductor package facilitating statistical analysis of class cluster separations in dimensionality reduced data, Bioinformatics, Volume 33, Issue 19, 1 October 2017, Pages 3126–3128, <> Citation information can be found in R using: ``` r library(ClusterSignificance) citation("ClusterSignificance") ``` ## License [GPL-3](