... | ... |
@@ -17,6 +17,7 @@ export(celda_G) |
17 | 17 |
export(clusterProbability) |
18 | 18 |
export(clusters) |
19 | 19 |
export(compareCountMatrix) |
20 |
+export(curveElbow) |
|
20 | 21 |
export(differentialExpression) |
21 | 22 |
export(distinct_colors) |
22 | 23 |
export(factorizeMatrix) |
... | ... |
@@ -43,6 +44,8 @@ export(plotGridSearchPerplexity.celda_G) |
43 | 44 |
export(plotHeatmap) |
44 | 45 |
export(recodeClusterY) |
45 | 46 |
export(recodeClusterZ) |
47 |
+export(recursiveSplitCell) |
|
48 |
+export(recursiveSplitModule) |
|
46 | 49 |
export(resList) |
47 | 50 |
export(resamplePerplexity) |
48 | 51 |
export(runParams) |
... | ... |
@@ -13,9 +13,9 @@ |
13 | 13 |
#' @param split.on.last Integer. After `stop.iter` iterations have been performed without improvement, a heuristic will be applied to determine if a cell population should be reassigned and another cell population should be split into two clusters. If a split occurs, then `stop.iter` will be reset. Default TRUE. |
14 | 14 |
#' @param seed Integer. Passed to `set.seed()`. Default 12345. If NULL, no calls to `set.seed()` are made. |
15 | 15 |
#' @param nchains Integer. Number of random cluster initializations. Default 3. |
16 |
-#' @param initialize Chararacter. One of 'random' or 'split'. With 'random', cells are randomly assigned to a clusters. With 'split' cell clusters will be recurssively split into two clusters using `celda_C` until the specified K is reached. Default 'random'. |
|
17 |
-#' @param count.checksum "Character. An MD5 checksum for the `counts` matrix. Default NULL. |
|
16 |
+#' @param z.initialize Chararacter. One of 'random', 'split', or 'predefined'. With 'random', cells are randomly assigned to a populations. With 'split', cells will be split into sqrt(K) populations and then each popluation will be subsequently split into another sqrt(K) populations. With 'predefined', values in `z.init` will be used to initialize `z`. Default 'split'. |
|
18 | 17 |
#' @param z.init Integer vector. Sets initial starting values of z. If NULL, starting values for each cell will be randomly sampled from `1:K`. 'z.init' can only be used when `initialize = 'random'`. Default NULL. |
18 |
+#' @param count.checksum "Character. An MD5 checksum for the `counts` matrix. Default NULL. |
|
19 | 19 |
#' @param logfile Character. Messages will be redirected to a file named `logfile`. If NULL, messages will be printed to stdout. Default NULL. |
20 | 20 |
#' @param verbose Logical. Whether to print log messages. Default TRUE. |
21 | 21 |
#' @return An object of class `celda_C` with the cell population clusters stored in in `z`. |
... | ... |
@@ -17,10 +17,11 @@ |
17 | 17 |
#' @param split.on.last Integer. After `stop.iter` iterations have been performed without improvement, a heuristic will be applied to determine if a cell population or feature module should be reassigned and another cell population or feature module should be split into two clusters. If a split occurs, then 'stop.iter' will be reset. Default TRUE. |
18 | 18 |
#' @param seed Integer. Passed to `set.seed()`. Default 12345. If NULL, no calls to `set.seed()` are made. |
19 | 19 |
#' @param nchains Integer. Number of random cluster initializations. Default 3. |
20 |
-#' @param initialize Chararacter. One of 'random' or 'split'. With 'random', cells and features are randomly assigned to a clusters. With 'split' cell and feature clusters will be recurssively split into two clusters using `celda_C` and `celda_G`, respectively, until the specified K and L is reached. Default 'random'. |
|
21 |
-#' @param count.checksum Character. An MD5 checksum for the `counts` matrix. Default NULL. |
|
20 |
+#' @param z.initialize Chararacter. One of 'random', 'split', or 'predefined'. With 'random', cells are randomly assigned to a populations. With 'split', cells will be split into sqrt(K) populations and then each popluation will be subsequently split into another sqrt(K) populations. With 'predefined', values in `z.init` will be used to initialize `z`. Default 'split'. |
|
21 |
+#' @param y.initialize Chararacter. One of 'random', 'split', or 'predefined'. With 'random', features are randomly assigned to a modules. With 'split', features will be split into sqrt(L) modules and then each module will be subsequently split into another sqrt(L) modules. With 'predefined', values in `y.init` will be used to initialize `y`. Default 'split'. |
|
22 | 22 |
#' @param z.init Integer vector. Sets initial starting values of z. If NULL, starting values for each cell will be randomly sampled from 1:K. 'z.init' can only be used when `initialize' = 'random'`. Default NULL. |
23 | 23 |
#' @param y.init Integer vector. Sets initial starting values of y. If NULL, starting values for each feature will be randomly sampled from 1:L. 'y.init' can only be used when `initialize = 'random'`. Default NULL. |
24 |
+#' @param count.checksum Character. An MD5 checksum for the `counts` matrix. Default NULL. |
|
24 | 25 |
#' @param logfile Character. Messages will be redirected to a file named `logfile`. If NULL, messages will be printed to stdout. Default NULL. |
25 | 26 |
#' @param verbose Logical. Whether to print log messages. Default TRUE. |
26 | 27 |
#' @return An object of class `celda_CG` with the cell populations clusters stored in in `z` and feature module clusters stored in `y`. |
... | ... |
@@ -13,9 +13,9 @@ |
13 | 13 |
#' @param split.on.last Integer. After `stop.iter` iterations have been performed without improvement, a heuristic will be applied to determine if a cell population should be reassigned and another cell population should be split into two clusters. If a split occurs, then `stop.iter` will be reset. Default TRUE. |
14 | 14 |
#' @param seed Integer. Passed to `set.seed()`. Default 12345. If NULL, no calls to `set.seed()` are made. |
15 | 15 |
#' @param nchains Integer. Number of random cluster initializations. Default 3. |
16 |
-#' @param initialize Chararacter. One of 'random' or 'split'. With 'random', features are randomly assigned to a clusters. With 'split' cell and feature clusters will be recurssively split into two clusters using `celda_G()` until the specified L is reached. Default 'random'. |
|
17 |
-#' @param count.checksum Character. An MD5 checksum for the `counts` matrix. Default NULL. |
|
16 |
+#' @param y.initialize Chararacter. One of 'random', 'split', or 'predefined'. With 'random', features are randomly assigned to a modules. With 'split', features will be split into sqrt(L) modules and then each module will be subsequently split into another sqrt(L) modules. With 'predefined', values in `y.init` will be used to initialize `y`. Default 'split'. |
|
18 | 17 |
#' @param y.init Integer vector. Sets initial starting values of y. If NULL, starting values for each feature will be randomly sampled from `1:L`. `y.init` can only be used when `initialize = 'random'`. Default NULL. |
18 |
+#' @param count.checksum Character. An MD5 checksum for the `counts` matrix. Default NULL. |
|
19 | 19 |
#' @param logfile Character. Messages will be redirected to a file named `logfile`. If NULL, messages will be printed to stdout. Default NULL. |
20 | 20 |
#' @param verbose Logical. Whether to print log messages. Default TRUE. |
21 | 21 |
#' @return An object of class `celda_G` with the feature module clusters stored in `y`. |
... | ... |
@@ -7,7 +7,7 @@ |
7 | 7 |
celda_C(counts, sample.label = NULL, K, alpha = 1, beta = 1, |
8 | 8 |
algorithm = c("EM", "Gibbs"), stop.iter = 10, max.iter = 200, |
9 | 9 |
split.on.iter = 10, split.on.last = TRUE, seed = 12345, |
10 |
- nchains = 3, initialize = c("random", "split"), |
|
10 |
+ nchains = 3, z.initialize = c("split", "random", "predefined"), |
|
11 | 11 |
count.checksum = NULL, z.init = NULL, logfile = NULL, |
12 | 12 |
verbose = TRUE) |
13 | 13 |
} |
... | ... |
@@ -36,7 +36,7 @@ celda_C(counts, sample.label = NULL, K, alpha = 1, beta = 1, |
36 | 36 |
|
37 | 37 |
\item{nchains}{Integer. Number of random cluster initializations. Default 3.} |
38 | 38 |
|
39 |
-\item{initialize}{Chararacter. One of 'random' or 'split'. With 'random', cells are randomly assigned to a clusters. With 'split' cell clusters will be recurssively split into two clusters using `celda_C` until the specified K is reached. Default 'random'.} |
|
39 |
+\item{z.initialize}{Chararacter. One of 'random', 'split', or 'predefined'. With 'random', cells are randomly assigned to a populations. With 'split', cells will be split into sqrt(K) populations and then each popluation will be subsequently split into another sqrt(K) populations. With 'predefined', values in `z.init` will be used to initialize `z`. Default 'split'.} |
|
40 | 40 |
|
41 | 41 |
\item{count.checksum}{"Character. An MD5 checksum for the `counts` matrix. Default NULL.} |
42 | 42 |
|
... | ... |
@@ -8,8 +8,10 @@ celda_CG(counts, sample.label = NULL, K, L, alpha = 1, beta = 1, |
8 | 8 |
delta = 1, gamma = 1, algorithm = c("EM", "Gibbs"), |
9 | 9 |
stop.iter = 10, max.iter = 200, split.on.iter = 10, |
10 | 10 |
split.on.last = TRUE, seed = 12345, nchains = 3, |
11 |
- initialize = c("random", "split"), count.checksum = NULL, |
|
12 |
- z.init = NULL, y.init = NULL, logfile = NULL, verbose = TRUE) |
|
11 |
+ z.initialize = c("split", "random", "predefined"), |
|
12 |
+ y.initialize = c("split", "random", "predefined"), |
|
13 |
+ count.checksum = NULL, z.init = NULL, y.init = NULL, |
|
14 |
+ logfile = NULL, verbose = TRUE) |
|
13 | 15 |
} |
14 | 16 |
\arguments{ |
15 | 17 |
\item{counts}{Integer matrix. Rows represent features and columns represent cells.} |
... | ... |
@@ -42,7 +44,9 @@ celda_CG(counts, sample.label = NULL, K, L, alpha = 1, beta = 1, |
42 | 44 |
|
43 | 45 |
\item{nchains}{Integer. Number of random cluster initializations. Default 3.} |
44 | 46 |
|
45 |
-\item{initialize}{Chararacter. One of 'random' or 'split'. With 'random', cells and features are randomly assigned to a clusters. With 'split' cell and feature clusters will be recurssively split into two clusters using `celda_C` and `celda_G`, respectively, until the specified K and L is reached. Default 'random'.} |
|
47 |
+\item{z.initialize}{Chararacter. One of 'random', 'split', or 'predefined'. With 'random', cells are randomly assigned to a populations. With 'split', cells will be split into sqrt(K) populations and then each popluation will be subsequently split into another sqrt(K) populations. With 'predefined', values in `z.init` will be used to initialize `z`. Default 'split'.} |
|
48 |
+ |
|
49 |
+\item{y.initialize}{Chararacter. One of 'random', 'split', or 'predefined'. With 'random', features are randomly assigned to a modules. With 'split', features will be split into sqrt(L) modules and then each module will be subsequently split into another sqrt(L) modules. With 'predefined', values in `y.init` will be used to initialize `y`. Default 'split'.} |
|
46 | 50 |
|
47 | 51 |
\item{count.checksum}{Character. An MD5 checksum for the `counts` matrix. Default NULL.} |
48 | 52 |
|
... | ... |
@@ -6,7 +6,7 @@ |
6 | 6 |
\usage{ |
7 | 7 |
celda_G(counts, L, beta = 1, delta = 1, gamma = 1, stop.iter = 10, |
8 | 8 |
max.iter = 200, split.on.iter = 10, split.on.last = TRUE, |
9 |
- seed = 12345, nchains = 3, initialize = c("random", "split"), |
|
9 |
+ seed = 12345, nchains = 3, y.initialize = c("split", "random"), |
|
10 | 10 |
count.checksum = NULL, y.init = NULL, logfile = NULL, |
11 | 11 |
verbose = TRUE) |
12 | 12 |
} |
... | ... |
@@ -33,7 +33,7 @@ celda_G(counts, L, beta = 1, delta = 1, gamma = 1, stop.iter = 10, |
33 | 33 |
|
34 | 34 |
\item{nchains}{Integer. Number of random cluster initializations. Default 3.} |
35 | 35 |
|
36 |
-\item{initialize}{Chararacter. One of 'random' or 'split'. With 'random', features are randomly assigned to a clusters. With 'split' cell and feature clusters will be recurssively split into two clusters using `celda_G()` until the specified L is reached. Default 'random'.} |
|
36 |
+\item{y.initialize}{Chararacter. One of 'random', 'split', or 'predefined'. With 'random', features are randomly assigned to a modules. With 'split', features will be split into sqrt(L) modules and then each module will be subsequently split into another sqrt(L) modules. With 'predefined', values in `y.init` will be used to initialize `y`. Default 'split'.} |
|
37 | 37 |
|
38 | 38 |
\item{count.checksum}{Character. An MD5 checksum for the `counts` matrix. Default NULL.} |
39 | 39 |
|
40 | 40 |
new file mode 100644 |
... | ... |
@@ -0,0 +1,62 @@ |
1 |
+% Generated by roxygen2: do not edit by hand |
|
2 |
+% Please edit documentation in R/recursiveSplit.R |
|
3 |
+\name{recursiveSplitCell} |
|
4 |
+\alias{recursiveSplitCell} |
|
5 |
+\title{Recursive cell splitting} |
|
6 |
+\usage{ |
|
7 |
+recursiveSplitCell(counts, sample.label = NULL, initial.K = 5, |
|
8 |
+ max.K = 25, y.init = NULL, alpha = 1, beta = 1, delta = 1, |
|
9 |
+ gamma = 1, min.cell = 3, perplexity = TRUE, seed = 12345, |
|
10 |
+ logfile = NULL, verbose = TRUE) |
|
11 |
+} |
|
12 |
+\arguments{ |
|
13 |
+\item{counts}{Integer matrix. Rows represent features and columns represent cells.} |
|
14 |
+ |
|
15 |
+\item{sample.label}{Vector or factor. Denotes the sample label for each cell (column) in the count matrix.} |
|
16 |
+ |
|
17 |
+\item{initial.K}{Integer. Minimum number of cell populations to try.} |
|
18 |
+ |
|
19 |
+\item{max.K}{Integer. Maximum number of cell populations to try.} |
|
20 |
+ |
|
21 |
+\item{y.init}{Integer vector. Module labels for features. Cells will be clusteredusing the `celda_CG` model based on the modules specified in `y.init` rather than the counts of individual features. While the feature module labels will be initialized to `y.init`, they will be allowed to move within each model with a new K.} |
|
22 |
+ |
|
23 |
+\item{alpha}{Numeric. Concentration parameter for Theta. Adds a pseudocount to each cell population in each sample. Default 1.} |
|
24 |
+ |
|
25 |
+\item{beta}{Numeric. Concentration parameter for Phi. Adds a pseudocount to each feature in each cell (if `y.init` is not used) or to each module in each cell population (if `y.init` is set). Default 1.} |
|
26 |
+ |
|
27 |
+\item{delta}{Numeric. Concentration parameter for Psi. Adds a pseudocount to each feature in each module. Only used if `y.init` is set. Default 1.} |
|
28 |
+ |
|
29 |
+\item{gamma}{Numeric. Concentration parameter for Eta. Adds a pseudocount to the number of features in each module. Only used if `y.init` is set. Default 1.} |
|
30 |
+ |
|
31 |
+\item{min.cell}{Integer. Only attempt to split cell populations with at least this many cells.} |
|
32 |
+ |
|
33 |
+\item{perplexity}{Logical. Whether to calculate perplexity for each model. If FALSE, then perplexity can be calculated later with `resamplePerplexity()`. Default TRUE.} |
|
34 |
+ |
|
35 |
+\item{seed}{Integer. Passed to `set.seed()`. Default 12345. If NULL, no calls to `set.seed()` are made.} |
|
36 |
+ |
|
37 |
+\item{logfile}{Character. Messages will be redirected to a file named `logfile`. If NULL, messages will be printed to stdout. Default NULL.} |
|
38 |
+ |
|
39 |
+\item{verbose}{Logical. Whether to print log messages. Default TRUE.} |
|
40 |
+} |
|
41 |
+\value{ |
|
42 |
+Object of class `celda_list`, which contains results for all model parameter combinations and summaries of the run parameters |
|
43 |
+} |
|
44 |
+\description{ |
|
45 |
+Uses the `celda_C` model to cluster cells into population for range of possible K's. The cell population labels of the previous "K-1" model are used as the initial values in the current model with K cell populations. The best split of an existing cell population is found to create the K-th cluster. This procedure is much faster than randomly initializing each model with a different K. If module labels for each feature are given in 'y.init', the `celda_CG` model will be used to split cell populations based on those modules instead of individual features. Module labels will also be updated during sampling and thus may end up slightly different than `y.init`. |
|
46 |
+} |
|
47 |
+\examples{ |
|
48 |
+## Create models that range from K=3 to K=10 by recursively splitting cell populations into two to produce `celda_C` cell clustering models |
|
49 |
+testZ = recursiveSplitCell(celda.C.sim$counts, initial.K = 3, max.K=10) |
|
50 |
+ |
|
51 |
+## Alternatively, first identify features modules usinge `recursiveSplitModule()` |
|
52 |
+module.split = recursiveSplitModule(celda.CG.sim$counts, initial.L = 3, max.L=20) |
|
53 |
+plotGridSearchPerplexity(module.split) |
|
54 |
+module.split.select = subsetCeldaList(module.split, list(L=10)) |
|
55 |
+Then, use module labels for initialization in `recursiveSplitCell()` to produce `celda_CG` bi-clustering models |
|
56 |
+cell.split = recursiveSplitCell(celda.CG.sim$counts, initial.K = 3, max.K=20, y.init = clusters(module.split.select)$y) |
|
57 |
+plotGridSearchPerplexity(cell.split) |
|
58 |
+celda.mod = subsetCeldaList(cell.split, list(K=5, L=10)) |
|
59 |
+} |
|
60 |
+\seealso{ |
|
61 |
+`recursiveSplitModule()` for recursive splitting of cell populations. |
|
62 |
+} |
0 | 63 |
new file mode 100644 |
... | ... |
@@ -0,0 +1,57 @@ |
1 |
+% Generated by roxygen2: do not edit by hand |
|
2 |
+% Please edit documentation in R/recursiveSplit.R |
|
3 |
+\name{recursiveSplitModule} |
|
4 |
+\alias{recursiveSplitModule} |
|
5 |
+\title{Recursive module splitting} |
|
6 |
+\usage{ |
|
7 |
+recursiveSplitModule(counts, initial.L = 10, max.L = 100, |
|
8 |
+ temp.K = 100, z.init = NULL, beta = 1, delta = 1, gamma = 1, |
|
9 |
+ min.feature = 3, perplexity = TRUE, seed = 12345, verbose = TRUE, |
|
10 |
+ logfile = NULL) |
|
11 |
+} |
|
12 |
+\arguments{ |
|
13 |
+\item{counts}{Integer matrix. Rows represent features and columns represent cells.} |
|
14 |
+ |
|
15 |
+\item{initial.L}{Integer. Minimum number of modules to try.} |
|
16 |
+ |
|
17 |
+\item{max.L}{Integer. Maximum number of modules to try.} |
|
18 |
+ |
|
19 |
+\item{z.init}{Integer vector. Collapse cells to cell populations based on labels in `z.init` and then perform module splitting. If NULL, no collapasing will be performed unless `temp.z` is specified. Default NULL.} |
|
20 |
+ |
|
21 |
+\item{beta}{Numeric. Concentration parameter for Phi. Adds a pseudocount to each feature module in each cell. Default 1.} |
|
22 |
+ |
|
23 |
+\item{delta}{Numeric. Concentration parameter for Psi. Adds a pseudocount to each feature in each module. Default 1.} |
|
24 |
+ |
|
25 |
+\item{gamma}{Numeric. Concentration parameter for Eta. Adds a pseudocount to the number of features in each module. Default 1.} |
|
26 |
+ |
|
27 |
+\item{min.feature}{Integer. Only attempt to split modules with at least this many features.} |
|
28 |
+ |
|
29 |
+\item{perplexity}{Logical. Whether to calculate perplexity for each model. If FALSE, then perplexity can be calculated later with `resamplePerplexity()`. Default TRUE.} |
|
30 |
+ |
|
31 |
+\item{seed}{Integer. Passed to `set.seed()`. Default 12345. If NULL, no calls to `set.seed()` are made.} |
|
32 |
+ |
|
33 |
+\item{verbose}{Logical. Whether to print log messages. Default TRUE.} |
|
34 |
+ |
|
35 |
+\item{logfile}{Character. Messages will be redirected to a file named `logfile`. If NULL, messages will be printed to stdout. Default NULL.} |
|
36 |
+ |
|
37 |
+\item{temp.z}{Integer. Number of temporary cell populations to identify and use in module splitting. Only used if `z.init=NULL` Collapsing cells to a relatively smaller number of cell popluations will increase the speed of module clustering and tend to produce better modules. This number should be larger than the number of true cell populations expected in the dataset. Default 100.} |
|
38 |
+} |
|
39 |
+\value{ |
|
40 |
+Object of class `celda_list`, which contains results for all model parameter combinations and summaries of the run parameters |
|
41 |
+} |
|
42 |
+\description{ |
|
43 |
+Uses the `celda_G` model to cluster features into modules for a range of possible L's. The module labels of the previous "L-1" model are used as the initial values in the current model with L modules. The best split of an existing module is found to create the L-th module. This procedure is much faster than randomly initializing each model with a different L. |
|
44 |
+} |
|
45 |
+\examples{ |
|
46 |
+## Create models that range from L=3 to L=20 by recursively splitting modules into two |
|
47 |
+module.split = recursiveSplitModule(celda.CG.sim$counts, initial.L = 3, max.L=20) |
|
48 |
+ |
|
49 |
+## Example results with perplexity |
|
50 |
+plotGridSearchPerplexity(module.split) |
|
51 |
+ |
|
52 |
+## Select model for downstream analysis |
|
53 |
+celda.mod = subsetCeldaList(module.split, list(L=10)) |
|
54 |
+} |
|
55 |
+\seealso{ |
|
56 |
+`recursiveSplitCell()` for recursive splitting of cell populations. |
|
57 |
+} |