... | ... |
@@ -3,8 +3,8 @@ Type: Package |
3 | 3 |
Title: A framework for cross-validated classification problems, with |
4 | 4 |
applications to differential variability and differential |
5 | 5 |
distribution testing |
6 |
-Version: 3.3.10 |
|
7 |
-Date: 2022-12-12 |
|
6 |
+Version: 3.3.11 |
|
7 |
+Date: 2023-02-10 |
|
8 | 8 |
Authors@R: |
9 | 9 |
c( |
10 | 10 |
person(given = "Dario", family = "Strbenac", email = "dario.strbenac@sydney.edu.au", role = c("aut", "cre")), |
... | ... |
@@ -25,7 +25,7 @@ Suggests: limma, edgeR, car, Rmixmod, gridExtra (>= 2.0.0), cowplot, |
25 | 25 |
BiocStyle, pamr, PoiClaClu, parathyroidSE, knitr, htmltools, gtable, |
26 | 26 |
scales, e1071, rmarkdown, IRanges, robustbase, glmnet, class, randomForestSRC, |
27 | 27 |
MatrixModels, xgboost |
28 |
-Description: The software formalises a framework for classification and survival model evaluatio |
|
28 |
+Description: The software formalises a framework for classification and survival model evaluation |
|
29 | 29 |
in R. There are four stages; Data transformation, feature selection, model training, |
30 | 30 |
and prediction. The requirements of variable types and variable order are |
31 | 31 |
fixed, but specialised variables for functions can also be provided. |
... | ... |
@@ -35,7 +35,7 @@ Description: The software formalises a framework for classification and survival |
35 | 35 |
may be developed by the user, by creating an interface to the framework. |
36 | 36 |
License: GPL-3 |
37 | 37 |
Packaged: 2014-10-18 11:16:55 UTC; dario |
38 |
-RoxygenNote: 7.2.2 |
|
38 |
+RoxygenNote: 7.2.3 |
|
39 | 39 |
NeedsCompilation: yes |
40 | 40 |
Collate: |
41 | 41 |
'ROCplot.R' |
... | ... |
@@ -1,11 +1,13 @@ |
1 | 1 |
#' Cross-validation to evaluate classification performance. |
2 | 2 |
#' |
3 | 3 |
#' This function has been designed to facilitate the comparison of classification |
4 |
-#' methods using cross-validation. A selection of typical comparisons are implemented. The \code{train} function |
|
5 |
-#' is a convenience method for training on one data set and predicting on an independent validation data set. |
|
4 |
+#' methods using cross-validation, particularly when there are multiple assays per biological unit. |
|
5 |
+#' A selection of typical comparisons are implemented. The \code{train} function |
|
6 |
+#' is a convenience method for training on one data set and likewise \code{predict} for predicting on an |
|
7 |
+#' independent validation data set. |
|
6 | 8 |
#' |
7 | 9 |
#' @param measurements Either a \code{\link{DataFrame}}, \code{\link{data.frame}}, \code{\link{matrix}}, \code{\link{MultiAssayExperiment}} |
8 |
-#' or a list of these objects containing the data. |
|
10 |
+#' or a list of the basic tabular objects containing the data. |
|
9 | 11 |
#' @param x Same as \code{measurements} but only training samples. |
10 | 12 |
#' @param outcome A vector of class labels of class \code{\link{factor}} of the |
11 | 13 |
#' same length as the number of samples in \code{measurements} or a character vector of length 1 containing the |
... | ... |
@@ -17,13 +19,15 @@ |
17 | 19 |
#' a character string, or vector of such strings, containing column name(s) of column(s) |
18 | 20 |
#' containing either classes or time and event information about survival. If column names |
19 | 21 |
#' of survival information, time must be in first column and event status in the second. |
20 |
-#' @param ... Parameters passed into \code{\link{prepareData}} which control subsetting and filtering of input data. |
|
22 |
+#' @param extraParams A list of parameters that will be used to overwrite default settings of transformation, selection, or model-building functions or |
|
23 |
+#' parameters which will be passed into the data cleaning function. The names of the list must be one of \code{"prepare"}, |
|
24 |
+#' \code{"select"}, \code{"train"}, \code{"predict"}. |
|
21 | 25 |
#' @param nFeatures The number of features to be used for classification. If this is a single number, the same number of features will be used for all comparisons |
22 | 26 |
#' or assays. If a numeric vector these will be optimised over using \code{selectionOptimisation}. If a named vector with the same names of multiple assays, |
23 | 27 |
#' a different number of features will be used for each assay. If a named list of vectors, the respective number of features will be optimised over. |
24 | 28 |
#' Set to NULL or "all" if all features should be used. |
25 | 29 |
#' @param selectionMethod Default: \code{"auto"}. A character vector of feature selection methods to compare. If a named character vector with names corresponding to different assays, |
26 |
-#' and performing multiview classification, the respective classification methods will be used on each assay. If \code{"auto"}, t-test (two categories) / F-test (three or more categories) ranking |
|
30 |
+#' and performing multiview classification, the respective selection methods will be used on each assay. If \code{"auto"}, t-test (two categories) / F-test (three or more categories) ranking |
|
27 | 31 |
#' and top \code{nFeatures} optimisation is done. Otherwise, the ranking method is per-feature Cox proportional hazards p-value. |
28 | 32 |
#' @param selectionOptimisation A character of "Resubstitution", "Nested CV" or "none" specifying the approach used to optimise \code{nFeatures}. |
29 | 33 |
#' @param performanceType Default: \code{"auto"}. If \code{"auto"}, then balanced accuracy for classification or C-index for survival. Otherwise, any one of the |
... | ... |
@@ -31,13 +35,14 @@ |
31 | 35 |
#' @param classifier Default: \code{"auto"}. A character vector of classification methods to compare. If a named character vector with names corresponding to different assays, |
32 | 36 |
#' and performing multiview classification, the respective classification methods will be used on each assay. If \code{"auto"}, then a random forest is used for a classification |
33 | 37 |
#' task or Cox proportional hazards model for a survival task. |
34 |
-#' @param multiViewMethod A character vector specifying the multiview method or data integration approach to use. |
|
38 |
+#' @param multiViewMethod Default: \code{"none"}. A character vector specifying the multiview method or data integration approach to use. See \code{available("multiViewMethod") for possibilities.} |
|
35 | 39 |
#' @param assayCombinations A character vector or list of character vectors proposing the assays or, in the case of a list, combination of assays to use |
36 | 40 |
#' with each element being a vector of assays to combine. Special value \code{"all"} means all possible subsets of assays. |
37 | 41 |
#' @param nFolds A numeric specifying the number of folds to use for cross-validation. |
38 | 42 |
#' @param nRepeats A numeric specifying the the number of repeats or permutations to use for cross-validation. |
39 | 43 |
#' @param nCores A numeric specifying the number of cores used if the user wants to use parallelisation. |
40 | 44 |
#' @param characteristicsLabel A character specifying an additional label for the cross-validation run. |
45 |
+#' @param ... For \code{train} and \code{predict} functions, parameters not used by the non-DataFrame signature functions but passed into the DataFrame signature function. |
|
41 | 46 |
#' @param object A trained model to predict with. |
42 | 47 |
#' @param newData The data to use to make predictions with. |
43 | 48 |
#' |
... | ... |
@@ -100,11 +105,14 @@ setMethod("crossValidate", "DataFrame", |
100 | 105 |
nFolds = 5, |
101 | 106 |
nRepeats = 20, |
102 | 107 |
nCores = 1, |
103 |
- characteristicsLabel = NULL, ...) |
|
108 |
+ characteristicsLabel = NULL, extraParams = NULL) |
|
104 | 109 |
|
105 | 110 |
{ |
106 | 111 |
# Check that data is in the right format, if not already done for MultiAssayExperiment input. |
107 |
- measurementsAndOutcome <- prepareData(measurements, outcome, ...) |
|
112 |
+ prepParams <- list(measurements, outcome) |
|
113 |
+ if("prepare" %in% names(extraParams)) |
|
114 |
+ prepParams <- c(prepParams, extraParams[["prepare"]]) |
|
115 |
+ measurementsAndOutcome <- do.call(prepareData, prepParams) |
|
108 | 116 |
measurements <- measurementsAndOutcome[["measurements"]] |
109 | 117 |
outcome <- measurementsAndOutcome[["outcome"]] |
110 | 118 |
|
... | ... |
@@ -177,7 +185,8 @@ setMethod("crossValidate", "DataFrame", |
177 | 185 |
nFolds = nFolds, |
178 | 186 |
nRepeats = nRepeats, |
179 | 187 |
nCores = nCores, |
180 |
- characteristicsLabel = characteristicsLabel |
|
188 |
+ characteristicsLabel = characteristicsLabel, |
|
189 |
+ extraParams = extraParams |
|
181 | 190 |
) |
182 | 191 |
}, |
183 | 192 |
simplify = FALSE) |
... | ... |
@@ -213,7 +222,8 @@ setMethod("crossValidate", "DataFrame", |
213 | 222 |
nFolds = nFolds, |
214 | 223 |
nRepeats = nRepeats, |
215 | 224 |
nCores = nCores, |
216 |
- characteristicsLabel = characteristicsLabel) |
|
225 |
+ characteristicsLabel = characteristicsLabel, |
|
226 |
+ extraParams = extraParams) |
|
217 | 227 |
}, simplify = FALSE) |
218 | 228 |
|
219 | 229 |
} |
... | ... |
@@ -246,7 +256,8 @@ setMethod("crossValidate", "DataFrame", |
246 | 256 |
nFolds = nFolds, |
247 | 257 |
nRepeats = nRepeats, |
248 | 258 |
nCores = nCores, |
249 |
- characteristicsLabel = characteristicsLabel) |
|
259 |
+ characteristicsLabel = characteristicsLabel, |
|
260 |
+ extraParams = extraParams) |
|
250 | 261 |
}, simplify = FALSE) |
251 | 262 |
|
252 | 263 |
} |
... | ... |
@@ -280,7 +291,8 @@ setMethod("crossValidate", "DataFrame", |
280 | 291 |
nFolds = nFolds, |
281 | 292 |
nRepeats = nRepeats, |
282 | 293 |
nCores = nCores, |
283 |
- characteristicsLabel = characteristicsLabel) |
|
294 |
+ characteristicsLabel = characteristicsLabel, |
|
295 |
+ extraParams = extraParams) |
|
284 | 296 |
}, simplify = FALSE) |
285 | 297 |
|
286 | 298 |
} |
... | ... |
@@ -306,9 +318,13 @@ setMethod("crossValidate", "MultiAssayExperiment", |
306 | 318 |
nFolds = 5, |
307 | 319 |
nRepeats = 20, |
308 | 320 |
nCores = 1, |
309 |
- characteristicsLabel = NULL, ...) |
|
321 |
+ characteristicsLabel = NULL, extraParams = NULL) |
|
310 | 322 |
{ |
311 |
- measurementsAndOutcome <- prepareData(measurements, outcome, ...) |
|
323 |
+ # Check that data is in the right format, if not already done for MultiAssayExperiment input. |
|
324 |
+ prepParams <- list(measurements, outcome) |
|
325 |
+ if("prepare" %in% names(extraParams)) |
|
326 |
+ prepParams <- c(prepParams, extraParams[["prepare"]]) |
|
327 |
+ measurementsAndOutcome <- do.call(prepareData, prepParams) |
|
312 | 328 |
|
313 | 329 |
crossValidate(measurements = measurementsAndOutcome[["measurements"]], |
314 | 330 |
outcome = measurementsAndOutcome[["outcome"]], |
... | ... |
@@ -322,7 +338,8 @@ setMethod("crossValidate", "MultiAssayExperiment", |
322 | 338 |
nFolds = nFolds, |
323 | 339 |
nRepeats = nRepeats, |
324 | 340 |
nCores = nCores, |
325 |
- characteristicsLabel = characteristicsLabel) |
|
341 |
+ characteristicsLabel = characteristicsLabel, |
|
342 |
+ extraParams = extraParams) |
|
326 | 343 |
}) |
327 | 344 |
|
328 | 345 |
#' @rdname crossValidate |
... | ... |
@@ -340,7 +357,7 @@ setMethod("crossValidate", "data.frame", # data.frame of numeric measurements. |
340 | 357 |
nFolds = 5, |
341 | 358 |
nRepeats = 20, |
342 | 359 |
nCores = 1, |
343 |
- characteristicsLabel = NULL, ...) |
|
360 |
+ characteristicsLabel = NULL, extraParams = NULL) |
|
344 | 361 |
{ |
345 | 362 |
measurements <- S4Vectors::DataFrame(measurements, check.names = FALSE) |
346 | 363 |
crossValidate(measurements = measurements, |
... | ... |
@@ -355,7 +372,7 @@ setMethod("crossValidate", "data.frame", # data.frame of numeric measurements. |
355 | 372 |
nFolds = nFolds, |
356 | 373 |
nRepeats = nRepeats, |
357 | 374 |
nCores = nCores, |
358 |
- characteristicsLabel = characteristicsLabel, ...) # ... for prepareData. |
|
375 |
+ characteristicsLabel = characteristicsLabel, extraParams = extraParams) |
|
359 | 376 |
}) |
360 | 377 |
|
361 | 378 |
#' @rdname crossValidate |
... | ... |
@@ -373,7 +390,7 @@ setMethod("crossValidate", "matrix", # Matrix of numeric measurements. |
373 | 390 |
nFolds = 5, |
374 | 391 |
nRepeats = 20, |
375 | 392 |
nCores = 1, |
376 |
- characteristicsLabel = NULL, ...) |
|
393 |
+ characteristicsLabel = NULL, extraParams = NULL) |
|
377 | 394 |
{ |
378 | 395 |
measurements <- S4Vectors::DataFrame(measurements, check.names = FALSE) |
379 | 396 |
crossValidate(measurements = measurements, |
... | ... |
@@ -388,7 +405,7 @@ setMethod("crossValidate", "matrix", # Matrix of numeric measurements. |
388 | 405 |
nFolds = nFolds, |
389 | 406 |
nRepeats = nRepeats, |
390 | 407 |
nCores = nCores, |
391 |
- characteristicsLabel = characteristicsLabel, ...) # ... for prepareData. |
|
408 |
+ characteristicsLabel = characteristicsLabel, extraParams = extraParams) |
|
392 | 409 |
}) |
393 | 410 |
|
394 | 411 |
# This expects that each table is about the same set of samples and thus |
... | ... |
@@ -408,7 +425,7 @@ setMethod("crossValidate", "list", |
408 | 425 |
nFolds = 5, |
409 | 426 |
nRepeats = 20, |
410 | 427 |
nCores = 1, |
411 |
- characteristicsLabel = NULL, ...) |
|
428 |
+ characteristicsLabel = NULL, extraParams = NULL) |
|
412 | 429 |
{ |
413 | 430 |
# Check data type is valid |
414 | 431 |
if (!(all(sapply(measurements, class) %in% c("data.frame", "DataFrame", "matrix")))) { |
... | ... |
@@ -456,7 +473,7 @@ setMethod("crossValidate", "list", |
456 | 473 |
nFolds = nFolds, |
457 | 474 |
nRepeats = nRepeats, |
458 | 475 |
nCores = nCores, |
459 |
- characteristicsLabel = characteristicsLabel, ...) |
|
476 |
+ characteristicsLabel = characteristicsLabel, extraParams = extraParams) |
|
460 | 477 |
}) |
461 | 478 |
|
462 | 479 |
|
... | ... |
@@ -544,6 +561,8 @@ generateCrossValParams <- function(nRepeats, nFolds, nCores, selectionOptimisati |
544 | 561 |
CrossValParams(permutations = nRepeats, folds = nFolds, parallelParams = BPparam, tuneMode = tuneMode) |
545 | 562 |
} |
546 | 563 |
|
564 |
+ |
|
565 |
+# Returns a |
|
547 | 566 |
generateModellingParams <- function(assayIDs, |
548 | 567 |
measurements, |
549 | 568 |
nFeatures, |
... | ... |
@@ -551,7 +570,8 @@ generateModellingParams <- function(assayIDs, |
551 | 570 |
selectionOptimisation, |
552 | 571 |
performanceType = "auto", |
553 | 572 |
classifier, |
554 |
- multiViewMethod = "none" |
|
573 |
+ multiViewMethod = "none", |
|
574 |
+ extraParams |
|
555 | 575 |
){ |
556 | 576 |
if(multiViewMethod != "none") { |
557 | 577 |
params <- generateMultiviewParams(assayIDs, |
... | ... |
@@ -559,15 +579,13 @@ generateModellingParams <- function(assayIDs, |
559 | 579 |
nFeatures, |
560 | 580 |
selectionMethod, |
561 | 581 |
selectionOptimisation, |
562 |
- performanceType = performanceType, |
|
582 |
+ performanceType, |
|
563 | 583 |
classifier, |
564 |
- multiViewMethod) |
|
584 |
+ multiViewMethod, extraParams) |
|
565 | 585 |
return(params) |
566 | 586 |
} |
567 | 587 |
|
568 | 588 |
|
569 |
- |
|
570 |
- |
|
571 | 589 |
if(length(assayIDs) > 1) obsFeatures <- sum(S4Vectors::mcols(measurements)[, "assay"] %in% assayIDs) |
572 | 590 |
else obsFeatures <- ncol(measurements) |
573 | 591 |
|
... | ... |
@@ -576,8 +594,7 @@ generateModellingParams <- function(assayIDs, |
576 | 594 |
|
577 | 595 |
if(max(nFeatures) > obsFeatures) { |
578 | 596 |
|
579 |
- warning("nFeatures greater than the max number of features in data. |
|
580 |
- Setting to max") |
|
597 |
+ warning("nFeatures greater than the max number of features in data. Setting to max") |
|
581 | 598 |
nFeatures <- pmin(nFeatures, obsFeatures) |
582 | 599 |
} |
583 | 600 |
|
... | ... |
@@ -585,21 +602,72 @@ generateModellingParams <- function(assayIDs, |
585 | 602 |
|
586 | 603 |
# Check classifier |
587 | 604 |
knownClassifiers <- .ClassifyRenvir[["classifyKeywords"]][, "classifier Keyword"] |
588 |
- if(!classifier %in% knownClassifiers) |
|
589 |
- stop(paste("Classifier must exactly match of these (be careful of case):", paste(knownClassifiers, collapse = ", "))) |
|
590 |
- |
|
605 |
+ if(any(!classifier %in% knownClassifiers)) |
|
606 |
+ stop(paste("classifier must exactly match these options (be careful of case):", paste(knownClassifiers, collapse = ", "))) |
|
607 |
+ |
|
608 |
+ # Always return a list for ease of processing. Unbox at end if just one. |
|
591 | 609 |
classifierParams <- .classifierKeywordToParams(classifier) |
610 |
+ |
|
611 |
+ # Modify the parameters with performanceType addition and any other to overwrite. |
|
592 | 612 |
if(!is.null(classifierParams$trainParams@tuneParams)) |
593 | 613 |
classifierParams$trainParams@tuneParams <- c(classifierParams$trainParams@tuneParams, performanceType = performanceType) |
594 | 614 |
|
615 |
+ if(!is.null(extraParams) && "train" %in% names(extraParams)) |
|
616 |
+ { |
|
617 |
+ for(paramIndex in seq_along(extraParams[["train"]])) |
|
618 |
+ { |
|
619 |
+ parameter <- extraParams[["train"]][[paramIndex]] |
|
620 |
+ parameterName <- names(extraParams[["train"]])[paramIndex] |
|
621 |
+ if(length(parameter) == 1) |
|
622 |
+ { |
|
623 |
+ if(is.null(classifierParams$trainParams@otherParams)) classifierParams$trainParams@otherParams <- extraParams[["train"]][paramIndex] |
|
624 |
+ else classifierParams$trainParams@otherParams[parameterName] <- parameter |
|
625 |
+ } else { |
|
626 |
+ if(is.null(classifierParams$trainParams@tuneParams)) classifierParams$trainParams@tuneParams <- extraParams[["train"]][paramIndex] |
|
627 |
+ else classifierParams$trainParams@tuneParams[parameterName] <- parameter # Multiple values, so tune them. |
|
628 |
+ } |
|
629 |
+ } |
|
630 |
+ } |
|
631 |
+ if(!is.null(extraParams) && "predict" %in% names(extraParams)) |
|
632 |
+ { |
|
633 |
+ for(paramIndex in seq_along(extraParams[["predict"]])) |
|
634 |
+ { |
|
635 |
+ parameter <- extraParams[["predict"]][[paramIndex]] |
|
636 |
+ parameterName <- names(extraParams[["predict"]])[paramIndex] |
|
637 |
+ if(length(parameter) == 1) |
|
638 |
+ { |
|
639 |
+ if(is.null(classifierParams$predictParams@otherParams)) classifierParams$predictParams@otherParams <- extraParams[["predict"]][paramIndex] |
|
640 |
+ else classifierParams$predictParams@otherParams[parameterName] <- parameter |
|
641 |
+ } else { |
|
642 |
+ if(is.null(classifierParams$predictParams@tuneParams)) classifierParams$predictParams@tuneParams <- extraParams[["predict"]][paramIndex] |
|
643 |
+ else classifierParams$predictParams@tuneParams[parameterName] <- parameter # Multiple values, so tune them. |
|
644 |
+ } |
|
645 |
+ } |
|
646 |
+ } |
|
647 |
+ |
|
595 | 648 |
selectionMethod <- unlist(selectionMethod) |
596 |
- |
|
597 |
- selectionMethod <- ifelse(is.null(selectionMethod), "none", selectionMethod) |
|
649 |
+ if(is.null(selectionMethod)) selectionMethod <- "none" |
|
598 | 650 |
|
599 | 651 |
if(selectionMethod != "none") |
600 |
- selectParams <- SelectParams(selectionMethod, |
|
601 |
- tuneParams = list(nFeatures = nFeatures, performanceType = performanceType)) |
|
602 |
- else selectParams <- NULL |
|
652 |
+ { |
|
653 |
+ selectParams <- SelectParams(selectionMethod, tuneParams = list(nFeatures = nFeatures, performanceType = performanceType)) |
|
654 |
+ if(!is.null(extraParams) && "select" %in% names(extraParams)) |
|
655 |
+ { |
|
656 |
+ for(paramIndex in seq_along(extraParams[["select"]])) |
|
657 |
+ { |
|
658 |
+ parameter <- extraParams[["select"]][[paramIndex]] |
|
659 |
+ parameterName <- names(extraParams[["select"]])[paramIndex] |
|
660 |
+ if(length(parameter) == 1) |
|
661 |
+ { |
|
662 |
+ if(is.null(classifierParams$selectParams@otherParams)) classifierParams$selectParams@otherParams <- extraParams[["select"]][paramIndex] |
|
663 |
+ else classifierParams$selectParams@otherParams[parameterName] <- parameter |
|
664 |
+ } else { |
|
665 |
+ if(is.null(classifierParams$selectParams@tuneParams)) classifierParams$selectParams@tuneParams <- extraParams[["select"]][paramIndex] |
|
666 |
+ else classifierParams$selectParams@tuneParams[parameterName] <- parameter # Multiple values, so tune them. |
|
667 |
+ } |
|
668 |
+ } |
|
669 |
+ } |
|
670 |
+ } else {selectParams <- NULL} |
|
603 | 671 |
|
604 | 672 |
params <- ModellingParams( |
605 | 673 |
balancing = "none", |
... | ... |
@@ -617,7 +685,6 @@ generateModellingParams <- function(assayIDs, |
617 | 685 |
# } |
618 | 686 |
# |
619 | 687 |
|
620 |
- |
|
621 | 688 |
params |
622 | 689 |
|
623 | 690 |
} |
... | ... |
@@ -632,7 +699,7 @@ generateMultiviewParams <- function(assayIDs, |
632 | 699 |
selectionOptimisation, |
633 | 700 |
performanceType, |
634 | 701 |
classifier, |
635 |
- multiViewMethod){ |
|
702 |
+ multiViewMethod, extraParams){ |
|
636 | 703 |
|
637 | 704 |
if(multiViewMethod == "merge"){ |
638 | 705 |
|
... | ... |
@@ -651,18 +718,21 @@ generateMultiviewParams <- function(assayIDs, |
651 | 718 |
selectionOptimisation = selectionOptimisation, |
652 | 719 |
performanceType = performanceType, |
653 | 720 |
classifier = classifier, |
654 |
- multiViewMethod = "none"), |
|
721 |
+ multiViewMethod = "none", |
|
722 |
+ extraParams = extraParams), |
|
655 | 723 |
SIMPLIFY = FALSE) |
656 | 724 |
|
657 |
- # Generate some params for merged model. |
|
725 |
+ # Generate some params for merged model. Which ones? |
|
726 |
+ # Reconsider how to do this well later. |
|
658 | 727 |
params <- generateModellingParams(assayIDs = assayIDs, |
659 | 728 |
measurements = measurements, |
660 | 729 |
nFeatures = nFeatures, |
661 |
- selectionMethod = selectionMethod, |
|
730 |
+ selectionMethod = selectionMethod[[1]], |
|
662 | 731 |
selectionOptimisation = "none", |
663 | 732 |
performanceType = performanceType, |
664 |
- classifier = classifier, |
|
665 |
- multiViewMethod = "none") |
|
733 |
+ classifier = classifier[[1]], |
|
734 |
+ multiViewMethod = "none", |
|
735 |
+ extraParams = extraParams) |
|
666 | 736 |
|
667 | 737 |
# Update selectParams to use |
668 | 738 |
params@selectParams <- SelectParams("selectMulti", |
... | ... |
@@ -690,36 +760,8 @@ generateMultiviewParams <- function(assayIDs, |
690 | 760 |
MoreArgs = list( |
691 | 761 |
selectionOptimisation = selectionOptimisation, |
692 | 762 |
performanceType = performanceType, |
693 |
- multiViewMethod = "none"), |
|
694 |
- SIMPLIFY = FALSE) |
|
695 |
- |
|
696 |
- |
|
697 |
- params <- ModellingParams( |
|
698 |
- balancing = "none", |
|
699 |
- selectParams = NULL, |
|
700 |
- trainParams = TrainParams(prevalTrainInterface, params = paramsAssays, characteristics = paramsAssays$clinical@trainParams@characteristics), |
|
701 |
- predictParams = PredictParams(prevalPredictInterface, characteristics = paramsAssays$clinical@predictParams@characteristics) |
|
702 |
- ) |
|
703 |
- |
|
704 |
- return(params) |
|
705 |
- } |
|
706 |
- |
|
707 |
- if(multiViewMethod == "prevalidation"){ |
|
708 |
- |
|
709 |
- # Split measurements up by assay. |
|
710 |
- assayTrain <- sapply(assayIDs, function(assayID) measurements[, S4Vectors::mcols(measurements)[["assay"]] %in% assayID], simplify = FALSE) |
|
711 |
- |
|
712 |
- # Generate params for each assay. This could be extended to have different selectionMethods for each type |
|
713 |
- paramsAssays <- mapply(generateModellingParams, |
|
714 |
- nFeatures = nFeatures[assayIDs], |
|
715 |
- selectionMethod = selectionMethod[assayIDs], |
|
716 |
- assayIDs = assayIDs, |
|
717 |
- measurements = assayTrain[assayIDs], |
|
718 |
- classifier = classifier[assayIDs], |
|
719 |
- MoreArgs = list( |
|
720 |
- selectionOptimisation = selectionOptimisation, |
|
721 |
- performanceType = performanceType, |
|
722 |
- multiViewMethod = "none"), |
|
763 |
+ multiViewMethod = "none", |
|
764 |
+ extraParams = extraParams), |
|
723 | 765 |
SIMPLIFY = FALSE) |
724 | 766 |
|
725 | 767 |
|
... | ... |
@@ -733,7 +775,6 @@ generateMultiviewParams <- function(assayIDs, |
733 | 775 |
return(params) |
734 | 776 |
} |
735 | 777 |
|
736 |
- |
|
737 | 778 |
if(multiViewMethod == "PCA"){ |
738 | 779 |
|
739 | 780 |
# Split measurements up by assay. |
... | ... |
@@ -748,7 +789,8 @@ generateMultiviewParams <- function(assayIDs, |
748 | 789 |
classifier = classifier["clinical"], |
749 | 790 |
selectionOptimisation = selectionOptimisation, |
750 | 791 |
performanceType = performanceType, |
751 |
- multiViewMethod = "none")) |
|
792 |
+ multiViewMethod = "none", |
|
793 |
+ extraParams = extraParams)) |
|
752 | 794 |
|
753 | 795 |
|
754 | 796 |
params <- ModellingParams( |
... | ... |
@@ -775,7 +817,7 @@ CV <- function(measurements, outcome, x, outcomeTrain, measurementsTest, outcome |
775 | 817 |
nFolds, |
776 | 818 |
nRepeats, |
777 | 819 |
nCores, |
778 |
- characteristicsLabel) |
|
820 |
+ characteristicsLabel, extraParams) |
|
779 | 821 |
|
780 | 822 |
{ |
781 | 823 |
# Which data-types or data-views are present? |
... | ... |
@@ -797,7 +839,7 @@ CV <- function(measurements, outcome, x, outcomeTrain, measurementsTest, outcome |
797 | 839 |
selectionOptimisation = selectionOptimisation, |
798 | 840 |
performanceType = performanceType, |
799 | 841 |
classifier = classifier, |
800 |
- multiViewMethod = multiViewMethod) |
|
842 |
+ multiViewMethod = multiViewMethod, extraParams = extraParams) |
|
801 | 843 |
|
802 | 844 |
if(length(assayIDs) > 1 || length(assayIDs) == 1 && assayIDs != 1) assayText <- assayIDs else assayText <- NULL |
803 | 845 |
characteristics <- S4Vectors::DataFrame(characteristic = c(if(!is.null(assayText)) "Assay Name" else NULL, "Classifier Name", "Selection Name", "multiViewMethod", "characteristicsLabel"), value = c(if(!is.null(assayText)) paste(assayText, collapse = ", ") else NULL, paste(classifier, collapse = ", "), paste(selectionMethod, collapse = ", "), multiViewMethod, characteristicsLabel)) |
... | ... |
@@ -848,16 +890,12 @@ train.data.frame <- function(x, outcomeTrain, ...) |
848 | 890 |
#' @method train DataFrame |
849 | 891 |
#' @export |
850 | 892 |
train.DataFrame <- function(x, outcomeTrain, selectionMethod = "auto", nFeatures = 20, classifier = "auto", performanceType = "auto", |
851 |
- multiViewMethod = "none", assayIDs = "all", ...) # ... for prepareData. |
|
893 |
+ multiViewMethod = "none", assayIDs = "all", extraParams = NULL) |
|
852 | 894 |
{ |
853 |
- prepArgs <- list(x, outcomeTrain) |
|
854 |
- extraInputs <- list(...) |
|
855 |
- prepExtras <- numeric() |
|
856 |
- if(length(extraInputs) > 0) |
|
857 |
- prepExtras <- which(names(extraInputs) %in% .ClassifyRenvir[["prepareDataFormals"]]) |
|
858 |
- if(length(prepExtras) > 0) |
|
859 |
- prepArgs <- append(prepArgs, extraInputs[prepExtras]) |
|
860 |
- measurementsAndOutcome <- do.call(prepareData, prepArgs) |
|
895 |
+ prepParams <- list(x, outcomeTrain) |
|
896 |
+ if(!is.null(extraParams) && "prepare" %in% names(extraParams)) |
|
897 |
+ prepParams <- c(prepParams, extraParams[["prepare"]]) |
|
898 |
+ measurementsAndOutcome <- do.call(prepareData, prepParams) |
|
861 | 899 |
|
862 | 900 |
# Ensure performance type is one of the ones that can be calculated by the package. |
863 | 901 |
if(!performanceType %in% c("auto", .ClassifyRenvir[["performanceTypes"]])) |
... | ... |
@@ -895,13 +933,46 @@ train.DataFrame <- function(x, outcomeTrain, selectionMethod = "auto", nFeatures |
895 | 933 |
|
896 | 934 |
modellingParams <- generateModellingParams(assayIDs = assayIDs, measurements = measurements, nFeatures = nFeatures, |
897 | 935 |
selectionMethod = selectionMethod, selectionOptimisation = "Resubstitution", performanceType = performanceType, |
898 |
- classifier = classifier, multiViewMethod = "none") |
|
936 |
+ classifier = classifier, multiViewMethod = "none", extraParams = extraParams) |
|
899 | 937 |
topFeatures <- .doSelection(measurementsUse, outcomeTrain, CrossValParams(), modellingParams, verbose = 0) |
900 | 938 |
selectedFeaturesIndices <- topFeatures[[2]] # Extract for subsetting. |
901 | 939 |
tuneDetailsSelect <- topFeatures[[3]] |
902 | 940 |
measurementsUse <- measurementsUse[, selectedFeaturesIndices] |
903 | 941 |
|
904 | 942 |
classifierParams <- .classifierKeywordToParams(classifierForAssay) |
943 |
+ if(!is.null(extraParams) && "train" %in% names(extraParams)) |
|
944 |
+ { |
|
945 |
+ for(paramIndex in seq_along(extraParams[["train"]])) |
|
946 |
+ { |
|
947 |
+ parameter <- extraParams[["train"]][[paramIndex]] |
|
948 |
+ parameterName <- names(extraParams[["train"]])[paramIndex] |
|
949 |
+ if(length(parameter) == 1) |
|
950 |
+ { |
|
951 |
+ if(is.null(classifierParams$trainParams@otherParams)) classifierParams$trainParams@otherParams <- extraParams[["train"]][paramIndex] |
|
952 |
+ else classifierParams$trainParams@otherParams[parameterName] <- parameter |
|
953 |
+ } else { |
|
954 |
+ if(is.null(classifierParams$trainParams@tuneParams)) classifierParams$trainParams@tuneParams <- extraParams[["train"]][paramIndex] |
|
955 |
+ else classifierParams$trainParams@tuneParams[parameterName] <- parameter # Multiple values, so tune them. |
|
956 |
+ } |
|
957 |
+ } |
|
958 |
+ } |
|
959 |
+ if(!is.null(extraParams) && "predict" %in% names(extraParams)) |
|
960 |
+ { |
|
961 |
+ for(paramIndex in seq_along(extraParams[["predict"]])) |
|
962 |
+ { |
|
963 |
+ parameter <- extraParams[["predict"]][[paramIndex]] |
|
964 |
+ parameterName <- names(extraParams[["predict"]])[paramIndex] |
|
965 |
+ if(length(parameter) == 1) |
|
966 |
+ { |
|
967 |
+ if(is.null(classifierParams$predictParams@otherParams)) classifierParams$predictParams@otherParams <- extraParams[["predict"]][paramIndex] |
|
968 |
+ else classifierParams$predictParams@otherParams[parameterName] <- parameter |
|
969 |
+ } else { |
|
970 |
+ if(is.null(classifierParams$predictParams@tuneParams)) classifierParams$predictParams@tuneParams <- extraParams[["predict"]][paramIndex] |
|
971 |
+ else classifierParams$predictParams@tuneParams[parameterName] <- parameter # Multiple values, so tune them. |
|
972 |
+ } |
|
973 |
+ } |
|
974 |
+ } |
|
975 |
+ |
|
905 | 976 |
modellingParams <- ModellingParams(balancing = "none", selectParams = NULL, |
906 | 977 |
trainParams = classifierParams$trainParams, predictParams = classifierParams$predictParams) |
907 | 978 |
if(!is.null(tuneDetailsSelect)) |
... | ... |
@@ -1079,6 +1150,6 @@ predict.trainedByClassifyR <- function(object, newData, ...) |
1079 | 1150 |
predictFunctionUse <- attr(object, "predictFunction") |
1080 | 1151 |
class(object) <- rev(class(object)) # Now want the predict method of the specific model to be picked, so put model class first. |
1081 | 1152 |
if (is(object, "listOfModels")) |
1082 |
- mapply(function(model, assay) predictFunctionUse(model, assay), object, newData, SIMPLIFY = FALSE) |
|
1083 |
- else predictFunctionUse(object, newData) # Object is itself a trained model and it is assumed that a predict method is defined for it. |
|
1153 |
+ mapply(function(model, assay) predictFunctionUse(model, assay), object, newData, MoreArgs = list(...), SIMPLIFY = FALSE) |
|
1154 |
+ else do.call(predictFunctionUse, list(object, newData, ...)) # Object is itself a trained model and it is assumed that a predict method is defined for it. |
|
1084 | 1155 |
} |
... | ... |
@@ -656,3 +656,26 @@ predict.dlda <- function(object, newdata, ...) { # Remove once sparsediscrim is |
656 | 656 |
.dmvnorm_diag <- function(x, mean, sigma) { # Remove once sparsediscrim is reinstated to CRAN. |
657 | 657 |
exp(sum(dnorm(x, mean=mean, sd=sqrt(sigma), log=TRUE))) |
658 | 658 |
} |
659 |
+ |
|
660 |
+# Function to create permutations of a vector, with the possibility to restrict values at certain positions. |
|
661 |
+# fixed parameter is a data frame with first column position and second column value. |
|
662 |
+.permutations <- function(data, fixed = NULL) |
|
663 |
+{ |
|
664 |
+ items <- length(data) |
|
665 |
+ multipliedTo1 <- factorial(items) |
|
666 |
+ if(items > 1) |
|
667 |
+ permutations <- structure(vapply(seq_along(data), function(index) |
|
668 |
+ rbind(data[index], .permutations(data[-index])), |
|
669 |
+ data[rep(1L, multipliedTo1)]), dim = c(items, multipliedTo1)) |
|
670 |
+ else permutations <- data |
|
671 |
+ |
|
672 |
+ if(!is.null(fixed)) |
|
673 |
+ { |
|
674 |
+ for(rowIndex in seq_len(nrow(fixed))) |
|
675 |
+ { |
|
676 |
+ keepColumns <- permutations[fixed[rowIndex, 1], ] == fixed[rowIndex, 2] |
|
677 |
+ permutations <- permutations[, keepColumns] |
|
678 |
+ } |
|
679 |
+ } |
|
680 |
+ permutations |
|
681 |
+} |
... | ... |
@@ -32,7 +32,7 @@ crossValidate(measurements, outcome, ...) |
32 | 32 |
nRepeats = 20, |
33 | 33 |
nCores = 1, |
34 | 34 |
characteristicsLabel = NULL, |
35 |
- ... |
|
35 |
+ extraParams = NULL |
|
36 | 36 |
) |
37 | 37 |
|
38 | 38 |
\S4method{crossValidate}{MultiAssayExperiment}( |
... | ... |
@@ -49,7 +49,7 @@ crossValidate(measurements, outcome, ...) |
49 | 49 |
nRepeats = 20, |
50 | 50 |
nCores = 1, |
51 | 51 |
characteristicsLabel = NULL, |
52 |
- ... |
|
52 |
+ extraParams = NULL |
|
53 | 53 |
) |
54 | 54 |
|
55 | 55 |
\S4method{crossValidate}{data.frame}( |
... | ... |
@@ -66,7 +66,7 @@ crossValidate(measurements, outcome, ...) |
66 | 66 |
nRepeats = 20, |
67 | 67 |
nCores = 1, |
68 | 68 |
characteristicsLabel = NULL, |
69 |
- ... |
|
69 |
+ extraParams = NULL |
|
70 | 70 |
) |
71 | 71 |
|
72 | 72 |
\S4method{crossValidate}{matrix}( |
... | ... |
@@ -83,7 +83,7 @@ crossValidate(measurements, outcome, ...) |
83 | 83 |
nRepeats = 20, |
84 | 84 |
nCores = 1, |
85 | 85 |
characteristicsLabel = NULL, |
86 |
- ... |
|
86 |
+ extraParams = NULL |
|
87 | 87 |
) |
88 | 88 |
|
89 | 89 |
\S4method{crossValidate}{list}( |
... | ... |
@@ -100,7 +100,7 @@ crossValidate(measurements, outcome, ...) |
100 | 100 |
nRepeats = 20, |
101 | 101 |
nCores = 1, |
102 | 102 |
characteristicsLabel = NULL, |
103 |
- ... |
|
103 |
+ extraParams = NULL |
|
104 | 104 |
) |
105 | 105 |
|
106 | 106 |
\method{train}{matrix}(x, outcomeTrain, ...) |
... | ... |
@@ -116,7 +116,7 @@ crossValidate(measurements, outcome, ...) |
116 | 116 |
performanceType = "auto", |
117 | 117 |
multiViewMethod = "none", |
118 | 118 |
assayIDs = "all", |
119 |
- ... |
|
119 |
+ extraParams = NULL |
|
120 | 120 |
) |
121 | 121 |
|
122 | 122 |
\method{train}{list}(x, outcomeTrain, ...) |
... | ... |
@@ -127,7 +127,7 @@ crossValidate(measurements, outcome, ...) |
127 | 127 |
} |
128 | 128 |
\arguments{ |
129 | 129 |
\item{measurements}{Either a \code{\link{DataFrame}}, \code{\link{data.frame}}, \code{\link{matrix}}, \code{\link{MultiAssayExperiment}} |
130 |
-or a list of these objects containing the data.} |
|
130 |
+or a list of the basic tabular objects containing the data.} |
|
131 | 131 |
|
132 | 132 |
\item{outcome}{A vector of class labels of class \code{\link{factor}} of the |
133 | 133 |
same length as the number of samples in \code{measurements} or a character vector of length 1 containing the |
... | ... |
@@ -136,7 +136,7 @@ length 2 or 3 specifying the time and event columns in \code{measurements} for s |
136 | 136 |
\code{\link{MultiAssayExperiment}}, the column name(s) in \code{colData(measurements)} representing the outcome. If column names |
137 | 137 |
of survival information, time must be in first column and event status in the second.} |
138 | 138 |
|
139 |
-\item{...}{Parameters passed into \code{\link{prepareData}} which control subsetting and filtering of input data.} |
|
139 |
+\item{...}{For \code{train} and \code{predict} functions, parameters not used by the non-DataFrame signature functions but passed into the DataFrame signature function.} |
|
140 | 140 |
|
141 | 141 |
\item{nFeatures}{The number of features to be used for classification. If this is a single number, the same number of features will be used for all comparisons |
142 | 142 |
or assays. If a numeric vector these will be optimised over using \code{selectionOptimisation}. If a named vector with the same names of multiple assays, |
... | ... |
@@ -144,7 +144,7 @@ a different number of features will be used for each assay. If a named list of v |
144 | 144 |
Set to NULL or "all" if all features should be used.} |
145 | 145 |
|
146 | 146 |
\item{selectionMethod}{Default: \code{"auto"}. A character vector of feature selection methods to compare. If a named character vector with names corresponding to different assays, |
147 |
-and performing multiview classification, the respective classification methods will be used on each assay. If \code{"auto"}, t-test (two categories) / F-test (three or more categories) ranking |
|
147 |
+and performing multiview classification, the respective selection methods will be used on each assay. If \code{"auto"}, t-test (two categories) / F-test (three or more categories) ranking |
|
148 | 148 |
and top \code{nFeatures} optimisation is done. Otherwise, the ranking method is per-feature Cox proportional hazards p-value.} |
149 | 149 |
|
150 | 150 |
\item{selectionOptimisation}{A character of "Resubstitution", "Nested CV" or "none" specifying the approach used to optimise \code{nFeatures}.} |
... | ... |
@@ -155,7 +155,7 @@ and top \code{nFeatures} optimisation is done. Otherwise, the ranking method is |
155 | 155 |
and performing multiview classification, the respective classification methods will be used on each assay. If \code{"auto"}, then a random forest is used for a classification |
156 | 156 |
task or Cox proportional hazards model for a survival task.} |
157 | 157 |
|
158 |
-\item{multiViewMethod}{A character vector specifying the multiview method or data integration approach to use.} |
|
158 |
+\item{multiViewMethod}{Default: \code{"none"}. A character vector specifying the multiview method or data integration approach to use. See \code{available("multiViewMethod") for possibilities.}} |
|
159 | 159 |
|
160 | 160 |
\item{assayCombinations}{A character vector or list of character vectors proposing the assays or, in the case of a list, combination of assays to use |
161 | 161 |
with each element being a vector of assays to combine. Special value \code{"all"} means all possible subsets of assays.} |
... | ... |
@@ -168,6 +168,10 @@ with each element being a vector of assays to combine. Special value \code{"all" |
168 | 168 |
|
169 | 169 |
\item{characteristicsLabel}{A character specifying an additional label for the cross-validation run.} |
170 | 170 |
|
171 |
+\item{extraParams}{A list of parameters that will be used to overwrite default settings of transformation, selection, or model-building functions or |
|
172 |
+parameters which will be passed into the data cleaning function. The names of the list must be one of \code{"prepare"}, |
|
173 |
+\code{"select"}, \code{"train"}, \code{"predict"}.} |
|
174 |
+ |
|
171 | 175 |
\item{x}{Same as \code{measurements} but only training samples.} |
172 | 176 |
|
173 | 177 |
\item{outcomeTrain}{For the \code{train} function, either a factor vector of classes, a \code{\link{Surv}} object, or |
... | ... |
@@ -190,8 +194,10 @@ An object of class \code{\link{ClassifyResult}} |
190 | 194 |
} |
191 | 195 |
\description{ |
192 | 196 |
This function has been designed to facilitate the comparison of classification |
193 |
-methods using cross-validation. A selection of typical comparisons are implemented. The \code{train} function |
|
194 |
-is a convenience method for training on one data set and predicting on an independent validation data set. |
|
197 |
+methods using cross-validation, particularly when there are multiple assays per biological unit. |
|
198 |
+A selection of typical comparisons are implemented. The \code{train} function |
|
199 |
+is a convenience method for training on one data set and likewise \code{predict} for predicting on an |
|
200 |
+independent validation data set. |
|
195 | 201 |
} |
196 | 202 |
\details{ |
197 | 203 |
\code{classifier} can be any a keyword for any of the implemented approaches as shown by \code{available()}. |