Browse code

- extraParams parameter of crossValidate allows any parameter in a predefined parameter set (in simpleParams.R) to be overwritten or unused ones to be set by the user. - .permutations private utility function which allows any position(s) to be fixed to a certain value.

Dario Strbenac authored on 10/02/2023 02:10:18
Showing 4 changed files

... ...
@@ -3,8 +3,8 @@ Type: Package
3 3
 Title: A framework for cross-validated classification problems, with
4 4
        applications to differential variability and differential
5 5
        distribution testing
6
-Version: 3.3.10
7
-Date: 2022-12-12
6
+Version: 3.3.11
7
+Date: 2023-02-10
8 8
 Authors@R:
9 9
     c(
10 10
     person(given = "Dario", family = "Strbenac", email = "dario.strbenac@sydney.edu.au", role = c("aut", "cre")),
... ...
@@ -25,7 +25,7 @@ Suggests: limma, edgeR, car, Rmixmod, gridExtra (>= 2.0.0), cowplot,
25 25
         BiocStyle, pamr, PoiClaClu, parathyroidSE, knitr, htmltools, gtable,
26 26
         scales, e1071, rmarkdown, IRanges, robustbase, glmnet, class, randomForestSRC,
27 27
         MatrixModels, xgboost
28
-Description: The software formalises a framework for classification and survival model evaluatio
28
+Description: The software formalises a framework for classification and survival model evaluation
29 29
              in R. There are four stages; Data transformation, feature selection, model training,
30 30
              and prediction. The requirements of variable types and variable order are
31 31
              fixed, but specialised variables for functions can also be provided.
... ...
@@ -35,7 +35,7 @@ Description: The software formalises a framework for classification and survival
35 35
              may be developed by the user, by creating an interface to the framework.
36 36
 License: GPL-3
37 37
 Packaged: 2014-10-18 11:16:55 UTC; dario
38
-RoxygenNote: 7.2.2
38
+RoxygenNote: 7.2.3
39 39
 NeedsCompilation: yes
40 40
 Collate:
41 41
     'ROCplot.R'
... ...
@@ -1,11 +1,13 @@
1 1
 #' Cross-validation to evaluate classification performance.
2 2
 #' 
3 3
 #' This function has been designed to facilitate the comparison of classification
4
-#' methods using cross-validation. A selection of typical comparisons are implemented. The \code{train} function
5
-#' is a convenience method for training on one data set and predicting on an independent validation data set.
4
+#' methods using cross-validation, particularly when there are multiple assays per biological unit.
5
+#' A selection of typical comparisons are implemented. The \code{train} function
6
+#' is a convenience method for training on one data set and likewise \code{predict} for predicting on an
7
+#' independent validation data set.
6 8
 #'
7 9
 #' @param measurements Either a \code{\link{DataFrame}}, \code{\link{data.frame}}, \code{\link{matrix}}, \code{\link{MultiAssayExperiment}} 
8
-#' or a list of these objects containing the data.
10
+#' or a list of the basic tabular objects containing the data.
9 11
 #' @param x Same as \code{measurements} but only training samples.
10 12
 #' @param outcome A vector of class labels of class \code{\link{factor}} of the
11 13
 #' same length as the number of samples in \code{measurements} or a character vector of length 1 containing the
... ...
@@ -17,13 +19,15 @@
17 19
 #' a character string, or vector of such strings, containing column name(s) of column(s)
18 20
 #' containing either classes or time and event information about survival. If column names
19 21
 #' of survival information, time must be in first column and event status in the second.
20
-#' @param ... Parameters passed into \code{\link{prepareData}} which control subsetting and filtering of input data.
22
+#' @param extraParams A list of parameters that will be used to overwrite default settings of transformation, selection, or model-building functions or
23
+#' parameters which will be passed into the data cleaning function. The names of the list must be one of \code{"prepare"},
24
+#' \code{"select"}, \code{"train"}, \code{"predict"}.
21 25
 #' @param nFeatures The number of features to be used for classification. If this is a single number, the same number of features will be used for all comparisons
22 26
 #' or assays. If a numeric vector these will be optimised over using \code{selectionOptimisation}. If a named vector with the same names of multiple assays, 
23 27
 #' a different number of features will be used for each assay. If a named list of vectors, the respective number of features will be optimised over. 
24 28
 #' Set to NULL or "all" if all features should be used.
25 29
 #' @param selectionMethod Default: \code{"auto"}. A character vector of feature selection methods to compare. If a named character vector with names corresponding to different assays, 
26
-#' and performing multiview classification, the respective classification methods will be used on each assay. If \code{"auto"}, t-test (two categories) / F-test (three or more categories) ranking
30
+#' and performing multiview classification, the respective selection methods will be used on each assay. If \code{"auto"}, t-test (two categories) / F-test (three or more categories) ranking
27 31
 #' and top \code{nFeatures} optimisation is done. Otherwise, the ranking method is per-feature Cox proportional hazards p-value.
28 32
 #' @param selectionOptimisation A character of "Resubstitution", "Nested CV" or "none" specifying the approach used to optimise \code{nFeatures}.
29 33
 #' @param performanceType Default: \code{"auto"}. If \code{"auto"}, then balanced accuracy for classification or C-index for survival. Otherwise, any one of the
... ...
@@ -31,13 +35,14 @@
31 35
 #' @param classifier Default: \code{"auto"}. A character vector of classification methods to compare. If a named character vector with names corresponding to different assays, 
32 36
 #' and performing multiview classification, the respective classification methods will be used on each assay. If \code{"auto"}, then a random forest is used for a classification
33 37
 #' task or Cox proportional hazards model for a survival task.
34
-#' @param multiViewMethod A character vector specifying the multiview method or data integration approach to use.
38
+#' @param multiViewMethod Default: \code{"none"}. A character vector specifying the multiview method or data integration approach to use. See \code{available("multiViewMethod") for possibilities.}
35 39
 #' @param assayCombinations A character vector or list of character vectors proposing the assays or, in the case of a list, combination of assays to use
36 40
 #' with each element being a vector of assays to combine. Special value \code{"all"} means all possible subsets of assays.
37 41
 #' @param nFolds A numeric specifying the number of folds to use for cross-validation.
38 42
 #' @param nRepeats A numeric specifying the the number of repeats or permutations to use for cross-validation.
39 43
 #' @param nCores A numeric specifying the number of cores used if the user wants to use parallelisation. 
40 44
 #' @param characteristicsLabel A character specifying an additional label for the cross-validation run.
45
+#' @param ... For \code{train} and \code{predict} functions, parameters not used by the non-DataFrame signature functions but passed into the DataFrame signature function.
41 46
 #' @param object A trained model to predict with.
42 47
 #' @param newData The data to use to make predictions with.
43 48
 #'
... ...
@@ -100,11 +105,14 @@ setMethod("crossValidate", "DataFrame",
100 105
                    nFolds = 5,
101 106
                    nRepeats = 20,
102 107
                    nCores = 1,
103
-                   characteristicsLabel = NULL, ...)
108
+                   characteristicsLabel = NULL, extraParams = NULL)
104 109
 
105 110
           {
106 111
               # Check that data is in the right format, if not already done for MultiAssayExperiment input.
107
-              measurementsAndOutcome <- prepareData(measurements, outcome, ...)
112
+              prepParams <- list(measurements, outcome)
113
+              if("prepare" %in% names(extraParams))
114
+                prepParams <- c(prepParams, extraParams[["prepare"]])
115
+              measurementsAndOutcome <- do.call(prepareData, prepParams)
108 116
               measurements <- measurementsAndOutcome[["measurements"]]
109 117
               outcome <- measurementsAndOutcome[["outcome"]]
110 118
               
... ...
@@ -177,7 +185,8 @@ setMethod("crossValidate", "DataFrame",
177 185
                                       nFolds = nFolds,
178 186
                                       nRepeats = nRepeats,
179 187
                                       nCores = nCores,
180
-                                      characteristicsLabel = characteristicsLabel
188
+                                      characteristicsLabel = characteristicsLabel,
189
+                                      extraParams = extraParams
181 190
                                   )
182 191
                               },
183 192
                               simplify = FALSE)
... ...
@@ -213,7 +222,8 @@ setMethod("crossValidate", "DataFrame",
213 222
                          nFolds = nFolds,
214 223
                          nRepeats = nRepeats,
215 224
                          nCores = nCores,
216
-                         characteristicsLabel = characteristicsLabel)
225
+                         characteristicsLabel = characteristicsLabel,
226
+                         extraParams = extraParams)
217 227
                   }, simplify = FALSE)
218 228
 
219 229
               }
... ...
@@ -246,7 +256,8 @@ setMethod("crossValidate", "DataFrame",
246 256
                          nFolds = nFolds,
247 257
                          nRepeats = nRepeats,
248 258
                          nCores = nCores,
249
-                         characteristicsLabel = characteristicsLabel)
259
+                         characteristicsLabel = characteristicsLabel,
260
+                         extraParams = extraParams)
250 261
                   }, simplify = FALSE)
251 262
 
252 263
               }
... ...
@@ -280,7 +291,8 @@ setMethod("crossValidate", "DataFrame",
280 291
                          nFolds = nFolds,
281 292
                          nRepeats = nRepeats,
282 293
                          nCores = nCores,
283
-                         characteristicsLabel = characteristicsLabel)
294
+                         characteristicsLabel = characteristicsLabel,
295
+                         extraParams = extraParams)
284 296
                   }, simplify = FALSE)
285 297
 
286 298
               }
... ...
@@ -306,9 +318,13 @@ setMethod("crossValidate", "MultiAssayExperiment",
306 318
                    nFolds = 5,
307 319
                    nRepeats = 20,
308 320
                    nCores = 1,
309
-                   characteristicsLabel = NULL, ...)
321
+                   characteristicsLabel = NULL, extraParams = NULL)
310 322
           {
311
-              measurementsAndOutcome <- prepareData(measurements, outcome, ...)
323
+              # Check that data is in the right format, if not already done for MultiAssayExperiment input.
324
+              prepParams <- list(measurements, outcome)
325
+              if("prepare" %in% names(extraParams))
326
+                prepParams <- c(prepParams, extraParams[["prepare"]])
327
+              measurementsAndOutcome <- do.call(prepareData, prepParams)
312 328
 
313 329
               crossValidate(measurements = measurementsAndOutcome[["measurements"]],
314 330
                             outcome = measurementsAndOutcome[["outcome"]], 
... ...
@@ -322,7 +338,8 @@ setMethod("crossValidate", "MultiAssayExperiment",
322 338
                             nFolds = nFolds,
323 339
                             nRepeats = nRepeats,
324 340
                             nCores = nCores,
325
-                            characteristicsLabel = characteristicsLabel)
341
+                            characteristicsLabel = characteristicsLabel,
342
+                            extraParams = extraParams)
326 343
           })
327 344
 
328 345
 #' @rdname crossValidate
... ...
@@ -340,7 +357,7 @@ setMethod("crossValidate", "data.frame", # data.frame of numeric measurements.
340 357
                    nFolds = 5,
341 358
                    nRepeats = 20,
342 359
                    nCores = 1,
343
-                   characteristicsLabel = NULL, ...)
360
+                   characteristicsLabel = NULL, extraParams = NULL)
344 361
           {
345 362
               measurements <- S4Vectors::DataFrame(measurements, check.names = FALSE)
346 363
               crossValidate(measurements = measurements,
... ...
@@ -355,7 +372,7 @@ setMethod("crossValidate", "data.frame", # data.frame of numeric measurements.
355 372
                             nFolds = nFolds,
356 373
                             nRepeats = nRepeats,
357 374
                             nCores = nCores,
358
-                            characteristicsLabel = characteristicsLabel, ...) # ... for prepareData.
375
+                            characteristicsLabel = characteristicsLabel, extraParams = extraParams)
359 376
           })
360 377
 
361 378
 #' @rdname crossValidate
... ...
@@ -373,7 +390,7 @@ setMethod("crossValidate", "matrix", # Matrix of numeric measurements.
373 390
                    nFolds = 5,
374 391
                    nRepeats = 20,
375 392
                    nCores = 1,
376
-                   characteristicsLabel = NULL, ...)
393
+                   characteristicsLabel = NULL, extraParams = NULL)
377 394
           {
378 395
               measurements <- S4Vectors::DataFrame(measurements, check.names = FALSE)
379 396
               crossValidate(measurements = measurements,
... ...
@@ -388,7 +405,7 @@ setMethod("crossValidate", "matrix", # Matrix of numeric measurements.
388 405
                             nFolds = nFolds,
389 406
                             nRepeats = nRepeats,
390 407
                             nCores = nCores,
391
-                            characteristicsLabel = characteristicsLabel, ...) # ... for prepareData.
408
+                            characteristicsLabel = characteristicsLabel, extraParams = extraParams)
392 409
           })
393 410
 
394 411
 # This expects that each table is about the same set of samples and thus
... ...
@@ -408,7 +425,7 @@ setMethod("crossValidate", "list",
408 425
                    nFolds = 5,
409 426
                    nRepeats = 20,
410 427
                    nCores = 1,
411
-                   characteristicsLabel = NULL, ...)
428
+                   characteristicsLabel = NULL, extraParams = NULL)
412 429
           {
413 430
               # Check data type is valid
414 431
               if (!(all(sapply(measurements, class) %in% c("data.frame", "DataFrame", "matrix")))) {
... ...
@@ -456,7 +473,7 @@ setMethod("crossValidate", "list",
456 473
                             nFolds = nFolds,
457 474
                             nRepeats = nRepeats,
458 475
                             nCores = nCores,
459
-                            characteristicsLabel = characteristicsLabel, ...)
476
+                            characteristicsLabel = characteristicsLabel, extraParams = extraParams)
460 477
           })
461 478
 
462 479
 
... ...
@@ -544,6 +561,8 @@ generateCrossValParams <- function(nRepeats, nFolds, nCores, selectionOptimisati
544 561
     CrossValParams(permutations = nRepeats, folds = nFolds, parallelParams = BPparam, tuneMode = tuneMode)
545 562
 }
546 563
 
564
+
565
+# Returns a 
547 566
 generateModellingParams <- function(assayIDs,
548 567
                                     measurements,
549 568
                                     nFeatures,
... ...
@@ -551,7 +570,8 @@ generateModellingParams <- function(assayIDs,
551 570
                                     selectionOptimisation,
552 571
                                     performanceType = "auto",
553 572
                                     classifier,
554
-                                    multiViewMethod = "none"
573
+                                    multiViewMethod = "none",
574
+                                    extraParams
555 575
 ){
556 576
     if(multiViewMethod != "none") {
557 577
         params <- generateMultiviewParams(assayIDs,
... ...
@@ -559,15 +579,13 @@ generateModellingParams <- function(assayIDs,
559 579
                                           nFeatures,
560 580
                                           selectionMethod,
561 581
                                           selectionOptimisation,
562
-                                          performanceType = performanceType,
582
+                                          performanceType,
563 583
                                           classifier,
564
-                                          multiViewMethod)
584
+                                          multiViewMethod, extraParams)
565 585
         return(params)
566 586
     }
567 587
 
568 588
 
569
-
570
-
571 589
     if(length(assayIDs) > 1) obsFeatures <- sum(S4Vectors::mcols(measurements)[, "assay"] %in% assayIDs)
572 590
     else obsFeatures <- ncol(measurements)
573 591
 
... ...
@@ -576,8 +594,7 @@ generateModellingParams <- function(assayIDs,
576 594
 
577 595
     if(max(nFeatures) > obsFeatures) {
578 596
 
579
-        warning("nFeatures greater than the max number of features in data.
580
-                                                 Setting to max")
597
+        warning("nFeatures greater than the max number of features in data. Setting to max")
581 598
         nFeatures <- pmin(nFeatures, obsFeatures)
582 599
     }
583 600
 
... ...
@@ -585,21 +602,72 @@ generateModellingParams <- function(assayIDs,
585 602
     
586 603
     # Check classifier
587 604
     knownClassifiers <- .ClassifyRenvir[["classifyKeywords"]][, "classifier Keyword"]
588
-    if(!classifier %in% knownClassifiers)
589
-        stop(paste("Classifier must exactly match of these (be careful of case):", paste(knownClassifiers, collapse = ", ")))
590
-
605
+    if(any(!classifier %in% knownClassifiers))
606
+        stop(paste("classifier must exactly match these options (be careful of case):", paste(knownClassifiers, collapse = ", ")))
607
+    
608
+    # Always return a list for ease of processing. Unbox at end if just one.
591 609
     classifierParams <- .classifierKeywordToParams(classifier)
610
+
611
+    # Modify the parameters with performanceType addition and any other to overwrite.
592 612
     if(!is.null(classifierParams$trainParams@tuneParams))
593 613
       classifierParams$trainParams@tuneParams <- c(classifierParams$trainParams@tuneParams, performanceType = performanceType)
594 614
 
615
+    if(!is.null(extraParams) && "train" %in% names(extraParams))
616
+    {
617
+      for(paramIndex in seq_along(extraParams[["train"]]))
618
+      {
619
+        parameter <- extraParams[["train"]][[paramIndex]]
620
+        parameterName <- names(extraParams[["train"]])[paramIndex]
621
+        if(length(parameter) == 1)
622
+        {
623
+          if(is.null(classifierParams$trainParams@otherParams)) classifierParams$trainParams@otherParams <- extraParams[["train"]][paramIndex]
624
+          else classifierParams$trainParams@otherParams[parameterName] <- parameter
625
+        } else {
626
+          if(is.null(classifierParams$trainParams@tuneParams)) classifierParams$trainParams@tuneParams <- extraParams[["train"]][paramIndex]
627
+          else classifierParams$trainParams@tuneParams[parameterName] <- parameter # Multiple values, so tune them.
628
+        }
629
+      }
630
+    }
631
+    if(!is.null(extraParams) && "predict" %in% names(extraParams))
632
+    {
633
+      for(paramIndex in seq_along(extraParams[["predict"]]))
634
+      {
635
+        parameter <- extraParams[["predict"]][[paramIndex]]
636
+        parameterName <- names(extraParams[["predict"]])[paramIndex]
637
+        if(length(parameter) == 1)
638
+        {
639
+          if(is.null(classifierParams$predictParams@otherParams)) classifierParams$predictParams@otherParams <- extraParams[["predict"]][paramIndex]
640
+          else classifierParams$predictParams@otherParams[parameterName] <- parameter
641
+        } else {
642
+          if(is.null(classifierParams$predictParams@tuneParams)) classifierParams$predictParams@tuneParams <- extraParams[["predict"]][paramIndex]
643
+          else classifierParams$predictParams@tuneParams[parameterName] <- parameter # Multiple values, so tune them.
644
+        }
645
+      }
646
+    }    
647
+    
595 648
     selectionMethod <- unlist(selectionMethod)
596
-
597
-    selectionMethod <- ifelse(is.null(selectionMethod), "none", selectionMethod)
649
+    if(is.null(selectionMethod)) selectionMethod <- "none"
598 650
 
599 651
     if(selectionMethod != "none")
600
-        selectParams <- SelectParams(selectionMethod,
601
-                        tuneParams = list(nFeatures = nFeatures, performanceType = performanceType))
602
-    else selectParams <- NULL
652
+    {
653
+      selectParams <- SelectParams(selectionMethod, tuneParams = list(nFeatures = nFeatures, performanceType = performanceType))
654
+      if(!is.null(extraParams) && "select" %in% names(extraParams))
655
+      {
656
+        for(paramIndex in seq_along(extraParams[["select"]]))
657
+        {
658
+          parameter <- extraParams[["select"]][[paramIndex]]
659
+          parameterName <- names(extraParams[["select"]])[paramIndex]
660
+          if(length(parameter) == 1)
661
+          {
662
+            if(is.null(classifierParams$selectParams@otherParams)) classifierParams$selectParams@otherParams <- extraParams[["select"]][paramIndex]
663
+            else classifierParams$selectParams@otherParams[parameterName] <- parameter
664
+          } else {
665
+            if(is.null(classifierParams$selectParams@tuneParams)) classifierParams$selectParams@tuneParams <- extraParams[["select"]][paramIndex]
666
+            else classifierParams$selectParams@tuneParams[parameterName] <- parameter # Multiple values, so tune them.
667
+          }
668
+        }
669
+      }
670
+    } else {selectParams <- NULL}
603 671
 
604 672
     params <- ModellingParams(
605 673
         balancing = "none",
... ...
@@ -617,7 +685,6 @@ generateModellingParams <- function(assayIDs,
617 685
     # }
618 686
     #
619 687
 
620
-
621 688
     params
622 689
 
623 690
 }
... ...
@@ -632,7 +699,7 @@ generateMultiviewParams <- function(assayIDs,
632 699
                                     selectionOptimisation,
633 700
                                     performanceType,
634 701
                                     classifier,
635
-                                    multiViewMethod){
702
+                                    multiViewMethod, extraParams){
636 703
 
637 704
     if(multiViewMethod == "merge"){
638 705
 
... ...
@@ -651,18 +718,21 @@ generateMultiviewParams <- function(assayIDs,
651 718
                                      selectionOptimisation = selectionOptimisation,
652 719
                                      performanceType = performanceType,
653 720
                                      classifier = classifier,
654
-                                     multiViewMethod = "none"),
721
+                                     multiViewMethod = "none",
722
+                                     extraParams = extraParams),
655 723
                                  SIMPLIFY = FALSE)
656 724
 
657
-        # Generate some params for merged model.
725
+        # Generate some params for merged model. Which ones?
726
+        # Reconsider how to do this well later. 
658 727
         params <- generateModellingParams(assayIDs = assayIDs,
659 728
                                           measurements = measurements,
660 729
                                           nFeatures = nFeatures,
661
-                                          selectionMethod = selectionMethod,
730
+                                          selectionMethod = selectionMethod[[1]],
662 731
                                           selectionOptimisation = "none",
663 732
                                           performanceType = performanceType,
664
-                                          classifier = classifier,
665
-                                          multiViewMethod = "none")
733
+                                          classifier = classifier[[1]],
734
+                                          multiViewMethod = "none",
735
+                                          extraParams = extraParams)
666 736
 
667 737
         # Update selectParams to use
668 738
         params@selectParams <- SelectParams("selectMulti",
... ...
@@ -690,36 +760,8 @@ generateMultiviewParams <- function(assayIDs,
690 760
                                  MoreArgs = list(
691 761
                                      selectionOptimisation = selectionOptimisation,
692 762
                                      performanceType = performanceType,
693
-                                     multiViewMethod = "none"),
694
-                                 SIMPLIFY = FALSE)
695
-
696
-
697
-        params <- ModellingParams(
698
-            balancing = "none",
699
-            selectParams = NULL,
700
-            trainParams = TrainParams(prevalTrainInterface, params = paramsAssays, characteristics = paramsAssays$clinical@trainParams@characteristics),
701
-            predictParams = PredictParams(prevalPredictInterface, characteristics = paramsAssays$clinical@predictParams@characteristics)
702
-        )
703
-
704
-        return(params)
705
-    }
706
-
707
-    if(multiViewMethod == "prevalidation"){
708
-
709
-        # Split measurements up by assay.
710
-        assayTrain <- sapply(assayIDs, function(assayID) measurements[, S4Vectors::mcols(measurements)[["assay"]] %in% assayID], simplify = FALSE)
711
-
712
-        # Generate params for each assay. This could be extended to have different selectionMethods for each type
713
-        paramsAssays <- mapply(generateModellingParams,
714
-                                 nFeatures = nFeatures[assayIDs],
715
-                                 selectionMethod = selectionMethod[assayIDs],
716
-                                 assayIDs = assayIDs,
717
-                                 measurements = assayTrain[assayIDs],
718
-                                 classifier = classifier[assayIDs],
719
-                                 MoreArgs = list(
720
-                                     selectionOptimisation = selectionOptimisation,
721
-                                     performanceType = performanceType,
722
-                                     multiViewMethod = "none"),
763
+                                     multiViewMethod = "none",
764
+                                     extraParams = extraParams),
723 765
                                  SIMPLIFY = FALSE)
724 766
 
725 767
 
... ...
@@ -733,7 +775,6 @@ generateMultiviewParams <- function(assayIDs,
733 775
         return(params)
734 776
     }
735 777
 
736
-
737 778
     if(multiViewMethod == "PCA"){
738 779
 
739 780
         # Split measurements up by assay.
... ...
@@ -748,7 +789,8 @@ generateMultiviewParams <- function(assayIDs,
748 789
                                  classifier = classifier["clinical"],
749 790
                                  selectionOptimisation = selectionOptimisation,
750 791
                                  performanceType = performanceType,
751
-                                 multiViewMethod = "none"))
792
+                                 multiViewMethod = "none",
793
+                                 extraParams = extraParams))
752 794
 
753 795
 
754 796
         params <- ModellingParams(
... ...
@@ -775,7 +817,7 @@ CV <- function(measurements, outcome, x, outcomeTrain, measurementsTest, outcome
775 817
                nFolds,
776 818
                nRepeats,
777 819
                nCores,
778
-               characteristicsLabel)
820
+               characteristicsLabel, extraParams)
779 821
 
780 822
 {
781 823
     # Which data-types or data-views are present?
... ...
@@ -797,7 +839,7 @@ CV <- function(measurements, outcome, x, outcomeTrain, measurementsTest, outcome
797 839
                                                selectionOptimisation = selectionOptimisation,
798 840
                                                performanceType = performanceType,
799 841
                                                classifier = classifier,
800
-                                               multiViewMethod = multiViewMethod)
842
+                                               multiViewMethod = multiViewMethod, extraParams = extraParams)
801 843
     
802 844
     if(length(assayIDs) > 1 || length(assayIDs) == 1 && assayIDs != 1) assayText <- assayIDs else assayText <- NULL
803 845
     characteristics <- S4Vectors::DataFrame(characteristic = c(if(!is.null(assayText)) "Assay Name" else NULL, "Classifier Name", "Selection Name", "multiViewMethod", "characteristicsLabel"), value = c(if(!is.null(assayText)) paste(assayText, collapse = ", ") else NULL, paste(classifier, collapse = ", "),  paste(selectionMethod, collapse = ", "), multiViewMethod, characteristicsLabel))
... ...
@@ -848,16 +890,12 @@ train.data.frame <- function(x, outcomeTrain, ...)
848 890
 #' @method train DataFrame
849 891
 #' @export
850 892
 train.DataFrame <- function(x, outcomeTrain, selectionMethod = "auto", nFeatures = 20, classifier = "auto", performanceType = "auto",
851
-                            multiViewMethod = "none", assayIDs = "all", ...) # ... for prepareData.
893
+                            multiViewMethod = "none", assayIDs = "all", extraParams = NULL)
852 894
                    {
853
-              prepArgs <- list(x, outcomeTrain)
854
-              extraInputs <- list(...)
855
-              prepExtras <- numeric()
856
-              if(length(extraInputs) > 0)
857
-                prepExtras <- which(names(extraInputs) %in% .ClassifyRenvir[["prepareDataFormals"]])
858
-              if(length(prepExtras) > 0)
859
-                prepArgs <- append(prepArgs, extraInputs[prepExtras])
860
-              measurementsAndOutcome <- do.call(prepareData, prepArgs)
895
+              prepParams <- list(x, outcomeTrain)
896
+              if(!is.null(extraParams) && "prepare" %in% names(extraParams))
897
+                prepParams <- c(prepParams, extraParams[["prepare"]])
898
+              measurementsAndOutcome <- do.call(prepareData, prepParams)
861 899
               
862 900
               # Ensure performance type is one of the ones that can be calculated by the package.
863 901
               if(!performanceType %in% c("auto", .ClassifyRenvir[["performanceTypes"]]))
... ...
@@ -895,13 +933,46 @@ train.DataFrame <- function(x, outcomeTrain, selectionMethod = "auto", nFeatures
895 933
                                   
896 934
                                   modellingParams <- generateModellingParams(assayIDs = assayIDs, measurements = measurements, nFeatures = nFeatures,
897 935
                                                      selectionMethod = selectionMethod, selectionOptimisation = "Resubstitution", performanceType = performanceType,
898
-                                                     classifier = classifier, multiViewMethod = "none")
936
+                                                     classifier = classifier, multiViewMethod = "none", extraParams = extraParams)
899 937
                                   topFeatures <- .doSelection(measurementsUse, outcomeTrain, CrossValParams(), modellingParams, verbose = 0)
900 938
                                   selectedFeaturesIndices <- topFeatures[[2]] # Extract for subsetting.
901 939
                                   tuneDetailsSelect <- topFeatures[[3]]
902 940
                                   measurementsUse <- measurementsUse[, selectedFeaturesIndices]
903 941
 
904 942
                                   classifierParams <- .classifierKeywordToParams(classifierForAssay)
943
+                                  if(!is.null(extraParams) && "train" %in% names(extraParams))
944
+                                  {
945
+                                     for(paramIndex in seq_along(extraParams[["train"]]))
946
+                                     {
947
+                                        parameter <- extraParams[["train"]][[paramIndex]]
948
+                                        parameterName <- names(extraParams[["train"]])[paramIndex]
949
+                                        if(length(parameter) == 1)
950
+                                        {
951
+                                          if(is.null(classifierParams$trainParams@otherParams)) classifierParams$trainParams@otherParams <- extraParams[["train"]][paramIndex]
952
+                                          else classifierParams$trainParams@otherParams[parameterName] <- parameter
953
+                                        } else {
954
+                                          if(is.null(classifierParams$trainParams@tuneParams)) classifierParams$trainParams@tuneParams <- extraParams[["train"]][paramIndex]
955
+                                          else classifierParams$trainParams@tuneParams[parameterName] <- parameter # Multiple values, so tune them.
956
+                                        }
957
+                                      }
958
+                                    }
959
+                                  if(!is.null(extraParams) && "predict" %in% names(extraParams))
960
+                                  {
961
+                                      for(paramIndex in seq_along(extraParams[["predict"]]))
962
+                                      {
963
+                                        parameter <- extraParams[["predict"]][[paramIndex]]
964
+                                        parameterName <- names(extraParams[["predict"]])[paramIndex]
965
+                                        if(length(parameter) == 1)
966
+                                        {
967
+                                          if(is.null(classifierParams$predictParams@otherParams)) classifierParams$predictParams@otherParams <- extraParams[["predict"]][paramIndex]
968
+                                          else classifierParams$predictParams@otherParams[parameterName] <- parameter
969
+                                        } else {
970
+                                          if(is.null(classifierParams$predictParams@tuneParams)) classifierParams$predictParams@tuneParams <- extraParams[["predict"]][paramIndex]
971
+                                          else classifierParams$predictParams@tuneParams[parameterName] <- parameter # Multiple values, so tune them.
972
+                                        }
973
+                                      }
974
+                                    }
975
+                                  
905 976
                                   modellingParams <- ModellingParams(balancing = "none", selectParams = NULL,
906 977
                                                                      trainParams = classifierParams$trainParams, predictParams = classifierParams$predictParams)
907 978
                                   if(!is.null(tuneDetailsSelect))
... ...
@@ -1079,6 +1150,6 @@ predict.trainedByClassifyR <- function(object, newData, ...)
1079 1150
     predictFunctionUse <- attr(object, "predictFunction")
1080 1151
     class(object) <- rev(class(object)) # Now want the predict method of the specific model to be picked, so put model class first.
1081 1152
     if (is(object, "listOfModels")) 
1082
-         mapply(function(model, assay) predictFunctionUse(model, assay), object, newData, SIMPLIFY = FALSE)
1083
-    else predictFunctionUse(object, newData) # Object is itself a trained model and it is assumed that a predict method is defined for it.
1153
+         mapply(function(model, assay) predictFunctionUse(model, assay), object, newData, MoreArgs = list(...), SIMPLIFY = FALSE)
1154
+    else do.call(predictFunctionUse, list(object, newData, ...)) # Object is itself a trained model and it is assumed that a predict method is defined for it.
1084 1155
 }
... ...
@@ -656,3 +656,26 @@ predict.dlda <- function(object, newdata, ...) { # Remove once sparsediscrim is
656 656
 .dmvnorm_diag <- function(x, mean, sigma) { # Remove once sparsediscrim is reinstated to CRAN.
657 657
   exp(sum(dnorm(x, mean=mean, sd=sqrt(sigma), log=TRUE)))
658 658
 }
659
+
660
+# Function to create permutations of a vector, with the possibility to restrict values at certain positions.
661
+# fixed parameter is a data frame with first column position and second column value.
662
+.permutations <- function(data, fixed = NULL)
663
+{
664
+  items <- length(data)
665
+  multipliedTo1 <- factorial(items)
666
+  if(items > 1) 
667
+    permutations <- structure(vapply(seq_along(data), function(index)
668
+                     rbind(data[index], .permutations(data[-index])), 
669
+                     data[rep(1L, multipliedTo1)]), dim = c(items, multipliedTo1))
670
+  else permutations <- data
671
+  
672
+  if(!is.null(fixed))
673
+  {
674
+    for(rowIndex in seq_len(nrow(fixed)))
675
+    {
676
+      keepColumns <- permutations[fixed[rowIndex, 1], ] == fixed[rowIndex, 2]
677
+      permutations <- permutations[, keepColumns]
678
+    }
679
+  }
680
+  permutations
681
+}
... ...
@@ -32,7 +32,7 @@ crossValidate(measurements, outcome, ...)
32 32
   nRepeats = 20,
33 33
   nCores = 1,
34 34
   characteristicsLabel = NULL,
35
-  ...
35
+  extraParams = NULL
36 36
 )
37 37
 
38 38
 \S4method{crossValidate}{MultiAssayExperiment}(
... ...
@@ -49,7 +49,7 @@ crossValidate(measurements, outcome, ...)
49 49
   nRepeats = 20,
50 50
   nCores = 1,
51 51
   characteristicsLabel = NULL,
52
-  ...
52
+  extraParams = NULL
53 53
 )
54 54
 
55 55
 \S4method{crossValidate}{data.frame}(
... ...
@@ -66,7 +66,7 @@ crossValidate(measurements, outcome, ...)
66 66
   nRepeats = 20,
67 67
   nCores = 1,
68 68
   characteristicsLabel = NULL,
69
-  ...
69
+  extraParams = NULL
70 70
 )
71 71
 
72 72
 \S4method{crossValidate}{matrix}(
... ...
@@ -83,7 +83,7 @@ crossValidate(measurements, outcome, ...)
83 83
   nRepeats = 20,
84 84
   nCores = 1,
85 85
   characteristicsLabel = NULL,
86
-  ...
86
+  extraParams = NULL
87 87
 )
88 88
 
89 89
 \S4method{crossValidate}{list}(
... ...
@@ -100,7 +100,7 @@ crossValidate(measurements, outcome, ...)
100 100
   nRepeats = 20,
101 101
   nCores = 1,
102 102
   characteristicsLabel = NULL,
103
-  ...
103
+  extraParams = NULL
104 104
 )
105 105
 
106 106
 \method{train}{matrix}(x, outcomeTrain, ...)
... ...
@@ -116,7 +116,7 @@ crossValidate(measurements, outcome, ...)
116 116
   performanceType = "auto",
117 117
   multiViewMethod = "none",
118 118
   assayIDs = "all",
119
-  ...
119
+  extraParams = NULL
120 120
 )
121 121
 
122 122
 \method{train}{list}(x, outcomeTrain, ...)
... ...
@@ -127,7 +127,7 @@ crossValidate(measurements, outcome, ...)
127 127
 }
128 128
 \arguments{
129 129
 \item{measurements}{Either a \code{\link{DataFrame}}, \code{\link{data.frame}}, \code{\link{matrix}}, \code{\link{MultiAssayExperiment}} 
130
-or a list of these objects containing the data.}
130
+or a list of the basic tabular objects containing the data.}
131 131
 
132 132
 \item{outcome}{A vector of class labels of class \code{\link{factor}} of the
133 133
 same length as the number of samples in \code{measurements} or a character vector of length 1 containing the
... ...
@@ -136,7 +136,7 @@ length 2 or 3 specifying the time and event columns in \code{measurements} for s
136 136
 \code{\link{MultiAssayExperiment}}, the column name(s) in \code{colData(measurements)} representing the outcome.  If column names
137 137
 of survival information, time must be in first column and event status in the second.}
138 138
 
139
-\item{...}{Parameters passed into \code{\link{prepareData}} which control subsetting and filtering of input data.}
139
+\item{...}{For \code{train} and \code{predict} functions, parameters not used by the non-DataFrame signature functions but passed into the DataFrame signature function.}
140 140
 
141 141
 \item{nFeatures}{The number of features to be used for classification. If this is a single number, the same number of features will be used for all comparisons
142 142
 or assays. If a numeric vector these will be optimised over using \code{selectionOptimisation}. If a named vector with the same names of multiple assays, 
... ...
@@ -144,7 +144,7 @@ a different number of features will be used for each assay. If a named list of v
144 144
 Set to NULL or "all" if all features should be used.}
145 145
 
146 146
 \item{selectionMethod}{Default: \code{"auto"}. A character vector of feature selection methods to compare. If a named character vector with names corresponding to different assays, 
147
-and performing multiview classification, the respective classification methods will be used on each assay. If \code{"auto"}, t-test (two categories) / F-test (three or more categories) ranking
147
+and performing multiview classification, the respective selection methods will be used on each assay. If \code{"auto"}, t-test (two categories) / F-test (three or more categories) ranking
148 148
 and top \code{nFeatures} optimisation is done. Otherwise, the ranking method is per-feature Cox proportional hazards p-value.}
149 149
 
150 150
 \item{selectionOptimisation}{A character of "Resubstitution", "Nested CV" or "none" specifying the approach used to optimise \code{nFeatures}.}
... ...
@@ -155,7 +155,7 @@ and top \code{nFeatures} optimisation is done. Otherwise, the ranking method is
155 155
 and performing multiview classification, the respective classification methods will be used on each assay. If \code{"auto"}, then a random forest is used for a classification
156 156
 task or Cox proportional hazards model for a survival task.}
157 157
 
158
-\item{multiViewMethod}{A character vector specifying the multiview method or data integration approach to use.}
158
+\item{multiViewMethod}{Default: \code{"none"}. A character vector specifying the multiview method or data integration approach to use. See \code{available("multiViewMethod") for possibilities.}}
159 159
 
160 160
 \item{assayCombinations}{A character vector or list of character vectors proposing the assays or, in the case of a list, combination of assays to use
161 161
 with each element being a vector of assays to combine. Special value \code{"all"} means all possible subsets of assays.}
... ...
@@ -168,6 +168,10 @@ with each element being a vector of assays to combine. Special value \code{"all"
168 168
 
169 169
 \item{characteristicsLabel}{A character specifying an additional label for the cross-validation run.}
170 170
 
171
+\item{extraParams}{A list of parameters that will be used to overwrite default settings of transformation, selection, or model-building functions or
172
+parameters which will be passed into the data cleaning function. The names of the list must be one of \code{"prepare"},
173
+\code{"select"}, \code{"train"}, \code{"predict"}.}
174
+
171 175
 \item{x}{Same as \code{measurements} but only training samples.}
172 176
 
173 177
 \item{outcomeTrain}{For the \code{train} function, either a factor vector of classes, a \code{\link{Surv}} object, or
... ...
@@ -190,8 +194,10 @@ An object of class \code{\link{ClassifyResult}}
190 194
 }
191 195
 \description{
192 196
 This function has been designed to facilitate the comparison of classification
193
-methods using cross-validation. A selection of typical comparisons are implemented. The \code{train} function
194
-is a convenience method for training on one data set and predicting on an independent validation data set.
197
+methods using cross-validation, particularly when there are multiple assays per biological unit.
198
+A selection of typical comparisons are implemented. The \code{train} function
199
+is a convenience method for training on one data set and likewise \code{predict} for predicting on an
200
+independent validation data set.
195 201
 }
196 202
 \details{
197 203
 \code{classifier} can be any a keyword for any of the implemented approaches as shown by \code{available()}.