...
|
...
|
@@ -1,8 +1,24 @@
|
1
|
1
|
<!DOCTYPE html>
|
2
|
|
-<!-- Generated by pkgdown: do not edit by hand --><html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="utf-8"><meta http-equiv="X-UA-Compatible" content="IE=edge"><meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"><meta name="description" content="ClassifyR"><title>An Introduction to **ClassifyR** • ClassifyR</title><script src="../deps/jquery-3.6.0/jquery-3.6.0.min.js"></script><meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"><link href="../deps/bootstrap-5.1.3/bootstrap.min.css" rel="stylesheet"><script src="../deps/bootstrap-5.1.3/bootstrap.bundle.min.js"></script><!-- Font Awesome icons --><link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.12.1/css/all.min.css" integrity="sha256-mmgLkCYLUQbXn0B1SRqzHar6dCnv9oZFPEC1g1cwlkk=" crossorigin="anonymous"><link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.12.1/css/v4-shims.min.css" integrity="sha256-wZjR52fzng1pJHwx4aV2AO3yyTOXrcDW7jBpJtTwVxw=" crossorigin="anonymous"><!-- bootstrap-toc --><script src="https://cdn.rawgit.com/afeld/bootstrap-toc/v1.0.1/dist/bootstrap-toc.min.js"></script><!-- headroom.js --><script src="https://cdnjs.cloudflare.com/ajax/libs/headroom/0.11.0/headroom.min.js" integrity="sha256-AsUX4SJE1+yuDu5+mAVzJbuYNPHj/WroHuZ8Ir/CkE0=" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/headroom/0.11.0/jQuery.headroom.min.js" integrity="sha256-ZX/yNShbjqsohH1k95liqY9Gd8uOiE1S4vZc+9KQ1K4=" crossorigin="anonymous"></script><!-- clipboard.js --><script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/2.0.6/clipboard.min.js" integrity="sha256-inc5kl9MA1hkeYUt+EC3BhlIgyp/2jDIyBLS6k3UxPI=" crossorigin="anonymous"></script><!-- search --><script src="https://cdnjs.cloudflare.com/ajax/libs/fuse.js/6.4.6/fuse.js" integrity="sha512-zv6Ywkjyktsohkbp9bb45V6tEMoWhzFzXis+LrMehmJZZSys19Yxf1dopHx7WzIKxr5tK2dVcYmaCk2uqdjF4A==" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/autocomplete.js/0.38.0/autocomplete.jquery.min.js" integrity="sha512-GU9ayf+66Xx2TmpxqJpliWbT5PiGYxpaG8rfnBEk1LL8l1KGkRShhngwdXK1UgqhAzWpZHSiYPc09/NwDQIGyg==" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/mark.js/8.11.1/mark.min.js" integrity="sha512-5CYOlHXGh6QpOFA/TeTylKLWfB3ftPsde7AnmhuitiTX4K5SqCLBeKro6sPS8ilsz1Q4NRx3v8Ko2IBiszzdww==" crossorigin="anonymous"></script><!-- pkgdown --><script src="../pkgdown.js"></script><meta property="og:title" content="An Introduction to **ClassifyR**"><meta property="og:description" content="ClassifyR"><!-- mathjax --><script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js" integrity="sha256-nvJJv9wWKEm88qvoQl9ekL2J+k/RWIsaSScxxlsrv8k=" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/config/TeX-AMS-MML_HTMLorMML.js" integrity="sha256-84DKXVJXs0/F8OTMzX4UR909+jtl4G7SPypPavF+GfA=" crossorigin="anonymous"></script><!--[if lt IE 9]>
|
|
2
|
+<!-- Generated by pkgdown: do not edit by hand --><html lang="en">
|
|
3
|
+<head>
|
|
4
|
+<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
|
|
5
|
+<meta charset="utf-8">
|
|
6
|
+<meta http-equiv="X-UA-Compatible" content="IE=edge">
|
|
7
|
+<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
|
|
8
|
+<meta name="description" content="ClassifyR">
|
|
9
|
+<title>An Introduction to ClassifyR</title>
|
|
10
|
+<script src="../deps/jquery-3.6.0/jquery-3.6.0.min.js"></script><meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
|
|
11
|
+<link href="../deps/bootstrap-5.1.3/bootstrap.min.css" rel="stylesheet">
|
|
12
|
+<script src="../deps/bootstrap-5.1.3/bootstrap.bundle.min.js"></script><!-- Font Awesome icons --><link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.12.1/css/all.min.css" integrity="sha256-mmgLkCYLUQbXn0B1SRqzHar6dCnv9oZFPEC1g1cwlkk=" crossorigin="anonymous">
|
|
13
|
+<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.12.1/css/v4-shims.min.css" integrity="sha256-wZjR52fzng1pJHwx4aV2AO3yyTOXrcDW7jBpJtTwVxw=" crossorigin="anonymous">
|
|
14
|
+<!-- bootstrap-toc --><script src="https://cdn.rawgit.com/afeld/bootstrap-toc/v1.0.1/dist/bootstrap-toc.min.js"></script><!-- headroom.js --><script src="https://cdnjs.cloudflare.com/ajax/libs/headroom/0.11.0/headroom.min.js" integrity="sha256-AsUX4SJE1+yuDu5+mAVzJbuYNPHj/WroHuZ8Ir/CkE0=" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/headroom/0.11.0/jQuery.headroom.min.js" integrity="sha256-ZX/yNShbjqsohH1k95liqY9Gd8uOiE1S4vZc+9KQ1K4=" crossorigin="anonymous"></script><!-- clipboard.js --><script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/2.0.6/clipboard.min.js" integrity="sha256-inc5kl9MA1hkeYUt+EC3BhlIgyp/2jDIyBLS6k3UxPI=" crossorigin="anonymous"></script><!-- search --><script src="https://cdnjs.cloudflare.com/ajax/libs/fuse.js/6.4.6/fuse.js" integrity="sha512-zv6Ywkjyktsohkbp9bb45V6tEMoWhzFzXis+LrMehmJZZSys19Yxf1dopHx7WzIKxr5tK2dVcYmaCk2uqdjF4A==" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/autocomplete.js/0.38.0/autocomplete.jquery.min.js" integrity="sha512-GU9ayf+66Xx2TmpxqJpliWbT5PiGYxpaG8rfnBEk1LL8l1KGkRShhngwdXK1UgqhAzWpZHSiYPc09/NwDQIGyg==" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/mark.js/8.11.1/mark.min.js" integrity="sha512-5CYOlHXGh6QpOFA/TeTylKLWfB3ftPsde7AnmhuitiTX4K5SqCLBeKro6sPS8ilsz1Q4NRx3v8Ko2IBiszzdww==" crossorigin="anonymous"></script><!-- pkgdown --><script src="../pkgdown.js"></script><meta property="og:title" content="An Introduction to ClassifyR">
|
|
15
|
+<meta property="og:description" content="ClassifyR">
|
|
16
|
+<!-- mathjax --><script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js" integrity="sha256-nvJJv9wWKEm88qvoQl9ekL2J+k/RWIsaSScxxlsrv8k=" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/config/TeX-AMS-MML_HTMLorMML.js" integrity="sha256-84DKXVJXs0/F8OTMzX4UR909+jtl4G7SPypPavF+GfA=" crossorigin="anonymous"></script><!--[if lt IE 9]>
|
3
|
17
|
<script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>
|
4
|
18
|
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
|
5
|
|
-<![endif]--></head><body>
|
|
19
|
+<![endif]-->
|
|
20
|
+</head>
|
|
21
|
+<body>
|
6
|
22
|
<a href="#main" class="visually-hidden-focusable">Skip to contents</a>
|
7
|
23
|
|
8
|
24
|
|
...
|
...
|
@@ -10,7 +26,7 @@
|
10
|
26
|
|
11
|
27
|
<a class="navbar-brand me-2" href="../index.html">ClassifyR</a>
|
12
|
28
|
|
13
|
|
- <small class="nav-text text-muted me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="">3.3.1</small>
|
|
29
|
+ <small class="nav-text text-muted me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="">3.3.2</small>
|
14
|
30
|
|
15
|
31
|
|
16
|
32
|
<button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
|
...
|
...
|
@@ -18,26 +34,23 @@
|
18
|
34
|
</button>
|
19
|
35
|
|
20
|
36
|
<div id="navbar" class="collapse navbar-collapse ms-3">
|
21
|
|
- <ul class="navbar-nav me-auto"><li class="active nav-item">
|
|
37
|
+ <ul class="navbar-nav me-auto">
|
|
38
|
+<li class="active nav-item">
|
22
|
39
|
<a class="nav-link" href="../articles/ClassifyR.html">Get started</a>
|
23
|
40
|
</li>
|
24
|
41
|
<li class="nav-item">
|
25
|
42
|
<a class="nav-link" href="../reference/index.html">Reference</a>
|
26
|
43
|
</li>
|
27
|
|
-<li class="nav-item dropdown">
|
28
|
|
- <a href="#" class="nav-link dropdown-toggle" data-bs-toggle="dropdown" role="button" aria-expanded="false" aria-haspopup="true" id="dropdown-articles">Articles</a>
|
29
|
|
- <div class="dropdown-menu" aria-labelledby="dropdown-articles">
|
30
|
|
- <a class="dropdown-item" href="../articles/DevelopersGuide.html">**ClassifyR** Developer's Guide</a>
|
31
|
|
- <a class="dropdown-item" href="../articles/incorporateNew.html">Creating a Wrapper for New Functionality and Registering It</a>
|
32
|
|
- <a class="dropdown-item" href="../articles/introduction.html">Introduction to the Concepts of ClassifyR</a>
|
33
|
|
- <a class="dropdown-item" href="../articles/multiViewMethods.html">Multi-view Methods for Modelling of Multiple Data Views</a>
|
34
|
|
- <a class="dropdown-item" href="../articles/performanceEvaluation.html">Performance Evaluation of Fitted Models</a>
|
35
|
|
- </div>
|
|
44
|
+<li class="nav-item">
|
|
45
|
+ <a class="nav-link" href="../articles/index.html">Articles</a>
|
36
|
46
|
</li>
|
37
|
|
- </ul><form class="form-inline my-2 my-lg-0" role="search">
|
38
|
|
- <input type="search" class="form-control me-sm-2" aria-label="Toggle navigation" name="search-input" data-search-index="../search.json" id="search-input" placeholder="Search for" autocomplete="off"></form>
|
|
47
|
+ </ul>
|
|
48
|
+<form class="form-inline my-2 my-lg-0" role="search">
|
|
49
|
+ <input type="search" class="form-control me-sm-2" aria-label="Toggle navigation" name="search-input" data-search-index="../search.json" id="search-input" placeholder="Search for" autocomplete="off">
|
|
50
|
+</form>
|
39
|
51
|
|
40
|
|
- <ul class="navbar-nav"></ul></div>
|
|
52
|
+ <ul class="navbar-nav"></ul>
|
|
53
|
+</div>
|
41
|
54
|
|
42
|
55
|
|
43
|
56
|
</div>
|
...
|
...
|
@@ -45,13 +58,10 @@
|
45
|
58
|
|
46
|
59
|
|
47
|
60
|
|
48
|
|
-
|
49
|
|
-<div class="row">
|
|
61
|
+<script src="ClassifyR_files/accessible-code-block-0.0.1/empty-anchor.js"></script><div class="row">
|
50
|
62
|
<main id="main" class="col-md-9"><div class="page-header">
|
51
|
|
- <img src="" class="logo" alt=""><h1>An Introduction to **ClassifyR**</h1>
|
52
|
|
- <h4 data-toc-skip class="author">Dario Strbenac,
|
53
|
|
-Ellis Patrick, Graham Mann, Jean Yang, John Ormerod <br> The University
|
54
|
|
-of Sydney, Australia.</h4>
|
|
63
|
+ <img src="" class="logo" alt=""><h1>An Introduction to ClassifyR</h1>
|
|
64
|
+ <h4 data-toc-skip class="author">Dario Strbenac, Ellis Patrick, Graham Mann, Jean Yang, John Ormerod <br> The University of Sydney, Australia.</h4>
|
55
|
65
|
|
56
|
66
|
|
57
|
67
|
|
...
|
...
|
@@ -60,254 +70,166 @@ of Sydney, Australia.</h4>
|
60
|
70
|
|
61
|
71
|
|
62
|
72
|
|
63
|
|
-<div id="installation" class="section level2">
|
64
|
|
-<h2>Installation</h2>
|
65
|
|
-<p>Typically, each feature selection method or classifier originates
|
66
|
|
-from a different R package, which <strong>ClassifyR</strong> provides a
|
67
|
|
-wrapper around. By default, only high-performance t-test/F-test and
|
68
|
|
-random forest are installed. If you intend to compare between numerous
|
69
|
|
-different modelling methods, you should install all suggested packages
|
70
|
|
-at once by using the command
|
71
|
|
-<code>BiocManager::install("ClassifyR", dependencies = TRUE)</code>.
|
72
|
|
-This will take a few minutes, particularly on Linux, because each
|
73
|
|
-package will be compiled from source code.</p>
|
|
73
|
+<div class="section level2">
|
|
74
|
+<h2 id="installation">Installation<a class="anchor" aria-label="anchor" href="#installation"></a>
|
|
75
|
+</h2>
|
|
76
|
+<p>Typically, each feature selection method or classifier originates from a different R package, which <strong>ClassifyR</strong> provides a wrapper around. By default, only high-performance t-test/F-test and random forest are installed. If you intend to compare between numerous different modelling methods, you should install all suggested packages at once by using the command <code>BiocManager::install("ClassifyR", dependencies = TRUE)</code>. This will take a few minutes, particularly on Linux, because each package will be compiled from source code.</p>
|
74
|
77
|
</div>
|
75
|
|
-<div id="overview" class="section level2">
|
76
|
|
-<h2>Overview</h2>
|
77
|
|
-<p><strong>ClassifyR</strong> provides a structured pipeline for
|
78
|
|
-cross-validated classification. Classification is viewed in terms of
|
79
|
|
-four stages, data transformation, feature selection, classifier
|
80
|
|
-training, and prediction. The driver functions <em>crossValidate</em>
|
81
|
|
-and <em>runTests</em> implements varieties of cross-validation. They
|
82
|
|
-are:</p>
|
|
78
|
+<div class="section level2">
|
|
79
|
+<h2 id="overview">Overview<a class="anchor" aria-label="anchor" href="#overview"></a>
|
|
80
|
+</h2>
|
|
81
|
+<p><strong>ClassifyR</strong> provides a structured pipeline for cross-validated classification. Classification is viewed in terms of four stages, data transformation, feature selection, classifier training, and prediction. The driver functions <em>crossValidate</em> and <em>runTests</em> implements varieties of cross-validation. They are:</p>
|
83
|
82
|
<ul>
|
84
|
|
-<li>Permutation of the order of samples followed by k-fold
|
85
|
|
-cross-validation (runTests only)</li>
|
|
83
|
+<li>Permutation of the order of samples followed by k-fold cross-validation (runTests only)</li>
|
86
|
84
|
<li>Repeated x% test set cross-validation</li>
|
87
|
85
|
<li>leave-k-out cross-validation</li>
|
88
|
86
|
</ul>
|
89
|
|
-<p>Driver functions can use parallel processing capabilities in R to
|
90
|
|
-speed up cross-validations when many CPUs are available. The output of
|
91
|
|
-the driver functions is a <em>ClassifyResult</em> object which can be
|
92
|
|
-directly used by the performance evaluation functions. The process of
|
93
|
|
-classification is summarised by a flowchart.</p>
|
94
|
|
-<img src="" style="margin-left: auto;margin-right: auto"/>
|
95
|
|
-<p>Importantly, ClassifyR implements a number of methods for
|
96
|
|
-classification using different kinds of changes in measurements between
|
97
|
|
-classes. Most classifiers work with features where the means are
|
98
|
|
-different. In addition to changes in means (DM),
|
99
|
|
-<strong>ClassifyR</strong> also allows for classification using
|
100
|
|
-differential variability (DV; changes in scale) and differential
|
101
|
|
-distribution (DD; changes in location and/or scale).</p>
|
102
|
|
-<div id="case-study-diagnosing-asthma" class="section level3">
|
103
|
|
-<h3>Case Study: Diagnosing Asthma</h3>
|
104
|
|
-<p>To demonstrate some key features of ClassifyR, a data set consisting
|
105
|
|
-of the 2000 most variably expressed genes and 190 people will be used to
|
106
|
|
-quickly obtain results. The journal article corresponding to the data
|
107
|
|
-set was published in <em>Scientific Reports</em> in 2018 and is titled
|
108
|
|
-<a href="http://www.nature.com/articles/s41598-018-27189-4">A Nasal
|
109
|
|
-Brush-based Classifier of Asthma Identified by Machine Learning Analysis
|
110
|
|
-of Nasal RNA Sequence Data</a>.</p>
|
|
87
|
+<p>Driver functions can use parallel processing capabilities in R to speed up cross-validations when many CPUs are available. The output of the driver functions is a <em>ClassifyResult</em> object which can be directly used by the performance evaluation functions. The process of classification is summarised by a flowchart.</p>
|
|
88
|
+<img src="" style="margin-left: auto;margin-right: auto"><p>Importantly, ClassifyR implements a number of methods for classification using different kinds of changes in measurements between classes. Most classifiers work with features where the means are different. In addition to changes in means (DM), <strong>ClassifyR</strong> also allows for classification using differential variability (DV; changes in scale) and differential distribution (DD; changes in location and/or scale).</p>
|
|
89
|
+<div class="section level3">
|
|
90
|
+<h3 id="case-study-diagnosing-asthma">Case Study: Diagnosing Asthma<a class="anchor" aria-label="anchor" href="#case-study-diagnosing-asthma"></a>
|
|
91
|
+</h3>
|
|
92
|
+<p>To demonstrate some key features of ClassifyR, a data set consisting of the 2000 most variably expressed genes and 190 people will be used to quickly obtain results. The journal article corresponding to the data set was published in <em>Scientific Reports</em> in 2018 and is titled <a href="http://www.nature.com/articles/s41598-018-27189-4" class="external-link">A Nasal Brush-based Classifier of Asthma Identified by Machine Learning Analysis of Nasal RNA Sequence Data</a>.</p>
|
111
|
93
|
<p>Load the package.</p>
|
112
|
|
-<div class="sourceCode" id="cb1"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(ClassifyR)</span></code></pre></div>
|
113
|
|
-<pre><code>## Warning: multiple methods tables found for 'aperm'</code></pre>
|
114
|
|
-<pre><code>## Warning: replacing previous import 'BiocGenerics::aperm' by 'DelayedArray::aperm' when loading 'SummarizedExperiment'</code></pre>
|
|
94
|
+<div class="sourceCode" id="cb1"><pre class="downlit sourceCode r">
|
|
95
|
+<code class="sourceCode R"><span><span class="kw"><a href="https://rdrr.io/r/base/library.html" class="external-link">library</a></span><span class="op">(</span><span class="va"><a href="https://sydneybiox.github.io/ClassifyR/">ClassifyR</a></span><span class="op">)</span></span></code></pre></div>
|
|
96
|
+<pre><code><span><span class="co">## Warning: multiple methods tables found for 'aperm'</span></span></code></pre>
|
|
97
|
+<pre><code><span><span class="co">## Warning: replacing previous import 'BiocGenerics::aperm' by 'DelayedArray::aperm' when loading 'SummarizedExperiment'</span></span></code></pre>
|
115
|
98
|
<p>A glimpse at the RNA measurements and sample classes.</p>
|
116
|
|
-<div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="fu">data</span>(asthma) <span class="co"># Contains measurements and classes variables.</span></span>
|
117
|
|
-<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>measurements[<span class="dv">1</span><span class="sc">:</span><span class="dv">5</span>, <span class="dv">1</span><span class="sc">:</span><span class="dv">5</span>]</span></code></pre></div>
|
118
|
|
-<pre><code>## HBB BPIFA1 XIST FCGR3B HBA2
|
119
|
|
-## Sample 1 9.72 14.06 12.28 11.42 7.83
|
120
|
|
-## Sample 2 11.98 13.89 6.35 13.25 9.42
|
121
|
|
-## Sample 3 12.15 17.44 10.21 7.87 9.68
|
122
|
|
-## Sample 4 10.60 11.87 6.27 14.75 8.96
|
123
|
|
-## Sample 5 8.18 15.01 11.21 6.77 6.43</code></pre>
|
124
|
|
-<div class="sourceCode" id="cb6"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="fu">head</span>(classes)</span></code></pre></div>
|
125
|
|
-<pre><code>## [1] No No No No Yes No
|
126
|
|
-## Levels: No Yes</code></pre>
|
127
|
|
-<p>The numeric matrix variable <em>measurements</em> stores the
|
128
|
|
-normalised values of the RNA gene abundances for each sample and the
|
129
|
|
-factor vector <em>classes</em> identifies which class the samples belong
|
130
|
|
-to. The measurements were normalised using <strong>DESeq2</strong>’s
|
131
|
|
-<em>varianceStabilizingTransformation</em> function, which produces
|
132
|
|
-<span class="math inline">\(log_2\)</span>-like data.</p>
|
133
|
|
-<p>For more complex data sets with multiple kinds of experiments
|
134
|
|
-(e.g. DNA methylation, copy number, gene expression on the same set of
|
135
|
|
-samples) a <a
|
136
|
|
-href="https://bioconductor.org/packages/release/bioc/html/MultiAssayExperiment.html"><em>MultiAssayExperiment</em></a>
|
137
|
|
-is recommended for data storage and supported by
|
138
|
|
-<strong>ClassifyR</strong>’s methods.</p>
|
|
99
|
+<div class="sourceCode" id="cb4"><pre class="downlit sourceCode r">
|
|
100
|
+<code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/utils/data.html" class="external-link">data</a></span><span class="op">(</span><span class="va">asthma</span><span class="op">)</span> <span class="co"># Contains measurements and classes variables.</span></span>
|
|
101
|
+<span><span class="va">measurements</span><span class="op">[</span><span class="fl">1</span><span class="op">:</span><span class="fl">5</span>, <span class="fl">1</span><span class="op">:</span><span class="fl">5</span><span class="op">]</span></span></code></pre></div>
|
|
102
|
+<pre><code><span><span class="co">## HBB BPIFA1 XIST FCGR3B HBA2</span></span>
|
|
103
|
+<span><span class="co">## Sample 1 9.72 14.06 12.28 11.42 7.83</span></span>
|
|
104
|
+<span><span class="co">## Sample 2 11.98 13.89 6.35 13.25 9.42</span></span>
|
|
105
|
+<span><span class="co">## Sample 3 12.15 17.44 10.21 7.87 9.68</span></span>
|
|
106
|
+<span><span class="co">## Sample 4 10.60 11.87 6.27 14.75 8.96</span></span>
|
|
107
|
+<span><span class="co">## Sample 5 8.18 15.01 11.21 6.77 6.43</span></span></code></pre>
|
|
108
|
+<div class="sourceCode" id="cb6"><pre class="downlit sourceCode r">
|
|
109
|
+<code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/utils/head.html" class="external-link">head</a></span><span class="op">(</span><span class="va">classes</span><span class="op">)</span></span></code></pre></div>
|
|
110
|
+<pre><code><span><span class="co">## [1] No No No No Yes No </span></span>
|
|
111
|
+<span><span class="co">## Levels: No Yes</span></span></code></pre>
|
|
112
|
+<p>The numeric matrix variable <em>measurements</em> stores the normalised values of the RNA gene abundances for each sample and the factor vector <em>classes</em> identifies which class the samples belong to. The measurements were normalised using <strong>DESeq2</strong>’s <em>varianceStabilizingTransformation</em> function, which produces <span class="math inline">\(log_2\)</span>-like data.</p>
|
|
113
|
+<p>For more complex data sets with multiple kinds of experiments (e.g. DNA methylation, copy number, gene expression on the same set of samples) a <a href="https://bioconductor.org/packages/release/bioc/html/MultiAssayExperiment.html" class="external-link"><em>MultiAssayExperiment</em></a> is recommended for data storage and supported by <strong>ClassifyR</strong>’s methods.</p>
|
139
|
114
|
</div>
|
140
|
115
|
</div>
|
141
|
|
-<div id="quick-start-crossvalidate-function" class="section level2">
|
142
|
|
-<h2>Quick Start: <em>crossValidate</em> Function</h2>
|
143
|
|
-<p>The <em>crossValidate</em> function offers a quick and simple way to
|
144
|
|
-start analysing a dataset in ClassifyR. It is a wrapper for
|
145
|
|
-<em>runTests</em>, the core model building and testing function of
|
146
|
|
-ClassifyR. <em>crossValidate</em> must be supplied with
|
147
|
|
-<em>measurements</em>, a simple tabular data container or a list-like
|
148
|
|
-structure of such related tabular data on common samples. The classes of
|
149
|
|
-it may be <em>matrix</em>, <em>data.frame</em>, <em>DataFrame</em>,
|
150
|
|
-<em>MultiAssayExperiment</em> or <em>list</em> of <em>data.frames</em>.
|
151
|
|
-For a dataset with <span class="math inline">\(n\)</span> observations
|
152
|
|
-and <span class="math inline">\(p\)</span> variables, the
|
153
|
|
-<em>crossValidate</em> function will accept inputs of the following
|
154
|
|
-shapes:</p>
|
155
|
|
-<table>
|
156
|
|
-<colgroup>
|
157
|
|
-<col width="25%" />
|
158
|
|
-<col width="37%" />
|
159
|
|
-<col width="37%" />
|
160
|
|
-</colgroup>
|
161
|
|
-<thead>
|
162
|
|
-<tr class="header">
|
|
116
|
+<div class="section level2">
|
|
117
|
+<h2 id="quick-start-crossvalidate-function">Quick Start: <em>crossValidate</em> Function<a class="anchor" aria-label="anchor" href="#quick-start-crossvalidate-function"></a>
|
|
118
|
+</h2>
|
|
119
|
+<p>The <em>crossValidate</em> function offers a quick and simple way to start analysing a dataset in ClassifyR. It is a wrapper for <em>runTests</em>, the core model building and testing function of ClassifyR. <em>crossValidate</em> must be supplied with <em>measurements</em>, a simple tabular data container or a list-like structure of such related tabular data on common samples. The classes of it may be <em>matrix</em>, <em>data.frame</em>, <em>DataFrame</em>, <em>MultiAssayExperiment</em> or <em>list</em> of <em>data.frames</em>. For a dataset with <span class="math inline">\(n\)</span> observations and <span class="math inline">\(p\)</span> variables, the <em>crossValidate</em> function will accept inputs of the following shapes:</p>
|
|
120
|
+<table class="table">
|
|
121
|
+<thead><tr class="header">
|
163
|
122
|
<th>Data Type</th>
|
164
|
123
|
<th align="center"><span class="math inline">\(n \times p\)</span></th>
|
165
|
124
|
<th align="center"><span class="math inline">\(p \times n\)</span></th>
|
166
|
|
-</tr>
|
167
|
|
-</thead>
|
|
125
|
+</tr></thead>
|
168
|
126
|
<tbody>
|
169
|
127
|
<tr class="odd">
|
170
|
|
-<td><span
|
171
|
|
-style="font-family: 'Courier New', monospace;">matrix</span></td>
|
|
128
|
+<td><span style="font-family: 'Courier New', monospace;">matrix</span></td>
|
172
|
129
|
<td align="center">✔</td>
|
173
|
130
|
<td align="center"></td>
|
174
|
131
|
</tr>
|
175
|
132
|
<tr class="even">
|
176
|
|
-<td><span
|
177
|
|
-style="font-family: 'Courier New', monospace;">data.frame</span></td>
|
|
133
|
+<td><span style="font-family: 'Courier New', monospace;">data.frame</span></td>
|
178
|
134
|
<td align="center">✔</td>
|
179
|
135
|
<td align="center"></td>
|
180
|
136
|
</tr>
|
181
|
137
|
<tr class="odd">
|
182
|
|
-<td><span
|
183
|
|
-style="font-family: 'Courier New', monospace;">DataFrame</span></td>
|
|
138
|
+<td><span style="font-family: 'Courier New', monospace;">DataFrame</span></td>
|
184
|
139
|
<td align="center">✔</td>
|
185
|
140
|
<td align="center"></td>
|
186
|
141
|
</tr>
|
187
|
142
|
<tr class="even">
|
188
|
|
-<td><span
|
189
|
|
-style="font-family: 'Courier New', monospace;">MultiAssayExperiment</span></td>
|
|
143
|
+<td><span style="font-family: 'Courier New', monospace;">MultiAssayExperiment</span></td>
|
190
|
144
|
<td align="center"></td>
|
191
|
145
|
<td align="center">✔</td>
|
192
|
146
|
</tr>
|
193
|
147
|
<tr class="odd">
|
194
|
|
-<td><span
|
195
|
|
-style="font-family: 'Courier New', monospace;">list</span> of
|
196
|
|
-<span
|
197
|
|
-style="font-family: 'Courier New', monospace;">data.frame</span>s</td>
|
|
148
|
+<td>
|
|
149
|
+<span style="font-family: 'Courier New', monospace;">list</span> of <span style="font-family: 'Courier New', monospace;">data.frame</span>s</td>
|
198
|
150
|
<td align="center">✔</td>
|
199
|
151
|
<td align="center"></td>
|
200
|
152
|
</tr>
|
201
|
153
|
</tbody>
|
202
|
154
|
</table>
|
203
|
|
-<p><em>crossValidate</em> must also be supplied with <em>outcome</em>,
|
204
|
|
-which represents the prediction to be made in a variety of possible
|
205
|
|
-ways.</p>
|
|
155
|
+<p><em>crossValidate</em> must also be supplied with <em>outcome</em>, which represents the prediction to be made in a variety of possible ways.</p>
|
206
|
156
|
<ul>
|
207
|
|
-<li>A <em>factor</em> that contains the class label for each
|
208
|
|
-observation. <em>classes</em> must be of length <span
|
209
|
|
-class="math inline">\(n\)</span>.</li>
|
210
|
|
-<li>A <em>character</em> of length 1 that matches a column name in a
|
211
|
|
-data frame which holds the classes. The classes will automatically be
|
212
|
|
-removed before training is done.</li>
|
213
|
|
-<li>A <em>Surv</em> object of the same length as the number of samples
|
214
|
|
-in the data which contains information about the time and censoring of
|
215
|
|
-the samples.</li>
|
216
|
|
-<li>A <em>character</em> vector of length 2 or 3 that each match a
|
217
|
|
-column name in a data frame which holds information about the time and
|
218
|
|
-censoring of the samples. The time-to-event columns will automatically
|
219
|
|
-be removed before training is done.</li>
|
|
157
|
+<li>A <em>factor</em> that contains the class label for each observation. <em>classes</em> must be of length <span class="math inline">\(n\)</span>.</li>
|
|
158
|
+<li>A <em>character</em> of length 1 that matches a column name in a data frame which holds the classes. The classes will automatically be removed before training is done.</li>
|
|
159
|
+<li>A <em>Surv</em> object of the same length as the number of samples in the data which contains information about the time and censoring of the samples.</li>
|
|
160
|
+<li>A <em>character</em> vector of length 2 or 3 that each match a column name in a data frame which holds information about the time and censoring of the samples. The time-to-event columns will automatically be removed before training is done.</li>
|
220
|
161
|
</ul>
|
221
|
|
-<p>The type of classifier used can be changed with the
|
222
|
|
-<em>classifier</em> argument. The default is a random forest, which
|
223
|
|
-seamlessly handles categorical and numerical data. A full list of
|
224
|
|
-classifiers can be seen by running <em>?crossValidate</em>. A feature
|
225
|
|
-selection step can be performed before classification using
|
226
|
|
-<em>nFeatures</em> and <em>selectionMethod</em>, which is a t-test by
|
227
|
|
-default. Similarly, the number of folds and number of repeats for cross
|
228
|
|
-validation can be changed with the <em>nFolds</em> and <em>nRepeats</em>
|
229
|
|
-arguments. If wanted, <em>nCores</em> can be specified to run the cross
|
230
|
|
-validation in parallel. To perform 5-fold cross-validation of a Support
|
231
|
|
-Vector Machine with 2 repeats:</p>
|
232
|
|
-<div class="sourceCode" id="cb8"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>result <span class="ot"><-</span> <span class="fu">crossValidate</span>(measurements, classes, <span class="at">classifier =</span> <span class="st">"SVM"</span>,</span>
|
233
|
|
-<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a> <span class="at">nFeatures =</span> <span class="dv">20</span>, <span class="at">nFolds =</span> <span class="dv">5</span>, <span class="at">nRepeats =</span> <span class="dv">2</span>, <span class="at">nCores =</span> <span class="dv">1</span>)</span></code></pre></div>
|
234
|
|
-<pre><code>## Processing sample set 10.</code></pre>
|
235
|
|
-<div class="sourceCode" id="cb10"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="fu">performancePlot</span>(result)</span></code></pre></div>
|
236
|
|
-<pre><code>## Warning in .local(results, ...): Balanced Accuracy not found in all elements of results. Calculating it now.</code></pre>
|
237
|
|
-<p><img src="ClassifyR_files/figure-html/unnamed-chunk-5-1.png" width="700" /></p>
|
238
|
|
-<div id="data-integration-with-crossvalidate" class="section level3">
|
239
|
|
-<h3>Data Integration with crossValidate</h3>
|
240
|
|
-<p><em>crossValidate</em> also allows data from multiple sources to be
|
241
|
|
-integrated into a single model. The integration method can be specified
|
242
|
|
-with <em>multiViewMethod</em> argument. In this example, suppose the
|
243
|
|
-first 10 variables in the asthma data set are from a certain source and
|
244
|
|
-the remaining 1990 variables are from a second source. To integrate
|
245
|
|
-multiple data sets, each variable must be labeled with the data set it
|
246
|
|
-came from. This is done in a different manner depending on the data type
|
247
|
|
-of <em>measurements</em>.</p>
|
248
|
|
-<p>If using Bioconductor’s <em>DataFrame</em>, this can be specified
|
249
|
|
-using <em>mcols</em>. In the column metadata, each feature must have an
|
250
|
|
-<em>assay</em> and a <em>feature</em> name.</p>
|
251
|
|
-<div class="sourceCode" id="cb12"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a>measurementsDF <span class="ot"><-</span> <span class="fu">DataFrame</span>(measurements)</span>
|
252
|
|
-<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="fu">mcols</span>(measurementsDF) <span class="ot"><-</span> <span class="fu">data.frame</span>(</span>
|
253
|
|
-<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a> <span class="at">assay =</span> <span class="fu">rep</span>(<span class="fu">c</span>(<span class="st">"assay_1"</span>, <span class="st">"assay_2"</span>), <span class="at">times =</span> <span class="fu">c</span>(<span class="dv">10</span>, <span class="dv">1990</span>)),</span>
|
254
|
|
-<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a> <span class="at">feature =</span> <span class="fu">colnames</span>(measurementsDF)</span>
|
255
|
|
-<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a>)</span>
|
256
|
|
-<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a></span>
|
257
|
|
-<span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a>result <span class="ot"><-</span> <span class="fu">crossValidate</span>(measurementsDF, classes, <span class="at">classifier =</span> <span class="st">"SVM"</span>, <span class="at">nFolds =</span> <span class="dv">5</span>,</span>
|
258
|
|
-<span id="cb12-8"><a href="#cb12-8" aria-hidden="true" tabindex="-1"></a> <span class="at">nRepeats =</span> <span class="dv">3</span>, <span class="at">multiViewMethod =</span> <span class="st">"merge"</span>)</span></code></pre></div>
|
259
|
|
-<pre><code>## Processing sample set 10.
|
260
|
|
-## Processing sample set 10.
|
261
|
|
-## Processing sample set 10.</code></pre>
|
262
|
|
-<div class="sourceCode" id="cb14"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="fu">performancePlot</span>(result, <span class="at">characteristicsList =</span> <span class="fu">list</span>(<span class="at">x =</span> <span class="st">"Assay Name"</span>))</span></code></pre></div>
|
263
|
|
-<pre><code>## Warning in .local(results, ...): Balanced Accuracy not found in all elements of results. Calculating it now.</code></pre>
|
264
|
|
-<p><img src="ClassifyR_files/figure-html/unnamed-chunk-6-1.png" width="700" /></p>
|
265
|
|
-<p>If using a list of <em>data.frame</em>s, the name of each element in
|
266
|
|
-the list will be used as the assay name.</p>
|
267
|
|
-<div class="sourceCode" id="cb16"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Assigns first 10 variables to dataset_1, and the rest to dataset_2</span></span>
|
268
|
|
-<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a>measurementsList <span class="ot"><-</span> <span class="fu">list</span>(</span>
|
269
|
|
-<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a> (measurements <span class="sc">|></span> <span class="fu">as.data.frame</span>())[<span class="dv">1</span><span class="sc">:</span><span class="dv">10</span>],</span>
|
270
|
|
-<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a> (measurements <span class="sc">|></span> <span class="fu">as.data.frame</span>())[<span class="dv">11</span><span class="sc">:</span><span class="dv">2000</span>]</span>
|
271
|
|
-<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a>)</span>
|
272
|
|
-<span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a><span class="fu">names</span>(measurementsList) <span class="ot"><-</span> <span class="fu">c</span>(<span class="st">"assay_1"</span>, <span class="st">"assay_2"</span>)</span>
|
273
|
|
-<span id="cb16-7"><a href="#cb16-7" aria-hidden="true" tabindex="-1"></a></span>
|
274
|
|
-<span id="cb16-8"><a href="#cb16-8" aria-hidden="true" tabindex="-1"></a>result <span class="ot"><-</span> <span class="fu">crossValidate</span>(measurementsList, classes, <span class="at">classifier =</span> <span class="st">"SVM"</span>, <span class="at">nFolds =</span> <span class="dv">5</span>,</span>
|
275
|
|
-<span id="cb16-9"><a href="#cb16-9" aria-hidden="true" tabindex="-1"></a> <span class="at">nRepeats =</span> <span class="dv">3</span>, <span class="at">multiViewMethod =</span> <span class="st">"merge"</span>)</span></code></pre></div>
|
276
|
|
-<pre><code>## Processing sample set 10.
|
277
|
|
-## Processing sample set 10.
|
278
|
|
-## Processing sample set 10.</code></pre>
|
279
|
|
-<div class="sourceCode" id="cb18"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="fu">performancePlot</span>(result, <span class="at">characteristicsList =</span> <span class="fu">list</span>(<span class="at">x =</span> <span class="st">"Assay Name"</span>))</span></code></pre></div>
|
280
|
|
-<pre><code>## Warning in .local(results, ...): Balanced Accuracy not found in all elements of results. Calculating it now.</code></pre>
|
281
|
|
-<p><img src="ClassifyR_files/figure-html/unnamed-chunk-7-1.png" width="700" /></p>
|
|
162
|
+<p>The type of classifier used can be changed with the <em>classifier</em> argument. The default is a random forest, which seamlessly handles categorical and numerical data. A full list of classifiers can be seen by running <em>?crossValidate</em>. A feature selection step can be performed before classification using <em>nFeatures</em> and <em>selectionMethod</em>, which is a t-test by default. Similarly, the number of folds and number of repeats for cross validation can be changed with the <em>nFolds</em> and <em>nRepeats</em> arguments. If wanted, <em>nCores</em> can be specified to run the cross validation in parallel. To perform 5-fold cross-validation of a Support Vector Machine with 2 repeats:</p>
|
|
163
|
+<div class="sourceCode" id="cb8"><pre class="downlit sourceCode r">
|
|
164
|
+<code class="sourceCode R"><span><span class="va">result</span> <span class="op"><-</span> <span class="fu"><a href="../reference/crossValidate.html">crossValidate</a></span><span class="op">(</span><span class="va">measurements</span>, <span class="va">classes</span>, classifier <span class="op">=</span> <span class="st">"SVM"</span>,</span>
|
|
165
|
+<span> nFeatures <span class="op">=</span> <span class="fl">20</span>, nFolds <span class="op">=</span> <span class="fl">5</span>, nRepeats <span class="op">=</span> <span class="fl">2</span>, nCores <span class="op">=</span> <span class="fl">1</span><span class="op">)</span></span></code></pre></div>
|
|
166
|
+<pre><code><span><span class="co">## Processing sample set 10.</span></span></code></pre>
|
|
167
|
+<div class="sourceCode" id="cb10"><pre class="downlit sourceCode r">
|
|
168
|
+<code class="sourceCode R"><span><span class="fu"><a href="../reference/performancePlot.html">performancePlot</a></span><span class="op">(</span><span class="va">result</span><span class="op">)</span></span></code></pre></div>
|
|
169
|
+<pre><code><span><span class="co">## Warning in .local(results, ...): Balanced Accuracy not found in all elements of results. Calculating it now.</span></span></code></pre>
|
|
170
|
+<p><img src="ClassifyR_files/figure-html/unnamed-chunk-5-1.png" width="700"></p>
|
|
171
|
+<div class="section level3">
|
|
172
|
+<h3 id="data-integration-with-crossvalidate">Data Integration with crossValidate<a class="anchor" aria-label="anchor" href="#data-integration-with-crossvalidate"></a>
|
|
173
|
+</h3>
|
|
174
|
+<p><em>crossValidate</em> also allows data from multiple sources to be integrated into a single model. The integration method can be specified with <em>multiViewMethod</em> argument. In this example, suppose the first 10 variables in the asthma data set are from a certain source and the remaining 1990 variables are from a second source. To integrate multiple data sets, each variable must be labeled with the data set it came from. This is done in a different manner depending on the data type of <em>measurements</em>.</p>
|
|
175
|
+<p>If using Bioconductor’s <em>DataFrame</em>, this can be specified using <em>mcols</em>. In the column metadata, each feature must have an <em>assay</em> and a <em>feature</em> name.</p>
|
|
176
|
+<div class="sourceCode" id="cb12"><pre class="downlit sourceCode r">
|
|
177
|
+<code class="sourceCode R"><span><span class="va">measurementsDF</span> <span class="op"><-</span> <span class="fu"><a href="https://rdrr.io/pkg/S4Vectors/man/DataFrame-class.html" class="external-link">DataFrame</a></span><span class="op">(</span><span class="va">measurements</span><span class="op">)</span></span>
|
|
178
|
+<span><span class="fu"><a href="https://rdrr.io/pkg/S4Vectors/man/Vector-class.html" class="external-link">mcols</a></span><span class="op">(</span><span class="va">measurementsDF</span><span class="op">)</span> <span class="op"><-</span> <span class="fu"><a href="https://rdrr.io/r/base/data.frame.html" class="external-link">data.frame</a></span><span class="op">(</span></span>
|
|
179
|
+<span> assay <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/rep.html" class="external-link">rep</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/c.html" class="external-link">c</a></span><span class="op">(</span><span class="st">"assay_1"</span>, <span class="st">"assay_2"</span><span class="op">)</span>, times <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html" class="external-link">c</a></span><span class="op">(</span><span class="fl">10</span>, <span class="fl">1990</span><span class="op">)</span><span class="op">)</span>,</span>
|
|
180
|
+<span> feature <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/colnames.html" class="external-link">colnames</a></span><span class="op">(</span><span class="va">measurementsDF</span><span class="op">)</span></span>
|
|
181
|
+<span><span class="op">)</span></span>
|
|
182
|
+<span></span>
|
|
183
|
+<span><span class="va">result</span> <span class="op"><-</span> <span class="fu"><a href="../reference/crossValidate.html">crossValidate</a></span><span class="op">(</span><span class="va">measurementsDF</span>, <span class="va">classes</span>, classifier <span class="op">=</span> <span class="st">"SVM"</span>, nFolds <span class="op">=</span> <span class="fl">5</span>,</span>
|
|
184
|
+<span> nRepeats <span class="op">=</span> <span class="fl">3</span>, multiViewMethod <span class="op">=</span> <span class="st">"merge"</span><span class="op">)</span></span></code></pre></div>
|
|
185
|
+<pre><code><span><span class="co">## Processing sample set 10.</span></span>
|
|
186
|
+<span><span class="co">## Processing sample set 10.</span></span>
|
|
187
|
+<span><span class="co">## Processing sample set 10.</span></span></code></pre>
|
|
188
|
+<div class="sourceCode" id="cb14"><pre class="downlit sourceCode r">
|
|
189
|
+<code class="sourceCode R"><span><span class="fu"><a href="../reference/performancePlot.html">performancePlot</a></span><span class="op">(</span><span class="va">result</span>, characteristicsList <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/list.html" class="external-link">list</a></span><span class="op">(</span>x <span class="op">=</span> <span class="st">"Assay Name"</span><span class="op">)</span><span class="op">)</span></span></code></pre></div>
|
|
190
|
+<pre><code><span><span class="co">## Warning in .local(results, ...): Balanced Accuracy not found in all elements of results. Calculating it now.</span></span></code></pre>
|
|
191
|
+<p><img src="ClassifyR_files/figure-html/unnamed-chunk-6-1.png" width="700"></p>
|
|
192
|
+<p>If using a list of <em>data.frame</em>s, the name of each element in the list will be used as the assay name.</p>
|
|
193
|
+<div class="sourceCode" id="cb16"><pre class="downlit sourceCode r">
|
|
194
|
+<code class="sourceCode R"><span><span class="co"># Assigns first 10 variables to dataset_1, and the rest to dataset_2</span></span>
|
|
195
|
+<span><span class="va">measurementsList</span> <span class="op"><-</span> <span class="fu"><a href="https://rdrr.io/r/base/list.html" class="external-link">list</a></span><span class="op">(</span></span>
|
|
196
|
+<span> <span class="op">(</span><span class="va">measurements</span> <span class="op">|></span> <span class="fu"><a href="https://rdrr.io/r/base/as.data.frame.html" class="external-link">as.data.frame</a></span><span class="op">(</span><span class="op">)</span><span class="op">)</span><span class="op">[</span><span class="fl">1</span><span class="op">:</span><span class="fl">10</span><span class="op">]</span>,</span>
|
|
197
|
+<span> <span class="op">(</span><span class="va">measurements</span> <span class="op">|></span> <span class="fu"><a href="https://rdrr.io/r/base/as.data.frame.html" class="external-link">as.data.frame</a></span><span class="op">(</span><span class="op">)</span><span class="op">)</span><span class="op">[</span><span class="fl">11</span><span class="op">:</span><span class="fl">2000</span><span class="op">]</span></span>
|
|
198
|
+<span><span class="op">)</span></span>
|
|
199
|
+<span><span class="fu"><a href="https://rdrr.io/r/base/names.html" class="external-link">names</a></span><span class="op">(</span><span class="va">measurementsList</span><span class="op">)</span> <span class="op"><-</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html" class="external-link">c</a></span><span class="op">(</span><span class="st">"assay_1"</span>, <span class="st">"assay_2"</span><span class="op">)</span></span>
|
|
200
|
+<span></span>
|
|
201
|
+<span><span class="va">result</span> <span class="op"><-</span> <span class="fu"><a href="../reference/crossValidate.html">crossValidate</a></span><span class="op">(</span><span class="va">measurementsList</span>, <span class="va">classes</span>, classifier <span class="op">=</span> <span class="st">"SVM"</span>, nFolds <span class="op">=</span> <span class="fl">5</span>,</span>
|
|
202
|
+<span> nRepeats <span class="op">=</span> <span class="fl">3</span>, multiViewMethod <span class="op">=</span> <span class="st">"merge"</span><span class="op">)</span></span></code></pre></div>
|
|
203
|
+<pre><code><span><span class="co">## Processing sample set 10.</span></span>
|
|
204
|
+<span><span class="co">## Processing sample set 10.</span></span>
|
|
205
|
+<span><span class="co">## Processing sample set 10.</span></span></code></pre>
|
|
206
|
+<div class="sourceCode" id="cb18"><pre class="downlit sourceCode r">
|
|
207
|
+<code class="sourceCode R"><span><span class="fu"><a href="../reference/performancePlot.html">performancePlot</a></span><span class="op">(</span><span class="va">result</span>, characteristicsList <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/list.html" class="external-link">list</a></span><span class="op">(</span>x <span class="op">=</span> <span class="st">"Assay Name"</span><span class="op">)</span><span class="op">)</span></span></code></pre></div>
|
|
208
|
+<pre><code><span><span class="co">## Warning in .local(results, ...): Balanced Accuracy not found in all elements of results. Calculating it now.</span></span></code></pre>
|
|
209
|
+<p><img src="ClassifyR_files/figure-html/unnamed-chunk-7-1.png" width="700"></p>
|
282
|
210
|
</div>
|
283
|
211
|
</div>
|
284
|
|
-<div id="a-more-detailed-look-at-classifyr" class="section level2">
|
285
|
|
-<h2>A More Detailed Look at ClassifyR</h2>
|
286
|
|
-<p>In the following sections, some of the most useful functions provided
|
287
|
|
-in <strong>ClassifyR</strong> will be demonstrated. However, a user
|
288
|
|
-could wrap any feature selection, training, or prediction function to
|
289
|
|
-the classification framework, as long as it meets some simple rules
|
290
|
|
-about the input and return parameters. See the appendix section of this
|
291
|
|
-guide titled “Rules for New Functions” for a description of these.</p>
|
292
|
|
-<div id="comparison-to-existing-classification-frameworks"
|
293
|
|
-class="section level3">
|
294
|
|
-<h3>Comparison to Existing Classification Frameworks</h3>
|
295
|
|
-<p>There are a few other frameworks for classification in R. The table
|
296
|
|
-below provides a comparison of which features they offer.</p>
|
297
|
|
-<table>
|
|
212
|
+<div class="section level2">
|
|
213
|
+<h2 id="a-more-detailed-look-at-classifyr">A More Detailed Look at ClassifyR<a class="anchor" aria-label="anchor" href="#a-more-detailed-look-at-classifyr"></a>
|
|
214
|
+</h2>
|
|
215
|
+<p>In the following sections, some of the most useful functions provided in <strong>ClassifyR</strong> will be demonstrated. However, a user could wrap any feature selection, training, or prediction function to the classification framework, as long as it meets some simple rules about the input and return parameters. See the appendix section of this guide titled “Rules for New Functions” for a description of these.</p>
|
|
216
|
+<div class="section level3">
|
|
217
|
+<h3 id="comparison-to-existing-classification-frameworks">Comparison to Existing Classification Frameworks<a class="anchor" aria-label="anchor" href="#comparison-to-existing-classification-frameworks"></a>
|
|
218
|
+</h3>
|
|
219
|
+<p>There are a few other frameworks for classification in R. The table below provides a comparison of which features they offer.</p>
|
|
220
|
+<table class="table">
|
298
|
221
|
<colgroup>
|
299
|
|
-<col width="8%" />
|
300
|
|
-<col width="10%" />
|
301
|
|
-<col width="8%" />
|
302
|
|
-<col width="10%" />
|
303
|
|
-<col width="10%" />
|
304
|
|
-<col width="11%" />
|
305
|
|
-<col width="14%" />
|
306
|
|
-<col width="12%" />
|
307
|
|
-<col width="12%" />
|
|
222
|
+<col width="8%">
|
|
223
|
+<col width="10%">
|
|
224
|
+<col width="8%">
|
|
225
|
+<col width="10%">
|
|
226
|
+<col width="10%">
|
|
227
|
+<col width="11%">
|
|
228
|
+<col width="14%">
|
|
229
|
+<col width="12%">
|
|
230
|
+<col width="12%">
|
308
|
231
|
</colgroup>
|
309
|
|
-<thead>
|
310
|
|
-<tr class="header">
|
|
232
|
+<thead><tr class="header">
|
311
|
233
|
<th>Package</th>
|
312
|
234
|
<th>Run User-defined Classifiers</th>
|
313
|
235
|
<th>Parallel Execution on any OS</th>
|
...
|
...
|
@@ -317,8 +239,7 @@ below provides a comparison of which features they offer.</p>
|
317
|
239
|
<th>Class Distribution Plot</th>
|
318
|
240
|
<th>Sample-wise Error Heatmap</th>
|
319
|
241
|
<th>Direct Support for MultiAssayExperiment Input</th>
|
320
|
|
-</tr>
|
321
|
|
-</thead>
|
|
242
|
+</tr></thead>
|
322
|
243
|
<tbody>
|
323
|
244
|
<tr class="odd">
|
324
|
245
|
<td><strong>ClassifyR</strong></td>
|
...
|
...
|
@@ -378,109 +299,89 @@ below provides a comparison of which features they offer.</p>
|
378
|
299
|
</tbody>
|
379
|
300
|
</table>
|
380
|
301
|
</div>
|
381
|
|
-<div id="provided-functionality" class="section level3">
|
382
|
|
-<h3>Provided Functionality</h3>
|
383
|
|
-<p>Although being a cross-validation framework, a number of popular
|
384
|
|
-feature selection and classification functions are provided by the
|
385
|
|
-package which meet the requirements of functions to be used by it (see
|
386
|
|
-the last section).</p>
|
387
|
|
-<div id="provided-methods-for-feature-selection-and-classification"
|
388
|
|
-class="section level4">
|
389
|
|
-<h4>Provided Methods for Feature Selection and Classification</h4>
|
390
|
|
-<p>In the following tables, a function that is used when no function is
|
391
|
|
-explicitly specified by the user is shown as <span
|
392
|
|
-style="padding:4px; border:2px dashed #e64626;">functionName</span>.</p>
|
393
|
|
-<p>The functions below produce a ranking, of which different size
|
394
|
|
-subsets are tried and the classifier performance evaluated, to select a
|
395
|
|
-best subset of features, based on a criterion such as balanced accuracy
|
396
|
|
-rate, for example.</p>
|
397
|
|
-<table style="width:100%;">
|
|
302
|
+<div class="section level3">
|
|
303
|
+<h3 id="provided-functionality">Provided Functionality<a class="anchor" aria-label="anchor" href="#provided-functionality"></a>
|
|
304
|
+</h3>
|
|
305
|
+<p>Although being a cross-validation framework, a number of popular feature selection and classification functions are provided by the package which meet the requirements of functions to be used by it (see the last section).</p>
|
|
306
|
+<div class="section level4">
|
|
307
|
+<h4 id="provided-methods-for-feature-selection-and-classification">Provided Methods for Feature Selection and Classification<a class="anchor" aria-label="anchor" href="#provided-methods-for-feature-selection-and-classification"></a>
|
|
308
|
+</h4>
|
|
309
|
+<p>In the following tables, a function that is used when no function is explicitly specified by the user is shown as <span style="padding:4px; border:2px dashed #e64626;">functionName</span>.</p>
|
|
310
|
+<p>The functions below produce a ranking, of which different size subsets are tried and the classifier performance evaluated, to select a best subset of features, based on a criterion such as balanced accuracy rate, for example.</p>
|
|
311
|
+<table style="width:100%;" class="table">
|
398
|
312
|
<colgroup>
|
399
|
|
-<col width="9%" />
|
400
|
|
-<col width="62%" />
|
401
|
|
-<col width="9%" />
|
402
|
|
-<col width="9%" />
|
403
|
|
-<col width="9%" />
|
|
313
|
+<col width="9%">
|
|
314
|
+<col width="62%">
|
|
315
|
+<col width="9%">
|
|
316
|
+<col width="9%">
|
|
317
|
+<col width="9%">
|
404
|
318
|
</colgroup>
|
405
|
|
-<thead>
|
406
|
|
-<tr class="header">
|
|
319
|
+<thead><tr class="header">
|
407
|
320
|
<th>Function</th>
|
408
|
321
|
<th>Description</th>
|
409
|
322
|
<th>DM</th>
|
410
|
323
|
<th>DV</th>
|
411
|
324
|
<th>DD</th>
|
412
|
|
-</tr>
|
413
|
|
-</thead>
|
|
325
|
+</tr></thead>
|
414
|
326
|
<tbody>
|
415
|
327
|
<tr class="odd">
|
416
|
|
-<td><span
|
417
|
|
-style="padding:4px; border:2px dashed #e64626; font-family: 'Courier New', monospace;">differentMeansRanking</span></td>
|
|
328
|
+<td><span style="padding:4px; border:2px dashed #e64626; font-family: 'Courier New', monospace;">differentMeansRanking</span></td>
|
418
|
329
|
<td>t-test ranking if two classes, F-test ranking if three or more</td>
|
419
|
330
|
<td>✔</td>
|
420
|
331
|
<td></td>
|
421
|
332
|
<td></td>
|
422
|
333
|
</tr>
|
423
|
334
|
<tr class="even">
|
424
|
|
-<td><span
|
425
|
|
-style="font-family: 'Courier New', monospace;">limmaRanking</span></td>
|
|
335
|
+<td><span style="font-family: 'Courier New', monospace;">limmaRanking</span></td>
|
426
|
336
|
<td>Moderated t-test ranking using variance shrinkage</td>
|
427
|
337
|
<td>✔</td>
|
428
|
338
|
<td></td>
|
429
|
339
|
<td></td>
|
430
|
340
|
</tr>
|
431
|
341
|
<tr class="odd">
|
432
|
|
-<td><span
|
433
|
|
-style="font-family: 'Courier New', monospace;">edgeRranking</span></td>
|
|
342
|
+<td><span style="font-family: 'Courier New', monospace;">edgeRranking</span></td>
|
434
|
343
|
<td>Likelihood ratio test for count data ranking</td>
|
435
|
344
|
<td>✔</td>
|
436
|
345
|
<td></td>
|
437
|
346
|
<td></td>
|
438
|
347
|
</tr>
|
439
|
348
|
<tr class="even">
|
440
|
|
-<td><span
|
441
|
|
-style="font-family: 'Courier New', monospace;">bartlettRanking</span></td>
|
|
349
|
+<td><span style="font-family: 'Courier New', monospace;">bartlettRanking</span></td>
|
442
|
350
|
<td>Bartlett’s test non-robust ranking</td>
|
443
|
351
|
<td></td>
|
444
|
352
|
<td>✔</td>
|
445
|
353
|
<td></td>
|
446
|
354
|
</tr>
|
447
|
355
|
<tr class="odd">
|
448
|
|
-<td><span
|
449
|
|
-style="font-family: 'Courier New', monospace;">leveneRanking</span></td>
|
|
356
|
+<td><span style="font-family: 'Courier New', monospace;">leveneRanking</span></td>
|
450
|
357
|
<td>Levene’s test robust ranking</td>
|
451
|
358
|
<td></td>
|
452
|
359
|
<td>✔</td>
|
453
|
360
|
<td></td>
|
454
|
361
|
</tr>
|
455
|
362
|
<tr class="even">
|
456
|
|
-<td><span
|
457
|
|
-style="font-family: 'Courier New', monospace;">DMDranking</span></td>
|
458
|
|
-<td><span style="white-space: nowrap">Difference in location
|
459
|
|
-(mean/median) and/or scale (SD, MAD, <span
|
460
|
|
-class="math inline">\(Q_n\)</span>)</span></td>
|
|
363
|
+<td><span style="font-family: 'Courier New', monospace;">DMDranking</span></td>
|
|
364
|
+<td><span style="white-space: nowrap">Difference in location (mean/median) and/or scale (SD, MAD, <span class="math inline">\(Q_n\)</span>)</span></td>
|
461
|
365
|
<td>✔</td>
|
462
|
366
|
<td>✔</td>
|
463
|
367
|
<td>✔</td>
|
464
|
368
|
</tr>
|
465
|
369
|
<tr class="odd">
|
466
|
|
-<td><span
|
467
|
|
-style="font-family: 'Courier New', monospace;">likelihoodRatioRanking</span></td>
|
|
370
|
+<td><span style="font-family: 'Courier New', monospace;">likelihoodRatioRanking</span></td>
|
468
|
371
|
<td>Likelihood ratio (normal distribution) ranking</td>
|
469
|
372
|
<td>✔</td>
|
470
|
373
|
<td>✔</td>
|
471
|
374
|
<td>✔</td>
|
472
|
375
|
</tr>
|
473
|
376
|
<tr class="even">
|
474
|
|
-<td><span
|
475
|
|
-style="font-family: 'Courier New', monospace;">KolmogorovSmirnovRanking</span></td>
|
|
377
|
+<td><span style="font-family: 'Courier New', monospace;">KolmogorovSmirnovRanking</span></td>
|
476
|
378
|
<td>Kolmogorov-Smirnov distance between distributions ranking</td>
|
477
|
379
|
<td>✔</td>
|
478
|
380
|
<td>✔</td>
|
479
|
381
|
<td>✔</td>
|
480
|
382
|
</tr>
|
481
|
383
|
<tr class="odd">
|
482
|
|
-<td><span
|
483
|
|
-style="font-family: 'Courier New', monospace;">KullbackLeiblerRanking</span></td>
|
|
384
|
+<td><span style="font-family: 'Courier New', monospace;">KullbackLeiblerRanking</span></td>
|
484
|
385
|
<td>Kullback-Leibler distance between distributions ranking</td>
|
485
|
386
|
<td>✔</td>
|
486
|
387
|
<td>✔</td>
|
...
|
...
|
@@ -489,213 +390,164 @@ style="font-family: 'Courier New', monospace;">KullbackLeiblerRanking</s
|
489
|
390
|
</tbody>
|
490
|
391
|
</table>
|
491
|
392
|
<p>Likewise, a variety of classifiers is also provided.</p>
|
492
|
|
-<table>
|
|
393
|
+<table class="table">
|
493
|
394
|
<colgroup>
|
494
|
|
-<col width="9%" />
|
495
|
|
-<col width="61%" />
|
496
|
|
-<col width="9%" />
|
497
|
|
-<col width="9%" />
|
498
|
|
-<col width="9%" />
|
|
395
|
+<col width="9%">
|
|
396
|
+<col width="61%">
|
|
397
|
+<col width="9%">
|
|
398
|
+<col width="9%">
|
|
399
|
+<col width="9%">
|
499
|
400
|
</colgroup>
|
500
|
|
-<thead>
|
501
|
|
-<tr class="header">
|
|
401
|
+<thead><tr class="header">
|
502
|
402
|
<th>Function(s)</th>
|
503
|
403
|
<th>Description</th>
|
504
|
404
|
<th>DM</th>
|
505
|
405
|
<th>DV</th>
|
506
|
406
|
<th>DD</th>
|
507
|
|
-</tr>
|
508
|
|
-</thead>
|
|
407
|
+</tr></thead>
|
509
|
408
|
<tbody>
|
510
|
409
|
<tr class="odd">
|
511
|
|
-<td><span
|
512
|
|
-style="padding:1px; border:2px dashed #e64626; display:inline-block; margin-bottom: 3px; font-family: 'Courier New', monospace;">DLDAtrainInterface</span>,<br><span
|
513
|
|
-style="padding:1px; border:2px dashed #e64626; display:inline-block; font-family: 'Courier New', monospace;">DLDApredictInterface</span></td>
|
514
|
|
-<td>Wrappers for sparsediscrim’s functions <span
|
515
|
|
-style="font-family: 'Courier New', monospace;">dlda</span> and
|
516
|
|
-<span
|
517
|
|
-style="font-family: 'Courier New', monospace;">predict.dlda</span>
|
518
|
|
-functions</td>
|
|
410
|
+<td>
|
|
411
|
+<span style="padding:1px; border:2px dashed #e64626; display:inline-block; margin-bottom: 3px; font-family: 'Courier New', monospace;">DLDAtrainInterface</span>,<br><span style="padding:1px; border:2px dashed #e64626; display:inline-block; font-family: 'Courier New', monospace;">DLDApredictInterface</span>
|
|
412
|
+</td>
|
|
413
|
+<td>Wrappers for sparsediscrim’s functions <span style="font-family: 'Courier New', monospace;">dlda</span> and <span style="font-family: 'Courier New', monospace;">predict.dlda</span> functions</td>
|
519
|
414
|
<td>✔</td>
|
520
|
415
|
<td></td>
|
521
|
416
|
<td></td>
|
522
|
417
|
</tr>
|
523
|
418
|
<tr class="even">
|
524
|
|
-<td><span
|
525
|
|
-style="font-family: 'Courier New', monospace;">classifyInterface</span></td>
|
526
|
|
-<td>Wrapper for PoiClaClu’s Poisson LDA function <span
|
527
|
|
-style="font-family: 'Courier New', monospace;">classify</span></td>
|
|
419
|
+<td><span style="font-family: 'Courier New', monospace;">classifyInterface</span></td>
|
|
420
|
+<td>Wrapper for PoiClaClu’s Poisson LDA function <span style="font-family: 'Courier New', monospace;">classify</span>
|
|
421
|
+</td>
|
528
|
422
|
<td>✔</td>
|
529
|
423
|
<td></td>
|
530
|
424
|
<td></td>
|
531
|
425
|
</tr>
|
532
|
426
|
<tr class="odd">
|
533
|
|
-<td><span
|
534
|
|
-style="font-family: 'Courier New', monospace;">elasticNetGLMtrainInterface</span>,
|
535
|
|
-<span
|
536
|
|
-style="font-family: 'Courier New', monospace;">elasticNetGLMpredictInterface</span></td>
|
537
|
|
-<td>Wrappers for glmnet’s elastic net GLM functions <span
|
538
|
|
-style="font-family: 'Courier New', monospace;">glmnet</span> and
|
539
|
|
-<span
|
540
|
|
-style="font-family: 'Courier New', monospace;">predict.glmnet</span></td>
|
|
427
|
+<td>
|
|
428
|
+<span style="font-family: 'Courier New', monospace;">elasticNetGLMtrainInterface</span>, <span style="font-family: 'Courier New', monospace;">elasticNetGLMpredictInterface</span>
|
|
429
|
+</td>
|
|
430
|
+<td>Wrappers for glmnet’s elastic net GLM functions <span style="font-family: 'Courier New', monospace;">glmnet</span> and <span style="font-family: 'Courier New', monospace;">predict.glmnet</span>
|
|
431
|
+</td>
|
541
|
432
|
<td>✔</td>
|
542
|
433
|
<td></td>
|
543
|
434
|
<td></td>
|
544
|
435
|
</tr>
|
545
|
436
|
<tr class="even">
|
546
|
|
-<td><span
|
547
|
|
-style="font-family: 'Courier New', monospace;">NSCtrainInterface</span>,
|
548
|
|
-<span
|
549
|
|
-style="font-family: 'Courier New', monospace;">NSCpredictInterface</span></td>
|
550
|
|
-<td>Wrappers for pamr’s Nearest Shrunken Centroid functions <span
|
551
|
|
-style="font-family: 'Courier New', monospace;">pamr.train</span>
|
552
|
|
-and <span
|
553
|
|
-style="font-family: 'Courier New', monospace;">pamr.predict</span></td>
|
|
437
|
+<td>
|
|
438
|
+<span style="font-family: 'Courier New', monospace;">NSCtrainInterface</span>, <span style="font-family: 'Courier New', monospace;">NSCpredictInterface</span>
|
|
439
|
+</td>
|
|
440
|
+<td>Wrappers for pamr’s Nearest Shrunken Centroid functions <span style="font-family: 'Courier New', monospace;">pamr.train</span> and <span style="font-family: 'Courier New', monospace;">pamr.predict</span>
|
|
441
|
+</td>
|
554
|
442
|
<td>✔</td>
|
555
|
443
|
<td></td>
|
556
|
444
|
<td></td>
|
557
|
445
|
</tr>
|
558
|
446
|
<tr class="odd">
|
559
|
|
-<td><span
|
560
|
|
-style="font-family: 'Courier New', monospace;">fisherDiscriminant</span></td>
|
|
447
|
+<td><span style="font-family: 'Courier New', monospace;">fisherDiscriminant</span></td>
|
561
|
448
|
<td>Implementation of Fisher’s LDA for departures from normality</td>
|
562
|
449
|
<td>✔</td>
|
563
|
450
|
<td>✔*</td>
|
564
|
451
|
<td></td>
|
565
|
452
|
</tr>
|
566
|
453
|
<tr class="even">
|
567
|
|
-<td><span
|
568
|
|
-style="font-family: 'Courier New', monospace;">mixModelsTrain</span>,
|
569
|
|
-<span
|
570
|
|
-style="font-family: 'Courier New', monospace;">mixModelsPredict</span></td>
|
|
454
|
+<td>
|
|
455
|
+<span style="font-family: 'Courier New', monospace;">mixModelsTrain</span>, <span style="font-family: 'Courier New', monospace;">mixModelsPredict</span>
|
|
456
|
+</td>
|
571
|
457
|
<td>Feature-wise mixtures of normals and voting</td>
|
572
|
458
|
<td>✔</td>
|
573
|
459
|
<td>✔</td>
|
574
|
460
|
<td>✔</td>
|
575
|
461
|
</tr>
|
576
|
462
|
<tr class="odd">
|
577
|
|
-<td><span
|
578
|
|
-style="font-family: 'Courier New', monospace;">naiveBayesKernel</span></td>
|
|
463
|
+<td><span style="font-family: 'Courier New', monospace;">naiveBayesKernel</span></td>
|
579
|
464
|
<td>Feature-wise kernel density estimation and voting</td>
|
580
|
465
|
<td>✔</td>
|
581
|
466
|
<td>✔</td>
|
582
|
467
|
<td>✔</td>
|
583
|
468
|
</tr>
|
584
|
469
|
<tr class="even">
|
585
|
|
-<td><span
|
586
|
|
-style="font-family: 'Courier New', monospace;">randomForestTrainInterface</span>,
|
587
|
|
-<span
|
588
|
|
-style="font-family: 'Courier New', monospace;">randomForestPredictInterface</span></td>
|
589
|
|
-<td>Wrapper for ranger’s functions <span
|
590
|
|
-style="font-family: 'Courier New', monospace;">ranger</span> and
|
591
|
|
-<span
|
592
|
|
-style="font-family: 'Courier New', monospace;">predict</span></td>
|
|
470
|
+<td>
|
|
471
|
+<span style="font-family: 'Courier New', monospace;">randomForestTrainInterface</span>, <span style="font-family: 'Courier New', monospace;">randomForestPredictInterface</span>
|
|
472
|
+</td>
|
|
473
|
+<td>Wrapper for ranger’s functions <span style="font-family: 'Courier New', monospace;">ranger</span> and <span style="font-family: 'Courier New', monospace;">predict</span>
|
|
474
|
+</td>
|
593
|
475
|
<td>✔</td>
|
594
|
476
|
<td>✔</td>
|
595
|
477
|
<td>✔</td>
|
596
|
478
|
</tr>
|
597
|
479
|
<tr class="odd">
|
598
|
|
-<td><span
|
599
|
|
-style="font-family: 'Courier New', monospace;">extremeGradientBoostingTrainInterface</span>,
|
600
|
|
-<span
|
601
|
|
-style="font-family: 'Courier New', monospace;">extremeGradientBoostingPredictInterface</span></td>
|
602
|
|
-<td>Wrapper for xgboost’s functions <span
|
603
|
|
-style="font-family: 'Courier New', monospace;">xgboost</span>
|
604
|
|
-and <span
|
605
|
|
-style="font-family: 'Courier New', monospace;">predict</span></td>
|
|
480
|
+<td>
|
|
481
|
+<span style="font-family: 'Courier New', monospace;">extremeGradientBoostingTrainInterface</span>, <span style="font-family: 'Courier New', monospace;">extremeGradientBoostingPredictInterface</span>
|
|
482
|
+</td>
|
|
483
|
+<td>Wrapper for xgboost’s functions <span style="font-family: 'Courier New', monospace;">xgboost</span> and <span style="font-family: 'Courier New', monospace;">predict</span>
|
|
484
|
+</td>
|
606
|
485
|
<td>✔</td>
|
607
|
486
|
<td>✔</td>
|
608
|
487
|
<td>✔</td>
|
609
|
488
|
</tr>
|
610
|
489
|
<tr class="even">
|
611
|
|
-<td><span
|
612
|
|
-style="font-family: 'Courier New', monospace;">kNNinterface</span></td>
|
613
|
|
-<td>Wrapper for class’s function <span
|
614
|
|
-style="font-family: 'Courier New', monospace;">knn</span></td>
|
|
490
|
+<td><span style="font-family: 'Courier New', monospace;">kNNinterface</span></td>
|
|
491
|
+<td>Wrapper for class’s function <span style="font-family: 'Courier New', monospace;">knn</span>
|
|
492
|
+</td>
|
615
|
493
|
<td>✔</td>
|
616
|
494
|
<td>✔</td>
|
617
|
495
|
<td>✔</td>
|
618
|
496
|
</tr>
|
619
|
497
|
<tr class="odd">
|
620
|
|
-<td><span
|
621
|
|
-style="font-family: 'Courier New', monospace;">SVMtrainInterface</span>,
|
622
|
|
-<span
|
623
|
|
-style="font-family: 'Courier New', monospace;">SVMpredictInterface</span></td>
|
624
|
|
-<td>Wrapper for e1071’s functions <span
|
625
|
|
-style="font-family: 'Courier New', monospace;">svm</span> and
|
626
|
|
-<span
|
627
|
|
-style="font-family: 'Courier New', monospace;">predict.svm</span></td>
|
|
498
|
+<td>
|
|
499
|
+<span style="font-family: 'Courier New', monospace;">SVMtrainInterface</span>, <span style="font-family: 'Courier New', monospace;">SVMpredictInterface</span>
|
|
500
|
+</td>
|
|
501
|
+<td>Wrapper for e1071’s functions <span style="font-family: 'Courier New', monospace;">svm</span> and <span style="font-family: 'Courier New', monospace;">predict.svm</span>
|
|
502
|
+</td>
|
628
|
503
|
<td>✔</td>
|
629
|
504
|
<td>✔ †</td>
|
630
|
505
|
<td>✔ †</td>
|
631
|
506
|
</tr>
|
632
|
507
|
</tbody>
|
633
|
508
|
</table>
|
634
|
|
-<p>* If ordinary numeric measurements have been transformed to absolute
|
635
|
|
-deviations using <span
|
636
|
|
-style="font-family: 'Courier New', monospace;">subtractFromLocation</span>.<br>
|
637
|
|
-† If the value of <span
|
638
|
|
-style="font-family: 'Courier New', monospace;">kernel</span> is
|
639
|
|
-not <span
|
640
|
|
-style="font-family: 'Courier New', monospace;">“linear”</span>.</p>
|
641
|
|
-<p>If a desired selection or classification method is not already
|
642
|
|
-implemented, rules for writing functions to work with
|
643
|
|
-<strong>ClassifyR</strong> are outlined in the wrapper vignette. Please
|
644
|
|
-visit it for more information.</p>
|
|
509
|
+<p>* If ordinary numeric measurements have been transformed to absolute deviations using <span style="font-family: 'Courier New', monospace;">subtractFromLocation</span>.<br> † If the value of <span style="font-family: 'Courier New', monospace;">kernel</span> is not <span style="font-family: 'Courier New', monospace;">“linear”</span>.</p>
|
|
510
|
+<p>If a desired selection or classification method is not already implemented, rules for writing functions to work with <strong>ClassifyR</strong> are outlined in the wrapper vignette. Please visit it for more information.</p>
|
645
|
511
|
</div>
|
646
|
|
-<div id="provided-meta-feature-methods" class="section level4">
|
647
|
|
-<h4>Provided Meta-feature Methods</h4>
|
648
|
|
-<p>A number of methods are provided for users to enable classification
|
649
|
|
-in a feature-set-centric or interactor-centric way. The meta-feature
|
650
|
|
-creation functions should be used before cross-validation is done.</p>
|
651
|
|
-<table>
|
|
512
|
+<div class="section level4">
|
|
513
|
+<h4 id="provided-meta-feature-methods">Provided Meta-feature Methods<a class="anchor" aria-label="anchor" href="#provided-meta-feature-methods"></a>
|
|
514
|
+</h4>
|
|
515
|
+<p>A number of methods are provided for users to enable classification in a feature-set-centric or interactor-centric way. The meta-feature creation functions should be used before cross-validation is done.</p>
|
|
516
|
+<table class="table">
|
652
|
517
|
<colgroup>
|
653
|
|
-<col width="9%" />
|
654
|
|
-<col width="61%" />
|
655
|
|
-<col width="14%" />
|
656
|
|
-<col width="14%" />
|
|
518
|
+<col width="9%">
|
|
519
|
+<col width="61%">
|
|
520
|
+<col width="14%">
|
|
521
|
+<col width="14%">
|
657
|
522
|
</colgroup>
|
658
|
|
-<thead>
|
659
|
|
-<tr class="header">
|
|
523
|
+<thead><tr class="header">
|
660
|
524
|
<th>Function</th>
|
661
|
525
|
<th>Description</th>
|
662
|
526
|
<th align="center">Before CV</th>
|
663
|
527
|
<th align="center">During CV</th>
|
664
|
|
-</tr>
|
665
|
|
-</thead>
|
|
528
|
+</tr></thead>
|
666
|
529
|
<tbody>
|
667
|
530
|
<tr class="odd">
|
668
|
|
-<td><span
|
669
|
|
-style="font-family: 'Courier New', monospace;">edgesToHubNetworks</span></td>
|
670
|
|
-<td>Takes a two-column <span
|
671
|
|
-style="font-family: 'Courier New', monospace;">matrix</span> or
|
672
|
|
-<span
|
673
|
|
-style="font-family: 'Courier New', monospace;">DataFrame</span>
|
674
|
|
-and finds all nodes with at least a minimum number of interactions</td>
|
|
531
|
+<td><span style="font-family: 'Courier New', monospace;">edgesToHubNetworks</span></td>
|
|
532
|
+<td>Takes a two-column <span style="font-family: 'Courier New', monospace;">matrix</span> or <span style="font-family: 'Courier New', monospace;">DataFrame</span> and finds all nodes with at least a minimum number of interactions</td>
|
675
|
533
|
<td align="center">✔</td>
|
676
|
534
|
<td align="center"></td>
|
677
|
535
|
</tr>
|
678
|
536
|
<tr class="even">
|
679
|
|
-<td><span
|
680
|
|
-style="font-family: 'Courier New', monospace;">featureSetSummary</span></td>
|
681
|
|
-<td><span style="white-space: nowrap">Considers sets of features and
|
682
|
|
-calculates their mean or median</span></td>
|
|
537
|
+<td><span style="font-family: 'Courier New', monospace;">featureSetSummary</span></td>
|
|
538
|
+<td><span style="white-space: nowrap">Considers sets of features and calculates their mean or median</span></td>
|
683
|
539
|
<td align="center">✔</td>
|
684
|
540
|
<td align="center"></td>
|
685
|
541
|
</tr>
|
686
|
542
|
<tr class="odd">
|
687
|
|
-<td><span
|
688
|
|
-style="font-family: 'Courier New', monospace;">pairsDifferencesSelection</span></td>
|
689
|
|
-<td>Finds a set of pairs of features whose measurement inequalities can
|
690
|
|
-be used for predicting with</td>
|
|
543
|
+<td><span style="font-family: 'Courier New', monospace;">pairsDifferencesSelection</span></td>
|
|
544
|
+<td>Finds a set of pairs of features whose measurement inequalities can be used for predicting with</td>
|
691
|
545
|
<td align="center"></td>
|
692
|
546
|
<td align="center">✔</td>
|
693
|
547
|
</tr>
|
694
|
548
|
<tr class="even">
|
695
|
|
-<td><span
|
696
|
|
-style="font-family: 'Courier New', monospace;">kTSPclassifier</span></td>
|
697
|
|
-<td>Voting classifier that uses inequalities between pairs of features
|
698
|
|
-to vote for one of two classes</td>
|
|
549
|
+<td><span style="font-family: 'Courier New', monospace;">kTSPclassifier</span></td>
|
|
550
|
+<td>Voting classifier that uses inequalities between pairs of features to vote for one of two classes</td>
|
699
|
551
|
<td align="center"></td>
|
700
|
552
|
<td align="center">✔</td>
|
701
|
553
|
</tr>
|
...
|
...
|
@@ -703,590 +555,459 @@ to vote for one of two classes</td>
|
703
|
555
|
</table>
|
704
|
556
|
</div>
|
705
|
557
|
</div>
|
706
|
|
-<div id="fine-grained-cross-validation-and-modelling-using-runtests"
|
707
|
|
-class="section level3">
|
708
|
|
-<h3>Fine-grained Cross-validation and Modelling Using
|
709
|
|
-<em>runTests</em></h3>
|
710
|
|
-<p>For more control over the finer aspects of cross-validation of a
|
711
|
|
-single data set, <em>runTests</em> may be employed in place of
|
712
|
|
-<em>crossValidate</em>. For the variety of cross-validation, the
|
713
|
|
-parameters are specified by a <em>CrossValParams</em> object. The
|
714
|
|
-default setting is for 100 permutations and five folds and parameter
|
715
|
|
-tuning is done by resubstitution. It is also recommended to specify a
|
716
|
|
-<em>parallelParams</em> setting. On Linux and MacOS operating systems,
|
717
|
|
-it should be <em>MulticoreParam</em> and on Windows computers it should
|
718
|
|
-be <em>SnowParam</em>. Note that each of these have an option
|
719
|
|
-<em>RNGseed</em> and this <strong>needs to be set by the user</strong>
|
720
|
|
-because some classifiers or feature selection functions will have some
|
721
|
|
-element of randomisation. One example that works on all operating
|
722
|
|
-systems, but is best-suited to Windows is:</p>
|
723
|
|
-<div class="sourceCode" id="cb20"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a>CVparams <span class="ot"><-</span> <span class="fu">CrossValParams</span>(<span class="at">parallelParams =</span> <span class="fu">SnowParam</span>(<span class="dv">16</span>, <span class="at">RNGseed =</span> <span class="dv">123</span>))</span>
|
724
|
|
-<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a>CVparams</span></code></pre></div>
|
725
|
|
-<p>For the actual operations to do to the data to build a model of it,
|
726
|
|
-each of the stages should be specified by an object of class
|
727
|
|
-<em>ModellingParams</em>. This controls how class imbalance is handled
|
728
|
|
-(default is to downsample to the smallest class), any transformation
|
729
|
|
-that needs to be done inside of cross-validation (i.e. involving a
|
730
|
|
-computed value from the training set), any feature selection and the
|
731
|
|
-training and prediction functions to be used. The default is to do an
|
732
|
|
-ordinary t-test (two groups) or ANOVA (three or more groups) and
|
733
|
|
-classification using diagonal LDA.</p>
|
734
|
|
-<div class="sourceCode" id="cb21"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="fu">ModellingParams</span>()</span></code></pre></div>
|
735
|
|
-<pre><code>## An object of class "ModellingParams"
|
736
|
|
-## Slot "balancing":
|
737
|
|
-## [1] "downsample"
|
738
|
|
-##
|
739
|
|
-## Slot "transformParams":
|
740
|
|
-## NULL
|
741
|
|
-##
|
742
|
|
-## Slot "selectParams":
|
743
|
|
-## An object of class 'SelectParams'.
|
744
|
|
-## Selection Name: Difference in Means.
|
745
|
|
-##
|
746
|
|
-## Slot "trainParams":
|
747
|
|
-## An object of class 'TrainParams'.
|
748
|
|
-## Classifier Name: Diagonal LDA.
|
749
|
|
-##
|
750
|
|
-## Slot "predictParams":
|
751
|
|
-## An object of class 'PredictParams'.
|
752
|
|
-##
|
753
|
|
-## Slot "doImportance":
|
754
|
|
-## [1] FALSE</code></pre>
|
|
558
|
+<div class="section level3">
|
|
559
|
+<h3 id="fine-grained-cross-validation-and-modelling-using-runtests">Fine-grained Cross-validation and Modelling Using <em>runTests</em><a class="anchor" aria-label="anchor" href="#fine-grained-cross-validation-and-modelling-using-runtests"></a>
|
|
560
|
+</h3>
|
|
561
|
+<p>For more control over the finer aspects of cross-validation of a single data set, <em>runTests</em> may be employed in place of <em>crossValidate</em>. For the variety of cross-validation, the parameters are specified by a <em>CrossValParams</em> object. The default setting is for 100 permutations and five folds and parameter tuning is done by resubstitution. It is also recommended to specify a <em>parallelParams</em> setting. On Linux and MacOS operating systems, it should be <em>MulticoreParam</em> and on Windows computers it should be <em>SnowParam</em>. Note that each of these have an option <em>RNGseed</em> and this <strong>needs to be set by the user</strong> because some classifiers or feature selection functions will have some element of randomisation. One example that works on all operating systems, but is best-suited to Windows is:</p>
|
|
562
|
+<div class="sourceCode" id="cb20"><pre class="downlit sourceCode r">
|
|
563
|
+<code class="sourceCode R"><span><span class="va">CVparams</span> <span class="op"><-</span> <span class="fu"><a href="../reference/CrossValParams-class.html">CrossValParams</a></span><span class="op">(</span>parallelParams <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/pkg/BiocParallel/man/SnowParam-class.html" class="external-link">SnowParam</a></span><span class="op">(</span><span class="fl">16</span>, RNGseed <span class="op">=</span> <span class="fl">123</span><span class="op">)</span><span class="op">)</span></span>
|
|
564
|
+<span><span class="va">CVparams</span></span></code></pre></div>
|
|
565
|
+<p>For the actual operations to do to the data to build a model of it, each of the stages should be specified by an object of class <em>ModellingParams</em>. This controls how class imbalance is handled (default is to downsample to the smallest class), any transformation that needs to be done inside of cross-validation (i.e. involving a computed value from the training set), any feature selection and the training and prediction functions to be used. The default is to do an ordinary t-test (two groups) or ANOVA (three or more groups) and classification using diagonal LDA.</p>
|
|
566
|
+<div class="sourceCode" id="cb21"><pre class="downlit sourceCode r">
|
|
567
|
+<code class="sourceCode R"><span><span class="fu"><a href="../reference/ModellingParams-class.html">ModellingParams</a></span><span class="op">(</span><span class="op">)</span></span></code></pre></div>
|
|
568
|
+<pre><code><span><span class="co">## An object of class "ModellingParams"</span></span>
|
|
569
|
+<span><span class="co">## Slot "balancing":</span></span>
|
|
570
|
+<span><span class="co">## [1] "downsample"</span></span>
|
|
571
|
+<span><span class="co">## </span></span>
|
|
572
|
+<span><span class="co">## Slot "transformParams":</span></span>
|
|
573
|
+<span><span class="co">## NULL</span></span>
|
|
574
|
+<span><span class="co">## </span></span>
|
|
575
|
+<span><span class="co">## Slot "selectParams":</span></span>
|
|
576
|
+<span><span class="co">## An object of class 'SelectParams'.</span></span>
|
|
577
|
+<span><span class="co">## Selection Name: Difference in Means.</span></span>
|
|
578
|
+<span><span class="co">## </span></span>
|
|
579
|
+<span><span class="co">## Slot "trainParams":</span></span>
|
|
580
|
+<span><span class="co">## An object of class 'TrainParams'.</span></span>
|
|
581
|
+<span><span class="co">## Classifier Name: Diagonal LDA.</span></span>
|
|
582
|
+<span><span class="co">## </span></span>
|
|
583
|
+<span><span class="co">## Slot "predictParams":</span></span>
|