Browse code

Use pkgdown to make website

Andrew McDavid authored on 19/06/2019 14:15:03
Showing 51 changed files

... ...
@@ -6,3 +6,6 @@ extdata/refdata-cellranger-vdj-GRCh38-alts-ensembl-2.0.0/
6 6
 ^doc$
7 7
 ^Meta$
8 8
 manuscript/
9
+^_pkgdown\.yml$
10
+^docs$
11
+^pkgdown$
... ...
@@ -9,3 +9,8 @@ inst/doc
9 9
 doc
10 10
 Meta
11 11
 *.dll
12
+*.Rproj
13
+.Rhistory
14
+.Renviron
15
+.RData
16
+
... ...
@@ -55,3 +55,5 @@ Encoding: UTF-8
55 55
 LazyData: true
56 56
 NeedsCompilation: yes
57 57
 RoxygenNote: 6.1.1
58
+URL: https://github.com/amcdavid/CellaRepertorium
59
+BugReports: https://github.com/amcdavid/CellaRepertorium/issues
... ...
@@ -1,10 +1,34 @@
1 1
 # CellaRepertorium
2 2
 
3
-This package contains methods for clustering and analyzing single cell RepSeq data, especially as generated by [10X genomics VDJ solution](https://support.10xgenomics.com/single-cell-vdj).  
3
+This package contains methods for clustering and analyzing single cell RepSeq data, especially as generated by [10X genomics VDJ solution](https://support.10xgenomics.com/single-cell-vdj).
4
+
5
+## Installation
6
+
7
+```
8
+devtools::install_github('amcdavid/CellaRepertorium')
9
+```
10
+
11
+Requires R>=3.5.
12
+
13
+## Data requirements and package structure
14
+
15
+The fundamental unit is the **contig**, which is a section of contiguously stitched reads from a single **cell**.  Each contig belongs to one (and only one) cell, however, cells generate multiple contigs.  Contigs can also belong to a **cluster**.  Because of these two many-to-one mappings, these data can be thought as a series of ragged arrays.  The links between them mean they are relational data.
16
+
17
+[A schematic of contigs and cells should go here]
18
+
19
+A `ContigCellDB` object wraps each of these objects as a sequence of three `data.frame`s (well, `tibble`s, actually).   `ContigCellDB` also tracks columns (keys) that unique identify each row in each of these tables.  The `contig_tbl` is the `tibble` containing **contigs**, the `cell_tbl` contains the **cells**, and the `cluster_tbl` contains the **clusters**.  The `contig_pk`, `cell_pk` and `cluster_pk` identify the columns that identify a contig, cell and cluster, respectively, and must be unique in each of the respective tables.
20
+The tables are kept in sync so that subsetting the contigs will subset the cells, and clusters, and vice-versa.
21
+
22
+[A schematic showing table relations should go here]
23
+
24
+Of course, each of these tables can contain many other columns that will serve as covariates for various analysis, such as the CDR3 sequence, or the identity of the V, D and J regions.  Various derived quantities that describe cells and clusters can also be calculated, and added to these tables, such as the medoid of a cluster.
4 25
 
5 26
 ## Functions
6 27
 
28
+[a screencap of something interesting?]
29
+
7 30
 *  `cdhit`: An R interface to CDhit, which was originally ported by Thomas Lin Pedersen.
8 31
 *  `fine_cluster`: clustering CDR3 by edit distances (possibly using empirical amino acid substitution matrices)
9 32
 *  `cluster_permute_test`: permutation tests of cluster statistics
10 33
 
34
+
11 35
new file mode 100644
... ...
@@ -0,0 +1 @@
1
+destination: docs
0 2
new file mode 100644
... ...
@@ -0,0 +1,412 @@
1
+<!DOCTYPE html>
2
+<!-- Generated by pkgdown: do not edit by hand --><html lang="en">
3
+<head>
4
+<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
5
+<meta charset="utf-8">
6
+<meta http-equiv="X-UA-Compatible" content="IE=edge">
7
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
8
+<title>Clustering repertoire via CDR3 sequences • CellaRepertorium</title>
9
+<!-- jquery --><script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script><!-- Bootstrap --><link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.7/css/bootstrap.min.css" integrity="sha256-916EbMg70RQy9LHiGkXzG8hSg9EdNy97GazNG/aiY1w=" crossorigin="anonymous">
10
+<script src="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha256-U5ZEeKfGNOja007MMD3YBI0A3OSZOQbeG6z2f2Y0hu8=" crossorigin="anonymous"></script><!-- Font Awesome icons --><link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css" integrity="sha256-eZrrJcwDc/3uDhsdt61sL2oOBY362qM3lon1gyExkL0=" crossorigin="anonymous">
11
+<!-- clipboard.js --><script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/2.0.4/clipboard.min.js" integrity="sha256-FiZwavyI2V6+EXO1U+xzLG3IKldpiTFf3153ea9zikQ=" crossorigin="anonymous"></script><!-- sticky kit --><script src="https://cdnjs.cloudflare.com/ajax/libs/sticky-kit/1.1.3/sticky-kit.min.js" integrity="sha256-c4Rlo1ZozqTPE2RLuvbusY3+SU1pQaJC0TjuhygMipw=" crossorigin="anonymous"></script><!-- pkgdown --><link href="../pkgdown.css" rel="stylesheet">
12
+<script src="../pkgdown.js"></script><meta property="og:title" content="Clustering repertoire via CDR3 sequences">
13
+<meta property="og:description" content="">
14
+<meta name="twitter:card" content="summary">
15
+<!-- mathjax --><script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js" integrity="sha256-nvJJv9wWKEm88qvoQl9ekL2J+k/RWIsaSScxxlsrv8k=" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/config/TeX-AMS-MML_HTMLorMML.js" integrity="sha256-84DKXVJXs0/F8OTMzX4UR909+jtl4G7SPypPavF+GfA=" crossorigin="anonymous"></script><!--[if lt IE 9]>
16
+<script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>
17
+<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
18
+<![endif]-->
19
+</head>
20
+<body>
21
+    <div class="container template-article">
22
+      <header><div class="navbar navbar-default navbar-fixed-top" role="navigation">
23
+  <div class="container">
24
+    <div class="navbar-header">
25
+      <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false">
26
+        <span class="sr-only">Toggle navigation</span>
27
+        <span class="icon-bar"></span>
28
+        <span class="icon-bar"></span>
29
+        <span class="icon-bar"></span>
30
+      </button>
31
+      <span class="navbar-brand">
32
+        <a class="navbar-link" href="../index.html">CellaRepertorium</a>
33
+        <span class="version label label-default" data-toggle="tooltip" data-placement="bottom" title="Released version">0.3.1</span>
34
+      </span>
35
+    </div>
36
+
37
+    <div id="navbar" class="navbar-collapse collapse">
38
+      <ul class="nav navbar-nav">
39
+<li>
40
+  <a href="../index.html">
41
+    <span class="fa fa-home fa-lg"></span>
42
+     
43
+  </a>
44
+</li>
45
+<li>
46
+  <a href="../reference/index.html">Reference</a>
47
+</li>
48
+<li class="dropdown">
49
+  <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">
50
+    Articles
51
+     
52
+    <span class="caret"></span>
53
+  </a>
54
+  <ul class="dropdown-menu" role="menu">
55
+<li>
56
+      <a href="../articles/cdr3_clustering.html">Clustering repertoire via CDR3 sequences</a>
57
+    </li>
58
+    <li>
59
+      <a href="../articles/mouse_tcell_qc.html">Quality control and Exploration of UMI-based repertoire data</a>
60
+    </li>
61
+  </ul>
62
+</li>
63
+      </ul>
64
+<ul class="nav navbar-nav navbar-right">
65
+<li>
66
+  <a href="https://github.com/amcdavid/CellaRepertorium">
67
+    <span class="fa fa-github fa-lg"></span>
68
+     
69
+  </a>
70
+</li>
71
+      </ul>
72
+</div>
73
+<!--/.nav-collapse -->
74
+  </div>
75
+<!--/.container -->
76
+</div>
77
+<!--/.navbar -->
78
+
79
+      
80
+      </header><div class="row">
81
+  <div class="col-md-9 contents">
82
+    <div class="page-header toc-ignore">
83
+      <h1>Clustering repertoire via CDR3 sequences</h1>
84
+            
85
+      
86
+      <small class="dont-index">Source: <a href="https://github.com/amcdavid/CellaRepertorium/blob/master/vignettes/cdr3_clustering.Rmd"><code>vignettes/cdr3_clustering.Rmd</code></a></small>
87
+      <div class="hidden name"><code>cdr3_clustering.Rmd</code></div>
88
+
89
+    </div>
90
+
91
+    
92
+    
93
+<div class="sourceCode" id="cb1"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb1-1" data-line-number="1"><span class="co">#load_all()</span></a>
94
+<a class="sourceLine" id="cb1-2" data-line-number="2"><span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/library">library</a></span>(CellaRepertorium)</a>
95
+<a class="sourceLine" id="cb1-3" data-line-number="3"><span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/library">library</a></span>(dplyr)</a>
96
+<a class="sourceLine" id="cb1-4" data-line-number="4"><span class="co">#&gt; </span></a>
97
+<a class="sourceLine" id="cb1-5" data-line-number="5"><span class="co">#&gt; Attaching package: 'dplyr'</span></a>
98
+<a class="sourceLine" id="cb1-6" data-line-number="6"><span class="co">#&gt; The following objects are masked from 'package:stats':</span></a>
99
+<a class="sourceLine" id="cb1-7" data-line-number="7"><span class="co">#&gt; </span></a>
100
+<a class="sourceLine" id="cb1-8" data-line-number="8"><span class="co">#&gt;     filter, lag</span></a>
101
+<a class="sourceLine" id="cb1-9" data-line-number="9"><span class="co">#&gt; The following objects are masked from 'package:base':</span></a>
102
+<a class="sourceLine" id="cb1-10" data-line-number="10"><span class="co">#&gt; </span></a>
103
+<a class="sourceLine" id="cb1-11" data-line-number="11"><span class="co">#&gt;     intersect, setdiff, setequal, union</span></a>
104
+<a class="sourceLine" id="cb1-12" data-line-number="12"><span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/library">library</a></span>(ggplot2)</a>
105
+<a class="sourceLine" id="cb1-13" data-line-number="13"><span class="co">#&gt; Registered S3 methods overwritten by 'ggplot2':</span></a>
106
+<a class="sourceLine" id="cb1-14" data-line-number="14"><span class="co">#&gt;   method         from </span></a>
107
+<a class="sourceLine" id="cb1-15" data-line-number="15"><span class="co">#&gt;   [.quosures     rlang</span></a>
108
+<a class="sourceLine" id="cb1-16" data-line-number="16"><span class="co">#&gt;   c.quosures     rlang</span></a>
109
+<a class="sourceLine" id="cb1-17" data-line-number="17"><span class="co">#&gt;   print.quosures rlang</span></a>
110
+<a class="sourceLine" id="cb1-18" data-line-number="18"><span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/library">library</a></span>(readr)</a>
111
+<a class="sourceLine" id="cb1-19" data-line-number="19"><span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/library">library</a></span>(tidyr)</a>
112
+<a class="sourceLine" id="cb1-20" data-line-number="20"><span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/library">library</a></span>(stringr)</a>
113
+<a class="sourceLine" id="cb1-21" data-line-number="21"><span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/library">library</a></span>(purrr)</a></code></pre></div>
114
+<div id="load-filtered-contig-files" class="section level1">
115
+<h1 class="hasAnchor">
116
+<a href="#load-filtered-contig-files" class="anchor"></a>Load filtered contig files</h1>
117
+<div class="sourceCode" id="cb2"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb2-1" data-line-number="1"><span class="kw"><a href="https://www.rdocumentation.org/packages/utils/topics/data">data</a></span>(contigs_qc)</a>
118
+<a class="sourceLine" id="cb2-2" data-line-number="2">MIN_CDR3_AA =<span class="st"> </span><span class="dv">6</span></a>
119
+<a class="sourceLine" id="cb2-3" data-line-number="3"></a>
120
+<a class="sourceLine" id="cb2-4" data-line-number="4"></a>
121
+<a class="sourceLine" id="cb2-5" data-line-number="5">cdb =<span class="st"> </span><span class="kw"><a href="../reference/ContigCellDB-fun.html">ContigCellDB_10XVDJ</a></span>(contigs_qc, <span class="dt">contig_pk =</span> <span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/c">c</a></span>(<span class="st">'barcode'</span>, <span class="st">'pop'</span>, <span class="st">'sample'</span>, <span class="st">'contig_id'</span>), <span class="dt">cell_pk =</span> <span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/c">c</a></span>(<span class="st">'barcode'</span>, <span class="st">'pop'</span>, <span class="st">'sample'</span>))</a>
122
+<a class="sourceLine" id="cb2-6" data-line-number="6"></a>
123
+<a class="sourceLine" id="cb2-7" data-line-number="7">cdb<span class="op">$</span>contig_tbl =<span class="st"> </span>dplyr<span class="op">::</span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/filter.html">filter</a></span>(cdb<span class="op">$</span>contig_tbl, full_length, productive <span class="op">==</span><span class="st"> 'True'</span>, high_confidence, chain <span class="op">!=</span><span class="st"> 'Multi'</span>, <span class="kw"><a href="https://stringr.tidyverse.org/reference/str_length.html">str_length</a></span>(cdr3) <span class="op">&gt;</span><span class="st"> </span>MIN_CDR3_AA) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate</a></span>( <span class="dt">fancy_name =</span> <span class="kw"><a href="../reference/fancy_name_contigs.html">fancy_name_contigs</a></span>(., <span class="kw"><a href="https://stringr.tidyverse.org/reference/str_c.html">str_c</a></span>(pop, <span class="st">'_'</span>, sample)))</a></code></pre></div>
124
+<p>good chains (either TRA or TRB); each cell can appear more than once.</p>
125
+</div>
126
+<div id="chain-pairings" class="section level1">
127
+<h1 class="hasAnchor">
128
+<a href="#chain-pairings" class="anchor"></a>Chain pairings</h1>
129
+<div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb3-1" data-line-number="1">paired_chain =<span class="st"> </span><span class="kw"><a href="../reference/enumerate_pairing.html">enumerate_pairing</a></span>(cdb, <span class="dt">chain_recode_fun =</span> <span class="st">'guess'</span>)</a>
130
+<a class="sourceLine" id="cb3-2" data-line-number="2"></a>
131
+<a class="sourceLine" id="cb3-3" data-line-number="3"><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot</a></span>(paired_chain, <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes</a></span>(<span class="dt">x =</span> <span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/interaction">interaction</a></span>(sample, pop), <span class="dt">fill =</span> pairing)) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/geom_bar.html">geom_bar</a></span>() <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/facet_wrap.html">facet_wrap</a></span>(<span class="op">~</span>canonical, <span class="dt">scale =</span> <span class="st">'free_x'</span>) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/coord_flip.html">coord_flip</a></span>() <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggtheme.html">theme_minimal</a></span>()</a></code></pre></div>
132
+<p><img src="cdr3_clustering_files/figure-html/unnamed-chunk-3-1.png" width="700"></p>
133
+</div>
134
+<div id="cluster-cdr3-protein-sequences" class="section level1">
135
+<h1 class="hasAnchor">
136
+<a href="#cluster-cdr3-protein-sequences" class="anchor"></a>Cluster CDR3 protein sequences</h1>
137
+<div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb4-1" data-line-number="1"></a>
138
+<a class="sourceLine" id="cb4-2" data-line-number="2">aa80 =<span class="st"> </span>CellaRepertorium<span class="op">:::</span><span class="kw"><a href="../reference/cdhit.html">cdhit_ccdb</a></span>(cdb, <span class="st">'cdr3'</span>, <span class="dt">type =</span> <span class="st">'AA'</span>, <span class="dt">cluster_name =</span> <span class="st">'aa80'</span>, <span class="dt">identity =</span> <span class="fl">.8</span>)</a>
139
+<a class="sourceLine" id="cb4-3" data-line-number="3">aa80 =<span class="st"> </span><span class="kw"><a href="../reference/fine_clustering.html">fine_clustering</a></span>(aa80, <span class="dt">sequence_key =</span> <span class="st">'cdr3'</span>, <span class="dt">type =</span> <span class="st">'AA'</span>, <span class="dt">keep_clustering_details =</span> <span class="ot">TRUE</span>)</a>
140
+<a class="sourceLine" id="cb4-4" data-line-number="4"><span class="co">#&gt; Calculating intradistances on 988 clusters.</span></a>
141
+<a class="sourceLine" id="cb4-5" data-line-number="5"><span class="co">#&gt; Summarizing</span></a>
142
+<a class="sourceLine" id="cb4-6" data-line-number="6"></a>
143
+<a class="sourceLine" id="cb4-7" data-line-number="7"></a>
144
+<a class="sourceLine" id="cb4-8" data-line-number="8"><span class="co"># This maybe should be a turned into a function </span></a>
145
+<a class="sourceLine" id="cb4-9" data-line-number="9"><span class="co"># Other plots should be considered:</span></a>
146
+<a class="sourceLine" id="cb4-10" data-line-number="10"><span class="co"># That show how clusters are split between samples, chains, etc</span></a>
147
+<a class="sourceLine" id="cb4-11" data-line-number="11"><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot</a></span>(aa80<span class="op">$</span>cluster_tbl <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/filter.html">filter</a></span>(n_cluster<span class="op">&gt;</span><span class="dv">1</span>) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://tidyr.tidyverse.org/reference/gather.html">gather</a></span>(key, value, <span class="op">-</span>aa80, <span class="op">-</span>fc) , <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes</a></span>(<span class="dt">x =</span> value))<span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/facet_wrap.html">facet_wrap</a></span>(<span class="op">~</span>key, <span class="dt">scales =</span> <span class="st">'free'</span>) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/geom_histogram.html">geom_histogram</a></span>() <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/scale_continuous.html">scale_y_sqrt</a></span>()</a>
148
+<a class="sourceLine" id="cb4-12" data-line-number="12"><span class="co">#&gt; `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.</span></a></code></pre></div>
149
+<p><img src="cdr3_clustering_files/figure-html/unnamed-chunk-4-1.png" width="700"></p>
150
+<p>We cluster the CDR3 translated amino acid residues with the program <a href="http://weizhongli-lab.org/cdhit_suite/cgi-bin/index.cgi?cmd=cd-hit">CD-HIT</a>. A sequence is included in a cluster if it matches by 100% similiarity and has the same CDR3 length. Note that this can and should be relaxed – especially in the beta chain we see “near clones” that only differ by a residue or two, seemingly in stylized places.</p>
151
+</div>
152
+<div id="cluster-cdr3-dna-sequences" class="section level1">
153
+<h1 class="hasAnchor">
154
+<a href="#cluster-cdr3-dna-sequences" class="anchor"></a>Cluster CDR3 DNA sequences</h1>
155
+<div class="sourceCode" id="cb5"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb5-1" data-line-number="1">cdb =<span class="st"> </span>CellaRepertorium<span class="op">:::</span><span class="kw"><a href="../reference/cdhit.html">cdhit_ccdb</a></span>(cdb, <span class="st">'cdr3_nt'</span>, <span class="dt">type =</span> <span class="st">'DNA'</span>, <span class="dt">cluster_name =</span> <span class="st">'DNA97'</span>, <span class="dt">identity =</span> <span class="fl">.965</span>, <span class="dt">min_length =</span> MIN_CDR3_AA<span class="op">*</span><span class="dv">3-1</span>, <span class="dt">G =</span> <span class="dv">1</span>)</a>
156
+<a class="sourceLine" id="cb5-2" data-line-number="2">cdb =<span class="st"> </span><span class="kw"><a href="../reference/fine_clustering.html">fine_clustering</a></span>(cdb, <span class="dt">sequence_key =</span> <span class="st">'cdr3_nt'</span>, <span class="dt">type =</span> <span class="st">'DNA'</span>)</a>
157
+<a class="sourceLine" id="cb5-3" data-line-number="3"><span class="co">#&gt; Calculating intradistances on 1342 clusters.</span></a>
158
+<a class="sourceLine" id="cb5-4" data-line-number="4"><span class="co">#&gt; Summarizing</span></a>
159
+<a class="sourceLine" id="cb5-5" data-line-number="5"><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot</a></span>(cdb<span class="op">$</span>cluster_tbl <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/filter.html">filter</a></span>(n_cluster<span class="op">&gt;</span><span class="dv">1</span>) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://tidyr.tidyverse.org/reference/gather.html">gather</a></span>(key, value, <span class="op">-</span>DNA97) , <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes</a></span>(<span class="dt">x =</span> value))<span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/facet_wrap.html">facet_wrap</a></span>(<span class="op">~</span>key, <span class="dt">scales =</span> <span class="st">'free'</span>) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/geom_histogram.html">geom_histogram</a></span>() <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/scale_continuous.html">scale_y_sqrt</a></span>()</a>
160
+<a class="sourceLine" id="cb5-6" data-line-number="6"><span class="co">#&gt; `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.</span></a></code></pre></div>
161
+<p><img src="cdr3_clustering_files/figure-html/unnamed-chunk-5-1.png" width="700"></p>
162
+<p>We can also cluster by DNA identity.</p>
163
+<div class="sourceCode" id="cb6"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb6-1" data-line-number="1">germline_cluster =<span class="st"> </span>CellaRepertorium<span class="op">:::</span><span class="kw">cluster_germline</span>(cdb, <span class="dt">segment_keys =</span> <span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/c">c</a></span>(<span class="st">'v_gene'</span>, <span class="st">'j_gene'</span>, <span class="st">'chain'</span>), <span class="dt">cluster_name =</span> <span class="st">'segment_idx'</span>)</a>
164
+<a class="sourceLine" id="cb6-2" data-line-number="2"><span class="co">#&gt; Warning in replace_cluster_tbl(ccdb, cluster_tbl, cl_con_tbl, cluster_pk =</span></a>
165
+<a class="sourceLine" id="cb6-3" data-line-number="3"><span class="co">#&gt; cluster_name): Replacing `cluster_tbl` with key ccdb$cluster_pk</span></a></code></pre></div>
166
+<p>And by other features of the contigs. Here we cluster each contig based on the chain and V-J genes. This gives us the set of observed V-J pairings:</p>
167
+<div class="sourceCode" id="cb7"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb7-1" data-line-number="1">germline_cluster =<span class="st"> </span><span class="kw"><a href="../reference/fine_clustering.html">fine_clustering</a></span>(germline_cluster, <span class="dt">sequence_key =</span> <span class="st">'cdr3_nt'</span>, <span class="dt">type =</span> <span class="st">'DNA'</span>)</a>
168
+<a class="sourceLine" id="cb7-2" data-line-number="2"><span class="co">#&gt; Calculating intradistances on 700 clusters.</span></a>
169
+<a class="sourceLine" id="cb7-3" data-line-number="3"><span class="co">#&gt; Summarizing</span></a>
170
+<a class="sourceLine" id="cb7-4" data-line-number="4"><span class="co">#&gt; Warning in left_join_warn(d_medoid, contig_tbl, by = ccdb$contig_pk,</span></a>
171
+<a class="sourceLine" id="cb7-5" data-line-number="5"><span class="co">#&gt; overwrite = TRUE): Overwriting fields d(medoid), is_medoid in table</span></a>
172
+<a class="sourceLine" id="cb7-6" data-line-number="6"><span class="co">#&gt; contig_tbl</span></a>
173
+<a class="sourceLine" id="cb7-7" data-line-number="7"><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot</a></span>(germline_cluster<span class="op">$</span>cluster_tbl <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/filter.html">filter</a></span>(chain <span class="op">==</span><span class="st"> 'TRB'</span>), <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes</a></span>(<span class="dt">x =</span> v_gene, <span class="dt">y =</span> j_gene, <span class="dt">fill =</span> n_cluster)) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/geom_tile.html">geom_tile</a></span>() <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/theme.html">theme</a></span>(<span class="dt">axis.text.x =</span> <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/element.html">element_text</a></span>(<span class="dt">angle =</span> <span class="dv">90</span>))</a></code></pre></div>
174
+<p><img src="cdr3_clustering_files/figure-html/unnamed-chunk-7-1.png" width="700"></p>
175
+<p>Number of pairs</p>
176
+<div class="sourceCode" id="cb8"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb8-1" data-line-number="1"><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot</a></span>(germline_cluster<span class="op">$</span>cluster_tbl <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/filter.html">filter</a></span>(chain <span class="op">==</span><span class="st"> 'TRB'</span>), <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes</a></span>(<span class="dt">x =</span> v_gene, <span class="dt">y =</span> j_gene, <span class="dt">fill =</span> avg_distance)) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/geom_tile.html">geom_tile</a></span>() <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/theme.html">theme</a></span>(<span class="dt">axis.text.x =</span> <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/element.html">element_text</a></span>(<span class="dt">angle =</span> <span class="dv">90</span>))</a></code></pre></div>
177
+<p><img src="cdr3_clustering_files/figure-html/unnamed-chunk-8-1.png" width="700"></p>
178
+<p>Average Levenstein distance of CDR3 within each pair</p>
179
+<div class="sourceCode" id="cb9"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb9-1" data-line-number="1"><span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/library">library</a></span>(ggdendro)</a>
180
+<a class="sourceLine" id="cb9-2" data-line-number="2"></a>
181
+<a class="sourceLine" id="cb9-3" data-line-number="3"><span class="co"># This should be turned into a function in the package somehow</span></a>
182
+<a class="sourceLine" id="cb9-4" data-line-number="4"><span class="co"># But plot arguments will be super-variable</span></a>
183
+<a class="sourceLine" id="cb9-5" data-line-number="5"><span class="co"># Maybe just return the `hc` object?</span></a>
184
+<a class="sourceLine" id="cb9-6" data-line-number="6">dendro_plot =<span class="st"> </span><span class="cf">function</span>(ccdb, idx, <span class="dt">method =</span> <span class="st">'complete'</span>){</a>
185
+<a class="sourceLine" id="cb9-7" data-line-number="7">    h =<span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/filter.html">filter</a></span>(ccdb<span class="op">$</span>cluster_tbl, <span class="op">!!</span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/tidyeval.html">sym</a></span>(ccdb<span class="op">$</span>cluster_pk) <span class="op">==</span><span class="st"> </span>idx) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/pull.html">pull</a></span>(fc) <span class="op">%&gt;%</span><span class="st"> </span>.[[<span class="dv">1</span>]]</a>
186
+<a class="sourceLine" id="cb9-8" data-line-number="8">    quer =<span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/filter.html">filter</a></span>(ccdb<span class="op">$</span>contig_tbl, <span class="op">!!</span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/tidyeval.html">sym</a></span>(ccdb<span class="op">$</span>cluster_pk) <span class="op">==</span><span class="st"> </span>idx)</a>
187
+<a class="sourceLine" id="cb9-9" data-line-number="9">    hc =<span class="st"> </span><span class="kw"><a href="https://www.rdocumentation.org/packages/stats/topics/hclust">hclust</a></span>(<span class="kw"><a href="https://www.rdocumentation.org/packages/stats/topics/dist">as.dist</a></span>(h<span class="op">$</span>distance_mat), <span class="dt">method =</span> method) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://www.rdocumentation.org/packages/ggdendro/topics/dendro_data">dendro_data</a></span>(<span class="dt">type =</span> <span class="st">"rectangle"</span>)</a>
188
+<a class="sourceLine" id="cb9-10" data-line-number="10">    hc<span class="op">$</span>labels =<span class="st"> </span><span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/cbind">cbind</a></span>(hc<span class="op">$</span>labels, quer)</a>
189
+<a class="sourceLine" id="cb9-11" data-line-number="11">   <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot</a></span>(hc<span class="op">$</span>segments, <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes</a></span>(<span class="dt">x=</span>x, <span class="dt">y=</span>y)) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/geom_segment.html">geom_segment</a></span>(<span class="kw"><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes</a></span>(<span class="dt">xend=</span>xend, <span class="dt">yend=</span>yend)) <span class="op">+</span><span class="st"> </span></a>
190
+<a class="sourceLine" id="cb9-12" data-line-number="12"><span class="st">  </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggtheme.html">theme_classic</a></span>() <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/geom_text.html">geom_text</a></span>(<span class="dt">data =</span> hc<span class="op">$</span>labels, <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes</a></span>(<span class="dt">color =</span> sample, <span class="dt">label =</span> fancy_name), <span class="dt">size =</span> <span class="dv">3</span>, <span class="dt">angle =</span> <span class="dv">60</span>) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/scale_continuous.html">scale_x_continuous</a></span>(<span class="dt">breaks =</span> <span class="ot">NULL</span>) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/labs.html">ylab</a></span>(<span class="st">'AA Distance'</span>) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/labs.html">xlab</a></span>(<span class="st">''</span>)</a>
191
+<a class="sourceLine" id="cb9-13" data-line-number="13">}</a>
192
+<a class="sourceLine" id="cb9-14" data-line-number="14"></a>
193
+<a class="sourceLine" id="cb9-15" data-line-number="15">MIN_OLIGO =<span class="st"> </span><span class="dv">7</span></a>
194
+<a class="sourceLine" id="cb9-16" data-line-number="16">to_plot =<span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/filter.html">filter</a></span>(aa80<span class="op">$</span>cluster_tbl, n_cluster <span class="op">&gt;=</span><span class="st"> </span>MIN_OLIGO)</a>
195
+<a class="sourceLine" id="cb9-17" data-line-number="17"></a>
196
+<a class="sourceLine" id="cb9-18" data-line-number="18"><span class="kw"><a href="https://purrr.tidyverse.org/reference/map.html">map</a></span>(to_plot<span class="op">$</span>aa80, <span class="op">~</span><span class="st"> </span><span class="kw">dendro_plot</span>(aa80, .))</a>
197
+<a class="sourceLine" id="cb9-19" data-line-number="19"><span class="co">#&gt; [[1]]</span></a></code></pre></div>
198
+<p><img src="cdr3_clustering_files/figure-html/unnamed-chunk-9-1.png" width="700"></p>
199
+<pre><code>#&gt; 
200
+#&gt; [[2]]</code></pre>
201
+<p><img src="cdr3_clustering_files/figure-html/unnamed-chunk-9-2.png" width="700"></p>
202
+<pre><code>#&gt; 
203
+#&gt; [[3]]</code></pre>
204
+<p><img src="cdr3_clustering_files/figure-html/unnamed-chunk-9-3.png" width="700"></p>
205
+<pre><code>#&gt; 
206
+#&gt; [[4]]</code></pre>
207
+<p><img src="cdr3_clustering_files/figure-html/unnamed-chunk-9-4.png" width="700"></p>
208
+<div class="sourceCode" id="cb13"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb13-1" data-line-number="1">aa80 =<span class="st"> </span>CellaRepertorium<span class="op">:::</span><span class="kw">canonicalize_cluster</span>(aa80, <span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/c">c</a></span>(<span class="st">'cdr3'</span>), <span class="dt">contig_fields =</span> <span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/c">c</a></span>(<span class="st">'cdr3'</span>, <span class="st">'cdr3_nt'</span>, <span class="st">'chain'</span>, <span class="st">'v_gene'</span>, <span class="st">'d_gene'</span>, <span class="st">'j_gene'</span>))</a></code></pre></div>
209
+<p>Pull the fields listed in <code>contig_fields</code> into the <code>cluster_tbl</code>, using the values found in the medoid contig</p>
210
+<div class="sourceCode" id="cb14"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb14-1" data-line-number="1">oligo_clusters =<span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/filter.html">filter</a></span>(aa80<span class="op">$</span>cluster_tbl, n_cluster <span class="op">&gt;=</span><span class="st"> </span>MIN_OLIGO)</a>
211
+<a class="sourceLine" id="cb14-2" data-line-number="2">oligo_contigs =<span class="st"> </span>aa80</a>
212
+<a class="sourceLine" id="cb14-3" data-line-number="3">oligo_contigs<span class="op">$</span>contig_tbl =<span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/join.html">semi_join</a></span>(oligo_contigs<span class="op">$</span>contig_tbl, oligo_clusters, <span class="dt">by =</span> <span class="st">'aa80'</span>)</a>
213
+<a class="sourceLine" id="cb14-4" data-line-number="4">oligo_contigs</a>
214
+<a class="sourceLine" id="cb14-5" data-line-number="5"><span class="co">#&gt; ContigCellDB of 54 contigs; 54 cells; and 4 clusters.</span></a>
215
+<a class="sourceLine" id="cb14-6" data-line-number="6"><span class="co">#&gt; Contigs keyed by barcode, pop, sample, contig_id; cells keyed by barcode, pop, sample.</span></a></code></pre></div>
216
+<p>Get contigs/cells/clusters found at least 7 times (across contigs). Note that replacing <code>contig_tbl</code> with the subset selected with the <code>semi_join</code> also automatically subsetted the <code>cell_tbl</code> and <code>cluster_tbl</code>.</p>
217
+<div class="sourceCode" id="cb15"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb15-1" data-line-number="1">oligo_clusters =<span class="st"> </span>oligo_contigs<span class="op">$</span>contig_tbl <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/group_by.html">group_by</a></span>(aa80) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarize</a></span>(<span class="st">`</span><span class="dt">n subjects observed</span><span class="st">`</span> =<span class="st"> </span><span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/length">length</a></span>(<span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/unique">unique</a></span>(sample))) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/join.html">left_join</a></span>(oligo_clusters)</a>
218
+<a class="sourceLine" id="cb15-2" data-line-number="2"><span class="co">#&gt; Joining, by = "aa80"</span></a>
219
+<a class="sourceLine" id="cb15-3" data-line-number="3"></a>
220
+<a class="sourceLine" id="cb15-4" data-line-number="4">knitr<span class="op">::</span><span class="kw"><a href="https://www.rdocumentation.org/packages/knitr/topics/kable">kable</a></span>(oligo_clusters <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/select.html">select</a></span>(aa80<span class="op">:</span>cdr3, chain<span class="op">:</span>j_gene, avg_distance, n_cluster))</a></code></pre></div>
221
+<table class="table">
222
+<thead><tr class="header">
223
+<th align="right">aa80</th>
224
+<th align="right">n subjects observed</th>
225
+<th align="left">cdr3</th>
226
+<th align="left">chain</th>
227
+<th align="left">v_gene</th>
228
+<th align="left">d_gene</th>
229
+<th align="left">j_gene</th>
230
+<th align="right">avg_distance</th>
231
+<th align="right">n_cluster</th>
232
+</tr></thead>
233
+<tbody>
234
+<tr class="odd">
235
+<td align="right">111</td>
236
+<td align="right">6</td>
237
+<td align="left">CVVGDRGSALGRLHF</td>
238
+<td align="left">TRA</td>
239
+<td align="left">TRAV11</td>
240
+<td align="left">None</td>
241
+<td align="left">TRAJ18</td>
242
+<td align="right">0.6071429</td>
243
+<td align="right">28</td>
244
+</tr>
245
+<tr class="even">
246
+<td align="right">172</td>
247
+<td align="right">5</td>
248
+<td align="left">CAVSRASSGSWQLIF</td>
249
+<td align="left">TRA</td>
250
+<td align="left">TRAV9N-3</td>
251
+<td align="left">None</td>
252
+<td align="left">TRAJ22</td>
253
+<td align="right">2.1111111</td>
254
+<td align="right">9</td>
255
+</tr>
256
+<tr class="odd">
257
+<td align="right">296</td>
258
+<td align="right">6</td>
259
+<td align="left">CAASASSGSWQLIF</td>
260
+<td align="left">TRA</td>
261
+<td align="left">TRAV14D-2</td>
262
+<td align="left">None</td>
263
+<td align="left">TRAJ22</td>
264
+<td align="right">1.5000000</td>
265
+<td align="right">8</td>
266
+</tr>
267
+<tr class="even">
268
+<td align="right">808</td>
269
+<td align="right">4</td>
270
+<td align="left">CATGNYAQGLTF</td>
271
+<td align="left">TRA</td>
272
+<td align="left">TRAV8D-2</td>
273
+<td align="left">None</td>
274
+<td align="left">TRAJ26</td>
275
+<td align="right">1.3333333</td>
276
+<td align="right">9</td>
277
+</tr>
278
+</tbody>
279
+</table>
280
+<p>Report some statistics about these expanded clusters.</p>
281
+</div>
282
+<div id="oligo-clusters" class="section level1">
283
+<h1 class="hasAnchor">
284
+<a href="#oligo-clusters" class="anchor"></a>Oligo clusters</h1>
285
+<div class="sourceCode" id="cb16"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb16-1" data-line-number="1">oligo_plot =<span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot</a></span>(oligo_contigs<span class="op">$</span>contig_tbl, <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes</a></span>(<span class="dt">x =</span> representative, <span class="dt">fill =</span> chain)) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/geom_bar.html">geom_bar</a></span>() <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/coord_flip.html">coord_flip</a></span>() <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/scale_brewer.html">scale_fill_brewer</a></span>(<span class="dt">type =</span> <span class="st">'qual'</span>) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggtheme.html">theme_minimal</a></span>()</a>
286
+<a class="sourceLine" id="cb16-2" data-line-number="2">oligo_plot</a></code></pre></div>
287
+<p><img src="cdr3_clustering_files/figure-html/unnamed-chunk-13-1.png" width="700"></p>
288
+<p>These always come from a single chain.</p>
289
+<div class="sourceCode" id="cb17"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb17-1" data-line-number="1">oligo_plot <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes</a></span>(<span class="dt">fill =</span>   sample) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/facet_wrap.html">facet_wrap</a></span>(<span class="op">~</span>pop)</a></code></pre></div>
290
+<p><img src="cdr3_clustering_files/figure-html/unnamed-chunk-14-1.png" width="700"></p>
291
+<p>But come from multiple populations and samples.</p>
292
+</div>
293
+<div id="formal-testing-for-frequency-differences" class="section level1">
294
+<h1 class="hasAnchor">
295
+<a href="#formal-testing-for-frequency-differences" class="anchor"></a>Formal testing for frequency differences</h1>
296
+<div class="sourceCode" id="cb18"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb18-1" data-line-number="1"><span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/library">library</a></span>(lme4)</a>
297
+<a class="sourceLine" id="cb18-2" data-line-number="2"><span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/library">library</a></span>(broom)</a>
298
+<a class="sourceLine" id="cb18-3" data-line-number="3">per_chain_sample =<span class="st"> </span>good_cluster_cells <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/group_by.html">group_by</a></span>(sample, pop, chain) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarize</a></span>(<span class="dt">total_cells =</span> <span class="kw"><a href="https://dplyr.tidyverse.org/reference/n.html">n</a></span>(), <span class="dt">weeks_premature =</span> weeks_premature[<span class="dv">1</span>])</a>
299
+<a class="sourceLine" id="cb18-4" data-line-number="4"></a>
300
+<a class="sourceLine" id="cb18-5" data-line-number="5">oligo_cluster_stat =<span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/join.html">semi_join</a></span>(oligo_clusters, good_cluster_cells <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/select.html">select</a></span>(dataset, contig_id)) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/group_by.html">group_by</a></span>(sample, pop, chain, cluster_idx) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarize</a></span>(<span class="dt">n_cluster =</span> <span class="kw"><a href="https://dplyr.tidyverse.org/reference/n.html">n</a></span>())<span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/group_by.html">ungroup</a></span>() <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://tidyr.tidyverse.org/reference/complete.html">complete</a></span>(sample, pop, <span class="kw"><a href="https://tidyr.tidyverse.org/reference/expand.html">nesting</a></span>(cluster_idx, chain), <span class="dt">fill =</span> <span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/list">list</a></span>(<span class="dt">n_cluster =</span> <span class="dv">0</span>))</a>
301
+<a class="sourceLine" id="cb18-6" data-line-number="6"></a>
302
+<a class="sourceLine" id="cb18-7" data-line-number="7">oligo_cluster_stat =<span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/join.html">left_join</a></span>(oligo_cluster_stat, per_chain_sample, <span class="dt">by =</span> <span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/c">c</a></span>(<span class="st">'sample'</span>, <span class="st">'pop'</span>, <span class="st">'chain'</span>)) </a>
303
+<a class="sourceLine" id="cb18-8" data-line-number="8"></a>
304
+<a class="sourceLine" id="cb18-9" data-line-number="9"><span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/stopifnot">stopifnot</a></span>( <span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/all">all</a></span>(<span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/colSums">colSums</a></span>(<span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/with">with</a></span>(oligo_cluster_stat, <span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/table">table</a></span>(chain, cluster_idx)) <span class="op">&gt;</span><span class="st"> </span><span class="dv">0</span>) <span class="op">==</span><span class="st"> </span><span class="dv">1</span>))</a>
305
+<a class="sourceLine" id="cb18-10" data-line-number="10"></a>
306
+<a class="sourceLine" id="cb18-11" data-line-number="11">mm_out =<span class="st"> </span><span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/warning">suppressWarnings</a></span>(oligo_cluster_stat <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/group_by.html">group_by</a></span>(cluster_idx, chain) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/do.html">do</a></span>( <span class="kw"><a href="https://www.rdocumentation.org/packages/lme4/topics/glmer">glmer</a></span>(<span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/cbind">cbind</a></span>(n_cluster, total_cells) <span class="op">~</span><span class="st"> </span>pop <span class="op">+</span><span class="st"> </span>weeks_premature <span class="op">+</span><span class="st"> </span>(<span class="dv">1</span><span class="op">|</span>sample), <span class="dt">data =</span> ., <span class="dt">family =</span> <span class="st">'binomial'</span>) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://www.rdocumentation.org/packages/broom/topics/reexports">tidy</a></span>(<span class="dt">conf.int =</span> <span class="ot">TRUE</span>)))</a></code></pre></div>
307
+<div class="sourceCode" id="cb19"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb19-1" data-line-number="1">mm_outj =<span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/filter.html">filter</a></span>(<span class="kw"><a href="https://dplyr.tidyverse.org/reference/join.html">left_join</a></span>(<span class="kw"><a href="https://dplyr.tidyverse.org/reference/group_by.html">ungroup</a></span>(mm_out), <span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/unique">unique</a></span>(oligo_clusters_all <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/select.html">select</a></span>(cdr3_representative, cluster_idx))), term <span class="op">%in%</span><span class="st"> </span><span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/c">c</a></span>(<span class="st">'popCD31Pos'</span>, <span class="st">'weeks_premature'</span>)) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate</a></span>(<span class="dt">ci_lo =</span> AMmisc<span class="op">::</span><span class="kw"><a href="https://www.rdocumentation.org/packages/AMmisc/topics/clamp">clamp</a></span>(conf.low), <span class="dt">ci_hi =</span> AMmisc<span class="op">::</span><span class="kw"><a href="https://www.rdocumentation.org/packages/AMmisc/topics/clamp">clamp</a></span>(conf.high)) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/arrange.html">arrange</a></span>(<span class="kw"><a href="https://dplyr.tidyverse.org/reference/desc.html">desc</a></span>(cdr3_representative))</a>
308
+<a class="sourceLine" id="cb19-2" data-line-number="2"></a>
309
+<a class="sourceLine" id="cb19-3" data-line-number="3"><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot</a></span>(mm_outj, <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes</a></span>(<span class="dt">x =</span> cdr3_representative, <span class="dt">ymin =</span> ci_lo, <span class="dt">ymax =</span> ci_hi, <span class="dt">y =</span> <span class="kw">clamp</span>(estimate))) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/geom_linerange.html">geom_pointrange</a></span>() <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/facet_wrap.html">facet_wrap</a></span>(<span class="op">~</span>term, <span class="dt">scales =</span> <span class="st">'free'</span>) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/coord_flip.html">coord_flip</a></span>() <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggtheme.html">theme_minimal</a></span>() <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/geom_abline.html">geom_hline</a></span>(<span class="dt">yintercept =</span> <span class="dv">0</span>, <span class="dt">lty =</span> <span class="dv">2</span>) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/labs.html">xlab</a></span>(<span class="st">"Isomorph"</span>) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/labs.html">ylab</a></span>(<span class="st">"log odds of isomorph"</span>)</a></code></pre></div>
310
+<p>We test if the binomial rate of clone expression differs between CD31+/- or term, for each clone.</p>
311
+</div>
312
+<div id="clonal-pairs" class="section level1">
313
+<h1 class="hasAnchor">
314
+<a href="#clonal-pairs" class="anchor"></a>Clonal pairs</h1>
315
+<div class="sourceCode" id="cb20"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb20-1" data-line-number="1">class_colors =<span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/reexports.html">data_frame</a></span>(<span class="dt">chain =</span>  <span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/unique">unique</a></span>(aa80<span class="op">$</span>cluster_tbl<span class="op">$</span>chain)) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate</a></span>(<span class="dt">class_color =</span>  RColorBrewer<span class="op">::</span><span class="kw"><a href="https://www.rdocumentation.org/packages/RColorBrewer/topics/ColorBrewer">brewer.pal</a></span>(<span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/length">length</a></span>(chain),<span class="st">"Set1"</span>)[<span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/seq">seq_along</a></span>(chain)])</a>
316
+<a class="sourceLine" id="cb20-2" data-line-number="2"><span class="co">#&gt; Warning: `data_frame()` is deprecated, use `tibble()`.</span></a>
317
+<a class="sourceLine" id="cb20-3" data-line-number="3"><span class="co">#&gt; This warning is displayed once per session.</span></a>
318
+<a class="sourceLine" id="cb20-4" data-line-number="4"><span class="co">#&gt; Warning in RColorBrewer::brewer.pal(length(chain), "Set1"): minimal value for n is 3, returning requested palette with 3 different levels</span></a>
319
+<a class="sourceLine" id="cb20-5" data-line-number="5"></a>
320
+<a class="sourceLine" id="cb20-6" data-line-number="6">aa80<span class="op">$</span>cluster_pk =<span class="st"> 'representative'</span></a>
321
+<a class="sourceLine" id="cb20-7" data-line-number="7">pairing_list =<span class="st"> </span><span class="kw"><a href="../reference/pairing_tables.html">pairing_tables</a></span>(aa80, <span class="dt">table_order =</span> <span class="dv">2</span>, <span class="dt">orphan_level =</span> <span class="dv">1</span>, <span class="dt">min_expansion =</span> <span class="dv">2</span>, <span class="dt">cluster_keys =</span> <span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/c">c</a></span>(<span class="st">'cdr3'</span>, <span class="st">'representative'</span>, <span class="st">'chain'</span>, <span class="st">'v_gene'</span>, <span class="st">'j_gene'</span>, <span class="st">'avg_distance'</span>))</a>
322
+<a class="sourceLine" id="cb20-8" data-line-number="8"><span class="co">#&gt; Warning: Factor `cluster_idx.2` contains implicit NA, consider using</span></a>
323
+<a class="sourceLine" id="cb20-9" data-line-number="9"><span class="co">#&gt; `forcats::fct_explicit_na`</span></a>
324
+<a class="sourceLine" id="cb20-10" data-line-number="10"><span class="co">#&gt; Warning: Column `representative` joining factors with different levels,</span></a>
325
+<a class="sourceLine" id="cb20-11" data-line-number="11"><span class="co">#&gt; coercing to character vector</span></a></code></pre></div>
326
+<div class="sourceCode" id="cb21"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb21-1" data-line-number="1">pairs_plt =<span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot</a></span>(pairing_list<span class="op">$</span>cell_tbl, <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes</a></span>(<span class="dt">x =</span> cluster_idx<span class="fl">.1</span>_fct, <span class="dt">y =</span> cluster_idx<span class="fl">.2</span>_fct, <span class="dt">color =</span> sample, <span class="dt">shape =</span> pop)) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/geom_jitter.html">geom_jitter</a></span>(<span class="dt">width =</span> <span class="fl">.3</span>, <span class="dt">height =</span> <span class="fl">.3</span>) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggtheme.html">theme_minimal</a></span>()</a>
327
+<a class="sourceLine" id="cb21-2" data-line-number="2"></a>
328
+<a class="sourceLine" id="cb21-3" data-line-number="3">ylab =<span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/reexports.html">data_frame</a></span>(<span class="dt">cdr3_representative =</span>  <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggplot_build.html">ggplot_build</a></span>(pairs_plt)<span class="op">$</span>layout<span class="op">$</span>panel_params[[<span class="dv">1</span>]]<span class="op">$</span>y.label) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/join.html">left_join</a></span>(feature_tbl) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate</a></span>(<span class="dt">class_color =</span> <span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/ifelse">ifelse</a></span>(<span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/NA">is.na</a></span>(class_color), <span class="st">'#E41A1C'</span>, class_color))</a>
329
+<a class="sourceLine" id="cb21-4" data-line-number="4"></a>
330
+<a class="sourceLine" id="cb21-5" data-line-number="5">xlab =<span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/reexports.html">data_frame</a></span>(<span class="dt">cdr3_representative =</span>  <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggplot_build.html">ggplot_build</a></span>(pairs_plt)<span class="op">$</span>layout<span class="op">$</span>panel_params[[<span class="dv">1</span>]]<span class="op">$</span>x.label) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/join.html">left_join</a></span>(feature_tbl) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate</a></span>(<span class="dt">class_color =</span> <span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/ifelse">ifelse</a></span>(<span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/NA">is.na</a></span>(class_color), <span class="st">'#E41A1C'</span>, class_color))</a>
331
+<a class="sourceLine" id="cb21-6" data-line-number="6"></a>
332
+<a class="sourceLine" id="cb21-7" data-line-number="7">pairs_plt =<span class="st"> </span>pairs_plt <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/theme.html">theme</a></span>(<span class="dt">axis.text.x =</span> <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/element.html">element_text</a></span>(<span class="dt">angle =</span> <span class="dv">90</span>, <span class="dt">color =</span> xlab<span class="op">$</span>class_color, <span class="dt">size =</span> <span class="dv">8</span>), <span class="dt">axis.text.y =</span> <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/element.html">element_text</a></span>(<span class="dt">color =</span> ylab<span class="op">$</span>class_color, <span class="dt">size =</span> <span class="dv">8</span>))</a>
333
+<a class="sourceLine" id="cb21-8" data-line-number="8"></a>
334
+<a class="sourceLine" id="cb21-9" data-line-number="9">pairs_plt</a></code></pre></div>
335
+<div id="expanded-clones" class="section level2">
336
+<h2 class="hasAnchor">
337
+<a href="#expanded-clones" class="anchor"></a>Expanded clones</h2>
338
+<div class="sourceCode" id="cb22"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb22-1" data-line-number="1">pairing_list =<span class="st"> </span><span class="kw"><a href="../reference/pairing_tables.html">pairing_tables</a></span>(oligo_clusters_all <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/select.html">select</a></span>(cdr3_representative, dataset, barcode, chain, umis, reads), <span class="dt">cluster_idx =</span> <span class="st">'cdr3_representative'</span>, <span class="dt">cell_identifiers =</span> <span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/c">c</a></span>(<span class="st">'dataset'</span>, <span class="st">'barcode'</span>), <span class="dt">canonicalize_fun =</span> canonicalize_by_prevalence, <span class="dt">table_order =</span> <span class="dv">2</span>, <span class="dt">orphan_level =</span> <span class="dv">1</span>, <span class="dt">min_expansion =</span> <span class="dv">4</span>, <span class="dt">feature_tbl =</span> feature_tbl, <span class="dt">cell_tbl =</span> good_cells, <span class="dt">cluster_whitelist =</span> <span class="kw"><a href="https://dplyr.tidyverse.org/reference/filter.html">filter</a></span>(oligo_clusters, n_cluster<span class="op">&gt;</span><span class="dv">8</span>) <span class="op">%&gt;%</span><span class="st"> </span>dplyr<span class="op">::</span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/select.html">select</a></span>(<span class="dt">cluster_idx.1 =</span> cdr3_representative) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/unique">unique</a></span>())</a>
339
+<a class="sourceLine" id="cb22-2" data-line-number="2">pairs_plt =<span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot</a></span>(pairing_list<span class="op">$</span>cell_tbl, <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes</a></span>(<span class="dt">x =</span> cluster_idx<span class="fl">.1</span>_fct, <span class="dt">y =</span> cluster_idx<span class="fl">.2</span>_fct, <span class="dt">color =</span> sample, <span class="dt">shape =</span> pop)) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/geom_jitter.html">geom_jitter</a></span>(<span class="dt">width =</span> <span class="fl">.3</span>, <span class="dt">height =</span> <span class="fl">.3</span>) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggtheme.html">theme_minimal</a></span>()</a>
340
+<a class="sourceLine" id="cb22-3" data-line-number="3"></a>
341
+<a class="sourceLine" id="cb22-4" data-line-number="4">ylab =<span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/reexports.html">data_frame</a></span>(<span class="dt">cdr3_representative =</span>  <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggplot_build.html">ggplot_build</a></span>(pairs_plt)<span class="op">$</span>layout<span class="op">$</span>panel_params[[<span class="dv">1</span>]]<span class="op">$</span>y.label) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/join.html">left_join</a></span>(feature_tbl) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate</a></span>(<span class="dt">class_color =</span> <span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/ifelse">ifelse</a></span>(<span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/NA">is.na</a></span>(class_color), <span class="st">'#E41A1C'</span>, class_color))</a>
342
+<a class="sourceLine" id="cb22-5" data-line-number="5"></a>
343
+<a class="sourceLine" id="cb22-6" data-line-number="6">xlab =<span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/reexports.html">data_frame</a></span>(<span class="dt">cdr3_representative =</span>  <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggplot_build.html">ggplot_build</a></span>(pairs_plt)<span class="op">$</span>layout<span class="op">$</span>panel_params[[<span class="dv">1</span>]]<span class="op">$</span>x.label) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/join.html">left_join</a></span>(feature_tbl) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate</a></span>(<span class="dt">class_color =</span> <span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/ifelse">ifelse</a></span>(<span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/NA">is.na</a></span>(class_color), <span class="st">'#E41A1C'</span>, class_color))</a>
344
+<a class="sourceLine" id="cb22-7" data-line-number="7"></a>
345
+<a class="sourceLine" id="cb22-8" data-line-number="8">pairs_plt =<span class="st"> </span>pairs_plt <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/theme.html">theme</a></span>(<span class="dt">axis.text.x =</span> <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/element.html">element_text</a></span>(<span class="dt">angle =</span> <span class="dv">90</span>, <span class="dt">color =</span> xlab<span class="op">$</span>class_color, <span class="dt">size =</span> <span class="dv">8</span>), <span class="dt">axis.text.y =</span> <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/element.html">element_text</a></span>(<span class="dt">color =</span> ylab<span class="op">$</span>class_color, <span class="dt">size =</span> <span class="dv">8</span>))</a>
346
+<a class="sourceLine" id="cb22-9" data-line-number="9"></a>
347
+<a class="sourceLine" id="cb22-10" data-line-number="10">pairs_plt</a></code></pre></div>
348
+</div>
349
+</div>
350
+<div id="length-of-cdr3" class="section level1">
351
+<h1 class="hasAnchor">
352
+<a href="#length-of-cdr3" class="anchor"></a>Length of CDR3</h1>
353
+<div class="sourceCode" id="cb23"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb23-1" data-line-number="1">aa80<span class="op">$</span>contig_tbl =<span class="st"> </span>aa80<span class="op">$</span>contig_tbl <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate</a></span>(<span class="dt">cdr3_length =</span> <span class="kw"><a href="https://stringr.tidyverse.org/reference/str_length.html">str_length</a></span>(cdr3_nt))</a>
354
+<a class="sourceLine" id="cb23-2" data-line-number="2"><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot</a></span>(aa80<span class="op">$</span>contig_tbl, <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes</a></span>(<span class="dt">fill =</span> pop, <span class="dt">x=</span> cdr3_length)) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/geom_histogram.html">geom_histogram</a></span>(<span class="dt">binwidth =</span> <span class="dv">1</span>, <span class="dt">mapping =</span> <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes</a></span>(<span class="dt">y =</span> ..density..)) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggtheme.html">theme_minimal</a></span>() <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/scale_brewer.html">scale_fill_brewer</a></span>(<span class="dt">type =</span> <span class="st">'qual'</span>) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/facet_grid.html">facet_grid</a></span>(sample <span class="op">~</span>chain) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/theme.html">theme</a></span>(<span class="dt">strip.text.y =</span> <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/element.html">element_text</a></span>(<span class="dt">angle =</span> <span class="dv">0</span>)) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/coord_cartesian.html">coord_cartesian</a></span>(<span class="dt">xlim =</span> <span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/c">c</a></span>(<span class="dv">25</span>, <span class="dv">55</span>))</a></code></pre></div>
355
+<p><img src="cdr3_clustering_files/figure-html/unnamed-chunk-17-1.png" width="700"></p>
356
+<p>Plot the CDR3 length distribution for each sample and pop. There doesn’t appear to be a noticable difference between Balbc and Black6 mice, but if we needed to make sure, an appropriate procedure would be to run a mixed model with a random <code>sample</code> effect (assumed to represent a biological replicate).</p>
357
+<div class="sourceCode" id="cb24"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb24-1" data-line-number="1">cdr_len =<span class="st"> </span>aa80<span class="op">$</span>contig_tbl <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/group_by.html">group_by</a></span>(chain) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/do.html">do</a></span>(broom<span class="op">::</span><span class="kw"><a href="https://www.rdocumentation.org/packages/broom/topics/reexports">tidy</a></span>(lme4<span class="op">::</span><span class="kw"><a href="https://www.rdocumentation.org/packages/lme4/topics/lmer">lmer</a></span>(cdr3_length <span class="op">~</span><span class="st"> </span>pop <span class="op">+</span><span class="st"> </span>(<span class="dv">1</span><span class="op">|</span>sample), <span class="dt">data =</span> .), <span class="dt">conf.int =</span> <span class="ot">TRUE</span>))</a>
358
+<a class="sourceLine" id="cb24-2" data-line-number="2"><span class="co">#&gt; boundary (singular) fit: see ?isSingular</span></a>
359
+<a class="sourceLine" id="cb24-3" data-line-number="3"><span class="co">#&gt; Warning in bind_rows_(x, .id): binding factor and character vector,</span></a>
360
+<a class="sourceLine" id="cb24-4" data-line-number="4"><span class="co">#&gt; coercing into character vector</span></a>
361
+<a class="sourceLine" id="cb24-5" data-line-number="5"><span class="co">#&gt; Warning in bind_rows_(x, .id): binding character and factor vector,</span></a>
362
+<a class="sourceLine" id="cb24-6" data-line-number="6"><span class="co">#&gt; coercing into character vector</span></a>
363
+<a class="sourceLine" id="cb24-7" data-line-number="7"><span class="co">#&gt; boundary (singular) fit: see ?isSingular</span></a>
364
+<a class="sourceLine" id="cb24-8" data-line-number="8"><span class="co">#&gt; Warning in bind_rows_(x, .id): binding factor and character vector,</span></a>
365
+<a class="sourceLine" id="cb24-9" data-line-number="9"><span class="co">#&gt; coercing into character vector</span></a>
366
+<a class="sourceLine" id="cb24-10" data-line-number="10"></a>
367
+<a class="sourceLine" id="cb24-11" data-line-number="11"><span class="co">#&gt; Warning in bind_rows_(x, .id): binding character and factor vector,</span></a>
368
+<a class="sourceLine" id="cb24-12" data-line-number="12"><span class="co">#&gt; coercing into character vector</span></a>
369
+<a class="sourceLine" id="cb24-13" data-line-number="13"><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot</a></span>(cdr_len <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/filter.html">filter</a></span>(group <span class="op">==</span><span class="st"> 'fixed'</span>, term <span class="op">!=</span><span class="st"> '(Intercept)'</span>), <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes</a></span>(<span class="dt">x =</span> <span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/interaction">interaction</a></span>(chain, term), <span class="dt">y =</span> estimate, <span class="dt">ymin =</span> conf.low, <span class="dt">ymax =</span> conf.high)) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/geom_linerange.html">geom_pointrange</a></span>() <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggtheme.html">theme_minimal</a></span>() <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/coord_flip.html">coord_flip</a></span>() <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/labs.html">ylab</a></span>(<span class="st">'Length(CDR3 Nt)'</span>) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/labs.html">xlab</a></span>(<span class="st">'Term/Chain'</span>)</a></code></pre></div>
370
+<p><img src="cdr3_clustering_files/figure-html/cdr3_len-1.png" width="288"></p>
371
+<p>As was suggested by the histogram, there doesn’t seem to be an obvious <code>pop</code> effect.</p>
372
+</div>
373
+  </div>
374
+
375
+  <div class="col-md-3 hidden-xs hidden-sm" id="sidebar">
376
+        <div id="tocnav">
377
+      <h2 class="hasAnchor">
378
+<a href="#tocnav" class="anchor"></a>Contents</h2>
379
+      <ul class="nav nav-pills nav-stacked">
380
+<li><a href="#load-filtered-contig-files">Load filtered contig files</a></li>
381
+      <li><a href="#chain-pairings">Chain pairings</a></li>
382
+      <li><a href="#cluster-cdr3-protein-sequences">Cluster CDR3 protein sequences</a></li>
383
+      <li><a href="#cluster-cdr3-dna-sequences">Cluster CDR3 DNA sequences</a></li>
384
+      <li><a href="#oligo-clusters">Oligo clusters</a></li>
385
+      <li><a href="#formal-testing-for-frequency-differences">Formal testing for frequency differences</a></li>
386
+      <li>
387
+<a href="#clonal-pairs">Clonal pairs</a><ul class="nav nav-pills nav-stacked">
388
+<li><a href="#expanded-clones">Expanded clones</a></li>
389
+      </ul>
390
+</li>
391
+      <li><a href="#length-of-cdr3">Length of CDR3</a></li>
392
+      </ul>
393
+</div>
394
+      </div>
395
+
396
+</div>
397
+
398
+
399
+      <footer><div class="copyright">
400
+  <p>Developed by Andrew McDavid, Yu Gu.</p>
401
+</div>
402
+
403
+<div class="pkgdown">
404
+  <p>Site built with <a href="https://pkgdown.r-lib.org/">pkgdown</a> 1.3.0.</p>
405
+</div>
406
+      </footer>
407
+</div>
408
+
409
+  
410
+
411
+  </body>
412
+</html>
0 413
new file mode 100644
1 414
Binary files /dev/null and b/docs/articles/cdr3_clustering_files/figure-html/cdr3_len-1.png differ
2 415
new file mode 100644
3 416
Binary files /dev/null and b/docs/articles/cdr3_clustering_files/figure-html/unnamed-chunk-13-1.png differ
4 417
new file mode 100644
5 418
Binary files /dev/null and b/docs/articles/cdr3_clustering_files/figure-html/unnamed-chunk-14-1.png differ
6 419
new file mode 100644
7 420
Binary files /dev/null and b/docs/articles/cdr3_clustering_files/figure-html/unnamed-chunk-17-1.png differ
8 421
new file mode 100644
9 422
Binary files /dev/null and b/docs/articles/cdr3_clustering_files/figure-html/unnamed-chunk-3-1.png differ
10 423
new file mode 100644
11 424
Binary files /dev/null and b/docs/articles/cdr3_clustering_files/figure-html/unnamed-chunk-4-1.png differ
12 425
new file mode 100644
13 426
Binary files /dev/null and b/docs/articles/cdr3_clustering_files/figure-html/unnamed-chunk-5-1.png differ
14 427
new file mode 100644
15 428
Binary files /dev/null and b/docs/articles/cdr3_clustering_files/figure-html/unnamed-chunk-7-1.png differ
16 429
new file mode 100644
17 430
Binary files /dev/null and b/docs/articles/cdr3_clustering_files/figure-html/unnamed-chunk-8-1.png differ
18 431
new file mode 100644
19 432
Binary files /dev/null and b/docs/articles/cdr3_clustering_files/figure-html/unnamed-chunk-9-1.png differ
20 433
new file mode 100644
21 434
Binary files /dev/null and b/docs/articles/cdr3_clustering_files/figure-html/unnamed-chunk-9-2.png differ
22 435
new file mode 100644
23 436
Binary files /dev/null and b/docs/articles/cdr3_clustering_files/figure-html/unnamed-chunk-9-3.png differ
24 437
new file mode 100644
25 438
Binary files /dev/null and b/docs/articles/cdr3_clustering_files/figure-html/unnamed-chunk-9-4.png differ
26 439
new file mode 100644
... ...
@@ -0,0 +1,144 @@
1
+<!-- Generated by pkgdown: do not edit by hand -->
2
+<!DOCTYPE html>
3
+<html lang="en">
4
+  <head>
5
+  <meta charset="utf-8">
6
+<meta http-equiv="X-UA-Compatible" content="IE=edge">
7
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
8
+
9
+<title>Articles • CellaRepertorium</title>
10
+
11
+<!-- jquery -->
12
+<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script>
13
+<!-- Bootstrap -->
14
+
15
+<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.7/css/bootstrap.min.css" integrity="sha256-916EbMg70RQy9LHiGkXzG8hSg9EdNy97GazNG/aiY1w=" crossorigin="anonymous" />
16
+<script src="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha256-U5ZEeKfGNOja007MMD3YBI0A3OSZOQbeG6z2f2Y0hu8=" crossorigin="anonymous"></script>
17
+
18
+<!-- Font Awesome icons -->
19
+<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css" integrity="sha256-eZrrJcwDc/3uDhsdt61sL2oOBY362qM3lon1gyExkL0=" crossorigin="anonymous" />
20
+
21
+<!-- clipboard.js -->
22
+<script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/2.0.4/clipboard.min.js" integrity="sha256-FiZwavyI2V6+EXO1U+xzLG3IKldpiTFf3153ea9zikQ=" crossorigin="anonymous"></script>
23
+
24
+<!-- sticky kit -->
25
+<script src="https://cdnjs.cloudflare.com/ajax/libs/sticky-kit/1.1.3/sticky-kit.min.js" integrity="sha256-c4Rlo1ZozqTPE2RLuvbusY3+SU1pQaJC0TjuhygMipw=" crossorigin="anonymous"></script>
26
+
27
+<!-- pkgdown -->
28
+<link href="../pkgdown.css" rel="stylesheet">
29
+<script src="../pkgdown.js"></script>
30
+
31
+
32
+
33
+<meta property="og:title" content="Articles" />
34
+
35
+
36
+
37
+<!-- mathjax -->
38
+<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js" integrity="sha256-nvJJv9wWKEm88qvoQl9ekL2J+k/RWIsaSScxxlsrv8k=" crossorigin="anonymous"></script>
39
+<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/config/TeX-AMS-MML_HTMLorMML.js" integrity="sha256-84DKXVJXs0/F8OTMzX4UR909+jtl4G7SPypPavF+GfA=" crossorigin="anonymous"></script>
40
+
41
+<!--[if lt IE 9]>
42
+<script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>
43
+<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
44
+<![endif]-->
45
+
46
+
47
+  </head>
48
+
49
+  <body>
50
+    <div class="container template-article-index">
51
+      <header>
52
+      <div class="navbar navbar-default navbar-fixed-top" role="navigation">
53
+  <div class="container">
54
+    <div class="navbar-header">
55
+      <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false">
56
+        <span class="sr-only">Toggle navigation</span>
57
+        <span class="icon-bar"></span>
58
+        <span class="icon-bar"></span>
59
+        <span class="icon-bar"></span>
60
+      </button>
61
+      <span class="navbar-brand">
62
+        <a class="navbar-link" href="../index.html">CellaRepertorium</a>
63
+        <span class="version label label-default" data-toggle="tooltip" data-placement="bottom" title="Released version">0.3.1</span>
64
+      </span>
65
+    </div>
66
+
67
+    <div id="navbar" class="navbar-collapse collapse">
68
+      <ul class="nav navbar-nav">
69
+        <li>
70
+  <a href="../index.html">
71
+    <span class="fa fa-home fa-lg"></span>
72
+     
73
+  </a>
74
+</li>
75
+<li>
76
+  <a href="../reference/index.html">Reference</a>
77
+</li>
78
+<li class="dropdown">
79
+  <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">
80
+    Articles
81
+     
82
+    <span class="caret"></span>
83
+  </a>
84
+  <ul class="dropdown-menu" role="menu">
85
+    <li>
86
+      <a href="../articles/cdr3_clustering.html">Clustering repertoire via CDR3 sequences</a>
87
+    </li>
88
+    <li>
89
+      <a href="../articles/mouse_tcell_qc.html">Quality control and Exploration of UMI-based repertoire data</a>
90
+    </li>
91
+  </ul>
92
+</li>
93
+      </ul>
94
+      
95
+      <ul class="nav navbar-nav navbar-right">
96
+        <li>
97
+  <a href="https://github.com/amcdavid/CellaRepertorium">
98
+    <span class="fa fa-github fa-lg"></span>
99
+     
100
+  </a>
101
+</li>
102
+      </ul>
103
+      
104
+    </div><!--/.nav-collapse -->
105
+  </div><!--/.container -->
106
+</div><!--/.navbar -->
107
+
108
+      
109
+      </header>
110
+
111
+<div class="row">
112
+  <div class="col-md-9 contents">
113
+    <div class="page-header">
114
+      <h1>Articles</h1>
115
+    </div>
116
+
117
+    <div class="section ">
118
+      <h3>All vignettes</h3>
119
+      <p class="section-desc"></p>
120
+
121
+      <ul>
122
+        <li><a href="cdr3_clustering.html">Clustering repertoire via CDR3 sequences</a></li>
123
+        <li><a href="mouse_tcell_qc.html">Quality control and Exploration of UMI-based repertoire data</a></li>
124
+      </ul>
125
+    </div>
126
+  </div>
127
+</div>
128
+
129
+      <footer>
130
+      <div class="copyright">
131
+  <p>Developed by Andrew McDavid, Yu Gu.</p>
132
+</div>
133
+
134
+<div class="pkgdown">
135
+  <p>Site built with <a href="https://pkgdown.r-lib.org/">pkgdown</a> 1.3.0.</p>
136
+</div>
137
+      </footer>
138
+   </div>
139
+
140
+  
141
+
142
+  </body>
143
+</html>
144
+
0 145
new file mode 100644
... ...
@@ -0,0 +1,323 @@
1
+<!DOCTYPE html>
2
+<!-- Generated by pkgdown: do not edit by hand --><html lang="en">
3
+<head>
4
+<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
5
+<meta charset="utf-8">
6
+<meta http-equiv="X-UA-Compatible" content="IE=edge">
7
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
8
+<title>Quality control and Exploration of UMI-based repertoire data • CellaRepertorium</title>
9
+<!-- jquery --><script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script><!-- Bootstrap --><link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.7/css/bootstrap.min.css" integrity="sha256-916EbMg70RQy9LHiGkXzG8hSg9EdNy97GazNG/aiY1w=" crossorigin="anonymous">
10
+<script src="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha256-U5ZEeKfGNOja007MMD3YBI0A3OSZOQbeG6z2f2Y0hu8=" crossorigin="anonymous"></script><!-- Font Awesome icons --><link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css" integrity="sha256-eZrrJcwDc/3uDhsdt61sL2oOBY362qM3lon1gyExkL0=" crossorigin="anonymous">
11
+<!-- clipboard.js --><script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/2.0.4/clipboard.min.js" integrity="sha256-FiZwavyI2V6+EXO1U+xzLG3IKldpiTFf3153ea9zikQ=" crossorigin="anonymous"></script><!-- sticky kit --><script src="https://cdnjs.cloudflare.com/ajax/libs/sticky-kit/1.1.3/sticky-kit.min.js" integrity="sha256-c4Rlo1ZozqTPE2RLuvbusY3+SU1pQaJC0TjuhygMipw=" crossorigin="anonymous"></script><!-- pkgdown --><link href="../pkgdown.css" rel="stylesheet">
12
+<script src="../pkgdown.js"></script><meta property="og:title" content="Quality control and Exploration of UMI-based repertoire data">
13
+<meta property="og:description" content="">
14
+<meta name="twitter:card" content="summary">
15
+<!-- mathjax --><script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js" integrity="sha256-nvJJv9wWKEm88qvoQl9ekL2J+k/RWIsaSScxxlsrv8k=" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/config/TeX-AMS-MML_HTMLorMML.js" integrity="sha256-84DKXVJXs0/F8OTMzX4UR909+jtl4G7SPypPavF+GfA=" crossorigin="anonymous"></script><!--[if lt IE 9]>
16
+<script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>
17
+<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
18
+<![endif]-->
19
+</head>
20
+<body>
21
+    <div class="container template-article">
22
+      <header><div class="navbar navbar-default navbar-fixed-top" role="navigation">
23
+  <div class="container">
24
+    <div class="navbar-header">
25
+      <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false">
26
+        <span class="sr-only">Toggle navigation</span>
27
+        <span class="icon-bar"></span>
28
+        <span class="icon-bar"></span>
29
+        <span class="icon-bar"></span>
30
+      </button>
31
+      <span class="navbar-brand">
32
+        <a class="navbar-link" href="../index.html">CellaRepertorium</a>
33
+        <span class="version label label-default" data-toggle="tooltip" data-placement="bottom" title="Released version">0.3.1</span>
34
+      </span>
35
+    </div>
36
+
37
+    <div id="navbar" class="navbar-collapse collapse">
38
+      <ul class="nav navbar-nav">
39
+<li>
40
+  <a href="../index.html">
41
+    <span class="fa fa-home fa-lg"></span>
42
+     
43
+  </a>
44
+</li>
45
+<li>
46
+  <a href="../reference/index.html">Reference</a>
47
+</li>
48
+<li class="dropdown">
49
+  <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">
50
+    Articles
51
+     
52
+    <span class="caret"></span>
53
+  </a>
54
+  <ul class="dropdown-menu" role="menu">
55
+<li>
56
+      <a href="../articles/cdr3_clustering.html">Clustering repertoire via CDR3 sequences</a>
57
+    </li>
58
+    <li>
59
+      <a href="../articles/mouse_tcell_qc.html">Quality control and Exploration of UMI-based repertoire data</a>
60
+    </li>
61
+  </ul>
62
+</li>
63
+      </ul>
64
+<ul class="nav navbar-nav navbar-right">
65
+<li>
66
+  <a href="https://github.com/amcdavid/CellaRepertorium">
67
+    <span class="fa fa-github fa-lg"></span>
68
+     
69
+  </a>
70
+</li>
71
+      </ul>
72
+</div>
73
+<!--/.nav-collapse -->
74
+  </div>
75
+<!--/.container -->
76
+</div>
77
+<!--/.navbar -->
78
+
79
+      
80
+      </header><div class="row">
81
+  <div class="col-md-9 contents">
82
+    <div class="page-header toc-ignore">
83
+      <h1>Quality control and Exploration of UMI-based repertoire data</h1>
84
+            
85
+      
86
+      <small class="dont-index">Source: <a href="https://github.com/amcdavid/CellaRepertorium/blob/master/vignettes/mouse_tcell_qc.Rmd"><code>vignettes/mouse_tcell_qc.Rmd</code></a></small>
87
+      <div class="hidden name"><code>mouse_tcell_qc.Rmd</code></div>
88
+
89
+    </div>
90
+
91
+    
92
+    
93
+<div class="sourceCode" id="cb1"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb1-1" data-line-number="1"><span class="co">#load_all()</span></a>
94
+<a class="sourceLine" id="cb1-2" data-line-number="2"><span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/library">library</a></span>(CellaRepertorium)</a>
95
+<a class="sourceLine" id="cb1-3" data-line-number="3"><span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/library">library</a></span>(dplyr)</a>
96
+<a class="sourceLine" id="cb1-4" data-line-number="4"><span class="co">#&gt; </span></a>
97
+<a class="sourceLine" id="cb1-5" data-line-number="5"><span class="co">#&gt; Attaching package: 'dplyr'</span></a>
98
+<a class="sourceLine" id="cb1-6" data-line-number="6"><span class="co">#&gt; The following objects are masked from 'package:stats':</span></a>
99
+<a class="sourceLine" id="cb1-7" data-line-number="7"><span class="co">#&gt; </span></a>
100
+<a class="sourceLine" id="cb1-8" data-line-number="8"><span class="co">#&gt;     filter, lag</span></a>
101
+<a class="sourceLine" id="cb1-9" data-line-number="9"><span class="co">#&gt; The following objects are masked from 'package:base':</span></a>
102
+<a class="sourceLine" id="cb1-10" data-line-number="10"><span class="co">#&gt; </span></a>
103
+<a class="sourceLine" id="cb1-11" data-line-number="11"><span class="co">#&gt;     intersect, setdiff, setequal, union</span></a>
104
+<a class="sourceLine" id="cb1-12" data-line-number="12"><span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/library">library</a></span>(ggplot2)</a>
105
+<a class="sourceLine" id="cb1-13" data-line-number="13"><span class="co">#&gt; Registered S3 methods overwritten by 'ggplot2':</span></a>
106
+<a class="sourceLine" id="cb1-14" data-line-number="14"><span class="co">#&gt;   method         from </span></a>
107
+<a class="sourceLine" id="cb1-15" data-line-number="15"><span class="co">#&gt;   [.quosures     rlang</span></a>
108
+<a class="sourceLine" id="cb1-16" data-line-number="16"><span class="co">#&gt;   c.quosures     rlang</span></a>
109
+<a class="sourceLine" id="cb1-17" data-line-number="17"><span class="co">#&gt;   print.quosures rlang</span></a>
110
+<a class="sourceLine" id="cb1-18" data-line-number="18"><span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/library">library</a></span>(readr)</a>
111
+<a class="sourceLine" id="cb1-19" data-line-number="19"><span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/library">library</a></span>(tidyr)</a>
112
+<a class="sourceLine" id="cb1-20" data-line-number="20"><span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/library">library</a></span>(stringr)</a></code></pre></div>
113
+<div id="general-questions" class="section level1">
114
+<h1 class="hasAnchor">
115
+<a href="#general-questions" class="anchor"></a>General questions</h1>
116
+<ol style="list-style-type: decimal">
117
+<li>Can/should elements of this script be included as package functionality (cell calling/filtering?)</li>
118
+<li>Can the QC be made more statistically principaled?</li>
119
+</ol>
120
+</div>
121
+<div id="load-contig-files" class="section level1">
122
+<h1 class="hasAnchor">
123
+<a href="#load-contig-files" class="anchor"></a>Load contig files</h1>
124
+<div class="sourceCode" id="cb2"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb2-1" data-line-number="1">files =<span class="st"> </span><span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/list.files">list.files</a></span>(<span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/system.file">system.file</a></span>(<span class="st">'extdata'</span>, <span class="dt">package =</span> <span class="st">'CellaRepertorium'</span>), <span class="dt">pattern =</span> <span class="st">"all_contig_annotations_.+?.csv.xz"</span>, <span class="dt">recursive =</span> <span class="ot">TRUE</span>, <span class="dt">full.names =</span> <span class="ot">TRUE</span>)</a>
125
+<a class="sourceLine" id="cb2-2" data-line-number="2"><span class="co"># Pull out sample and population names</span></a>
126
+<a class="sourceLine" id="cb2-3" data-line-number="3">samp_map =<span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/reexports.html">tibble</a></span>(<span class="dt">anno_file =</span> files, <span class="dt">pop =</span> <span class="kw"><a href="https://stringr.tidyverse.org/reference/str_match.html">str_match</a></span>(files, <span class="st">'b6|balbc'</span>)[,<span class="dv">1</span>], <span class="dt">sample =</span> <span class="kw"><a href="https://stringr.tidyverse.org/reference/str_match.html">str_match</a></span>(files, <span class="st">'_([0-9])</span><span class="ch">\\</span><span class="st">.'</span>)[,<span class="dv">2</span>])</a>
127
+<a class="sourceLine" id="cb2-4" data-line-number="4"></a>
128
+<a class="sourceLine" id="cb2-5" data-line-number="5">knitr<span class="op">::</span><span class="kw"><a href="https://www.rdocumentation.org/packages/knitr/topics/kable">kable</a></span>(samp_map)</a></code></pre></div>
129
+<table class="table">
130
+<thead><tr class="header">
131
+<th align="left">anno_file</th>
132
+<th align="left">pop</th>
133
+<th align="left">sample</th>
134
+</tr></thead>
135
+<tbody>
136
+<tr class="odd">
137
+<td align="left">/Users/amcdavid/Library/R/3.9-bioc/CellaRepertorium/extdata/all_contig_annotations_b6_4.csv.xz</td>
138
+<td align="left">b6</td>
139
+<td align="left">4</td>
140
+</tr>
141
+<tr class="even">
142
+<td align="left">/Users/amcdavid/Library/R/3.9-bioc/CellaRepertorium/extdata/all_contig_annotations_b6_5.csv.xz</td>
143
+<td align="left">b6</td>
144
+<td align="left">5</td>
145
+</tr>
146
+<tr class="odd">
147
+<td align="left">/Users/amcdavid/Library/R/3.9-bioc/CellaRepertorium/extdata/all_contig_annotations_b6_6.csv.xz</td>
148
+<td align="left">b6</td>
149
+<td align="left">6</td>
150
+</tr>
151
+<tr class="even">
152
+<td align="left">/Users/amcdavid/Library/R/3.9-bioc/CellaRepertorium/extdata/all_contig_annotations_balbc_1.csv.xz</td>
153
+<td align="left">balbc</td>
154
+<td align="left">1</td>
155
+</tr>
156
+<tr class="odd">
157
+<td align="left">/Users/amcdavid/Library/R/3.9-bioc/CellaRepertorium/extdata/all_contig_annotations_balbc_2.csv.xz</td>
158
+<td align="left">balbc</td>
159
+<td align="left">2</td>
160
+</tr>
161
+<tr class="even">
162
+<td align="left">/Users/amcdavid/Library/R/3.9-bioc/CellaRepertorium/extdata/all_contig_annotations_balbc_3.csv.xz</td>
163
+<td align="left">balbc</td>
164
+<td align="left">3</td>
165
+</tr>
166
+</tbody>
167
+</table>
168
+<p>PBMC pooled from Balb/C and C57BL/6 mice <a href="https://support.10xgenomics.com/single-cell-vdj/datasets/3.0.0/vdj_v1_mm_c57bl6_pbmc_t">were assayed on 10X genomics V3 chemistry</a> and a library enriched for TCR were run. For the purposes of illustrating functionality in this package, cell barcodes were subsampled 3 times for each of the Balb/C and Black6 pools to generate distinct <code>samples</code>, which is reflected in the <code>sample</code> column. More details are available in the scripts in the <code>data-raw</code> directory of this package.</p>
169
+<div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb3-1" data-line-number="1"><span class="co"># read in CSV</span></a>
170
+<a class="sourceLine" id="cb3-2" data-line-number="2">all_anno =<span class="st"> </span>samp_map <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/rowwise.html">rowwise</a></span>() <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate</a></span>(<span class="dt">anno =</span> <span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/list">list</a></span>(<span class="kw"><a href="https://readr.tidyverse.org/reference/read_delim.html">read_csv</a></span>(anno_file, <span class="dt">col_types =</span> <span class="kw"><a href="https://readr.tidyverse.org/reference/cols.html">cols</a></span>(</a>
171
+<a class="sourceLine" id="cb3-3" data-line-number="3">  <span class="dt">barcode =</span> <span class="kw"><a href="https://readr.tidyverse.org/reference/parse_atomic.html">col_character</a></span>(),</a>
172
+<a class="sourceLine" id="cb3-4" data-line-number="4">  <span class="dt">is_cell =</span> <span class="kw"><a href="https://readr.tidyverse.org/reference/parse_atomic.html">col_logical</a></span>(),</a>
173
+<a class="sourceLine" id="cb3-5" data-line-number="5">  <span class="dt">contig_id =</span> <span class="kw"><a href="https://readr.tidyverse.org/reference/parse_atomic.html">col_character</a></span>(),</a>
174
+<a class="sourceLine" id="cb3-6" data-line-number="6">  <span class="dt">high_confidence =</span> <span class="kw"><a href="https://readr.tidyverse.org/reference/parse_atomic.html">col_logical</a></span>(),</a>
175
+<a class="sourceLine" id="cb3-7" data-line-number="7">  <span class="dt">length =</span> <span class="kw"><a href="https://readr.tidyverse.org/reference/parse_atomic.html">col_double</a></span>(),</a>
176
+<a class="sourceLine" id="cb3-8" data-line-number="8">  <span class="dt">chain =</span> <span class="kw"><a href="https://readr.tidyverse.org/reference/parse_atomic.html">col_character</a></span>(),</a>
177
+<a class="sourceLine" id="cb3-9" data-line-number="9">  <span class="dt">v_gene =</span> <span class="kw"><a href="https://readr.tidyverse.org/reference/parse_atomic.html">col_character</a></span>(),</a>
178
+<a class="sourceLine" id="cb3-10" data-line-number="10">  <span class="dt">d_gene =</span> <span class="kw"><a href="https://readr.tidyverse.org/reference/parse_atomic.html">col_character</a></span>(),</a>
179
+<a class="sourceLine" id="cb3-11" data-line-number="11">  <span class="dt">j_gene =</span> <span class="kw"><a href="https://readr.tidyverse.org/reference/parse_atomic.html">col_character</a></span>(),</a>
180
+<a class="sourceLine" id="cb3-12" data-line-number="12">  <span class="dt">c_gene =</span> <span class="kw"><a href="https://readr.tidyverse.org/reference/parse_atomic.html">col_character</a></span>(),</a>
181
+<a class="sourceLine" id="cb3-13" data-line-number="13">  <span class="dt">full_length =</span> <span class="kw"><a href="https://readr.tidyverse.org/reference/parse_atomic.html">col_logical</a></span>(),</a>
182
+<a class="sourceLine" id="cb3-14" data-line-number="14">  <span class="dt">productive =</span> <span class="kw"><a href="https://readr.tidyverse.org/reference/parse_atomic.html">col_character</a></span>(),</a>
183
+<a class="sourceLine" id="cb3-15" data-line-number="15">  <span class="dt">cdr3 =</span> <span class="kw"><a href="https://readr.tidyverse.org/reference/parse_atomic.html">col_character</a></span>(),</a>
184
+<a class="sourceLine" id="cb3-16" data-line-number="16">  <span class="dt">cdr3_nt =</span> <span class="kw"><a href="https://readr.tidyverse.org/reference/parse_atomic.html">col_character</a></span>(),</a>
185
+<a class="sourceLine" id="cb3-17" data-line-number="17">  <span class="dt">reads =</span> <span class="kw"><a href="https://readr.tidyverse.org/reference/parse_atomic.html">col_double</a></span>(),</a>
186
+<a class="sourceLine" id="cb3-18" data-line-number="18">  <span class="dt">umis =</span> <span class="kw"><a href="https://readr.tidyverse.org/reference/parse_atomic.html">col_double</a></span>(),</a>
187
+<a class="sourceLine" id="cb3-19" data-line-number="19">  <span class="dt">raw_clonotype_id =</span> <span class="kw"><a href="https://readr.tidyverse.org/reference/parse_atomic.html">col_character</a></span>(),</a>
188
+<a class="sourceLine" id="cb3-20" data-line-number="20">  <span class="dt">raw_consensus_id =</span> <span class="kw"><a href="https://readr.tidyverse.org/reference/parse_atomic.html">col_character</a></span>()</a>
189
+<a class="sourceLine" id="cb3-21" data-line-number="21">))))</a>
190
+<a class="sourceLine" id="cb3-22" data-line-number="22"></a>
191
+<a class="sourceLine" id="cb3-23" data-line-number="23">all_anno =<span class="st"> </span>all_anno <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://tidyr.tidyverse.org/reference/unnest.html">unnest</a></span>() <span class="op">%&gt;%</span><span class="st"> </span></a>
192
+<a class="sourceLine" id="cb3-24" data-line-number="24"><span class="st">    </span><span class="co"># write a column specifying what cell type the contig belonged to</span></a>
193
+<a class="sourceLine" id="cb3-25" data-line-number="25"><span class="st">    </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate</a></span>(<span class="dt">celltype =</span> <span class="kw"><a href="https://dplyr.tidyverse.org/reference/case_when.html">case_when</a></span>(chain <span class="op">%in%</span><span class="st"> </span><span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/c">c</a></span>(<span class="st">'TRA'</span>, <span class="st">'TRB'</span>) <span class="op">~</span><span class="st"> "T_ab"</span>, chain <span class="op">%in%</span><span class="st"> </span><span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/c">c</a></span>(<span class="st">'TRD'</span>, <span class="st">'TRG'</span>) <span class="op">~</span><span class="st"> 'T_gd'</span>, chain <span class="op">==</span><span class="st"> 'Multi'</span> <span class="op">~</span><span class="st"> 'Multi'</span>, chain <span class="op">%in%</span><span class="st"> </span><span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/c">c</a></span>(<span class="st">'IGH'</span>,<span class="st">'IGK'</span>, <span class="st">'IGL'</span>) <span class="op">~</span><span class="st"> 'B'</span>, <span class="ot">TRUE</span> <span class="op">~</span><span class="st"> </span><span class="ot">NA_character_</span>))</a></code></pre></div>
194
+<p>The pipeline for assembling reads into contigs, and mapping them to UMIs and cells is described in <a href="https://support.10xgenomics.com/single-cell-vdj/software/pipelines/latest/algorithms/overview">the 10X genomics documentation</a>, and its source code is available <a href="https://github.com/10XGenomics/cellranger/tree/5f5a6293bbc067e1965e50f0277286914b96c908">here</a>.</p>
195
+<p>[Comment in greater depth on what output we are using, maybe make a schematic?]</p>
196
+<p>We read in the contig annotation file for each of the samples, and annotate the contig as a alpha-beta T cell, gamma-delta T cell, B cell or chimeric “multi” cell type based on where various</p>
197
+</div>
198
+<div id="high-confidence-umis-belonging-to-t-cells-per-cell" class="section level1">
199
+<h1 class="hasAnchor">
200
+<a href="#high-confidence-umis-belonging-to-t-cells-per-cell" class="anchor"></a>High confidence UMIs belonging to T cells per cell</h1>
201
+<div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb4-1" data-line-number="1">total_umi =<span class="st"> </span>all_anno  <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/group_by.html">group_by</a></span>(sample, pop, barcode, is_cell, high_confidence, celltype <span class="op">==</span><span class="st"> 'T_ab'</span>) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarize</a></span>(<span class="dt">total_umi =</span> <span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/sum">sum</a></span>(umis)) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/select.html">rename</a></span>(<span class="dt">is_T =</span> <span class="st">`</span><span class="dt">celltype == "T_ab"</span><span class="st">`</span>)</a>
202
+<a class="sourceLine" id="cb4-2" data-line-number="2"></a>
203
+<a class="sourceLine" id="cb4-3" data-line-number="3"><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot</a></span>(<span class="kw"><a href="https://dplyr.tidyverse.org/reference/filter.html">filter</a></span>(total_umi, high_confidence, is_T), <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes</a></span>(<span class="dt">color =</span> <span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/factor">factor</a></span>(is_cell), <span class="dt">x =</span> total_umi, <span class="dt">group =</span> <span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/interaction">interaction</a></span>(is_cell, sample, pop))) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/stat_ecdf.html">stat_ecdf</a></span>() <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/coord_cartesian.html">coord_cartesian</a></span>(<span class="dt">xlim =</span> <span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/c">c</a></span>(<span class="dv">0</span>, <span class="dv">10</span>)) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/labs.html">ylab</a></span>(<span class="st">'Fraction of barcodes'</span>) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggtheme.html">theme_minimal</a></span>() <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/scale_hue.html">scale_color_discrete</a></span>(<span class="st">'10X called cell?'</span>)</a></code></pre></div>
204
+<p><img src="mouse_tcell_qc_files/figure-html/unnamed-chunk-4-1.png" width="700"></p>
205
+<p>10X defines <a href="https://support.10xgenomics.com/single-cell-vdj/software/pipelines/latest/algorithms/cell-calling">a procedure</a> to separate cells from background that fits a Gaussian mixture model to the UMI distributions for each sample. However in saome cases, it may be desirable to implement a common QC threshold with a different stringency, such as:</p>
206
+<ul>
207
+<li>Comparing across multiple samples</li>
208
+<li>When a sample has been enriched for a particular cell type (eg with pre-sequencing flow cytometry).</li>
209
+</ul>
210
+<p>When we consider only high confidence UMIs that unambiguous map to T cells, most “non cells” have 1 or fewer, while most putative cells have &gt;5. However, we might want to adopt a different UMI-based cell filter, as was done below. Is there a way to evaluate a sensitivity/specificity in distinguishing cells from debris, or T cells from other cell types?</p>
211
+</div>
212
+<div id="reads-umis" class="section level1">
213
+<h1 class="hasAnchor">
214
+<a href="#reads-umis" class="anchor"></a>Reads / UMIs</h1>
215
+<div class="sourceCode" id="cb5"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb5-1" data-line-number="1">qual_plot =<span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot</a></span>(all_anno, <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes</a></span>(<span class="dt">x =</span> celltype, <span class="dt">y=</span> umis)) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/geom_violin.html">geom_violin</a></span>() <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/geom_jitter.html">geom_jitter</a></span>() <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/facet_wrap.html">facet_wrap</a></span>(<span class="op">~</span>sample <span class="op">+</span><span class="st"> </span>pop) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/scale_continuous.html">scale_y_log10</a></span>() <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/labs.html">xlab</a></span>(<span class="st">"Annotated cell type"</span>)</a>
216
+<a class="sourceLine" id="cb5-2" data-line-number="2"></a>
217
+<a class="sourceLine" id="cb5-3" data-line-number="3">qual_plot </a></code></pre></div>
218
+<p><img src="mouse_tcell_qc_files/figure-html/unnamed-chunk-5-1.png" width="700"></p>
219
+<div class="sourceCode" id="cb6"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb6-1" data-line-number="1">qual_plot <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes</a></span>(<span class="dt">y =</span> reads)</a></code></pre></div>
220
+<p><img src="mouse_tcell_qc_files/figure-html/unnamed-chunk-5-2.png" width="700"></p>
221
+<p>The number of UMIs and reads by sample and annotated cell type.</p>
222
+</div>
223
+<div id="apply-t-cell-contig-umi-filter" class="section level1">
224
+<h1 class="hasAnchor">
225
+<a href="#apply-t-cell-contig-umi-filter" class="anchor"></a>Apply T-cell contig UMI filter</h1>
226
+<div class="sourceCode" id="cb7"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb7-1" data-line-number="1"><span class="co"># At least 2 UMI mapping to high confidence T cell contigs.</span></a>
227
+<a class="sourceLine" id="cb7-2" data-line-number="2">good_bc =<span class="st"> </span>total_umi <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/group_by.html">ungroup</a></span>() <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/filter.html">filter</a></span>(is_cell) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/filter.html">filter</a></span>(total_umi <span class="op">&gt;=</span><span class="st"> </span><span class="dv">2</span>, is_T) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/filter.html">filter</a></span>(high_confidence)</a>
228
+<a class="sourceLine" id="cb7-3" data-line-number="3">total_cells =<span class="st"> </span>good_bc <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/group_by.html">group_by</a></span>(sample, pop) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarize</a></span>(<span class="dt">good_bc =</span> <span class="kw"><a href="https://dplyr.tidyverse.org/reference/n.html">n</a></span>())</a>
229
+<a class="sourceLine" id="cb7-4" data-line-number="4">knitr<span class="op">::</span><span class="kw"><a href="https://www.rdocumentation.org/packages/knitr/topics/kable">kable</a></span>(total_cells)</a></code></pre></div>
230
+<table class="table">
231
+<thead><tr class="header">
232
+<th align="left">sample</th>
233
+<th align="left">pop</th>
234
+<th align="right">good_bc</th>
235
+</tr></thead>
236
+<tbody>
237
+<tr class="odd">
238
+<td align="left">1</td>
239
+<td align="left">balbc</td>
240
+<td align="right">133</td>
241
+</tr>
242
+<tr class="even">
243
+<td align="left">2</td>
244
+<td align="left">balbc</td>
245
+<td align="right">137</td>
246
+</tr>
247
+<tr class="odd">
248
+<td align="left">3</td>
249
+<td align="left">balbc</td>
250
+<td align="right">143</td>
251
+</tr>
252
+<tr class="even">
253
+<td align="left">4</td>
254
+<td align="left">b6</td>
255
+<td align="right">149</td>
256
+</tr>
257
+<tr class="odd">
258
+<td align="left">5</td>
259
+<td align="left">b6</td>
260
+<td align="right">150</td>
261
+</tr>
262
+<tr class="even">
263
+<td align="left">6</td>
264
+<td align="left">b6</td>
265
+<td align="right">148</td>
266
+</tr>
267
+</tbody>
268
+</table>
269
+<p>Apply a filter on UMIs.</p>
270
+<div class="sourceCode" id="cb8"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb8-1" data-line-number="1">contigs_qc =<span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/join.html">semi_join</a></span>(all_anno, good_bc <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/select.html">select</a></span>(sample, pop, barcode)) <span class="op">%&gt;%</span><span class="st"> </span></a>
271
+<a class="sourceLine" id="cb8-2" data-line-number="2"><span class="st">  </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/filter.html">filter</a></span>(full_length, productive <span class="op">==</span><span class="st"> 'True'</span>, high_confidence, chain <span class="op">!=</span><span class="st"> 'Multi'</span>)</a>
272
+<a class="sourceLine" id="cb8-3" data-line-number="3"><span class="co">#&gt; Joining, by = c("pop", "sample", "barcode")</span></a></code></pre></div>
273
+<p>And take only high confidence, full length, productive <span class="math inline">\(\alpha-\beta\)</span> T cell contigs.</p>
274
+</div>
275
+<div id="multi-chain-t-cells" class="section level1">
276
+<h1 class="hasAnchor">
277
+<a href="#multi-chain-t-cells" class="anchor"></a>“Multi-chain” T cells</h1>
278
+<div class="sourceCode" id="cb9"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb9-1" data-line-number="1"></a>
279
+<a class="sourceLine" id="cb9-2" data-line-number="2">n_productive =<span class="st"> </span>contigs_qc <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/group_by.html">group_by</a></span>(sample, pop, barcode) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarize</a></span>(<span class="dt">n_productive =</span> <span class="kw"><a href="https://dplyr.tidyverse.org/reference/n.html">n</a></span>())</a>
280
+<a class="sourceLine" id="cb9-3" data-line-number="3"></a>
281
+<a class="sourceLine" id="cb9-4" data-line-number="4"><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot</a></span>(n_productive, <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes</a></span>(<span class="dt">x =</span> <span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/interaction">interaction</a></span>(sample,pop), <span class="dt">fill =</span> <span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/factor">factor</a></span>(n_productive))) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/geom_bar.html">geom_bar</a></span>(<span class="dt">position =</span> <span class="kw"><a href="https://ggplot2.tidyverse.org/reference/position_stack.html">position_stack</a></span>()) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/ggtheme.html">theme_minimal</a></span>() <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/labs.html">xlab</a></span>(<span class="st">"Replicate"</span>) <span class="op">+</span><span class="st"> </span><span class="kw"><a href="https://ggplot2.tidyverse.org/reference/labs.html">ylab</a></span>(<span class="st">"Number of cell barcodes"</span>)</a></code></pre></div>
282
+<p><img src="mouse_tcell_qc_files/figure-html/unnamed-chunk-8-1.png" width="700"></p>
283
+<p>Number of productive chains per replicate.</p>
284
+</div>
285
+<div id="add-a-plot-showing-chain-breakdowns-alpha-alphaalpha-beta-alpha-alpha-beta-etc" class="section level1">
286
+<h1 class="hasAnchor">
287
+<a href="#add-a-plot-showing-chain-breakdowns-alpha-alphaalpha-beta-alpha-alpha-beta-etc" class="anchor"></a>Add a plot showing chain breakdowns alpha-alpha/alpha-beta, alpha-alpha-beta, etc</h1>
288
+</div>
289
+  </div>
290
+
291
+  <div class="col-md-3 hidden-xs hidden-sm" id="sidebar">
292
+        <div id="tocnav">
293
+      <h2 class="hasAnchor">
294
+<a href="#tocnav" class="anchor"></a>Contents</h2>
295
+      <ul class="nav nav-pills nav-stacked">
296
+<li><a href="#general-questions">General questions</a></li>
297
+      <li><a href="#load-contig-files">Load contig files</a></li>
298
+      <li><a href="#high-confidence-umis-belonging-to-t-cells-per-cell">High confidence UMIs belonging to T cells per cell</a></li>
299
+      <li><a href="#reads-umis">Reads / UMIs</a></li>
300
+      <li><a href="#apply-t-cell-contig-umi-filter">Apply T-cell contig UMI filter</a></li>
301
+      <li><a href="#multi-chain-t-cells">“Multi-chain” T cells</a></li>
302
+      <li><a href="#add-a-plot-showing-chain-breakdowns-alpha-alphaalpha-beta-alpha-alpha-beta-etc">Add a plot showing chain breakdowns alpha-alpha/alpha-beta, alpha-alpha-beta, etc</a></li>
303
+      </ul>
304
+</div>
305
+      </div>
306
+
307
+</div>
308
+
309
+
310
+      <footer><div class="copyright">
311
+  <p>Developed by Andrew McDavid, Yu Gu.</p>
312
+</div>
313
+
314
+<div class="pkgdown">
315
+  <p>Site built with <a href="https://pkgdown.r-lib.org/">pkgdown</a> 1.3.0.</p>
316
+</div>
317
+      </footer>
318
+</div>
319
+
320
+  
321
+
322
+  </body>
323
+</html>
0 324
new file mode 100644
1 325
Binary files /dev/null and b/docs/articles/mouse_tcell_qc_files/figure-html/unnamed-chunk-4-1.png differ
2 326
new file mode 100644
3 327
Binary files /dev/null and b/docs/articles/mouse_tcell_qc_files/figure-html/unnamed-chunk-5-1.png differ
4 328
new file mode 100644
5 329
Binary files /dev/null and b/docs/articles/mouse_tcell_qc_files/figure-html/unnamed-chunk-5-2.png differ
6 330
new file mode 100644
7 331
Binary files /dev/null and b/docs/articles/mouse_tcell_qc_files/figure-html/unnamed-chunk-8-1.png differ
8 332
new file mode 100644
... ...
@@ -0,0 +1,148 @@
1
+<!-- Generated by pkgdown: do not edit by hand -->
2
+<!DOCTYPE html>
3
+<html lang="en">
4
+  <head>
5
+  <meta charset="utf-8">
6
+<meta http-equiv="X-UA-Compatible" content="IE=edge">
7
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
8
+
9
+<title>Authors • CellaRepertorium</title>
10
+
11
+<!-- jquery -->
12
+<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script>
13
+<!-- Bootstrap -->
14
+
15
+<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.7/css/bootstrap.min.css" integrity="sha256-916EbMg70RQy9LHiGkXzG8hSg9EdNy97GazNG/aiY1w=" crossorigin="anonymous" />
16
+<script src="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha256-U5ZEeKfGNOja007MMD3YBI0A3OSZOQbeG6z2f2Y0hu8=" crossorigin="anonymous"></script>
17
+
18
+<!-- Font Awesome icons -->
19
+<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css" integrity="sha256-eZrrJcwDc/3uDhsdt61sL2oOBY362qM3lon1gyExkL0=" crossorigin="anonymous" />
20
+
21
+<!-- clipboard.js -->
22
+<script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/2.0.4/clipboard.min.js" integrity="sha256-FiZwavyI2V6+EXO1U+xzLG3IKldpiTFf3153ea9zikQ=" crossorigin="anonymous"></script>
23
+
24
+<!-- sticky kit -->
25
+<script src="https://cdnjs.cloudflare.com/ajax/libs/sticky-kit/1.1.3/sticky-kit.min.js" integrity="sha256-c4Rlo1ZozqTPE2RLuvbusY3+SU1pQaJC0TjuhygMipw=" crossorigin="anonymous"></script>
26
+
27
+<!-- pkgdown -->
28
+<link href="pkgdown.css" rel="stylesheet">
29
+<script src="pkgdown.js"></script>
30
+
31
+
32
+
33
+<meta property="og:title" content="Authors" />
34
+
35
+
36
+
37
+<!-- mathjax -->
38
+<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js" integrity="sha256-nvJJv9wWKEm88qvoQl9ekL2J+k/RWIsaSScxxlsrv8k=" crossorigin="anonymous"></script>
39
+<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/config/TeX-AMS-MML_HTMLorMML.js" integrity="sha256-84DKXVJXs0/F8OTMzX4UR909+jtl4G7SPypPavF+GfA=" crossorigin="anonymous"></script>
40
+
41
+<!--[if lt IE 9]>
42
+<script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>
43
+<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
44
+<![endif]-->
45
+
46
+
47
+  </head>
48
+
49
+  <body>
50
+    <div class="container template-authors">
51
+      <header>
52
+      <div class="navbar navbar-default navbar-fixed-top" role="navigation">
53
+  <div class="container">
54
+    <div class="navbar-header">
55
+      <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false">
56
+        <span class="sr-only">Toggle navigation</span>
57
+        <span class="icon-bar"></span>
58
+        <span class="icon-bar"></span>
59
+        <span class="icon-bar"></span>
60
+      </button>
61
+      <span class="navbar-brand">
62
+        <a class="navbar-link" href="index.html">CellaRepertorium</a>
63
+        <span class="version label label-default" data-toggle="tooltip" data-placement="bottom" title="Released version">0.3.1</span>
64
+      </span>
65
+    </div>
66
+
67
+    <div id="navbar" class="navbar-collapse collapse">
68
+      <ul class="nav navbar-nav">
69
+        <li>
70
+  <a href="index.html">
71
+    <span class="fa fa-home fa-lg"></span>
72
+     
73
+  </a>
74
+</li>
75
+<li>
76
+  <a href="reference/index.html">Reference</a>
77
+</li>
78
+<li class="dropdown">
79
+  <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">
80
+    Articles
81
+     
82
+    <span class="caret"></span>
83
+  </a>
84
+  <ul class="dropdown-menu" role="menu">
85
+    <li>
86
+      <a href="articles/cdr3_clustering.html">Clustering repertoire via CDR3 sequences</a>
87
+    </li>
88
+    <li>
89
+      <a href="articles/mouse_tcell_qc.html">Quality control and Exploration of UMI-based repertoire data</a>
90
+    </li>
91
+  </ul>
92
+</li>
93
+      </ul>
94
+      
95
+      <ul class="nav navbar-nav navbar-right">
96
+        <li>
97
+  <a href="https://github.com/amcdavid/CellaRepertorium">
98
+    <span class="fa fa-github fa-lg"></span>
99
+     
100
+  </a>
101
+</li>
102
+      </ul>
103
+      
104
+    </div><!--/.nav-collapse -->
105
+  </div><!--/.container -->
106
+</div><!--/.navbar -->
107
+
108
+      
109
+      </header>
110
+
111
+<div class="row">
112
+  <div class="contents col-md-9">
113
+    <div class="page-header">
114
+      <h1>Authors</h1>
115
+    </div>
116
+
117
+    <ul class="list-unstyled">
118
+      <li>
119
+        <p><strong>Andrew McDavid</strong>. Author, maintainer. 
120
+        </p>
121
+      </li>
122
+      <li>
123
+        <p><strong>Yu Gu</strong>. Author. 
124
+        </p>
125
+      </li>
126
+    </ul>
127
+
128
+  </div>
129
+
130
+</div>
131
+
132
+
133
+      <footer>
134
+      <div class="copyright">
135
+  <p>Developed by Andrew McDavid, Yu Gu.</p>
136
+</div>
137
+
138
+<div class="pkgdown">
139
+  <p>Site built with <a href="https://pkgdown.r-lib.org/">pkgdown</a> 1.3.0.</p>
140
+</div>
141
+      </footer>
142
+   </div>
143
+
144
+  
145
+
146
+  </body>
147
+</html>
148
+
0 149
new file mode 100644
... ...
@@ -0,0 +1,148 @@
1
+/* Docsearch -------------------------------------------------------------- */
2
+/*
3
+  Source: https://github.com/algolia/docsearch/
4
+  License: MIT
5
+*/
6
+
7
+.algolia-autocomplete {
8
+  display: block;
9
+  -webkit-box-flex: 1;
10
+  -ms-flex: 1;
11
+  flex: 1
12
+}
13
+
14
+.algolia-autocomplete .ds-dropdown-menu {
15
+  width: 100%;
16
+  min-width: none;
17
+  max-width: none;
18
+  padding: .75rem 0;
19
+  background-color: #fff;
20
+  background-clip: padding-box;
21
+  border: 1px solid rgba(0, 0, 0, .1);
22
+  box-shadow: 0 .5rem 1rem rgba(0, 0, 0, .175);
23
+}
24
+
25
+@media (min-width:768px) {
26
+  .algolia-autocomplete .ds-dropdown-menu {
27
+      width: 175%
28
+  }
29
+}
30
+
31
+.algolia-autocomplete .ds-dropdown-menu::before {
32
+  display: none
33
+}
34
+
35
+.algolia-autocomplete .ds-dropdown-menu [class^=ds-dataset-] {
36
+  padding: 0;
37
+  background-color: rgb(255,255,255);
38
+  border: 0;
39
+  max-height: 80vh;
40
+}
41
+
42
+.algolia-autocomplete .ds-dropdown-menu .ds-suggestions {
43
+  margin-top: 0
44
+}
45
+
46
+.algolia-autocomplete .algolia-docsearch-suggestion {
47
+  padding: 0;
48
+  overflow: visible
49
+}
50
+
51