Bioconductor Code: CellaRepertorium

Name	Mode	Size
R	040000
data-raw	040000
data	040000
docs	040000
inst	040000
man	040000
src	040000
tests	040000
vignettes	040000
.Rbuildignore	100644	0 kb
.gitignore	100644	0 kb
DESCRIPTION	100644	1 kb
NAMESPACE	100644	2 kb
README.md	100644	2 kb
_pkgdown.yml	100644	0 kb

README.md

# CellaRepertorium This package contains methods for clustering and analyzing single cell RepSeq data, especially as generated by [10X genomics VDJ solution](https://support.10xgenomics.com/single-cell-vdj). ## Installation ``` devtools::install_github('amcdavid/CellaRepertorium') ``` Requires R>=3.5. ## Data requirements and package structure The fundamental unit is the **contig**, which is a section of contiguously stitched reads from a single **cell**. Each contig belongs to one (and only one) cell, however, cells generate multiple contigs. Contigs can also belong to a **cluster**. Because of these two many-to-one mappings, these data can be thought as a series of ragged arrays. The links between them mean they are relational data. [A schematic of contigs and cells should go here] A `ContigCellDB` object wraps each of these objects as a sequence of three `data.frame`s (well, `tibble`s, actually). `ContigCellDB` also tracks columns (keys) that unique identify each row in each of these tables. The `contig_tbl` is the `tibble` containing **contigs**, the `cell_tbl` contains the **cells**, and the `cluster_tbl` contains the **clusters**. The `contig_pk`, `cell_pk` and `cluster_pk` identify the columns that identify a contig, cell and cluster, respectively, and must be unique in each of the respective tables. The tables are kept in sync so that subsetting the contigs will subset the cells, and clusters, and vice-versa. [A schematic showing table relations should go here] Of course, each of these tables can contain many other columns that will serve as covariates for various analysis, such as the CDR3 sequence, or the identity of the V, D and J regions. Various derived quantities that describe cells and clusters can also be calculated, and added to these tables, such as the medoid of a cluster. ## Functions [a screencap of something interesting?] * `cdhit`: An R interface to CDhit, which was originally ported by Thomas Lin Pedersen. * `fine_cluster`: clustering CDR3 by edit distances (possibly using empirical amino acid substitution matrices) * `cluster_permute_test`: permutation tests of cluster statistics