SNPRelate: Parallel computing toolset for relatedness and principal component analysis of SNP data

[GNU General Public License, GPLv3](
## Features
Genome-wide association studies are widely used to investigate the genetic basis of diseases and traits, but they pose many computational challenges. We developed SNPRelate (R package for multi-core symmetric multiprocessing computer architectures) to accelerate two key computations on SNP data: principal component analysis (PCA) and relatedness analysis using identity-by-descent measures. The kernels of our algorithms are written in C/C++ and highly optimized.
The GDS format offers the efficient operations specifically designed for integers with two bits, since a SNP could occupy only two bits. The SNP GDS format in this package is also used by the [GWASTools]( package with the support of S4 classes and generic functions. The extended GDS format is implemented in the [SeqArray]( package to support the storage of single nucleotide variation (SNV), insertion/deletion polymorphism (indel) and structural variation calls. It is strongly suggested to use [SeqArray]( for large-scale whole-exome and whole-genome sequencing variant data instead of [SNPRelate](
## Bioconductor
Release Version: v1.38.0
## News
* See [package news](NEWS).
## Tutorials
## Citations
Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS (2012). A High-performance Computing Toolset for Relatedness and Principal Component Analysis of SNP Data. *Bioinformatics*. [DOI: 10.1093/bioinformatics/bts606](
Zheng X, Gogarten S, Lawrence M, Stilp A, Conomos M, Weir BS, Laurie C, Levine D (2017). SeqArray -- A storage-efficient high-performance data format for WGS variant calls. *Bioinformatics*. [DOI: 10.1093/bioinformatics/btx145](
## Installation
* Bioconductor repository:
if (!requireNamespace("BiocManager", quietly=TRUE))
* Development version from Github (for developers/testers only):
The `install_github()` approach requires that you build from source, i.e. `make` and compilers must be installed on your system -- see the [R FAQ]( for your operating system; you may also need to install dependencies manually.
## Implementation with Intel Intrinsics
| Functions | No SIMD | SSE2 | AVX | AVX2 | AVX-512 |
| snpgdsDiss [»]( | X |
| snpgdsEIGMIX [»]( | X | X | X |
| snpgdsGRM [»]( | X | X | X | . |
| snpgdsIBDKING [»]( | X | X | | X |
| snpgdsIBDMoM [»]( | X |
| snpgdsIBS [»]( | X | X |
| snpgdsIBSNum [»]( | X | X |
| snpgdsIndivBeta [»]( | X | X | P | X |
| snpgdsPCA [»]( | X | X | X |
| snpgdsPCACorr [»]( | X |
| snpgdsPCASampLoading [»]( | X |
| snpgdsPCASNPLoading [»]( | X |
| [...]( |
`X: fully supported; .: partially supported; P: POPCNT instruction.`
### Install the package from the source code with the support of Intel SIMD Intrinsics:
You have to customize the package compilation, see: [CRAN: Customizing-package-compilation](
Change `~/.R/Makevars` to, assuming GNU Compilers (gcc/g++) or Clang compiler (clang++) are installed:
## for C code
CFLAGS=-g -O3 -march=native -mtune=native
## for C++ code
CXXFLAGS=-g -O3 -march=native -mtune=native