gdsfmt: R Interface to CoreArray Genomic Data Structure (GDS) files

[GNU Lesser General Public License, LGPL-3](
## Features
This package provides a high-level R interface to CoreArray Genomic Data Structure (GDS) data files, which are portable across platforms with hierarchical structure to store multiple scalable array-oriented data sets with metadata information. It is suited for large-scale datasets, especially for data which are much larger than the available random-access memory. The gdsfmt package offers the efficient operations specifically designed for integers of less than 8 bits, since a single genetic/genomic variant, like single-nucleotide polymorphism, usually occupies fewer bits than a byte. Data compression and decompression are available with relatively efficient random access. It is also allowed to read a GDS file in parallel with multiple R processes supported by the parallel package.
## Bioconductor:
Release Version: v1.8.3
[Help Documents](
Development Version: v1.9.3
[Help Documents](
## Package Vignettes
## Citation
Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS (2012). A High-performance Computing Toolset for Relatedness and Principal Component Analysis of SNP Data. *Bioinformatics*. [DOI: 10.1093/bioinformatics/bts606](
## Package Maintainer
Dr. Xiuwen Zheng ([](
## URL
## Installation
* Bioconductor repository:
* Development version from Github:
The `install_github()` approach requires that you build from source, i.e. `make` and compilers must be installed on your system -- see the [R FAQ]( for your operating system; you may also need to install dependencies manually.
## Copyright Notice
* CoreArray C++ library, LGPL-3 License, 2007-2016, Xiuwen Zheng
* zlib, zlib License, 1995-2016, Jean-loup Gailly and Mark Adler
* LZ4, BSD 2-clause License, 2011-2016, Yann Collet
* liblzma, public domain, 2005-2016, Lasse Collin and other xz contributors
## GDS Command-line Tools
In the R environment,
install.packages("getopt", repos="")
install.packages("optparse", repos="")
install.packages("crayon", repos="")
[See More...](
### *viewgds*
`viewgds` is a shell script written in R ([viewgds.R](, to view the contents of a GDS file. The R packages `gdsfmt`, `getopt` and `optparse` should be installed before running `viewgds`, and the package `crayon` is optional.
Usage: viewgds [options] file
Installation with command line,
echo '#!' `which Rscript` '--vanilla' > viewgds
curl -L >> viewgds
chmod +x viewgds
## Or
echo '#!' `which Rscript` '--vanilla' > viewgds
wget -qO- --no-check-certificate >> viewgds
chmod +x viewgds
### *diffgds*
`diffgds` is a shell script written in R ([diffgds.R](, to compare two files GDS files. The R packages `gdsfmt`, `getopt` and `optparse` should be installed before running `diffgds`.
Usage: diffgds [options] file1 file2
Installation with command line,
echo '#!' `which Rscript` '--vanilla' > diffgds
curl -L >> diffgds
chmod +x diffgds
## Or
echo '#!' `which Rscript` '--vanilla' > diffgds
wget -qO- --no-check-certificate >> diffgds
chmod +x diffgds
## Examples
# create a GDS file
f <- createfn.gds("test.gds")
add.gdsn(f, "int", val=1:10000)
add.gdsn(f, "double", val=seq(1, 1000, 0.4))
add.gdsn(f, "character", val=c("int", "double", "logical", "factor"))
add.gdsn(f, "logical", val=rep(c(TRUE, FALSE, NA), 50))
add.gdsn(f, "factor", val=as.factor(c(NA, "AA", "CC")))
add.gdsn(f, "bit2", val=sample(0:3, 1000, replace=TRUE), storage="bit2")
# list and data.frame
add.gdsn(f, "list", val=list(X=1:10, Y=seq(1, 10, 0.25)))
add.gdsn(f, "data.frame", val=data.frame(X=1:19, Y=seq(1, 10, 0.5)))
folder <- addfolder.gdsn(f, "folder")
add.gdsn(folder, "int", val=1:1000)
add.gdsn(folder, "double", val=seq(1, 100, 0.4))
# show the contents
# close the GDS file
File: test.gds (1.1K)
+ [ ]
|--+ int { Int32 10000, 39.1K }
|--+ double { Float64 2498, 19.5K }
|--+ character { Str8 4, 26B }
|--+ logical { Int32,logical 150, 600B } *
|--+ factor { Int32,factor 3, 12B } *
|--+ bit2 { Bit2 1000, 250B }
|--+ list [ list ] *
| |--+ X { Int32 10, 40B }
| \--+ Y { Float64 37, 296B }
|--+ data.frame [ data.frame ] *
| |--+ X { Int32 19, 76B }
| \--+ Y { Float64 19, 152B }
\--+ folder [ ]
|--+ int { Int32 1000, 3.9K }
\--+ double { Float64 248, 1.9K }