Name Mode Size
..
colAlls-dgCMatrix-method.Rd 100644 2 kb
colAnyNAs-dgCMatrix-method.Rd 100644 2 kb
colAnys-dgCMatrix-method.Rd 100644 2 kb
colAvgsPerRowSet-dgCMatrix-method.Rd 100644 2 kb
colCollapse-dgCMatrix-method.Rd 100644 2 kb
colCounts-dgCMatrix-method.Rd 100644 2 kb
colCummaxs-dgCMatrix-method.Rd 100644 2 kb
colCummins-dgCMatrix-method.Rd 100644 2 kb
colCumprods-dgCMatrix-method.Rd 100644 2 kb
colCumsums-dgCMatrix-method.Rd 100644 2 kb
colDiffs-dgCMatrix-method.Rd 100644 2 kb
colIQRDiffs-dgCMatrix-method.Rd 100644 2 kb
colIQRs-dgCMatrix-method.Rd 100644 2 kb
colLogSumExps-dgCMatrix-method.Rd 100644 2 kb
colMadDiffs-dgCMatrix-method.Rd 100644 2 kb
colMads-dgCMatrix-method.Rd 100644 2 kb
colMaxs-dgCMatrix-method.Rd 100644 2 kb
colMeans2-dgCMatrix-method.Rd 100644 2 kb
colMedians-dgCMatrix-method.Rd 100644 2 kb
colMins-dgCMatrix-method.Rd 100644 2 kb
colOrderStats-dgCMatrix-method.Rd 100644 2 kb
colProds-dgCMatrix-method.Rd 100644 2 kb
colQuantiles-dgCMatrix-method.Rd 100644 2 kb
colRanges-dgCMatrix-method.Rd 100644 2 kb
colRanks-dgCMatrix-method.Rd 100644 3 kb
colSdDiffs-dgCMatrix-method.Rd 100644 2 kb
colSds-dgCMatrix-method.Rd 100644 2 kb
colSums2-dgCMatrix-method.Rd 100644 2 kb
colTabulates-dgCMatrix-method.Rd 100644 2 kb
colVarDiffs-dgCMatrix-method.Rd 100644 2 kb
colVars-dgCMatrix-method.Rd 100644 2 kb
colWeightedMads-dgCMatrix-method.Rd 100644 2 kb
colWeightedMeans-dgCMatrix-method.Rd 100644 2 kb
colWeightedMedians-dgCMatrix-method.Rd 100644 2 kb
colWeightedSds-dgCMatrix-method.Rd 100644 2 kb
colWeightedVars-dgCMatrix-method.Rd 100644 2 kb
README.Rmd
--- output: github_document --- <!-- README.md is generated from README.Rmd. Please edit that file --> ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" ) ``` # sparseMatrixStats <!-- badges: start --> [![codecov](https://codecov.io/gh/const-ae/sparseMatrixStats/branch/master/graph/badge.svg)](https://codecov.io/gh/const-ae/sparseMatrixStats) <!-- badges: end --> The goal of `sparseMatrixStats` is to make the API of the [matrixStats](https://github.com/HenrikBengtsson/matrixStats) available for sparse matrices. ## Installation You can install the release version of `r BiocStyle::Biocpkg("sparseMatrixStats")` from BioConductor: ``` r if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("sparseMatrixStats") ``` Alternatively, you can get the development version of the package from [GitHub](https://github.com/const-ae/sparseMatrixStats) with: ``` r # install.packages("devtools") devtools::install_github("const-ae/sparseMatrixStats") ``` ## Example ```{r include=FALSE} set.seed(1) ``` ```{r} library(sparseMatrixStats) ``` ```{r} mat <- matrix(0, nrow=10, ncol=6) mat[sample(seq_len(60), 4)] <- 1:4 # Convert dense matrix to sparse matrix sparse_mat <- as(mat, "dgCMatrix") sparse_mat ``` The package provides an interface to quickly do common operations on the rows or columns. For example calculate the variance: ```{r} apply(mat, 2, var) matrixStats::colVars(mat) sparseMatrixStats::colVars(sparse_mat) ``` On this small example data, all methods are basically equally fast, but if we have a much larger dataset, the optimizations for the sparse data start to show. I generate a dataset with 10,000 rows and 50 columns that is 99% empty ```{r} big_mat <- matrix(0, nrow=1e4, ncol=50) big_mat[sample(seq_len(1e4 * 50), 5000)] <- rnorm(5000) # Convert dense matrix to sparse matrix big_sparse_mat <- as(big_mat, "dgCMatrix") ``` I use the `bench` package to benchmark the performance difference: ```{r} bench::mark( sparseMatrixStats=sparseMatrixStats::colMedians(big_sparse_mat), matrixStats=matrixStats::colMedians(big_mat), apply=apply(big_mat, 2, median) ) ``` As you can see `sparseMatrixStats` is ca. 60 times fast than `matrixStats`, which in turn is 7 times faster than the `apply()` version. # API The package is still work in progress. For example, it is still completely lacking any documentation. Most functions have already been optimized for `dgCMatrix` input. The following list gives an overview which already have. In particular the `colXXXDiff()` functions have not yet been implemented. ```{r, echo=FALSE, results="asis"} matrixStats_functions <- sort( c("colsum", "rowsum", grep("^(col|row)", getNamespaceExports("matrixStats"), value = TRUE))) DelayedMatrixStats_functions <- grep("^(col|row)", getNamespaceExports("DelayedMatrixStats"), value=TRUE) DelayedArray_functions <- grep("^(col|row)", getNamespaceExports("DelayedArray"), value=TRUE) sparseMatrixStats_functions <- grep("^(col|row)", getNamespaceExports("sparseMatrixStats"), value=TRUE) notes <- c("colAnyMissings"="Not implemented because it is deprecated in favor of `colAnyNAs()`", "rowAnyMissings"="Not implemented because it is deprecated in favor of `rowAnyNAs()`", "colsum"="Base R function", "rowsum"="Base R function", "colWeightedMedians"="Only equivalent if `interpolate=FALSE`", "rowWeightedMedians"="Only equivalent if `interpolate=FALSE`", "colWeightedMads"="Sparse version behaves slightly differently, because it always uses `interpolate=FALSE`.", "rowWeightedMads"="Sparse version behaves slightly differently, because it always uses `interpolate=FALSE`.") api_df <- data.frame( Method = paste0(matrixStats_functions, "()"), matrixStats = ifelse(matrixStats_functions %in% matrixStats_functions, "✔", "❌"), sparseMatrixStats = ifelse(matrixStats_functions %in%sparseMatrixStats_functions, "✔", "❌"), Notes = ifelse(matrixStats_functions %in% names(notes), notes[matrixStats_functions], ""), stringsAsFactors = FALSE ) knitr::kable(api_df, row.names = FALSE) ```