% Generated by roxygen2: do not edit by hand % Please edit documentation in R/sample_filtering.R \name{metric_sample_filter} \alias{metric_sample_filter} \title{Metric-based Sample Filtering: Function to filter single-cell RNA-Seq libraries.} \usage{ metric_sample_filter(expr, nreads = colSums(expr), ralign = NULL, gene_filter = NULL, pos_controls = NULL, scale. = FALSE, glen = NULL, AUC_range = c(0, 15), zcut = 1, mixture = TRUE, dip_thresh = 0.05, hard_nreads = 25000, hard_ralign = 15, hard_breadth = 0.2, hard_auc = 10, suff_nreads = NULL, suff_ralign = NULL, suff_breadth = NULL, suff_auc = NULL, plot = FALSE, hist_breaks = 10, ...) } \arguments{ \item{expr}{matrix The data matrix (genes in rows, cells in columns).} \item{nreads}{A numeric vector representing number of reads in each library. Default to `colSums` of `expr`.} \item{ralign}{A numeric vector representing the proportion of reads aligned to the reference genome in each library. If NULL, filtered_ralign will be returned NA.} \item{gene_filter}{A logical vector indexing genes that will be used to compute library transcriptome breadth. If NULL, filtered_breadth will be returned NA.} \item{pos_controls}{A logical, numeric, or character vector indicating positive control genes that will be used to compute false-negative rate characteristics. If NULL, filtered_fnr will be returned NA.} \item{scale.}{logical. Will expression be scaled by total expression for FNR computation? Default = FALSE} \item{glen}{Gene lengths for gene-length normalization (normalized data used in FNR computation).} \item{AUC_range}{An array of two values, representing range over which FNR AUC will be computed (log(expr_units)). Default c(0,15)} \item{zcut}{A numeric value determining threshold Z-score for sd, mad, and mixture sub-criteria. Default 1. If NULL, only hard threshold sub-criteria will be applied.} \item{mixture}{A logical value determining whether mixture modeling sub-criterion will be applied per primary criterion (metric). If true, a dip test will be applied to each metric. If a metric is multimodal, it is fit to a two-component normal mixture model. Samples deviating zcut sd's from optimal mean (in the inferior direction), have failed this sub-criterion.} \item{dip_thresh}{A numeric value determining dip test p-value threshold. Default 0.05.} \item{hard_nreads}{numeric. Hard (lower bound on) nreads threshold. Default 25000.} \item{hard_ralign}{numeric. Hard (lower bound on) ralign threshold. Default 15.} \item{hard_breadth}{numeric. Hard (lower bound on) breadth threshold. Default 0.2.} \item{hard_auc}{numeric. Hard (upper bound on) fnr auc threshold. Default 10.} \item{suff_nreads}{numeric. If not null, serves as an overriding upper bound on nreads threshold.} \item{suff_ralign}{numeric. If not null, serves as an overriding upper bound on ralign threshold.} \item{suff_breadth}{numeric. If not null, serves as an overriding upper bound on breadth threshold.} \item{suff_auc}{numeric. If not null, serves as an overriding lower bound on fnr auc threshold.} \item{plot}{logical. Should a plot be produced?} \item{hist_breaks}{hist() breaks argument. Ignored if `plot=FALSE`.} \item{...}{Arguments to be passed to methods.} } \value{ A list with the following elements: \itemize{ \item{filtered_nreads}{ Logical. Sample has too few reads.} \item{filtered_ralign}{ Logical. Sample has too few reads aligned.} \item{filtered_breadth}{ Logical. Samples has too few genes detected (low breadth).} \item{filtered_fnr}{ Logical. Sample has a high FNR AUC.} } } \description{ This function returns a sample-filtering report for each cell in the input expression matrix, describing which filtering criteria are satisfied. } \details{ For each primary criterion (metric), a sample is evaluated based on 4 sub-criteria: 1) Hard (encoded) threshold 2) Adaptive thresholding via sd's from the mean 3) Adaptive thresholding via mad's from the median 4) Adaptive thresholding via sd's from the mean (after mixture modeling) A sample must pass all sub-criteria to pass the primary criterion. } \examples{ mat <- matrix(rpois(1000, lambda = 5), ncol=10) colnames(mat) <- paste("X", 1:ncol(mat), sep="") qc = as.matrix(cbind(colSums(mat),colSums(mat > 0))) rownames(qc) = colnames(mat) colnames(qc) = c("NCOUNTS","NGENES") mfilt = metric_sample_filter(expr = mat,nreads = qc[,"NCOUNTS"], plot = TRUE, hard_nreads = 0) }