Bioconductor Code: scPipe

Raw Blame Patch Log History
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/sc_atac_trim_barcode.R
\name{sc_atac_trim_barcode}
\alias{sc_atac_trim_barcode}
\title{demultiplex raw single-cell ATAC-Seq fastq reads}
\usage{
sc_atac_trim_barcode(
  r1,
  r2,
  bc_file = NULL,
  valid_barcode_file = "",
  output_folder = "",
  umi_start = 0,
  umi_length = 0,
  umi_in = "both",
  rmN = FALSE,
  rmlow = FALSE,
  min_qual = 20,
  num_below_min = 2,
  id1_st = -0,
  id1_len = 16,
  id2_st = 0,
  id2_len = 16,
  no_reverse_complement = FALSE
)
}
\arguments{
\item{r1}{read one for pair-end reads.}

\item{r2}{read two for pair-end reads, NULL if single read.}

\item{bc_file}{the barcode information, can be either in a \code{fastq} format (e.g. from 10x-ATAC) or
from a \code{.csv} file (here the barcode is expected to be on the second column). 
Currently, for the fastq approach, this can be a list of barcode files.}

\item{valid_barcode_file}{optional file path of the valid (expected) barcode sequences to be found in the bc_file (.txt, can be txt.gz). 
Must contain one barcode per line on the second column separated by a comma (default ="").
If given, each barcode from bc_file is matched against the barcode of
best fit (allowing a hamming distance of 1). If a FASTQ \code{bc_file} is provided, barcodes with a higher mapping quality, as given by
the fastq reads quality score are prioritised.}

\item{output_folder}{the output dir for the demultiplexed fastq file, which will contain 
fastq files with reformatted barcode and UMI into the read name. 
Files ending in \code{.gz} will be automatically compressed.}

\item{umi_start}{if available, the start position of the molecular identifier.}

\item{umi_length}{if available, the start position of the molecular identifier.}

\item{umi_in}{umi_in}

\item{rmN}{logical, whether to remove reads that contains N in UMI or cell barcode.}

\item{rmlow}{logical, whether to remove reads that have low quality barcode sequences}

\item{min_qual}{the minimum base pair quality that is allowed (default = 20).}

\item{num_below_min}{the maximum number of base pairs below the quality threshold.}

\item{id1_st}{barcode start position (0-indexed) for read 1, which is an extra parameter that is needed if the
\code{bc_file} is in a \code{.csv} format.}

\item{id1_len}{barcode length for read 1, which is an extra parameter that is needed if the
\code{bc_file} is in a \code{.csv} format.}

\item{id2_st}{barcode start position (0-indexed) for read 2, which is an extra parameter that is needed if the
\code{bc_file} is in a \code{.csv} format.}

\item{id2_len}{barcode length for read 2, which is an extra parameter that is needed if the
\code{bc_file} is in a \code{.csv} format.}

\item{no_reverse_complement}{specifies if the reverse complement of the barcode sequence should be 
used for barcode error correction (only when barcode sequences are provided as fastq files). FALSE (default)
lets the function decide whether to use reverse complement, and TRUE forces the function to
use the forward barcode sequences.}
}
\value{
None (invisible `NULL`)
}
\description{
single-cell data need to be demultiplexed in order to retain the information of the cell barcodes
the data belong to. Here we reformat fastq files so barcode/s (and if available the UMI sequences) are moved from
the sequence into the read name. Since scATAC-Seq data are mostly paired-end, both `r1` and `r2` are demultiplexed in this function.
}
\examples{
data.folder <- system.file("extdata", package = "scPipe", mustWork = TRUE)
r1      <- file.path(data.folder, "small_chr21_R1.fastq.gz") 
r2      <- file.path(data.folder, "small_chr21_R3.fastq.gz") 

# Using a barcode fastq file:

# barcodes in fastq format
barcode_fastq      <- file.path(data.folder, "small_chr21_R2.fastq.gz") 

sc_atac_trim_barcode (
r1            = r1, 
r2            = r2, 
bc_file       = barcode_fastq,
rmN           = TRUE,
rmlow         = TRUE,
output_folder = tempdir())

# Using a barcode csv file:

# barcodes in .csv format
barcode_1000       <- file.path(data.folder, "chr21_modified_barcode_1000.csv")

\dontrun{
sc_atac_trim_barcode (
r1            = r1, 
r2            = r2, 
bc_file       = barcode_1000, 
id1_st        = 0,
rmN           = TRUE,
rmlow         = TRUE,
output_folder = tempdir())
}
}