\name{BSmooth}
\alias{BSmooth}
\title{
  BSmooth, smoothing bisulfite sequence data
}
\description{
  This implements the BSmooth algorithm for estimating methylation levels from bisulfite sequencing data.}
\usage{
BSmooth(BSseq,
        ns = 70,
        h = 1000,
        maxGap = 10^8,
        keep.se = FALSE,
        BPPARAM = bpparam(),
        chunkdim = NULL,
        level = NULL,
        verbose = getOption("verbose"))
}
\arguments{
  \item{BSseq}{An object of class \code{BSseq}.}
  \item{ns}{The minimum number of methylation loci in a smoothing window.}
  \item{h}{The minimum smoothing window, in bases.}
  \item{maxGap}{The maximum gap between two methylation loci, before the smoothing is broken across the gap.
  The default smoothes each chromosome separately.}
  \item{keep.se}{Should the estimated standard errors from the smoothing
    algorithm be kept. This will make the return object roughly 30
    percent bigger and is currently not be used for anything in \pkg{bsseq}.}
  \item{BPPARAM}{An optional \linkS4class{BiocParallelParam} instance
    determining the parallel back-end to be used during evaluation. Currently
    supported are \linkS4class{SerialParam} (Unix, Mac, Windows),
    \linkS4class{MulticoreParam} (Unix and Mac), \linkS4class{SnowParam}
    (Unix, Mac, and Windows, limited to single-machine clusters), and
    \linkS4class{BatchJobsParam} (Unix, Mac, Windows, only with the in-memory
    realization backend). See sections 'Parallelization and progress
    monitoring' and 'Realization backends' for further details.}
  \item{chunkdim}{\strong{Only applicable if \code{BACKEND == "HDF5Array"}.}
    The dimensions of the chunks to use for writing the data to
    disk. By default, \code{\link[HDF5Array]{getHDF5DumpChunkDim}()} using the
    dimensions of the returned \linkS4class{BSseq} object will be used. See
    \code{?\link[HDF5Array]{getHDF5DumpChunkDim}} for more information.}
  \item{level}{\strong{Only applicable if \code{BACKEND == "HDF5Array"}.}
    The compression level to use for writing the data to disk. By
    default, \code{\link[HDF5Array]{getHDF5DumpCompressionLevel}()} will be
    used. See \code{?\link[HDF5Array]{getHDF5DumpCompressionLevel}} for more
    information.}
  \item{verbose}{A \code{logical(1)} indicating whether progress messages
    should be printed (default \code{TRUE}).}
}
\details{
  \code{ns} and \code{h} are passed to the \code{locfit} function. The
  bandwidth used is the maximum (in genomic distance) of the \code{h}
  and a width big enough to contain \code{ns} number of methylation
  loci.
}

\section{Realization backends}{
  The \code{BSmooth()} function creates a new assay to store the coefficients
  used to construct the smoothed methylation estimates ((\code{coef}). An
  additional assay is also created if \code{keep.se == TRUE} (\code{se.coef}).

  The choice of \emph{realization backend} controls whether these assay(s) are
  stored in-memory as an ordinary \link[base]{matrix} or on-disk as a
  \linkS4class{HDF5Array}, for example.

  The choice of realization backend is controlled by the \code{BACKEND}
  argument, which defaults to the current value of
  \code{DelayedArray::\link[DelayedArray]{getRealizationBackend}()}.

  \code{BSmooth} supports the following realization backends:

  \itemize{
    \item \code{NULL} (in-memory): This stores each new assay in-memory using
      an ordinary \link[base]{matrix}.
    \item \code{HDF5Array} (on-disk): This stores each new assay on-disk in a
    HDF5 file using an \linkS4class{HDF5Matrix} from \pkg{HDF5Array}.
  }

  Please note that certain combinations of realization backend and
  parallelization backend are currently not supported. For example, the
  \linkS4class{HDF5Array} realization backend is currently only compatible when
  used with a single-machine parallelization backend (i.e. it is not compatible
  with a \linkS4class{SnowParam} that specifies an \emph{ad hoc} cluster of
  \strong{multiple} machines). \code{BSmooth()} will issue an error when given
  such incompatible realization and parallelization backends. Furthermore, to
  avoid memory usage blow-ups, \code{BSmooth()} will issue an error if an
  in-memory realization backend is used when smoothing a disk-backed
  \linkS4class{BSseq} object.

  Additional arguments related to the realization backend can be passed via the
  \code{...} argument. These arguments must be named and are passed to the
  relevant \linkS4class{RealizationSink} constructor. For example, the
  \code{...} argument can be used to specify the path to the HDF5 file to be
  used by \code{BSmooth()}. Please see the examples at the bottom of the page.
}

\section{Parallelization and progress monitoring}{
  \code{BSmooth()} now uses the \pkg{BiocParallel} package to implement
  parallelization. This brings some notable improvements:

  \itemize{
    \item Smoothed results can now be written directly to an on-disk
      realization backend by the worker. This dramatically reduces memory
      usage compared to previous versions of \pkg{bsseq} that required all
      results be retained in-memory.
    \item Parallelization is now supported on Windows through the use of a
    \linkS4class{SnowParam} object as the value of \code{BPPARAM}.
    \item Detailed and extensive job logging facilities.
  }

  All parallelization options are controlled via the \code{BPPARAM} argument.
  In general, we recommend that users combine multicore (single-machine)
  parallelization with an on-disk realization backend (see section,
  'Realization backend'). For Unix and Mac users, this means using
  a \linkS4class{MulticoreParam}. For Windows users, this means using a
  single-machine \linkS4class{SnowParam}. Please consult the \pkg{BiocParallel}
  documentation to take full advantage of the more advanced features.

  \subsection{Deprecated arguments}{
    \code{parallelBy}, \code{mc.cores}, and \code{mc.preschedule} are
    deprecated and will be removed in subsequent releases of \pkg{bsseq}. These
    arguments were necessary when \code{BSmooth()} used the \pkg{parallel}
    package to implement parallelization, but this functionality is superseded
    by the aforementioned use of \pkg{BiocParallel}. We recommend that users
    who previously relied on these arguments switch to
    \code{BPPARAM = MulticoreParam(workers = mc.cores, progressbar = TRUE)}.
  }

  \subsection{Progress monitoring}{
    A useful feature of \pkg{BiocParallel} are progress bars to monitor the
    status of long-running jobs, such as \code{BSmooth()}. Progress bars are
    controlled via the \code{progressbar} argument in the
    \linkS4class{BiocParallelParam} constructor. Progress bars replace the
    use of the deprecated \code{verbose} argument to print out information on
    the status of \code{BSmooth()}.

    \pkg{BiocParallel} also supports extensive and detailed logging facilities.
    Please consult the \pkg{BiocParallel} documentation to take full advantage
    these advanced features.
  }
}

\value{
  An object of class \code{BSseq}, containing coefficients used to fit smoothed
  methylation values and optionally standard errors for these.
}
\author{
  Method and original implementation by Kasper Daniel Hansen
  \email{khansen@jhsph.edu}. Updated implementation to support disk-backed
  \linkS4class{BSseq} objects and more general parallelization by Peter Francis
  Hickey.
}
\references{
  KD Hansen, B Langmead, and RA Irizarry.
  \emph{BSmooth: from whole genome bisulfite sequencing reads to
    differentially methylated regions}.
  Genome Biology (2012) 13:R83.
  doi:\href{http://www.dx.doi.org/10.1186/gb-2012-13-10-r83}{10.1186/gb-2012-13-10-r83}.
}
\seealso{
  \code{\link[locfit]{locfit}} in the locfit package, as well as
  \code{\linkS4class{BSseq}}.
}
\examples{
\dontrun{
  # Run BSmooth() on a matrix-backed BSseq object using an in-memory realization
  # backend with serial evaluation.
  data(BS.chr22)
  # This is a matrix-backed BSseq object.
  sapply(assays(BS.chr22, withDimnames = FALSE), class)
  BS.fit <- BSmooth(BS.chr22, BPPARAM = SerialParam(progressbar = TRUE))
  # The new 'coef' assay is an ordinary matrix.
  sapply(assays(BS.fit, withDimnames = FALSE), class)
  BS.fit

  # Run BSmooth() on a disk-backed BSseq object using the HDF5Array realization
  # backend (with data written to the file 'BSmooth_example.h5') with
  # multi-core parallel evaluation.
  BS.chr22 <- realize(BS.chr22, "HDF5Array")
  # This is a disk-backed BSseq object.
  sapply(assays(BS.chr22, withDimnames = FALSE), class)
  BS.fit <- BSmooth(BS.chr22,
              BPPARAM = MulticoreParam(workers = 2, progressbar = TRUE),
              BACKEND = "HDF5Array",
              filepath = "BSmooth_example.h5")
  # The new 'coef' assay is an HDF5Matrix.
  sapply(assays(BS.fit, withDimnames = FALSE), class)
  BS.fit
  # The new 'coef' assay is in the HDF5 file 'BSmooth_example.h5' (in the
  # current working directory).
  sapply(assays(BS.fit, withDimnames = FALSE), path)
}
}