\name{crlmmCopynumber}
\alias{crlmmCopynumber}
\alias{crlmmCopynumber2}
\alias{crlmmCopynumberLD}
\title{Locus- and allele-specific estimation of copy number}
\description{
  Locus- and allele-specific estimation of copy number.
}
\usage{
crlmmCopynumber(object, MIN.SAMPLES=10, SNRMin = 5, MIN.OBS = 1,
	        DF.PRIOR = 50, bias.adj = FALSE,
                prior.prob = rep(1/4, 4), seed = 1, verbose = TRUE,
                GT.CONF.THR = 0.80, MIN.NU = 2^3, MIN.PHI = 2^3,
                THR.NU.PHI = TRUE, type=c("SNP", "NP", "X.SNP", "X.NP"))
}

\arguments{

  \item{object}{object of class \code{CNSet}.}

 \item{MIN.SAMPLES}{ 'Integer'.  The minimum number of samples in a
  batch.  Bathes with fewer than MIN.SAMPLES are skipped.  Therefore,
  samples in batches with fewer than MIN.SAMPLES have NA's for the
  allele-specific copy number and NA's for the linear model
  parameters.

}

  \item{SNRMin}{ Samples with low signal to noise ratios are  excluded.  }

  \item{MIN.OBS}{

  For a SNP with with fewer than \code{MIN.OBS} of a genotype in a given
  batch, the within-genotype median is imputed.  The imputation is based
  on a regression using SNPs for which all three biallelic genotypes are
  observed.  For example, assume at at a given SNP genotypes AA and AB
  were observed and BB is an unobserved genotype.  For SNPs in which all
  3 genotypes were observed, we fit the model E(mean_BB) = beta0 +
  beta1*mean_AA + beta2*mean_AB, obtaining estimates; of beta0, beta1,
  and beta2.  The imputed mean at the SNP with unobserved BB is then
  beta0hat + beta1hat * mean_AA of beta2hat * mean_AB.

}

  \item{DF.PRIOR}{

  The 2 x 2 covariance matrix of the background and signal variances
  is estimated from the data at each locus.  This matrix is then
  smoothed towards a common matrix estimated from all of the loci.
  DF.PRIOR controls the amount of smoothing towards the common matrix,
  with higher values corresponding to greater smoothing.  Currently,
  DF.PRIOR is not estimated from the data.  Future versions may
  estimate DF.PRIOR empirically.

}

\item{bias.adj}{

  \code{bias.adj} is currently ignored (as well as the prior.prob
  argument).  We plan to add this feature back to the crlmm package in
  the near future. This feature, when \code{TRUE}, updated initial
  estimates from the linear model after excluding samples with a low
  posterior probability of normal copy number.  Excluding samples that
  have a low posterior probability can be helpful at loci in which a
  substantial fraction of the samples have a copy number alteration.
  For additional information, see Scharpf et al., 2010.

}
  \item{prior.prob}{

    This argument is currently ignored.  A numerical vector providing
  prior probabilities for copy number states corresponding to homozygous
  deletion, hemizygous deletion, normal copy number, and amplification,
  respectively.

}
  \item{seed}{ Seed for random number generation.}

  \item{verbose}{ Logical. }

  \item{GT.CONF.THR}{

    Confidence threshold for genotype calls (0, 1).  Calls with
    confidence scores below this theshold are not used to estimate the
    within-genotype medians. See Carvalho et al., 2007 for information
    regarding confidence scores of biallelic genotypes.

}


  \item{MIN.NU}{ numeric. Minimum value for background
    intensity. Ignored if \code{THR.NU.PHI} is \code{FALSE}. }

  \item{MIN.PHI}{numeric. Minimum value for slope. Ignored if
    \code{THR.NU.PHI} is \code{FALSE}.}

  \item{THR.NU.PHI}{  If \code{THR.NU.PHI} is \code{FALSE},
    \code{MIN.NU} and \code{MIN.PHI} are ignored. When TRUE, background
    (nu) and slope (phi) coefficients below MIN.NU and MIN.PHI are set
    to MIN.NU and MIN.PHI, respectively.}

  \item{type}{ Character string vector that must be one or more of
    "SNP", "NP", "X.SNP", or "X.NP". Type refers to a set of
    markers. See details below}

 }

 \references{

  Carvalho B, Bengtsson H, Speed TP, Irizarry RA. Exploration,
  normalization, and genotype calls of high-density oligonucleotide SNP
  array data. Biostatistics. 2007 Apr;8(2):485-99. Epub 2006 Dec
  22. PMID: 17189563.

  Carvalho BS, Louis TA, Irizarry RA.
  Quantifying uncertainty in genotype calls.
  Bioinformatics. 2010 Jan 15;26(2):242-9.

  Scharpf RB, Ruczinski I, Carvalho B, Doan B, Chakravarti A, and
  Irizarry RA, Biostatistics.  Biostatistics, Epub July 2010.

}

\details{

    We suggest a minimum of 10 samples per batch for using
    \code{crlmmCopynumber}.  50 or more samples per batch is preferred
    and will improve the estimates.

    The functions \code{crlmmCopynumberLD} and
    \code{crlmmCopynumber2} have been deprecated.

	The argument \code{type} can be used to specify a subset of
	markers for which the copy number estimation algorithm is run.
	One or more of the following possible entries are valid: 'SNP',
	'NP', 'X.SNP', and 'X.NP'.

	'SNP' referers to autosomal SNPs.

	'NP' refers to autosomal nonpolymorphic markers.

	'X.SNP' refers to SNPs on chromosome X.

	'X.NP' refers to autosomes on chromosome X.

	However, users must run 'SNP' prior to running 'NP' and 'X.NP',
	or specify \code{type = c('SNP', 'X.NP')}.

      }

      \value{

	The value returned by the \code{crlmmCopynumber} function
	depends on whether the data is stored in RAM or whether the data
	is stored on disk using the R package \code{ff} for reading /
	writing.  If uncertain, the first line of the \code{show} method
	defined for \code{CNSet} objects prints whether the
	\code{assayData} elements are derived from the \code{ff} package
	in the first line.  Specifically,

	- if the elements of the \code{batchStaticts} slot in the
	\code{CNSet} object have the class "ff_matrix" or "ffdf", then
	the \code{crlmmCopynumber} function updates the data stored on
	disk and returns the value \code{TRUE}.

	- if the elements of the \code{batchStatistics} slot in the
	\code{CNSet} object have the class 'matrix', then the
	\code{crlmmCopynumber} function returns an object of class
	\code{CNSet} with the elements of \code{batchStatistics}
	updated.

      }

\author{R. Scharpf}
\keyword{manip}