src/library/stats/man/wilcox.test.Rd - R - Git at Google

 % File src/library/stats/man/wilcox.test.Rd
 % Part of the R package, https://www.R-project.org
 % Copyright 1995-2018 R Core Team
 % Distributed under GPL 2 or later

 \name{wilcox.test}
 \title{Wilcoxon Rank Sum and Signed Rank Tests}
 \alias{wilcox.test}
 \alias{wilcox.test.default}
 \alias{wilcox.test.formula}
 \concept{Mann-Whitney Test}
 \description{
   Performs one- and two-sample Wilcoxon tests on vectors of data; the
   latter is also known as \sQuote{Mann-Whitney} test.
 }
 \usage{
 wilcox.test(x, \dots)

 \method{wilcox.test}{default}(x, y = NULL,
             alternative = c("two.sided", "less", "greater"),
             mu = 0, paired = FALSE, exact = NULL, correct = TRUE,
             conf.int = FALSE, conf.level = 0.95, \dots)

 \method{wilcox.test}{formula}(formula, data, subset, na.action, \dots)
 }
 \arguments{
   \item{x}{numeric vector of data values.  Non-finite (e.g., infinite or
     missing) values will be omitted.}
   \item{y}{an optional numeric vector of data values: as with \code{x}
     non-finite values will be omitted.}
   \item{alternative}{a character string specifying the alternative
     hypothesis, must be one of \code{"two.sided"} (default),
     \code{"greater"} or \code{"less"}.  You can specify just the initial
     letter.}
   \item{mu}{a number specifying an optional parameter used to form the
     null hypothesis.  See \sQuote{Details}.}
   \item{paired}{a logical indicating whether you want a paired test.}
   \item{exact}{a logical indicating whether an exact p-value
     should be computed.}
   \item{correct}{a logical indicating whether to apply continuity
     correction in the normal approximation for the p-value.}
   \item{conf.int}{a logical indicating whether a confidence interval
     should be computed.}
   \item{conf.level}{confidence level of the interval.}
   \item{formula}{a formula of the form \code{lhs ~ rhs} where \code{lhs}
     is a numeric variable giving the data values and \code{rhs} a factor
     with two levels giving the corresponding groups.}
   \item{data}{an optional matrix or data frame (or similar: see
     \code{\link{model.frame}}) containing the variables in the
     formula \code{formula}.  By default the variables are taken from
     \code{environment(formula)}.}
   \item{subset}{an optional vector specifying a subset of observations
     to be used.}
   \item{na.action}{a function which indicates what should happen when
     the data contain \code{NA}s.  Defaults to
     \code{getOption("na.action")}.}
   \item{\dots}{further arguments to be passed to or from methods.}
 }
 \details{
   The formula interface is only applicable for the 2-sample tests.

   If only \code{x} is given, or if both \code{x} and \code{y} are given
   and \code{paired} is \code{TRUE}, a Wilcoxon signed rank test of the
   null that the distribution of \code{x} (in the one sample case) or of
   \code{x - y} (in the paired two sample case) is symmetric about
   \code{mu} is performed.

   Otherwise, if both \code{x} and \code{y} are given and \code{paired}
   is \code{FALSE}, a Wilcoxon rank sum test (equivalent to the
   Mann-Whitney test: see the Note) is carried out.  In this case, the
   null hypothesis is that the distributions of \code{x} and \code{y}
   differ by a location shift of \code{mu} and the alternative is that
   they differ by some other location shift (and the one-sided
   alternative \code{"greater"} is that \code{x} is shifted to the right
   of \code{y}).

   By default (if \code{exact} is not specified), an exact p-value
   is computed if the samples contain less than 50 finite values and
   there are no ties.  Otherwise, a normal approximation is used.

   Optionally (if argument \code{conf.int} is true), a nonparametric
   confidence interval and an estimator for the pseudomedian (one-sample
   case) or for the difference of the location parameters \code{x-y} is
   computed.  (The pseudomedian of a distribution \eqn{F} is the median
   of the distribution of \eqn{(u+v)/2}, where \eqn{u} and \eqn{v} are
   independent, each with distribution \eqn{F}.  If \eqn{F} is symmetric,
   then the pseudomedian and median coincide.  See Hollander & Wolfe
   (1973), page 34.)  Note that in the two-sample case the estimator for
   the difference in location parameters does \bold{not} estimate the
   difference in medians (a common misconception) but rather the median
   of the difference between a sample from \code{x} and a sample from
   \code{y}.

   If exact p-values are available, an exact confidence interval is
   obtained by the algorithm described in Bauer (1972), and the
   Hodges-Lehmann estimator is employed.  Otherwise, the returned
   confidence interval and point estimate are based on normal
   approximations.  These are continuity-corrected for the interval but
   \emph{not} the estimate (as the correction depends on the
   \code{alternative}).

   With small samples it may not be possible to achieve very high
   confidence interval coverages. If this happens a warning will be given
   and an interval with lower coverage will be substituted.

   When \code{x} (and \code{y} if applicable) are valid, the function now
   always returns, also in the \code{conf.int = TRUE} case when a
   confidence interval cannot be computed, in which case the interval
   boundaries and sometimes the \code{estimate} now contain
   \code{\link{NaN}}.
 }
 \value{
   A list with class \code{"htest"} containing the following components:
   \item{statistic}{the value of the test statistic with a name
     describing it.}
   \item{parameter}{the parameter(s) for the exact distribution of the
     test statistic.}
   \item{p.value}{the p-value for the test.}
   \item{null.value}{the location parameter \code{mu}.}
   \item{alternative}{a character string describing the alternative
     hypothesis.}
   \item{method}{the type of test applied.}
   \item{data.name}{a character string giving the names of the data.}
   \item{conf.int}{a confidence interval for the location parameter.
     (Only present if argument \code{conf.int = TRUE}.)}
   \item{estimate}{an estimate of the location parameter.
     (Only present if argument \code{conf.int = TRUE}.)}
 }
 \note{
   The literature is not unanimous about the definitions of the Wilcoxon
   rank sum and Mann-Whitney tests.  The two most common definitions
   correspond to the sum of the ranks of the first sample with the
   minimum value subtracted or not: \R subtracts and S-PLUS does not,
   giving a value which is larger by \eqn{m(m+1)/2} for a first sample
   of size \eqn{m}.  (It seems Wilcoxon's original paper used the
   unadjusted sum of the ranks but subsequent tables subtracted the
   minimum.)

   \R's value can also be computed as the number of all pairs
   \code{(x[i], y[j])} for which \code{y[j]} is not greater than
   \code{x[i]}, the most common definition of the Mann-Whitney test.
 }
 \section{Warning}{
   This function can use large amounts of memory and stack (and even
   crash \R if the stack limit is exceeded) if \code{exact = TRUE} and
   one sample is large (several thousands or more).
 }
 \references{
   David F. Bauer (1972).
   Constructing confidence sets using rank statistics.
   \emph{Journal of the American Statistical Association}
   \bold{67}, 687--690.
   \doi{10.1080/01621459.1972.10481279}.

   Myles Hollander and Douglas A. Wolfe (1973).
   \emph{Nonparametric Statistical Methods}.
   New York: John Wiley & Sons.
   Pages 27--33 (one-sample), 68--75 (two-sample).\cr
   Or second edition (1999).
 }
 \seealso{
   \code{\link{psignrank}}, \code{\link{pwilcox}}.

   \code{\link[coin:LocationTests]{wilcox_test}} in package
   \CRANpkg{coin} for exact, asymptotic and Monte Carlo
   \emph{conditional} p-values, including in the presence of ties.

   \code{\link{kruskal.test}} for testing homogeneity in location
   parameters in the case of two or more samples;
   \code{\link{t.test}} for an alternative under normality
   assumptions [or large samples]
 }
 \examples{
 require(graphics)
 ## One-sample test.
 ## Hollander & Wolfe (1973), 29f.
 ## Hamilton depression scale factor measurements in 9 patients with
 ##  mixed anxiety and depression, taken at the first (x) and second
 ##  (y) visit after initiation of a therapy (administration of a
 ##  tranquilizer).
 x <- c(1.83,  0.50,  1.62,  2.48, 1.68, 1.88, 1.55, 3.06, 1.30)
 y <- c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29)
 wilcox.test(x, y, paired = TRUE, alternative = "greater")
 wilcox.test(y - x, alternative = "less")    # The same.
 wilcox.test(y - x, alternative = "less",
             exact = FALSE, correct = FALSE) # H&W large sample
                                             # approximation

 ## Two-sample test.
 ## Hollander & Wolfe (1973), 69f.
 ## Permeability constants of the human chorioamnion (a placental
 ##  membrane) at term (x) and between 12 to 26 weeks gestational
 ##  age (y).  The alternative of interest is greater permeability
 ##  of the human chorioamnion for the term pregnancy.
 x <- c(0.80, 0.83, 1.89, 1.04, 1.45, 1.38, 1.91, 1.64, 0.73, 1.46)
 y <- c(1.15, 0.88, 0.90, 0.74, 1.21)
 wilcox.test(x, y, alternative = "g")        # greater
 wilcox.test(x, y, alternative = "greater",
             exact = FALSE, correct = FALSE) # H&W large sample
                                             # approximation

 wilcox.test(rnorm(10), rnorm(10, 2), conf.int = TRUE)

 ## Formula interface.
 boxplot(Ozone ~ Month, data = airquality)
 wilcox.test(Ozone ~ Month, data = airquality,
             subset = Month \%in\% c(5, 8))
 }
 \keyword{htest}
	% File src/library/stats/man/wilcox.test.Rd
	% Part of the R package, https://www.R-project.org
	% Copyright 1995-2018 R Core Team
	% Distributed under GPL 2 or later

	\name{wilcox.test}
	\title{Wilcoxon Rank Sum and Signed Rank Tests}
	\alias{wilcox.test}
	\alias{wilcox.test.default}
	\alias{wilcox.test.formula}
	\concept{Mann-Whitney Test}
	\description{
	Performs one- and two-sample Wilcoxon tests on vectors of data; the
	latter is also known as \sQuote{Mann-Whitney} test.
	}
	\usage{
	wilcox.test(x, \dots)

	\method{wilcox.test}{default}(x, y = NULL,
	alternative = c("two.sided", "less", "greater"),
	mu = 0, paired = FALSE, exact = NULL, correct = TRUE,
	conf.int = FALSE, conf.level = 0.95, \dots)

	\method{wilcox.test}{formula}(formula, data, subset, na.action, \dots)
	}
	\arguments{
	\item{x}{numeric vector of data values. Non-finite (e.g., infinite or
	missing) values will be omitted.}
	\item{y}{an optional numeric vector of data values: as with \code{x}
	non-finite values will be omitted.}
	\item{alternative}{a character string specifying the alternative
	hypothesis, must be one of \code{"two.sided"} (default),
	\code{"greater"} or \code{"less"}. You can specify just the initial
	letter.}
	\item{mu}{a number specifying an optional parameter used to form the
	null hypothesis. See \sQuote{Details}.}
	\item{paired}{a logical indicating whether you want a paired test.}
	\item{exact}{a logical indicating whether an exact p-value
	should be computed.}
	\item{correct}{a logical indicating whether to apply continuity
	correction in the normal approximation for the p-value.}
	\item{conf.int}{a logical indicating whether a confidence interval
	should be computed.}
	\item{conf.level}{confidence level of the interval.}
	\item{formula}{a formula of the form \code{lhs ~ rhs} where \code{lhs}
	is a numeric variable giving the data values and \code{rhs} a factor
	with two levels giving the corresponding groups.}
	\item{data}{an optional matrix or data frame (or similar: see
	\code{\link{model.frame}}) containing the variables in the
	formula \code{formula}. By default the variables are taken from
	\code{environment(formula)}.}
	\item{subset}{an optional vector specifying a subset of observations
	to be used.}
	\item{na.action}{a function which indicates what should happen when
	the data contain \code{NA}s. Defaults to
	\code{getOption("na.action")}.}
	\item{\dots}{further arguments to be passed to or from methods.}
	}
	\details{
	The formula interface is only applicable for the 2-sample tests.

	If only \code{x} is given, or if both \code{x} and \code{y} are given
	and \code{paired} is \code{TRUE}, a Wilcoxon signed rank test of the
	null that the distribution of \code{x} (in the one sample case) or of
	\code{x - y} (in the paired two sample case) is symmetric about
	\code{mu} is performed.

	Otherwise, if both \code{x} and \code{y} are given and \code{paired}
	is \code{FALSE}, a Wilcoxon rank sum test (equivalent to the
	Mann-Whitney test: see the Note) is carried out. In this case, the
	null hypothesis is that the distributions of \code{x} and \code{y}
	differ by a location shift of \code{mu} and the alternative is that
	they differ by some other location shift (and the one-sided
	alternative \code{"greater"} is that \code{x} is shifted to the right
	of \code{y}).

	By default (if \code{exact} is not specified), an exact p-value
	is computed if the samples contain less than 50 finite values and
	there are no ties. Otherwise, a normal approximation is used.

	Optionally (if argument \code{conf.int} is true), a nonparametric
	confidence interval and an estimator for the pseudomedian (one-sample
	case) or for the difference of the location parameters \code{x-y} is
	computed. (The pseudomedian of a distribution \eqn{F} is the median
	of the distribution of \eqn{(u+v)/2}, where \eqn{u} and \eqn{v} are
	independent, each with distribution \eqn{F}. If \eqn{F} is symmetric,
	then the pseudomedian and median coincide. See Hollander & Wolfe
	(1973), page 34.) Note that in the two-sample case the estimator for
	the difference in location parameters does \bold{not} estimate the
	difference in medians (a common misconception) but rather the median
	of the difference between a sample from \code{x} and a sample from
	\code{y}.

	If exact p-values are available, an exact confidence interval is
	obtained by the algorithm described in Bauer (1972), and the
	Hodges-Lehmann estimator is employed. Otherwise, the returned
	confidence interval and point estimate are based on normal
	approximations. These are continuity-corrected for the interval but
	\emph{not} the estimate (as the correction depends on the
	\code{alternative}).

	With small samples it may not be possible to achieve very high
	confidence interval coverages. If this happens a warning will be given
	and an interval with lower coverage will be substituted.

	When \code{x} (and \code{y} if applicable) are valid, the function now
	always returns, also in the \code{conf.int = TRUE} case when a
	confidence interval cannot be computed, in which case the interval
	boundaries and sometimes the \code{estimate} now contain
	\code{\link{NaN}}.
	}
	\value{
	A list with class \code{"htest"} containing the following components:
	\item{statistic}{the value of the test statistic with a name
	describing it.}
	\item{parameter}{the parameter(s) for the exact distribution of the
	test statistic.}
	\item{p.value}{the p-value for the test.}
	\item{null.value}{the location parameter \code{mu}.}
	\item{alternative}{a character string describing the alternative
	hypothesis.}
	\item{method}{the type of test applied.}
	\item{data.name}{a character string giving the names of the data.}
	\item{conf.int}{a confidence interval for the location parameter.
	(Only present if argument \code{conf.int = TRUE}.)}
	\item{estimate}{an estimate of the location parameter.
	(Only present if argument \code{conf.int = TRUE}.)}
	}
	\note{
	The literature is not unanimous about the definitions of the Wilcoxon
	rank sum and Mann-Whitney tests. The two most common definitions
	correspond to the sum of the ranks of the first sample with the
	minimum value subtracted or not: \R subtracts and S-PLUS does not,
	giving a value which is larger by \eqn{m(m+1)/2} for a first sample
	of size \eqn{m}. (It seems Wilcoxon's original paper used the
	unadjusted sum of the ranks but subsequent tables subtracted the
	minimum.)

	\R's value can also be computed as the number of all pairs
	\code{(x[i], y[j])} for which \code{y[j]} is not greater than
	\code{x[i]}, the most common definition of the Mann-Whitney test.
	}
	\section{Warning}{
	This function can use large amounts of memory and stack (and even
	crash \R if the stack limit is exceeded) if \code{exact = TRUE} and
	one sample is large (several thousands or more).
	}
	\references{
	David F. Bauer (1972).
	Constructing confidence sets using rank statistics.
	\emph{Journal of the American Statistical Association}
	\bold{67}, 687--690.
	\doi{10.1080/01621459.1972.10481279}.

	Myles Hollander and Douglas A. Wolfe (1973).
	\emph{Nonparametric Statistical Methods}.
	New York: John Wiley & Sons.
	Pages 27--33 (one-sample), 68--75 (two-sample).\cr
	Or second edition (1999).
	}
	\seealso{
	\code{\link{psignrank}}, \code{\link{pwilcox}}.

	\code{\link[coin:LocationTests]{wilcox_test}} in package
	\CRANpkg{coin} for exact, asymptotic and Monte Carlo
	\emph{conditional} p-values, including in the presence of ties.

	\code{\link{kruskal.test}} for testing homogeneity in location
	parameters in the case of two or more samples;
	\code{\link{t.test}} for an alternative under normality
	assumptions [or large samples]
	}
	\examples{
	require(graphics)
	## One-sample test.
	## Hollander & Wolfe (1973), 29f.
	## Hamilton depression scale factor measurements in 9 patients with
	## mixed anxiety and depression, taken at the first (x) and second
	## (y) visit after initiation of a therapy (administration of a
	## tranquilizer).
	x <- c(1.83, 0.50, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.30)
	y <- c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29)
	wilcox.test(x, y, paired = TRUE, alternative = "greater")
	wilcox.test(y - x, alternative = "less") # The same.
	wilcox.test(y - x, alternative = "less",
	exact = FALSE, correct = FALSE) # H&W large sample
	# approximation

	## Two-sample test.
	## Hollander & Wolfe (1973), 69f.
	## Permeability constants of the human chorioamnion (a placental
	## membrane) at term (x) and between 12 to 26 weeks gestational
	## age (y). The alternative of interest is greater permeability
	## of the human chorioamnion for the term pregnancy.
	x <- c(0.80, 0.83, 1.89, 1.04, 1.45, 1.38, 1.91, 1.64, 0.73, 1.46)
	y <- c(1.15, 0.88, 0.90, 0.74, 1.21)
	wilcox.test(x, y, alternative = "g") # greater
	wilcox.test(x, y, alternative = "greater",
	exact = FALSE, correct = FALSE) # H&W large sample
	# approximation

	wilcox.test(rnorm(10), rnorm(10, 2), conf.int = TRUE)

	## Formula interface.
	boxplot(Ozone ~ Month, data = airquality)
	wilcox.test(Ozone ~ Month, data = airquality,
	subset = Month \%in\% c(5, 8))
	}
	\keyword{htest}