blob: b5ddd01fd1ddf3583bf9c53f3e1d3bee25b595fd [file] [log] [blame]
% File src/library/stats/man/step.Rd
% Part of the R package, https://www.R-project.org
% Copyright 1995-2014 R Core Team
% Distributed under GPL 2 or later
\name{step}
\alias{step}
\title{
Choose a model by AIC in a Stepwise Algorithm
}
\description{
Select a formula-based model by AIC.
}
\usage{
step(object, scope, scale = 0,
direction = c("both", "backward", "forward"),
trace = 1, keep = NULL, steps = 1000, k = 2, \dots)
}
\arguments{
\item{object}{
an object representing a model of an appropriate class (mainly
\code{"lm"} and \code{"glm"}).
This is used as the initial model in the stepwise search.
}
\item{scope}{
defines the range of models examined in the stepwise search.
This should be either a single formula, or a list containing
components \code{upper} and \code{lower}, both formulae. See the
details for how to specify the formulae and how they are used.
}
\item{scale}{
used in the definition of the AIC statistic for selecting the models,
currently only for \code{\link{lm}}, \code{\link{aov}} and
\code{\link{glm}} models. The default value, \code{0}, indicates
the scale should be estimated: see \code{\link{extractAIC}}.
}
\item{direction}{
the mode of stepwise search, can be one of \code{"both"},
\code{"backward"}, or \code{"forward"}, with a default of \code{"both"}.
If the \code{scope} argument is missing the default for
\code{direction} is \code{"backward"}. Values can be abbreviated.
}
\item{trace}{
if positive, information is printed during the running of \code{step}.
Larger values may give more detailed information.
}
\item{keep}{
a filter function whose input is a fitted model object and the
associated \code{AIC} statistic, and whose output is arbitrary.
Typically \code{keep} will select a subset of the components of
the object and return them. The default is not to keep anything.
}
\item{steps}{
the maximum number of steps to be considered. The default is 1000
(essentially as many as required). It is typically used to stop the
process early.
}
\item{k}{
the multiple of the number of degrees of freedom used for the penalty.
Only \code{k = 2} gives the genuine AIC: \code{k = log(n)} is sometimes
referred to as BIC or SBC.
}
\item{\dots}{
any additional arguments to \code{\link{extractAIC}}.
}
}
\value{
the stepwise-selected model is returned, with up to two additional
components. There is an \code{"anova"} component corresponding to the
steps taken in the search, as well as a \code{"keep"} component if the
\code{keep=} argument was supplied in the call. The
\code{"Resid. Dev"} column of the analysis of deviance table refers
to a constant minus twice the maximized log likelihood: it will be a
deviance only in cases where a saturated model is well-defined
(thus excluding \code{lm}, \code{aov} and \code{survreg} fits,
for example).
}
\details{
\code{step} uses \code{\link{add1}} and \code{\link{drop1}}
repeatedly; it will work for any method for which they work, and that
is determined by having a valid method for \code{\link{extractAIC}}.
When the additive constant can be chosen so that AIC is equal to
Mallows' \eqn{C_p}{Cp}, this is done and the tables are labelled
appropriately.
The set of models searched is determined by the \code{scope} argument.
The right-hand-side of its \code{lower} component is always included
in the model, and right-hand-side of the model is included in the
\code{upper} component. If \code{scope} is a single formula, it
specifies the \code{upper} component, and the \code{lower} model is
empty. If \code{scope} is missing, the initial model is used as the
\code{upper} model.
Models specified by \code{scope} can be templates to update
\code{object} as used by \code{\link{update.formula}}. So using
\code{.} in a \code{scope} formula means \sQuote{what is
already there}, with \code{.^2} indicating all interactions of
existing terms.
There is a potential problem in using \code{\link{glm}} fits with a
variable \code{scale}, as in that case the deviance is not simply
related to the maximized log-likelihood. The \code{"glm"} method for
function \code{\link{extractAIC}} makes the
appropriate adjustment for a \code{gaussian} family, but may need to be
amended for other cases. (The \code{binomial} and \code{poisson}
families have fixed \code{scale} by default and do not correspond
to a particular maximum-likelihood problem for variable \code{scale}.)
}
\note{
This function differs considerably from the function in S, which uses a
number of approximations and does not in general compute the correct AIC.
This is a minimal implementation. Use \code{\link[MASS]{stepAIC}}
in package \CRANpkg{MASS} for a wider range of object classes.
}
\section{Warning}{
The model fitting must apply the models to the same dataset. This
may be a problem if there are missing values and \R's default of
\code{na.action = na.omit} is used. We suggest you remove the
missing values first.
Calls to the function \code{\link{nobs}} are used to check that the
number of observations involved in the fitting process remains unchanged.
}
\seealso{
\code{\link[MASS]{stepAIC}} in \CRANpkg{MASS}, \code{\link{add1}},
\code{\link{drop1}}
}
\references{
Hastie, T. J. and Pregibon, D. (1992)
\emph{Generalized linear models.}
Chapter 6 of \emph{Statistical Models in S}
eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
Venables, W. N. and Ripley, B. D. (2002)
\emph{Modern Applied Statistics with S.}
New York: Springer (4th ed).
}
\author{
B. D. Ripley: \code{step} is a slightly simplified version of
\code{\link[MASS]{stepAIC}} in package \CRANpkg{MASS} (Venables &
Ripley, 2002 and earlier editions).
The idea of a \code{step} function follows that described in Hastie &
Pregibon (1992); but the implementation in \R is more general.
}
\examples{\donttest{
## following on from example(lm)
\dontshow{utils::example("lm", echo = FALSE)}
step(lm.D9)
summary(lm1 <- lm(Fertility ~ ., data = swiss))
slm1 <- step(lm1)
summary(slm1)
slm1$anova
}}
\keyword{models}