src/library/base/man/Extract.data.frame.Rd - R - Git at Google

 % File src/library/base/man/Extract.data.frame.Rd
 % Part of the R package, https://www.R-project.org
 % Copyright 1995-2019 R Core Team
 % Distributed under GPL 2 or later

 \name{Extract.data.frame}
 \alias{[.data.frame}
 \alias{[[.data.frame}
 \alias{[<-.data.frame}
 \alias{[[<-.data.frame}
 % \alias{$.data.frame}
 \alias{$<-.data.frame}
 \title{Extract or Replace Parts of a Data Frame}
 \description{
   Extract or replace subsets of data frames.
 }
 \usage{
 \method{[}{data.frame}(x, i, j, drop = )
 \method{[}{data.frame}(x, i, j) <- value
 \method{[[}{data.frame}(x, ..., exact = TRUE)
 \method{[[}{data.frame}(x, i, j) <- value
 % \method{$}{data.frame}(x, name)
 \method{$}{data.frame}(x, name) <- value
 }
 \arguments{
   \item{x}{data frame.}

   \item{i, j, ...}{elements to extract or replace.  For \code{[} and
     \code{[[}, these are \code{numeric} or \code{character} or, for
     \code{[} only, empty.  Numeric values are coerced to integer as if
     by \code{\link{as.integer}}.  For replacement by \code{[}, a logical
     matrix is allowed.}

   \item{name}{
     A literal character string or a \link{name} (possibly \link{backtick}
     quoted).}

   \item{drop}{logical.  If \code{TRUE} the result is coerced to the
     lowest possible dimension.  The default is to drop if only one
     column is left, but \bold{not} to drop if only one row is left.}

   \item{value}{A suitable replacement value: it will be repeated a whole
     number of times if necessary and it may be coerced: see the
     Coercion section.  If \code{NULL}, deletes the column if a single
     column is selected.}

    \item{exact}{logical: see \code{\link{[}}, and applies to column names.}
 }
 \details{
   Data frames can be indexed in several modes.  When \code{[} and
   \code{[[} are used with a single vector index (\code{x[i]} or
   \code{x[[i]]}), they index the data frame as if it were a list.  In
   this usage a \code{drop} argument is ignored, with a warning.

   There is no \code{data.frame} method for \code{$}, so \code{x$name}
   uses the default method which treats \code{x} as a list (with partial
   matching of column names if the match is unique, see
   \code{\link{Extract}}).  The replacement method (for \code{$}) checks
   \code{value} for the correct number of rows, and replicates it if necessary.

   When \code{[} and \code{[[} are used with two indices (\code{x[i, j]}
   and \code{x[[i, j]]}) they act like indexing a matrix:  \code{[[} can
   only be used to select one element.  Note that for each selected
   column, \code{xj} say, typically (if it is not matrix-like), the
   resulting column will be \code{xj[i]}, and hence rely on the
   corresponding \code{[} method, see the examples section.

   If \code{[} returns a data frame it will have unique (and non-missing)
   row names, if necessary transforming the row names using
   \code{\link{make.unique}}.  Similarly, if columns are selected column
   names will be transformed to be unique if necessary (e.g., if columns
   are selected more than once, or if more than one column of a given
   name is selected if the data frame has duplicate column names).

   When \code{drop = TRUE}, this is applied to the subsetting of any
   matrices contained in the data frame as well as to the data frame itself.

   The replacement methods can be used to add whole column(s) by specifying
   non-existent column(s), in which case the column(s) are added at the
   right-hand edge of the data frame and numerical indices must be
   contiguous to existing indices.  On the other hand, rows can be added
   at any row after the current last row, and the columns will be
   in-filled with missing values.  Missing values in the indices are not
   allowed for replacement.

   For \code{[} the replacement value can be a list: each element of the
   list is used to replace (part of) one column, recycling the list as
   necessary.  If columns specified by number are created, the names
   (if any) of the corresponding list elements are used to name the
   columns.  If the replacement is not selecting rows, list values can
   contain \code{NULL} elements which will cause the corresponding
   columns to be deleted.  (See the Examples.)

   Matrix indexing (\code{x[i]} with a logical or a 2-column integer
   matrix \code{i}) using \code{[} is not recommended.  For extraction,
   \code{x} is first coerced to a matrix. For replacement, logical
   matrix indices must be of the same dimension as \code{x}.
   Replacements are done one column at a time, with multiple type
   coercions possibly taking place.

   Both \code{[} and \code{[[} extraction methods partially match row
   names.  By default neither partially match column names, but \code{[[}
   will if \code{exact = FALSE} (and with a warning if \code{exact =
   NA}).  If you want to exact matching on row names use
   \code{\link{match}}, as in the examples.
 }
 \section{Coercion}{
   The story over when replacement values are coerced is a complicated
   one, and one that has changed during \R's development.  This section
   is a guide only.

   When \code{[} and \code{[[} are used to add or replace a whole column,
   no coercion takes place but \code{value} will be
   replicated (by calling the generic function \code{\link{rep}}) to the
   right length if an exact number of repeats can be used.

   When \code{[} is used with a logical matrix, each value is coerced to
   the type of the column into which it is to be placed.

   When  \code{[} and \code{[[} are used with two indices, the
   column will be coerced as necessary to accommodate the value.

   Note that when the replacement value is an array (including a matrix)
   it is \emph{not} treated as a series of columns (as
   \code{\link{data.frame}} and \code{\link{as.data.frame}} do) but
   inserted as a single column.
 }
 \section{Warning}{
   The default behaviour when only one \emph{row} is left is equivalent to
   specifying \code{drop = FALSE}.  To drop from a data frame to a list,
   \code{drop = TRUE} has to be specified explicitly.

   Arguments other than \code{drop} and \code{exact} should not be named:
   there is a warning if they are and the behaviour differs from the
   description here.
 }
 \value{
   For \code{[} a data frame, list or a single column (the latter two
   only when dimensions have been dropped).  If matrix indexing is used for
   extraction a vector results.  If the result would be a data frame an
   error results if undefined columns are selected (as there is no general
   concept of a 'missing' column in a data frame).  Otherwise if a single
   column is selected and this is undefined the result is \code{NULL}.

   For \code{[[} a column of the data frame or \code{NULL}
   (extraction with one index)
   or a length-one vector (extraction with two indices).

   For \code{$}, a column of the data frame (or \code{NULL}).

   For \code{[<-}, \code{[[<-} and \code{$<-}, a data frame.
 }
 \seealso{
   \code{\link{subset}} which is often easier for extraction,
   \code{\link{data.frame}}, \code{\link{Extract}}.
 }
 \examples{
 sw <- swiss[1:5, 1:4]  # select a manageable subset

 sw[1:3]      # select columns
 sw[, 1:3]    # same
 sw[4:5, 1:3] # select rows and columns
 sw[1]        # a one-column data frame
 sw[, 1, drop = FALSE]  # the same
 sw[, 1]      # a (unnamed) vector
 sw[[1]]      # the same
 sw$Fert      # the same (possibly w/ warning, see ?Extract)

 sw[1,]       # a one-row data frame
 sw[1,, drop = TRUE]  # a list

 sw["C", ] # partially matches
 sw[match("C", row.names(sw)), ] # no exact match
 try(sw[, "Ferti"]) # column names must match exactly

 \dontshow{
 stopifnot(identical(sw[, 1], sw[[1]]),
           identical(sw[, 1][1], 80.2),
           identical(sw[, 1, drop = FALSE], sw[1]),
           is.data.frame(sw[1 ]), dim(sw[1 ]) == c(5, 1),
           is.data.frame(sw[1,]), dim(sw[1,]) == c(1, 4),
           is.list(s1 <- sw[1, , drop = TRUE]), identical(s1$Fertility, 80.2))
 tools::assertError(sw[, "Ferti"])
 }
 swiss[ c(1, 1:2), ]   # duplicate row, unique row names are created

 sw[sw <= 6] <- 6  # logical matrix indexing
 sw

 ## adding a column
 sw["new1"] <- LETTERS[1:5]   # adds a character column
 sw[["new2"]] <- letters[1:5] # ditto
 sw[, "new3"] <- LETTERS[1:5] # ditto
 sw$new4 <- 1:5
 sapply(sw, class)
 sw$new  # -> NULL: no unique partial match
 sw$new4 <- NULL              # delete the column
 sw
 sw[6:8] <- list(letters[10:14], NULL, aa = 1:5)
 # update col. 6, delete 7, append
 sw

 ## matrices in a data frame
 A <- data.frame(x = 1:3, y = I(matrix(4:9, 3, 2)),
                          z = I(matrix(letters[1:9], 3, 3)))
 A[1:3, "y"] # a matrix
 A[1:3, "z"] # a matrix
 A[, "y"]    # a matrix
 stopifnot(identical(colnames(A), c("x", "y", "z")), ncol(A) == 3L,
           identical(A[,"y"], A[1:3, "y"]),
           inherits (A[,"y"], "AsIs"))

 ## keeping special attributes: use a class with a
 ## "as.data.frame" and "[" method;
 ## "avector" := vector that keeps attributes.   Could provide a constructor
 ##  avector <- function(x) { class(x) <- c("avector", class(x)); x }
 as.data.frame.avector <- as.data.frame.vector

 `[.avector` <- function(x,i,...) {
   r <- NextMethod("[")
   mostattributes(r) <- attributes(x)
   r
 }

 d <- data.frame(i = 0:7, f = gl(2,4),
                 u = structure(11:18, unit = "kg", class = "avector"))
 str(d[2:4, -1]) # 'u' keeps its "unit"
 \dontshow{
 stopifnot(identical(d[2:4,-1][,"u"],
                     structure(12:14, unit = "kg", class = "avector")))
 }
 }
 \keyword{array}
	% File src/library/base/man/Extract.data.frame.Rd
	% Part of the R package, https://www.R-project.org
	% Copyright 1995-2019 R Core Team
	% Distributed under GPL 2 or later

	\name{Extract.data.frame}
	\alias{[.data.frame}
	\alias{[[.data.frame}
	\alias{[<-.data.frame}
	\alias{[[<-.data.frame}
	% \alias{$.data.frame}
	\alias{$<-.data.frame}
	\title{Extract or Replace Parts of a Data Frame}
	\description{
	Extract or replace subsets of data frames.
	}
	\usage{
	\method{[}{data.frame}(x, i, j, drop = )
	\method{[}{data.frame}(x, i, j) <- value
	\method{[[}{data.frame}(x, ..., exact = TRUE)
	\method{[[}{data.frame}(x, i, j) <- value
	% \method{$}{data.frame}(x, name)
	\method{$}{data.frame}(x, name) <- value
	}
	\arguments{
	\item{x}{data frame.}

	\item{i, j, ...}{elements to extract or replace. For \code{[} and
	\code{[[}, these are \code{numeric} or \code{character} or, for
	\code{[} only, empty. Numeric values are coerced to integer as if
	by \code{\link{as.integer}}. For replacement by \code{[}, a logical
	matrix is allowed.}

	\item{name}{
	A literal character string or a \link{name} (possibly \link{backtick}
	quoted).}

	\item{drop}{logical. If \code{TRUE} the result is coerced to the
	lowest possible dimension. The default is to drop if only one
	column is left, but \bold{not} to drop if only one row is left.}

	\item{value}{A suitable replacement value: it will be repeated a whole
	number of times if necessary and it may be coerced: see the
	Coercion section. If \code{NULL}, deletes the column if a single
	column is selected.}

	\item{exact}{logical: see \code{\link{[}}, and applies to column names.}
	}
	\details{
	Data frames can be indexed in several modes. When \code{[} and
	\code{[[} are used with a single vector index (\code{x[i]} or
	\code{x[[i]]}), they index the data frame as if it were a list. In
	this usage a \code{drop} argument is ignored, with a warning.

	There is no \code{data.frame} method for \code{$}, so \code{x$name}
	uses the default method which treats \code{x} as a list (with partial
	matching of column names if the match is unique, see
	\code{\link{Extract}}). The replacement method (for \code{$}) checks
	\code{value} for the correct number of rows, and replicates it if necessary.

	When \code{[} and \code{[[} are used with two indices (\code{x[i, j]}
	and \code{x[[i, j]]}) they act like indexing a matrix: \code{[[} can
	only be used to select one element. Note that for each selected
	column, \code{xj} say, typically (if it is not matrix-like), the
	resulting column will be \code{xj[i]}, and hence rely on the
	corresponding \code{[} method, see the examples section.

	If \code{[} returns a data frame it will have unique (and non-missing)
	row names, if necessary transforming the row names using
	\code{\link{make.unique}}. Similarly, if columns are selected column
	names will be transformed to be unique if necessary (e.g., if columns
	are selected more than once, or if more than one column of a given
	name is selected if the data frame has duplicate column names).

	When \code{drop = TRUE}, this is applied to the subsetting of any
	matrices contained in the data frame as well as to the data frame itself.

	The replacement methods can be used to add whole column(s) by specifying
	non-existent column(s), in which case the column(s) are added at the
	right-hand edge of the data frame and numerical indices must be
	contiguous to existing indices. On the other hand, rows can be added
	at any row after the current last row, and the columns will be
	in-filled with missing values. Missing values in the indices are not
	allowed for replacement.

	For \code{[} the replacement value can be a list: each element of the
	list is used to replace (part of) one column, recycling the list as
	necessary. If columns specified by number are created, the names
	(if any) of the corresponding list elements are used to name the
	columns. If the replacement is not selecting rows, list values can
	contain \code{NULL} elements which will cause the corresponding
	columns to be deleted. (See the Examples.)

	Matrix indexing (\code{x[i]} with a logical or a 2-column integer
	matrix \code{i}) using \code{[} is not recommended. For extraction,
	\code{x} is first coerced to a matrix. For replacement, logical
	matrix indices must be of the same dimension as \code{x}.
	Replacements are done one column at a time, with multiple type
	coercions possibly taking place.

	Both \code{[} and \code{[[} extraction methods partially match row
	names. By default neither partially match column names, but \code{[[}
	will if \code{exact = FALSE} (and with a warning if \code{exact =
	NA}). If you want to exact matching on row names use
	\code{\link{match}}, as in the examples.
	}
	\section{Coercion}{
	The story over when replacement values are coerced is a complicated
	one, and one that has changed during \R's development. This section
	is a guide only.

	When \code{[} and \code{[[} are used to add or replace a whole column,
	no coercion takes place but \code{value} will be
	replicated (by calling the generic function \code{\link{rep}}) to the
	right length if an exact number of repeats can be used.

	When \code{[} is used with a logical matrix, each value is coerced to
	the type of the column into which it is to be placed.

	When \code{[} and \code{[[} are used with two indices, the
	column will be coerced as necessary to accommodate the value.

	Note that when the replacement value is an array (including a matrix)
	it is \emph{not} treated as a series of columns (as
	\code{\link{data.frame}} and \code{\link{as.data.frame}} do) but
	inserted as a single column.
	}
	\section{Warning}{
	The default behaviour when only one \emph{row} is left is equivalent to
	specifying \code{drop = FALSE}. To drop from a data frame to a list,
	\code{drop = TRUE} has to be specified explicitly.

	Arguments other than \code{drop} and \code{exact} should not be named:
	there is a warning if they are and the behaviour differs from the
	description here.
	}
	\value{
	For \code{[} a data frame, list or a single column (the latter two
	only when dimensions have been dropped). If matrix indexing is used for
	extraction a vector results. If the result would be a data frame an
	error results if undefined columns are selected (as there is no general
	concept of a 'missing' column in a data frame). Otherwise if a single
	column is selected and this is undefined the result is \code{NULL}.

	For \code{[[} a column of the data frame or \code{NULL}
	(extraction with one index)
	or a length-one vector (extraction with two indices).

	For \code{$}, a column of the data frame (or \code{NULL}).

	For \code{[<-}, \code{[[<-} and \code{$<-}, a data frame.
	}
	\seealso{
	\code{\link{subset}} which is often easier for extraction,
	\code{\link{data.frame}}, \code{\link{Extract}}.
	}
	\examples{
	sw <- swiss[1:5, 1:4] # select a manageable subset

	sw[1:3] # select columns
	sw[, 1:3] # same
	sw[4:5, 1:3] # select rows and columns
	sw[1] # a one-column data frame
	sw[, 1, drop = FALSE] # the same
	sw[, 1] # a (unnamed) vector
	sw[[1]] # the same
	sw$Fert # the same (possibly w/ warning, see ?Extract)

	sw[1,] # a one-row data frame
	sw[1,, drop = TRUE] # a list

	sw["C", ] # partially matches
	sw[match("C", row.names(sw)), ] # no exact match
	try(sw[, "Ferti"]) # column names must match exactly

	\dontshow{
	stopifnot(identical(sw[, 1], sw[[1]]),
	identical(sw[, 1][1], 80.2),
	identical(sw[, 1, drop = FALSE], sw[1]),
	is.data.frame(sw[1 ]), dim(sw[1 ]) == c(5, 1),
	is.data.frame(sw[1,]), dim(sw[1,]) == c(1, 4),
	is.list(s1 <- sw[1, , drop = TRUE]), identical(s1$Fertility, 80.2))
	tools::assertError(sw[, "Ferti"])
	}
	swiss[ c(1, 1:2), ] # duplicate row, unique row names are created

	sw[sw <= 6] <- 6 # logical matrix indexing
	sw

	## adding a column
	sw["new1"] <- LETTERS[1:5] # adds a character column
	sw[["new2"]] <- letters[1:5] # ditto
	sw[, "new3"] <- LETTERS[1:5] # ditto
	sw$new4 <- 1:5
	sapply(sw, class)
	sw$new # -> NULL: no unique partial match
	sw$new4 <- NULL # delete the column
	sw
	sw[6:8] <- list(letters[10:14], NULL, aa = 1:5)
	# update col. 6, delete 7, append
	sw

	## matrices in a data frame
	A <- data.frame(x = 1:3, y = I(matrix(4:9, 3, 2)),
	z = I(matrix(letters[1:9], 3, 3)))
	A[1:3, "y"] # a matrix
	A[1:3, "z"] # a matrix
	A[, "y"] # a matrix
	stopifnot(identical(colnames(A), c("x", "y", "z")), ncol(A) == 3L,
	identical(A[,"y"], A[1:3, "y"]),
	inherits (A[,"y"], "AsIs"))

	## keeping special attributes: use a class with a
	## "as.data.frame" and "[" method;
	## "avector" := vector that keeps attributes. Could provide a constructor
	## avector <- function(x) { class(x) <- c("avector", class(x)); x }
	as.data.frame.avector <- as.data.frame.vector

	`[.avector` <- function(x,i,...) {
	r <- NextMethod("[")
	mostattributes(r) <- attributes(x)
	r
	}

	d <- data.frame(i = 0:7, f = gl(2,4),
	u = structure(11:18, unit = "kg", class = "avector"))
	str(d[2:4, -1]) # 'u' keeps its "unit"
	\dontshow{
	stopifnot(identical(d[2:4,-1][,"u"],
	structure(12:14, unit = "kg", class = "avector")))
	}
	}
	\keyword{array}