blob: 6496890d1870357392fc9a7836de30a8124f5643 [file] [log] [blame]
% File src/library/utils/man/read.DIF.Rd
% Part of the R package, https://www.R-project.org
% Copyright 1995-2014 R Core Team
% Distributed under GPL 2 or later
\name{read.DIF}
\alias{read.DIF}
\title{Data Input from Spreadsheet}
\description{
Reads a file in Data Interchange Format (DIF) and creates a data frame
from it. DIF is a format for data matrices such as single spreadsheets.
}
\usage{
read.DIF(file, header = FALSE,
dec = ".", numerals = c("allow.loss", "warn.loss", "no.loss"),
row.names, col.names, as.is = !stringsAsFactors,
na.strings = "NA", colClasses = NA, nrows = -1,
skip = 0, check.names = TRUE, blank.lines.skip = TRUE,
stringsAsFactors = default.stringsAsFactors(),
transpose = FALSE, fileEncoding = "")
}
\arguments{
\item{file}{the name of the file which the data are to be read from,
or a \link{connection}, or a complete URL.
The name \code{"clipboard"} may also be used on Windows, in which
case \code{read.DIF("clipboard")} will look for a DIF format entry
in the Windows clipboard.
}
\item{header}{a logical value indicating whether the spreadsheet contains the
names of the variables as its first line. If missing, the value is
determined from the file format: \code{header} is set to \code{TRUE}
if and only if the first row contains only character values and
the top left cell is empty.}
\item{dec}{the character used in the file for decimal points.}
\item{numerals}{string indicating how to convert numbers whose conversion
to double precision would lose accuracy, see \code{\link{type.convert}}.}
\item{row.names}{a vector of row names. This can be a vector giving
the actual row names, or a single number giving the column of the
table which contains the row names, or character string giving the
name of the table column containing the row names.
If there is a header and the first row contains one fewer field than
the number of columns, the first column in the input is used for the
row names. Otherwise if \code{row.names} is missing, the rows are
numbered.
Using \code{row.names = NULL} forces row numbering.
}
\item{col.names}{a vector of optional names for the variables.
The default is to use \code{"V"} followed by the column number.}
\item{as.is}{the default behavior of \code{read.DIF} is to convert
character variables to factors. The variable \code{as.is} controls the
conversion of columns not otherwise specified by \code{colClasses}.
Its value is either a vector of logicals (values are recycled if
necessary), or a vector of numeric or character indices which
specify which columns should not be converted to factors.
Note: In releases prior to \R{} 2.12.1, cells marked as being of
character type were converted to logical, numeric or complex using
\code{\link{type.convert}} as in \code{\link{read.table}}.
Note: to suppress all conversions including those of numeric
columns, set \code{colClasses = "character"}.
Note that \code{as.is} is specified per column (not per
variable) and so includes the column of row names (if any) and any
columns to be skipped.
}
\item{na.strings}{a character vector of strings which are to be
interpreted as \code{\link{NA}} values. Blank fields are also
considered to be missing values in logical, integer, numeric and
complex fields.}
\item{colClasses}{character. A vector of classes to be assumed for
the columns. Recycled as necessary, or if the character vector is
named, unspecified values are taken to be \code{NA}.
Possible values are \code{NA} (when \code{\link{type.convert}} is
used), \code{"NULL"} (when the column is skipped), one of the atomic
vector classes (logical, integer, numeric, complex, character, raw),
or \code{"factor"}, \code{"Date"} or \code{"POSIXct"}. Otherwise
there needs to be an \code{as} method (from package \pkg{methods})
for conversion from \code{"character"} to the specified formal
class.
Note that \code{colClasses} is specified per column (not per
variable) and so includes the column of row names (if any).
}
\item{nrows}{the maximum number of rows to read in. Negative values
are ignored.}
\item{skip}{the number of lines of the data file to skip before
beginning to read data.}
\item{check.names}{logical. If \code{TRUE} then the names of the
variables in the data frame are checked to ensure that they are
syntactically valid variable names. If necessary they are adjusted
(by \code{\link{make.names}}) so that they are, and also to ensure
that there are no duplicates.}
\item{blank.lines.skip}{logical: if \code{TRUE} blank lines in the
input are ignored.}
\item{stringsAsFactors}{logical: should character vectors be converted
to factors?}
\item{transpose}{logical, indicating if the row and column
interpretation should be transposed. Microsoft's Excel has been
known to produce (non-standard conforming) DIF files which would
need \code{transpose = TRUE} to be read correctly.}
\item{fileEncoding}{character string: if non-empty declares the
encoding used on a file (not a connection or clipboard) so the
character data can be re-encoded. See the \sQuote{Encoding} section
of the help for \code{\link{file}}, the \sQuote{R Data Import/Export
Manual} and \sQuote{Note}.}
}
\value{
A data frame (\code{\link{data.frame}}) containing a representation of
the data in the file. Empty input is an error unless \code{col.names}
is specified, when a 0-row data frame is returned: similarly giving
just a header line if \code{header = TRUE} results in a 0-row data frame.
}
\note{
The columns referred to in \code{as.is} and \code{colClasses} include
the column of row names (if any).
Less memory will be used if \code{colClasses} is specified as one of
the six atomic vector classes.
}
\author{R Core; \code{transpose} option by Christoph Buser, ETH Zurich}
\seealso{
The \emph{R Data Import/Export} manual.
\code{\link{scan}}, \code{\link{type.convert}},
\code{\link{read.fwf}} for reading \emph{f}ixed \emph{w}idth
\emph{f}ormatted input;
\code{\link{read.table}};
\code{\link{data.frame}}.
}
\references{
The DIF format specification can be found by searching on
\url{http://www.wotsit.org/}; the optional header fields are ignored.
See also
\url{https://en.wikipedia.org/wiki/Data_Interchange_Format}.
The term is likely to lead to confusion: Windows will have a
\sQuote{Windows Data Interchange Format (DIF) data format} as part of
its WinFX system, which may or may not be compatible.
}
\examples{
## read.DIF() may need transpose = TRUE for a file exported from Excel
udir <- system.file("misc", package = "utils")
dd <- read.DIF(file.path(udir, "exDIF.dif"), header = TRUE, transpose = TRUE)
dc <- read.csv(file.path(udir, "exDIF.csv"), header = TRUE)
stopifnot(identical(dd, dc), dim(dd) == c(4,2))
}
\keyword{file}
\keyword{connection}