blob: 2bdd556e5ac78dc2d8bf6b8be44ddc195b9d131e [file] [log] [blame]
\name{strcapture}
\alias{strcapture}
\title{Capture String Tokens into a data.frame}
\description{
Given a character vector and a regular expression containing capture
expressions, \code{strcapture} will extract the captured tokens into a
tabular data structure, such as a data.frame, the type and structure of
which is specified by a prototype object. The assumption is that the
same number of tokens are captured from every input string.
}
\usage{
strcapture(pattern, x, proto, perl = FALSE, useBytes = FALSE)
}
\arguments{
\item{pattern}{
The regular expression with the capture expressions.
}
\item{x}{
A character vector in which to capture the tokens.
}
\item{proto}{
A \code{data.frame} or S4 object that behaves like one. See details.
}
\item{perl,useBytes}{
Arguments passed to \code{\link{regexec}}.
}
}
\details{
The \code{proto} argument is typically a \code{data.frame}, with a
column corresponding to each capture expression, in order. The
captured character vector is coerced to the type of the column, and
the column names are carried over to the return value. Any data in the
prototype are ignored. See the examples.
}
\value{
A tabular data structure of the same type as \code{proto}, so
typically a \code{data.frame}, containing a column for each capture
expression. The column types and names are inherited from
\code{proto}. Cases in \code{x} that do not match \code{pattern} have
\code{NA} in every column.
}
\seealso{
\code{\link{regexec}} and \code{\link{regmatches}} for related
low-level utilities.
}
\examples{
x <- "chr1:1-1000"
pattern <- "(.*?):([[:digit:]]+)-([[:digit:]]+)"
proto <- data.frame(chr=character(), start=integer(), end=integer())
strcapture(pattern, x, proto)
}
\keyword{utilities}