src/library/base/man/Quotes.Rd - R - Git at Google

 % File src/library/base/man/Quotes.Rd
 % Part of the R package, https://www.R-project.org
 % Copyright 1995-2022 R Core Team
 % Distributed under GPL 2 or later

 \name{Quotes}
 \alias{Quotes}
 \alias{backtick}
 \alias{backquote}
 \alias{'}%'
 \alias{"}%"
 \alias{`}%`
 \concept{quotes}
 \concept{backslash}
 \concept{raw character strings}
 \title{Quotes}
 \description{
   Descriptions of the various uses of quoting in \R.
 }
 \details{
   Three types of quotes are part of the syntax of \R: single and double
   quotation marks and the backtick (or back quote, \samp{`}).  In
   addition, backslash is used to escape the following character
   inside character constants.
 }
 \section{Character constants}{
   Single and double quotes delimit character constants.  They can be used
   interchangeably but double quotes are preferred (and character
   constants are printed using double quotes), so single quotes are
   normally only used to delimit character constants containing double
   quotes.

   Backslash is used to start an escape sequence inside character
   constants.  Escaping a character not in the following table is an
   error.

   Single quotes need to be escaped by backslash in single-quoted
   strings, and double quotes in double-quoted strings.

   \tabular{ll}{
     \samp{\\n}\tab newline (aka \sQuote{line feed})\cr
     \samp{\\r}\tab carriage return\cr
     \samp{\\t}\tab tab\cr
     \samp{\\b}\tab backspace\cr
     \samp{\\a}\tab alert (bell)\cr
     \samp{\\f}\tab form feed\cr
     \samp{\\v}\tab vertical tab\cr
     \samp{\\\\}\tab backslash \samp{\\}\cr
     \samp{\\'}\tab ASCII apostrophe \samp{'}\cr
     \samp{\\"}\tab ASCII quotation mark \samp{"}\cr
     \samp{\\`}\tab ASCII grave accent (backtick) \samp{`}\cr
     \samp{\\nnn}\tab character with given octal code (1, 2 or 3 digits)\cr
     \samp{\\xnn}\tab character with given hex code (1 or 2 hex digits)\cr
     \samp{\\unnnn}\tab Unicode character with given code (1--4 hex digits)\cr
     \samp{\\Unnnnnnnn}\tab Unicode character with given code (1--8 hex digits)\cr
   }
   Alternative forms for the last two are \samp{\\u\{nnnn\}} and
   \samp{\\U\{nnnnnnnn\}}.  All except the Unicode escape sequences are
   also supported when reading character strings by \code{\link{scan}}
   and \code{\link{read.table}} if \code{allowEscapes = TRUE}.  Unicode
   escapes can be used to enter Unicode characters not in the current
   locale's charset (when the string will be stored internally in UTF-8).
   The maximum allowed value for \samp{\\nnn} is \samp{\\377} (the same
   character as \samp{\\xff}).

   As from \R 4.1.0 the largest allowed \samp{\\U} value is
   \samp{\\U10FFFF}, the maximum Unicode point.

   The parser does not allow the use of both octal/hex and Unicode
   escapes in a single string.

   These forms will also be used by \code{\link{print.default}}
   when outputting non-printable characters (including backslash).

   Embedded nuls are not allowed in character strings, so using escapes
   (such as \samp{\\0}) for a nul will result in the string being
   truncated at that point (usually with a warning).

   Raw character constants are also available using a syntax similar to
   the one used in C++: \code{r"(...)"} with \code{...} any character
   sequence, except that it must not contain the closing sequence
   \samp{)"}. %"
   The delimiter pairs \code{[]} and \code{\{\}} can also be
   used, and \code{R} can be used in place of \code{r}.  For additional
   flexibility, a number of dashes can be placed between the opening quote
   and the opening delimiter, as long as the same number of dashes appear
   between the closing delimiter and the closing quote.
 }
 \section{Names and Identifiers}{
   Identifiers consist of a sequence of letters, digits, the period
   (\code{.}) and the underscore.  They must not start with a digit nor
   underscore, nor with a period followed by a digit.  \link{Reserved}
   words are not valid identifiers.

   The definition of a \emph{letter} depends on the current locale, but
   only ASCII digits are considered to be digits.

   Such identifiers are also known as \emph{syntactic names} and may be used
   directly in \R code.  Almost always, other names can be used
   provided they are quoted.  The preferred quote is the backtick
   (\samp{`}), and \code{\link{deparse}} will normally use it, but under
   many circumstances single or double quotes can be used (as a character
   constant will often be converted to a name).  One place where
   backticks may be essential is to delimit variable names in formulae:
   see \code{\link{formula}}.
 }

 \note{
   UTF-16 surrogate pairs in \samp{\\unnnn\uoooo} form will be converted
   to a single Unicode point, so for example \samp{\\uD834\\uDD1E} gives
   the single character \samp{\\U1D11E}.  However, unpaired values in
   the surrogate range such as in the string \code{"abc\uD834de"} will be
   converted to a non-standard-conformant UTF-8 string (as is done by most
   other software): this may change in future.
 }

 \seealso{
   \code{\link{Syntax}} for other aspects of the syntax.

   \code{\link{sQuote}} for quoting English text.

   \code{\link{shQuote}} for quoting OS commands.

   The  \sQuote{R Language Definition} manual.
 }
 \examples{%% NOTE: Quote the \ even  "once more" !
 'single quotes can be used more-or-less interchangeably'
 "with double quotes to create character vectors"

 ## Single quotes inside single-quoted strings need backslash-escaping.
 ## Ditto double quotes inside double-quoted strings.
 ##
 identical('"It\'s alive!", he screamed.',
           "\"It's alive!\", he screamed.") # same

 ## Backslashes need doubling, or they have a special meaning.
 x <- "In ALGOL, you could do logical AND with /\\\\."
 print(x)      # shows it as above ("input-like")
 writeLines(x) # shows it as you like it ;-)

 ## Single backslashes followed by a letter are used to denote
 ## special characters like tab(ulator)s and newlines:
 x <- "long\tlines can be\nbroken with newlines"
 writeLines(x) # see also ?strwrap

 ## Backticks are used for non-standard variable names.
 ## (See make.names and ?Reserved for what counts as
 ## non-standard.)
 `x y` <- 1:5
 `x y`
 d <- data.frame(`1st column` = rchisq(5, 2), check.names = FALSE)
 d$`1st column`

 ## Backslashes followed by up to three numbers are interpreted as
 ## octal notation for ASCII characters.
 "\110\145\154\154\157\40\127\157\162\154\144\41"

 ## \x followed by up to two numbers is interpreted as
 ## hexadecimal notation for ASCII characters.
 (hw1 <- "\x48\x65\x6c\x6c\x6f\x20\x57\x6f\x72\x6c\x64\x21")

 ## Mixing octal and hexadecimal in the same string is OK
 (hw2 <- "\110\x65\154\x6c\157\x20\127\x6f\162\x6c\144\x21")

 ## \u is also hexadecimal, but supports up to 4 digits,
 ## using Unicode specification.  In the previous example,
 ## you can simply replace \x with \u.
 (hw3 <- "\u48\u65\u6c\u6c\u6f\u20\u57\u6f\u72\u6c\u64\u21")

 ## The last three are all identical to
 hw <- "Hello World!"
 stopifnot(identical(hw, hw1), identical(hw1, hw2), identical(hw2, hw3))

 ## Using Unicode makes more sense for non-latin characters.
 (nn <- "\u0126\u0119\u1114\u022d\u2001\u03e2\u0954\u0f3f\u13d3\u147b\u203c")

 ## Mixing \x and \u throws a _parse_ error (which is not catchable!)
 \dontrun{
   "\x48\u65\x6c\u6c\x6f\u20\x57\u6f\x72\u6c\x64\u21"
 }
 ##   -->   Error: mixing Unicode and octal/hex escapes .....

 ## \U works like \u, but supports up to six hex digits.
 ## So we can replace \u with \U in the previous example.
 n2 <- "\U0126\U0119\U1114\U022d\U2001\U03e2\U0954\U0f3f\U13d3\U147b\U203c"
 stopifnot(identical(nn, n2))

 ## Under systems supporting multi-byte locales (and not Windows),
 ## \U also supports the rarer characters outside the usual 16^4 range.
 ## See the R language manual,
 ## https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Literal-constants
 ## and bug 16098 https://bugs.r-project.org/show_bug.cgi?id=16098
 ## This character may or not be printable (the platform decides)
 ## and if it is, may not have a glyph in the font used.
 "\U1d4d7" # On Windows this used to give the incorrect value of "\Ud4d7"

 ## nul characters (for terminating strings in C) are not allowed (parse errors)
 \dontrun{% as above, these errors cannot be caught via try*(..)
   "foo\0bar"     # Error: nul character not allowed (line 1)
   "foo\u0000bar" # same error
 }

 ## A Windows path written as a raw string constant:
 r"(c:\Program files\R)"

 ## More raw strings:
 r"{(\1\2)}"
 r"(use both "double" and 'single' quotes)"
 r"---(\1--)-)---"
 }
 \keyword{documentation}
	% File src/library/base/man/Quotes.Rd
	% Part of the R package, https://www.R-project.org
	% Copyright 1995-2022 R Core Team
	% Distributed under GPL 2 or later

	\name{Quotes}
	\alias{Quotes}
	\alias{backtick}
	\alias{backquote}
	\alias{'}%'
	\alias{"}%"
	\alias{`}%`
	\concept{quotes}
	\concept{backslash}
	\concept{raw character strings}
	\title{Quotes}
	\description{
	Descriptions of the various uses of quoting in \R.
	}
	\details{
	Three types of quotes are part of the syntax of \R: single and double
	quotation marks and the backtick (or back quote, \samp{`}). In
	addition, backslash is used to escape the following character
	inside character constants.
	}
	\section{Character constants}{
	Single and double quotes delimit character constants. They can be used
	interchangeably but double quotes are preferred (and character
	constants are printed using double quotes), so single quotes are
	normally only used to delimit character constants containing double
	quotes.

	Backslash is used to start an escape sequence inside character
	constants. Escaping a character not in the following table is an
	error.

	Single quotes need to be escaped by backslash in single-quoted
	strings, and double quotes in double-quoted strings.

	\tabular{ll}{
	\samp{\\n}\tab newline (aka \sQuote{line feed})\cr
	\samp{\\r}\tab carriage return\cr
	\samp{\\t}\tab tab\cr
	\samp{\\b}\tab backspace\cr
	\samp{\\a}\tab alert (bell)\cr
	\samp{\\f}\tab form feed\cr
	\samp{\\v}\tab vertical tab\cr
	\samp{\\\\}\tab backslash \samp{\\}\cr
	\samp{\\'}\tab ASCII apostrophe \samp{'}\cr
	\samp{\\"}\tab ASCII quotation mark \samp{"}\cr
	\samp{\\`}\tab ASCII grave accent (backtick) \samp{`}\cr
	\samp{\\nnn}\tab character with given octal code (1, 2 or 3 digits)\cr
	\samp{\\xnn}\tab character with given hex code (1 or 2 hex digits)\cr
	\samp{\\unnnn}\tab Unicode character with given code (1--4 hex digits)\cr
	\samp{\\Unnnnnnnn}\tab Unicode character with given code (1--8 hex digits)\cr
	}
	Alternative forms for the last two are \samp{\\u\{nnnn\}} and
	\samp{\\U\{nnnnnnnn\}}. All except the Unicode escape sequences are
	also supported when reading character strings by \code{\link{scan}}
	and \code{\link{read.table}} if \code{allowEscapes = TRUE}. Unicode
	escapes can be used to enter Unicode characters not in the current
	locale's charset (when the string will be stored internally in UTF-8).
	The maximum allowed value for \samp{\\nnn} is \samp{\\377} (the same
	character as \samp{\\xff}).

	As from \R 4.1.0 the largest allowed \samp{\\U} value is
	\samp{\\U10FFFF}, the maximum Unicode point.

	The parser does not allow the use of both octal/hex and Unicode
	escapes in a single string.

	These forms will also be used by \code{\link{print.default}}
	when outputting non-printable characters (including backslash).

	Embedded nuls are not allowed in character strings, so using escapes
	(such as \samp{\\0}) for a nul will result in the string being
	truncated at that point (usually with a warning).

	Raw character constants are also available using a syntax similar to
	the one used in C++: \code{r"(...)"} with \code{...} any character
	sequence, except that it must not contain the closing sequence
	\samp{)"}. %"
	The delimiter pairs \code{[]} and \code{\{\}} can also be
	used, and \code{R} can be used in place of \code{r}. For additional
	flexibility, a number of dashes can be placed between the opening quote
	and the opening delimiter, as long as the same number of dashes appear
	between the closing delimiter and the closing quote.
	}
	\section{Names and Identifiers}{
	Identifiers consist of a sequence of letters, digits, the period
	(\code{.}) and the underscore. They must not start with a digit nor
	underscore, nor with a period followed by a digit. \link{Reserved}
	words are not valid identifiers.

	The definition of a \emph{letter} depends on the current locale, but
	only ASCII digits are considered to be digits.

	Such identifiers are also known as \emph{syntactic names} and may be used
	directly in \R code. Almost always, other names can be used
	provided they are quoted. The preferred quote is the backtick
	(\samp{`}), and \code{\link{deparse}} will normally use it, but under
	many circumstances single or double quotes can be used (as a character
	constant will often be converted to a name). One place where
	backticks may be essential is to delimit variable names in formulae:
	see \code{\link{formula}}.
	}

	\note{
	UTF-16 surrogate pairs in \samp{\\unnnn\uoooo} form will be converted
	to a single Unicode point, so for example \samp{\\uD834\\uDD1E} gives
	the single character \samp{\\U1D11E}. However, unpaired values in
	the surrogate range such as in the string \code{"abc\uD834de"} will be
	converted to a non-standard-conformant UTF-8 string (as is done by most
	other software): this may change in future.
	}

	\seealso{
	\code{\link{Syntax}} for other aspects of the syntax.

	\code{\link{sQuote}} for quoting English text.

	\code{\link{shQuote}} for quoting OS commands.

	The \sQuote{R Language Definition} manual.
	}
	\examples{%% NOTE: Quote the \ even "once more" !
	'single quotes can be used more-or-less interchangeably'
	"with double quotes to create character vectors"

	## Single quotes inside single-quoted strings need backslash-escaping.
	## Ditto double quotes inside double-quoted strings.
	##
	identical('"It\'s alive!", he screamed.',
	"\"It's alive!\", he screamed.") # same

	## Backslashes need doubling, or they have a special meaning.
	x <- "In ALGOL, you could do logical AND with /\\\\."
	print(x) # shows it as above ("input-like")
	writeLines(x) # shows it as you like it ;-)

	## Single backslashes followed by a letter are used to denote
	## special characters like tab(ulator)s and newlines:
	x <- "long\tlines can be\nbroken with newlines"
	writeLines(x) # see also ?strwrap

	## Backticks are used for non-standard variable names.
	## (See make.names and ?Reserved for what counts as
	## non-standard.)
	`x y` <- 1:5
	`x y`
	d <- data.frame(`1st column` = rchisq(5, 2), check.names = FALSE)
	d$`1st column`

	## Backslashes followed by up to three numbers are interpreted as
	## octal notation for ASCII characters.
	"\110\145\154\154\157\40\127\157\162\154\144\41"

	## \x followed by up to two numbers is interpreted as
	## hexadecimal notation for ASCII characters.
	(hw1 <- "\x48\x65\x6c\x6c\x6f\x20\x57\x6f\x72\x6c\x64\x21")

	## Mixing octal and hexadecimal in the same string is OK
	(hw2 <- "\110\x65\154\x6c\157\x20\127\x6f\162\x6c\144\x21")

	## \u is also hexadecimal, but supports up to 4 digits,
	## using Unicode specification. In the previous example,
	## you can simply replace \x with \u.
	(hw3 <- "\u48\u65\u6c\u6c\u6f\u20\u57\u6f\u72\u6c\u64\u21")

	## The last three are all identical to
	hw <- "Hello World!"
	stopifnot(identical(hw, hw1), identical(hw1, hw2), identical(hw2, hw3))

	## Using Unicode makes more sense for non-latin characters.
	(nn <- "\u0126\u0119\u1114\u022d\u2001\u03e2\u0954\u0f3f\u13d3\u147b\u203c")

	## Mixing \x and \u throws a _parse_ error (which is not catchable!)
	\dontrun{
	"\x48\u65\x6c\u6c\x6f\u20\x57\u6f\x72\u6c\x64\u21"
	}
	## --> Error: mixing Unicode and octal/hex escapes .....

	## \U works like \u, but supports up to six hex digits.
	## So we can replace \u with \U in the previous example.
	n2 <- "\U0126\U0119\U1114\U022d\U2001\U03e2\U0954\U0f3f\U13d3\U147b\U203c"
	stopifnot(identical(nn, n2))

	## Under systems supporting multi-byte locales (and not Windows),
	## \U also supports the rarer characters outside the usual 16^4 range.
	## See the R language manual,
	## https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Literal-constants
	## and bug 16098 https://bugs.r-project.org/show_bug.cgi?id=16098
	## This character may or not be printable (the platform decides)
	## and if it is, may not have a glyph in the font used.
	"\U1d4d7" # On Windows this used to give the incorrect value of "\Ud4d7"

	## nul characters (for terminating strings in C) are not allowed (parse errors)
	\dontrun{% as above, these errors cannot be caught via try*(..)
	"foo\0bar" # Error: nul character not allowed (line 1)
	"foo\u0000bar" # same error
	}

	## A Windows path written as a raw string constant:
	r"(c:\Program files\R)"

	## More raw strings:
	r"{(\1\2)}"
	r"(use both "double" and 'single' quotes)"
	r"---(\1--)-)---"
	}
	\keyword{documentation}