| \input texinfo |
| @c %**start of header |
| @setfilename R-ints.info |
| @settitle R Internals |
| @setchapternewpage on |
| @c %**end of header |
| |
| @c @documentencoding ISO-8859-1 |
| |
| @syncodeindex fn vr |
| |
| @dircategory Programming |
| @direntry |
| * R Internals: (R-ints). R Internals. |
| @end direntry |
| |
| @finalout |
| |
| @include R-defs.texi |
| @include version.texi |
| |
| @copying |
| This manual is for R, version @value{VERSION}. |
| |
| @Rcopyright{1999} |
| |
| @quotation |
| @permission{} |
| @end quotation |
| @end copying |
| |
| @titlepage |
| @title R Internals |
| @subtitle Version @value{VERSION} |
| @author R Core Team |
| @page |
| @vskip 0pt plus 1filll |
| @insertcopying |
| @end titlepage |
| |
| @ifplaintext |
| @insertcopying |
| @end ifplaintext |
| |
| @c @ifnothtml |
| @contents |
| @c @end ifnothtml |
| |
| @ifnottex |
| @node Top, R Internal Structures, (dir), (dir) |
| @top R Internals |
| |
| This is a guide to the internal structures of @R{} and coding standards for |
| the core team working on @R{} itself. |
| |
| @insertcopying |
| |
| @end ifnottex |
| |
| @menu |
| * R Internal Structures:: |
| * .Internal vs .Primitive:: |
| * Internationalization in the R sources:: |
| * Package Structure:: |
| * Files:: |
| * Graphics Devices:: |
| * GUI consoles:: |
| * Tools:: |
| * R coding standards:: |
| * Testing R code:: |
| * Use of TeX dialects:: |
| * Current and future directions:: |
| * Function and variable index:: |
| * Concept index:: |
| @end menu |
| @c Could have (autogenerated) @detailmenu here .. |
| |
| @node R Internal Structures, .Internal vs .Primitive, Top, Top |
| @chapter R Internal Structures |
| |
| This chapter is the beginnings of documentation about @R{} internal |
| structures. It is written for the core team and others studying the |
| code in the @file{src/main} directory. |
| |
| It is a work-in-progress and should be checked against the current |
| version of the source code. Versions for @R{} 2.x.y contain historical |
| comments about when features were introduced: this version is for the |
| 3.x.y series. |
| |
| @menu |
| * SEXPs:: |
| * Environments and variable lookup:: |
| * Attributes:: |
| * Contexts:: |
| * Argument evaluation:: |
| * Autoprinting:: |
| * The write barrier:: |
| * Serialization Formats:: |
| * Encodings for CHARSXPs:: |
| * The CHARSXP cache:: |
| * Warnings and errors:: |
| * S4 objects:: |
| * Memory allocators:: |
| * Internal use of global and base environments:: |
| * Modules:: |
| * Visibility:: |
| * Lazy loading:: |
| @end menu |
| |
| @node SEXPs, Environments and variable lookup, R Internal Structures, R Internal Structures |
| @section SEXPs |
| |
| @cindex SEXP |
| @cindex SEXPRREC |
| What @R{} users think of as @emph{variables} or @emph{objects} are |
| symbols which are bound to a value. The value can be thought of as |
| either a @code{SEXP} (a pointer), or the structure it points to, a |
| @code{SEXPREC} (and there are alternative forms used for vectors, namely |
| @code{VECSXP} pointing to @code{VECTOR_SEXPREC} structures). |
| So the basic building blocks of @R{} objects are often called |
| @emph{nodes}, meaning @code{SEXPREC}s or @code{VECTOR_SEXPREC}s. |
| |
| Note that the internal structure of the @code{SEXPREC} is not made |
| available to @R{} Extensions: rather @code{SEXP} is an opaque pointer, |
| and the internals can only be accessed by the functions provided. |
| |
| @cindex node |
| Both types of node structure have as their first three fields a 64-bit |
| @code{sxpinfo} header and then three pointers (to the attributes and the |
| previous and next node in a doubly-linked list), and then some further |
| fields. On a 32-bit platform a node@footnote{strictly, a @code{SEXPREC} |
| node; @code{VECTOR_SEXPREC} nodes are slightly smaller but followed by |
| data in the node.} occupies 32 bytes: on a 64-bit platform typically 56 |
| bytes (depending on alignment constraints). |
| |
| The first five bits of the @code{sxpinfo} header specify one of up to 32 |
| @code{SEXPTYPE}s. |
| |
| @menu |
| * SEXPTYPEs:: |
| * Rest of header:: |
| * The 'data':: |
| * Allocation classes:: |
| @end menu |
| |
| @node SEXPTYPEs, Rest of header, SEXPs, SEXPs |
| @subsection SEXPTYPEs |
| |
| @cindex SEXPTYPE |
| Currently @code{SEXPTYPE}s 0:10 and 13:25 are in use. Values 11 and 12 |
| were used for internal factors and ordered factors and have since been |
| withdrawn. Note that the @code{SEXPTYPE} numbers are stored in |
| @code{save}d objects and that the ordering of the types is used, so the |
| gap cannot easily be reused. |
| |
| @cindex SEXPTYPE table |
| @quotation |
| @multitable {no} {SPECIALSXPXXX} {S4 classes not of simple type} |
| @headitem no @tab SEXPTYPE@tab Description |
| @item @code{0} @tab @code{NILSXP} @tab @code{NULL} |
| @item @code{1} @tab @code{SYMSXP} @tab symbols |
| @item @code{2} @tab @code{LISTSXP} @tab pairlists |
| @item @code{3} @tab @code{CLOSXP} @tab closures |
| @item @code{4} @tab @code{ENVSXP} @tab environments |
| @item @code{5} @tab @code{PROMSXP} @tab promises |
| @item @code{6} @tab @code{LANGSXP} @tab language objects |
| @item @code{7} @tab @code{SPECIALSXP} @tab special functions |
| @item @code{8} @tab @code{BUILTINSXP} @tab builtin functions |
| @item @code{9} @tab @code{CHARSXP} @tab internal character strings |
| @item @code{10} @tab @code{LGLSXP} @tab logical vectors |
| @item @code{13} @tab @code{INTSXP} @tab integer vectors |
| @item @code{14} @tab @code{REALSXP} @tab numeric vectors |
| @item @code{15} @tab @code{CPLXSXP} @tab complex vectors |
| @item @code{16} @tab @code{STRSXP} @tab character vectors |
| @item @code{17} @tab @code{DOTSXP} @tab dot-dot-dot object |
| @item @code{18} @tab @code{ANYSXP} @tab make ``any'' args work |
| @item @code{19} @tab @code{VECSXP} @tab list (generic vector) |
| @item @code{20} @tab @code{EXPRSXP} @tab expression vector |
| @item @code{21} @tab @code{BCODESXP} @tab byte code |
| @item @code{22} @tab @code{EXTPTRSXP} @tab external pointer |
| @item @code{23} @tab @code{WEAKREFSXP} @tab weak reference |
| @item @code{24} @tab @code{RAWSXP} @tab raw vector |
| @item @code{25} @tab @code{S4SXP} @tab S4 classes not of simple type |
| @end multitable |
| @end quotation |
| |
| @cindex atomic vector type |
| Many of these will be familiar from @R{} level: the atomic vector types |
| are @code{LGLSXP}, @code{INTSXP}, @code{REALSXP}, @code{CPLXSP}, |
| @code{STRSXP} and @code{RAWSXP}. Lists are @code{VECSXP} and names |
| (also known as symbols) are @code{SYMSXP}. Pairlists (@code{LISTSXP}, |
| the name going back to the origins of @R{} as a Scheme-like language) |
| are rarely seen at @R{} level, but are for example used for argument |
| lists. Character vectors are effectively lists all of whose elements |
| are @code{CHARSXP}, a type that is rarely visible at @R{} level. |
| |
| @cindex language object |
| @cindex argument list |
| Language objects (@code{LANGSXP}) are calls (including formulae and so |
| on). Internally they are pairlists with first element a |
| reference@footnote{a pointer to a function or a symbol to look up the |
| function by name, or a language object to be evaluated to give a |
| function.} to the function to be called with remaining elements the |
| actual arguments for the call (and with the tags if present giving the |
| specified argument names). Although this is not enforced, many places |
| in the code assume that the pairlist is of length one or more, often |
| without checking. |
| |
| @cindex expression |
| Expressions are of type @code{EXPRSXP}: they are a vector of (usually |
| language) objects most often seen as the result of @code{parse()}. |
| |
| @cindex function |
| The functions are of types @code{CLOSXP}, @code{SPECIALSXP} and |
| @code{BUILTINSXP}: where @code{SEXPTYPE}s are stored in an integer |
| these are sometimes lumped into a pseudo-type @code{FUNSXP} with code |
| 99. Functions defined via @code{function} are of type @code{CLOSXP} and |
| have formals, body and environment. |
| |
| @cindex S4 type |
| The @code{SEXPTYPE} @code{S4SXP} is for S4 objects which do not consist |
| solely of a simple type such as an atomic vector or function. |
| |
| |
| @node Rest of header, The 'data', SEXPTYPEs, SEXPs |
| @subsection Rest of header |
| |
| Note that the size and structure of the header changed in @R{} 3.5.0: |
| see earlier editions of this manual for the previous layout. |
| |
| The @code{sxpinfo} header is defined as a 64-bit C structure by |
| |
| @example |
| #define NAMED_BITS 16 |
| struct sxpinfo_struct @{ |
| SEXPTYPE type : 5; /* @r{discussed above} */ |
| unsigned int scalar: 1; /* @r{is this a numeric vector of length 1?} |
| unsigned int obj : 1; /* @r{is this an object with a class attribute?} */ |
| unsigned int alt : 1; /* @r{is this an @code{ALTREP} object?} */ |
| unsigned int gp : 16; /* @r{general purpose, see below} */ |
| unsigned int mark : 1; /* @r{mark object as `in use' in GC} */ |
| unsigned int debug : 1; |
| unsigned int trace : 1; |
| unsigned int spare : 1; /* @r{debug once and with reference counting} */ |
| unsigned int gcgen : 1; /* @r{generation for GC} */ |
| unsigned int gccls : 3; /* @r{class of node for GC} */ |
| unsigned int named : NAMED_BITS; /* @r{used to control copying} */ |
| unsigned int extra : 32 - NAMED_BITS; |
| @}; /* Tot: 64 */ |
| @end example |
| |
| @findex debug bit |
| The @code{debug} bit is used for closures and environments. For |
| closures it is set by @code{debug()} and unset by @code{undebug()}, and |
| indicates that evaluations of the function should be run under the |
| browser. For environments it indicates whether the browsing is in |
| single-step mode. |
| |
| @findex trace bit |
| The @code{trace} bit is used for functions for @code{trace()} and for |
| other objects when tracing duplications (see @code{tracemem}). |
| |
| @findex spare bit |
| The @code{spare} bit is used for closures to mark them for one-time |
| debugging. |
| |
| @findex named bits |
| @findex NAMED |
| @findex SET_NAMED |
| @cindex copying semantics |
| The @code{named} field is set and accessed by the @code{SET_NAMED} and |
| @code{NAMED} macros, and take values @code{0}, @code{1} and @code{2}, or |
| possibly higher if @code{NAMEDMAX} is set to a higher value. |
| @R{} has a `call by value' illusion, so an assignment like |
| @example |
| b <- a |
| @end example |
| [The @code{NAMED} mechanism has been replaced by reference counting.] |
| |
| @noindent |
| appears to make a copy of @code{a} and refer to it as @code{b}. |
| However, if neither @code{a} nor @code{b} are subsequently altered there |
| is no need to copy. What really happens is that a new symbol @code{b} |
| is bound to the same value as @code{a} and the @code{named} field on the |
| value object is set (in this case to @code{2}). When an object is about |
| to be altered, the @code{named} field is consulted. A value of @code{2} |
| or more means that the object must be duplicated before being changed. (Note |
| that this does not say that it is necessary to duplicate, only that it |
| should be duplicated whether necessary or not.) A value of @code{0} |
| means that it is known that no other @code{SEXP} shares data with this |
| object, and so it may safely be altered. A value of @code{1} is used |
| for situations like |
| |
| @example |
| dim(a) <- c(7, 2) |
| @end example |
| |
| @noindent |
| where in principle two copies of @code{a} exist for the duration of the |
| computation as (in principle) |
| |
| @example |
| a <- `dim<-`(a, c(7, 2)) |
| @end example |
| |
| @noindent |
| but for no longer, and so some primitive functions can be optimized to |
| avoid a copy in this case. [This mechanism is scheduled to be replaced |
| in @R{} 4.0.0.] |
| |
| The @code{gp} bits are by definition `general purpose'. We label these |
| from 0 to 15. Bits 0--5 and bits 14--15 have been used as described below |
| (mainly from detective work on the sources). |
| |
| @findex gp bits |
| @findex LEVELS |
| @findex SETLEVELS |
| The bits can be accessed and set by the @code{LEVELS} and |
| @code{SETLEVELS} macros, which names appear to date back to the internal |
| factor and ordered types and are now used in only a few places in the |
| code. The @code{gp} field is serialized/unserialized for the |
| @code{SEXPTYPE}s other than @code{NILSXP}, @code{SYMSXP} and |
| @code{ENVSXP}. |
| |
| Bits 14 and 15 of @code{gp} are used for `fancy bindings'. Bit 14 is |
| used to lock a binding or an environment, and bit 15 is used to indicate |
| an active binding. (For the definition of an `active binding' see the |
| header comments in file @file{src/main/envir.c}.) Bit 15 is used for an |
| environment to indicate if it participates in the global cache. |
| |
| @findex ARGSUSED |
| @findex SET_ARGUSED |
| The macros @code{ARGUSED} and @code{SET_ARGUSED} are used when matching |
| actual and formal function arguments, and take the values 0, 1 and 2. |
| |
| @findex MISSING |
| @findex SET_MISSING |
| The macros @code{MISSING} and @code{SET_MISSING} are used for pairlists |
| of arguments. Four bits are reserved, but only two are used (and |
| exactly what for is not explained). It seems that bit 0 is used by |
| @code{matchArgs_NR} to mark missingness on the returned argument list, and |
| bit 1 is used to mark the use of a default value for an argument copied |
| to the evaluation frame of a closure. |
| |
| @findex DDVAL |
| @findex SET_DDVAL |
| @cindex ... argument |
| Bit 0 is used by macros @code{DDVAL} and @code{SET_DDVAL}. This |
| indicates that a @code{SYMSXP} is one of the symbols @code{..n} which |
| are implicitly created when @code{...} is processed, and so indicates |
| that it may need to be looked up in a @code{DOTSXP}. |
| |
| @findex PRSEEN |
| @cindex promise |
| Bit 0 is used for @code{PRSEEN}, a flag to indicate if a promise has |
| already been seen during the evaluation of the promise (and so to avoid |
| recursive loops). |
| |
| Bit 0 is used for @code{HASHASH}, on the @code{PRINTNAME} of the |
| @code{TAG} of the frame of an environment. (This bit is not serialized |
| for @code{CHARSXP} objects.) |
| |
| Bits 0 and 1 are used for weak references (to indicate `ready to |
| finalize', `finalize on exit'). |
| |
| Bit 0 is used by the condition handling system (on a @code{VECSXP}) to |
| indicate a calling handler. |
| |
| Bit 4 is turned on to mark S4 objects. |
| |
| Bits 1, 2, 3, 5 and 6 are used for a @code{CHARSXP} to denote its |
| encoding. Bit 1 indicates that the @code{CHARSXP} should be treated as |
| a set of bytes, not necessarily representing a character in any known |
| encoding. Bits 2, 3 and 6 are used to indicate that it is known to be |
| in Latin-1, UTF-8 or @acronym{ASCII} respectively. |
| |
| Bit 5 for a @code{CHARSXP} indicates that it is hashed by its address, |
| that is @code{NA_STRING} or is in the @code{CHARSXP} cache (this is not |
| serialized). Only exceptionally is a @code{CHARSXP} not hashed, and |
| this should never happen in end-user code. |
| |
| @node The 'data', Allocation classes, Rest of header, SEXPs |
| @subsection The `data' |
| |
| A @code{SEXPREC} is a C structure containing the 64-bit header as |
| described above, three pointers (to the attributes, previous and next |
| node) and the node data, a union |
| |
| @example |
| union @{ |
| struct primsxp_struct primsxp; |
| struct symsxp_struct symsxp; |
| struct listsxp_struct listsxp; |
| struct envsxp_struct envsxp; |
| struct closxp_struct closxp; |
| struct promsxp_struct promsxp; |
| @} u; |
| @end example |
| |
| @noindent |
| All of these alternatives apart from the first (an @code{int}) are three |
| pointers, so the union occupies three words. |
| |
| @cindex vector type |
| The vector types are @code{RAWSXP}, @code{CHARSXP}, @code{LGLSXP}, |
| @code{INTSXP}, @code{REALSXP}, @code{CPLXSXP}, @code{STRSXP}, |
| @code{VECSXP}, @code{EXPRSXP} and @code{WEAKREFSXP}. Remember that such |
| types are a @code{VECTOR_SEXPREC}, which again consists of the header |
| and the same three pointers, but followed by two integers giving the |
| length and `true length'@footnote{The only current use is for hash tables of |
| environments (@code{VECSXP}s), where @code{length} is the size of the table |
| and @code{truelength} is the number of primary slots in use, for the |
| reference hash tables in serialization (@code{VECSXP}s), and for `growable' |
| vectors (atomic vectors, @code{VECSXP}s and @code{EXPRSXP}s) which are |
| created by slightly over-committing when enlarging a vector during |
| subassignment, so that some number of the following enlargements during |
| subassignment can be performed in place), where @code{truelength} is the |
| number of slots in use. } of the vector, and then followed by the data |
| (aligned as required: on most 32-bit systems with a 24-byte |
| @code{VECTOR_SEXPREC} node the data can follow immediately after the node). |
| The data are a block of memory of the appropriate length to store `true |
| length' elements (rounded up to a multiple of 8 bytes, with the 8-byte |
| blocks being the `Vcells' referred in the documentation for @code{gc()}). |
| |
| The `data' for the various types are given in the table below. A lot of |
| this is interpretation, i.e.@: the types are not checked. |
| |
| @table @code |
| @item NILSXP |
| There is only one object of type @code{NILSXP}, @code{R_NilValue}, with |
| no data. |
| |
| @item SYMSXP |
| Pointers to three nodes, the name, value and internal, accessed by |
| @code{PRINTNAME} (a @code{CHARSXP}), @code{SYMVALUE} and |
| @code{INTERNAL}. (If the symbol's value is a @code{.Internal} function, |
| the last is a pointer to the appropriate @code{SEXPREC}.) Many symbols |
| have @code{SYMVALUE} @code{R_UnboundValue}. |
| |
| @item LISTSXP |
| Pointers to the CAR, CDR (usually a @code{LISTSXP} or @code{NULL}) and |
| TAG (a @code{SYMSXP} or @code{NULL}). |
| |
| @item CLOSXP |
| Pointers to the formals (a pairlist), the body and the environment. |
| |
| @item ENVSXP |
| Pointers to the frame, enclosing environment and hash table (@code{NULL} or a |
| @code{VECSXP}). A frame is a tagged pairlist with tag the symbol and |
| CAR the bound value. |
| |
| @item PROMSXP |
| Pointers to the value, expression and environment (in which to evaluate |
| the expression). Once an promise has been evaluated, the environment is |
| set to @code{NULL}. |
| |
| @item LANGSXP |
| A special type of @code{LISTSXP} used for function calls. (The CAR |
| references the function (perhaps via a symbol or language object), and |
| the CDR the argument list with tags for named arguments.) @R{}-level |
| documentation references to `expressions' / `language objects' are |
| mainly @code{LANGSXP}s, but can be symbols (@code{SYMSXP}s) or |
| expression vectors (@code{EXPRSXP}s). |
| |
| @item SPECIALSXP |
| @itemx BUILTINSXP |
| An integer giving the offset into the table of |
| primitives/@code{.Internal}s. |
| |
| @item CHARSXP |
| @code{length}, @code{truelength} followed by a block of bytes (allowing |
| for the @code{nul} terminator). |
| |
| @item LGLSXP |
| @itemx INTSXP |
| @code{length}, @code{truelength} followed by a block of C @code{int}s |
| (which are 32 bits on all @R{} platforms). |
| |
| @item REALSXP |
| @code{length}, @code{truelength} followed by a block of C @code{double}s. |
| |
| @item CPLXSXP |
| @code{length}, @code{truelength} followed by a block of C99 @code{double |
| complex}s. |
| |
| @item STRSXP |
| @code{length}, @code{truelength} followed by a block of pointers |
| (@code{SEXP}s pointing to @code{CHARSXP}s). |
| |
| @item DOTSXP |
| A special type of @code{LISTSXP} for the value bound to a @code{...} |
| symbol: a pairlist of promises. |
| |
| @item ANYSXP |
| This is used as a place holder for any type: there are no actual objects |
| of this type. |
| |
| @item VECSXP |
| @itemx EXPRSXP |
| @code{length}, @code{truelength} followed by a block of pointers. These |
| are internally identical (and identical to @code{STRSXP}) but differ in |
| the interpretations placed on the elements. |
| |
| @item BCODESXP |
| For the `byte-code' objects generated by the compiler. |
| |
| @item EXTPTRSXP |
| Has three pointers, to the pointer, the protection value (an @R{} object |
| which if alive protects this object) and a tag (a @code{SYMSXP}?). |
| |
| @item WEAKREFSXP |
| A @code{WEAKREFSXP} is a special @code{VECSXP} of length 4, with |
| elements @samp{key}, @samp{value}, @samp{finalizer} and @samp{next}. |
| The @samp{key} is @code{NULL}, an environment or an external pointer, |
| and the @samp{finalizer} is a function or @code{NULL}. |
| |
| @item RAWSXP |
| @code{length}, @code{truelength} followed by a block of bytes. |
| |
| @item S4SXP |
| two unused pointers and a tag. |
| @end table |
| |
| @node Allocation classes, , The 'data', SEXPs |
| @subsection Allocation classes |
| |
| @cindex allocation classes |
| As we have seen, the field @code{gccls} in the header is three bits to |
| label up to 8 classes of nodes. Non-vector nodes are of class 0, and |
| `small' vector nodes are of classes 1 to 5, with a class for custom |
| allocator vector nodes 6 and `large' vector nodes being of class 7. The |
| `small' vector nodes are able to store vector data of up to 8, 16, 32, |
| 64 and 128 bytes: larger vectors are @code{malloc}-ed individually |
| whereas the `small' nodes are allocated from pages of about 2000 |
| bytes. Vector nodes allocated using custom allocators (via |
| @code{allocVector3}) are not counted in the gc memory usage statistics |
| since their memory semantics is not under R's control and may be |
| non-standard (e.g., memory could be partially shared across nodes). |
| |
| |
| @node Environments and variable lookup, Attributes, SEXPs, R Internal Structures |
| @section Environments and variable lookup |
| |
| @cindex environment |
| @cindex variable lookup |
| What users think of as `variables' are symbols which are bound to |
| objects in `environments'. The word `environment' is used ambiguously |
| in @R{} to mean @emph{either} the frame of an @code{ENVSXP} (a pairlist |
| of symbol-value pairs) @emph{or} an @code{ENVSXP}, a frame plus an |
| enclosure. |
| |
| @cindex user databases |
| There are additional places that `variables' can be looked up, called |
| `user databases' in comments in the code. These seem undocumented in |
| the @R{} sources, but apparently refer to the @pkg{RObjectTable} package |
| formerly available at @uref{https://www.omegahat.net/RObjectTables/}. |
| |
| @cindex base environment |
| @cindex environment, base |
| The base environment is special. There is an @code{ENVSXP} environment |
| with enclosure the empty environment @code{R_EmptyEnv}, but the frame of |
| that environment is not used. Rather its bindings are part of the |
| global symbol table, being those symbols in the global symbol table |
| whose values are not @code{R_UnboundValue}. When @R{} is started the |
| internal functions are installed (by C code) in the symbol table, with |
| primitive functions having values and @code{.Internal} functions having |
| what would be their values in the field accessed by the @code{INTERNAL} |
| macro. Then @code{.Platform} and @code{.Machine} are computed and the |
| base package is loaded into the base environment followed by the system |
| profile. |
| |
| The frames of environments (and the symbol table) are normally hashed |
| for faster access (including insertion and deletion). |
| |
| By default @R{} maintains a (hashed) global cache of `variables' (that |
| is symbols and their bindings) which have been found, and this refers |
| only to environments which have been marked to participate, which |
| consists of the global environment (aka the user workspace), the base |
| environment plus environments@footnote{Remember that attaching a list or |
| a saved image actually creates and populates an environment and attaches |
| that.} which have been @code{attach}ed. When an environment is either |
| @code{attach}ed or @code{detach}ed, the names of its symbols are flushed |
| from the cache. The cache is used whenever searching for variables from |
| the global environment (possibly as part of a recursive search). |
| |
| @menu |
| * Search paths:: |
| * Namespaces:: |
| * Hash table:: |
| @end menu |
| |
| @node Search paths, Namespaces, Environments and variable lookup, Environments and variable lookup |
| @subsection Search paths |
| |
| @cindex search path |
| @Sl{} has the notion of a `search path': the lookup for a `variable' |
| leads (possibly through a series of frames) to the `session frame' the |
| `working directory' and then along the search path. The search path is |
| a series of databases (as returned by @code{search()}) which contain the |
| system functions (but not necessarily at the end of the path, as by |
| default the equivalent of packages are added at the end). |
| |
| @R{} has a variant on the @Sl{} model. There is a search path (also |
| returned by @code{search()}) which consists of the global environment |
| (aka user workspace) followed by environments which have been attached |
| and finally the base environment. Note that unlike @Sl{} it is not |
| possible to attach environments before the workspace nor after the base |
| environment. |
| |
| However, the notion of variable lookup is more general in @R{}, hence |
| the plural in the title of this subsection. Since environments have |
| enclosures, from any environment there is a search path found by looking |
| in the frame, then the frame of its enclosure and so on. Since loops |
| are not allowed, this process will eventually terminate: it can |
| terminate at either the base environment or the empty environment. (It |
| can be conceptually simpler to think of the search always terminating at |
| the empty environment, but with an optimization to stop at the base |
| environment.) So the `search path' describes the chain of environments |
| which is traversed once the search reaches the global environment. |
| |
| @node Namespaces, Hash table, Search paths, Environments and variable lookup |
| @subsection Namespaces |
| |
| @cindex namespace |
| Namespaces are environments associated with packages (and once again |
| the base package is special and will be considered separately). A |
| package @code{@var{pkg}} defines two environments |
| @code{namespace:@var{pkg}} and @code{package:@var{pkg}}: it is |
| @code{package:@var{pkg}} that can be @code{attach}ed and form part of |
| the search path. |
| |
| The objects defined by the @R{} code in the package are symbols with |
| bindings in the @code{namespace:@var{pkg}} environment. The |
| @code{package:@var{pkg}} environment is populated by selected symbols |
| from the @code{namespace:@var{pkg}} environment (the exports). The |
| enclosure of this environment is an environment populated with the |
| explicit imports from other namespaces, and the enclosure of |
| @emph{that} environment is the base namespace. (So the illusion of the |
| imports being in the namespace environment is created via the |
| environment tree.) The enclosure of the base namespace is the global |
| environment, so the search from a package namespace goes via the |
| (explicit and implicit) imports to the standard `search path'. |
| |
| @cindex base namespace |
| @cindex namespace, base |
| @findex R_BaseNamespace |
| The base namespace environment @code{R_BaseNamespace} is another |
| @code{ENVSXP} that is special-cased. It is effectively the same thing |
| as the base environment @code{R_BaseEnv} @emph{except} that its |
| enclosure is the global environment rather than the empty environment: |
| the internal code diverts lookups in its frame to the global symbol |
| table. |
| |
| @node Hash table, , Namespaces, Environments and variable lookup |
| @subsection Hash table |
| |
| Environments in @R{} usually have a hash table, and nowadays that is the |
| default in @code{new.env()}. It is stored as a @code{VECSXP} where |
| @code{length} is used for the allocated size of the table and |
| @code{truelength} is the number of primary slots in use---the pointer to |
| the @code{VECSXP} is part of the header of a @code{SEXP} of type |
| @code{ENVSXP}, and this points to @code{R_NilValue} if the environment |
| is not hashed. |
| |
| For the pros and cons of hashing, see a basic text on Computer Science. |
| |
| The code to implement hashed environments is in @file{src/main/envir.c}. |
| Unless set otherwise (e.g.@: by the @code{size} argument of |
| @code{new.env()}) the initial table size is @code{29}. The table will |
| be resized by a factor of 1.2 once the load factor (the proportion of |
| primary slots in use) reaches 85%. |
| |
| The hash chains are stored as pairlist elements of the @code{VECSXP}: |
| items are inserted at the front of the pairlist. Hashing is principally |
| designed for fast searching of environments, which are from time to time |
| added to but rarely deleted from, so items are not actually deleted but |
| have their value set to @code{R_UnboundValue}. |
| |
| |
| @node Attributes, Contexts, Environments and variable lookup, R Internal Structures |
| @section Attributes |
| |
| @cindex attributes |
| @findex ATTRIB |
| @findex SET_ATTRIB |
| @findex DUPLICATE_ATTRIB |
| As we have seen, every @code{SEXPREC} has a pointer to the attributes of |
| the node (default @code{R_NilValue}). The attributes can be |
| accessed/set by the macros/functions @code{ATTRIB} and |
| @code{SET_ATTRIB}, but such direct access is normally only used to check |
| if the attributes are @code{NULL} or to reset them. Otherwise access |
| goes through the functions @code{getAttrib} and @code{setAttrib} which |
| impose restrictions on the attributes. One thing to watch is that if |
| you copy attributes from one object to another you may (un)set the |
| @code{"class"} attribute and so need to copy the object and S4 bits as |
| well. There is a macro/function @code{DUPLICATE_ATTRIB} to automate |
| this. |
| |
| Note that the `attributes' of a @code{CHARSXP} are used as part of the |
| management of the @code{CHARSXP} cache: of course @code{CHARSXP}'s are |
| not user-visible but C-level code might look at their attributes. |
| |
| The code assumes that the attributes of a node are either |
| @code{R_NilValue} or a pairlist of non-zero length (and this is checked |
| by @code{SET_ATTRIB}). The attributes are named (via tags on the |
| pairlist). The replacement function @code{attributes<-} ensures that |
| @code{"dim"} precedes @code{"dimnames"} in the pairlist. Attribute |
| @code{"dim"} is one of several that is treated specially: the values are |
| checked, and any @code{"names"} and @code{"dimnames"} attributes are |
| removed. Similarly, you cannot set @code{"dimnames"} without having set |
| @code{"dim"}, and the value assigned must be a list of the correct |
| length and with elements of the correct lengths (and all zero-length |
| elements are replaced by @code{NULL}). |
| |
| The other attributes which are given special treatment are |
| @code{"names"}, @code{"class"}, @code{"tsp"}, @code{"comment"} and |
| @code{"row.names"}. For pairlist-like objects the names are not stored |
| as an attribute but (as symbols) as the tags: however the @R{} interface |
| makes them look like conventional attributes, and for one-dimensional |
| arrays they are stored as the first element of the @code{"dimnames"} |
| attribute. The C code ensures that the @code{"tsp"} attribute is an |
| @code{REALSXP}, the frequency is positive and the implied length agrees |
| with the number of rows of the object being assigned to. Classes and |
| comments are restricted to character vectors, and assigning a |
| zero-length comment or class removes the attribute. Setting or removing |
| a @code{"class"} attribute sets the object bit appropriately. Integer |
| row names are converted to and from the internal compact representation. |
| |
| @cindex copying semantics |
| Care needs to be taken when adding attributes to objects of the types |
| with non-standard copying semantics. There is only one object of type |
| @code{NILSXP}, @code{R_NilValue}, and that should never have attributes |
| (and this is enforced in @code{installAttrib}). For environments, |
| external pointers and weak references, the attributes should be relevant |
| to all uses of the object: it is for example reasonable to have a name |
| for an environment, and also a @code{"path"} attribute for those |
| environments populated from @R{} code in a package. |
| |
| @cindex attributes, preserving |
| @cindex preserving attributes |
| When should attributes be preserved under operations on an object? |
| Becker, Chambers & Wilks (1988, pp. 144--6) give some guidance. Scalar |
| functions (those which operate element-by-element on a vector and whose |
| output is similar to the input) should preserve attributes (except |
| perhaps class, and if they do preserve class they need to preserve the |
| @code{OBJECT} and S4 bits). Binary operations normally call |
| @findex copyMostAttrib |
| @code{copyMostAttrib} to copy most attributes from the longer |
| argument (and if they are of the same length from both, preferring the |
| values on the first). Here `most' means all except the @code{names}, |
| @code{dim} and @code{dimnames} which are set appropriately by the code |
| for the operator. |
| |
| Subsetting (other than by an empty index) generally drops all attributes |
| except @code{names}, @code{dim} and @code{dimnames} which are reset as |
| appropriate. On the other hand, subassignment generally preserves such |
| attributes even if the length is changed. Coercion drops all |
| attributes. For example: |
| |
| @example |
| > x <- structure(1:8, names=letters[1:8], comm="a comment") |
| > x[] |
| a b c d e f g h |
| 1 2 3 4 5 6 7 8 |
| attr(,"comm") |
| [1] "a comment" |
| > x[1:3] |
| a b c |
| 1 2 3 |
| > x[3] <- 3 |
| > x |
| a b c d e f g h |
| 1 2 3 4 5 6 7 8 |
| attr(,"comm") |
| [1] "a comment" |
| > x[9] <- 9 |
| > x |
| a b c d e f g h |
| 1 2 3 4 5 6 7 8 9 |
| attr(,"comm") |
| [1] "a comment" |
| @end example |
| |
| |
| @node Contexts, Argument evaluation, Attributes, R Internal Structures |
| @section Contexts |
| |
| @cindex context |
| @emph{Contexts} are the internal mechanism used to keep track of where a |
| computation has got to (and from where), so that control-flow constructs |
| can work and reasonable information can be produced on error conditions |
| (such as @emph{via} traceback), and otherwise (the @code{sys.@var{xxx}} |
| functions). |
| |
| Execution contexts are a stack of C @code{structs}: |
| |
| @example |
| typedef struct RCNTXT @{ |
| struct RCNTXT *nextcontext; /* @r{The next context up the chain} */ |
| int callflag; /* @r{The context `type'} */ |
| JMP_BUF cjmpbuf; /* @r{C stack and register information} */ |
| int cstacktop; /* @r{Top of the pointer protection stack} */ |
| int evaldepth; /* @r{Evaluation depth at inception} */ |
| SEXP promargs; /* @r{Promises supplied to closure} */ |
| SEXP callfun; /* @r{The closure called} */ |
| SEXP sysparent; /* @r{Environment the closure was called from} */ |
| SEXP call; /* @r{The call that effected this context} */ |
| SEXP cloenv; /* @r{The environment} */ |
| SEXP conexit; /* @r{Interpreted @code{on.exit} code} */ |
| void (*cend)(void *); /* @r{C @code{on.exit} thunk} */ |
| void *cenddata; /* @r{Data for C @code{on.exit} thunk} */ |
| char *vmax; /* @r{Top of the @code{R_alloc} stack} */ |
| int intsusp; /* @r{Interrupts are suspended} */ |
| SEXP handlerstack; /* @r{Condition handler stack} */ |
| SEXP restartstack; /* @r{Stack of available restarts} */ |
| struct RPRSTACK *prstack; /* @r{Stack of pending promises} */ |
| @} RCNTXT, *context; |
| @end example |
| |
| @noindent |
| plus additional fields for the byte-code compiler. The `types' |
| are from |
| |
| @example |
| enum @{ |
| CTXT_TOPLEVEL = 0, /* @r{toplevel context} */ |
| CTXT_NEXT = 1, /* @r{target for @code{next}} */ |
| CTXT_BREAK = 2, /* @r{target for @code{break}} */ |
| CTXT_LOOP = 3, /* @r{@code{break} or @code{next} target} */ |
| CTXT_FUNCTION = 4, /* @r{function closure} */ |
| CTXT_CCODE = 8, /* @r{other functions that need error cleanup} */ |
| CTXT_RETURN = 12, /* @r{@code{return()} from a closure} */ |
| CTXT_BROWSER = 16, /* @r{return target on exit from browser} */ |
| CTXT_GENERIC = 20, /* @r{rather, running an S3 method} */ |
| CTXT_RESTART = 32, /* @r{a call to @code{restart} was made from a closure} */ |
| CTXT_BUILTIN = 64 /* @r{builtin internal function} */ |
| @}; |
| @end example |
| |
| @noindent |
| where the @code{CTXT_FUNCTION} bit is on wherever function closures are |
| involved. |
| |
| Contexts are created by a call to @code{begincontext} and ended by a |
| call to @code{endcontext}: code can search up the stack for a |
| particular type of context via @code{findcontext} (and jump there) or |
| jump to a specific context via @code{R_JumpToContext}. |
| @code{R_ToplevelContext} is the `idle' state (normally the command |
| prompt), and @code{R_GlobalContext} is the top of the stack. |
| |
| Note that whilst calls to closures set a context, internal functions never |
| do and primitive builtins only set it when profiling or when they are |
| interfaces to foreign functions. |
| |
| The byte-code compiler generates a map of instructions to source references |
| and expressions at compile time, which allows to produce information on |
| error conditions. As an optimization, the byte-code interpreter then does |
| not set a context in some cases, such as in simple loops or when inlining |
| simple builtins or wrappers for internal functions. |
| |
| @findex UseMethod |
| @cindex method dispatch |
| Dispatching from a S3 generic (via @code{UseMethod} or its internal |
| equivalent) or calling @code{NextMethod} sets the context type to |
| @code{CTXT_GENERIC}. This is used to set the @code{sysparent} of the |
| method call to that of the @code{generic}, so the method appears to have |
| been called in place of the generic rather than from the generic. |
| |
| The @R{} @code{sys.frame} and @code{sys.call} functions work by counting |
| calls to closures (type @code{CTXT_FUNCTION}) from either end of the |
| context stack. |
| |
| Note that the @code{sysparent} element of the structure is not the same |
| thing as @code{sys.parent()}. Element @code{sysparent} is primarily |
| used in managing changes of the function being evaluated, i.e.@: by |
| @code{Recall} and method dispatch. |
| |
| @code{CTXT_CCODE} contexts are currently used in @code{cat()}, |
| @code{load()}, @code{scan()} and @code{write.table()} (to close the |
| connection on error), by @code{PROTECT}, serialization (to recover from |
| errors, e.g.@: free buffers) and within the error handling code (to |
| raise the C stack limit and reset some variables). |
| |
| |
| @node Argument evaluation, Autoprinting, Contexts, R Internal Structures |
| @section Argument evaluation |
| |
| @cindex argument evaluation |
| As we have seen, functions in @R{} come in three types, closures |
| (@code{SEXPTYPE} @code{CLOSXP}), specials (@code{SPECIALSXP}) and |
| builtins (@code{BUILTINSXP}). In this section we consider when (and if) |
| the actual arguments of function calls are evaluated. The rules are |
| different for the internal (special/builtin) and @R{}-level functions |
| (closures). |
| |
| For a call to a closure, the actual and formal arguments are matched and |
| a matched call (another @code{LANGSXP}) is constructed. This process |
| first replaces the actual argument list by a list of promises to the |
| values supplied. It then constructs a new environment which contains |
| the names of the formal parameters matched to actual or default values: |
| all the matched values are promises, the defaults as promises to be |
| evaluated in the environment just created. That environment is then |
| used for the evaluation of the body of the function, and promises will |
| be forced (and hence actual or default arguments evaluated) when they |
| are encountered. |
| @findex NAMED |
| (Evaluating a promise sets @code{NAMED = NAMEDMAX} on its value, so if the |
| argument was a symbol its binding is regarded as having multiple |
| references during the evaluation of the closure call.) |
| [The @code{NAMED} mechanism has been replaced by reference counting.] |
| |
| If the closure is an S3 generic (that is, contains a call to |
| @code{UseMethod}) the evaluation process is the same until the |
| @code{UseMethod} call is encountered. At that point the argument on |
| which to do dispatch (normally the first) will be evaluated if it has |
| not been already. If a method has been found which is a closure, a new |
| evaluation environment is created for it containing the matched |
| arguments of the method plus any new variables defined so far during the |
| evaluation of the body of the generic. (Note that this means changes to |
| the values of the formal arguments in the body of the generic are |
| discarded when calling the method, but @emph{actual} argument promises |
| which have been forced retain the values found when they were forced. |
| On the other hand, missing arguments have values which are promises to |
| use the default supplied by the method and not by the generic.) If the |
| method found is a primitive it is called with the matched argument list |
| of promises (possibly already forced) used for the generic. |
| |
| @cindex builtin function |
| @cindex special function |
| @cindex primitive function |
| @cindex .Internal function |
| The essential difference@footnote{There is currently one other |
| difference: when profiling builtin functions are counted as function |
| calls but specials are not.} between special and builtin functions is |
| that the arguments of specials are not evaluated before the C code is |
| called, and those of builtins are. Note that being a special/builtin is |
| separate from being primitive or @code{.Internal}: @code{quote} is a |
| special primitive, @code{+} is a builtin primitive, @code{cbind} is a |
| special @code{.Internal} and @code{grep} is a builtin @code{.Internal}. |
| |
| @cindex generic, internal |
| @findex DispatchOrEval |
| Many of the internal functions are internal generics, which for specials |
| means that they do not evaluate their arguments on call, but the C code |
| starts with a call to @code{DispatchOrEval}. The latter evaluates the |
| first argument, and looks for a method based on its class. (If S4 |
| dispatch is on, S4 methods are looked for first, even for S3 classes.) |
| If it finds a method, it dispatches to that method with a call based on |
| promises to evaluate the remaining arguments. If no method is found, |
| the remaining arguments are evaluated before return to the internal |
| generic. |
| |
| @cindex generic, generic |
| @findex DispatchGeneric |
| The other way that internal functions can be generic is to be group |
| generic. Most such functions are builtins (so immediately evaluate all |
| their arguments), and all contain a call to the C function |
| @code{DispatchGeneric}. There are some peculiarities over the number of |
| arguments for the @code{"Math"} group generic, with some members |
| allowing only one argument, some having two (with a default for the |
| second) and @code{trunc} allows one or more but the default method only |
| accepts one. |
| |
| @menu |
| * Missingness:: |
| * Dot-dot-dot arguments:: |
| @end menu |
| |
| @node Missingness, Dot-dot-dot arguments, Argument evaluation, Argument evaluation |
| @subsection Missingness |
| |
| @cindex missingness |
| Actual arguments to (non-internal) @R{} functions can be fewer than are |
| required to match the formal arguments of the function. Having |
| unmatched formal arguments will not matter if the argument is never used |
| (by lazy evaluation), but when the argument is evaluated, either its |
| default value is evaluated (within the evaluation environment of the |
| function) or an error is thrown with a message along the lines of |
| |
| @example |
| argument "foobar" is missing, with no default |
| @end example |
| |
| @findex MISSING |
| @findex R_MissingArg |
| Internally missingness is handled by two mechanisms. The object |
| @code{R_MissingArg} is used to indicate that a formal argument has no |
| (default) value. When matching the actual arguments to the formal |
| arguments, a new argument list is constructed from the formals all of |
| whose values are @code{R_MissingArg} with the first @code{MISSING} bit |
| set. Then whenever a formal argument is matched to an actual argument, |
| the corresponding member of the new argument list has its value set to |
| that of the matched actual argument, and if that is not |
| @code{R_MissingArg} the missing bit is unset. |
| |
| This new argument list is used to form the evaluation frame for the |
| function, and if named arguments are subsequently given a new value |
| (before they are evaluated) the missing bit is cleared. |
| |
| Missingness of arguments can be interrogated via the @code{missing()} |
| function. An argument is clearly missing if its missing bit is set or |
| if the value is @code{R_MissingArg}. However, missingness can be passed |
| on from function to function, for using a formal argument as an actual |
| argument in a function call does not count as evaluation. So |
| @code{missing()} has to examine the value (a promise) of a |
| non-yet-evaluated formal argument to see if it might be missing, which |
| might involve investigating a promise and so on @dots{}. |
| |
| Special primitives also need to handle missing arguments, and in some |
| case (e.g.@: @code{log}) that is why they are special and not |
| builtin. This is usually done by testing if an argument's value is |
| @code{R_MissingArg}. |
| |
| @node Dot-dot-dot arguments, , Missingness, Argument evaluation |
| @subsection Dot-dot-dot arguments |
| |
| @cindex ... argument |
| Dot-dot-dot arguments are convenient when writing functions, but |
| complicate the internal code for argument evaluation. |
| |
| The formals of a function with a @code{...} argument represent that as a |
| single argument like any other argument, with tag the symbol |
| @code{R_DotsSymbol}. When the actual arguments are matched to the |
| formals, the value of the @code{...} argument is of @code{SEXPTYPE} |
| @code{DOTSXP}, a pairlist of promises (as used for matched arguments) |
| but distinguished by the @code{SEXPTYPE}. |
| |
| Recall that the evaluation frame for a function initially contains the |
| @code{@var{name}=@var{value}} pairs from the matched call, and hence |
| this will be true for @code{...} as well. The value of @code{...} is a |
| (special) pairlist whose elements are referred to by the special symbols |
| @code{..1}, @code{..2}, @dots{} which have the @code{DDVAL} bit set: |
| when one of these is encountered it is looked up (via @code{ddfindVar}) |
| in the value of the @code{...} symbol in the evaluation frame. |
| |
| Values of arguments matched to a @code{...} argument can be missing. |
| |
| Special primitives may need to handle @code{...} arguments: see for |
| example the internal code of @code{switch} in file |
| @file{src/main/builtin.c}. |
| |
| @node Autoprinting, The write barrier, Argument evaluation, R Internal Structures |
| @section Autoprinting |
| |
| @cindex autoprinting |
| @findex R_Visible |
| |
| Whether the returned value of a top-level @R{} expression is printed is |
| controlled by the global boolean variable @code{R_Visible}. This is set |
| (to true or false) on entry to all primitive and internal functions |
| based on the @code{eval} column of the table in file |
| @file{src/main/names.c}: the appropriate setting can be extracted by the |
| macro @code{PRIMPRINT}. |
| @findex PRIMPRINT |
| |
| @findex invisible |
| The @R{} primitive function @code{invisible} makes use of this |
| mechanism: it just sets @code{R_Visible = FALSE} before entry and |
| returns its argument. |
| |
| For most functions the intention will be that the setting of |
| @code{R_Visible} when they are entered is the setting used when they |
| return, but there need to be exceptions. The @R{} functions |
| @code{identify}, @code{options}, @code{system} and @code{writeBin} |
| determine whether the result should be visible from the arguments or |
| user action. Other functions themselves dispatch functions which may |
| change the visibility flag: examples@footnote{the other current example |
| is left brace, which is implemented as a primitive.} are |
| @code{.Internal}, @code{do.call}, @code{eval}, @code{withVisible}, |
| @code{if}, @code{NextMethod}, @code{Recall}, @code{recordGraphics}, |
| @code{standardGeneric}, @code{switch} and @code{UseMethod}. |
| |
| `Special' primitive and internal functions evaluate their arguments |
| internally @emph{after} @code{R_Visible} has been set, and evaluation of |
| the arguments (e.g.@: an assignment as in PR#9263) can change the value |
| of the flag. |
| |
| The @code{R_Visible} flag can also get altered during the evaluation of |
| a function, with comments in the code about @code{warning}, |
| @code{writeChar} and graphics functions calling @code{GText} (PR#7397). |
| (Since the C-level function @code{eval} sets @code{R_Visible}, this |
| could apply to any function calling it. Since it is called when |
| evaluating promises, even object lookup can change @code{R_Visible}.) |
| Internal and primitive functions force the documented setting of |
| @code{R_Visible} on return, unless the C code is allowed to change it |
| (the exceptions above are indicated by @code{PRIMPRINT} having value 2). |
| |
| The actual autoprinting is done by @code{PrintValueEnv} in file |
| @file{print.c}. If the object to be printed has the S4 bit set and S4 |
| methods dispatch is on, @code{show} is called to print the object. |
| Otherwise, if the object bit is set (so the object has a |
| @code{"class"} attribute), @code{print} is called to dispatch methods: |
| for objects without a class the internal code of @code{print.default} |
| is called. |
| |
| |
| @node The write barrier, Serialization Formats, Autoprinting, R Internal Structures |
| @section The write barrier and the garbage collector |
| |
| @cindex write barrier |
| @cindex garbage collector |
| @R{} has long had a generational garbage collector, and bit @code{gcgen} |
| in the @code{sxpinfo} header is used in the implementation of this. |
| This is used in conjunction with the @code{mark} bit to identify two |
| previous generations. |
| |
| There are three levels of collections. Level 0 collects only the |
| youngest generation, level 1 collects the two youngest generations and |
| level 2 collects all generations. After 20 level-0 collections the next |
| collection is at level 1, and after 5 level-1 collections at level 2. |
| Further, if a level-@var{n} collection fails to provide 20% free space |
| (for each of nodes and the vector heap), the next collection will be at |
| level @var{n+1}. (The @R{}-level function @code{gc()} performs a |
| level-2 collection.) |
| |
| A generational collector needs to efficiently `age' the objects, |
| especially list-like objects (including @code{STRSXP}s). This is done |
| by ensuring that the elements of a list are regarded as at least as old |
| as the list @emph{when they are assigned}. This is handled by the |
| functions @code{SET_VECTOR_ELT} and @code{SET_STRING_ELT}, which is why |
| they are functions and not macros. Ensuring the integrity of such |
| operations is termed the @dfn{write barrier} and is done by making the |
| @code{SEXP} opaque and only providing access via functions (which cannot |
| be used as lvalues in assignments in C). |
| |
| All code in @R{} extensions is behind the write barrier. @R{} |
| extensions cannot directly access the internals of the @code{SEXPREC}s. |
| Base code can access internals if @samp{USE_RINTERNALS} is defined. |
| This is normally defined in @file{Defn.h} when @R{} is compiled. To |
| enable a check on the way that the access is used, @R{} can be compiled |
| with flag @option{--enable-strict-barrier} which ensures that header |
| @file{Defn.h} does not define @samp{USE_RINTERNALS} and hence that |
| @code{SEXP} is opaque in most of @R{} itself. (There are some necessary |
| exceptions: foremost in file @file{memory.c} where the accessor |
| functions are defined and also in file @file{size.c} which needs access |
| to the sizes of the internal structures.) |
| |
| For background papers see |
| @uref{https://homepage.stat.uiowa.edu/~luke/R/barrier.html} and |
| @uref{https://homepage.stat.uiowa.edu/~luke/R/gengcnotes.html}. |
| |
| @node Serialization Formats, Encodings for CHARSXPs, The write barrier, R Internal Structures |
| @section Serialization Formats |
| |
| @cindex serialization |
| Serialized versions of @R{} objects are used by @code{load}/@code{save} |
| and also at a slightly lower level by @code{saveRDS}/@code{readRDS} (and |
| their earlier `internal' dot-name versions) and |
| @code{serialize}/@code{unserialize}. These differ in what they |
| serialize to (a file, a connection, a raw vector) and whether they are |
| intended to serialize a single object or a collection of objects |
| (typically the workspace). @code{save} writes a header at the beginning |
| of the file (a single LF-terminated line) which the lower-level versions |
| do not. |
| |
| @code{save} and @code{saveRDS} allow various forms of compression, and |
| @command{gzip} compression is the default (except for @acronym{ASCII} |
| saves). Compression is applied to the whole file stream, including the |
| headers, so serialized files can be uncompressed or re-compressed by |
| external programs. Both @code{load} and @code{readRDS} can read |
| @command{gzip}, @command{bzip2} and @command{xz} forms of compression |
| when reading from a file, and @command{gzip} compression when reading |
| from a connection. |
| |
| @R{} has used the same serialization format called `version 2' from @R{} |
| 1.4.0 in December 2001 until @R{} 3.5.3 in March 2019. It has been expanded |
| in back-compatible ways since its inception, for example to support |
| additional @code{SEXPTYPE}s. Earlier formats are still supported via |
| @code{load} and @code{save} but such formats are not described here. The |
| current default serialization format is called `version 3', and has been |
| introduced in @R{} 3.5.0. |
| |
| @code{save} works by writing a single-line header (typically |
| @code{RDX2\n} for a binary save: the only other current value is |
| @code{RDA2\n} for @code{save(files=TRUE)}), then creating a tagged |
| pairlist of the objects to be saved and serializing that single object. |
| @code{load} reads the header line, unserializes a single object (a |
| pairlist or a vector list) and assigns the elements of the object in the |
| specified environment. The header line serves two purposes in @R{}: it |
| identifies the serialization format so @code{load} can switch to the |
| appropriate reader code, and the newline @code{\n} allows the detection of files |
| which have been subjected to a non-binary transfer which re-mapped line |
| endings. It can also be thought of as a `magic number' in the sense |
| used by the @command{file} program (although @R{} save files are not yet |
| by default known to that program). |
| |
| Serialization in @R{} needs to take into account that objects may |
| contain references to environments, which then have enclosing |
| environments and so on. (Environments recognized as package or name |
| space environments are saved by name.) There are `reference objects' |
| which are not duplicated on copy and should remain shared on |
| unserialization. These are weak references, external pointers and |
| environments other than those associated with packages, namespaces and |
| the global environment. These are handled via a hash table, and |
| references after the first are written out as a reference marker indexed |
| by the table entry. |
| |
| Version-2 serialization first writes a header indicating the format |
| (normally @samp{X\n} for an XDR format binary save, but @samp{A\n}, |
| ASCII, and @samp{B\n}, native word-order binary, can also occur) and |
| then three integers giving the version of the format and two @R{} |
| versions (packed by the @code{R_Version} macro from @file{Rversion.h}). |
| (Unserialization interprets the two versions as the version of @R{} |
| which wrote the file followed by the minimal version of @R{} needed to |
| read the format.) Serialization then writes out the object recursively |
| using function @code{WriteItem} in file @file{src/main/serialize.c}. |
| |
| Some objects are written as if they were @code{SEXPTYPE}s: such |
| pseudo-@code{SEXPTYPE}s cover @code{R_NilValue}, @code{R_EmptyEnv}, |
| @code{R_BaseEnv}, @code{R_GlobalEnv}, @code{R_UnboundValue}, |
| @code{R_MissingArg} and @code{R_BaseNamespace}. |
| |
| For all @code{SEXPTYPE}s except @code{NILSXP}, @code{SYMSXP} and |
| @code{ENVSXP} serialization starts with an integer with the |
| @code{SEXPTYPE} in bits 0:7@footnote{only bits 0:4 are currently used |
| for @code{SEXPTYPE}s but values 241:255 are used for |
| pseudo-@code{SEXPTYPE}s.} followed by the object bit, two bits |
| indicating if there are any attributes and if there is a tag (for the |
| pairlist types), an unused bit and then the @code{gp} |
| field@footnote{Currently the only relevant bits are 0:1, 4, 14:15.} in |
| bits 12:27. Pairlist-like objects write their attributes (if any), tag |
| (if any), CAR and then CDR (using tail recursion): other objects write |
| their attributes after themselves. Atomic vector objects write their |
| length followed by the data: generic vector-list objects write their |
| length followed by a call to @code{WriteItem} for each element. The |
| code for @code{CHARSXP}s special-cases @code{NA_STRING} and writes it as |
| length @code{-1} with no data. Lengths no more than @code{2^31 - 1} are |
| written in that way and larger lengths (which only occur on 64-bit |
| systems) as @code{-1} followed by the upper and lower 32-bits as integers |
| (regarded as unsigned). |
| |
| Environments are treated in several ways: as we have seen, some are |
| written as specific pseudo-@code{SEXPTYPE}s. Package and namespace |
| environments are written with pseudo-@code{SEXPTYPE}s followed by the |
| name. `Normal' environments are written out as @code{ENVSXP}s with an |
| integer indicating if the environment is locked followed by the |
| enclosure, frame, `tag' (the hash table) and attributes. |
| |
| In the `XDR' format integers and doubles are written in bigendian order: |
| however the format is not fully XDR (as defined in RFC 1832) as byte |
| quantities (such as the contents of @code{CHARSXP} and @code{RAWSXP} |
| types) are written as-is and not padded to a multiple of four bytes. |
| |
| The `ASCII' format writes 7-bit characters. Integers are formatted with |
| @code{%d} (except that @code{NA_integer_} is written as @code{NA}), |
| doubles formatted with @code{%.16g} (plus @code{NA}, @code{Inf} and |
| @code{-Inf}) and bytes with @code{%02x}. Strings are written using |
| standard escapes (e.g.@: @code{\t} and @code{\013}) for non-printing and |
| non-@acronym{ASCII} bytes. |
| |
| Version-3 serialization extends version-2 by support for custom |
| serialization of @code{ALTREP} framework objects. It also stores the |
| current native encoding at serialization time, so that unflagged strings can |
| be converted if unserialized in R running under different native encoding. |
| |
| @node Encodings for CHARSXPs, The CHARSXP cache, Serialization Formats, R Internal Structures |
| @section Encodings for CHARSXPs |
| |
| Character data in @R{} are stored in the sexptype @code{CHARSXP}. |
| |
| There is support for encodings other than that of the current locale, in |
| particular UTF-8 and the multi-byte encodings used on Windows for CJK |
| languages. A limited means to indicate the encoding of a @code{CHARSXP} |
| is @emph{via} two of the `general purpose' bits which are used to declare |
| the encoding to be either Latin-1 or UTF-8. (Note that it is possible |
| for a character vector to contain elements in different encodings.) |
| Both printing and plotting notice the declaration and convert the string |
| to the current locale (possibly using @code{<xx>} to display in |
| hexadecimal bytes that are not valid in the current locale). Many (but |
| not all) of the character manipulation functions will either preserve |
| the declaration or re-encode the character string. |
| |
| Strings that refer to the OS such as file names need to be passed |
| through a wide-character interface on some OSes (e.g.@: Windows). |
| |
| When are character strings declared to be of known encoding? One way is |
| to do so directly via @code{Encoding}. The parser declares the encoding |
| if this is known, either via the @code{encoding} argument to |
| @code{parse} or from the locale within which parsing is being done at |
| the @R{} command line. (Other ways are recorded on the help page for |
| @code{Encoding}.) |
| |
| It is not necessary to declare the encoding of @acronym{ASCII} strings |
| as they will work in any locale. @acronym{ASCII} strings should never |
| have a marked encoding, as any encoding will be ignored when entering |
| such strings into the @code{CHARSXP} cache. |
| |
| The rationale behind considering only UTF-8 and Latin-1 was that most |
| systems are capable of producing UTF-8 strings and this is the nearest |
| we have to a universal format. For those that do not (for example those |
| lacking a powerful enough @code{iconv}), it is likely that they work in |
| Latin-1, the old @R{} assumption. Then the parser can return a |
| UTF-8-encoded string if it encounters a @samp{\uxxxx} escape for a |
| Unicode point that cannot be represented in the current charset. (This |
| needs MBCS support, and in the past was only enabled@footnote{See define |
| @code{USE_UTF8_IF_POSSIBLE} in old versions of file @file{src/main/gram.c}.} on |
| Windows.) Now this is enabled for all platforms, and a @samp{\uxxxx} or |
| @samp{\Uxxxxxxxx} escape ensures that the parsed string will be marked |
| as UTF-8. |
| |
| Most of the character manipulation functions now preserve UTF-8 |
| encodings: there are some notes as to which at the top of file |
| @file{src/main/character.c} and in file |
| @file{src/library/base/man/Encoding.Rd}. |
| |
| Graphics devices are offered the possibility of handing UTF-8-encoded |
| strings without re-encoding to the native character set, by setting |
| @code{hasTextUTF8} to be @samp{TRUE} and supplying functions |
| @code{textUTF8} and @code{strWidthUTF8} that expect UTF-8-encoded |
| inputs. Normally the symbol font is encoded in Adobe Symbol encoding, |
| but that can be re-encoded to UTF-8 by setting @code{wantSymbolUTF8} to |
| @samp{TRUE}. The Windows' port of cairographics has a rather peculiar |
| assumption: it wants the symbol font to be encoded in UTF-8 as if it |
| were encoded in Latin-1 rather than Adobe Symbol: this is selected by |
| @code{wantSymbolUTF8 = NA_LOGICAL}. |
| |
| Windows with MSVCRT as the C runtime has no UTF-8 locales, but rather expects to work with |
| UCS-2@footnote{or UTF-16 if support for surrogates is enabled in the OS, |
| which it used not to be when encoding support was added to @R{}.} |
| strings. @R{} (being written in standard C) would not work internally |
| with UCS-2 without extensive changes. The @file{Rgui} |
| console@footnote{but not the GraphApp toolkit.} uses UCS-2 internally, |
| but communicates with the @R{} engine in the native encoding. To allow |
| UTF-8 strings to be printed in UTF-8 in @file{Rgui.exe}, an escape |
| convention is used (see header file @file{rgui_UTF8.h}) by |
| @code{cat}, @code{print} and autoprinting. |
| |
| `Unicode' (UCS-2LE) files are common in the Windows world, and |
| @code{readLines} and @code{scan} will read them into UTF-8 strings on |
| Windows if the encoding is declared explicitly on an unopened |
| connection passed to those functions. |
| |
| Windows have multiple notions of the current locale encoding, one is in the |
| C runtime (C library) and another is the active code page (system locale). |
| The active code page is used when calling non-UTF-16 variants of Windows API |
| functions (earlier referred to as ANSI calls), either directly or indirectly |
| via POSIX wrappers inside MinGW-w64, from R, R packages and libraries they |
| link to. While R has handled many cases by calling directly the UTF-16 |
| variants of the Windows API, it still may sometimes use the non-UTF-16 ones, |
| and external libraries also primarily developed for POSIX systems typically |
| do that. Therefore, for R to reliably work with (non-ASCII) strings on |
| Windows, both the C locale encoding and the active code page on Windows must |
| be the same, and by default they are. |
| |
| The Windows UCRT C runtime supports UTF-8 locales. Historically, the active |
| code page was a system-wide setting, changing which required a reboot, and |
| UTF-8 was not supported. Later a "BETA: Use Unicode UTF-8 for worldwide |
| language support" feature has been added to set the active code page to |
| UTF-8, but this still required a reboot and impacted all applications, many |
| of which would not work correctly with that unexpected setting, so it could |
| not be used in practice. |
| |
| Windows since Windows 10 (version 1903), Windows Server 2022 (LTSC), and |
| Windows Server 1903 (semi-annual channel) allow setting the active code page |
| to UTF-8 in the application manifest. This changes the active code page |
| only for the given application and does so together with changing also the |
| UCRT C locale to UTF-8. R 4.2 for Windows uses this feature to get UTF-8 as |
| the native encoding on Windows. To make that possible, R had to switch to |
| UCRT, which in turn required creation of Rtools42. |
| |
| Older versions of Windows still rely on the previous encoding support where |
| the native encoding cannot be UTF-8. R 4.2 requires UCRT to work, but UCRT |
| can be installed on Windows since Vista SP2 and Windows Server 2008 SP2. It |
| is shipped with Windows since Windows 10 and Windows Server 2016. |
| |
| @node The CHARSXP cache, Warnings and errors, Encodings for CHARSXPs, R Internal Structures |
| @section The CHARSXP cache |
| |
| @findex mkChar |
| There is a global cache for @code{CHARSXP}s created by @code{mkChar} --- |
| the cache ensures that most @code{CHARSXP}s with the same contents share |
| storage (`contents' including any declared encoding). Not all |
| @code{CHARSXP}s are part of the cache -- notably @samp{NA_STRING} is |
| not. @code{CHARSXP}s reloaded from the @code{save} formats of @R{} prior |
| to 0.99.0 are not cached (since the code used is frozen and very few |
| examples still exist). |
| |
| @findex mkCharLenCE |
| The cache records the encoding of the string as well as the bytes: all |
| requests to create a @code{CHARSXP} should be @emph{via} a call to |
| @code{mkCharLenCE}. Any encoding given in @code{mkCharLenCE} call will |
| be ignored if the string's bytes are all @acronym{ASCII} characters. |
| |
| |
| @node Warnings and errors, S4 objects, The CHARSXP cache, R Internal Structures |
| @section Warnings and errors |
| |
| @findex warning |
| @findex warningcall |
| @findex error |
| @findex errorcall |
| |
| Each of @code{warning} and @code{stop} have two C-level equivalents, |
| @code{warning}, @code{warningcall}, @code{error} and @code{errorcall}. |
| The relationship between the pairs is similar: @code{warning} tries to |
| fathom out a suitable call, and then calls @code{warningcall} with that |
| call as the first argument if it succeeds, and with @code{call = |
| R_NilValue} if it does not. When @code{warningcall} is called, it |
| includes the deparsed call in its printout unless @code{call = |
| R_NilValue}. |
| |
| @code{warning} and @code{error} look at the context stack. If the |
| topmost context is not of type @code{CTXT_BUILTIN}, it is used to |
| provide the call, otherwise the next context provides the call. |
| This means that when these functions are called from a primitive or |
| @code{.Internal}, the imputed call will not be to |
| primitive/@code{.Internal} but to the function calling the |
| primitive/@code{.Internal} . This is exactly what one wants for a |
| @code{.Internal}, as this will give the call to the closure wrapper. |
| (Further, for a @code{.Internal}, the call is the argument to |
| @code{.Internal}, and so may not correspond to any @R{} function.) |
| However, it is unlikely to be what is needed for a primitive. |
| |
| The upshot is that that @code{warningcall} and @code{errorcall} should |
| normally be used for code called from a primitive, and @code{warning} |
| and @code{error} should be used for code called from a @code{.Internal} |
| (and necessarily from @code{.Call}, @code{.C} and so on, where the call |
| is not passed down). However, there are two complications. One is that |
| code might be called from either a primitive or a @code{.Internal}, in |
| which case probably @code{warningcall} is more appropriate. The other |
| involves replacement functions, where the call was once of the form |
| @example |
| > length(x) <- y ~ x |
| Error in "length<-"(`*tmp*`, value = y ~ x) : invalid value |
| @end example |
| |
| @noindent |
| which is unpalatable to the end user. For replacement functions there |
| will be a suitable context at the top of the stack, so @code{warning} |
| should be used. (The results for @code{.Internal} replacement functions |
| such as @code{substr<-} are not ideal.) |
| |
| |
| |
| @node S4 objects, Memory allocators, Warnings and errors, R Internal Structures |
| @section S4 objects |
| |
| [This section is currently a preliminary draft and should not be taken |
| as definitive. The description assumes that @env{R_NO_METHODS_TABLES} |
| has not been set.] |
| |
| @menu |
| * Representation of S4 objects:: |
| * S4 classes:: |
| * S4 methods:: |
| * Mechanics of S4 dispatch:: |
| @end menu |
| |
| @node Representation of S4 objects, S4 classes, S4 objects, S4 objects |
| @subsection Representation of S4 objects |
| |
| S4 objects can be of any @code{SEXPTYPE}. They are either an object of |
| a simple type (such as an atomic vector or function) with S4 class |
| information or of type @code{S4SXP}. In all cases, the `S4 bit' (bit 4 |
| of the `general purpose' field) is set, and can be tested by the |
| macro/function @code{IS_S4_OBJECT}. |
| |
| S4 objects are created via @code{new()}@footnote{This can also create |
| non-S4 objects, as in @code{new("integer")}.} and thence via the C |
| function @code{R_do_new_object}. This duplicates the prototype of the |
| class, adds a class attribute and sets the S4 bit. All S4 class |
| attributes should be character vectors of length one with an attribute |
| giving (as a character string) the name of the package (or |
| @code{.GlobalEnv}) containing the class definition. Since S4 objects |
| have a class attribute, the @code{OBJECT} bit is set. |
| |
| It is currently unclear what should happen if the class attribute is |
| removed from an S4 object, or if this should be allowed. |
| |
| @node S4 classes, S4 methods, Representation of S4 objects, S4 objects |
| @subsection S4 classes |
| |
| S4 classes are stored as @R{} objects in the environment in which they |
| are created, with names @code{.__C__@var{classname}}: as such they are |
| not listed by default by @code{ls}. |
| |
| The objects are S4 objects of class @code{"classRepresentation"} which |
| is defined in the @pkg{methods} package. |
| |
| Since these are just objects, they are subject to the normal scoping |
| rules and can be imported and exported from namespaces like other |
| objects. The directives @code{importClassesFrom} and |
| @code{exportClasses} are merely convenient ways to refer to class |
| objects without needing to know their internal `metaname' (although |
| @code{exportClasses} does a little sanity checking via @code{isClass}). |
| |
| @node S4 methods, Mechanics of S4 dispatch, S4 classes, S4 objects |
| @subsection S4 methods |
| |
| Details of the methods are stored in environments (typically hidden in the |
| respective namespace) with a non-syntactic name of the form |
| @code{.__T__@var{generic}:@var{package}} containing objects of class |
| @code{MethodDefinition} for all methods defined in the current environment |
| for the named generic derived from a specific package (which might be @code{.GlobalEnv}). |
| This is sometimes referred to as a `methods table'. |
| |
| For example, |
| @example |
| length(nM <- asNamespace("Matrix") ) # 941 for Matrix 1.2-6 |
| length(meth <- grep("^[.]__T__", names(nM), value=TRUE))# 107 generics with methods |
| length(meth.Ops <- nM$`.__T__Ops:base`) # 71 methods for the 'Ops' (group)generic |
| head(sort(names(meth.Ops))) ## "abIndex#abIndex" ... "ANY#ddiMatrix" "ANY#ldiMatrix" "ANY#Matrix" |
| @end example |
| |
| During an @R{} session there is an environment associated with each |
| non-primitive generic containing objects @code{.AllMTable}, |
| @code{.Generic}, @code{.Methods}, @code{.MTable}, @code{.SigArgs} and |
| @code{.SigLength}. @code{.MTable} and @code{AllMTable} are merged |
| methods tables containing all the methods defined directly and via |
| inheritance respectively. @code{.Methods} is a merged methods list. |
| |
| Exporting methods from a namespace is more complicated than exporting a |
| class. Note first that you do not export a method, but rather the |
| directive @code{exportMethods} will export all the methods defined in |
| the namespace for a specified generic: the code also adds to the list |
| of generics any that are exported directly. For generics which are |
| listed via @code{exportMethods} or exported themselves, the |
| corresponding environment is exported and so |
| will appear (as hidden object) in the package environment. |
| |
| Methods for primitives which are internally S4 generic (see below) are |
| always exported, whether mentioned in the @file{NAMESPACE} file or not. |
| |
| Methods can be imported either via the directive |
| @code{importMethodsFrom} or via importing a namespace by @code{import}. |
| Also, if a generic is imported via @code{importFrom}, its methods are |
| also imported. In all cases the generic will be imported if it is in |
| the namespace, so @code{importMethodsFrom} is most appropriate for |
| methods defined on generics in other packages. Since methods for a |
| generic could be imported from several different packages, the methods |
| tables are merged. |
| |
| When a package is attached |
| @code{methods:::cacheMetaData} is called to update the internal tables: |
| only the visible methods will be cached. |
| |
| |
| @node Mechanics of S4 dispatch, , S4 methods, S4 objects |
| @subsection Mechanics of S4 dispatch |
| |
| This subsection does not discuss how S4 methods are chosen: see |
| @uref{https://developer.@/r-project.org/howMethodsWork.pdf}. |
| |
| For all but primitive functions, setting a method on an existing |
| function that is not itself S4 generic creates a new object in the |
| current environment which is a call to @code{standardGeneric} with the |
| old definition as the default method. Such S4 generics can also be |
| created @emph{via} a call to @code{setGeneric}@footnote{although this is |
| not recommended as it is less future-proof.} and are standard closures |
| in the @R{} language, with environment the environment within which they |
| are created. With the advent of namespaces this is somewhat |
| problematic: if @code{myfn} was previously in a package with a name |
| space there will be two functions called @code{myfn} on the search |
| paths, and which will be called depends on which search path is in use. |
| This is starkest for functions in the base namespace, where the |
| original will be found ahead of the newly created function from any |
| other package. |
| |
| Primitive functions are treated quite differently, for efficiency |
| reasons: this results in different semantics. @code{setGeneric} is |
| disallowed for primitive functions. The @pkg{methods} namespace |
| contains a list @code{.BasicFunsList} named by primitive functions: |
| the entries are either @code{FALSE} or a standard S4 generic showing |
| the effective definition. When @code{setMethod} (or |
| @code{setReplaceMethod}) is called, it either fails (if the list entry |
| is @code{FALSE}) or a method is set on the effective generic given in |
| the list. |
| |
| Actual dispatch of S4 methods for almost all primitives piggy-backs on |
| the S3 dispatch mechanism, so S4 methods can only be dispatched for |
| primitives which are internally S3 generic. When a primitive that is |
| internally S3 generic is called with a first argument which is an S4 |
| object and S4 dispatch is on (that is, the @pkg{methods} namespace is |
| loaded), @code{DispatchOrEval} calls @code{R_possible_dispatch} (defined |
| in file @file{src/main/objects.c}). (Members of the S3 group generics, |
| which includes all the generic operators, are treated slightly |
| differently: the first two arguments are checked and |
| @code{DispatchGroup} is called.) @code{R_possible_dispatch} first |
| checks an internal table to see if any S4 methods are set for that |
| generic (and S4 dispatch is currently enabled for that generic), and if |
| so proceeds to S4 dispatch using methods stored in another internal |
| table. All primitives are in the base namespace, and this mechanism |
| means that S4 methods can be set for (some) primitives and will always |
| be used, in contrast to setting methods on non-primitives. |
| |
| The exception is @code{%*%}, which is S4 generic but not S3 generic as |
| its C code contains a direct call to @code{R_possible_dispatch}. |
| |
| The primitive @code{as.double} is special, as @code{as.numeric} and |
| @code{as.real} are copies of it. The @pkg{methods} package code partly |
| refers to generics by name and partly by function, and maps |
| @code{as.double} and @code{as.real} to @code{as.numeric} (since that is |
| the name used by packages exporting methods for it). |
| |
| Some elements of the language are implemented as primitives, for example |
| @code{@}}. This includes the subset and subassignment `functions' and |
| they are S4 generic, again piggybacking on S3 dispatch. |
| |
| @code{.BasicFunsList} is generated when @pkg{methods} is installed, by |
| computing all primitives, initially disallowing methods on all and then |
| setting generics for members of @code{.GenericArgsEnv}, the S4 group |
| generics and a short exceptions list in file @file{BasicFunsList.R}: this |
| currently contains the subsetting and subassignment operators and an |
| override for @code{c}. |
| |
| @node Memory allocators, Internal use of global and base environments, S4 objects, R Internal Structures |
| @section Memory allocators |
| |
| @R{}'s memory allocation is almost all done via routines in file |
| @file{src/main/memory.c}. |
| |
| The rest of @R{} should where possible make use of the allocators made |
| available by file @file{src/main/memory.c}, which are also the methods |
| recommended in |
| @ifset UseExternalXrefs |
| @ref{Memory allocation, , Memory allocation, R-exts, Writing R Extensions} |
| @end ifset |
| @ifclear UseExternalXrefs |
| `Writing R Extensions' |
| @end ifclear |
| @findex R_alloc |
| @findex R_Calloc |
| @findex R_Realloc |
| @findex R_Free |
| for use in @R{} packages, namely the use of @code{R_alloc}, |
| @code{R_Calloc}, @code{R_Realloc} and @code{R_Free}. Memory allocated by |
| @code{R_alloc} is freed by the garbage collector once the `watermark' |
| has been reset by calling |
| @findex vmaxset |
| @code{vmaxset}. This is done automatically by the wrapper code calling |
| primitives and @code{.Internal} functions (and also by the wrapper code |
| to @code{.Call} and @code{.External}), but |
| @findex vmaxget |
| @code{vmaxget} and @code{vmaxset} can be used to reset the watermark |
| from within internal code if the memory is only required for a short |
| time. |
| |
| @findex alloca |
| All of the methods of memory allocation mentioned so far are relatively |
| expensive. All @R{} platforms support @code{alloca}, and in almost all |
| cases@footnote{but apparently not on Windows.} this is managed by the |
| compiler, allocates memory on the C stack and is very efficient. |
| |
| There are two disadvantages in using @code{alloca}. First, it is |
| fragile and care is needed to avoid writing (or even reading) outside |
| the bounds of the allocation block returned. Second, it increases the |
| danger of overflowing the C stack. It is suggested that it is only |
| used for smallish allocations (up to tens of thousands of bytes), and |
| that |
| |
| @findex R_CheckStack |
| @example |
| R_CheckStack(); |
| @end example |
| |
| @noindent |
| is called immediately after the allocation (as @R{}'s stack checking |
| mechanism will warn far enough from the stack limit to allow for modest |
| use of alloca). (@code{do_makeunique} in file @file{src/main/unique.c} |
| provides an example of both points.) |
| |
| There is an alternative check, |
| @findex R_CheckStack2 |
| @example |
| R_CheckStack2(size_t extra); |
| @end example |
| |
| @noindent |
| to be called immediately @emph{before} trying an allocation of |
| @code{extra} bytes. |
| |
| An alternative strategy has been used for various functions which |
| require intermediate blocks of storage of varying but usually small |
| size, and this has been consolidated into the routines in the header |
| file @file{src/main/RBufferUtils.h}. This uses a structure which |
| contains a buffer, the current size and the default size. A call to |
| @findex R_AllocStringBuffer |
| @example |
| R_AllocStringBuffer(size_t blen, R_StringBuffer *buf); |
| @end example |
| |
| @noindent |
| sets @code{buf->data} to a memory area of at least @code{blen+1} bytes. |
| At least the default size is used, which means that for small |
| allocations the same buffer can be reused. A call to |
| @findex R_FreeStringBufferL |
| @findex R_FreeStringBuffer |
| @code{R_FreeStringBufferL} releases memory if more than the default has |
| been allocated whereas a call to @code{R_FreeStringBuffer} frees any |
| memory allocated. |
| |
| The @code{R_StringBuffer} structure needs to be initialized, for example by |
| |
| @example |
| static R_StringBuffer ex_buff = @{NULL, 0, MAXELTSIZE@}; |
| @end example |
| |
| @noindent |
| which uses a default size of @code{MAXELTSIZE = 8192} bytes. Most |
| current uses have a static @code{R_StringBuffer} structure, which |
| allows the (default-sized) buffer to be shared between calls to e.g.@: |
| @code{grep} and even between functions: this will need to be changed if |
| @R{} ever allows concurrent evaluation threads. So the idiom is |
| |
| @example |
| static R_StringBuffer ex_buff = @{NULL, 0, MAXELTSIZE@}; |
| ... |
| char *buf; |
| for(i = 0; i < n; i++) @{ |
| compute len |
| buf = R_AllocStringBuffer(len, &ex_buff); |
| use buf |
| @} |
| /* free allocation if larger than the default, but leave |
| default allocated for future use */ |
| R_FreeStringBufferL(&ex_buff); |
| @end example |
| |
| |
| @menu |
| * Internals of R_alloc:: |
| @end menu |
| |
| @node Internals of R_alloc, , Memory allocators, Memory allocators |
| @subsection Internals of R_alloc |
| |
| The memory used by @code{R_alloc} is allocated as @R{} vectors, of type |
| @code{RAWSXP}. Thus the allocation is in units of 8 bytes, and is |
| rounded up. A request for zero bytes currently returns @code{NULL} (but |
| this should not be relied on). For historical reasons, in all other |
| cases 1 byte is added before rounding up so the allocation is always |
| 1--8 bytes more than was asked for: again this should not be relied on. |
| |
| The vectors allocated are protected via the setting of @code{R_VStack}, |
| as the garbage collector marks everything that can be reached from that |
| location. When a vector is @code{R_alloc}ated, its @code{ATTRIB} |
| pointer is set to the current @code{R_VStack}, and @code{R_VStack} is |
| set to the latest allocation. Thus @code{R_VStack} is a single-linked |
| chain of the vectors currently allocated via @code{R_alloc}. Function |
| @code{vmaxset} resets the location @code{R_VStack}, and should be to a |
| value that has previously be obtained @emph{via} @code{vmaxget}: |
| allocations after the value was obtained will no longer be protected and |
| hence available for garbage collection. |
| |
| @node Internal use of global and base environments, Modules, Memory allocators, R Internal Structures |
| @section Internal use of global and base environments |
| |
| This section notes known use by the system of these environments: the |
| intention is to minimize or eliminate such uses. |
| |
| @menu |
| * Base environment:: |
| * Global environment:: |
| @end menu |
| |
| @node Base environment, Global environment, Internal use of global and base environments, Internal use of global and base environments |
| @subsection Base environment |
| |
| @cindex base environment |
| @cindex environment, base |
| @findex .Device |
| @findex .Devices |
| The graphics devices system maintains two variables @code{.Device} and |
| @code{.Devices} in the base environment: both are always set. The |
| variable @code{.Devices} gives a list of character vectors of the names |
| of open devices, and @code{.Device} is the element corresponding to the |
| currently active device. The null device will always be open. |
| |
| @findex .Options |
| There appears to be a variable @code{.Options}, a pairlist giving the |
| current options settings. But in fact this is just a symbol with a |
| value assigned, and so shows up as a base variable. |
| |
| @findex .Last.value |
| Similarly, the evaluator creates a symbol @code{.Last.value} which |
| appears as a variable in the base environment. |
| |
| @findex .Traceback |
| @findex last.warning |
| Errors can give rise to objects @code{.Traceback} and |
| @code{last.warning} in the base environment. |
| |
| @node Global environment, , Base environment, Internal use of global and base environments |
| @subsection Global environment |
| |
| @cindex global environment |
| @cindex environment, global |
| @findex .Random.seed |
| The seed for the random number generator is stored in object |
| @code{.Random.seed} in the global environment. |
| |
| @findex dump.frames |
| Some error handlers may give rise to objects in the global environment: |
| for example @code{dump.frames} by default produces @code{last.dump}. |
| |
| @findex .SavedPlots |
| The @code{windows()} device makes use of a variable @code{.SavedPlots} |
| to store display lists of saved plots for later display. This is |
| regarded as a variable created by the user. |
| |
| |
| @node Modules, Visibility, Internal use of global and base environments, R Internal Structures |
| @section Modules |
| |
| @cindex modules |
| @R{} makes use of a number of shared objects/DLLs stored in the |
| @file{modules} directory. These are parts of the code which have been |
| chosen to be loaded `on demand' rather than linked as dynamic libraries |
| or incorporated into the main executable/dynamic library. |
| |
| For the remaining modules the motivation has been the amount of (often |
| optional) code they will bring in @emph{via} libraries to which they are |
| linked. |
| |
| @table @asis |
| |
| @item @code{internet} |
| The internal HTTP and FTP clients and socket support, which link to |
| system-specific support libraries. This may load @code{libcurl} and on |
| Windows will load @file{wininet.dll} and @file{ws2_32.dll}. |
| |
| @item @code{lapack} |
| The code which makes use of the LAPACK library, and is linked to |
| @file{libRlapack} or an external LAPACK library. |
| |
| @item @code{X11} |
| (Unix-alikes only.) The @code{X11()}, @code{jpeg()}, @code{png()} and |
| @code{tiff()} devices. These are optional, and links to some or all of |
| the @code{X11}, @code{pango}, @code{cairo}, @code{jpeg}, @code{libpng} |
| and @code{libtiff} libraries. |
| @end table |
| |
| @node Visibility, Lazy loading, Modules, R Internal Structures |
| @section Visibility |
| @cindex visibility |
| |
| @menu |
| * Hiding C entry points:: |
| * Variables in Windows DLLs:: |
| @end menu |
| |
| @node Hiding C entry points, Variables in Windows DLLs, Visibility, Visibility |
| @subsection Hiding C entry points |
| |
| We make use of the visibility mechanisms discussed in |
| @ifset UseExternalXrefs |
| @ref{Controlling visibility, , Controlling visibility, R-exts, Writing R Extensions}, |
| @end ifset |
| @ifclear UseExternalXrefs |
| section `Controlling Visibility' in `Writing R Extensions', |
| @end ifclear |
| C entry points not needed outside the main @R{} executable/dynamic |
| library (and in particular in no package nor module) should be prefixed |
| by @code{attribute_hidden}. |
| @findex attribute_hidden |
| Minimizing the visibility of symbols in the @R{} dynamic library will |
| speed up linking to it (which packages will do) and reduce the |
| possibility of linking to the wrong entry points of the same name. In |
| addition, on some platforms reducing the number of entry points allows |
| more efficient versions of PIC to be used: somewhat over half the entry |
| points are hidden. A convenient way to hide variables (as distinct from |
| functions) is to declare them @code{extern0} in header file @file{Defn.h}. |
| |
| The visibility mechanism used is only available with some compilers and |
| platforms, and in particular not on Windows, where an alternative |
| mechanism is used. Entry points will not be made available in |
| @file{R.dll} if they are listed in the file |
| @file{src/gnuwin32/Rdll.hide}. |
| @findex Rdll.hide |
| Entries in that file start with a space and must be strictly in |
| alphabetic order in the C locale (use @command{sort} on the file to |
| ensure this if you change it). It is possible to hide Fortran as well |
| as C entry points via this file: the former are lower-cased and have an |
| underline as suffix, and the suffixed name should be included in the |
| file. Some entry points exist only on Windows or need to be visible |
| only on Windows, and some notes on these are provided in file |
| @file{src/gnuwin32/Maintainters.notes}. |
| |
| Because of the advantages of reducing the number of visible entry |
| points, they should be declared @code{attribute_hidden} where possible. |
| Note that this only has an effect on a shared-R-library build, and so |
| care is needed not to hide entry points that are legitimately used by |
| packages. So it is best if the decision on visibility is made when a |
| new entry point is created, including the decision if it should be |
| included in header file @file{Rinternals.h}. A list of the visible |
| entry points on shared-R-library build on a reasonably standard |
| Unix-alike can be made by something like |
| |
| @example |
| nm -g libR.so | grep ' [BCDT] ' | cut -b20- |
| @end example |
| |
| @node Variables in Windows DLLs, , Hiding C entry points, Visibility |
| @subsection Variables in Windows DLLs |
| |
| Windows is unique in that it conventionally treats importing variables |
| differently from functions: variables that are imported from a DLL need |
| to be specified by a prefix (often @samp{_imp_}) when being linked to |
| (`imported') but not when being linked from (`exported'). The details |
| depend on the compiler system, and have changed for MinGW during the |
| lifetime of that port. They are in the main hidden behind some macros |
| defined in header file @file{R_ext/libextern.h}. |
| |
| A (non-function) variable in the main @R{} sources that needs to be |
| referred to outside @file{R.dll} (in a package, module or another DLL |
| such as @file{Rgraphapp.dll}) should be declared with prefix |
| @code{LibExtern}. The main use is in @file{Rinternals.h}, but it needs |
| to be considered for any public header and also @file{Defn.h}. |
| |
| It would nowadays be possible to make use of the `auto-import' feature |
| of the MinGW port of @command{ld} to fix up imports from DLLs (and if |
| @R{} is built for the Cygwin platform this is what happens). However, |
| this was not possible when the MinGW build of @R{} was first constructed |
| in ca 1998, allows less control of visibility and would not work for |
| other Windows compiler suites. |
| |
| It is only possible to check if this has been handled correctly by |
| compiling the @R{} sources on Windows. |
| |
| @node Lazy loading, , Visibility, R Internal Structures |
| @section Lazy loading |
| |
| Lazy loading is always used for code in packages but is optional |
| (selected by the package maintainer) for datasets in packages. When a |
| package/namespace which uses it is loaded, the package/namespace |
| environment is populated with promises for all the named objects: when |
| these promises are evaluated they load the actual code from a database. |
| |
| There are separate databases for code and data, stored in the @file{R} |
| and @file{data} subdirectories. The database consists of two files, |
| @file{@var{name}.rdb} and @file{@var{name}.rdx}. The @file{.rdb} file |
| is a concatenation of serialized objects, and the @file{.rdx} file |
| contains an index. The objects are stored in (usually) a |
| @command{gzip}-compressed format with a 4-byte header giving the |
| uncompressed serialized length (in XDR, that is big-endian, byte order) |
| and read by a call to the primitive @code{lazyLoadDBfetch}. (Note that |
| this makes lazy-loading unsuitable for really large objects: the |
| unserialized length of an @R{} object can exceed 4GB.) |
| |
| The index or `map' file @file{@var{name}.rdx} is a compressed serialized |
| @R{} object to be read by @code{readRDS}. It is a list with three |
| elements @code{variables}, @code{references} and @code{compressed}. The |
| first two are named lists of integer vectors of length 2 giving the |
| offset and length of the serialized object in the @file{@var{name}.rdb} |
| file. Element @code{variables} has an entry for each named object: |
| @code{references} serializes a temporary environment used when named |
| environments are added to the database. @code{compressed} is a logical |
| indicating if the serialized objects were compressed: compression is |
| always used nowadays. We later added the values @code{compressed = 2} |
| and @code{3} for @command{bzip2} and @command{xz} compression (with the |
| possibility of future expansion to other methods): these formats add a |
| fifth byte to the header for the type of compression, and store |
| serialized objects uncompressed if compression expands them. |
| |
| Source references are treated specially for performance reasons: bindings |
| @code{lines} and @code{parseData} from @code{srcfile} environments are |
| loaded lazily. This uses a mechanism that allows loading selected bindings |
| from an environment lazily. The key for such environment is a list with two |
| elements: @code{eagerKey} gives the length-two integer key for the bindings |
| loaded eagerly and @code{lazyKeys} gives a vector of length-two integer |
| keys, one for each lazily loaded binding. |
| |
| The loader for a lazy-load database of code or data is function |
| @code{lazyLoad} in the @pkg{base} package, but note that there is a |
| separate copy to load @pkg{base} itself in file |
| @file{R_HOME/base/R/base}. |
| |
| Lazy-load databases are created by the code in |
| @file{src/library/tools/R/makeLazyLoad.R}: the main tool is the |
| unexported function @code{makeLazyLoadDB} and the insertion of database |
| entries is done by calls to @code{.Call("R_lazyLoadDBinsertValue", |
| ...)}. |
| |
| Lazy-load databases of less than 10MB are cached in memory at first use: |
| this was found necessary when using file systems with high latency |
| (removable devices and network-mounted file systems on Windows). |
| |
| Lazy-load databases are loaded into the exports for a package, but not |
| into the namespace environment itself. Thus they are visible when the |
| package is @emph{attached}, and also @emph{via} the @code{::} operator. |
| This was a deliberate design decision, as packages mostly make datasets |
| available for use by the end user (or other packages), and they should |
| not be found preferentially from functions in the package, surprising |
| users who expected the normal search path to be used. (There is an |
| alternative mechanism, @file{sysdata.rda}, for `system datasets' that |
| are intended primarily to be used within the package.) |
| |
| The same database mechanism is used to store parsed @file{Rd} files. |
| One or all of the parsed objects is fetched by a call to |
| @code{tools:::fetchRdDB}. |
| |
| @node .Internal vs .Primitive, Internationalization in the R sources, R Internal Structures, Top |
| @chapter @code{.Internal} vs @code{.Primitive} |
| |
| @findex .Internal |
| @findex .Primitive |
| C code compiled into @R{} at build time can be called directly in what |
| are termed @emph{primitives} or via the @code{.Internal} interface, |
| which is very similar to the @code{.External} interface except in |
| syntax. More precisely, @R{} maintains a table of @R{} function names and |
| corresponding C functions to call, which by convention all start with |
| @samp{do_} and return a @code{SEXP}. This table (@code{R_FunTab} in |
| file @file{src/main/names.c}) also specifies how many arguments to a |
| function are required or allowed, whether or not the arguments are to be |
| evaluated before calling, and whether the function is `internal' in |
| the sense that it must be accessed via the @code{.Internal} interface, |
| or directly accessible in which case it is printed in @R{} as |
| @code{.Primitive}. |
| |
| Functions using @code{.Internal()} wrapped in a closure are in general |
| preferred as this ensures standard handling of named and default |
| arguments. For example, @code{grep} is defined as |
| |
| @example |
| @group |
| grep <- |
| function (pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE, |
| fixed = FALSE, useBytes = FALSE, invert = FALSE) |
| @{ |
| if (!is.character(x)) x <- structure(as.character(x), names = names(x)) |
| .Internal(grep(as.character(pattern), x, ignore.case, value, |
| perl, fixed, useBytes, invert)) |
| @} |
| |
| @end group |
| @end example |
| @noindent |
| and the use of @code{as.character} allows methods to be dispatched (for |
| example, for factors). |
| |
| However, for reasons of convenience and also efficiency (as there is |
| some overhead in using the @code{.Internal} interface wrapped in a |
| function closure), the primitive functions are exceptions that can be |
| accessed directly. And of course, primitive functions are needed for |
| basic operations---for example @code{.Internal} is itself a primitive. |
| Note that primitive functions make no use of @R{} code, and hence are |
| very different from the usual interpreted functions. In particular, |
| @code{formals} and @code{body} return @code{NULL} for such objects, and |
| argument matching can be handled differently. For some primitives |
| (including @code{call}, @code{switch}, @code{.C} and @code{.subset}) |
| positional matching is important to avoid partial matching of the first |
| argument. |
| |
| The list of primitive functions is subject to change; currently, it |
| includes the following. |
| |
| @enumerate |
| |
| @item |
| ``Special functions'' which really are @emph{language} elements, but |
| implemented as primitive functions: |
| |
| @example |
| @group |
| @{ ( if for while repeat break next |
| return function quote switch |
| @end group |
| @end example |
| |
| @item |
| Language elements and basic @emph{operator}s (i.e., functions usually |
| @emph{not} called as @code{foo(a, b, ...)}) for subsetting, assignment, |
| arithmetic, comparison and logic: |
| |
| @example |
| @group |
| [ [[ $ @@ |
| <- <<- = [<- [[<- $<- @@<- |
| |
| + - * / ^ %% %*% %/% |
| < <= == != >= > |
| | || & && ! |
| @end group |
| @end example |
| |
| @noindent |
| When the arithmetic, comparison and logical operators are called as |
| functions, any argument names are discarded so positional matching is used. |
| |
| @item |
| ``Low level'' 0-- and 1--argument functions which belong to one of the |
| following groups of functions: |
| |
| @enumerate a |
| @item |
| Basic mathematical functions with a single argument, i.e., |
| |
| @example |
| @group |
| abs sign sqrt |
| floor ceiling |
| @end group |
| |
| @group |
| exp expm1 |
| log2 log10 log1p |
| cos sin tan |
| acos asin atan |
| cosh sinh tanh |
| acosh asinh atanh |
| cospi sinpi tanpi |
| @end group |
| |
| @group |
| gamma lgamma digamma trigamma |
| @end group |
| |
| @group |
| cumsum cumprod cummax cummin |
| @end group |
| |
| @group |
| Im Re Arg Conj Mod |
| @end group |
| @end example |
| |
| @code{log} is a primitive function of one or two arguments with named |
| argument matching. |
| |
| @code{trunc} is a difficult case: it is a primitive that can have one |
| or more arguments: the default method handled in the primitive has |
| only one. |
| |
| @item |
| Functions rarely used outside of ``programming'' (i.e., mostly used |
| inside other functions), such as |
| |
| @example |
| @group |
| nargs missing on.exit interactive |
| as.call as.character as.complex as.double |
| as.environment as.integer as.logical as.raw |
| is.array is.atomic is.call is.character |
| is.complex is.double is.environment is.expression |
| is.finite is.function is.infinite is.integer |
| is.language is.list is.logical is.matrix |
| is.na is.name is.nan is.null |
| is.numeric is.object is.pairlist is.raw |
| is.real is.recursive is.single is.symbol |
| baseenv emptyenv globalenv pos.to.env |
| unclass invisible seq_along seq_len |
| @end group |
| @end example |
| |
| @item |
| The programming and session management utilities |
| |
| @example |
| @group |
| browser proc.time gc.time tracemem retracemem untracemem |
| @end group |
| @end example |
| |
| @end enumerate |
| |
| @item |
| The following basic replacement and extractor functions |
| |
| @example |
| @group |
| length length<- |
| class class<- |
| oldClass oldClass<- |
| attr attr<- |
| attributes attributes<- |
| names names<- |
| dim dim<- |
| dimnames dimnames<- |
| environment<- |
| levels<- |
| storage.mode<- |
| @end group |
| @end example |
| |
| @findex NAMED |
| @noindent |
| Note that optimizing @code{NAMED = 1} is only effective within a |
| primitive (as the closure wrapper of a @code{.Internal} will set |
| @code{NAMED = NAMEDMAX} when the promise to the argument is evaluated) and |
| hence replacement functions should where possible be primitive to avoid |
| copying (at least in their default methods). |
| [The @code{NAMED} mechanism has been replaced by reference counting.] |
| |
| @item |
| The following functions are primitive for efficiency reasons: |
| |
| @example |
| @group |
| : ~ c list |
| call expression substitute |
| UseMethod standardGeneric |
| .C .Fortran .Call .External |
| round signif rep seq.int |
| @end group |
| @end example |
| |
| @noindent |
| as well as the following internal-use-only functions |
| |
| @example |
| @group |
| .Primitive .Internal |
| .Call.graphics .External.graphics |
| .subset .subset2 |
| .primTrace .primUntrace |
| lazyLoadDBfetch |
| @end group |
| @end example |
| |
| @end enumerate |
| |
| |
| The multi-argument primitives |
| @example |
| @group |
| call switch |
| .C .Fortran .Call .External |
| @end group |
| @end example |
| |
| @noindent |
| intentionally use positional matching, and need to do so to avoid |
| partial matching to their first argument. They do check that the first |
| argument is unnamed or for the first two, partially matches the formal |
| argument name. On the other hand, |
| |
| @example |
| @group |
| attr attr<- browser rememtrace substitute UseMethod |
| log round signif rep seq.int |
| @end group |
| @end example |
| |
| @noindent |
| manage their own argument matching and do work in the standard way. |
| |
| All the one-argument primitives check that if they are called with a |
| named argument that this (partially) matches the name given in the |
| documentation: this is also done for replacement functions with one |
| argument plus @code{value}. |
| |
| The net effect is that argument matching for primitives intended for |
| end-user use @emph{as functions} is done in the same way as for |
| interpreted functions except for the six exceptions where positional |
| matching is required. |
| |
| @menu |
| * Special primitives:: |
| * Special internals:: |
| * Prototypes for primitives:: |
| * Adding a primitive:: |
| @end menu |
| |
| @node Special primitives, Special internals, .Internal vs .Primitive, .Internal vs .Primitive |
| @section Special primitives |
| |
| A small number of primitives are @emph{specials} rather than |
| @emph{builtins}, that is they are entered with unevaluated arguments. |
| This is clearly necessary for the language constructs and the assignment |
| operators, as well as for @code{&&} and @code{||} which conditionally |
| evaluate their second argument, and @code{~}, @code{.Internal}, |
| @code{call}, @code{expression}, @code{missing}, @code{on.exit}, |
| @code{quote} and @code{substitute} which do not evaluate some of their |
| arguments. |
| |
| @code{rep} and @code{seq.int} are special as they evaluate some of their |
| arguments conditional on which are non-missing. |
| |
| @code{log}, @code{round} and @code{signif} are special to allow default |
| values to be given to missing arguments. |
| |
| The subsetting, subassignment and @code{@@} operators are all special. |
| (For both extraction and replacement forms, @code{$} and @code{@@} |
| take a symbol argument, and @code{[} and @code{[[} allow missing |
| arguments.) |
| |
| @code{UseMethod} is special to avoid the additional contexts added to |
| calls to builtins. |
| |
| @node Special internals, Prototypes for primitives, Special primitives, .Internal vs .Primitive |
| @section Special internals |
| |
| There are also special @code{.Internal} functions: @code{NextMethod}, |
| @code{Recall}, @code{withVisible}, @code{cbind}, @code{rbind} (to allow |
| for the @code{deparse.level} argument), @code{eapply}, @code{lapply} and |
| @code{vapply}. |
| |
| @node Prototypes for primitives, Adding a primitive, Special internals, .Internal vs .Primitive |
| @section Prototypes for primitives |
| |
| Prototypes are available for the primitive functions and operators, and |
| these are used for printing, @code{args} and package checking (e.g.@: by |
| @code{tools::checkS3methods} and by package @CRANpkg{codetools}). There are |
| two environments in the @pkg{base} package (and namespace), |
| @samp{.GenericArgsEnv} for those primitives which are internal S3 |
| generics, and @samp{.ArgsEnv} for the rest. Those environments contain |
| closures with the same names as the primitives, formal arguments derived |
| (manually) from the help pages, a body which is a suitable call to |
| @code{UseMethod} or @code{NULL} and environment the base namespace. |
| |
| The C code for @code{print.default} and @code{args} uses the closures in |
| these environments in preference to the definitions in base (as |
| primitives). |
| |
| The QC function @code{undoc} checks that all the functions prototyped in |
| these environments are currently primitive, and that the primitives not |
| included are better thought of as language elements (at the time of |
| writing |
| |
| @example |
| $ $<- && ( : @@ @@<- [ [[ [[<- [<- @{ || ~ <- <<- = |
| break for function if next repeat return while |
| @end example |
| |
| @noindent |
| ). One could argue about @code{~}, but it is known to the parser and has |
| semantics quite unlike a normal function. And @code{:} is documented |
| with different argument names in its two meanings. |
| |
| The QC functions @code{codoc} and @code{checkS3methods} also make use of |
| these environments (effectively placing them in front of base in the |
| search path), and hence the formals of the functions they contain are |
| checked against the help pages by @code{codoc}. However, there are two |
| problems with the generic primitives. The first is that many of the |
| operators are part of the S3 group generic @code{Ops} and that defines |
| their arguments to be @code{e1} and @code{e2}: although it would be very |
| unusual, an operator could be called as e.g.@: @code{"+"(e1=a, e2=b)} |
| and if method dispatch occurred to a closure, there would be an argument |
| name mismatch. So the definitions in environment @code{.GenericArgsEnv} |
| have to use argument names @code{e1} and @code{e2} even though the |
| traditional documentation is in terms of @code{x} and @code{y}: |
| @code{codoc} makes the appropriate adjustment via |
| @code{tools:::.make_S3_primitive_generic_env}. The second discrepancy |
| is with the @code{Math} group generics, where the group generic is |
| defined with argument list @code{(x, ...)}, but most of the members only |
| allow one argument when used as the default method (and @code{round} and |
| @code{signif} allow two as default methods): again fix-ups are used. |
| |
| Those primitives which are in @code{.GenericArgsEnv} are checked (via |
| @file{tests/primitives.R}) to be generic @emph{via} defining methods for |
| them, and a check is made that the remaining primitives are probably not |
| generic, by setting a method and checking it is not dispatched to (but |
| this can fail for other reasons). However, there is no certain way to |
| know that if other @code{.Internal} or primitive functions are not |
| internally generic except by reading the source code. |
| |
| @node Adding a primitive, , Prototypes for primitives, .Internal vs .Primitive |
| @section Adding a primitive |
| |
| [For R-core use: reverse this procedure to remove a primitive. Most |
| commonly this is done by changing a @code{.Internal} to a primitive or |
| @emph{vice versa}.] |
| |
| Primitives are listed in the table @code{R_FunTab} in |
| @file{src/main/names.c}: primitives have @samp{Y = 0} in the @samp{eval} |
| field. |
| |
| There needs to be an @samp{\alias} entry in a help file in the @pkg{base} |
| package, and the primitive needs to be added to one of the lists at the |
| start of this section. |
| |
| Some primitives are regarded as language elements (the current ones are |
| listed above). These need to be added to two lists of exceptions, |
| @code{langElts} in @code{undoc()} (in file |
| @file{src/library/tools/R/QC.R}) and @code{lang_elements} in |
| @file{tests/primitives.R}. |
| |
| All other primitives are regarded as functions and should be listed in |
| one of the environments defined in @file{src/library/base/R/zzz.R}, |
| either @code{.ArgsEnv} or @code{.GenericArgsEnv}: internal generics also |
| need to be listed in the character vector @code{.S3PrimitiveGenerics}. |
| Note too the discussion about argument matching above: if you add a |
| primitive function with more than one argument by converting a |
| @code{.Internal} you need to add argument matching to the C code, and |
| for those with a single argument, add argument-name checking. |
| |
| Do ensure that @command{make check-devel} has been run: that tests most |
| of these requirements. |
| |
| @node Internationalization in the R sources, Package Structure, .Internal vs .Primitive, Top |
| @chapter Internationalization in the R sources |
| |
| The process of marking messages (errors, warnings etc) for translation |
| in an @R{} package is described in |
| @ifset UseExternalXrefs |
| @ref{Internationalization, , Internationalization, R-exts, Writing R Extensions}, |
| @end ifset |
| @ifclear UseExternalXrefs |
| `Writing R Extensions', |
| @end ifclear |
| and the standard packages included with @R{} have (with an exception in |
| @pkg{grDevices} for the menus of the @code{windows()} device) been |
| internationalized in the same way as other packages. |
| |
| @menu |
| * R code:: |
| * Main C code:: |
| * Windows-GUI-specific code:: |
| * macOS GUI:: |
| * Updating:: |
| @end menu |
| |
| @node R code, Main C code, Internationalization in the R sources, Internationalization in the R sources |
| @section R code |
| |
| Internationalization for @R{} code is done in exactly the same way as |
| for extension packages. As all standard packages which have @R{} code |
| also have a namespace, it is never necessary to specify @code{domain}, |
| but for efficiency calls to @code{message}, @code{warning} and |
| @code{stop} should include @code{domain = NA} when the message is |
| constructed @emph{via} @code{gettextf}, @code{gettext} or |
| @code{ngettext}. |
| |
| For each package, the extracted messages and translation sources are |
| stored under package directory @file{po} in the source package, and |
| compiled translations under @file{inst/po} for installation to package |
| directory @file{po} in the installed package. This also applies to C |
| code in packages. |
| |
| @node Main C code, Windows-GUI-specific code, R code, Internationalization in the R sources |
| @section Main C code |
| |
| The main C code (e.g.@: that in files @file{src/*/*.c} and in |
| the modules) is where @R{} is closest to the sort of application for |
| which @samp{gettext} was written. Messages in the main C code are in |
| domain @code{R} and stored in the top-level directory @file{po} with |
| compiled translations under @file{share/locale}. |
| |
| The list of files covered by the @R{} domain is specified in file |
| @file{po/POTFILES.in}. |
| |
| The normal way to mark messages for translation is via @code{_("msg")} |
| just as for packages. However, sometimes one needs to mark passages for |
| translation without wanting them translated at the time, for example |
| when declaring string constants. This is the purpose of the @code{N_} |
| macro, for example |
| |
| @example |
| @{ ERROR_ARGTYPE, N_("invalid argument type")@}, |
| @end example |
| |
| @noindent |
| from file @file{src/main/errors.c}. |
| |
| The @code{P_} macro |
| |
| @example |
| #ifdef ENABLE_NLS |
| #define P_(StringS, StringP, N) ngettext (StringS, StringP, N) |
| #else |
| #define P_(StringS, StringP, N) (N > 1 ? StringP: StringS) |
| #endif |
| @end example |
| |
| @noindent |
| may be used |
| as a wrapper for @code{ngettext}: however in some cases the preferred |
| approach has been to conditionalize (on @code{ENABLE_NLS}) code using |
| @code{ngettext}. |
| |
| The macro @code{_("msg")} can safely be used in directory |
| @file{src/appl}; the header for standalone @samp{nmath} skips possible |
| translation. (This does not apply to @code{N_} or @code{P_}). |
| |
| |
| @node Windows-GUI-specific code, macOS GUI, Main C code, Internationalization in the R sources |
| @section Windows-GUI-specific code |
| |
| Messages for the Windows GUI are in a separate domain @samp{RGui}. This |
| was done for two reasons: |
| |
| @itemize |
| @item |
| The translators for the Windows version of @R{} might be separate from |
| those for the rest of @R{} (familiarity with the GUI helps), and |
| |
| @item |
| Messages for Windows are most naturally handled in the native charset |
| for the language, and in the case of CJK languages the charset is |
| Windows-specific. (It transpires that as the @code{iconv} we ported |
| works well under Windows, this is less important than anticipated.) |
| @end itemize |
| |
| Messages for the @samp{RGui} domain are marked by @code{G_("msg")}, a |
| macro that is defined in header file @file{src/gnuwin32/win-nls.h}. The |
| list of files that are considered is hardcoded in function @code{update_RGui_po} |
| in file @file{src/library/tools/R/translations.R}, which is invoked via the |
| @code{update-RGui} target of file @file{po/Makefile.win}: note |
| that this includes @file{devWindows.c} as the menus on the |
| @code{windows} device are considered to be part of the GUI. (There is |
| also @code{GN_("msg")}, the analogue of @code{N_("msg")}.) |
| |
| The template and message catalogs for the @samp{RGui} domain are in the |
| top-level @file{po} directory of package @pkg{base}. |
| |
| |
| @node macOS GUI, Updating, Windows-GUI-specific code, Internationalization in the R sources |
| @section macOS GUI |
| |
| This is handled separately: see |
| @uref{https://developer.r-project.org/Translations30.html}. |
| |
| |
| @node Updating, , macOS GUI, Internationalization in the R sources |
| @section Updating |
| |
| See file @file{po/README} for how to update the message templates and catalogs. |
| |
| @node Package Structure, Files, Internationalization in the R sources, Top |
| @chapter Structure of an Installed Package |
| |
| @menu |
| * Metadata:: |
| * Help:: |
| @end menu |
| |
| The structure of a @emph{source} packages is described in @ref{Creating |
| R packages, , Creating R packages, R-exts, Writing R Extensions}: this |
| chapter is concerned with the structure of @emph{installed} packages. |
| |
| An installed package has a top-level file @file{DESCRIPTION}, a copy of |
| the file of that name in the package sources with a @samp{Built} field |
| appended, and file @file{INDEX}, usually describing the objects on which |
| help is available, a file @file{NAMESPACE} if the package has a name |
| space, optional files such as @file{CITATION}, @file{LICENCE} and |
| @file{NEWS}, and any other files copied in from @file{inst}. It will |
| have directories @file{Meta}, @file{help} and @file{html} (even if the |
| package has no help pages), almost always has a directory @file{R} and |
| often has a directory @file{libs} to contain compiled code. Other |
| directories with known meaning to @R{} are @file{data}, @file{demo}, |
| @file{doc} and @file{po}. |
| |
| Function @code{library} looks for a namespace and if one is found |
| passes control to @code{loadNamespace}. Then @code{library} or |
| @code{loadNamespace} looks for file @file{R/@var{pkgname}}, warns if it |
| is not found and otherwise sources the code (using @code{sys.source}) |
| into the package's environment, then lazy-loads a database |
| @file{R/sysdata} if present. So how @R{} code gets loaded depends on |
| the contents of @file{R/@var{pkgname}}: a standard template to load |
| lazy-load databases are provided in @file{share/R/nspackloader.R}. |
| |
| Compiled code is usually loaded when the package's namespace is loaded |
| by a @code{useDynlib} directive in a @file{NAMESPACE} file or by the |
| package's @code{.onLoad} function. Conventionally compiled code is |
| loaded by a call to @code{library.dynam} and this looks in directory |
| @file{libs} (and in an appropriate sub-directory if sub-architectures |
| are in use) for a shared object (Unix-alike) or DLL (Windows). |
| |
| Subdirectory @file{data} serves two purposes. In a package using |
| lazy-loading of data, it contains a lazy-load database @file{Rdata}, |
| plus a file @file{Rdata.rds} which contain a named character vector used |
| by @code{data()} in the (unusual) event that it is used for such a |
| package. Otherwise it is a copy of the @file{data} directory in the |
| sources, with saved images re-compressed if @command{R CMD INSTALL |
| --resave-data} was used. |
| |
| Subdirectory @file{demo} supports the @code{demo} function, and is |
| copied from the sources. |
| |
| Subdirectory @file{po} contains (in subdirectories) compiled message |
| catalogs. |
| |
| @node Metadata, Help, Package Structure, Package Structure |
| @section Metadata |
| |
| Directory @file{Meta} contains several files in @code{.rds} format, that |
| is serialized @R{} objects written by @code{saveRDS}. All packages |
| have files @file{Rd.rds}, @file{hsearch.rds}, @file{links.rds}, |
| @file{features.rds}, and |
| @file{package.rds}. Packages with namespaces have a file |
| @file{nsInfo.rds}, and those with data, demos or vignettes have |
| @file{data.rds}, @file{demo.rds} or @file{vignette.rds} files. |
| |
| The structure of these files (and their existence and names) is private |
| to @R{}, so the description here is for those trying to follow the @R{} |
| sources: there should be no reference to these files in non-base |
| packages. |
| |
| File @file{package.rds} is a dump of information extracted from the |
| @file{DESCRIPTION} file. It is a list of several components. The |
| first, @samp{DESCRIPTION}, is a character vector, the @file{DESCRIPTION} |
| file as read by @code{read.dcf}. Further elements @samp{Depends}, |
| @samp{Suggests}, @samp{Imports}, @samp{Rdepends} and @samp{Rdepends2} |
| record the @samp{Depends}, @samp{Suggests} and @samp{Imports} fields. |
| These are all lists, and can be empty. The first three have an entry |
| for each package named, each entry being a list of length 1 or 3, which |
| element @samp{name} (the package name) and optional elements @samp{op} |
| (a character string) and @samp{version} (an object of class |
| @samp{"package_version"}). Element @samp{Rdepends} is used for the |
| first version dependency on @R{}, and @samp{Rdepends2} is a list of zero |
| or more @R{} version dependencies---each is a three-element list of the |
| form described for packages. Element @samp{Rdepends} is no longer used, |
| but it is still potentially needed so @R{} < 2.7.0 can detect that the |
| package was not installed for it. |
| |
| File @file{nsInfo.rds} records a list, a parsed version of the |
| @file{NAMESPACE} file. |
| |
| File @file{Rd.rds} records a data frame with one row for each help file. |
| The columns are @samp{File} (the file name with extension), @samp{Name} |
| (the @samp{\name} section), @samp{Type} (from the optional |
| @samp{\docType} section), @samp{Title}, @samp{Encoding}, @samp{Aliases}, |
| @samp{Concepts} and @samp{Keywords}. All columns are character vectors |
| apart from @samp{Aliases}, which is a list of character vectors. |
| |
| File @file{hsearch.rds} records the information to be used by |
| @samp{help.search}. This is a list of four unnamed elements which are |
| character matrices for help files, aliases, keywords and concepts. All |
| the matrices have columns @samp{ID} and @samp{Package} which are used to |
| tie the aliases, keywords and concepts (the remaining column of the last |
| three elements) to a particular help file. The first element has |
| further columns @samp{LibPath} (stored as @code{""} and filled in what |
| the file is loaded), @samp{name}, @samp{title}, @samp{topic} (the first |
| alias, used when presenting the results as |
| @samp{@var{pkgname}::@var{topic}}) and @samp{Encoding}. |
| |
| File @file{links.rds} records a named character vector, the names being |
| aliases and the values character strings of the form |
| @example |
| "../../@var{pkgname}/html/@var{filename}.html" |
| @end example |
| |
| File @file{data.rds} records a two-column character matrix with columns |
| of dataset names and titles from the corresponding help file. File |
| @file{demo.rds} has the same structure for package demos. |
| |
| File @file{vignette.rds} records a data frame with one row for each |
| `vignette' (@file{.[RS]nw} file in @file{inst/doc}) and with columns |
| @samp{File} (the full file path in the sources), @samp{Title}, |
| @samp{PDF} (the pathless file name of the installed PDF version, if |
| present), @samp{Depends}, @samp{Keywords} and @samp{R} (the pathless |
| file name of the installed @R{} code, if present). |
| |
| |
| @node Help, , Metadata, Package Structure |
| @section Help |
| |
| All installed packages, whether they had any @file{.Rd} files or not, |
| have @file{help} and @file{html} directories. The latter normally only |
| contains the single file @file{00Index.html}, the package index which |
| has hyperlinks to the help topics (if any). |
| |
| Directory @file{help} contains files @file{AnIndex}, @file{paths.rds} |
| and @file{@var{pkgname}.rd[bx]}. The latter two files are a lazy-load |
| database of parsed @file{.Rd} files, accessed by |
| @code{tools:::fetchRdDB}. File @file{paths.rds} is a saved character |
| vector of the original path names of the @file{.Rd} files, used when |
| updating the database. |
| |
| File @file{AnIndex} is a two-column tab-delimited file: the first column |
| contains the aliases defined in the help files and the second the |
| basename (without the @file{.Rd} or @file{.rd} extension) of the file |
| containing that alias. It is read by @code{utils:::index.search} to |
| search for files matching a topic (alias), and read by @code{scan} in |
| @code{utils:::matchAvailableTopics}, part of the completion system. |
| |
| File @file{aliases.rds} is the same information as @file{AnIndex} as a |
| named character vector (names the topics, values the file basename), for |
| faster access. |
| |
| @node Files, Graphics Devices, Package Structure, Top |
| @chapter Files |
| |
| @R{} provides many functions to work with files and directories: many of |
| these have been added relatively recently to facilitate scripting in |
| @R{} and in particular the replacement of Perl scripts by @R{} scripts |
| in the management of @R{} itself. |
| |
| These functions are implemented by standard C/POSIX library calls, |
| except on Windows. That means that filenames must be encoded in the |
| current locale as the OS provides no other means to access the file |
| system: increasingly filenames are stored in UTF-8 and the OS will |
| translate filenames to UTF-8 in other locales. So using a UTF-8 locale |
| gives transparent access to the whole file system. |
| |
| Windows is another story. There the internal view of filenames is in |
| UTF-16LE (so-called `Unicode'), and standard C library calls can only |
| access files whose names can be expressed in the current codepage. To |
| circumvent that restriction, there is a parallel set of Windows-specific |
| calls which take wide-character arguments for filepaths. Much of the |
| file-handling in @R{} has been moved over to using these functions, so |
| filenames can be manipulated in @R{} as UTF-8 encoded character strings, |
| converted to wide characters (which on Windows are UTF-16LE) and passed |
| to the OS. The utilities @code{RC_fopen} and @code{filenameToWchar} |
| help this process. Currently @code{file.copy} to a directory, |
| @code{list.files}, @code{list.dirs} and @code{path.expand} work only |
| with filepaths encoded in the current codepage. |
| |
| All these functions do tilde expansion, in the same way as |
| @code{path.expand}, with the deliberate exception of @code{Sys.glob}. |
| |
| File names may be case sensitive or not: the latter is the norm on |
| Windows and macOS, the former on other Unix-alikes. Note that this |
| is a property of both the OS and the file system: it is often possible |
| to map names to upper or lower case when mounting the file system. This |
| can affect the matching of patterns in @code{list.files} and |
| @code{Sys.glob}. |
| |
| File names commonly contain spaces on Windows and macOS but not |
| elsewhere. As file names are handled as character strings by @R{}, |
| spaces are not usually a concern unless file names are passed to other |
| process, e.g.@: by a @code{system} call. |
| |
| Windows has another couple of peculiarities. Whereas a POSIX file |
| system has a single root directory (and other physical file systems are |
| mounted onto logical directories under that root), Windows has separate |
| roots for each physical or logical file system (`volume'), organized |
| under @emph{drives} (with file paths starting @code{D:} for an |
| @acronym{ASCII} letter, case-insensitively) and @emph{network shares} |
| (with paths like @code{\netname\topdir\myfiles\a file}). There is a |
| current drive, and path names without a drive part are relative to the |
| current drive. Further, each drive has a current directory, and |
| relative paths are relative to that current directory, on a particular |
| drive if one is specified. So @file{D:dir\file} and @file{D:} are valid |
| path specifications (the last being the current directory on drive |
| @file{D:}). |
| |
| @c basename Wchar na |
| @c dir.create Wchar ~ |
| @c dirname Wchar ~ |
| @c getwd |
| @c file.access Wchar ~ |
| @c file.append RC_fopen |
| @c file.copy no ~ (+ file.append) |
| @c file.create RC_fopen |
| @c file.edit UTF-8 in R code |
| @c file.exists Wchar ~ |
| @c file.info Wchar ~ |
| @c file.link 8-bit ~ |
| @c file.remove Wchar ~ |
| @c file.rename Wchar ~ |
| @c file.show UTF-8 in R code |
| @c file.symlink not ~ |
| @c file_test |
| @c list.dirs no ~ |
| @c list.files no ~ |
| @c normalizePath Wchar ~ |
| @c path.expand no |
| @c setwd Wchar ~ |
| @c Sys.chmod Wchar ~ |
| @c Sys.glob Wchar not |
| @c Sys.readlink not ~ |
| @c Sys.umask |
| @c unlink Wchar ~ |
| |
| |
| @node Graphics Devices, GUI consoles, Files, Top |
| @chapter Graphics |
| |
| @R{}'s graphics internals were re-designed to enable multiple graphics |
| systems to be installed on top on the graphics `engine' -- currently |
| there are two such systems, one supporting `base' graphics (based on |
| that in S and whose @R{} code@footnote{The C code is in files |
| @file{base.c}, @file{graphics.c}, @file{par.c}, @file{plot.c} and |
| @file{plot3d.c} in directory @file{src/main}.} is in package |
| @pkg{graphics}) and one implemented in package @pkg{grid}. |
| |
| Some notes on the historical changes can be found at |
| @uref{https://www.stat.auckland.ac.nz/~paul/R/basegraph.html} and |
| @uref{https://www.stat.auckland.ac.nz/~paul/R/graphicsChanges.html}. |
| |
| At the lowest level is a graphics device, which manages a plotting |
| surface (a screen window or a representation to be written to a file). |
| This implements a set of graphics primitives, to `draw' |
| |
| @itemize |
| @item a circle, optionally filled |
| @item a rectangle, optionally filled |
| @item a line |
| @item a set of connected lines |
| @item a polygon, optionally filled |
| @item a paths, optionally filled using a winding rule |
| @item text |
| @item a raster image (optional) |
| @item and to set a clipping rectangle |
| @end itemize |
| |
| @noindent |
| as well as requests for information such as |
| |
| @itemize |
| @item the width of a string if plotted |
| @item the metrics (width, ascent, descent) of a single character |
| @item the current size of the plotting surface |
| @end itemize |
| |
| @noindent |
| and requests/opportunities to take action such as |
| |
| @itemize |
| @item start a new `page', possibly after responding to a request to ask |
| the user for confirmation. |
| @item return the position of the device pointer (if any). |
| @item when a device become the current device or stops being the current |
| device (this is usually used to change the window title on a screen |
| device). |
| @item when drawing starts or finishes (e.g.@: used to flush graphics to |
| the screen when drawing stops). |
| @item wait for an event, for example a mouse click or keypress. |
| @item an `onexit' action, to clean up if plotting is interrupted (by an |
| error or by the user). |
| @item capture the current contents of the device as a raster image. |
| @item close the device. |
| @end itemize |
| |
| |
| |
| The device also sets a number of variables, mainly Boolean flags |
| indicating its capabilities. Devices work entirely in `device units' |
| which are up to its developer: they can be in pixels, big points (1/72 |
| inch), twips, @dots{}, and can differ@footnote{although that needs to be |
| handled carefully, as for example the @code{circle} callback is given a |
| radius (and that should be interpreted as in the x units).} in the |
| @samp{x} and @samp{y} directions. |
| |
| @c think of the engine as colors.c, devices.c, engine.c, plotmath.c, vfonts.c |
| The next layer up is the graphics `engine' that is the main interface to |
| the device (although the graphics subsystems do talk directly to |
| devices). This is responsible for clipping lines, rectangles and |
| polygons, converting the @code{pch} values @code{0...26} to sets of |
| lines/circles, centring (and otherwise adjusting) text, rendering |
| mathematical expressions (`plotmath') and mapping colour descriptions |
| such as names to the internal representation. |
| |
| @c graphics.c looks at device dimensions, locator, metricinfo |
| @c par.c looks at various device pars |
| @c plot3d.c looks at useRotatedTextInContour |
| @c grid looks at size, clipping, locator, ipr |
| |
| Another function of the engine is to manage display lists and snapshots. |
| Some but not all instances of graphics devices maintain display lists, a |
| `list' of operations that have been performed on the device to produce |
| the current plot (since the device was opened or the plot was last |
| cleared, e.g.@: by @code{plot.new}). Screen devices generally maintain |
| a display list to handle repaint and resize events whereas file-based |
| formats do not---display lists are also used to implement |
| @code{dev.copy()} and friends. The display list is a pairlist of |
| @code{.Internal} (base graphics) or @code{.Call.graphics} (grid |
| graphics) calls, which means that the C code implementing a graphics |
| operation will be re-called when the display list is replayed: apart |
| from the part which records the operation if successful. |
| |
| Snapshots of the current graphics state are taken by |
| @code{GEcreateSnapshot} and replayed later in the session by |
| @code{GEplaySnapshot}. These are used by @code{recordPlot()}, |
| @code{replayPlot()} and the GUI menus of the @code{windows()} device. |
| The `state' includes the display list. |
| |
| |
| The top layer comprises the graphics subsystems. Although there is |
| provision for 24 subsystems since about 2001, currently still only two |
| exist, `base' and |
| `grid'. The base subsystem is registered with the engine when @R{} is |
| initialized, and unregistered (via @code{KillAllDevices}) when an @R{} |
| session is shut down. The grid subsystem is registered in its |
| @code{.onLoad} function and unregistered in the @code{.onUnload} |
| function. The graphics subsystem may also have `state' information |
| saved in a snapshot (currently base does and grid does not). |
| |
| Package @pkg{grDevices} was originally created to contain the basic |
| graphics devices (although @code{X11} is in a separate load-on-demand |
| module because of the volume of external libraries it brings in). Since |
| then it has been used for other functionality that was thought desirable |
| for use with @pkg{grid}, and hence has been transferred from package |
| @pkg{graphics} to @pkg{grDevices}. This is principally concerned with |
| the handling of colours and recording and replaying plots. |
| |
| @menu |
| * Graphics devices:: |
| * Colours:: |
| * Base graphics:: |
| * Grid graphics:: |
| @end menu |
| |
| @node Graphics devices, Colours, Graphics Devices, Graphics Devices |
| @section Graphics Devices |
| |
| @R{} ships with several graphics devices, and there is support for |
| third-party packages to provide additional devices---several packages |
| now do. This section describes the device internals from the viewpoint |
| of a would-be writer of a graphics device. |
| |
| @menu |
| * Device structures:: |
| * Device capabilities:: |
| * Handling text:: |
| * Conventions:: |
| * 'Mode':: |
| * Graphics events:: |
| * Specific devices:: |
| @end menu |
| |
| @node Device structures, Device capabilities, Graphics devices, Graphics devices |
| @subsection Device structures |
| |
| There are two types used internally which are pointers to structures |
| related to graphics devices. |
| |
| The @code{DevDesc} type is a structure defined in the header file |
| @file{R_ext/GraphicsDevice.h} (which is included by |
| @file{R_ext/GraphicsEngine.h}). This describes the physical |
| characteristics of a device, the capabilities of the device driver and |
| contains a set of callback functions that will be used by the graphics |
| engine to obtain information about the device and initiate actions |
| (e.g.@: a new page, plotting a line or some text). Type @code{pDevDesc} |
| is a pointer to this type. |
| |
| The following callbacks can be omitted (or set to the null pointer, |
| their default value) when appropriate default behaviour will be taken by |
| the graphics engine: @code{activate}, @code{cap}, @code{deactivate}, |
| @code{locator}, @code{holdflush} (API version 9), @code{mode}, |
| @code{newFrameConfirm}, @code{path}, @code{raster} and @code{size}. |
| |
| The relationship of device units to physical dimensions is set by the |
| element @code{ipr} of the @code{DevDesc} structure: a @samp{double} |
| array of length 2. |
| |
| |
| The @code{GEDevDesc} type is a structure defined in |
| @file{R_ext/GraphicsEngine.h} (with comments in the file) as |
| |
| @example |
| typedef struct _GEDevDesc GEDevDesc; |
| struct _GEDevDesc @{ |
| pDevDesc dev; |
| Rboolean displayListOn; |
| SEXP displayList; |
| SEXP DLlastElt; |
| SEXP savedSnapshot; |
| Rboolean dirty; |
| Rboolean recordGraphics; |
| GESystemDesc *gesd[MAX_GRAPHICS_SYSTEMS]; |
| Rboolean ask; |
| @} |
| @end example |
| |
| @noindent |
| So this is essentially a device structure plus information about the |
| device maintained by the graphics engine and normally@footnote{It is |
| possible for the device to find the @code{GEDevDesc} which points to its |
| @code{DevDesc}, and this is done often enough that there is a |
| convenience function @code{desc2GEDesc} to do so.} visible to the engine |
| and not to the device. Type @code{pGEDevDesc} is a pointer to this |
| type. |
| |
| The graphics engine maintains an array of devices, as pointers to |
| @code{GEDevDesc} structures. The array is of size 64 but the first |
| element is always occupied by the @code{"null device"} and the final |
| element is kept as NULL as a sentinel.@footnote{Calling |
| @code{R_CheckDeviceAvailable()} ensures there is a free slot or throws |
| an error.} This array is reflected in the @R{} variable |
| @samp{.Devices}. Once a device is killed its element becomes available |
| for reallocation (and its name will appear as @code{""} in |
| @samp{.Devices}). Exactly one of the devices is `active': this is the |
| the null device if no other device has been opened and not killed. |
| |
| Each instance of a graphics device needs to set up a @code{GEDevDesc} |
| structure by code very similar to |
| |
| @example |
| pGEDevDesc gdd; |
| |
| R_GE_checkVersionOrDie(R_GE_version); |
| R_CheckDeviceAvailable(); |
| BEGIN_SUSPEND_INTERRUPTS @{ |
| pDevDesc dev; |
| /* Allocate and initialize the device driver data */ |
| if (!(dev = (pDevDesc) calloc(1, sizeof(DevDesc)))) |
| return 0; /* or error() */ |
| /* set up device driver or free 'dev' and error() */ |
| gdd = GEcreateDevDesc(dev); |
| GEaddDevice2(gdd, "dev_name"); |
| @} END_SUSPEND_INTERRUPTS; |
| @end example |
| |
| The @code{DevDesc} structure contains a @code{void *} pointer |
| @samp{deviceSpecific} which is used to store data specific to the |
| device. Setting up the device driver includes initializing all the |
| non-zero elements of the @code{DevDesc} structure. |
| |
| Note that the device structure is zeroed when allocated: this provides |
| some protection against future expansion of the structure since the |
| graphics engine can add elements that need to be non-NULL/non-zero to be |
| `on' (and the structure ends with 64 reserved bytes which will be zeroed |
| and allow for future expansion). |
| |
| Rather more protection is provided by the version number of the |
| engine/device API, @code{R_GE_version} defined in |
| @file{R_ext/GraphicsEngine.h} together with access functions |
| |
| @example |
| int R_GE_getVersion(void); |
| void R_GE_checkVersionOrDie(int version); |
| @end example |
| |
| @noindent |
| If a graphics device calls @code{R_GE_checkVersionOrDie(R_GE_version)} |
| it can ensure it will only be used in versions of @R{} which provide the |
| API it was designed for and compiled against. |
| |
| The @code{DevDesc} structure also contains an @code{int} @samp{deviceVersion} |
| to indicate which version of the engine/device API that the device supports. |
| If the device driver sets this correctly, |
| there is no need for a device driver to use |
| @code{R_GE_checkVersionOrDie(R_GE_version)} because the graphics engine will |
| not make use of callbacks from an API version above the version that is |
| supported by the device. |
| |
| @node Device capabilities, Handling text, Device structures, Graphics devices |
| @subsection Device capabilities |
| |
| The following `capabilities' can be defined for the device's |
| @code{DevDesc} structure. |
| |
| @itemize |
| @item @code{canChangeGamma} -- |
| @code{Rboolean}: can the display gamma be adjusted? This is now |
| ignored, as gamma support has been removed. |
| @item @code{canHadj} -- |
| @code{integer}: can the device do horizontal adjustment of text |
| @emph{via} the @code{text} callback, and if so, how precisely? 0 = no |
| adjustment, 1 = @{0, 0.5, 1@} (left, centre, right justification) or 2 = |
| continuously variable (in [0,1]) between left and right justification. |
| @item @code{canGenMouseDown} -- |
| @code{Rboolean}: can the device handle mouse down events? This |
| flag and the next three are not currently used by R, but are maintained |
| for back compatibility. |
| @item @code{canGenMouseMove} -- |
| @code{Rboolean}: ditto for mouse move events. |
| @item @code{canGenMouseUp} -- |
| @code{Rboolean}: ditto for mouse up events. |
| @item @code{canGenKeybd} -- |
| @code{Rboolean}: ditto for keyboard events. |
| @item @code{hasTextUTF8} -- |
| @code{Rboolean}: should non-symbol text be sent (in UTF-8) to the |
| @code{textUTF8} and @code{strWidthUTF8} callbacks, and sent as Unicode |
| points (negative values) to the @code{metricInfo} callback? |
| @item @code{wantSymbolUTF8} -- |
| @code{Rboolean}: should symbol text be handled in UTF-8 in the same way |
| as other text? Requires @code{textUTF8 = TRUE}. |
| @item @code{haveTransparency}: |
| does the device support semi-transparent colours? |
| @item @code{haveTransparentBg}: |
| can the background be fully or semi-transparent? |
| @item @code{haveRaster}: |
| is there support for rendering raster images? |
| @item @code{haveCapture}: |
| is there support for @code{grid::grid.cap}? |
| @item @code{haveLocator}: |
| is there an interactive locator? |
| @item @code{deviceClip}: |
| should the engine leave @emph{all} clipping to the device? |
| @end itemize |
| |
| @code{haveRaster}, |
| @code{haveCapture}, and @code{haveLocator} |
| can often be deduced to be false from the presence of |
| @code{NULL} entries instead of the corresponding functions. |
| |
| In addition, the @code{capabilities} callback allows the device driver |
| to provide more detailed information, especially related to callbacks |
| in the engine/device API version 13 or higher. |
| |
| The @code{capabilities} callback is called with a list of integer |
| vectors that represent the best guess that the graphics engine |
| can make, based on the flags in the @code{DevDesc} structure and |
| the @samp{deviceVersion}. For some capabilities, the integer vector |
| is length 1 with @code{0} for no support, @code{1} for support, or |
| @code{NA} for unknown support. For capabilities where support |
| can be more nuanced, the integer vector may either take higher values |
| or it may have length greater than 1, |
| though length 1 and @code{0} still means no support and @code{NA} still means |
| unknown support. |
| |
| The following components of |
| this list are likely to need modifying (for these, the graphics engine |
| can only guess @code{0} |
| if @samp{deviceVersion} is too low or @code{NA} |
| otherwise): |
| |
| @itemize |
| @item The @code{patterns} component reports what sort of pattern fills are |
| supported. If the device supports one or more pattern types, |
| this component should be replaced with an integer vector containing |
| a value for each supported pattern type; |
| the graphics engine provides |
| constants @code{R_GE_linearGradientPattern}, |
| @code{R_GE_radialGradientPattern}, and |
| @code{R_GE_tilingPattern}. |
| If the device does not provide support, this component should |
| be set to 0. |
| |
| @item The @code{clippingPaths} component reports whether |
| arbitrary clipping paths are supported. |
| If the device supports clipping paths, this component should be set to 1. |
| If the device does not provide support, this component should |
| be set to 0. |
| |
| @item The @code{masks} component reports what sort of masks are supported. |
| If the device supports one or more mask types, |
| this component should be replaced with an integer vector containing |
| a value for each supported mask type; |
| the graphics engine provides |
| constants @code{R_GE_alphaMask} and @code{R_GE_luminanceMask}. |
| If the device does not provide support, this component should |
| be set to 0. |
| |
| @item The @code{compositing} component reports which compositing operators |
| are supported. |
| If the device supports one or more compositing operators, |
| this component should be replaced with an integer vector containing |
| a value for each supported operator; |
| The list of possible operators is long, |
| encompassing Porter-Duff operators and Adobe PDF Blend Modes; |
| the graphics engine provides constants @code{R_GE_compositeClear}, etc. |
| If the device does not provide support, this component should |
| be set to 0. |
| |
| @item The @code{transformations} component reports whether |
| affine transformations are supported. |
| If the device supports transformations, this component should be set to 1. |
| If the device does not provide support, this component should |
| be set to 0. |
| |
| @item The @code{paths} component reports whether stroking and |
| filling of paths composed of multiple shapes is supported. |
| If the device supports stroking and filling paths, |
| this component should be set to 1. |
| If the device does not provide support, this component should |
| be set to 0. |
| |
| @end itemize |
| |
| The graphics engine provides constants like |
| @code{R_GE_capability_patterns} for selecting the appropriate |
| component of the list of capabilities. |
| |
| It is valid (if unhelpful) for the device driver to return the list |
| of capabilities unaltered. |
| |
| @node Handling text, Conventions, Device capabilities, Graphics devices |
| @subsection Handling text |
| |
| Handling text is probably the hardest task for a graphics device, and |
| the design allows for the device to optionally indicate that it has |
| additional capabilities. (If the device does not, these will if |
| possible be handled in the graphics engine.) |
| |
| The three callbacks for handling text that must be in all graphics |
| devices are @code{text}, @code{strWidth} and @code{metricInfo} with |
| declarations |
| |
| @example |
| void text(double x, double y, const char *str, double rot, double hadj, |
| pGgcontext gc, pDevDesc dd); |
| |
| double strWidth(const char *str, pGEcontext gc, pDevDesc dd); |
| |
| void metricInfo(int c, pGEcontext gc, |
| double* ascent, double* descent, double* width, |
| pDevDesc dd); |
| @end example |
| |
| @noindent |
| The @samp{gc} parameter provides the graphics context, most importantly |
| the current font and fontsize, and @samp{dd} is a pointer to the active |
| device's structure. |
| |
| The @code{text} callback should plot @samp{str} at @samp{(x, |
| y)}@footnote{in device coordinates} with an anti-clockwise rotation of |
| @samp{rot} degrees. (For @samp{hadj} see below.) The interpretation |
| for horizontal text is that the baseline is at @code{y} and the start is |
| a @code{x}, so any left bearing for the first character will start at |
| @code{x}. |
| |
| The @code{strWidth} callback computes the width of the string which it |
| would occupy if plotted horizontally in the current font. (Width here |
| is expected to include both (preferably) or neither of left and right |
| bearings.) |
| |
| The @code{metricInfo} callback computes the size of a single |
| character: @code{ascent} is the distance it extends above the baseline |
| and @code{descent} how far it extends below the baseline. |
| @code{width} is the amount by which the cursor should be advanced when |
| the character is placed. For @code{ascent} and @code{descent} this is |
| intended to be the bounding box of the `ink' put down by the glyph and |
| not the box which might be used when assembling a line of conventional |
| text (it needs to be for e.g.@: @code{hat(beta)} to work correctly). |
| However, the @code{width} is used in plotmath to advance to the next |
| character, and so needs to include left and right bearings. |
| |
| The @emph{interpretation} of @samp{c} depends on the locale. In a |
| single-byte locale values @code{32...255} indicate the corresponding |
| character in the locale (if present). For the symbol font (as used by |
| @samp{graphics::par(font=5)}, @samp{grid::gpar(fontface=5}) and by |
| `plotmath'), values @code{32...126, 161...239, 241...254} indicate |
| glyphs in the Adobe Symbol encoding. In a multibyte locale, @code{c} |
| represents a Unicode point (except in the symbol font). So the function |
| needs to include code like |
| |
| @example |
| Rboolean Unicode = mbcslocale && (gc->fontface != 5); |
| if (c < 0) @{ Unicode = TRUE; c = -c; @} |
| if(Unicode) UniCharMetric(c, ...); else CharMetric(c, ...); |
| @end example |
| |
| @noindent |
| In addition, if device capability @code{hasTextUTF8} (see below) is |
| true, Unicode points will be passed as negative values: the code snippet |
| above shows how to handle this. (This applies to the symbol font only |
| if device capability @code{wantSymbolUTF8} is true.) |
| |
| If possible, the graphics device should handle clipping of text. It |
| indicates this by the structure element @code{canClip} which if true |
| will result in calls to the callback @code{clip} to set the clipping |
| region. If this is not done, the engine will clip very crudely (by |
| omitting any text that does not appear to be wholly inside the clipping |
| region). |
| |
| The device structure has an integer element @code{canHadj}, which |
| indicates if the device can do horizontal alignment of text. If this is |
| one, argument @samp{hadj} to @code{text} will be called as @code{0 ,0.5, |
| 1} to indicate left-, centre- and right-alignment at the indicated |
| position. If it is two, continuous values in the range @code{[0, 1]} |
| are assumed to be supported. |
| |
| Capability @code{hasTextUTF8} if true, it has two consequences. |
| First, there are callbacks @code{textUTF8} and @code{strWidthUTF8} that |
| should behave identically to @code{text} and @code{strWidth} except that |
| @samp{str} is assumed to be in UTF-8 rather than the current locale's |
| encoding. The graphics engine will call these for all text except in |
| the symbol font. Second, Unicode points will be passed to the |
| @code{metricInfo} callback as negative integers. If your device would |
| prefer to have UTF-8-encoded symbols, define @code{wantSymbolUTF8} as |
| well as @code{hasTextUTF8}. In that case text in the symbol font is |
| sent to @code{textUTF8} and @code{strWidthUTF8}. |
| |
| Some devices can produce high-quality rotated text, but those based on |
| bitmaps often cannot. Those which can should set |
| @code{useRotatedTextInContour} to be true from graphics API version 4. |
| |
| Several other elements relate to the precise placement of text by the |
| graphics engine: |
| |
| @example |
| double xCharOffset; |
| double yCharOffset; |
| double yLineBias; |
| double cra[2]; |
| @end example |
| |
| @noindent |
| These are more than a little mysterious. Element @code{cra} provides an |
| indication of the character size, @code{par("cra")} in base graphics, in |
| device units. The mystery is what is meant by `character size': which |
| character, which font at which size? Some help can be obtained by |
| looking at what this is used for. The first element, `width', is not |
| used by @R{} except to set the graphical parameters. The second, |
| `height', is use to set the line spacing, that is the relationship |
| between @code{par("mar")} and @code{par("mai")} and so on. It is |
| suggested that a good choice is |
| |
| @example |
| dd->cra[0] = 0.9 * fnsize; |
| dd->cra[1] = 1.2 * fnsize; |
| @end example |
| |
| @noindent |
| where @samp{fnsize} is the `size' of the standard font (@code{cex=1}) |
| on the device, in device units. So for a 12-point font (the usual |
| default for graphics devices), @samp{fnsize} should be 12 points in |
| device units. |
| |
| The remaining elements are yet more mysterious. The @code{postscript()} |
| device says |
| |
| @example |
| /* Character Addressing Offsets */ |
| /* These offsets should center a single */ |
| /* plotting character over the plotting point. */ |
| /* Pure guesswork and eyeballing ... */ |
| |
| dd->xCharOffset = 0.4900; |
| dd->yCharOffset = 0.3333; |
| dd->yLineBias = 0.2; |
| @end example |
| |
| @noindent |
| It seems that @code{xCharOffset} is not currently used, and |
| @code{yCharOffset} is used by the base graphics system to set vertical |
| alignment in @code{text()} when @code{pos} is specified, and in |
| @code{identify()}. It is occasionally used by the graphic engine when |
| attempting exact centring of text, such as character string values of |
| @code{pch} in @code{points()} or @code{grid.points()}---however, it is |
| only used when precise character metric information is not available or |
| for multi-line strings. |
| |
| @code{yLineBias} is used in the base graphics system in @code{axis()} and |
| @code{mtext()} to provide a default for their @samp{padj} argument. |
| |
| @node Conventions, 'Mode', Handling text, Graphics devices |
| @subsection Conventions |
| |
| The aim is to make the (default) output from graphics devices as similar |
| as possible. Generally people follow the model of the @code{postscript} |
| and @code{pdf} devices (which share most of their internal code). |
| |
| The following conventions have become established: |
| |
| @itemize |
| |
| @item |
| The default size of a device should be 7 inches square. |
| |
| @item |
| There should be a @samp{pointsize} argument which defaults to 12, and it |
| should give the pointsize in big points (1/72 inch). How exactly this |
| is interpreted is font-specific, but it should use a font which works |
| with lines packed 1/6 inch apart, and looks good with lines 1/5 inch |
| apart (that is with 2pt leading). |
| |
| @item |
| The default font family should be a sans serif font, e.g Helvetica or |
| similar (e.g.@: Arial on Windows). |
| |
| @item |
| @code{lwd = 1} should correspond to a line width of 1/96 inch. This |
| will be a problem with pixel-based devices, and generally there is a |
| minimum line width of 1 pixel (although this may not be appropriate |
| where anti-aliasing of lines is used, and @code{cairo} prefers a minimum |
| of 2 pixels). |
| |
| @item |
| Even very small circles should be visible, e.g.@: by using a minimum |
| radius of 1 pixel or replacing very small circles by a single filled |
| pixel. |
| |
| @item |
| How RGB colour values will be interpreted should be documented, and |
| preferably be sRGB. |
| |
| @item |
| The help page should describe its policy on these conventions. |
| |
| @end itemize |
| |
| These conventions are less clear-cut for bitmap devices, especially |
| where the bitmap format does not have a design resolution. |
| |
| The interpretation of the line texture (@code{par("lty"}) is described |
| in the header @file{GraphicsEngine.h} and in the help for @code{par}: note that the |
| `scale' of the pattern should be proportional to the line width (at |
| least for widths above the default). |
| |
| |
| @node 'Mode', Graphics events, Conventions, Graphics devices |
| @subsection `Mode' |
| |
| One of the device callbacks is a function @code{mode}, documented in |
| the header as |
| |
| @example |
| * device_Mode is called whenever the graphics engine |
| * starts drawing (mode=1) or stops drawing (mode=0) |
| * GMode (in graphics.c) also says that |
| * mode = 2 (graphical input on) exists. |
| * The device is not required to do anything |
| @end example |
| |
| @noindent |
| Since @code{mode = 2} has only recently been documented at device level. |
| It could be used to change the graphics cursor, but devices currently do |
| that in the @code{locator} callback. (In base graphics the mode is set |
| for the duration of a @code{locator} call, but if @code{type != "n"} is |
| switched back for each point whilst annotation is being done.) |
| |
| Many devices do indeed do nothing on this call, but some screen devices |
| ensure that drawing is flushed to the screen when called with @code{mode |
| = 0}. It is tempting to use it for some sort of buffering, but note |
| that `drawing' is interpreted at quite a low level and a typical single |
| figure will stop and start drawing many times. The buffering introduced |
| in the @code{X11()} device makes use of @code{mode = 0} to indicate |
| activity: it updates the screen after @emph{ca} 100ms of inactivity. |
| |
| This callback need not be supplied if it does nothing. |
| |
| @node Graphics events, Specific devices, 'Mode', Graphics devices |
| @subsection Graphics events |
| |
| Graphics devices may be designed to handle user interaction: not all are. |
| |
| Users may use @code{grDevices::setGraphicsEventEnv} to set the |
| @code{eventEnv} environment in the device driver to hold event |
| handlers. When the user calls @code{grDevices::getGraphicsEvent}, R will |
| take three steps. First, it sets the device driver member |
| @code{gettingEvent} to @code{true} for each device with a |
| non-@code{NULL} @code{eventEnv} entry, and calls @code{initEvent(dd, |
| true)} if the callback is defined. It then enters an event loop. Each |
| time through the loop R will process events once, then check whether any |
| device has set the @code{result} member of @code{eventEnv} to a |
| non-@code{NULL} value, and will save the first such value found to be |
| returned. C functions @code{doMouseEvent} and @code{doKeybd} are |
| provided to call the R event handlers @code{onMouseDown}, |
| @code{onMouseMove}, @code{onMouseUp}, and @code{onKeybd} and set |
| @code{eventEnv$result} during this step. Finally, @code{initEvent} is |
| called again with @code{init=false} to inform the devices that the |
| loop is done, and the result is returned to the user. |
| |
| @node Specific devices, , Graphics events, Graphics devices |
| @subsection Specific devices |
| |
| Specific devices are mostly documented by comments in their sources, |
| although for devices of many years' standing those comments can be in |
| need of updating. This subsection is a repository of notes on design |
| decisions. |
| |
| @menu |
| * X11():: |
| * windows():: |
| @end menu |
| |
| @node X11(), windows(), Specific devices, Specific devices |
| @subsubsection X11() |
| |
| The @code{X11(type="Xlib")} device dates back to the mid 1990's and was |
| written then in @code{Xlib}, the most basic X11 toolkit. It has since |
| optionally made use of a few features from other toolkits: @code{libXt} |
| is used to read X11 resources, and @code{libXmu} is used in the handling |
| of clipboard selections. |
| |
| Using basic @code{Xlib} code makes drawing fast, but is limiting. There |
| is no support of translucent colours (that came in the @code{Xrender} |
| toolkit of 2000) nor for rotated text (which @R{} implements by |
| rendering text to a bitmap and rotating the latter). |
| |
| The hinting for the X11 window asks for backing store to be used, and |
| some windows managers may use it to handle repaints, but it seems that |
| most repainting is done by replaying the display list (and here the fast |
| drawing is very helpful). |
| |
| There are perennial problems with finding fonts. Many users fail to |
| realize that fonts are a function of the X server and not of the machine |
| that @R{} is running on. After many difficulties, @R{} tries first to |
| find the nearest size match in the sizes provided for Adobe fonts in the |
| standard 75dpi and 100dpi X11 font packages---even that will fail to |
| work when users of near-100dpi screens have only the 75dpi set |
| installed. The 75dpi set allows sizes down to 6 points on a 100dpi |
| screen, but some users do try to use smaller sizes and even 6 and 8 |
| point bitmapped fonts do not look good. |
| |
| Introduction of UTF-8 locales has caused another wave of difficulties. |
| X11 has very few genuine UTF-8 fonts, and produces composite fontsets |
| for the @code{iso10646-1} encoding. Unfortunately these seem to have |
| low coverage apart from a few monospaced fonts in a few sizes (which are |
| not suitable for graph annotation), and where glyphs are missing what is |
| plotted is often quite unsatisfactory. |
| |
| The current approach is to make use of more modern toolkits, namely |
| @code{cairo} for rendering and @code{Pango} for font |
| management---because these are associated with @code{Gtk+2} they are |
| widely available. Cairo supports translucent colours and alpha-blending |
| (@emph{via} @code{Xrender}), and anti-aliasing for the display of lines |
| and text. Pango's font management is based on @code{fontconfig} and |
| somewhat mysterious, but it seems mainly to use Type 1 and TrueType |
| fonts on the machine running @R{} and send grayscale bitmaps to cairo. |
| |
| |
| @node windows(), , X11(), Specific devices |
| @subsubsection windows() |
| |
| The @code{windows()} device is a family of devices: it supports plotting |
| to Windows (enhanced) metafiles, @code{BMP}, @code{JPEG}, @code{PNG} and |
| @code{TIFF} files as well as to Windows printers. |
| |
| In most of these cases the primary plotting is to a bitmap: this is used |
| for the (default) buffering of the screen device, which also enables the |
| current plot to be saved to BMP, JPEG, PNG or TIFF (it is the internal |
| bitmap which is copied to the file in the appropriate format). |
| |
| The device units are pixels (logical ones on a metafile device). |
| |
| The code was originally written by Guido Masarotto with extensive use of |
| macros, which can make it hard to disentangle. |
| |
| For a screen device, @code{xd->gawin} is the canvas of the screen, and |
| @code{xd->bm} is the off-screen bitmap. So macro @code{DRAW} arranges |
| to plot to @code{xd->bm}, and if buffering is off, also to |
| @code{xd->gawin}. For all other device, @code{xd->gawin} is the canvas, |
| a bitmap for the @code{jpeg()} and @code{png()} device, and an internal |
| representation of a Windows metafile for the @code{win.metafile()} and |
| @code{win.print} device. Since `plotting' is done by Windows GDI calls |
| to the appropriate canvas, its precise nature is hidden by the GDI |
| system. |
| |
| Buffering on the screen device is achieved by running a timer, which |
| when it fires copies the internal bitmap to the screen. This is set to |
| fire every 500ms (by default) and is reset to 100ms after plotting |
| activity. |
| |
| Repaint events are handled by copying the internal bitmap to the screen |
| canvas (and then reinitializing the timer), unless there has been a resize. |
| Resizes are handled by replaying the display list: this might not be |
| necessary if a fixed canvas with scrollbars is being used, but that is |
| the least popular of the three forms of resizing. |
| |
| Text on the device has moved to `Unicode' (UCS-2) in recent years. |
| UTF-8 is requested (@code{hasTextUTF8 = TRUE}) for standard text, and |
| converted to UCS-2 in the plotting functions in file |
| @file{src/extra/graphapp/gdraw.c}. However, GDI has no support for |
| Unicode symbol fonts, and symbols are handled in Adobe Symbol encoding. |
| |
| There is support for translucent colours (with alpha channel between 0 |
| and 255) was introduced on the screen device and bitmap |
| devices.@footnote{It is technically possible to use alpha-blending on |
| metafile devices such as printers, but it seems few drivers have support |
| for this.} This is done by drawing on a further internal bitmap, |
| @code{xd->bm2}, in the opaque version of the colour then alpha-blending |
| that bitmap to @code{xd->bm}. The alpha-blending routine is in a |
| separate DLL, @file{msimg32.dll}, which is loaded on first use. As |
| small a rectangular region as reasonably possible is alpha-blended (this |
| is rectangle @code{r} in the code), but things like mitre joins make |
| estimation of a tight bounding box too much work for lines and polygonal |
| boundaries. Translucent-coloured lines are not common, and the |
| performance seems acceptable. |
| |
| The support for a transparent background in @code{png()} predates full |
| alpha-channel support in @code{libpng} (let alone in PNG viewers), so |
| makes use of the limited transparency support in earlier versions of |
| PNG. Where 24-bit colour is used, this is done by marking a single |
| colour to be rendered as transparent. @R{} chose @samp{#fdfefd}, and |
| uses this as the background colour (in @code{GA_NewPage} if the |
| specified background colour is transparent (and all non-opaque |
| background colours are treated as transparent). So this works by |
| marking that colour in the PNG file, and viewers without transparency |
| support see a slightly-off-white background, as if there were a |
| near-white canvas. Where a palette is used in the PNG file (if less |
| than 256 colours were used) then this colour is recorded with full |
| transparency and the remaining colours as opaque. If 32-bit colour were |
| available then we could add a full alpha channel, but this is dependent |
| on the graphics hardware and undocumented properties of GDI. |
| |
| |
| @node Colours, Base graphics, Graphics devices, Graphics Devices |
| @section Colours |
| |
| Devices receive colours as a @code{typedef} @code{rcolor} (an |
| @code{unsigned int}) defined in the header |
| @file{R_ext/GraphicsEngine.h}). The 4 bytes are @emph{R} ,@emph{G}, |
| @emph{B} and @emph{alpha} from least to most significant. So each of RGB |
| has 256 levels of luminosity from 0 to 255. The alpha byte represents |
| opacity, so value 255 is fully opaque and 0 fully transparent: many but |
| not all devices handle semi-transparent colours. |
| |
| Colors can be created in C via the macro @code{R_RGBA}, and a set of |
| macros are defined in @file{R_ext/GraphicsDevice.h} to extract the |
| various components. |
| |
| Colours in the base graphics system were originally adopted from S (and |
| before that the GRZ library from Bell Labs), with the concept of a |
| (variable-sized) palette of colours referenced by numbers |
| @samp{1...@var{N}} plus @samp{0} (the background colour of the current |
| device). @R{} introduced the idea of referring to colours by character |
| strings, either in the forms @samp{#RRGGBB} or @samp{#RRGGBBAA} |
| (representing the bytes in hex) as given by function @code{rgb()} or via |
| names: the 657 known names are given in the character vector |
| @code{colors} and in a table in file @file{colors.c} in package |
| @pkg{grDevices}. Note that semi-transparent colours are not |
| `premultiplied', so 50% transparent white is @samp{#ffffff80}. |
| |
| Integer or character @code{NA} colours are mapped internally to |
| transparent white, as is the character string @code{"NA"}. |
| |
| Negative colour numbers are an error. Colours greater than |
| @samp{@var{N}} are wrapped around, so that for example with the default |
| palette of size 8, colour @samp{10} is colour @samp{2} in the palette. |
| |
| Integer colours have been used more widely than the base graphics |
| sub-system, as they are supported by package @pkg{grid} and hence by |
| @CRANpkg{lattice} and @CRANpkg{ggplot2}. (They are also used by package |
| @CRANpkg{rgl}.) @pkg{grid} did re-define colour @samp{0} to be |
| transparent white, but @CRANpkg{rgl} used @code{col2rgb} and hence the |
| background colour of base graphics. |
| |
| Note that positive integer colours refer to the current palette and |
| colour @samp{0} to the current device (and a device is opened if needs |
| be). These are mapped to type @code{rcolor} at the time of use: this |
| matters when re-playing the display list, e.g.@: when a device is |
| resized or @code{dev.copy} is used. The palette should be thought of as |
| per-session: it is stored in package @pkg{grDevices}. |
| |
| The convention is that devices use the colorspace `sRGB'. This is an |
| industry standard: it is used by Web browsers and JPEGs from all but |
| high-end digital cameras. The interpretation is a matter for graphics |
| devices and for code that manipulates colours, but not for the graphics |
| engine or subsystems. |
| |
| @R{} uses a painting model similar to PostScript and PDF. This means |
| that where shapes (circles, rectangles and polygons) can both be filled |
| and have a stroked border, the fill should be painted first and then the |
| border (or otherwise only half the border will be visible). Where both |
| the fill and the border are semi-transparent there is some room for |
| interpretation of the intention. Most devices first paint the fill and |
| then the border, alpha-blending at each step. However, PDF does some |
| automatic grouping of objects, and @emph{when the fill and the border |
| have the same alpha}, they are painted onto the same layer and then |
| alpha-blended in one step. (See p. 569 of the PDF Reference Sixth |
| Edition, version 1.7. Unfortunately, although this is what the PDF |
| standard says should happen, it is not correctly implemented by some |
| viewers.) |
| |
| The mapping from colour numbers to type @code{rcolor} is primarily done |
| by function @code{RGBpar3}: this is exported from the @R{} binary but |
| linked to code in package @pkg{grDevices}. The first argument is a |
| @code{SEXP} pointing to a character, integer or double vector, and the |
| second is the @code{rcolor} value for colour @code{0} (or @code{"0"}). |
| C entry point @code{RGBpar} is a wrapper that takes @code{0} to be |
| transparent white: it is often used to set colour defaults for devices. |
| The @R{}-level wrapper is @code{col2rgb}. |
| |
| There is also @code{R_GE_str2col} which takes a C string and converts to |
| type @code{rcolor}: @code{"0'} is converted to transparent white. |
| |
| There is a @R{}-level conversion of colours to @samp{##RRGGBBAA} by |
| @code{image.default(useRaster = TRUE)}. |
| |
| The other color-conversion entry point in the API is @code{name2col} |
| which takes a colour name (a C string) and returns a value of type |
| @code{rcolor}. This handles @code{"NA"}, @code{"transparent"} and the |
| 657 colours known to the @R{} function @code{colors()}. |
| |
| @node Base graphics, Grid graphics, Colours, Graphics Devices |
| @section Base graphics |
| |
| The base graphics system was migrated to package @pkg{graphics} in @R{} |
| 3.0.0: it was previously implemented in files in @file{src/main}. |
| |
| For historical reasons it is largely implemented in two layers. |
| Files @file{plot.c}, @file{plot3d.c} and @file{par.c} contain the code |
| for the around 30 @code{.External} calls that implement the basic |
| graphics operations. This code then calls functions with names starting |
| with @code{G} and declared in header @file{Rgraphics.h} in file |
| @file{graphics.c}, which in turn call the graphics engine (whose |
| functions almost all have names starting with @code{GE}). |
| |
| A large part of the infrastructure of the base graphics subsystem are |
| the graphics parameters (as set/read by @code{par()}). These are stored |
| in a @code{GPar} structure declared in the private header |
| @file{Graphics.h}. This structure has two variables (@code{state} and |
| @code{valid}) tracking the state of the base subsystem on the device, |
| and many variables recording the graphics parameters and functions of |
| them. |
| |
| The base system state is contained in @code{baseSystemState} structure |
| defined in the private header @file{GraphicsBase.h}. |
| This contains three @code{GPar} |
| structures and a Boolean variable used to record if @code{plot.new()} |
| (or @code{persp}) has been used successfully on the device. |
| |
| The three copies of the @code{GPar} structure are used to store the |
| current parameters (accessed via @code{gpptr}), the `device copy' |
| (accessed via @code{dpptr}) and space for a saved copy of the `device |
| copy' parameters. The current parameters are, clearly, those currently |
| in use and are copied from the `device copy' whenever @code{plot.new()} |
| is called (whether or not that advances to the next `page'). The saved |
| copy keeps the state when the device was last completely cleared (e.g.@: |
| when @code{plot.new()} was called with @code{par(new=TRUE)}), and is |
| used to replay the display list. |
| |
| The separation is not completely clean: the `device copy' is altered if |
| a plot with log scale(s) is set up via @code{plot.window()}. |
| |
| There is yet another copy of most of the graphics parameters in |
| @code{static} variables in @file{graphics.c} which are used to preserve |
| the current parameters across the processing of inline parameters in |
| high-level graphics calls (handled by @code{ProcessInlinePars}). |
| |
| Snapshots of the base subsystem record the `saved device copy' of the |
| @code{GPar} structure. |
| |
| @menu |
| * Arguments and parameters:: |
| @end menu |
| |
| @node Arguments and parameters, , Base graphics, Base graphics |
| @subsection Arguments and parameters |
| |
| There is an unfortunate confusion between some of the graphical |
| parameters (as set by @code{par}) and arguments to base graphic |
| functions of the same name. This description may help set the record |
| straight. |
| |
| Most of the high-level plotting functions accept graphical parameters as |
| additional arguments, which are then often passed to lower-level |
| functions if not already named arguments (which is the main source of |
| confusion). |
| |
| Graphical parameter @code{bg} is the background colour of the plot. |
| Argument @code{bg} refers to the fill colour for the filled symbols |
| @code{21} to @code{25}. It is an argument to the function |
| @code{plot.xy}, but normally passed by the default method of |
| @code{points}, often from a @code{plot} method. |
| |
| Graphics parameters @code{cex}, @code{col}, @code{lty}, @code{lwd} and |
| @code{pch} also appear as arguments of @code{plot.xy} and so are often |
| passed as arguments from higher-level plot functions such as |
| @code{lines}, @code{points} and @code{plot} methods. They appear as |
| arguments of @code{legend}, @code{col}, @code{lty} and @code{lwd} are |
| arguments of @code{arrows} and @code{segments}. When used as arguments |
| they can be vectors, recycled to control the various lines, points and |
| segments. When set a graphical parameters they set the default |
| rendering: in addition @code{par(cex=)} sets the overall character |
| expansion which subsequent calls (as arguments or on-line graphical |
| parameters) multiply. |
| |
| The handling of missing values differs in the two classes of uses. |
| Generally these are errors when used in @code{par} but cause the |
| corresponding element of the plot to be omitted when used as an element |
| of a vector argument. Originally the interpretation of arguments was |
| mainly left to the device, but nowadays some of this is pre-empted in |
| the graphics engine (but for example the handling of @code{lwd = 0} |
| remains device-specific, with some interpreting it as a `thinnest |
| possible' line). |
| |
| @node Grid graphics, , Base graphics, Graphics Devices |
| @section Grid graphics |
| |
| [At least pointers to documentation.] |
| |
| @node GUI consoles, Tools, Graphics Devices, Top |
| @chapter GUI consoles |
| |
| The standard @R{} front-ends are programs which run in a terminal, but |
| there are several ways to provide a GUI console. |
| |
| This can be done by a package which is loaded from terminal-based @R{} |
| and launches a console as part of its startup code or by the user |
| running a specific function: package @CRANpkg{Rcmdr} is a well-known |
| example with a Tk-based GUI. |
| |
| There used to be a Gtk-based console invoked by @command{R --gui=GNOME}: |
| this relied on special-casing in the front-end shell script to launch a |
| different executable. There still is @command{R --gui=Tk}, which starts |
| terminal-based @R{} and runs @code{tcltk::tkStartGui()} as part of the |
| modified startup sequence. |
| |
| However, the main way to run a GUI console is to launch a separate |
| program which runs embedded @R{}: this is done by @command{Rgui.exe} on |
| Windows and @command{R.app} on macOS. The first is an integral part |
| of @R{} and the code for the console is currently in @file{R.dll}. |
| |
| @menu |
| * R.app:: |
| @end menu |
| |
| @node R.app, , GUI consoles, GUI consoles |
| @section R.app |
| |
| @command{R.app} is a macOS application which provides a console. Its |
| sources are a separate project@footnote{an Xcode project, in SVN at |
| @uref{https://svn.r-project.org/R-packages/trunk/Mac-GUI/}.}, and its binaries |
| link to an @R{} installation which it runs as a dynamic library |
| @file{libR.dylib}. The standard @acronym{CRAN} distribution of @R{} for |
| macOS bundles the GUI and @R{} itself, but installing the GUI is optional |
| and either component can be updated separately. |
| |
| @command{R.app} relies on @file{libR.dylib} being in a specific place, |
| and hence on @R{} having been built and installed as a Mac macOS |
| `framework'. Specifically, it uses |
| @file{/Library/Frameworks/R.framework/R}. This is a symbolic link, as |
| frameworks can contain multiple versions of @R{}. It eventually |
| resolves to |
| @file{/Library/Frameworks/R.framework/Versions/Current/Resources/lib/libR.dylib}, |
| which is (in the @acronym{CRAN} distribution) a `fat' binary containing |
| multiple sub-architectures. |
| |
| macOS applications are directory trees: each @command{R.app} contains |
| a front-end written in Objective-C for one sub-architecture: in the |
| standard distribution there are separate applications for 32- and 64-bit |
| Intel architectures. |
| |
| Originally the @R{} sources contained quite a lot of code used only by |
| the macOS GUI, but this was migrated to the @command{R.app} sources. |
| |
| @command{R.app} starts @R{} as an embedded application with a |
| command-line which includes @option{--gui=aqua} (see below). It uses |
| most of the interface pointers defined in the header |
| @file{Rinterface.h}, plus a private interface pointer in file |
| @file{src/main/sysutils.c}. It adds an environment |
| it names @code{tools:RGUI} to the second position in the search path. |
| This contains a number of utility functions used to support the menu |
| items, for example @code{package.manager()}, plus functions @code{q()} |
| and @code{quit()} which mask those in package @pkg{base}---the custom |
| versions save the history in a way specific to @code{R.app}. |
| |
| There is a @command{configure} option @option{--with-aqua} for @R{} |
| which customizes the way @R{} is built: this is distinct from the |
| @option{--enable-R-framework} option which causes @command{make install} |
| to install @R{} as the framework needed for use with @code{R.app}. (The |
| option @option{--with-aqua} is the default on macOS.) It sets the |
| macro @code{HAVE_AQUA} in @file{config.h} and the make variable |
| @code{BUILD_AQUA_TRUE}. These have several consequences: |
| |
| @itemize |
| @item |
| The @code{quartz()} device is built (other than as a stub) in package |
| @pkg{grDevices}: this needs an Objective-C compiler. Then |
| @code{quartz()} can be used with terminal @R{} provided the latter has |
| access to the macOS screen. |
| |
| @item |
| File @file{src/unix/aqua.c} is compiled. This now only contains an |
| interface pointer for the @code{quartz()} device(s). |
| |
| @item |
| @code{capabilities("aqua")} is set to @code{TRUE}. |
| |
| @item |
| The default path for a personal library directory is set as |
| @file{~/Library/R/arch/x.y/library}. |
| @c This is done in @file{etc/Renviron}. |
| |
| @item |
| There is support for setting a `busy' indicator whilst waiting for |
| @code{system()} to return. |
| |
| @item |
| @code{R_ProcessEvents} is inhibited in a forked child from package |
| @pkg{parallel}. The associated callback in @code{R.app} does things |
| which should not be done in a child, and forking forks the whole process |
| including the console. |
| |
| @item |
| There is support for starting the embedded @R{} with the option |
| @option{--gui=aqua}: when this is done the global C variable |
| @code{useaqua} is set to a true value. This has consequences: |
| |
| @itemize |
| @item |
| The @R{} session is asserted to be interactive @emph{via} @code{R_Interactive}. |
| |
| @item |
| @code{.Platform$GUI} is set to @code{"AQUA"}. That has consequences: |
| @itemize |
| @item |
| The environment variable @env{DISPLAY} is set to @samp{:0} if not |
| already set. |
| |
| @item |
| @file{/usr/local/bin} is appended to @env{PATH} since that is where |
| @command{gfortran} is installed. |
| |
| @item |
| The default @HTML{} browser is switched to the one in @command{R.app}. |
| |
| @item |
| Various widgets are switched to the versions provided in |
| @command{R.app}: these include graphical menus, the data editor (but not |
| the data viewer used by @code{View()}) and the workspace browser invoked |
| by @code{browseEnv()}. |
| |
| @item |
| The @pkg{grDevices} package when loaded knows that it is being run |
| under @command{R.app} and so informs any @code{quartz} devices that a |
| Quartz event loop is already running. |
| @end itemize |
| |
| @item |
| The use of the OS's @code{system} function (including by @code{system()} |
| and @code{system2()}, and to launch editors and pagers) is replaced by a |
| version in @code{R.app} (which by default just calls the OS's |
| @code{system} with various signal handlers reset). |
| |
| @end itemize |
| |
| @item |
| If either @R{} was started by @option{--gui=aqua} or @R{} is running in |
| a terminal which is not of type @samp{dumb}, the standard output to |
| files @file{stdout} and @file{stderr} is directed through the C function |
| @code{Rstd_WriteConsoleEx}. This uses ANSI terminal escapes to render |
| lines sent to @code{stderr} as bold on @code{stdout}. |
| |
| @item |
| For historical reasons the startup option @code{-psn} is allowed but |
| ignored. (It seems that in 2003, @samp{r27492}, this was added by Finder.) |
| |
| @end itemize |
| |
| |
| |
| @node Tools, R coding standards, GUI consoles, Top |
| @chapter Tools |
| |
| The behavior of @command{R CMD check} can be controlled through a |
| variety of command line arguments and environment variables. |
| |
| There is an internal @option{--install=@var{value}} command line |
| argument not shown by @command{R CMD check --help}, with possible values |
| |
| @table @code |
| @item check:@var{file} |
| Assume that installation was already performed with stdout/stderr to |
| @var{file}, the contents of which need to be checked (without repeating |
| the installation). This is useful for checks applied by repository |
| maintainers: it reduces the check time by the installation time given |
| that the package has already been installed. In this case, one also |
| needs to specify @emph{where} the package was installed to using command |
| line option @option{--library}. |
| @item fake |
| Fake installation, and turn off the run-time tests. |
| @item skip |
| Skip installation, e.g., when testing recommended packages bundled with |
| R. |
| @item no |
| The same as @option{--no-install} : turns off installation and the tests |
| which require the package to be installed. |
| @end table |
| |
| The following environment variables can be used to customize the |
| operation of @command{check}: a convenient place to set these is the |
| check environment file (default, @file{~/.R/check.Renviron}). |
| |
| @vtable @code |
| @item _R_CHECK_ALL_NON_ISO_C_ |
| If true, do not ignore compiler (typically GCC) warnings about non ISO C |
| code in @emph{system} headers. Note that this may also show additional |
| ISO C++ warnings. |
| Default: false. |
| @item _R_CHECK_FORCE_SUGGESTS_ |
| If true, give an error if suggested packages are not available. |
| Default: true (but false for CRAN submission checks). |
| @item _R_CHECK_RD_CONTENTS_ |
| If true, check @file{Rd} files for auto-generated content which needs |
| editing, and missing argument documentation. |
| Default: true. |
| @item _R_CHECK_RD_LINE_WIDTHS_ |
| If true, check @file{Rd} line widths in usage and examples sections. |
| Default: false (but true for CRAN submission checks). |
| @item _R_CHECK_RD_STYLE_ |
| If true, check whether @file{Rd} usage entries for S3 methods use the full |
| function name rather than the appropriate @code{\method} markup. |
| Default: true. |
| @item _R_CHECK_RD_XREFS_ |
| If true, check the cross-references in @file{.Rd} files. |
| Default: true. |
| @item _R_CHECK_SUBDIRS_NOCASE_ |
| If true, check the case of directories such as @file{R} and @file{man}. |
| Default: true. |
| @item _R_CHECK_SUBDIRS_STRICT_ |
| Initial setting for @option{--check-subdirs}. |
| Default: @samp{default} (which checks only tarballs, and checks in the |
| @file{src} only if there is no @file{configure} file). |
| @item _R_CHECK_USE_CODETOOLS_ |
| If true, make use of the @CRANpkg{codetools} package, which provides a |
| detailed analysis of visibility of objects (but may give false |
| positives). |
| Default: true (if recommended packages are installed). |
| @item _R_CHECK_USE_INSTALL_LOG_ |
| If true, record the output from installing a package as part of its |
| check to a log file (@file{00install.out} by default), even when running |
| interactively. |
| Default: true. |
| @item _R_CHECK_VIGNETTES_NLINES_ |
| Maximum number of lines to show from the bottom of the output when |
| reporting errors in running or re-building vignettes. ( Value @code{0} |
| means all lines will be shown.) |
| Default: 10 for running, 25 for re-building. |
| @item _R_CHECK_CODOC_S4_METHODS_ |
| Control whether @code{codoc()} testing is also performed on S4 methods. |
| Default: true. |
| @item _R_CHECK_DOT_INTERNAL_ |
| Control whether the package code is scanned for @code{.Internal} calls, |
| which should only be used by base (and occasionally by recommended) packages. |
| Default: true. |
| @item _R_CHECK_EXECUTABLES_ |
| Control checking for executable (binary) files. |
| Default: true. |
| @item _R_CHECK_EXECUTABLES_EXCLUSIONS_ |
| Control whether checking for executable (binary) files ignores files |
| listed in the package's @file{BinaryFiles} file. |
| Default: true (but false for CRAN submission checks). |
| However, most likely this package-level override mechanism will be |
| removed eventually. |
| @item _R_CHECK_PERMISSIONS_ |
| Control whether permissions of files should be checked. |
| Default: true iff @code{.Platform$OS.type == "unix"}. |
| @item _R_CHECK_FF_CALLS_ |
| Allows turning off @code{checkFF()} testing. If set to |
| @samp{registration}, checks the registration information (number of |
| arguments, correct choice of @code{.C/.Fortran/.Call/.External}) for |
| such calls provided the package is installed. |
| Default: true. |
| @item _R_CHECK_FF_DUP_ |
| Controls @code{checkFF(check_DUP)} |
| Default: true (and forced to be true for CRAN submission checks). |
| @item _R_CHECK_LICENSE_ |
| Control whether/how license checks are performed. A possible value is |
| @samp{maybe} (warn in case of problems, but not about standardizable |
| non-standard license specs). |
| Default: true. |
| @item _R_CHECK_RD_EXAMPLES_T_AND_F_ |
| Control whether @code{check_T_and_F()} also looks for ``bad'' (global) |
| @samp{T}/@samp{F} uses in examples. |
| Off by default because this can result in false positives. |
| @item _R_CHECK_RD_CHECKRD_MINLEVEL_ |
| Controls the minimum level for reporting warnings from @code{checkRd}. |
| Default: -1. |
| @item _R_CHECK_XREFS_REPOSITORIES_ |
| If set to a non-empty value, a space-separated list of repositories to |
| use to determine known packages. Default: empty, when the CRAN |
| and Bioconductor repositories known to @R{} is used. |
| @item _R_CHECK_SRC_MINUS_W_IMPLICIT_ |
| Control whether installation output is checked for compilation warnings |
| about implicit function declarations (as spotted by GCC with command |
| line option @option{-Wimplicit-function-declaration}, which is implied |
| by @option{-Wall}). NB: implicit function declarations are errors in |
| some recent C compilers, including Apple @command{clang}. |
| Default: true from @R{} 4.2.0, previously false. |
| @item _R_CHECK_SRC_MINUS_W_UNUSED_ |
| Control whether installation output is checked for compilation warnings |
| about unused code constituents (as spotted by GCC with command line |
| option @option{-Wunused}, which is implied by @option{-Wall}). |
| Default: true. |
| @item _R_CHECK_WALL_FORTRAN_ |
| Control whether gfortran 4.0 or later @option{-Wall} warnings are used in |
| the analysis of installation output. |
| Default: false, even though the warnings are justifiable. |
| @item _R_CHECK_ASCII_CODE_ |
| If true, check @R{} code for non-ascii characters. |
| Default: true. |
| @item _R_CHECK_ASCII_DATA_ |
| If true, check data for non-ascii characters. @emph{En route}, checks |
| that all the datasets can be loaded and that their components can be |
| accessed. |
| Default: true. |
| @item _R_CHECK_COMPACT_DATA_ |
| If true, check data for ascii and uncompressed saves, and also check if |
| using @command{bzip2} or @code{xz} compression would be significantly |
| better. |
| Default: true. |
| @item _R_CHECK_SKIP_ARCH_ |
| Comma-separated list of architectures that will be omitted from |
| checking in a multi-arch setup. |
| Default: none. |
| @item _R_CHECK_SKIP_TESTS_ARCH_ |
| Comma-separated list of architectures that will be omitted from |
| running tests in a multi-arch setup. |
| Default: none. |
| @item _R_CHECK_SKIP_EXAMPLES_ARCH_ |
| Comma-separated list of architectures that will be omitted from |
| running examples in a multi-arch setup. |
| Default: none. |
| @item _R_CHECK_VC_DIRS_ |
| Should the unpacked package directory be checked for version-control |
| directories (@file{CVS}, @file{.svn} @dots{})? |
| Default: true for tarballs. |
| @item _R_CHECK_PKG_SIZES_ |
| Should @command{du} be used to find the installed sizes of packages? |
| @command{R CMD check} does check for the availability of @command{du}. |
| but this option allows the check to be overruled if an unsuitable |
| command is found (including one that does not respect the @option{-k} |
| flag to report in units of 1Kb, or reports in a different format -- the |
| GNU, macOS and Solaris @command{du} commands have been tested). |
| Default: true if @command{du} is found. |
| @item _R_CHECK_PKG_SIZES_THRESHOLD_ |
| Threshold used for @env{_R_CHECK_PKG_SIZES_} (in Mb). |
| Default: 5 |
| @item _R_CHECK_DOC_SIZES_ |
| Should @command{qpdf} be used to check the installed sizes of PDFs? |
| Default: true if @command{qpdf} is found. |
| @item _R_CHECK_DOC_SIZES2_ |
| Should @command{gs} be used to check the installed sizes of PDFs? This |
| is slower than (and in addition to) the previous check, but does detect |
| figures with excessive detail (often hidden by over-plotting) or bitmap |
| figures with too high a resolution. Requires that @env{R_GSCMD} is set |
| to a valid program, or @command{gs} (or on Windows, |
| @command{gswin32.exe} or @command{gswin64c.exe}) is on the path. |
| Default: false (but true for CRAN submission checks). |
| @item _R_CHECK_ALWAYS_LOG_VIGNETTE_OUTPUT_ |
| By default the output from running the @R{} code in the vignettes is |
| kept only if there is an error. This also applies to the |
| @file{build_vignettes.log} log from the re-building of vignettes. |
| Default: false. |
| @item _R_CHECK_CLEAN_VIGN_TEST_ |
| Should the @file{vign_test} directory be removed if the test is |
| successful? |
| Default: true. |
| @item _R_CHECK_REPLACING_IMPORTS_ |
| Should warnings about replacing imports be reported? These sometimes come |
| from auto-generated @file{NAMESPACE} files in other packages, but most |
| often from importing the whole of a namespace rather than using |
| @code{importFrom}. |
| Default: true. |
| @item _R_CHECK_UNSAFE_CALLS_ |
| Check for calls that appear to tamper with (or allow tampering with) |
| already loaded code not from the current package: such calls may well |
| contravene CRAN policies. |
| Default: true. |
| @item _R_CHECK_TIMINGS_ |
| Optionally report timings for installation, examples, tests and |
| running/re-building vignettes as part of the check log. The format is |
| @samp{[as/bs]} for the total CPU time (including child processes) |
| @samp{a} and elapsed time @samp{b}, except on Windows, when it is |
| @samp{[bs]}. In most cases timings are only given for @samp{OK} checks. |
| Times with an elapsed component over 10 mins are reported in minutes |
| (with abbreviation @samp{m}). The value is the smallest numerical value |
| in elapsed seconds that should be reported: non-numerical values |
| indicate that no report is required, a value of @samp{0} that a report |
| is always required. |
| Default: @code{""}. (@code{10} for CRAN checks.) |
| |
| @item _R_CHECK_EXAMPLE_TIMING_THRESHOLD_ |
| If timings are being recorded, set the threshold in seconds for |
| reporting long-running examples (either user+system CPU time or elapsed |
| time). Default: @code{"5"}. |
| |
| @item _R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD_ |
| For checks with timings enabled, report examples where the ratio of CPU |
| time to elapsed time exceeds this threshold (and the CPU time is at |
| least one second). This can help detect the simultaneous use of |
| multiple CPU cores. |
| Default: @code{NA}. |
| |
| @item _R_CHECK_TEST_TIMING_CPU_TO_ELAPSED_THRESHOLD_ |
| Report for running an individual test if the ratio of CPU time to |
| elapsed time exceeds this threshold (and the CPU time is at least one |
| second). Not supported on Windows. |
| Default: @code{NA}. |
| |
| @item _R_CHECK_VIGNETTE_TIMING_CPU_TO_ELAPSED_THRESHOLD_ |
| Report if when running/re-building vignettes (individually or in |
| aggregate) the ratio of CPU time to elapsed time exceeds this threshold |
| (and the CPU time is at least one second). Not supported on |
| Windows. |
| Default: @code{NA}. |
| |
| @item _R_CHECK_CODETOOLS_PROFILE_ |
| A string with comma-separated @code{@var{name}=@var{value}} pairs (with |
| @var{value} a logical constant) giving additional arguments for the |
| @CRANpkg{codetools} functions used for analyzing package code. E.g., |
| use @code{_R_CHECK_CODETOOLS_PROFILE_="suppressLocalUnused=FALSE"} to |
| turn off suppressing warnings about unused local variables. Default: no |
| additional arguments, corresponding to using @code{skipWith = TRUE}, |
| @code{suppressPartialMatchArgs = FALSE} and @code{suppressLocalUnused = |
| TRUE}. |
| |
| @item _R_CHECK_CRAN_INCOMING_ |
| Check whether package is suitable for publication on CRAN. |
| Default: false, except for CRAN submission checks. |
| |
| @item _R_CHECK_CRAN_INCOMING_REMOTE_ |
| Include checks that require remote access among the above. |
| Default: same as @code{_R_CHECK_CRAN_INCOMING_} |
| |
| @item _R_CHECK_XREFS_USE_ALIASES_FROM_CRAN_ |
| When checking anchored Rd xrefs, use Rd aliases from the CRAN package |
| web areas in addition to those in the packages installed locally. |
| Default: false. |
| |
| @item _R_SHLIB_BUILD_OBJECTS_SYMBOL_TABLES_ |
| Make the checks of compiled code more accurate by recording the symbol |
| tables for objects (@file{.o} files) at installation in a file |
| @file{symbols.rds}. (Only currently supported on Linux, Solaris, macOS, |
| Windows and FreeBSD.) |
| Default: true. |
| |
| @item _R_CHECK_CODE_ASSIGN_TO_GLOBALENV_ |
| Should the package code be checked for assignments to the global |
| environment? |
| Default: false (but true for CRAN submission checks). |
| |
| @item _R_CHECK_CODE_ATTACH_ |
| Should the package code be checked for calls to @code{attach()}? |
| Default: false (but true for CRAN submission checks). |
| |
| @item _R_CHECK_CODE_DATA_INTO_GLOBALENV_ |
| Should the package code be checked for calls to @code{data()} which load |
| into the global environment? |
| Default: false (but true for CRAN submission checks). |
| |
| @item _R_CHECK_DOT_FIRSTLIB_ |
| Should the package code be checked for the presence of the obsolete function |
| @code{.First.lib()}? |
| Default: false (but true for CRAN submission checks). |
| |
| @item _R_CHECK_DEPRECATED_DEFUNCT_ |
| Should the package code be checked for the presence of recently deprecated |
| or defunct functions (including completely removed functions). Also for |
| platform-specific graphics devices. |
| Default: false (but true for CRAN submission checks). |
| |
| @item _R_CHECK_SCREEN_DEVICE_ |
| If set to @samp{warn}, give a warning if examples etc open a screen |
| device. If set to @samp{stop}, give an error. |
| Default: empty (but @samp{stop} for CRAN submission checks). |
| |
| @item _R_CHECK_WINDOWS_DEVICE_ |
| If set to @samp{stop}, give an error if a Windows-only device is used in |
| example etc. This is only useful on Windows: the devices do not exist |
| elsewhere. |
| Default: empty (but @samp{stop} for CRAN submission checks on Windows). |
| |
| @item _R_CHECK_TOPLEVEL_FILES_ |
| Report on top-level files in the package sources that are not described |
| in `Writing R Extensions' nor are commonly understood (like |
| @file{ChangeLog}). Variations on standard names (e.g.@: |
| @file{COPYRIGHT}) are also reported. |
| Default: false (but true for CRAN submission checks). |
| |
| @item _R_CHECK_GCT_N_ |
| Should the @option{--use-gct} use @code{gctorture2(@var{n})} rather than |
| @code{gctorture(TRUE)}? Use a positive integer to enable this. |
| Default: @code{0}. |
| |
| @item _R_CHECK_LIMIT_CORES_ |
| If set, check the usage of too many cores in package @pkg{parallel}. If |
| set to @samp{warn} gives a warning, to @samp{false} or @samp{FALSE} the |
| check is skipped, and any other non-empty value gives an error when more |
| than 2 children are spawned. |
| Default: unset (but @samp{TRUE} for CRAN submission checks). |
| |
| @item _R_CHECK_CODE_USAGE_VIA_NAMESPACES_ |
| If set, check code usage (via @CRANpkg{codetools}) directly on the |
| package namespace without loading and attaching the package and its |
| suggests and enhances. |
| Default: true (and true for CRAN submission checks). |
| |
| @item _R_CHECK_CODE_USAGE_WITH_ONLY_BASE_ATTACHED_ |
| If set, check code usage (via @CRANpkg{codetools}) with only the base |
| package attached. |
| Default: true. |
| |
| @item _R_CHECK_EXIT_ON_FIRST_ERROR_ |
| If set to a true value, the check will exit on the first error. |
| Default: false. |
| |
| @item _R_CHECK_S3_METHODS_NOT_REGISTERED_ |
| If set to a true value, report (apparent) S3 methods exported but not |
| registered. |
| Default: true. |
| |
| @item _R_CHECK_OVERWRITE_REGISTERED_S3_METHODS_ |
| If set to a true value, report already registered S3 methods in |
| base/recommended packages which are overwritten when this package's |
| namespace is loaded. |
| Default: false (but true for CRAN submission checks). |
| |
| @item _R_CHECK_TESTS_NLINES_ |
| Number of trailing lines of test output to reproduce in the log. If |
| @code{0} all lines except the @R{} preamble are reproduced. |
| Default: 13. |
| |
| @item _R_CHECK_NATIVE_ROUTINE_REGISTRATION_ |
| If set to a true value, report if the entry points to register native |
| routines and to suppress dynamic search are not found in a package's |
| DLL. (@strong{NB:} this requires system command @command{nm} to be on the |
| @env{PATH}. On Windows, @command{objdump.exe} is first searched for in |
| compiler toolchain specified via @code{Makeconf} (can be customized by |
| environment variable @env{BINPREF}). If not found there, it must be on the |
| @env{PATH}. On Unix this would be normal when using a package with compiled |
| code (which are the only ones this checks), but Windows' users should check. |
| Default: false (but true for CRAN submission checks). |
| |
| @item _R_CHECK_NO_STOP_ON_TEST_ERROR_ |
| If set to a true value, do not stop running tests after first error (as |
| if command line option @option{--no-stop-on-test-error} had been given). |
| Default: false (but true for CRAN submission checks). |
| |
| @item _R_CHECK_PRAGMAS_ |
| Run additional checks on the pragmas in C/C++ source code and headers. |
| Default: false (but true for CRAN submission checks). |
| |
| @item _R_CHECK_COMPILATION_FLAGS_ |
| If the package is installed and has C/C++/Fortran code, check the |
| install log for non-portable flags (for example those added to |
| @file{src/Makevars} during configuration). Currently @option{-W} flags |
| are reported, except @option{-Wall}, @option{-Wextra} and |
| @option{-Weverything}, and flags which appear to be attempts to suppress |
| warnings are highlighted. |
| See |
| @ifset UseExternalXrefs |
| @ref{Writing portable packages, , Writing portable packages, R-exts, Writing R Extensions} |
| @end ifset |
| @ifclear UseExternalXrefs |
| `Writing R Extensions' |
| @end ifclear |
| for the rationale of this check (and why even @option{-Werror} is |
| unsafe). |
| |
| Environment variable @env{_R_CHECK_COMPILATION_FLAGS_KNOWN_} can be set |
| to a space-separated set of flags which come from the @R{} build used |
| for testing (flags such as @option{-Wall} and @option{-Wextra} are |
| already known). For example, for CRAN build of @R{} >= 4.0.0 on macOS |
| one could use |
| @example |
| _R_CHECK_COMPILATION_FLAGS_KNOWN_="-mmacosx-version-min=10.13" |
| @end example |
| @noindent |
| Default: false (but true for CRAN submission checks). |
| |
| @item _R_CHECK_R_DEPENDS_ |
| Check that any dependence on R is not on a recent patch-level version |
| such as @code{R (>= 3.3.3)} since blocking installation of a package |
| will also block its reverse dependencies. Possible values |
| @samp{"note"}, @samp{"warn"} and logical values (where currently true |
| values are equivalent to @samp{"note"}). |
| Default: false (but @samp{"warn"} for @option{--as-cran}). |
| |
| @item _R_CHECK_SERIALIZATION_ |
| Check that serialized @R{} objects in the package sources were |
| serialized with version 2 and there is no dependence on @samp{R >= |
| 3.5.0}. (Version 3 is in use as from @R{} 3.5.0 but should only be used |
| when necessary.) |
| Default: false (but true for CRAN submission checks). |
| |
| @item _R_CHECK_R_ON_PATH_ |
| This checks if the package attempts to use @command{R} or |
| @command{Rscript} from the path rather than that under test. |
| It does so by putting scripts at the head of the path which print a |
| message and fail. |
| Default: false (but true for CRAN submission checks). |
| |
| @item _R_CHECK_PACKAGES_USED_IN_TESTS_USE_SUBDIRS_ |
| If set to a true value, also check the R code in common unit test |
| subdirectories of @file{tests} for undeclared package dependencies. |
| Default: false (but true for CRAN submission checks). |
| |
| @item _R_CHECK_SHLIB_OPENMP_FLAGS_ |
| Check correct and portable use of @code{SHLIB_OPENMP_*FLAGS} in |
| @file{src/Makevars} (and similar). |
| Default: false (but true for CRAN submission checks). |
| |
| @item _R_CHECK_CONNECTIONS_LEFT_OPEN_ |
| When checking examples, check for each example if connections are left |
| open: if any are found, this is reported with a fatal error. NB: |
| `connections' includes most use of files and any parallel clusters which |
| have not be stopped by @code{stopCluster()}. |
| Default: false (but true for CRAN submission checks). |
| |
| @item _R_CHECK_FUTURE_FILE_TIMESTAMPS_ |
| Check if any of the input files has a timestamp in the future (and to do |
| so, checks that the system clock is correct to within 5 minutes). |
| Default: false (but true for CRAN submission checks). |
| @c _R_CHECK_SYSTEM_CLOCK_ can be used to disable the clock check, for |
| @c use on a check farm. |
| |
| @item _R_CHECK_LENGTH_1_CONDITION_ |
| No longer in use: conditions of length greater than one in @code{if} or |
| @code{while} statements are now an error. |
| |
| @item _R_CHECK_LENGTH_1_LOGIC2_ |
| Optionally check if an argument of the binary operators @code{&&} and |
| @code{||} has length greater than one, the right-hand side being checked |
| only if it is used. For a false value (@samp{F}, @samp{False}, |
| @samp{FALSE} or @samp{false}) or when unset, print a warning. Any other |
| non-true non-empty value needs to be a list of commands separated by |
| commas: @samp{abort} causes R to terminate unconditionally instead of |
| signalling an error (which is useful in pinpointing issues in code |
| called from @code{try} or @code{tryCatch}), @samp{verbose} prints a very |
| detailed diagnostic message, @samp{package:pkg} restricts the check to |
| if/while statements executing in the namespace of package @samp{pkg} |
| (but @code{all_base} refers to all the standard packages), |
| @samp{package:_R_CHECK_PACKAGE_NAME_} restricts the check to if/while |
| statements executing in the package that is currently being checked by |
| @code{R CMD check}, @samp{warn} causes R to report a warning instead of |
| signalling an error. (More than one package specification can be given: |
| a report, error or warning will be given if any are satisfied.) |
| Default: unset (a warning is reported), but |
| @samp{package:_R_CHECK_PACKAGE_NAME_,abort,verbose} for the CRAN |
| submission checks. |
| |
| @item _R_CHECK_VIGNETTES_SKIP_RUN_MAYBE_ |
| Should running the @R{} code in the vignettes be skipped if vignette |
| outputs are to be rebuilt (which will involve running that code). |
| Default: false (but true for CRAN checking) |
| |
| @item _R_CHECK_BUILD_VIGNETTES_SEPARATELY_ |
| Prior to @R{} 3.6.0, re-building the vignette outputs was done in a |
| single @R{} session which allowed accidental reliance of one vignette on |
| another (for example, in the loading of packages). The current default |
| is to use a separate session for each vignette; this option allows |
| testing the older behaviour, |
| Default: true |
| |
| @item _R_CHECK_SYSTEM_CLOCK_ |
| As part of the `checking for future file timestamps' enabled by |
| @option{--as-cran}, check the system clock against an external clock to |
| catch errors such as the wrong day or even year. Not necessary on |
| systems doing repeated checks. |
| Default: true (but false for CRAN checking) |
| |
| @item _R_CHECK_AUTOCONF_ |
| For packages with a @file{configure} file generated by GNU |
| @command{autoconf} and either @file{configure.ac} or |
| @file{configure,.in}, check that @command{autoreconf} can, if available, |
| be run in a copy of the sources (this will detect missing source files |
| and report @command{autoconf} warnings). Environment variable |
| @env{AUTORECONf} controls the command used: it can give the full path to |
| @command{autoreconf} (without spaces) and can include flags such as |
| @option{--warnings=obsolete} (which is added for @command{autoreconf} |
| version 2.68 or 2.69 and is the default for later versions). |
| Default: false (but true for CRAN submission checks). |
| |
| @item _R_CHECK_DATALIST_ |
| Check whether file @file{data/datalist} is out-of-date. |
| Default: false (but true for CRAN submission checks). |
| |
| @item _R_CHECK_THINGS_IN_CHECK_DIR_ |
| Check and report at the end of the check run if files have been left in |
| the check directory. |
| Default: false (but true for CRAN submission checks). |
| |
| @item _R_CHECK_THINGS_IN_TEMP_DIR_ |
| Check and report at the end of tthe check run if files would have been |
| left in the temporary directory (usually @file{/tmp} on a Unix-alike). |
| It does this by setting the environment variable @env{TEMPDIR} to a |
| subdirectory of the @R{} session directory for the @code{check} process: |
| if any files or directories are left there they are removed. Since some |
| of these might be out of the user's control, environment variable |
| @env{_R_CHECK_THINGS_IN_TEMP_DIR_EXCLUDE_} can specify an (extended |
| regex) pattern of file paths not to be reported -- CRAN uses |
| @samp{^ompi.} for directories left behind by OpenMPI. There are rare |
| instances where @env{TEMPDIR} is not respected and so files are left in |
| @file{/tmp} (and not reported, but see the next item): one example is |
| @file{/tmp/boost_interprocess} on some OSes. |
| @c macOS is one. |
| Default: false (but true for CRAN submission checks). |
| |
| @item _R_CHECK_THINGS_IN_OTHER_DIRS_ |
| Check and report at the end of the check run if new files or directories |
| are created in a selected set of directories during the check run. |
| (This is confined to files owned by the user running the check process.) |
| Currently the directories monitored are the home directory, @file{/tmp} |
| (excluding @file{RtmpXXXXXX} dirs), @file{/dav/shm}, @file{~/.cache} |
| (recursively) and @file{~/.local/share} (recursively) or their |
| equivalents on Windows and macOS (the directories in which the default |
| settings for @code{tools::R_user_dir()} use a @file{R} subdirectory). |
| Additional directories can be specified in environment variable |
| @env{_R_CHECK_THINGS_IN_OTHER_DIRS_XTRA_}, separated by semicolons. |
| Directories are reported with a trailing @samp{/} on all platforms. |
| |
| Environment variable @env{_R_CHECK_THINGS_IN_OTHER_DIRS_EXCLUDE_} can |
| specify an (extended regex) pattern of file paths not to be reported -- |
| this should match absolute file paths with home represented by |
| @file{~}. For example, on a Linux system |
| @example |
| '^~/.cache/(mozilla/firefox|mesa_shader_cache)/' |
| @end example |
| @noindent |
| matches cache directories used by Firefox and OpenGL (and their |
| content). If the value starts with @samp{@@} it is considered as a |
| filepath which is read with each line treated as a pattern to be |
| matched. |
| |
| Note that other processes (include check runs in parallel) may create |
| new files in these directories which will get reported. However, this |
| optional check is very useful for narrowing down possible packages which |
| are leaving behind unexpected files. |
| @c |
| Default: false |
| |
| @item _R_CHECK_BASHISMS_ |
| Check the top-level scripts @file{configure} (unless generated by |
| @file{autoconf}) and @file{cleanup} for non-Bourne-shell code, using the |
| Perl script @command{checkbashisms} if available. This includes |
| reporting scripts using the non-portable @code{#! /bin/bash}. |
| (Script @command{checkbashisms} is available in most Linux distributions |
| in a package named either @samp{devscripts} or @samp{devscripts-checkbashisms} |
| and from @uref{https://sourceforge.net/projects/checkbaskisms/files}.) |
| Default: false (but true for CRAN submission checks except on Windows). |
| |
| @item _R_CHECK_ORPHANED_ |
| Check if dependencies are orphaned packages. As from @R{} 4.1.0 this |
| checks strict dependencies recursively, so will report any orphaned |
| packages which are needed to attach the package by @code{library()} as |
| well as any orphaned packages which are suggested. |
| Default: false (but true for CRAN submission checks). |
| |
| @item _R_CHECK_EXCESSIVE_IMPORTS_ |
| A positive integer. If set, give a NOTE if the number of imports from |
| non-base packages exceed this threshold. Large numbers of imports |
| make a package vulnerable to any of them becoming unavailable. |
| Default: unset (but 20 for CRAN submission checks) |
| |
| @item _R_CHECK_DONTTEST_EXAMPLES_ |
| If true and examples are found with @code{\donttest} sections, the |
| tests are run in one pass with these commented out and then in a |
| second pass including the @code{\donttest} sections, (for the main |
| architecture only). Only for the first pass are the results compared |
| to any @file{.Rout.save} file and timings analysed. Overridden by |
| @option{--run-donttest}. |
| Default: false unless @option{-as-cran} is specified (which can be |
| overridden by setting @samp{_R_CHECK_DONTTEST_EXAMPLES_=false}). |
| |
| @item _R_CHECK_XREFS_PKGS_ARE_DECLARED_ |
| Check if packages used in `anchored' cross-references in @file{.Rd} |
| files (those of the form @code{\link[@var{pkg}]@{@var{foo}@}} and |
| @code{\link[@var{pkg:bar}]@{@var{foo}@}}) are declared in the |
| @file{DESCRIPTION} file and so these links can be checked. |
| Default: false. |
| |
| @item _R_CHECK_XREFS_MIND_SUSPECT_ANCHORS_ |
| Check if package-anchored Rd cross-references are to @emph{files} (and |
| not aliases). |
| Default: false. |
| |
| @item _R_CHECK_BOGUS_RETURN_ |
| If true and @env{_R_CHECK_USE_CODETOOLS_} is also true, functions are |
| scanned for use of @code{return} rather than @code{return()}. |
| Default: false (but true for CRAN submission checks). |
| |
| @item _R_CHECK_MATRIX_DATA_ |
| By default, the check for a mismatch between the data length and the |
| dimensions in a call to @code{matrix} gives a warning: setting this to a |
| true value gives an error with a compact traceback. |
| @c |
| Default: false (but true for CRAN submission checks). |
| |
| @item _R_CHECK_CODE_CLASS_IS_STRING_ |
| Check if package code has @code{if()} condition which compare the class |
| of an object to a string. |
| See |
| @uref{https://developer.r-project.org/Blog/public/2019/11/09/when-you-think-class.-think-again/index.html} |
| why this is a bad idea. |
| Default: false (but true for CRAN submission checks). |
| |
| @item _R_CHECK_RD_VALIDATE_RD2HTML_ |
| Check the validity of the package HTML help pages using HTML tidy |
| (@uref{https://www.html-tidy.org/}) (if available on the system path for |
| executables). |
| Default: false (but true for CRAN submission checks). |
| @end vtable |
| |
| The following variables control checks for undeclared/unconditional use |
| of other packages. They work by setting up a temporary library |
| directory and setting @code{.libPaths()} to just that and |
| @code{.Library}, so are only effective if additional packages are |
| installed somewhere other than @code{.Library}. The temporary library |
| is populated by symbolic links@footnote{under Windows, junction points, |
| or copies if environment variable @env{R_WIN_NO_JUNCTIONS} has a |
| non-empty value.} to installed packages not also in @code{.library}. |
| |
| @vtable @code |
| @item _R_CHECK_INSTALL_DEPENDS_ |
| If set to a true value and a test installation is to be done, this is |
| done with a temporary library populated by all the |
| Depends/Imports/LinkingTo packages. |
| Default: false (but true for CRAN submission checks). |
| |
| Note that this is actually implemented in @command{R CMD INSTALL}, so it |
| is available to those who first install recording to a log, then call |
| @command{R CMD check}. |
| |
| @item _R_CHECK_SUGGESTS_ONLY_ |
| If set to a true value, running examples, tests and vignettes is done |
| with a temporary library directory populated by all the |
| Depends/Imports/Suggests packages. (As exceptions, packages in a |
| @samp{VignetteBuilder} field are always made available.) |
| @c |
| Default: false (but true for CRAN submission checks: some of the regular |
| checks use true and some use false). |
| |
| @item _R_CHECK_DEPENDS_ONLY_ |
| As for @env{_R_CHECK_SUGGESTS_ONLY_} but using only Depends/Imports (and |
| the exceptions, including test-suite managers in @samp{Suggests}). |
| Default: false |
| |
| @item _R_CHECK_DEPENDS_ONLY_DATA_ |
| Apply @env{_R_CHECK_DEPENDS_ONLY_} only to the check of loading from |
| the @file{data} directory, so checks if any dataset depends on |
| packages which are in Suggests or undeclared. |
| Default: false (but true for CRAN submission checks) |
| |
| @item _R_CHECK_DEPENDS_ONLY_EXAMPLES_ |
| @itemx _R_CHECK_DEPENDS_ONLY_TESTS_ |
| @itemx _R_CHECK_DEPENDS_ONLY_VIGNETTES_ |
| Apply @env{_R_CHECK_DEPENDS_ONLY_} only to the checking of |
| examples, tests or vignettes. These can be used on their own, or |
| with a false value to override @env{_R_CHECK_DEPENDS_ONLY_}. |
| @c |
| Default: the value of @env{_R_CHECK_DEPENDS_ONLY_} or false if that is unset.. |
| |
| @item _R_CHECK_NO_RECOMMENDED_ |
| If set to a true value, augment the previous checks to make recommended |
| packages unavailable unless declared (even if installed in @code{.library}). |
| Default: false (but true for CRAN submission checks). |
| |
| This may give false positives on code which uses |
| @code{grDevices::densCols} and @code{stats:::.asSparse} / |
| @code{stats:::.Diag} as these invoke @CRANpkg{KernSmooth} and |
| @CRANpkg{Matrix} respectively. (Those in @pkg{stats} are called from |
| various contrasts functions if @code{sparse = TRUE} is used.) |
| @end vtable |
| |
| CRAN's submission checks use something like |
| |
| @example |
| _R_CHECK_CRAN_INCOMING_=TRUE |
| _R_CHECK_CRAN_INCOMING_REMOTE_=TRUE |
| _R_CHECK_VC_DIRS_=TRUE |
| _R_CHECK_TIMINGS_=10 |
| _R_CHECK_INSTALL_DEPENDS_=TRUE |
| _R_CHECK_SUGGESTS_ONLY_=TRUE |
| _R_CHECK_NO_RECOMMENDED_=TRUE |
| _R_CHECK_EXECUTABLES_EXCLUSIONS_=FALSE |
| _R_CHECK_DOC_SIZES2_=TRUE |
| _R_CHECK_CODE_ASSIGN_TO_GLOBALENV_=TRUE |
| _R_CHECK_CODE_ATTACH_=TRUE |
| _R_CHECK_CODE_DATA_INTO_GLOBALENV_=TRUE |
| _R_CHECK_CODE_USAGE_VIA_NAMESPACES_=TRUE |
| _R_CHECK_DOT_FIRSTLIB_=TRUE |
| _R_CHECK_DEPRECATED_DEFUNCT_=TRUE |
| _R_CHECK_REPLACING_IMPORTS_=TRUE |
| _R_CHECK_SCREEN_DEVICE_=stop |
| _R_CHECK_TOPLEVEL_FILES_=TRUE |
| _R_CHECK_S3_METHODS_NOT_REGISTERED_=TRUE |
| _R_CHECK_OVERWRITE_REGISTERED_S3_METHODS_=TRUE |
| _R_CHECK_PRAGMAS_=TRUE |
| _R_CHECK_COMPILATION_FLAGS_=TRUE |
| _R_CHECK_R_DEPENDS_=warn |
| _R_CHECK_SERIALIZATION_=TRUE |
| _R_CHECK_R_ON_PATH_=TRUE |
| _R_CHECK_PACKAGES_USED_IN_TESTS_USE_SUBDIRS_=TRUE |
| _R_CHECK_SHLIB_OPENMP_FLAGS_=TRUE |
| _R_CHECK_CONNECTIONS_LEFT_OPEN_=TRUE |
| _R_CHECK_FUTURE_FILE_TIMESTAMPS_=TRUE |
| _R_CHECK_LENGTH_1_CONDITION_=package:_R_CHECK_PACKAGE_NAME_,abort,verbose |
| _R_CHECK_LENGTH_1_LOGIC2_=package:_R_CHECK_PACKAGE_NAME_,abort,verbose |
| _R_CHECK_AUTOCONF_=true |
| _R_CHECK_DATALIST_=true |
| _R_CHECK_THINGS_IN_CHECK_DIR_=true |
| _R_CHECK_THINGS_IN_TEMP_DIR_=true |
| _R_CHECK_BASHISMS_=true |
| _R_CLASS_MATRIX_ARRARY_=true |
| _R_CHECK_ORPHANED_=true |
| _R_CHECK_BOGUS_RETURN_=true |
| _R_CHECK_MATRIX_DATA_=TRUE |
| _R_CHECK_CODE_CLASS_IS_STRING_=true |
| _R_CHECK_RD_VALIDATE_RD2HTML_=true |
| @end example |
| |
| @noindent |
| These are turned on by @command{R CMD check --as-cran}: the incoming |
| checks also use |
| @example |
| _R_CHECK_FORCE_SUGGESTS_=FALSE |
| @end example |
| |
| @noindent |
| since some packages do suggest other packages not available on CRAN or |
| other commonly-used repositories. |
| |
| Several environment variables can be used to set `timeouts': limits for |
| the elapsed time taken by the sub-processes used for parts of the |
| checks. A value of @code{0} indicates no limit, and is the default. |
| Character strings ending in @samp{s}, @samp{m} or @samp{h} indicate a |
| number of seconds, minutes or hours respectively: other values are |
| interpreted as a whole number of seconds (with invalid inputs being |
| treated as no limit). |
| @vtable @code |
| @item _R_CHECK_ELAPSED_TIMEOUT_ |
| The default timeout for sub-processes not otherwise mentioned, and the |
| default value for all except @env{_R_CHECK_ONE_TEST_ELAPSED_TIMEOUT_}. |
| (This is also used by @code{tools::check_packages_in_dir}.) |
| |
| @item _R_CHECK_INSTALL_ELAPSED_TIMEOUT_ |
| Limit for when @command{R CMD INSTALL} is run by @command{check}. |
| |
| @item _R_CHECK_EXAMPLES_ELAPSED_TIMEOUT_ |
| Limit for running all the examples for one sub-architecture. |
| |
| @item _R_CHECK_ONE_TEST_ELAPSED_TIMEOUT_ |
| Limit for running one test for one sub-architecture. Default |
| @env{_R_CHECK_TESTS_ELAPSED_TIMEOUT_}. |
| |
| @item _R_CHECK_TESTS_ELAPSED_TIMEOUT_ |
| Limit for running all the tests for one sub-architecture (and the |
| default limit for running one test). |
| |
| @item _R_CHECK_ONE_VIGNETTE_ELAPSED_TIMEOUT_ |
| Limit for running the @R{} code in one vignette, including for |
| re-building each vignette separately. |
| |
| @item _R_CHECK_BUILD_VIGNETTES_ELAPSED_TIMEOUT_ |
| Limit for re-building all vignettes. |
| |
| @item _R_CHECK_PKGMAN_ELAPSED_TIMEOUT_ |
| Limit for each attempt at building the PDF package manual. |
| @end vtable |
| |
| Another variable which enables stricter checks is to set |
| @env{R_CHECK_CONSTANTS} to @code{5}. This checks that |
| nothing@footnote{The usual culprits are calls to compiled code |
| @emph{via} @code{.Call} or @code{.External} which alter their |
| arguments.} changes the values of `constants'@footnote{things which the |
| byte compiler assumes do not change, e.g.@: function bodies.} in @R{} |
| code. This is best used in conjunction with setting |
| @env{R_JIT_STRATEGY} to @code{3}, which checks code on first use (by |
| default most code is only checked after byte-compilation on second use). |
| Unfortunately these checks slow down checking of examples, tests and |
| vignettes, typically two-fold but in the worst cases at least a |
| hundred-fold. |
| |
| The following environment variables can be used to customize the |
| operation of @command{INSTALL}. |
| |
| @vtable @code |
| @item _R_INSTALL_LIBS_ONLY_FORCE_DEPENDS_IMPORTS_ |
| If true, give an error if installing only package libraries via |
| @option{--libs-only} and some package imported or depended on is not |
| available. |
| Default: true (false only for special applications, which analyze native |
| code of packages). |
| @end vtable |
| |
| @node R coding standards, Testing R code, Tools, Top |
| @chapter R coding standards |
| |
| @cindex coding standards |
| @R{} is meant to run on a wide variety of platforms, including Linux and |
| most variants of Unix as well as Windows and macOS. |
| Therefore, when extending @R{} by either adding to the @R{} base |
| distribution or by providing an add-on package, one should not rely on |
| features specific to only a few supported platforms, if this can be |
| avoided. In particular, although most @R{} developers use @acronym{GNU} |
| tools, they should not employ the @acronym{GNU} extensions to standard |
| tools. Whereas some other software packages explicitly rely on e.g.@: |
| @acronym{GNU} make or the @acronym{GNU} C++ compiler, @R{} does not. |
| Nevertheless, @R{} is a @acronym{GNU} project, and the spirit of the |
| @cite{@acronym{GNU} Coding Standards} should be followed if possible. |
| |
| The following tools can ``safely be assumed'' for @R{} extensions. |
| |
| @itemize @bullet |
| @item |
| An ISO C99 C compiler. Note that extensions such as @acronym{POSIX} |
| 1003.1 must be tested for, typically using Autoconf unless you are sure |
| they are supported on all mainstream @R{} platforms (including Windows |
| and macOS). |
| |
| @item |
| A fixed-form Fortran compiler. |
| |
| @item |
| A simple @command{make}, considering the features of @command{make} in |
| 4.2 @acronym{BSD} systems as a baseline. |
| @findex make |
| |
| @acronym{GNU} or other extensions, including pattern rules using |
| @samp{%}, the automatic variable @samp{$^}, the @samp{+=} syntax to |
| append to the value of a variable, the (``safe'') inclusion of makefiles |
| with no error, conditional execution, and many more, must not be used |
| (see Chapter ``Features'' in the @cite{@acronym{GNU} Make Manual} for |
| more information). On the other hand, building @R{} in a separate |
| directory (not containing the sources) should work provided that |
| @command{make} supports the @code{VPATH} mechanism. |
| |
| Windows-specific makefiles can assume @acronym{GNU} @command{make} 3.79 |
| or later, as no other @command{make} is viable on that platform. |
| |
| @item |
| A Bourne shell and the ``traditional'' Unix programming tools, including |
| @command{grep}, @command{sed}, and @command{awk}. |
| |
| There are @acronym{POSIX} standards for these tools, but these may not |
| be fully supported. Baseline features could be determined from a book |
| such as @cite{The UNIX Programming Environment} by Brian W. Kernighan & |
| Rob Pike. Note in particular that @samp{|} in a regexp is an extended |
| regexp, and is not supported by all versions of @command{grep} or |
| @command{sed}. The Open Group Base Specifications, Issue 7, which are |
| technically identical to IEEE Std 1003.1 (POSIX), 2008, |
| are available at |
| @uref{https://pubs.opengroup.org/onlinepubs/9699919799/mindex.html}. |
| @end itemize |
| |
| Under Windows, most users will not have these tools installed, and you |
| should not require their presence for the operation of your package. |
| However, users who install your package from source will have them, as |
| they can be assumed to have followed the instructions in ``the Windows |
| toolset'' appendix of the ``R Installation and Administration'' manual |
| to obtain them. Redirection cannot be assumed to be available via |
| @command{system} as this does not use a standard shell (let alone a |
| Bourne shell). |
| |
| @noindent |
| In addition, the following tools are needed for certain tasks. |
| |
| @itemize @bullet |
| @item |
| Perl version 5 is needed for the maintainer-only script |
| @file{tools/help2man.pl}. |
| @findex Perl |
| |
| @item |
| @command{texinfo} version 5.1 or later is needed to build the HTML, PDF |
| and Info files for the @R{} manuals written in the @acronym{GNU} Texinfo |
| system. And that requires Perl. |
| @findex makeinfo |
| @end itemize |
| |
| It is also important that code is written in a way that allows others to |
| understand it. This is particularly helpful for fixing problems, and |
| includes using self-descriptive variable names, commenting the code, and |
| also formatting it properly. The @R{} Core Team recommends to use a |
| basic indentation of 4 for @R{} and C (and most likely also Perl) code, |
| and 2 for documentation in Rd format. Emacs (21 or later) users can |
| implement this indentation style by putting the following in one of |
| their startup files, and using customization to set the |
| @code{c-default-style} to @code{"bsd"} and @code{c-basic-offset} to |
| @code{4}.) |
| @findex emacs |
| |
| @smallexample |
| @group |
| ;;; ESS |
| (add-hook 'ess-mode-hook |
| (lambda () |
| (ess-set-style 'C++ 'quiet) |
| ;; Because |
| ;; DEF GNU BSD K&R C++ |
| ;; ess-indent-level 2 2 8 5 4 |
| ;; ess-continued-statement-offset 2 2 8 5 4 |
| ;; ess-brace-offset 0 0 -8 -5 -4 |
| ;; ess-arg-function-offset 2 4 0 0 0 |
| ;; ess-expression-offset 4 2 8 5 4 |
| ;; ess-else-offset 0 0 0 0 0 |
| ;; ess-close-brace-offset 0 0 0 0 0 |
| (add-hook 'local-write-file-hooks |
| (lambda () |
| (ess-nuke-trailing-whitespace))))) |
| (setq ess-nuke-trailing-whitespace-p 'ask) |
| ;; or even |
| ;; (setq ess-nuke-trailing-whitespace-p t) |
| @end group |
| @group |
| ;;; Perl |
| (add-hook 'perl-mode-hook |
| (lambda () (setq perl-indent-level 4))) |
| @end group |
| @end smallexample |
| |
| @noindent |
| (The `GNU' styles for Emacs' C and R modes use a basic indentation of 2, |
| which has been determined not to display the structure clearly enough |
| when using narrow fonts.) |
| |
| @node Testing R code, Use of TeX dialects, R coding standards, Top |
| @chapter Testing R code |
| |
| When you (as @R{} developer) add new functions to the R base (all the |
| packages distributed with @R{}), be careful to check if @kbd{make |
| test-Specific} or particularly, @kbd{cd tests; make no-segfault.Rout} |
| still works (without interactive user intervention, and on a standalone |
| computer). If the new function, for example, accesses the Internet, or |
| requires @acronym{GUI} interaction, please add its name to the ``stop |
| list'' in @file{tests/no-segfault.Rin}. |
| |
| [To be revised: use @command{make check-devel}, check the write barrier |
| if you change internal structures.] |
| |
| @node Use of TeX dialects, Current and future directions, Testing R code, Top |
| @chapter Use of TeX dialects |
| |
| Various dialects of TeX are used for different purposes in @R{}. The |
| policy is that manuals be written in @samp{texinfo}, and for convenience |
| the main and Windows FAQs are also. This has the advantage that is is |
| easy to produce @HTML{} and plain text versions as well as typeset manuals. |
| |
| @LaTeX{} is not used directly, but rather as an intermediate format for |
| typeset help documents and for vignettes. |
| |
| Care needs to be taken about the assumptions made about the @R{} user's |
| system: it may not have either @samp{texinfo} or a TeX system |
| installed. We have attempted to abstract out the cross-platform |
| differences, and almost all the setting of typeset documents is done by |
| @code{tools::texi2dvi}. This is used for offline printing of help |
| documents, preparing vignettes and for package manuals via @command{R |
| CMD Rd2pdf}. It is not currently used for the @R{} manuals created in |
| directory @file{doc/manual}. |
| |
| @code{tools::texi2dvi} makes use of a system command @command{texi2dvi} |
| where available. On a Unix-alike this is usually part of |
| @samp{texinfo}, whereas on Windows if it exists at all it would be an |
| executable, part of MiKTeX. If none is available, the @R{} code runs |
| a sequence of @command{(pdf)latex}, @command{bibtex} and |
| @command{makeindex} commands. |
| |
| This process has been rather vulnerable to the versions of the external |
| software used: particular issues have been @command{texi2dvi} and |
| @file{texinfo.tex} updates, mismatches between the two@footnote{Linux |
| distributions tend to unbundle @file{texinfo.tex} from @samp{texinfo}.}, |
| versions of the @LaTeX{} package @samp{hyperref} and quirks in index |
| production. The licenses used for @LaTeX{} and latterly @samp{texinfo} |
| prohibit us from including `known good' versions in the @R{} |
| distribution. |
| |
| On a Unix-alike @command{configure} looks for the executables for TeX and |
| friends and if found records the absolute paths in the system |
| @file{Renviron} file. This used to record @samp{false} if no command |
| was found, but it nowadays records the name for looking up on the path |
| at run time. The latter can be important for binary distributions: one |
| does not want to be tied to, for example, TeX Live 2007. |
| |
| |
| @node Current and future directions, Function and variable index, Use of TeX dialects, Top |
| @chapter Current and future directions |
| |
| This chapter is for notes about possible in-progress and future changes |
| to @R{}: there is no commitment to release such changes, let alone to a |
| timescale. |
| |
| @menu |
| * Long vectors:: |
| * 64-bit types:: |
| * Large matrices:: |
| @end menu |
| |
| @node Long vectors, 64-bit types, Current and future directions, Current and future directions |
| @section Long vectors |
| |
| Vectors in @R{} 2.x.y were limited to a length of 2^31 - 1 elements |
| (about 2 billion), as the length is stored in the @code{SEXPREC} as a C |
| @code{int}, and that type is used extensively to record lengths and |
| element numbers, including in packages. |
| |
| Note that longer vectors are effectively impossible under 32-bit |
| platforms because of their address limit, so this section applies only |
| on 64-bit platforms. The internals are unchanged on a 32-bit build of |
| @R{}. |
| |
| A single object with 2^31 or more elements will take up at least 8GB of |
| memory if integer or logical and 16GB if numeric or character, so |
| routine use of such objects is still some way off. |
| |
| There is now some support for long vectors. This applies to raw, |
| logical, integer, numeric and character vectors, and lists and |
| expression vectors. (Elements of character vectors (@code{CHARSXP}s) |
| remain limited to 2^31 - 1 bytes.) Some considerations: |
| |
| |
| @itemize |
| |
| @item |
| This has been implemented by recording the length (and true length) as |
| @code{-1} and recording the actual length as a 64-bit field at the |
| beginning of the header. Because a fair amount of code in @R{} uses a |
| signed type for the length, the `long length' is recorded using the |
| signed C99 type @code{ptrdiff_t}, which is typedef-ed to |
| @code{R_xlen_t}. |
| |
| @item |
| These can in theory have 63-bit lengths, but note that current 64-bit |
| OSes do not even theoretically offer 64-bit address spaces and there is |
| currently a 52-bit limit (which exceeds the theoretical limit of current |
| OSes and ensures that such lengths can be stored exactly in doubles). |
| |
| @item |
| The serialization format has been changed to accommodate longer lengths, |
| but vectors of lengths up to 2^31-1 are stored in the same way as |
| before. Longer vectors have their length field set to @code{-1} and |
| followed by two 32-bit fields giving the upper and lower 32-bits of the |
| actual length. There is currently a sanity check which limits lengths |
| to 2^48 on unserialization. |
| |
| @item |
| The type @code{R_xlen_t} is made available to packages in C header |
| @file{Rinternals.h}: this should be fine in C code since C99 is |
| required. People do try to use @R{} internals in C++, but C++98 |
| compilers are not required to support these types. |
| |
| @item |
| Indexing can be done via the use of doubles. The internal indexing code |
| used to work with positive integer indices (and negative, logical and |
| matrix indices were all converted to positive integers): it now works |
| with either @code{INTSXP} or @code{REALSXP} indices. |
| |
| @item |
| The @R{} function @code{length} returns a double value if the length |
| exceeds 2^31-1. Code calling @code{as.integer(length(x))} before passing |
| to @code{.C}/@code{.Fortran} should checks for an @code{NA} result. |
| |
| @end itemize |
| |
| @node 64-bit types, Large matrices, Long vectors, Current and future directions |
| @section 64-bit types |
| |
| There is also some desire to be able to store larger integers in @R{}, |
| although the possibility of storing these as @code{double} is often |
| overlooked (and e.g.@: file pointers as returned by @code{seek} are |
| already stored as @code{double}). |
| |
| Different routes have been proposed: |
| |
| @itemize |
| |
| @item |
| Add a new type to @R{} and use that for lengths and indices---most likely |
| this would be a 64-bit signed type, say @code{longint}. @R{}'s usual |
| implicit coercion rules would ensure that supplying an @code{integer} |
| vector for indexing or @code{length<-} would work. |
| |
| @item |
| A more radical alternative is to change the existing @code{integer} type |
| to be 64-bit on 64-bit platforms (which was the approach taken by S-PLUS |
| for DEC/Compaq Alpha systems). Or even on all platforms. |
| |
| @item |
| Allow either @code{integer} or @code{double} values for lengths and |
| indices, and return @code{double} only when necessary. |
| |
| @end itemize |
| |
| The third has the advantages of minimal disruption to existing code and |
| not increasing memory requirements. In the first and third scenarios |
| both @R{}'s own code and user code would have to be adapted for lengths |
| that were not of type @code{integer}, and in the third code branches for |
| long vectors would be tested rarely. |
| |
| Most users of the @code{.C} and @code{.Fortran} interfaces use |
| @code{as.integer} for lengths and element numbers, but a few omit these |
| in the knowledge that these were of type @code{integer}. It may be |
| reasonable to assume that these are never intended to be used with long |
| vectors. |
| |
| The remaining interfaces will need to cope with the changed |
| @code{VECTOR_SEXPREC} types. It seems likely that in most cases lengths |
| are accessed by the @code{length} and @code{LENGTH} |
| functions@footnote{but @code{LENGTH} is a macro under some internal |
| uses.} The current approach is to keep these returning 32-bit lengths and |
| introduce `long' versions @code{xlength} and @code{XLENGTH} which return |
| @code{R_xlen_t} values. |
| |
| |
| See also @uref{https://homepage.cs.uiowa.edu/~luke/talks/useR10.pdf}. |
| |
| @node Large matrices, , 64-bit types, Current and future directions |
| @section Large matrices |
| |
| Matrices are stored as vectors and so were also limited to 2^31-1 |
| elements. Now longer vectors are allowed on 64-bit platforms, matrices |
| with more elements are supported provided that each of the dimensions is |
| no more than 2^31-1. However, not all applications can be supported. |
| |
| The main problem is linear algebra done by Fortran code compiled with |
| 32-bit @code{INTEGER}. Although not guaranteed, it seems that all the |
| compilers currently used with @R{} on a 64-bit platform allow matrices |
| each of whose dimensions is less than 2^31 but with 2^31 or more |
| elements and index them correctly, and a substantial part of the |
| support software (such as @acronym{BLAS} and @acronym{LAPACK}) also |
| work. |
| |
| There are exceptions: for example some complex @acronym{LAPACK} |
| auxiliary routines do use a single @code{INTEGER} index and hence |
| overflow silently and segfault or give incorrect results. One example |
| seen was @code{svd()} on a complex matrix. |
| |
| Since this is implementation-dependent, it is possible that optimized |
| @acronym{BLAS} and @acronym{LAPACK} may have further restrictions: |
| a segfault have been reported from @code{svd()} using ATLAS on |
| @cputype{x86_64} Linux. |
| @c https://stat.ethz.ch/pipermail/r-devel/2021-August/081004.html |
| |
| For matrix algebra on large matrices one almost certainly wants a |
| machine with a lot of RAM (100s of gigabytes), many cores and a |
| multi-threaded @acronym{BLAS}. |
| |
| |
| |
| @node Function and variable index, Concept index, Current and future directions, Top |
| @unnumbered Function and variable index |
| |
| @printindex vr |
| |
| @node Concept index, , Function and variable index, Top |
| @unnumbered Concept index |
| |
| @printindex cp |
| |
| @bye |
| |
| @c Local Variables: *** |
| @c mode: TeXinfo *** |
| @c End: *** |