title: “Immediate Binding Values” author: Luke Tierney output: html_document

Background

For scalar numerical code it can help to allow variable bindings to hold scalar integer, logical, and double values as immediate values rather than as allocated scalar vectors, or boxed values. This eliminates the overhead of checking whether they might be shared or have attributes. It also makes inlining scalar computations for basic arithmetic operations and element access in the byte code engine more effective. The combined benefit can be as high as 20% for some examples, including the convolution example from the extensions manual. Having immediate bindings also allows some brittle optimizations for updating scalar variable bindings and loop indices to be removed.

This note reflects changed committed to R_devel in r77327.)

Interface

Binding cells have a marker that is returned by BNDCELL_TAG. The marker, or tag, is zero for standard bindings, and one of REALSXP, INTSXP, or LGLSXP for immediate bindings.

BINDING_VALUE, used only in eval.c and envir.c, always returns an allocated object as the value of a binding. For immediate bindings it first converts to a standard binding by allocating and installing a scalar vector of the appropriate type. This allows most code to be unaware of the existence of typed bindings. The allocation is done by R_expand_binding_value.

Code that wants to take advantage of typed bindings can read and set their values with

  • INTSXP: BNDCELL_IVAL(cell), SET_BNDCELL_IVAL(cell, val)
  • LGLSXP: BNDCELL_LVAL(cell), SET_BNDCELL_LVAL(cell, val)
  • REALSXP:BNDCELL_DVAL(cell), SET_BNDCELL_DVAL(cell, val)

These do not check or set the type tag. To create and initialize a new immediate binding in a cell use

  • INTSXP: NEW_BNDCELL_IVAL(cell, val)
  • LGLSXP: NEW_BNDCELL_LVAL(cell, val)
  • REALSXP:NEW_BNDCELL_DVAL(cell, val)

The generic CAR accessor has been modified to signal an error if it encounters a cell with an immediate CAR value. This ensures immediate values are only used in the context of bindings. This makes it easier to avoid inadvertent boxing and may help with a transition to a different environment and binding representation.

The setters, such as SETCAR, clear an immediate binding marker without signaling an error.

Notes

  • For now, the sxpinfo.extra field is used to hold the binding tag.

  • Two implementations are provided for representing the immediate values. One replaces the SEXP CAR field by a union; he other allocates a boxed value. The union representation is conceptually more natural and a little more efficient. But it would require a change in memory layout on 32-bit platforms since the union requires 8 bytes for the double value while a pointer only requires 4 bytes. On 64-bit hardware the union approach should not change the memory layout.

    For now, the union approach is used on 64-bit platforms and the boxed approach on 32-bit ones. It would be best to use the union approach unconditionally, but this would require changing the binary version and rebuilding all packages with compiled code. This should probably be done before release.

  • The approach taken for now is to just allow immediate values in the CAR of binding cells. An alternative would be to allow immediate values in all CONS cells, or even more widely, such as in vector element. Allowing immediate values in all CONS cells would have been a little simpler. But it would have make it harder to detect unintended boxing, and might also have made it harder to transition to an alternate environment or binding representation should we wish to do that.

    If immediate values were to be supported more widely it would probably be necessary to suspend the GC when boxing values in R_expand_binding_value.

  • Serialization handles environment frames with standard pairlist code, so the code not checks for an immediate binding and boxes the value if necessary. An alternative would be to update the serialization format to support immediate bindings. But given how challenging it is to change the format it seemed best just to box.

  • Only unlocked standard environment bindings that can be cached can be turned into immediate bindings. Symbol bindings for the base environment are not cached, and bindings for user data bases are locked when returned by findVarLoc or findVarLocInFrame, so neither of these can become immediate bindings.

  • BINDING_VALUE is defined slightly differently in eval.c and envir.c. It would be good to unify these eventually.