src/library/compiler/noweb/compiler.nw - R - Git at Google

 % -*- mode: ess-noweb; ess-noweb-code-mode: R-mode -*-

 \documentclass[11pt]{article}
 \usepackage{hyperref}
 \usepackage[headings]{fullpage}
 \usepackage{verbatim}
 \usepackage{noweb}

 % This is a minor modification to the new verbatim environment that uses
 % the same size as noweb's code size and indents the same amount at
 % noweb's code chunks. I redefine verbatim instead of defining my own
 % environment since the html converter only seems to understand
 % verbatim, not new definitions.
 \makeatletter
 \addto@hook{\every@verbatim}{\nowebsize\setlength{\leftmargin}{50mm}}
 \def\verbatim@processline{\hspace{\codemargin}\the\verbatim@line\par}
 \makeatother

 % The following try to prevent wasteful page breaks
 \def\nwendcode{\endtrivlist \endgroup}
 \let\nwdocspar=\par

 \pagestyle{noweb}
 \bibliographystyle{plain}

 \noweboptions{noidentxref,longchunks,smallcode}

 \title{A Byte Code Compiler for R}
 \author{Luke Tierney\\
   Department of Statistics and Actuarial Science\\
   University of Iowa}

 \begin{document}
 \maketitle
 This document presents the current implementation of the byte code
 compiler for R.  The compiler produces code for a virtual machine that
 is then executed by a virtual machine runtime system.  The virtual
 machine is a stack based machine.  Thus instructions for the virtual
 machine take arguments off a stack and may leave one or more results
 on the stack.  Byte code objects consists of an integer vector
 representing instruction opcodes and operands, and a generic vector
 representing a constant pool. The compiler is implemented almost
 entirely in R, with just a few support routines in C to manage
 compiled code objects.

 The virtual machine instruction set is designed to allow much of the
 interpreter internals to be re-used.  In particular, for now the
 mechanism for calling functions of all types from compiled code
 remains the same as the function calling mechanism for interpreted
 code.  There are opportunities for efficiency improvements through
 using a different mechanism for calls from compiled functions to
 compiled functions, or changing the mechanism for both interpreted and
 compiled code; this will be explored in future work.

 The style used by the compiler for building up code objects is
 imperative: A code buffer object is created that contains buffers for
 the instruction stream and the constant pool.  Instructions and
 constants are written to the buffer as the compiler processes an
 expression tree, and at the end a code object is constructed. A more
 functional design in which each compiler step returns a modified code
 object might be more elegant in principle, but it would be more
 difficult to make efficient.

 A multi-pass compiler in which a first pass produces an intermediate
 representation, subsequent passes optimize the intermediate
 representation, and a final pass produces actual code would also be
 useful and might be able to produce better code.  A future version of
 the compiler may use this approach.  But for now to keep things simple
 a single pass is used.

 %% **** Some peephole optimization is probably possible, and at least
 %% **** some constant folding could be done on the bytecode, but more
 %% **** sophisticated optimizations like inlining or R code would require
 %% **** a more suitable intermediate representation.

 %% **** I _think_ conversion from stack-based byte code to register-based
 %% **** code is reasonably straight forward but I haven't thought it
 %% **** through thoroughly yet.


 \section{The compiler interface}
 The compiler can be used either explicitly by calling certain
 functions to carry out compilations, or implicitly by enabling
 compilation to occur automatically at certain points.

 \subsection{Explicit compilation}
 The primary functions for explicit compilation are [[compile]],
 [[cmpfun]], and [[cmpfile]].

 The [[compile]] function compiles an expression and returns a byte code
 object, which can then be passed to [[eval]].  A simple example is
 \begin{verbatim}
 > library(compiler)
 > compile(quote(1+3))
 <bytecode: 0x25ba070>
 > eval(compile(quote(1+3)))
 [1] 4
 \end{verbatim}

 A closure can be compiled using [[cmpfun]].  If the function [[f]] is
 defined as
 \begin{verbatim}
 f <- function(x) {
     s <- 0.0
     for (y in x)
         s <- s + y
     s
 }
 \end{verbatim}
 then a compiled version is produced by
 \begin{verbatim}
 fc <- cmpfun(f)
 \end{verbatim}
 We can then compare the performance of the interpreted and compiled
 versions:
 \begin{verbatim}
 > x <- as.double(1 : 10000000)
 > system.time(f(x))
    user  system elapsed
   6.470   0.010   6.483
 > system.time(fc(x))
    user  system elapsed
   1.870   0.000   1.865
 \end{verbatim}

 A source file can be compiled with [[cmpfile]].  For now, the resulting
 file has to then be loaded with [[loadcmp]].  In the future it may
 make sense to allow [[source]] to either load a pre-compiled file or
 to optionally compile while sourcing.

 \subsection{Implicit compilation}
 Implicit compilation can be used to compile packages as they are
 installed or for just-in-time (JIT) compilation of functions or
 expressions.  The mechanism for enabling these is experimental and
 likely to change.

 For now, compilation of packages requires the use of lazy loading and can be
 enabled either by calling [[compilePKGS]] with argument [[TRUE]] or by
 starting R with the environment variable [[_R_COMPILE_PKGS_]] set to a
 positive integer value.  These settings are used internally during R build
 to compile the base package (and tools, utils, methods, etc) and by
 [[R CMD INSTALL]].
 Functions are compiled as they are written to the lazy loading database.
 Compilation of packages should only be enabled for that time, because it
 adds noticeable time and space overhead to any serialization.

 In a UNIX-like environment, for example, installing a package with
 \begin{verbatim}
 env R_COMPILE_PKGS=1 R CMD INSTALL foo.tar.gz
 \end{verbatim}
 will internally enable package compilation using [[compilePKGS]].

 If R is installed from source then the base and required packages can
 be compiled on installation using
 \begin{verbatim}
 make bytecode
 \end{verbatim}
 This does not require setting the [[_R_COMPILE_PKGS_]] environment variable.

 JIT compilation can be enabled from within R by calling [[enableJIT]]
 with a non-negative integer argument or by starting R with the
 environment variable [[R_ENABLE_JIT]] set to a non-negative integer.
 The possible values of the argument to [[enableJIT]] and their
 meanings are
 \begin{itemize}
 \item[0] turn off JIT
 \item[1] compile closures before they are called the first time
 \item[2] same as 1, plus compile closures before duplicating (useful
   for packages that store closures in lists, like lattice)
 \item[3] same as 2, plus compile all [[for()]], [[while()]], and
   [[repeat()]] loops before executing.
 \end{itemize}
 R may initially be somewhat sluggish if JIT is enabled and base and
 recommended packages have not been pre-compiled as almost everything
 will initially need some compilation.


 \section{The basic compiler}
 This section presents the basic compiler for compiling R expressions
 to byte code objects.


 \subsection{The compiler top level}
 R expressions consist of function calls, variable references, and
 literal constants.  To create a byte code object representing an R
 expression the compiler has to walk the expression tree and emit code
 for the different node types in encounters. The code emitted may
 depend on the environment in which the expression will be evaluated as
 well as various compiler option settings.

 The simplest function in the top level compiler interface is the
 function [[compile]].  This function requires an expression argument
 and takes three optional arguments: an environment, a list of
 options and source code reference.  The default environment is the global
 environment. By default, the source reference argument is [[NULL]] and the
 source reference is taken from the [[srcref]] attribute of the expression
 argument.
 <<[[compile]] function>>=
 compile <- function(e, env = .GlobalEnv, options = NULL, srcref = NULL) {
     cenv <- makeCenv(env)
     cntxt <- make.toplevelContext(cenv, options)
     cntxt$env <- addCenvVars(cenv, findLocals(e, cntxt))
     if (mayCallBrowser(e, cntxt))
         ## NOTE: compilation will be attempted repeatedly
         e
     else if (is.null(srcref))
         genCode(e, cntxt)
     else
         genCode(e, cntxt, loc = list(expr = e, srcref = srcref))
 }
 @ %def compile
 The supplied environment is converted into a compilation environment
 data structure.  This compilation environment and any options
 provided are then used to construct a compiler context.  The function
 [[genCode]] is then used to generate a byte code object for the
 expression and the constructed compilation context.

 Compilation environments are described in Section
 \ref{sec:environments} and compiler contexts in Section
 \ref{sec:contexts}. The [[genCode]] function is defined as
 <<[[genCode]] function>>=
 genCode <- function(e, cntxt, gen = NULL, loc = NULL) {
     cb <- make.codeBuf(e, loc)
     if (is.null(gen))
         cmp(e, cb, cntxt, setloc = FALSE)
     else
         gen(cb, cntxt)
     codeBufCode(cb, cntxt)
 }
 @ %def genCode
 [[genCode]] creates a code buffer, fills the code buffer, and then
 calls [[codeBufCode]] to extract and return the byte code object.  In
 the most common case [[genCode]] uses the low level recursive
 compilation function [[cmp]], described in Section \ref{subsec:cmp},
 to generate the code.  For added flexibility it can be given a
 generator function that emits code into the code buffer based on the
 provided context.  This is used in Section \ref{sec:loops} for
 compilation of loop bodies in loops that require an explicit loop context
 (and a long jump in the byte-code interpreter).


 \subsection{Basic code buffer interface}
 Code buffers are used to accumulate the compiled code and related
 constant values.  A code buffer [[cb]] is a list containing a number
 of closures used to manipulate the content of the code buffer.  In
 this section two closures are used, [[putconst]] and [[putcode]].

 The closure [[cb$putconst]] is used to enter constants into the
 constant pool.  It takes a single argument, an arbitrary R object to
 be entered into the constant pool, and returns an integer index into
 the pool.  The [[cb$putcode]] closure takes an instruction opcode and
 any operands the opcode requires and emits them into the code buffer.
 The operands are typically constant pool indices or labels, to be
 introduced in Section \ref{sec:codebuf}.

 As an example, the [[GETVAR]] instruction takes one operand, the index
 in the constant pool of a symbol. The opcode for this instruction is
 [[GETVAR.OP]].  The instruction retrieves the symbol from the constant
 pool, looks up its value in the current environment, and pushes the
 value on the stack.  If [[sym]] is a variable with value of a symbol,
 then code to enter the symbol in the constant pool and emit an
 instruction to get its value would be
 <<example of emitting a [[GETVAR]] instruction>>=
 ci <- cb$putconst(sym)
 cb$putcode(GETVAR.OP, ci)
 @ %def

 The complete code buffer implementation is given in Section
 \ref{sec:codebuf}.


 \subsection{The recursive code generator}
 \label{subsec:cmp}
 The function [[cmp]] is the basic code generation function. It
 recursively traverses the expression tree and emits code as it visits
 each node in the tree.

 Before generating code for an expression the function [[cmp]] attempts
 to determine the value of the expression by constant folding using the
 function [[constantFold]].  If constant folding is successful then
 [[contantFold]] returns a named list containing a [[value]] element.
 Otherwise it returns [[NULL]].  If constant folding is successful,
 then the result is compiled as a constant.  Otherwise, the standard
 code generation process is used.
 %% **** comment on alternative of doing constant folding as an
 %% **** optimization on the bytecode or an intermediate representation?

 In the interpreter there are four types of objects that are not
 treated as constants, i.e. as evaluating to themselves: function calls
 of type [["language"]], variable references of type [["symbol"]],
 promises, and byte code objects.  Neither promises nor byte code
 objects should appear as literals in code so an error is signaled for
 those.  The language, symbol, and constant cases are each handled by
 their own code generators.
 %% **** promises do appear in the expressions generated by the
 %% **** interpreter for evaluating complex assignment expressions
 <<generate code for expression [[e]]>>=
 if (typeof(e) == "language")
     cmpCall(e, cb, cntxt)
 else if (typeof(e) == "symbol")
     cmpSym(e, cb, cntxt, missingOK)
 else if (typeof(e) == "bytecode")
     cntxt$stop(gettext("cannot compile byte code literals in code"),
                cntxt, loc = cb$savecurloc())
 else if (typeof(e) == "promise")
     cntxt$stop(gettext("cannot compile promise literals in code"),
                cntxt, loc = cb$savecurloc())
 else
     cmpConst(e, cb, cntxt)
 @
 The function [[cmp]] is then defined as
 <<[[cmp]] function>>=
 cmp <- function(e, cb, cntxt, missingOK = FALSE, setloc = TRUE) {
     if (setloc) {
         sloc <- cb$savecurloc()
         cb$setcurexpr(e)
     }
     ce <- constantFold(e, cntxt, loc = cb$savecurloc())
     if (is.null(ce)) {
         <<generate code for expression [[e]]>>
     }
     else
         cmpConst(ce$value, cb, cntxt)
     if (setloc)
         cb$restorecurloc(sloc)
 }
 @ %def cmp
 The call code generator [[cmpCall]] will recursively call [[cmp]].
 %% **** should promises/byte code produce compiler errors or runtime errors??


 \subsection{Compiling constant expressions}
 The constant code generator [[cmpConst]] is the simplest of the three
 generators. A simplified generator can be defined as
 <<simplified [[cmpConst]] function>>=
 cmpConst <- function(val, cb, cntxt) {
     ci <- cb$putconst(val)
     cb$putcode(LDCONST.OP, ci)
     if (cntxt$tailcall) cb$putcode(RETURN.OP)
 }
 @ %def cmpConst
 This function enters the constant in the constant pool using the
 closure [[cb$putconst]].  The value returned by this closure is an
 index for the constant in the constant pool.  Then the code generator
 emits an instruction to load the constant at the specified constant
 pool index and push it onto the stack.  If the expression appears in
 tail position then a [[RETURN]] instruction is emitted as well.
 %% **** explain tail position here??

 Certain constant values, such as [[TRUE]], [[FALSE]], and [[NULL]]
 appear very often in code. It may be useful to provide and use special
 instructions for loading these. The resulting code will have slightly
 smaller constant pools and may be a little faster, though the
 difference is likely to be small.  A revised definition of
 [[cmpConst]] that makes use of instructions for loading these
 particular values is given by
 <<[[cmpConst]] function>>=
 cmpConst <- function(val, cb, cntxt) {
     if (identical(val, NULL))
         cb$putcode(LDNULL.OP)
     else if (identical(val, TRUE))
         cb$putcode(LDTRUE.OP)
     else if (identical(val, FALSE))
         cb$putcode(LDFALSE.OP)
     else {
         ci <- cb$putconst(val)
         cb$putcode(LDCONST.OP, ci)
     }
     if (cntxt$tailcall) cb$putcode(RETURN.OP)
 }
 @ %def cmpConst
 It might be useful to handle other constants in a similar way, such as
 [[NA]] or small integer values; this may be done in the future.
 %% **** check out if small integers is worth doing.
 %% **** mention peephole optimization as alternative

 The implementation marks values in the constant pool as read-only
 after they are loaded. In the past, all values were duplicated as they
 were retrieved from the constant pool as a precaution against bad
 package code: several packages in the wild assumed that an expression
 [[TRUE]], for example, appearing in code would result in a freshly
 allocated value that could be freely modified in [[.C]] calls.


 \subsection{Compiling variable references}
 The function [[cmpSym]] handles compilation of variable
 references. For standard variables this involves entering the symbol
 in the constant pool, emitting code to look up the value of the
 variable at the specified constant pool location in the current
 environment, and, if necessary, emitting a [[RETURN]] instruction.

 In addition to standard variables there is the ellipsis variable
 [[...]]  and the accessors [[..1]], [[..2]], and so on that need to be
 considered. The ellipsis variable can only appear as an argument in
 function calls, so [[cmp]], like the interpreter [[eval]] itself,
 should not encounter it. The interpreter signals an error if it does
 encounter a [[...]] variable, and the compiler emits code that does
 the same at runtime.  The compiler also emits a warning at compile
 time.  Variables representing formal parameters may not have values
 provided in their calls, i.e. may have missing values. In some cases
 this should signal an error; in others the missing value can be passed
 on (for example in expressions of the form [[x[]]]). To support this,
 [[cmpSym]] takes an optional argument for allowing missing argument
 values.
 <<[[cmpSym]] function>>=
 cmpSym <- function(sym, cb, cntxt, missingOK = FALSE) {
     if (sym == "...") {
         notifyWrongDotsUse("...", cntxt, loc = cb$savecurloc())
         cb$putcode(DOTSERR.OP)
     }
     else if (is.ddsym(sym)) {
         <<emit code for [[..n]] variable references>>
     }
     else {
         <<emit code for standard variable references>>
     }
 }
 @ %def cmpSym

 References to [[..n]] variables are also only appropriate when a
 [[...]] variable is available, so a warning is given if that is not
 the case. The virtual machine provides instructions [[DDVAL]] and
 [[DDVAL_MISSOK]] for the case where missing arguments are not allowed
 and for the case where they are, and the appropriate instruction is
 used based on the [[missingOK]] argument to [[cmpSym]].
 <<emit code for [[..n]] variable references>>=
 if (! findLocVar("...", cntxt))
     notifyWrongDotsUse(sym, cntxt, loc = cb$savecurloc())
 ci <- cb$putconst(sym)
 if (missingOK)
     cb$putcode(DDVAL_MISSOK.OP, ci)
 else
     cb$putcode(DDVAL.OP, ci)
 if (cntxt$tailcall) cb$putcode(RETURN.OP)
 @ %def

 There are also two instructions available for obtaining the value of a
 general variable from the current environment, one that allows missing
 values and one that does not.
 <<emit code for standard variable references>>=
 if (! findVar(sym, cntxt))
     notifyUndefVar(sym, cntxt, loc = cb$savecurloc())
 ci <- cb$putconst(sym)
 if (missingOK)
     cb$putcode(GETVAR_MISSOK.OP, ci)
 else
     cb$putcode(GETVAR.OP, ci)
 if (cntxt$tailcall) cb$putcode(RETURN.OP)
 @ %def

 For now, these instructions only take an index in the constant pool
 for the symbol as operands, not any information about where the
 variable can be found within the environment.  This approach to
 obtaining the value of variables requires a search of the current
 environment for every variable reference.  In a less dynamic language
 it would be possible to compute locations of variable bindings within
 an environment at compile time and to choose environment
 representations that allow constant time access to any variable's
 value.  Since bindings in R can be added or removed at runtime this
 would require a semantic change that would need some form of
 declaration to make legitimate. Another approach that may be worth
 exploring is some sort of caching mechanism in which the location of
 each variable is stored when it is first found by a full search, and
 that cached location is used until an event occurs that forces
 flushing of the cache. If such events are rare, as they typically are,
 then this may be effective.
 %% **** need to look into caching strategies

 %% **** looks like a simple cache of the local frame speeds up sum and
 %% **** Neal's em by about 10% (just lookup, not assignment -- with
 %% **** assignment should be a bit better)

 %% **** Is it really necessary for bcEval to save/restore stack tops?
 %% **** Shouldn't that happen automatically?
 %% **** Is it possible to have closure calling stay in the same bc?
 %% **** maybe at least for promises?


 \subsection{Compiling function calls}
 Conceptually, the R function calling mechanism uses lazy evaluation of
 arguments.  Thus calling a function involves three steps:
 \begin{itemize}
 \item finding the function to call
 \item packaging up the argument expressions into deferred evaluation
   objects, or promises
 \item executing the call
 \end{itemize}
 Code for this process is generated by the function [[cmpCall]]. A
 simplified version is defined as
 <<simplified [[cmpCall]] function>>=
 cmpCall <- function(call, cb, cntxt) {
     cntxt <- make.callContext(cntxt, call)
     fun <- call[[1]]
     args <- call[-1]
     if (typeof(fun) == "symbol")
         cmpCallSymFun(fun, args, call, cb, cntxt)
     else
         cmpCallExprFun(fun, args, call, cb, cntxt)
 }
 @ %def cmpCall

 Call expressions in which the function is represented by a symbol are
 compiled by [[cmpCallSymFun]].  This function emits a [[GETFUN]]
 instruction and then compiles the arguments.
 <<[[cmpCallSymFun]] function>>=
 maybeNSESymbols <- c("bquote")
 cmpCallSymFun <- function(fun, args, call, cb, cntxt) {
     ci <- cb$putconst(fun)
     cb$putcode(GETFUN.OP, ci)
     nse <- as.character(fun) %in% maybeNSESymbols
     <<compile arguments and emit [[CALL]] instruction>>
 }
 @ %def cmpCallSymFun
 The [[GETFUN]] instruction takes a constant pool index of the symbol
 as an operand, looks for a function binding to the symbol in the
 current environment, places it on the stack, and prepares the stack
 for handling function call arguments.
 %% **** need a fun cache and a var cache???

 Argument compilation is carried out by the function [[cmpCallArgs]],
 presented in Section \ref{subsec:callargs}, and is followed by emitting code
 to execute the call and, if necessary, return a result.  Calls to functions
 listed in [[maybeNSESymbols]] get their arguments uncompiled.  Currently
 this is only the case of [[bquote]], which does not evaluate its argument
 [[expr]] normally, but modifies the expression first (non-standard
 evaluation).  Compiling such argument could result in warnings, because the
 argument may not be a valid R expression (e.g.  when it contains [[.()]]
 subexpressions in complex assignments), and the generated code would be
 irrelevant (yet not used).  Not compiling an argument that will in fact be
 evaluated normally is safe, hence the code is not differentiating between
 individual function arguments nor is it checking whether [[bquote]] is the
 one from the [[base]] package.

 <<compile arguments and emit [[CALL]] instruction>>=
 cmpCallArgs(args, cb, cntxt, nse)
 ci <- cb$putconst(call)
 cb$putcode(CALL.OP, ci)
 if (cntxt$tailcall) cb$putcode(RETURN.OP)
 @ %def
 The call expression itself is stored in the constant pool and is
 available to the [[CALL]] instruction.

 Calls in which the function is represented by an expression other than
 a symbol are handled by [[cmpCallExprFun]].  This emits code to
 evaluate the expression, leaving the value in the stack, and then
 emits a [[CHECKFUN]] instruction.  This instruction checks that the
 value on top of the stack is a function and prepares the stack for
 receiving call arguments.  Generation of argument code and the
 [[CALL]] instruction are handled as for symbol function calls.
 <<[[cmpCallExprFun]] function>>=
 cmpCallExprFun <- function(fun, args, call, cb, cntxt) {
     ncntxt <- make.nonTailCallContext(cntxt)
     cmp(fun, cb, ncntxt)
     cb$putcode(CHECKFUN.OP)
     nse <- FALSE
     <<compile arguments and emit [[CALL]] instruction>>
 }
 @ %def cmpCallExprFun

 The actual definition of [[cmpCall]] is a bit more complex than the
 simplified one given above:
 <<[[cmpCall]] function>>=
 cmpCall <- function(call, cb, cntxt, inlineOK = TRUE) {
     sloc <- cb$savecurloc()
     cb$setcurexpr(call)
     cntxt <- make.callContext(cntxt, call)
     fun <- call[[1]]
     args <- call[-1]
     if (typeof(fun) == "symbol") {
         if (! (inlineOK && tryInline(call, cb, cntxt))) {
             <<check the call to a symbol function>>
 	    cmpCallSymFun(fun, args, call, cb, cntxt)
         }
     }
     else {
         <<hack for handling [[break()]] and [[next()]] expressions>>
         cmpCallExprFun(fun, args, call, cb, cntxt)
     }
     cb$restorecurloc(sloc)
 }
 @ %def cmpCall
 The main addition is the use of a [[tryInline]] function which tries
 to generate more efficient code for particular functions.  The
 [[inlineOK]] argument can be used to disable inlining. This function
 returns [[TRUE]] if it has handled code generation and [[FALSE]] if it
 has not.  Code will be generated by the inline mechanism if inline
 handlers for the particular function are available and the
 optimization level permits their use.  Details of the inlining
 mechanism are given in Section \ref{sec:inlining}.

 In addition to the inlining mechanism, some checking of the call is
 carried out for symbol calls. The checking code is
 <<check the call to a symbol function>>=
 if (findLocVar(fun, cntxt))
     notifyLocalFun(fun, cntxt, loc = cb$savecurloc())
 else {
     def <- findFunDef(fun, cntxt)
     if (is.null(def))
         notifyUndefFun(fun, cntxt, loc = cb$savecurloc())
     else
         checkCall(def, call,
                   function(w) notifyBadCall(w, cntxt, loc = cb$savecurloc()))
 }
 @
 and [[checkCall]] is defined as
 <<[[checkCall]] function>>=
 ## **** figure out how to handle multi-line deparses
 ## ****     e.g. checkCall(`{`, quote({}))
 ## **** better design would capture error object, wrap it up, and pass it on
 ## **** use approach from codetools to capture partial argument match
 ## ****     warnings if enabled?
 checkCall <- function(def, call, signal = warning) {
     if (typeof(def) %in% c("builtin", "special"))
         def <- args(def)
     if (typeof(def) != "closure" || any.dots(call))
         NA
     else {
         msg <- tryCatch({match.call(def, call); NULL},
                         error = function(e) conditionMessage(e))
         if (! is.null(msg)) {
             emsg <- gettextf("possible error in '%s': %s",
                              deparse(call, 20)[1], msg)
             if (! is.null(signal)) signal(emsg)
             FALSE
         }
         else TRUE
     }
 }
 @ %def checkCall

 Finally, for calls where the function is an expression a hack is
 currently needed for dealing with the way the parser currently parses
 expressions of the form [[break()]] and [[next()]].  To be able to
 compile as many [[break]] and [[next]] calls as possible as simple
 [[GOTO]] instructions these need to be handled specially to avoid
 placing things on the stack.  A better solution would probably be to
 modify the parser to make expressions of the form [[break()]] be
 syntax errors.
 <<hack for handling [[break()]] and [[next()]] expressions>>=
 ## **** this hack is needed for now because of the way the
 ## **** parser handles break() and next() calls
 if (typeof(fun) == "language" && typeof(fun[[1]]) == "symbol" &&
     as.character(fun[[1]]) %in% c("break", "next"))
     return(cmp(fun, cb, cntxt))
 @


 \subsection{Compiling call arguments}
 \label{subsec:callargs}
 Function calls can contain four kinds of arguments:
 \begin{itemize}
 \item missing arguments
 \item [[...]] arguments
 \item general expressions
 \end{itemize}
 In the first and third cases the arguments can also be named.  The argument
 compilation function [[cmpCallArgs]] loops over the argument lists and
 handles each of the three cases, in addition to signaling errors for
 arguments that are literal bytecode or promise objects.  When [[nse]] is
 [[TRUE]] (non-standard evaluation), promises will only get uncompiled
 expressions.
 <<[[cmpCallArgs]] function>>=
 cmpCallArgs <- function(args, cb, cntxt, nse = FALSE) {
     names <- names(args)
     pcntxt <- make.promiseContext(cntxt)
     for (i in seq_along(args)) {
         a <- args[[i]]
         n <- names[[i]]
         <<compile missing argument>>
         <<compile [[...]] argument>>
         <<signal an error for promise or bytecode argument>>
         <<compile a general argument>>
     }
 }
 @ %def cmpCallArgs

 The missing argument case is handled by
 <<compile missing argument>>=
 if (missing(a)) { ## better test for missing??
     cb$putcode(DOMISSING.OP)
     cmpTag(n, cb)
 }
 @ %def
 Computations on the language related to missing arguments are tricky.
 The use of [[missing]] is a little odd, but for now at least it does
 work.

 An ellipsis argument [[...]] is handled by the [[DODOTS]] instruction:
 <<compile [[...]] argument>>=
 else if (is.symbol(a) && a == "...") {
     if (! findLocVar("...", cntxt))
         notifyWrongDotsUse("...", cntxt, loc = cb$savecurloc())
     cb$putcode(DODOTS.OP)
 }
 @ %def
 A warning is issued if no [[...]] argument is visible.

 As in [[cmp]], errors are signaled for literal bytecode or promise
 values as arguments.
 <<signal an error for promise or bytecode argument>>=
 else if (typeof(a) == "bytecode")
     cntxt$stop(gettext("cannot compile byte code literals in code"),
                cntxt, loc = cb$savecurloc())
 else if (typeof(a) == "promise")
     cntxt$stop(gettext("cannot compile promise literals in code"),
                cntxt, loc = cb$savecurloc())
 @ %def

 A general non-constant argument expression is compiled to a separate
 byte code object which is stored in the constant pool.  The compiler
 then emits a [[MAKEPROM]] instruction that uses the stored code
 object. Promises are not needed for literal constant arguments as
 these are self-evaluating.  Within the current implementation both the
 evaluation process and use of [[substitute]] will work properly if
 constants are placed directly in the argument list rather than being
 wrapped in promises. This could also be done in the interpreter,
 though the benefit is less clear as a runtime determination of whether
 an argument is a constant would be needed.  This may still be cheap
 enough compared to the cost of allocating a promise to be worth doing.
 Constant folding in [[cmp]] may also produce more constants, but
 promises are needed in this case in order for [[substitute]] to work
 properly.  These promises could be created as evaluated promises,
 though it is not clean how much this would gain.
 <<compile a general argument>>=
 else {
     if (is.symbol(a) || typeof(a) == "language") {
         if (nse)
               ci <- cb$putconst(a)
         else
               ci <- cb$putconst(genCode(a, pcntxt, loc = cb$savecurloc()))
         cb$putcode(MAKEPROM.OP, ci)
     }
     else
         cmpConstArg(a, cb, cntxt)
     cmpTag(n, cb)
 }
 @ %def
 %% **** look into using evaluated promises for constant folded arguments
 %% **** then we would use a variant of this:
 % else {
 %     ca <- constantFold(a, cntxt)
 %     if (is.null(ca)) {
 %         if (is.symbol(a) || typeof(a) == "language") {
 %             ci <- cb$putconst(genCode(a, pcntxt))
 %             cb$putcode(MAKEPROM.OP, ci)
 %         }
 %         else
 %             cmpConstArg(a, cb, cntxt)
 %     }
 %     else
 %         cmpConstArg(ca$value, cb, cntxt)
 %     cmpTag(n, cb)
 % }

 For calls to closures the [[MAKEPROM]] instruction retrieves the code
 object, creates a promise from the code object and the current
 environment, and pushes the promise on the argument stack. For calls
 to functions of type [[BULTIN]] the [[MAKEPROM]] instruction actually
 executes the code object in the current environment and pushes the
 resulting value on the stack.  For calls to functions of type
 [[SPECIAL]] the [[MAKEPROM]] instruction does nothing as these calls use
 only the call expression.

 Constant arguments are compiled by [[cmpConstArg]].  Again there are
 special instructions for the common special constants [[NULL]],
 [[TRUE]], and [[FALSE]].
 <<[[cmpConstArg]]>>=
 cmpConstArg <- function(a, cb, cntxt) {
     if (identical(a, NULL))
         cb$putcode(PUSHNULLARG.OP)
     else if (identical(a, TRUE))
         cb$putcode(PUSHTRUEARG.OP)
     else if (identical(a, FALSE))
         cb$putcode(PUSHFALSEARG.OP)
     else {
         ci <- cb$putconst(a)
         cb$putcode(PUSHCONSTARG.OP, ci)
     }
 }
 @ %def cmpConstArg

 Code to install names for named arguments is generated by [[cmpTag]]:
 <<[[cmpTag]] function>>=
 cmpTag <- function(n, cb) {
     if (! is.null(n) && n != "") {
         ci <- cb$putconst(as.name(n))
         cb$putcode(SETTAG.OP, ci)
     }
 }
 @ %def cmpTag

 The current implementation allocates a linked list of call arguments,
 stores tags in the list cells, and allocates promises. Alternative
 implementations that avoid some or all allocation are worth exploring.
 Also worth exploring is having an instruction specifically for calls that
 do not require matching of named arguments to formal arguments, since
 cases that use only order of arguments, not names, are quite common
 and are known at compile time. In the case of calls to functions with
 definitions known at compile time matching of named arguments to
 formal ones could also be done at compile time.


 \subsection{Discussion}
 The framework presented in this section, together with some support
 functions, is actually able to compile any legal R code.  But this is
 somewhat deceptive. The R implementation, and the [[CALL]] opcode,
 support three kinds of functions: closures (i.e. R-level functions),
 primitive functions of type [[BUILTIN]], and primitive functions of
 type [[SPECIAL]].  Primitives of type [[BUILTIN]] always evaluate
 their arguments in order, so creating promises is not necessary and in
 fact the [[MAKEPROM]] instruction does not do so --- if the function
 to be called is a [[BUILTIN]] then [[MAKEPROM]] runs the code for
 computing the argument in the current environment and pushes the value
 on the stack.  On the other hand, primitive functions of type
 [[SPECIAL]] use the call expression and evaluate bits of it as
 needed. As a result, they will be running interpreted code.  Since
 core functions like the sequencing function [[{]] and the conditional
   evaluation function [[if]] are of type [[SPECIAL]] this means most
   non-trivial code will be run by the standard interpreter.  This will
   be addressed by defining inlining rules that allow functions like
   [[{]] and [[if]] to be compiled properly.


 \section{The code buffer}
 \label{sec:codebuf}
 The code buffer is a collection of closures that accumulate code and
 constants in variables in their defining environment.  For a code
 buffer [[cb]] the closures [[cb$putcode]] and [[cb$putconst]] write an
 instruction sequence and a constant, respectively, into the code
 buffer. The closures [[cb$code]] and [[cb$consts]] extract the code
 vector and the constant pool.

 The function [[make.codeBuf]] creates a set of closures for managing
 the instruction stream buffer and the constant pool buffer and returns
 a list of these closures for use by the compilation functions.  In
 addition, the expression to be compiled into the code buffer is stored
 as the first constant in the constant pool; this can be used to
 retrieve the source code for a compiled expression.
 <<[[make.codeBuf]] function>>=
 make.codeBuf <- function(expr, loc = NULL) {
     <<source location tracking implementation>>
     <<instruction stream buffer implementation>>
     <<constant pool buffer implementation>>
     <<label management interface>>
     cb <- list(code = getcode,
                const = getconst,
                putcode = putcode,
                putconst = putconst,
                makelabel = makelabel,
                putlabel = putlabel,
 	       patchlabels = patchlabels,
                setcurexpr = setcurexpr,
                setcurloc = setcurloc,
                commitlocs = commitlocs,
                savecurloc = savecurloc,
                restorecurloc = restorecurloc)
     cb$putconst(expr) ## insert expression as first constant.
       ## NOTE: this will also insert the srcref directly into the constant
       ## pool
     cb
 }
 @ %def make.codeBuf

 The instruction stream buffer uses a list structure and a count of
 elements in use, and doubles the size of the list to make room for new
 code when necessary.  By convention the first entry is a byte code
 version number; if the interpreter sees a byte code version number it
 cannot handle then it falls back to interpreting the uncompiled
 expression. The doubling strategy is needed to avoid quadratic
 compilation times for large instruction streams.
 <<instruction stream buffer implementation>>=
 codeBuf <- list(.Internal(bcVersion()))
 codeCount <- 1
 putcode <- function(...) {
     new <- list(...)
     newLen <- length(new)
     while (codeCount + newLen > length(codeBuf)) {
         codeBuf <<- c(codeBuf, vector("list", length(codeBuf)))
         if (exprTrackingOn)
             exprBuf <<- c(exprBuf, vector("integer", length(exprBuf)))
         if (srcrefTrackingOn)
             srcrefBuf <<- c(srcrefBuf, vector("integer", length(srcrefBuf)))
     }
     codeRange <- (codeCount + 1) : (codeCount + newLen)
     codeBuf[codeRange] <<- new

     if (exprTrackingOn) {   ## put current expression into the constant pool
         ei <- putconst(curExpr)
         exprBuf[codeRange] <<- ei
     }
     if (srcrefTrackingOn) { ## put current srcref into the constant pool
         si <- putconst(curSrcref)
         srcrefBuf[codeRange] <<- si
     }

     codeCount <<- codeCount + newLen
 }
 getcode <- function() as.integer(codeBuf[1 : codeCount])
 @ %def

 The constant pool is accumulated into a list buffer.  The zero-based
 index of the constant in the pool is returned by the insertion
 function.  Values are only entered once; if a value is already in the
 pool, as determined by [[identical]], its existing index is returned.
 Again a size-doubling strategy is used for the buffer.  [[.Internal]]
 functions are used both for performance reasons and to prevent
 duplication of the constants.
 <<constant pool buffer implementation>>=
 constBuf <- vector("list", 1)
 constCount <- 0
 putconst <- function(x) {
     if (constCount == length(constBuf))
         constBuf <<- .Internal(growconst(constBuf))
     i <- .Internal(putconst(constBuf, constCount, x))
     if (i == constCount)
         constCount <<- constCount + 1
     i
 }
 getconst <- function()
     .Internal(getconst(constBuf, constCount))
 @ %def

 The compiler maintains a mapping from code to source locations. For each
 value in the code buffer (instruction and operand) there is a source code
 reference ([[srcref]]) and the corresponding expression (AST).  The code
 buffer implementation remembers the current location (source reference and
 expression), which can be set by [[setcurloc]], [[setcurexpr]] or
 [[restorecurloc]] and retrieved by [[savecurloc]].  In addition to emitting
 code, [[putconst]] also copies the current location information into the
 constant pool and records the resulting constant indices in a source
 reference buffer and expression buffer.  When the final code is extracted
 using [[codeBufCode]], the source reference and expression buffers are
 copied into the constant pool as vectors indexed by code offset (program
 counter).
 <<source location tracking functions>>=
 extractSrcref <- function(sref, idx) {
     if (is.list(sref) && length(sref) >= idx)
         sref[[idx]]
     else if (is.integer(sref) && length(sref) >= 6)
         sref
     else
         NULL
 }
 getExprSrcref <- function(expr) {
     sattr <- attr(expr, "srcref")
     extractSrcref(sattr, 1)
 }
 # if block is a block srcref, get its idx'th entry
 # if block is a single srcref, return this srcref
 getBlockSrcref <- function(block, idx) {
   extractSrcref(block, idx)
 }
 addLocString <- function(msg, loc) {
     if (is.null(loc$srcref))
         msg
     else
         paste0(msg, " at ", utils::getSrcFilename(loc$srcref), ":",
                utils::getSrcLocation(loc$srcref, "line"))
 }
 @ %def

 <<source location tracking implementation>>=
 exprTrackingOn <- TRUE
 srcrefTrackingOn <- TRUE

 if (is.null(loc)) {
     curExpr <- expr
     curSrcref <- getExprSrcref(expr)
 } else {
     curExpr <- loc$expr
     curSrcref <- loc$srcref
 }

 if (is.null(curSrcref))
     ## when top-level srcref is null, we speculate there will be no
     ##   source references within the compiled expressions either,
     ##   disabling the tracking makes the resulting constant pool
     ##   smaller
     srcrefTrackingOn <- FALSE

 exprBuf <- NA   ## exprBuf will have the same length as codeBuf
 srcrefBuf <- NA ## srcrefBuf will have the same length as codeBuf

 if (!exprTrackingOn) {
     curExpr <- NULL
     exprBuf <- NULL
 }
 if (!srcrefTrackingOn) {
     curSrcref <- NULL
     srcrefBuf <- NULL
 }

 ## set the current expression
 ## also update the srcref according to expr, if expr has srcref attribute
 ##   (note: never clears current srcref)
 setcurexpr <- function(expr) {
     if (exprTrackingOn) {
         curExpr <<- expr
     }
     if (srcrefTrackingOn) {
         sref <- getExprSrcref(expr)
         if (!is.null(sref) && srcrefTrackingOn)
             curSrcref <<- sref
      }
 }
 ## unconditionally sets the current expression and srcrefs
 setcurloc <- function(expr, sref) {
     if (exprTrackingOn)
         curExpr <<- expr
     if (srcrefTrackingOn)
         curSrcref <<- sref
 }
 ## add location information (current expressions, srcrefs) to the constant pool
 commitlocs <- function() {
     if (exprTrackingOn) {
       exprs <- exprBuf[1:codeCount]
       class(exprs) <- "expressionsIndex"
       putconst(exprs)
     }

     if (srcrefTrackingOn) {
       srefs <- srcrefBuf[1:codeCount]
       class(srefs) <- "srcrefsIndex"
       putconst(srefs)
     }

     ## these entries will be at the end of the constant pool, assuming only the compiler
     ## uses these two classes
     NULL
 }
 savecurloc <- function() {
     list(expr = curExpr, srcref = curSrcref)
 }
 restorecurloc <- function(saved) {
     if (exprTrackingOn) curExpr <<- saved$expr
     if (srcrefTrackingOn) curSrcref <<- saved$srcref
 }
 @ %def


 Labels are used for identifying targets for branching instruction.  The
 label management interface creates new labels with [[makelabel]] as
 character strings that are unique within the buffer.  These labels can
 then be included as operands in branching instructions. The
 [[putlabel]] function records the current code position as the value
 of the label.
 <<label management interface>>=
 idx <- 0
 labels <- vector("list")
 makelabel <- function() { idx <<- idx + 1; paste0("L", idx) }
 putlabel <- function(name) labels[[name]] <<- codeCount
 @

 Once code generation is complete the symbolic labels in the code
 stream need to be converted to numerical offset values.  This is done
 by [[patchlabels]].  Labels can appear directly in the instruction
 stream and in lists that have been placed in the instruction stream;
 this is used for the [[SWITCH]] instruction.
 <<label management interface>>=
 patchlabels <- function(cntxt) {
     offset <- function(lbl) {
         if (is.null(labels[[lbl]]))
             cntxt$stop(gettextf("no offset recorded for label \"%s\"", lbl),
                        cntxt)
         labels[[lbl]]
     }
     for (i in 1 : codeCount) {
         v <- codeBuf[[i]]
         if (is.character(v))
             codeBuf[[i]] <<- offset(v)
         else if (typeof(v) == "list") {
             off <- as.integer(lapply(v, offset))
             ci <- putconst(off)
             codeBuf[[i]] <<- ci
         }
     }
 }
 @ %def

 The contents of the code buffer is extracted into a code object by
 calling [[codeBufCode]]:
 <<[[codeBufCode]] function>>=
 codeBufCode <- function(cb, cntxt) {
     cb$patchlabels(cntxt)
     cb$commitlocs()
     .Internal(mkCode(cb$code(), cb$const()))
 }
 @ %def codeBufCode


 \section{Compiler contexts}
 \label{sec:contexts}
 The compiler context object [[cntxt]] carries along information about
 whether the expression appears in tail position and should be followed
 by a return or, whether the result is ignored, or whether the
 expression is contained in a loop.  The context object also contains
 current compiler option settings as well as functions used to issue
 warnings or signal errors.


 \subsection{Top level contexts}
 Top level compiler functions start by creating a top level context.
 The constructor for top level contexts takes as arguments the current
 compilation environment, described in Section \ref{sec:environments},
 and a list of option values used to override default option settings.
 The [[toplevel]] field will be set to [[FALSE]] for compiling
 expressions, such as function arguments, that do not appear at top
 level.  Top level expressions are assumed to be in tail position, so
 the [[tailcall]] field is initialized as [[TRUE]]. The
 [[needRETURNJMP]] specifies whether a call to the [[return]] function
 can use the [[RETURN]] instruction or has to use a [[longjmp]] via the
 [[RETURNJMP]] instruction.  Initially using a simple [[RETURN]] is
 safe; this is set set to [[TRUE]] when compiling promises ad certain
 loops where [[RETURNJMP]] is needed.
 <<[[make.toplevelContext]] function>>=
 make.toplevelContext <- function(cenv, options = NULL)
     structure(list(toplevel = TRUE,
                    tailcall = TRUE,
 		   needRETURNJMP = FALSE,
                    env = cenv,
                    optimize = getCompilerOption("optimize", options),
                    suppressAll = getCompilerOption("suppressAll", options),
                    suppressNoSuperAssignVar =
                        getCompilerOption("suppressNoSuperAssignVar", options),
                    suppressUndefined = getCompilerOption("suppressUndefined",
                                                          options),
                    call = NULL,
                    stop = function(msg, cntxt, loc = NULL)
                        stop(simpleError(addLocString(msg, loc), cntxt$call)),
                    warn = function(x, cntxt, loc = NULL)
                        cat(paste("Note:", addLocString(x, loc), "\n"))
               ),
               class = "compiler_context")
 @ %def make.toplevelContext
 Errors are signaled using a version of [[stop]] that uses the current
 call in the compilation context.  The default would be to use the call
 in the compiler code where the error was raised, and that would not be
 meaningful to the end user.  Ideally [[warn]] should do something
 similar and also use the condition system, but for now it just prints
 a simple message to standard output.
 %% **** look into adding source info to errors/warnings
 %% **** comment on class of context object


 \subsection{Other compiler contexts}
 The [[cmpCall]] function creates a new context for each call it
 compiles.  The new context is the current context with the [[call]]
 entry replaced by the current call --- this is be useful for issuing
 meaningful warning and error messages.
 <<[[make.callContext]] function>>=
 make.callContext <- function(cntxt, call) {
     cntxt$call <- call
     cntxt
 }
 @ %def make.callContext

 Non-tail-call contexts are used when a value is being computed for use
 in a subsequent computation. The constructor returns a new context
 that is the current context with the tailcall field set to [[FALSE]].
 <<[[make.nonTailCallContext]] function>>=
 make.nonTailCallContext <- function(cntxt) {
     cntxt$tailcall <- FALSE
     cntxt
 }
 @ %def make.nonTailCallContext
 A no value context is used in cases where the computed value will be
 ignored.  For now this is identical to a non-tail-call context, but it
 may eventually be useful to distinguish the two situations. This is
 used mainly for expressions other than the final one in [[{]] calls
   and for compiling the bodies of loops.
 %% **** can avoid generating push/pop pairs if novalue = TRUE is used
 %% **** might simplify tailcall/RETURN stuff??
 <<[[make.noValueContext]] function>>=
 make.noValueContext <- function(cntxt) {
     cntxt$tailcall <- FALSE
     cntxt
 }
 @ %def make.noValueContext

 The compiler context for compiling a function is a new toplevel
 context using the function environment and the current compiler
 options settings.
 %% **** copy other compiler options; maybe cntxt$options$optimize??
 <<[[make.functionContext]] function>>=
 make.functionContext <- function(cntxt, forms, body) {
     nenv <- funEnv(forms, body, cntxt)
     ncntxt <- make.toplevelContext(nenv)
     ncntxt$optimize <- cntxt$optimize
     ncntxt$suppressAll <- cntxt$suppressAll
     ncntxt$suppressNoSuperAssignVar <- cntxt$suppressNoSuperAssignVar
     ncntxt$suppressUndefined <- cntxt$suppressUndefined
     ncntxt
 }
 @ %def make.functionContext

 The context for compiling the body of a loop is a no value context
 with the loop information available.
 <<[[make.loopContext]] function>>=
 make.loopContext <- function(cntxt, loop.label, end.label) {
     ncntxt <- make.noValueContext(cntxt)
     ncntxt$loop <- list(loop = loop.label, end = end.label, gotoOK = TRUE)
     ncntxt
 }
 @ %def make.loopContext

 The initial loop context allows [[break]] and [[next]] calls to be
 implemented as [[GOTO]] instructions.  This is OK for calls that are
 in top level position relative to the loop.  Calls that occur in
 promises or in other contexts where the stack has changed from the
 loop top level state need stack unwinding and cannot be implemented as
 [[GOTO]] instructions. These should should be compiled with contexts
 that have the [[loop$gotoOK]] field set to [[FALSE]].  The promise
 context does this for promises and the argument context for other
 settings.  The promise context also sets [[needRETURNJMP]] to [[TRUE]]
 since a [[return]] call that is triggered by forcing a promise
 requires a [[longjmp]] to return from the appropriate function.
 <<[[make.argContext]] function>>=
 make.argContext <- function(cntxt) {
     cntxt$toplevel <- FALSE
     cntxt$tailcall <- FALSE
     if (! is.null(cntxt$loop))
         cntxt$loop$gotoOK <- FALSE
     cntxt
 }
 @ %def make.argContext
 <<[[make.promiseContext]] function>>=
 make.promiseContext <- function(cntxt) {
     cntxt$toplevel <- FALSE
     cntxt$tailcall <- TRUE
     cntxt$needRETURNJMP <- TRUE
     if (! is.null(cntxt$loop))
         cntxt$loop$gotoOK <- FALSE
     cntxt
 }
 @ %def make.promiseContext
 %% pull out gotoOK chunk


 \subsection{Compiler options}
 Default compiler options are maintained in an environment.  For now,
 the supported options are [[optimize]], which is initialized to level
 2, and two options for controlling compiler messages.  The
 [[suppressAll]] option, if [[TRUE]], suppresses all notifications.
 The [[suppressNoSuperAssignVar]] option, if [[TRUE]], suppresses
 notifications about missing binding for a super-assigned variable.
 The [[suppressUndefined]] option can be [[TRUE]] to suppress all
 notifications about undefined variables and functions, or it can be a
 character vector of the names of variables for which warnings should
 be suppressed.
 <<compiler options data base>>=
 compilerOptions <- new.env(hash = TRUE, parent = emptyenv())
 compilerOptions$optimize <- 2
 compilerOptions$suppressAll <- TRUE
 compilerOptions$suppressNoSuperAssignVar <- FALSE
 compilerOptions$suppressUndefined <-
     c(".Generic", ".Method", ".Random.seed", ".self")
 @ %def compilerOptions

 Options are retrieved with the [[getCompilerOption]] function.
 <<[[getCompilerOption]] function>>=
 getCompilerOption <- function(name, options = NULL) {
     if (name %in% names(options))
         options[[name]]
     else
         get(name, compilerOptions)
 }
 @ %def getCompilerOption

 The [[suppressAll]] function determines whether a context has its
 [[supressAll]] property set to [[TRUE]].
 <<[[suppressAll]] function>>=
 suppressAll <- function(cntxt)
     identical(cntxt$suppressAll, TRUE)
 @ %def suppressAll
 The [[suppressNoSuperAssignVar]] function determines whether a context has
 its [[suppressNoSuperAssignVar]] property set to [[TRUE]].
 <<[[suppressNoSuperAssignVar]] function>>=
 suppressNoSuperAssignVar <- function(cntxt)
     isTRUE(cntxt$suppressNoSuperAssignVar)
 @ %def suppressNoSuperAssignVar
 The [[suppressUndef]] function determines whether undefined variable
 or function definition notifications for a particular variable should
 be suppressed in a particular compiler context.
 <<[[suppressUndef]] function>>=
 suppressUndef <- function(name, cntxt) {
     if (identical(cntxt$suppressAll, TRUE))
         TRUE
     else {
         suppress <- cntxt$suppressUndefined
         if (is.null(suppress))
             FALSE
         else if (identical(suppress, TRUE))
             TRUE
         else if (is.character(suppress) && as.character(name) %in% suppress)
             TRUE
         else FALSE
     }
 }
 @ %def suppressUndef

 At some point we will need mechanisms for setting default options from
 the interpreter and in package meta-data. A declaration mechanism for
 adjusting option settings locally will also be needed.


 \subsection{Compiler notifications}
 Compiler notifications are currently sent by calling the context's
 [[warn]] function, which in turn prints a message to standard output.
 It would be better to use an approach based on the condition system,
 and this will be done eventually.  The use of separate notification
 functions for each type of issue signaled is a step in this direction.

 Undefined function and undefined variable notifications are issued by
 [[notifyUndefFun]] and [[notifyUndefVar]].  These both use
 [[suppressUndef]] to determine whether the notification should be
 suppressed in the current context.
 <<[[notifyUndefFun]] function>>=
 notifyUndefFun <- function(fun, cntxt, loc = NULL) {
     if (! suppressUndef(fun, cntxt)) {
         msg <- gettextf("no visible global function definition for '%s'",
                         as.character(fun))
         cntxt$warn(msg, cntxt, loc)
     }
 }
 @ %def notifyUndefFun
 <<[[notifyUndefVar]] function>>=
 notifyUndefVar <- function(var, cntxt, loc = NULL) {
     if (! suppressUndef(var, cntxt)) {
         msg <- gettextf("no visible binding for global variable '%s'",
                         as.character(var))
         cntxt$warn(msg, cntxt, loc)
     }
 }
 @ %def notifyUndefVar

 Codetools currently optionally notifies about use of local
 functions. This is of course not an error but may sometimes be the
 result of a mis-spelling.  For now the compiler does not notify about
 these, but this could be changed by redefining [[notifyLocalFun]] .
 <<[[notifyLocalFun]] function>>=
 notifyLocalFun <- function(fun, cntxt, loc = NULL) {
     if (! suppressAll(cntxt))
         NULL
 }
 @ %def notifyLocalFun

 Warnings about possible improper use of [[...]] and [[..n]] variables
 are sent by [[notifyWrongDotsUse]].
 <<[[notifyWrongDotsUse]] function>>=
 notifyWrongDotsUse <- function(var, cntxt, loc = NULL) {
     if (! suppressAll(cntxt)) {
         msg <- paste(var, "may be used in an incorrect context")
         cntxt$warn(msg, cntxt, loc)
     }
 }
 @ %def notifyWrongDotsUse

 Wrong argument count issues are signaled by [[notifyWrongArgCount]].
 <<[[notifyWrongArgCount]] function>>=
 notifyWrongArgCount <- function(fun, cntxt, loc = NULL) {
     if (! suppressAll(cntxt)) {
         msg <- gettextf("wrong number of arguments to '%s'",
                         as.character(fun))
         cntxt$warn(msg, cntxt, loc)
     }
 }
 @ %def notifyWrongArgCount
 Other issues with calls that do not match their definitions are
 signaled by [[notifyBadCall]].  Ideally these should be broken down
 more finely, but that would require some rewriting of the error
 signaling in [[match.call]].
 <<[[notifyBadCall]] function>>=
 notifyBadCall <- function(w, cntxt, loc = NULL) {
     if (! suppressAll(cntxt))
         cntxt$warn(w, cntxt, loc)
 }
 @ %def notifyBadCall

 [[break]] or [[next]] calls that occur in a context where no loop is
 visible will most likely result in runtime errors, and
 [[notifyWrongBreakNext]] is used to signal such cases.
 <<[[notifyWrongBreakNext]] function>>=
 notifyWrongBreakNext <- function(fun, cntxt, loc = NULL) {
     if (! suppressAll(cntxt)) {
         msg <- paste(fun, "used in wrong context: no loop is visible")
         cntxt$warn(msg, cntxt, loc)
     }
 }
 @ %def notifyWrongBreakNext

 Several issues can arise in assignments.  For super-assignments a
 target variable should be defined; otherwise there will be a runtime
 warning.
 <<[[notifyNoSuperAssignVar]] function>>=
 notifyNoSuperAssignVar <- function(symbol, cntxt, loc = NULL) {
     if (! suppressAll(cntxt) && ! suppressNoSuperAssignVar(cntxt)) {
         msg <- gettextf("no visible binding for '<<-' assignment to '%s'",
                         as.character(symbol))
         cntxt$warn(msg, cntxt, loc)
     }
 }
 @ %def notifyNoSuperAssignVar
 If the compiler detects an invalid function in a complex assignment
 then this is signaled at compile time; a corresponding error would
 occur at runtime.
 %% **** should put function/call into message
 <<[[notifyBadAssignFun]] function>>=
 notifyBadAssignFun <- function(fun, cntxt, loc = NULL) {
     if (! suppressAll(cntxt)) {
         msg <- gettext("invalid function in complex assignment")
         cntxt$warn(msg, cntxt, loc)
     }
 }
 @ %def notifyBadAssignFun

 In [[switch]] calls it is an error if a character selector argument is
 used and there are multiple default alternatives.  The compiler
 signals a possible problem with [[notifyMultipleSwitchDefaults]] if
 there are some named cases but more than one unnamed ones.
 <<[[notifyMultipleSwitchDefaults]] function>>=
 notifyMultipleSwitchDefaults <- function(ndflt, cntxt, loc = NULL)
     if (! suppressAll(cntxt)) {
         msg <- gettext("more than one default provided in switch() call")
         cntxt$warn(msg, cntxt, loc)
     }
 @ %def notifyMultipleSwitchDefaults

 <<[[notifyNoSwitchcases]] function>>=
 notifyNoSwitchcases <- function(cntxt, loc = NULL)
     if (! suppressAll(cntxt)) {
         msg <- gettext("'switch' with no alternatives")
         cntxt$warn(msg, cntxt, loc)
     }
 @ %def notifyNoSwitchcases

 The compiler signals when it encounters that a special syntactic function,
 such as [[for]], has been assigned to.
 <<[[notifyAssignSyntacticFun]] function>>=
 notifyAssignSyntacticFun <- function(funs, cntxt, loc = NULL) {
     if (! suppressAll(cntxt)) {
         msg <- ngettext(length(funs),
             "local assignment to syntactic function: ",
             "local assignments to syntactic functions: ")
         cntxt$warn(paste(msg, paste(funs, collapse = ", ")),
                    cntxt, loc)
     }
 }
 @ %def notifyAssignSyntacticFun

 When the compiler encounters an error during JIT or package
 compilation, it catches the error and returns the original uncompiled
 code letting the AST interpreter handle it. This can happen due to a
 compiler bug or when the code being compiled violates certain
 assumptions made by the compiler (such as a certain discipline on
 frame types in the evaluation environment, as checked in
 [[frameTypes]]). The compiler will notify about catching such errors
 via [[notifyCompilerError]].

 <<[[notifyCompilerError]] function>>=
 notifyCompilerError <- function(msg)
     if (!compilerOptions$suppressAll)
         cat(paste(gettext("Error: compilation failed - "), msg, "\n"))
 @ %def notifyCompilerError


 \section{Compilation environments}
 \label{sec:environments}
 %% **** lambda lifting/eliminating variables not captured
 %% **** defer SETVAR until needed; avoid GETVAR if still in register
 %% **** Could preserve semantics by pre-test to check that fun in env
 %% **** is inlined one.  Would need to make efficient somehow, e.g
 %% **** increment counter each time one of inlined names is assigned
 %% **** to and only check when count has changed.
 At this point the compiler will essentially use the interpreter to
 evaluate an expression of the form
 \begin{verbatim}
 if (x > 0) log(x) else 0
 \end{verbatim}
 since [[if]] is a [[SPECIAL]] function. To make further improvements
 the compiler needs to be able to implement the [[if]] expression in
 terms of conditional and unconditional branch instructions.  It might
 then also be useful to implement [[>]] and [[log]] with special
 virtual machine instructions.  To be able to do this, the compiler
 needs to know that [[if]], [[>]], and [[log]] refer to the standard
 versions of these functions in the base package.  While this is very
 likely, it is not guaranteed.

 R is a very dynamic language.  Functions defined in the base and other
 packages could be shadowed at runtime by definitions in loaded user
 packages or by local definitions within a function.  It is even
 possible for user code to redefine the functions in the base package,
 though this is discouraged by binding locking and would be poor
 programming practice.  Finally, it is possible for functions called
 prior to evaluating an expression like the one above to reach into
 their calling environment and add new definitions of [[log]] or [[if]]
 that wound then be used in evaluating this expression.  Again this is
 not common and generally not a good idea outside of a debugging
 context.

 Ideally the compiler should completely preserve semantics of the
 language implemented by the interpreter.  While this is possible it
 would significantly complicate the compiler and the compiled code, and
 carry at least some runtime penalty. The approach taken here is
 therefore to permit the compiler to inline some functions when they
 are not visibly shadowed in the compiled code.  What the compiler is
 permitted to do is determined by the setting of an optimization level.
 The details are desctibed in Section \ref{sec:inlining}.

 For the compiler to be able to decide whether is can inline a function
 it needs to be able to determine whether there are any local variable
 that might shadow a variable from a base package. This requires adding
 environment information to the compilation process.


 \subsection{Representing compilation environments}
 When compiling an expression the compiler needs to take into account
 an evaluation environment, which would typically be a toplevel
 environment, along with local variable definitions discovered during
 the compilation process. The evaluation environment should not be
 modified, so the local variables need to be considered in addition to
 ones defined in the evaluation environment.  If an expression
 \begin{verbatim}
 { x <- 1; x + 2 }
 \end{verbatim}
 is compiled for evaluation in the global environment then existing
 definitions in the global environment as well as the new definition
 for [[x]] need to be taken into account. To address this the
 compilation environment is a list of two components, an environment
 and a list of character vectors.  The environment consists of one
 frame for each level of local variables followed by the top level
 evaluation environment.  The list of character vectors consist of one
 element for each frame for which local variables have been discovered
 by the compiler. For efficiency the compilation environment structure
 also includes a character vector [[ftype]] classifying each frame as a
 local, namespace, or global frame.
 <<[[makeCenv]] function>>=
 ## Create a new compiler environment
 ## **** need to explain the structure
 makeCenv <- function(env) {
     structure(list(extra = list(character(0)),
                    env = env,
                    ftypes = frameTypes(env)),
               class = "compiler_environment")
 }
 @ %def makeCenv

 When an expression is to be compiled in a particular environment a
 first step is to identify any local variable definitions and add these
 to the top level frame.
 <<[[addCenvVars]] function>>=
 ## Add vars to the top compiler environment frame
 addCenvVars <- function(cenv, vars) {
     cenv$extra[[1]] <- union(cenv$extra[[1]], vars)
     cenv
 }
 @ %def addCenvVars

 When compiling a function a new frame is added to the compilation
 environment.  Typically a number of local variables are added
 immediately, so an optional [[vars]] argument is provided so this
 can be done without an additional call to [[addCenvVars]].
 <<[[addCenvFrame]] function>>=
 ## Add a new frame to a compiler environment
 addCenvFrame <- function(cenv, vars) {
     cenv$extra <- c(list(character(0)), cenv$extra)
     cenv$env <- new.env(parent = cenv$env)
     cenv$ftypes <- c("local", cenv$ftypes)
     if (missing(vars))
         cenv
     else
         addCenvVars(cenv, vars)
 }
 @ %def addCenvFrame

 The compilation environment is queried by calling [[findCenvVar]].
 %% **** change name to findCenvVarInfo or some such??
 If a binding for the specified variable is found then [[findCenvVar]]
 returns a list containing information about the binding.  If no
 binding is found then [[NULL]] is returned.
 <<[[findCenvVar]] function>>=
 ## Find binding information for a variable (character or name).
 ## If a binding is found, return a list containing components
 ##   ftype -- one of "local", "namespace", "global"
 ##   value -- current value if available
 ##   frame -- frame containing the binding (not useful for "local" variables)
 ##   index -- index of the frame (1 for top, 2, for next one, etc.)
 ## Return NULL if no binding is found.
 ## **** drop the index, maybe value, to reduce cost? (query as needed?)
 findCenvVar <- function(var, cenv) {
     if (typeof(var) == "symbol")
         var <- as.character(var)
     extra <- cenv$extra
     env <- cenv$env
     frame <- NULL
     <<search [[extra]] entries and environment frames>>
     <<search the remaining environment frames if necessary>>
     <<create the [[findCenvVar]] result>>
 }
 @ %def findCenvVar

 The initial search for a matching binding proceeds down each frame for
 which there is also an entry in [[extra]], searching the [[extra]]
 entry before the environment frame.
 <<search [[extra]] entries and environment frames>>=
 for (i in seq_along(cenv$extra)) {
     if (var %in% extra[[i]] || exists(var, env, inherits = FALSE)) {
         frame <- env
         break
     }
     else
         env <- parent.env(env)
 }
 @ %def
 If [[frame]] is still [[NULL]] after the initial search then the
 remaining environment frames from the evaluation environment for which
 there are no corresponding entries in [[extra]] are searched.
 <<search the remaining environment frames if necessary>>=
 if (is.null(frame)) {
     empty <- emptyenv()
     while (! identical(env, empty)) {
         i <- i + 1
         if (exists(var, env, inherits = FALSE)) {
             frame <- env
             break
         }
         else
             env <- parent.env(env)
     }
 }
 @ %def

 If a binding frame is found then the result consists of a list
 containing the frame, the frame type, the value if available, and the
 frame index.  The value is not looked up for [[...]] variables.  A
 promise to compute the value is stored in an environment in the
 result.  This avoids computing the value in some cases where doing so
 may fail or produce unwanted side effects.
 <<create the [[findCenvVar]] result>>=
 if (! is.null(frame)) {
     if (exists(var, frame, inherits = FALSE) && var != "...") {
         value <- new.env(parent = emptyenv())
         delayedAssign("value", get(var, frame, inherits = FALSE),
                       assign.env = value)
     }
     else
         value <- NULL
     list(frame = frame, ftype = cenv$ftypes[i], value = value, index = i)
 }
 else
     NULL
 @ %def

 Useful functions for querying the environment associated with a
 compilation context are [[findVar]], [[findLocVar]], and
 [[findFunDef]]. The function [[findVar]] returns [[TRUE]] is a binding
 for the specified variable is visible and [[FALSE]] otherwise.
 <<[[findVar]] function>>=
 findVar <- function(var, cntxt) {
     cenv <- cntxt$env
     info <- findCenvVar(var, cenv)
     ! is.null(info)
 }
 @ %def findVar
 [[findLocVar]] returns [[TRUE]] only if a local binding is found.
 <<[[findLocVar]] function>>=
 ## test whether a local version of a variable might exist
 findLocVar <- function(var, cntxt) {
     cenv <- cntxt$env
     info <- findCenvVar(var, cenv)
     ! is.null(info) && info$ftype == "local"
 }
 @ %def findLocVar
 [[findFunDef]] returns a function definition if one is available for
 the specified name and [[NULL]] otherwise.
 <<[[findFunDef]] function>>=
 ## **** should this check for local functions as well?
 findFunDef <- function(fun, cntxt) {
     cenv <- cntxt$env
     info <- findCenvVar(fun, cenv)
     if (! is.null(info$value) && is.function(info$value$value))
         info$value$value
     else
         NULL
 }
 @ %def findFunDef


 \subsection{Identifying possible local variables}
 For the compiler to be able to know that it can optimize a reference
 to a particular global function or variable it needs to be able to
 determine that that variable will not be shadowed by a local
 definition at runtime.  R semantics do not allow certain
 identification of local variables.  If a function body consist of the
 two lines
 \begin{verbatim}
 if (x) y <- 1
 y
 \end{verbatim}
 then whether the variable [[y]] in the second line is local or global
 depends on the value of [[x]].  Lazy evaluation of arguments also
 means what whether and when an assignment in a function argument
 occurs can be uncertain.

 The approach taken by the compiler is to conservatively identify all
 variables that might be created within an expression, such as a
 function body, and consider those to be potentially local variables
 that inhibit optimizations.  This ignores runtime creation of new
 variables, but as already mentioned that is generally not good
 programming practice.

 Variables are created by the assignment operators [[<-]] and [[=]] and
 by [[for]] loops.  In addition, calls to [[assign]] and
 [[delayedAssign]] with a literal character name argument are
 considered to create potential local variables if the environment
 argument is missing, which means the assignment is in the current
 environment.

 A simple approach for identifying all local variables created within
 an expression is given by
 <<findlocals0>>=
 findLocals0 <- function(e, cntxt) {
     if (typeof(e) == "language") {
         if (typeof(e[[1]]) %in% c("symbol", "character"))
             switch(as.character(e[[1]]),
                    <<[[findLocals0]] switch clauses>>
                    findLocalsList0(e[-1], cntxt))
          else findLocalsList0(e, cntxt)
     }
     else character(0)
 }

 findLocalsList0 <- function(elist, cntxt)
     unique(unlist(lapply(elist, findLocals0, cntxt)))
 @ %def findLocals0 findLocalsList0

 For assignment expressions the assignment variable is added to any
 variables found in the value expression.
 <<[[findLocals0]] switch clauses>>=
 "=" =,
 "<-" = unique(c(getAssignedVar(e, cntxt),
                 findLocalsList0(e[-1], cntxt))),
 @ %def
 The assigned variable is determined by [[getAssignedVar]]:
 <<[[getAssignedVar]] function>>=
 getAssignedVar <- function(e, cntxt) {
     v <- e[[2]]
     if (missing(v))
         cntxt$stop(gettextf("bad assignment: %s", pasteExpr(e)), cntxt)
     else if (typeof(v) %in% c("symbol", "character"))
         as.character(v)
     else {
         while (typeof(v) == "language") {
             if (length(v) < 2)
                 cntxt$stop(gettextf("bad assignment: %s", pasteExpr(e)), cntxt)
             v <- v[[2]]
             if (missing(v))
                 cntxt$stop(gettextf("bad assignment: %s", pasteExpr(e)), cntxt)
         }
         if (typeof(v) != "symbol")
             cntxt$stop(gettextf("bad assignment: %s", pasteExpr(e)), cntxt)
         as.character(v)
     }
 }
 @ %def getAssignedVar

 For [[for]] loops the loop variable is added to any variables found in
 the sequence and body expressions.
 <<[[findLocals0]] switch clauses>>=
 "for" = unique(c(as.character(e[2]),
                  findLocalsList0(e[-2], cntxt))),
 @ %def

 The variable in [[assign]] and [[delayedAssign]] expressions is
 considered local if it is an explicit character string and there is no
 environment argument.
 <<[[findLocals0]] switch clauses>>=
 "delayedAssign" =,
 "assign" = if (length(e) == 3 &&
                is.character(e[[2]]) &&
                length(e[[2]]) == 1)
                c(e[[2]], findLocals0(e[[3]], cntxt))
            else findLocalsList0(e[1], cntxt),
 @ %def

 Variables defined within local functions created by [[function]]
 expressions do not shadow globals within the containing expression and
 therefore [[function]] expressions do not contribute any new local
 variables. Similarly, [[local]] calls without an environment argument
 create a new environment for evaluating their expression and do not
 add new local variables.  If an environment argument is present then
 this might be the current environment and so assignments in the
 expression are considered to create possible local variables.
 Finally, [[~]], [[expression]], and [[quote]] do not
 evaluate their arguments and so do not contribute new local variables.
 <<[[findLocals0]] switch clauses>>=
 "function" = character(0),
 "~" = character(0),
 "local" = if (length(e) == 2) character(0)
           else findLocalsList0(e[-1], cntxt),
 "expression" =,
 "quote" = character(0),
 @ %def findLocals0
 Other functions, for example [[Quote]] from the [[methods]] package,
 are also known to not evaluate their arguments but these do not often
 contain assignment expressions and so ignoring them only slightly
 increases the degree of conservatism in this approach.

 A problem with this simple implementation is that it assumes that all
 of the functions named in the [[switch]] correspond to the bindings in
 the base package.  This is reasonable for the ones that are
 syntactically special, but not for [[expression]], [[local]] and
 [[quote]].  These might be shadowed by local definitions in a
 surrounding function.  To allow for this we can add an optional
 variable [[shadowed]] for providing a character vector of names of
 variables with shadowing local definitions.

 The more sophisticated implementaiton is also slightly optimized to avoid
 recursive calls. [[findLocals1]] now, instead of searching through the full
 transitive closure of language objects, only searches from the first, but
 returns what remains to be searched. The variables found are stored into an
 environment, which avoids some extra calls and assures that each variable is
 listed at most once.
 <<[[findLocals1]] function>>=
 addVar <- function(v, vars) assign(v, 1, envir = vars)
 findLocals1 <- function(e, shadowed = character(0), cntxt, vars) {
     if (typeof(e) == "language") {
         if (typeof(e[[1]]) %in% c("symbol", "character")) {
             v <- as.character(e[[1]])
             switch(v,
                    <<[[findLocals1]] switch clauses>>
                    e[-1])
         }
         else e
     }
     else NULL
 }
 @ %def findLocals1
 %% **** merge into single chunk??
 <<[[findLocalsList1]] function>>=
 findLocalsList1 <- function(elist, shadowed, cntxt) {
     todo <- elist
     vars <- new.env()
     while(length(todo) > 0) {
         newtodo <- list()
         lapply(todo, function(e)
             lapply(findLocals1(e, shadowed, cntxt, vars),
                    function(x)
 		       if (typeof(x) == "language")
 		           newtodo <<- append(newtodo, x))
         )
 	todo <- newtodo
     }
     ls(vars, all.names=T)
 }

 @ %def findLocalsList1
 The handling of assignment operators, [[for]] loops, [[function]] and
 [[~]] expressions is analogous to the approach in [[findLocals0]].
 <<[[findLocals1]] switch clauses>>=
 "=" =,
 "<-" = { addVar(getAssignedVar(e, cntxt), vars); e[-1] },

 "for" = { addVar(as.character(e[2]), vars); e[-2] },

 "delayedAssign" =,
 "assign" = if (length(e) == 3 &&
                is.character(e[[2]]) &&
                length(e[[2]]) == 1) {

                addVar(e[[2]], vars); list(e[[3]])
            }
            else e[1],
 "function" = character(0),
 "~" = character(0),
 @ %def
 The rules for ignoring assignments in [[local]], [[expression]], and
 [[quote]] calls are only applied if there are no shadowing
 definitions.
 <<[[findLocals1]] switch clauses>>=
 "local" = if (! v %in% shadowed && length(e) == 2)
               NULL
           else e[-1],
 "expression" =,
 "quote" = if (! v %in% shadowed)
               NULL
           else e[-1],
 @ %def
 The assignment functions could also be shadowed, but this is not very
 common, and assuming that they are not errs in the conservative
 direction.

 This approach can handle the case where [[quote]] or one of the other
 non-syntactic functions is shadowed by an outer definition but does not
 handle assignments that occur in the expression itself.  For example,
 in
 \begin{verbatim}
 function (f, x, y) {
     local <- f
     local(x <- y)
     x
 }
 \end{verbatim}
 the reference to [[x]] in the third line has to be considered
 potentially local.  To deal with this multiple passes are needed.  The
 first pass assumes that [[expression]], [[local]] or [[quote]] might
 be shadowed by local assignments.  If no assignments to some of them
 are visible, then a second pass can be used in which they are assumed
 not to be shadowed.  This can be iterated to convergence.  It is also
 useful to check before returning whether any of the syntactically
 special variables has been assigned to.  If so, so a warning is
 issued.
 %% **** look into speeding up findLocalsList
 <<[[findLocalsList]] function>>=
 findLocalsList <- function(elist, cntxt) {
     initialShadowedFuns <- c("expression", "local", "quote")
     shadowed <- Filter(function(n) ! isBaseVar(n, cntxt), initialShadowedFuns)
     specialSyntaxFuns <- c("~", "<-", "=", "for", "function")
     sf <- initialShadowedFuns
     nsf <- length(sf)
     repeat {
         vals <- findLocalsList1(elist, sf, cntxt)
         redefined <- sf %in% vals
         last.nsf <- nsf
         sf <- unique(c(shadowed, sf[redefined]))
         nsf <- length(sf)
         ## **** need to fix the termination condition used in codetools!!!
         if (last.nsf == nsf) {
             rdsf <- vals %in% specialSyntaxFuns
             if (any(rdsf))
                 ## cannot get location info (source reference) here
                 notifyAssignSyntacticFun(vals[rdsf], cntxt)
             return(vals)
         }
     }
 }
 @ %def findLocalsList
 <<[[findLocals]] function>>=
 findLocals <- function(e, cntxt)
     findLocalsList(list(e), cntxt)
 @ %def findLocals

 Standard definitions for all functions in [[initialShadowedFuns]] are
 in the base package and [[isBaseVar]] checks the compilation
 environment to see whether the specified variable's definition comes
 from that package either via a namespace or the global environment.
 <<[[isBaseVar]] function>>=
 isBaseVar <- function(var, cntxt) {
     info <- getInlineInfo(var, cntxt)
     (! is.null(info) &&
      (identical(info$frame, .BaseNamespaceEnv) ||
       identical(info$frame, baseenv())))
 }
 @ %def isBaseVar
 The use of [[getInlineInfo]], defined in Section \ref{sec:inlining}, means
 that the setting of the [[optimize]] compiler option will influence whether
 a variable should be considered to be from the base package or not.
 It might also be useful to warn about assignments to other functions.

 When a [[function]] expression is compiled, its body and default
 arguments need to be compiled using a compilation environment that
 contains a new frame for the function that contains variables for the
 arguments and any assignments in the body and the default expressions.
 [[funEnv]] creates such an environment.
 <<[[funEnv]] function>>=
 ## augment compiler environment with function args and locals
 funEnv <- function(forms, body, cntxt) {
     cntxt$env <- addCenvFrame(cntxt$env, names(forms))
     locals <- findLocalsList(c(forms, body), cntxt)
     addCenvVars(cntxt$env, locals)
 }
 @ %def funEnv


 \section{The inlining mechanism}
 \label{sec:inlining}
 To allow for inline coding of calls to some functions the [[cmpCall]]
 function calls the [[tryInline]] function.  The [[tryInline]] function
 will either generate code for the call and return [[TRUE]], or it will
 decline to do so and return [[FALSE]], in which case the standard code
 generation process for a function call is used.

 The function [[tryInline]] calls [[getInlineInfo]] to determine
 whether inlining is permissible given the current environment and
 optimization settings.  There are four possible optimization levels:
 \begin{description}
 \item[Level 0:] No inlining.
 \item[Level 1:] Functions in the base packages found through a
   namespace that are not shadowed by function arguments or visible
   local assignments may be inlined.
 \item[Level 2:] In addition to the inlining permitted by Level 1,
   functions that are syntactically special or are considered core
   language functions and are found via the global environment at
   compile time may be inlined. Other functions in the base packages
   found via the global environment may be inlined with a guard that
   ensures at runtime that the inlined function has not been masked;
   if it has, then the call in handled by the AST interpreter.
 \item[Level 3:] Any function in the base packages found via the global
   environment may be inlined.
   %% **** should there be an explicit list of functions where inlining
   %% **** is OK here??
 \end{description}
 The syntactically special and core language functions are
 <<[[languageFuns]] definition>>=
 languageFuns <- c("^", "~", "<", "<<-", "<=", "<-", "=", "==", ">", ">=",
                   "|", "||", "-", ":", "!", "!=", "/", "(", "[", "[<-", "[[",
                   "[[<-", "{", "@", "$", "$<-", "*", "&", "&&", "%/%", "%*%",
                   "%%", "+",
                   "::", ":::", "@<-",
                   "break", "for", "function", "if", "next", "repeat", "while",
                   "local", "return", "switch")
 @ %def languageFuns
 %% **** local, return, and switch are dubious here
 %% **** if we allow them, should we also allow a few others, like .Internal?
 The default optimization level is Level 2. Future versions of the
 compiler may allow some functions to be explicitly excluded from
 inlining and may provide a means for allowing user-defined functions
 to be declared eligible for inlining.

 If inlining is permissible then the result returned by
 [[getInlineInfo]] contains the packages associated with the specified
 variable in the current environment. The variable name and package are
 then looked up in a data base of handlers.  If a handler is found then
 the handler is called.  The handler can either generate code and
 return [[TRUE]] or decline to and return [[FALSE]].  If inlining is
 not possible then [[getInlineInfo]] returns [[NULL]] and [[tryInline]]
 returns [[FALSE]].
 %% **** think about adding GETNSFUN to use when inlining is OK
 <<[[tryInline]] function>>=
 tryInline <- function(e, cb, cntxt) {
     name <- as.character(e[[1]])
     info <- getInlineInfo(name, cntxt, guardOK = TRUE)
     if (is.null(info))
         FALSE
     else {
         h <- getInlineHandler(name, info$package)
         if (! is.null(h)) {
             if (info$guard) {
 	        <<inline with a guard instruction>>
             }
             else h(e, cb, cntxt)
 	}
         else FALSE
     }
 }
 @ %def tryInline
 If a guard instruction is needed then the instruction is emitted that
 will check validity of the inlined function at runtime; if the inlined
 code is not valid the guard instruction will evaluate the call in the
 AST interpreter and jump over the inlined code. The inlined code is
 handled as a non-tail-call; if the call is in tail position, then a
 return instruction is emitted.
 <<inline with a guard instruction>>=
 tailcall <- cntxt$tailcall
 if (tailcall) cntxt$tailcall <- FALSE
 expridx <- cb$putconst(e)
 endlabel <- cb$makelabel()
 cb$putcode(BASEGUARD.OP, expridx, endlabel)
 if (! h(e, cb, cntxt))
     cmpCall(e, cb, cntxt, inlineOK = FALSE)
 cb$putlabel(endlabel)
 if (tailcall) cb$putcode(RETURN.OP)
 TRUE
 @

 The function [[getInlineInfo]] implements the optimization rules
 described at the beginning of this section.
 <<[[getInlineInfo]] function>>=
 noInlineSymbols <- c("standardGeneric")

 getInlineInfo <- function(name, cntxt, guardOK = FALSE) {
     optimize <- cntxt$optimize
     if (optimize > 0 && ! (name %in% noInlineSymbols)) {
         info <- findCenvVar(name, cntxt$env)
         if (is.null(info))
             NULL
         else {
             ftype <- info$ftype
             frame <- info$frame
             if (ftype == "namespace") {
                 <<fixup for a namespace import frame>>
                 info$package <- nsName(findHomeNS(name, frame, cntxt))
 		info$guard <- FALSE
                 info
             }
             else if (ftype == "global" &&
                      (optimize >= 3 ||
                       (optimize >= 2 && name %in% languageFuns))) {
                 info$package <- packFrameName(frame)
 		info$guard <- FALSE
                 info
             }
             else if (guardOK && ftype == "global" &&
                      packFrameName(frame) == "base") {
                 info$package <- packFrameName(frame)
                 info$guard <- TRUE
                 info
             }
             else NULL
         }
     }
     else NULL
 }
 @ %def getInlineInfo
 The code for finding the home namespace from a namespace import frame
 is needed here to deal with the fact that a namespace may not be
 registered when this function is called, so the mechanism used in
 [[findHomeNS]] to locate the namespace to which an import frame
 belongs may not work.
 <<fixup for a namespace import frame>>=
 if (! isNamespace(frame)) {
     ## should be the import frame of the current topenv
     top <- topenv(cntxt$env$env)
     if (! isNamespace(top) ||
         ! identical(frame, parent.env(top)))
         cntxt$stop(gettext("bad namespace import frame"))
     frame <- top
 }
 @

 For this version of the compiler the inline handler data base is
 managed as an environment in which handlers are entered and looked up
 by name.  For now it is assumed that a name can only appear associated
 with one package and an error is signaled if an attempt is made to
 redefine a handler for a given name for a different package than an
 existing definition.  This can easily be changed if it should prove too
 restrictive.
 %% **** note on haveInlineHandler
 %% **** allow same name in multiple packages?
 <<inline handler implementation>>=
 inlineHandlers <- new.env(hash = TRUE, parent = emptyenv())

 setInlineHandler <- function(name, h, package = "base") {
     if (exists(name, inlineHandlers, inherits = FALSE)) {
         entry <- get(name, inlineHandlers)
         if (entry$package != package) {
             fmt <- "handler for '%s' is already defined for another package"
             stop(gettextf(fmt, name), domain = NA)
         }
     }
     entry <- list(handler = h, package = package)
     assign(name, entry, inlineHandlers)
 }

 getInlineHandler <- function(name, package = "base") {
     if (exists(name, inlineHandlers, inherits = FALSE)) {
         hinfo <- get(name, inlineHandlers)
         if (hinfo$package == package)
             hinfo$handler
         else NULL
     }
     else NULL
 }

 haveInlineHandler <- function(name, package = "base") {
     if (exists(name, inlineHandlers, inherits = FALSE)) {
         hinfo <- get(name, inlineHandlers)
         package == hinfo$package
     }
     else FALSE
 }
 @ %def inlineHandlers getInlineHandler setInlineHandler haveInlineHandler


 \section{Default inlining rules for primitives}
 This section defines generic handlers for [[BUILTIN]] and [[SPECIAL]]
 functions.  These are installed programmatically for all [[BUILTIN]]
 and [[SPECIAL]] functions.  The following sections present more
 specialized handlers for a range of functions that are installed in
 place of the default ones.
 <<install default inlining handlers>>=
 local({
     <<install default [[SPECIAL]] handlers>>
     <<install default [[BUILTIN]] handlers>>
 })
 @
 The handler installations are wrapped in a [[local]] call to reduce
 environment pollution.


 \subsection{[[BUILTIN]] functions}
 Calls to functions known at compile time to be of type [[BUILTIN]] can
 be handled more efficiently. The interpreter evaluates all arguments
 for [[BUILTIN]] functions before calling the function, so the compiler
 can evaluate the arguments in line without the need for creating
 promises.

 A generic handler for inlining a call to a [[BUILIN]] function is
 provided by [[cmpBuiltin]].  For now, the handler returns [[FALSE]] if
 the call contains missing arguments, which are currently not allowed
 in [[BUILTIN]] functions, or [[...]]  arguments.  The handling of
 [[...]] arguments should be improved.
 %% **** look into improving handling ... arguments to BUILTINs?
 For [[BUILTIN]] functions the function to call is pushed on the stack
 with the [[GETBUILTIN]] instruction.  The [[internal]] argument allows
 [[cmpBuiltin]] to be used with [[.Internal]] functions of type
 [[BUILTIN]] as well; this is used in the handler for [[.Internal]]
 defined in Section \ref{subsec:.Internal}.
 <<[[cmpBuiltin]] function>>=
 cmpBuiltin <- function(e, cb, cntxt, internal = FALSE) {
     fun <- e[[1]]
     args <- e[-1]
     names <- names(args)
     if (dots.or.missing(args))
         FALSE
     else {
         ci <- cb$putconst(fun)
         if (internal)
             cb$putcode(GETINTLBUILTIN.OP, ci)
         else
             cb$putcode(GETBUILTIN.OP, ci)
         cmpBuiltinArgs(args, names, cb, cntxt)
         ci <- cb$putconst(e)
         cb$putcode(CALLBUILTIN.OP, ci)
         if (cntxt$tailcall) cb$putcode(RETURN.OP)
         TRUE
     }
 }
 @ %def cmpBuiltin

 Argument evaluation code is generated by [[cmpBuiltinArgs]].  In the
 context of [[BUILTIN]] functions missing arguments are currently not
 allowed.  But to allow [[cmpBuiltinArgs]] to be used in other contexts
 missing arguments are supported if the optional argument [[missingOK]]
 is [[TRUE]].
 %% **** should this warn/stop if there are missings and missingOK is FALSE??
 %% **** can this be adjusted so error messages match the interpreter?
 %% **** for f <- function(x, y) x + y compare errors for f(1,) and cmpfun(f)(1,)
 %% **** test code for constant folding is needed (sym and non-sym)
 <<[[cmpBuiltinArgs]] function>>=
 cmpBuiltinArgs <- function(args, names, cb, cntxt, missingOK = FALSE) {
     ncntxt <- make.argContext(cntxt)
     for (i in seq_along(args)) {
         a <- args[[i]]
         n <- names[[i]]
         <<compile missing [[BUILTIN]] argument>>
         ## **** handle ... here ??
         <<signal an error for promise or bytecode argument>>
         <<compile a general [[BUILTIN]] argument>>
     }
 }
 @ %def cmpBuiltinArgs

 Missing argument code is generated by
 <<compile missing [[BUILTIN]] argument>>=
 if (missing(a)) {
     if (missingOK) {
         cb$putcode(DOMISSING.OP)
         cmpTag(n, cb)
     }
     else
         cntxt$stop(gettext("missing arguments are not allowed"), cntxt,
                    loc = cb$savecurloc())
 }
 @
 The error case should not be reached as [[cmpBuiltinArgs]] should not
 be called with missing arguments unless [[missingOK]] is [[TRUE]].

 The code for general arguments handles symbols separately to allow for
 the case when missing values are acceptable.  Constant folding is
 tried first since the constant folding code in [[cmp]] is not reached
 in this case.  Constant folding is needed here since it doesn't go
 through [[cmp]].
 <<compile a general [[BUILTIN]] argument>>=
 else {
     if (is.symbol(a)) {
         ca <- constantFold(a, cntxt, loc = cb$savecurloc())
         if (is.null(ca)) {
             cmpSym(a, cb, ncntxt, missingOK)
             cb$putcode(PUSHARG.OP)
         }
         else
             cmpConstArg(ca$value, cb, cntxt)
     }
     else if (typeof(a) == "language") {
         cmp(a, cb, ncntxt)
         cb$putcode(PUSHARG.OP)
     }
     else
         cmpConstArg(a, cb, cntxt)
     cmpTag(n, cb)
 }
 @ %def
 Handling the constant case separately is not really necessary but
 makes the code a bit cleaner.

 Default handlers for all [[BUILTIN]] functions in the [[base]] package
 are installed programmatically by
 <<install default [[BUILTIN]] handlers>>=
 for (b in basevars[types == "builtin"])
     if (! haveInlineHandler(b, "base"))
         setInlineHandler(b, cmpBuiltin)
 @ %def


 \subsection{[[SPECIAL]] functions}
 Calls to functions known to be of type [[SPECIAL]] can also be
 compiled somewhat more efficiently by the [[cmpSpecial]] function:
 <<[[cmpSpecial]] function>>=
 cmpSpecial <- function(e, cb, cntxt) {
     fun <- e[[1]]
     if (typeof(fun) == "character")
         fun <- as.name(fun)
     ci <- cb$putconst(e)
     cb$putcode(CALLSPECIAL.OP, ci)
     if (cntxt$tailcall)
         cb$putcode(RETURN.OP)
     TRUE
 }
 @ %def cmpSpecial

 This handler is installed for all [[SPECIAL]] functions in the base
 package with
 <<install default [[SPECIAL]] handlers>>=
 basevars <- ls('package:base', all = TRUE)
 types <- sapply(basevars, function(n) typeof(get(n)))
 for (s in basevars[types == "special"])
     if (! haveInlineHandler(s, "base"))
         setInlineHandler(s, cmpSpecial)
 @ %def


 \section{Some simple inlining handlers}
 This section presents inlining handlers for a number of core primitive
 functions.  With these additions the compiler will begin to show some
 performance improvements.

 \subsection{The left brace sequencing function}
 The inlining handler for [[{]] needs to consider that a pair of braces
 [[{]] and [[}]] can surround zero, one, or more expressions.  A set
 of empty braces is equivalent to the constant [[NULL]].  If there is
 more than one expression, then all the values of all expressions other
 than the last are ignored.  These expressions are compiled in a
 no-value context (currently equivalent to a non-tail-call context),
 and then code is generated to pop their values off the stack.  The
 final expression is then compiled according to the context in which
 the braces expression occurs.
 <<inlining handler for left brace function>>=
 setInlineHandler("{", function(e, cb, cntxt) {
     n <- length(e)
     if (n == 1)
         cmp(NULL, cb, cntxt)
     else {
         sloc <- cb$savecurloc()
         bsrefs <- attr(e, "srcref")
         if (n > 2) {
             ncntxt <- make.noValueContext(cntxt)
             for (i in 2 : (n - 1)) {
                 subexp <- e[[i]]
                 cb$setcurloc(subexp, getBlockSrcref(bsrefs, i))
                 cmp(subexp, cb, ncntxt, setloc = FALSE)
                 cb$putcode(POP.OP)
             }
         }
         subexp <- e[[n]]
         cb$setcurloc(subexp, getBlockSrcref(bsrefs, n))
         cmp(subexp, cb, cntxt, setloc = FALSE)
         cb$restorecurloc(sloc)
     }
     TRUE
 })
 @ %def


 \subsection{The closure constructor function}
 Compiling of [[function]] expressions is somewhat similar to compiling
 promises for function arguments.  The body of a function is compiled
 into a separate byte code object and stored in the constant pool
 together with the formals.  Then code is emitted for creating a
 closure from the formals, compiled body, and the current environment.
 For now, only the body of functions is compiled, not the
 default argument expressions.  This should be changed in future
 versions of the compiler.
 <<inlining handler for [[function]]>>=
 setInlineHandler("function", function(e, cb, cntxt) {
     forms <- e[[2]]
     body <- e[[3]]
     sref <- e[[4]]
     ncntxt <- make.functionContext(cntxt, forms, body)
     if (mayCallBrowser(body, cntxt))
         return(FALSE)
     cbody <- genCode(body, ncntxt, loc = cb$savecurloc())
     ci <- cb$putconst(list(forms, cbody, sref))
     cb$putcode(MAKECLOSURE.OP, ci)
     if (cntxt$tailcall) cb$putcode(RETURN.OP)
     TRUE
 })
 @ %def


 \subsection{The left parenthesis function}
 In R an expression of the form [[(expr)]] is interpreted as a call to
 the function [[(]] with the argument [[expr]].  Parentheses are used
 to guide the parser, and for the most part [[(expr)]] is equivalent to
 [[expr]]. There are two exceptions:
 \begin{itemize}
 \item Since [[(]] is a function an expression of the form [[(...)]] is
   legal whereas just [[...]] may not be, depending on the context.  A
   runtime error will occur unless the [[...]] argument expands to
   exactly one non-missing argument.
 \item In tail position a call to [[(]] sets the visible flag to
   [[TRUE]].  So at top level for example the result of an assignment
   expression [[x <- 1]] would not be printed, but the result of [[(x
   <- 1]] would be printed.  It is not clear that this feature really
   needs to be preserved within functions --- it could be made a
   feature of the read-eval-print loop --- but for now it is a feature
   of the interpreter that the compiler should preserve.
 \end{itemize}

 The inlining handler for [[(]] calls handles a [[...]]  argument case
 or a case with fewer or more than one argument as a generic
 [[BUILTIN]] call.  If the expression is in tail position then the
 argument is compiled in a non-tail-call context, a [[VISIBLE]]
 instruction is emitted to set the visible flag to [[TRUE]], and a
 [[RETURN]] instruction is emitted.  If the expression is in non-tail
 position, then code for the argument is generated in the current context.
 <<inlining handler for [[(]]>>=
 setInlineHandler("(", function(e, cb, cntxt) {
     if (any.dots(e))
         cmpBuiltin(e, cb, cntxt) ## punt
     else if (length(e) != 2) {
         notifyWrongArgCount("(", cntxt, loc = cb$savecurloc())
         cmpBuiltin(e, cb, cntxt) ## punt
     }
     else if (cntxt$tailcall) {
         ncntxt <- make.nonTailCallContext(cntxt)
         cmp(e[[2]], cb, ncntxt)
         cb$putcode(VISIBLE.OP)
         cb$putcode(RETURN.OP)
         TRUE
     }
     else {
         cmp(e[[2]], cb, cntxt)
         TRUE
     }
 })
 @ %def


 \subsection{The [[.Internal]] function}
 \label{subsec:.Internal}
 One frequently used [[SPECIAL]] function is [[.Internal]]. When the
 [[.Internal]] function called is of type [[BUILTIN]] it is useful to
 compile the call as for a [[BUILTIN]] function.  For [[.Internal]]
 functions of type [[SPECIAL]] there is less of an advantage, and so
 the [[.Internal]] expression is compiled with [[cmpSpecial]].  It may
 be useful to introduce a [[GETINTLSPECIAL]] instruction and handle
 these analogously to [[.Internal]] functions of type [[BUILTIN]].  The
 handler is assigned to the variable [[cmpDotInternalCall]] to allow
 its use in inlining.
 %% **** look into adding GETINTLSPECIAL??
 <<inlining handler for [[.Internal]]>>=
 cmpDotInternalCall <- function(e, cb, cntxt) {
     ee <- e[[2]]
     sym <- ee[[1]]
     if (.Internal(is.builtin.internal(sym)))
         cmpBuiltin(ee, cb, cntxt, internal = TRUE)
     else
         cmpSpecial(e, cb, cntxt)
 }

 setInlineHandler(".Internal", cmpDotInternalCall)
 @ %def cmpDotInternalCall


 \subsection{The [[local]] function}
 While [[local]] is currently implemented as a closure, because of its
 importance relative to local variable determination it is a good idea
 to inline it as well. The current semantics are such that the
 interpreter treats
 \begin{verbatim}
 local(expr)
 \end{verbatim}
 essentially the same as
 \begin{verbatim}
 (function() expr)()
 \end{verbatim}
 There may be some minor differences related to what the [[sys.xyz]]
 functions return. An instance of this was found in the [[RefManageR]]
 package which used [[parent.frame(2)]] to access the environment from which
 [[local]] was invoked. In this case, the use of [[parent.frame]] was
 unnecessary (and [[local]] was not needed either); the maintainer
 accepted a patch fixing this. The code pattern in the package was
 \begin{verbatim}
 MakeBibLaTeX <- function(docstyle = "text") local({
   docstyle <- get("docstyle", parent.frame(2))
   sortKeys <- function() 42
   environment()
 })
 \end{verbatim}
 and the suggested fix was
 \begin{verbatim}
 MakeBibLaTeX <- function(docstyle = "text") {
   sortKeys <- function() 42
   environment()
 }
 \end{verbatim}
 So the compiler handles one argument [[local]] calls by making this
 conversion and compiling the result.
 %% **** add to language manual?
 <<inlining handler for [[local]] function>>=
 setInlineHandler("local", function(e, cb, cntxt) {
     if (length(e) == 2) {
         ee <- as.call(list(as.call(list(
             as.name("function"), NULL, e[[2]], NULL))))
         cmp(ee, cb, cntxt)
         TRUE
     }
     else FALSE
 })
 @ %def
 The interpreter couls, and probably should, be modified to handle this
 case of a [[local]] call expression in the same way as the compiler.

 \subsection{The [[return]] function}
 \label{subsec:return}
 A call to [[return]] causes a return from the associated function
 call, as determined by the lexical context in which the [[return]]
 expression is defined.  If the [[return]] is captured in a closure and
 is executed within a callee then this requires a [[longjmp]].  A
 [[longjmp]] is also needed if the [[return]] call occurs within a loop
 that is compiled to a separate code object to support a [[setjmp]] for
 [[break]] or [[next]] calls.  The [[RETURNJMP]] instruction is
 provided for this purpose.  In all other cases an ordinary [[RETURN]]
 instruction can be used.
 %% **** if function body code was tagged as such then changing from
 %% **** RETURNJMP to RETURN could be done by post-processing the
 %% **** bytecode or by putcode
 [[return]] calls with [[...]], which may be legal if [[...]] contains
 only one argument, or missing arguments or more than one argument,
 which will produce runtime errors, are compiled as generic [[SPECIAL]]
 calls.
 <<inlining handler for [[return]] function>>=
 setInlineHandler("return", function(e, cb, cntxt) {
     if (dots.or.missing(e) || length(e) > 2)
         cmpSpecial(e, cb, cntxt) ## **** punt for now
     else {
         if (length(e) == 1)
             val <- NULL
         else
             val <- e[[2]]
         ncntxt <- make.nonTailCallContext(cntxt)
         cmp(val, cb, ncntxt)
 	if (cntxt$needRETURNJMP)
             cb$putcode(RETURNJMP.OP)
         else
             cb$putcode(RETURN.OP)
     }
     TRUE
 })
 @


 \section{Branching and labels}
 The code generated so far is straight line code without conditional or
 unconditional branches.  To implement conditional evaluation
 constructs and loops we need to add conditional and unconditional
 branching instructions.  These make use of the labels mechanism
 provided by the code buffer.


 \subsection{Inlining handler for [[if]] expressions}
 Using the labels mechanism we can implement an inlining handler for
 [[if]] expressions.  The first step extracts the components of the
 expression.  An [[if]] expression with no [[else]] clause will
 invisibly return [[NULL]] if the test is [[FALSE]], but the visible
 flag setting only matters if the [[if]] expression is in tail
 position.  So the case of no [[else]] clause will be handled slightly
 differently in tail and non-tail contexts.
 %% **** In no value contexts it would be good to avoid pushing and
 %% **** immediately popping constants. Alternatively a peephole optimizer
 %% **** could clean these up.
 %% **** Should there be error checking for either two or three arguments here??
 <<[[if]] inline handler body>>=
 test <- e[[2]]
 then.expr <- e[[3]]
 if (length(e) == 4) {
     have.else.expr <- TRUE
     else.expr <- e[[4]]
 }
 else have.else.expr <- FALSE
 @ %def

 To deal with use of [[if (FALSE) ...]] for commenting out code and of
 [[if (is.R()) ... else ...]] for handling both R and Splus code it is
 useful to attempt to constant-fold the test.  If this succeeds and
 produces either [[TRUE]] or [[FALSE]] then only the appropriate branch
 is compiled and the handler returns [[TRUE]].
 <<[[if]] inline handler body>>=
 ct <- constantFold(test, cntxt, loc = cb$savecurloc())
 if (! is.null(ct) && is.logical(ct$value) && length(ct$value) == 1
     && ! is.na(ct$value)) {
     if (ct$value)
         cmp(then.expr, cb, cntxt)
     else if (have.else.expr)
         cmp(else.expr, cb, cntxt)
     else if (cntxt$tailcall) {
         cb$putcode(LDNULL.OP)
         cb$putcode(INVISIBLE.OP)
         cb$putcode(RETURN.OP)
     }
     else cb$putcode(LDNULL.OP)
     return(TRUE)
 }
 @

 Next, the test code is compiled, a label for the start of code for the
 [[else]] clause is generated, and a conditional branch instruction
 that branches to the [[else]] label if the test fails is emitted.
 This is followed by code for the consequent (test is [[TRUE]])
 expression.  The [[BRIFNOT]] takes two operand, the constant pool
 index for the call and the label to branch to if the value on the
 stack is [[FALSE]].  The call is used if an error needs to be signaled
 for an improper test result on the stack.
 <<[[if]] inline handler body>>=
 ncntxt <- make.nonTailCallContext(cntxt)
 cmp(test, cb, ncntxt)
 callidx <- cb$putconst(e)
 else.label <- cb$makelabel()
 cb$putcode(BRIFNOT.OP, callidx, else.label)
 cmp(then.expr, cb, cntxt)
 @

 The code for the alternative [[else]] expression will be placed after
 the code for the consequent expression.  If the [[if]] expression
 appears in tail position then the code for the consequent will end with
 a [[RETURN]] instruction and there is no need to jump over the
 following instructions for the [[else]] expression. All that is needed
 is to record the value of the label for the [[else]] clause and to
 emit the code for the [[else]] clause.  If no [[else]] clause was
 provided then that code arranges for the value [[NULL]] to be returned
 invisibly.
 <<[[if]] inline handler body>>=
 if (cntxt$tailcall) {
     cb$putlabel(else.label)
     if (have.else.expr)
         cmp(else.expr, cb, cntxt)
     else {
         cb$putcode(LDNULL.OP)
         cb$putcode(INVISIBLE.OP)
         cb$putcode(RETURN.OP)
     }
 }
 @ %def
 On the other hand, if the [[if]] expression is not in tail position
 then a label for the next instruction after the [[else]] expression
 code is needed, and the consequent expression code needs to end with a
 [[GOTO]] instruction to that label.  If the expression does not
 include an [[else]] clause then the alternative code just places
 [[NULL]] on the stack.
 <<[[if]] inline handler body>>=
 else {
     end.label <- cb$makelabel()
     cb$putcode(GOTO.OP, end.label)
     cb$putlabel(else.label)
     if (have.else.expr)
         cmp(else.expr, cb, cntxt)
     else
         cb$putcode(LDNULL.OP)
     cb$putlabel(end.label)
 }
 @ %def

 The resulting handler definition is
 <<inlining handler for [[if]]>>=
 setInlineHandler("if", function(e, cb, cntxt) {
     ## **** test for missing, ...
     <<[[if]] inline handler body>>
     TRUE
 })
 @ %def
 %% **** need some assembly code examples??


 \subsection{Inlining handlers for [[&&]] and [[||]] expressions}
 In many languages it is possible to convert the expression [[a && b]]
 to an equivalent [[if]] expression of the form
 \begin{verbatim}
 if (a) { if (b) TRUE else FALSE }
 \end{verbatim}
 Similarly, in these languages the expression [[a || b]] is equivalent
 to
 \begin{verbatim}
 if (a) TRUE else if (b) TRUE else FALSE
 \end{verbatim}
 Compilation of these expressions is thus reduced to compiling [[if]]
 expressions.

 Unfortunately, because of the possibility of [[NA]] values, these
 equivalencies do not hold in R. In R, [[NA || TRUE]] should evaluate
 to [[TRUE]] and [[NA && FALSE]] to [[FALSE]].  This is handled by
 introducing special instructions [[AND1ST]] and [[AND2ND]] for [[&&]]
 expressions and [[OR1ST]] and [[OR2ND]] for [[||]].

 The code generator for [[&&]] expressions generates code to evaluate
 the first argument and then emits an [[AND1ST]] instruction. The
 [[AND1ST]] instruction has one operand, the label for the instruction
 following code for the second argument.  If the value on the stack
 produced by the first argument is [[FALSE]] then [[AND1ST]] jumps to
 the label and skips evaluation of the second argument; the value of
 the expression is [[FALSE]].  The code for the second argument is
 generated next, followed by an [[AND2ND]] instruction.  This removes
 the values of the two arguments to [[&&]] from the stack and pushes
 the value of the expression onto the stack.  A [[RETURN]] instruction
 is generated if the [[&&]] expression was in tail position.
 %% **** check over all uses of argContext vs nonTailCallContext
 %% **** The first argument can use nonTailCallCOntext because nothing
 %% **** is on the stack yet.  The second one has to use argContext.
 %% **** This wouldn't be an issue if break/next could reset the stack
 %% **** before the jump.
 <<inlining handler for [[&&]]>>=
 setInlineHandler("&&", function(e, cb, cntxt) {
     ## **** arity check??
     ncntxt <- make.argContext(cntxt)
     callidx <- cb$putconst(e)
     label <- cb$makelabel()
     cmp(e[[2]], cb, ncntxt)
     cb$putcode(AND1ST.OP, callidx, label)
     cmp(e[[3]], cb, ncntxt)
     cb$putcode(AND2ND.OP, callidx)
     cb$putlabel(label)
     if (cntxt$tailcall)
         cb$putcode(RETURN.OP)
     TRUE
 })
 @ %def

 The code generator for [[||]] expressions is analogous.
 <<inlining handler for [[||]]>>=
 setInlineHandler("||", function(e, cb, cntxt) {
     ## **** arity check??
     ncntxt <- make.argContext(cntxt)
     callidx <- cb$putconst(e)
     label <- cb$makelabel()
     cmp(e[[2]], cb, ncntxt)
     cb$putcode(OR1ST.OP, callidx, label)
     cmp(e[[3]], cb, ncntxt)
     cb$putcode(OR2ND.OP, callidx)
     cb$putlabel(label)
     if (cntxt$tailcall)
         cb$putcode(RETURN.OP)
     TRUE
 })
 @ %def


 \section{Loops}
 \label{sec:loops}
 In principle code for [[repeat]] and [[while]] loops can be generated
 using just [[GOTO]] and [[BRIFNOT]] instructions.  [[for]] loops
 require a little more to manage the loop variable and termination.  A
 complication arises due to the need to support [[break]] and [[next]]
 calls in the context of lazy evaluation of arguments: if a [[break]]
 or [[next]] expression appears in a function argument that is compiled
 as a closure, then the expression may be evaluated deep inside a
 series of nested function calls and require a non-local jump.  A
 similar issue arises for calls to the [[return]] function as described
 in Section \ref{subsec:return}.

 To support these non-local jumps the interpreter sets up a [[setjmp]]
 context for each loop, and [[break]] and [[next]] use [[longjmp]] to
 transfer control.  In general, compiled loops need to use a similar
 approach.  For now, this is implemented by the [[STARTLOOPCNTXT]] and
 [[ENDLOOPCNTXT]] instructions.  The [[STARTLOOPCNTXT]] instructions
 takes two operands, a flag indicating whether the loop is a [[for]]
 loop or not, and a label which points after the loop. The interpreter
 jumps to this label in case of a non-local jump implementing
 [[break]].  The loop body should end with a call to [[ENDLOOPCNTXT]],
 which takes one operand indicating whether this is a [[for]] loop or
 not.  [[ENDLOOPCNTXT]] terminates the context established by
 [[STARTLOOPCNTXT]] and pops it off the context stack.  The context
 data is stored on the byte code interpreter stack; in the case of a
 [[for]] loop some loop state information is duplicated on the stack by
 [[STARTLOOPCNTXT]] and removed again by [[ENDLOOPCNTXT]]. The byte
 code intepreter stores the [[pc]] in a slot in the [[RCNTXT]]
 structure so it is available after a [[longjmp]] triggered by a
 [[break]] for retrieving the label on the [[ENDLOOPCNTXT]]
 instruction.  An alternative would be to add separate
 [[STARTFORLOOPCNTXT]] and [[ENDFORLOOPCNTXT]] instructions. Then the
 [[pc]] or the label could be stored on the note stack.

 At least with some assumptions it is often possible to implement
 [[break]] and [[next]] calls as simple [[GOTO]]s.  If all [[break]]
 and [[next]] calls in a loop can be implemented using [[GOTO]]s then
 the loop context is not necessary.  The mechanism to enable the
 simpler code generation is presented in Section
 \ref{subsec:skipcntxt}.

 The current engine implementation executes one [[setjmp]] per
 [[STARTLOOPCNTXT]] and uses nested calls to [[bceval]] to run the
 code.  Eventually we should be able to reduce the need for nested
 [[bceval]] calls and to arrange that [[setjmp]] buffers be reused for
 multiple purposes.


 \subsection{[[repeat]] loops}
 The simplest loop in R is the [[repeat]] loop.  The code generator is
 defined as
 <<inlining handler for [[repeat]] loops>>=
 setInlineHandler("repeat", function(e, cb, cntxt) {
     body <- e[[2]]
     <<generate context and body for [[repeat]] loop>>
     <<generate [[repeat]] and [[while]] loop wrap-up code>>
     TRUE
 })
 @ %def

 If a loop context is not needed then the code for the loop body is
 just written to the original code buffer.  The [[else]] clause in the
 code chunk below generates the code for the general case. The need for
 using [[RETURNJMP]] for [[return]] calls is indicated by setting the
 [[needRETURNJMP]] flag in the compiler context to [[TRUE]].
 <<generate context and body for [[repeat]] loop>>=
 if (checkSkipLoopCntxt(body, cntxt))
     cmpRepeatBody(body, cb, cntxt)
 else {
     cntxt$needRETURNJMP <- TRUE ## **** do this a better way
     ljmpend.label <- cb$makelabel()
     cb$putcode(STARTLOOPCNTXT.OP, 0, ljmpend.label)
     cmpRepeatBody(body, cb, cntxt)
     cb$putlabel(ljmpend.label)
     cb$putcode(ENDLOOPCNTXT.OP, 0)
 }
 @ %def

 The loop body uses two labels. [[loop.label]] marks the top of the
 loop and is the target of the [[GOTO]] instruction at the end of the
 body.  This label is also used by [[next]] expressions that do not
 require [[longjmp]]s.  The [[end.loop]] label is placed after the
 [[GOTO]] instruction and is used by [[break]] expressions that do not
 require [[longjmp]]s.  The body is compiled in a context that makes
 these labels available, and the value left on the stack is removed by
 a [[POP]] instruction.  The [[POP]] instruction is followed by a
 [[GOTO]] instruction that returns to the top of the loop.
 <<[[cmpRepeatBody]] function>>=
 cmpRepeatBody <- function(body, cb, cntxt) {
     loop.label <- cb$makelabel()
     end.label <- cb$makelabel()
     cb$putlabel(loop.label)
     lcntxt <- make.loopContext(cntxt, loop.label, end.label)
     cmp(body, cb, lcntxt)
     cb$putcode(POP.OP)
     cb$putcode(GOTO.OP, loop.label)
     cb$putlabel(end.label)
 }
 @ %def cmpRepeatBody


 The wrap-up code for the loop places the [[NULL]] value of the loop
 expression on the stack and emits [[INVISIBLE]] and [[RETURN]]
 instructions to return the value if the loop appears in tail position.
 <<generate [[repeat]] and [[while]] loop wrap-up code>>=
 cb$putcode(LDNULL.OP)
 if (cntxt$tailcall) {
     cb$putcode(INVISIBLE.OP)
     cb$putcode(RETURN.OP)
 }
 @ %def

 The [[break]] and [[next]] code generators emit [[GOTO]] instructions
 if the loop information is available and the [[gotoOK]] compiler
 context flag is [[TRUE]].  A warning is issued if no loop is visible
 in the compilation context.
 <<inlining handlers for [[next]] and [[break]]>>=
 setInlineHandler("break", function(e, cb, cntxt) {
     if (is.null(cntxt$loop)) {
         notifyWrongBreakNext("break", cntxt, loc = cb$savecurloc())
         cmpSpecial(e, cb, cntxt)
     }
     else if (cntxt$loop$gotoOK) {
         cb$putcode(GOTO.OP, cntxt$loop$end)
         TRUE
     }
     else cmpSpecial(e, cb, cntxt)
 })

 setInlineHandler("next", function(e, cb, cntxt) {
     if (is.null(cntxt$loop)) {
         notifyWrongBreakNext("next", cntxt, loc = cb$savecurloc())
         cmpSpecial(e, cb, cntxt)
     }
     else if (cntxt$loop$gotoOK) {
         cb$putcode(GOTO.OP, cntxt$loop$loop)
         TRUE
     }
     else cmpSpecial(e, cb, cntxt)
 })
 @ %def


 \subsection{[[while]] loops}
 %% could just compile repeat{ if (condition) body else break } ??
 The structure for the [[while]] loop code generator is similar to the
 structure of the [[repeat]] code generator:
 <<inlining handler for [[while]] loops>>=
 setInlineHandler("while", function(e, cb, cntxt) {
     cond <- e[[2]]
     body <- e[[3]]
     <<generate context and body for [[while]] loop>>
     <<generate [[repeat]] and [[while]] loop wrap-up code>>
     TRUE
 })
 @ %def
 The context and body generation chunk is similar as well. The
 expression stored in the code object isn't quite right as what is
 compiled includes both the test and the body, but this code object
 should not be externally visible.
 <<generate context and body for [[while]] loop>>=
 if (checkSkipLoopCntxt(cond, cntxt) && checkSkipLoopCntxt(body, cntxt))
     cmpWhileBody(e, cond, body, cb, cntxt)
 else {
     cntxt$needRETURNJMP <- TRUE ## **** do this a better way
     ljmpend.label <- cb$makelabel()
     cb$putcode(STARTLOOPCNTXT.OP, 0, ljmpend.label)
     cmpWhileBody(e, cond, body, cb, cntxt)
     cb$putlabel(ljmpend.label)
     cb$putcode(ENDLOOPCNTXT.OP, 0)
 }
 @ %def

 Again two labels are used, one at the top of the loop and one at the
 end.  The [[loop.label]] is followed by code for the test.  Next is a
 [[BRIFNOT]] instruction that jumps to the end of the loop if the value
 left on the stack by the test is [[FALSE]].  This is followed by the
 code for the body, a [[POP]] instruction, and a [[GOTO]] instruction
 that jumps to the top of the loop. Finally, the [[end.label]] is
 recorded.
 <<[[cmpWhileBody]] function>>=
 cmpWhileBody <- function(call, cond, body, cb, cntxt) {
     loop.label <- cb$makelabel()
     end.label <- cb$makelabel()
     cb$putlabel(loop.label)
     lcntxt <- make.loopContext(cntxt, loop.label, end.label)
     cmp(cond, cb, lcntxt)
     callidx <- cb$putconst(call)
     cb$putcode(BRIFNOT.OP, callidx, end.label)
     cmp(body, cb, lcntxt)
     cb$putcode(POP.OP)
     cb$putcode(GOTO.OP, loop.label)
     cb$putlabel(end.label)
 }
 @ cmpWhileBody


 \subsection{[[for]] loops}
 %% could compile repeat { if (stepfor) body else break } and peephole a bit ??
 Code generation for [[for]] loops is a little more complex because of
 the need to manage the loop variable value and stepping through the
 sequence.  Code for [[for]] loops uses three additional instructions.
 [[STARTFOR]] takes the constant pool index of the call, the constant
 pool index of the loop variable symbol, and the label of the start
 instruction as operands. It finds the sequence to iterate over on the
 stack and places information for accessing the loop variable binding
 and stepping the sequence on the stack before jumping to the label.
 The call is used if an error for an improper for loop sequence needs
 to be signaled.  The [[STEPFOR]] instruction takes a label for the top
 of the loop as its operand.  If there are more elements in the
 sequence then [[STEPFOR]] advances the position within the sequence,
 sets the loop variable, and jumps to the top of the loop.  Otherwise
 it drops through to the next instruction.  Finally [[ENDFOR]] cleans
 up the loop information stored on the stack by [[STARTFOR]] and leaves
 the [[NULL]] loop value on the stack.

 The inlining handler for a [[for]] loop starts out by checking the
 loop variable and issuing a warning if it is not a symbol.  The code
 generator then declines to inline the loop expression.  This means it
 is compiled as a generic function call and will signal an error at
 runtime.  An alternative would be do generate code to signal the error
 as is done with improper use of [[...]] arguments.  After checking the
 symbol, code to compute the sequence to iterate over is generated.
 From then on the structure is similar to the structure of the other
 loop code generators.
 %% **** do cmpSpecial instead of returning FALSE??
 <<inlining handler for [[for]] loops>>=
 setInlineHandler("for", function(e, cb, cntxt) {
     sym <- e[[2]]
     seq <- e[[3]]
     body <- e[[4]]
     if (! is.name(sym)) {
         ## not worth warning here since the parser should not allow this
         return(FALSE)
     }
     ncntxt <- make.nonTailCallContext(cntxt)
     cmp(seq, cb, ncntxt)
     ci <- cb$putconst(sym)
     callidx <- cb$putconst(e)
     <<generate context and body for [[for]] loop>>
     <<generate [[for]] loop wrap-up code>>
     TRUE
 })
 @ %def

 When a [[setjmp]] context is needed, the label given to [[STARTFOR]]
 is just the following instruction, which is a [[STARTLOOPCNTXT]]
 instruction.  If the context is not needed then the label for the
 [[STARTFOR]] instruction will be the loop's [[STEPFOR]] instruction;
 if the context is needed then the first instruction in the code object
 for the body will be a [[GOTO]] instruction that jumps to the
 [[STEPFOR]] instruction.  This design means the stepping and the jump
 can be handled by one instruction instead of two, a step instruction
 and a [[GOTO]].
 <<generate context and body for [[for]] loop>>=
 if (checkSkipLoopCntxt(body, cntxt))
     cmpForBody(callidx, body, ci, cb, cntxt)
 else {
     cntxt$needRETURNJMP <- TRUE ## **** do this a better way
     ctxt.label <- cb$makelabel()
     cb$putcode(STARTFOR.OP, callidx, ci, ctxt.label)
     cb$putlabel(ctxt.label)
     ljmpend.label <- cb$makelabel()
     cb$putcode(STARTLOOPCNTXT.OP, 1, ljmpend.label)
     cmpForBody(NULL, body, NULL, cb, cntxt)
     cb$putlabel(ljmpend.label)
     cb$putcode(ENDLOOPCNTXT.OP, 1)
 }
 @ %def

 The body code generator takes an additional argument, the index of the
 loop label.  For the case where a [[setjmp]] context is needed this
 argument is [[NULL]], and the first instruction generated is a
 [[GOTO]] targeting the [[STEPFOR]] instruction.  This is labeled by
 the [[loop.label]] label, since this will also be the target used by a
 [[next]] expression. An additional label, [[body.label]] is needed for
 the top of the loop, which is used by [[STEPFOR]] if there are more
 loop elements to process.  When the [[ci]] argument is not [[NULL]]
 code is being generated for the case without a [[setjmp]] context, and
 the first instruction is the [[STARTFOR]] instruction which
 initializes the loop and jumps to [[loop.label]] at the [[STEPLOOP]]
 instruction.
 <<[[cmpForBody]] function>>=
 cmpForBody <- function(callidx, body, ci, cb, cntxt) {
     body.label <- cb$makelabel()
     loop.label <- cb$makelabel()
     end.label <- cb$makelabel()
     if (is.null(ci))
         cb$putcode(GOTO.OP, loop.label)
     else
         cb$putcode(STARTFOR.OP, callidx, ci, loop.label)
     cb$putlabel(body.label)
     lcntxt <- make.loopContext(cntxt, loop.label, end.label)
     cmp(body, cb, lcntxt)
     cb$putcode(POP.OP)
     cb$putlabel(loop.label)
     cb$putcode(STEPFOR.OP, body.label)
     cb$putlabel(end.label)
 }
 @ %def cmpForBody

 The wrap-up code issues an [[ENDFOR]] instruction instead of the
 [[LDNULL]] instruction used for [[repeat]] and [[while]] loops.
 <<generate [[for]] loop wrap-up code>>=
 cb$putcode(ENDFOR.OP)
 if (cntxt$tailcall) {
     cb$putcode(INVISIBLE.OP)
     cb$putcode(RETURN.OP)
 }
 @ %def


 \subsection{Avoiding runtime loop contexts}
 \label{subsec:skipcntxt}
 When all uses of [[break]] or [[next]] in a loop occur only in top
 level contexts then all [[break]] and [[next]] calls can be
 implemented with simple [[GOTO]] instructions and a [[setjmp]] context
 for the loop is not needed. Top level contexts are the loop body
 itself and argument expressions in top level calls to [[if]], [[{]],
 and [[(]].  The [[switch]] functions will eventually be included as well.
 %% **** need to add switch to the top level functions
 %% **** may not be OK if switch uses vmpSpecial because of ... arg
 The function [[checkSkipLoopContxt]] recursively traverses an
 expression tree to determine whether all relevant uses of [[break]] or
 [[next]] are safe to compile as [[GOTO]] instructions. The search
 returns [[FALSE]] if a [[break]] or [[next]] call occurs in an unsafe
 place.  The search stops and returns [[TRUE]] for any expression that
 cannot contain relevant [[break]] or [[next]] calls.  These stop
 expressions are calls to the three loop functions and to [[function]].
 Calls to functions like [[quote]] that are known not to evaluate their
 arguments could also be included among the stop functions but this
 doesn't seem particularly worth while at this time. Loops that include a
 call to [[eval]] (or [[evalq]], [[source]]) are compiled with context to support a
 programming pattern present e.g. in package [[Rmpi]]: a server application
 is implemented using an infinite loop, which evaluates de-serialized code
 received from the client; the server shuts down when it receives a
 serialized version of [[break]].

 The recursive checking function is defined as
 <<[[checkSkipLoopCntxt]] function>>=
 checkSkipLoopCntxt <- function(e, cntxt, breakOK = TRUE) {
     if (typeof(e) == "language") {
         fun <- e[[1]]
         if (typeof(fun) == "symbol") {
             fname <- as.character(fun)
             if (! breakOK && fname %in% c("break", "next"))
                 FALSE
             else if (isLoopStopFun(fname, cntxt))
                 TRUE
             else if (isLoopTopFun(fname, cntxt))
                 checkSkipLoopCntxtList(e[-1], cntxt, breakOK)
             else if (fname %in% c("eval", "evalq", "source"))
                 FALSE
             else
                 checkSkipLoopCntxtList(e[-1], cntxt, FALSE)
         }
         else
             checkSkipLoopCntxtList(e, cntxt, FALSE)
     }
     else TRUE
 }
 @ %def checkSkipLoopCntxt
 A version that operates on a list of expressions is given by
 <<[[checkSkipLoopCntxtList]] function>>=
 checkSkipLoopCntxtList <- function(elist, cntxt, breakOK) {
     for (a in as.list(elist))
         if (! missing(a) && ! checkSkipLoopCntxt(a, cntxt, breakOK))
             return(FALSE)
     TRUE
 }
 @ %def checkSkipLoopCntxtList

 The stop functions are identified by [[isLoopStopFun]].  This uses
 [[isBaseVar]] to ensure that interpreting a reference to a stop
 function name as referring to the corresponding function in the
 [[base]] package is permitted by the current optimization settings.
 %% **** could also stop for quote() and some others.
 <<[[isLoopStopFun]] function>>=
 isLoopStopFun <- function(fname, cntxt)
     (fname %in% c("function", "for", "while", "repeat") &&
      isBaseVar(fname, cntxt))
 @ %def isLoopStopFun

 The top level functions are identified by [[isLoopTopFun]].  Again the
 compilation context is consulted to ensure that candidate can be
 assumed to be from the [[base]] package.
 %% **** eventually add "switch"
 <<[[isLoopTopFun]] function>>=
 isLoopTopFun <- function(fname, cntxt)
     (fname %in% c("(", "{", "if") &&
      isBaseVar(fname, cntxt))
 @ %def isLoopTopFun

 The [[checkSkipLoopCntxt]] function does not check whether calls to
 [[break]] or [[next]] are indeed calls to the [[base]] functions.
 Given the special syntactic nature of [[break]] and [[next]] this is
 very unlikely to cause problems, but if it does it will result in some
 safe loops being considered unsafe and so errs in the conservative
 direction.


 \section{More inlining}

 \subsection{Basic arithmetic expressions}
 The addition and subtraction functions [[+]] and [[-]] are [[BUILTIN]]
 functions that can both be called with one or two arguments.
 Multiplication and division functions [[*]] and [[/]] require two
 arguments.  Since code generation for all one arguments cases and all
 two argument cases is very similar these are abstracted out into
 functions [[cmpPrim1]] and [[cmpPrim2]].

 The code generators for addition and subtraction are given by
 <<inline handlers for [[+]] and [[-]]>>=
 setInlineHandler("+", function(e, cb, cntxt) {
     if (length(e) == 3)
         cmpPrim2(e, cb, ADD.OP, cntxt)
     else
         cmpPrim1(e, cb, UPLUS.OP, cntxt)
 })

 setInlineHandler("-", function(e, cb, cntxt) {
     if (length(e) == 3)
         cmpPrim2(e, cb, SUB.OP, cntxt)
     else
         cmpPrim1(e, cb, UMINUS.OP, cntxt)
 })
 @ %def
 The code generators for multiplication and division are
 <<inline handlers for [[*]] and [[/]]>>=
 setInlineHandler("*", function(e, cb, cntxt)
     cmpPrim2(e, cb, MUL.OP, cntxt))

 setInlineHandler("/", function(e, cb, cntxt)
     cmpPrim2(e, cb, DIV.OP, cntxt))
 @ %def

 Code for instructions corresponding to calls to a [[BUILTIN]] function
 with one argument are generated by [[cmpPrim1]]. The generator
 produces code for a generic [[BUILTIN]] call using [[cmpBuiltin]] if
 if there are any missing or [[...]] arguments or if the number of
 arguments is not equal to one.  Otherwise code for the argument is
 generated in a non-tail-call context, and the instruction provided as
 the [[op]] argument is emitted followed by a [[RETURN]] instruction
 for an expression in tail position. The [[op]] instructions take the
 call as operand for use in error message and for internal dispatching.
 <<[[cmpPrim1]] function>>=
 cmpPrim1 <- function(e, cb, op, cntxt) {
     if (dots.or.missing(e[-1]))
         cmpBuiltin(e, cb, cntxt)
     else if (length(e) != 2) {
         notifyWrongArgCount(e[[1]], cntxt, loc = cb$savecurloc())
         cmpBuiltin(e, cb, cntxt)
     }
     else {
         ncntxt <- make.nonTailCallContext(cntxt)
         cmp(e[[2]], cb, ncntxt);
 	ci <- cb$putconst(e)
         cb$putcode(op, ci)
         if (cntxt$tailcall)
             cb$putcode(RETURN.OP)
         TRUE
     }
 }
 @ %def cmpPrim1

 Code generation for the two argument case is similar, except that the
 second argument has to be compiled with an argument context since the
 stack already has the value of the first argument on it and that would
 need to be popped before a jump.
 <<[[cmpPrim2]] function>>=
 cmpPrim2 <- function(e, cb, op, cntxt) {
     if (dots.or.missing(e[-1]))
         cmpBuiltin(e, cb, cntxt)
     else if (length(e) != 3) {
         notifyWrongArgCount(e[[1]], cntxt, loc = cb$savecurloc())
         cmpBuiltin(e, cb, cntxt)
     }
     else {
         needInc <- checkNeedsInc(e[[3]], cntxt)
         ncntxt <- make.nonTailCallContext(cntxt)
         cmp(e[[2]], cb, ncntxt);
         if (needInc) cb$putcode(INCLNK.OP)
         ncntxt <- make.argContext(cntxt)
         cmp(e[[3]], cb, ncntxt)
         if (needInc) cb$putcode(DECLNK.OP)
         ci <- cb$putconst(e)
         cb$putcode(op, ci)
         if (cntxt$tailcall)
             cb$putcode(RETURN.OP)
         TRUE
     }
 }
 @ %def cmpPrim2

 The [[INCLNK]] and [[DECLNK]] instructions are used to protect
 evaluated arguents on the stack from modifications during evaluation
 of subsequent arguments. These instructions can be omitted if the
 subsequent argument evaluations cannot modify values on the stack.
 <<[[checkNeedsInc]] function>>=
 checkNeedsInc <- function(e, cntxt) {
     type <- typeof(e)
     if (type %in% c("language", "bytecode", "promise"))
         TRUE
     else FALSE ## symbols and constants
 }
 @ %def checkNeedsInc

 Calls to the power function [[^]] and the functions [[exp]] and
 [[sqrt]] can be compiled using [[cmpPrim1]] and [[cmpPrim2]] as well:
 <<inline handlers for [[^]], [[exp]], and [[sqrt]]>>=
 setInlineHandler("^", function(e, cb, cntxt)
     cmpPrim2(e, cb, EXPT.OP, cntxt))

 setInlineHandler("exp", function(e, cb, cntxt)
     cmpPrim1(e, cb, EXP.OP, cntxt))

 setInlineHandler("sqrt", function(e, cb, cntxt)
     cmpPrim1(e, cb, SQRT.OP, cntxt))
 @

 The [[log]] function is currently defined as a [[SPECIAL]].  The
 default inline handler action is therefore to use [[cmpSpecial]]. For
 calls with one unnamed argument the [[LOG.OP]] instruction is
 used. For two unnamed arguments [[LOGBASE.OP]] is used. It might be
 useful to introduce instructions for [[log2]] and [[log10]] as well
 but this has not been done yet.
 <<inline handler for [[log]]>>=
 setInlineHandler("log", function(e, cb, cntxt) {
     if (dots.or.missing(e) || ! is.null(names(e)) ||
         length(e) < 2 || length(e) > 3)
         cmpSpecial(e, cb, cntxt)
     else {
         ci <- cb$putconst(e)
         ncntxt <- make.nonTailCallContext(cntxt)
         cmp(e[[2]], cb, ncntxt);
         if (length(e) == 2)
             cb$putcode(LOG.OP, ci)
         else {
             needInc <- checkNeedsInc(e[[3]], cntxt)
             if (needInc) cb$putcode(INCLNK.OP)
             ncntxt <- make.argContext(cntxt)
             cmp(e[[3]], cb, ncntxt)
             if (needInc) cb$putcode(DECLNK.OP)
             cb$putcode(LOGBASE.OP, ci)
         }
         if (cntxt$tailcall)
             cb$putcode(RETURN.OP)
         TRUE
     }
 })
 @

 A number of one argument math functions are handled by the interpreter
 using the function [[math1]] in [[arithmetic.c]]. The [[MATH1.OP]]
 instruction handles these for compuled code. The instruction takes two
 operands, an index for the call expression in the constant table, and
 an index for the function to be called in a table of function
 pointers. The table of names in the byte code compiler has to match
 the function pointer array in the byte code interpreter.  It would
 have been possible to use the same indices as the offset values used
 in [[names.c]], but keeping this consistent seemed more challenging.
 <<list of one argument math functions>>=
 ## Keep the order consistent with the order in the internal byte code
 ## interpreter!
 math1funs <- c("floor", "ceiling", "sign",
                "expm1", "log1p",
                "cos", "sin", "tan", "acos", "asin", "atan",
                "cosh", "sinh", "tanh", "acosh", "asinh", "atanh",
                "lgamma", "gamma", "digamma", "trigamma",
                "cospi", "sinpi", "tanpi")
 @ %def math1funs

 The code generation is done by [[cmpMath1]]:
 <<[[cmpMath1]] function>>=
 cmpMath1 <- function(e, cb, cntxt) {
     if (dots.or.missing(e[-1]))
         cmpBuiltin(e, cb, cntxt)
     else if (length(e) != 2) {
         notifyWrongArgCount(e[[1]], cntxt, loc = cb$savecurloc())
         cmpBuiltin(e, cb, cntxt)
     }
     else {
         name <- as.character(e[[1]])
         idx <- match(name, math1funs) - 1
         if (is.na(idx))
             cntxt$stop(
                 paste(sQuote(name), "is not a registered math1 function"),
                 cntxt, loc = cb$savecurloc())
         ncntxt <- make.nonTailCallContext(cntxt)
         cmp(e[[2]], cb, ncntxt);
         ci <- cb$putconst(e)
         cb$putcode(MATH1.OP, ci, idx)
         if (cntxt$tailcall)
             cb$putcode(RETURN.OP)
         TRUE
     }
 }
 @ %def cmpMath1
 The generators are installed by
 <<inline one argument math functions>>=
 for (name in math1funs)
     setInlineHandler(name, cmpMath1)
 @

 \subsection{Logical operators}
 Two argument instructions are provided for the comparison operators
 and code for them can be generated using [[cmpPrim2]]:
 <<inline handlers for comparison operators>>=
 setInlineHandler("==", function(e, cb, cntxt)
    cmpPrim2(e, cb, EQ.OP, cntxt))

 setInlineHandler("!=", function(e, cb, cntxt)
    cmpPrim2(e, cb, NE.OP, cntxt))

 setInlineHandler("<", function(e, cb, cntxt)
    cmpPrim2(e, cb, LT.OP, cntxt))

 setInlineHandler("<=", function(e, cb, cntxt)
    cmpPrim2(e, cb, LE.OP, cntxt))

 setInlineHandler(">=", function(e, cb, cntxt)
    cmpPrim2(e, cb, GE.OP, cntxt))

 setInlineHandler(">", function(e, cb, cntxt)
    cmpPrim2(e, cb, GT.OP, cntxt))
 @ %def

 The vectorized [[&]] and [[|]] functions are handled similarly:
 <<inline handlers for [[&]] and [[|]]>>=
 setInlineHandler("&", function(e, cb, cntxt)
    cmpPrim2(e, cb, AND.OP, cntxt))

 setInlineHandler("|", function(e, cb, cntxt)
    cmpPrim2(e, cb, OR.OP, cntxt))
 @ %def

 The negation operator [[!]] takes only one argument and code for calls
 to it are generated using [[cmpPrim1]]:
 <<inline handler for [[!]]>>=
 setInlineHandler("!", function(e, cb, cntxt)
    cmpPrim1(e, cb, NOT.OP, cntxt))
 @ %def

 %% **** do log() somewhere around here?
 %% **** is log(x,) == log(x)???
 %% **** is log(,y) allowed?


 \subsection{Subsetting and related operations}
 \label{subsec:subset}
 Current R semantics are such that the subsetting operator [[[]] and a
 number of others may not evaluate some of their arguments if S3 or S4
 methods are available.  S-plus has different semantics---there the
 subsetting operator is guaranteed to evaluate its arguments.
 % In the case of the concatenation function [[c]] it is not clear
 % whether these semantics are worth preserving; changing [[c]] to a
 % [[BUILTIN]] seems to cause no problems on [[CRAN]] and [[BioC]]
 % packages tested.
 For subsetting there are [[CRAN]] packages that use non-standard
 evaluation of their arguments ([[igraph]] is one example), so this
 probably can no longer be changed.

 The compiler preserve these semantics.  To do so subsetting is
 implemented in terms of two instructions, [[STARTSUBSET]] and
 [[DFLTSUBSET]].  The object being subsetted is evaluated and placed on
 the stack. [[STARTSUBSET]] takes a constant table index for the
 expression and a label operand as operands and examines the object on
 the stack.  If an internal S3 or S4 dispatch succeeds then the
 receiver object is removed and the result is placed on the stack and a
 jump to the label is carried out.  If the dispatch fails then code to
 evaluate and execute the arguments is executed followed by a
 [[DFLTSUBSET]] instruction.
 This pattern is used for several other operations and is abstracted
 into the code generation function [[cmpDispatch]]. Code for subsetting
 and other operations is then generated by
 <<inlining handlers for some dispatching SPECIAL functions>>=
 # **** this is now handled differently; see "Improved subset ..."
 # setInlineHandler("[", function(e, cb, cntxt)
 #     cmpDispatch(STARTSUBSET.OP, DFLTSUBSET.OP, e, cb, cntxt))

 # **** c() is now a BUILTIN
 # setInlineHandler("c", function(e, cb, cntxt)
 #     cmpDispatch(STARTC.OP, DFLTC.OP, e, cb, cntxt, FALSE))

 # **** this is now handled differently; see "Improved subset ..."
 # setInlineHandler("[[", function(e, cb, cntxt)
 #     cmpDispatch(STARTSUBSET2.OP, DFLTSUBSET2.OP, e, cb, cntxt))
 @

 The [[cmpDispatch]] function takes the two opcodes as arguments.  It
 declines to handle cases with [[...]] arguments in the call or with a
 missing first argument --- these will be handled as calls to a
 [[SPECIAL]] primitive. For the case handled it generates code for the
 first argument, followed by a call to the first [[start.op]]
 instruction.  The operands for the [[start.op]] are a constant pool
 index for the expression and a label for the instruction following the
 [[dflt.op]] instruction that allows skipping over the default case
 code. The default case code consists of code to compute and push the
 arguments followed by the [[dflt.op]] instruction.
 <<[[cmpDispatch]] function>>=
 cmpDispatch <- function(start.op, dflt.op, e, cb, cntxt, missingOK = TRUE) {
     if ((missingOK && any.dots(e)) ||
         (! missingOK && dots.or.missing(e)) ||
         length(e) == 1)
         cmpSpecial(e, cb, cntxt) ## punt
     else {
         ne <- length(e)
         oe <- e[[2]]
         if (missing(oe))
             cmpSpecial(e, cb, cntxt) ## punt
         else {
             ncntxt <- make.argContext(cntxt)
             cmp(oe, cb, ncntxt)
             ci <- cb$putconst(e)
             end.label <- cb$makelabel()
             cb$putcode(start.op, ci, end.label)
             if (ne > 2)
                 cmpBuiltinArgs(e[-(1:2)], names(e)[-(1:2)], cb, cntxt,
                                missingOK)
             cb$putcode(dflt.op)
             cb$putlabel(end.label)
             if (cntxt$tailcall) cb$putcode(RETURN.OP)
             TRUE
         }
     }
 }
 @ %def cmpDispatch
 %% **** The implementation currently implies that arguments to things
 %% **** with S4 methods may be evaluated more than once if dispatch
 %% **** does not happen.  It would be better to rewrite this so if
 %% **** arguments are evaluated we stay with the interpreted version
 %% **** all the way.  This requires a bit of refactoring of
 %% **** DispatchOrEval code to get it to work. But it should not
 %% **** affect the compiler.
 %% ****
 %% **** There may be some merit to always go with the interpreted code
 %% **** if the receiver has the object bit set -- that way the
 %% **** sequence could be done as
 %% ****
 %% ****     if (object bit set)
 %% ****         CALLSPECIAL
 %% ****     else
 %% ****         do default thing
 %% ****
 %% **** and in some cases the object bit test can be hoisted.

 The [[$]] function is simpler to implement since its selector argument
 is never evaluated.  The [[DOLLAR]] instruction takes the object to
 extract a component from off the stack and takes a constant index
 argument specifying the selection symbol.
 %% signal warning if selector is not a symbol or a string??
 %% also decline if any missing args?
 <<inlining handler for [[$]]>>=
 setInlineHandler("$", function(e, cb, cntxt) {
     if (any.dots(e) || length(e) != 3)
         cmpSpecial(e, cb, cntxt)
     else {
         sym <- if (is.character(e[[3]]) && length(e[[3]]) == 1
                    && e[[3]] != "")
             as.name(e[[3]]) else e[[3]]
         if (is.name(sym)) {
             ncntxt <- make.argContext(cntxt)
             cmp(e[[2]], cb, ncntxt)
             ci <- cb$putconst(e)
             csi <- cb$putconst(sym)
             cb$putcode(DOLLAR.OP, ci, csi)
             if (cntxt$tailcall) cb$putcode(RETURN.OP)
             TRUE
         }
         else cmpSpecial(e, cb, cntxt)
     }
 })
 @ %def


 \subsection{Inlining simple [[.Internal]] functions}
 A number of functions are defined as simple wrappers around
 [[.Internal]] calls. One example is [[dnorm]], which is currently
 defined as
 \begin{verbatim}
 dnorm <- function (x, mean = 0, sd = 1, log = FALSE)
     .Internal(dnorm(x, mean, sd, log))
 \end{verbatim}
 The implementation of [[.Internal]] functions can be of type
 [[BUILTIN]] or [[SPECIAL]].  The [[dnorm]] implementation is of type
 [[BUILTIN]], so its arguments are guaranteed to be evaluated in order,
 and this particular function doe not depend on the position of its
 calls in the evaluation stack. As a result, a call of the form
 \begin{verbatim}
 dnorm(2, 1)
 \end{verbatim}
 can be replaced by the call
 \begin{verbatim}
 .Internal(dnorm(2, 1, 1, FALSE))
 \end{verbatim}
 %% **** except for error messages maybe??
 This can result in considerable speed-up since it avoids the overhead
 of the call to the wrapper function.

 The substitution of a call to the wrapper with a [[.Internal]] call
 can be done by a function [[inlineSimpleInternalCall]] defined as
 <<[[inlineSimpleInternalCall]] function>>=
 inlineSimpleInternalCall <- function(e, def) {
     if (! dots.or.missing(e) && is.simpleInternal(def)) {
         forms <- formals(def)
         b <- body(def)
         if (typeof(b) == "language" && length(b) == 2 && b[[1]] == "{")
             b <- b[[2]]
         icall <- b[[2]]
         defaults <- forms ## **** could strip missings but OK not to?
         cenv <- c(as.list(match.call(def, e, F))[-1], defaults)
         subst <- function(n)
             if (typeof(n) == "symbol") cenv[[as.character(n)]] else n
         args <- lapply(as.list(icall[-1]), subst)
         as.call(list(quote(.Internal), as.call(c(icall[[1]], args))))
     }
     else NULL
 }
 @ %def inlineSimpleInternalCall

 Code for an inlined simple internal function can then be generated by
 [[cmpSimpleInternal]]:
 <<[[cmpSimpleInternal]] function>>=
 cmpSimpleInternal <- function(e, cb, cntxt) {
     if (any.dots(e))
         FALSE
     else {
         name <- as.character(e[[1]])
         def <- findFunDef(name, cntxt)
         if (! checkCall(def, e, NULL)) return(FALSE)
         call <- inlineSimpleInternalCall(e, def)
         if (is.null(call))
             FALSE
 	else
             cmpDotInternalCall(call, cb, cntxt)
     }
 }
 @ %def cmpSimpleInternal

 <<inline safe simple [[.Internal]] functions from [[base]]>>=
 safeBaseInternals <- c("atan2", "besselY", "beta", "choose",
                        "drop", "inherits", "is.vector", "lbeta", "lchoose",
                        "nchar", "polyroot", "typeof", "vector", "which.max",
                        "which.min", "is.loaded", "identical",
                        "match", "rep.int", "rep_len")

 for (i in safeBaseInternals) setInlineHandler(i,  cmpSimpleInternal)
 @ %def safeBaseInternals

 %% **** nextn would also be OK with a broader definition of 'safe'
 <<inline safe simple [[.Internal]] functions from [[stats]]>>=
 safeStatsInternals <- c("dbinom", "dcauchy", "dgeom", "dhyper", "dlnorm",
                         "dlogis", "dnorm", "dpois", "dunif", "dweibull",
                         "fft", "mvfft", "pbinom", "pcauchy",
                         "pgeom", "phyper", "plnorm", "plogis", "pnorm",
                         "ppois", "punif", "pweibull", "qbinom", "qcauchy",
                         "qgeom", "qhyper", "qlnorm", "qlogis", "qnorm",
                         "qpois", "qunif", "qweibull", "rbinom", "rcauchy",
                         "rgeom", "rhyper", "rlnorm", "rlogis", "rnorm",
                         "rpois", "rsignrank",  "runif", "rweibull",
                         "rwilcox", "ptukey", "qtukey")

 for (i in safeStatsInternals) setInlineHandler(i,  cmpSimpleInternal, "stats")
 @ %def

 It is possible to automate the process of identifying functions with
 the simple wrapper form and with [[.Internal]] implementations of type
 [[BUILTIN]], and the function [[simpleInternals]] produces a list of
 such candidates for a given package on the search path.  But
 determining whether such a candidate can be safely inlined needs to be
 done manually.  Most can, but some, such as [[sys.call]], cannot since
 they depend on their position on the call stack (removing the wrapper
 call that the implementation expects would change the result).
 Nevertheless, [[simpleInternals]] is useful for providing a list of
 candidates to screen. The [[is.simpleInternal]] function can be used
 in test code to check that the assumption made in the compiler is
 valid.  The implementation is
 <<[[simpleInternals]] function>>=
 simpleInternals <- function(pos = "package:base") {
     names <- ls(pos = pos, all = TRUE)
     if (length(names) == 0)
         character(0)
     else {
         fn <-  function(n)
             is.simpleInternal(get(n, pos = pos))
         names[sapply(names, fn)]
     }
 }
 @ %def simpleInternals

 <<[[is.simpleInternal]] function>>=
 is.simpleInternal <- function(def) {
     if (typeof(def) == "closure" && simpleFormals(def)) {
         b <- body(def)
         if (typeof(b) == "language" && length(b) == 2 && b[[1]] == "{")
             b <- b[[2]]
         if (typeof(b) == "language" &&
             typeof(b[[1]]) == "symbol" &&
             b[[1]] == ".Internal") {
             icall <- b[[2]]
             ifun <- icall[[1]]
             typeof(ifun) == "symbol" &&
             .Internal(is.builtin.internal(as.name(ifun))) &&
             simpleArgs(icall, names(formals(def)))
         }
         else FALSE
     }
     else FALSE
 }
 @ %def is.simpleInternal

 <<[[simpleFormals]] function>>=
 simpleFormals <- function(def) {
     forms <- formals(def)
     if ("..." %in% names(forms))
         return(FALSE)
     for (d in as.list(forms)) {
         if (! missing(d)) {
             ## **** check constant folding
             if (typeof(d) %in% c("symbol", "language", "promise", "bytecode"))
                 return(FALSE)
         }
     }
     TRUE
 }
 @ %def simpleFormals

 <<[[simpleArgs]] function>>=
 simpleArgs <- function(icall, fnames) {
     for (a in as.list(icall[-1])) {
         if (missing(a))
             return(FALSE)
         else if (typeof(a) == "symbol") {
             if (! (as.character(a) %in% fnames))
                 return(FALSE)
         }
         else if (typeof(a) %in% c("language", "promise", "bytecode"))
             return(FALSE)
     }
     TRUE
 }
 @ %def simpleArgs


 \subsection{Inlining [[is.xyz]] functions}
 Most of the [[is.xyz]] functions in [[base]] are simple [[BUILTIN]]s
 that do not do internal dispatch.  They have simple instructions
 defined for them and are compiled in a common way.  [[cmpIs]] abstract
 out the common compilation process.
 <<[[cmpIs]] function>>=
 cmpIs <- function(op, e, cb, cntxt) {
     if (any.dots(e) || length(e) != 2)
         cmpBuiltin(e, cb, cntxt)
     else {
         ## **** check that the function is a builtin somewhere??
         s<-make.argContext(cntxt)
         cmp(e[[2]], cb, s)
         cb$putcode(op)
         if (cntxt$tailcall) cb$putcode(RETURN.OP)
         TRUE
     }
 }
 @ %def cmpIs

 Inlining handlers are then defined by
 <<inlining handlers for [[is.xyz]] functions>>=
 setInlineHandler("is.character", function(e, cb, cntxt)
     cmpIs(ISCHARACTER.OP, e, cb, cntxt))
 setInlineHandler("is.complex", function(e, cb, cntxt)
     cmpIs(ISCOMPLEX.OP, e, cb, cntxt))
 setInlineHandler("is.double", function(e, cb, cntxt)
     cmpIs(ISDOUBLE.OP, e, cb, cntxt))
 setInlineHandler("is.integer", function(e, cb, cntxt)
     cmpIs(ISINTEGER.OP, e, cb, cntxt))
 setInlineHandler("is.logical", function(e, cb, cntxt)
     cmpIs(ISLOGICAL.OP, e, cb, cntxt))
 setInlineHandler("is.name", function(e, cb, cntxt)
      cmpIs(ISSYMBOL.OP, e, cb, cntxt))
 setInlineHandler("is.null", function(e, cb, cntxt)
     cmpIs(ISNULL.OP, e, cb, cntxt))
 setInlineHandler("is.object", function(e, cb, cntxt)
     cmpIs(ISOBJECT.OP, e, cb, cntxt))
 setInlineHandler("is.symbol", function(e, cb, cntxt)
     cmpIs(ISSYMBOL.OP, e, cb, cntxt))
 @ %def

 At present [[is.numeric]], [[is.matrix]], and [[is.array]] do internal
 dispatching so we just handle them as ordinary [[BUILTIN]]s.  It might
 be worth defining virtual machine instructions for them as well.


 \subsection{Inline handler for calling C functions}
 The [[.Call]] interface is now the preferred interface for calling C
 functions and is also used in base packages like [[stat]]. The
 [[DOTCALL.OP]] instruction allows these calls to be made without
 allocating a list of arguments---the arguments are accumulated on the
 stack. For now only 16 or fewer arguments are handled; more arguments,
 and cases with named arguments, are handled by the standard [[.Call]]
 [[BUILTIN]].
 <<inline handler for [[.Call]]>>=
 setInlineHandler(".Call", function(e, cb, cntxt) {
     nargsmax <- 16 ## should match DOTCALL_MAX in eval.c
     if (dots.or.missing(e[-1]) || ! is.null(names(e)) ||
         length(e) < 2 || length(e) > nargsmax + 2)
         cmpBuiltin(e, cb, cntxt) ## punt
     else {
         ncntxt <- make.nonTailCallContext(cntxt)
         cmp(e[[2]], cb, ncntxt);
         nargs <- length(e) - 2
         if (nargs > 0) {
             ncntxt <- make.argContext(cntxt)
             for (a in as.list(e[-(1:2)]))
                 cmp(a, cb, ncntxt);
         }
         ci <- cb$putconst(e)
         cb$putcode(DOTCALL.OP, ci, nargs)
         if (cntxt$tailcall)
             cb$putcode(RETURN.OP)
         TRUE
     }
 })
 @


 \subsection{Inline handlers for generating integer sequences}
 The colon operator and the [[BUILTIN]] functions [[seq_along]] and
 [[seq_len]] generate sequences (the sequence might not be integers if
 long vectors are involved or the colon operator is given no-integer
 arguments). The [[COLON.OP]], [[SEQALONG.OP]], and [[SEQLEN.OP]]
 instructions implement these operations in byte code. This allows an
 implementation in which the result stored on the stack is not a fully
 realized sequence but only a recipe that the [[for]] loop, for
 example, can use to run the loop without generating the sequence.
 This is optionally implemented in the byte code interpreter. It would
 also be possible to allow the compact sequence representation to be
 stored in variables, etc., but this would require more extensive
 changes.
 <<inline handlers for integer sequences>>=
 setInlineHandler(":", function(e, cb, cntxt)
     cmpPrim2(e, cb, COLON.OP, cntxt))

 setInlineHandler("seq_along", function(e, cb, cntxt)
     cmpPrim1(e, cb, SEQALONG.OP, cntxt))

 setInlineHandler("seq_len", function(e, cb, cntxt)
     cmpPrim1(e, cb, SEQLEN.OP, cntxt))
 @

 \subsection{Inlining handlers for controlling warnings}
 The inlining handlers in this section do not actually affect code
 generation. Their purpose is to suppress warnings.

 Compiling calls to the [[::]] and [[:::]] functions without special
 handling would generate undefined variable warnings for the arguments.
 This is avoided by converting the arguments from symbols to strings,
 which these functions would do anyway at runtime, and then compiling
 the modified calls. The common process is handled by [[cmpMultiColon]].
 <<[[cmpMultiColon]] function>>=
 cmpMultiColon <- function(e, cb, cntxt) {
     if (! dots.or.missing(e) && length(e) == 3) {
 	goodType <- function(a)
 	    typeof(a) == "symbol" ||
 	    (typeof(a) == "character" && length(a) == 1)
         fun <- e[[1]]
         x <- e[[2]]
 	y <- e[[3]]
 	if (goodType(x) && goodType(y)) {
 	    args <- list(as.character(x), as.character(y))
             cmpCallSymFun(fun, args, e, cb, cntxt)
 	    TRUE
 	}
 	else FALSE
     }
     else FALSE
 }
 @ %def cmpMultiColon
 Code generators are then registered by
 <<inlining handlers for [[::]] and [[:::]]>>=
 setInlineHandler("::", cmpMultiColon)
 setInlineHandler(":::", cmpMultiColon)
 @

 Calls to with will often generate spurious undefined variable warning
 for variables appearing in the expression argument.  A crude approach
 is to compile the entire call with undefined variable warnings
 suppressed.
 <<inlining handler for [[with]]>>=
 setInlineHandler("with", function(e, cb, cntxt) {
     cntxt$suppressUndefined <- TRUE
     cmpCallSymFun(e[[1]], e[-1], e, cb, cntxt)
     TRUE
 })
 @

 A similar issue arises for [[require]], where an unquoted argument is
 often used.
 <<inlining handler for [[require]]>>=
 setInlineHandler("require", function(e, cb, cntxt) {
     cntxt$suppressUndefined <- TRUE
     cmpCallSymFun(e[[1]], e[-1], e, cb, cntxt)
     TRUE
 })
 @


 \section{The [[switch]] function}
 The [[switch]] function has somewhat awkward semantics that vary
 depending on whether the value of the first argument is a character
 string or is numeric.  For a string all or all but one of the
 alternatives must be named, and empty case arguments are allowed and
 result in falling through to the next non-empty case.  In the numeric
 case selecting an empty case produces an error.  If there is more than
 one alternative case and no cases are named then a character selector
 argument will produce an error, so one can assume that a numeric
 switch is intended.  But a [[switch]] with named arguments can be used
 with a numeric selector, so it is not in general possible to determine
 the intended type of the [[switch]] call from the structure of the
 call alone.  The compiled code therefore has to allow for both
 possibilities.

 The inlining handler goes through a number of steps collecting and
 processing information computed from the call and finally emits code
 for the non-empty alternatives.  If the [[switch]] expression appears
 in tail position then each alternative will end in a [[RETURN]]
 instruction.  If the call is not in tail position then each
 alternative will end with a [[GOTO]] than jumps to a label placed
 after the code for the final alternative.
 <<inline handler for [[switch]]>>=
 setInlineHandler("switch", function(e, cb, cntxt) {
     if (length(e) < 2 || any.dots(e))
         cmpSpecial(e, cb, cntxt)
     else {
         ## **** check name on EXPR, if any, partially matches EXPR?
         <<extract the [[switch]] expression components>>

         <<collect information on named alternatives>>

         <<create the labels>>

         <<create the map from names to labels for a character switch>>

         <<emit code for the [[EXPR]] argument>>

         <<emit the switch instruction>>

         <<emit error code for empty alternative in numerical switch>>

         <<emit code for the default case>>

         <<emit code for non-empty alternatives>>

         if (! cntxt$tailcall)
             cb$putlabel(endLabel)
     }
     TRUE
 })
 @

 The first step in processing the [[switch]] expression is to extract
 the selector expression [[expr]] and the case expressions, to identify
 which, if any, of the cases are empty, and to extract the names of the
 cases as [[nm]].  A warning is issued if there are no cases.  If there
 is only one case and that case is not named then setting [[nm = ""]]
 allows this situation to be processed by code used when names are
 present.
 <<extract the [[switch]] expression components>>=
 expr <- e[[2]]
 cases <-e[-c(1, 2)]

 if (is.null(cases))
     notifyNoSwitchcases(cntxt, loc = cb$savecurloc())

 miss <- missingArgs(cases)
 nm <- names(cases)

 ## allow for corner cases like switch(x, 1) which always
 ## returns 1 if x is a character scalar.
 if (is.null(nm) && length(cases) == 1)
     nm <- ""
 @ %def

 The next step in the case where some cases are named is to check for a
 default expression.  If there is more than one expression then the
 [[switch]] is compiled by [[cmpSpecial]].  This avoids having to
 reproduce the runtime error that would be generated if the [[switch]]
 is called with a character selector.
 %% **** would probably be better to not punt though -- then we could
 %% **** allow break/next to use GOTO
 <<collect information on named alternatives>>=
 ## collect information on named alternatives and check for
 ## multiple default cases.
 if (! is.null(nm)) {
     haveNames <- TRUE
     ndflt <- sum(nm == "")
     if (ndflt > 1) {
         notifyMultipleSwitchDefaults(ndflt, cntxt, loc = cb$savecurloc())
         ## **** punt back to interpreted version for now to get
         ## **** runtime error message for multiple defaults
         cmpSpecial(e, cb, cntxt)
         return(TRUE)
     }
     if (ndflt > 0)
         haveCharDflt <- TRUE
     else
         haveCharDflt <- FALSE
 }
 else {
     haveNames <- FALSE
     haveCharDflt <- FALSE
 }
 @ %def

 Next the labels are generated.  [[missLabel]] will be the label for
 code that signals an error if a numerical selector expression chooses
 a case with an empty argument.  The label [[dfltLabel]] will be for
 code that invisibly procures the value [[NULL]], which is the default
 case for a numerical selector argument and also for a character
 selector when no unnamed default case is provided. All non-empty cases
 are given their own labels, and [[endLabel]] is generated if it will
 be needed as the [[GOTO]] target for a [[switch]] expression that is
 not in tail position.
 <<create the labels>>=
 ## create the labels
 if (any(miss))
     missLabel <- cb$makelabel()
 dfltLabel <- cb$makelabel()

 lab <- function(m)
     if (m) missLabel
     else cb$makelabel()
 labels <- c(lapply(miss, lab), list(dfltLabel))

 if (! cntxt$tailcall)
     endLabel <- cb$makelabel()
 @ %def

 When there are named cases a map from the case names to the
 corresponding code labels is constructed next.  If no unnamed default
 was provided one is added that uses the [[dfltLabel]].
 <<create the map from names to labels for a character switch>>=
 ## create the map from names to labels for a character switch
 if (haveNames) {
     unm <- unique(nm[nm != ""])
     if (haveCharDflt)
         unm <- c(unm, "")
     nlabels <- labels[unlist(lapply(unm, findActionIndex, nm, miss))]
     ## if there is no unnamed case to act as a default for a
     ## character switch then the numeric default becomes the
     ## character default as well.
     if (! haveCharDflt) {
         unm <- c(unm, "")
         nlabels <- c(nlabels, list(dfltLabel))
     }
 }
 else {
     unm <- NULL
     nlabels <- NULL
 }
 @ %def
 The computation of the index of the appropriate label for a given name
 is carried out by [[findActionIndex]].
 %% **** rewrite this to directly return the label?
 <<[[findActionIndex]] function>>=
 findActionIndex <- function(name, nm, miss) {
     start <- match(name, nm)
     aidx <- c(which(! miss), length(nm) + 1)
     min(aidx[aidx >= start])
 }
 @ %def findActionIndex

 At this point we are ready to start emitting code into the code
 buffer.  First code to compute the selector is emitted.  As with the
 condition for an [[if]] expression a non-tail-call context is used.
 <<emit code for the [[EXPR]] argument>>=
 ## compile the EXPR argument
 ncntxt <- make.nonTailCallContext(cntxt)
 cmp(expr, cb, ncntxt)
 @ %def

 The switch instruction takes the selector off the stack and four
 operands from the instruction stream: the call index, an index for the
 names, or [[NULL]] if there are none, and indices for the labels for a
 character selector and for a numeric selector.  At this point lists of
 labels are placed in the instruction buffer.  At code extraction time
 these will be replaced by indices for numeric offset vectors by the
 [[patchlables]] function of the code buffer.
 <<emit the switch instruction>>=
 ## emit the SWITCH instruction
 cei <- cb$putconst(e)
 if (haveNames) {
     cni <- cb$putconst(unm)
     cb$putcode(SWITCH.OP, cei, cni, nlabels, labels)
 }
 else {
     cni <- cb$putconst(NULL)
     cb$putcode(SWITCH.OP, cei, cni, cni, labels)
 }
 @ %def

 If there are empty alternatives then code to signal an error for a
 numeric selector that chooses one of these is needed and is
 identified by the label [[missLabel]].
 <<emit error code for empty alternative in numerical switch>>=
 ## emit code to signal an error if a numeric switch hist an
 ## empty alternative (fall through, as for character, might
 ## make more sense but that isn't the way switch() works)
 if (any(miss)) {
     cb$putlabel(missLabel)
     cmp(quote(stop("empty alternative in numeric switch")), cb, cntxt)
 }
 @ %def

 Code for the numeric default case, corresponding to [[dfltLabel]],
 places [[NULL]] on the stack, and for a [[switch]] in tail position
 this is followed by an [[INVISIBLE]] and a [[RETURN]] instruction.
 <<emit code for the default case>>=
 ## emit code for the default case
 cb$putlabel(dfltLabel)
 cb$putcode(LDNULL.OP)
 if (cntxt$tailcall) {
     cb$putcode(INVISIBLE.OP)
     cb$putcode(RETURN.OP)
 }
 else
     cb$putcode(GOTO.OP, endLabel)
 @ %def

 Finally the labels and code for the non-empty alternatives are written
 to the code buffer.  In non-tail position the code is followed by a
 [[GOTO]] instruction that jumps to [[endLabel]].  The final case does
 not need this [[GOTO]].
 %% **** maybe try to drop the final GOTO
 <<emit code for non-empty alternatives>>=
 ## emit code for the non-empty alternatives
 for (i in seq_along(cases)) {
     if (! miss[i]) {
         cb$putlabel(labels[[i]])
         cmp(cases[[i]], cb, cntxt)
         if (! cntxt$tailcall)
             cb$putcode(GOTO.OP, endLabel)
     }
 }
 @ %def


 \section{Assignments expressions}
 R supports simple assignments in which the left-hand side of the
 assignment expression is a symbol and complex assignments of the form
 \begin{verbatim}
 f(x) <- v
 \end{verbatim}
 or
 \begin{verbatim}
 g(f(x)) <- v
 \end{verbatim}
 The second form is sometimes called a nested complex assignment.
 Ordinary assignment creates or modifies a binding in the current
 environment.  Superassignment via the [[<<-]] operator modifies a
 binding in a containing environment.

 Assignment expressions are compiled by [[cmpAssign]].  This function
 checks the form of the assignment expression and, for well formed
 expressions then uses [[cmpSymbolAssign]] for simple assignments and
 [[cmpComplexAssign]] for complex assignments.

 For now, a temporary hack is needed to address a discrepancy between
 byte code and AST code that can be caused by assignments in arguments
 to primitives. The root issue is that we are not recording referenced
 to arguments that have been evaluated. Once that is addressed we can
 remove this hack.
 <<temporary hack to deal with assignments in arguments issue>>=
 ## if (! cntxt$toplevel)
 ##    return(cmpSpecial(e, cb, cntxt))
 @
 <<[[cmpAssign]] function>>=
 cmpAssign <- function(e, cb, cntxt) {
     <<temporary hack to deal with assignments in arguments issue>>
     if (! checkAssign(e, cntxt, loc = cb$savecurloc()))
         return(cmpSpecial(e, cb, cntxt))
     superAssign <- as.character(e[[1]]) == "<<-"
     lhs <- e[[2]]
     value <- e[[3]]
     symbol <- as.name(getAssignedVar(e, cntxt))
     if (superAssign && ! findVar(symbol, cntxt))
         notifyNoSuperAssignVar(symbol, cntxt, loc = cb$savecurloc())
     if (is.name(lhs) || is.character(lhs))
         cmpSymbolAssign(symbol, value, superAssign, cb, cntxt)
     else if (typeof(lhs) == "language")
         cmpComplexAssign(symbol, lhs, value, superAssign, cb, cntxt)
     else cmpSpecial(e, cb, cntxt) # punt for now
 }
 @ %def cmpAssign

 The code generators for the assignment operators [[<-]] and [[=]] and
 the superassignment operator [[<<-]] are registered by
 <<inlining handlers for [[<-]], [[=]], and [[<<-]]>>=
 setInlineHandler("<-", cmpAssign)
 setInlineHandler("=", cmpAssign)
 setInlineHandler("<<-", cmpAssign)
 @ %def

 The function [[checkAssign]] is used to check that an assignment
 expression is well-formed.
 <<[[checkAssign]] function>>=
 checkAssign <- function(e, cntxt, loc = NULL) {
     if (length(e) != 3)
         FALSE
     else {
         place <- e[[2]]
         if (typeof(place) == "symbol" ||
             (typeof(place) == "character" && length(place) == 1))
             TRUE
         else {
             <<check left hand side call>>
         }
     }
 }
 @ %def checkAssign
 A valid left hand side call must have a function that is either a
 symbol or is of the form [[foo::bar]] or [[foo:::bar]], and the first
 argument must be a symbol or another valid left hand side call.  A
 [[while]] loop is used to unravel nested calls.
 <<check left hand side call>>=
 while (typeof(place) == "language") {
     fun <- place[[1]]
     if (typeof(fun) != "symbol" &&
         ! (typeof(fun) == "language" && length(fun) == 3 &&
            typeof(fun[[1]]) == "symbol" &&
            as.character(fun[[1]]) %in% c("::", ":::"))) {
         notifyBadAssignFun(fun, cntxt, loc)
         return(FALSE)
     }
     place = place[[2]]
 }
 if (typeof(place) == "symbol")
     TRUE
 else FALSE
 @ %def

 \subsection{Simple assignment expressions}
 %% **** handle fun defs specially for message purposes??
 Code for assignment to a symbol is generated by [[cmpSymbolAssign]].
 <<[[cmpSymbolAssign]] function>>=
 cmpSymbolAssign <- function(symbol, value, superAssign, cb, cntxt) {
     <<compile the right hand side value expression>>
     <<emit code for the symbol assignment instruction>>
     <<for tail calls return the value invisibly>>
     TRUE
 }
 @ %def cmpSymbolAssign

 A non-tail-call context is used to generate code for the right hand
 side value expression.
 <<compile the right hand side value expression>>=
 ncntxt <- make.nonTailCallContext(cntxt)
 cmp(value, cb, ncntxt)
 @ %def

 The [[SETVAR]] and [[SETVAR2]] instructions assign the value on the
 stack to the symbol specified by its constant pool index operand.  The
 [[SETVAR]] instruction is used by ordinary assignment to assign in the
 local frame, and [[SETVAR2]] for superassignments.
 <<emit code for the symbol assignment instruction>>=
 ci <- cb$putconst(symbol)
 if (superAssign)
     cb$putcode(SETVAR2.OP, ci)
 else
     cb$putcode(SETVAR.OP, ci)
 @ %def
 The super-assignment case does not need to check for and warn about a
 missing binding since this is done in [[cmpAssign]].

 The [[SETVAR]] and [[SETVAR2]] instructions leave the value on the
 stack as the value of the assignment expression; if the expression
 appears in tail position then this value is returned with the visible
 flag set to [[FALSE]].
 <<for tail calls return the value invisibly>>=
 if (cntxt$tailcall) {
     cb$putcode(INVISIBLE.OP)
     cb$putcode(RETURN.OP)
 }
 @ %def


 \subsection{Complex assignment expressions}
 \label{subsec:complexassign}
 It seems somehow appropriate at this point to mention that the code in
 [[eval.c]] implementing the interpreter semantics starts with the
 following comment:
 \begin{verbatim}
     /*
      *  Assignments for complex LVAL specifications. This is the stuff that
      *  nightmares are made of ...
 \end{verbatim}

 There are some issues with the semantics for complex assignment as
 implemented by the interpreter:
 \begin{itemize}
 \item With the current approach the following legal, though strange,
   code fails:
 <<inner assignment trashes temporary>>=
 f <-function(x, y) x
 `f<-` <- function(x, y, value) { y; x}
 x <- 1
 y <- 2
 f(x, y[] <- 1) <- 3
 @ %def
   The reason is that the current left hand side object is maintained in
   a variable [[*tmp*]], and processing the assignment in the second
   argument first overwrites the value of [[*tmp*]] and then removes
   [[*tmp*]] before the first argument is evaluated. Using evaluated
   promises as arguments, as is done for the right hand side value,
   solves this.

 \item The current approach of using a temporary variable [[*tmp*]] to
   hold the evaluated LHS object requires an internal cleanup context
   to ensure that the variable is removed in the event of a non-local
   exit. Implementing this in the compiler would introduce significant
   overhead.

 \item The asymmetry of handling the pre-evaluated right hand side
   value via an evaluated promise and the pre-evaluated left hand side
   via a temporary variable makes the code harder to understand and the
   semantics harder to explain.

 \item Using promises in an expression passed to eval means promises
   can leak out into R via sys.call. This is something we have tried to
   avoid and should try to avoid so we can have the freedom to
   implement lazy evaluation differently if that seems useful. [It may
   be possible at some point to avoid allocation of promise objects in
   compiled code.] The compiler can avoid this by using promises only
   in the argument lists passed to function calls, not in the call
   expressions.  A similar change could be made in the interpreter but
   it would have a small runtime penalty for constructing an expression
   in addition to an argument list I would prefer to avoid that for now
   until the compiler has been turned on by default.

 \item The current approach of installing the intermediate RHS value as
   the expression for the RHS promise in nested complex assignments has
   several drawbacks:
   \begin{itemize}
   \item it can produce huge expressions.

   \item the result is misleading if the intermediate RHS value is a
     symbol or a language object.

   \item to maintain this in compiled code it would be necessary to
     construct the assignment function call expression at runtime even
     though it is usually not needed (or it would require significant
     rewriting to allow on-demand computation of the call). If *vtmp*
     is used as a marker for the expression and documented as not a
     real variable then the call can be constructed at compile time.
   \end{itemize}

 \item In nested complex assignments the additional arguments of the
   inner functions are evaluated twice. This is illustrated by running
   this code:
 <<multiple evaluation of arguments in assignments>>=
 f <- function(x, y) {y ; x }
 `f<-` <- function(x, y, value) { y; x }
 g <- function(x, y) {y ; x }
 `g<-` <- function(x, y, value) { y; x }
 x <- 1
 y <- 2
 f(g(x, print(y)), y) <- 3
 @ %def
   This is something we have lived with, and I don't propose to change
   it at this time. But it would be good to be able to change it in the
   future.
 \end{itemize}

 Because of these issues the compiler implements slightly different
 semantics for complex assignment than the current intepreter.
 \emph{Evaluation} semantics should be identical; the difference arises
 in how intermediate values are managed and has some effect on results
 produced by [[substitute]].  In particular, no intermediate [[*tmp*]]
 value is used and therefore no cleanup frame is needed.  This does
 mean that uses of the form
 \begin{verbatim}
     eval(substitute(<first arg>), parent.frame())
 \end{verbatim}
 will no longer work.  In tests of most of CRAN and BioC this directly
 affected only one function, [[$.proto]] in the [[proto]] package, and
 indirectly about 30 packages using proto failed.  I looked at the
 [[$.proto]] implementation, and it turned out that the
 [[eval(substitute())]] approach used there could be replaced by
 standard evaluation using lexical scope. This produces better code,
 and the result works with both the current R interpreter and compiled
 code (proto and all the dependent packages pass check with this
 change).  The [[proto]] maintainer has changed [[proto]] along these
 lines.  It would be good to soon change the interpreter to also use
 evaluated promises in place of the [[*tmp*]] variable to bring the
 compiled and interpreted semantics closer together.

 Complex assignment expressions are compiled by [[cmpComplexAssign]].
 <<[[cmpComplexAssign]] function>>=
 cmpComplexAssign <- function(symbol, lhs, value, superAssign, cb, cntxt) {
     <<select complex assignment instructions>>
     <<compile the right hand side value expression>>
     <<compile the left hand side call>>
     <<for tail calls return the value invisibly>>
     TRUE;
 }
 @ %def cmpComplexAssign

 Assignment code is bracketed by a start and an end instruction.
 <<compile the left hand side call>>=
 csi <- cb$putconst(symbol)
 cb$putcode(startOP, csi)

 <<compile code to compute left hand side values>>
 <<compile code to compute right hand side values>>

 cb$putcode(endOP, csi)
 @ %def
 The appropriate instructions [[startOP]] and [[endOP]] depend on
 whether the assignment is an ordinary assignment or a superassignment.
 <<select complex assignment instructions>>=
 if (superAssign) {
     startOP <- STARTASSIGN2.OP
     endOP <- ENDASSIGN2.OP
 }
 else {
     if (! findVar(symbol, cntxt))
         notifyUndefVar(symbol, cntxt, loc = cb$savecurloc())
     startOP <- STARTASSIGN.OP
     endOP <- ENDASSIGN.OP
 }
 @ %def
 An undefined variable notification is issued for ordinary assignment,
 since this will produce a runtime error. For superassignment
 [[cmpAssign]] has already checked for an undefined left-hand-side
 variable and issued a notification if none was found.

 The start instructions obtain the initial value of the left-hand-side
 variable and in the case of standard assignment assign it in the local
 frame if it is not assigned there already. They also prepare the stack
 for the assignment process.  The stack invariant maintained by the
 assignment process is that the current right hand side value is on the
 top, followed by the evaluated left hand side values and the original
 right hand side value. Thus the start instruction leaves the right
 hand side value, the value of the left hand side variable, and again
 the right hand side value on the top of the stack.

 The end instruction finds the final right hand side value followed by
 the original right hand side value on the top of the stack.  The final
 value is removed and assigned to the appropriate variable binding.
 The original right hand side value is left on the top of the stack as
 the value of the assignment expression.

 Evaluating a nested complex assignment involves evaluating a sequence
 of expressions to obtain the left hand sides to modify, and then
 evaluating a sequence of corresponding calls to replacement functions
 in the opposite order. The function [[flattenPlace]] returns a list
 of the expressions that need to be considered, with [[*tmp*]] in place
 of the current left hand side argument. For example, for an assignment
 of the form [[f(g(h(x, k), j), i) <- v]] this produces
 \begin{verbatim}
 > flattenPlace(quote(f(g(h(x, k), j), i)))$places
 [[1]]
 f(`*tmp*`, i)

 [[2]]
 g(`*tmp*`, j)

 [[3]]
 h(`*tmp*`, k)
 \end{verbatim}
 The sequence of left hand side values needed consists of the original
 variable value, which is already on the stack, and the values of
 [[h(`*tmp*`, k)]] and [[g(`*tmp*`, j)]].

 In general the additional evaluations needed are of all but the first
 expression produced by [[flattenPlace]], evaluated in reverse
 order. An argument context is used since there are already values on
 the stack.
 <<compile code to compute left hand side values>>=
 ncntxt <- make.argContext(cntxt)
 flat <- flattenPlace(lhs, cntxt, loc = cb$savecurloc())
 flatOrigPlace <- flat$origplaces
 flatPlace <- flat$places
 flatPlaceIdxs <- seq_along(flatPlace)[-1]
 for (i in rev(flatPlaceIdxs))
     cmpGetterCall(flatPlace[[i]], flatOrigPlace[[i]], cb, ncntxt)
 @ %def
 The compilation of the individual calls carried out by
 [[cmpGetterCall]], which is presented in Section \ref{subsec:getter}.
 Each compilation places the new left hand side value on the top of the
 stack and then switches it with the value below, which is the original
 right hand side value, to preserve the stack invariant.

 The function [[flattenPlace]] is defined as
 <<[[flattenPlace]] function>>=
 flattenPlace <- function(place, cntxt, loc = NULL) {
     places <- NULL
     origplaces <- NULL
     while (typeof(place) == "language") {
         if (length(place) < 2)
             cntxt$stop(gettext("bad assignment 1"), cntxt, loc = loc)
         origplaces <- c(origplaces, list(place))
         tplace <- place
         tplace[[2]] <- as.name("*tmp*")
         places <- c(places, list(tplace))
         place <- place[[2]]
     }
     if (typeof(place) != "symbol")
         cntxt$stop(gettext("bad assignment 2"), cntxt, loc = loc)
     list(places = places, origplaces = origplaces)
 }
 @ %def flattenPlace

 After the right hand side values have been computed the stack contains
 the original right hand side value followed by the left hand side
 values in the order in which they need to be modified. Code to call
 the sequence of replacement functions is generated by
 <<compile code to compute right hand side values>>=
 cmpSetterCall(flatPlace[[1]], flatOrigPlace[[1]], value, cb, ncntxt)
 for (i in flatPlaceIdxs)
     cmpSetterCall(flatPlace[[i]], flatOrigPlace[[i]], as.name("*vtmp*"), cb, ncntxt)
 @ %def
 The first call uses the expression for the original right hand side in
 its call; all others will use [[*vtmp*]].  Each replacement function
 call compiled by [[cmpSetterCall]] will remove the top two elements
 from the stack and then push the new right hand side value on the
 stack.  [[cmpSetterCall]] is described in Section \ref{subsec:setter}.


 \subsection{Compiling setter calls}
 \label{subsec:setter}
 Setter calls, or calls to replacement functions, in compiled
 assignment expressions find stack that contains the current right hand
 side value on the top followed by the current left hand side value.
 Some replacement function calls, such as calls to [[$<-]], are handled
 by an inlining mechanism described below.  The general case when the
 function is specified by a symbol is handled a [[GETFUN]] instruction
 to push the function on the stack, pushing any additional arguments on
 the stack, and using the [[SETTER_CALL]] instruction to execute the
 call.  This instruction adjusts the argument list by inserting as the
 first argument an evaluated promise for the left hand side value and
 as the last argument an evaluated promise for the right hand side
 value; the final argument also has the [[value]] tag. The case where
 the function is specified in the form [[foo::bar]] or [[foo:::bar]]
 differs only compiling the function expression and using [[CHECKFUN]]
 to verify the result and prepare the stack.
 <<[[cmpSetterCall]] function>>=
 cmpSetterCall <- function(place, origplace, vexpr, cb, cntxt) {
     afun <- getAssignFun(place[[1]])
     acall <- as.call(c(afun, as.list(place[-1]), list(value = vexpr)))
     acall[[2]] <- as.name("*tmp*")
     ncntxt <- make.callContext(cntxt, acall)
     sloc <- cb$savecurloc()
     cexpr <- as.call(c(afun, as.list(origplace[-1]), list(value = vexpr)))
     cb$setcurexpr(cexpr)
     if (is.null(afun))
         ## **** warn instead and arrange for cmpSpecial?
         ## **** or generate code to signal runtime error?
         cntxt$stop(gettext("invalid function in complex assignment"),
                    loc = cb$savecurloc())
     else if (typeof(afun) == "symbol") {
         if (! trySetterInline(afun, place, origplace, acall, cb, ncntxt)) {
             ci <- cb$putconst(afun)
             cb$putcode(GETFUN.OP, ci)
             <<compile additional arguments and call to setter function>>
         }
     }
     else {
         cmp(afun, cb, ncntxt)
         cb$putcode(CHECKFUN.OP)
         <<compile additional arguments and call to setter function>>
     }
     cb$restorecurloc(sloc)
 }
 @ %def cmpSetterCall
 The common code for compiling additional arguments and issuing the
 [[SETTER_CALL]] instruction is given by
 <<compile additional arguments and call to setter function>>=
 cb$putcode(PUSHNULLARG.OP)
 cmpCallArgs(place[-c(1, 2)], cb, ncntxt)
 cci <- cb$putconst(acall)
 cvi <- cb$putconst(vexpr)
 cb$putcode(SETTER_CALL.OP, cci, cvi)
 @ %def
 The [[PUSHNULL]] instruction places [[NULL]] in the argument list as a
 first argument to serve as a place holder; [[SETTER_CALL]] replaces
 this with the evaluated promise for the current left hand side value.

 The replacement function corresponding to [[fun]] is computed by
 [[getAssignFun]].  If [[fun]] is a symbol then the assignment function
 is the symbol followed by [[<-]].  The function [[fun]] can also be an
 expression of the form [[foo::bar]], in which case the replacement
 function is the expression [[foo::`bar<-`]].  [[NULL]] is returned if
 [[fun]] does not fit into one of these two cases.
 <<[[getAssignFun]] function>>=
 getAssignFun <- function(fun) {
     if (typeof(fun) == "symbol")
         as.name(paste0(fun, "<-"))
     else {
         ## check for and handle foo::bar(x) <- y assignments here
         if (typeof(fun) == "language" && length(fun) == 3 &&
             (as.character(fun[[1]]) %in% c("::", ":::")) &&
             typeof(fun[[2]]) == "symbol" && typeof(fun[[3]]) == "symbol") {
             afun <- fun
             afun[[3]] <- as.name(paste0(fun[[3]],"<-"))
             afun
         }
         else NULL
     }
 }
 @ %def getAssignFun

 To produce more efficient code some replacement function calls can be
 inlined and use specialized instructions.  The most important of these
 are [[$<-]], [[[<-]], and [[[[<-]].  An inlining mechanism similar to
 the one described in Section \ref{sec:inlining} is used for this
 purpose.  A separate mechanism is needed because of the fact that in
 the present context two arguments, the left hand side and right hand
 side values, are already on the stack.
 <<setter inlining mechanism>>=
 setterInlineHandlers <- new.env(hash = TRUE, parent = emptyenv())

 setSetterInlineHandler <- function(name, h, package = "base") {
     if (exists(name, setterInlineHandlers, inherits = FALSE)) {
         entry <- get(name, setterInlineHandlers)
         if (entry$package != package) {
             fmt <- "handler for '%s' is already defined for another package"
             stop(gettextf(fmt, name), domain = NA)
         }
     }
     entry <- list(handler = h, package = package)
     assign(name, entry, setterInlineHandlers)
 }

 getSetterInlineHandler <- function(name, package = "base") {
     if (exists(name, setterInlineHandlers, inherits = FALSE)) {
         hinfo <- get(name, setterInlineHandlers)
         if (hinfo$package == package)
             hinfo$handler
         else NULL
     }
     else NULL
 }

 trySetterInline <- function(afun, place, origplace, call, cb, cntxt) {
     name <- as.character(afun)
     info <- getInlineInfo(name, cntxt)
     if (is.null(info))
         FALSE
     else {
         h <- getSetterInlineHandler(name, info$package)
         if (! is.null(h))
             h(afun, place, origplace, call, cb, cntxt)
         else FALSE
     }
 }
 @ %def

 The inline handler for [[$<-]] replacement calls uses the
 [[DOLLARGETS]] instruction.  The handler declines to handle cases that
 would produce runtime errors; these are compiled by the generic
 mechanism.
 %% **** might be useful to signal a warning at compile time
 <<setter inline handler for [[$<-]]>>=
 setSetterInlineHandler("$<-", function(afun, place, origplace, call, cb, cntxt) {
     if (any.dots(place) || length(place) != 3)
         FALSE
     else {
         sym <- place[[3]]
         if (is.character(sym))
             sym <- as.name(sym)
         if (is.name(sym)) {
             ci <- cb$putconst(call)
             csi <- cb$putconst(sym)
             cb$putcode(DOLLARGETS.OP, ci, csi)
             TRUE
         }
         else FALSE
     }
 })
 @ %def

 The replacement functions [[[<-]] and [[[[<-]]] are implemented as
 [[SPECIAL]] functions that do internal dispatching.  They are
 therefore compiled along the same lines as their corresponding
 accessor functions as described in Section \ref{subsec:subset}.  The
 common pattern is implemented by [[cmpSetterDispatch]].
 <<[[cmpSetterDispatch]] function>>=
 cmpSetterDispatch <- function(start.op, dflt.op, afun, place, call, cb, cntxt) {
     if (any.dots(place))
         FALSE ## punt
     else {
         ci <- cb$putconst(call)
         end.label <- cb$makelabel()
         cb$putcode(start.op, ci, end.label)
         if (length(place) > 2) {
             args <- place[-(1:2)]
             cmpBuiltinArgs(args, names(args), cb, cntxt, TRUE)
         }
         cb$putcode(dflt.op)
         cb$putlabel(end.label)
         TRUE
     }
 }
 @ %def cmpSetterDispatch
 The two inlining handlers are then defined as
 <<setter inline handlers for [[ [<- ]] and [[ [[<- ]]>>=
 # **** this is now handled differently; see "Improved subset ..."
 # setSetterInlineHandler("[<-", function(afun, place, origplace, call, cb, cntxt)
 #     cmpSetterDispatch(STARTSUBASSIGN.OP, DFLTSUBASSIGN.OP,
 #                       afun, place, call, cb, cntxt))

 # setSetterInlineHandler("[[<-", function(afun, place, origplace, call, cb, cntxt)
 #     cmpSetterDispatch(STARTSUBASSIGN2.OP, DFLTSUBASSIGN2.OP,
 #                       afun, place, call, cb, cntxt))
 @ %def

 An inline handler is defined for [[@<-]] in order to suppress spurious
 warnings about the slot name symbol. A call in which the slot is
 specified by a symbol is converted to one using a string instead, and
 is then compiled by a recursive call to [[cmpSetterCall]]; the handler
 will decline in this second call and the default compilation strategy
 will be used.
 <<setter inlining handler for [[@<-]]>>=
 setSetterInlineHandler("@<-", function(afun, place, origplace, acall, cb, cntxt) {
     if (! dots.or.missing(place) && length(place) == 3 &&
         typeof(place[[3]]) == "symbol") {
         place[[3]] <- as.character(place[[3]])
         vexpr <- acall[[length(acall)]]
 	cmpSetterCall(place, origplace, vexpr, cb, cntxt)
         TRUE
     }
     else FALSE
 })
 @


 \subsection{Compiling getter calls}
 \label{subsec:getter}
 Getter calls within an assignment also need special handling because
 of the left hand side argument being on the stack already and because
 of the need to restore the stack invariant. There are again two cases
 for installing the getter function on the stack.  These are then
 followed by common code for handling the additional arguments and the
 call.
 <<[[cmpGetterCall]] function>>=
 cmpGetterCall <- function(place, origplace, cb, cntxt) {
     ncntxt <- make.callContext(cntxt, place)
     sloc <- cb$savecurloc()
     cb$setcurexpr(origplace)
     fun <- place[[1]]
     if (typeof(fun) == "symbol") {
         if (! tryGetterInline(place, cb, ncntxt)) {
             ci <- cb$putconst(fun)
             cb$putcode(GETFUN.OP, ci)
 	    <<compile additional arguments and call to getter function>>
         }
     }
     else {
         cmp(fun, cb, ncntxt)
         cb$putcode(CHECKFUN.OP)
 	<<compile additional arguments and call to getter function>>
     }
     cb$restorecurloc(sloc)
 }
 @ %def cmpGetterCall
 In the common code, as in setter calls a [[NULL]] is placed on the
 argument stack as a place holder for the left hand side promise.  Then
 the additional arguments are placed on the stack and the
 [[GETTER-CALL]] instruction is issued.  This instruction installs the
 evaluated promise with the left hand side value as the first argument
 and executes the call.  The call will leave the next right left hand
 side on the top of the stack.  A [[SWAP]] instruction then switches
 the top two stack entries.  This leaves the original right hand side
 value on top followed by the new left hand side value returned by the
 getter call and any other left hand side values produced by earlier
 getter call.
 <<compile additional arguments and call to getter function>>=
 cb$putcode(PUSHNULLARG.OP)
 cmpCallArgs(place[-c(1, 2)], cb, ncntxt)
 cci <- cb$putconst(place)
 cb$putcode(GETTER_CALL.OP, cci)
 cb$putcode(SWAP.OP)
 @ %def

 Again an inlining mechanism is needed to handle calls to functions like
 [[$]] and [[[]].  These are able to use the same instructions as the
 inline handlers in Section \ref{subsec:subset} for ordinary calls to
 [[$]] and [[[]] but require some additional work to deal with
 maintaining the stack invariant.

 The inlining mechanism itself is analogous to the general one and the
 one for inlining setter calls.
 <<getter inlining mechanism>>=
 getterInlineHandlers <- new.env(hash = TRUE, parent = emptyenv())

 setGetterInlineHandler <- function(name, h, package = "base") {
     if (exists(name, getterInlineHandlers, inherits = FALSE)) {
         entry <- get(name, getterInlineHandlers)
         if (entry$package != package) {
             fmt <- "handler for '%s' is already defined for another package"
             stop(gettextf(fmt, name), domain = NA)
         }
     }
     entry <- list(handler = h, package = package)
     assign(name, entry, getterInlineHandlers)
 }

 getGetterInlineHandler <- function(name, package = "base") {
     if (exists(name, getterInlineHandlers, inherits = FALSE)) {
         hinfo <- get(name, getterInlineHandlers)
         if (hinfo$package == package)
             hinfo$handler
         else NULL
     }
     else NULL
 }

 tryGetterInline <- function(call, cb, cntxt) {
     name <- as.character(call[[1]])
     info <- getInlineInfo(name, cntxt)
     if (is.null(info))
         FALSE
     else {
         h <- getGetterInlineHandler(name, info$package)
         if (! is.null(h))
             h(call, cb, cntxt)
         else FALSE
     }
 }
 @ %def

 The inline handler for [[$]] in a getter context uses the [[DUP2ND]]
 instruction to push the second value on the stack, the previous left
 hand side value, onto the stack.  The [[DOLLAR]] instruction pops this
 value, computes the component for this value and the symbol in the
 constant pool, and pushes the result on the stack.  A [[SWAP]]
 instruction then interchanges this value with the next value, which is
 the original right hand side value, thus restoring the stack
 invariant.
 <<getter inline handler for [[$]]>>=
 setGetterInlineHandler("$", function(call, cb, cntxt) {
     if (any.dots(call) || length(call) != 3)
         FALSE
     else {
         sym <- call[[3]]
         if (is.character(sym))
             sym <- as.name(sym)
         if (is.name(sym)) {
             ci <- cb$putconst(call)
             csi <- cb$putconst(sym)
             cb$putcode(DUP2ND.OP)
             cb$putcode(DOLLAR.OP, ci, csi)
             cb$putcode(SWAP.OP)
             TRUE
         }
         else FALSE
     }
 })
 @ %def

 Calls to [[[]] and [[[[]] again need two instructions to support the
 internal dispatch.  The general pattern is implemented in
 [[cmpGetterDispatch]].  A [[DUP2ND]] instruction is used to place the
 first argument for the call on top of the stack, code analogous to the
 code for ordinary calls to [[[]] and [[[[]] is used to make the call,
 and this is followed by a [[SWAP]] instruction to rearrange the stack.
 <<[[cmpGetterDispatch]] function>>=
 cmpGetterDispatch <- function(start.op, dflt.op, call, cb, cntxt) {
     if (any.dots(call))
         FALSE ## punt
     else {
         ci <- cb$putconst(call)
         end.label <- cb$makelabel()
         cb$putcode(DUP2ND.OP)
         cb$putcode(start.op, ci, end.label)
         if (length(call) > 2) {
             args <- call[-(1:2)]
             cmpBuiltinArgs(args, names(args), cb, cntxt, TRUE)
         }
         cb$putcode(dflt.op)
         cb$putlabel(end.label)
         cb$putcode(SWAP.OP)
         TRUE
     }
 }
 @ %def cmpGetterDispatch
 The two inline handlers are then defined as
 <<getter inline handlers for [[[]] and [[[[]]>>=
 # **** this is now handled differently; see "Improved subset ..."
 # setGetterInlineHandler("[", function(call, cb, cntxt)
 #     cmpGetterDispatch(STARTSUBSET.OP, DFLTSUBSET.OP, call, cb, cntxt))

 # setGetterInlineHandler("[[", function(call, cb, cntxt)
 #     cmpGetterDispatch(STARTSUBSET2.OP, DFLTSUBSET2.OP, call, cb, cntxt))
 @ %def


 \section{Constant folding}
 A very valuable compiler optimization is constant folding. For
 example, an expression for computing a normal density function may
 include the code
 \begin{verbatim}
 1 / sqrt(2 * pi)
 \end{verbatim}
 The interpreter would have to evaluate this expression each time it is
 needed, but a compiler can often compute the value once at compile
 time.

 The constant folding optimization can be applied at various points in
 the compilation process: It can be applied to the source code before
 code generation or to the generated code in a separate optimization
 phase.  For now, constant folding is applied during the code
 generation phase.

 The [[constantFold]] function examines its expression argument and
 handles each expression type by calling an appropriate function.
 <<[[constantFold]] function>>=
 ## **** rewrite using switch??
 constantFold <- function(e, cntxt, loc = NULL) {
     type = typeof(e)
     if (type == "language")
         constantFoldCall(e, cntxt)
     else if (type == "symbol")
         constantFoldSym(e, cntxt)
     else if (type == "promise")
         cntxt$stop(gettext("cannot constant fold literal promises"),
                    cntxt, loc = loc)
     else if (type == "bytecode")
         cntxt$stop(gettext("cannot constant fold literal bytecode objects"),
                    cntxt, loc = loc)
     else checkConst(e)
 }
 @ %def constantFold
 %% **** warn and return NULL instead of calling stop??

 The [[checkConst]] function decides whether a value is a constant that
 is small enough and simple enough to enter into the constant pool.  If
 so, then [[checkConst]] wraps the value in a list as the [[value]]
 component.  If not, then [[NULL]] is returned.
 <<[[checkConst]] function>>=
 checkConst <- function(e) {
     if (mode(e) %in% constModes && length(e) <= maxConstSize)
         list(value = e)
     else
         NULL
 }
 @ %def checkConst
 The maximal size and acceptable modes are defined by
 <<[[maxConstSize]] and [[constModes]] definitions>>=
 maxConstSize <- 10

 constModes <- c("numeric", "logical", "NULL", "complex", "character")
 @ %def maxConstSize constModes

 For now, constant folding is only applied for a particular set of
 variables and functions defined in the base package.  The constant
 folding code uses [[isBaseVar]] to determine whether a variable can be
 assumed to reference the corresponding base variable given the
 current compilation environment and optimization setting.
 [[constantFoldSym]] is applied to base variables in the [[constNames]]
 list.
 <<[[constantFoldSym]] function>>=
 ## Assumes all constants will be defined in base.
 ## Eventually allow other packages to define constants.
 ## Any variable with locked binding could be used if type is right.
 ## Allow local declaration of optimize, notinline declaration.
 constantFoldSym <- function(var, cntxt) {
     var <- as.character(var)
     if (var %in% constNames && isBaseVar(var, cntxt))
         checkConst(get(var, .BaseNamespaceEnv))
     else NULL
 }
 @ %def constantFoldSym
 <<[[constNames]] definition>>=
 constNames <- c("pi", "T", "F")
 @ %def constNames

 Call expressions are handled by determining whether the function
 called is eligible for constant folding, attempting to constant fold
 the arguments, and calling the folding function.  The result is the
 passed to [[checkConst]].  If an error or a warning occurs in the call to the
 folding function then [[constantFoldCall]] returns [[NULL]].
 <<[[constantFoldCall]] function>>=
 constantFoldCall <- function(e, cntxt) {
     fun <- e[[1]]
     if (typeof(fun) == "symbol") {
         ffun <- getFoldFun(fun, cntxt)
         if (! is.null(ffun)) {
             args <- as.list(e[-1])
             for (i in seq_along(args)) {
                 a <- args[[i]]
                 if (missing(a))
                     return(NULL)
                 val <- constantFold(a, cntxt)
                 if (! is.null(val))
                     args[i] <- list(val$value) ## **** in case value is NULL
                 else return(NULL)
             }
             modes <- unlist(lapply(args, mode))
             if (all(modes %in% constModes)) {
                 tryCatch(checkConst(do.call(ffun, args)),
                          error = function(e) NULL, warning = function(w) NULL)
                 ## **** issue warning??
             }
             else NULL
         }
         else NULL
     }
     else NULL
 }
 @ %def constantFoldCall
 %% **** separate out and explain argument processing chunk (maybe also the call)

 The functions in the base package eligible for constant folding are
 <<[[foldFuns]] definition>>=
 foldFuns <- c("+", "-", "*", "/", "^", "(",
               ">", ">=", "==", "!=", "<", "<=", "||", "&&", "!",
               "|", "&", "%%",
               "c", "rep", ":",
               "abs", "acos", "acosh", "asin", "asinh", "atan", "atan2",
               "atanh", "ceiling", "choose", "cos", "cosh", "exp", "expm1",
               "floor", "gamma", "lbeta", "lchoose", "lgamma", "log", "log10",
               "log1p", "log2", "max", "min", "prod", "range", "round",
               "seq_along", "seq.int", "seq_len", "sign", "signif",
               "sin", "sinh", "sqrt", "sum", "tan", "tanh", "trunc",
               "baseenv", "emptyenv", "globalenv",
               "Arg", "Conj", "Im", "Mod", "Re",
               "is.R")
 @ %def foldFuns
 [[getFoldFun]] checks the called function against this list and
 whether the binding for the variable can be assumed to be from the
 base package. If then returns the appropriate function from the base
 package or [[NULL]].
 <<[[getFoldFun]] function>>=
 ## For now assume all foldable functions are in base
 getFoldFun <- function(var, cntxt) {
     var <- as.character(var)
     if (var %in% foldFuns && isBaseVar(var, cntxt)) {
         val <- get(var, .BaseNamespaceEnv)
         if (is.function(val))
             val
         else
             NULL
     }
     else NULL
 }
 @ %def getFoldFun


 \section{More top level functions}
 \subsection{Compiling closures}
 The function [[cmpfun]] is for compiling a closure.  The body is
 compiled with [[genCode]] and combined with the closure's formals and
 environment to form a compiled closure.  The [[.Internal]] function
 [[bcClose]] does this. Some additional fiddling is needed if the
 closure is an S4 generic.  The need for the [[asS4]] bit seems a bit
 odd but it is apparently needed at this point.
 <<[[cmpfun]] function>>=
 cmpfun <- function(f, options = NULL) {
     type <- typeof(f)
     if (type == "closure") {
         cntxt <- make.toplevelContext(makeCenv(environment(f)), options)
         ncntxt <- make.functionContext(cntxt, formals(f), body(f))
         if (mayCallBrowser(body(f), ncntxt))
             return(f)
         if (typeof(body(f)) != "language" || body(f)[1] != "{")
             loc <- list(expr = body(f), srcref = getExprSrcref(f))
         else
             loc <- NULL
         b <- genCode(body(f), ncntxt, loc = loc)
         val <- .Internal(bcClose(formals(f), b, environment(f)))
         attrs <- attributes(f)
         if (! is.null(attrs))
             attributes(val) <- attrs
         if (isS4(f)) ## **** should this really be needed??
             val <- asS4(val)
         val
     }
     else if (type == "builtin" || type == "special")
         f
     else stop("cannot compile a non-function")
 }
 @ %def cmpfun

 For use in compiling packages and in JIT compilation it is useful to
 have a variant that returns the uncompiled function if there is an
 error during compilation.
 <<[[tryCmpfun]] function>>=
 tryCmpfun <- function(f)
     tryCatch(cmpfun(f), error = function(e) {
         notifyCompilerError(paste(e$message, "at", deparse(e$call)))
         f
     })
 @ %def tryCmpfun

 A similar utility for expressions for use in JIT compilation of loops:
 <<[[tryCompile]] function>>=
 tryCompile <- function(e, ...)
     tryCatch(compile(e, ...), error = function(err) {
         notifyCompilerError(paste(err$message, "at", deparse(err$call)))
         e
     })
 @ %def tryCompile

 If a function contains a call to [[browser]], it should not be compiled,
 because the byte-code interpreter does not support command-by-command
 execution ("n"). This function explores the AST of a closure to find out if
 it may contain a call to [[browser]]:

 <<[[mayCallBrowser]] function>>=
 mayCallBrowser <- function(e, cntxt) {
     if (typeof(e) == "language") {
         fun <- e[[1]]
         if (typeof(fun) == "symbol") {
             fname <- as.character(fun)
             if (fname == "browser") ## not checking isBaseVar to err on the
                                     ## positive
                 TRUE
             else if (fname == "function" && isBaseVar(fname, cntxt))
                 FALSE
             else
                 mayCallBrowserList(e[-1], cntxt)
         }
         else
             mayCallBrowserList(e, cntxt)
     }
     else FALSE
 }
 @ %def mayCallBrowser

 A version that operates on a list of expressions is
 <<[[mayCallBrowserList]] function>>=
 mayCallBrowserList <- function(elist, cntxt) {
     for (a in as.list(elist))
         if (! missing(a) && mayCallBrowser(a, cntxt))
             return(TRUE)
     FALSE
 }
 @ %def mayCallBrowserList


 \subsection{Compiling and loading files}
 A file can be compiled with [[cmpfile]] and loaded with [[loadcmp]].
 [[cmpfile]] reads in the expressions, compiles them, and serializes
 the list of compiled expressions by calling the [[.Internal]] function
 [[save.to.file]].
 <<[[cmpfile]] function>>=
 cmpfile <- function(infile, outfile, ascii = FALSE, env = .GlobalEnv,
                     verbose = FALSE, options = NULL, version = NULL) {
     if (! is.environment(env) || ! identical(env, topenv(env)))
         stop("'env' must be a top level environment")
     <<create [[outfile]] if argument is missing>>
     <<check that [[infile]] and [[outfile]] are not the same>>
     forms <- parse(infile)
     nforms <- length(forms)
     srefs <- attr(forms, "srcref")
     if (nforms > 0) {
         expr.needed <- 1000
         expr.old <- getOption("expressions")
         if (expr.old < expr.needed) {
             options(expressions = expr.needed)
             on.exit(options(expressions = expr.old))
         }
         cforms <- vector("list", nforms)
         cenv <- makeCenv(env)
         cntxt <- make.toplevelContext(cenv, options)
         cntxt$env <- addCenvVars(cenv, findLocalsList(forms, cntxt))
         for (i in 1:nforms) {
             e <- forms[[i]]
             sref <- srefs[[i]]
             if (verbose) {
                 if (typeof(e) == "language" && e[[1]] == "<-" &&
                     typeof(e[[3]]) == "language" && e[[3]][[1]] == "function")
                     cat(paste0("compiling function \"", e[[2]], "\"\n"))
                 else
                     cat(paste("compiling expression", deparse(e, 20)[1],
                               "...\n"))
             }
             if (!mayCallBrowser(e, cntxt))
                 cforms[[i]] <- genCode(e, cntxt,
                                        loc = list(expr = e, srcref = sref))
         }
         cat(gettextf("saving to file \"%s\" ... ", outfile))
         .Internal(save.to.file(cforms, outfile, ascii, version))
         cat(gettext("done"), "\n", sep = "")
     }
     else warning("empty input file; no output written");
     invisible(NULL)
 }
 @ %def cmpfile

 The default output file name is the base name of the input file with a
 [[.Rc]] extension.
 <<create [[outfile]] if argument is missing>>=
 if (missing(outfile)) {
     basename <- sub("\\.[a-zA-Z0-9]$", "", infile)
     outfile <- paste0(basename, ".Rc")
 }
 @ %def
 As a precaution it is useful to check that [[infile]] and [[outfile]]
 are not the same and signal an error if they are.
 <<check that [[infile]] and [[outfile]] are not the same>>=
 if (infile == outfile)
     stop("input and output file names are the same")
 @ %def

 The [[loadcmp]] reads in the serialized list of expressions
 using the [[.Internal]] function [[load.from.file]].  The compiled
 expressions are then evaluated in the global environment.
 <<[[loadcmp]] function>>=
 loadcmp <- function (file, envir = .GlobalEnv, chdir = FALSE) {
     if (!(is.character(file) && file.exists(file)))
         stop(gettextf("file '%s' does not exist", file), domain = NA)
     exprs <- .Internal(load.from.file(file))
     if (length(exprs) == 0)
         return(invisible())
     if (chdir && (path <- dirname(file)) != ".") {
         owd <- getwd()
         on.exit(setwd(owd), add = TRUE)
         setwd(path)
     }
     for (i in exprs) {
         eval(i, envir)
     }
     invisible()
 }
 @ %def loadcmp
 [[loadcmp]] is the analog to [[source]] for compiled files.

 Two additional functions that are currently not exported or used are
 [[cmpframe]] and [[cmplib]].  They should probably be removed.
 <<[[cmpframe]] function>>=
 cmpframe <- function(inpos, file) {
     expr.needed <- 1000
     expr.old <- getOption("expressions")
     if (expr.old < expr.needed)
        options(expressions = expr.needed)
     on.exit(options(expressions = expr.old))

     attach(NULL, name="<compiled>")
     inpos <- inpos + 1
     outpos <- 2
     on.exit(detach(pos=outpos), add=TRUE)

     for (f in ls(pos = inpos, all.names = TRUE)) {
         def <- get(f, pos = inpos)
         if (typeof(def) == "closure") {
                 cat(gettextf("compiling '%s'", f), "\n", sep = "")
                 fc <- cmpfun(def)
                 assign(f, fc, pos=outpos)
         }
     }
     cat(gettextf("saving to file \"%s\" ... ", file))
     save(list = ls(pos = outpos, all.names = TRUE), file = file)
     cat(gettext("done"), "\n", sep = "")
 }
 @ %def cmpframe

 <<[[cmplib]] function>>=
 cmplib <- function(package, file) {
     package <- as.character(substitute(package))
     pkgname <- paste("package", package, sep = ":")
     pos <- match(pkgname, search());
     if (missing(file))
         file <- paste0(package,".Rc")
     if (is.na(pos)) {
         library(package, character.only = TRUE)
         pos <- match(pkgname, search());
         on.exit(detach(pos=match(pkgname, search())))
     }
     cmpframe(pos, file)
 }
 @ %def cmplib


 \subsection{Enabling implicit compilation}
 <<[[enableJIT]] function>>=
 enableJIT <- function(level)
     .Internal(enableJIT(level))
 @ %def enableJIT
 <<[[compilePKGS]] function>>=
 compilePKGS <- function(enable)
     .Internal(compilePKGS(enable))
 @ %def compilePKGS


 \subsection{Setting compiler options}
 The [[setCompilerOptions]] function provides a means for users to
 adjust the default compiler option values.  This interface is
 experimental and may change.
 <<[[setCompilerOptions]] function>>=
 setCompilerOptions <- function(...) {
     options <- list(...)
     nm <- names(options)
     for (n in nm)
         if (! exists(n, compilerOptions))
             stop(gettextf("'%s' is not a valid compiler option", n),
                  domain = NA)
     old <- list()
     newOptions <- as.list(compilerOptions) # copy options
     for (n in nm) {
         op <- options[[n]]
         switch(n,
                optimize = {
                    op <- as.integer(op)
                    if (length(op) == 1 && 0 <= op && op <= 3) {
                        old <- c(old, list(optimize =
                                           compilerOptions$optimize))
                        newOptions$optimize <- op
                    }
                },
                suppressAll = {
                    if (identical(op, TRUE) || identical(op, FALSE)) {
                        old <- c(old, list(suppressAll =
                                           compilerOptions$suppressAll))
                        newOptions$suppressAll <- op
                    }
                },
                suppressNoSuperAssignVar = {
                    if (isTRUE(op) || isFALSE(op)) {
                        old <- c(old, list(
                            suppressNoSuperAssignVar =
                                compilerOptions$suppressNoSuperAssignVar))
                        newOptions$suppressNoSuperAssignVar <- op
                    }
                },
                suppressUndefined = {
                    if (identical(op, TRUE) || identical(op, FALSE) ||
                        is.character(op)) {
                        old <- c(old, list(suppressUndefined =
                                           compilerOptions$suppressUndefined))
                        newOptions$suppressUndefined <- op
                    }
                })
     }
     jitEnabled <- enableJIT(-1)
     if (checkCompilerOptions(jitEnabled, newOptions))
         for(n in names(newOptions)) # commit the new options
             assign(n, newOptions[[n]], compilerOptions)
     invisible(old)
 }
 @ %def

 For now, a [[.onLoad]] function is used to allow all warning to be
 suppressed.  This is probably useful for building packages, since the
 way lazy loading is done means variables defined in shared libraries
 are not available and produce a raft of warnings.  The [[.onLoad]]
 function also allows undefined variables to be suppressed and the
 optimization level to be specified using environment variables.
 <<[[.onLoad]] function>>=
 .onLoad <- function(libname, pkgname) {
     envAsLogical <- function(varName) {
         value = Sys.getenv(varName)
         if (value == "")
             NA
         else
             switch(value,
                 "1"=, "TRUE"=, "true"=, "True"=, "yes"=, "Yes"= TRUE,
                 "0"=, "FALSE"=,"false"=,"False"=, "no"=, "No" = FALSE,
                 stop(gettextf("invalid environment variable value: %s==%s",
                     varName, value)))
     }
     val <- envAsLogical("R_COMPILER_SUPPRESS_ALL")
     if (!is.na(val))
         setCompilerOptions(suppressAll = val)
     val <- envAsLogical("R_COMPILER_SUPPRESS_UNDEFINED")
     if (!is.na(val))
         setCompilerOptions(suppressUndefined = val)
     val <- envAsLogical("R_COMPILER_SUPPRESS_NO_SUPER_ASSIGN_VAR")
     if (!is.na(val))
         setCompilerOptions(suppressNoSuperAssignVar = val)
     if (Sys.getenv("R_COMPILER_OPTIMIZE") != "")
         tryCatch({
             lev <- as.integer(Sys.getenv("R_COMPILER_OPTIMIZE"))
             if (0 <= lev && lev <= 3)
                 setCompilerOptions(optimize = lev)
         }, error = function(e) e, warning = function(w) w)
 }
 @ %def .onLoad

 When [[enableJIT]] is set to 3, loops should be compiled before executing.
 However, if the [[optimize]] option is set to 0 or 1, a compiled loop will
 call to the same primitive function as is used by the AST interpretter (e.g.
 [[do_for]]), and the compilation would run into infinite recursion.
 [[checkCompilerOptions]] will detect invalid combinations of [[enableJIT]]
 and [[optimize]] and report a warning.
 %% **** could also change the interface and atomically set enableJIT and optimize
 <<[[checkCompilerOptions]] function>>=
 checkCompilerOptions <- function(jitEnabled, options = NULL) {
     optimize <- getCompilerOption("optimize", options)
     if (jitEnabled <= 2 || optimize >= 2)
         TRUE
     else {
         stop(gettextf(
             "invalid compiler options: optimize(==%d)<2 and jitEnabled(==%d)>2",
             optimize, jitEnabled))
         FALSE
     }
 }
 @ %def checkCompilerOptions


 \subsection{Disassembler}
 A minimal disassembler is provided by [[disassemble]]. This is
 primarily useful for debugging the compiler.  A more readable output
 representation might be nice to have. It would also probably make
 sense to give the result a class and write a print method.
 <<[[disassemble]] function>>=
 disassemble <- function(code) {
     .CodeSym <- as.name(".Code")
     disasm.const<-function(x)
         if (typeof(x)=="list" && length(x) > 0 && identical(x[[1]], .CodeSym))
             disasm(x) else x
     disasm <-function(code) {
         code[[2]]<-bcDecode(code[[2]])
         code[[3]]<-lapply(code[[3]], disasm.const)
         code
     }
     if (typeof(code)=="closure") {
         code <- .Internal(bodyCode(code))
         if (typeof(code) != "bytecode")
             stop("function is not compiled")
     }
     dput(disasm(.Internal(disassemble(code))))
 }
 @ %def disassemble

 The [[.Internal]] function [[disassemble]] extracts the numeric code
 vector and constant pool.  The function [[bcDecode]] uses the
 [[Opcodes.names]] array to translate the numeric opcodes into symbolic
 ones.  At this point not enough information is available in a
 reasonable place to also convert labels back to symbolic form.
 <<[[bcDecode]] function>>=
 bcDecode <- function(code) {
     n <- length(code)
     ncode <- vector("list", n)
     ncode[[1]] <- code[1] # version number
     i <- 2
     while (i <= n) {
         name<-Opcodes.names[code[i]+1]
         argc<-Opcodes.argc[[code[i]+1]]
         ncode[[i]] <- as.name(name)
         i<-i+1
         if (argc > 0)
             for (j in 1:argc) {
                 ncode[[i]]<-code[i]
                 i<-i+1
             }
     }
     ncode
 }
 @ %def bcDecode


 \section{Improved subset and sub-assignment handling}
 This section describes changes that allow subset and subassign
 operations to inmost case be handled without allocating list of the
 index arguments --- the arguments are passed on the stack instead.
 The function [[cmpSubsetDispatch]] is analogous to [[cmpDispatch]]
 described above. the [[dflt.op]] argument passed information about the
 instruction to be emitted. For instructions designed for a particular
 number of arguments the [[rank]] component is [[FALSE]] and no index
 count is emitted; this is used for [[VECSUBSET.OP]] and
 [[MATSUBSET.OP]] instructions. If [[rank]] is [[TRUE]], then the
 number of indices is emitted as an operand; this is used by the
 [[SUBSET_N.OP]] instruction.
 <<[[cmpSubsetDispatch]] function>>=
 cmpSubsetDispatch <- function(start.op, dflt.op, e, cb, cntxt) {
     if (dots.or.missing(e) || ! is.null(names(e)) || length(e) < 3)
         cntxt$stop(gettext("cannot compile this expression"), cntxt,
                    loc = cb$savecurloc())
     else {
         oe <- e[[2]]
         if (missing(oe))
             cntxt$stop(gettext("cannot compile this expression"), cntxt,
                        loc = cb$savecurloc())
         ncntxt <- make.argContext(cntxt)
         ci <- cb$putconst(e)
         label <- cb$makelabel()
         cmp(oe, cb, ncntxt)
         cb$putcode(start.op, ci, label)
         indices <- e[-c(1, 2)]
         cmpIndices(indices, cb, ncntxt)
         if (dflt.op$rank) cb$putcode(dflt.op$code, ci, length(indices))
         else cb$putcode(dflt.op$code, ci)
         cb$putlabel(label)
         if (cntxt$tailcall) cb$putcode(RETURN.OP)
         TRUE
     }
 }
 @ %def cmpSubsetDispatch

 Index expressions are compiled by
 <<[[cmpIndices]] function>>=
 cmpIndices <- function(indices, cb, cntxt) {
     n <- length(indices)
     needInc <- FALSE
     for (i in seq_along(indices))
         if (i > 1 && checkNeedsInc(indices[[i]], cntxt)) {
             needInc <- TRUE
             break
         }
     for (i in seq_along(indices)) {
         cmp(indices[[i]], cb, cntxt, TRUE)
         if (needInc && i < n) cb$putcode(INCLNK.OP)
     }
     if (needInc) {
         if (n == 2) cb$putcode(DECLNK.OP)
         else if (n > 2) cb$putcode(DECLNK_N.OP, n - 1)
     }
 }
 @ %def cmpIndices
 This adds instructions to increment and later decrement link counts on
 previously computed values to prevent later computations from
 modifying earlier ones. Eventually it should be possible to eliminate
 some of these increment/decrement instructions in an optimization
 phase.

 The subsetting handlers fall back to using [[cmpDispatch]] if there
 are any named arguments or if an error would need to be signaled (we
 could issue a compiler warning at this point as well). If all
 arguments are unnamed and there are no dots then [[cmpSubsetDispatch]]
 is used; the instruction emitted depends on the argument count.
 <<inline handlers for subsetting>>=
 setInlineHandler("[", function(e, cb, cntxt) {
     if (dots.or.missing(e) || ! is.null(names(e)) || length(e) < 3)
         cmpDispatch(STARTSUBSET.OP, DFLTSUBSET.OP, e, cb, cntxt) ## punt
     else {
         nidx <- length(e) - 2;
         if (nidx == 1)
             dflt.op <- list(code = VECSUBSET.OP, rank = FALSE)
         else if (nidx == 2)
             dflt.op <- list(code = MATSUBSET.OP, rank = FALSE)
         else
             dflt.op <- list(code = SUBSET_N.OP, rank = TRUE)
         cmpSubsetDispatch(STARTSUBSET_N.OP, dflt.op, e, cb, cntxt)
     }
 })

 setInlineHandler("[[", function(e, cb, cntxt) {
     if (dots.or.missing(e) || ! is.null(names(e)) || length(e) < 3)
         cmpDispatch(STARTSUBSET2.OP, DFLTSUBSET2.OP, e, cb, cntxt) ## punt
     else {
         nidx <- length(e) - 2;
         if (nidx == 1)
             dflt.op <- list(code = VECSUBSET2.OP, rank = FALSE)
         else if (nidx == 2)
             dflt.op <- list(code = MATSUBSET2.OP, rank = FALSE)
         else
             dflt.op <- list(code = SUBSET2_N.OP, rank = TRUE)
         cmpSubsetDispatch(STARTSUBSET2_N.OP, dflt.op, e, cb, cntxt)
     }
 })
 @

 Similarly, [[cmpSubassignDispatch]] is a variant of
 [[cmpSetterDispatch]] that passes index arguments on the stack and
 emits an index count if necessary.
 <<[[cmpSubassignDispatch]] function>>=
 cmpSubassignDispatch <- function(start.op, dflt.op, afun, place, call, cb,
                                  cntxt) {
     if (dots.or.missing(place) || ! is.null(names(place)) || length(place) < 3)
         cntxt$stop(gettext("cannot compile this expression"), cntxt,
                    loc = cb$savecurloc())
     else {
         ci <- cb$putconst(call)
         label <- cb$makelabel()
         cb$putcode(start.op, ci, label)
         indices <- place[-c(1, 2)]
         cmpIndices(indices, cb, cntxt)
         if (dflt.op$rank) cb$putcode(dflt.op$code, ci, length(indices))
         else cb$putcode(dflt.op$code, ci)
         cb$putlabel(label)
         TRUE
     }
 }
 @  %def cmpSubassignDispatch

 Again the handlers fall back to [[cmpSetterDispatch]] if there are
 named arguments or other complication.
 <<inline handlers for subassignment>>=
 setSetterInlineHandler("[<-", function(afun, place, origplace, call, cb, cntxt) {
     if (dots.or.missing(place) || ! is.null(names(place)) || length(place) < 3)
         cmpSetterDispatch(STARTSUBASSIGN.OP, DFLTSUBASSIGN.OP,
                           afun, place, call, cb, cntxt) ## punt
     else {
         nidx <- length(place) - 2
         if (nidx == 1)
             dflt.op <- list(code = VECSUBASSIGN.OP, rank = FALSE)
         else if (nidx == 2)
             dflt.op <- list(code = MATSUBASSIGN.OP, rank = FALSE)
         else
             dflt.op <- list(code = SUBASSIGN_N.OP, rank = TRUE)
         cmpSubassignDispatch(STARTSUBASSIGN_N.OP, dflt.op, afun, place, call,
                              cb, cntxt)
     }
 })

 setSetterInlineHandler("[[<-", function(afun, place, origplace, call, cb, cntxt) {
     if (dots.or.missing(place) || ! is.null(names(place)) || length(place) < 3)
         cmpSetterDispatch(STARTSUBASSIGN2.OP, DFLTSUBASSIGN2.OP,
                           afun, place, call, cb, cntxt) ## punt
     else {
         nidx <- length(place) - 2
         if (nidx == 1)
             dflt.op <- list(code = VECSUBASSIGN2.OP, rank = FALSE)
         else if (nidx == 2)
             dflt.op <- list(code = MATSUBASSIGN2.OP, rank = FALSE)
         else
             dflt.op <- list(code = SUBASSIGN2_N.OP, rank = TRUE)
         cmpSubassignDispatch(STARTSUBASSIGN2_N.OP, dflt.op, afun, place, call,
                              cb, cntxt)
     }
 })
 @

 Similarly, again, [[cmpSubsetGetterDispatch]] is a variant of
 [[cmpGetterDispatch]] that passes index arguments on the stack.
 <<[[cmpSubsetGetterDispatch]] function>>=
 cmpSubsetGetterDispatch <- function(start.op, dflt.op, call, cb, cntxt) {
     if (dots.or.missing(call) || ! is.null(names(call)) || length(call) < 3)
         cntxt$stop(gettext("cannot compile this expression"), cntxt,
                    loc = cb$savecurloc())
     else {
         ci <- cb$putconst(call)
         end.label <- cb$makelabel()
         cb$putcode(DUP2ND.OP)
         cb$putcode(start.op, ci, end.label)
         indices <- call[-c(1, 2)]
         cmpIndices(indices, cb, cntxt)
         if (dflt.op$rank)
             cb$putcode(dflt.op$code, ci, length(indices))
         else
             cb$putcode(dflt.op$code, ci)
         cb$putlabel(end.label)
         cb$putcode(SWAP.OP)
         TRUE
     }
 }
 @  %def cmpSubsetGetterDispatch

 And again the handlers fall back to [[cmpGetterDispatch]] if necessary.
 <<inline handlers for subset getters>>=
 setGetterInlineHandler("[", function(call, cb, cntxt) {
     if (dots.or.missing(call) || ! is.null(names(call)) || length(call) < 3)
         cmpGetterDispatch(STARTSUBSET.OP, DFLTSUBSET.OP, call, cb, cntxt)
     else {
         nidx <- length(call) - 2;
         if (nidx == 1)
             dflt.op <- list(code = VECSUBSET.OP, rank = FALSE)
         else if (nidx == 2)
             dflt.op <- list(code = MATSUBSET.OP, rank = FALSE)
         else
             dflt.op <- list(code = SUBSET_N.OP, rank = TRUE)
         cmpSubsetGetterDispatch(STARTSUBSET_N.OP, dflt.op, call, cb, cntxt)
     }
 })

 setGetterInlineHandler("[[", function(call, cb, cntxt) {
     if (dots.or.missing(call) || ! is.null(names(call)) || length(call) < 3)
         cmpGetterDispatch(STARTSUBSET2.OP, DFLTSUBSET2.OP, call, cb, cntxt)
     else {
         nidx <- length(call) - 2;
         if (nidx == 1)
             dflt.op <- list(code = VECSUBSET2.OP, rank = FALSE)
         else if (nidx == 2)
             dflt.op <- list(code = MATSUBSET2.OP, rank = FALSE)
         else
             dflt.op <- list(code = SUBSET2_N.OP, rank = TRUE)
         cmpSubsetGetterDispatch(STARTSUBSET2_N.OP, dflt.op, call, cb, cntxt)
     }
 })
 @

 \section{Discussion and future directions}
 Despite its long gestation period this compiler should be viewed as a
 first pass at creating a byte code compiler for R.  The compiler
 itself is very simple in design as a single pass compiler with no
 separate optimization phases.  Similarly the virtual machine uses a
 very simple stack design.  While the compiler already achieves some
 useful performance improvements on loop-intensive code, more can be
 achieved with more sophisticated approaches.  This will be explored in
 future work.

 A major objective of this first version was to reproduce R's
 interpreted semantics with as few departures as possible while at the
 same time optimizing a number of aspect of the execution process. The
 inlining rules controlled by an optimization level setting seem to
 provide a good way of doing this, and the default optimization setting
 seems to be reasonably effective.  Mechanisms for adjusting the
 default settings via declarations will be explored and added in the
 near future.

 Future versions of the compiler and the engine will explore a number
 of alternative designs.  Switching to a register-based virtual machine
 will be explored fairly soon. Preliminary experiments suggest that
 this can provide significant improvements in the case of tight loops
 by allowing allocation of intermediate results to be avoided in many
 cases.  It may be possible at least initially to keep the current
 compiler ant just translate the stack-based machine code to a
 register-based code.

 Another direction that will be explored is whether sequences of
 arithmetic and other numerical operations can be fused and possibly
 vectorized. Again preliminary experiments are promising, but more
 exploration is needed.

 Other improvements to be examined may affect interpreted code as much
 as compiled code.  These include more efficient environment
 representations and more efficient calling conventions.

 %% **** add some benchmarks
 %% **** comment on engine

 %% **** lots of other builtins, specials, and .Internals
 %% **** controlling compiler warnings
 %% **** merging in codetools features
 %% **** put in install-time tests of assumptions about BUILTINs, etc.

 %% **** switch to register-based engine
 %% **** make function calls more efficient
 %% **** try to stay within a single bceval call

 %% **** think about optimizing things like mean?

 %% **** jit that compiles all expressions??
 %% **** can it record code for expr/env pairs or some such?
 %% **** can it inline primitives at that point?

 %% **** Stuff to think about:
 %% ****   alternate environment representation for compiler
 %% ****   optimizing function calls in general
 %% ****   avoiding matching for BOA calls

 %% **** Think about different ways of handling environments.  Should every op
 %% **** return a new env object that includes (possible) new local vars?

 %% **** Useful to be able to distinguish ... in args from assigned-to ...

 %% **** matrix subsetting has to be slow because of the way dim is stored.
 %% **** might make sense to explicitly compile as multiple operations and
 %% **** invariant hoisting  out of loops for loops?

 %% **** look at tail call optimization -- pass call and parent.frame??
 %% **** eliminating variables not needed?
 %% **** alternate builtin call implementations?

 %% **** install compiled promises in code body??

 %% **** catch errors at dispatching of inliners; fall back to runtime error


 \appendix
 \section{General utility functions}
 This appendix provides a few general utility functions.

 The utility function [[pasteExpr]] is used in the error messages.
 %% **** use elipsis instead of collapse??
 %% **** use error context or catch errors?
 %% **** maybe don't need expression if we catch errors?
 <<[[pasteExpr]] function>>=
 pasteExpr <- function(e, prefix = "\n    ") {
     de <- deparse(e)
     if (length(de) == 1) sQuote(de)
     else paste(prefix, deparse(e), collapse="")
 }
 @ %def pasteExpr

 The function [[dots.or.missing]] checks the argument list for any
 missing or [[...]] arguments:
 <<[[dots.or.missing]] function>>=
 dots.or.missing <- function(args) {
     for (i in 1:length(args)) {
         a <-args[[i]]
         if (missing(a)) return(TRUE) #**** better test?
         if (typeof(a) == "symbol" && a == "...") return(TRUE)
     }
     return(FALSE)
 }
 @ %def dots.or.missing

 The function [[any.dots]] is defined as
 <<[[any.dots]] function>>=
 any.dots <- function(args) {
     for (i in 1:length(args)) {
         a <-args[[i]]
         if (! missing(a) && typeof(a) == "symbol" && a == "...")
             return(TRUE)
     }
     return(FALSE)
 }
 @ %def any.dots

 The utility function [[is.ddsym]] is used to recognize symbols of the
 form [[..1]], [[..2]], and so on.
 <<[[is.ddsym]] function>>=
 is.ddsym <- function(name) {
     (is.symbol(name) || is.character(name)) &&
     length(grep("^\\.\\.[0-9]+$", as.character(name))) != 0
 }
 @ %def is.ddsym

 [[missingArgs]] takes an argument list for a call a logical vector
 indicating for each argument whether it is empty (missing) or not.
 <<[[missingArgs]] function>>=
 missingArgs <- function(args) {
     val <- logical(length(args))
     for (i in seq_along(args)) {
         a <- args[[i]]
         if (missing(a))
             val[i] <- TRUE
         else
             val[i] <- FALSE
     }
     val
 }
 @ %def missingArgs


 \section{Environment utilities}
 This appendix presents some utilities for computations on environments.

 The function [[frameTypes]] takes an environment argument and returns
 a character vector with elements for each frame in the environment
 classifying the frame as local, namespace, or global. The environment
 is assumed to be a standard evaluation environment that contains
 [[.GlobalEnv]] as one of its parents. It does this by computing the
 number of local, namespace, and global frames and then generating the
 result using [[rep]].
 <<[[frameTypes]] function>>=
 frameTypes <- function(env) {
     top <- topenv(env)
     empty <- emptyenv()
     <<find the number [[nl]] of local frames>>
     <<find the number [[nn]] of namespace frames>>
     <<find the number [[ng]] of global frames>>
     rep(c("local", "namespace", "global"), c(nl, nn, ng))
 }
 @ %def frameTypes
 The number of local frames is computes by marching down the parent
 frames with [[parent.env]] until the top level environment is reached.
 <<find the number [[nl]] of local frames>>=
 nl <- 0
 while (! identical(env, top)) {
     if (isNamespace(env))
         stop("namespace found within local environments")
     env <- parent.env(env)
     nl <- nl + 1
     if (identical(env, empty))
         stop("not a proper evaluation environment")
 }
 @ %def
 The number of namespace frames is computed by continuing down the
 parent frames until [[.GlobalEnv]] is reached.
 <<find the number [[nn]] of namespace frames>>=
 nn <- 0
 if (isNamespace(env)) {
     while (! identical(env, .GlobalEnv)) {
         if (!isNamespace(env)) {
             name <- attr(env, "name")
             if (!is.character(name) || !startsWith(name, "imports:"))
 		stop("non-namespace found within namespace environments")
         }
         env <- parent.env(env)
         nn <- nn + 1
         if (identical(env, empty))
             stop("not a proper evaluation environment")
     }
 }
 @ %def
 Finally the number of global frames is computed by continuing until
 the empty environment is reached.  An alternative would be to compute
 the length of the result returned by [[search]]
 <<find the number [[ng]] of global frames>>=
 ng <- 0
 while (! identical(env, empty)) {
     if (isNamespace(env))
 	stop("namespace found within global environments")
     env <- parent.env(env)
     ng <- ng + 1
 }
 @

 The function [[findHomeNS]] takes a variable name and a namespace
 frame, or a namespace imports frame, and returns the namespace frame
 in which the variable was originally defined, if any. The code assumes
 that renaming has not been used (it may no longer be supported in the
 namespace implementation in any case). Just in case, an attempt is
 made to check for renaming.  The result returned is the namaspace
 frame for the namespace in which the variable was defined or [[NULL]]
 if the variable was not defined in the specified namespace or one of
 its imports, or if the home namespace cannot be determined.
 <<[[findHomeNS]] function>>=
 ## Given a symbol name and a namespace environment (or a namespace
 ## imports environment) find the namespace in which the symbol's value
 ## was originally defined. Returns NULL if the symbol is not found via
 ## the namespace.
 findHomeNS <- function(sym, ns, cntxt) {
     <<if [[ns]] is an imports frame find the corresponding namespace>>
     if (exists(sym, ns, inherits = FALSE))
         ns
     else if (exists(".__NAMESPACE__.", ns, inherits = FALSE)) {
         <<search the imports for [[sym]]>>
         NULL
     }
     else NULL
 }
 @ %def findHomeNS

 If the [[ns]] argument is not a namespace frame it should be the
 imports frame of a namespace.  Such an imports frame should have a
 [[name]] attribute or the form [["imports:foo"]] it it is associated
 with namespace [["foo"]]. This is used to find the namespace frame
 that owns the imports frame in this case, and this frames is then
 assigned to [[ns]].
 <<if [[ns]] is an imports frame find the corresponding namespace>>=
 if (! isNamespace(ns)) {
     ## As a convenience this allows for 'ns' to be the imports fame
     ## of a namespace. It appears that these now have a 'name'
     ## attribute of the form 'imports:foo' if 'foo' is the
     ## namespace.
     name <- attr(ns, "name")
     if (is.null(name))
         cntxt$stop("'ns' must be a namespace or a namespace imports environment",
             cntxt)
     ns <- getNamespace(sub("imports:", "", attr(ns, "name")))
 }
 @ %def

 The imports are searched in reverse order since in the case of name
 conflicts the last one imported will take precedence.  Full imports
 via an [[import]] directive have to be handled differently than
 selective imports created with [[importFrom]] directives.
 <<search the imports for [[sym]]>>=
 imports <- get(".__NAMESPACE__.", ns)$imports
 for (i in rev(seq_along(imports))) {
     iname <- names(imports)[i]
     ins <- getNamespace(iname)
     if (identical(imports[[i]], TRUE)) {
         <<search in a full import>>
     }
     else {
         <<search in a selective import>>
     }
 }
 @ %def

 If an entry in the [[imports]] specification for the import source
 namespace [[ins]] has value [[TRUE]], then all exports of the [[ins]]
 have been imported.  If [[sym]] is in the exports then the result of a
 recursive call to [[findHomeNS]] is returned.
 <<search in a full import>>=
 if (identical(ins, .BaseNamespaceEnv))
     exports <- .BaseNamespaceEnv
 else
     exports <- get(".__NAMESPACE__.", ins)$exports
 if (exists(sym, exports, inherits = FALSE))
     return(findHomeNS(sym, ins, cntxt))
 @ %def

 For selective imports the [[imports]] entry is a named character
 vector mapping export name to import name.  In the absence of renaming
 the names should match the values; if this is not the case [[NULL]] is
 returned. Otherwise, a match results again in returning a recursive
 call to [[findHomeNS]].
 <<search in a selective import>>=
 exports <- imports[[i]]
 pos <- match(sym, names(exports), 0)
 if (pos) {
     ## If renaming has been used things get too
     ## confusing so return NULL. (It is not clear if
     ## renaming this is still supported by the
     ## namespace code.)
     if (sym == exports[pos])
         return(findHomeNS(sym, ins, cntxt))
     else
         return(NULL)
 }
 @

 Given a package package frame from the global environment the function
 [[packFrameName]] returns the associated package name, which is
 computed from the [[name]] attribute.
 %% **** might be good the check the name is of the form package:foo
 <<[[packFrameName]] function>>=
 packFrameName <- function(frame) {
     fname <- attr(frame, "name")
     if (is.character(fname))
         sub("package:", "", fname)
     else if (identical(frame , baseenv()))
         "base"
     else ""
 }
 @ %def packFrameName

 For a namespace frame the function [[nsName]] retrieves the namespace
 name from the namespace information structure.
 <<[[nsName]] function>>=
 nsName <- function(ns) {
     if (identical(ns, .BaseNamespaceEnv))
         "base"
     else {
         name <- ns$.__NAMESPACE__.$spec["name"]
         if (is.character(name))
             as.character(name) ## strip off names
         else ""
     }
 }
 @ %def nsName


 \section{Experimental utilities}

 This section presents two experimental utililities that, for now, are
 not exported. The first is a simple byte code profiler. This requires
 that the file [[eval.c]] be compiled with [[BC_PROFILING]] enabled,
 which on [[gcc]]-compatible compilers will disable threaded code. The
 byte code profiler uses the profile timer to record the active byte
 code instruction at interrupt time. The function [[bcprof]] runs the
 profiler while evaluating its argument expression and returns a
 summary of the counts.

 <<[[bcprof]] function>>=
 bcprof <- function(expr) {
     .Internal(bcprofstart())
     expr
     .Internal(bcprofstop())
     val <- structure(.Internal(bcprofcounts()),
                      names = Opcodes.names)
     hits <- sort(val[val > 0], decreasing = TRUE)
     pct <- round(100 * hits / sum(hits), 1)
     data.frame(hits = hits, pct = pct)
 }
 @ %def bcprof

 The second utility is a simple interface to the code building
 mechanism that may help with experimenting with code optimizations.
 <<[[asm]] function>>=
 asm <- function(e, gen, env = .GlobalEnv, options = NULL) {
     cenv <- makeCenv(env)
     cntxt <- make.toplevelContext(cenv, options)
     cntxt$env <- addCenvVars(cenv, findLocals(e, cntxt))
     genCode(e, cntxt, gen = gen)
 }
 @ %def asm

 \section{Opcode constants}
 \subsection{Symbolic opcode names}
 <<opcode definitions>>=
 BCMISMATCH.OP <- 0
 RETURN.OP <- 1
 GOTO.OP <- 2
 BRIFNOT.OP <- 3
 POP.OP <- 4
 DUP.OP <- 5
 PRINTVALUE.OP <- 6
 STARTLOOPCNTXT.OP <- 7
 ENDLOOPCNTXT.OP <- 8
 DOLOOPNEXT.OP <- 9
 DOLOOPBREAK.OP <- 10
 STARTFOR.OP <- 11
 STEPFOR.OP <- 12
 ENDFOR.OP <- 13
 SETLOOPVAL.OP <- 14
 INVISIBLE.OP <- 15
 LDCONST.OP <- 16
 LDNULL.OP <- 17
 LDTRUE.OP <- 18
 LDFALSE.OP <- 19
 GETVAR.OP <- 20
 DDVAL.OP <- 21
 SETVAR.OP <- 22
 GETFUN.OP <- 23
 GETGLOBFUN.OP <- 24
 GETSYMFUN.OP <- 25
 GETBUILTIN.OP <- 26
 GETINTLBUILTIN.OP <- 27
 CHECKFUN.OP <- 28
 MAKEPROM.OP <- 29
 DOMISSING.OP <- 30
 SETTAG.OP <- 31
 DODOTS.OP <- 32
 PUSHARG.OP <- 33
 PUSHCONSTARG.OP <- 34
 PUSHNULLARG.OP <- 35
 PUSHTRUEARG.OP <- 36
 PUSHFALSEARG.OP <- 37
 CALL.OP <- 38
 CALLBUILTIN.OP <- 39
 CALLSPECIAL.OP <- 40
 MAKECLOSURE.OP <- 41
 UMINUS.OP <- 42
 UPLUS.OP <- 43
 ADD.OP <- 44
 SUB.OP <- 45
 MUL.OP <- 46
 DIV.OP <- 47
 EXPT.OP <- 48
 SQRT.OP <- 49
 EXP.OP <- 50
 EQ.OP <- 51
 NE.OP <- 52
 LT.OP <- 53
 LE.OP <- 54
 GE.OP <- 55
 GT.OP <- 56
 AND.OP <- 57
 OR.OP <- 58
 NOT.OP <- 59
 DOTSERR.OP <- 60
 STARTASSIGN.OP <- 61
 ENDASSIGN.OP <- 62
 STARTSUBSET.OP <- 63
 DFLTSUBSET.OP <- 64
 STARTSUBASSIGN.OP <- 65
 DFLTSUBASSIGN.OP <- 66
 STARTC.OP <- 67
 DFLTC.OP <- 68
 STARTSUBSET2.OP <- 69
 DFLTSUBSET2.OP <- 70
 STARTSUBASSIGN2.OP <- 71
 DFLTSUBASSIGN2.OP <- 72
 DOLLAR.OP <- 73
 DOLLARGETS.OP <- 74
 ISNULL.OP <- 75
 ISLOGICAL.OP <- 76
 ISINTEGER.OP <- 77
 ISDOUBLE.OP <- 78
 ISCOMPLEX.OP <- 79
 ISCHARACTER.OP <- 80
 ISSYMBOL.OP <- 81
 ISOBJECT.OP <- 82
 ISNUMERIC.OP <- 83
 VECSUBSET.OP <- 84
 MATSUBSET.OP <- 85
 VECSUBASSIGN.OP <- 86
 MATSUBASSIGN.OP <- 87
 AND1ST.OP <- 88
 AND2ND.OP <- 89
 OR1ST.OP <- 90
 OR2ND.OP <- 91
 GETVAR_MISSOK.OP <- 92
 DDVAL_MISSOK.OP <- 93
 VISIBLE.OP <- 94
 SETVAR2.OP <- 95
 STARTASSIGN2.OP <- 96
 ENDASSIGN2.OP <- 97
 SETTER_CALL.OP <- 98
 GETTER_CALL.OP <- 99
 SWAP.OP <- 100
 DUP2ND.OP <- 101
 SWITCH.OP <- 102
 RETURNJMP.OP <- 103
 STARTSUBSET_N.OP <- 104
 STARTSUBASSIGN_N.OP <- 105
 VECSUBSET2.OP <- 106
 MATSUBSET2.OP <- 107
 VECSUBASSIGN2.OP <- 108
 MATSUBASSIGN2.OP <- 109
 STARTSUBSET2_N.OP <- 110
 STARTSUBASSIGN2_N.OP <- 111
 SUBSET_N.OP <- 112
 SUBSET2_N.OP <- 113
 SUBASSIGN_N.OP <- 114
 SUBASSIGN2_N.OP <-115
 LOG.OP <- 116
 LOGBASE.OP <- 117
 MATH1.OP <- 118
 DOTCALL.OP <- 119
 COLON.OP <- 120
 SEQALONG.OP <- 121
 SEQLEN.OP <- 122
 BASEGUARD.OP <- 123
 INCLNK.OP <- 124
 DECLNK.OP <- 125
 DECLNK_N.OP <- 126
 @

 \subsection{Instruction argument counts and names}
 <<opcode argument counts>>=
 Opcodes.argc <- list(
 BCMISMATCH.OP = 0,
 RETURN.OP = 0,
 GOTO.OP = 1,
 BRIFNOT.OP = 2,
 POP.OP = 0,
 DUP.OP = 0,
 PRINTVALUE.OP = 0,
 STARTLOOPCNTXT.OP = 2,
 ENDLOOPCNTXT.OP = 1,
 DOLOOPNEXT.OP = 0,
 DOLOOPBREAK.OP = 0,
 STARTFOR.OP = 3,
 STEPFOR.OP = 1,
 ENDFOR.OP = 0,
 SETLOOPVAL.OP = 0,
 INVISIBLE.OP = 0,
 LDCONST.OP = 1,
 LDNULL.OP = 0,
 LDTRUE.OP = 0,
 LDFALSE.OP = 0,
 GETVAR.OP = 1,
 DDVAL.OP = 1,
 SETVAR.OP = 1,
 GETFUN.OP = 1,
 GETGLOBFUN.OP = 1,
 GETSYMFUN.OP = 1,
 GETBUILTIN.OP = 1,
 GETINTLBUILTIN.OP = 1,
 CHECKFUN.OP = 0,
 MAKEPROM.OP = 1,
 DOMISSING.OP = 0,
 SETTAG.OP = 1,
 DODOTS.OP = 0,
 PUSHARG.OP = 0,
 PUSHCONSTARG.OP = 1,
 PUSHNULLARG.OP = 0,
 PUSHTRUEARG.OP = 0,
 PUSHFALSEARG.OP = 0,
 CALL.OP = 1,
 CALLBUILTIN.OP = 1,
 CALLSPECIAL.OP = 1,
 MAKECLOSURE.OP = 1,
 UMINUS.OP = 1,
 UPLUS.OP = 1,
 ADD.OP = 1,
 SUB.OP = 1,
 MUL.OP = 1,
 DIV.OP = 1,
 EXPT.OP = 1,
 SQRT.OP = 1,
 EXP.OP = 1,
 EQ.OP = 1,
 NE.OP = 1,
 LT.OP = 1,
 LE.OP = 1,
 GE.OP = 1,
 GT.OP = 1,
 AND.OP = 1,
 OR.OP = 1,
 NOT.OP = 1,
 DOTSERR.OP = 0,
 STARTASSIGN.OP = 1,
 ENDASSIGN.OP = 1,
 STARTSUBSET.OP = 2,
 DFLTSUBSET.OP = 0,
 STARTSUBASSIGN.OP = 2,
 DFLTSUBASSIGN.OP = 0,
 STARTC.OP = 2,
 DFLTC.OP = 0,
 STARTSUBSET2.OP = 2,
 DFLTSUBSET2.OP = 0,
 STARTSUBASSIGN2.OP = 2,
 DFLTSUBASSIGN2.OP = 0,
 DOLLAR.OP = 2,
 DOLLARGETS.OP = 2,
 ISNULL.OP = 0,
 ISLOGICAL.OP = 0,
 ISINTEGER.OP = 0,
 ISDOUBLE.OP = 0,
 ISCOMPLEX.OP = 0,
 ISCHARACTER.OP = 0,
 ISSYMBOL.OP = 0,
 ISOBJECT.OP = 0,
 ISNUMERIC.OP = 0,
 VECSUBSET.OP = 1,
 MATSUBSET.OP = 1,
 VECSUBASSIGN.OP = 1,
 MATSUBASSIGN.OP = 1,
 AND1ST.OP = 2,
 AND2ND.OP = 1,
 OR1ST.OP = 2,
 OR2ND.OP = 1,
 GETVAR_MISSOK.OP = 1,
 DDVAL_MISSOK.OP = 1,
 VISIBLE.OP = 0,
 SETVAR2.OP = 1,
 STARTASSIGN2.OP = 1,
 ENDASSIGN2.OP = 1,
 SETTER_CALL.OP = 2,
 GETTER_CALL.OP = 1,
 SWAP.OP = 0,
 DUP2ND.OP = 0,
 SWITCH.OP = 4,
 RETURNJMP.OP = 0,
 STARTSUBSET_N.OP = 2,
 STARTSUBASSIGN_N.OP = 2,
 VECSUBSET2.OP = 1,
 MATSUBSET2.OP = 1,
 VECSUBASSIGN2.OP = 1,
 MATSUBASSIGN2.OP = 1,
 STARTSUBSET2_N.OP = 2,
 STARTSUBASSIGN2_N.OP = 2,
 SUBSET_N.OP = 2,
 SUBSET2_N.OP = 2,
 SUBASSIGN_N.OP = 2,
 SUBASSIGN2_N.OP = 2,
 LOG.OP = 1,
 LOGBASE.OP = 1,
 MATH1.OP = 2,
 DOTCALL.OP = 2,
 COLON.OP = 1,
 SEQALONG.OP = 1,
 SEQLEN.OP = 1,
 BASEGUARD.OP = 2,
 INCLNK.OP = 0,
 DECLNK.OP = 0,
 DECLNK_N.OP = 1
 )
 @

 <<opcode names>>=
 Opcodes.names <- names(Opcodes.argc)
 @ %def Opcodes.names


 \section{Implementation file}
 %% Benchmark code:
 % sf <- function(x) {
 %     s <- 0
 %     for (y in x)
 %         s <- s + y
 %     s
 % }

 % sfc <- cmpfun(sf)

 % x <- 1 : 1000000

 % system.time(sf(x))
 % system.time(sfc(x))


 %% **** need header/copyright/license stuff here
 <<cmp.R>>=
 #  Automatically generated from ../noweb/compiler.nw.
 #
 #  File src/library/compiler/R/cmp.R
 #  Part of the R package, https://www.R-project.org
 #  Copyright (C) 2001-2014 Luke Tierney
 #
 #  This program is free software; you can redistribute it and/or modify
 #  it under the terms of the GNU General Public License as published by
 #  the Free Software Foundation; either version 2 of the License, or
 #  (at your option) any later version.
 #
 #  This program is distributed in the hope that it will be useful,
 #  but WITHOUT ANY WARRANTY; without even the implied warranty of
 #  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 #  GNU General Public License for more details.
 #
 #  A copy of the GNU General Public License is available at
 #  https://www.R-project.org/Licenses/

 ##
 ## Compiler options
 ##

 <<compiler options data base>>

 <<[[getCompilerOption]] function>>


 ##
 ## General Utilities
 ##

 <<[[pasteExpr]] function>>

 <<[[dots.or.missing]] function>>

 <<[[any.dots]] function>>

 <<[[is.ddsym]] function>>


 <<[[missingArgs]] function>>


 ##
 ## Environment utilities
 ##

 <<[[frameTypes]] function>>

 <<[[findHomeNS]] function>>

 <<[[packFrameName]] function>>

 <<[[nsName]] function>>


 ##
 ## Finding possible local variables
 ##

 <<[[getAssignedVar]] function>>

 <<[[findLocals1]] function>>

 <<[[findLocalsList1]] function>>

 <<[[findLocals]] function>>

 <<[[findLocalsList]] function>>


 ##
 ## Compilation environment implementation
 ##

 <<[[makeCenv]] function>>

 <<[[addCenvVars]] function>>

 <<[[addCenvFrame]] function>>

 <<[[findCenvVar]] function>>

 <<[[isBaseVar]] function>>

 <<[[funEnv]] function>>

 <<[[findLocVar]] function>>

 <<[[findFunDef]] function>>

 <<[[findVar]] function>>


 ##
 ## Constant folding
 ##

 <<[[maxConstSize]] and [[constModes]] definitions>>

 <<[[constNames]] definition>>

 <<[[checkConst]] function>>

 <<[[constantFoldSym]] function>>

 <<[[getFoldFun]] function>>

 <<[[constantFoldCall]] function>>

 <<[[constantFold]] function>>

 <<[[foldFuns]] definition>>

 <<[[languageFuns]] definition>>


 ##
 ## Opcode constants
 ##

 <<opcode argument counts>>

 <<opcode names>>

 <<opcode definitions>>


 ##
 ## Code buffer implementation
 ##

 <<source location tracking functions>>

 <<[[make.codeBuf]] function>>

 <<[[codeBufCode]] function>>

 <<[[genCode]] function>>


 ##
 ## Compiler contexts
 ##

 <<[[make.toplevelContext]] function>>

 <<[[make.callContext]] function>>

 <<[[make.promiseContext]] function>>

 <<[[make.functionContext]] function>>

 <<[[make.nonTailCallContext]] function>>

 <<[[make.argContext]] function>>

 <<[[make.noValueContext]] function>>

 <<[[make.loopContext]] function>>


 ##
 ## Compiler top level
 ##

 <<[[cmp]] function>>

 <<[[cmpConst]] function>>

 <<[[cmpSym]] function>>

 <<[[cmpCall]] function>>

 <<[[cmpCallSymFun]] function>>

 <<[[cmpCallExprFun]] function>>

 <<[[cmpCallArgs]] function>>

 <<[[cmpConstArg]]>>

 <<[[checkCall]] function>>

 ## **** need to handle ... and ..n arguments specially
 ## **** separate call opcode for calls with named args?
 ## **** for (a in e[[-1]]) ... goes into infinite loop

 <<[[cmpTag]] function>>

 <<[[mayCallBrowser]] function>>

 <<[[mayCallBrowserList]] function>>

 ##
 ## Inlining mechanism
 ##

 <<inline handler implementation>>

 ## tryInline implements the rule permitting inlining as they stand now:
 ## Inlining is controlled by the optimize compiler option, with possible
 ## values 0, 1, 2, 3.

 <<[[getInlineInfo]] function>>

 <<[[tryInline]] function>>


 ##
 ## Inline handlers for some SPECIAL functions
 ##

 <<inlining handler for [[function]]>>

 <<inlining handler for left brace function>>

 <<inlining handler for [[if]]>>

 <<inlining handler for [[&&]]>>

 <<inlining handler for [[||]]>>


 ##
 ## Inline handlers for assignment expressions
 ##

 <<setter inlining mechanism>>

 <<getter inlining mechanism>>

 <<[[cmpAssign]] function>>

 <<[[flattenPlace]] function>>

 <<[[cmpGetterCall]] function>>

 <<[[checkAssign]] function>>

 <<[[cmpSymbolAssign]] function>>

 <<[[cmpComplexAssign]] function>>

 <<[[cmpSetterCall]] function>>

 <<[[getAssignFun]] function>>

 <<[[cmpSetterDispatch]] function>>

 <<inlining handlers for [[<-]], [[=]], and [[<<-]]>>

 <<setter inline handler for [[$<-]]>>

 <<setter inline handlers for [[ [<- ]] and [[ [[<- ]]>>

 <<[[cmpGetterDispatch]] function>>

 <<getter inline handler for [[$]]>>

 <<getter inline handlers for [[[]] and [[[[]]>>


 ##
 ## Inline handlers for loops
 ##

 <<inlining handlers for [[next]] and [[break]]>>

 <<[[isLoopStopFun]] function>>

 <<[[isLoopTopFun]] function>>

 <<[[checkSkipLoopCntxtList]] function>>

 <<[[checkSkipLoopCntxt]] function>>

 <<inlining handler for [[repeat]] loops>>

 <<[[cmpRepeatBody]] function>>

 <<inlining handler for [[while]] loops>>

 <<[[cmpWhileBody]] function>>

 <<inlining handler for [[for]] loops>>

 <<[[cmpForBody]] function>>


 ##
 ## Inline handlers for one and two argument primitives
 ##

 <<[[cmpPrim1]] function>>

 <<[[checkNeedsInc]] function>>

 <<[[cmpPrim2]] function>>

 <<inline handlers for [[+]] and [[-]]>>

 <<inline handlers for [[*]] and [[/]]>>

 <<inline handlers for [[^]], [[exp]], and [[sqrt]]>>

 <<inline handler for [[log]]>>

 <<list of one argument math functions>>

 <<[[cmpMath1]] function>>

 <<inline one argument math functions>>

 <<inline handlers for comparison operators>>

 <<inline handlers for [[&]] and [[|]]>>

 <<inline handler for [[!]]>>


 ##
 ## Inline handlers for the left parenthesis function
 ##

 <<inlining handler for [[(]]>>


 ##
 ## Inline handlers for general BUILTIN and SPECIAL functions
 ##

 <<[[cmpBuiltin]] function>>

 <<[[cmpBuiltinArgs]] function>>

 <<[[cmpSpecial]] function>>

 <<inlining handler for [[.Internal]]>>


 ##
 ## Inline handlers for subsetting and related operators
 ##

 <<[[cmpDispatch]] function>>

 <<inlining handlers for some dispatching SPECIAL functions>>

 <<inlining handler for [[$]]>>


 ##
 ## Inline handler for local() and return() functions
 ##

 <<inlining handler for [[local]] function>>

 <<inlining handler for [[return]] function>>


 ##
 ## Inline handlers for the family of is.xyz primitives
 ##

 <<[[cmpIs]] function>>

 <<inlining handlers for [[is.xyz]] functions>>


 ##
 ## Default inline handlers for BUILTIN and SPECIAL functions
 ##

 <<install default inlining handlers>>


 ##
 ## Inline handlers for some .Internal functions
 ##

 <<[[simpleFormals]] function>>

 <<[[simpleArgs]] function>>

 <<[[is.simpleInternal]] function>>

 <<[[inlineSimpleInternalCall]] function>>

 <<[[cmpSimpleInternal]] function>>

 <<inline safe simple [[.Internal]] functions from [[base]]>>

 <<inline safe simple [[.Internal]] functions from [[stats]]>>


 ##
 ## Inline handler for switch
 ##

 <<[[findActionIndex]] function>>

 <<inline handler for [[switch]]>>


 ##
 ## Inline handler for .Call
 ##

 <<inline handler for [[.Call]]>>


 ##
 ## Inline handlers for generating integer sequences
 ##

 <<inline handlers for integer sequences>>


 ##
 ## Inline handlers to control warnings
 ##

 <<[[cmpMultiColon]] function>>

 <<inlining handlers for [[::]] and [[:::]]>>

 <<setter inlining handler for [[@<-]]>>

 <<inlining handler for [[with]]>>

 <<inlining handler for [[require]]>>


 ##
 ## Compiler warnings
 ##

 <<[[suppressAll]] function>>

 <<[[suppressNoSuperAssignVar]] function>>

 <<[[suppressUndef]] function>>

 <<[[notifyLocalFun]] function>>

 <<[[notifyUndefFun]] function>>

 <<[[notifyUndefVar]] function>>

 <<[[notifyNoSuperAssignVar]] function>>

 <<[[notifyWrongArgCount]] function>>

 <<[[notifyWrongDotsUse]] function>>

 <<[[notifyWrongBreakNext]] function>>

 <<[[notifyBadCall]] function>>

 <<[[notifyBadAssignFun]] function>>

 <<[[notifyMultipleSwitchDefaults]] function>>

 <<[[notifyNoSwitchcases]] function>>

 <<[[notifyAssignSyntacticFun]] function>>

 <<[[notifyCompilerError]] function>>


 ##
 ## Compiler interface
 ##

 <<[[compile]] function>>

 <<[[cmpfun]] function>>

 <<[[tryCmpfun]] function>>

 <<[[tryCompile]] function>>

 <<[[cmpframe]] function>>

 <<[[cmplib]] function>>

 <<[[cmpfile]] function>>

 <<[[loadcmp]] function>>

 <<[[enableJIT]] function>>

 <<[[compilePKGS]] function>>

 <<[[setCompilerOptions]] function>>

 <<[[.onLoad]] function>>

 <<[[checkCompilerOptions]] function>>


 ##
 ## Disassembler
 ##

 <<[[bcDecode]] function>>

 <<[[disassemble]] function>>


 ##
 ## Experimental Utilities
 ##

 <<[[bcprof]] function>>

 <<[[asm]] function>>


 ##
 ## Improved subset and subassign handling
 ##

 <<[[cmpIndices]] function>>

 <<[[cmpSubsetDispatch]] function>>

 <<inline handlers for subsetting>>

 <<[[cmpSubassignDispatch]] function>>

 <<inline handlers for subassignment>>

 <<[[cmpSubsetGetterDispatch]] function>>

 <<inline handlers for subset getters>>
 @
 \end{document}