| \input texinfo |
| @c %**start of header |
| @setfilename R-exts.info |
| @settitle Writing R Extensions |
| @setchapternewpage on |
| @c %**end of header |
| |
| @c @documentencoding ISO-8859-1 |
| |
| @c Put the functions in the variable index |
| @syncodeindex fn vr |
| |
| @dircategory Programming |
| @direntry |
| * R Extensions: (R-exts). Writing R Extensions. |
| @end direntry |
| |
| @finalout |
| |
| @include R-defs.texi |
| @include version.texi |
| |
| @copying |
| This manual is for R, version @value{VERSION}. |
| |
| @Rcopyright{1999} |
| |
| @quotation |
| @permission{} |
| @end quotation |
| @end copying |
| |
| @titlepage |
| @title Writing R Extensions |
| @subtitle Version @value{VERSION} |
| @author R Core Team |
| @page |
| @vskip 0pt plus 1filll |
| @insertcopying |
| @end titlepage |
| |
| @ifplaintext |
| @insertcopying |
| @end ifplaintext |
| |
| @c @ifnothtml |
| @contents |
| @c @end ifnothtml |
| |
| @ifnottex |
| @node Top, Acknowledgements, (dir), (dir) |
| @top Writing R Extensions |
| |
| This is a guide to extending @R{}, describing the process of creating |
| @R{} add-on packages, writing @R{} documentation, @R{}'s system and |
| foreign language interfaces, and the @R{} @acronym{API}. |
| |
| @insertcopying |
| |
| @end ifnottex |
| |
| @menu |
| * Acknowledgements:: |
| * Creating R packages:: |
| * Writing R documentation files:: |
| * Tidying and profiling R code:: |
| * Debugging:: |
| * System and foreign language interfaces:: |
| * The R API:: |
| * Generic functions and methods:: |
| * Linking GUIs and other front-ends to R:: |
| * Function and variable index:: |
| * Concept index:: |
| @end menu |
| |
| @node Acknowledgements, Creating R packages, Top, Top |
| @unnumbered Acknowledgements |
| |
| |
| The contributions to early versions of this manual by Saikat DebRoy |
| (who wrote the first draft of a guide to using @code{.Call} and |
| @code{.External}) and Adrian Trapletti (who provided information on the |
| C++ interface) are gratefully acknowledged. |
| |
| @node Creating R packages, Writing R documentation files, Acknowledgements, Top |
| @chapter Creating R packages |
| @cindex Packages |
| @cindex Creating packages |
| |
| Packages provide a mechanism for loading optional code, data and |
| documentation as needed. The @R{} distribution itself includes about 30 |
| packages. |
| |
| In the following, we assume that you know the @code{library()} command, |
| including its @code{lib.loc} argument, and we also assume basic |
| knowledge of the @command{R CMD INSTALL} utility. Otherwise, please |
| look at @R{}'s help pages on |
| |
| @example |
| ?library |
| ?INSTALL |
| @end example |
| |
| @noindent |
| before reading on. |
| |
| For packages which contain code to be compiled, a computing environment |
| including a number of tools is assumed; the ``R Installation and |
| Administration'' manual describes what is needed for each OS. |
| |
| Once a source package is created, it must be installed by |
| the command @code{R CMD INSTALL}. |
| @ifset UseExternalXrefs |
| @xref{Add-on packages, , Add-on-packages, |
| R-admin, R Installation and Administration}. |
| @end ifset |
| |
| Other types of extensions are supported (but rare): @xref{Package types}. |
| |
| Some notes on terminology complete this introduction. These will help |
| with the reading of this manual, and also in describing concepts |
| accurately when asking for help. |
| |
| A @emph{package} is a directory of files which extend @R{}, a |
| @emph{source package} (the master files of a package), or a tarball |
| containing the files of a source package, or an @emph{installed} |
| package, the result of running @command{R CMD INSTALL} on a source |
| package. On some platforms (notably macOS and Windows) there are also |
| @emph{binary packages}, a zip file or tarball containing the files of an |
| installed package which can be unpacked rather than installing from |
| sources. |
| |
| A package is @strong{not}@footnote{although this is a persistent |
| mis-usage. It seems to stem from S, whose analogues of @R{}'s packages |
| were officially known as @emph{library sections} and later as |
| @emph{chapters}, but almost always referred to as @emph{libraries}.} a |
| @emph{library}. The latter is used in two senses in @R{} documentation. |
| |
| @itemize |
| |
| @item |
| A directory into which packages are installed, e.g.@: |
| @file{/usr/lib/R/library}: in that sense it is sometimes referred to as |
| a @emph{library directory} or @emph{library tree} (since the library is |
| a directory which contains packages as directories, which themselves |
| contain directories). |
| |
| @item |
| That used by the operating system, as a shared, dynamic or static |
| library or (especially on Windows) a DLL, where the second L stands for |
| `library'. Installed packages may contain compiled code in what is |
| known on Unix-alikes as a @emph{shared object} and on Windows as a DLL. |
| The concept of a @emph{shared library} (@emph{dynamic library} on macOS) |
| as a collection of compiled code to which a package might link is also |
| used, especially for @R{} itself on some platforms. On most platforms |
| these concepts are interchangeable (shared objects and DLLs can both be |
| loaded into the @R{} process and be linked against), but macOS |
| distinguishes between shared objects (extension @file{.so}) and dynamic |
| libraries (extension @file{.dylib}). |
| |
| @end itemize |
| |
| There are a number of well-defined operations on source packages. |
| |
| @itemize |
| |
| @item |
| The most common is @emph{installation} which takes a source package and |
| installs it in a library using @command{R CMD INSTALL} or |
| @code{install.packages}. |
| |
| @item |
| Source packages can be @emph{built}. This involves taking a source |
| directory and creating a tarball ready for distribution, including |
| cleaning it up and creating PDF documentation from any @emph{vignettes} |
| it may contain. Source packages (and most often tarballs) can be |
| @emph{checked}, when a test installation is done and tested (including |
| running its examples); also, the contents of the package are tested in |
| various ways for consistency and portability. |
| |
| @item |
| @emph{Compilation} is not a correct term for a package. Installing a |
| source package which contains C, C++ or Fortran code will involve |
| compiling that code. There is also the possibility of `byte' compiling |
| the @R{} code in a package (using the facilities of package |
| @pkg{compiler}): nowadays this is enabled by default for all |
| packages. So @emph{compiling} a package may come to mean byte-compiling |
| its @R{} code. |
| |
| @item |
| It used to be unambiguous to talk about @emph{loading} an installed |
| package using @code{library()}, but since the advent of package |
| namespaces this has been less clear: people now often talk about |
| @emph{loading} the package's namespace and then @emph{attaching} the |
| package so it becomes visible on the search path. Function |
| @code{library} performs both steps, but a package's namespace can be |
| loaded without the package being attached (for example by calls like |
| @code{splines::ns}). |
| |
| @end itemize |
| |
| The concept of @emph{lazy loading} of code or data is mentioned at |
| several points. This is part of the installation, always selected for |
| @R{} code but optional for data. When used the @R{} objects of the |
| package are created at installation time and stored in a database in the |
| @file{R} directory of the installed package, being loaded into the |
| session at first use. This makes the @R{} session start up faster and |
| use less (virtual) memory. |
| @ifset UseExternalXrefs |
| (For technical details, |
| @pxref{Lazy loading, , Lazy loading, R-ints, R Internals}.) |
| @end ifset |
| |
| @cindex CRAN |
| @acronym{CRAN} is a network of WWW sites holding the @R{} distributions |
| and contributed code, especially @R{} packages. Users of @R{} are |
| encouraged to join in the collaborative project and to submit their own |
| packages to @acronym{CRAN}: current instructions are linked from |
| @uref{https://CRAN.R-project.org/@/banner.shtml#submitting}. |
| |
| |
| @menu |
| * Package structure:: |
| * Configure and cleanup:: |
| * Checking and building packages:: |
| * Writing package vignettes:: |
| * Package namespaces:: |
| * Writing portable packages:: |
| * Diagnostic messages:: |
| * Internationalization:: |
| * CITATION files:: |
| * Package types:: |
| * Services:: |
| @end menu |
| |
| @node Package structure, Configure and cleanup, Creating R packages, Creating R packages |
| @section Package structure |
| @cindex Package structure |
| |
| The sources of an @R{} package consists of a subdirectory containing a |
| files @file{DESCRIPTION} and @file{NAMESPACE}, and the subdirectories |
| @file{R}, @file{data}, @file{demo}, @file{exec}, @file{inst}, |
| @file{man}, @file{po}, @file{src}, @file{tests}, @file{tools} and |
| @file{vignettes} (some of which can be missing, but which should not be |
| empty). The package subdirectory may also contain files @file{INDEX}, |
| @file{configure}, @file{cleanup}, @file{LICENSE}, @file{LICENCE} and |
| @file{NEWS}. Other files such as @file{INSTALL} (for non-standard |
| installation instructions), @file{README}/@file{README.md}@footnote{This |
| seems to be commonly used for a file in `markdown' format. Be aware |
| that most users of @R{} will not know that, nor know how to view such a |
| file: platforms such as macOS and Windows do not have a default viewer |
| set in their file associations. The @acronym{CRAN} package web pages |
| render such files in @HTML{}: the converter used expects the file to be |
| encoded in UTF-8.}, or @file{ChangeLog} will be ignored by @R{}, but may |
| be useful to end users. The utility @command{R CMD build} may add files |
| in a @file{build} directory (but this should not be used for other |
| purposes). |
| |
| Except where specifically mentioned,@footnote{currently, top-level files |
| @file{.Rbuildignore} and @file{.Rinstignore}, and |
| @file{vignettes/.install_extras}.} packages should not contain |
| Unix-style `hidden' files/directories (that is, those whose name starts |
| with a dot). |
| |
| The @file{DESCRIPTION} and @file{INDEX} files are described in the |
| subsections below. The @file{NAMESPACE} file is described in the |
| section on @ref{Package namespaces}. |
| |
| @cindex configure file |
| @cindex cleanup file |
| |
| The optional files @file{configure} and @file{cleanup} are (Bourne) |
| shell scripts which are, respectively, executed before and (if option |
| @option{--clean} was given) after installation on Unix-alikes, see |
| @ref{Configure and cleanup}. The analogues on Windows are |
| @file{configure.win} and @file{cleanup.win}. |
| |
| For the conventions for files @file{NEWS} and @file{ChangeLog} in the |
| @acronym{GNU} project see |
| @uref{https://www.gnu.org/@/prep/@/standards/@/standards.html#Documentation}. |
| |
| The package subdirectory should be given the same name as the package. |
| Because some file systems (e.g., those on Windows and by default on OS |
| X) are not case-sensitive, to maintain portability it is strongly |
| recommended that case distinctions not be used to distinguish different |
| packages. For example, if you have a package named @file{foo}, do not |
| also create a package named @file{Foo}. |
| |
| To ensure that file names are valid across file systems and supported |
| operating systems, the @acronym{ASCII} control characters as well as the |
| characters @samp{"}, @samp{*}, @samp{:}, @samp{/}, @samp{<}, @samp{>}, |
| @samp{?}, @samp{\}, and @samp{|} are not allowed in file names. In |
| addition, files with names @samp{con}, @samp{prn}, @samp{aux}, |
| @samp{clock$}, @samp{nul}, @samp{com1} to @samp{com9}, and @samp{lpt1} |
| to @samp{lpt9} after conversion to lower case and stripping possible |
| ``extensions'' (e.g., @samp{lpt5.foo.bar}), are disallowed. Also, file |
| names in the same directory must not differ only by case (see the |
| previous paragraph). In addition, the basenames of @samp{.Rd} files may |
| be used in URLs and so must be @acronym{ASCII} and not contain @code{%}. |
| For maximal portability filenames should only contain only |
| @acronym{ASCII} characters not excluded already (that is |
| @code{A-Za-z0-9._!#$%&+,;=@@^()@{@}'[]} --- we exclude space as many |
| utilities do not accept spaces in file paths): non-English alphabetic |
| characters cannot be guaranteed to be supported in all locales. It |
| would be good practice to avoid the shell metacharacters |
| @code{()@{@}'[]$~}: @code{~} is also used as part of `8.3' filenames on |
| Windows. In addition, packages are normally distributed as tarballs, |
| and these have a limit on path lengths: for maximal portability 100 |
| bytes. |
| |
| A source package if possible should not contain binary executable files: |
| they are not portable, and a security risk if they are of the |
| appropriate architecture. @command{R CMD check} will warn about |
| them@footnote{false positives are possible, but only a handful have been |
| seen so far.} unless they are listed (one filepath per line) in a file |
| @file{BinaryFiles} at the top level of the package. Note that |
| @acronym{CRAN} will not accept submissions containing binary files |
| even if they are listed. |
| |
| The @R{} function @code{package.skeleton} can help to create the |
| structure for a new package: see its help page for details. |
| |
| @menu |
| * The DESCRIPTION file:: |
| * Licensing:: |
| * Package Dependencies:: |
| * The INDEX file:: |
| * Package subdirectories:: |
| * Data in packages:: |
| * Non-R scripts in packages:: |
| * Specifying URLs:: |
| @end menu |
| |
| @node The DESCRIPTION file, Licensing, Package structure, Package structure |
| @subsection The @file{DESCRIPTION} file |
| @cindex DESCRIPTION file |
| |
| The @file{DESCRIPTION} file contains basic information about the package |
| in the following format: |
| |
| @quotation |
| @cartouche |
| @smallexample |
| Package: pkgname |
| Version: 0.5-1 |
| Date: 2015-01-01 |
| Title: My First Collection of Functions |
| Authors@@R: c(person("Joe", "Developer", role = c("aut", "cre"), |
| email = "Joe.Developer@@some.domain.net"), |
| person("Pat", "Developer", role = "aut"), |
| person("A.", "User", role = "ctb", |
| email = "A.User@@whereever.net")) |
| Author: Joe Developer [aut, cre], |
| Pat Developer [aut], |
| A. User [ctb] |
| Maintainer: Joe Developer <Joe.Developer@@some.domain.net> |
| Depends: R (>= 3.1.0), nlme |
| Suggests: MASS |
| Description: A (one paragraph) description of what |
| the package does and why it may be useful. |
| License: GPL (>= 2) |
| URL: https://www.r-project.org, http://www.another.url |
| BugReports: https://pkgname.bugtracker.url |
| @end smallexample |
| @end cartouche |
| @end quotation |
| |
| @noindent |
| The format is that of a version of a `Debian Control File' (see the help |
| for @samp{read.dcf} and |
| @uref{https://www.debian.org/@/doc/@/debian-policy/@/ch-controlfields.html}: |
| @R{} does not require encoding in UTF-8 and does not support comments |
| starting with @samp{#}). Fields start with an @acronym{ASCII} name |
| immediately followed by a colon: the value starts after the colon and a |
| space. Continuation lines (for example, for descriptions longer than |
| one line) start with a space or tab. Field names are case-sensitive: |
| all those used by @R{} are capitalized. |
| |
| For maximal portability, the @file{DESCRIPTION} file should be written |
| entirely in @acronym{ASCII} --- if this is not possible it must contain |
| an @samp{Encoding} field (see below). |
| |
| Several optional fields take @emph{logical values}: these can be |
| specified as @samp{yes}, @samp{true}, @samp{no} or @samp{false}: |
| capitalized values are also accepted. |
| |
| The @samp{Package}, @samp{Version}, @samp{License}, @samp{Description}, |
| @samp{Title}, @samp{Author}, and @samp{Maintainer} fields are mandatory, |
| all other fields are optional. Fields @samp{Author} and |
| @samp{Maintainer} can be auto-generated from @samp{Authors@@R}, and may |
| be omitted if the latter is provided: however if they are not |
| @acronym{ASCII} we recommend that they are provided. |
| |
| @c DESCRIPTION field Package |
| The mandatory @samp{Package} field gives the name of the package. This |
| should contain only (@acronym{ASCII}) letters, numbers and dot, have at |
| least two characters and start with a letter and not end in a dot. If |
| it needs explaining, this should be done in the @samp{Description} field |
| (and not the @samp{Title} field). |
| |
| @c DESCRIPTION field Version |
| The mandatory @samp{Version} field gives the version of the package. |
| This is a sequence of at least @emph{two} (and usually three) |
| non-negative integers separated by single @samp{.} or @samp{-} |
| characters. The canonical form is as shown in the example, and a |
| version such as @samp{0.01} or @samp{0.01.0} will be handled as if it |
| were @samp{0.1-0}. It is @strong{not} a decimal number, so for example |
| @code{0.9 < 0.75} since @code{9 < 75}. |
| |
| The mandatory @samp{License} field is discussed in the next subsection. |
| |
| @c DESCRIPTION field Title |
| The mandatory @samp{Title} field should give a @emph{short} description |
| of the package. Some package listings may truncate the title to 65 |
| characters. It should use @emph{title case} (that is, use capitals for |
| the principal words: @code{tools::toTitleCase} can help you with this), |
| not use any markup, not have any continuation lines, and not end in a |
| period (unless part of @dots{}). Do not repeat the package name: it is |
| often used prefixed by the name. Refer to other packages and external |
| software in single quotes, and to book titles (and similar) in double |
| quotes. |
| |
| @c DESCRIPTION field Description |
| The mandatory @samp{Description} field should give a |
| @emph{comprehensive} description of what the package does. One can use |
| several (complete) sentences, but only one paragraph. It should be |
| intelligible to all the intended readership (e.g.@: for a @acronym{CRAN} |
| package to all @acronym{CRAN} users). It is good practice not to start |
| with the package name, `This package' or similar. As with the |
| @samp{Title} field, double quotes should be used for quotations |
| (including titles of books and articles), and single quotes for |
| non-English usage, including names of other packages and external |
| software. This field should also be used for explaining the package |
| name if necessary. URLs should be enclosed in angle brackets, e.g.@: |
| @samp{<https://www.r-project.org>}: see also @ref{Specifying URLs}. |
| |
| @c DESCRIPTION field Author |
| @c DESCRIPTION field Authors@R |
| The mandatory @samp{Author} field describes who wrote @emph{the |
| package}. It is a plain text field intended for human readers, but not |
| for automatic processing (such as extracting the email addresses of all |
| listed contributors: for that use @samp{Authors@@R}). Note that all |
| significant contributors must be included: if you wrote an @R{} wrapper |
| for the work of others included in the @file{src} directory, you are not |
| the sole (and maybe not even the main) author. |
| |
| @c DESCRIPTION field Maintainer |
| The mandatory @samp{Maintainer} field should give a @emph{single} name |
| followed by a @emph{valid} (RFC 2822) email address in angle brackets. It |
| should not end in a period or comma. This field is what is reported by |
| the @code{maintainer} function and used by @code{bug.report}. For a |
| @acronym{CRAN} package it should be a @emph{person}, not a mailing list |
| and not a corporate entity: do ensure that it is valid and will remain |
| valid for the lifetime of the package. |
| |
| Note that the @emph{display name} (the part before the address in angle |
| brackets) should be enclosed in double quotes if it contains |
| non-alphanumeric characters such as comma or period. (The current |
| standard, RFC 5322, allows periods but RFC 2822 did not.) |
| |
| Both @samp{Author} and @samp{Maintainer} fields can be omitted if a |
| suitable @samp{Authors@@R} field is given. This field can be used to |
| provide a refined and machine-readable description of the package |
| ``authors'' (in particular specifying their precise @emph{roles}), @emph{via} |
| suitable @R{} code. It should create an object of class @code{"person"}, |
| by either a call to @code{person} or a series of calls (one per |
| ``author'') concatenated by @code{c()}: see the example |
| @file{DESCRIPTION} file above. The roles can include @samp{"aut"} |
| (author) for full authors, @samp{"cre"} (creator) for the package |
| maintainer, and @samp{"ctb"} (contributor) for other contributors, |
| @samp{"cph"} (copyright holder), among others. See @code{?person} for |
| more information. Note that no role is assumed by default. |
| Auto-generated package citation information takes advantage of this |
| specification. The @samp{Author} and @samp{Maintainer} fields are |
| auto-generated from it if needed when building@footnote{at least if this |
| is done in a locale which matches the package encoding.} or installing. |
| |
| @findex COPYRIGHTS |
| @c DESCRIPTION field Copyright |
| An optional @samp{Copyright} field can be used where the copyright |
| holder(s) are not the authors. If necessary, this can refer to an |
| installed file: the convention is to use file @file{inst/COPYRIGHTS}. |
| |
| @c DESCRIPTION field Date |
| The optional @samp{Date} field gives the @emph{release date} of the |
| current version of the package. It is strongly recommended@footnote{and |
| required by @acronym{CRAN}, so checked by @command{R CMD check |
| --as-cran}.} to use the @samp{yyyy-mm-dd} format conforming to the ISO |
| 8601 standard. |
| |
| The @samp{Depends}, @samp{Imports}, @samp{Suggests}, @samp{Enhances}, |
| @samp{LinkingTo} and @samp{Additional_repositories} fields are discussed |
| in a later subsection. |
| |
| @c DESCRIPTION field SystemRequirements |
| Dependencies external to the @R{} system should be listed in the |
| @samp{SystemRequirements} field, possibly amplified in a separate |
| @file{README} file. |
| |
| @c DESCRIPTION field URL |
| The @samp{URL} field may give a list of @acronym{URL}s |
| separated by commas or whitespace, for example the homepage of the |
| author or a page where additional material describing the software can |
| be found. These @acronym{URL}s are converted to active hyperlinks in |
| @acronym{CRAN} package listings. @xref{Specifying URLs}. |
| |
| @c DESCRIPTION field BugReports |
| The @samp{BugReports} field may contain a single @acronym{URL} to which |
| bug reports about the package should be submitted. This @acronym{URL} |
| will be used by @code{bug.report} instead of sending an email to the |
| maintainer. A browser is opened for a @samp{http://} or @samp{https://} |
| @acronym{URL}. To specify another email address for bug reports, use |
| @samp{Contact} instead: however @code{bug.report} will try to extract an |
| email address (preferably from a @samp{mailto:} URL or enclosed in angle |
| brackets) from @samp{BugReports}. |
| |
| @c DESCRIPTION field Priority |
| Base and recommended packages (i.e., packages contained in the @R{} |
| source distribution or available from @acronym{CRAN} and recommended to |
| be included in every binary distribution of @R{}) have a @samp{Priority} |
| field with value @samp{base} or @samp{recommended}, respectively. These |
| priorities must not be used by other packages. |
| |
| @c DESCRIPTION field Collate |
| @c DESCRIPTION field Collate.unix |
| @c DESCRIPTION field Collate.windows |
| A @samp{Collate} field can be used for controlling the collation order |
| for the @R{} code files in a package when these are processed for |
| package installation. The default is to collate according to the |
| @samp{C} locale. If present, the collate specification must list |
| @emph{all} @R{} code files in the package (taking possible OS-specific |
| subdirectories into account, see @ref{Package subdirectories}) as a |
| whitespace separated list of file paths relative to the @file{R} |
| subdirectory. |
| @c % double quotes are not allowed in path names, for Windows |
| Paths containing white space or quotes need to be quoted. An |
| OS-specific collation field (@samp{Collate.unix} or |
| @samp{Collate.windows}) will be used in preference to @samp{Collate}. |
| |
| @c DESCRIPTION field LazyData |
| @c DESCRIPTION field LazyLoad |
| The @samp{LazyData} logical field controls whether the @R{} datasets use |
| lazy-loading. A @samp{LazyLoad} field was used in versions prior to |
| 2.14.0, but now is ignored. |
| |
| @c DESCRIPTION field KeepSource |
| The @samp{KeepSource} logical field controls if the package code is sourced |
| using @code{keep.source = TRUE} or @code{FALSE}: it might be needed |
| exceptionally for a package designed to always be used with |
| @code{keep.source = TRUE}. |
| |
| @c DESCRIPTION field ByteCompile |
| The @samp{ByteCompile} logical field controls if the package code is to |
| be byte-compiled on installation: the default is to byte-compile. This |
| can be overridden by installing with flag @option{--no-byte-compile}. |
| |
| @c DESCRIPTION field StagedInstall |
| The @samp{StagedInstall} logical field controls if package installation |
| is `staged', that is done to a temporary location and moved to the final |
| location when successfully completed. This field was introduced in @R{} |
| 3.6.0 and it true by default: it is considered to be a temporary measure |
| which may be withdrawn in future. |
| |
| @c DESCRIPTION field ZipData |
| The @samp{ZipData} logical field has been ignored since @R{} 2.13.0. |
| |
| @c DESCRIPTION field Biarch |
| The @samp{Biarch} logical field is used on Windows to select the |
| @command{INSTALL} option @option{--force-biarch} for this package. |
| |
| @c DESCRIPTION field BuildVignettes |
| The @samp{BuildVignettes} logical field can be set to a false value to |
| stop @command{R CMD build} from attempting to build the vignettes, as |
| well as preventing@footnote{But it is checked for Open Source packages |
| by @command{R CMD check --as-cran}.} @command{R CMD check} from testing |
| this. This should only be used exceptionally, for example if the PDFs |
| include large figures which are not part of the package sources (and |
| hence only in packages which do not have an Open Source license). |
| |
| @c DESCRIPTION field VignetteBuilder |
| The @samp{VignetteBuilder} field names (in a comma-separated list) |
| packages that provide an engine for building vignettes. These may |
| include the current package, or ones listed in @samp{Depends}, |
| @samp{Suggests} or @samp{Imports}. The @pkg{utils} package is always |
| implicitly appended. See @ref{Non-Sweave vignettes} for details. Note |
| that if, for example, a vignette has engine @samp{knitr::rmarkdown}, |
| then @CRANpkg{knitr} provides the engine but both @pkg{knitr} and |
| @CRANpkg{rmarkdown} are needed for using it, so @emph{both} these |
| packages need to be in the @samp{VignetteBuilder} field and at least |
| suggested (as @pkg{rmarkdown} is only suggested by @pkg{knitr}, and |
| hence not available automatically along with it). Many packages using |
| @CRANpkg{knitr} also need the package @CRANpkg{formatR} which it |
| suggests and so the user package needs to do so too and include this in |
| @samp{VignetteBuilder}. |
| |
| @c DESCRIPTION field Encoding |
| If the @file{DESCRIPTION} file is not entirely in @acronym{ASCII} it |
| should contain an @samp{Encoding} field specifying an encoding. This is |
| used as the encoding of the @file{DESCRIPTION} file itself and of the |
| @file{R} and @file{NAMESPACE} files, and as the default encoding of |
| @file{.Rd} files. The examples are assumed to be in this encoding when |
| running @command{R CMD check}, and it is used for the encoding of the |
| @code{CITATION} file. Only encoding names @code{latin1}, @code{latin2} |
| and @code{UTF-8} are known to be portable. (Do not specify an encoding |
| unless one is actually needed: doing so makes the package @emph{less} |
| portable. If a package has a specified encoding, you should run |
| @command{R CMD build} etc in a locale using that encoding.) |
| |
| @c DESCRIPTION NeedsCompilation |
| The @samp{NeedsCompilation} field should be set to @code{"yes"} if the |
| package contains code which to be compiled, otherwise @code{"no"} (when |
| the package could be installed from source on any platform without |
| additional tools). This is used by @code{install.packages(type = |
| "both")} in @R{} >= 2.15.2 on platforms where binary packages are the |
| norm: it is normally set by @command{R CMD build} or the repository |
| assuming compilation is required if and only if the package has a |
| @file{src} directory. |
| |
| @c DESCRIPTION field OS_type |
| The @samp{OS_type} field specifies the OS(es) for which the |
| package is intended. If present, it should be one of @code{unix} or |
| @code{windows}, and indicates that the package can only be installed |
| on a platform with @samp{.Platform$OS.type} having that value. |
| |
| @c DESCRIPTION field Type |
| The @samp{Type} field specifies the type of the package: |
| @pxref{Package types}. |
| |
| @c DESCRIPTION field Classification/ACM |
| @c DESCRIPTION field Classification/ACM-2012 |
| @c DESCRIPTION field Classification/JEL |
| @c DESCRIPTION field Classification/MSC |
| @c DESCRIPTION field Classification/MSC-2010 |
| One can add subject classifications for the content of the package using |
| the fields @samp{Classification/ACM} or @samp{Classification/ACM-2012} |
| (using the Computing Classification System of the Association for |
| Computing Machinery, @uref{http://www.acm.org/about/class/}; the former refers |
| to the 1998 version), @samp{Classification/JEL} (the Journal of Economic |
| Literature Classification System, |
| @uref{https://www.aeaweb.org/@/econlit/@/jelCodes.php}, or |
| @samp{Classification/MSC} or @samp{Classification/MSC-2010} (the |
| Mathematics Subject Classification of the American Mathematical Society, |
| @uref{http://www.ams.org/msc/}; the former refers to the 2000 version). |
| The subject classifications should be comma-separated lists of the |
| respective classification codes, e.g., @samp{Classification/ACM: G.4, |
| H.2.8, I.5.1}. |
| |
| @c DESCRIPTION field Language |
| A @samp{Language} field can be used to indicate if the package |
| documentation is not in English: this should be a comma-separated list |
| of standard (not private use or grandfathered) IETF language tags as |
| currently defined by RFC 5646 |
| (@uref{https://tools.ietf.org/@/html/@/rfc5646}, see also |
| @uref{https://en.wikipedia.org/@/wiki/@/IETF_language_tag}), i.e., use |
| language subtags which in essence are 2-letter ISO 639-1 |
| (@uref{https://en.wikipedia.@/org/@/wiki/@/ISO_639-1}) or 3-letter ISO |
| 639-3 (@uref{https://en.wikipedia.@/org/@/wiki/@/ISO_639-3}) language |
| codes. |
| |
| @c DESCRIPTION field RdMacros |
| An @samp{RdMacros} field can be used to hold a comma-separated list of |
| packages from which the current package will import @file{Rd} macro |
| definitions. These package should also be listed in @samp{Imports}, |
| @samp{Suggests} or @samp{Depends}. The macros in these packages will be |
| imported after the system macros, in the |
| order listed in the @samp{RdMacros} field, before any macro definitions |
| in the current package are loaded. Macro definitions in individual |
| @file{.Rd} files in the @file{man} directory are loaded last, and are |
| local to later parts of that file. In case of duplicates, the last |
| loaded definition will be used@footnote{Duplicate definitions may |
| trigger a warning: see @ref{User-defined macros}.} Both @command{R CMD |
| Rd2pdf} and @command{R CMD Rdconv} have an optional flag |
| @option{--RdMacros=pkglist}. The option is also a comma-separated list |
| of package names, and has priority over the value given in |
| @file{DESCRIPTION}. Packages using @file{Rd} macros should depend on |
| @R{} 3.2.0 or later. |
| |
| @c DESCRIPTION field Built |
| @c DESCRIPTION field Packaged |
| @quotation Note |
| There should be no @samp{Built} or @samp{Packaged} fields, as these are |
| added by the package management tools. |
| @end quotation |
| |
| @c DESCRIPTION field Note |
| @c DESCRIPTION field Contact |
| @c DESCRIPTION field MailingList |
| There is no restriction on the use of other fields not mentioned here |
| (but using other capitalizations of these field names would cause |
| confusion). Fields @code{Note}, @code{Contact} (for contacting the |
| authors/developers@footnote{@code{bug.report} will try to extract an |
| email address from a @code{Contact} field if there is no |
| @code{BugReports} field.}) and @code{MailingList} are in common |
| use. Some repositories (including @acronym{CRAN} and R-forge) add their |
| own fields. |
| |
| |
| |
| @node Licensing, Package Dependencies, The DESCRIPTION file, Package structure |
| @subsection Licensing |
| |
| Licensing for a package which might be distributed is an important but |
| potentially complex subject. |
| |
| It is very important that you include license information! Otherwise, |
| it may not even be legally correct for others to distribute copies of |
| the package, let alone use it. |
| |
| The package management tools use the concept of |
| `free or open source software' |
| (FOSS, e.g., @uref{https://en.wikipedia.org/@/wiki/@/FOSS}) |
| licenses: the idea being that some users of @R{} and its packages want |
| to restrict themselves to such software. Others need to ensure that |
| there are no restrictions stopping them using a package, e.g.@: |
| forbidding commercial or military use. It is a central tenet of FOSS |
| software that there are no restrictions on users nor usage. |
| |
| Do not use the @samp{License} field for information on copyright |
| holders: if needed, use a @samp{Copyright} field. |
| |
| @c DESCRIPTION field License |
| @c DESCRIPTION field License_is_FOSS |
| @c DESCRIPTION field License_restricts_use |
| The mandatory @samp{License} field in the @file{DESCRIPTION} file should |
| specify the license of the package in a standardized form. Alternatives |
| are indicated @emph{via} vertical bars. Individual specifications must |
| be one of |
| @itemize @bullet |
| @item |
| One of the ``standard'' short specifications |
| @example |
| GPL-2 GPL-3 LGPL-2 LGPL-2.1 LGPL-3 AGPL-3 Artistic-2.0 |
| BSD_2_clause BSD_3_clause MIT |
| @end example |
| @noindent |
| as made available @emph{via} @uref{https://www.R-project.org/@/Licenses/} and |
| contained in subdirectory @file{share/licenses} of the @R{} source or home |
| directory. |
| @item |
| The names or abbreviations of other licenses contained in the license |
| data base in file @file{share/licenses/license.db} in the @R{} source or |
| home directory, possibly (for versioned licenses) followed by a version |
| restriction of the form @samp{(@var{op} @var{v})} with @samp{@var{op}} one of |
| the comparison operators @samp{<}, @samp{<=}, @samp{>}, @samp{>=}, |
| @samp{==}, or @samp{!=} and @samp{@var{v}} a numeric version specification |
| (strings of non-negative integers separated by @samp{.}), possibly |
| combined @emph{via} @samp{,} (see below for an example). For versioned |
| licenses, one can also specify the name followed by the version, or |
| combine an existing abbreviation and the version with a @samp{-}. |
| |
| Abbreviations @code{GPL} and @code{LGPL} are ambiguous and |
| usually@footnote{@acronym{CRAN} expands them to e.g.@: @code{GPL-2 |
| | GPL-3}.} taken to mean any version of the license: but it is better |
| not to use them. |
| @item |
| One of the strings @samp{file LICENSE} or @samp{file LICENCE} referring |
| to a file named @file{LICENSE} or @file{LICENCE} in the package (source |
| and installation) top-level directory. |
| @item |
| The string @samp{Unlimited}, meaning that there are no restrictions on |
| distribution or use other than those imposed by relevant laws (including |
| copyright laws). |
| @end itemize |
| |
| If a package license @emph{restricts} a base license (where permitted, |
| e.g., using GPL-3 or AGPL-3 with an attribution clause), the additional |
| terms should be placed in file @file{LICENSE} (or @file{LICENCE}), and |
| the string @samp{+ file LICENSE} (or @samp{+ file LICENCE}, |
| respectively) should be appended to the corresponding individual license |
| specification. Note that several commonly used licenses do not permit |
| restrictions: this includes GPL-2 and hence any specification which |
| includes it. |
| |
| Examples of standardized specifications include |
| @example |
| License: GPL-2 |
| License: LGPL (>= 2.0, < 3) | Mozilla Public License |
| License: GPL-2 | file LICENCE |
| License: GPL (>= 2) | BSD_3_clause + file LICENSE |
| License: Artistic-2.0 | AGPL-3 + file LICENSE |
| @end example |
| @noindent |
| Please note in particular that ``Public domain'' is not a valid license, |
| since it is not recognized in some jurisdictions. |
| |
| Please ensure that the license you choose also covers any dependencies |
| (including system dependencies) of your package: it is particularly |
| important that any restrictions on the use of such dependencies are |
| evident to people reading your @file{DESCRIPTION} file. |
| |
| Fields @samp{License_is_FOSS} and @samp{License_restricts_use} may be |
| added by repositories where information cannot be computed from the name |
| of the license. @samp{License_is_FOSS: yes} is used for licenses which |
| are known to be FOSS, and @samp{License_restricts_use} can have values |
| @samp{yes} or @samp{no} if the @file{LICENSE} file is known to restrict |
| users or usage, or known not to. These are used by, e.g.@:, the |
| @code{available.packages} filters. |
| |
| |
| @cindex LICENSE file |
| @cindex LICENCE file |
| The optional file @file{LICENSE}/@file{LICENCE} contains a copy of the |
| license of the package. To avoid any confusion only include such a file |
| if it is referred to in the @samp{License} field of the |
| @file{DESCRIPTION} file. |
| |
| Whereas you should feel free to include a license file in your |
| @emph{source} distribution, please do not arrange to @emph{install} yet |
| another copy of the @acronym{GNU} @file{COPYING} or @file{COPYING.LIB} |
| files but refer to the copies on |
| @uref{https://www.R-project.org/@/Licenses/} and included in the @R{} |
| distribution (in directory @file{share/licenses}). Since files named |
| @file{LICENSE} or @file{LICENCE} @emph{will} be installed, do not use |
| these names for standard license files. To include comments about the |
| licensing rather than the body of a license, use a file named something |
| like @file{LICENSE.note}. |
| |
| A few ``standard'' licenses are rather license templates which need |
| additional information to be completed @emph{via} @samp{+ file LICENSE}. |
| |
| @node Package Dependencies, The INDEX file, Licensing, Package structure |
| @subsection Package Dependencies |
| |
| @c DESCRIPTION field Depends |
| The @samp{Depends} field gives a comma-separated list of package names |
| which this package depends on. Those packages will be attached before |
| the current package when @code{library} or @code{require} is called. |
| Each package name may be optionally followed by a comment in parentheses |
| specifying a version requirement. The comment should contain a |
| comparison operator, whitespace and a valid version number, |
| e.g.@: @samp{MASS (>= 3.1-20)}. |
| |
| The @samp{Depends} field can also specify a dependence on a certain |
| version of @R{} --- e.g., if the package works only with @R{} version |
| 3.6.0 or later, include @samp{R (>= 3.6)} in the @samp{Depends} |
| field. (As here, trailing zeroes can be dropped and it is recommended |
| that they are.) You can also require a certain SVN revision for R-devel |
| or R-patched, e.g.@: @samp{R (>= 2.14.0), R (>= r56550)} requires a |
| version later than R-devel of late July 2011 (including released |
| versions of 2.14.0). |
| |
| It makes no sense to declare a dependence on @code{R} without a version |
| specification, nor on the package @pkg{base}: this is an @R{} package |
| and package @pkg{base} is always available. |
| |
| A package or @samp{R} can appear more than once in the @samp{Depends} |
| field, for example to give upper and lower bounds on acceptable |
| versions. |
| |
| It is inadvisable to use a dependence on @R{} with patchlevel (the third |
| digit) other than zero. Doing so with packages which others depend on |
| will cause the other packages to become unusable under earlier versions |
| in the series, and e.g.@: versions 3.x.1 are widely used throughout the |
| Northern Hemisphere academic year. |
| |
| Both @code{library} and the @R{} package checking facilities use this |
| field: hence it is an error to use improper syntax or misuse the |
| @samp{Depends} field for comments on other software that might be |
| needed. The @R{} @command{INSTALL} facilities check if the version of |
| @R{} used is recent enough for the package being installed, and the list |
| of packages which is specified will be attached (after checking version |
| requirements) before the current package. |
| |
| @c DESCRIPTION field Imports |
| The @samp{Imports} field lists packages whose namespaces are imported |
| from (as specified in the @file{NAMESPACE} file) but which do not need |
| to be attached. Namespaces accessed by the @samp{::} and @samp{:::} |
| operators must be listed here, or in @samp{Suggests} or @samp{Enhances} |
| (see below). Ideally this field will include all the standard packages |
| that are used, and it is important to include S4-using packages (as |
| their class definitions can change and the @file{DESCRIPTION} file is |
| used to decide which packages to re-install when this happens). |
| Packages declared in the @samp{Depends} field should not also be in the |
| @samp{Imports} field. Version requirements can be specified and are |
| checked when the namespace is loaded. |
| |
| @c DESCRIPTION field Suggests |
| The @samp{Suggests} field uses the same syntax as @samp{Depends} and |
| lists packages that are not necessarily needed. This includes packages |
| used only in examples, tests or vignettes (@pxref{Writing package |
| vignettes}), and packages loaded in the body of functions. E.g., |
| suppose an example@footnote{even one wrapped in @code{\donttest}.} from |
| package @pkg{foo} uses a dataset from package @pkg{bar}. Then it is not |
| necessary to have @pkg{bar} use @pkg{foo} unless one wants to execute |
| all the examples/tests/vignettes: it is useful to have @pkg{bar}, but |
| not necessary. Version requirements can be specified but should be |
| checked by the code which uses the package. |
| |
| @c DESCRIPTION field Enhances |
| Finally, the @samp{Enhances} field lists packages ``enhanced'' by the |
| package at hand, e.g., by providing methods for classes from these |
| packages, or ways to handle objects from these packages (so several |
| packages have @samp{Enhances: chron} because they can handle datetime |
| objects from @CRANpkg{chron} even though they prefer @R{}'s native |
| datetime functions). Version requirements can be specified, but are |
| currently not used. Such packages cannot be required to check the |
| package: any tests which use them must be conditional on the presence |
| of the package. (If your tests use e.g.@: a dataset from another |
| package it should be in @samp{Suggests} and not @samp{Enhances}.) |
| |
| The general rules are |
| |
| @itemize @bullet |
| @item |
| A package should be listed in only one of these fields. |
| @item |
| Packages whose namespace only is needed to load the package using |
| @code{library(@var{pkgname})} should be listed in the @samp{Imports} field |
| and not in the @samp{Depends} field. Packages listed in @code{import} |
| or @code{importFrom} directives in the @file{NAMESPACE} file should |
| almost always be in @samp{Imports} and not @samp{Depends}. |
| @item |
| Packages that need to be attached to successfully load the package using |
| @code{library(@var{pkgname})} must be listed in the @samp{Depends} |
| field. |
| @item |
| All packages that are needed@footnote{This includes all packages |
| directly called by @code{library} and @code{require} calls, as well as |
| data obtained @emph{via} @code{data(theirdata, package = "somepkg")} |
| calls: @command{R CMD check} will warn about all of these. But there |
| are subtler uses which it may not detect: e.g.@: if package A uses |
| package B and makes use of functionality in package B which uses package |
| C which package B suggests or enhances, then package C needs to be in |
| the @samp{Suggests} list for package A. Nor will undeclared uses in |
| included files be reported, nor unconditional uses of packages listed |
| under @samp{Enhances}. @command{R CMD check --as-cran} will detect more |
| of the subtler uses, especially for re-building of vignettes as from |
| @R{} 3.5.0.} to successfully run @code{R CMD check} on the package must |
| be listed in one of @samp{Depends} or @samp{Suggests} or @samp{Imports}. |
| Packages used to run examples or tests conditionally (e.g.@: @emph{via} |
| @code{if(require(@var{pkgname}))}) should be listed in @samp{Suggests} |
| or @samp{Enhances}. (This allows checkers to ensure that all the |
| packages needed for a complete check are installed.) |
| |
| @item |
| Packages needed to use datasets from the package should be in |
| @samp{Imports}: this includes those needed to define S4 classes used. |
| @end itemize |
| |
| @noindent |
| In particular, packages providing ``only'' data for examples or |
| vignettes should be listed in @samp{Suggests} rather than @samp{Depends} |
| in order to make lean installations possible. |
| |
| Version dependencies in the @samp{Depends} and @samp{Imports} fields are |
| used by @code{library} when it loads the package, and |
| @code{install.packages} checks versions for the @samp{Depends}, |
| @samp{Imports} and (for @code{dependencies = TRUE}) @samp{Suggests} |
| fields. |
| |
| It is important that the information in these fields is complete and |
| accurate: it is for example used to compute which packages depend on an |
| updated package and which packages can safely be installed in parallel. |
| |
| This scheme was developed before all packages had namespaces (@R{} |
| 2.14.0 in October 2011), and good practice changed once that was in |
| place. |
| |
| Field @samp{Depends} should nowadays be used rarely, only for packages |
| which are intended to be put on the search path to make their facilities |
| available to the end user (and not to the package itself): for example |
| it makes sense that a user of package @CRANpkg{latticeExtra} would want |
| the functions of package @CRANpkg{lattice} made available. |
| |
| Almost always packages mentioned in @samp{Depends} should also be |
| imported from in the @file{NAMESPACE} file: this ensures that any needed |
| parts of those packages are available when some other package imports |
| the current package. |
| |
| The @samp{Imports} field should not contain packages which are not |
| imported from (@emph{via} the @file{NAMESPACE} file or @code{::} or |
| @code{:::} operators), as all the packages listed in that field need to |
| be installed for the current package to be installed. (This is checked |
| by @command{R CMD check}.) |
| |
| @R{} code in the package should call @code{library} or @code{require} |
| only exceptionally. Such calls are never needed for packages listed in |
| @samp{Depends} as they will already be on the search path. It used to |
| be common practice to use @code{require} calls for packages listed in |
| @samp{Suggests} in functions which used their functionality, but |
| nowadays it is better to access such functionality @emph{via} @code{::} |
| calls. |
| |
| @c DESCRIPTION field LinkingTo |
| A package that wishes to make use of header files in other packages needs |
| to declare them as a comma-separated list in the field @samp{LinkingTo} |
| in the @file{DESCRIPTION} file. For example |
| |
| @example |
| LinkingTo: link1, link2 |
| @end example |
| |
| @noindent |
| The @samp{LinkingTo} field can have a version requirement which is |
| checked at installation. |
| |
| Specifying a package in @samp{LinkingTo} suffices if these are C++ |
| headers containing source code or static linking is done at |
| installation: the packages do not need to be (and usually should not be) |
| listed in the @samp{Depends} or @samp{Imports} fields. This includes |
| @acronym{CRAN} package @CRANpkg{BH} and almost all users of |
| @CRANpkg{RcppArmadillo} and @CRANpkg{RcppEigen}. |
| |
| For another use of @samp{LinkingTo} see @ref{Linking to native routines |
| in other packages}. |
| |
| @c DESCRIPTION field Additional_repositories |
| The @samp{Additional_repositories} field is a comma-separated list of |
| repository URLs where the packages named in the other fields may be |
| found. It is currently used by @command{R CMD check} to check that the |
| packages can be found, at least as source packages (which can be |
| installed on any platform). |
| |
| @menu |
| * Suggested packages:: |
| @end menu |
| |
| @node Suggested packages, , Package Dependencies, Package Dependencies |
| @subsubsection Suggested packages |
| |
| Note that someone wanting to run the examples/tests/vignettes may not |
| have a suggested package available (and it may not even be possible to |
| install it for that platform). The recommendation used to be to make |
| their use conditional @emph{via} @code{if(require("@var{pkgname}"))}: |
| this is OK if that conditioning is done in examples/tests/vignettes, |
| although using @code{if(requireNamespace("@var{pkgname}"))} is |
| preferred, if possible. |
| |
| However, using @code{require} for conditioning @emph{in package code} is |
| not good practice as it alters the search path for the rest of the |
| session and relies on functions in that package not being masked by |
| other @code{require} or @code{library} calls. It is better practice to |
| use code like |
| @example |
| if (requireNamespace("rgl", quietly = TRUE)) @{ |
| rgl::plot3d(...) |
| @} else @{ |
| ## do something else not involving rgl. |
| @} |
| @end example |
| @noindent |
| Note the use of @code{rgl::} as that object would not necessarily be |
| visible (and if it is, it need not be the one from that namespace: |
| @code{plot3d} occurs in several other packages). If the intention is to |
| give an error if the suggested package is not available, simply use |
| e.g.@: @code{rgl::plot3d}. |
| |
| If the conditional code produces @code{print} output, function |
| @code{withAutoprint} can be useful. |
| |
| Note that the recommendation to use suggested packages conditionally in |
| tests does also apply to packages used to manage test suites: a |
| notorious example was @CRANpkg{testthat} which in version 1.0.0 contained |
| illegal C++ code and hence could not be installed on standards-compliant |
| platforms. |
| |
| Some people have assumed that a `recommended' package in @samp{Suggests} |
| can safely be used unconditionally, but this is not so. (@R{} can be |
| installed without recommended packages, and which packages are |
| `recommended' may change.) |
| |
| As noted above, packages in @samp{Enhances} @emph{must} be used |
| conditionally and hence objects within them should always be accessed |
| @emph{via} @code{::}. |
| |
| On most systems, @command{R CMD check} can be run with only those |
| packages declared in @samp{Depends} and @samp{Imports} by setting |
| environment variable @env{_R_CHECK_DEPENDS_ONLY_=true}, whereas setting |
| @env{_R_CHECK_SUGGESTS_ONLY_=true} also allows suggested packages, but |
| not those in @samp{Enhances} nor those not mentioned in the |
| @file{DESCRIPTION} file. It is recommended that a package is checked |
| with each of these set, as well as with neither. |
| |
| @node The INDEX file, Package subdirectories, Package Dependencies, Package structure |
| @subsection The @file{INDEX} file |
| @cindex INDEX file |
| |
| The optional file @file{INDEX} contains a line for each sufficiently |
| interesting object in the package, giving its name and a description |
| (functions such as print methods not usually called explicitly might not |
| be included). Normally this file is missing and the corresponding |
| information is automatically generated from the documentation sources |
| (using @code{tools::Rdindex()}) when installing from source. |
| |
| The file is part of the information given by @code{library(help = |
| @var{pkgname})}. |
| |
| Rather than editing this file, it is preferable to put customized |
| information about the package into an overview help page |
| (@pxref{Documenting packages}) and/or a vignette (@pxref{Writing package |
| vignettes}). |
| |
| @node Package subdirectories, Data in packages, The INDEX file, Package structure |
| @subsection Package subdirectories |
| @cindex Package subdirectories |
| |
| The @file{R} subdirectory contains @R{} code files, only. The code |
| files to be installed must start with an @acronym{ASCII} (lower or upper |
| case) letter or digit and have one of the extensions@footnote{Extensions |
| @file{.S} and @file{.s} arise from code originally written for S(-PLUS), |
| but are commonly used for assembler code. Extension @file{.q} was used |
| for S, which at one time was tentatively called QPE.} @file{.R}, |
| @file{.S}, @file{.q}, @file{.r}, or @file{.s}. We recommend using |
| @file{.R}, as this extension seems to be not used by any other software. |
| It should be possible to read in the files using @code{source()}, so |
| @R{} objects must be created by assignments. Note that there need be no |
| connection between the name of the file and the @R{} objects created by |
| it. Ideally, the @R{} code files should only directly assign @R{} |
| objects and definitely should not call functions with side effects such |
| as @code{require} and @code{options}. If computations are required to |
| create objects these can use code `earlier' in the package (see the |
| @samp{Collate} field) plus functions in the @samp{Depends} packages |
| provided that the objects created do not depend on those packages except |
| @emph{via} namespace imports. |
| |
| Two exceptions are allowed: if the @file{R} subdirectory contains a file |
| @file{sysdata.rda} (a saved image of one or more @R{} objects: please |
| use suitable compression as suggested by @code{tools::resaveRdaFiles}, |
| and see also the @samp{SysDataCompression} @file{DESCRIPTION} field.) |
| this will be lazy-loaded into the namespace environment -- this is |
| intended for system datasets that are not intended to be user-accessible |
| @emph{via} @code{data}. Also, files ending in @samp{.in} will be |
| allowed in the @file{R} directory to allow a @file{configure} script to |
| generate suitable files. |
| |
| Only @acronym{ASCII} characters (and the control characters tab, |
| formfeed, LF and CR) should be used in code files. Other characters are |
| accepted in comments@footnote{but they should be in the encoding |
| declared in the @file{DESCRIPTION} file.}, but then the comments may not |
| be readable in e.g.@: a UTF-8 locale. Non-@acronym{ASCII} characters in |
| object names will normally@footnote{This is true for OSes which |
| implement the @samp{C} locale: Windows' idea of the @samp{C} locale uses |
| the WinAnsi charset.} fail when the package is installed. Any byte will |
| be allowed in a quoted character string but @samp{\uxxxx} escapes should |
| be used for non-@acronym{ASCII} characters. However, |
| non-@acronym{ASCII} character strings may not be usable in some locales |
| and may display incorrectly in others. |
| |
| |
| @findex library.dynam |
| Various @R{} functions in a package can be used to initialize and |
| clean up. @xref{Load hooks}. |
| |
| The @file{man} subdirectory should contain (only) documentation files |
| for the objects in the package in @dfn{R documentation} (Rd) format. |
| The documentation filenames must start with an @acronym{ASCII} (lower or |
| upper case) letter or digit and have the extension @file{.Rd} (the |
| default) or @file{.rd}. Further, the names must be valid in |
| @samp{file://} URLs, which means@footnote{More precisely, they can |
| contain the English alphanumeric characters and the symbols |
| @samp{$ - _ . + ! ' ( ) , ; @ = &}.} |
| they must be entirely @acronym{ASCII} and not contain @samp{%}. |
| @xref{Writing R documentation files}, for more information. Note that |
| all user-level objects in a package should be documented; if a package |
| @var{pkg} contains user-level objects which are for ``internal'' use |
| only, it should provide a file @file{@var{pkg}-internal.Rd} which |
| documents all such objects, and clearly states that these are not meant |
| to be called by the user. See e.g.@: the sources for package @pkg{grid} |
| in the @R{} distribution. Note that packages which use internal objects |
| extensively should not export those objects from their namespace, when |
| they do not need to be documented (@pxref{Package namespaces}). |
| |
| Having a @file{man} directory containing no documentation files may give |
| an installation error. |
| |
| The @file{man} subdirectory may contain a subdirectory named @file{macros}; |
| this will contain source for user-defined Rd macros. |
| (See @ref{User-defined macros}.) These use the Rd format, but may |
| not contain anything but macro definitions, comments and whitespace. |
| |
| The @file{R} and @file{man} subdirectories may contain OS-specific |
| subdirectories named @file{unix} or @file{windows}. |
| |
| The sources and headers for the compiled code are in @file{src}, plus |
| optionally a file @file{Makevars} or @file{Makefile}. When a package is |
| installed using @code{R CMD INSTALL}, @command{make} is used to control |
| compilation and linking into a shared object for loading into @R{}. |
| There are default @command{make} variables and rules for this |
| (determined when @R{} is configured and recorded in |
| @file{@var{R_HOME}/etc@var{R_ARCH}/Makeconf}), providing support for C, |
| C++, fixed- or free-form Fortran, Objective C and Objective |
| C++@footnote{either or both of which may not be supported on particular |
| platforms} with associated extensions @file{.c}, @file{.cc} or |
| @file{.cpp}, @file{.f}, @file{.f90} or @file{.f95}, @file{.m}, and |
| @file{.mm}, respectively. We recommend using @file{.h} for headers, |
| also for C++@footnote{Using @file{.hpp} is not guaranteed to be |
| portable.} or Fortran 9x include files. (Use of extension @file{.C} for |
| C++ is no longer supported.) Files in the @file{src} directory should |
| not be hidden (start with a dot), and hidden files will under some |
| versions of @R{} be ignored. |
| |
| It is not portable (and may not be possible at all) to mix all these |
| languages in a single package. Because @R{} itself uses it, we know that |
| C and fixed-form Fortran can be used together, and mixing C, C++ and |
| Fortran usually work for the platform's native compilers. |
| |
| If your code needs to depend on the platform there are certain defines |
| which can used in C or C++. On all Windows builds (even 64-bit ones) |
| @samp{_WIN32} will be defined: on 64-bit Windows builds also |
| @samp{_WIN64}, and on macOS @samp{__APPLE__} is defined.@footnote{There |
| is also @samp{__APPLE_CC__}, but that indicates a compiler with |
| Apple-specific features, not the OS. It is used in |
| @file{Rinlinedfuns.h}.} |
| |
| The default rules can be tweaked by setting macros@footnote{the POSIX |
| terminology, called `make variables' by GNU make.} in a file |
| @file{src/Makevars} (@pxref{Using Makevars}). Note that this mechanism |
| should be general enough to eliminate the need for a package-specific |
| @file{src/Makefile}. If such a file is to be distributed, considerable |
| care is needed to make it general enough to work on all @R{} platforms. |
| If it has any targets at all, it should have an appropriate first target |
| named @samp{all} and a (possibly empty) target @samp{clean} which |
| removes all files generated by running @command{make} (to be used by |
| @samp{R CMD INSTALL --clean} and @samp{R CMD INSTALL --preclean}). |
| There are platform-specific file names on Windows: |
| @file{src/Makevars.win} takes precedence over @file{src/Makevars} and |
| @file{src/Makefile.win} must be used. Some @command{make} programs |
| require makefiles to have a complete final line, including a newline. |
| |
| A few packages use the @file{src} directory for purposes other than |
| making a shared object (e.g.@: to create executables). Such packages |
| should have files @file{src/Makefile} and @file{src/Makefile.win} |
| (unless intended for only Unix-alikes or only Windows). |
| |
| In very special cases packages may create binary files other than the |
| shared objects/DLLs in the @file{src} directory. Such files will not be |
| installed in a multi-architecture setting since @code{R CMD INSTALL |
| --libs-only} is used to merge multiple sub-architectures and it only |
| copies shared objects/DLLs. If a package wants to install other |
| binaries (for example executable programs), it should provide an @R{} |
| script @file{src/install.libs.R} which will be run as part of the |
| installation in the @code{src} build directory @emph{instead of} copying |
| the shared objects/DLLs. The script is run in a separate @R{} |
| environment containing the following variables: @code{R_PACKAGE_NAME} |
| (the name of the package), @code{R_PACKAGE_SOURCE} (the path to the |
| source directory of the package), @code{R_PACKAGE_DIR} (the path of the |
| target installation directory of the package), @code{R_ARCH} (the |
| arch-dependent part of the path, often empty), @code{SHLIB_EXT} (the |
| extension of shared objects) and @code{WINDOWS} (@code{TRUE} on Windows, |
| @code{FALSE} elsewhere). Something close to the default behavior could |
| be replicated with the following @file{src/install.libs.R} file: |
| |
| @example |
| files <- Sys.glob(paste0("*", SHLIB_EXT)) |
| dest <- file.path(R_PACKAGE_DIR, paste0('libs', R_ARCH)) |
| dir.create(dest, recursive = TRUE, showWarnings = FALSE) |
| file.copy(files, dest, overwrite = TRUE) |
| if(file.exists("symbols.rds")) |
| file.copy("symbols.rds", dest, overwrite = TRUE) |
| @end example |
| @noindent |
| On the other hand, executable programs could be installed along the |
| lines of |
| @example |
| execs <- c("one", "two", "three") |
| if(WINDOWS) execs <- paste0(execs, ".exe") |
| if ( any(file.exists(execs)) ) @{ |
| dest <- file.path(R_PACKAGE_DIR, paste0('bin', R_ARCH)) |
| dir.create(dest, recursive = TRUE, showWarnings = FALSE) |
| file.copy(execs, dest, overwrite = TRUE) |
| @} |
| @end example |
| |
| @noindent |
| Note the use of architecture-specific subdirectories of @file{bin} where |
| needed. |
| |
| The @file{data} subdirectory is for data files: @xref{Data in packages}. |
| |
| The @file{demo} subdirectory is for @R{} scripts (for running @emph{via} |
| @code{demo()}) that demonstrate some of the functionality of the |
| package. Demos may be interactive and are not checked automatically, so |
| if testing is desired use code in the @file{tests} directory to achieve |
| this. The script files must start with a (lower or upper case) letter |
| and have one of the extensions @file{.R} or @file{.r}. If present, the |
| @file{demo} subdirectory should also have a @file{00Index} file with one |
| line for each demo, giving its name and a description separated by a tab |
| or at least three spaces. (This index file is not generated |
| automatically.) Note that a demo does not have a specified encoding and |
| so should be an @acronym{ASCII} file (@pxref{Encoding issues}). Function |
| @code{demo()} will use the package encoding if there is one, but this is |
| mainly useful for non-@acronym{ASCII} comments. |
| |
| @cindex .Rinstignore file |
| The contents of the @file{inst} subdirectory will be copied recursively |
| to the installation directory. Subdirectories of @file{inst} should not |
| interfere with those used by @R{} (currently, @file{R}, @file{data}, |
| @file{demo}, @file{exec}, @file{libs}, @file{man}, @file{help}, |
| @file{html} and @file{Meta}, and earlier versions used @file{latex}, |
| @file{R-ex}). The copying of the @file{inst} happens after @file{src} |
| is built so its @file{Makefile} can create files to be installed. To |
| exclude files from being installed, one can specify a list of exclude |
| patterns in file @file{.Rinstignore} in the top-level source directory. |
| These patterns should be Perl-like regular expressions (see the help for |
| @code{regexp} in @R{} for the precise details), one per line, to be |
| matched case-insensitively against the file and directory paths, e.g.@: |
| @file{doc/.*[.]png$} will exclude all PNG files in @file{inst/doc} based |
| on the extension. |
| |
| Note that with the exceptions of @file{INDEX}, |
| @file{LICENSE}/@file{LICENCE} and @file{NEWS}, information files at the |
| top level of the package will @emph{not} be installed and so not be |
| known to users of Windows and macOS compiled packages (and not seen |
| by those who use @command{R CMD INSTALL} or @command{install.packages} |
| on the tarball). So any information files you wish an end user to see |
| should be included in @file{inst}. Note that if the named exceptions |
| also occur in @file{inst}, the version in @file{inst} will be that seen |
| in the installed package. |
| |
| @findex CITATION |
| @cindex citation |
| @findex NEWS.Rd |
| @cindex news |
| Things you might like to add to @file{inst} are a @file{CITATION} file |
| for use by the @code{citation} function, and a @file{NEWS.Rd} file for |
| use by the @code{news} function. See its help page for the specific |
| format restrictions of the @file{NEWS.Rd} file. |
| |
| @findex AUTHORS |
| @findex COPYRIGHTS |
| Another file sometimes needed in @file{inst} is @file{AUTHORS} or |
| @file{COPYRIGHTS} to specify the authors or copyright holders when this |
| is too complex to put in the @file{DESCRIPTION} file. |
| |
| Subdirectory @file{tests} is for additional package-specific test code, |
| similar to the specific tests that come with the @R{} distribution. |
| Test code can either be provided directly in a @file{.R} (or @file{.r} |
| as from @R{} 3.4.0) file, or @emph{via} a @file{.Rin} file containing |
| code which in turn creates the corresponding @file{.R} file (e.g., by |
| collecting all function objects in the package and then calling them |
| with the strangest arguments). The results of running a @file{.R} file |
| are written to a @file{.Rout} file. If there is a |
| corresponding@footnote{The best way to generate such a file is to copy |
| the @file{.Rout} from a successful run of @command{R CMD check}. If you |
| want to generate it separately, do run @R{} with options |
| @option{--vanilla --slave} and with environment variable |
| @env{LANGUAGE=en} set to get messages in English. Be careful not to use |
| output with the option @option{--timings} (and note that |
| @option{--as-cran} sets it).} @file{.Rout.save} file, these two are |
| compared, with differences being reported but not causing an error. The |
| directory @file{tests} is copied to the check area, and the tests are |
| run with the copy as the working directory and with @code{R_LIBS} set to |
| ensure that the copy of the package installed during testing will be |
| found by @code{library(@var{pkg_name})}. Note that the package-specific |
| tests are run in a vanilla @R{} session without setting the |
| random-number seed, so tests which use random numbers will need to set |
| the seed to obtain reproducible results (and it can be helpful to do so |
| in all cases, to avoid occasional failures when tests are run). |
| |
| If directory @file{tests} has a subdirectory @file{Examples} containing |
| a file @code{@var{pkg}-Ex.Rout.save}, this is compared to the output |
| file for running the examples when the latter are checked. Reference |
| output should be produced without having the @option{--timings} option |
| set (and note that @option{--as-cran} sets it). |
| |
| If reference output is included for examples, tests or vignettes do make |
| sure that it is fully reproducible, as it will be compared verbatim to |
| that produced in a check run, unless the @samp{IGNORE_RDIFF} markup is |
| used. Things which trip maintainers up include displayed version |
| numbers from loading other packages, printing numerical results to an |
| unreproducibly high precision and printing timings. Another trap is |
| small values which are in fact rounding error from zero: consider using |
| @code{zapsmall}. |
| |
| Subdirectory @file{exec} could contain additional executable scripts the |
| package needs, typically scripts for interpreters such as the shell, |
| Perl, or Tcl. NB: only files (and not directories) under @file{exec} are |
| installed (and those with names starting with a dot are ignored), and |
| they are all marked as executable (mode @code{755}, moderated by |
| @samp{umask}) on POSIX platforms. Note too that this is not suitable |
| for executable @emph{programs} since some platforms (including Windows) |
| support multiple architectures using the same installed package |
| directory. |
| |
| Subdirectory @file{po} is used for files related to @emph{localization}: |
| @pxref{Internationalization}. |
| |
| Subdirectory @file{tools} is the preferred place for auxiliary files |
| needed during configuration, and also for sources need to re-create |
| scripts (e.g.@: M4 files for @command{autoconf}). |
| |
| |
| @node Data in packages, Non-R scripts in packages, Package subdirectories, Package structure |
| @subsection Data in packages |
| |
| The @file{data} subdirectory is for data files, either to be made |
| available @emph{via} lazy-loading or for loading using @code{data()}. |
| (The choice is made by the @samp{LazyData} field in the |
| @file{DESCRIPTION} file: the default is not to do so.) It should not be |
| used for other data files needed by the package, and the convention has |
| grown up to use directory @file{inst/extdata} for such files. |
| |
| Data files can have one of three types as indicated by their extension: |
| plain @R{} code (@file{.R} or @file{.r}), tables (@file{.tab}, |
| @file{.txt}, or @file{.csv}, see @code{?data} for the file formats, and |
| note that @file{.csv} is @strong{not} the standard@footnote{e.g.@: |
| @uref{https://tools.ietf.org/@/html/@/rfc4180}.} CSV format), or |
| @code{save()} images (@file{.RData} or @file{.rda}). The files should |
| not be hidden (have names starting with a dot). Note that @R{} code |
| should be if possible ``self-sufficient'' and not make use of extra |
| functionality provided by the package, so that the data file can also be |
| used without having to load the package or its namespace: it should run |
| as silently as possible and not change the @code{search()} path by |
| attaching packages or other environments. |
| |
| Images (extensions @file{.RData}@footnote{People who have trouble with |
| case are advised to use @file{.rda} as a common error is to refer to |
| @file{abc.RData} as @file{abc.Rdata}!} or @file{.rda}) can contain |
| references to the namespaces of packages that were used to create them. |
| Preferably there should be no such references in data files, and in any |
| case they should only be to packages listed in the @code{Depends} and |
| @code{Imports} fields, as otherwise it may be impossible to install the |
| package. To check for such references, load all the images into a |
| vanilla @R{} session, run @code{str()} on all the datasets, and look at |
| the output of @code{loadedNamespaces()}. |
| |
| Particular care is needed where a dataset or one of its components is of |
| an S4 class, especially if the class is defined in a different package. |
| First, the package containing the class definition has to be available |
| to do useful things with the dataset, so that package must be listed in |
| @code{Imports} or @code{Depends} (even if this gives a check warning |
| about unused imports). Second, the definition of an S4 class can |
| change, and often is unnoticed when in a package with a different |
| author. So it may be wiser to use the @file{.R} form and use that to |
| create the dataset object when needed (loading package namespaces but |
| not attaching them by using @code{requireNamespace(@var{pkg}, quietly = |
| TRUE)} and using @code{@var{pkg}::} to refer to objects in the |
| namespace). |
| |
| If you are not using @samp{LazyData} and either your data files are large |
| or e.g., you use @file{data/foo.R} scripts to produce your data, loading |
| your namespace, you |
| can speed up installation by providing a file @file{datalist} in the |
| @file{data} subdirectory. This should have one line per topic that |
| @code{data()} will find, in the format @samp{foo} if @code{data(foo)} |
| provides @samp{foo}, or @samp{foo: bar bah} if @code{data(foo)} provides |
| @samp{bar} and @samp{bah}. @command{R CMD build} will automatically add |
| a @file{datalist} file to @file{data} directories of over 1Mb, using the |
| function @code{tools::add_datalist}. |
| |
| Tables (@file{.tab}, @file{.txt}, or @file{.csv} files) can be |
| compressed by @command{gzip}, @command{bzip2} or @command{xz}, |
| optionally with additional extension @file{.gz}, @file{.bz2} or |
| @file{.xz}. |
| |
| If your package is to be distributed, do consider the resource |
| implications of large datasets for your users: they can make packages |
| very slow to download and use up unwelcome amounts of storage space, as |
| well as taking many seconds to load. It is normally best to distribute |
| large datasets as @file{.rda} images prepared by @code{save(, compress = |
| TRUE)} (the default). Using @command{bzip2} or @command{xz} compression |
| will usually reduce the size of both the package tarball and the |
| installed package, in some cases by a factor of two or more. |
| |
| Package @pkg{tools} has a couple of functions to help with data images: |
| @code{checkRdaFiles} reports on the way the image was saved, and |
| @code{resaveRdaFiles} will re-save with a different type of compression, |
| including choosing the best type for that particular image. |
| |
| @c DESCRIPTION field LazyDataCompression |
| Some packages using @samp{LazyData} will benefit from using a form of |
| compression other than @command{gzip} in the installed lazy-loading |
| database. This can be selected by the @option{--data-compress} option |
| to @command{R CMD INSTALL} or by using the @samp{LazyDataCompression} |
| field in the @file{DESCRIPTION} file. Useful values are @code{bzip2}, |
| @code{xz} and the default, @code{gzip}. The only way to discover which |
| is best is to try them all and look at the size of the |
| @file{@var{pkgname}/data/Rdata.rdb} file. |
| |
| @c DESCRIPTION field SysDataCompression |
| The analogue for @file{sysdata.rda} is field @samp{SysDataCompression}: |
| the default is @code{xz} for files bigger than 1MB otherwise |
| @code{gzip}. |
| |
| Lazy-loading is not supported for very large datasets (those which when |
| serialized exceed 2GB, the limit for the format on 32-bit platforms). |
| |
| @node Non-R scripts in packages, Specifying URLs, Data in packages, Package structure |
| @subsection Non-R scripts in packages |
| |
| Code which needs to be compiled (C, C++, Fortran @dots{}) |
| is included in the @file{src} subdirectory and discussed elsewhere in |
| this document. |
| |
| Subdirectory @file{exec} could be used for scripts for interpreters such |
| as the shell, BUGS, JavaScript, Matlab, Perl, php (@CRANpkg{amap}), |
| Python or Tcl (@CRANpkg{Simile}), or even @R{}. However, it seems more |
| common to use the @file{inst} directory, for example |
| @file{WriteXLS/inst/Perl}, @file{NMF/inst/m-files}, |
| @file{RnavGraph/inst/tcl}, @file{RProtoBuf/inst/python} and |
| @file{emdbook/inst/BUGS} and @file{gridSVG/inst/js}. |
| |
| Java code is a special case: except for very small programs, |
| @file{.java} files should be byte-compiled (to a @file{.class} file) and |
| distributed as part of a @file{.jar} file: the conventional location for |
| the @file{.jar} file(s) is @file{inst/java}. It is desirable (and |
| required under an Open Source license) to make the Java source files |
| available: this is best done in a top-level @file{java} directory in the |
| package---the source files should not be installed. |
| |
| If your package requires one of these interpreters or an extension then |
| this should be declared in the @samp{SystemRequirements} field of its |
| @file{DESCRIPTION} file. (Users of Java most often do so @emph{via} |
| @CRANpkg{rJava}, when depending on/importing that suffices.) |
| |
| Windows and Mac users should be aware that the Tcl extensions |
| @samp{BWidget} and @samp{Tktable} which are currently included with the |
| @R{} for Windows and in the macOS installers @emph{are} extensions and do |
| need to be declared for users of other platforms (and that |
| @samp{Tktable} is less widely available than it used to be, including |
| not in the main repositories for major Linux distributions). |
| @c Not in Fedora since 17, only in launchpad for Ubuntu. |
| |
| @samp{BWidget} needs to be installed by the user on other OSes. This is |
| fairly easy to do: first find the Tcl/Tk search path: |
| |
| @example |
| library(tcltk) |
| strsplit(tclvalue('auto_path'), " ")[[1]] |
| @end example |
| |
| @noindent |
| then download the sources from |
| @uref{https://sourceforge.net/@/projects/@/tcllib/@/files/@/BWidget/} and |
| at the command line run something like |
| |
| @example |
| tar xf bwidget-1.9.8.tar.gz |
| sudo mv bwidget-1.9.8 /usr/local/lib |
| @end example |
| |
| @noindent |
| substituting a location on the Tcl/Tk search path for @file{/usr/local/lib} if |
| needed. |
| |
| @node Specifying URLs, , Non-R scripts in packages, Package structure |
| @subsection Specifying URLs |
| |
| URLs in many places in the package documentation will be converted to |
| clickable hyperlinks in at least some of their renderings. So care is |
| needed that their forms are correct and portable. |
| |
| The full URL should be given, including the scheme (often @samp{http://} |
| or @samp{https://}) and a final @samp{/} for references to directories. |
| |
| Spaces in URLs are not portable and how they are handled does vary by |
| HTTP server and by client. There should be no space in the host part of |
| an @samp{http://} URL, and spaces in the remainder should be encoded, |
| with each space replaced by @samp{%20}. |
| |
| Other characters may benefit from being encoded: see the help on |
| @code{URLencode()}. |
| |
| The canonical URL for a @acronym{CRAN} package is |
| @example |
| https://cran.r-project.org/package=@var{pkgname} |
| @end example |
| |
| @noindent |
| and not a version starting |
| @samp{https://cran.r-project.org/web/packages/@var{pkgname}}. |
| |
| @node Configure and cleanup, Checking and building packages, Package structure, Creating R packages |
| @section Configure and cleanup |
| |
| Note that most of this section is specific to Unix-alikes: see the |
| comments later on about the Windows port of @R{}. |
| |
| If your package needs some system-dependent configuration before |
| installation you can include an executable (Bourne@footnote{The script |
| should only assume a POSIX-compliant @command{/bin/sh} -- see |
| @uref{http://pubs.opengroup.org/@/onlinepubs/@/9699919799/@/utilities/@/V3_chap02.html}. |
| In particular @command{bash} extensions must not be used, and not all |
| @R{} platforms have a @command{bash} command, let alone one at |
| @file{/bin/bash}. All known shells used with @R{} support the use of |
| backticks, but not all support @samp{$(@var{cmd})}. However, real-world |
| shells are not fully POSIX-compliant and omissions and idiosyncrasies |
| need to be worked around---which Autoconf will do for you. Arithmetic |
| expansion is a known issue: see |
| @uref{https://www.gnu.org/@/software/@/autoconf/@/manual/autoconf.html#Portable-Shell} |
| for this and others. Some checks can be done by the |
| @code{checkbashisms} Perl script at |
| @uref{https://sourceforge.net/@/projects/@/checkbaskisms/@/files}, also |
| available in most Linux distributions in a package named either @samp{devscripts} or @samp{devscripts-checkbashisms}.}) shell script @file{configure} in |
| your package which (if present) is executed by @code{R CMD INSTALL} |
| before any other action is performed. This can be a script created by |
| the Autoconf mechanism, but may also be a script written by yourself. |
| Use this to detect if any nonstandard libraries are present such that |
| corresponding code in the package can be disabled at install time rather |
| than giving error messages when the package is compiled or used. To |
| summarize, the full power of Autoconf is available for your extension |
| package (including variable substitution, searching for libraries, |
| etc.). |
| |
| Under a Unix-alike only, an executable (Bourne shell) script |
| @file{cleanup} is executed as the last thing by @code{R CMD INSTALL} if |
| option @option{--clean} was given, and by @code{R CMD build} when |
| preparing the package for building from its source. |
| |
| As an example consider we want to use functionality provided by a (C or |
| Fortran) library @code{foo}. Using Autoconf, we can create a configure |
| script which checks for the library, sets variable @code{HAVE_FOO} to |
| @code{TRUE} if it was found and to @code{FALSE} otherwise, and then |
| substitutes this value into output files (by replacing instances of |
| @samp{@@HAVE_FOO@@} in input files with the value of @code{HAVE_FOO}). |
| For example, if a function named @code{bar} is to be made available by |
| linking against library @code{foo} (i.e., using @option{-lfoo}), one |
| could use |
| |
| @example |
| @group |
| AC_CHECK_LIB(foo, @var{fun}, [HAVE_FOO=TRUE], [HAVE_FOO=FALSE]) |
| AC_SUBST(HAVE_FOO) |
| ...... |
| AC_CONFIG_FILES([foo.R]) |
| AC_OUTPUT |
| @end group |
| @end example |
| |
| @noindent |
| in @file{configure.ac} (assuming Autoconf 2.50 or later). |
| |
| The definition of the respective @R{} function in @file{foo.R.in} could be |
| |
| @example |
| @group |
| foo <- function(x) @{ |
| if(!@@HAVE_FOO@@) |
| stop("Sorry, library 'foo' is not available") |
| ... |
| @end group |
| @end example |
| |
| @noindent |
| From this file @command{configure} creates the actual @R{} source file |
| @file{foo.R} looking like |
| |
| @example |
| @group |
| foo <- function(x) @{ |
| if(!FALSE) |
| stop("Sorry, library 'foo' is not available") |
| ... |
| @end group |
| @end example |
| |
| @noindent |
| if library @code{foo} was not found (with the desired functionality). |
| In this case, the above @R{} code effectively disables the function. |
| |
| One could also use different file fragments for available and missing |
| functionality, respectively. |
| |
| You will very likely need to ensure that the same C compiler and |
| compiler flags are used in the @file{configure} tests as when compiling |
| @R{} or your package. Under a Unix-alike, you can achieve this by |
| including the following fragment early in @file{configure.ac} |
| (@emph{before} calling @code{AC_PROG_CC}) |
| |
| @example |
| @group |
| : $@{R_HOME=`R RHOME`@} |
| if test -z "$@{R_HOME@}"; then |
| echo "could not determine R_HOME" |
| exit 1 |
| fi |
| CC=`"$@{R_HOME@}/bin/R" CMD config CC` |
| CFLAGS=`"$@{R_HOME@}/bin/R" CMD config CFLAGS` |
| CPPFLAGS=`"$@{R_HOME@}/bin/R" CMD config CPPFLAGS` |
| @end group |
| @end example |
| |
| @noindent |
| (Using @samp{$@{R_HOME@}/bin/R} rather than just @samp{R} is necessary |
| in order to use the correct version of @R{} when running the script as |
| part of @code{R CMD INSTALL}, and the quotes since @samp{$@{R_HOME@}} |
| might contain spaces.) |
| |
| If your code does load checks then you may also need |
| @example |
| LDFLAGS=`"$@{R_HOME@}/bin/R" CMD config LDFLAGS` |
| @end example |
| |
| @noindent |
| and packages written with C++ need to pick up the details for the C++ |
| compiler and switch the current language to C++ by something like |
| @example |
| CXX=`"$@{R_HOME@}/bin/R" CMD config CXX` |
| if test -z "$CXX"; then |
| AC_MSG_ERROR([No C++ compiler is available]) |
| fi |
| CXXFLAGS=`"$@{R_HOME@}/bin/R" CMD config CXXFLAGS` |
| AC_LANG(C++) |
| @end example |
| |
| @noindent |
| The latter is important, as for example C headers may not be available |
| to C++ programs or may not be written to avoid C++ name-mangling. Note |
| that an @R{} installation is not required to have a C++ compiler so |
| @samp{CXX} may be empty. |
| |
| @findex R CMD config |
| You can use @code{R CMD config} to get the value of the basic |
| configuration variables, and also the header and library flags necessary |
| for linking a front-end executable program against @R{}, see @kbd{R CMD |
| config --help} for details. If you do, it is essential that you use |
| both the command and the appropriate flags, so that for example |
| @samp{CC} must always be used with @samp{CFLAGS} and (for code to be |
| linked into a shared library) @samp{CPICFLAGS}. For Fortran, be careful |
| to use @samp{FC FFLAGS FPICFLAGS} for fixed-form Fortran and |
| @samp{FC FCFLAGS FPICFLAGS} for free-form Fortran. (Packages intended to |
| be used with @R{} versions before 3.6.0 should use the legacy forms |
| @samp{F77 FFLAGS FPICFLAGS} and @samp{FC FCFLAGS FCPICFLAGS}, which are |
| still accepted.) |
| |
| To check for an external BLAS library using the @code{AX_BLAS} macro |
| from the official Autoconf Macro Archive, one can simply do |
| |
| @example |
| @group |
| FC=`"$@{R_HOME@}/bin/R" CMD config FC` |
| FCLAGS=`"$@{R_HOME@}/bin/R" CMD config FFLAGS` |
| AC_PROG_FC |
| FLIBS=`"$@{R_HOME@}/bin/R" CMD config FLIBS` |
| AX_BLAS([], AC_MSG_ERROR([could not find your BLAS library], 1)) |
| @end group |
| @end example |
| |
| Note that @code{FLIBS} as determined by @R{} must be used to ensure that |
| Fortran code works on all @R{} platforms. |
| @c Calls to the Autoconf macro |
| @c @code{AC_F77_LIBRARY_LDFLAGS}, which would overwrite @code{FLIBS}, must |
| @c not be used (and hence e.g.@: removed from @code{ACX_BLAS}). (Recent |
| @c versions of Autoconf in fact allow an already set @code{FLIBS} to |
| @c override the test for the Fortran linker flags.) |
| |
| |
| @strong{N.B.}: If the @command{configure} script creates files, e.g.@: |
| @file{src/Makevars}, you do need a @command{cleanup} script to remove |
| them. Otherwise @command{R CMD build} may ship the files that are |
| created. For example, package @CRANpkg{RODBC} has |
| |
| @example |
| #!/bin/sh |
| |
| rm -f config.* src/Makevars src/config.h |
| @end example |
| |
| @noindent |
| As this example shows, @command{configure} often creates working files |
| such as @file{config.log}. |
| |
| If your configure script needs auxiliary files, it is recommended that |
| you ship them in a @file{tools} directory (as @R{} itself does). |
| |
| You should bear in mind that the configure script will not be used on |
| Windows systems. If your package is to be made publicly available, |
| please give enough information for a user on a non-Unix-alike platform |
| to configure it manually, or provide a @file{configure.win} script to be |
| used on that platform. (Optionally, there can be a @file{cleanup.win} |
| script. Both should be shell scripts to be executed by @command{ash}, |
| which is a minimal version of Bourne-style @command{sh}.) When |
| @file{configure.win} is run the environment variables @env{R_HOME} |
| (which uses @samp{/} as the file separator), @env{R_ARCH} and Use |
| @env{R_ARCH_BIN} will be set. Use @env{R_ARCH} to decide if this is a |
| 64-bit build (its value there is @samp{/x64}) and to install DLLs to the |
| correct place (@file{$@{R_HOME@}/libs$@{R_ARCH@}}). Use |
| @env{R_ARCH_BIN} to find the correct place under the @file{bin} |
| directory, e.g.@: @file{$@{R_HOME@}/bin$@{R_ARCH_BIN@}/Rscript.exe}. |
| |
| In some rare circumstances, the configuration and cleanup scripts need |
| to know the location into which the package is being installed. An |
| example of this is a package that uses C code and creates two shared |
| object/DLLs. Usually, the object that is dynamically loaded by @R{} |
| is linked against the second, dependent, object. On some systems, we |
| can add the location of this dependent object to the object that is |
| dynamically loaded by @R{}. This means that each user does not have to |
| set the value of the @env{LD_LIBRARY_PATH} (or equivalent) environment |
| variable, but that the secondary object is automatically resolved. |
| Another example is when a package installs support files that are |
| required at run time, and their location is substituted into an @R{} |
| data structure at installation time. |
| @vindex R_LIBRARY_DIR |
| @vindex R_PACKAGE_DIR |
| @vindex R_PACKAGE_NAME |
| The names of the top-level library directory (i.e., specifiable |
| @emph{via} the @samp{-l} argument) and the directory of the package |
| itself are made available to the installation scripts @emph{via} the two |
| shell/environment variables @env{R_LIBRARY_DIR} and @env{R_PACKAGE_DIR}. |
| Additionally, the name of the package (e.g.@: @samp{survival} or |
| @samp{MASS}) being installed is available from the environment variable |
| @env{R_PACKAGE_NAME}. (Currently the value of @env{R_PACKAGE_DIR} is |
| always @code{$@{R_LIBRARY_DIR@}/$@{R_PACKAGE_NAME@}}, but this used not to |
| be the case when versioned installs were allowed. Its main use is in |
| @file{configure.win} scripts for the installation path of external |
| software's DLLs.) Note that the value of @env{R_PACKAGE_DIR} may |
| contain spaces and other shell-unfriendly characters, and so should be |
| quoted in makefiles and configure scripts. |
| |
| One of the more tricky tasks can be to find the headers and libraries of |
| external software. One tool which is increasingly available on |
| Unix-alikes (but not by default@footnote{but it is available on the machines |
| used to produce the CRAN binary packages.} on macOS) to do this is |
| @command{pkg-config}. The @file{configure} script will need to test for |
| the presence of the command itself (see for example package |
| @CRANpkg{Cairo}), and if present it can be asked if the software is |
| installed, of a suitable version and for compilation/linking flags by |
| e.g.@: |
| |
| @example |
| $ pkg-config --exists 'QtCore >= 4.0.0' # check the status |
| $ pkg-config --modversion QtCore |
| 4.8.7 |
| $ pkg-config --cflags QtCore |
| -DQT_SHARED -I/usr/include/QtCore |
| $ pkg-config --libs QtCore |
| -lQtCore |
| $ pkg-config --static --libs QtCore |
| -lQtCore -lpthread -lz -lm -ldl -lgthread-2.0 -pthread -lglib-2.0 -lrt |
| @end example |
| |
| @noindent |
| Note that @command{pkg-config --libs} gives the information required to |
| link against the default version@footnote{but not all projects get this |
| right when only a static library is installed, so it is often necessary |
| to try in turn @command{pkg-config --libs} and @command{pkg-config |
| --static --libs}.} of that library (usually the dynamic one), and |
| @command{pkg-config --static --libs} may be needed if the static library is |
| to be used. |
| |
| Sometimes the name by which the software is known to |
| @command{pkg-config} is not what one might expect (e.g.@: |
| @samp{gtk+-2.0} even for 2.22). To get a complete list use |
| |
| @example |
| pkg-config --list-all | sort |
| @end example |
| |
| If using Autoconf it is good practice to include all the Autoconf |
| sources in the the package (and required for an Open Source package). |
| This will include the file @file{configure.ac}@footnote{a decade ago |
| Autoconf used @file{configure.in}: this is still accepted but should be |
| renamed and @command{autoreconf} as used by @command{R CMD check |
| --as-cran} will report as such.} in the top-level directory of the |
| package. If extensions written in @command{m4} are needed, these should |
| be included under the directory @file{tools} and included in |
| @file{configure.ac} @emph{via} e.g., |
| @example |
| m4_include([tools/ax_pthread.m4]) |
| @end example |
| @noindent |
| One source of such extensions is the `Autoconf Archive' |
| (@uref{https://www.gnu.org/@/software/@/autoconf-archive}. It is not |
| safe to assume this is installed on users' machines, so the extension |
| should be shipped with the package (taking care to comply with its |
| licence). |
| |
| @menu |
| * Using Makevars:: |
| * Configure example:: |
| * Using F9x code:: |
| * Using C++11 code:: |
| * Using C++14 code:: |
| * Using C++17 code:: |
| @end menu |
| |
| @node Using Makevars, Configure example, Configure and cleanup, Configure and cleanup |
| @subsection Using @file{Makevars} |
| |
| @menu |
| * OpenMP support:: |
| * Using pthreads:: |
| * Compiling in sub-directories:: |
| @end menu |
| |
| Sometimes writing your own @file{configure} script can be avoided by |
| supplying a file @file{Makevars}: also one of the most common uses of a |
| @file{configure} script is to make @file{Makevars} from |
| @file{Makevars.in}. |
| |
| A @file{Makevars} file is a makefile and is used as one of several |
| makefiles by @command{R CMD SHLIB} (which is called by @command{R CMD |
| INSTALL} to compile code in the @file{src} directory). It should be |
| written if at all possible in a portable style, in particular (except |
| for @file{Makevars.win}) without the use of GNU extensions. |
| |
| The most common use of a @file{Makevars} file is to set additional |
| preprocessor options (for example include paths and definitions) for |
| C/C++ files @emph{via} @code{PKG_CPPFLAGS}, and additional compiler |
| flags by setting @code{PKG_CFLAGS}, @code{PKG_CXXFLAGS} or |
| @code{PKG_FFLAGS}, for C, C++ or Fortran respectively (@pxref{Creating |
| shared objects}). |
| |
| @strong{N.B.}: Include paths are preprocessor options, not compiler |
| options, and @strong{must} be set in @code{PKG_CPPFLAGS} as otherwise |
| platform-specific paths (e.g.@: @samp{-I/usr/local/include}) will take |
| precedence. @code{PKG_CPPFLAGS} should contain @samp{-I}, @samp{-D}, |
| @samp{-U} and (where supported) @samp{-include} and @samp{-pthread} |
| options: everything else should be a compiler flag. |
| |
| @file{Makevars} can also be used to set flags for the linker, for |
| example @samp{-L} and @samp{-l} options, @emph{via} @code{PKG_LIBS}. |
| |
| When writing a @file{Makevars} file for a package you intend to |
| distribute, take care to ensure that it is not specific to your |
| compiler: flags such as @option{-O2 -Wall -pedantic} (and all other |
| @option{-W} flags: for the Oracle compilers these are used to pass |
| arguments to compiler phases) are all specific to GCC. |
| |
| Also, do not set variables such as @code{CPPFLAGS}, @code{CFLAGS} etc.: |
| these should be settable by users (sites) through appropriate personal |
| (site-wide) @file{Makevars} files. |
| @ifset UseExternalXrefs |
| @xref{Customizing package compilation, , Customizing package compilation, |
| R-admin, R Installation and Administration}, |
| @end ifset |
| |
| There are some macros@footnote{in POSIX parlance: GNU @command{make} |
| calls these `make variables'.} which are set whilst configuring the |
| building of @R{} itself and are stored in |
| @file{@var{R_HOME}/etc@var{R_ARCH}/Makeconf}. That makefile is included |
| as a @file{Makefile} @emph{after} @file{Makevars[.win]}, and the macros |
| it defines can be used in macro assignments and make command lines in |
| the latter. These include |
| |
| @table @code |
| @item FLIBS |
| @vindex FLIBS |
| A macro containing the set of libraries need to link Fortran code. This |
| may need to be included in @code{PKG_LIBS}: it will normally be included |
| automatically if the package contains Fortran source files in the |
| @file{src} directory. |
| |
| @item BLAS_LIBS |
| @vindex BLAS_LIBS |
| A macro containing the BLAS libraries used when building @R{}. This may |
| need to be included in @code{PKG_LIBS}. Beware that if it is empty then |
| the @R{} executable will contain all the double-precision and |
| double-complex BLAS routines, but no single-precision nor complex |
| routines. If @code{BLAS_LIBS} is included, then @code{FLIBS} also needs |
| to be@footnote{at least on Unix-alikes: the Windows build currently |
| resolves such dependencies to a static Fortran library when |
| @file{Rblas.dll} is built.} included following it, as most BLAS |
| libraries are written at least partially in Fortran. |
| |
| @item LAPACK_LIBS |
| @vindex LAPACK_LIBS |
| A macro containing the LAPACK libraries (and paths where appropriate) |
| used when building @R{}. This may need to be included in |
| @code{PKG_LIBS}. It may point to a dynamic library @code{libRlapack} |
| which contains the main double-precision LAPACK routines as well as |
| those double-complex LAPACK routines needed to build @R{}, or it may |
| point to an external LAPACK library, or may be empty if an external BLAS |
| library also contains LAPACK. |
| |
| [@code{libRlapack} includes all the double-precision LAPACK routines |
| which were current in 2003: a list of which routines are included is in |
| file @file{src/modules/lapack/README}. Note that an external LAPACK/BLAS |
| library need not do so, as some were `deprecated' (and not compiled by |
| default) in LAPACK 3.6.0 in late 2015.] |
| |
| For portability, the macros @code{BLAS_LIBS} and @code{FLIBS} should |
| always be included @emph{after} @code{LAPACK_LIBS} (and in that order). |
| |
| @item SAFE_FFLAGS |
| @vindex SAFE_FFLAGS |
| A macro containing flags which are needed to circumvent |
| over-optimization of FORTRAN code: it is might be @samp{-g -O2 |
| -ffloat-store} or @samp{-g -O2 -msse2 -mfpmath=sse} on @cputype{ix86} |
| platforms using @command{gfortran}. Note that this is @strong{not} an |
| additional flag to be used as part of @code{PKG_FFLAGS}, but a |
| replacement for @code{FFLAGS}. See the example later in this section. |
| @end table |
| |
| @vindex OBJECTS |
| Setting certain macros in @file{Makevars} will prevent @command{R CMD |
| SHLIB} setting them: in particular if @file{Makevars} sets |
| @samp{OBJECTS} it will not be set on the @command{make} command line. |
| This can be useful in conjunction with implicit rules to allow other |
| types of source code to be compiled and included in the shared object. |
| It can also be used to control the set of files which are compiled, |
| either by excluding some files in @file{src} or including some files in |
| subdirectories. For example |
| |
| @example |
| OBJECTS = 4dfp/endianio.o 4dfp/Getifh.o R4dfp-object.o |
| @end example |
| |
| |
| Note that @file{Makevars} should not normally contain targets, as it is |
| included before the default makefile and @command{make} will call the |
| first target, intended to be @code{all} in the default makefile. If you |
| really need to circumvent that, use a suitable (phony) target @code{all} |
| before any actual targets in @file{Makevars.[win]}: for example package |
| @CRANpkg{fastICA} used to have |
| |
| @example |
| PKG_LIBS = @@BLAS_LIBS@@ |
| |
| SLAMC_FFLAGS=$(R_XTRA_FFLAGS) $(FPICFLAGS) $(SHLIB_FFLAGS) $(SAFE_FFLAGS) |
| |
| all: $(SHLIB) |
| |
| slamc.o: slamc.f |
| $(FC) $(SLAMC_FFLAGS) -c -o slamc.o slamc.f |
| @end example |
| |
| @noindent |
| needed to ensure that the LAPACK routines find some constants without |
| infinite looping. The Windows equivalent was |
| |
| @example |
| all: $(SHLIB) |
| |
| slamc.o: slamc.f |
| $(FC) $(SAFE_FFLAGS) -c -o slamc.o slamc.f |
| @end example |
| |
| @noindent |
| (since the other macros are all empty on that platform, and @R{}'s |
| internal BLAS was not used). Note that the first target in |
| @file{Makevars} will be called, but for back-compatibility it is best |
| named @code{all}. |
| |
| If you want to create and then link to a library, say using code in a |
| subdirectory, use something like |
| |
| @example |
| .PHONY: all mylibs |
| |
| all: $(SHLIB) |
| $(SHLIB): mylibs |
| |
| mylibs: |
| (cd subdir; $(MAKE)) |
| @end example |
| |
| @noindent |
| Be careful to create all the necessary dependencies, as there is no |
| guarantee that the dependencies of @code{all} will be run in a |
| particular order (and some of the @acronym{CRAN} build machines use |
| multiple CPUs and parallel makes). In particular, |
| |
| @example |
| all: mylibs |
| @end example |
| |
| @noindent |
| does @strong{not} suffice. GNU make does allow the construct |
| @example |
| .NOTPARALLEL: all |
| all: mylibs $(SHLIB) |
| @end example |
| |
| @noindent |
| but that is not portable. @command{dmake} and @command{pmake} allow the |
| similar @code{.NO_PARALLEL}, also not portable: some variants of |
| @command{pmake} accept @code{.NOTPARALLEL} as an alias for |
| @code{.NO_PARALLEL}. |
| |
| Note that on Windows it is required that @file{Makevars[.win]} does |
| create a DLL: this is needed as it is the only reliable way to ensure |
| that building a DLL succeeded. If you want to use the @file{src} |
| directory for some purpose other than building a DLL, use a |
| @file{Makefile.win} file. |
| |
| It is sometimes useful to have a target @samp{clean} in @file{Makevars} |
| or @file{Makevars.win}: this will be used by @command{R CMD build} to |
| clean up (a copy of) the package sources. When it is run by |
| @command{build} it will have fewer macros set, in particular not |
| @code{$(SHLIB)}, nor @code{$(OBJECTS)} unless set in the file itself. |
| It would also be possible to add tasks to the target @samp{shlib-clean} |
| which is run by @command{R CMD INSTALL} and @command{R CMD SHLIB} with |
| options @option{--clean} and @option{--preclean}. |
| |
| If you want to run @R{} code in @file{Makevars}, e.g.@: to find |
| configuration information, please do ensure that you use the correct |
| copy of @code{R} or @code{Rscript}: there might not be one in the path |
| at all, or it might be the wrong version or architecture. The correct |
| way to do this is @emph{via} |
| |
| @example |
| "$(R_HOME)/bin$(R_ARCH_BIN)/Rscript" @var{filename} |
| "$(R_HOME)/bin$(R_ARCH_BIN)/Rscript" -e '@var{R expression}' |
| @end example |
| |
| @noindent |
| where @code{$(R_ARCH_BIN)} is only needed currently on Windows. |
| |
| Environment or make variables can be used to select different macros for |
| 32- and 64-bit code, for example (GNU @command{make} syntax, allowed on |
| Windows) |
| |
| @example |
| ifeq "$(WIN)" "64" |
| PKG_LIBS = @var{value for 64-bit Windows} |
| else |
| PKG_LIBS = @var{value for 32-bit Windows} |
| endif |
| @end example |
| |
| On Windows there is normally a choice between linking to an import |
| library or directly to a DLL. Where possible, the latter is much more |
| reliable: import libraries are tied to a specific toolchain, and in |
| particular on 64-bit Windows two different conventions have been |
| commonly used. So for example instead of |
| |
| @example |
| PKG_LIBS = -L$(XML_DIR)/lib -lxml2 |
| @end example |
| |
| @noindent |
| one can use |
| |
| @example |
| PKG_LIBS = -L$(XML_DIR)/bin -lxml2 |
| @end example |
| |
| @noindent |
| since on Windows @code{-lxxx} will look in turn for |
| |
| @example |
| libxxx.dll.a |
| xxx.dll.a |
| libxxx.a |
| xxx.lib |
| libxxx.dll |
| xxx.dll |
| @end example |
| |
| @noindent |
| where the first and second are conventionally import libraries, the |
| third and fourth often static libraries (with @code{.lib} intended for |
| Visual C++), but might be import libraries. See for example |
| @uref{https://sourceware.org/@/binutils/@/docs-2.20/@/ld/@/WIN32.html#WIN32}. |
| |
| The fly in the ointment is that the DLL might not be named |
| @file{libxxx.dll}, and in fact on 32-bit Windows there is a |
| @file{libxml2.dll} whereas on one build for 64-bit Windows the DLL is |
| called @file{libxml2-2.dll}. Using import libraries can cover over |
| these differences but can cause equal difficulties. |
| |
| If static libraries are available they can save a lot of problems with |
| run-time finding of DLLs, especially when binary packages are to be |
| distributed and even more when these support both architectures. Where |
| using DLLs is unavoidable we normally arrange (@emph{via} |
| @file{configure.win}) to ship them in the same directory as the package |
| DLL. |
| |
| @node OpenMP support, Using pthreads, Using Makevars, Using Makevars |
| @subsubsection OpenMP support |
| |
| @cindex OpenMP |
| |
| There is some support for packages which wish to use |
| OpenMP@footnote{@uref{http://www.openmp.org/}, |
| @uref{https://en.wikipedia.org/@/wiki/@/OpenMP}, |
| @uref{https://computing.llnl.gov/@/tutorials/@/openMP/}}. The |
| @command{make} macros |
| |
| @example |
| SHLIB_OPENMP_CFLAGS |
| SHLIB_OPENMP_CXXFLAGS |
| SHLIB_OPENMP_FFLAGS |
| @end example |
| |
| @noindent |
| are available for use in @file{src/Makevars} or @file{src/Makevars.win}. |
| Include the appropriate macro in @code{PKG_CFLAGS}, @code{PKG_CXXFLAGS} |
| and so on, and also in @code{PKG_LIBS} (but see below for Fortran). |
| C/C++ code that needs to be conditioned on the use of OpenMP can be used |
| inside @code{#ifdef _OPENMP}: note that some toolchains used for @R{} |
| (including Apple's for macOS and some others using |
| @command{clang}@footnote{Default builds of @command{clang} 3.8.0 and |
| later have support for OpenMP, but the @code{libomp} run-time library |
| may not be installed.}) have no OpenMP support at all, not even |
| @file{omp.h}. |
| |
| For example, a package with C code written for OpenMP should have in |
| @file{src/Makevars} the lines |
| |
| @example |
| PKG_CFLAGS = $(SHLIB_OPENMP_CFLAGS) |
| PKG_LIBS = $(SHLIB_OPENMP_CFLAGS) |
| @end example |
| |
| Note that the macro @code{SHLIB_OPENMP_CXXFLAGS} applies to the default |
| C++11 compiler and not necessarily to the C++98/14/17 compiler: users of |
| the latter should do their own @command{configure} checks. If you do |
| use your own checks, make sure that OpenMP support is complete by |
| compiling and linking an OpenMP-using program: on some platforms the |
| runtime library is optional and on others that library depends on other |
| optional libraries. |
| @c For clang pre-7, libomp.so depended on libatomic. |
| |
| Some care is needed when compilers are from different families which may |
| use different OpenMP runtimes (e.g.@: @command{clang} @emph{vs} GCC |
| including @command{gfortran}, although it is often possible to use the |
| @command{clang} runtime with GCC but not @emph{vice versa}: however |
| @command{gfortran} 9.x may generate calls not in the @command{clang} |
| runtime). For a package with Fortran code using OpenMP the appropriate |
| lines are |
| |
| @example |
| PKG_FFLAGS = $(SHLIB_OPENMP_FFLAGS) |
| PKG_LIBS = $(SHLIB_OPENMP_CFLAGS) |
| @end example |
| |
| @noindent |
| as the C compiler will be used to link the package code. There are |
| platforms on which this does not work @emph{for some OpenMP-using code} |
| and installation will fail, so portable packages wanting to use Fortran |
| code with OpenMP need to test their usage for themselves. An |
| alternative for a package with only Fortran sources using OpenMP is to |
| use a file @file{src/Makefile} (and @file{src/Makefile.win}) something |
| like |
| @example |
| PKG_FFLAGS = $(SHLIB_OPENMP_FFLAGS) |
| PKG_LIBS = $(SHLIB_OPENMP_FFLAGS) $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) |
| SHLIB_LD = $(SHLIB_FCLD) |
| SHLIB_LDFLAGS = $(SHLIB_FCLDFLAGS) |
| SHLIB = pkgname$(SHLIB_EXT) |
| |
| all: $(SHLIB) |
| |
| $(SHLIB): $(OBJECTS) |
| $(SHLIB_LINK) -o $@ $(OBJECTS) $(ALL_LIBS) |
| @end example |
| @noindent |
| Since @R{} >= 3.6.2 a further alternative is to use |
| @example |
| USE_FC_TO_LINK = |
| PKG_FFLAGS = $(SHLIB_OPENMP_FFLAGS) |
| PKG_LIBS = $(SHLIB_OPENMP_FFLAGS) |
| @end example |
| in @file{src/Makevars} or @file{src/Makevsars.win}. |
| |
| It is not portable to use OpenMP with more than one of C, C++ and |
| Fortran in a single package since it is not uncommon that the compilers |
| are of different families. |
| |
| For portability, any C/C++ code using the @code{omp_*} functions should |
| include the @file{omp.h} header: some compilers (but not all) include it |
| when OpenMP mode is switched on (e.g.@: @emph{via} flag |
| @option{-fopenmp}). |
| |
| @c http://openmp.org/wp/openmp-compilers-tools/ |
| @c clang 3.8.x reports 201307 but has full support only for 3.1 (201111) |
| @c clang 3.9.x reports 201111 but has all but offloading support of 4.0. |
| @c Debian Wheezy was on ELTS until 2019-05-31 |
| @c Centos 6 EOL 2020-11-30, defaults to gcc 4.7, but 4.8.4 is available |
| There is nothing@footnote{In most implementations the @code{_OPENMP} |
| macro has value a date which can be mapped to an OpenMP version: for |
| example, value @code{201307} is the date of version 4.0 (July |
| 2013). However this may be used to denote the latest version which is |
| partially supported, not that which is fully implemented.} to say what |
| version of OpenMP is supported: version 3.1 (and much of 4.0) is |
| supported by recent versions of the Linux, Windows and Solaris |
| platforms, but portable packages cannot assume that end users have |
| recent versions. Apple builds of @command{clang} |
| on macOS currently have no OpenMP support, but @acronym{CRAN} binary |
| packages are built with a @command{clang}-based toolchain which supports |
| OpenMP. @uref{http://www.openmp.org/@/resources/@/openmp-compilers-tools} |
| gives some idea of what compilers support what versions. |
| |
| The performance of OpenMP varies substantially between platforms. The |
| Windows implementation has substantial overheads@footnote{as did the |
| GCC-based Apple implementation, but not the Intel/LLVM OpenMP runtime |
| on macOS.}, so is only beneficial if quite substantial tasks are run in |
| parallel. Also, on Windows new threads are started with the |
| default@footnote{Windows default, not MinGW-w64 default.} FPU control |
| word, so computations done on OpenMP threads will not make use of |
| extended-precision arithmetic which is the default for the main process. |
| @c mingw64-public, 2015-02-02. |
| @c https://stackoverflow.com/questions/2553725/is-the-fpu-control-word-setting-per-thread-or-per-process |
| |
| Do not include these macros unless your code does make use of OpenMP |
| (possibly for C++ via included external headers): this can result in the |
| OpenMP runtime being linked in, threads being started, @dots{}. |
| |
| Calling any of the @R{} API from threaded code is `for experts only' and |
| strongly discouraged. Many functions in the @R{} API modify internal |
| @R{} data structures and might corrupt these data structures if called |
| simultaneously from multiple threads. Most @R{} API functions can |
| signal errors, which must only happen on the @R{} main thread. Also, |
| external libraries (e.g.@: LAPACK) may not be thread-safe. |
| |
| Packages are not standard-alone programs, and an @R{} process could |
| contain more than one OpenMP-enabled package as well as other components |
| (for example, an optimized BLAS) making use of OpenMP. So careful |
| consideration needs to be given to resource usage. OpenMP works with |
| parallel regions, and for most implementations the default is to use as |
| many threads as `CPUs' for such regions. Parallel regions can be |
| nested, although it is common to use only a single thread below the |
| first level. The correctness of the detected number of `CPUs' and the |
| assumption that the @R{} process is entitled to use them all are both |
| dubious assumptions. One way to limit resources is to limit the overall |
| number of threads available to OpenMP in the @R{} process: this can be |
| done @emph{via} environment variable @env{OMP_THREAD_LIMIT}, where |
| implemented.@footnote{Which it was at the time of writing with GCC, |
| Oracle, Intel and Clang compilers. The count may include the thread |
| running the main process.} Alternatively, the number of threads per |
| region can be limited by the environment variable @env{OMP_NUM_THREADS} |
| or API call @code{omp_set_num_threads}, or, better, for the regions in |
| your code as part of their specification. E.g.@: @R{} uses@footnote{Be |
| careful not to declare @code{nthreads} as @code{const int}: the Oracle |
| compiler requires it to be `an lvalue'.} |
| @example |
| #pragma omp parallel for num_threads(nthreads) @dots{} |
| @end example |
| @noindent |
| That way you only control your own code and not that of other OpenMP users. |
| |
| Note that setting environment variables to control OpenMP is |
| implementation-dependent and may need to be done outside the @R{} |
| process or before any use of OpenMP (which might be by another process |
| or @R{} itself). Also, implementation-specific variables such as |
| @env{KMP_THREAD_LIMIT} might take precedence. |
| |
| @node Using pthreads, Compiling in sub-directories, OpenMP support, Using Makevars |
| @subsubsection Using pthreads |
| |
| There is no direct support for the POSIX threads (more commonly known as |
| @code{pthreads}): by the time we considered adding it several packages |
| were using it unconditionally so it seems that nowadays it is |
| universally available on POSIX operating systems (hence not Windows). |
| |
| For reasonably recent versions of @command{gcc} and @command{clang} the |
| correct specification is |
| |
| @example |
| PKG_CPPFLAGS = -pthread |
| PKG_LIBS = -pthread |
| @end example |
| |
| @noindent |
| (and the plural version is also accepted on some systems/versions). For |
| other platforms the specification is |
| |
| @example |
| PKG_CPPFLAGS = -D_REENTRANT |
| PKG_LIBS = -lpthread |
| @end example |
| @noindent |
| (and note that the library name is singular). This is what |
| @option{-pthread} does on all known current platforms (although earlier |
| versions of OpenBSD used a different library name). |
| |
| For a tutorial see |
| @uref{https://computing.llnl.gov/@/tutorials/@/pthreads/}. |
| |
| POSIX threads are not normally used on Windows, which has its own native |
| concepts of threads. However, there are two projects implementing |
| @code{pthreads} on top of Windows, @code{pthreads-w32} and |
| @code{winpthreads} (part of the MinGW-w64 project). |
| |
| Whether Windows toolchains implement @code{pthreads} is up to the |
| toolchain provider. A @command{make} variable |
| @code{SHLIB_PTHREAD_FLAGS} is available for use in |
| @file{src/Makevars.win}: this should be included in both |
| @code{PKG_CPPFLAGS} (or the Fortran compiler flags) and @code{PKG_LIBS}. |
| |
| The presence of a working @code{pthreads} implementation cannot be |
| unambiguously determined without testing for yourself: however, that |
| @samp{_REENTRANT} is defined@footnote{some Windows toolchains had the |
| typo @samp{_REENTRANCE} instead.} in C/C++ code is a good indication. |
| |
| Note that not all @code{pthreads} implementations are equivalent as parts |
| are optional (see |
| @uref{http://pubs.opengroup.org/@/onlinepubs/@/009695399/@/basedefs/@/pthread.h.html}): |
| for example, macOS lacks the `Barriers' option. |
| |
| See also the comments on thread-safety and performance under OpenMP: on |
| all known @R{} platforms OpenMP is implemented @emph{via} |
| @code{pthreads} and the known performance issues are in the latter. |
| |
| @node Compiling in sub-directories, , Using pthreads, Using Makevars |
| @subsubsection Compiling in sub-directories |
| |
| Package authors fairly often want to organize code in sub-directories of |
| @file{src}, for example if they are including a separate piece of |
| external software to which this is an @R{} interface. |
| |
| One simple way is simply to set @code{OBJECTS} to be all the objects |
| that need to be compiled, including in sub-directories. For example, |
| @acronym{CRAN} package @CRANpkg{RSiena} has |
| |
| @smallexample |
| SOURCES = $(wildcard data/*.cpp network/*.cpp utils/*.cpp model/*.cpp model/*/*.cpp model/*/*/*.cpp) |
| |
| OBJECTS = siena07utilities.o siena07internals.o siena07setup.o siena07models.o $(SOURCES:.cpp=.o) |
| @end smallexample |
| |
| @noindent |
| One problem with that approach is that unless GNU make extensions are |
| used, the source files need to be listed and kept up-to-date. As in the |
| following from @acronym{CRAN} package @CRANpkg{lossDev}: |
| |
| @smallexample |
| OBJECTS.samplers = samplers/ExpandableArray.o samplers/Knots.o \ |
| samplers/RJumpSpline.o samplers/RJumpSplineFactory.o \ |
| samplers/RealSlicerOV.o samplers/SliceFactoryOV.o samplers/MNorm.o |
| OBJECTS.distributions = distributions/DSpline.o \ |
| distributions/DChisqrOV.o distributions/DTOV.o \ |
| distributions/DNormOV.o distributions/DUnifOV.o distributions/RScalarDist.o |
| OBJECTS.root = RJump.o |
| |
| OBJECTS = $(OBJECTS.samplers) $(OBJECTS.distributions) $(OBJECTS.root) |
| @end smallexample |
| |
| Where the subdirectory is self-contained code with a suitable makefile, |
| the best approach is something like |
| |
| @smallexample |
| PKG_LIBS = -LCsdp/lib -lsdp $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) |
| |
| $(SHLIB): Csdp/lib/libsdp.a |
| |
| Csdp/lib/libsdp.a: |
| @@(cd Csdp/lib && $(MAKE) libsdp.a \ |
| CC="$(CC)" CFLAGS="$(CFLAGS) $(CPICFLAGS)" AR="$(AR)" RANLIB="$(RANLIB)") |
| @end smallexample |
| |
| @noindent |
| Note the quotes: the macros can contain spaces, e.g.@: @code{CC = "gcc |
| -m64 -std=gnu99"}. Several authors have forgotten about parallel makes: |
| the static library in the subdirectory must be made before the shared |
| object (@code{$(SHLIB)}) and so the latter must depend on the former. |
| Others forget the need@footnote{A few OSes (AIX, IRIX, Windows) do not |
| need special flags for such code, but most do---although compilers will |
| often generate PIC code when not asked to do so.} for |
| position-independent code. |
| |
| We really do not recommend using @file{src/Makefile} instead of |
| @file{src/Makevars}, and as the example above shows, it is not |
| necessary. |
| |
| @node Configure example, Using F9x code, Using Makevars, Configure and cleanup |
| @subsection Configure example |
| |
| It may be helpful to give an extended example of using a |
| @file{configure} script to create a @file{src/Makevars} file: this is |
| based on that in the @CRANpkg{RODBC} package. |
| |
| The @file{configure.ac} file follows: @file{configure} is created from |
| this by running @command{autoconf} in the top-level package directory |
| (containing @file{configure.ac}). |
| |
| @quotation |
| @c @cartouche |
| @smallexample |
| AC_INIT([RODBC], 1.1.8) dnl package name, version |
| |
| dnl A user-specifiable option |
| odbc_mgr="" |
| AC_ARG_WITH([odbc-manager], |
| AC_HELP_STRING([--with-odbc-manager=MGR], |
| [specify the ODBC manager, e.g. odbc or iodbc]), |
| [odbc_mgr=$withval]) |
| |
| if test "$odbc_mgr" = "odbc" ; then |
| AC_PATH_PROGS(ODBC_CONFIG, odbc_config) |
| fi |
| |
| dnl Select an optional include path, from a configure option |
| dnl or from an environment variable. |
| AC_ARG_WITH([odbc-include], |
| AC_HELP_STRING([--with-odbc-include=INCLUDE_PATH], |
| [the location of ODBC header files]), |
| [odbc_include_path=$withval]) |
| RODBC_CPPFLAGS="-I." |
| if test [ -n "$odbc_include_path" ] ; then |
| RODBC_CPPFLAGS="-I. -I$@{odbc_include_path@}" |
| else |
| if test [ -n "$@{ODBC_INCLUDE@}" ] ; then |
| RODBC_CPPFLAGS="-I. -I$@{ODBC_INCLUDE@}" |
| fi |
| fi |
| |
| dnl ditto for a library path |
| AC_ARG_WITH([odbc-lib], |
| AC_HELP_STRING([--with-odbc-lib=LIB_PATH], |
| [the location of ODBC libraries]), |
| [odbc_lib_path=$withval]) |
| if test [ -n "$odbc_lib_path" ] ; then |
| LIBS="-L$odbc_lib_path $@{LIBS@}" |
| else |
| if test [ -n "$@{ODBC_LIBS@}" ] ; then |
| LIBS="-L$@{ODBC_LIBS@} $@{LIBS@}" |
| else |
| if test -n "$@{ODBC_CONFIG@}"; then |
| odbc_lib_path=`odbc_config --libs | sed s/-lodbc//` |
| LIBS="$@{odbc_lib_path@} $@{LIBS@}" |
| fi |
| fi |
| fi |
| |
| dnl Now find the compiler and compiler flags to use |
| : $@{R_HOME=`R RHOME`@} |
| if test -z "$@{R_HOME@}"; then |
| echo "could not determine R_HOME" |
| exit 1 |
| fi |
| CC=`"$@{R_HOME@}/bin/R" CMD config CC` |
| CFLAGS=`"$@{R_HOME@}/bin/R" CMD config CFLAGS` |
| CPPFLAGS=`"$@{R_HOME@}/bin/R" CMD config CPPFLAGS` |
| |
| if test -n "$@{ODBC_CONFIG@}"; then |
| RODBC_CPPFLAGS=`odbc_config --cflags` |
| fi |
| CPPFLAGS="$@{CPPFLAGS@} $@{RODBC_CPPFLAGS@}" |
| |
| dnl Check the headers can be found |
| AC_CHECK_HEADERS(sql.h sqlext.h) |
| if test "$@{ac_cv_header_sql_h@}" = no || |
| test "$@{ac_cv_header_sqlext_h@}" = no; then |
| AC_MSG_ERROR("ODBC headers sql.h and sqlext.h not found") |
| fi |
| |
| dnl search for a library containing an ODBC function |
| if test [ -n "$@{odbc_mgr@}" ] ; then |
| AC_SEARCH_LIBS(SQLTables, $@{odbc_mgr@}, , |
| AC_MSG_ERROR("ODBC driver manager $@{odbc_mgr@} not found")) |
| else |
| AC_SEARCH_LIBS(SQLTables, odbc odbc32 iodbc, , |
| AC_MSG_ERROR("no ODBC driver manager found")) |
| fi |
| |
| dnl for 64-bit ODBC need SQL[U]LEN, and it is unclear where they are defined. |
| AC_CHECK_TYPES([SQLLEN, SQLULEN], , , [# include <sql.h>]) |
| dnl for unixODBC header |
| AC_CHECK_SIZEOF(long, 4) |
| |
| dnl substitute RODBC_CPPFLAGS and LIBS |
| AC_SUBST(RODBC_CPPFLAGS) |
| AC_SUBST(LIBS) |
| AC_CONFIG_HEADERS([src/config.h]) |
| dnl and do substitution in the src/Makevars.in and src/config.h |
| AC_CONFIG_FILES([src/Makevars]) |
| AC_OUTPUT |
| @end smallexample |
| @c @end cartouche |
| @end quotation |
| |
| @noindent |
| where @file{src/Makevars.in} would be simply |
| |
| @quotation |
| @example |
| PKG_CPPFLAGS = @@RODBC_CPPFLAGS@@ |
| PKG_LIBS = @@LIBS@@ |
| @end example |
| @end quotation |
| |
| A user can then be advised to specify the location of the ODBC driver |
| manager files by options like (lines broken for easier reading) |
| |
| @example |
| R CMD INSTALL \ |
| --configure-args='--with-odbc-include=/opt/local/include \ |
| --with-odbc-lib=/opt/local/lib --with-odbc-manager=iodbc' \ |
| RODBC |
| @end example |
| |
| @noindent |
| or by setting the environment variables @code{ODBC_INCLUDE} and |
| @code{ODBC_LIBS}. |
| |
| @node Using F9x code, Using C++11 code, Configure example, Configure and cleanup |
| @subsection Using F9x code |
| |
| @R{} assumes that source files with extension @file{.f} are fixed-form |
| Fortran 90 (which includes Fortran 77), and passes them to the compiler |
| specified by macro @samp{FC}. On known platforms the Fortran compiler |
| will also accept free-form Fortran 90/95 code with extension @file{.f90} |
| or @file{.f95}, but those are not used by @R{} itself so this is not |
| required. |
| |
| @vindex PKG_FCFLAGS |
| As from @R{} 3.6.0 the same compiler is used for both fixed-form and |
| free-form Fortran code (with different file extensions and possibly |
| different flags). For both, macro @code{PKG_FFLAGS} can be used for |
| package-specific flags: in the unusual case that both are included in a |
| single package and that different flags are needed for the two forms, |
| macro @code{PKG_FCFLAGS} is also available for free-form Fortran. |
| |
| The code used to build @R{} allows a `Fortran 90' compiler to be |
| selected as @samp{FC}, so platforms might be encountered which only |
| support Fortran 90. However, Fortran 95 is widely supported. |
| |
| Some compilers specified by @samp{FC} will accept Fortran 2003, 2008 or |
| 2018 code: such code should still use file extension @file{.f90} or |
| @file{.f95}. Most platforms use @command{gfortran} where you may need |
| to include @option{-std=f2003}, @option{-std=f2008} or (from version 8) |
| @option{-std=f2018} in @code{PKG_FFLAGS} or @code{PKG_FCFLAGS}: the |
| default is `GNU Fortran', Fortran 95 with non-standard extensions. The |
| Oracle @command{f95} compiler `accepts some Fortran 2003/8 features' |
| (search for `Oracle Developer Studio 12.6: Fortran User's Guide' and |
| look for §4.6). Intel Fortran has full Fortran 2008 support from |
| version 17.0, and some 2018 support in version 16.0 and more in version |
| 19.0. |
| |
| Modern versions of Fortran support modules, whereby compiling one source |
| file creates a module file which is then included in others. (Module |
| files typically have a @file{.mod} extension: they do depend on the |
| compiler used and so should never be included in a package.) This |
| creates a dependence which @command{make} will not know about and often |
| causes installation with a parallel make to fail. Thus it is necessary |
| to add explicit dependencies to @file{src/Makevars} to tell |
| @command{make} the constraints on the order of compilation. For |
| example, if file @file{iface.f90} creates a module @samp{iface} used by |
| files @file{cmi.f90} and @file{dmi.f90} then @file{src/Makevars} needs |
| to contain something like |
| |
| @example |
| cmi.o dmi.o: iface.o |
| @end example |
| |
| @noindent |
| Note that it is not portable (although some platforms do accept it) to |
| define a module of the same name in multiple source files. |
| @c As was done by frailtypack in 2018-12: gfortran accepted this, ODS |
| @c on Solaris did not. |
| |
| @node Using C++11 code, Using C++14 code, Using F9x code, Configure and cleanup |
| @subsection Using C++11 code |
| |
| @R{} can be built without a C++ compiler although one is available (but |
| not necessarily installed) on all known @R{} platforms. For full |
| portability across platforms, all that can be assumed is approximate |
| support for the C++98 standard (the widely used @command{g++} deviates |
| considerably from the standard). Some compilers have a concept of |
| `C++03' (`essentially a bug fix') or `C++ Technical Report 1' (TR1), an |
| optional addition to the `C++03' revision which was published in 2007. |
| A revised standard was published in 2011 and compilers with pretty much |
| complete implementations are available. C++11 added all of the C99 |
| features which are not otherwise implemented in C++, and C++ compilers |
| commonly accept C99 extensions to C++98. A minor update@footnote{The |
| changes are linked from |
| @uref{https://isocpp.org/@/std/@/standing-documents/@/sd-6-sg10-feature-test-recommendations}.} |
| (`C++14') was published in December 2014. A revision (`C++17') was |
| published in December 2017, and a further revision (`C++20', with many |
| new features) is scheduled for publication in May 2020. |
| |
| What standard a C++ compiler aims to support can be hard to determine: |
| the value@footnote{Values @code{199711}, @code{201103L} and |
| @code{201402L} are most commonly used for C++98, C++11 and C++14 |
| respectively, but some compilers set @code{1L}. The official value for |
| C++17 is @code{201703L} but some compiler versions have smaller values |
| so a test of @code{__cplusplus > 201402L} is safer.} of |
| @code{__cplusplus} may help but some compilers use it to denote a |
| standard which is partially supported and some the latest standard which |
| is (almost) fully supported. |
| |
| The webpage |
| @uref{http://en.cppreference.com/@/w/@/cpp/@/compiler_support} gives |
| some information on which compilers are known to support recent C++ |
| features. @command{g++} claims full C++11 support from version 4.8.1. |
| |
| As from version 3.6.2@footnote{version 3.6.0 except on Windows.}, @R{} |
| selects a default C++ compiler with options that conform as far as |
| possible@footnote{allowing `GNU extensions', but if possible excluding |
| C++14 features.} to C++11. Packages which do not specify @samp{R (>= |
| 3.6.2)} in their @file{DESCRIPTION} files need to explicitly require |
| C++11, hence the rest of this section. |
| |
| In order to specify C++11 code in a package to be used with @R{} |
| versions from 3.1.0 but before 3.6.2, the package's @file{Makevars} file |
| (or @file{Makevars.win} on Windows) should include the line |
| |
| @example |
| CXX_STD = CXX11 |
| @end example |
| @noindent |
| Compilation and linking will then be done with the C++11 compiler (if any). |
| |
| Packages without a @file{src/Makevars} or @file{src/Makefile} file may |
| specify that they require C++11 for code in the @file{src} directory by |
| including @samp{C++11} in the @samp{SystemRequirements} field of the |
| @file{DESCRIPTION} file, e.g. |
| |
| @example |
| SystemRequirements: C++11 |
| @end example |
| |
| If a package does have a @file{src/Makevars[.win]} file then setting the |
| make variable @samp{CXX_STD} is preferred, as it allows @command{R CMD |
| SHLIB} to work correctly in the package's @file{src} directory. |
| |
| Conversely, to ensure that the C++98 standard is assumed even when this |
| is not the compiler default, use |
| |
| @example |
| SystemRequirements: C++98 |
| @end example |
| @noindent |
| or |
| @example |
| CXX_STD = CXX98 |
| @end example |
| @noindent |
| But note that this is deprecated and will be ignorwd in future versions |
| of @R{}. |
| |
| The C++11 compiler will be used systematically by R for all C++ code |
| if the environment variable @env{USE_CXX11} is defined (with any |
| value). Hence this environment variable should be defined when invoking |
| @command{R CMD SHLIB} in the absence of a @file{Makevars} file (or |
| @file{Makevars.win} on Windows) if a C++11 compiler is required. |
| |
| Further control over compilation of C++11 code can be obtained by |
| specifying the macros @samp{CXX11} and @samp{CXX11STD} when @R{} is |
| configured@footnote{For |
| details of these and related macros, see file @file{config.site} in |
| the @R{} sources.}, or in a personal or site @file{Makevars} file. |
| @ifset UseExternalXrefs |
| @xref{Customizing package compilation, , Customizing package compilation, |
| R-admin, R Installation and Administration}. |
| @end ifset |
| If C++11 support is not available then these macros are both empty; if |
| it is available by default, @samp{CXX11} defaults to @samp{CXX} and |
| @samp{CXX11STD} is empty . Otherwise, @samp{CXX11} defaults to the same |
| value as the C++ compiler @samp{CXX} and the flag @samp{CXX11STD} |
| defaults to @option{-std=c++11} or similar. It is possible to specify |
| @samp{CXX11} to be a distinct compiler just for C++11--using packages, |
| e.g.@: @command{g++} on Solaris. Note however that different C++ |
| compilers (and even different versions of the same compiler) often |
| differ in their ABI so their outputs can rarely be mixed. By setting |
| @samp{CXX11STD} it is also possible to choose a different dialect of the |
| standard such as @option{-std=c++11}. |
| |
| As noted above, support for C++11 varies across platforms: on some |
| platforms, it may be possible or necessary to select a different |
| compiler for C++11, @emph{via} personal or site @file{Makevars} files. |
| |
| There is no guarantee that C++11 can be used in a package in combination |
| with any other compiled language (even C), as the C++11 compiler may be |
| incompatible with the native compilers for the platform. |
| |
| If a package using C++11 has a @command{configure} script it is |
| essential that it selects the correct compiler, @emph{via} something like |
| |
| @example |
| CXX11=`"$@{R_HOME@}/bin/R" CMD config CXX11` |
| if test -z "$CXX11"; then |
| AC_MSG_ERROR([No C++11 compiler is available]) |
| fi |
| CXX11STD=`"$@{R_HOME@}/bin/R" CMD config CXX11STD` |
| CXX="$@{CXX11@} $@{CXX11STD@}" |
| CXXFLAGS=`"$@{R_HOME@}/bin/R" CMD config CXX11FLAGS` |
| AC_LANG(C++) |
| @end example |
| |
| @noindent |
| (paying attention to all the quotes required). |
| |
| If you want to compile C++11 code in a subdirectory, make sure you pass |
| down the macros to specify that compiler, e.g.@: in @file{src/Makevars} |
| @example |
| sublibs: |
| @@(cd libs && $(MAKE) \ |
| CXX="$(CXX11) $(CXX11STD)" CXXFLAGS="$(CXX11FLAGS) $(CXX11PICFLAGS)") |
| @end example |
| |
| Note that the mechanisms described here specify C++11 for code compiled |
| by @command{R CMD SHLIB} as used by default by @command{R CMD INSTALL}. |
| They do not necessarily apply if there is a @file{src/Makefile} file, |
| nor to compilation done in vignettes or @emph{via} other packages. |
| |
| @node Using C++14 code, Using C++17 code, Using C++11 code, Configure and cleanup |
| @subsection Using C++14 code |
| |
| Support for a C++14 compiler (where available) was been added to @R{} |
| from version 3.4.0. Similar considerations to C++11 apply, with the |
| variables associated with the C++14 compiler using the prefix |
| @samp{CXX14} instead of @samp{CXX11}. Hence to use C++14 code in a |
| package, the package's @file{Makevars} file (or @file{Makevars.win} on |
| Windows) should include the line |
| @example |
| CXX_STD = CXX14 |
| @end example |
| |
| In the absence of a @file{Makevars} file, C++14 support can also be |
| requested by the line: |
| @example |
| SystemRequirements: C++14 |
| @end example |
| @noindent |
| in the @file{DESCRIPTION} file. Finally, the C++14 compiler can be |
| used systematically by setting the environment variable @env{USE_CXX14}. |
| |
| Note that code written for C++11 that emulates features of C++14 will |
| not necessarily compile under a C++14 compiler@footnote{As from @R{} |
| 3.4.0, @command{configure} attempts to supply a C++14 compiler only if |
| explicitly requested. However, earlier versions of @R{} will use the |
| default C++14 mode of @command{g++} 6 and later.}, since the emulation |
| typically leads to a namespace clash. In order to ensure that the code |
| also compiles under C++14, something like the following should be |
| done: |
| @example |
| #if __cplusplus >= 201402L |
| using std::make_unique; |
| #else |
| // your emulation |
| #endif |
| @end example |
| |
| @noindent |
| Code needing C++14 features would do better to test for their presence |
| @emph{via} `SD-6 feature tests'@footnote{See |
| @uref{https://isocpp.org/@/std/@/standing-documents/@/sd-6-sg10-feature-test-recommendations} |
| or |
| @uref{http://en.cppreference.com/@/w/@/cpp/@/experimental/@/feature_test}. |
| It seems a reasonable assumption that any compiler promising some C++14 |
| conformance will provide these---e.g.@: @command{g++} 4.9.x did but |
| 4.8.5 did not.}. That test could be |
| |
| @example |
| #include <memory> // header where this is defined |
| #if defined(__cpp_lib_make_unique) && (__cpp_lib_make_unique >= 201304) |
| using std::make_unique; |
| #else |
| // your emulation |
| #endif |
| @end example |
| |
| Note that @command{g++} 4.9.x (as used for @R{} on Windows at |
| least up to 3.6.x) has only partial C++14 support, and the flag to |
| obtain that support is not included in the default Windows build of @R{} |
| --- one could try something like |
| |
| @example |
| CXX14="$(BINPREF)g++ $(M_ARCH)" |
| CXX14FLAGS="-O2 -Wall" |
| CXX14STD=-std=gnu1y |
| @end example |
| |
| @noindent |
| in @file{@var{HOME}/.R/Makevars.win}. |
| |
| |
| @node Using C++17 code, , Using C++14 code, Configure and cleanup |
| @subsection Using C++17 code |
| |
| Support for C++17 was added to @R{} version 3.4.0. The @file{configure} |
| script tests a subset of C++17 features. @code{clang 4.0.0} and |
| @code{gcc 7.1} and later versions passed these tests (with flag |
| @option{-std=gnu++17} or @option{-std=gnu++1z} chosen by the |
| @file{configure} script). Note that the C++17 feature tests are |
| incomplete and are subject to change in future @R{} versions as |
| support for the standard improves. |
| @c Both the compiler and the library are involved, conceivably also the |
| @c OS headers. |
| |
| The variables associated with the C++17 compiler use the prefix |
| @samp{CXX17}. Hence to use C++17 code in a package, the package's |
| @file{Makevars} file (or @file{Makevars.win} on Windows) should |
| include the line |
| @example |
| CXX_STD = CXX17 |
| @end example |
| |
| In the absence of a @file{Makevars} file, C++17 support can also be |
| requested by the line: |
| @example |
| SystemRequirements: C++17 |
| @end example |
| @noindent |
| in the @file{DESCRIPTION} file. Finally, the C++17 compiler can be |
| used systematically by setting the environment variable @env{USE_CXX17}. |
| |
| As for C++14, feature tests can be used (and probably should be as |
| support is still patchy, especially library support). |
| |
| No C++17 support is enabled in the current default build of @R{} on |
| Windows. |
| |
| @node Checking and building packages, Writing package vignettes, Configure and cleanup, Creating R packages |
| @section Checking and building packages |
| |
| Before using these tools, please check that your package can be |
| installed (which checked it can be loaded). @code{R CMD check} will |
| @emph{inter alia} do this, but you may get more detailed error messages |
| doing the install directly. |
| |
| @menu |
| * Checking packages:: |
| * Building package tarballs:: |
| * Building binary packages:: |
| @end menu |
| |
| If your package specifies an encoding in its @file{DESCRIPTION} file, |
| you should run these tools in a locale which makes use of that encoding: |
| they may not work at all or may work incorrectly in other locales |
| (although UTF-8 locales will most likely work). |
| |
| @quotation Note |
| @code{R CMD check} and @code{R CMD build} run @R{} processes with |
| @option{--vanilla} in which none of the user's startup files are read. |
| If you need @env{R_LIBS} set (to find packages in a non-standard |
| library) you can set it in the environment: also you can use the check |
| and build environment files (as specified by the environment variables |
| @env{R_CHECK_ENVIRON} and @env{R_BUILD_ENVIRON}; if unset, |
| files@footnote{On systems which use sub-architectures, |
| architecture-specific versions such as @file{~/.R/check.Renviron.i386} |
| take precedence.} @file{~/.R/check.Renviron} and |
| @file{~/.R/build.Renviron} are used) to set environment variables when |
| using these utilities. |
| @end quotation |
| |
| @quotation Note to Windows users |
| @code{R CMD build} may make use of the Windows toolset (see the ``R |
| Installation and Administration'' manual) if present and in your path, |
| and it is required for packages which need it to install (including |
| those with @file{configure.win} or @file{cleanup.win} scripts or a |
| @file{src} directory) and e.g.@: need vignettes built. |
| |
| You may need to set the environment variable @env{TMPDIR} to point to a |
| suitable writable directory with a path not containing spaces -- use |
| forward slashes for the separators. Also, the directory needs to be on |
| a case-honouring file system (some network-mounted file systems are |
| not). |
| @end quotation |
| |
| |
| @node Checking packages, Building package tarballs, Checking and building packages, Checking and building packages |
| @subsection Checking packages |
| @cindex Checking packages |
| |
| @findex R CMD check |
| Using @code{R CMD check}, the @R{} package checker, one can test whether |
| @emph{source} @R{} packages work correctly. It can be run on one or |
| more directories, or compressed package @command{tar} archives with |
| extension @file{.tar.gz}, @file{.tgz}, @file{.tar.bz2} or |
| @file{.tar.xz}. |
| |
| It is strongly recommended that the final checks are run on a |
| @command{tar} archive prepared by @command{R CMD build}. |
| |
| This runs a series of checks, including |
| |
| @enumerate |
| @item |
| The package is installed. This will warn about missing cross-references |
| and duplicate aliases in help files. |
| |
| @item |
| The file names are checked to be valid across file systems and supported |
| operating system platforms. |
| |
| @item |
| The files and directories are checked for sufficient permissions |
| (Unix-alikes only). |
| |
| @item |
| The files are checked for binary executables, using a suitable version |
| of @command{file} if available@footnote{A suitable @command{file.exe} is |
| part of the Windows toolset: it checks for @command{gfile} if a suitable |
| @command{file} is not found: the latter is available in the OpenCSW |
| collection for Solaris at @uref{http://www.opencsw.org}. The source |
| repository is @uref{ftp://ftp.astron.com/pub/file/}.}. (There may be |
| rare false positives.) |
| |
| @item |
| The @file{DESCRIPTION} file is checked for completeness, and some of its |
| entries for correctness. Unless installation tests are skipped, |
| checking is aborted if the package dependencies cannot be resolved at |
| run time. (You may need to set @env{R_LIBS} in the environment if |
| dependent packages are in a separate library tree.) One check is that |
| the package name is not that of a standard package, nor one of the |
| defunct standard packages (@samp{ctest}, @samp{eda}, @samp{lqs}, |
| @samp{mle}, @samp{modreg}, @samp{mva}, @samp{nls}, @samp{stepfun} and |
| @samp{ts}). Another check is that all packages mentioned in |
| @code{library} or @code{require}s or from which the @file{NAMESPACE} |
| file imports or are called @emph{via} @code{::} or @code{:::} are listed |
| (in @samp{Depends}, @samp{Imports}, @samp{Suggests}): this is not an |
| exhaustive check of the actual imports. |
| |
| @item |
| Available index information (in particular, for demos and vignettes) is |
| checked for completeness. |
| |
| @item |
| The package subdirectories are checked for suitable file names and for |
| not being empty. The checks on file names are controlled by the option |
| @option{--check-subdirs=@var{value}}. This defaults to @samp{default}, |
| which runs the checks only if checking a tarball: the default can be |
| overridden by specifying the value as @samp{yes} or @samp{no}. Further, |
| the check on the @file{src} directory is only run if the package |
| does not contain a @file{configure} script (which corresponds to the |
| value @samp{yes-maybe}) and there is no @file{src/Makefile} or |
| @file{src/Makefile.in}. |
| |
| To allow a @file{configure} script to generate suitable files, files |
| ending in @samp{.in} will be allowed in the @file{R} directory. |
| |
| A warning is given for directory names that look like @R{} package check |
| directories -- many packages have been submitted to @acronym{CRAN} |
| containing these. |
| |
| @item |
| The @R{} files are checked for syntax errors. Bytes which are |
| non-@acronym{ASCII} are reported as warnings, but these should be |
| regarded as errors unless it is known that the package will always be |
| used in the same locale. |
| |
| @item |
| It is checked that the package can be loaded, first with the usual |
| default packages and then only with package @pkg{base} already |
| loaded. It is checked that the namespace this can be loaded in an empty |
| session with only the @pkg{base} namespace loaded. (Namespaces and |
| packages can be loaded very early in the session, before the default |
| packages are available, so packages should work then.) |
| |
| @item |
| The @R{} files are checked for correct calls to @code{library.dynam}. |
| Package startup functions are checked for correct argument lists and |
| (incorrect) calls to functions which modify the search path or |
| inappropriately generate messages. The @R{} code is checked for |
| possible problems using @CRANpkg{codetools}. In addition, it is checked |
| whether S3 methods have all arguments of the corresponding generic, and |
| whether the final argument of replacement functions is called |
| @samp{value}. All foreign function calls (@code{.C}, @code{.Fortran}, |
| @code{.Call} and @code{.External} calls) are tested to see if they have |
| a @code{PACKAGE} argument, and if not, whether the appropriate DLL might |
| be deduced from the namespace of the package. Any other calls are |
| reported. (The check is generous, and users may want to supplement this |
| by examining the output of @code{tools::checkFF("mypkg", verbose=TRUE)}, |
| especially if the intention were to always use a @code{PACKAGE} |
| argument) |
| |
| @item |
| The @file{Rd} files are checked for correct syntax and metadata, |
| including the presence of the mandatory fields (@code{\name}, @code{\alias}, |
| @code{\title} and @code{\description}). The @file{Rd} name and |
| title are checked for being non-empty, and there is a check for missing |
| cross-references (links). |
| |
| @item |
| A check is made for missing documentation entries, such as undocumented |
| user-level objects in the package. |
| |
| @item |
| Documentation for functions, data sets, and S4 classes is checked for |
| consistency with the corresponding code. |
| |
| @item |
| It is checked whether all function arguments given in @code{\usage} |
| sections of @file{Rd} files are documented in the corresponding |
| @code{\arguments} section. |
| |
| @item |
| The @file{data} directory is checked for non-@acronym{ASCII} characters |
| and for the use of reasonable levels of compression. |
| |
| @item |
| C, C++ and Fortran source and header files@footnote{An exception is made |
| for subdirectories with names starting @samp{win} or @samp{Win}.} are |
| tested for portable (LF-only) line endings. If there is a |
| @file{Makefile} or @file{Makefile.in} or @file{Makevars} or |
| @file{Makevars.in} file under the @file{src} directory, it is checked |
| for portable line endings and the correct use of @samp{$(BLAS_LIBS)} and |
| @samp{$(LAPACK_LIBS)} |
| |
| Compiled code is checked for symbols corresponding to functions which |
| might terminate @R{} or write to @file{stdout}/@file{stderr} instead of |
| the console. Note that the latter might give false positives in that |
| the symbols might be pulled in with external libraries and could never |
| be called. Windows@footnote{on most other platforms such runtime |
| libraries are dynamic, but static libraries are currently used on |
| Windows because the toolchain is not a standard part of the OS.} users |
| should note that the Fortran and C++ runtime libraries are examples of |
| such external libraries. |
| |
| @item |
| Some checks are made of the contents of the @file{inst/doc} directory. |
| These always include checking for files that look like leftovers, and if |
| suitable tools (such as @command{qpdf}) are available, checking that the |
| PDF documentation is of minimal size. |
| |
| @item |
| The examples provided by the package's documentation are run. |
| (@pxref{Writing R documentation files}, for information on using |
| @code{\examples} to create executable example code.) If there is a file |
| @file{tests/Examples/@var{pkg}-Ex.Rout.save}, the output of running the |
| examples is compared to that file. |
| |
| Of course, released packages should be able to run at least their own |
| examples. Each example is run in a `clean' environment (so earlier |
| examples cannot be assumed to have been run), and with the variables |
| @code{T} and @code{F} redefined to generate an error unless they are set |
| in the example: @xref{Logical vectors, , Logical vectors, R-intro, An |
| Introduction to R}. |
| |
| @item |
| If the package sources contain a @file{tests} directory then the tests |
| specified in that directory are run. (Typically they will consist of a |
| set of @file{.R} source files and target output files |
| @file{.Rout.save}.) Please note that the comparison will be done in the |
| end user's locale, so the target output files should be @acronym{ASCII} |
| if at all possible. (The command line option @code{--test-dir=foo} may |
| be used to specify tests in a non-standard location. For example, |
| unusually slow tests could be placed in @file{inst/slowTests} and then |
| @code{R CMD check --test-dir=inst/slowTests} would be used to run them. |
| Other names that have been suggested are, for example, |
| @file{inst/testWithOracle} for tests that require Oracle to be installed, |
| @file{inst/randomTests} for tests which use random values and may |
| occasionally fail by chance, etc.) |
| |
| @item |
| The code in package vignettes (@pxref{Writing package vignettes}) is |
| executed, and the vignette PDFs re-made from their sources as a check of |
| completeness of the sources (unless there is a @samp{BuildVignettes} |
| field in the package's @file{DESCRIPTION} file with a false value). If |
| there is a target output file @file{.Rout.save} in the vignette source |
| directory, the output from running the code in that vignette is compared |
| with the target output file and any differences are reported (but not |
| recorded in the log file). (If the vignette sources are in the |
| deprecated location @file{inst/doc}, do mark such target output files to |
| not be installed in @file{.Rinstignore}.) |
| |
| If there is an error@footnote{or if option @option{--use-valgrind} is |
| used or environment variable @env{_R_CHECK_ALWAYS_LOG_VIGNETTE_OUTPUT_} |
| is set to a true value or if there are differences from a target output |
| file} in executing the @R{} code in vignette @file{@var{foo.ext}}, a log |
| file @file{@var{foo.ext}.log} is created in the check directory. The |
| vignette PDFs are re-made in a copy of the package sources in the |
| @file{vign_test} subdirectory of the check directory, so for further |
| information on errors look in directory |
| @file{@var{pkgname}/vign_test/vignettes}. (It is only retained if there |
| are errors or if environment variable @env{_R_CHECK_CLEAN_VIGN_TEST_} is |
| set to a false value.) |
| |
| @item |
| The PDF version of the package's manual is created (to check that the |
| @file{Rd} files can be converted successfully). This needs @LaTeX{} and |
| suitable fonts and @LaTeX{} packages to be installed. |
| @ifset UseExternalXrefs |
| @xref{Making the manuals, , Making the manuals, |
| R-admin, R Installation and Administration}. |
| @end ifset |
| @ifclear UseExternalXrefs |
| See the section `Making the manuals' in the `R Installation and |
| Administration' manual' for further details. |
| @end ifclear |
| |
| @end enumerate |
| |
| All these tests are run with collation set to the @code{C} locale, and |
| for the examples and tests with environment variable @env{LANGUAGE=en}: |
| this is to minimize differences between platforms. |
| |
| Use @kbd{R CMD check --help} to obtain more information about the usage |
| of the @R{} package checker. A subset of the checking steps can be |
| selected by adding command-line options. It also allows customization by |
| setting environment variables @w{@env{_R_CHECK_*_}} as described in |
| @ifset UseExternalXrefs |
| @ref{Tools, , Tools, R-ints, R Internals}: |
| @end ifset |
| @ifclear UseExternalXrefs |
| `R Internals': |
| @end ifclear |
| a set of these customizations similar to those used by @acronym{CRAN} |
| can be selected by the option @option{--as-cran} (which works best if |
| Internet access is available). Some Windows users may |
| need to set environment variable @env{R_WIN_NO_JUNCTIONS} to a non-empty |
| value. The test of cyclic declarations@footnote{For example, in early |
| 2014 @CRANpkg{gdata} declared @samp{Imports: gtools} and @CRANpkg{gtools} |
| declared @samp{Imports: gdata}.}in @file{DESCRIPTION} files needs |
| repositories (including @acronym{CRAN}) set: do this in |
| @file{~/.Rprofile}, by e.g.@: |
| @example |
| options(repos = c(CRAN="https://cran.r-project.org")) |
| @end example |
| |
| One check customization which can be revealing is |
| @example |
| _R_CHECK_CODETOOLS_PROFILE_="suppressLocalUnused=FALSE" |
| @end example |
| @noindent |
| which reports unused local assignments. Not only does this point out |
| computations which are unnecessary because their results are unused, it |
| also can uncover errors. (Two such are to intend to update an object by |
| assigning a value but mistype its name or assign in the wrong scope, |
| for example using @code{<-} where @code{<<-} was intended.) This can |
| give false positives, most commonly because of non-standard evaluation |
| for formulae and because the intention is to return objects in the |
| environment of a function for later use. |
| |
| Complete checking of a package which contains a file @file{README.md} |
| needs a reasonably current version of @command{pandoc} installed: see |
| @uref{http://johnmacfarlane.net/@/pandoc/@/installing.html}. |
| |
| You do need to ensure that the package is checked in a suitable locale |
| if it contains non-@acronym{ASCII} characters. Such packages are likely |
| to fail some of the checks in a @code{C} locale, and @command{R CMD |
| check} will warn if it spots the problem. You should be able to check |
| any package in a UTF-8 locale (if one is available). Beware that |
| although a @code{C} locale is rarely used at a console, it may be the |
| default if logging in remotely or for batch jobs. |
| |
| @quotation Multiple sub-architectures |
| On systems which support multiple sub-architectures (principally |
| Windows), @command{R CMD check} will install and check a package which |
| contains compiled code under all available sub-architectures. (Use |
| option @option{--force-multiarch} to force this for packages without |
| compiled code, which are otherwise only checked under the main |
| sub-architecture.) This will run the loading tests, examples and |
| @file{tests} directory under each installed sub-architecture in turn, |
| and give an error if any fail. Where environment variables (including |
| perhaps @env{PATH}) need to be set differently for each |
| sub-architecture, these can be set in architecture-specific files such |
| as @file{@var{R_HOME}/etc/i386/Renviron.site}. |
| |
| An alternative approach is to use @command{R CMD check --no-multiarch} |
| to check the primary sub-architecture, and then to use something like |
| @command{R --arch=x86_64 CMD check --extra-arch} or (Windows) |
| @command{/path/to/R/bin/x64/Rcmd check --extra-arch} to run for each |
| additional sub-architecture just the checks@footnote{loading, examples, |
| tests, running vignette code} which differ by sub-architecture. (This |
| approach is required for packages which are installed by @command{R CMD |
| INSTALL --merge-multiarch}.) |
| |
| Where packages need additional commands to install all the |
| sub-architectures these can be supplied by e.g.@: |
| @option{--install-args=--force-biarch}. |
| |
| @end quotation |
| |
| |
| @node Building package tarballs, Building binary packages, Checking packages, Checking and building packages |
| @subsection Building package tarballs |
| @cindex Building source packages |
| |
| @findex R CMD build |
| @cindex Package builder |
| @cindex tarballs |
| Packages may be distributed in source form as ``tarballs'' |
| (@file{.tar.gz} files) or in binary form. The source form can be |
| installed on all platforms with suitable tools and is the usual form for |
| Unix-like systems; the binary form is platform-specific, and is the more |
| common distribution form for the Windows and macOS platforms. |
| |
| Using @command{R CMD build}, the @R{} package builder, one can build |
| @R{} package tarballs from their sources (for example, for subsequent |
| release). It is recommended that packages are built for release by the |
| current release version of @R{} or @samp{r-patched}, to avoid |
| inadvertently picking up new features of a development version of @R{}. |
| |
| Prior to actually building the package in the standard gzipped tar file |
| format, a few diagnostic checks and cleanups are performed. In |
| particular, it is tested whether object indices exist and can be assumed |
| to be up-to-date, and C, C++ and Fortran source files and relevant |
| makefiles in a @file{src} directory are tested and converted to LF |
| line-endings if necessary. |
| |
| Run-time checks whether the package works correctly should be performed |
| using @command{R CMD check} prior to invoking the final build procedure. |
| |
| @cindex .Rbuildignore file |
| To exclude files from being put into the package, one can specify a list |
| of exclude patterns in file @file{.Rbuildignore} in the top-level source |
| directory. These patterns should be Perl-like regular expressions (see |
| the help for @code{regexp} in @R{} for the precise details), one per |
| line, to be matched case-insensitively against the file and directory |
| names relative to the top-level package source directory. In addition, |
| directories from source control systems@footnote{called @file{CVS} or |
| @file{.svn} or @file{.arch-ids} or @file{.bzr} or @file{.git} (but not |
| files called @file{.git}) or @file{.hg}.} or from |
| @command{eclipse}@footnote{called @file{.metadata}.}, directories with |
| names ending @file{.Rcheck} or @file{Old} or @file{old} and files |
| @file{GNUMakefile}@footnote{which is an error: GNU make uses |
| @file{GNUmakefile}.}, @file{Read-and-delete-me} or with base names |
| starting with @samp{.#}, or starting and ending with @samp{#}, or ending |
| in @samp{~}, @samp{.bak} or @samp{.swp}, are excluded by default. In |
| addition, those files in the @file{R}, @file{demo} and @file{man} |
| directories which are flagged by @command{R CMD check} as having invalid |
| names will be excluded. |
| |
| Use @kbd{R CMD build --help} to obtain more information about the usage |
| of the @R{} package builder. |
| |
| @c DESCRIPTION field BuildVignettes |
| Unless @kbd{R CMD build} is invoked with the |
| @option{--no-build-vignettes} option (or the package's |
| @file{DESCRIPTION} contains @samp{BuildVignettes: no} or similar), it |
| will attempt to (re)build the vignettes (@pxref{Writing package |
| vignettes}) in the package. To do so it installs the current package |
| into a temporary library tree, but any dependent packages need to be |
| installed in an available library tree (see the Note: at the top of this |
| section). |
| |
| @c DESCRIPTION field BuildManual |
| Similarly, if the @file{.Rd} documentation files contain any |
| @code{\Sexpr} macros (@pxref{Dynamic pages}), the package will be |
| temporarily installed to execute them. Post-execution binary copies of |
| those pages containing build-time macros will be saved in |
| @file{build/partial.rdb}. If there are any install-time or render-time |
| macros, a @file{.pdf} version of the package manual will be built and |
| installed in the @file{build} subdirectory. (This allows |
| @acronym{CRAN} or other repositories to display the manual even if they |
| are unable to install the package.) This can be suppressed by the |
| option @option{--no-manual} or if package's @file{DESCRIPTION} contains |
| @samp{BuildManual: no} or similar. |
| |
| @c DESCRIPTION field BuildKeepEmpty |
| One of the checks that @command{R CMD build} runs is for empty source |
| directories. These are in most (but not all) cases unintentional, if |
| they are intentional use the option @option{--keep-empty-dirs} (or set |
| the environment variable @env{_R_BUILD_KEEP_EMPTY_DIRS_} to @samp{TRUE}, |
| or have a @samp{BuildKeepEmpty} field with a true value in the |
| @file{DESCRIPTION} file). |
| |
| @c DESCRIPTION field BuildResaveData |
| The @option{--resave-data} option allows saved images (@file{.rda} and |
| @file{.RData} files) in the @file{data} directory to be optimized for |
| size. It will also compress tabular files and convert @file{.R} files |
| to saved images. It can take values @code{no}, @code{gzip} (the default |
| if this option is not supplied, which can be changed by setting the |
| environment variable @env{_R_BUILD_RESAVE_DATA_}) and @code{best} |
| (equivalent to giving it without a value), which chooses the most |
| effective compression. Using @code{best} adds a dependence on @code{R |
| (>= 2.10)} to the @file{DESCRIPTION} file if @command{bzip2} or |
| @command{xz} compression is selected for any of the files. If this is |
| thought undesirable, @option{--resave-data=gzip} (which is the default |
| if that option is not supplied) will do what compression it can with |
| @command{gzip}. A package can control how its data is resaved by |
| supplying a @samp{BuildResaveData} field (with one of the values given |
| earlier in this paragraph) in its @file{DESCRIPTION} file. |
| |
| The @option{--compact-vignettes} option will run |
| @code{tools::compactPDF} over the PDF files in @file{inst/doc} (and its |
| subdirectories) to losslessly compress them. This is not enabled by |
| default (it can be selected by environment variable |
| @env{_R_BUILD_COMPACT_VIGNETTES_}) and needs @command{qpdf} |
| (@uref{http://qpdf.sourceforge.net/}) to be available. |
| |
| It can be useful to run @command{R CMD check --check-subdirs=yes} on the |
| built tarball as a final check on the contents. |
| |
| Where a non-POSIX file system is in use which does not utilize execute |
| permissions, some care is needed with permissions. This applies on |
| Windows and to e.g.@: FAT-formatted drives and SMB-mounted file systems |
| on other OSes. The `mode' of the file recorded in the tarball will be |
| whatever @code{file.info()} returns. On Windows this will record only |
| directories as having execute permission and on other OSes it is likely |
| that all files have reported `mode' @code{0777}. A particular issue is |
| packages being built on Windows which are intended to contain executable |
| scripts such as @file{configure} and @file{cleanup}: @command{R CMD |
| build} ensures those two are recorded with execute permission. |
| |
| Directory @file{build} of the package sources is reserved for use by |
| @command{R CMD build}: it contains information which may not easily be |
| created when the package is installed, including index information on |
| the vignettes and, rarely, information on the help pages and perhaps a |
| copy of the PDF reference manual (see above). |
| |
| @node Building binary packages, , Building package tarballs, Checking and building packages |
| @subsection Building binary packages |
| @cindex Building binary packages |
| |
| Binary packages are compressed copies of installed versions of |
| packages. They contain compiled shared libraries rather than C, C++ or |
| Fortran source code, and the R functions are included in their installed |
| form. The format and filename are platform-specific; for example, a |
| binary package for Windows is usually supplied as a @file{.zip} file, |
| and for the macOS platform the default binary package file extension is |
| @file{.tgz}. |
| |
| The recommended method of building binary packages is to use |
| |
| @command{R CMD INSTALL --build pkg} |
| @noindent |
| where @file{pkg} is either the name of a source tarball (in the usual |
| @file{.tar.gz} format) or the location of the directory of the package |
| source to be built. This operates by first installing the package and |
| then packing the installed binaries into the appropriate binary package |
| file for the particular platform. |
| |
| By default, @command{R CMD INSTALL --build} will attempt to install the |
| package into the default library tree for the local installation of |
| @R{}. This has two implications: |
| |
| @itemize @bullet |
| @item |
| If the installation is successful, it will overwrite any existing installation |
| of the same package. |
| |
| @item |
| The default library tree must have write permission; if not, the package will |
| not install and the binary will not be created. |
| |
| @end itemize |
| |
| @noindent |
| To prevent changes to the present working installation or to provide an |
| install location with write access, create a suitably located directory |
| with write access and use the @command{-l} option to build the package |
| in the chosen location. The usage is then |
| |
| @command{R CMD INSTALL -l location --build pkg} |
| |
| @noindent |
| where @file{location} is the chosen directory with write access. The package |
| will be installed as a subdirectory of @file{location}, and the package binary |
| will be created in the current directory. |
| |
| Other options for @command{R CMD INSTALL} can be found using @command{R |
| CMD INSTALL --help}, and platform-specific details for special cases are |
| discussed in the platform-specific FAQs. |
| |
| |
| @c In much earlier versions of @R{}, @command{R CMD build --binary} could |
| @c build a binary version of a package, but this approach is now deprecated |
| @c in favour of @command{R CMD INSTALL --build}. |
| |
| |
| Finally, at least one web-based service is available for building binary |
| packages from (checked) source code: WinBuilder (see |
| @uref{https://win-builder.R-project.org/}) is able to build Windows |
| binaries. Note that this is intended for developers on other platforms |
| who do not have access to Windows but wish to provide binaries for the |
| Windows platform. |
| |
| @node Writing package vignettes, Package namespaces, Checking and building packages, Creating R packages |
| @section Writing package vignettes |
| @cindex vignettes |
| @cindex Sweave |
| |
| @menu |
| * Encodings and vignettes:: |
| * Non-Sweave vignettes:: |
| @end menu |
| |
| In addition to the help files in @file{Rd} format, @R{} packages allow |
| the inclusion of documents in arbitrary other formats. The standard |
| location for these is subdirectory @file{inst/doc} of a source package, |
| the contents will be copied to subdirectory @file{doc} when the package |
| is installed. Pointers from package help indices to the installed |
| documents are automatically created. Documents in @file{inst/doc} can |
| be in arbitrary format, however we strongly recommend providing them in |
| PDF format, so users on almost all platforms can easily read them. To |
| ensure that they can be accessed from a browser (as an @HTML{} index is |
| provided), the file names should start with an @acronym{ASCII} letter |
| and be comprised entirely of @acronym{ASCII} letters or digits or hyphen |
| or underscore. |
| |
| A special case is @emph{package vignettes}. Vignettes are documents in |
| PDF or @HTML{} format obtained from plain text literate source files |
| from which @R{} knows how to extract @R{} code and create output (in |
| PDF/@HTML{} or intermediate @LaTeX{}). Vignette engines do this work, |
| using ``tangle'' and ``weave'' functions respectively. Sweave, provided |
| by the R distribution, is the default engine. Other vignette engines |
| besides Sweave are supported; see @ref{Non-Sweave vignettes}. |
| |
| Package vignettes have their sources in subdirectory @file{vignettes} of |
| the package sources. Note that the location of the vignette sources |
| only affects @command{R CMD build} and @command{R CMD check}: the |
| tarball built by @command{R CMD build} includes in @file{inst/doc} the |
| components intended to be installed. |
| |
| Sweave vignette sources are normally given the file extension |
| @file{.Rnw} or @file{.Rtex}, but for historical reasons |
| extensions@footnote{and to avoid problems with case-insensitive file |
| systems, lower-case versions of all these extensions.} @file{.Snw} and |
| @file{.Stex} are also recognized. Sweave allows the integration of |
| @LaTeX{} documents: see the @code{Sweave} help page in @R{} and the |
| @code{Sweave} vignette in package @pkg{utils} for details on the |
| source document format. |
| |
| Package vignettes are tested by @code{R CMD check} by executing all @R{} |
| code chunks they contain (except those marked for non-evaluation, e.g., |
| with option @code{eval=FALSE} for Sweave). The @R{} working directory |
| for all vignette tests in @code{R CMD check} is a @emph{copy} of the |
| vignette source directory. Make sure all files needed to run the @R{} |
| code in the vignette (data sets, @dots{}) are accessible by either |
| placing them in the @file{inst/doc} hierarchy of the source package or |
| by using calls to @code{system.file()}. All other files needed to |
| re-make the vignettes (such as @LaTeX{} style files, Bib@TeX{} input |
| files and files for any figures not created by running the code in the |
| vignette) must be in the vignette source directory. @code{R CMD check} |
| will check that vignette production has succeeded by comparing |
| modification times of output files in @file{inst/doc} with |
| the source in @file{vignettes}. |
| |
| @code{R CMD build} will automatically@footnote{unless inhibited by using |
| @samp{BuildVignettes: no} in the @file{DESCRIPTION} file.} create the |
| (PDF or @HTML{} versions of the) vignettes in @file{inst/doc} for |
| distribution with the package sources. By including the vignette |
| outputs in the package sources it is not necessary that these can be |
| re-built at install time, i.e., the package author can use private @R{} |
| packages, screen snapshots and @LaTeX{} extensions which are only |
| available on their machine.@footnote{provided the conditions of the |
| package's license are met: many, including @acronym{CRAN}, see the |
| omission of source components as incompatible with an Open Source |
| license.} |
| |
| By default @code{R CMD build} will run @code{Sweave} on all Sweave |
| vignette source files in @file{vignettes}. If @file{Makefile} is found |
| in the vignette source directory, then @code{R CMD build} will try to |
| run @command{make} after the @code{Sweave} runs, otherwise |
| @code{texi2pdf} is run on each @file{.tex} file produced. |
| |
| The first target in the @file{Makefile} should take care of both |
| creation of PDF/@HTML{} files and cleaning up afterwards (including |
| after @code{Sweave}), i.e., delete all files that shall not appear in |
| the final package archive. Note that if the @code{make} step runs @R{} |
| it needs to be careful to respect the environment values of @env{R_LIBS} |
| and @env{R_HOME}@footnote{@code{R_HOME/bin} is prepended to the |
| @env{PATH} so that references to @command{R} or @command{Rscript} in the |
| @file{Makefile} do make use of the currently running version of @R{}.}. |
| Finally, if there is a @file{Makefile} and it has a @samp{clean:} |
| target, @command{make clean} is run. |
| |
| All the usual @emph{caveats} about including a @file{Makefile} apply. |
| It must be portable (no @acronym{GNU} extensions), use LF line endings |
| and must work correctly with a parallel @command{make}: too many authors |
| have written things like |
| |
| @example |
| ## BAD EXAMPLE |
| all: pdf clean |
| |
| pdf: ABC-intro.pdf ABC-details.pdf |
| |
| %.pdf: %.tex |
| texi2dvi --pdf $* |
| |
| clean: |
| rm *.tex ABC-details-*.pdf |
| @end example |
| |
| @noindent |
| which will start removing the source files whilst @command{pdflatex} is |
| working. |
| |
| Metadata lines can be placed in the source file, preferably in @LaTeX{} |
| comments in the preamble. One such is a @code{\VignetteIndexEntry} of |
| the form |
| @example |
| %\VignetteIndexEntry@{Using Animal@} |
| @end example |
| @noindent |
| Others you may see are @code{\VignettePackage} (currently ignored), |
| @code{\VignetteDepends} and @code{\VignetteKeyword} (which replaced |
| @code{\VignetteKeywords}). These are processed at package installation |
| time to create the saved data frame @file{Meta/vignette.rds}, but only |
| the @code{\VignetteIndexEntry} and @code{\VignetteKeyword} statements |
| are currently used. The @code{\VignetteEngine} statement |
| is described in @ref{Non-Sweave vignettes}. |
| |
| At install time an @HTML{} index for all vignettes in the package is |
| automatically created from the @code{\VignetteIndexEntry} statements |
| unless a file @file{index.html} exists in directory |
| @file{inst/doc}. This index is linked from the @HTML{} help index for |
| the package. If you do supply a @file{inst/doc/index.html} file it |
| should contain relative links only to files under the installed |
| @file{doc} directory, or perhaps (not really an index) to @HTML{} help |
| files or to the @file{DESCRIPTION} file, and be valid @HTML{} as |
| confirmed @emph{via} the @uref{https://validator.w3.org, W3C Markup |
| Validation Service} or @uref{https://validator.nu/, Validator.nu}. |
| |
| Sweave/Stangle allows the document to specify the @code{split=TRUE} |
| option to create a single @R{} file for each code chunk: this will not |
| work for vignettes where it is assumed that each vignette source |
| generates a single file with the vignette extension replaced by |
| @file{.R}. |
| |
| Do watch that PDFs are not too large -- one in a @acronym{CRAN} package |
| was 72MB! This is usually caused by the inclusion of overly detailed |
| figures, which will not render well in PDF viewers. Sometimes it is |
| much better to generate fairly high resolution bitmap (PNG, JPEG) |
| figures and include those in the PDF document. |
| |
| @cindex .install_extras file |
| When @command{R CMD build} builds the vignettes, it copies these and |
| the vignette sources from directory @file{vignettes} to @file{inst/doc}. |
| To install any other files from the @file{vignettes} directory, include |
| a file @file{vignettes/.install_extras} which specifies these as |
| Perl-like regular expressions on one or more lines. (See the |
| description of the @file{.Rinstignore} file for full details.) |
| |
| |
| @node Encodings and vignettes, Non-Sweave vignettes, Writing package vignettes, Writing package vignettes |
| @subsection Encodings and vignettes |
| |
| Vignettes will in general include descriptive text, @R{} input, @R{} |
| output and figures, @LaTeX{} include files and bibliographic references. |
| As any of these may contain non-@acronym{ASCII} characters, the handling |
| of encodings can become very complicated. |
| |
| The vignette source file should be written in @acronym{ASCII} or contain |
| a declaration of the encoding (see below). This applies even to |
| comments within the source file, since vignette engines process comments |
| to look for options and metadata lines. When an engine's weave and |
| tangle functions are called on the vignette source, it will be converted |
| to the encoding of the current @R{} session. |
| |
| @code{Stangle()} will produce an @R{} code file in the current locale's |
| encoding: for a non-@acronym{ASCII} vignette what that is is recorded in a |
| comment at the top of the file. |
| |
| @code{Sweave()} will produce a @file{.tex} file in the current |
| encoding, or in UTF-8 if that is declared. Non-@acronym{ASCII} encodings |
| need to be declared to @LaTeX{} via a line like |
| @example |
| \usepackage[utf8]@{inputenc@} |
| @end example |
| @noindent |
| (It is also possible to use the more recent @samp{inputenx} @LaTeX{} |
| package.) For files where this line is not needed (e.g.@: chapters |
| included within the body of a larger document, or non-Sweave |
| vignettes), the encoding may be declared using a comment like |
| @example |
| %\VignetteEncoding@{UTF-8@} |
| @end example |
| @noindent |
| If the encoding is UTF-8, this can also be declared using |
| the declaration |
| @example |
| %\SweaveUTF8 |
| @end example |
| @noindent |
| If no declaration is given in the vignette, it will be assumed to be |
| in the encoding declared for the package. If there is no encoding |
| declared in either place, then it is an error to use non-@acronym{ASCII} |
| characters in the vignette. |
| |
| In any case, be aware that @LaTeX{} may require the @samp{usepackage} |
| declaration. |
| |
| @code{Sweave()} will also parse and evaluate the @R{} code in each |
| chunk. The @R{} output will also be in the current locale (or @acronym{UTF-8} |
| if so declared), and should |
| be covered by the @samp{inputenc} declaration. One thing people often |
| forget is that the @R{} output may not be @acronym{ASCII} even for |
| @acronym{ASCII} @R{} sources, for many possible reasons. One common one |
| is the use of `fancy' quotes: see the @R{} help on @code{sQuote}: note |
| carefully that it is not portable to declare UTF-8 or CP1252 to cover |
| such quotes, as their encoding will depend on the locale used to run |
| @code{Sweave()}: this can be circumvented by setting |
| @code{options(useFancyQuotes="UTF-8")} in the vignette. |
| |
| The final issue is the encoding of figures -- this applies only to PDF |
| figures and not PNG etc. The PDF figures will contain declarations for |
| their encoding, but the Sweave option @code{pdf.encoding} may need to be |
| set appropriately: see the help for the @code{pdf()} graphics device. |
| |
| As a real example of the complexities, consider the @CRANpkg{fortunes} |
| package version @samp{1.4-0}. That package did not have a declared |
| encoding, and its vignette was in @acronym{ASCII}. However, the data it |
| displays are read from a UTF-8 CSV file and will be assumed to be in the |
| current encoding, so @file{fortunes.tex} will be in UTF-8 in any locale. |
| Had @code{read.table} been told the data were UTF-8, @file{fortunes.tex} |
| would have been in the locale's encoding. |
| |
| @node Non-Sweave vignettes, , Encodings and vignettes, Writing package vignettes |
| @subsection Non-Sweave vignettes |
| |
| Vignettes in formats other than Sweave are supported @emph{via} |
| ``vignette engines''. For example @CRANpkg{knitr} version 1.1 or later |
| can create @file{.tex} files from a variation on Sweave format, and |
| @file{.html} files from a variation on ``markdown'' format. These |
| engines replace the @code{Sweave()} function with other functions to |
| convert vignette source files into @LaTeX{} files for processing into |
| @file{.pdf}, or directly into @file{.pdf} or @file{.html} files. The |
| @code{Stangle()} function is replaced with a function that extracts the |
| @R{} source from a vignette. |
| |
| @R{} recognizes non-Sweave vignettes using filename extensions specified |
| by the engine. For example, the @CRANpkg{knitr} package supports |
| the extension @file{.Rmd} (standing for |
| ``R markdown''). The user indicates the vignette engine |
| within the vignette source using a @code{\VignetteEngine} line, for example |
| @example |
| %\VignetteEngine@{knitr::knitr@} |
| @end example |
| @noindent |
| This specifies the name of a package and an engine to use in place of |
| Sweave in processing the vignette. As @code{Sweave} is the only engine |
| supplied with the @R{} distribution, the package providing any other |
| engine must be specified in the @samp{VignetteBuilder} field of the |
| package @file{DESCRIPTION} file, and also specified in the |
| @samp{Suggests}, @samp{Imports} or @samp{Depends} field (since its |
| namespace must be available to build or check your package). If more |
| than one package is specified as a builder, they will be searched in the |
| order given there. The @pkg{utils} package is always implicitly |
| appended to the list of builder packages, but may be included earlier |
| to change the search order. |
| |
| Note that a package with non-Sweave vignettes should always have a |
| @samp{VignetteBuilder} field in the @file{DESCRIPTION} file, since this |
| is how @command{R CMD check} recognizes that there are vignettes to be |
| checked: packages listed there are required when the package is checked. |
| |
| The vignette engine can produce @file{.tex}, @file{.pdf}, or @file{.html} |
| files as output. If it produces @file{.tex} files, @R{} will |
| call @code{texi2pdf} to convert them to @file{.pdf} for display |
| to the user (unless there is a @file{Makefile} in the @file{vignettes} |
| directory). |
| |
| Package writers who would like to supply vignette engines need |
| to register those engines in the package @code{.onLoad} function. |
| For example, that function could make the call |
| @example |
| tools::vignetteEngine("knitr", weave = vweave, tangle = vtangle, |
| pattern = "[.]Rmd$", package = "knitr") |
| @end example |
| @noindent |
| (The actual registration in @CRANpkg{knitr} is more complicated, because |
| it supports other input formats.) See the @code{?tools::vignetteEngine} |
| help topic for details on engine registration. |
| |
| |
| @node Package namespaces, Writing portable packages, Writing package vignettes, Creating R packages |
| @section Package namespaces |
| @cindex namespaces |
| |
| @R{} has a namespace management system for code in packages. This |
| system allows the package writer to specify which variables in the |
| package should be @emph{exported} to make them available to package |
| users, and which variables should be @emph{imported} from other |
| packages. |
| |
| The namespace for a package is specified by the |
| @file{NAMESPACE} file in the top level package directory. This file |
| contains @emph{namespace directives} describing the imports and exports |
| of the namespace. Additional directives register any shared objects to |
| be loaded and any S3-style methods that are provided. Note that |
| although the file looks like @R{} code (and often has @R{}-style |
| comments) it is not processed as @R{} code. Only very simple |
| conditional processing of @code{if} statements is implemented. |
| |
| Packages are loaded and attached to the search path by calling |
| @code{library} or @code{require}. Only the exported variables are |
| placed in the attached frame. Loading a package that imports variables |
| from other packages will cause these other packages to be loaded as well |
| (unless they have already been loaded), but they will @emph{not} be |
| placed on the search path by these implicit loads. Thus code in the |
| package can only depend on objects in its own namespace and its imports |
| (including the @pkg{base} namespace) being visible@footnote{Note that |
| lazy-loaded datasets are @emph{not} in the package's namespace so need |
| to be accessed @emph{via} @code{::}, e.g.@: |
| @code{survival::survexp.us}.}. |
| |
| Namespaces are @emph{sealed} once they are loaded. Sealing means that |
| imports and exports cannot be changed and that internal variable |
| bindings cannot be changed. Sealing allows a simpler implementation |
| strategy for the namespace mechanism. Sealing also allows code |
| analysis and compilation tools to accurately identify the definition |
| corresponding to a global variable reference in a function body. |
| |
| The namespace controls the search strategy for variables used by |
| functions in the package. If not found locally, @R{} searches the |
| package namespace first, then the imports, then the base namespace and |
| then the normal search path. |
| |
| |
| @menu |
| * Specifying imports and exports:: |
| * Registering S3 methods:: |
| * Load hooks:: |
| * useDynLib:: |
| * An example:: |
| * Namespaces with S4 classes and methods:: |
| @end menu |
| |
| |
| @node Specifying imports and exports, Registering S3 methods, Package namespaces, Package namespaces |
| @subsection Specifying imports and exports |
| |
| Exports are specified using the @code{export} directive in the |
| @file{NAMESPACE} file. A directive of the form |
| |
| @findex export |
| @example |
| export(f, g) |
| @end example |
| |
| @noindent |
| specifies that the variables @code{f} and @code{g} are to be exported. |
| (Note that variable names may be quoted, and reserved words and |
| non-standard names such as @code{[<-.fractions} must be.) |
| |
| For packages with many variables to export it may be more convenient to |
| specify the names to export with a regular expression using |
| @code{exportPattern}. The directive |
| |
| @findex exportPattern |
| @example |
| exportPattern("^[^\\.]") |
| @end example |
| |
| @noindent |
| exports all variables that do not start with a period. However, such |
| broad patterns are not recommended for production code: it is better to |
| list all exports or use narrowly-defined groups. (This pattern applies |
| to S4 classes.) Beware of patterns which include names starting with a |
| period: some of these are internal-only variables and should never be |
| exported, e.g.@: @samp{.__S3MethodsTable__.} (and the code nowadays |
| excludes known cases). |
| |
| Packages implicitly import the base namespace. |
| Variables exported from other packages with namespaces need to be |
| imported explicitly using the directives @code{import} and |
| @code{importFrom}. The @code{import} directive imports all exported |
| variables from the specified package(s). Thus the directives |
| |
| @findex import |
| @example |
| import(foo, bar) |
| @end example |
| |
| @noindent |
| specifies that all exported variables in the packages @pkg{foo} and |
| @pkg{bar} are to be imported. If only some of the exported variables |
| from a package are needed, then they can be imported using |
| @code{importFrom}. The directive |
| |
| @findex importFrom |
| @example |
| importFrom(foo, f, g) |
| @end example |
| |
| @noindent |
| specifies that the exported variables @code{f} and @code{g} of the |
| package @pkg{foo} are to be imported. Using @code{importFrom} |
| selectively rather than @code{import} is good practice and recommended |
| notably when importing from packages with more than a dozen exports. |
| |
| To import every symbol from a package but for a few exceptions, |
| pass the @code{except} argument to @code{import}. The directive |
| |
| @example |
| import(foo, except=c(bar, baz)) |
| @end example |
| |
| @noindent |
| imports every symbol from @pkg{foo} except @code{bar} and |
| @code{baz}. The value of @code{except} should evaluate to something |
| coercible to a character vector, after substituting each symbol for |
| its corresponding string. |
| |
| It is possible to export variables from a namespace which it has |
| imported from other namespaces: this has to be done explicitly and not |
| @emph{via} @code{exportPattern}. |
| |
| If a package only needs a few objects from another package it can use a |
| fully qualified variable reference in the code instead of a formal |
| import. A fully qualified reference to the function @code{f} in package |
| @pkg{foo} is of the form @code{foo::f}. This is slightly less efficient |
| than a formal import and also loses the advantage of recording all |
| dependencies in the @file{NAMESPACE} file (but they still need to be |
| recorded in the @file{DESCRIPTION} file). Evaluating @code{foo::f} will |
| cause package @pkg{foo} to be loaded, but not attached, if it was not |
| loaded already---this can be an advantage in delaying the loading of a |
| rarely used package. |
| |
| Using @code{foo:::f} instead of @code{foo::f} allows access to |
| unexported objects. This is generally not recommended, as the |
| semantics of unexported objects may be changed by the package author |
| in routine maintenance. |
| |
| @node Registering S3 methods, Load hooks, Specifying imports and exports, Package namespaces |
| @subsection Registering S3 methods |
| |
| The standard method for S3-style @code{UseMethod} dispatching might fail |
| to locate methods defined in a package that is imported but not attached |
| to the search path. To ensure that these methods are available the |
| packages defining the methods should ensure that the generics are |
| imported and register the methods using @code{S3method} directives. If |
| a package defines a function @code{print.foo} intended to be used as a |
| @code{print} method for class @code{foo}, then the directive |
| |
| @findex S3method |
| @example |
| S3method(print, foo) |
| @end example |
| |
| @noindent |
| ensures that the method is registered and available for @code{UseMethod} |
| dispatch, and the function @code{print.foo} does not need to be exported. |
| Since the generic @code{print} is defined in @pkg{base} it does not need |
| to be imported explicitly. |
| |
| (Note that function and class names may be quoted, and reserved words |
| and non-standard names such as @code{[<-} and @code{function} must |
| be.) |
| |
| It is possible to specify a third argument to S3method, the function to |
| be used as the method, for example |
| |
| @example |
| S3method(print, check_so_symbols, .print.via.format) |
| @end example |
| |
| @noindent |
| when @code{print.check_so_symbols} is not needed. |
| |
| As of @R{} version 3.6.0, one can also use @code{S3method()} directives |
| to perform @emph{delayed} registration. With |
| @example |
| if(getRversion() >= "3.6.0") @{ |
| S3method(pkg::gen, cls) |
| @} |
| @end example |
| @noindent |
| function @code{gen.cls} will get registered as an S3 method for class |
| @code{cls} and generic @code{gen} from package @code{pkg} only when the |
| namespace of @code{pkg} is loaded. This can be employed to deal with |
| situations where the method is not ``immediately'' needed, and having to |
| pre-load the namespace of @code{pkg} (and all its strong dependencies) |
| in order to perform immediate registration is considered too ``costly''. |
| |
| @node Load hooks, useDynLib, Registering S3 methods, Package namespaces |
| @subsection Load hooks |
| |
| @findex .onLoad |
| @findex .onAttach |
| There are a number of hooks called as packages are loaded, attached, |
| detached, and unloaded. See @code{help(".onLoad")} for more details. |
| |
| Since loading and attaching are distinct operations, separate hooks are |
| provided for each. These hook functions are called @code{.onLoad} and |
| @code{.onAttach}. They both take arguments@footnote{they will be called |
| with two unnamed arguments, in that order.} @code{libname} and |
| @code{pkgname}; they should be defined in the namespace but not |
| exported. |
| |
| @findex .onUnload |
| @findex .onDetach |
| @findex .Last.lib |
| Packages can use a @code{.onDetach} or @code{.Last.lib} function |
| (provided the latter is exported from the namespace) when @code{detach} |
| is called on the package. It is called with a single argument, the full |
| path to the installed package. There is also a hook @code{.onUnload} |
| which is called when the namespace is unloaded (@emph{via} a call to |
| @code{unloadNamespace}, perhaps called by @code{detach(unload = TRUE)}) |
| with argument the full path to the installed package's directory. |
| @code{.onUnload} and @code{.onDetach} should be defined in the namespace |
| and not exported, but @code{.Last.lib} does need to be exported. |
| |
| Packages are not likely to need @code{.onAttach} (except perhaps for a |
| start-up banner); code to set options and load shared objects should be |
| placed in a @code{.onLoad} function, or use made of the @code{useDynLib} |
| directive described next. |
| |
| User-level hooks are also available: see the help on function |
| @code{setHook}. |
| |
| These hooks are often used incorrectly. People forget to export |
| @code{.Last.lib}. Compiled code should be loaded in @code{.onLoad} (or |
| @emph{via} a @code{useDynLb} directive: see below) and unloaded in |
| @code{.onUnload}. Do remember that a package's namespace can be loaded |
| without the namespace being attached (e.g.@: by @code{pkgname::fun}) and |
| that a package can be detached and re-attached whilst its namespace |
| remains loaded. |
| |
| @node useDynLib, An example, Load hooks, Package namespaces |
| @subsection useDynLib |
| |
| A @file{NAMESPACE} file can contain one or more @code{useDynLib} |
| directives which allows shared objects that need to be |
| loaded.@footnote{NB: this will only be read in all versions of @R{} if |
| the package contains @R{} code in a @file{R} directory.} The directive |
| |
| @findex useDynLib |
| @example |
| useDynLib(foo) |
| @end example |
| |
| @noindent |
| registers the shared object @code{foo}@footnote{Note that this is the |
| basename of the shared object, and the appropriate extension (@file{.so} |
| or @file{.dll}) will be added.} for loading with @code{library.dynam}. |
| Loading of registered object(s) occurs after the package code has been |
| loaded and before running the load hook function. Packages that would |
| only need a load hook function to load a shared object can use the |
| @code{useDynLib} directive instead. |
| |
| The @code{useDynLib} directive also accepts the names of the native |
| routines that are to be used in @R{} @emph{via} the @code{.C}, @code{.Call}, |
| @code{.Fortran} and @code{.External} interface functions. These are given as |
| additional arguments to the directive, for example, |
| |
| @example |
| useDynLib(foo, myRoutine, myOtherRoutine) |
| @end example |
| |
| By specifying these names in the @code{useDynLib} directive, the native |
| symbols are resolved when the package is loaded and @R{} variables |
| identifying these symbols are added to the package's namespace with |
| these names. These can be used in the @code{.C}, @code{.Call}, |
| @code{.Fortran} and @code{.External} calls in place of the name of the |
| routine and the @code{PACKAGE} argument. For instance, we can call the |
| routine @code{myRoutine} from @R{} with the code |
| |
| @example |
| .Call(myRoutine, x, y) |
| @end example |
| |
| @noindent |
| rather than |
| |
| @example |
| .Call("myRoutine", x, y, PACKAGE = "foo") |
| @end example |
| |
| There are at least two benefits to this approach. Firstly, the symbol |
| lookup is done just once for each symbol rather than each time the |
| routine is invoked. Secondly, this removes any ambiguity in resolving |
| symbols that might be present in several compiled DLLs. However, this |
| approach is nowadays deprecated in favour of supplying registration |
| information (see below). |
| |
| In some circumstances, there will already be an @R{} variable in the |
| package with the same name as a native symbol. For example, we may have |
| an @R{} function in the package named @code{myRoutine}. In this case, |
| it is necessary to map the native symbol to a different @R{} variable |
| name. This can be done in the @code{useDynLib} directive by using named |
| arguments. For instance, to map the native symbol name @code{myRoutine} |
| to the @R{} variable @code{myRoutine_sym}, we would use |
| |
| @example |
| useDynLib(foo, myRoutine_sym = myRoutine, myOtherRoutine) |
| @end example |
| |
| We could then call that routine from @R{} using the command |
| |
| @example |
| .Call(myRoutine_sym, x, y) |
| @end example |
| |
| Symbols without explicit names are assigned to the @R{} variable with |
| that name. |
| |
| In some cases, it may be preferable not to create @R{} variables in the |
| package's namespace that identify the native routines. It may be too |
| costly to compute these for many routines when the package is loaded |
| if many of these routines are not likely to be used. In this case, |
| one can still perform the symbol resolution correctly using the DLL, |
| but do this each time the routine is called. Given a reference to the |
| DLL as an @R{} variable, say @code{dll}, we can call the routine |
| @code{myRoutine} using the expression |
| |
| @example |
| .Call(dll$myRoutine, x, y) |
| @end example |
| |
| The @code{$} operator resolves the routine with the given name in the |
| DLL using a call to @code{getNativeSymbol}. This is the same |
| computation as above where we resolve the symbol when the package is |
| loaded. The only difference is that this is done each time in the case |
| of @code{dll$myRoutine}. |
| |
| In order to use this dynamic approach (e.g., @code{dll$myRoutine}), one |
| needs the reference to the DLL as an @R{} variable in the package. The |
| DLL can be assigned to a variable by using the @code{variable = |
| dllName} format used above for mapping symbols to @R{} variables. For |
| example, if we wanted to assign the DLL reference for the DLL |
| @code{foo} in the example above to the variable @code{myDLL}, we would |
| use the following directive in the @file{NAMESPACE} file: |
| |
| @example |
| myDLL = useDynLib(foo, myRoutine_sym = myRoutine, myOtherRoutine) |
| @end example |
| |
| Then, the @R{} variable @code{myDLL} is in the package's namespace and |
| available for calls such as @code{myDLL$dynRoutine} to access routines |
| that are not explicitly resolved at load time. |
| |
| If the package has registration information (see @ref{Registering native |
| routines}), then we can use that directly rather than specifying the |
| list of symbols again in the @code{useDynLib} directive in the |
| @file{NAMESPACE} file. Each routine in the registration information is |
| specified by giving a name by which the routine is to be specified along |
| with the address of the routine and any information about the number and |
| type of the parameters. Using the @code{.registration} argument of |
| @code{useDynLib}, we can instruct the namespace mechanism to create |
| @R{} variables for these symbols. For example, suppose we have the |
| following registration information for a DLL named @code{myDLL}: |
| |
| @example |
| static R_NativePrimitiveArgType foo_t[] = @{ |
| REALSXP, INTSXP, STRSXP, LGLSXP |
| @}; |
| |
| static const R_CMethodDef cMethods[] = @{ |
| @{"foo", (DL_FUNC) &foo, 4, foo_t@}, |
| @{"bar_sym", (DL_FUNC) &bar, 0@}, |
| @{NULL, NULL, 0, NULL@} |
| @}; |
| |
| static const R_CallMethodDef callMethods[] = @{ |
| @{"R_call_sym", (DL_FUNC) &R_call, 4@}, |
| @{"R_version_sym", (DL_FUNC) &R_version, 0@}, |
| @{NULL, NULL, 0@} |
| @}; |
| @end example |
| |
| Then, the directive in the @file{NAMESPACE} file |
| |
| @example |
| useDynLib(myDLL, .registration = TRUE) |
| @end example |
| |
| @noindent |
| causes the DLL to be loaded and also for the @R{} variables @code{foo}, |
| @code{bar_sym}, @code{R_call_sym} and @code{R_version_sym} to be |
| defined in the package's namespace. |
| |
| Note that the names for the @R{} variables are taken from the entry in |
| the registration information and do not need to be the same as the name |
| of the native routine. This allows the creator of the registration |
| information to map the native symbols to non-conflicting variable names |
| in @R{}, e.g.@: @code{R_version} to @code{R_version_sym} for use in an |
| @R{} function such as |
| |
| @example |
| R_version <- function() |
| @{ |
| .Call(R_version_sym) |
| @} |
| @end example |
| |
| Using argument @code{.fixes} allows an automatic prefix to be added to |
| the registered symbols, which can be useful when working with an |
| existing package. For example, package @CRANpkg{KernSmooth} has |
| |
| @example |
| useDynLib(KernSmooth, .registration = TRUE, .fixes = "F_") |
| @end example |
| |
| @noindent |
| which makes the @R{} variables corresponding to the Fortran symbols |
| @code{F_bkde} and so on, and so avoid clashes with @R{} code in the |
| namespace. |
| |
| @strong{NB}: Using these arguments for a package which does not register |
| native symbols merely slows down the package loading (although at the |
| time of writing 90 @acronym{CRAN} packages did so). Once symbols are |
| registered, check that the corresponding @R{} variables are not |
| accidentally exported by a pattern in the @file{NAMESPACE} file. |
| |
| |
| @node An example, Namespaces with S4 classes and methods, useDynLib, Package namespaces |
| @subsection An example |
| |
| As an example consider two packages named @pkg{foo} and @pkg{bar}. The |
| @R{} code for package @pkg{foo} in file @file{foo.R} is |
| |
| @quotation |
| @cartouche |
| @example |
| x <- 1 |
| f <- function(y) c(x,y) |
| foo <- function(x) .Call("foo", x, PACKAGE="foo") |
| print.foo <- function(x, ...) cat("<a foo>\n") |
| @end example |
| @end cartouche |
| @end quotation |
| |
| @noindent |
| Some C code defines a C function compiled into DLL @code{foo} (with an |
| appropriate extension). The @file{NAMESPACE} file for this package is |
| |
| @quotation |
| @cartouche |
| @example |
| useDynLib(foo) |
| export(f, foo) |
| S3method(print, foo) |
| @end example |
| @end cartouche |
| @end quotation |
| |
| @noindent |
| The second package @pkg{bar} has code file @file{bar.R} |
| |
| @quotation |
| @cartouche |
| @example |
| c <- function(...) sum(...) |
| g <- function(y) f(c(y, 7)) |
| h <- function(y) y+9 |
| @end example |
| @end cartouche |
| @end quotation |
| |
| @noindent |
| and @file{NAMESPACE} file |
| |
| @quotation |
| @cartouche |
| @example |
| import(foo) |
| export(g, h) |
| @end example |
| @end cartouche |
| @end quotation |
| |
| @noindent |
| Calling @code{library(bar)} loads @pkg{bar} and attaches its exports to |
| the search path. Package @pkg{foo} is also loaded but not attached to |
| the search path. A call to @code{g} produces |
| |
| @example |
| > g(6) |
| [1] 1 13 |
| @end example |
| |
| @noindent |
| This is consistent with the definitions of @code{c} in the two settings: |
| in @pkg{bar} the function @code{c} is defined to be equivalent to |
| @code{sum}, but in @pkg{foo} the variable @code{c} refers to the |
| standard function @code{c} in @pkg{base}. |
| |
| @node Namespaces with S4 classes and methods, , An example, Package namespaces |
| @subsection Namespaces with S4 classes and methods |
| |
| Some additional steps are needed for packages which make use of formal |
| (S4-style) classes and methods (unless these are purely used |
| internally). The package should have @code{Depends: methods} in its |
| @file{DESCRIPTION} and @code{import(methods)} or |
| @code{importFrom(methods, ...)} plus any classes and methods which are |
| to be exported need to be declared in the @file{NAMESPACE} file. For |
| example, the @pkg{stats4} package has |
| |
| @findex exportClasses |
| @findex exportMethods |
| |
| @example |
| export(mle) # exporting methods implicitly exports the generic |
| importFrom("graphics", plot) |
| importFrom("stats", optim, qchisq) |
| ## For these, we define methods or (AIC, BIC, nobs) an implicit generic: |
| importFrom("stats", AIC, BIC, coef, confint, logLik, nobs, profile, |
| update, vcov) |
| exportClasses(mle, profile.mle, summary.mle) |
| ## All methods for imported generics: |
| exportMethods(coef, confint, logLik, plot, profile, summary, |
| show, update, vcov) |
| ## implicit generics which do not have any methods here |
| export(AIC, BIC, nobs) |
| @end example |
| |
| @findex exportPattern |
| @findex exportClassPattern |
| @noindent |
| All S4 classes to be used outside the package need to be listed in an |
| @code{exportClasses} directive. Alternatively, they can be specified |
| using @code{exportClassPattern}@footnote{This defaults to the same |
| pattern as @code{exportPattern}: use something like |
| @code{exportClassPattern("^$")} to override this.} in the same style as |
| for @code{exportPattern}. To export methods for generics from other |
| packages an @code{exportMethods} directive can be used. |
| |
| Note that exporting methods on a generic in the namespace will also |
| export the generic, and exporting a generic in the namespace will also |
| export its methods. If the generic function is not local to this |
| package, either because it was imported as a generic function or because |
| the non-generic version has been made generic solely to add S4 methods |
| to it (as for functions such as @code{plot} in the example above), it |
| can be declared @emph{via} either or both of @code{export} or |
| @code{exportMethods}, but the latter is clearer (and is used in the |
| @pkg{stats4} example above). In particular, for primitive functions |
| there is no generic function, so @code{export} would export the |
| primitive, which makes no sense. On the other hand, if the generic is |
| local to this package, it is more natural to export the function itself |
| using @code{export()}, and this @emph{must} be done if an implicit |
| generic is created without setting any methods for it (as is the case |
| for @code{AIC} in @pkg{stats4}). |
| |
| A non-local generic function is only exported to ensure that calls to |
| the function will dispatch the methods from this package (and that is |
| not done or required when the methods are for primitive functions). For |
| this reason, you do not need to document such implicitly created generic |
| functions, and @code{undoc} in package @pkg{tools} will not report them. |
| |
| If a package uses S4 classes and methods exported from another package, |
| but does not import the entire namespace of the other |
| package@footnote{if it does, there will be opaque warnings about |
| replacing imports if the classes/methods are also imported.}, it needs |
| to import the classes and methods explicitly, with directives |
| |
| @findex importClassesFrom |
| @findex importMethodsFrom |
| |
| @example |
| importClassesFrom(package, ...) |
| importMethodsFrom(package, ...) |
| @end example |
| |
| @noindent |
| listing the classes and functions with methods respectively. Suppose we |
| had two small packages @pkg{A} and @pkg{B} with @pkg{B} using @pkg{A}. |
| Then they could have @code{NAMESPACE} files |
| |
| @quotation |
| @cartouche |
| @example |
| export(f1, ng1) |
| exportMethods("[") |
| exportClasses(c1) |
| @end example |
| @end cartouche |
| @end quotation |
| |
| @noindent |
| and |
| |
| @quotation |
| @cartouche |
| @example |
| importFrom(A, ng1) |
| importClassesFrom(A, c1) |
| importMethodsFrom(A, f1) |
| export(f4, f5) |
| exportMethods(f6, "[") |
| exportClasses(c1, c2) |
| @end example |
| @end cartouche |
| @end quotation |
| |
| @noindent |
| respectively. |
| |
| Note that @code{importMethodsFrom} will also import any generics defined |
| in the namespace on those methods. |
| |
| It is important if you export S4 methods that the corresponding generics |
| are available. You may for example need to import @code{plot} from |
| @pkg{graphics} to make visible a function to be converted into its |
| implicit generic. But it is better practice to make use of the generics |
| exported by @pkg{stats4} as this enables multiple packages to |
| unambiguously set methods on those generics. |
| |
| @node Writing portable packages, Diagnostic messages, Package namespaces, Creating R packages |
| @section Writing portable packages |
| |
| This section contains advice on writing packages to be used on multiple |
| platforms or for distribution (for example to be submitted to a package |
| repository such as @acronym{CRAN}). |
| |
| @menu |
| * PDF size:: |
| * Check timing:: |
| * Encoding issues:: |
| * Portable C and C++ code:: |
| * Binary distribution:: |
| @end menu |
| |
| Portable packages should have simple file names: use only alphanumeric |
| @acronym{ASCII} characters and period (@code{.}), and avoid those names |
| not allowed under Windows (@pxref{Package structure}). |
| |
| Many of the graphics devices are platform-specific: even @code{X11()} |
| (aka @code{x11()}) which although emulated on Windows may not be |
| available on a Unix-alike (and is not the preferred screen device on OS |
| X). It is rarely necessary for package code or examples to open a new |
| device, but if essential,@footnote{People use @code{dev.new()} to open a |
| device at a particular size: that is not portable but using |
| @code{dev.new(noRStudioGD = TRUE)} helps.} use @code{dev.new()}. |
| |
| Use @command{R CMD build} to make the release @file{.tar.gz} file. |
| |
| @command{R CMD check} provides a basic set of checks, but often further |
| problems emerge when people try to install and use packages submitted to |
| @acronym{CRAN} -- many of these involve compiled code. Here are some |
| further checks that you can do to make your package more portable. |
| |
| @itemize |
| |
| @item |
| If your package has a @file{configure} script, provide a |
| @file{configure.win} script to be used on Windows (an empty file if no |
| actions are needed). |
| |
| @item |
| If your package has a @file{Makevars} or @file{Makefile} file, make sure |
| that you use only portable make features. Such files should be |
| LF-terminated@footnote{Solaris @command{make} does not accept |
| CRLF-terminated Makefiles; Solaris warns about and some other |
| @command{make}s ignore incomplete final lines.} (including the final |
| line of the file) and not make use of GNU extensions. (The POSIX |
| specification is available at |
| @uref{http://pubs.opengroup.org/@/onlinepubs/@/9699919799/@/utilities/@/make.html}; |
| anything not documented there should be regarded as an extension to be |
| avoided. Further advice can be found at |
| @uref{https://www.gnu.org/@/software/@/autoconf/@/manual/@/autoconf.html#Portable-Make}. ) |
| Commonly misused GNU extensions are conditional inclusions (@code{ifeq} |
| and the like), @code{$@{shell ...@}}, @code{$@{wildcard ...@}} and |
| similar, and the use of @code{+=}@footnote{This was apparently |
| introduced in SunOS 4, and is available elsewhere @emph{provided} it is |
| surrounded by spaces.} and @code{:=}. Also, the use of @code{$<} other |
| than in implicit rules is a GNU extension, as is the @code{$^} macro. |
| As is the use of @code{.PHONY} (some other makes ignore it). |
| Unfortunately makefiles which use GNU extensions often run on other |
| platforms but do not have the intended results. |
| |
| The use of @code{$@{shell ...@}} can be avoided by using backticks, e.g.@: |
| |
| @example |
| PKG_CPPFLAGS = `gsl-config --cflags` |
| @end example |
| |
| @noindent |
| which works in all versions of @command{make} known@footnote{GNU make, |
| BSD make and other variants of @command{pmake} in FreeBSD, NetBSD and |
| formerly in macOS, AT&T make as implemented on Solaris and `Distributed |
| Make' (@code{dmake}), part of Oracle Developer Studio and available in |
| other versions including from Apache OpenOffice.} to be used with @R{}. |
| |
| If you really must require GNU make, declare it in the @file{DESCRIPTION} |
| file by |
| |
| @example |
| SystemRequirements: GNU make |
| @end example |
| |
| @noindent |
| and ensure that you use the value of environment variable @env{MAKE} |
| (and not just @command{make}) in your scripts. (On some platforms GNU |
| make is available under a name such as @command{gmake}, and there |
| @code{SystemRequirements} is used to set @env{MAKE}.) |
| |
| If you only need GNU make for parts of the package which are rarely |
| needed (for example to create bibliography files under |
| @file{vignettes}), use a file called @file{GNUmakefile} rather than |
| @file{Makefile} as GNU make (only) will use the former. |
| |
| Since the only viable make for Windows is GNU make, it is permissible to |
| use GNU extensions in files @file{Makevars.win} or @file{Makefile.win}. |
| |
| @item |
| Bash extensions also need to be avoided in shell scripts, including |
| expressions in Makefiles (which are passed to the shell for processing). |
| Some @R{} platforms use strict@footnote{For example, @command{test} |
| options @option{-a} and @option{-e} are not portable, and not supported |
| in the AT&T Bourne shell used on Solaris 10/11, even though they are in |
| the POSIX standard. Nor does Solaris support @samp{$(@var{cmd})}.} |
| Bourne shells: the @R{} toolset on Windows and some Unix-alike OSes use |
| @command{ash} (@uref{https://en.wikipedia.org/@/wiki/@/Almquist_shell}), |
| a rather minimal shell with few builtins. Beware of assuming that all |
| the POSIX command-line utilities are available, especially on Windows |
| where only a minimal set is provided for use with @R{}. |
| @ifset UseExternalXrefs |
| (@xref{The command line tools, , The command line tools, |
| R-admin, R Installation and Administration}.) |
| @end ifset |
| One particular issue is the use of @command{echo}, for which two |
| behaviours are allowed |
| (@uref{http://pubs.opengroup.org/@/onlinepubs/@/9699919799/@/utilities/@/echo.html}) |
| and both occur as defaults on @R{} platforms: portable applications |
| should not use @option{-n} (as the first argument) nor escape |
| sequences. The recommended replacement for @command{echo -n} is the |
| command @command{printf}. |
| Another common issue is the construction |
| @example |
| export FOO=value |
| @end example |
| @noindent |
| which is bash-specific (first set the variable then export it by name). |
| |
| Using @code{test -e} (or @code{[ -e ]}) in shell scripts is not |
| portable: @code{-f} is normally what is intended. Flags @option{-a} and |
| @option{-o} are nowadays declared obsolescent by POSIX and should not be |
| used. |
| |
| Use of `brace expansion', e.g., |
| @example |
| rm -f src/*.@{o,so,d@} |
| @end example |
| @noindent |
| is not portable. |
| |
| The @option{-o} flag for @command{set} in shell scripts is optional in |
| POSIX and not supported on all the platforms @R{} is used on. |
| |
| On macOS Catalina which shell @file{/bin/sh} invokes is user- and |
| platform-dependent: it might be @command{bash} version 3.2, |
| @command{dash} or @command{zsh} (for new accounts it is @command{zsh}, |
| for accounts ported from an earlier version it is usually @command{bash}). |
| |
| @item |
| Make use of the abilities of your compilers to check the |
| standards-conformance of your code. For example, @command{gcc} and |
| @command{gfortran}@footnote{@uref{http://fortranwiki.org/@/fortran/@/show/Modernizing+Old+Fortran} |
| may help explain some of the warnings from @command{gfortran -Wall |
| -pedantic}.} can be used with options @option{-Wall -pedantic} to alert |
| you to potential problems. This is particularly important for C++, |
| where @code{g++ -Wall -pedantic} will alert you to the use of some of |
| the GNU extensions which fail to compile on most other C++ compilers. If |
| @R{} was not configured accordingly, one can achieve this @emph{via} |
| personal @file{Makevars} files. |
| @ifset UseExternalXrefs |
| @xref{Customizing package compilation, , Customizing package compilation, |
| R-admin, R Installation and Administration}, |
| @end ifset |
| |
| Portable C++ code needs to follow the 1998 standard (and not use |
| features from C99), or to specify a C++11 compiler (see @ref{Using C++11 |
| code}) where available (which is not the case on all @R{} platforms). |
| Currently C++14 code is less portable and C++17 support is patchy across |
| @R{} platforms. |
| |
| If using Fortran with the GNU compiler, use the flags |
| @option{-std=f95 -Wall -pedantic} which reject most GNU extensions and |
| features from later standards. (Although @R{} only requires Fortran 90, |
| @command{gfortran} does not have a way to specify that standard.) |
| |
| @R{} has tested that @code{DOUBLE COMPLEX} works and so is preferred to |
| @code{COMPLEX*16}. (One can also use something like |
| @code{COMPLEX(KIND=KIND(0.0D0))}@footnote{See |
| @uref{http://people.ds.cam.ac.uk/nmm1/fortran/paper_07.pdf}.}.) |
| |
| |
| Not all common @R{} platforms conform to the expected standards, e.g.@: |
| C99 for C code. One common area of problems is the @code{*printf} |
| functions where Windows does not support @code{%lld}, @code{%Lf} and |
| similar formats (and has its own formats such as @code{%I64d} for 64-bit |
| integers). It is very rare to need to output such types, and 64-bit |
| integers can usually be converted to doubles for output. However, the |
| C11 standard (section 7.8.1) includes @code{PRIxNN} |
| macros@footnote{These are optional because the corresponding types are, |
| but must be provided if the types are.} in C header @file{inttypes.h} |
| (for example @code{PRId64}) so the portable approach is to test for |
| these and if not available provide emulations in the package. |
| |
| @item |
| @command{R CMD check} performs some checks for non-portable |
| compiler/linker flags in @file{src/Makevars}. However, it cannot check |
| the meaning of such flags, and some are commonly accepted but with |
| compiler-specific meanings. There are other non-portable flags which |
| are not checked, nor are @file{src/Makefile} files and makefiles in |
| sub-directories. As a comment in the code says |
| |
| @quotation |
| It is hard to think of anything apart from @option{-I*} and @option{-D*} |
| that is safe for general use @dots{} |
| @end quotation |
| |
| @noindent |
| although @option{-pthread} is pretty close to portable. (Option |
| @option{-U} is portable but little use on the command line as it will |
| only cancel built-in defines (not portable) and those defined earlier on |
| the command line (@R{} does not use any).) |
| |
| People have used @command{configure} to customize @file{src/Makevars}, |
| including for specific compilers. This is unsafe for several reasons. |
| First, unintended compilers might meet the check---for example, several |
| compilers other than GCC identify themselves as `GCC' whilst being only |
| partially conformant. Second, future versions of compilers may behave |
| differently (including updates to quite old series) so for example |
| @option{-Werror} (and specializations) can make a package |
| non-installable under a future version. Third, using flags to suppress |
| diagnostic messages can hide important information for debugging on a |
| platform not tested by the package maintainer. (@command{R CMD check} |
| can optionally report on unsafe flags which were used.) |
| |
| Avoid the use of @option{-march} and especially @option{-march=native}. |
| This allows the compiler to generate code that will only run on a |
| particular class of CPUs (that of the compiling machine for |
| @samp{native}). People assume this is a `minimum' CPU specification, |
| but that is not how it is documented for @command{gcc} (it is accepted |
| by @command{clang} but apparently it is undocumented what precisely it |
| does, and it can be accepted and may be ignored for other compilers). |
| (For personal use @option{-mtune} is safer, but still not portable |
| enough to be used in a public package.) Not even @command{gcc} supports |
| @samp{native} for all CPUs, and it can do surprising things if it finds |
| a CPU released later than its version. |
| |
| @item |
| Do be very careful with passing arguments between @R{}, C and Fortran |
| code. In particular, @code{long} in C will be 32-bit on some @R{} |
| platforms (including 64-bit Windows), but 64-bit on most modern Unix and |
| Linux platforms. It is rather unlikely that the use of @code{long} in C |
| code has been thought through: if you need a longer type than @code{int} |
| you should use a configure test for a C99/C++11 type such as |
| @code{int_fast64_t} (and failing that, @code{long long}@footnote{but |
| note that @code{long long} is not a standard C++98 type, and C++ |
| compilers for earlier versions of @R{} set up for strict C++98 |
| conformance will reject it. C++11 (the default since @R{} 3.6.2) |
| includes @code{long long}.}) and typedef your own type, or use another |
| suitable type (such as @code{size_t}). |
| @c https://en.cppreference.com/w/cpp/language/types claims long long is |
| @c >= 64-bit, but that is not obvious in the standard. |
| |
| It is not safe to assume that @code{long} and pointer types are the same |
| size, and they are not on 64-bit Windows. If you need to convert |
| pointers to and from integers use the C99/C++11 integer types |
| @code{intptr_t} and @code{uintptr_t} (in the headers @code{<stdint.h>} |
| and @code{cstdint}: they are not required to be implemented by the |
| standards but are used in C code by @R{} itself). |
| |
| Note that @code{integer} in Fortran corresponds to @code{int} in C on |
| all @R{} platforms. |
| |
| @item |
| Under no circumstances should your compiled code ever call @code{abort} |
| or @code{exit}@footnote{or where supported the variants @code{_Exit} and |
| @code{_exit}.}: these terminate the user's @R{} process, quite possibly |
| losing all unsaved work. One usage that could call @code{abort} |
| is the @code{assert} macro in C or C++ functions, which should never be |
| active in production code. The normal way to ensure that is to define |
| the macro @code{NDEBUG}, and @command{R CMD INSTALL} does so as part of |
| the compilation flags. If you wish to use @code{assert} during |
| development. you can include @code{-UNDEBUG} in @code{PKG_CPPFLAGS}. |
| Note that your own @file{src/Makefile} or makefiles in sub-directories |
| may also need to define @code{NDEBUG}. |
| |
| This applies not only to your own code but to any external software you |
| compile in or link to. |
| |
| @item |
| Compiled code should not write to @file{stdout} or @file{stderr} and C++ |
| and Fortran I/O should not be used. As with the previous item such |
| calls may come from external software and may never be called, but |
| package authors are often mistaken about that. |
| |
| @item |
| Compiled code should not call the system random number generators such |
| as @code{rand}, @code{drand48} and @code{random}@footnote{This and |
| @code{srandom} are in any case not portable. They are in POSIX but not |
| in the C99 standard, and not available on Windows.}, but rather use the |
| interfaces to @R{}'s RNGs described in @ref{Random numbers}. In |
| particular, if more than one package initializes the system RNG (e.g.@: |
| @emph{via} @code{srand}), they will interfere with each other. |
| |
| Nor should the C++11 random number library be used, nor any other |
| third-party random number generators such as those in GSL. |
| |
| @item |
| Errors in memory allocation and reading/writing outside arrays are very |
| common causes of crashes (e.g., segfaults) on some machines. |
| See @ref{Checking memory access} for tools which can be used to look for this. |
| |
| @item |
| Many platforms will allow unsatisfied entry points in compiled code, but |
| will crash the application (here @R{}) if they are ever used. Some |
| (notably Windows) will not. Looking at the output of |
| |
| @example |
| nm -pg mypkg.so |
| @end example |
| |
| @noindent |
| and checking if any of the symbols marked @code{U} is unexpected is a |
| good way to avoid this. |
| |
| @item |
| Linkers have a lot of freedom in how to resolve entry points in |
| dynamically-loaded code, so the results may differ by platform. One |
| area that has caused grief is packages including copies of standard |
| system software such as @code{libz} (especially those already linked |
| into @R{}). In the case in point, entry point @code{gzgets} was |
| sometimes resolved against the old version compiled into the package, |
| sometimes against the copy compiled into @R{} and sometimes against the |
| system dynamic library. The only safe solution is to rename the entry |
| points in the copy in the package. We have even seen problems with |
| entry point name @code{myprintf}, which is a system entry |
| point@footnote{in @file{libselinux}.} on some Linux systems. |
| |
| @item |
| Conflicts between symbols in DLLs are handled in very platform-specific |
| ways. Good ways to avoid trouble are to make as many symbols as |
| possible static (check with @code{nm -pg}), and to use names which are |
| clearly tied to your package (which also helps users if anything does go |
| wrong). Note that symbol names starting with @code{R_} are regarded as |
| part of @R{}'s namespace and should not be used in packages. |
| |
| @item |
| It is good practice for DLLs to register their symbols |
| (@pxref{Registering native routines}), restrict visibility |
| (@pxref{Controlling visibility}) and not allow symbol search |
| (@pxref{Registering native routines}). It should be possible for a DLL |
| to have only one visible symbol, @code{R_init_@var{pkgname}}, on |
| suitable platforms@footnote{At least Linux and Windows, but not macOS.}, |
| which would completely avoid symbol conflicts. |
| |
| @item |
| It is not portable to call compiled code in @R{} or other packages |
| @emph{via} @code{.Internal}, @code{.C}, @code{.Fortran}, @code{.Call} or |
| @code{.External}, since such interfaces are subject to change without |
| notice and will probably result in your code terminating the @R{} |
| process. |
| |
| @item |
| Do not use (hard or symbolic) file links in your package sources. |
| Where possible @command{R CMD build} will replace them by copies. |
| |
| @item |
| If you do not yourself have a Windows system, consider submitting your |
| source package to WinBuilder (@uref{https://win-builder.r-project.org/}) |
| before distribution. |
| |
| @item |
| It is bad practice for package code to alter the search path using |
| @code{library}, @code{require} or @code{attach} and this often does not |
| work as intended. For alternatives, see @ref{Suggested packages} and |
| @code{with}. |
| |
| @item |
| Examples can be run interactively @emph{via} @code{example} as well as |
| in batch mode when checking. So they should behave appropriately in |
| both scenarios, conditioning by @code{interactive()} the parts which |
| need an operator or observer. For instance, progress |
| bars@footnote{except perhaps the simplest kind as used by |
| @code{download.file()} in non-interactive use.} are only appropriate in |
| interactive use, as is displaying help pages or calling @code{View()} |
| (see below). |
| |
| @item |
| Be careful with the order of entries in macros such as @code{PKG_LIBS}. |
| Some linkers will re-order the entries, and behaviour can differ between |
| dynamic and static libraries. Generally @option{-L} options should |
| precede@footnote{Whereas the GNU linker reorders so @option{-L} options |
| are processed first, the Solaris one does not.} the libraries (typically |
| specified by @option{-l} options) to be found from those directories, |
| and libraries are searched once in the order they are specified. Not |
| all linkers allow a space after @option{-L} . |
| |
| @item |
| Care is needed with the use of @code{LinkingTo}. This puts one or more |
| directories on the include search path ahead of system headers but |
| (prior to @R{} 3.4.0) after those specified in the @code{CPPFLAGS} macro |
| of the @R{} build (which normally includes @code{-I/usr/local/include}, |
| but most platforms ignore that and include it with the system headers). |
| |
| Any confusion would be avoided by having @code{LinkingTo} headers in a |
| directory named after the package. In any case, name conflicts of |
| headers and directories under package @file{include} directories should |
| be avoided, both between packages and between a package and system and |
| third-party software. |
| |
| @item |
| The @command{ar} utility is often used in makefiles to make static |
| libraries. Its modifier @code{u} is defined by POSIX but is disabled in |
| GNU @command{ar} on some recent Linux distributions which use |
| `deterministic mode'. The safest way to make a static library is to first |
| remove any existing file of that name then use @command{ar -cr} and then |
| @command{ranlib} if needed (which is system-dependent: on most |
| systems@footnote{some versions of macOS did not.} @command{ar} always |
| maintains a symbol table). The POSIX standard says options should be |
| preceded by a hyphen (as in @option{-cr}), although most OSes accept |
| them without. |
| @c flowWorkspace failed on macOS in Mar 2016 because a wildcard spec was empty |
| Note that on some systems @command{ar -cr} must have at least one file |
| specified. |
| |
| @item |
| Some people have a need to set a locale. Locale names are not portable, |
| and e.g.@: @samp{fr_FR.utf8} is commonly used on Linux but not accepted on |
| either Solaris or macOS. @samp{fr_FR.UTF-8} is more portable, being |
| accepted on recent Linux, AIX, FreeBSD, macOS and Solaris (at least). |
| However, some Linux distributions micro-package, so locales defined by |
| @pkg{glibc} (including these examples) may not be installed. |
| |
| @item |
| Avoid spaces in file names, not least as they can cause difficulties for |
| external tools. A recent example was a package with a @CRANpkg{knitr} |
| vignette that used spaces in plot names: this caused some versions of |
| @command{pandoc} to fail with a baffling error message. |
| @c msmtools in June 2016 failed with pandoc 1.12 but not 1.16. |
| |
| Non-ASCII filenames can also cause problems (particularly in non-UTF-8 |
| locales). |
| |
| @item |
| Make sure that any version requirement for Java code is both declared in |
| the @samp{SystemRequirements} field@footnote{If a Java interpreter is |
| required directly (not @emph{via} @CRANpkg{rJava}) this must be declared |
| and its presence tested like any other external command.} and tested at |
| runtime (not least as the Java installation when the package is |
| installed might not be the same as when the package is run and will not |
| be for binary packages). Java 8 is available for fewer platforms than |
| Java 7, and Java 11 for fewer still (at the time of writing, only |
| @cputype{x86_64} Linux, macOS, 64-bit Windows and 64-bit Solaris 11 from |
| Oracle). |
| |
| When specifying a minimum Java version please use the official version |
| names, which are (confusingly) |
| @example |
| 1.1 1.2 1.3 1.4 5.0 6 7 8 9 10 11 12 13 |
| @end example |
| @noindent |
| and as from 2018 a year.month scheme such as @samp{18.3} is also in use. |
| |
| A suitable test for Java at least version 8 for packages using |
| @CRANpkg{rJava} would be something like |
| @example |
| .jinit() |
| jv <- .jcall("java/lang/System", "S", "getProperty", "java.runtime.version") |
| if(substr(jv, 1L, 2L) == "1.") @{ |
| jvn <- as.numeric(paste0(strsplit(jv, "[.]")[[1L]][1:2], collapse = ".")) |
| if(jvn < 1.8) stop("Java >= 8 is needed for this package but not available") |
| @} |
| @end example |
| @noindent |
| Java 9 changed the format of this string (which used to be something |
| like @samp{1.8.0_162-b12}); Java 11 gives @code{jv} as @samp{11+28} |
| whereas Java 10.0.2 gave |
| @samp{10.0.2+10}. (@uref{http://openjdk.java.net/jeps/322} details the |
| current scheme. Note that it is necessary to allow for pre-releases |
| like @samp{11-ea+22}.) |
| |
| Note too that the compiler used to produce a @code{jar} can impose a minimum |
| Java version, often resulting in an arcane message like |
| |
| @example |
| java.lang.UnsupportedClassVersionError: ... Unsupported major.minor version 52.0 |
| @end example |
| @noindent |
| (Where @uref{https://en.wikipedia.org/@/wiki/@/Java_class_file} maps |
| class-file version numbers to Java versions.) Compile with something |
| like @command{javac -target 1.6} to ensure this is avoided. (As from |
| Java 8, @command{javac} defaults to compiling for Java 8. Versions as |
| old as @samp{1.6} are already deprecated and will give a warning with |
| Java 10's @command{javac}.) Note this also applies to packages |
| distributing (or even downloading) compiled Java code produced by |
| others, so their requirements need to be checked (they are often not |
| documented accurately) and accounted for. It should be possible to |
| check the class-file version @emph{via} command-line utility |
| @command{javap}, if necessary after extracting the @file{.class} files |
| from a @file{.jar} archive. |
| |
| Some packages have stated a requirement on a particular JDK, but a |
| package should only be requiring a JRE unless providing its own Java |
| interface. |
| |
| @item |
| A package with a hard-to-satisfy system requirement is by definition not |
| portable, annoyingly so if this is not declared in the |
| @samp{SystemRequirements} field. The most common example is the use of |
| @command{pandoc}, which is only available for a very limited range of |
| platforms (and has onerous requirements to install from source) and has |
| capabilities@footnote{For example, the ability to handle @samp{https://} |
| URLs, which even the build in some major Linux distributions in 2018 did |
| not possess. Further, Linux and macOS builds from late 2017 with |
| @samp{https://} support were unable to download from some sites using or |
| redirecting to @samp{https://} URLs.} that vary by build but are not |
| documented. |
| |
| Usage of external commands should always be conditional on a test for |
| presence (perhaps using @code{Sys.which}), as well as declared in the |
| @samp{SystemRequirements} field. A package should pass its checks |
| without warnings nor errors without the external command being present. |
| |
| An external command can be a (possibly optional) requirement for an |
| imported or suggested package but needed for examples, tests or |
| vignettes in the package itself. Such usages should always be declared |
| and conditional. |
| |
| Interpreters for scripting languages such as Perl, Python and Ruby need |
| to be declared as system requirements and used conditionally: for |
| example macOS 10.16 has been announced not to have them. Python 2 has |
| passed end-of-life and been removed from some major distributions. This |
| applies also to a Java interpreter (which macOS does not have by default). |
| |
| @item |
| Be sure to use portable encoding names: none of @code{utf8}, @code{mac} |
| and @code{macroman} is. See the help for @code{file} for more details. |
| |
| |
| @item |
| Do not invoke @R{} by plain @command{R}, @command{Rscript} or (on |
| Windows) @command{Rterm} in your examples, tests, vignettes, makefiles |
| or other scripts. As pointed out in several places earlier in this |
| manual, use something like |
| @example |
| "$(R_HOME)/bin/Rscript" |
| "$(R_HOME)/bin$(R_ARCH_BIN)/Rterm" |
| @end example |
| with appropriate quotes (as, although not recommended, @env{R_HOME} can |
| contain spaces). |
| |
| @item |
| Do not use @env{R_HOME} in makefiles except when passing them to the shell. |
| Specifically, do not use @env{R_HOME} in the argument to @code{include}, |
| as @env{R_HOME} can contain spaces. Quoting the argument to @code{include} |
| does not help. GNU @command{make}'s @code{include} accepts spaces when |
| escaped using backslashes (GNU @command{make} syntax required): |
| |
| @example |
| ## WARNING: requires GNU make (allowed on Windows) |
| sp = |
| sp += |
| sq = $(subst $(sp),\ ,$1) |
| include $(call sq,$@{R_HOME@}/etc$@{R_ARCH@}/Makeconf) |
| @end example |
| |
| A portable and the recommended way to avoid the problem of spaces in |
| @code{$@{R_HOME@}} is using option @code{-f} of @command{make}. This is |
| easy to do with recursive invocation of @command{make}, which is also the |
| only usual situation when @env{R_HOME} is needed in the argument for |
| @code{include}. |
| |
| @example |
| $(MAKE) -f "$@{R_HOME@}/etc$@{R_ARCH@}/Makeconf" -f Makefile.inner |
| @end example |
| @end itemize |
| |
| |
| Do be careful in what your tests (and examples) actually test. Bad |
| practice seen in distributed packages include: |
| |
| @itemize |
| |
| @item |
| It is not reasonable to test the time taken by a command: you cannot |
| know how fast or how heavily loaded an @R{} platform might be. At best |
| you can test a ratio of times, and even that is fraught with |
| difficulties and not advisable: the just-in-time compiler (JIT) and the GC |
| may trigger at unpredictable times, following heuristics that may change |
| without notice. |
| |
| @item |
| Do not test the exact format of @R{} messages (from @R{} itself or from |
| other packages): They change, and they can be translated. |
| |
| Packages have even tested the exact format of system error messages, |
| which are platform-dependent and perhaps locale-dependent. |
| |
| @item |
| If you use functions such as @code{View}, remember that in testing there |
| is no one to look at the output. It is better to use something like one of |
| @example |
| if(interactive()) View(obj) else print(head(obj)) |
| if(interactive()) View(obj) else str(obj) |
| @end example |
| |
| @item |
| Be careful when comparing file paths. There can be multiple paths to a |
| single file, and some of these can be very long character strings. If |
| possible canonicalize paths before comparisons, but study |
| @code{?normalizePath} to be aware of the pitfalls. |
| |
| @item |
| Only test the accuracy of results if you have done a formal error |
| analysis. Things such as checking that probabilities numerically sum to |
| one are silly: numerical tests should always have a tolerance. That the |
| tests on your platform achieve a particular tolerance says little about |
| other platforms. @R{} is configured by default to make use of long |
| doubles where available, but they may not be available or be too slow |
| for routine use. Most @R{} platforms use @cputype{ix86} or |
| @cputype{x86_64} CPUs: these may use extended precision registers on some |
| but not all of their FPU instructions. Thus the achieved precision can |
| depend on the compiler version and optimization flags---our experience |
| is that 32-bit builds tend to be less precise than 64-bit ones. But not |
| all platforms use those CPUs, and not all@footnote{Not doing so is the |
| default on Windows, overridden for the @R{} executables. It is also the |
| default on some Solaris compilers.} which use them configure them to |
| allow the use of extended precision. In particular, ARM CPUs do not |
| (currently) have extended precision nor long doubles, and long double |
| was 64-bit on HP/PA Linux. |
| |
| If you must try to establish a tolerance empirically, configure and |
| build @R{} with @option{--disable-long-double} and use appropriate |
| compiler flags (such as @option{-ffloat-store} and |
| @option{-fexcess-precision=standard} for @command{gcc}, depending on the |
| CPU type@footnote{These are not needed for the default compiler settings |
| on @cputype{x86_64} but are likely to be needed on @cputype{ix86}.}) to |
| mitigate the effects of extended-precision calculations. |
| |
| Tests which involve random inputs or non-deterministic algorithms should |
| normally set a seed or be tested for many seeds. |
| |
| @end itemize |
| |
| @node PDF size, Check timing, Writing portable packages, Writing portable packages |
| @subsection PDF size |
| |
| There are a several tools available to reduce the size of PDF files: |
| often the size can be reduced substantially with no or minimal loss in |
| quality. Not only do large files take up space: they can stress the PDF |
| viewer and take many minutes to print (if they can be printed at all). |
| |
| @command{qpdf} (@uref{http://qpdf.sourceforge.net/}) can compress |
| losslessly. It is fairly readily available (e.g.@: it has binaries for |
| Windows and packages in Debian/Ubuntu/Fedora, and is installed as part |
| of the @acronym{CRAN} macOS distribution of @R{}). @command{R CMD |
| build} has an option to run @command{qpdf} over PDF files under |
| @file{inst/doc} and replace them if at least 10Kb and 10% is saved. The |
| full path to the @command{qpdf} command can be supplied as environment |
| variable @env{R_QPDF} (and is on the @acronym{CRAN} binary of @R{} for |
| macOS). It seems MiKTeX does not use PDF object compression and so |
| @command{qpdf} can reduce considerably the files it outputs: MiKTeX's |
| defaults can be overridden by code in the preamble of an Sweave or |
| @LaTeX{} file --- see how this is done for the @R{} reference manual at |
| @uref{https://svn.r-project.org/@/R/@/trunk/@/doc/@/manual/@/refman.top}. |
| (Although earlier versions of @command{qpdf} are supported, versions |
| 6.0.0 and later in some cases achieve considerably better compression.) |
| |
| Other tools can reduce the size of PDFs containing bitmap images at |
| excessively high resolution. These are often best re-generated (for |
| example @code{Sweave} defaults to 300 ppi, and 100--150 is more |
| appropriate for a package manual). These tools include Adobe Acrobat |
| (not Reader), Apple's Preview@footnote{Select `Save as', and select |
| `Reduce file size' from the `Quartz filter' menu': this can be accessed |
| in other ways, for example by Automator.} and Ghostscript (which |
| converts PDF to PDF by |
| |
| @example |
| ps2pdf @var{options} -dAutoRotatePages=/None -dPrinted=false @var{in}.pdf @var{out}.pdf |
| @end example |
| |
| @noindent |
| and suitable options might be |
| |
| @example |
| -dPDFSETTINGS=/ebook |
| -dPDFSETTINGS=/screen |
| @end example |
| |
| @noindent |
| ; see @uref{http://www.ghostscript.com/@/doc/@/current/@/Ps2pdf.htm} for |
| more such and consider all the options for image downsampling). There |
| have been examples in @acronym{CRAN} packages for which current versions |
| of Ghostscript produced much bigger reductions than earlier ones. |
| |
| We come across occasionally large PDF files containing excessively |
| complicated figures using PDF vector graphics: such figures are often |
| best redesigned or failing that, output as PNG files. |
| |
| Option @option{--compact-vignettes} to @command{R CMD build} defaults to |
| value @samp{qpdf}: use @samp{both} to try harder to reduce the size, |
| provided you have Ghostscript available (see the help for |
| @code{tools::compactPDF}). |
| |
| @node Check timing, Encoding issues, PDF size, Writing portable packages |
| @subsection Check timing |
| |
| There are several ways to find out where time is being spent in the |
| check process. Start by setting the environment variable |
| @env{_R_CHECK_TIMINGS_} to @samp{0}. This will report the total CPU |
| times (not Windows) and elapsed times for installation and running |
| examples, tests and vignettes, under each sub-architecture if |
| appropriate. For tests and vignettes, it reports the time for each as |
| well as the total. |
| |
| Setting @env{_R_CHECK_TIMINGS_} to a positive value sets a threshold (in |
| seconds elapsed time) for reporting timings. |
| |
| If you need to look in more detail at the timings for examples, use |
| option @option{--timings} to @command{R CMD check} (this is set by |
| @option{--as-cran}). This adds a summary to the check output for all |
| the examples with CPU or elapsed time of more than 5 seconds. It |
| produces a file @file{@var{mypkg}.Rcheck/@var{mypkg}-Ex.timings} |
| containing timings for each help file: it is a tab-delimited file which |
| can be read into @R{} for further analysis. |
| |
| Timings for the tests and vignette runs are given at the bottom of the |
| corresponding log file: note that log files for successful vignette runs |
| are only retained if environment variable |
| @env{_R_CHECK_ALWAYS_LOG_VIGNETTE_OUTPUT_} is set to a true value. |
| |
| |
| @node Encoding issues, Portable C and C++ code, Check timing, Writing portable packages |
| @subsection Encoding issues |
| |
| Care is needed if your package contains non-@acronym{ASCII} text, and in |
| particular if it is intended to be used in more than one locale. It is |
| possible to mark the encoding used in the @file{DESCRIPTION} file and in |
| @file{.Rd} files, as discussed elsewhere in this manual. |
| |
| First, consider carefully if you really need non-@acronym{ASCII} text. |
| Many users of @R{} will only be able to view correctly text in their |
| native language group (e.g.@: Western European, Eastern European, |
| Simplified Chinese) and @acronym{ASCII}.@footnote{except perhaps some |
| special characters such as backslash and hash which may be taken over |
| for currency symbols.}. Other characters may not be rendered at all, |
| rendered incorrectly, or cause your @R{} code to give an error. For |
| @file{.Rd} documentation, marking the encoding and including |
| @acronym{ASCII} transliterations is likely to do a reasonable job. The |
| set of characters which is commonly supported is wider than it used to |
| be around 2000, but non-Latin alphabets (Greek, Russian, Georgian, |
| @dots{}) are still often problematic and those with double-width |
| characters (Chinese, Japanese, Korean, emoji) often need specialist |
| fonts to render correctly. |
| |
| Several @acronym{CRAN} packages have messages in their @R{} code in French (and a |
| few in German). A better way to tackle this is to use the |
| internationalization facilities discussed elsewhere in this manual. |
| |
| Function @code{showNonASCIIfile} in package @pkg{tools} can help in |
| finding non-@acronym{ASCII} bytes in files. |
| |
| There is a portable way to have arbitrary text in character strings |
| (only) in your @R{} code, which is to supply them in Unicode as |
| @samp{\uxxxx} escapes. If there are any characters not in the current |
| encoding the parser will encode the character string as UTF-8 and mark |
| it as such. This applies also to character strings in datasets: they |
| can be prepared using @samp{\uxxxx} escapes or encoded in UTF-8 in a |
| UTF-8 locale, or even converted to UTF-8 @emph{via} @code{iconv()}. If |
| you do this, make sure you have @samp{R (>= 2.10)} (or later) in the |
| @samp{Depends} field of the @file{DESCRIPTION} file. |
| |
| @R{} sessions running in non-UTF-8 locales will if possible re-encode |
| such strings for display (and this is done by @command{RGui} on Windows, |
| for example). Suitable fonts will need to be selected or made |
| available@footnote{Typically on a Unix-alike this is done by telling |
| @command{fontconfig} where to find suitable fonts to select glyphs |
| from.} both for the console/terminal and graphics devices such as |
| @samp{X11()} and @samp{windows()}. Using @samp{postscript} or |
| @samp{pdf} will choose a default 8-bit encoding depending on the |
| language of the UTF-8 locale, and your users would need to be told how |
| to select the @samp{encoding} argument. |
| |
| Note that the previous two paragraphs only apply to character strings in |
| @R{} code. Non-ASCII characters are particularly prevalent in comments |
| (in the @R{} code of the package, in examples, tests, vignettes and even |
| in the @file{NAMESPACE} file) but should be avoided there. Most commonly |
| people use the Windows extensions to Latin-1 (often directional single |
| and double quotes, ellipsis, bullet and en and em dashes) which are not |
| supported in strict Latin-1 locales nor in CJK locales on Windows. A |
| surprisingly common misuse is to use a right quote in @samp{don't} |
| instead of the correct apostrophe. |
| |
| If you want to run @command{R CMD check} on a Unix-alike over a package |
| that sets a package encoding in its @file{DESCRIPTION} file @emph{and do |
| not use a UTF-8 locale} you may need to specify a suitable locale |
| @emph{via} environment variable @env{R_ENCODING_LOCALES}. The default |
| is equivalent to the value |
| |
| @example |
| "latin1=en_US:latin2=pl_PL:UTF-8=en_US.UTF-8:latin9=fr_FR.iso885915@@euro" |
| @end example |
| |
| @noindent |
| (which is appropriate for a system based on @code{glibc}: macOS requires |
| @code{latin9=fr_FR.ISO8859-15}) except that if the current locale is |
| UTF-8 then the package code is translated to UTF-8 for syntax checking, |
| so it is strongly recommended to check in a UTF-8 locale. |
| |
| @node Portable C and C++ code, Binary distribution, Encoding issues, Writing portable packages |
| @subsection Portable C and C++ code |
| |
| @menu |
| * Common symbols:: |
| @end menu |
| |
| Writing portable C and C++ code is mainly a matter of observing the |
| standards (C99, C++11 or where declared C++98/14/17) and testing that |
| extensions (such as POSIX functions) are supported. |
| |
| @strong{C++ standards}: As from version 3.6.0 (3.6.2 on Windows), @R{} |
| defaults to C++11 where available@footnote{which it is on all known |
| platforms}. However, in earlier versions the default standard was that |
| of the compiler used, often C++98 or C++14. Thus for portability it is |
| desirable to specify the C++ standard@footnote{For C++98 this is only |
| possible since @R{} 3.5.0, for C++11 since @R{} 3.1.0.} assumed for a |
| package. Because most packages will be made available for earlier |
| versions on @R{}, comments below about C++98 have been retained. |
| |
| Note that the `TR1' C++ extensions are not part of any of these |
| standards and the @code{<tr1/@var{name}>} headers are not supplied by some of |
| the compilers used for @R{}, including on macOS. (Use the C++11 |
| versions instead.) |
| |
| Note too that the POSIX standards only require recently-defined |
| functions to be declared if certain macros are defined with large enough |
| values, and on some compiler/OS combinations@footnote{This is seen on |
| Linux, Solaris and FreeBSD, although each has other ways to turn on all |
| extensions, e.g.@: defining @code{_GNU_SOURCE}, @code{__EXTENSIONS__} or |
| @code{_BSD_SOURCE}: the GCC compilers by default define |
| @code{_GNU_SOURCE} unless a strict standard such as @option{-std=c99} is |
| used. On macOS extensions are declared unless one of these macros is |
| given too small a value.} they are not declared otherwise. So you may |
| need to include something like one of @footnote{Solaris 10 does not |
| recognize this value of @code{_POSIX_C_SOURCE}, nor values of |
| @code{_XOPEN_SOURCE} beyond 600 (700 corresponds to POSIX 2008). |
| Further, the value of 500 is not allowed in C99 mode, @R{}'s default for |
| C code.} |
| @example |
| #define _XOPEN_SOURCE 600 |
| @end example |
| @noindent |
| or |
| @example |
| #ifdef __GLIBC__ |
| # define _POSIX_C_SOURCE 200809L |
| #endif |
| @end example |
| @noindent |
| before @emph{any} headers. (@code{strdup} and @code{strncasecmp} are |
| two such functions.) |
| |
| However, some common errors are worth pointing out here. It can be |
| helpful to look up functions at |
| @uref{http://www.cplusplus.com/reference/} or |
| @uref{http://en.cppreference.com/w/} and compare what is defined in the |
| various standards. |
| |
| Both the compiler and OS (@emph{via} system header files, which may |
| differ by architecture even for nominally the same OS) affect the |
| compilability of C/C++ code. Compilers from the GCC, @command{clang}, |
| Intel and Oracle Developer Studio suites are routinely used with @R{}, |
| and both @command{clang} and Oracle have more than one implementation of |
| C++ headers and library. The range of possibilities makes comprehensive |
| empirical checking impossible, and regrettably compilers are patchy at |
| best on warning about non-standard code. |
| |
| @itemize |
| @item |
| Mathematical functions such as @code{sqrt} are defined in C++11 for |
| floating-point arguments: @code{float}, @code{double}, @code{long |
| double} and possibly more. The standard specifies what happens with an |
| argument of integer type but this is not always implemented, resulting |
| in a report of `overloading ambiguity': this is commonly seen on |
| Solaris, but for @code{pow} also seen on macOS. |
| |
| A not-uncommonly-seen problem is to mistakenly call @code{floor(x/y)} or |
| @code{ceil(x/y)} for @code{int} arguments @code{x} and @code{y}. Since |
| @code{x/y} does integer division, the result is of type @code{int} and |
| `overloading ambiguity' may be reported. Some people have (pointlessly) |
| called @code{floor} and @code{ceil} on arguments of integer type, which |
| may have an `overloading ambiguity'. |
| |
| A surprising common misuse is things like @code{pow(10, -3)}: this |
| should be the constant @code{1e-3}. Note that there are constants such |
| as @code{M_SQRT2} defined in @file{Rmath.h}@footnote{often taken from |
| the toolchain's headers.} for @code{sqrt(2.0)}, frequently mis-coded as |
| @code{sqrt(2)}. |
| |
| @item |
| Function @code{fabs} is defined only for floating-point types, except in |
| C++11 which has overloads for @code{std::fabs} in @file{<cmath>} for |
| integer types. Function @code{abs} is defined in C99's |
| @file{<stdlib.h>} for @code{int} and in C++'s @file{<cstdlib>} for |
| integer types, overloaded in @file{<cmath>} for floating-point types. |
| C++11 has additional overloads for @code{std::abs} in @file{<cmath>} for |
| integer types. The effect of calling @code{abs} with a floating-point |
| type is implementation-specific: it may truncate to an integer. For |
| clarity and to avoid compiler warnings, use @code{abs} for integer types |
| and @code{fabs} for double values. |
| |
| @item |
| It is an error (and make little sense, although has been seen) to call |
| macros/functions @code{isnan}, @code{isinf} and @code{isfinite} for integer |
| arguments: a few compilers give a compilation error. Function |
| @code{finite} is obsolete, and some compilers will warn about its use. |
| |
| @item |
| The GNU C/C++ compilers support a large number of non-portable |
| extensions. For example, @code{INFINITY} (which is a @emph{float} value |
| in C99 and C++11 but not C++98), for which @R{} provides the portable |
| double value @code{R_PosInf} (and @code{R_NegInf} for @code{-INFINITY}). |
| And @code{NAN}@footnote{also part of C++11 and later.} is just one NaN |
| @emph{float} value: for use with @R{}, @code{NA_REAL} is usually what |
| is intended, but @code{R_NaN} is also available. |
| |
| Some (but not all) extensions are listed at |
| @uref{https://gcc.gnu.org/@/onlinedocs/@/gcc/@/C-Extensions.html} and |
| @uref{https://gcc.gnu.org/@/onlinedocs/@/gcc/@/C_002b_002b-Extensions.html}. |
| |
| Other GNU extensions which have bitten package writers is the use of |
| non-portable characters such as @samp{$} in identifiers and use of C++ |
| headers under @file{ext}. |
| |
| The GNU Fortran compiler also supports a large number of non-portable |
| extensions, the most commonly encountered one being |
| @code{ISNAN}@footnote{There is a portable way to do this in Fortran 2003 |
| (@code{ieee_is_nan()} in module @code{ieee_arithmetic}), but ironically |
| that is not supported in the commonly-used versions 4.x of GNU Fortran. |
| A pretty robust alternative is to test @code{if(my_var /= my_var)}.}. |
| Some are listed at |
| @uref{https://gcc.gnu.org/@/onlinedocs/@/gfortran/@/Extensions-implemented-in-GNU-Fortran.html}. |
| One that frequently catches package writers is that it allows |
| out-of-order declarations: in standard-conformant Fortran variables must |
| be declared (explicitly or implicitly) before use in other declarations |
| such as dimensions. |
| |
| @item |
| Including C-style headers in C++ code is not portable. Including the |
| legacy header@footnote{which often is the same as the header included by |
| the C compiler, but some compilers have wrappers for some of the C |
| headers.} @file{math.h} in C++ code may conflict with @file{cmath} which |
| may be included by other headers. This is particularly problematic with |
| C++11 compilers, as functions like @code{sqrt} and @code{isnan} are |
| defined for @code{double} arguments in @file{math.h} and for a range of |
| types including @code{double} in @file{cmath}. Similar issues have been |
| seen for @file{stdlib.h} and @file{cstdlib}. Including the C++ version |
| first used to be a sufficient workaround but for some 2016 compilers |
| only one could be included. |
| |
| @item |
| Be careful to include the headers which define the functions you use. |
| Some compilers/OSes include other system headers in their headers which |
| are not required by the standards, and so code may compile on such |
| systems and not on others. (A prominent example is the C++ header |
| @code{<random>} which is indirectly included by @code{<algorithm>} by |
| @command{g++}. Another issue is the C header @code{<time.h>} which is |
| included by other headers on Linux and Windows but not macOS nor |
| Solaris.) |
| |
| Note that @code{malloc}, @code{calloc}, @code{realloc} and @code{free} |
| are defined by C99 in the header @file{stdlib.h} and (in the |
| @code{std::} namespace) by C++ header @file{cstdlib}. Some earlier |
| implementations used a header @file{malloc.h}, but that is not portable |
| and does not exist on macOS. |
| |
| This also applies to types such as @code{ssize_t}. The POSIX standards |
| say that is declared in headers @code{unistd.h} and @code{sys/types.h}, |
| and the latter is often included indirectly by other headers on some |
| but not all systems. |
| |
| Similarly for constants: for example @code{SIZE_MAX} is defined in |
| @code{stdint.h} alongside @code{size_t}. |
| |
| @item |
| For C++ code, be careful to specify namespaces where needed. Many |
| functions are defined by the standards to be in the @code{std} |
| namespace, but @command{g++} puts many such also in the C++ main |
| namespace. One way to do so is to use declarations such as |
| @example |
| using std::floor; |
| @end example |
| @noindent |
| but it is usually preferable to use explicit namespace prefixes in the code. |
| |
| Examples seen in @acronym{CRAN} packages include |
| @example |
| abs acos atan bind calloc ceil div exp fabs floor fmod free log malloc |
| memcpy memset pow printf qsort round sin sprintf sqrt strcmp strcpy |
| strerror strlen strncmp strtol tan trunc |
| @end example |
| @noindent |
| This problem is less common than it used to be, but in 2019 |
| @command{clang} did not have @code{bind} in the main namespace. |
| |
| @item |
| @c including clang as from 4.0.0 |
| Some C++ compilers refuse to compile constructs such as |
| @example |
| if(ptr > 0) @{ ....@} |
| @end example |
| @noindent |
| which compares a pointer to the integer @code{0}. This could just use |
| @code{if(ptr)} (pointer addresses cannot be negative) but if needed |
| pointers can be tested against @code{nullptr} (C++11) or @code{NULL}. |
| |
| @item |
| Macros defined by the compiler/OS can cause problems. Identifiers |
| starting with an underscore followed by an upper-case letter or another |
| underscore are reserved for system macros and should not be used in |
| portable code (including not as guards in C/C++ headers). Other macros, |
| typically upper-case, may be defined by the compiler or system headers |
| and can cause problems. |
| @c http://lists.x.org/archives/xorg-devel/2013-November/038808.html |
| The most common issue involves the names of the Intel CPU registers such |
| as @code{CS}, @code{DS}, @code{ES}, @code{FS}, @code{GS} and @code{SS} |
| (and more with longer abbreviations@footnote{including @code{EAX}, |
| @code{EBP}, @code{EBX}, @code{ECX}, @code{EDI},@code{EDX}, @code{EFL}, |
| @code{EIP}, @code{ESI} and @code{ESP} .}) defined on i586/x64 Solaris in |
| @file{<sys/regset.h>} and often included indirectly by @file{<stdlib.h>} |
| and other core headers. Further examples are @code{ERR}, @code{VERSION}, |
| @code{LITTLE_ENDIAN}, @code{zero} and @code{I} (which is defined in |
| Solaris' @file{<complex.h>} as a compiler intrinsic for the imaginary |
| unit). Some of these can be avoided by defining @code{_POSIX_C_SOURCE} |
| before including any system headers, but it is better to only use |
| all-upper-case names which have a unique prefix such as the package |
| name. |
| |
| @item |
| @code{typedef}s in OS headers can conflict with those in the package: |
| examples include @code{ulong} on several OSes and @code{index_t} and |
| @code{single} on Solaris. (Note that these may conflict with other uses |
| as identifiers, e.g.@: defining a C++ function called @code{single}.) |
| @c as done by package Emcdf in June 2017. |
| |
| @item |
| If you use OpenMP, check carefully that you have followed the advice in |
| the subsection on @ref{OpenMP support}. In particular, any use of |
| OpenMP in C/C++ code will need to use |
| @example |
| #ifdef _OPENMP |
| # include <omp.h> |
| #endif |
| @end example |
| @noindent |
| Any use of OpenMP functions, e.g.@: @code{omp_set_num_threads}, also |
| needs to be conditioned. |
| |
| And do not hardcode @option{-lgomp}: not only is that specific to the |
| GCC family of compilers, using the correct linker flag often sets up the |
| run-time path to the library. |
| |
| @item |
| Package authors commonly assume things are part of C/C++ when they are |
| not: the most common example is POSIX function @code{strdup}. The most |
| common C library on Linux, @code{glibc}, will hide the declarations of |
| such extensions unless a `feature-test macro' is defined @strong{before} |
| (almost) any system header is included. So for @code{strdup} you need |
| @example |
| #define _POSIX_C_SOURCE 200809L |
| ... |
| #include <string.h> |
| ... |
| strdup call(s) |
| @end example |
| @noindent |
| where the appropriate value can be found by @command{man strdup} on |
| Linux. (Use of @code{strncasecmp} is similar.) |
| |
| However, modes of @command{gcc} with `GNU EXTENSIONS' (which are the |
| default, either @option{-std=gnu99} or @option{-std=gnu11}) declare |
| enough macros to ensure that missing declarations are rarely seen. |
| |
| This applies also to constants such as @code{M_PI} and @code{M_LN2}, |
| which are part of the X/Open standard: to use these define |
| @code{_XOPEN_SOURCE} before including any headers, or include the @R{} |
| header @file{Rmath.h}. |
| |
| @item |
| Similarly, package authors commonly assume things are part of C++ when |
| they were introduced in C++11 if at all. Recent examples from |
| @acronym{CRAN} packages include the C99/C++11 functions |
| @example |
| erf expm1 fmin fmax lgamma lround loglp round snprintf strcasecmp trunc |
| @end example |
| @noindent |
| (all of which are in the @code{std} namespace in C++11) and the POSIX |
| functions @code{strdup} and @code{strncasecmp} and constants @code{M_PI} |
| and @code{M_LN2} (see the previous item). @R{} has long provided |
| @code{fmax2}, @code{fmin2}, @code{fround}, @code{ftrunc}, |
| @code{lgammafn} and many of the X/Open constants, declared in header |
| @file{Rmath.h}. Uses of @code{erf} can be replaced by @code{pnorm} (see |
| the @R{} help page for the latter). |
| |
| @item |
| Using @code{alloca} portably is tricky: it is neither an ISO C/C++ nor a |
| POSIX function. An adequately portable preamble is |
| @example |
| #ifdef __GNUC__ |
| /* Includes GCC, clang and Intel compilers */ |
| # undef alloca |
| # define alloca(x) __builtin_alloca((x)) |
| #elif defined(__sun) || defined(_AIX) |
| /* this is necessary (and sufficient) for Solaris 10 and AIX 6: */ |
| # include <alloca.h> |
| #endif |
| @end example |
| |
| @item |
| Compiler writers feel free to implement features from later standards |
| than the one specified, so for example they may implement or warn on |
| C++14 or C++17 features. Portable code will not use such features -- it |
| can be hard to know what they are but the most common warnings are |
| @example |
| 'register' storage class specifier is deprecated and incompatible with C++17 |
| |
| ISO C++11 does not allow conversion from string literal to 'char *' |
| @end example |
| @noindent |
| (where conversion should be to @code{const char *}). Keyword |
| @code{register} was not mentioned in C++98, deprecated in C++11 and |
| removed in C++17. |
| |
| There are quite a lot of other C++98 features deprecated in C++11 and |
| removed in C++17, and @command{clang} 9 and later warn about |
| them. Examples include @code{bind1st}/@code{bind2nd} (use |
| @code{std::bind} or |
| lambdas@footnote{@uref{https://stackoverflow.com/questions/32739018/a-replacement-for-stdbind2nd}}) |
| @code{std::auto_ptr} (replaced by @code{std::unique_ptr}), |
| @code{std:;mem_fun_ref} and @code{std::ptr_fun}. |
| |
| @item |
| Be careful about including C headers in C++ code. Issues include |
| @itemize |
| @item |
| Use of the @code{register} storage class specifier (see the previous |
| item). |
| @item |
| The C99 keyword @code{restrict} is not part of@footnote{it is allowed |
| but ignored in system headers.} any C++ standard and is rejected by some |
| C++ compilers. |
| @c but package treatSens attempted to use it. |
| @c http://stackoverflow.com/questions/6434549/does-c11-add-the-c99-restrict-specifier-if-not-why-not |
| @item |
| Inclusion by such headers of C-style headers such as @file{math.h} (see above). |
| @end itemize |
| @noindent |
| The most portable way to interface to other software with a C API is to |
| use C code (which can normally be mixed with C++ code in a package). |
| |
| @item |
| @code{reinterpret_cast} in C++ is not safe for pointers: for example the types |
| may have different alignment requirements. Use @code{memcpy} to copy |
| the contents to a fresh variable of the destination type. |
| @c seen in 2019 for casts to both int32 and double from a byte stream |
| |
| @item |
| Avoid platform-specific code if at all possible, but if you need to test |
| for a platform ensure that all platforms are covered. For example, |
| @code{__unix__} is not defined on all Unix-alikes, in particular not on |
| macOS. A reasonably portable way to condition code for a Unix-alike is |
| @example |
| #if defined (__unix__) || (defined (__APPLE__) && defined (__MACH__)) |
| #endif |
| @end example |
| @noindent |
| but |
| @example |
| #ifdef _WIN32 |
| // Windows-specific code |
| #else |
| // Unix-alike code |
| #endif |
| @end example |
| @noindent |
| would be better. For a Unix-alike it is much better to use |
| @command{configure} to test for the functionality needed than make |
| assumptions about OSes (and people all too frequently forget @R{} is |
| used on platforms other than Linux, Windows and macOS --- and some |
| forget macOS). |
| |
| @end itemize |
| |
| Some additional information for C++ is available at |
| @uref{http://journal.r-project.org/@/archive/@/2011-2/@/RJournal_2011-2_Plummer.pdf} |
| by Martyn Plummer. |
| |
| @node Common symbols, , Portable C and C++ code, Portable C and C++ code |
| @subsubsection Common symbols |
| |
| Most OSes (including all those commonly used for @R{}) have the concept |
| of `tentative definitions' where global C variables are defined without |
| an initializer. Traditionally the linker resolves all tentative |
| definitions of the same variable in different object files to the same |
| object, or to a non-tentative definition. However, |
| @command{gcc}@tie{}10 has changed its default so that tentative |
| definitions cannot be merged and the linker will give an error if the |
| same variable is defined in more than one object file. To avoid this, |
| all but one of the C source files should declare the variable |
| @code{extern} --- which means that any such variables included in header |
| files need to be declared @code{extern}. A commonly used idiom |
| (including by @R{} itself) is to define all global variables as |
| @code{extern} in a header, say @file{globals.h} (and nowhere else), and |
| then in one (and one only) source file use |
| @example |
| #define extern |
| # include "globals.h" |
| #undef extern |
| @end example |
| |
| A cleaner approach is not to have global variables at all, but to place |
| in a single file common variables (declared @code{static}) followed by |
| all the functions which make use of them: this may result in more |
| efficient code. |
| |
| The `modern' behaviour can be seen@footnote{In principle this could |
| depend on the OS, but has been checked on Linux and macOS.} by using |
| compiler flag @option{-fno-common} as part of @samp{CFLAGS} in earlier |
| versions of @command{gcc} and also in @command{clang}. It is |
| recommended that packages are if possible checked with that flag to |
| ensure portability. |
| |
| This is not pertinent to C++ which does not permit tentative definitions. |
| |
| @node Binary distribution, , Portable C and C++ code, Writing portable packages |
| @subsection Binary distribution |
| |
| If you want to distribute a binary version of a package on Windows or |
| macOS, there are further checks you need to do to check it is portable: |
| it is all too easy to depend on external software on your own machine |
| that other users will not have. |
| |
| For Windows, check what other DLLs your package's DLL depends on |
| (`imports' from in the DLL tools' parlance). A convenient GUI-based |
| tool to do so is `Dependency Walker' |
| (@uref{http://www.dependencywalker.com/}) for both 32-bit and 64-bit |
| DLLs -- note that this will report as missing links to @R{}'s own DLLs |
| such as @file{R.dll} and @file{Rblas.dll}. For 32-bit DLLs only, the |
| command-line tool @command{pedump.exe -i} (in @file{Rtools*.exe}) can be |
| used, and for the brave, the @command{objdump} tool in the appropriate |
| toolchain will also reveal what DLLs are imported from. If you use a |
| toolchain other than one provided by the @R{} developers or use your own |
| makefiles, watch out in particular for dependencies on the toolchain's |
| runtime DLLs such as @file{libgfortran}, @file{libstdc++} and |
| @file{libgcc_s}. |
| |
| For macOS, using @code{R CMD otool -L} on the package's shared object(s) |
| in the @file{libs} directory will show what they depend on: watch for |
| any dependencies in @file{/usr/local/lib} or |
| @file{/usr/local/gfortran/lib}, notably @file{libgfortran.?.dylib} and |
| @file{libquadmath.0.dylib}. |
| |
| Many people (including the @acronym{CRAN} package repository) will not |
| accept source packages containing binary files as the latter are a |
| security risk. If you want to distribute a source package which needs |
| external software on Windows or macOS, options include |
| @itemize |
| @item |
| To arrange for installation of the package to download the |
| additional software from a URL, as e.g.@: package @CRANpkg{Cairo} does. |
| |
| @item |
| (For @acronym{CRAN}.) |
| To negotiate with Uwe Ligges to host the additional components on |
| WinBuilder, and write a @file{configure.win} file to install them. |
| |
| @end itemize |
| |
| @noindent |
| Be aware that license requirements will need to be met so you may need |
| to supply the sources for the additional components (and will if your |
| package has a GPL-like license). |
| |
| |
| @node Diagnostic messages, Internationalization, Writing portable packages, Creating R packages |
| @section Diagnostic messages |
| |
| Diagnostic messages can be made available for translation, so it is |
| important to write them in a consistent style. Using the tools |
| described in the next section to extract all the messages can give a |
| useful overview of your consistency (or lack of it). |
| Some guidelines follow. |
| |
| @itemize |
| @item |
| Messages are sentence fragments, and not viewed in isolation. So it is |
| conventional not to capitalize the first word and not to end with a |
| period (or other punctuation). |
| |
| @item |
| Try not to split up messages into small pieces. In C error messages use |
| a single format string containing all English words in the messages. |
| |
| In @R{} error messages do not construct a message with @code{paste} (such |
| messages will not be translated) but @emph{via} multiple arguments to |
| @code{stop} or @code{warning}, or @emph{via} @code{gettextf}. |
| |
| @item |
| Do not use colloquialisms such as ``can't'' and ``don't''. |
| |
| @item |
| Conventionally single quotation marks are used for quotations such as |
| |
| @example |
| 'ord' must be a positive integer, at most the number of knots |
| @end example |
| |
| @noindent |
| and double quotation marks when referring to an @R{} character string or |
| a class, such as |
| |
| @example |
| 'format' must be "normal" or "short" - using "normal" |
| @end example |
| |
| Since @acronym{ASCII} does not contain directional quotation marks, it |
| is best to use @samp{'} and let the translator (including automatic |
| translation) use directional quotations where available. The range of |
| quotation styles is immense: unfortunately we cannot reproduce them in a |
| portable @code{texinfo} document. But as a taster, some languages use |
| `up' and `down' (comma) quotes rather than left or right quotes, and |
| some use guillemets (and some use what Adobe calls `guillemotleft' to |
| start and others use it to end). |
| |
| In @R{} messages it is also possible to use @code{sQuote} or @code{dQuote} as in |
| |
| @example |
| stop(gettextf("object must be of class %s or %s", |
| dQuote("manova"), dQuote("maov")), |
| domain = NA) |
| @end example |
| |
| @item |
| Occasionally messages need to be singular or plural (and in other |
| languages there may be no such concept or several plural forms -- |
| Slovenian has four). So avoid constructions such as was once used in |
| @code{library} |
| |
| @example |
| if((length(nopkgs) > 0) && !missing(lib.loc)) @{ |
| if(length(nopkgs) > 1) |
| warning("libraries ", |
| paste(sQuote(nopkgs), collapse = ", "), |
| " contain no packages") |
| else |
| warning("library ", paste(sQuote(nopkgs)), |
| " contains no package") |
| @} |
| @end example |
| |
| @noindent |
| and was replaced by |
| |
| @example |
| if((length(nopkgs) > 0) && !missing(lib.loc)) @{ |
| pkglist <- paste(sQuote(nopkgs), collapse = ", ") |
| msg <- sprintf(ngettext(length(nopkgs), |
| "library %s contains no packages", |
| "libraries %s contain no packages", |
| domain = "R-base"), |
| pkglist) |
| warning(msg, domain=NA) |
| @} |
| @end example |
| |
| @noindent |
| Note that it is much better to have complete clauses as here, since |
| in another language one might need to say |
| `There is no package in library %s' or |
| `There are no packages in libraries %s'. |
| |
| @end itemize |
| |
| @node Internationalization, CITATION files, Diagnostic messages, Creating R packages |
| @section Internationalization |
| |
| There are mechanisms to translate the @R{}- and C-level error and warning |
| messages. There are only available if @R{} is compiled with NLS support |
| (which is requested by @command{configure} option @option{--enable-nls}, |
| the default). |
| |
| The procedures make use of @code{msgfmt} and @code{xgettext} which are |
| part of @acronym{GNU} @code{gettext} and this will need to be installed: |
| Windows users can find pre-compiled binaries at |
| @uref{https://www.stats.ox.ac.uk/@/pub/@/Rtools/@/goodies/@/gettext-tools.zip}. |
| |
| @menu |
| * C-level messages:: |
| * R messages:: |
| * Preparing translations:: |
| @end menu |
| |
| @node C-level messages, R messages, Internationalization, Internationalization |
| @subsection C-level messages |
| |
| The process of enabling translations is |
| |
| @itemize |
| @item |
| In a header file that will be included in all the C (or C++ or Objective |
| C/C++) files containing messages that should be translated, declare |
| |
| @example |
| #include <R.h> /* to include Rconfig.h */ |
| |
| #ifdef ENABLE_NLS |
| #include <libintl.h> |
| #define _(String) dgettext ("@var{pkg}", String) |
| /* replace @var{pkg} as appropriate */ |
| #else |
| #define _(String) (String) |
| #endif |
| @end example |
| |
| @item |
| For each message that should be translated, wrap it in @code{_(...)}, |
| for example |
| |
| @example |
| error(_("'ord' must be a positive integer")); |
| @end example |
| |
| If you want to use different messages for singular and plural forms, you |
| need to add |
| |
| @example |
| #ifndef ENABLE_NLS |
| #define dngettext(pkg, String, StringP, N) (N > 1 ? StringP : String) |
| #endif |
| @end example |
| |
| @noindent |
| and mark strings by |
| |
| @example |
| dngettext("@var{pkg}", @var{<singular string>}, @var{<plural string>}, n) |
| @end example |
| |
| @item |
| In the package's @file{src} directory run |
| |
| @example |
| xgettext --keyword=_ -o @var{pkg}.pot *.c |
| @end example |
| |
| @end itemize |
| |
| The file @file{src/@var{pkg}.pot} is the template file, and |
| conventionally this is shipped as @file{po/@var{pkg}.pot}. |
| |
| @node R messages, Preparing translations, C-level messages, Internationalization |
| @subsection R messages |
| |
| Mechanisms are also available to support the automatic translation of |
| @R{} @code{stop}, @code{warning} and @code{message} messages. They make |
| use of message catalogs in the same way as C-level messages, but using |
| domain @code{R-@var{pkg}} rather than @code{@var{pkg}}. Translation of |
| character strings inside @code{stop}, @code{warning} and @code{message} |
| calls is automatically enabled, as well as other messages enclosed in |
| calls to @code{gettext} or @code{gettextf}. (To suppress this, use |
| argument @code{domain=NA}.) |
| |
| Tools to prepare the @file{R-@var{pkg}.pot} file are provided in package |
| @pkg{tools}: @code{xgettext2pot} will prepare a file from all strings |
| occurring inside @code{gettext}/@code{gettextf}, @code{stop}, |
| @code{warning} and @code{message} calls. Some of these are likely to be |
| spurious and so the file is likely to need manual editing. |
| @code{xgettext} extracts the actual calls and so is more useful when |
| tidying up error messages. |
| |
| The @R{} function @code{ngettext} provides an interface to the C |
| function of the same name: see example in the previous section. It is |
| safest to use @code{domain="R-@var{pkg}"} explicitly in calls to |
| @code{ngettext}, and necessary for earlier versions of @R{} unless they |
| are calls directly from a function in the package. |
| |
| |
| @node Preparing translations, , R messages, Internationalization |
| @subsection Preparing translations |
| |
| Once the template files have been created, translations can be made. |
| Conventional translations have file extension @file{.po} and are placed |
| in the @file{po} subdirectory of the package with a name that is either |
| @samp{@var{ll}.po} or @samp{R-@var{ll}.po} for translations of the C and @R{} |
| messages respectively to language with code @samp{@var{ll}}. |
| |
| @ifset UseExternalXrefs |
| @xref{Localization of messages, , Localization of messages, R-admin, |
| R Installation and Administration}, for details of language codes. |
| @end ifset |
| @ifclear UseExternalXrefs |
| See `Localization of messages' in `R Installation and Administration', |
| for details of language codes. |
| @end ifclear |
| |
| There is an @R{} function, @code{update_pkg_po} in package @pkg{tools}, |
| to automate much of the maintenance of message translations. See its |
| help for what it does in detail. |
| |
| If this is called on a package with no existing translations, it creates |
| the directory @file{@var{pkgdir}/po}, creates a template file of @R{} |
| messages, @file{@var{pkgdir}/po/R-@var{pkg}.pot}, within it, creates the |
| @samp{en@@quot} translation and installs that. (The @samp{en@@quot} |
| pseudo-language interprets quotes in their directional forms in suitable |
| (e.g.@: UTF-8) locales.) |
| |
| If the package has C source files in its @file{src} directory |
| that are marked for translation, use |
| |
| @example |
| touch @var{pkgdir}/po/@var{pkg}.pot |
| @end example |
| |
| @noindent |
| to create a dummy template file, then call @code{update_pkg_po} again |
| (this can also be done before it is called for the first time). |
| |
| When translations to new languages are added in the @file{@var{pkgdir}/po} |
| directory, running the same command will check and then |
| install the translations. |
| |
| If the package sources are updated, the same command will update the |
| template files, merge the changes into the translation @file{.po} files |
| and then installed the updated translations. You will often see that |
| merging marks translations as `fuzzy' and this is reported in the |
| coverage statistics. As fuzzy translations are @emph{not} used, this is |
| an indication that the translation files need human attention. |
| |
| The merged translations are run through @code{tools::checkPofile} to |
| check that C-style formats are used correctly: if not the mismatches are |
| reported and the broken translations are not installed. |
| |
| This function needs the GNU @command{gettext-tools} installed and on the |
| path: see its help page. |
| |
| |
| @findex CITATION |
| @cindex citation |
| @node CITATION files, Package types, Internationalization, Creating R packages |
| @section CITATION files |
| |
| An installed file named @file{CITATION} will be used by the |
| @code{citation()} function. (It should be in the @file{inst} |
| subdirectory of the package sources.) |
| |
| The @file{CITATION} file is parsed as @R{} code (in the package's |
| declared encoding, or in @acronym{ASCII} if none is declared). If no |
| such file is present, @code{citation} auto-generates citation |
| information from the package @file{DESCRIPTION} metadata, and an example |
| of what that would look like as a @file{CITATION} file can be seen in |
| recommended package @CRANpkg{nlme} (see below): recommended packages |
| @CRANpkg{boot}, @CRANpkg{cluster} and @CRANpkg{mgcv} have further |
| examples. |
| |
| A @file{CITATION} file will contain calls to function @code{bibentry}. |
| |
| Here is that for @CRANpkg{nlme}: |
| |
| @example |
| year <- sub("-.*", "", meta$Date) |
| note <- sprintf("R package version %s", meta$Version) |
| |
| bibentry(bibtype = "Manual", |
| title = "@{nlme@}: Linear and Nonlinear Mixed Effects Models", |
| author = c(person("Jose", "Pinheiro"), |
| person("Douglas", "Bates"), |
| person("Saikat", "DebRoy"), |
| person("Deepayan", "Sarkar"), |
| person("R Core Team")), |
| year = year, |
| note = note, |
| url = "https://CRAN.R-project.org/package=nlme") |
| @end example |
| |
| Note the way that information that may need to be updated is picked up |
| from object @code{meta}, a parsed version of the @file{DESCRIPTION} file |
| -- it is tempting to hardcode such information, but it normally then |
| gets outdated. See @code{?bibentry} for further details of the |
| information which can be provided. |
| |
| In case a bibentry contains @LaTeX{} markup (e.g., for accented |
| characters or mathematical symbols), it may be necessary to provide a |
| text representation to be used for printing @emph{via} the |
| @code{textVersion} argument to @code{bibentry}. E.g., earlier versions |
| of @CRANpkg{nlme} additionally used |
| |
| @example |
| textVersion = |
| paste0("Jose Pinheiro, Douglas Bates, Saikat DebRoy,", |
| "Deepayan Sarkar and the R Core Team (", |
| year, |
| "). nlme: Linear and Nonlinear Mixed Effects Models. ", |
| note, ".") |
| @end example |
| |
| The @file{CITATION} file should itself produce no output when |
| @code{source}-d. |
| |
| It is desirable (and essential for @acronym{CRAN}) that the |
| @file{CITATION} file does not contain calls to functions such as |
| @code{packageDescription} which assume the package is installed in a |
| library tree on the package search path. |
| |
| @node Package types, Services, CITATION files, Creating R packages |
| @section Package types |
| |
| The @file{DESCRIPTION} file has an optional field @code{Type} which if |
| missing is assumed to be @samp{Package}, the sort of extension discussed |
| so far in this chapter. Currently one other type is recognized; there |
| used also to be a @samp{Translation} type. |
| |
| @menu |
| * Frontend:: |
| @end menu |
| |
| @node Frontend, , Package types, Package types |
| @subsection Frontend |
| |
| This is a rather general mechanism, designed for adding new front-ends |
| such as the former @pkg{gnomeGUI} package (see the @file{Archive} area on |
| @acronym{CRAN}). If a @file{configure} file is found in the top-level |
| directory of the package it is executed, and then if a @file{Makefile} |
| is found (often generated by @file{configure}), @code{make} is called. |
| If @code{R CMD INSTALL --clean} is used @code{make clean} is called. No |
| other action is taken. |
| |
| @code{R CMD build} can package up this type of extension, but @code{R |
| CMD check} will check the type and skip it. |
| |
| Many packages of this type need write permission for the @R{} |
| installation directory. |
| |
| @node Services, , Package types, Creating R packages |
| @section Services |
| |
| Several members of the @R{} project have set up services to assist those |
| writing @R{} packages, particularly those intended for public |
| distribution. |
| |
| @uref{https://win-builder.r-project.org, win-builder.r-project.org} |
| offers the automated preparation of (32/64-bit) Windows binaries from |
| well-tested source packages. |
| |
| R-Forge (@uref{https://R-Forge.r-project.org, R-Forge.r-project.org}) and |
| RForge (@uref{https://www.rforge.net, www.rforge.net}) are similar |
| services with similar names. Both provide source-code management |
| through SVN, daily building and checking, mailing lists and a repository |
| that can be accessed @emph{via} @code{install.packages} (they can be |
| selected by @code{setRepositories} and the GUI menus that use it). |
| Package developers have the opportunity to present their work on the |
| basis of project websites or news announcements. Mailing lists, forums |
| or wikis provide useRs with convenient instruments for discussions and |
| for exchanging information between developers and/or interested useRs. |
| |
| |
| @node Writing R documentation files, Tidying and profiling R code, Creating R packages, Top |
| @chapter Writing R documentation files |
| @cindex Documentation, writing |
| |
| @menu |
| * Rd format:: |
| * Sectioning:: |
| * Marking text:: |
| * Lists and tables:: |
| * Cross-references:: |
| * Mathematics:: |
| * Figures:: |
| * Insertions:: |
| * Indices:: |
| * Platform-specific sections:: |
| * Conditional text:: |
| * Dynamic pages:: |
| * User-defined macros:: |
| * Encoding:: |
| * Processing documentation files:: |
| * Editing Rd files:: |
| @end menu |
| |
| @node Rd format, Sectioning, Writing R documentation files, Writing R documentation files |
| @section Rd format |
| |
| @R{} objects are documented in files written in ``@R{} documentation'' |
| (Rd) format, a simple markup language much of which closely resembles |
| (La)@TeX{}, which can be processed into a variety of formats, |
| including @LaTeX{}, @HTML{} and plain text. The translation is |
| carried out by functions in the @pkg{tools} package called by the |
| script @command{Rdconv} in @file{@var{R_HOME}/bin} and by the |
| installation scripts for packages. |
| |
| @c 1324 as of 2011-01-16 |
| The @R{} distribution contains more than 1300 such files which can be |
| found in the @file{src/library/@var{pkg}/man} directories of the @R{} |
| source tree, where @var{pkg} stands for one of the standard packages |
| which are included in the @R{} distribution. |
| |
| As an example, let us look at a simplified version of |
| @file{src/library/base/man/load.Rd} which documents the @R{} function |
| @code{load}. |
| |
| @quotation |
| @cartouche |
| @smallexample |
| % File src/library/base/man/load.Rd |
| \name@{load@} |
| \alias@{load@} |
| \title@{Reload Saved Datasets@} |
| \description@{ |
| Reload the datasets written to a file with the function |
| \code@{save@}. |
| @} |
| \usage@{ |
| load(file, envir = parent.frame()) |
| @} |
| \arguments@{ |
| \item@{file@}@{a connection or a character string giving the |
| name of the file to load.@} |
| \item@{envir@}@{the environment where the data should be |
| loaded.@} |
| @} |
| \seealso@{ |
| \code@{\link@{save@}@}. |
| @} |
| \examples@{ |
| ## save all data |
| save(list = ls(), file= "all.RData") |
| |
| ## restore the saved values to the current environment |
| load("all.RData") |
| |
| ## restore the saved values to the workspace |
| load("all.RData", .GlobalEnv) |
| @} |
| \keyword@{file@} |
| @end smallexample |
| @end cartouche |
| @end quotation |
| |
| An @file{Rd} file consists of three parts. The header gives basic |
| information about the name of the file, the topics documented, a title, |
| a short textual description and @R{} usage information for the objects |
| documented. The body gives further information (for example, on the |
| function's arguments and return value, as in the above example). |
| Finally, there is an optional footer with keyword information. The |
| header is mandatory. |
| |
| Information is given within a series of @emph{sections} with standard |
| names (and user-defined sections are also allowed). Unless otherwise |
| specified@footnote{e.g.@: @code{\alias}, @code{\keyword} and |
| @code{\note} sections.} these should occur only once in an @file{Rd} |
| file (in any order), and the processing software will retain only the |
| first occurrence of a standard section in the file, with a warning. |
| |
| See @uref{https://developer.r-project.org/Rds.html, ``Guidelines for Rd |
| files''} for guidelines for writing documentation in @file{Rd} format |
| which should be useful for package writers. |
| @findex prompt |
| The @R{} |
| generic function @code{prompt} is used to construct a bare-bones @file{Rd} |
| file ready for manual editing. Methods are defined for documenting |
| functions (which fill in the proper function and argument names) and |
| data frames. There are also functions @code{promptData}, |
| @code{promptPackage}, @code{promptClass}, and @code{promptMethods} for |
| other types of @file{Rd} file. |
| |
| The general syntax of @file{Rd} files is summarized below. For a detailed |
| technical discussion of current @file{Rd} syntax, see |
| @uref{https://developer.r-project.org/parseRd.pdf, ``Parsing Rd files''}. |
| |
| @file{Rd} files consist of four types of text input. The most common |
| is @LaTeX{}-like, with the backslash used as a prefix on markup |
| (e.g.@: @code{\alias}), and braces used to indicate arguments |
| (e.g.@: @code{@{load@}}). The least common type of text is `verbatim' |
| text, where no markup other than the comment marker (@code{%}) is |
| processed. There is also a rare variant of `verbatim' text |
| (used in @code{\eqn}, @code{\deqn}, @code{\figure}, |
| and @code{\newcommand}) where comment markers need not be escaped. |
| The final type is @R{}-like, intended for @R{} code, but allowing some |
| embedded macros. Quoted strings within @R{}-like text are handled |
| specially: regular character escapes such as @code{\n} may be entered |
| as-is. Only markup starting with @code{\l} (e.g.@: @code{\link}) or |
| @code{\v} (e.g.@: @code{\var}) will be recognized within quoted strings. |
| The rarely used vertical tab @code{\v} must be entered as @code{\\v}. |
| |
| Each macro defines the input type for its argument. For example, the |
| file initially uses @LaTeX{}-like syntax, and this is also used in the |
| @code{\description} section, but the @code{\usage} section uses |
| @R{}-like syntax, and the @code{\alias} macro uses `verbatim' syntax. |
| Comments run from a percent symbol @code{%} to the end of the line in |
| all types of text except the rare `verbatim' variant |
| (as on the first line of the @code{load} example). |
| |
| Because backslashes, braces and percent symbols have special meaning, to |
| enter them into text sometimes requires escapes using a backslash. In |
| general balanced braces do not need to be escaped, but percent symbols |
| always do, except in the `verbatim' variant. |
| For the complete list of macros and rules for escapes, see |
| @uref{https://developer.r-project.org/parseRd.pdf, ``Parsing Rd files''}. |
| |
| @menu |
| * Documenting functions:: |
| * Documenting data sets:: |
| * Documenting S4 classes and methods:: |
| * Documenting packages:: |
| @end menu |
| |
| @node Documenting functions, Documenting data sets, Rd format, Rd format |
| @subsection Documenting functions |
| |
| The basic markup commands used for documenting @R{} objects (in |
| particular, functions) are given in this subsection. |
| |
| @table @code |
| @item \name@{@var{name}@} |
| @findex \name |
| @var{name} typically@footnote{There can be exceptions: for example |
| @file{Rd} files are not allowed to start with a dot, and have to be |
| uniquely named on a case-insensitive file system.} is the basename of |
| the @file{Rd} file containing the documentation. It is the ``name'' of |
| the @file{Rd} object represented by the file and has to be unique in a |
| package. To avoid problems with indexing the package manual, it may not |
| @c Problems seen in 2.13.x but not 2.14.0 |
| contain @samp{!} @samp{|} nor @samp{@@}, and to avoid possible problems |
| with the @HTML{} help system it should not contain @samp{/} nor a space. |
| (@LaTeX{} special characters are allowed, but may not be collated |
| correctly in the index.) There can only be one @code{\name} entry in a |
| file, and it must not contain any markup. Entries in the package manual |
| will be in alphabetic@footnote{in the current locale, and with special |
| treatment for @LaTeX{} special characters and with any |
| @samp{@var{pkgname}-package} topic moved to the top of the list.} order |
| of the @code{\name} entries. |
| |
| @item \alias@{@var{topic}@} |
| @findex \alias |
| The @code{\alias} sections specify all ``topics'' the file documents. |
| This information is collected into index data bases for lookup by the |
| on-line (plain text and @HTML{}) help systems. The @var{topic} can |
| contain spaces, but (for historical reasons) leading and trailing spaces |
| will be stripped. Percent and left brace need to be escaped by |
| a backslash. |
| |
| There may be several @code{\alias} entries. Quite often it is |
| convenient to document several @R{} objects in one file. For example, |
| file @file{Normal.Rd} documents the density, distribution function, |
| quantile function and generation of random variates for the normal |
| distribution, and hence starts with |
| |
| @example |
| @group |
| \name@{Normal@} |
| \alias@{Normal@} |
| \alias@{dnorm@} |
| \alias@{pnorm@} |
| \alias@{qnorm@} |
| \alias@{rnorm@} |
| @end group |
| @end example |
| |
| @noindent |
| Also, it is often convenient to have several different ways to refer to |
| an @R{} object, and an @code{\alias} does not need to be the name of an |
| object. |
| |
| Note that the @code{\name} is not necessarily a topic documented, and if |
| so desired it needs to have an explicit @code{\alias} entry (as in this |
| example). |
| |
| @item \title@{@var{Title}@} |
| @findex \title |
| Title information for the @file{Rd} file. This should be capitalized |
| and not end in a period; try to limit its length to at most 65 |
| characters for widest compatibility. |
| |
| Markup is supported in the text, but use of characters other than |
| English text and punctuation (e.g., @samp{<}) may limit portability. |
| |
| There must be one (and only one) @code{\title} section in a help file. |
| |
| @item \description@{@dots{}@} |
| @findex \description |
| A short description of what the function(s) do(es) (one paragraph, a few |
| lines only). (If a description is too long and cannot easily be |
| shortened, the file probably tries to document too much at once.) |
| This is mandatory except for package-overview files. |
| |
| @item \usage@{@var{fun}(@var{arg1}, @var{arg2}, @dots{})@} |
| @findex \usage |
| One or more lines showing the synopsis of the function(s) and variables |
| documented in the file. These are set in typewriter font. This is an |
| @R{}-like command. |
| |
| The usage information specified should match the function definition |
| @emph{exactly} (such that automatic checking for consistency between |
| code and documentation is possible). |
| |
| To indicate that a function can be used in several different ways, |
| depending on the named arguments specified, use section @code{\details}. |
| E.g., @file{abline.Rd} contains |
| |
| @example |
| @group |
| \details@{ |
| Typical usages are |
| \preformatted@{abline(a, b, untf = FALSE, \dots) |
| ...... |
| @} |
| @end group |
| @end example |
| |
| @findex \method |
| Use @code{\method@{@var{generic}@}@{@var{class}@}} to indicate the name |
| of an S3 method for the generic function @var{generic} for objects |
| inheriting from class @code{"@var{class}"}. In the printed versions, |
| this will come out as @var{generic} (reflecting the understanding that |
| methods should not be invoked directly but @emph{via} method dispatch), but |
| @code{codoc()} and other QC tools always have access to the full name. |
| |
| For example, @file{print.ts.Rd} contains |
| |
| @example |
| @group |
| \usage@{ |
| \method@{print@}@{ts@}(x, calendar, \dots) |
| @} |
| @end group |
| @end example |
| |
| @noindent |
| which will print as |
| |
| @example |
| @group |
| Usage: |
| |
| ## S3 method for class 'ts': |
| print(x, calendar, ...) |
| @end group |
| @end example |
| |
| Usage for replacement functions should be given in the style of |
| @code{dim(x) <- value} rather than explicitly indicating the name of the |
| replacement function (@w{@code{"dim<-"}} in the above). Similarly, one |
| can use @code{\method@{@var{generic}@}@{@var{class}@}(@var{arglist}) <- |
| value} to indicate the usage of an S3 replacement method for the generic |
| replacement function @code{"@var{generic}<-"} for objects inheriting |
| from class @code{"@var{class}"}. |
| |
| Usage for S3 methods for extracting or replacing parts of an object, S3 |
| methods for members of the Ops group, and S3 methods for user-defined |
| (binary) infix operators (@samp{%@var{xxx}%}) follows the above rules, |
| using the appropriate function names. E.g., @file{Extract.factor.Rd} |
| contains |
| |
| @example |
| @group |
| \usage@{ |
| \method@{[@}@{factor@}(x, \dots, drop = FALSE) |
| \method@{[[@}@{factor@}(x, \dots) |
| \method@{[@}@{factor@}(x, \dots) <- value |
| @} |
| @end group |
| @end example |
| |
| @noindent |
| which will print as |
| |
| @example |
| @group |
| Usage: |
| |
| ## S3 method for class 'factor': |
| x[..., drop = FALSE] |
| ## S3 method for class 'factor': |
| x[[...]] |
| ## S3 replacement method for class 'factor': |
| x[...] <- value |
| @end group |
| @end example |
| |
| @findex \S3method |
| @code{\S3method} is accepted as an alternative to @code{\method}. |
| |
| @item \arguments@{@dots{}@} |
| @findex \arguments |
| Description of the function's arguments, using an entry of the form |
| |
| @example |
| \item@{@var{arg_i}@}@{@var{Description of arg_i}.@} |
| @end example |
| |
| @noindent for each element of the argument list. (Note that there is |
| no whitespace between the three parts of the entry.) There may be |
| optional text outside the @code{\item} entries, for example to give |
| general information about groups of parameters. |
| |
| |
| @item \details@{@dots{}@} |
| @findex \details |
| A detailed if possible precise description of the functionality |
| provided, extending the basic information in the @code{\description} |
| slot. |
| |
| @item \value@{@dots{}@} |
| @findex \value |
| Description of the function's return value. |
| |
| If a list with multiple values is returned, you can use entries of the |
| form |
| |
| @example |
| \item@{@var{comp_i}@}@{@var{Description of comp_i}.@} |
| @end example |
| |
| @noindent |
| for each component of the list returned. Optional text may |
| precede@footnote{Text between or after list items is discouraged.} this |
| list (see for example the help for @code{rle}). Note that @code{\value} |
| is implicitly a @code{\describe} environment, so that environment should |
| not be used for listing components, just individual @code{\item@{@}@{@}} |
| entries. |
| |
| @item \references@{@dots{}@} |
| @findex \references |
| A section with references to the literature. Use @code{\url@{@}} or |
| @code{\href@{@}@{@}} for web pointers. |
| |
| @item \note@{...@} |
| @findex \note |
| Use this for a special note you want to have pointed out. Multiple |
| @code{\note} sections are allowed, but might be confusing to the end users. |
| |
| For example, @file{pie.Rd} contains |
| |
| @example |
| @group |
| \note@{ |
| Pie charts are a very bad way of displaying information. |
| The eye is good at judging linear measures and bad at |
| judging relative areas. |
| ...... |
| @} |
| @end group |
| @end example |
| |
| @item \author@{@dots{}@} |
| @findex \author |
| Information about the author(s) of the @file{Rd} file. Use |
| @code{\email@{@}} without extra delimiters (such as @samp{( )} or |
| @samp{< >}) to specify email addresses, or @code{\url@{@}} or |
| @code{\href@{@}@{@}} for web pointers. |
| |
| @item \seealso@{@dots{}@} |
| @findex \seealso |
| Pointers to related @R{} objects, using @code{\code@{\link@{...@}@}} to |
| refer to them (@code{\code} is the correct markup for @R{} object names, |
| and @code{\link} produces hyperlinks in output formats which support |
| this. @xref{Marking text}, and @ref{Cross-references}). |
| |
| @findex \examples |
| @item \examples@{@dots{}@} |
| Examples of how to use the function. Code in this section is set |
| in typewriter font without reformatting and is run by |
| @code{example()} unless marked otherwise (see below). |
| |
| Examples are not only useful for documentation purposes, but also |
| provide test code used for diagnostic checking of @R{} code. By |
| default, text inside @code{\examples@{@}} will be displayed in the |
| output of the help page and run by @code{example()} and by @code{R CMD |
| check}. You can use @code{\dontrun@{@}} |
| @findex \dontrun |
| for text that should only be shown, but not run, and |
| @code{\dontshow@{@}} |
| @findex \dontshow |
| for extra commands for testing that should not be shown to users, but |
| will be run by @code{example()}. (Previously this was called |
| @code{\testonly}, and that is still accepted.) |
| |
| Text inside @code{\dontrun@{@}} is `verbatim', but the other parts |
| of the @code{\examples} section are @R{}-like text. |
| |
| For example, |
| |
| @example |
| @group |
| x <- runif(10) # @r{Shown and run.} |
| \dontrun@{plot(x)@} # @r{Only shown.} |
| \dontshow@{log(x)@} # @r{Only run.} |
| @end group |
| @end example |
| |
| Thus, example code not included in @code{\dontrun} must be executable! |
| In addition, it should not use any system-specific features or require |
| special facilities (such as Internet access or write permission to |
| specific directories). Text included in @code{\dontrun} is indicated by |
| comments in the processed help files: it need not be valid @R{} code but |
| the escapes must still be used for @code{%}, @code{\} and unpaired |
| braces as in other `verbatim' text. |
| |
| Example code must be capable of being run by @code{example}, which uses |
| @code{source}. This means that it should not access @file{stdin}, |
| e.g.@: to @code{scan()} data from the example file. |
| |
| Data needed for making the examples executable can be obtained by random |
| number generation (for example, @code{x <- rnorm(100)}), or by using |
| standard data sets listed by @code{data()} (see @code{?data} for more |
| info). |
| |
| Finally, there is @code{\donttest}, used (at the beginning of a separate |
| line) to mark code that should be run by @code{example()} but not by |
| @code{R CMD check} (by default: the option @option{--run-donttest} can |
| be used). This should be needed only occasionally but can be used for |
| code which might fail in circumstances that are hard to test for, for |
| example in some locales. (Use e.g.@: @code{capabilities()} or |
| @code{nzchar(Sys.which("someprogram"))} to test for features needed in |
| the examples wherever possible, and you can also use @code{try()} or |
| @code{tryCatch()}. Use @code{interactive()} to condition examples which |
| need someone to interact with.) Note that code included in |
| @code{\donttest} must be correct @R{} code, and any packages used should |
| be declared in the @file{DESCRIPTION} file. It is good practice to |
| include a comment in the @code{\donttest} section explaining why it is |
| needed. |
| |
| Output from code between comments |
| @example |
| ## IGNORE_RDIFF_BEGIN |
| ## IGNORE_RDIFF_END |
| @end example |
| @noindent |
| is ignored when comparing check output to reference output (a |
| @file{-Ex.Rout.save} file). This markup can also be used for scripts |
| under @file{tests}. |
| |
| @findex \keyword |
| @item \keyword@{@var{key}@} |
| There can be zero or more @code{\keyword} sections per file. |
| Each @code{\keyword} section should specify a single keyword, preferably |
| one of the standard keywords as listed in file @file{KEYWORDS} in the |
| @R{} documentation directory (default @file{@var{R_HOME}/doc}). Use |
| e.g.@: @code{RShowDoc("KEYWORDS")} to inspect the standard keywords from |
| within @R{}. There can be more than one @code{\keyword} entry if the @R{} |
| object being documented falls into more than one category, or none. |
| |
| Do strongly consider using @code{\concept} (@pxref{Indices}) instead of |
| @code{\keyword} if you are about to use more than very few non-standard |
| keywords. |
| |
| The special keyword @samp{internal} marks a page of internal objects |
| that are not part of the package's API. If the help page for object |
| @code{foo} has keyword @samp{internal}, then @code{help(foo)} gives this |
| help page, but @code{foo} is excluded from several object indices, |
| including the alphabetical list of objects in the @HTML{} help system. |
| |
| @code{help.search()} can search by keyword, including user-defined |
| values: however the `Search Engine & Keywords' @HTML{} page accessed |
| @emph{via} @code{help.start()} provides single-click access only to a |
| pre-defined list of keywords. |
| @end table |
| |
| |
| @node Documenting data sets, Documenting S4 classes and methods, Documenting functions, Rd format |
| @subsection Documenting data sets |
| |
| The structure of @file{Rd} files which document @R{} data sets is slightly |
| different. Sections such as @code{\arguments} and @code{\value} are not |
| needed but the format and source of the data should be explained. |
| |
| As an example, let us look at @file{src/library/datasets/man/rivers.Rd} |
| which documents the standard @R{} data set @code{rivers}. |
| |
| @quotation |
| @cartouche |
| @smallexample |
| \name@{rivers@} |
| \docType@{data@} |
| \alias@{rivers@} |
| \title@{Lengths of Major North American Rivers@} |
| \description@{ |
| This data set gives the lengths (in miles) of 141 \dQuote@{major@} |
| rivers in North America, as compiled by the US Geological |
| Survey. |
| @} |
| \usage@{rivers@} |
| \format@{A vector containing 141 observations.@} |
| \source@{World Almanac and Book of Facts, 1975, page 406.@} |
| \references@{ |
| McNeil, D. R. (1977) \emph@{Interactive Data Analysis@}. |
| New York: Wiley. |
| @} |
| \keyword@{datasets@} |
| @end smallexample |
| @end cartouche |
| @end quotation |
| |
| This uses the following additional markup commands. |
| |
| @table @code |
| @item \docType@{@dots{}@} |
| Indicates the ``type'' of the documentation object. Always @samp{data} |
| for data sets, and @samp{package} for @file{@var{pkg}-package.Rd} |
| overview files. Documentation for S4 methods and classes uses |
| @samp{methods} (from @code{promptMethods()}) and @samp{class} (from |
| @code{promptClass()}). |
| |
| @item \format@{@dots{}@} |
| @findex \format |
| A description of the format of the data set (as a vector, matrix, data |
| frame, time series, @dots{}). For matrices and data frames this should |
| give a description of each column, preferably as a list or table. |
| @xref{Lists and tables}, for more information. |
| |
| @item \source@{@dots{}@} |
| @findex \source |
| Details of the original source (a reference or @acronym{URL}, |
| @pxref{Specifying URLs}). In addition, section @code{\references} could |
| give secondary sources and usages. |
| @end table |
| |
| Note also that when documenting data set @var{bar}, |
| |
| @itemize @bullet |
| @item |
| The @code{\usage} entry is always @code{@var{bar}} or (for packages |
| which do not use lazy-loading of data) @code{data(@var{bar})}. (In |
| particular, only document a @emph{single} data object per @file{Rd} file.) |
| @item |
| The @code{\keyword} entry should always be @samp{datasets}. |
| @end itemize |
| |
| If @code{@var{bar}} is a data frame, documenting it as a data set can |
| be initiated @emph{via} @code{prompt(@var{bar})}. Otherwise, the @code{promptData} |
| function may be used. |
| |
| @node Documenting S4 classes and methods, Documenting packages, Documenting data sets, Rd format |
| @subsection Documenting S4 classes and methods |
| |
| There are special ways to use the @samp{?} operator, namely |
| @samp{class?@var{topic}} and @samp{methods?@var{topic}}, to access |
| documentation for S4 classes and methods, respectively. This mechanism |
| depends on conventions for the topic names used in @code{\alias} |
| entries. The topic names for S4 classes and methods respectively are of |
| the form |
| |
| @example |
| @var{class}-class |
| @var{generic},@var{signature_list}-method |
| @end example |
| |
| @noindent |
| where @var{signature_list} contains the names of the classes in the |
| signature of the method (without quotes) separated by @samp{,} (without |
| whitespace), with @samp{ANY} used for arguments without an explicit |
| specification. E.g., @samp{genericFunction-class} is the topic name for |
| documentation for the S4 class @code{"genericFunction"}, and |
| @samp{coerce,ANY,NULL-method} is the topic name for documentation for |
| the S4 method for @code{coerce} for signature @code{c("ANY", "NULL")}. |
| |
| Skeletons of documentation for S4 classes and methods can be generated |
| by using the functions @code{promptClass()} and @code{promptMethods()} |
| from package @pkg{methods}. If it is necessary or desired to provide an |
| explicit function declaration (in a @code{\usage} section) for an S4 |
| method (e.g., if it has ``surprising arguments'' to be mentioned |
| explicitly), one can use the special markup |
| |
| @example |
| \S4method@{@var{generic}@}@{@var{signature_list}@}(@var{argument_list}) |
| @end example |
| |
| @noindent |
| (e.g., @samp{\S4method@{coerce@}@{ANY,NULL@}(from, to)}). |
| |
| To make full use of the potential of the on-line documentation system, |
| all user-visible S4 classes and methods in a package should at least |
| have a suitable @code{\alias} entry in one of the package's @file{Rd} files. |
| If a package has methods for a function defined originally somewhere |
| else, and does not change the underlying default method for the |
| function, the package is responsible for documenting the methods it |
| creates, but not for the function itself or the default method. |
| |
| An S4 replacement method is documented in the same way as an S3 one: see |
| the description of @code{\method} in @ref{Documenting functions}. |
| |
| |
| See @code{help("Documentation", package = "methods")} for more |
| information on using and creating on-line documentation for S4 classes and |
| methods. |
| |
| @node Documenting packages, , Documenting S4 classes and methods, Rd format |
| @subsection Documenting packages |
| |
| Packages may have an overview help page with an @code{\alias} |
| @code{@var{pkgname}-package}, e.g.@: @samp{utils-package} for the |
| @pkg{utils} package, when @code{package?@var{pkgname}} will open that |
| help page. If a topic named @code{@var{pkgname}} does not exist in |
| another @file{Rd} file, it is helpful to use this as an additional |
| @code{\alias}. |
| |
| Skeletons of documentation for a package can be generated using the |
| function @code{promptPackage()}. If the @code{final = LIBS} argument |
| is used, then the @file{Rd} file will be generated in final form, containing |
| the information that would be produced up to |
| @code{library(help = @var{pkgname})}. Otherwise (the default) comments |
| will be inserted giving suggestions for content. |
| |
| Apart from the mandatory @code{\name} and @code{\title} and the |
| @code{@var{pkgname}-package} alias, the only requirement for the package |
| overview page is that it include a @code{\docType@{package@}} statement. |
| All other content is optional. We suggest that it should be a short |
| overview, to give a reader unfamiliar with the package enough |
| information to get started. More extensive documentation is better |
| placed into a package vignette (@pxref{Writing package vignettes}) and |
| referenced from this page, or into individual man pages for the |
| functions, datasets, or classes. |
| |
| @node Sectioning, Marking text, Rd format, Writing R documentation files |
| @section Sectioning |
| |
| To begin a new paragraph or leave a blank line in an example, just |
| insert an empty line (as in (La)@TeX{}). To break a line, use |
| @code{\cr}. |
| @findex \cr |
| |
| In addition to the predefined sections (such as @code{\description@{@}}, |
| @code{\value@{@}}, etc.), you can ``define'' arbitrary ones by |
| @code{\section@{@var{section_title}@}@{@dots{}@}}. |
| @findex \section |
| For example |
| |
| @example |
| \section@{Warning@}@{ |
| You must not call this function unless @dots{} |
| @} |
| @end example |
| |
| @noindent |
| For consistency with the pre-assigned sections, the section name (the |
| first argument to @code{\section}) should be capitalized (but not all |
| upper case). Whitespace between the first and second braced expressions |
| is not allowed. Markup (e.g.@: @code{\code}) within the section title |
| may cause problems with the latex conversion (depending on the version |
| of macro packages such as @samp{hyperref}) and so should be avoided. |
| |
| The @code{\subsection} macro takes arguments in the same format as |
| @code{\section}, but is used within a section, so it may be used to |
| nest subsections within sections or other subsections. There is no |
| predefined limit on the nesting level, but formatting is not designed |
| for more than 3 levels (i.e.@: subsections within subsections within |
| sections). |
| |
| Note that additional named sections are always inserted at a fixed |
| position in the output (before @code{\note}, @code{\seealso} and the |
| examples), no matter where they appear in the input (but in the same |
| order amongst themselves as in the input). |
| |
| |
| @node Marking text, Lists and tables, Sectioning, Writing R documentation files |
| @section Marking text |
| @cindex Marking text in documentation |
| |
| The following logical markup commands are available for emphasizing or |
| quoting text. |
| |
| @table @code |
| @item \emph@{@var{text}@} |
| @findex \emph |
| @itemx \strong@{@var{text}@} |
| @findex \strong |
| Emphasize @var{text} using @emph{italic} and @strong{bold} font if |
| possible; @code{\strong} is regarded as stronger (more emphatic). |
| |
| @item \bold@{@var{text}@} |
| @findex \bold |
| Set @var{text} in @b{bold} font where possible. |
| |
| @item \sQuote@{@var{text}@} |
| @findex \sQuote |
| @itemx \dQuote@{@var{text}@} |
| @findex \dQuote |
| Portably single or double quote @var{text} (without hard-wiring the |
| characters used for quotation marks). |
| @end table |
| |
| Each of the above commands takes @LaTeX{}-like input, so other macros |
| may be used within @var{text}. |
| |
| The following logical markup commands are available for indicating |
| specific kinds of text. Except as noted, these take `verbatim' text |
| input, and so other macros may not be used within them. Some characters |
| will need to be escaped (@pxref{Insertions}). |
| |
| @table @code |
| @item \code@{@var{text}@} |
| @findex \code |
| Indicate text that is a literal example of a piece of an @R{} program, |
| e.g., a fragment of @R{} code or the name of an @R{} object. Text is |
| entered in @R{}-like syntax, and displayed using @code{typewriter} font |
| where possible. Macros @code{\var} and @code{\link} are interpreted within |
| @var{text}. |
| |
| @item \preformatted@{@var{text}@} |
| @findex \preformatted |
| Indicate text that is a literal example of a piece of a program. Text |
| is displayed using @code{typewriter} font where possible. Formatting, |
| e.g.@: line breaks, is preserved. (Note that this includes a line break |
| after the initial @{, so typically text should start on the same line as |
| the command.) |
| |
| Due to limitations in @LaTeX{} as of this writing, this macro may not be |
| nested within other markup macros other than @code{\dQuote} and |
| @code{\sQuote}, as errors or bad formatting may result. |
| |
| @item \kbd@{@var{keyboard-characters}@} |
| @findex \kbd |
| Indicate keyboard input, using @kbd{slanted typewriter} font if |
| possible, so users can distinguish the characters they are supposed to |
| type from computer output. Text is entered `verbatim'. |
| |
| @item \samp@{@var{text}@} |
| @findex \samp |
| Indicate text that is a literal example of a sequence of characters, |
| entered `verbatim'. No wrapping or reformatting will occur. Displayed |
| using @code{typewriter} font where possible. |
| |
| |
| @item \verb@{@var{text}@} |
| @findex \verb |
| Indicate text that is a literal example of a sequence of characters, |
| with no interpretation of e.g.@: @code{\var}, but which will be included |
| within word-wrapped text. Displayed using @code{typewriter} font if |
| possible. |
| |
| @item \pkg@{@var{package_name}@} |
| @findex \pkg |
| Indicate the name of an @R{} package. @LaTeX{}-like. |
| |
| @item \file@{@var{file_name}@} |
| @findex \file |
| Indicate the name of a file. Text is @LaTeX{}-like, so backslash needs |
| to be escaped. Displayed using a distinct font where possible. |
| |
| @item \email@{@var{email_address}@} |
| @findex \email |
| Indicate an electronic mail address. @LaTeX{}-like, will be rendered as |
| a hyperlink in @HTML{} and PDF conversion. Displayed using |
| @code{typewriter} font where possible. |
| |
| @item \url@{@var{uniform_resource_locator}@} |
| @findex \url |
| Indicate a uniform resource locator (@acronym{URL}) for the World Wide |
| Web. The argument is handled as `verbatim' text (with percent and |
| braces escaped by backslash), and rendered as a hyperlink in @HTML{} and |
| PDF conversion. Linefeeds are removed, and leading and trailing |
| whitespace@footnote{as defined by the @R{} function @code{trimws}.} is |
| removed. @xref{Specifying URLs}. |
| |
| Displayed using @code{typewriter} font where possible. |
| |
| @item \href@{@var{uniform_resource_locator}@}@{@var{text}@} |
| @findex \href |
| Indicate a hyperlink to the World Wide Web. The first argument is |
| handled as `verbatim' text (with percent and braces escaped by |
| backslash) and is used as the @acronym{URL} in the hyperlink, with the |
| second argument of @LaTeX{}-like text displayed to the user. Linefeeds |
| are removed from the first argument, and leading and trailing whitespace |
| is removed. |
| |
| Note that RFC3986-encoded URLs (e.g.@: using @samp{\%28VS.85\%29} in |
| place of @samp{(VS.85)}) may not work correctly in versions of @R{} |
| before 3.1.3 and are best avoided---use @code{URLdecode()} to decode |
| them. |
| |
| @item \var@{@var{metasyntactic_variable}@} |
| @findex \var |
| Indicate a metasyntactic variable. In some cases this will be rendered |
| distinctly, e.g.@: in italic, but not in all@footnote{Currently it is |
| rendered differently only in @HTML{} conversions, and @LaTeX{} conversion |
| outside @samp{\usage} and @samp{\examples} environments.}. @LaTeX{}-like. |
| @item \env@{@var{environment_variable}@} |
| @findex \env |
| Indicate an environment variable. `Verbatim'. |
| Displayed using @code{typewriter} font where possible |
| @item \option@{@var{option}@} |
| @findex \option |
| Indicate a command-line option. `Verbatim'. |
| Displayed using @code{typewriter} font where possible. |
| @item \command@{@var{command_name}@} |
| @findex \command |
| Indicate the name of a command. @LaTeX{}-like, so @code{\var} is |
| interpreted. Displayed using @code{typewriter} font where possible. |
| @item \dfn@{@var{term}@} |
| @findex \dfn |
| Indicate the introductory or defining use of a term. @LaTeX{}-like. |
| @item \cite@{@var{reference}@} |
| @findex \cite |
| Indicate a reference without a direct cross-reference @emph{via} @code{\link} |
| (@pxref{Cross-references}), such as the name of a book. @LaTeX{}-like. |
| @item \acronym@{@var{acronym}@} |
| @findex \acronym |
| Indicate an acronym (an abbreviation written in all capital letters), |
| such as @acronym{GNU}. @LaTeX{}-like. |
| @end table |
| |
| |
| @node Lists and tables, Cross-references, Marking text, Writing R documentation files |
| @section Lists and tables |
| @cindex Lists and tables in documentation |
| |
| @findex \itemize |
| @findex \enumerate |
| The @code{\itemize} and @code{\enumerate} commands take a single |
| argument, within which there may be one or more @code{\item} commands. |
| The text following each @code{\item} is formatted as one or more |
| paragraphs, suitably indented and with the first paragraph marked with a |
| bullet point (@code{\itemize}) or a number (@code{\enumerate}). |
| |
| Note that unlike argument lists, @code{\item} in these formats is |
| followed by a space and the text (not enclosed in braces). For example |
| |
| @example |
| \enumerate@{ |
| \item A database consists of one or more records, each with one or |
| more named fields. |
| \item Regular lines start with a non-whitespace character. |
| \item Records are separated by one or more empty lines. |
| @} |
| @end example |
| |
| @code{\itemize} and @code{\enumerate} commands may be nested. |
| |
| @findex \describe |
| The @code{\describe} command is similar to @code{\itemize} but allows |
| initial labels to be specified. Each @code{\item} takes two arguments, |
| the label and the body of the item, in exactly the same way as an |
| argument or value @code{\item}. @code{\describe} commands are mapped to |
| @code{<DL>} lists in @HTML{} and @code{\description} lists in @LaTeX{}. |
| |
| @findex \tabular |
| The @code{\tabular} command takes two arguments. The first gives for |
| each of the columns the required alignment (@samp{l} for |
| left-justification, @samp{r} for right-justification or @samp{c} for |
| centring.) The second argument consists of an arbitrary number of |
| lines separated by @code{\cr}, and with fields separated by @code{\tab}. |
| For example: |
| |
| @example |
| @group |
| \tabular@{rlll@}@{ |
| [,1] \tab Ozone \tab numeric \tab Ozone (ppb)\cr |
| [,2] \tab Solar.R \tab numeric \tab Solar R (lang)\cr |
| [,3] \tab Wind \tab numeric \tab Wind (mph)\cr |
| [,4] \tab Temp \tab numeric \tab Temperature (degrees F)\cr |
| [,5] \tab Month \tab numeric \tab Month (1--12)\cr |
| [,6] \tab Day \tab numeric \tab Day of month (1--31) |
| @} |
| @end group |
| @end example |
| |
| @noindent |
| There must be the same number of fields on each line as there are |
| alignments in the first argument, and they must be non-empty (but can |
| contain only spaces). (There is no whitespace between @code{\tabular} |
| and the first argument, nor between the two arguments.) |
| |
| @node Cross-references, Mathematics, Lists and tables, Writing R documentation files |
| @section Cross-references |
| @cindex Cross-references in documentation |
| |
| @findex \link |
| The markup @code{\link@{@var{foo}@}} (usually in the combination |
| @code{\code@{\link@{@var{foo}@}@}}) produces a hyperlink to the help for |
| @var{foo}. Here @var{foo} is a @emph{topic}, that is the argument of |
| @code{\alias} markup in another @file{Rd} file (possibly in another package). |
| Hyperlinks are supported in some of the formats to which @file{Rd} files are |
| converted, for example @HTML{} and PDF, but ignored in others, e.g.@: |
| the text format. |
| |
| One main usage of @code{\link} is in the @code{\seealso} section of the |
| help page, @pxref{Rd format}. |
| |
| Note that whereas leading and trailing spaces are stripped when |
| extracting a topic from a @code{\alias}, they are not stripped when |
| looking up the topic of a @code{\link}. |
| |
| @cindex \linkS4class |
| You can specify a link to a different topic than its name by |
| @code{\link[=@var{dest}]@{@var{name}@}} which links to topic @var{dest} |
| with name @var{name}. This can be used to refer to the documentation |
| for S3/4 classes, for example @code{\code@{"\link[=abc-class]@{abc@}"@}} |
| would be a way to refer to the documentation of an S4 class @code{"abc"} |
| defined in your package, and |
| @code{\code@{"\link[=terms.object]@{terms@}"@}} to the S3 @code{"terms"} |
| class (in package @pkg{stats}). To make these easy to read in the |
| source file, @code{\code@{"\linkS4class@{abc@}"@}} expands to the form |
| given above. |
| |
| There are two other forms of optional argument specified as |
| @code{\link[@var{pkg}]@{@var{foo}@}} and |
| @code{\link[@var{pkg:bar}]@{@var{foo}@}} to link to the package |
| @pkg{@var{pkg}}, to @emph{files} @file{@var{foo}.html} and |
| @file{@var{bar}.html} respectively. These are rarely needed, perhaps to |
| refer to not-yet-installed packages (but there the @HTML{} help system |
| will resolve the link at run time) or in the normally undesirable event |
| that more than one package offers help on a topic@footnote{a common |
| example in @acronym{CRAN} packages is @code{\link[mgcv]@{gam@}}.} (in |
| which case the present package has precedence so this is only needed to |
| refer to other packages). They are currently only used in @HTML{} help |
| (and ignored for hyperlinks in @LaTeX{} conversions of help pages), and |
| link to the file rather than the topic (since there is no way to know |
| which topics are in which files in an uninstalled package). The |
| @strong{only} reason to use these forms for base and recommended |
| packages is to force a reference to a package that might be further down |
| the search path. Because they have been frequently misused, the @HTML{} |
| help system looks for topic @code{@var{foo}} in package @pkg{@var{pkg}} |
| if it does not find file @file{@var{foo}.html}. |
| |
| @node Mathematics, Figures, Cross-references, Writing R documentation files |
| @section Mathematics |
| @cindex Mathematics in documentation |
| @findex \eqn |
| @findex \deqn |
| |
| Mathematical formulae should be set beautifully for printed |
| documentation yet we still want something useful for text and @HTML{} |
| online help. To this end, the two commands |
| @code{\eqn@{@var{latex}@}@{@var{ascii}@}} and |
| @code{\deqn@{@var{latex}@}@{@var{ascii}@}} are used. Whereas @code{\eqn} |
| is used for ``inline'' formulae (corresponding to @TeX{}'s |
| @code{$@dots{}$}), @code{\deqn} gives ``displayed equations'' (as in |
| @LaTeX{}'s @code{displaymath} environment, or @TeX{}'s |
| @code{$$@dots{}$$}). Both arguments are treated as `verbatim' text. |
| |
| Both commands can also be used as @code{\eqn@{@var{latexascii}@}} (only |
| @emph{one} argument) which then is used for both @var{latex} and |
| @var{ascii}. No whitespace is allowed between command and the first |
| argument, nor between the first and second arguments. |
| |
| The following example is from @file{Poisson.Rd}: |
| |
| @example |
| @group |
| \deqn@{p(x) = \frac@{\lambda^x e^@{-\lambda@}@}@{x!@}@}@{% |
| p(x) = \lambda^x exp(-\lambda)/x!@} |
| for \eqn@{x = 0, 1, 2, \ldots@}. |
| @end group |
| @end example |
| |
| @iftex |
| For the @LaTeX{} manual, this becomes |
| @c: Name-and-shame for Brian Diggs: |
| @c: this is TeXinfo markup, not the result of the conversions. |
| @quotation |
| @cartouche |
| @tex |
| $$ p(x) = \lambda^x\ {e^{-\lambda} \over x!} $$ |
| for $x = 0, 1, 2, \ldots$. |
| @end tex |
| @end cartouche |
| @end quotation |
| @end iftex |
| |
| For text on-line help we get |
| |
| @quotation |
| @cartouche |
| @example |
| p(x) = lambda^x exp(-lambda)/x! |
| |
| for x = 0, 1, 2, .... |
| @end example |
| @end cartouche |
| @end quotation |
| |
| Greek letters (both cases) will be rendered in @HTML{} if preceded by a |
| backslash, @code{\dots} and @code{\ldots} will be rendered as ellipses |
| and @code{\sqrt}, @code{\ge} and @code{\le} as mathematical symbols. |
| |
| Note that only basic @LaTeX{} can be used, there being no provision to |
| specify @LaTeX{} style files such as the AMS extensions. |
| |
| @node Figures, Insertions, Mathematics, Writing R documentation files |
| @section Figures |
| @cindex Figures in documentation |
| @findex \figure |
| |
| To include figures in help pages, use the @code{\figure} markup. There |
| are three forms. |
| |
| The two commonly used simple forms are @code{\figure@{@var{filename}@}} |
| and @code{\figure@{@var{filename}@}@{@var{alternate text}@}}. This will |
| include a copy of the figure in either @HTML{} or @LaTeX{} output. In text |
| output, the alternate text will be displayed instead. (When the second |
| argument is omitted, the filename will be used.) Both the filename and |
| the alternate text will be parsed verbatim, and should not include |
| special characters that are significant in @HTML{} or @LaTeX{}. |
| |
| The expert form is @code{\figure@{@var{filename}@}@{options: |
| @var{string}@}}. (The word @samp{options:} must be typed exactly as |
| shown and followed by at least one space.) In this form, the |
| @var{string} is copied into the @HTML{} @code{img} tag as attributes |
| following the @code{src} attribute, or into the second argument of the |
| @code{\Figure} macro in @LaTeX{}, which by default is used as options to |
| an @code{\includegraphics} call. As it is unlikely that any single |
| string would suffice for both display modes, the expert form would |
| normally be wrapped in conditionals. It is up to the author to make |
| sure that legal @HTML{}/@LaTeX{} is used. For example, to include a |
| logo in both @HTML{} (using the simple form) and @LaTeX{} (using the |
| expert form), the following could be used: |
| |
| @example |
| \if@{html@}@{\figure@{Rlogo.svg@}@{options: width=100 alt="R logo"@}@} |
| \if@{latex@}@{\figure@{Rlogo.pdf@}@{options: width=0.5in@}@} |
| @end example |
| |
| The files containing the figures should be stored in the directory |
| @file{man/figures}. Files with extensions @file{.jpg}, @file{.jpeg}, |
| @file{.pdf}, @file{.png} and @file{.svg} from that directory will be |
| copied to the @file{help/figures} directory at install time. (Figures in |
| PDF format will not display in most @HTML{} browsers, but might be the |
| best choice in reference manuals.) Specify the filename relative to |
| @file{man/figures} in the @code{\figure} directive. |
| |
| @node Insertions, Indices, Figures, Writing R documentation files |
| @section Insertions |
| |
| @findex \R |
| Use @code{\R} for the @R{} system itself. Use @code{\dots} |
| @findex \dots |
| for the dots in function argument lists @samp{@dots{}}, and |
| @code{\ldots} |
| @findex \ldots |
| for ellipsis dots in ordinary text.@footnote{There is only a fine |
| distinction between @code{\dots} and @code{\ldots}. It is technically |
| incorrect to use @code{\ldots} in code blocks and @code{tools::checkRd} |
| will warn about this---on the other hand the current converters treat |
| them the same way in code blocks, and elsewhere apart from the small |
| distinction between the two in @LaTeX{}.} These can be followed by |
| @code{@{@}}, and should be unless followed by whitespace. |
| |
| After an unescaped @samp{%}, you can put your own comments regarding the |
| help text. The rest of the line (but not the newline at the end) will |
| be completely disregarded. Therefore, you can also use it to make part |
| of the ``help'' invisible. |
| |
| You can produce a backslash (@samp{\}) by escaping it by another |
| backslash. (Note that @code{\cr} is used for generating line breaks.) |
| |
| The ``comment'' character @samp{%} and unpaired braces@footnote{See the |
| examples section in the file @file{Paren.Rd} for an example.} |
| @emph{almost always} need to be escaped by @samp{\}, and @samp{\\} can |
| be used for backslash and needs to be when there are two or more adjacent |
| backslashes. In @R{}-like code quoted strings are handled slightly |
| differently; see @uref{https://developer.r-project.org/parseRd.pdf, |
| ``Parsing Rd files''} for details -- in particular braces should not be |
| escaped in quoted strings. |
| |
| All of @samp{% @{ @} \} should be escaped in @LaTeX{}-like text. |
| |
| @findex \enc |
| Text which might need to be represented differently in different |
| encodings should be marked by @code{\enc}, e.g.@: |
| @code{\enc@{J@"oreskog@}@{Joreskog@}} (with no whitespace between the |
| braces) where the first argument will be used where encodings are |
| allowed and the second should be @acronym{ASCII} (and is used for e.g.@: |
| the text conversion in locales that cannot represent the encoded form). |
| (This is intended to be used for individual words, not whole sentences |
| or paragraphs.) |
| |
| @node Indices, Platform-specific sections, Insertions, Writing R documentation files |
| @section Indices |
| @cindex Indices |
| |
| The @code{\alias} command (@pxref{Documenting functions}) is used to |
| specify the ``topics'' documented, which should include @emph{all} @R{} |
| objects in a package such as functions and variables, data sets, and S4 |
| classes and methods (@pxref{Documenting S4 classes and methods}). The |
| on-line help system searches the index data base consisting of all |
| alias topics. |
| |
| @findex \concept |
| In addition, it is possible to provide ``concept index entries'' using |
| @code{\concept}, which can be used for @code{help.search()} lookups. |
| E.g., file @file{cor.test.Rd} in the standard package @pkg{stats} |
| contains |
| |
| @example |
| \concept@{Kendall correlation coefficient@} |
| \concept@{Pearson correlation coefficient@} |
| \concept@{Spearman correlation coefficient@} |
| @end example |
| |
| @noindent |
| so that e.g.@: @kbd{??Spearman} will succeed in finding the |
| help page for the test for association between paired samples using |
| Spearman's @eqn{\rho, rho}. |
| |
| (Note that @code{help.search()} only uses ``sections'' of documentation |
| objects with no additional markup.) |
| |
| Each @code{\concept} entry should give a @emph{single} index term (word |
| or phrase), and not use any Rd markup. |
| |
| If you want to cross reference such items from other help files @emph{via} |
| @code{\link}, you need to use @code{\alias} and not @code{\concept}. |
| |
| |
| @node Platform-specific sections, Conditional text, Indices, Writing R documentation files |
| @section Platform-specific documentation |
| @cindex Platform-specific documentation |
| |
| Sometimes the documentation needs to differ by platform. Currently two |
| OS-specific options are available, @samp{unix} and @samp{windows}, and |
| lines in the help source file can be enclosed in |
| |
| @example |
| @group |
| #ifdef @var{OS} |
| ... |
| #endif |
| @end group |
| @end example |
| |
| @noindent |
| or |
| |
| @example |
| @group |
| #ifndef @var{OS} |
| ... |
| #endif |
| @end group |
| @end example |
| |
| @noindent |
| for OS-specific inclusion or exclusion. Such blocks should not be |
| nested, and should be entirely within a block (that, is between the |
| opening and closing brace of a section or item), or at top-level contain |
| one or more complete sections. |
| |
| If the differences between platforms are extensive or the @R{} objects |
| documented are only relevant to one platform, platform-specific @file{Rd} files |
| can be put in a @file{unix} or @file{windows} subdirectory. |
| |
| @node Conditional text, Dynamic pages, Platform-specific sections, Writing R documentation files |
| @section Conditional text |
| @cindex conditionals |
| @findex \if |
| @findex \ifelse |
| @findex \out |
| |
| Occasionally the best content for one output format is different from |
| the best content for another. For this situation, the |
| @code{\if@{@var{format}@}@{@var{text}@}} or |
| @code{\ifelse@{@var{format}@}@{@var{text}@}@{@var{alternate}@}} markup |
| is used. Here @var{format} is a comma separated list of formats in |
| which the @var{text} should be rendered. The @var{alternate} will be |
| rendered if the format does not match. Both @var{text} and |
| @var{alternate} may be any sequence of text and markup. |
| |
| Currently the following formats are recognized: @code{example}, |
| @code{html}, @code{latex} and @code{text}. These select output for |
| the corresponding targets. (Note that @code{example} refers to |
| extracted example code rather than the displayed example in some other |
| format.) Also accepted are @code{TRUE} (matching all formats) and |
| @code{FALSE} (matching no formats). These could be the output |
| of the @code{\Sexpr} macro (@pxref{Dynamic pages}). |
| |
| The @code{\out@{@var{literal}@}} macro would usually be used within |
| the @var{text} part of @code{\if@{@var{format}@}@{@var{text}@}}. It |
| causes the renderer to output the literal text exactly, with no |
| attempt to escape special characters. For example, use |
| the following to output the markup necessary to display the Greek letter in |
| @LaTeX{} or @HTML{}, and the text string @code{alpha} in other formats: |
| @example |
| \ifelse@{latex@}@{\out@{$\alpha$@}@}@{\ifelse@{html@}@{\out@{α@}@}@{alpha@}@} |
| @end example |
| |
| @node Dynamic pages, User-defined macros, Conditional text, Writing R documentation files |
| @section Dynamic pages |
| @cindex dynamic pages |
| @findex \Sexpr |
| @findex \RdOpts |
| |
| Two macros supporting dynamically generated man pages are @code{\Sexpr} |
| and @code{\RdOpts}. These are modelled after Sweave, and are intended |
| to contain executable @R{} expressions in the @file{Rd} file. |
| |
| The main argument to @code{\Sexpr} must be valid @R{} code that can be |
| executed. It may also take options in square brackets before the main |
| argument. Depending on the options, the code may be executed at |
| package build time, package install time, or man page rendering time. |
| |
| The options follow the same format as in Sweave, but different options |
| are supported. Currently the allowed options and their defaults are: |
| |
| @itemize @bullet |
| @item @code{eval=TRUE} |
| Whether the @R{} code should be evaluated. |
| |
| @item @code{echo=FALSE} |
| Whether the @R{} code should be echoed. If @code{TRUE}, a display will |
| be given in a preformatted block. For example, |
| @code{\Sexpr[echo=TRUE]@{ x <- 1 @}} will be displayed as |
| @example |
| > x <- 1 |
| @end example |
| |
| @item @code{keep.source=TRUE} |
| Whether to keep the author's formatting when displaying the |
| code, or throw it away and use a deparsed version. |
| |
| @item @code{results=text} |
| How should the results be displayed? The possibilities |
| are: |
| |
| @itemize @minus |
| @item @code{results=text} |
| Apply @code{as.character()} to the result of the code, and insert it |
| as a text element. |
| |
| @item @code{results=verbatim} |
| Print the results of the code just as if it was executed at the console, |
| and include the printed results verbatim. (Invisible results will not print.) |
| |
| @item @code{results=rd} |
| The result is assumed to be a character vector containing markup to be |
| passed to @code{parse_Rd()}, with the result inserted in place. This |
| could be used to insert computed aliases, for instance. |
| @code{parse_Rd()} is called first with @code{fragment = FALSE} to allow |
| a single Rd section macro to be inserted. If that fails, it is called |
| again with @code{fragment = TRUE}, the older behavior. |
| |
| @item @code{results=hide} |
| Insert no output. |
| @end itemize |
| |
| @item @code{strip.white=TRUE} |
| Remove leading and trailing white space from each line of |
| output if @code{strip.white=TRUE}. With |
| @code{strip.white=all}, also remove blank lines. |
| |
| @item @code{stage=install} |
| Control when this macro is run. Possible values are |
| @itemize @minus |
| @item @code{stage=build} |
| The macro is run when building a source tarball. |
| |
| @item @code{stage=install} |
| The macro is run when installing from source. |
| |
| @item @code{stage=render} |
| The macro is run when displaying the help page. |
| @end itemize |
| |
| Conditionals such as @code{#ifdef} |
| (@pxref{Platform-specific sections}) are applied after the |
| @code{build} macros but before the @code{install} macros. In some |
| situations (e.g.@: installing directly from a source directory without a |
| tarball, or building a binary package) the above description is not |
| literally accurate, but authors can rely on the sequence being |
| @code{build}, @code{#ifdef}, @code{install}, @code{render}, with all |
| stages executed. |
| |
| Code is only run once in each stage, so a @code{\Sexpr[results=rd]} |
| macro can output an @code{\Sexpr} macro designed for a later stage, |
| but not for the current one or any earlier stage. |
| |
| @item @code{width, height, fig} |
| These options are currently allowed but ignored. |
| @end itemize |
| |
| The @code{\RdOpts} macro is used to set new defaults for options to apply |
| to following uses of @code{\Sexpr}. |
| |
| For more details, see the online document |
| @uref{https://developer.r-project.org/parseRd.pdf, ``Parsing Rd files''}. |
| |
| @node User-defined macros, Encoding, Dynamic pages, Writing R documentation files |
| @section User-defined macros |
| @cindex user-defined macros |
| @findex \newcommand |
| @findex \renewcommand |
| |
| The @code{\newcommand} and @code{\renewcommand} macros allow new macros |
| to be defined within an Rd file. These are similar but not identical to |
| the same-named @LaTeX{} macros. |
| |
| They each take two arguments which are parsed verbatim. The first is |
| the name of the new macro including the initial backslash, and the |
| second is the macro definition. As in @LaTeX{}, @code{\newcommand} |
| requires that the new macro not have been previously defined, whereas |
| @code{\renewcommand} allows existing macros (including all built-in |
| ones) to be replaced. (This test is disabled by default, but may be |
| enabled by setting the environment variable |
| @env{_WARN_DUPLICATE_RD_MACROS_} to a true value.) |
| |
| Also as in @LaTeX{}, the new macro may be defined to take arguments, |
| and numeric placeholders such as @code{#1} are used in the macro |
| definition. However, unlike @LaTeX{}, the number of arguments is |
| determined automatically from the highest placeholder number seen in |
| the macro definition. For example, a macro definition containing |
| @code{#1} and @code{#3} (but no other placeholders) will define a |
| three argument macro (whose second argument will be ignored). As in |
| @LaTeX{}, at most 9 arguments may be defined. If the @code{#} |
| character is followed by a non-digit it will have no special |
| significance. All arguments to user-defined macros will be parsed as |
| verbatim text, and simple text-substitution will be used to replace |
| the place-holders, after which the replacement text will be parsed. |
| |
| A number of macros are defined in the file |
| @file{share/Rd/macros/system.Rd} of the @R{} source or home directory, |
| and these will normally be available in all @file{.Rd} files. For |
| example, that file contains the definition |
| @example |
| \newcommand@{\PR@}@{\Sexpr[results=rd]@{tools:::Rd_expr_PR(#1)@}@} |
| @end example |
| @noindent |
| which defines @code{\PR} to be a single argument macro; then code |
| (typically used in the @file{NEWS.Rd} file) like |
| @example |
| \PR@{1234@} |
| @end example |
| @noindent |
| will expand to |
| @example |
| \Sexpr[results=rd]@{tools:::Rd_expr_PR(1234)@} |
| @end example |
| @noindent |
| when parsed. |
| |
| Some macros that might be of general use are: |
| @ftable @code |
| @item \CRANpkg@{@var{pkg}@} |
| A package on CRAN |
| |
| @item \sspace |
| A single space (used after a period that does not end a sentence). |
| |
| @item \doi@{@var{numbers}@} |
| A digital object identifier (DOI). |
| @end ftable |
| See the @file{system.Rd} file in @file{share/Rd/macros} for more details |
| and macro definitions, including macros @code{\packageTitle}, |
| @code{\packageDescription}, @code{\packageAuthor}, @code{\packageMaintainer}, |
| @code{\packageDESCRIPTION} and @code{\packageIndices}. |
| @findex @code{\packageTitle} |
| @findex @code{\packageDescription} |
| @findex @code{\packageAuthor} |
| @findex @code{\packageMaintainer} |
| @findex @code{\packageDESCRIPTION} |
| @findex @code{\packageIndices} |
| |
| |
| Packages may also define their own common macros; these would be stored |
| in an @file{.Rd} file in @file{man/macros} in the package source and |
| will be installed into @file{help/macros} when the package is installed. |
| A package may also use the macros from a different package by listing |
| the other package in the @samp{RdMacros} field in the @file{DESCRIPTION} |
| file. |
| |
| |
| |
| @node Encoding, Processing documentation files, User-defined macros, Writing R documentation files |
| @section Encoding |
| @cindex encoding |
| |
| Rd files are text files and so it is impossible to deduce the encoding |
| they are written in unless @acronym{ASCII}: files with 8-bit characters |
| could be UTF-8, Latin-1, Latin-9, KOI8-R, EUC-JP, @emph{etc}. So an |
| @code{\encoding@{@}} section must be used to specify the encoding if it |
| is not @acronym{ASCII}. (The @code{\encoding@{@}} section must be on a |
| line by itself, and in particular one containing no non-@acronym{ASCII} |
| characters. The encoding declared in the @file{DESCRIPTION} file will |
| be used if none is declared in the file.) The @file{Rd} files are |
| converted to UTF-8 before parsing and so the preferred encoding for the |
| files themselves is now UTF-8. |
| |
| Wherever possible, avoid non-@acronym{ASCII} chars in @file{Rd} files, and |
| even symbols such as @samp{<}, @samp{>}, @samp{$}, @samp{^}, @samp{&}, |
| @samp{|}, @samp{@@}, @samp{~}, and @samp{*} outside `verbatim' |
| environments (since they may disappear in fonts designed to render |
| text). (Function @code{showNonASCIIfile} in package @pkg{tools} can help |
| in finding non-@acronym{ASCII} bytes in the files.) |
| |
| For convenience, encoding names @samp{latin1} and @samp{latin2} are |
| always recognized: these and @samp{UTF-8} are likely to work fairly |
| widely. However, this does not mean that all characters in UTF-8 will |
| be recognized, and the coverage of non-Latin characters@footnote{@R{} |
| 2.9.0 added support for UTF-8 Cyrillic characters in @LaTeX{}, but on |
| some OSes this will need Cyrillic support added to @LaTeX{}, so |
| environment variable @env{_R_CYRILLIC_TEX_} may need to be set to a |
| non-empty value to enable this.} is fairly low. Using @LaTeX{} |
| @code{inputenx} (see @code{?Rd2pdf} in @R{}) will give greater coverage |
| of UTF-8. |
| |
| The @code{\enc} command (@pxref{Insertions}) can be used to provide |
| transliterations which will be used in conversions that do not support |
| the declared encoding. |
| |
| The @LaTeX{} conversion converts the file to UTF-8 from the declared |
| encoding, and includes a |
| |
| @example |
| \inputencoding@{utf8@} |
| @end example |
| |
| @noindent |
| command, and this needs to be matched by a suitable invocation of the |
| @command{\usepackage@{inputenc@}} command. The @R{} utility @command{R |
| CMD Rd2pdf} looks at the converted code and includes the encodings used: |
| it might for example use |
| |
| @example |
| \usepackage[utf8]@{inputenc@} |
| @end example |
| |
| @noindent |
| (Use of @code{utf8} as an encoding requires @LaTeX{} dated 2003/12/01 or |
| later. Also, the use of Cyrillic characters in @samp{UTF-8} appears to |
| also need @samp{\usepackage[T2A]@{fontenc@}}, and @command{R CMD Rd2pdf} |
| includes this conditionally on the file @file{t2aenc.def} being present |
| and environment variable @env{_R_CYRILLIC_TEX_} being set.) |
| |
| Note that this mechanism works best with Latin letters: the coverage of |
| UTF-8 in @LaTeX{} is quite low. |
| |
| |
| |
| @node Processing documentation files, Editing Rd files, Encoding, Writing R documentation files |
| @section Processing documentation files |
| @cindex Processing Rd format |
| |
| There are several commands to process Rd files from the system command |
| line. |
| |
| @findex R CMD Rdconv |
| Using @code{R CMD Rdconv} one can convert @R{} documentation format to |
| other formats, or extract the executable examples for run-time testing. |
| The currently supported conversions are to plain text, @HTML{} and |
| @LaTeX{} as well as extraction of the examples. |
| |
| @findex R CMD Rd2pdf |
| @code{R CMD Rd2pdf} generates PDF output from documentation in @file{Rd} |
| files, which can be specified either explicitly or by the path to a |
| directory with the sources of a package. In the latter case, a |
| reference manual for all documented objects in the package is created, |
| including the information in the @file{DESCRIPTION} files. |
| |
| @findex R CMD Sweave |
| @findex R CMD Stangle |
| @code{R CMD Sweave} and @code{R CMD Stangle} process vignette-like |
| documentation files (e.g.@: Sweave vignettes with extension |
| @samp{.Snw} or @samp{.Rnw}, or other non-Sweave vignettes). |
| @code{R CMD Stangle} is used to extract the @R{} code fragments. |
| |
| The exact usage and a detailed list of available options for all of |
| these commands can be obtained by running @code{R CMD @var{command} |
| --help}, e.g., @kbd{R CMD Rdconv --help}. All available commands can be |
| listed using @kbd{R --help} (or @kbd{Rcmd --help} under Windows). |
| |
| All of these work under Windows. You may need to have installed the |
| the tools to build packages from source as described in the ``R |
| Installation and Administration'' manual, although typically all that is |
| needed is a @LaTeX{} installation. |
| |
| @node Editing Rd files, , Processing documentation files, Writing R documentation files |
| @section Editing Rd files |
| @cindex Editing Rd files |
| |
| It can be very helpful to prepare @file{.Rd} files using a editor which |
| knows about their syntax and will highlight commands, indent to show the |
| structure and detect mis-matched braces, and so on. |
| |
| The system most commonly used for this is some version of |
| @command{Emacs} (including @command{XEmacs}) with the @acronym{ESS} |
| package (@uref{https://ESS.R-project.org/}: it is often is installed with |
| @command{Emacs} but may need to be loaded, or even installed, |
| separately). |
| |
| Another is the Eclipse IDE with the Stat-ET plugin |
| (@uref{http://www.walware.de/goto/statet}), and (on Windows only) |
| Tinn-R (@uref{http://sourceforge.net/@/projects/@/tinn-r/}). |
| |
| People have also used @LaTeX{} mode in a editor, as @file{.Rd} files are |
| rather similar to @LaTeX{} files. |
| |
| Some @R{} front-ends provide editing support for @file{.Rd} files, for |
| example RStudio (@uref{https://www.rstudio.com/}). |
| |
| @node Tidying and profiling R code, Debugging, Writing R documentation files, Top |
| @chapter Tidying and profiling R code |
| |
| @menu |
| * Tidying R code:: |
| * Profiling R code for speed:: |
| * Profiling R code for memory use:: |
| * Profiling compiled code:: |
| @end menu |
| |
| @R{} code which is worth preserving in a package and perhaps making |
| available for others to use is worth documenting, tidying up and perhaps |
| optimizing. The last two of these activities are the subject of this |
| chapter. |
| |
| @node Tidying R code, Profiling R code for speed, Tidying and profiling R code, Tidying and profiling R code |
| @section Tidying R code |
| @cindex Tidying R code |
| |
| @R{} treats function code loaded from packages and code entered by users |
| differently. By default code entered by users has the source code stored |
| internally, and when the function is listed, the original source is |
| reproduced. Loading code from a package (by default) discards the |
| source code, and the function listing is re-created from the parse tree |
| of the function. |
| |
| Normally keeping the source code is a good idea, and in particular it |
| avoids comments being removed from the source. However, we can make |
| use of the ability to re-create a function listing from its parse tree |
| to produce a tidy version of the function, for example with consistent |
| indentation and spaces around operators. If the original source |
| does not follow the standard format this tidied version can be much |
| easier to read. |
| |
| We can subvert the keeping of source in two ways. |
| |
| @enumerate |
| @item |
| The option @code{keep.source} can be set to @code{FALSE} before the code |
| is loaded into @R{}. |
| @item |
| The stored source code can be removed by calling the @code{removeSource()} |
| function, for example by |
| |
| @example |
| myfun <- removeSource(myfun) |
| @end example |
| |
| @end enumerate |
| |
| @noindent |
| In each case if we then list the function we will get the standard |
| layout. |
| |
| Suppose we have a file of functions @file{myfuns.R} that we want to |
| tidy up. Create a file @file{tidy.R} containing |
| |
| @example |
| @group |
| source("myfuns.R", keep.source = FALSE) |
| dump(ls(all = TRUE), file = "new.myfuns.R") |
| @end group |
| @end example |
| |
| @noindent |
| and run @R{} with this as the source file, for example by @kbd{R |
| --vanilla < tidy.R} or by pasting into an @R{} session. Then the file |
| @file{new.myfuns.R} will contain the functions in alphabetical order in |
| the standard layout. Warning: comments in your functions will be lost. |
| |
| The standard format provides a good starting point for further tidying. |
| Although the deparsing cannot do so, we recommend the consistent use of |
| the preferred assignment operator @samp{<-} (rather than @samp{=}) for |
| assignment. Many package authors use a version of Emacs (on a |
| Unix-alike or Windows) to edit @R{} code, using the ESS[S] mode of the |
| @acronym{ESS} Emacs package. See @ref{R coding standards, , R coding |
| standards, R-ints, R Internals} for style options within the ESS[S] mode |
| recommended for the source code of @R{} itself. |
| |
| |
| |
| @node Profiling R code for speed, Profiling R code for memory use, Tidying R code, Tidying and profiling R code |
| @section Profiling R code for speed |
| @cindex Profiling |
| @findex Rprof |
| |
| It is possible to profile @R{} code on Windows and most@footnote{@R{} |
| has to be built to enable this, but the option |
| @option{--enable-R-profiling} is the default.} Unix-alike versions of |
| @R{}. |
| |
| The command @command{Rprof} is used to control profiling, and its help |
| page can be consulted for full details. Profiling works by recording |
| at fixed intervals@footnote{For Unix-alikes these are intervals of CPU |
| time, and for Windows of elapsed time.} (by default every 20 msecs) |
| which line in which @R{} function is being used, and recording the |
| results in a file (default @file{Rprof.out} in the working directory). |
| Then the function @code{summaryRprof} or the command-line utility |
| @code{R CMD Rprof @var{Rprof.out}} can be used to summarize the |
| activity. |
| |
| As an example, consider the following code (from Venables & Ripley, |
| 2002, pp. 225--6). |
| |
| @smallexample |
| @group |
| library(MASS); library(boot) |
| storm.fm <- nls(Time ~ b*Viscosity/(Wt - c), stormer, |
| start = c(b=30.401, c=2.2183)) |
| st <- cbind(stormer, fit=fitted(storm.fm)) |
| storm.bf <- function(rs, i) @{ |
| st$Time <- st$fit + rs[i] |
| tmp <- nls(Time ~ (b * Viscosity)/(Wt - c), st, |
| start = coef(storm.fm)) |
| tmp$m$getAllPars() |
| @} |
| rs <- scale(resid(storm.fm), scale = FALSE) # remove the mean |
| Rprof("boot.out") |
| storm.boot <- boot(rs, storm.bf, R = 4999) # slow enough to profile |
| Rprof(NULL) |
| @end group |
| @end smallexample |
| |
| @noindent |
| Having run this we can summarize the results by |
| |
| @smallexample |
| @group |
| R CMD Rprof boot.out |
| |
| Each sample represents 0.02 seconds. |
| Total run time: 22.52 seconds. |
| |
| Total seconds: time spent in function and callees. |
| Self seconds: time spent in function alone. |
| @end group |
| |
| @group |
| % total % self |
| total seconds self seconds name |
| 100.0 25.22 0.2 0.04 "boot" |
| 99.8 25.18 0.6 0.16 "statistic" |
| 96.3 24.30 4.0 1.02 "nls" |
| 33.9 8.56 2.2 0.56 "<Anonymous>" |
| 32.4 8.18 1.4 0.36 "eval" |
| 31.8 8.02 1.4 0.34 ".Call" |
| 28.6 7.22 0.0 0.00 "eval.parent" |
| 28.5 7.18 0.3 0.08 "model.frame" |
| 28.1 7.10 3.5 0.88 "model.frame.default" |
| 17.4 4.38 0.7 0.18 "sapply" |
| 15.0 3.78 3.2 0.80 "nlsModel" |
| 12.5 3.16 1.8 0.46 "lapply" |
| 12.3 3.10 2.7 0.68 "assign" |
| ... |
| @end group |
| |
| @group |
| % self % total |
| self seconds total seconds name |
| 5.7 1.44 7.5 1.88 "inherits" |
| 4.0 1.02 96.3 24.30 "nls" |
| 3.6 0.92 3.6 0.92 "$" |
| 3.5 0.88 28.1 7.10 "model.frame.default" |
| 3.2 0.80 15.0 3.78 "nlsModel" |
| 2.8 0.70 9.8 2.46 "qr.coef" |
| 2.7 0.68 12.3 3.10 "assign" |
| 2.5 0.64 2.5 0.64 ".Fortran" |
| 2.5 0.62 7.1 1.80 "qr.default" |
| 2.2 0.56 33.9 8.56 "<Anonymous>" |
| 2.1 0.54 5.9 1.48 "unlist" |
| 2.1 0.52 7.9 2.00 "FUN" |
| ... |
| @end group |
| @end smallexample |
| |
| @noindent |
| This often produces |
| surprising results and can be used to identify bottlenecks or pieces of |
| @R{} code that could benefit from being replaced by compiled code. |
| |
| Two warnings: profiling does impose a small performance penalty, and the |
| output files can be very large if long runs are profiled at the default |
| sampling interval. |
| |
| Profiling short runs can sometimes give misleading results. @R{} from |
| time to time performs @emph{garbage collection} to reclaim unused |
| memory, and this takes an appreciable amount of time which profiling |
| will charge to whichever function happens to provoke it. It may be |
| useful to compare profiling code immediately after a call to @code{gc()} |
| with a profiling run without a preceding call to @code{gc}. |
| |
| More detailed analysis of the output can be achieved by the tools in the |
| @acronym{CRAN} packages @CRANpkg{proftools} and @CRANpkg{profr}: in |
| particular these allow call graphs to be studied. |
| |
| @node Profiling R code for memory use, Profiling compiled code, Profiling R code for speed, Tidying and profiling R code |
| @section Profiling R code for memory use |
| @cindex Profiling |
| @cindex Memory use |
| |
| Measuring memory use in @R{} code is useful either when the code takes |
| more memory than is conveniently available or when memory allocation and |
| copying of objects is responsible for slow code. There are three ways to |
| profile memory use over time in @R{} code. All three require @R{} to |
| have been compiled with @option{--enable-memory-profiling}, which is not |
| the default, but is currently used for the macOS and Windows binary |
| distributions. All can be misleading, for different reasons. |
| |
| In understanding the memory profiles it is useful to know a little more |
| about @R{}'s memory allocation. Looking at the results of @code{gc()} |
| shows a division of memory into @code{Vcells} used to store the contents |
| of vectors and @code{Ncells} used to store everything else, including |
| all the administrative overhead for vectors such as type and length |
| information. In fact the vector contents are divided into two |
| pools. Memory for small vectors (by default 128 bytes or less) is |
| obtained in large chunks and then parcelled out by @R{}; memory for |
| larger vectors is obtained directly from the operating system. |
| |
| Some memory allocation is obvious in interpreted code, for example, |
| |
| @smallexample |
| y <- x + 1 |
| @end smallexample |
| |
| @noindent |
| allocates memory for a new vector @code{y}. Other memory allocation is |
| less obvious and occurs because @code{R} is forced to make good on its |
| promise of `call-by-value' argument passing. When an argument is |
| passed to a function it is not immediately copied. Copying occurs (if |
| necessary) only when the argument is modified. This can lead to |
| surprising memory use. For example, in the `survey' package we have |
| |
| @smallexample |
| print.svycoxph <- function (x, ...) |
| @{ |
| print(x$survey.design, varnames = FALSE, design.summaries = FALSE, ...) |
| x$call <- x$printcall |
| NextMethod() |
| @} |
| @end smallexample |
| |
| @noindent |
| It may not be obvious that the assignment to @code{x$call} will cause |
| the entire object @code{x} to be copied. This copying to preserve the |
| call-by-value illusion is usually done by the internal C function |
| @code{duplicate}. |
| |
| The main reason that memory-use profiling is difficult is garbage |
| collection. Memory is allocated at well-defined times in an @R{} |
| program, but is freed whenever the garbage collector happens to run. |
| |
| @menu |
| * Memory statistics from Rprof:: |
| * Tracking memory allocations:: |
| * Tracing copies of an object:: |
| @end menu |
| |
| @node Memory statistics from Rprof, Tracking memory allocations, Profiling R code for memory use, Profiling R code for memory use |
| @subsection Memory statistics from @code{Rprof} |
| @findex Rprof |
| @findex summaryRprof |
| |
| The sampling profiler @code{Rprof} described in the previous section can |
| be given the option @code{memory.profiling=TRUE}. It then writes out the |
| total @R{} memory allocation in small vectors, large vectors, and cons |
| cells or nodes at each sampling interval. It also writes out the number |
| of calls to the internal function @code{duplicate}, which is called to |
| copy @R{} objects. @code{summaryRprof} provides summaries of this |
| information. The main reason that this can be misleading is that the |
| memory use is attributed to the function running at the end of the |
| sampling interval. A second reason is that garbage collection can make |
| the amount of memory in use decrease, so a function appears to use |
| little memory. Running under @code{gctorture} helps with both problems: |
| it slows down the code to effectively increase the sampling frequency |
| and it makes each garbage collection release a smaller amount of memory. |
| Changing the memory limits with @code{mem.limits()} may also be useful, |
| to see how the code would run under different memory conditions. |
| |
| @node Tracking memory allocations, Tracing copies of an object, Memory statistics from Rprof, Profiling R code for memory use |
| @subsection Tracking memory allocations |
| @findex Rprofmem |
| |
| The second method of memory profiling uses a memory-allocation |
| profiler, @code{Rprofmem()}, which writes out a stack trace to an |
| output file every time a large vector is allocated (with a |
| user-specified threshold for `large') or a new page of memory is |
| allocated for the @R{} heap. Summary functions for this output are still |
| being designed. |
| |
| Running the example from the previous section with |
| |
| @smallexample |
| > Rprofmem("boot.memprof",threshold=1000) |
| > storm.boot <- boot(rs, storm.bf, R = 4999) |
| > Rprofmem(NULL) |
| @end smallexample |
| |
| @noindent |
| shows that apart from some initial and final work in @code{boot} there |
| are no vector allocations over 1000 bytes. |
| |
| @node Tracing copies of an object, , Tracking memory allocations, Profiling R code for memory use |
| @subsection Tracing copies of an object |
| @findex tracemem |
| @findex untracemem |
| |
| The third method of memory profiling involves tracing copies made of a |
| specific (presumably large) @R{} object. Calling @code{tracemem} on an |
| object marks it so that a message is printed to standard output when |
| the object is copied @emph{via} @code{duplicate} or coercion to another type, |
| or when a new object of the same size is created in arithmetic |
| operations. The main reason that this can be misleading is that |
| copying of subsets or components of an object is not tracked. It may |
| be helpful to use @code{tracemem} on these components. |
| |
| |
| In the example above we can run @code{tracemem} on the data frame |
| @code{st} |
| |
| @smallexample |
| > tracemem(st) |
| [1] "<0x9abd5e0>" |
| > storm.boot <- boot(rs, storm.bf, R = 4) |
| memtrace[0x9abd5e0->0x92a6d08]: statistic boot |
| memtrace[0x92a6d08->0x92a6d80]: $<-.data.frame $<- statistic boot |
| memtrace[0x92a6d80->0x92a6df8]: $<-.data.frame $<- statistic boot |
| memtrace[0x9abd5e0->0x9271318]: statistic boot |
| memtrace[0x9271318->0x9271390]: $<-.data.frame $<- statistic boot |
| memtrace[0x9271390->0x9271408]: $<-.data.frame $<- statistic boot |
| memtrace[0x9abd5e0->0x914f558]: statistic boot |
| memtrace[0x914f558->0x914f5f8]: $<-.data.frame $<- statistic boot |
| memtrace[0x914f5f8->0x914f670]: $<-.data.frame $<- statistic boot |
| memtrace[0x9abd5e0->0x972cbf0]: statistic boot |
| memtrace[0x972cbf0->0x972cc68]: $<-.data.frame $<- statistic boot |
| memtrace[0x972cc68->0x972cd08]: $<-.data.frame $<- statistic boot |
| memtrace[0x9abd5e0->0x98ead98]: statistic boot |
| memtrace[0x98ead98->0x98eae10]: $<-.data.frame $<- statistic boot |
| memtrace[0x98eae10->0x98eae88]: $<-.data.frame $<- statistic boot |
| @end smallexample |
| |
| @noindent |
| The object is duplicated fifteen times, three times for each of the |
| @code{R+1} calls to @code{storm.bf}. This is surprising, since none of the duplications happen inside @code{nls}. Stepping through @code{storm.bf} in the debugger shows that all three happen in the line |
| |
| @smallexample |
| st$Time <- st$fit + rs[i] |
| @end smallexample |
| |
| Data frames are slower than matrices and this is an example of why. |
| Using @code{tracemem(st$Viscosity)} does not reveal any additional |
| copying. |
| |
| @node Profiling compiled code, , Profiling R code for memory use, Tidying and profiling R code |
| @section Profiling compiled code |
| @cindex Profiling |
| |
| Profiling compiled code is highly system-specific, but this section |
| contains some hints gleaned from various @R{} users. Some methods need |
| to be different for a compiled executable and for dynamic/shared |
| libraries/objects as used by @R{} packages. We know of no good way to |
| profile DLLs on Windows. |
| |
| @menu |
| * Linux:: |
| * Solaris:: |
| * macOS:: |
| @end menu |
| |
| @node Linux, Solaris, Profiling compiled code, Profiling compiled code |
| @subsection Linux |
| |
| Options include using @command{sprof} for a shared object, and |
| @command{oprofile} (see @uref{http://oprofile.sourceforge.net/}) and |
| @command{perf} (see |
| @uref{https://perf.wiki.kernel.org/@/index.php/@/Tutorial}) for any |
| executable or shared object. |
| |
| @subsubsection sprof |
| |
| You can select shared objects to be profiled with @command{sprof} by |
| setting the environment variable @env{LD_PROFILE}. For example |
| |
| @example |
| % setenv LD_PROFILE /path/to/R_HOME/library/stats/libs/stats.so |
| R |
| ... run the boot example |
| % sprof /path/to/R_HOME/library/stats/libs/stats.so \ |
| /var/tmp/path/to/R_HOME/library/stats/libs/stats.so.profile |
| |
| Flat profile: |
| |
| Each sample counts as 0.01 seconds. |
| % cumulative self self total |
| time seconds seconds calls us/call us/call name |
| 76.19 0.32 0.32 0 0.00 numeric_deriv |
| 16.67 0.39 0.07 0 0.00 nls_iter |
| 7.14 0.42 0.03 0 0.00 getListElement |
| |
| rm /var/tmp/path/to/R_HOME/library/stats/libs/stats.so.profile |
| ... to clean up ... |
| @end example |
| |
| It is possible that root access is needed to create the directories used |
| for the profile data. |
| |
| @subsubsection oprofile and operf |
| |
| The @command{oprofile} project has two modes of operation. In what is |
| now called `legacy' mode, it is uses a daemon to collect information on |
| a process (see below). Since version 0.9.8 (August 2012), the preferred |
| mode is to use @command{operf}, so we discuss that first. The modes |
| differ in how the profiling data is collected: it is analysed by tools |
| such as @command{opreport} and @command{oppannote} in both. |
| |
| Here is an example on @code{x86_64} Linux using @R{} 3.0.2. File |
| @file{pvec.R} contains the part of the examples from @code{pvec} in |
| package @pkg{parallel}: |
| @example |
| library(parallel) |
| N <- 1e6 |
| dates <- sprintf('%04d-%02d-%02d', as.integer(2000+rnorm(N)), |
| as.integer(runif(N, 1, 12)), as.integer(runif(N, 1, 28))) |
| system.time(a <- as.POSIXct(dates, format = "%Y-%m-%d")) |
| @end example |
| @noindent |
| with timings from the final step |
| @example |
| user system elapsed |
| 0.371 0.237 0.612 |
| @end example |
| |
| @R{}-level profiling by @code{Rprof} shows |
| @example |
| self.time self.pct total.time total.pct |
| "strptime" 1.70 41.06 1.70 41.06 |
| "as.POSIXct.POSIXlt" 1.40 33.82 1.42 34.30 |
| "sprintf" 0.74 17.87 0.98 23.67 |
| ... |
| @end example |
| @noindent |
| so the conversion from character to @code{POSIXlt} takes most of the |
| time. |
| |
| This can be run under @command{operf} and analysed by |
| @example |
| operf R -f pvec.R |
| opreport |
| opreport -l /path/to/R_HOME/bin/exec/R |
| opannotate --source /path/to/R_HOME/bin/exec/R |
| ## And for the system time |
| opreport -l /lib64/libc.so.6 |
| @end example |
| @noindent |
| The first report shows where (which library etc) the time was spent: |
| @example |
| CPU_CLK_UNHALT...| |
| samples| %| |
| ------------------ |
| 166761 99.9161 Rdev |
| CPU_CLK_UNHALT...| |
| samples| %| |
| ------------------ |
| 70586 42.3276 no-vmlinux |
| 56963 34.1585 libc-2.16.so |
| 36922 22.1407 R |
| 1584 0.9499 stats.so |
| 624 0.3742 libm-2.16.so |
| ... |
| @end example |
| |
| @noindent |
| The rest of the output is voluminous, and only extracts are shown below. |
| |
| Most of the time within @R{} is spent in |
| @example |
| samples % image name symbol name |
| 10397 28.5123 R R_gc_internal |
| 5683 15.5848 R do_sprintf |
| 3036 8.3258 R do_asPOSIXct |
| 2427 6.6557 R do_strptime |
| 2421 6.6392 R Rf_mkCharLenCE |
| 1480 4.0587 R w_strptime_internal |
| 1202 3.2963 R Rf_qnorm5 |
| 1165 3.1948 R unif_rand |
| 675 1.8511 R mktime0 |
| 617 1.6920 R makelt |
| 617 1.6920 R validate_tm |
| 584 1.6015 R day_of_the_week |
| ... |
| @end example |
| @noindent |
| @command{opannotate} shows that 31% of the time in @R{} is spent in |
| @file{memory.c}, 21% in @file{datetime.c} and 7% in @file{Rstrptime.h}. |
| The analysis for @file{libc} showed that calls to @code{wcsftime} |
| dominated, so those calls were cached for @R{} 3.0.3: the time spent in |
| @code{no-vmlinux} (the kernel) was reduced dramatically. |
| |
| On platforms which support it, call graphs can be produced by |
| @command{opcontrol --callgraph} if collected @emph{via} @command{operf |
| --callgraph}. |
| |
| The profiling data is by default stored in sub-directory |
| @file{oprofile_data} of the current directory, which can be removed at |
| the end of the session. |
| |
| Another example, from @CRANpkg{sm} version 2.2-5.4. The example for |
| @code{sm.variogram} took a long time: |
| @example |
| system.time(example(sm.variogram)) |
| ... |
| user system elapsed |
| 5.543 3.202 8.785 |
| @end example |
| |
| @noindent |
| including a lot of system time. Profiling just the slow part, the |
| second plot, showed |
| |
| @example |
| samples| %| |
| ------------------ |
| 381845 99.9885 R |
| CPU_CLK_UNHALT...| |
| samples| %| |
| ------------------ |
| 187484 49.0995 sm.so |
| 169627 44.4230 no-vmlinux |
| 12636 3.3092 libgfortran.so.3.0.0 |
| 6455 1.6905 R |
| @end example |
| |
| @noindent |
| so the system time was almost all in the Linux kernel. It is possible |
| to dig deeper if you have a matching uncompressed kernel with debug |
| symbols to specify @emph{via} @option{--vmlinux}: we did not. |
| |
| In `legacy' mode @code{oprofile} works by running a daemon which |
| collects information. The daemon must be started as root, e.g. |
| |
| @example |
| % su |
| % opcontrol --no-vmlinux |
| % (optional, some platforms) opcontrol --callgraph=5 |
| % opcontrol --start |
| % exit |
| @end example |
| |
| Then as a user |
| |
| @example |
| % R |
| ... run the boot example |
| % opcontrol --dump |
| % opreport -l /path/to/R_HOME/library/stats/libs/stats.so |
| ... |
| samples % symbol name |
| 1623 75.5939 anonymous symbol from section .plt |
| 349 16.2552 numeric_deriv |
| 113 5.2632 nls_iter |
| 62 2.8878 getListElement |
| % opreport -l /path/to/R_HOME/bin/exec/R |
| ... |
| samples % symbol name |
| 76052 11.9912 Rf_eval |
| 54670 8.6198 Rf_findVarInFrame3 |
| 37814 5.9622 Rf_allocVector |
| 31489 4.9649 Rf_duplicate |
| 28221 4.4496 Rf_protect |
| 26485 4.1759 Rf_cons |
| 23650 3.7289 Rf_matchArgs |
| 21088 3.3250 Rf_findFun |
| 19995 3.1526 findVarLocInFrame |
| 14871 2.3447 Rf_evalList |
| 13794 2.1749 R_Newhashpjw |
| 13522 2.1320 R_gc_internal |
| ... |
| @end example |
| |
| Shutting down the profiler and clearing the records needs to be done as |
| root. |
| |
| |
| @node Solaris, macOS, Linux, Profiling compiled code |
| @subsection Solaris |
| |
| On 64-bit (only) Solaris, the standard profiling tool @command{gprof} |
| collects information from shared objects compiled with @option{-pg}. |
| |
| @node macOS, , Solaris, Profiling compiled code |
| @subsection macOS |
| |
| Developers have recommended @command{sample} (or @command{Sampler.app}, |
| which is a GUI version), @command{Shark} (in version of @code{Xcode} |
| up to those for Snow Leopard), and @command{Instruments} (part of |
| @code{Xcode}, see |
| @uref{https://developer.apple.com/library/content/documentation/DeveloperTools/Conceptual/InstrumentsUserGuide/index.html}). |
| |
| |
| @node Debugging, System and foreign language interfaces, Tidying and profiling R code, Top |
| @chapter Debugging |
| |
| This chapter covers the debugging of @R{} extensions, starting with the |
| ways to get useful error information and moving on to how to deal with |
| errors that crash @R{}. For those who prefer other styles there are |
| contributed packages such as @CRANpkg{debug} on @acronym{CRAN} |
| (described in an article in |
| @uref{https://CRAN.R-project.org/@/doc/@/Rnews/@/Rnews_2003-3.pdf, R-News |
| 3/3}). (There are notes from 2002 provided by Roger Peng at |
| @uref{http://www.biostat.jhsph.edu/@/~rpeng/@/docs/@/R-debug-tools.pdf} |
| which provide complementary examples to those given here.) |
| |
| @menu |
| * Browsing:: |
| * Debugging R code:: |
| * Checking memory access:: |
| * Debugging compiled code:: |
| * Using Link-time Optimization:: |
| @end menu |
| |
| |
| @node Browsing, Debugging R code, Debugging, Debugging |
| @section Browsing |
| |
| @findex browser |
| Most of the @R{}-level debugging facilities are based around the |
| built-in browser. This can be used directly by inserting a call to |
| @code{browser()} into the code of a function (for example, using |
| @code{fix(my_function)} ). When code execution reaches that point in |
| the function, control returns to the @R{} console with a special prompt. |
| For example |
| |
| @example |
| > fix(summary.data.frame) ## insert browser() call after for() loop |
| > summary(women) |
| Called from: summary.data.frame(women) |
| Browse[1]> ls() |
| [1] "digits" "i" "lbs" "lw" "maxsum" "nm" "nr" "nv" |
| [9] "object" "sms" "z" |
| Browse[1]> maxsum |
| [1] 7 |
| Browse[1]> |
| height weight |
| Min. :58.0 Min. :115.0 |
| 1st Qu.:61.5 1st Qu.:124.5 |
| Median :65.0 Median :135.0 |
| Mean :65.0 Mean :136.7 |
| 3rd Qu.:68.5 3rd Qu.:148.0 |
| Max. :72.0 Max. :164.0 |
| > rm(summary.data.frame) |
| @end example |
| |
| @noindent |
| At the browser prompt one can enter any @R{} expression, so for example |
| @code{ls()} lists the objects in the current frame, and entering the |
| name of an object will@footnote{With the exceptions of the commands |
| listed below: an object of such a name can be printed @emph{via} an |
| explicit call to @code{print}.} print it. The following commands are |
| also accepted |
| |
| @itemize @bullet |
| @item @code{n} |
| |
| Enter `step-through' mode. In this mode, hitting return executes the |
| next line of code (more precisely one line and any continuation lines). |
| Typing @code{c} will continue to the end of the current context, e.g.@: |
| to the end of the current loop or function. |
| |
| @item @code{c} |
| |
| In normal mode, this quits the browser and continues execution, and just |
| return works in the same way. @code{cont} is a synonym. |
| |
| @item @code{where} |
| |
| This prints the call stack. For example |
| |
| @example |
| > summary(women) |
| Called from: summary.data.frame(women) |
| Browse[1]> where |
| where 1: summary.data.frame(women) |
| where 2: summary(women) |
| |
| Browse[1]> |
| @end example |
| |
| @item @code{Q} |
| |
| Quit both the browser and the current expression, and return to the |
| top-level prompt. |
| @end itemize |
| |
| Errors in code executed at the browser prompt will normally return |
| control to the browser prompt. Objects can be altered by assignment, |
| and will keep their changed values when the browser is exited. If |
| really necessary, objects can be assigned to the workspace from the |
| browser prompt (by using @code{<<-} if the name is not already in |
| scope). |
| |
| @node Debugging R code, Checking memory access, Browsing, Debugging |
| @section Debugging R code |
| |
| @findex traceback |
| Suppose your @R{} program gives an error message. The first thing to |
| find out is what @R{} was doing at the time of the error, and the most |
| useful tool is @code{traceback()}. We suggest that this is run whenever |
| the cause of the error is not immediately obvious. Daily, errors are |
| reported to the @R{} mailing lists as being in some package when |
| @code{traceback()} would show that the error was being reported by some |
| other package or base @R{}. Here is an example from the regression |
| suite. |
| |
| @smallexample |
| > success <- c(13,12,11,14,14,11,13,11,12) |
| > failure <- c(0,0,0,0,0,0,0,2,2) |
| > resp <- cbind(success, failure) |
| > predictor <- c(0, 5^(0:7)) |
| > glm(resp ~ 0+predictor, family = binomial(link="log")) |
| Error: no valid set of coefficients has been found: please supply starting values |
| > traceback() |
| 3: stop("no valid set of coefficients has been found: please supply |
| starting values", call. = FALSE) |
| 2: glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, |
| mustart = mustart, offset = offset, family = family, control = control, |
| intercept = attr(mt, "intercept") > 0) |
| 1: glm(resp ~ 0 + predictor, family = binomial(link ="log")) |
| @end smallexample |
| |
| @noindent |
| The calls to the active frames are given in reverse order (starting with |
| the innermost). So we see the error message comes from an explicit |
| check in @code{glm.fit}. (@code{traceback()} shows you all the lines of |
| the function calls, which can be limited by setting @code{option} |
| @option{"deparse.max.lines"}.) |
| |
| Sometimes the traceback will indicate that the error was detected inside |
| compiled code, for example (from @code{?nls}) |
| |
| @smallexample |
| Error in nls(y ~ a + b * x, start = list(a = 0.12345, b = 0.54321), trace = TRUE) : |
| step factor 0.000488281 reduced below 'minFactor' of 0.000976563 |
| > traceback() |
| 2: .Call(R_nls_iter, m, ctrl, trace) |
| 1: nls(y ~ a + b * x, start = list(a = 0.12345, b = 0.54321), trace = TRUE) |
| @end smallexample |
| |
| @noindent |
| This will be the case if the innermost call is to @code{.C}, |
| @code{.Fortran}, @code{.Call}, @code{.External} or @code{.Internal}, but |
| as it is also possible for such code to evaluate @R{} expressions, this |
| need not be the innermost call, as in |
| |
| @smallexample |
| > traceback() |
| 9: gm(a, b, x) |
| 8: .Call(R_numeric_deriv, expr, theta, rho, dir) |
| 7: numericDeriv(form[[3]], names(ind), env) |
| 6: getRHS() |
| 5: assign("rhs", getRHS(), envir = thisEnv) |
| 4: assign("resid", .swts * (lhs - assign("rhs", getRHS(), envir = thisEnv)), |
| envir = thisEnv) |
| 3: function (newPars) |
| @{ |
| setPars(newPars) |
| assign("resid", .swts * (lhs - assign("rhs", getRHS(), envir = thisEnv)), |
| envir = thisEnv) |
| assign("dev", sum(resid^2), envir = thisEnv) |
| assign("QR", qr(.swts * attr(rhs, "gradient")), envir = thisEnv) |
| return(QR$rank < min(dim(QR$qr))) |
| @}(c(-0.00760232418963883, 1.00119632515036)) |
| 2: .Call(R_nls_iter, m, ctrl, trace) |
| 1: nls(yeps ~ gm(a, b, x), start = list(a = 0.12345, b = 0.54321)) |
| @end smallexample |
| |
| Occasionally @code{traceback()} does not help, and this can be the case |
| if S4 method dispatch is involved. Consider the following example |
| |
| @example |
| > xyd <- new("xyloc", x=runif(20), y=runif(20)) |
| Error in as.environment(pkg) : no item called "package:S4nswv" |
| on the search list |
| Error in initialize(value, ...) : S language method selection got |
| an error when called from internal dispatch for function 'initialize' |
| > traceback() |
| 2: initialize(value, ...) |
| 1: new("xyloc", x = runif(20), y = runif(20)) |
| @end example |
| |
| @noindent |
| which does not help much, as there is no call to @code{as.environment} |
| in @code{initialize} (and the note ``called from internal dispatch'' |
| tells us so). In this case we searched the @R{} sources for the quoted |
| call, which occurred in only one place, |
| @code{methods:::.asEnvironmentPackage}. So now we knew where the |
| error was occurring. (This was an unusually opaque example.) |
| |
| The error message |
| |
| @example |
| evaluation nested too deeply: infinite recursion / options(expressions=)? |
| @end example |
| |
| @noindent |
| can be hard to handle with the default value (5000). Unless you know |
| that there actually is deep recursion going on, it can help to set |
| something like |
| |
| @example |
| options(expressions=500) |
| @end example |
| |
| @noindent |
| and re-run the example showing the error. |
| |
| Sometimes there is warning that clearly is the precursor to some later |
| error, but it is not obvious where it is coming from. Setting |
| @command{options(warn = 2)} (which turns warnings into errors) can help here. |
| |
| Once we have located the error, we have some choices. One way to proceed |
| is to find out more about what was happening at the time of the crash by |
| looking a @emph{post-mortem} dump. To do so, set |
| @findex dump.frames |
| @command{options(error=dump.frames)} and run the code again. Then invoke |
| @command{debugger()} and explore the dump. Continuing our example: |
| |
| @smallexample |
| > options(error = dump.frames) |
| > glm(resp ~ 0 + predictor, family = binomial(link ="log")) |
| Error: no valid set of coefficients has been found: please supply starting values |
| @end smallexample |
| |
| @noindent |
| which is the same as before, but an object called @code{last.dump} has |
| appeared in the workspace. (Such objects can be large, so remove it |
| when it is no longer needed.) We can examine this at a later time by |
| calling the function @code{debugger}. |
| @findex debugger |
| |
| @smallexample |
| > debugger() |
| Message: Error: no valid set of coefficients has been found: please supply starting values |
| Available environments had calls: |
| 1: glm(resp ~ 0 + predictor, family = binomial(link = "log")) |
| 2: glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, mus |
| 3: stop("no valid set of coefficients has been found: please supply starting values |
| Enter an environment number, or 0 to exit Selection: |
| @end smallexample |
| |
| @noindent |
| which gives the same sequence of calls as @code{traceback}, but in |
| outer-first order and with only the first line of the call, truncated to |
| the current width. However, we can now examine in more detail what was |
| happening at the time of the error. Selecting an environment opens the |
| browser in that frame. So we select the function call which spawned the |
| error message, and explore some of the variables (and execute two |
| function calls). |
| |
| @smallexample |
| Enter an environment number, or 0 to exit Selection: 2 |
| Browsing in the environment with call: |
| glm.fit(x = X, y = Y, weights = weights, start = start, etas |
| Called from: debugger.look(ind) |
| Browse[1]> ls() |
| [1] "aic" "boundary" "coefold" "control" "conv" |
| [6] "dev" "dev.resids" "devold" "EMPTY" "eta" |
| [11] "etastart" "family" "fit" "good" "intercept" |
| [16] "iter" "linkinv" "mu" "mu.eta" "mu.eta.val" |
| [21] "mustart" "n" "ngoodobs" "nobs" "nvars" |
| [26] "offset" "start" "valideta" "validmu" "variance" |
| [31] "varmu" "w" "weights" "x" "xnames" |
| [36] "y" "ynames" "z" |
| Browse[1]> eta |
| 1 2 3 4 5 |
| 0.000000e+00 -2.235357e-06 -1.117679e-05 -5.588393e-05 -2.794197e-04 |
| 6 7 8 9 |
| -1.397098e-03 -6.985492e-03 -3.492746e-02 -1.746373e-01 |
| Browse[1]> valideta(eta) |
| [1] TRUE |
| Browse[1]> mu |
| 1 2 3 4 5 6 7 8 |
| 1.0000000 0.9999978 0.9999888 0.9999441 0.9997206 0.9986039 0.9930389 0.9656755 |
| 9 |
| 0.8397616 |
| Browse[1]> validmu(mu) |
| [1] FALSE |
| Browse[1]> c |
| Available environments had calls: |
| 1: glm(resp ~ 0 + predictor, family = binomial(link = "log")) |
| 2: glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart |
| 3: stop("no valid set of coefficients has been found: please supply starting v |
| |
| Enter an environment number, or 0 to exit Selection: 0 |
| > rm(last.dump) |
| @end smallexample |
| |
| Because @code{last.dump} can be looked at later or even in another @R{} |
| session, post-mortem debugging is possible even for batch usage of @R{}. |
| We do need to arrange for the dump to be saved: this can be done either |
| using the command-line flag @option{--save} to save the workspace at the |
| end of the run, or @emph{via} a setting such as |
| |
| @example |
| > options(error = quote(@{dump.frames(to.file=TRUE); q()@})) |
| @end example |
| |
| @noindent |
| See the help on @code{dump.frames} for further options and a worked |
| example. |
| |
| @findex recover |
| An alternative error action is to use the function @command{recover()}: |
| |
| @smallexample |
| > options(error = recover) |
| > glm(resp ~ 0 + predictor, family = binomial(link = "log")) |
| Error: no valid set of coefficients has been found: please supply starting values |
| |
| Enter a frame number, or 0 to exit |
| |
| 1: glm(resp ~ 0 + predictor, family = binomial(link = "log")) |
| 2: glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart |
| |
| Selection: |
| @end smallexample |
| |
| @noindent |
| which is very similar to @code{dump.frames}. However, we can examine |
| the state of the program directly, without dumping and re-loading the |
| dump. As its help page says, @code{recover} can be routinely used as |
| the error action in place of @code{dump.calls} and @code{dump.frames}, |
| since it behaves like @code{dump.frames} in non-interactive use. |
| |
| |
| @findex debug |
| Post-mortem debugging is good for finding out exactly what went wrong, |
| but not necessarily why. An alternative approach is to take a closer |
| look at what was happening just before the error, and a good way to do |
| that is to use @command{debug}. This inserts a call to the browser |
| at the beginning of the function, starting in step-through mode. So in |
| our example we could use |
| |
| @smallexample |
| > debug(glm.fit) |
| > glm(resp ~ 0 + predictor, family = binomial(link ="log")) |
| debugging in: glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, |
| mustart = mustart, offset = offset, family = family, control = control, |
| intercept = attr(mt, "intercept") > 0) |
| debug: @{ |
| ## lists the whole function |
| Browse[1]> |
| debug: x <- as.matrix(x) |
| ... |
| Browse[1]> start |
| [1] -2.235357e-06 |
| debug: eta <- drop(x %*% start) |
| Browse[1]> eta |
| 1 2 3 4 5 |
| 0.000000e+00 -2.235357e-06 -1.117679e-05 -5.588393e-05 -2.794197e-04 |
| 6 7 8 9 |
| -1.397098e-03 -6.985492e-03 -3.492746e-02 -1.746373e-01 |
| Browse[1]> |
| debug: mu <- linkinv(eta <- eta + offset) |
| Browse[1]> mu |
| 1 2 3 4 5 6 7 8 |
| 1.0000000 0.9999978 0.9999888 0.9999441 0.9997206 0.9986039 0.9930389 0.9656755 |
| 9 |
| 0.8397616 |
| @end smallexample |
| |
| @noindent |
| (The prompt @code{Browse[1]>} indicates that this is the first level of |
| browsing: it is possible to step into another function that is itself |
| being debugged or contains a call to @code{browser()}.) |
| |
| @code{debug} can be used for hidden functions and S3 methods by |
| e.g.@: @code{debug(stats:::predict.Arima)}. (It cannot be used for S4 |
| methods, but an alternative is given on the help page for @code{debug}.) |
| Sometimes you want to debug a function defined inside another function, |
| e.g.@: the function @code{arimafn} defined inside @code{arima}. To do so, |
| set @code{debug} on the outer function (here @code{arima}) and |
| step through it until the inner function has been defined. Then |
| call @code{debug} on the inner function (and use @code{c} to get out of |
| step-through mode in the outer function). |
| |
| @findex undebug |
| To remove debugging of a function, call @code{undebug} with the argument |
| previously given to @code{debug}; debugging otherwise lasts for the rest |
| of the @R{} session (or until the function is edited or otherwise |
| replaced). |
| |
| @findex trace |
| @code{trace} can be used to temporarily insert debugging code into a |
| function, for example to insert a call to @code{browser()} just before |
| the point of the error. To return to our running example |
| |
| @example |
| ## first get a numbered listing of the expressions of the function |
| > page(as.list(body(glm.fit)), method="print") |
| > trace(glm.fit, browser, at=22) |
| Tracing function "glm.fit" in package "stats" |
| [1] "glm.fit" |
| > glm(resp ~ 0 + predictor, family = binomial(link ="log")) |
| Tracing glm.fit(x = X, y = Y, weights = weights, start = start, |
| etastart = etastart, .... step 22 |
| Called from: eval(expr, envir, enclos) |
| Browse[1]> n |
| ## and single-step from here. |
| > untrace(glm.fit) |
| @end example |
| @noindent |
| For your own functions, it may be as easy to use @code{fix} to insert |
| temporary code, but @code{trace} can help with functions in a namespace |
| (as can @code{fixInNamespace}). Alternatively, use |
| @code{trace(,edit=TRUE)} to insert code visually. |
| |
| |
| @node Checking memory access, Debugging compiled code, Debugging R code, Debugging |
| @section Checking memory access |
| |
| Errors in memory allocation and reading/writing outside arrays are very |
| common causes of crashes (e.g.,@: segfaults) on some machines. Often |
| the crash appears long after the invalid memory access: in particular |
| damage to the structures which @R{} itself has allocated may only become |
| apparent at the next garbage collection (or even at later garbage |
| collections after objects have been deleted). |
| |
| Note that memory access errors may be seen with LAPACK, BLAS, OpenMP and |
| Java-using packages: some at least of these seem to be intentional, and |
| some are related to passing characters to Fortran. |
| |
| Some of these tools can detect mismatched allocation and deallocation. |
| C++ programmers should note that memory allocated by @code{new []} must |
| be freed by @code{delete []}, other uses of @code{new} by @code{delete}, |
| and memory allocated by @code{malloc}, @code{calloc} and @code{realloc} |
| by @code{free}. Some platforms will tolerate mismatches (perhaps with |
| memory leaks) but others will segfault. |
| |
| @menu |
| * Using gctorture:: |
| * Using valgrind:: |
| * Using Address Sanitizer:: |
| * Using Undefined Behaviour Sanitizer:: |
| * Other analyses with `clang':: |
| * Using `Dr. Memory':: |
| * Fortran array bounds checking:: |
| @end menu |
| |
| @node Using gctorture, Using valgrind, Checking memory access, Checking memory access |
| @subsection Using gctorture |
| |
| @findex gctorture |
| We can help to detect memory problems in @R{} objects earlier by running |
| garbage collection as often as possible. This is achieved by |
| @code{gctorture(TRUE)}, which as described on its help page |
| |
| @quotation |
| Provokes garbage collection on (nearly) every memory allocation. |
| Intended to ferret out memory protection bugs. Also makes @R{} run |
| @emph{very} slowly, unfortunately. |
| @end quotation |
| |
| @noindent |
| The reference to `memory protection' is to missing C-level calls to |
| @code{PROTECT}/@code{UNPROTECT} (@pxref{Garbage Collection}) which if |
| missing allow @R{} objects to be garbage-collected when they are still |
| in use. But it can also help with other memory-related errors. |
| |
| Normally running under @code{gctorture(TRUE)} will just produce a crash |
| earlier in the @R{} program, hopefully close to the actual cause. See |
| the next section for how to decipher such crashes. |
| |
| It is possible to run all the examples, tests and vignettes covered by |
| @code{R CMD check} under @code{gctorture(TRUE)} by using the option |
| @option{--use-gct}. |
| |
| The function @code{gctorture2} provides more refined control over the GC |
| torture process. Its arguments @code{step}, @code{wait} and |
| @code{inhibit_release} are documented on its help page. Environment |
| variables can also be used at the start of the @R{} session to turn on |
| GC torture: @env{R_GCTORTURE} corresponds to the @code{step} argument to |
| @code{gctorture2}, @env{R_GCTORTURE_WAIT} to @code{wait}, and |
| @env{R_GCTORTURE_INHIBIT_RELEASE} to @code{inhibit_release}. |
| |
| If @R{} is configured with @option{--enable-strict-barrier} then a |
| variety of tests for the integrity of the write barrier are enabled. In |
| addition tests to help detect protect issues are enabled: |
| |
| @itemize @bullet |
| |
| @item |
| All GCs are full GCs. |
| |
| @item |
| New nodes in small node pages are marked as @code{NEWSXP} on creation. |
| |
| @item |
| After a GC all free nodes that are not of type @code{NEWSXP} are marked |
| as type @code{FREESXP} and their previous type is recorded. |
| |
| @item |
| Most calls to accessor functions check their @code{SEXP} inputs and |
| @code{SEXP} outputs and signal an error if a @code{FREESXP} is found. |
| The address of the node and the old type are included in the error |
| message. |
| |
| @end itemize |
| |
| @code{R CMD check --use-gct} can be set to use |
| @code{gctorture2(@var{n})} rather than @code{gctorture(TRUE)} by setting |
| environment variable @env{_R_CHECK_GCT_N_} to a positive integer value |
| to be used as @code{@var{n}}. |
| |
| Used with a debugger and with @code{gctorture} or @code{gctorture2} this |
| mechanism can be helpful in isolating memory protect problems. |
| |
| |
| @node Using valgrind, Using Address Sanitizer, Using gctorture, Checking memory access |
| @subsection Using valgrind |
| |
| If you have access to Linux on a common CPU type or supported versions |
| of macOS or Solaris you can use @code{valgrind} |
| (@uref{http://www.valgrind.org/}, pronounced to rhyme with `tinned') to |
| check for possible problems. To run some examples under @code{valgrind} |
| use something like |
| |
| @example |
| R -d valgrind --vanilla < mypkg-Ex.R |
| R -d "valgrind --tool=memcheck --leak-check=full" --vanilla < mypkg-Ex.R |
| @end example |
| |
| @noindent |
| where @file{mypkg-Ex.R} is a set of examples, e.g.@: the file created in |
| @file{mypkg.Rcheck} by @code{R CMD check}. Occasionally this reports |
| memory reads of `uninitialised values' that are the result of compiler |
| optimization, so can be worth checking under an unoptimized compile: for |
| maximal information use a build with debugging symbols. We know there |
| will be some small memory leaks from @code{readline} and @R{} itself --- |
| these are memory areas that are in use right up to the end of the @R{} |
| session. Expect this to run around 20x slower than without |
| @code{valgrind}, and in some cases much slower than that. Several |
| versions of @code{valgrind} were not happy with some optimized BLASes |
| that use @acronym{CPU}-specific instructions so you may need to build a |
| version of @R{} specifically to use with @code{valgrind}. |
| |
| On platforms where @code{valgrind} is installed you can build a version |
| of @R{} with extra instrumentation to help @code{valgrind} detect errors |
| in the use of memory allocated from the @R{} heap. The |
| @command{configure} option is |
| @option{--with-valgrind-instrumentation=@var{level}}, where @var{level} |
| is 0, 1 or 2. Level 0 is the default and does not add anything. |
| Level 1 will detect some uses@footnote{Those in some numeric, logical, |
| integer, raw, complex vectors and in memory allocated by |
| @code{R_alloc}.} of uninitialised memory and has little impact on speed |
| (compared to level 0). Level 2 will detect many other memory-use |
| bugs@footnote{including using the data sections of @R{} vectors after |
| they are freed.} but make @R{} much slower when running under |
| @code{valgrind}. Using this in conjunction with @code{gctorture} can be |
| even more effective (and even slower). |
| |
| An example of @code{valgrind} output is |
| @smallexample |
| ==12539== Invalid read of size 4 |
| ==12539== at 0x1CDF6CBE: csc_compTr (Mutils.c:273) |
| ==12539== by 0x1CE07E1E: tsc_transpose (dtCMatrix.c:25) |
| ==12539== by 0x80A67A7: do_dotcall (dotcode.c:858) |
| ==12539== by 0x80CACE2: Rf_eval (eval.c:400) |
| ==12539== by 0x80CB5AF: R_execClosure (eval.c:658) |
| ==12539== by 0x80CB98E: R_execMethod (eval.c:760) |
| ==12539== by 0x1B93DEFA: R_standardGeneric (methods_list_dispatch.c:624) |
| ==12539== by 0x810262E: do_standardGeneric (objects.c:1012) |
| ==12539== by 0x80CAD23: Rf_eval (eval.c:403) |
| ==12539== by 0x80CB2F0: Rf_applyClosure (eval.c:573) |
| ==12539== by 0x80CADCC: Rf_eval (eval.c:414) |
| ==12539== by 0x80CAA03: Rf_eval (eval.c:362) |
| ==12539== Address 0x1C0D2EA8 is 280 bytes inside a block of size 1996 alloc'd |
| ==12539== at 0x1B9008D1: malloc (vg_replace_malloc.c:149) |
| ==12539== by 0x80F1B34: GetNewPage (memory.c:610) |
| ==12539== by 0x80F7515: Rf_allocVector (memory.c:1915) |
| ... |
| @end smallexample |
| @noindent |
| This example is from an instrumented version of @R{}, while tracking |
| down a bug in the @CRANpkg{Matrix} package in 2006. The first line |
| indicates that @R{} has tried to read 4 bytes from a memory address that |
| it does not have access to. This is followed by a C stack trace showing |
| where the error occurred. Next is a description of the memory that was |
| accessed. It is inside a block allocated by @code{malloc}, called from |
| @code{GetNewPage}, that is, in the internal @R{} heap. Since this |
| memory all belongs to @R{}, @code{valgrind} would not (and did not) |
| detect the problem in an uninstrumented build of @R{}. In this example |
| the stack trace was enough to isolate and fix the bug, which was in |
| @code{tsc_transpose}, and in this example running under |
| @code{gctorture()} did not provide any additional information. |
| @c Was removed: see https://sourceforge.net/p/valgrind/mailman/message/34306867/ |
| @c When the stack trace is not sufficiently informative the option |
| @c @option{--db-attach=yes} to @code{valgrind} may be helpful. This starts |
| @c a post-mortem debugger (by default @code{gdb}) so that variables in the |
| @c C code can be inspected (@pxref{Inspecting R objects}). |
| |
| @command{valgrind} is good at spotting the use of uninitialized values: |
| use option @option{--track-origins=yes} to show where these originated |
| from. What it cannot detect is the misuse of arrays allocated on the |
| stack: this includes C automatic variables and some@footnote{small |
| fixed-size arrays by default in @command{gfortran}, for example.} |
| Fortran arrays. |
| |
| It is possible to run all the examples, tests and vignettes covered by |
| @code{R CMD check} under @code{valgrind} by using the option |
| @option{--use-valgrind}. If you do this you will need to select the |
| @code{valgrind} options some other way, for example by having a |
| @file{~/.valgrindrc} file containing |
| |
| @example |
| --leak-check=full |
| --track-origins=yes |
| @end example |
| |
| @noindent |
| or setting the environment variable @env{VALGRIND_OPTS}. |
| |
| On macOS you may need to ensure that debugging symbols are made available |
| (so @command{valgrind} reports line numbers in files). This can usually |
| be done with the @command{valgrind} option @option{--dsymutil=yes} to |
| ask for the symbols to be dumped when the @file{.so} file is loaded. |
| This will not work where packages are installed into a system area (such |
| as the @file{R.framework}) and can be slow. Installing packages with |
| @command{R CMD INSTALL --dsym} installs the dumped symbols. (This can |
| also be done by setting environment variable @env{PKG_MAKE_DSYM} to a |
| non-empty value before the @command{INSTALL}.) |
| |
| This section has described the use of @command{memtest}, the default |
| (and most useful) of @code{valgrind}'s tools. There are others |
| described in its documentation: @command{helgrind} can be useful for |
| threaded programs. |
| |
| @node Using Address Sanitizer, Using Undefined Behaviour Sanitizer, Using valgrind, Checking memory access |
| @subsection Using the Address Sanitizer |
| |
| @command{AddressSanitizer} (`ASan') is a tool with similar aims to the |
| memory checker in @command{valgrind}. It is available with suitable |
| builds@footnote{currently on Linux and macOS (including the builds from |
| Xcode 7 and later), with some support for Solaris. On some platforms |
| the runtime library, @pkg{libasan}, needs to be installed separately, |
| and for checking C++ you may also need @pkg{libubsan}.} of @command{gcc} |
| and @command{clang} on common Linux and macOS platforms. See |
| @uref{https://clang.llvm.org/@/docs/@/UsersManual.html#controlling-code-generation}, |
| @uref{https://clang.llvm.org/@/docs/@/AddressSanitizer.html} and |
| @uref{https://code.google.com/@/p/@/address-sanitizer/}. |
| |
| More thorough checks of C++ code are done if the C++ library has been |
| `annotated': at the time of writing this applied to @code{std::vector} |
| in @code{libc++} for use with @command{clang} and gives rise to |
| @samp{container-overflow}@footnote{see |
| @uref{http://llvm.org/devmtg/2014-04/PDFs/LightningTalks/EuroLLVM%202014%20--%20container%20overflow.pdf}.} |
| reports. |
| |
| It requires code to have been compiled @emph{and linked} with |
| @option{-fsanitize=address} and compiling with @code{-fno-omit-frame-pointer} |
| will give more legible reports. It has a runtime penalty of 2--3x, |
| extended compilation times and uses substantially more memory, often |
| 1--2GB, at run time. On 64-bit platforms it reserves (but does not |
| allocate) 16--20TB of virtual memory: restrictive shell settings can |
| cause problems. |
| |
| By comparison with @command{valgrind}, ASan can |
| detect misuse of stack and global variables but not the use of |
| uninitialized memory. |
| |
| Recent versions return symbolic addresses for the location of the error |
| provided @command{llvm-symbolizer}@footnote{part of the LLVM project and |
| in distributed in @code{llvm} RPMs and @code{.deb}s on Linux. It is not |
| currently shipped by Apple.} is on the path: if it is available but not |
| on the path or has been renamed@footnote{as Ubuntu has been said to do.}, one |
| can use an environment variable, e.g.@: |
| |
| @example |
| ASAN_SYMBOLIZER_PATH=/path/to/llvm-symbolizer |
| @end example |
| |
| @noindent |
| An alternative is to pipe the output through |
| @command{asan_symbolize.py}@footnote{installed on some Linux systems as |
| @command{asan_symbolize}, and obtainable from |
| @uref{https://llvm.org/@/svn/@/llvm-project/@/compiler-rt/@/trunk/@/lib/@/asan/scripts/@/asan_symbolize.py}: |
| it makes use of @command{llvm-symbolizer} if available.} and perhaps |
| then (for compiled C++ code) @command{c++filt}. (On macOS, you may need |
| to run @command{dsymutil} to get line-number reports.) |
| |
| The simplest way to make use of this is to build a version of @R{} with |
| something like |
| |
| @example |
| CC="gcc -std=gnu99 -fsanitize=address" |
| CFLAGS="-fno-omit-frame-pointer -g -O2 -Wall -pedantic -mtune=native" |
| @end example |
| |
| @noindent |
| which will ensure that the @code{libasan} run-time library is compiled |
| into the @R{} executable. However this check can be enabled on a |
| per-package basis by using a @file{~/.R/Makevars} file like |
| @example |
| CC = gcc -std=gnu99 -fsanitize=address -fno-omit-frame-pointer |
| CXX = g++ -fsanitize=address -fno-omit-frame-pointer |
| FC = gfortran -fsanitize=address |
| @end example |
| @noindent |
| (Note that @code{-fsanitize=address} has to be part of the compiler |
| specification to ensure it is used for linking. These settings will not |
| be honoured by packages which ignore @file{~/.R/Makevars}.) It will |
| be necessary to build @R{} with |
| |
| @example |
| MAIN_LDFLAGS = -fsanitize=address |
| @end example |
| |
| @noindent |
| to link the runtime libraries into the @R{} executable if it was not |
| specified as part of @samp{CC} when @R{} was built. (For some builds |
| without OpenMP, @option{-pthread} is also required.) |
| |
| For options available @emph{via} the environment variable |
| @env{ASAN_OPTIONS} see |
| @uref{https://code.google.com/@/p/@/address-sanitizer/@/wiki/@/AddressSanitizerFLags}. |
| With @command{gcc} additional control is available @emph{via} the |
| @option{--param} flag: see its @command{man} page. |
| |
| For more detailed information on an error, @R{} can be run under a |
| debugger with a breakpoint set before the address sanitizer report is |
| produced: for @command{gdb} or @command{lldb} you could use |
| @example |
| break __asan_report_error |
| @end example |
| @noindent |
| (See |
| @uref{https://code.google.com/@/p/@/address-sanitizer/@/wiki@//AddressSanitizer#gdb}.) |
| |
| More recent versions@footnote{including @command{gcc} 7.1 and |
| @command{clang} 4.0.0: for @command{gcc} it is implied by |
| @option{-fsanitize=address}.} added the flag |
| @option{-fsanitize-address-use-after-scope}: see |
| @uref{https://github.com/@/google/@/sanitizers/@/wiki/@/AddressSanitizerUseAfterScope}. |
| |
| One of the checks done by ASAN is that @code{malloc/free} and in C++ |
| @code{new/delete} and @code{new[]/delete[]} are used consistently |
| (rather than say @code{free} being used to dealloc memory allocated by |
| @code{new[]}). This matters on some systems but not all: unfortunately |
| on some of those where it does not matter, system libraries@footnote{for |
| example, X11/GL libraries on Linux, seen when checking package |
| @CRANpkg{rgl} and some others using it---a workaround is to set |
| environment variable @env{RGL_USE_NULL=true}.} are not consistent. The |
| check can be suppressed by including @samp{alloc_dealloc_mismatch=0} in |
| @env{ASAN_OPTIONS}. |
| |
| ASAN also checks system calls and sometimes reports can refer to |
| problems in the system software and not the package nor @R{}. A couple |
| of reports have been of `heap-use-after-free' errors in the X11 |
| libraries called from Tcl/Tk. |
| |
| @menu |
| * Using Leak Sanitizer:: |
| @end menu |
| |
| @node Using Leak Sanitizer, , Using Address Sanitizer, Using Address Sanitizer |
| @subsubsection Using the Leak Sanitizer |
| |
| For @code{x86_64} Linux there is a leak sanitizer, `LSan': see |
| @uref{https://code.google.com/@/p/@/address-sanitizer/@/wiki/@/LeakSanitizer}. |
| This is available on recent versions of @code{gcc} and @code{clang}, and |
| where available is compiled in as part of ASan. |
| |
| One way to invoke this from an ASan-enabled build is by the environment |
| variable |
| |
| @example |
| ASAN_OPTIONS='detect_leaks=1' |
| @end example |
| @noindent |
| However, this was made the default as from @command{clang} 3.5 and |
| @command{gcc} 5.1.0. |
| |
| When LSan is enabled, leaks give the process a failure error status (by |
| default @code{23}). For an @R{} package this means the @R{} process, |
| and as the parser retains some memory to the end of the process, if @R{} |
| itself was built against ASan, all runs will have a failure error status |
| (which may include running @R{} as part of building @R{} itself). |
| |
| To disable this, allocation-mismatch checking and some strict C++ |
| checking use |
| |
| @example |
| setenv ASAN_OPTIONS 'alloc_dealloc_mismatch=0:detect_leaks=0:detect_odr_violation=0' |
| @end example |
| |
| LSan also has a `stand-alone' mode where it is compiled in using |
| @option{-fsanitize=leak} and avoids the run-time overhead of ASan. |
| |
| @node Using Undefined Behaviour Sanitizer, Other analyses with `clang', Using Address Sanitizer, Checking memory access |
| @subsection Using the Undefined Behaviour Sanitizer |
| |
| `Undefined behaviour' is where the language standard does not require |
| particular behaviour from the compiler. Examples include division by |
| zero (where for doubles @R{} requires the |
| @acronym{ISO}/@acronym{IEC}@tie{}60559 behaviour but C/C++ do not), use |
| of zero-length arrays, shifts too far for signed types (e.g.@: @code{int |
| x, y; y = x << 31;}), out-of-range coercion, invalid C++ casts and |
| mis-alignment. Not uncommon examples of out-of-range coercion in @R{} |
| packages are attempts to coerce a @code{NaN} or infinity to type |
| @code{int} or @code{NA_INTEGER} to an unsigned type such as |
| @code{size_t}. Also common is @code{y[x - 1]} forgetting that @code{x} |
| might be @code{NA_INTEGER}. |
| |
| `UBSanitizer' is a tool for C/C++ source code selected by |
| @option{-fsanitize=undefined} in suitable builds@footnote{On some |
| platforms the runtime library, @pkg{libubsan}, needs to be installed |
| separately.} of @command{clang} and GCC. Its (main) runtime library is |
| linked into each package's DLL, so it is less often needed to be |
| included in @env{MAIN_LDFLAGS}. |
| |
| This sanitizer can be combined with the Address Sanitizer by |
| @option{-fsanitize=undefined,address} (where both are supported). |
| |
| Finer control of what is checked can be achieved by other options. |
| |
| For @command{clang} see |
| @uref{https://clang.llvm.org/@/docs/@/UndefinedBehaviorSanitizer.html#ubsan-checks}. |
| The current set is (on a single line): |
| @example |
| -fsanitize=alignment,bool,bounds,builtin,enum,float-cast-overflow, |
| float-divide-by-zero,function,implicit-unsigned-integer-truncation, |
| implicit-signed-integer-truncation,implicit-integer-sign-change, |
| integer-divide-by-zero,nonnull-attribute,null,object-size, |
| pointer-overflow,return,returns-nonnull-attribute,shift, |
| signed-integer-overflow,unreachable,unsigned-integer-overflow,vla-bound,vptr |
| @end example |
| |
| @noindent |
| (plus the more specific versions @code{shift-base} and |
| @code{shift-exponent}) a subset of which could be combined with |
| @code{address}, or use something like |
| |
| @example |
| -fsanitize=undefined -fno-sanitize=float-divide-by-zero |
| @end example |
| |
| @noindent |
| Options @code{function}, @code{return} and @code{vptr} apply only to C++: to |
| use @code{vptr} its run-time library needs to be linked into the main |
| @R{} executable by building the latter with something like |
| @example |
| MAIN_LD="clang++ -fsanitize=undefined" |
| @end example |
| @noindent |
| Option @code{float-divide-by-zero} is undesirable for use with @R{} |
| which allow such divisions as part of @acronym{IEC}@tie{}60559 |
| arithmetic. |
| |
| For GCC see |
| @uref{https://gcc.gnu.org/@/onlinedocs/@/gcc/@/Instrumentation-Options.html} |
| (or the manual for your version of GCC, installed or @emph{via} |
| @uref{https://gcc.gnu.org/@/onlinedocs/}: look for `Program |
| Instrumentation Options') for the options supported by GCC: 6 and 7 supported |
| @example |
| -fsanitize=alignment,bool,bounds,enum,integer-divide-by-zero, |
| nonnull-attribute,null,object-size,return,returns-nonnull-attribute, |
| shift,signed-integer-overflow,unreachable,vla-bound,vptr |
| @end example |
| @noindent |
| plus the more specific versions @code{shift-base} and |
| @code{shift-exponent} and non-default options |
| @example |
| bound-strict,float-cast-overflow,float-divide-by-zero |
| @end example |
| @noindent |
| where @code{float-divide-by-zero} is not desirable for @R{} uses and |
| @code{bounds-strict} is an extension of @code{bounds}. From GCC 8 |
| @code{signed-integer-overflow} is no longer a default part of |
| @option{-fsanitize=undefined}, but can be specified separately. It adds |
| options @option{-fsanitize=pointer-overflow} and |
| @option{-fsanitize=builtin}. |
| |
| Other useful flags include |
| @example |
| -no-fsanitize-recover |
| @end example |
| |
| @noindent |
| which causes the first report to be fatal (it always is for the |
| @code{unreachable} and @code{return} suboptions). For more detailed |
| information on where the runtime error occurs, using |
| |
| @example |
| setenv UBSAN_OPTIONS 'print_stacktrace=1' |
| @end example |
| @noindent will include a traceback in the report. Beyond that, @R{} can |
| be run under a debugger with a breakpoint set before the sanitizer |
| report is produced: for @command{gdb} or @command{lldb} you could use |
| @example |
| break __ubsan_handle_float_cast_overflow |
| break __ubsan_handle_float_cast_overflow_abort |
| @end example |
| @noindent |
| or similar (there are handlers for each type of undefined behaviour). |
| |
| There are also the compiler flags @option{-fcatch-undefined-behavior} |
| and @option{-ftrapv}, said to be more reliable in @command{clang} than |
| @command{gcc}. |
| |
| For more details on the topic see |
| @uref{http://blog.regehr.org/archives/213} and |
| @uref{http://blog.llvm.org/@/2011/@/05/@/what-every-c-programmer-should-know.html} |
| (which has 3 parts). |
| |
| It may or may not be possible to build @R{} itself with |
| @option{-fsanitize=undefined}: when last tried it worked with |
| @command{clang} but there were problems with OpenMP-using code with |
| @command{gcc}. |
| |
| |
| @node Other analyses with `clang', Using `Dr. Memory', Using Undefined Behaviour Sanitizer, Checking memory access |
| @subsection Other analyses with `clang' |
| |
| Recent versions of @command{clang} on @cputype{x86_64} Linux have |
| `ThreadSanitizer' (@uref{https://code.google.com/@/p/@/thread-sanitizer/}), |
| a `data race detector for C/C++ programs', and `MemorySanitizer' |
| (@uref{https://clang.llvm.org/@/docs/@/MemorySanitizer.html}, |
| @uref{https://code.google.com/@/p/@/memory-sanitizer/@/wiki/@/MemorySanitizer}) |
| for the detection of uninitialized memory. Both are based on and |
| provide similar functionality to tools in @command{valgrind}. |
| |
| @command{clang} has a `Static Analyser' which can be run on the source |
| files during compilation: see @uref{https://clang-analyzer.llvm.org/}. |
| |
| @node Using `Dr. Memory', Fortran array bounds checking, Other analyses with `clang', Checking memory access |
| @subsection Using `Dr. Memory' |
| |
| `Dr. Memory' from @uref{http://www.drmemory.org/} is a memory checker |
| for (currently) 32-bit Windows, Linux and macOS with similar aims to |
| @command{valgrind}. It works with unmodified executables@footnote{but |
| works better if inlining and frame pointer optimizations are disabled.} |
| and detects memory access errors, uninitialized reads and memory leaks. |
| |
| @node Fortran array bounds checking, , Using `Dr. Memory', Checking memory access |
| @subsection Fortran array bounds checking |
| |
| Most of the Fortran compilers used with @R{} allow code to be compiled |
| with checking of array bounds: for example @command{gfortran} has option |
| @option{-fbounds-check} and Oracle Developer Studio has @option{-C}. |
| This will give an error when the upper or lower bound is exceeded, e.g. |
| @example |
| At line 97 of file .../src/appl/dqrdc2.f |
| Fortran runtime error: Index '1' of dimension 1 of array 'x' above upper bound of 0 |
| @end example |
| |
| One does need to be aware that lazy programmers often specify Fortran |
| dimensions as @code{1} rather than @code{*} or a real bound and these |
| will be reported (as may @code{*} dimensions) |
| |
| It is easy to arrange to use this check on just the code in your |
| package: add to @file{~/.R/Makevars} something like (for |
| @command{gfortran}) |
| @example |
| FFLAGS = -g -O2 -mtune=native -fbounds-check |
| @end example |
| |
| @noindent |
| when you run @command{R CMD check}. |
| |
| This may report errors with the way that Fortran character variables are |
| passed, particularly when Fortran subroutines are called from C code and |
| character lengths are not passed (@pxref{Fortran character strings}). |
| |
| |
| @node Debugging compiled code, Using Link-time Optimization, Checking memory access, Debugging |
| @section Debugging compiled code |
| @cindex Debugging |
| |
| |
| Sooner or later programmers will be faced with the need to debug |
| compiled code loaded into @R{}. This section is geared to platforms |
| using @command{gdb} with code compiled by @code{gcc}, but similar things |
| are possible with other debuggers such as @command{lldb} |
| (@uref{http://lldb.llvm.org/}, used on macOS) and Sun's @command{dbx}: |
| some debuggers have graphical front-ends available. |
| |
| Consider first `crashes', that is when @R{} terminated unexpectedly with |
| an illegal memory access (a `segfault' or `bus error'), illegal |
| instruction or similar. Unix-alike versions of @R{} use a signal |
| handler which aims to give some basic information. For example |
| |
| @example |
| *** caught segfault *** |
| address 0x20000028, cause 'memory not mapped' |
| |
| Traceback: |
| 1: .identC(class1[[1]], class2) |
| 2: possibleExtends(class(sloti), classi, ClassDef2 = getClassDef(classi, |
| where = where)) |
| 3: validObject(t(cu)) |
| 4: stopifnot(validObject(cu <- as(tu, "dtCMatrix")), validObject(t(cu)), |
| validObject(t(tu))) |
| |
| Possible actions: |
| 1: abort (with core dump) |
| 2: normal R exit |
| 3: exit R without saving workspace |
| 4: exit R saving workspace |
| Selection: 3 |
| @end example |
| |
| @noindent |
| Since the @R{} process may be damaged, the only really safe options are |
| the first or third. (Note that a core dump is only produced where |
| enabled: a common default in a shell is to limit its size to 0, thereby |
| disabling it.) |
| |
| A fairly common cause of such crashes is a package which uses @code{.C} |
| or @code{.Fortran} and writes beyond (at either end) one of the |
| arguments it is passed. There is a good way to detect this: using |
| @code{options(CBoundsCheck = TRUE)} (which can be selected @emph{via} |
| the environment variable @env{R_C_BOUNDS_CHECK=yes)} changes the way |
| @code{.C} and @code{.Fortran} work to check if the compiled code writes |
| in the 64 bytes at either end of an argument. |
| |
| Another cause of a `crash' is to overrun the C stack. @R{} tries to |
| track that in its own code, but it may happen in third-party compiled |
| code. For modern POSIX-compliant OSes @R{} can safely catch that and |
| return to the top-level prompt, so one gets something like |
| |
| @example |
| > .C("aaa") |
| Error: segfault from C stack overflow |
| > |
| @end example |
| |
| @noindent |
| However, C stack overflows are fatal under Windows and normally defeat |
| attempts at debugging on that platform. Further, the size of the stack |
| is set when @R{} is compiled, whereas on POSIX OSes it can be set in the |
| shell from which @R{} is launched. |
| |
| If you have a crash which gives a core dump you can use something like |
| |
| @example |
| gdb /path/to/R/bin/exec/R core.12345 |
| @end example |
| |
| @noindent |
| to examine the core dump. If core dumps are disabled or to catch errors |
| that do not generate a dump one can run @R{} directly under a debugger |
| by for example |
| |
| @example |
| $ R -d gdb --vanilla |
| ... |
| gdb> run |
| @end example |
| |
| @noindent |
| at which point @R{} will run normally, and hopefully the debugger will |
| catch the error and return to its prompt. This can also be used to |
| catch infinite loops or interrupt very long-running code. For a simple |
| example |
| |
| @example |
| > for(i in 1:1e7) x <- rnorm(100) |
| [hit Ctrl-C] |
| Program received signal SIGINT, Interrupt. |
| 0x00397682 in _int_free () from /lib/tls/libc.so.6 |
| (gdb) where |
| #0 0x00397682 in _int_free () from /lib/tls/libc.so.6 |
| #1 0x00397eba in free () from /lib/tls/libc.so.6 |
| #2 0xb7cf2551 in R_gc_internal (size_needed=313) |
| at /users/ripley/R/svn/R-devel/src/main/memory.c:743 |
| #3 0xb7cf3617 in Rf_allocVector (type=13, length=626) |
| at /users/ripley/R/svn/R-devel/src/main/memory.c:1906 |
| #4 0xb7c3f6d3 in PutRNGstate () |
| at /users/ripley/R/svn/R-devel/src/main/RNG.c:351 |
| #5 0xb7d6c0a5 in do_random2 (call=0x94bf7d4, op=0x92580e8, args=0x9698f98, |
| rho=0x9698f28) at /users/ripley/R/svn/R-devel/src/main/random.c:183 |
| ... |
| @end example |
| |
| In many cases it is possible to attach a debugger to a running process: |
| this is helpful if an alternative front-end is in use or to investigate |
| a task that seems to be taking far too long. This is done by something |
| like |
| |
| @example |
| gdb -p @var{pid} |
| @end example |
| |
| @noindent |
| where @code{@var{pid}} is the id of the @R{} executable or front-end. |
| This stops the process so its state can be examined: use @code{continue} |
| to resume execution. |
| |
| Some ``tricks'' worth knowing follow: |
| |
| @menu |
| * Finding entry points:: |
| * Inspecting R objects:: |
| @end menu |
| |
| @node Finding entry points, Inspecting R objects, Debugging compiled code, Debugging compiled code |
| @subsection Finding entry points in dynamically loaded code |
| |
| Under most compilation environments, compiled code dynamically loaded |
| into @R{} cannot have breakpoints set within it until it is loaded. To |
| use a symbolic debugger on such dynamically loaded code under |
| Unix-alikes use |
| |
| @itemize @bullet |
| @item |
| Call the debugger on the @R{} executable, for example by @kbd{R -d gdb}. |
| @item |
| Start @R{}. |
| @item |
| At the @R{} prompt, use @code{dyn.load} or @code{library} to load your |
| shared object. |
| @item |
| Send an interrupt signal. This will put you back to the debugger |
| prompt. |
| @item |
| Set the breakpoints in your code. |
| @item |
| Continue execution of @R{} by typing @kbd{signal 0@key{RET}}. |
| @end itemize |
| |
| Under Windows signals may not be able to be used, and if so the procedure is |
| more complicated. See the rw-FAQ. |
| |
| |
| @node Inspecting R objects, , Finding entry points, Debugging compiled code |
| @subsection Inspecting R objects when debugging |
| @cindex Inspecting R objects when debugging |
| |
| The key to inspecting @R{} objects from compiled code is the function |
| @code{PrintValue(SEXP @var{s})} which uses the normal @R{} printing |
| mechanisms to print the @R{} object pointed to by @var{s}, or the safer |
| version @code{R_PV(SEXP @var{s})} which will only print `objects'. |
| |
| One way to make use of @code{PrintValue} is to insert suitable calls |
| into the code to be debugged. |
| |
| Another way is to call @code{R_PV} from the symbolic debugger. |
| (@code{PrintValue} is hidden as @code{Rf_PrintValue}.) For example, |
| from @code{gdb} we can use |
| |
| @example |
| (gdb) p R_PV(ab) |
| @end example |
| |
| @noindent |
| using the object @code{ab} from the convolution example, if we have |
| placed a suitable breakpoint in the convolution C code. |
| |
| To examine an arbitrary @R{} object we need to work a little harder. |
| For example, let |
| |
| @example |
| R> DF <- data.frame(a = 1:3, b = 4:6) |
| @end example |
| |
| @noindent |
| By setting a breakpoint at @code{do_get} and typing @kbd{get("DF")} at |
| the @R{} prompt, one can find out the address in memory of @code{DF}, for |
| example |
| |
| @example |
| @group |
| Value returned is $1 = (SEXPREC *) 0x40583e1c |
| (gdb) p *$1 |
| $2 = @{ |
| sxpinfo = @{type = 19, obj = 1, named = 1, gp = 0, |
| mark = 0, debug = 0, trace = 0, = 0@}, |
| attrib = 0x40583e80, |
| u = @{ |
| vecsxp = @{ |
| length = 2, |
| type = @{c = 0x40634700 "0>X@@D>X@@0>X@@", i = 0x40634700, |
| f = 0x40634700, z = 0x40634700, s = 0x40634700@}, |
| truelength = 1075851272, |
| @}, |
| primsxp = @{offset = 2@}, |
| symsxp = @{pname = 0x2, value = 0x40634700, internal = 0x40203008@}, |
| listsxp = @{carval = 0x2, cdrval = 0x40634700, tagval = 0x40203008@}, |
| envsxp = @{frame = 0x2, enclos = 0x40634700@}, |
| closxp = @{formals = 0x2, body = 0x40634700, env = 0x40203008@}, |
| promsxp = @{value = 0x2, expr = 0x40634700, env = 0x40203008@} |
| @} |
| @} |
| @end group |
| @end example |
| |
| @noindent |
| (Debugger output reformatted for better legibility). |
| |
| Using @code{R_PV()} one can ``inspect'' the values of the various |
| elements of the SEXP, for example, |
| |
| @example |
| @group |
| (gdb) p R_PV($1->attrib) |
| $names |
| [1] "a" "b" |
| |
| $row.names |
| [1] "1" "2" "3" |
| |
| $class |
| [1] "data.frame" |
| |
| $3 = void |
| @end group |
| @end example |
| |
| To find out where exactly the corresponding information is stored, one |
| needs to go ``deeper'': |
| |
| @example |
| @group |
| (gdb) set $a = $1->attrib |
| (gdb) p $a->u.listsxp.tagval->u.symsxp.pname->u.vecsxp.type.c |
| $4 = 0x405d40e8 "names" |
| (gdb) p $a->u.listsxp.carval->u.vecsxp.type.s[1]->u.vecsxp.type.c |
| $5 = 0x40634378 "b" |
| (gdb) p $1->u.vecsxp.type.s[0]->u.vecsxp.type.i[0] |
| $6 = 1 |
| (gdb) p $1->u.vecsxp.type.s[1]->u.vecsxp.type.i[1] |
| $7 = 5 |
| @end group |
| @end example |
| |
| Another alternative is the @code{R_inspect} function which shows the |
| low-level structure of the objects recursively (addresses differ from |
| the above as this example is created on another machine): |
| |
| @example |
| @group |
| (gdb) p R_inspect($1) |
| @@100954d18 19 VECSXP g0c2 [OBJ,NAM(2),ATT] (len=2, tl=0) |
| @@100954d50 13 INTSXP g0c2 [NAM(2)] (len=3, tl=0) 1,2,3 |
| @@100954d88 13 INTSXP g0c2 [NAM(2)] (len=3, tl=0) 4,5,6 |
| ATTRIB: |
| @@102a70140 02 LISTSXP g0c0 [] |
| TAG: @@10083c478 01 SYMSXP g0c0 [MARK,NAM(2),gp=0x4000] "names" |
| @@100954dc0 16 STRSXP g0c2 [NAM(2)] (len=2, tl=0) |
| @@10099df28 09 CHARSXP g0c1 [MARK,gp=0x21] "a" |
| @@10095e518 09 CHARSXP g0c1 [MARK,gp=0x21] "b" |
| TAG: @@100859e60 01 SYMSXP g0c0 [MARK,NAM(2),gp=0x4000] "row.names" |
| @@102a6f868 13 INTSXP g0c1 [NAM(2)] (len=2, tl=1) -2147483648,-3 |
| TAG: @@10083c948 01 SYMSXP g0c0 [MARK,gp=0x4000] "class" |
| @@102a6f838 16 STRSXP g0c1 [NAM(2)] (len=1, tl=1) |
| @@1008c6d48 09 CHARSXP g0c2 [MARK,gp=0x21,ATT] "data.frame" |
| @end group |
| @end example |
| |
| In general the representation of each object follows the format: |
| |
| @smallexample |
| @@<address> <type-nr> <type-name> <gc-info> [<flags>] ... |
| @end smallexample |
| |
| For a more fine-grained control over the depth of the recursion |
| and the output of vectors @code{R_inspect3} takes additional two character() |
| parameters: maximum depth and the maximal number of elements that will |
| be printed for scalar vectors. The defaults in @code{R_inspect} are |
| currently -1 (no limit) and 5 respectively. |
| |
| @node Using Link-time Optimization, , Debugging compiled code, Debugging |
| @section Using Link-time Optimization |
| |
| Where supported, @emph{link time optimization} provides a comprehensive |
| way to check the consistency of calls between Fortran files or between C |
| and Fortran. |
| @ifset UseExternalXrefs |
| @xref{Link-Time Optimization, , Link-Time Optimization, |
| R-admin, R Installation and Administration}. |
| @end ifset |
| |
| For example: |
| @example |
| boot.f:61: warning: type of 'ddot' does not match original declaration [-Wlto-type-mismatch] |
| y(j,i)=ddot(p,x(j,1),n,b(1,j,i),1) |
| crq.f:1023: note: return value type mismatch |
| @end example |
| @noindent |
| where the package author forgot to declare |
| @example |
| double precision ddot |
| external ddot |
| @end example |
| @noindent |
| in @file{boot.f}. |
| |
| Further examples: |
| |
| @c package assist 3.1.4 |
| @example |
| rkpk2.f:77:5: warning: type of 'dstup' does not match original declaration [-Wlto-type-mismatch] |
| *info, wk) |
| rkpk1.f:2565:5: note: type mismatch in parameter 14 |
| subroutine dstup (s, lds, nobs, nnull, qraux, jpvt, y, q, ldqr, |
| rkpk1.f:2565:5: note: 'dstup' was previously declared here |
| @end example |
| @noindent |
| where the fourteenth argument @code{dum} was missing in the call. |
| |
| @c package gss 2.1-9 |
| @example |
| reg.f:78:33: warning: type of 'dqrdc' does not match original declaration [-Wlto-type-mismatch] |
| call dqrdc (sr, nobs, nobs, nnull, wk, dum, dum, 0) |
| dstup.f:20: note: 'dqrdc' was previously declared here |
| call dqrdc (s, lds, nobs, nnull, qraux, jpvt, work, 1) |
| @end example |
| @noindent |
| @code{dqrdc} is a LINPACK routine from @R{}, @code{jpvt} is an integer |
| array and @code{work} is a double precision one so @code{dum} cannot |
| match both. (If @option{--enable-lto=check} had been used the |
| comparison would have been with the definition in @R{}.) |
| |
| For Fortran files all in the package, most inconsistencies can be |
| detected by concatenating the Fortran files and compiling the result, |
| sometimes with clearer diagnostics than provided by LTO. For our last |
| two examples this gives |
| |
| @example |
| all.f:2966:72: |
| |
| *info, work1) |
| 1 |
| Warning: Missing actual argument for argument 'dum' at (1) |
| @end example |
| @noindent |
| and |
| @example |
| all.f:1663:72: |
| |
| *ipvtwk), wk(ikwk), wk(iwork1), wk(iwork2), info) |
| 1 |
| Warning: Type mismatch in argument 'jpvt' at (1); passed REAL(8) to INTEGER(4) |
| @end example |
| |
| @node System and foreign language interfaces, The R API, Debugging, Top |
| @chapter System and foreign language interfaces |
| |
| @menu |
| * Operating system access:: |
| * Interface functions .C and .Fortran:: |
| * dyn.load and dyn.unload:: |
| * Registering native routines:: |
| * Creating shared objects:: |
| * Interfacing C++ code:: |
| * Fortran I/O:: |
| * Linking to other packages:: |
| * Handling R objects in C:: |
| * Interface functions .Call and .External:: |
| * Evaluating R expressions from C:: |
| * Parsing R code from C:: |
| * External pointers and weak references:: |
| * Vector accessor functions:: |
| * Character encoding issues:: |
| @end menu |
| |
| @node Operating system access, Interface functions .C and .Fortran, System and foreign language interfaces, System and foreign language interfaces |
| @section Operating system access |
| @cindex Operating system access |
| |
| Access to operating system functions is @emph{via} the @R{} functions |
| @code{system} and @code{system2}. |
| @findex system |
| @findex system2 |
| The details will differ by platform (see the on-line help), and about |
| all that can safely be assumed is that the first argument will be a |
| string @code{command} that will be passed for execution (not necessarily |
| by a shell) and the second argument to @code{system} will be |
| @code{internal} which if true will collect the output of the command |
| into an @R{} character vector. |
| |
| On POSIX-compliant OSes these commands pass a command-line to a shell: |
| Windows is not POSIX-compliant and there is a separate function |
| @code{shell} to do so. |
| |
| The function @code{system.time} |
| @findex system.time |
| is available for timing. Timing on child processes is only available on |
| Unix-alikes, and may not be reliable there. |
| |
| @node Interface functions .C and .Fortran, dyn.load and dyn.unload, Operating system access, System and foreign language interfaces |
| @section Interface functions @code{.C} and @code{.Fortran} |
| @cindex Interfaces to compiled code |
| |
| @findex .C |
| @findex .Fortran |
| |
| These two functions provide an interface to compiled code that has been |
| linked into @R{}, either at build time or @emph{via} @code{dyn.load} |
| (@pxref{dyn.load and dyn.unload}). They are primarily intended for |
| compiled C and Fortran code respectively, but the @code{.C} function can |
| be used with other languages which can generate C interfaces, for |
| example C++ (@pxref{Interfacing C++ code}). |
| |
| The first argument to each function is a character string specifying the |
| symbol name as known@footnote{possibly after some platform-specific |
| translation, e.g.@: adding leading or trailing underscores.} to C or |
| Fortran, that is the function or subroutine name. (That the symbol is |
| loaded can be tested by, for example, @code{is.loaded("cg")}. Use the |
| name you pass to @code{.C} or @code{.Fortran} rather than the translated |
| symbol name.) |
| |
| There can be up to 65 further arguments giving @R{} objects to be passed |
| to compiled code. Normally these are copied before being passed in, and |
| copied again to an @R{} list object when the compiled code returns. If |
| the arguments are given names, these are used as names for the |
| components in the returned list object (but not passed to the compiled |
| code). |
| |
| The following table gives the mapping between the modes of @R{} atomic |
| vectors and the types of arguments to a C function or Fortran |
| subroutine. |
| |
| @quotation |
| @multitable {RRR storage.mode} {RRR unsigned char * RR} {DOUBLE PRECISION} |
| @headitem @R{} storage mode @tab C type @tab Fortran type |
| @item @code{logical} @tab @code{int *} @tab @code{INTEGER} |
| @item @code{integer} @tab @code{int *} @tab @code{INTEGER} |
| @item @code{double} @tab @code{double *} @tab @code{DOUBLE PRECISION} |
| @item @code{complex} @tab @code{Rcomplex *} @tab @code{DOUBLE COMPLEX} |
| @item @code{character} @tab @code{char **} @tab @code{CHARACTER(255)} |
| @item @code{raw} @tab @code{unsigned char *} @tab none |
| @end multitable |
| @end quotation |
| |
| @noindent |
| On all @R{} platforms @code{int} and @code{INTEGER} are 32-bit. Code |
| ported from S-PLUS (which uses @code{long *} for @code{logical} and |
| @code{integer}) will not work on all 64-bit platforms (although it may |
| appear to work on some, including Windows). Note also that if your |
| compiled code is a mixture of C functions and Fortran subprograms the |
| argument types must match as given in the table above. |
| |
| C type @code{Rcomplex} is a structure with @code{double} members |
| @code{r} and @code{i} defined in the header file @file{R_ext/Complex.h} |
| included by @file{R.h}. (On most platforms this is stored in a way |
| compatible with the C99 @code{double complex} type: however, it may not |
| be possible to pass @code{Rcomplex} to a C99 function expecting a |
| @code{double complex} argument. Nor need it be compatible with a C++ |
| @code{complex} type. Moreover, the compatibility can depend on the |
| optimization level set for the compiler.) |
| |
| Only a single character string of fixed length can be passed to or from |
| Fortran (the length is not passed), and the success of this is |
| compiler-dependent: its use was formally deprecated in 2019. Other @R{} |
| objects can be passed to @code{.C}, but it is much better to use one of |
| the other interfaces. |
| |
| It is possible to pass numeric vectors of storage mode @code{double} to |
| C as @code{float *} or to Fortran as @code{REAL} by setting the |
| attribute @code{Csingle}, most conveniently by using the @R{} functions |
| @code{as.single}, @code{single} or @code{mode}. This is intended only |
| to be used to aid interfacing existing C or Fortran code. |
| |
| Logical values are sent as @code{0} (@code{FALSE}), @code{1} |
| (@code{TRUE}) or @code{INT_MIN = -2147483648} (@code{NA}, but only if |
| @code{NAOK} is true), and the compiled code should return one of these |
| three values. (Non-zero values other than @code{INT_MIN} are mapped to |
| @code{TRUE}.) |
| |
| Unless formal argument @code{NAOK} is true, all the other arguments are |
| checked for missing values @code{NA} and for the @acronym{IEEE} special |
| values @code{NaN}, @code{Inf} and @code{-Inf}, and the presence of any |
| of these generates an error. If it is true, these values are passed |
| unchecked. |
| |
| Argument @code{PACKAGE} confines the search for the symbol name to a |
| specific shared object (or use @code{"base"} for code compiled into |
| @R{}). Its use is highly desirable, as there is no way to avoid two |
| package writers using the same symbol name, and such name clashes are |
| normally sufficient to cause @R{} to crash. (If it is not present and |
| the call is from the body of a function defined in a package namespace, |
| the shared object loaded by the first (if any) @code{useDynLib} |
| directive will be used. |
| @c However, prior to @R{} 2.15.2 the detection of the correct namespace is |
| @c unreliable and you are strongly recommended to use the @code{PACKAGE} |
| @c argument for packages to be used with earlier versions of @R{}. |
| |
| Note that the compiled code should not return anything except through |
| its arguments: C functions should be of type @code{void} and Fortran |
| subprograms should be subroutines. |
| |
| To fix ideas, let us consider a very simple example which convolves two |
| finite sequences. (This is hard to do fast in interpreted @R{} code, but |
| easy in C code.) We could do this using @code{.C} by |
| |
| @example |
| @group |
| void convolve(double *a, int *na, double *b, int *nb, double *ab) |
| @{ |
| int nab = *na + *nb - 1; |
| |
| for(int i = 0; i < nab; i++) |
| ab[i] = 0.0; |
| for(int i = 0; i < *na; i++) |
| for(int j = 0; j < *nb; j++) |
| ab[i + j] += a[i] * b[j]; |
| @} |
| @end group |
| @end example |
| |
| @noindent |
| called from @R{} by |
| |
| @example |
| @group |
| conv <- function(a, b) |
| .C("convolve", |
| as.double(a), |
| as.integer(length(a)), |
| as.double(b), |
| as.integer(length(b)), |
| ab = double(length(a) + length(b) - 1))$ab |
| @end group |
| @end example |
| |
| Note that we take care to coerce all the arguments to the correct @R{} |
| storage mode before calling @code{.C}; mistakes in matching the types |
| can lead to wrong results or hard-to-catch errors. |
| |
| Special care is needed in handling @code{character} vector arguments in |
| C (or C++). On entry the contents of the elements are duplicated and |
| assigned to the elements of a @code{char **} array, and on exit the |
| elements of the C array are copied to create new elements of a character |
| vector. This means that the contents of the character strings of the |
| @code{char **} array can be changed, including to @code{\0} to shorten |
| the string, but the strings cannot be lengthened. It is |
| possible@footnote{Note that this is then not checked for over-runs by |
| option @code{CBoundsCheck = TRUE}.} to allocate a new string @emph{via} |
| @code{R_alloc} and replace an entry in the @code{char **} array by the |
| new string. However, when character vectors are used other than in a |
| read-only way, the @code{.Call} interface is much to be preferred. |
| |
| Passing character strings to Fortran code needs even more care, is |
| deprecated and should be avoided where possible. Only the first element |
| of the character vector is passed in, as a fixed-length (255) character |
| array. Up to 255 characters are passed back to a length-one character |
| vector. How well this works (or even if it works at all) depends on the |
| C and Fortran compilers on each platform (including on their options). |
| Often what is being passed to Fortran is one of a small set of possible |
| values (a factor in @R{} terms) which could alternatively be passed as |
| an integer code: similarly Fortran code that wants to generate |
| diagnostic messages could pass an integer code to a C or @R{} wrapper |
| which would convert it to a character string. |
| |
| It is possible to pass some @R{} objects other than atomic vectors @emph{via} |
| @code{.C}, but this is only supported for historical compatibility: use |
| the @code{.Call} or @code{.External} interfaces for such objects. Any |
| C/C++ code that includes @file{Rinternals.h} should be called @emph{via} |
| @code{.Call} or @code{.External}. |
| |
| @node dyn.load and dyn.unload, Registering native routines, Interface functions .C and .Fortran, System and foreign language interfaces |
| @section @code{dyn.load} and @code{dyn.unload} |
| @cindex Dynamic loading |
| |
| @findex dyn.load |
| @findex dyn.unload |
| |
| Compiled code to be used with @R{} is loaded as a shared object |
| (Unix-alikes including macOS, @pxref{Creating shared objects} for more |
| information) or DLL (Windows). |
| |
| The shared object/DLL is loaded by @code{dyn.load} and unloaded by |
| @code{dyn.unload}. Unloading is not normally necessary and is not safe in |
| general, but it is needed to allow the DLL to be re-built on some platforms, |
| including Windows. Unloading a DLL and then re-loading a DLL of the same name |
| may not work: Solaris uses the first version loaded. A DLL that registers |
| C finalizers, but fails to unregister them when unloaded, may cause R to crash |
| after unloading. |
| |
| The first argument to both functions is a character string giving the |
| path to the object. Programmers should not assume a specific file |
| extension for the object/DLL (such as @file{.so}) but use a construction |
| like |
| |
| @example |
| file.path(path1, path2, paste0("mylib", .Platform$dynlib.ext)) |
| @end example |
| |
| @noindent |
| for platform independence. On Unix-alike systems the path supplied to |
| @code{dyn.load} can be an absolute path, one relative to the current |
| directory or, if it starts with @samp{~}, relative to the user's home |
| directory. |
| |
| Loading is most often done automatically based on the @code{useDynLib()} |
| declaration in the @file{NAMESPACE} file, but may be done |
| explicitly @emph{via} a call to @code{library.dynam}. |
| @findex library.dynam |
| This has the form |
| |
| @example |
| library.dynam("libname", package, lib.loc) |
| @end example |
| |
| @noindent |
| where @code{libname} is the object/DLL name @emph{with the extension |
| omitted}. Note that the first argument, @code{chname}, should |
| @strong{not} be @code{package} since this will not work if the package |
| is installed under another name. |
| |
| Under some Unix-alike systems there is a choice of how the symbols are |
| resolved when the object is loaded, governed by the arguments |
| @code{local} and @code{now}. Only use these if really necessary: in |
| particular using @code{now=FALSE} and then calling an unresolved symbol |
| will terminate @R{} unceremoniously. |
| |
| @R{} provides a way of executing some code automatically when a object/DLL |
| is either loaded or unloaded. This can be used, for example, to |
| register native routines with @R{}'s dynamic symbol mechanism, initialize |
| some data in the native code, or initialize a third party library. On |
| loading a DLL, @R{} will look for a routine within that DLL named |
| @code{R_init_@var{lib}} where @var{lib} is the name of the DLL file with |
| the extension removed. For example, in the command |
| |
| @example |
| library.dynam("mylib", package, lib.loc) |
| @end example |
| |
| @noindent |
| R looks for the symbol named @code{R_init_mylib}. Similarly, when |
| unloading the object, @R{} looks for a routine named |
| @code{R_unload_@var{lib}}, e.g., @code{R_unload_mylib}. In either case, |
| if the routine is present, @R{} will invoke it and pass it a single |
| argument describing the DLL. This is a value of type @code{DllInfo} |
| which is defined in the @file{Rdynload.h} file in the @file{R_ext} |
| directory. |
| |
| Note that there are some implicit restrictions on this mechanism as the |
| basename of the DLL needs to be both a valid file name and valid as part |
| of a C entry point (e.g.@: it cannot contain @samp{.}): for portable |
| code it is best to confine DLL names to be @acronym{ASCII} alphanumeric |
| plus underscore. If entry point @code{R_init_@var{lib}} is not found it |
| is also looked for with @samp{.} replaced by @samp{_}. |
| |
| |
| The following example shows templates for the initialization and |
| unload routines for the @code{mylib} DLL. |
| |
| @quotation |
| @cartouche |
| @example |
| #include <R_ext/Rdynload.h> |
| |
| void |
| R_init_mylib(DllInfo *info) |
| @{ |
| /* Register routines, |
| allocate resources. */ |
| @} |
| |
| void |
| R_unload_mylib(DllInfo *info) |
| @{ |
| /* Release resources. */ |
| @} |
| @end example |
| @end cartouche |
| @end quotation |
| |
| If a shared object/DLL is loaded more than once the most recent version |
| is used.@footnote{Strictly this is OS-specific, but no exceptions have |
| been seen for many years.} More generally, if the same symbol name |
| appears in several shared objects, the most recently loaded occurrence |
| is used. The @code{PACKAGE} argument and registration (see the next |
| section) provide good ways to avoid any ambiguity in which occurrence is |
| meant. |
| |
| On Unix-alikes the paths used to resolve dynamically linked dependent |
| libraries are fixed (for security reasons) when the process is launched, |
| so @code{dyn.load} will only look for such libraries in the locations |
| set by the @file{R} shell script (@emph{via} @file{etc/ldpaths}) and in |
| the OS-specific defaults. |
| |
| Windows allows more control (and less security) over where dependent |
| DLLs are looked for. On all versions this includes the @env{PATH} |
| environment variable, but with lowest priority: note that it does not |
| include the directory from which the DLL was loaded. It is possible to |
| add a single path with quite high priority @emph{via} the @code{DLLpath} |
| argument to @code{dyn.load}. This is (by default) used by |
| @code{library.dynam} to include the package's @file{libs/i386} or |
| @file{libs/x64} directory in the DLL search path. |
| |
| |
| @node Registering native routines, Creating shared objects, dyn.load and dyn.unload, System and foreign language interfaces |
| @section Registering native routines |
| @cindex Registering native routines |
| |
| @menu |
| * Speed considerations:: |
| * Converting a package to use registration:: |
| * Linking to native routines in other packages:: |
| @end menu |
| |
| By `native' routine, we mean an entry point in compiled code. |
| |
| In calls to @code{.C}, @code{.Call}, @code{.Fortran} and |
| @code{.External}, @R{} must locate the specified native routine by |
| looking in the appropriate shared object/DLL. By default, @R{} uses the |
| operating-system-specific dynamic loader to lookup the symbol in |
| all@footnote{For calls from within a namespace the search is confined to |
| the DLL loaded for that package.} loaded DLLs and the @R{} executable |
| or libraries it is linked to. Alternatively, the author of the DLL can |
| explicitly register routines with @R{} and use a single, |
| platform-independent mechanism for finding the routines in the DLL. One |
| can use this registration mechanism to provide additional information |
| about a routine, including the number and type of the arguments, and |
| also make it available to @R{} programmers under a different name. |
| @c No sign of this in 15 years .... |
| @c In the future, registration may be used to |
| @c implement a form of ``secure'' or limited native access. |
| |
| Registering routines has two main advantages: it provides a |
| faster@footnote{For unregistered entry points the OS's @code{dlsym} |
| routine is used to find addresses. Its performance varies considerably |
| by OS and even in the best case it will need to search a much larger |
| symbol table than, say, the table of @code{.Call} entry points.} way to |
| find the address of the entry point @emph{via} tables stored in the DLL |
| at compilation time, and it provides a run-time check that the entry |
| point is called with the right number of arguments and, optionally, the |
| right argument types. |
| |
| @findex R_registerRoutines |
| To register routines with @R{}, one calls the C routine |
| @code{R_registerRoutines}. This is typically done when the DLL is first |
| loaded within the initialization routine @code{R_init_@var{dll name}} |
| described in @ref{dyn.load and dyn.unload}. @code{R_registerRoutines} |
| takes 5 arguments. The first is the @code{DllInfo} object passed by |
| @R{} to the initialization routine. This is where @R{} stores the |
| information about the methods. The remaining 4 arguments are arrays |
| describing the routines for each of the 4 different interfaces: |
| @code{.C}, @code{.Call}, @code{.Fortran} and @code{.External}. Each |
| argument is a @code{NULL}-terminated array of the element types given in |
| the following table: |
| |
| @quotation |
| @multitable {@code{.External }} {@code{R_ExternalMethodDef}} |
| @item @code{.C} @tab @code{R_CMethodDef} |
| @item @code{.Call} @tab @code{R_CallMethodDef} |
| @item @code{.Fortran} @tab @code{R_FortranMethodDef} |
| @item @code{.External} @tab @code{R_ExternalMethodDef} |
| @end multitable |
| @end quotation |
| |
| Currently, the @code{R_ExternalMethodDef} type is the same as |
| @code{R_CallMethodDef} type and contains fields for the name of the |
| routine by which it can be accessed in @R{}, a pointer to the actual |
| native symbol (i.e., the routine itself), and the number of arguments |
| the routine expects to be passed from @R{}. For example, if we had a |
| routine named @code{myCall} defined as |
| |
| @example |
| SEXP myCall(SEXP a, SEXP b, SEXP c); |
| @end example |
| |
| @noindent |
| we would describe this as |
| |
| @example |
| static const R_CallMethodDef callMethods[] = @{ |
| @{"myCall", (DL_FUNC) &myCall, 3@}, |
| @{NULL, NULL, 0@} |
| @}; |
| @end example |
| |
| @noindent |
| along with any other routines for the @code{.Call} interface. For |
| routines with a variable number of arguments invoked @emph{via} the |
| @code{.External} interface, one specifies @code{-1} for the number of |
| arguments which tells @R{} not to check the actual number passed. |
| |
| Routines for use with the @code{.C} and @code{.Fortran} interfaces are |
| described with similar data structures, but which have two additional |
| fields for describing the type and ``style'' of each argument. Each of |
| these can be omitted. However, if specified, each should be an array |
| with the same number of elements as the number of parameters for the |
| routine. The types array should contain the @code{SEXP} types |
| describing the expected type of the argument. (Technically, the elements |
| of the types array are of type @code{R_NativePrimitiveArgType} which is |
| just an unsigned integer.) The @R{} types and corresponding type |
| identifiers are provided in the following table: |
| |
| @quotation |
| @multitable {@code{character }} {@code{SINGLESXP}} |
| @item @code{numeric} @tab @code{REALSXP} |
| @item @code{integer} @tab @code{INTSXP} |
| @item @code{logical} @tab @code{LGLSXP} |
| @item @code{single} @tab @code{SINGLESXP} |
| @item @code{character} @tab @code{STRSXP} |
| @item @code{list} @tab @code{VECSXP} |
| @end multitable |
| @end quotation |
| |
| Consider a C routine, @code{myC}, declared as |
| |
| @example |
| void myC(double *x, int *n, char **names, int *status); |
| @end example |
| |
| We would register it as |
| |
| @example |
| @group |
| static R_NativePrimitiveArgType myC_t[] = @{ |
| REALSXP, INTSXP, STRSXP, LGLSXP |
| @}; |
| |
| static const R_CMethodDef cMethods[] = @{ |
| @{"myC", (DL_FUNC) &myC, 4, myC_t@}, |
| @{NULL, NULL, 0, NULL@} |
| @}; |
| @end group |
| @end example |
| |
| @c Never implemented .... |
| @c One can also specify whether each argument is used simply as input, or |
| @c as output, or as both input and output. The style field in the |
| @c description of a method is used for this. The purpose is to |
| @c allow@footnote{but this is not currently done.} @R{} to transfer values |
| @c more efficiently across the @R{}-C/Fortran interface by avoiding copying |
| @c values when it is not necessary. Typically, one omits this information |
| @c in the registration data. |
| |
| Note that @code{.Fortran} entry points are mapped to lowercase, so |
| registration should use lowercase only. |
| |
| Having created the arrays describing each routine, the last step is to |
| actually register them with @R{}. We do this by calling |
| @code{R_registerRoutines}. For example, if we have the descriptions |
| above for the routines accessed by the @code{.C} and @code{.Call} |
| we would use the following code: |
| |
| @example |
| void |
| R_init_myLib(DllInfo *info) |
| @{ |
| R_registerRoutines(info, cMethods, callMethods, NULL, NULL); |
| @} |
| @end example |
| |
| This routine will be invoked when @R{} loads the shared object/DLL named |
| @code{myLib}. The last two arguments in the call to |
| @code{R_registerRoutines} are for the routines accessed by |
| @code{.Fortran} and @code{.External} interfaces. In our example, these |
| are given as @code{NULL} since we have no routines of these types. |
| |
| When @R{} unloads a shared object/DLL, its registrations are removed. |
| There is no other facility for unregistering a symbol. |
| |
| Examples of registering routines can be found in the different packages |
| in the @R{} source tree (e.g., @pkg{stats} and @pkg{graphics}). Also, |
| there is a brief, high-level introduction in @emph{R News} (volume 1/3, |
| September 2001, pages 20--23, |
| @uref{https://www.r-project.org/@/doc/@/Rnews/Rnews_2001-3.pdf}). |
| |
| Once routines are registered, they can be referred to as @R{} objects if |
| this is arranged in the @code{useDynLib} call in the package's |
| @file{NAMESPACE} file (see @ref{useDynLib}). So for example the |
| @pkg{stats} package has |
| @example |
| # Refer to all C/Fortran routines by their name prefixed by C_ |
| useDynLib(stats, .registration = TRUE, .fixes = "C_") |
| @end example |
| |
| @noindent |
| in its @file{NAMESPACE} file, and then @code{ansari.test}'s default |
| methods can contain |
| @example |
| pansari <- function(q, m, n) |
| .C(C_pansari, as.integer(length(q)), p = as.double(q), |
| as.integer(m), as.integer(n))$p |
| @end example |
| |
| @noindent |
| This avoids the overhead of looking up an entry point each time it is |
| used, and ensures that the entry point in the package is the one used |
| (without a @code{PACKAGE = "pkg"} argument). |
| |
| @code{R_init_} routines are often of the form |
| @example |
| void attribute_visible R_init_mypkg(DllInfo *dll) |
| @{ |
| R_registerRoutines(dll, CEntries, CallEntries, FortEntries, |
| ExternalEntries); |
| R_useDynamicSymbols(dll, FALSE); |
| R_forceSymbols(dll, TRUE); |
| ... |
| @} |
| @end example |
| |
| @noindent |
| @findex R_useDynamicSymbols |
| @findex R_forceSymbols |
| The @code{R_useDynamicSymbols} call says the DLL is not to be searched |
| for entry points specified by character strings so @code{.C} etc calls |
| will only find registered symbols: the @code{R_forceSymbols} call only |
| allows @code{.C} etc calls which specify entry points by @R{} objects |
| such as @code{C_pansari} (and not by character strings). Each provides |
| some protection against accidentally finding your entry points when |
| people supply a character string without a package, and avoids slowing |
| down such searches. (For the visibility attribute @pxref{Controlling |
| visibility}.) |
| |
| In more detail, if a package @code{mypkg} contains entry points |
| @code{reg} and @code{unreg} and the first is registered as a 0-argument |
| @code{.Call} routine, we could use (from code in the package) |
| |
| @example |
| .Call("reg") |
| .Call("unreg") |
| @end example |
| |
| @noindent |
| Without or with registration, these will both work. If |
| @code{R_init_mypkg} calls @code{R_useDynamicSymbols(dll, FALSE)}, only |
| the first will work. If in addition to registration the |
| @file{NAMESPACE} file contains |
| |
| @example |
| useDynLib(mypkg, .registration = TRUE, .fixes = "C_") |
| @end example |
| |
| @noindent |
| then we can call @code{.Call(C_reg)}. Finally, if @code{R_init_mypkg} |
| also calls @code{R_forceSymbols(dll, TRUE)}, only @code{.Call(C_reg)} |
| will work (and not @code{.Call("reg")}). This is usually what we want: |
| it ensures that all of our own @code{.Call} calls go directly to the |
| intended code in our package and that no one else accidentally finds our |
| entry points. (Should someone need to call our code from outside the |
| package, for example for debugging, they can use |
| @code{.Call(mypkg:::C_reg)}.) |
| |
| |
| @node Speed considerations, Converting a package to use registration, Registering native routines, Registering native routines |
| @subsection Speed considerations |
| |
| Sometimes registering native routines or using a @code{PACKAGE} argument |
| can make a large difference. The results can depend quite markedly on |
| the OS (and even if it is 32- or 64-bit), on the version of @R{} and |
| what else is loaded into @R{} at the time. |
| |
| To fix ideas, first consider @code{x86_64} OS 10.7 and @R{} 2.15.2. A |
| simple @code{.Call} function might be |
| @example |
| foo <- function(x) .Call("foo", x) |
| @end example |
| @noindent |
| with C code |
| @example |
| @group |
| #include <Rinternals.h> |
| |
| SEXP foo(SEXP x) |
| @{ |
| return x; |
| @} |
| @end group |
| @end example |
| If we compile with by @command{R CMD SHLIB foo.c}, load the code by |
| @code{dyn.load("foo.so")} and run @code{foo(pi)} it took around 22 |
| microseconds (us). Specifying the DLL by |
| @example |
| foo2 <- function(x) .Call("foo", x, PACKAGE = "foo") |
| @end example |
| @noindent |
| reduced the time to 1.7 us. |
| |
| Now consider making these functions part of a package whose |
| @file{NAMESPACE} file uses @code{useDynlib(foo)}. This immediately |
| reduces the running time as @code{"foo"} will be preferentially looked |
| for @file{foo.dll}. Without specifying @code{PACKAGE} it took about 5 |
| us (it needs to fathom out the appropriate DLL each time it is invoked |
| but it does not need to search all DLLs), and with the @code{PACKAGE} |
| argument it is again about 1.7 us. |
| |
| Next suppose the package has registered the native routine @code{foo}. |
| Then @code{foo()} still has to find the appropriate DLL but can get to |
| the entry point in the DLL faster, in about 4.2 us. And @code{foo2()} |
| now takes about 1 us. If we register the symbols in the |
| @file{NAMESPACE} file and use |
| @example |
| foo3 <- function(x) .Call(C_foo, x) |
| @end example |
| @noindent |
| then the address for the native routine is looked up just once when the |
| package is loaded, and @code{foo3(pi)} takes about 0.8 us. |
| |
| Versions using @code{.C()} rather than @code{.Call()} took about 0.2 us |
| longer. |
| |
| These are all quite small differences, but C routines are not uncommonly |
| invoked millions of times for run times of a few microseconds each, and |
| those doing such things may wish to be aware of the differences. |
| |
| On Linux and Solaris there is a smaller overhead in looking up |
| symbols. |
| |
| Symbol lookup on Windows used to be far slower, so @R{} maintains a |
| small cache. If the cache is currently empty enough that the symbol can |
| be stored in the cache then the performance is similar to Linux and |
| Solaris: if not it may be slower. @R{}'s own code always uses |
| registered symbols and so these never contribute to the cache: however |
| many other packages do rely on symbol lookup. |
| |
| In more recent versions of @R{} all the standard packages register |
| native symbols and do not allow symbol search, so in a new session |
| @code{foo()} can only look in @file{foo.so} and may be as fast as |
| @code{foo2()}. This will no longer apply when many contributed packages |
| are loaded, and generally those last loaded are searched first. For |
| example, consider @R{} 3.3.2 on x86_64 Linux. In an empty @R{} session, |
| both @code{foo()} and @code{foo2()} took about 0.75 us; however after |
| packages @CRANpkg{igraph} and @CRANpkg{spatstat} had been loaded (which |
| loaded another 12 DLLs), @code{foo()} took 3.6 us but @code{foo2()} |
| still took about 0.80 us. Using registration in a package reduced this |
| to 0.55 us and @code{foo3()} took 0.40 us, times which were unchanged |
| when further packages were loaded. |
| |
| @node Converting a package to use registration, Linking to native routines in other packages, Speed considerations, Registering native routines |
| @subsection Example: converting a package to use registration |
| |
| The @pkg{splines} package was converted to use symbol registration in |
| 2001, but we can use it as an example@footnote{Because it is a standard |
| package, one would need to rename it before attempting to reproduce the |
| account here.} of what needs to be done for a small package. |
| |
| @itemize |
| |
| @item |
| Find the relevant entry points. |
| This is somewhat OS-specific, but something like the following should be |
| possible at the OS command-line |
| |
| @example |
| @group |
| nm -g /path/to/splines.so | grep " T " |
| 0000000000002670 T _spline_basis |
| 0000000000001ec0 T _spline_value |
| @end group |
| @end example |
| |
| @noindent |
| This indicates that there are two relevant entry points. (They may or |
| may not have a leading underscore, as here. Fortran entry points will |
| have a trailing underscore.) Check in the @R{} code that they are |
| called by the package and how: in this case they are used by |
| @code{.Call}. |
| |
| Alternatively, examine the package's @R{} code for all @code{.C}, |
| @code{.Fortran}, @code{.Call} and @code{.External} calls. |
| |
| @item |
| Construct the registration table. First write skeleton registration |
| code, conventionally in file @file{src/init.c} (or at the end of the |
| only C source file in the package: if included in a C++ file the |
| @samp{R_init} function would need to be declared @code{extern "C"}): |
| |
| @example |
| @group |
| #include <stdlib.h> // for NULL |
| #include <R_ext/Rdynload.h> |
| |
| #define CALLDEF(name, n) @{#name, (DL_FUNC) &name, n@} |
| |
| static const R_CallMethodDef R_CallDef[] = @{ |
| CALLDEF(spline_basis, ?), |
| CALLDEF(spline_value, ?), |
| @{NULL, NULL, 0@} |
| @}; |
| |
| void R_init_splines(DllInfo *dll) |
| @{ |
| R_registerRoutines(dll, NULL, R_CallDef, NULL, NULL); |
| @} |
| @end group |
| @end example |
| |
| @noindent |
| and then replace the @code{?} in the skeleton with the actual numbers of |
| arguments. You will need to add declarations (also known as |
| `prototypes') of the functions unless appending to the only C source |
| file. Some packages will already have these in a header file, or you |
| could create one and include it in @file{init.c}, for example |
| @file{splines.h} containing |
| |
| @smallexample |
| @group |
| #include <Rinternals.h> // for SEXP |
| extern SEXP spline_basis(SEXP knots, SEXP order, SEXP xvals, SEXP derivs); |
| extern SEXP spline_value(SEXP knots, SEXP coeff, SEXP order, SEXP x, SEXP deriv); |
| @end group |
| @end smallexample |
| @noindent |
| Tools are available to extract declarations, at least for C and C++ |
| code: see the help file for |
| @code{package_native_routine_registration_skeleton} in package |
| @pkg{tools}. Here we could have used |
| @example |
| cproto -I/path/to/R/include -e splines.c |
| @end example |
| |
| For examples of registering other types of calls, see packages |
| @pkg{graphics} and @pkg{stats}. In particular, when registering entry |
| points for @code{.Fortran} one needs declarations as if called from C, |
| such as |
| |
| @example |
| @group |
| #include <R_ext/RS.h> |
| void F77_NAME(supsmu)(int *n, double *x, double *y, |
| double *w, int *iper, double *span, double *alpha, |
| double *smo, double *sc, double *edf); |
| @end group |
| @end example |
| |
| @noindent |
| @command{gfortran} 9.2@footnote{This was added on 2019-05-09, just after |
| release as 9.1.} and later can help generate such prototypes with its |
| flag @option{-fc-prototypes-external} (although one will need to replace |
| the hard-coded trailing underscore with the @code{F77_NAME} macro). |
| |
| One can get away with inaccurate argument lists in the declarations: it |
| is easy to specify the arguments for @code{.Call} (all @code{SEXP}) and |
| @code{.External} (one @code{SEXP}) and as the arguments for @code{.C} |
| and @code{.Fortran} are all pointers, specifying them as @code{void *} |
| suffices. (For most platforms one can omit all the arguments, although |
| link-time optimization will warn.) |
| |
| @item |
| (Optional but highly recommended.) Restrict @code{.Call} etc to use the |
| symbols you chose to register by editing @file{src/init.c} to contain |
| |
| @example |
| @group |
| void R_init_splines(DllInfo *dll) |
| @{ |
| R_registerRoutines(dll, NULL, R_CallDef, NULL, NULL); |
| R_useDynamicSymbols(dll, FALSE); |
| @} |
| @end group |
| @end example |
| |
| @end itemize |
| |
| A skeleton for the steps so far can be made using |
| @code{package_native_routine_registration_skeleton} in package |
| @pkg{tools}. This will optionally create declarations based on the |
| usage in the @R{} code. |
| |
| The remaining steps are optional but recommended. |
| |
| @itemize |
| @item |
| Edit the @file{NAMESPACE} file to create @R{} objects for the registered |
| symbols: |
| |
| @example |
| useDynLib(splines, .registration = TRUE, .fixes = "C_") |
| @end example |
| |
| @item |
| Find all the relevant calls in the @R{} code and edit them to use the |
| @R{} objects. This entailed changing the lines |
| |
| @smallexample |
| temp <- .Call("spline_basis", knots, ord, x, derivs, PACKAGE = "splines") |
| y[accept] <- .Call("spline_value", knots, coeff, ord, x[accept], deriv, PACKAGE = "splines") |
| y = .Call("spline_value", knots, coef(object), ord, x, deriv, PACKAGE = "splines") |
| @end smallexample |
| @noindent |
| to |
| |
| @smallexample |
| temp <- .Call(C_spline_basis, knots, ord, x, derivs) |
| y[accept] <- .Call(C_spline_value, knots, coeff, ord, x[accept], deriv) |
| y = .Call(C_spline_value, knots, coef(object), ord, x, deriv) |
| @end smallexample |
| |
| Check that there is no @code{exportPattern} directive which |
| unintentionally exports the newly created @R{} objects. |
| |
| @item |
| Restrict @code{.Call} to use the @R{} symbols by editing |
| @file{src/init.c} to contain |
| |
| @example |
| @group |
| void R_init_splines(DllInfo *dll) |
| @{ |
| R_registerRoutines(dll, NULL, R_CallDef, NULL, NULL); |
| R_useDynamicSymbols(dll, FALSE); |
| R_forceSymbols(dll, TRUE); |
| @} |
| @end group |
| @end example |
| |
| @item |
| Consider visibility. On some OSes we can hide entry points from the |
| loader, which precludes any possible name clashes and calling them |
| accidentally (usually with incorrect arguments and crashing the @R{} |
| process). If we repeat the first step we now see |
| |
| @example |
| @group |
| nm -g /path/to/splines.so | grep " T " |
| 0000000000002e00 T _R_init_splines |
| 00000000000025e0 T _spline_basis |
| 0000000000001e20 T _spline_value |
| @end group |
| @end example |
| |
| @noindent |
| If there were any entry points not intended to be used by the package we |
| should try to avoid exporting them, for example by making them |
| @code{static}. Now that the two relevant entry points are only accessed |
| @emph{via} the registration table, we can hide them. There are two ways |
| to do so on some Unix-alikes. We can hide individual entry points |
| @emph{via} |
| |
| @example |
| @group |
| #include <R_ext/Visibility.h> |
| |
| SEXP attribute_hidden |
| spline_basis(SEXP knots, SEXP order, SEXP xvals, SEXP derivs) |
| @dots{} |
| |
| SEXP attribute_hidden |
| spline_value(SEXP knots, SEXP coeff, SEXP order, SEXP x, SEXP deriv) |
| @dots{} |
| @end group |
| @end example |
| |
| @noindent |
| Alternatively, we can change the default visibility for all C symbols by |
| including |
| |
| @example |
| PKG_CFLAGS = $(C_VISIBILITY) |
| @end example |
| |
| @noindent |
| in @file{src/Makevars}, and then we need to allow registration by |
| declaring @code{R_init_splines} to be visible: |
| |
| @example |
| @group |
| #include <R_ext/Visibility.h> |
| |
| void attribute_visible |
| R_init_splines(DllInfo *dll) |
| @dots{} |
| @end group |
| @end example |
| |
| @noindent |
| @xref{Controlling visibility} for more details, including using Fortran |
| code and ways to restrict visibility on Windows. |
| |
| @item |
| We end up with a file @file{src/init.c} containing |
| |
| @quotation |
| @cartouche |
| @example |
| #include <stdlib.h> |
| #include <R_ext/Rdynload.h> |
| #include <R_ext/Visibility.h> // optional |
| |
| #include "splines.h" |
| |
| #define CALLDEF(name, n) @{#name, (DL_FUNC) &name, n@} |
| |
| static const R_CallMethodDef R_CallDef[] = @{ |
| CALLDEF(spline_basis, 4), |
| CALLDEF(spline_value, 5), |
| @{NULL, NULL, 0@} |
| @}; |
| |
| void |
| attribute_visible // optional |
| R_init_splines(DllInfo *dll) |
| @{ |
| R_registerRoutines(dll, NULL, R_CallDef, NULL, NULL); |
| R_useDynamicSymbols(dll, FALSE); |
| R_forceSymbols(dll, TRUE); |
| @} |
| @end example |
| @end cartouche |
| @end quotation |
| |
| @end itemize |
| |
| @node Linking to native routines in other packages, , Converting a package to use registration, Registering native routines |
| @subsection Linking to native routines in other packages |
| |
| In addition to registering C routines to be called by @R{}, it can at |
| times be useful for one package to make some of its C routines available |
| to be called by C code in another package. The interface consists of |
| two routines declared in header @file{R_ext/Rdynload.h} as |
| |
| @findex R_RegisterCCallable |
| @findex R_GetCCallable |
| @example |
| void R_RegisterCCallable(const char *package, const char *name, |
| DL_FUNC fptr); |
| DL_FUNC R_GetCCallable(const char *package, const char *name); |
| @end example |
| |
| A package @pkg{packA} that wants to make a C routine @code{myCfun} |
| available to C code in other packages would include the call |
| |
| @example |
| R_RegisterCCallable("packA", "myCfun", myCfun); |
| @end example |
| @noindent |
| in its initialization function @code{R_init_packA}. A package |
| @pkg{packB} that wants to use this routine would retrieve the function |
| pointer with a call of the form |
| |
| @example |
| p_myCfun = R_GetCCallable("packA", "myCfun"); |
| @end example |
| |
| The author of @pkg{packB} is responsible for ensuring that |
| @code{p_myCfun} has an appropriate declaration. In the future @R{} may |
| provide some automated tools to simplify exporting larger numbers of |
| routines. |
| |
| A package that wishes to make use of header files in other packages |
| needs to declare them as a comma-separated list in the field |
| @samp{LinkingTo} in the @file{DESCRIPTION} file. This then arranges |
| for the @file{include} directories in the installed linked-to packages |
| to be added to the include paths for C and C++ code. |
| |
| It must specify@footnote{whether or not @samp{LinkingTo} is used.} |
| @samp{Imports} or @samp{Depends} of those packages, for they have to be |
| loaded@footnote{so there needs to be a corresponding @code{import} or |
| @code{importFrom} entry in the @file{NAMESPACE} file.} prior to this one |
| (so the path to their compiled code has been registered). |
| |
| |
| @acronym{CRAN} examples of the use of this mechanism include @CRANpkg{coxme} |
| linking to @CRANpkg{bdsmatrix} and @CRANpkg{xts} linking to |
| @CRANpkg{zoo}. |
| |
| @node Creating shared objects, Interfacing C++ code, Registering native routines, System and foreign language interfaces |
| @section Creating shared objects |
| @cindex Creating shared objects |
| @findex R CMD SHLIB |
| |
| Shared objects for loading into @R{} can be created using @command{R CMD |
| SHLIB}. This accepts as arguments a list of files which must be object |
| files (with extension @file{.o}) or sources for C, C++, Fortran, |
| Objective C or Objective C++ (with extensions @file{.c}, @file{.cc} or |
| @file{.cpp}, @file{.f} (fixed-form Fortran), @file{.f90} or @file{.f95} |
| (free-form), @file{.m}, and @file{.mm} or @file{.M}, respectively), or |
| commands to be passed to the linker. See @kbd{R CMD SHLIB --help} (or |
| the @R{} help for @code{SHLIB}) for usage information. Note that files |
| intended for the Fortran pre-processor with extension @file{.F} are not |
| accepted. |
| |
| If compiling the source files does not work ``out of the box'', you can |
| specify additional flags by setting some of the variables |
| @vindex PKG_CPPFLAGS |
| @code{PKG_CPPFLAGS} (for the C/C++ preprocessor, mainly @samp{-I}, |
| @samp{-D} and @samp{-U} flags), |
| @vindex PKG_CFLAGS |
| @vindex PKG_CXXFLAGS |
| @vindex PKG_FFLAGS |
| @vindex PKG_OBJCFLAGS |
| @vindex PKG_OBJCXXFLAGS |
| @code{PKG_CFLAGS}, @code{PKG_CXXFLAGS}, @code{PKG_FFLAGS}, |
| @code{PKG_OBJCFLAGS}, and @code{PKG_OBJCXXFLAGS} |
| (for the C, C++, Fortran, Objective C, and Objective C++ |
| compilers, respectively) in the file @file{Makevars} in the compilation |
| directory (or, of course, create the object files directly from the |
| command line). |
| @vindex PKG_LIBS |
| Similarly, variable @code{PKG_LIBS} in @file{Makevars} can be used for |
| additional @samp{-l} and @samp{-L} flags to be passed to the linker when |
| building the shared object. (Supplying linker commands as arguments to |
| @code{R CMD SHLIB} will take precedence over @code{PKG_LIBS} in |
| @file{Makevars}.) |
| |
| @vindex OBJECTS |
| It is possible to arrange to include compiled code from other languages |
| by setting the macro @samp{OBJECTS} in file @file{Makevars}, together |
| with suitable rules to make the objects. |
| |
| Flags that are already set (for example in file |
| @file{etc@var{R_ARCH}/Makeconf}) can be overridden by the environment |
| variable @env{MAKEFLAGS} (at least for systems using a POSIX-compliant |
| @code{make}), as in (Bourne shell syntax) |
| |
| @example |
| MAKEFLAGS="CFLAGS=-O3" R CMD SHLIB *.c |
| @end example |
| |
| It is also possible to set such variables in personal @file{Makevars} |
| files, which are read after the local @file{Makevars} and the system |
| makefiles or in a site-wide @file{Makevars.site} file. |
| @ifset UseExternalXrefs |
| @xref{Customizing package compilation, , Customizing package compilation, |
| R-admin, R Installation and Administration}, |
| @end ifset |
| |
| |
| Note that as @command{R CMD SHLIB} uses Make, it will not remake a shared |
| object just because the flags have changed, and if @file{test.c} and |
| @file{test.f} both exist in the current directory |
| |
| @example |
| R CMD SHLIB test.f |
| @end example |
| |
| @noindent |
| will compile @file{test.c}! |
| |
| |
| If the @file{src} subdirectory of an add-on package contains source code |
| with one of the extensions listed above or a file @file{Makevars} but |
| @strong{not} a file @file{Makefile}, @command{R CMD INSTALL} creates a |
| shared object (for loading into @R{} through @code{useDynlib} in the |
| @file{NAMESPACE}, or in the @code{.onLoad} function of the package) |
| using the @command{R CMD SHLIB} mechanism. If file @file{Makevars} |
| exists it is read first, then the system makefile and then any personal |
| @file{Makevars} files. |
| |
| If the @file{src} subdirectory of package contains a file |
| @file{Makefile}, this is used by @command{R CMD INSTALL} in place of the |
| @code{R CMD SHLIB} mechanism. @command{make} is called with makefiles |
| @file{@var{R_HOME}/etc@var{R_ARCH}/Makeconf}, @file{src/Makefile} and |
| any personal @file{Makevars} files (in that order). The first target |
| found in @file{src/Makefile} is used. |
| |
| It is better to make use of a @file{Makevars} file rather than a |
| @file{Makefile}: the latter should be needed only exceptionally. |
| |
| @c Not so clearcut on case-insensitive file systems. |
| @c Note that whereas @code{R CMD INSTALL} makes use of a @file{Makefile}, |
| @c @code{R CMD SHLIB} does not. The file must be named @file{Makefile}, |
| @c not for example @file{makefile} nor @file{GNUmakefile}. |
| |
| Under Windows the same commands work, but @file{Makevars.win} will be |
| used in preference to @file{Makevars}, and only @file{src/Makefile.win} |
| will be used by @code{R CMD INSTALL} with @file{src/Makefile} being |
| ignored. For past experiences of building DLLs with a variety of |
| compilers, see file @samp{README.packages}. |
| Under Windows you can supply an exports definitions file called |
| @file{@var{dllname}-win.def}: otherwise all entry points in objects (but |
| not libraries) supplied to @code{R CMD SHLIB} will be exported from the |
| DLL. An example is @file{stats-win.def} for the @pkg{stats} package: a |
| @acronym{CRAN} example in package @CRANpkg{fastICA}. |
| |
| If you feel tempted to read the source code and subvert these |
| mechanisms, please resist. Far too much developer time has been wasted |
| in chasing down errors caused by failures to follow this documentation, |
| and even more by package authors demanding explanations as to why their |
| packages no longer work. |
| @c Jasjeet Singh Sekhon: this is your moment of infamy. |
| In particular, undocumented environment or @command{make} variables are |
| not for use by package writers and are subject to change without notice. |
| |
| @node Interfacing C++ code, Fortran I/O, Creating shared objects, System and foreign language interfaces |
| @section Interfacing C++ code |
| @cindex Interfacing C++ code |
| @cindex C++ code, interfacing |
| |
| Suppose we have the following hypothetical C++ library, consisting of |
| the two files @file{X.h} and @file{X.cpp}, and implementing the two |
| classes @code{X} and @code{Y} which we want to use in @R{}. |
| |
| @quotation |
| @cartouche |
| @example |
| // X.h |
| |
| class X @{ |
| public: X (); ~X (); |
| @}; |
| |
| class Y @{ |
| public: Y (); ~Y (); |
| @}; |
| @end example |
| @end cartouche |
| @end quotation |
| |
| @quotation |
| @cartouche |
| @example |
| // X.cpp |
| |
| #include <R.h> |
| #include "X.h" |
| |
| static Y y; |
| |
| X::X() @{ REprintf("constructor X\n"); @} |
| X::~X() @{ REprintf("destructor X\n"); @} |
| Y::Y() @{ REprintf("constructor Y\n"); @} |
| Y::~Y() @{ REprintf("destructor Y\n"); @} |
| @end example |
| @end cartouche |
| @end quotation |
| |
| To use with @R{}, the only thing we have to do is writing a wrapper |
| function and ensuring that the function is enclosed in |
| |
| @example |
| @group |
| extern "C" @{ |
| |
| @} |
| @end group |
| @end example |
| |
| For example, |
| |
| @quotation |
| @cartouche |
| @example |
| // X_main.cpp: |
| |
| #include "X.h" |
| |
| extern "C" @{ |
| |
| void X_main () @{ |
| X x; |
| @} |
| |
| @} // extern "C" |
| @end example |
| @end cartouche |
| @end quotation |
| |
| Compiling and linking should be done with the C++ compiler-linker |
| (rather than the C compiler-linker or the linker itself); otherwise, the |
| C++ initialization code (and hence the constructor of the static |
| variable @code{Y}) are not called. On a properly configured system, one |
| can simply use |
| |
| @example |
| R CMD SHLIB X.cpp X_main.cpp |
| @end example |
| |
| @noindent |
| to create the shared object, typically @file{X.so} (the file name |
| extension may be different on your platform). Now starting @R{} yields |
| |
| @example |
| @group |
| R version 2.14.1 Patched (2012-01-16 r58124) |
| Copyright (C) 2012 The R Foundation for Statistical Computing |
| ... |
| Type "q()" to quit R. |
| @end group |
| |
| @group |
| R> dyn.load(paste("X", .Platform$dynlib.ext, sep = "")) |
| constructor Y |
| R> .C("X_main") |
| constructor X |
| destructor X |
| list() |
| R> q() |
| Save workspace image? [y/n/c]: y |
| destructor Y |
| @end group |
| @end example |
| |
| The @R{} for Windows @acronym{FAQ} (@file{rw-FAQ}) contains details of how |
| to compile this example under Windows. |
| |
| Earlier versions of this example used C++ iostreams: this is best |
| avoided. There is no guarantee that the output will appear in the @R{} |
| console, and indeed it will not on the @R{} for Windows console. Use |
| @R{} code or the C entry points (@pxref{Printing}) for all I/O if at all |
| possible. Examples have been seen where merely loading a DLL that |
| contained calls to C++ I/O upset @R{}'s own C I/O (for example by |
| resetting buffers on open files). |
| |
| Most @R{} header files can be included within C++ programs but they |
| should @strong{not} be included within an @code{extern "C"} block (as |
| they include system headers@footnote{Even including C system headers in |
| such a block has caused compilation errors.}). The inclusion of system |
| headers in C++ changed in @R{} 3.3.0, so if you care about earlier |
| versions of @R{} please check your package there. |
| |
| Legacy header @file{S.h} cannot be used with C++. |
| |
| @subsection External C++ code |
| |
| Quite a lot of external C++ software is header-only (e.g.@: most of the |
| Boost `libraries' including all those supplied by package @CRANpkg{BH}, |
| and most of Armadillo as supplied by package @CRANpkg{RcppArmadillo}) |
| and so is compiled when an @R{} package which uses it is installed. |
| This causes few problems. |
| |
| A small number of external libraries used in @R{} packages have a C++ |
| interface to a library of compiled code, e.g.@: packages @CRANpkg{rgdal} |
| and @CRANpkg{rjags}. This raises many more problems! The C++ interface |
| uses name-mangling and the |
| ABI@footnote{@uref{https://en.wikipedia.org/@/wiki/@/Application_binary_interface}.} |
| may depend on the compiler, version and even C++ defines@footnote{For |
| example, @samp{_GLIBCXX_USE_CXX11_ABI} in @command{g++} 5.1 and later: |
| @uref{https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html}.}, |
| so requires the package C++ code to be compiled in exactly the same way |
| as the library (and what that was is often undocumented). Examples |
| include use of @command{g++} @emph{vs} @command{clang++} or Solaris' |
| @command{CC}, and the two ABIs available for C++11 in @command{g++} with |
| different defaults for GCC 4.9 and 5.x in some Linux distributions. |
| |
| Even fewer external libraries use C++ internally but present a C |
| interface, such as @CRANpkg{rgeos}. These require the C++ runtime |
| library to be linked into the package's shared object/DLL, and this is |
| best done by including a dummy C++ file in the package sources. |
| |
| There is a recent trend to link to the C++ interfaces offered by C |
| software such as @pkg{hdf5}, @pkg{pcre} and @pkg{ImageMagick}. Their C |
| interfaces are much preferred for portability (and can be used from C++ |
| code). Also, the C++ interfaces are often optional in the software |
| build or packaged separately and so users installing from package |
| sources are far less likely to already have them installed. |
| |
| |
| @node Fortran I/O, Linking to other packages, Interfacing C++ code, System and foreign language interfaces |
| @section Fortran I/O |
| |
| We have already warned against the use of C++ iostreams not least |
| because output is not guaranteed to appear on the @R{} console, and this |
| warning applies equally to Fortran output to units @code{*} |
| and @code{6}. @xref{Printing from Fortran}, which describes workarounds. |
| |
| In the past most Fortran compilers implemented I/O on top of the C I/O |
| system and so the two interworked successfully. This was true of |
| @command{g77}, but it is less true of @command{gfortran} as used in |
| @command{gcc} 4 and later. In particular, any package that makes use of |
| Fortran I/O will when compiled on Windows interfere with C I/O: when the |
| Fortran I/O support code is initialized (typically when the package is |
| loaded) the C @code{stdout} and @code{stderr} are switched to LF line |
| endings. (Function @code{init} in file |
| @file{src/modules/lapack/init_win.c} shows how to mitigate this. In a |
| package this would look something like |
| @example |
| #ifdef _WIN32 |
| # include <fcntl.h> |
| #endif |
| |
| void R_init_mypkgname(DllInfo *dll) |
| @{ |
| // Native symbol registration calls |
| |
| #ifdef _WIN32 |
| // gfortran I/O initialization sets these to _O_BINARY |
| setmode(1, _O_TEXT); /* stdout */ |
| setmode(2, _O_TEXT); /* stderr */ |
| #endif |
| @} |
| @end example |
| @noindent |
| in the file used for native symbol registration.) |
| |
| |
| @node Linking to other packages, Handling R objects in C, Fortran I/O, System and foreign language interfaces |
| @section Linking to other packages |
| |
| It is not in general possible to link a DLL in package @pkg{packA} to a |
| DLL provided by package @pkg{packB} (for the security reasons mentioned |
| in @ref{dyn.load and dyn.unload}, and also because some platforms |
| distinguish between shared objects and dynamic libraries), but it is on |
| Windows. |
| |
| Note that there can be tricky versioning issues here, as package |
| @pkg{packB} could be re-installed after package @pkg{packA} --- it is |
| desirable that the API provided by package @pkg{packB} remains |
| backwards-compatible. |
| |
| Shipping a static library in package @pkg{packB} for other packages to |
| link to avoids most of the difficulties. |
| |
| @menu |
| * Unix-alikes:: |
| * Windows:: |
| @end menu |
| |
| @node Unix-alikes, Windows, Linking to other packages, Linking to other packages |
| @subsection Unix-alikes |
| |
| It is possible to link a shared object in package @pkg{packA} to a |
| library provided by package @pkg{packB} under limited circumstances |
| on a Unix-alike OS. There are severe portability issues, so this is not |
| recommended for a distributed package. |
| |
| This is easiest if @pkg{packB} provides a static library |
| @file{packB/lib/libpackB.a}. (Note using directory @file{lib} rather |
| than @file{libs} is conventional, and architecture-specific |
| sub-directories may be needed and are assumed in the sample code |
| below. The code in the static library will need to be compiled with |
| @code{PIC} flags on platforms where it matters.) Then as the code from |
| package @pkg{packB} is incorporated when package @pkg{packA} is |
| installed, we only need to find the static library at install time for |
| package @pkg{packA}. The only issue is to find package @pkg{packB}, and |
| for that we can ask @R{} by something like (long lines broken for |
| display here) |
| |
| @example |
| PKGB_PATH=`echo 'library(packB); |
| cat(system.file("lib", package="packB", mustWork=TRUE))' \ |
| | "$@{R_HOME@}/bin/R" --vanilla --slave` |
| PKG_LIBS="$(PKGB_PATH)$(R_ARCH)/libpackB.a" |
| @end example |
| |
| For a dynamic library @file{packB/lib/libpackB.so} |
| (@file{packB/lib/libpackB.dylib} on macOS: note that you cannot link to |
| a shared object, @file{.so}, on that platform) we could use |
| |
| @example |
| PKGB_PATH=`echo 'library(packB); |
| cat(system.file("lib", package="packB", mustWork=TRUE))' \ |
| | "$@{R_HOME@}/bin/R" --vanilla --slave` |
| PKG_LIBS=-L"$(PKGB_PATH)$(R_ARCH)" -lpackB |
| @end example |
| |
| @noindent |
| This will work for installation, but very likely not when package |
| @code{packB} is loaded, as the path to package @pkg{packB}'s @file{lib} |
| directory is not in the @command{ld.so}@footnote{@command{dyld} on macOS, |
| and @env{DYLD_LIBRARY_PATHS} below.} search path. You can arrange to |
| put it there @strong{before} @R{} is launched by setting (on some |
| platforms) @env{LD_RUN_PATH} or @env{LD_LIBRARY_PATH} or adding to the |
| @command{ld.so} cache (see @command{man ldconfig}). On platforms that |
| support it, the path to the directory containing the dynamic library can |
| be hardcoded at install time (which assumes that the location of package |
| @pkg{packB} will not be changed nor the package updated to a changed |
| API). On systems with the @command{gcc} or @command{clang} and the |
| @acronym{GNU} linker (e.g.@: Linux) and some others this can be done by |
| e.g.@: |
| |
| @example |
| PKGB_PATH=`echo 'library(packB); |
| cat(system.file("lib", package="packB", mustWork=TRUE)))' \ |
| | "$@{R_HOME@}/bin/R" --vanilla --slave` |
| PKG_LIBS=-L"$(PKGB_PATH)$(R_ARCH)" -Wl,-rpath,"$(PKGB_PATH)$(R_ARCH)" -lpackB |
| @end example |
| |
| @noindent |
| Some other systems (e.g.@: Solaris with its native linker) use |
| @option{-Rdir} rather than @option{-rpath,dir} (and this is accepted by |
| the compiler as well as the linker). |
| |
| It may be possible to figure out what is required semi-automatically |
| from the result of @command{R CMD libtool --config} (look for |
| @samp{hardcode}). |
| |
| Making headers provided by package @pkg{packB} available to the code to |
| be compiled in package @pkg{packA} can be done by the @code{LinkingTo} |
| mechanism (@pxref{Registering native routines}). |
| |
| |
| @node Windows, , Unix-alikes, Linking to other packages |
| @subsection Windows |
| |
| Suppose package @pkg{packA} wants to make use of compiled code provided |
| by @pkg{packB} in DLL @file{packB/libs/exB.dll}, possibly the package's |
| DLL @file{packB/libs/packB.dll}. (This can be extended to linking to |
| more than one package in a similar way.) There are three issues to be |
| addressed: |
| |
| @itemize |
| |
| @item |
| Making headers provided by package @pkg{packB} available to the code to |
| be compiled in package @pkg{packA}. |
| |
| This is done by the @code{LinkingTo} mechanism (@pxref{Registering native |
| routines}). |
| |
| @item preparing @code{packA.dll} to link to @file{packB/libs/exB.dll}. |
| |
| This needs an entry in @file{Makevars.win} of the form |
| |
| @example |
| PKG_LIBS= -L<something> -lexB |
| @end example |
| @noindent |
| and one possibility is that @code{<something>} is the path to the |
| installed @file{pkgB/libs} directory. To find that we need to ask @R{} |
| where it is by something like |
| |
| @example |
| PKGB_PATH=`echo 'library(packB); |
| cat(system.file("libs", package="packB", mustWork=TRUE))' \ |
| | rterm --vanilla --slave` |
| PKG_LIBS= -L"$(PKGB_PATH)$(R_ARCH)" -lexB |
| @end example |
| |
| Another possibility is to use an import library, shipping with package |
| @pkg{packA} an exports file @file{exB.def}. Then @file{Makevars.win} |
| could contain |
| |
| @example |
| PKG_LIBS= -L. -lexB |
| |
| all: $(SHLIB) before |
| |
| before: libexB.dll.a |
| libexB.dll.a: exB.def |
| @end example |
| @noindent |
| and then installing package @pkg{packA} will make and use the import |
| library for @file{exB.dll}. (One way to prepare the exports file is to |
| use @file{pexports.exe}.) |
| |
| @item loading @file{packA.dll} which depends on @file{exB.dll}. |
| |
| If @code{exB.dll} was used by package @pkg{packB} (because it is in fact |
| @file{packB.dll} or @file{packB.dll} depends on it) and @pkg{packB} has |
| been loaded before @pkg{packA}, then nothing more needs to be done as |
| @file{exB.dll} will already be loaded into the @R{} executable. (This |
| is the most common scenario.) |
| |
| More generally, we can use the @code{DLLpath} argument to |
| @code{library.dynam} to ensure that @code{exB.dll} is found, for example |
| by setting |
| |
| @example |
| library.dynam("packA", pkg, lib, |
| DLLpath = system.file("libs", package="packB")) |
| @end example |
| |
| Note that @code{DLLpath} can only set one path, and so for linking to |
| two or more packages you would need to resort to setting environment |
| variable @env{PATH}. |
| |
| @end itemize |
| |
| @node Handling R objects in C, Interface functions .Call and .External, Linking to other packages, System and foreign language interfaces |
| @section Handling R objects in C |
| @cindex Handling R objects in C |
| |
| Using C code to speed up the execution of an @R{} function is often very |
| fruitful. Traditionally this has been done @emph{via} the @code{.C} |
| function in @R{}. However, if a user wants to write C code using |
| internal @R{} data structures, then that can be done using the |
| @code{.Call} and @code{.External} functions. The syntax for the calling |
| function in @R{} in each case is similar to that of @code{.C}, but the |
| two functions have different C interfaces. Generally the @code{.Call} |
| interface is simpler to use, but @code{.External} is a little more |
| general. |
| @findex .Call |
| @findex .External |
| |
| A call to @code{.Call} is very similar to @code{.C}, for example |
| |
| @example |
| .Call("convolve2", a, b) |
| @end example |
| |
| @noindent |
| The first argument should be a character string giving a C symbol name |
| of code that has already been loaded into @R{}. Up to 65 @R{} objects |
| can passed as arguments. The C side of the interface is |
| |
| @example |
| @group |
| #include <R.h> |
| #include <Rinternals.h> |
| |
| SEXP convolve2(SEXP a, SEXP b) |
| ... |
| @end group |
| @end example |
| |
| A call to @code{.External} is almost identical |
| |
| @example |
| .External("convolveE", a, b) |
| @end example |
| |
| @noindent |
| but the C side of the interface is different, having only one argument |
| |
| @example |
| @group |
| #include <R.h> |
| #include <Rinternals.h> |
| |
| SEXP convolveE(SEXP args) |
| ... |
| @end group |
| @end example |
| |
| @noindent |
| Here @code{args} is a @code{LISTSXP}, a Lisp-style pairlist from which |
| the arguments can be extracted. |
| |
| In each case the @R{} objects are available for manipulation @emph{via} |
| a set of functions and macros defined in the header file |
| @file{Rinternals.h} or some @Sl{}-compatibility macros@footnote{That is, |
| similar to those defined in @Sl{} version 4 from the 1990s: these are |
| not kept up to date and are not recommended for new projects.} See |
| @ref{Interface functions .Call and .External} for details on |
| @code{.Call} and @code{.External}. |
| |
| Before you decide to use @code{.Call} or @code{.External}, you should |
| look at other alternatives. First, consider working in interpreted @R{} |
| code; if this is fast enough, this is normally the best option. You |
| should also see if using @code{.C} is enough. If the task to be |
| performed in C is simple enough involving only atomic vectors and |
| requiring no call to @R{}, @code{.C} suffices. A great deal of useful |
| code was written using just @code{.C} before @code{.Call} and |
| @code{.External} were available. These interfaces allow much more |
| control, but they also impose much greater responsibilities so need to |
| be used with care. Neither @code{.Call} nor @code{.External} copy their |
| arguments: you should treat arguments you receive through these |
| interfaces as read-only. |
| |
| To handle @R{} objects from within C code we use the macros and functions |
| that have been used to implement the core parts of @R{}. A |
| public@footnote{ @pxref{The R API}: note that these are not all part of |
| the API.} subset of these is defined in the header file |
| @file{Rinternals.h} in the directory @file{@var{R_INCLUDE_DIR}} (default |
| @file{@var{R_HOME}/include}) that should be available on any @R{} |
| installation. |
| |
| A substantial amount of @R{}, including the standard packages, is |
| implemented using the functions and macros described here, so the @R{} |
| source code provides a rich source of examples and ``how to do it'': do |
| make use of the source code for inspirational examples. |
| |
| It is necessary to know something about how @R{} objects are handled in |
| C code. All the @R{} objects you will deal with will be handled with |
| the type @dfn{SEXP}@footnote{SEXP is an acronym for @emph{S}imple |
| @emph{EXP}ression, common in LISP-like language syntaxes.}, which is a |
| pointer to a structure with typedef @code{SEXPREC}. Think of this |
| structure as a @emph{variant type} that can handle all the usual types |
| of @R{} objects, that is vectors of various modes, functions, |
| environments, language objects and so on. The details are given later |
| in this section and in @ref{R Internal Structures, , R Internal |
| Structures, R-ints, R Internals}, but for most |
| purposes the programmer does not need to know them. Think rather of a |
| model such as that used by Visual Basic, in which @R{} objects are |
| handed around in C code (as they are in interpreted @R{} code) as the |
| variant type, and the appropriate part is extracted for, for example, |
| numerical calculations, only when it is needed. As in interpreted @R{} |
| code, much use is made of coercion to force the variant object to the |
| right type. |
| |
| @menu |
| * Garbage Collection:: |
| * Allocating storage:: |
| * Details of R types:: |
| * Attributes:: |
| * Classes:: |
| * Handling lists:: |
| * Handling character data:: |
| * Finding and setting variables:: |
| * Some convenience functions:: |
| * Named objects and copying:: |
| @end menu |
| |
| @node Garbage Collection, Allocating storage, Handling R objects in C, Handling R objects in C |
| @subsection Handling the effects of garbage collection |
| @cindex Garbage collection |
| |
| @findex PROTECT |
| @findex UNPROTECT |
| |
| We need to know a little about the way @R{} handles memory allocation. |
| The memory allocated for @R{} objects is not freed by the user; instead, |
| the memory is from time to time @dfn{garbage collected}. That is, some |
| or all of the allocated memory not being used is freed or marked as |
| re-usable. |
| |
| The @R{} object types are represented by a C structure defined by a |
| typedef @code{SEXPREC} in @file{Rinternals.h}. It contains several |
| things among which are pointers to data blocks and to other |
| @code{SEXPREC}s. A @code{SEXP} is simply a pointer to a @code{SEXPREC}. |
| |
| If you create an @R{} object in your C code, you must tell @R{} that you |
| are using the object by using the @code{PROTECT} macro on a pointer to |
| the object. This tells @R{} that the object is in use so it is not |
| destroyed during garbage collection. Notice that it is the object which |
| is protected, not the pointer variable. It is a common mistake to |
| believe that if you invoked @code{PROTECT(@var{p})} at some point then |
| @var{p} is protected from then on, but that is not true once a new |
| object is assigned to @var{p}. |
| |
| Protecting an @R{} object automatically protects all the @R{} objects |
| pointed to in the corresponding @code{SEXPREC}, for example all elements |
| of a protected list are automatically protected. |
| |
| The programmer is solely responsible for housekeeping the calls to |
| @code{PROTECT}. There is a corresponding macro @code{UNPROTECT} that |
| takes as argument an @code{int} giving the number of objects to |
| unprotect when they are no longer needed. The protection mechanism is |
| stack-based, so @code{UNPROTECT(@var{n})} unprotects the last @var{n} |
| objects which were protected. The calls to @code{PROTECT} and |
| @code{UNPROTECT} must balance when the user's code returns. @R{} will |
| warn about @code{"stack imbalance in .Call"} (or @code{.External}) if |
| the housekeeping is wrong. |
| |
| Here is a small example of creating an @R{} numeric vector in C code: |
| |
| @example |
| @group |
| #include <R.h> |
| #include <Rinternals.h> |
| |
| SEXP ab; |
| .... |
| ab = PROTECT(allocVector(REALSXP, 2)); |
| REAL(ab)[0] = 123.45; |
| REAL(ab)[1] = 67.89; |
| UNPROTECT(1); |
| @end group |
| @end example |
| |
| Now, the reader may ask how the @R{} object could possibly get removed |
| during those manipulations, as it is just our C code that is running. |
| As it happens, we can do without the protection in this example, but in |
| general we do not know (nor want to know) what is hiding behind the @R{} |
| macros and functions we use, and any of them might cause memory to be |
| allocated, hence garbage collection and hence our object @code{ab} to be |
| removed. It is usually wise to err on the side of caution and assume |
| that any of the @R{} macros and functions might remove the object. |
| |
| In some cases it is necessary to keep better track of whether protection |
| is really needed. Be particularly aware of situations where a large |
| number of objects are generated. The pointer protection stack has a |
| fixed size (default 10,000) and can become full. It is not a good idea |
| then to just @code{PROTECT} everything in sight and @code{UNPROTECT} |
| several thousand objects at the end. It will almost invariably be |
| possible to either assign the objects as part of another object (which |
| automatically protects them) or unprotect them immediately after use. |
| |
| Protection is not needed for objects which @R{} already knows are in |
| use. In particular, this applies to function arguments. |
| |
| There is a less-used macro @code{UNPROTECT_PTR(@var{s})} that unprotects the |
| object pointed to by the @code{SEXP} @var{s}, even if it is not the top item |
| on the pointer protection stack. This macro was introduced for use in the |
| parser, where the code interfacing with the R heap is generated and the |
| generator cannot be configured to insert proper calls to @code{PROTECT} and |
| @code{UNPROTECT}. However, @code{UNPROTECT_PTR} is dangerous to use in |
| combination with @code{UNPROTECT} when the same object has been protected |
| multiple times. It has been superseded by multi-set based functions |
| @code{R_PreserveInMSet} and @code{R_ReleaseFromMSet}, which protect objects |
| in a multi-set created by @code{R_NewPreciousMSet} and typically itself |
| protected using @code{PROTECT}. These functions should not be needed |
| outside parsers. |
| @findex UNPROTECT_PTR |
| @findex R_PreserveInMSet |
| @findex R_ReleaseFromMSet |
| @findex R_NewPreciousMSet |
| |
| Sometimes an object is changed (for example duplicated, coerced or |
| grown) yet the current value needs to be protected. For these cases |
| @code{PROTECT_WITH_INDEX} saves an index of the protection location that |
| can be used to replace the protected value using @code{REPROTECT}. |
| @findex PROTECT_WITH_INDEX |
| @findex REPROTECT |
| For example (from the internal code for @code{optim}) |
| |
| @example |
| PROTECT_INDEX ipx; |
| |
| .... |
| PROTECT_WITH_INDEX(s = eval(OS->R_fcall, OS->R_env), &ipx); |
| REPROTECT(s = coerceVector(s, REALSXP), ipx); |
| @end example |
| |
| Note that it is dangerous to mix @code{UNPROTECT_PTR} also with |
| @code{PROTECT_WITH_INDEX}, as the former changes the protection |
| locations of objects that were protected after the one being |
| unprotected. |
| |
| @findex R_PreserveObject |
| @findex R_ReleaseObject |
| There is another way to avoid the affects of garbage collection: a call |
| to @code{R_PreserveObject} adds an object to an internal list of objects |
| not to be collects, and a subsequent call to @code{R_ReleaseObject} |
| removes it from that list. This provides a way for objects which are |
| not returned as part of @R{} objects to be protected across calls to |
| compiled code: on the other hand it becomes the user's responsibility to |
| release them when they are no longer needed (and this often requires the |
| use of a finalizer). It is less efficient that the normal protection |
| mechanism, and should be used sparingly. |
| |
| @node Allocating storage, Details of R types, Garbage Collection, Handling R objects in C |
| @subsection Allocating storage |
| @cindex Allocating storage |
| |
| For many purposes it is sufficient to allocate @R{} objects and |
| manipulate those. There are quite a few @code{alloc@var{Xxx}} functions |
| defined in @file{Rinternals.h}---you may want to explore them. |
| |
| @findex allocVector |
| One that is commonly used is @code{allocVector}, the C-level equivalent |
| of @R{}-level @code{vector()} and its wrappers such as @code{integer()} |
| and @code{character()}. One distinction is that whereas the @R{} |
| functions always initialize the elements of the vector, |
| @code{allocVector} only does so for lists, expressions and character |
| vectors (the cases where the elements are themselves @R{} objects). |
| |
| If storage is required for C objects during the calculations this is |
| best allocating by calling @code{R_alloc}; @pxref{Memory allocation}. |
| All of these memory allocation routines do their own error-checking, so |
| the programmer may assume that they will raise an error and not return |
| if the memory cannot be allocated. |
| |
| @node Details of R types, Attributes, Allocating storage, Handling R objects in C |
| @subsection Details of R types |
| @cindex Details of R types |
| |
| Users of the @file{Rinternals.h} macros will need to know how the @R{} |
| types are known internally. The different @R{} data types are |
| represented in C by @dfn{SEXPTYPE}. Some of these are familiar from |
| @R{} and some are internal data types. The usual @R{} object modes are |
| given in the table. |
| |
| @quotation |
| @multitable {SEXPTYPE} {numeric with storage mode integer integer} |
| @headitem SEXPTYPE @tab @R{} equivalent |
| @item @code{REALSXP} @tab numeric with storage mode @code{double} |
| @item @code{INTSXP} @tab integer |
| @item @code{CPLXSXP} @tab complex |
| @item @code{LGLSXP} @tab logical |
| @item @code{STRSXP} @tab character |
| @item @code{VECSXP} @tab list (generic vector) |
| @item @code{LISTSXP} @tab pairlist |
| @item @code{DOTSXP} @tab a @samp{@dots{}} object |
| @item @code{NILSXP} @tab NULL |
| @item @code{SYMSXP} @tab name/symbol |
| @item @code{CLOSXP} @tab function or function closure |
| @item @code{ENVSXP} @tab environment |
| @end multitable |
| @end quotation |
| |
| @noindent |
| Among the important internal @code{SEXPTYPE}s are @code{LANGSXP}, |
| @code{CHARSXP}, @code{PROMSXP}, etc. (@strong{N.B.}: although it is |
| possible to return objects of internal types, it is unsafe to do so as |
| assumptions are made about how they are handled which may be violated at |
| user-level evaluation.) More details are given in @ref{R Internal |
| Structures, , R Internal Structures, R-ints, R Internals}. |
| |
| Unless you are very sure about the type of the arguments, the code |
| should check the data types. Sometimes it may also be necessary to |
| check data types of objects created by evaluating an @R{} expression in |
| the C code. You can use functions like @code{isReal}, @code{isInteger} |
| and @code{isString} to do type checking. See the header file |
| @file{Rinternals.h} for definitions of other such functions. All of |
| these take a @code{SEXP} as argument and return 1 or 0 to indicate |
| @var{TRUE} or @var{FALSE}. |
| |
| What happens if the @code{SEXP} is not of the correct type? Sometimes |
| you have no other option except to generate an error. You can use the |
| function @code{error} for this. It is usually better to coerce the |
| object to the correct type. For example, if you find that an |
| @code{SEXP} is of the type @code{INTEGER}, but you need a @code{REAL} |
| object, you can change the type by using |
| |
| @example |
| @var{newSexp} = PROTECT(coerceVector(@var{oldSexp}, REALSXP)); |
| @end example |
| |
| @noindent |
| Protection is needed as a new @emph{object} is created; the object |
| formerly pointed to by the @code{SEXP} is still protected but now |
| unused.@footnote{If no coercion was required, @code{coerceVector} would |
| have passed the old object through unchanged.} |
| |
| All the coercion functions do their own error-checking, and generate |
| @code{NA}s with a warning or stop with an error as appropriate. |
| |
| Note that these coercion functions are @emph{not} the same as calling |
| @code{as.numeric} (and so on) in @R{} code, as they do not dispatch on |
| the class of the object. Thus it is normally preferable to do the |
| coercion in the calling @R{} code. |
| |
| So far we have only seen how to create and coerce @R{} objects from C |
| code, and how to extract the numeric data from numeric @R{} vectors. |
| These can suffice to take us a long way in interfacing @R{} objects to |
| numerical algorithms, but we may need to know a little more to create |
| useful return objects. |
| |
| @node Attributes, Classes, Details of R types, Handling R objects in C |
| @subsection Attributes |
| @cindex Attributes |
| |
| Many @R{} objects have attributes: some of the most useful are classes |
| and the @code{dim} and @code{dimnames} that mark objects as matrices or |
| arrays. It can also be helpful to work with the @code{names} attribute |
| of vectors. |
| |
| To illustrate this, let us write code to take the outer product of two |
| vectors (which @code{outer} and @code{%o%} already do). As usual the |
| @R{} code is simple |
| |
| @example |
| out <- function(x, y) |
| @{ |
| storage.mode(x) <- storage.mode(y) <- "double" |
| .Call("out", x, y) |
| @} |
| @end example |
| |
| @noindent |
| where we expect @code{x} and @code{y} to be numeric vectors (possibly |
| integer), possibly with names. This time we do the coercion in the |
| calling @R{} code. |
| |
| C code to do the computations is |
| |
| @example |
| @group |
| #include <R.h> |
| #include <Rinternals.h> |
| |
| SEXP out(SEXP x, SEXP y) |
| @{ |
| int nx = length(x), ny = length(y); |
| SEXP ans = PROTECT(allocMatrix(REALSXP, nx, ny)); |
| double *rx = REAL(x), *ry = REAL(y), *rans = REAL(ans); |
| for(int i = 0; i < nx; i++) @{ |
| double tmp = rx[i]; |
| for(int j = 0; j < ny; j++) |
| rans[i + nx*j] = tmp * ry[j]; |
| @} |
| UNPROTECT(1); |
| return ans; |
| @} |
| @end group |
| @end example |
| |
| @noindent |
| Note the way @code{REAL} is used: as it is a function call it can be |
| considerably faster to store the result and index that. |
| |
| However, we would like to set the @code{dimnames} of the result. We can use |
| |
| @example |
| #include <R.h> |
| #include <Rinternals.h> |
| |
| @group |
| SEXP out(SEXP x, SEXP y) |
| @{ |
| int nx = length(x), ny = length(y); |
| SEXP ans = PROTECT(allocMatrix(REALSXP, nx, ny)); |
| double *rx = REAL(x), *ry = REAL(y), *rans = REAL(ans); |
| |
| for(int i = 0; i < nx; i++) @{ |
| double tmp = rx[i]; |
| for(int j = 0; j < ny; j++) |
| rans[i + nx*j] = tmp * ry[j]; |
| @} |
| |
| SEXP dimnames = PROTECT(allocVector(VECSXP, 2)); |
| SET_VECTOR_ELT(dimnames, 0, getAttrib(x, R_NamesSymbol)); |
| SET_VECTOR_ELT(dimnames, 1, getAttrib(y, R_NamesSymbol)); |
| setAttrib(ans, R_DimNamesSymbol, dimnames); |
| @end group |
| |
| @group |
| UNPROTECT(2); |
| return ans; |
| @} |
| @end group |
| @end example |
| |
| This example introduces several new features. The @code{getAttrib} and |
| @code{setAttrib} |
| @findex getAttrib |
| @findex setAttrib |
| functions get and set individual attributes. Their second argument is a |
| @code{SEXP} defining the name in the symbol table of the attribute we |
| want; these and many such symbols are defined in the header file |
| @file{Rinternals.h}. |
| |
| There are shortcuts here too: the functions @code{namesgets}, |
| @code{dimgets} and @code{dimnamesgets} are the internal versions of the |
| default methods of @code{names<-}, @code{dim<-} and @code{dimnames<-} |
| (for vectors and arrays), and there are functions such as |
| @code{GetMatrixDimnames} and @code{GetArrayDimnames}. |
| |
| What happens if we want to add an attribute that is not pre-defined? We |
| need to add a symbol for it @emph{via} a call to |
| @findex install |
| @code{install}. Suppose for illustration we wanted to add an attribute |
| @code{"version"} with value @code{3.0}. We could use |
| |
| @example |
| @group |
| SEXP version; |
| version = PROTECT(allocVector(REALSXP, 1)); |
| REAL(version)[0] = 3.0; |
| setAttrib(ans, install("version"), version); |
| UNPROTECT(1); |
| @end group |
| @end example |
| |
| Using @code{install} when it is not needed is harmless and provides a |
| simple way to retrieve the symbol from the symbol table if it is already |
| installed. However, the lookup takes a non-trivial amount of time, so |
| consider code such as |
| |
| @example |
| static SEXP VerSymbol = NULL; |
| ... |
| if (VerSymbol == NULL) VerSymbol = install("version"); |
| @end example |
| |
| @noindent |
| if it is to be done frequently. |
| |
| This example can be simplified by another convenience function: |
| |
| @example |
| @group |
| SEXP version = PROTECT(ScalarReal(3.0)); |
| setAttrib(ans, install("version"), version); |
| UNPROTECT(1); |
| @end group |
| @end example |
| |
| |
| @node Classes, Handling lists, Attributes, Handling R objects in C |
| @subsection Classes |
| @cindex Classes |
| |
| In @R{} the class is just the attribute named @code{"class"} so it can |
| be handled as such, but there is a shortcut @code{classgets}. Suppose |
| we want to give the return value in our example the class @code{"mat"}. |
| We can use |
| |
| @example |
| @group |
| #include <R.h> |
| #include <Rinternals.h> |
| .... |
| SEXP ans, dim, dimnames, class; |
| .... |
| class = PROTECT(allocVector(STRSXP, 1)); |
| SET_STRING_ELT(class, 0, mkChar("mat")); |
| classgets(ans, class); |
| UNPROTECT(4); |
| return ans; |
| @} |
| @end group |
| @end example |
| |
| @noindent |
| As the value is a character vector, we have to know how to create that |
| from a C character array, which we do using the function |
| @code{mkChar}. |
| |
| @node Handling lists, Handling character data, Classes, Handling R objects in C |
| @subsection Handling lists |
| @cindex Handling lists |
| |
| Some care is needed with lists, as @R{} moved early on from using |
| LISP-like lists (now called ``pairlists'') to S-like generic vectors. |
| As a result, the appropriate test for an object of mode @code{list} is |
| @code{isNewList}, and we need @code{allocVector(VECSXP, @var{n}}) and |
| @emph{not} @code{allocList(@var{n})}. |
| |
| List elements can be retrieved or set by direct access to the elements |
| of the generic vector. Suppose we have a list object |
| |
| @example |
| a <- list(f = 1, g = 2, h = 3) |
| @end example |
| |
| @noindent |
| Then we can access @code{a$g} as @code{a[[2]]} by |
| |
| @example |
| @group |
| double g; |
| .... |
| g = REAL(VECTOR_ELT(a, 1))[0]; |
| @end group |
| @end example |
| |
| This can rapidly become tedious, and the following function (based on |
| one in package @pkg{stats}) is very useful: |
| |
| @example |
| @group |
| /* get the list element named str, or return NULL */ |
| |
| SEXP getListElement(SEXP list, const char *str) |
| @{ |
| SEXP elmt = R_NilValue, names = getAttrib(list, R_NamesSymbol); |
| @end group |
| |
| @group |
| for (int i = 0; i < length(list); i++) |
| if(strcmp(CHAR(STRING_ELT(names, i)), str) == 0) @{ |
| elmt = VECTOR_ELT(list, i); |
| break; |
| @} |
| return elmt; |
| @} |
| @end group |
| @end example |
| |
| @noindent |
| and enables us to say |
| |
| @example |
| @group |
| double g; |
| g = REAL(getListElement(a, "g"))[0]; |
| @end group |
| @end example |
| |
| @node Handling character data, Finding and setting variables, Handling lists, Handling R objects in C |
| @subsection Handling character data |
| @cindex handling character data |
| |
| R character vectors are stored as @code{STRSXP}s, a vector type like |
| @code{VECSXP} where every element is of type @code{CHARSXP}. The |
| @code{CHARSXP} elements of @code{STRSXP}s are accessed using |
| @code{STRING_ELT} and @code{SET_STRING_ELT}. |
| |
| @code{CHARSXP}s are read-only objects and must never be modified. In |
| particular, the C-style string contained in a @code{CHARSXP} should be |
| treated as read-only and for this reason the @code{CHAR} function used |
| to access the character data of a @code{CHARSXP} returns @code{(const |
| char *)} (this also allows compilers to issue warnings about improper |
| use). Since @code{CHARSXP}s are immutable, the same @code{CHARSXP} can |
| be shared by any @code{STRSXP} needing an element representing the same |
| string. @R{} maintains a global cache of @code{CHARSXP}s so that there |
| is only ever one @code{CHARSXP} representing a given string in memory. |
| |
| @findex mkChar |
| @findex mkCharLen |
| You can obtain a @code{CHARSXP} by calling @code{mkChar} and providing a |
| nul-terminated C-style string. This function will return a pre-existing |
| @code{CHARSXP} if one with a matching string already exists, otherwise |
| it will create a new one and add it to the cache before returning it to |
| you. The variant @code{mkCharLen} can be used to create a |
| @code{CHARSXP} from part of a buffer and will ensure null-termination. |
| |
| Note that @R{} character strings are restricted to @code{2^31 - 1} |
| bytes, and hence so should the input to @code{mkChar} be (C allows |
| longer strings on 64-bit platforms). |
| |
| @node Finding and setting variables, Some convenience functions, Handling character data, Handling R objects in C |
| @subsection Finding and setting variables |
| @cindex Finding variables |
| @cindex Setting variables |
| |
| It will be usual that all the @R{} objects needed in our C computations |
| are passed as arguments to @code{.Call} or @code{.External}, but it is |
| possible to find the values of @R{} objects from within the C given |
| their names. The following code is the equivalent of @code{get(name, |
| envir = rho)}. |
| |
| @example |
| @group |
| SEXP getvar(SEXP name, SEXP rho) |
| @{ |
| SEXP ans; |
| |
| if(!isString(name) || length(name) != 1) |
| error("name is not a single string"); |
| if(!isEnvironment(rho)) |
| error("rho should be an environment"); |
| ans = findVar(installChar(STRING_ELT(name, 0)), rho); |
| Rprintf("first value is %f\n", REAL(ans)[0]); |
| return R_NilValue; |
| @} |
| @end group |
| @end example |
| |
| The main work is done by |
| @findex findVar |
| @code{findVar}, but to use it we need to install @code{name} as a name |
| in the symbol table. As we wanted the value for internal use, we return |
| @code{NULL}. |
| |
| Similar functions with syntax |
| |
| @example |
| @group |
| void defineVar(SEXP symbol, SEXP value, SEXP rho) |
| void setVar(SEXP symbol, SEXP value, SEXP rho) |
| @end group |
| @end example |
| @findex defineVar |
| @findex setVar |
| |
| @noindent |
| can be used to assign values to @R{} variables. @code{defineVar} |
| creates a new binding or changes the value of an existing binding in the |
| specified environment frame; it is the analogue of @code{assign(symbol, |
| value, envir = rho, inherits = FALSE)}, but unlike @code{assign}, |
| @code{defineVar} does not make a copy of the object |
| @code{value}.@footnote{You can assign a @emph{copy} of the object in the |
| environment frame @code{rho} using @code{defineVar(symbol, |
| duplicate(value), rho)}).} @code{setVar} searches for an existing |
| binding for @code{symbol} in @code{rho} or its enclosing environments. |
| If a binding is found, its value is changed to @code{value}. Otherwise, |
| a new binding with the specified value is created in the global |
| environment. This corresponds to @code{assign(symbol, value, envir = |
| rho, inherits = TRUE)}. |
| |
| @node Some convenience functions, Named objects and copying, Finding and setting variables, Handling R objects in C |
| @subsection Some convenience functions |
| |
| Some operations are done so frequently that there are convenience |
| functions to handle them. (All these are provided @emph{via} the header |
| file @file{Rinternals.h}.) |
| |
| Suppose we wanted to pass a single logical argument |
| @code{ignore_quotes}: we could use |
| |
| @example |
| int ign = asLogical(ignore_quotes); |
| if(ign == NA_LOGICAL) error("'ignore_quotes' must be TRUE or FALSE"); |
| @end example |
| |
| @noindent |
| which will do any coercion needed (at least from a vector argument), and |
| return @code{NA_LOGICAL} if the value passed was @code{NA} or coercion |
| failed. There are also @code{asInteger}, @code{asReal} and |
| @code{asComplex}. The function @code{asChar} returns a @code{CHARSXP}. |
| All of these functions ignore any elements of an input vector after the |
| first. |
| |
| To return a length-one real vector we can use |
| |
| @example |
| double x; |
| |
| ... |
| return ScalarReal(x); |
| @end example |
| |
| @noindent |
| and there are versions of this for all the atomic vector types (those for |
| a length-one character vector being @code{ScalarString} with argument a |
| @code{CHARSXP} and @code{mkString} with argument @code{const char *}). |
| |
| Some of the @code{is@var{XXXX}} functions differ from their apparent |
| @R{}-level counterparts: for example @code{isVector} is true for any |
| atomic vector type (@code{isVectorAtomic}) and for lists and expressions |
| (@code{isVectorList}) (with no check on attributes). @code{isMatrix} is |
| a test of a length-2 @code{"dim"} attribute. |
| |
| There are a series of small macros/functions to help construct pairlists |
| and language objects (whose internal structures just differ by |
| @code{SEXPTYPE}). Function @code{CONS(u, v)} is the basic building |
| block: it constructs a pairlist from @code{u} followed by @code{v} |
| (which is a pairlist or @code{R_NilValue}). @code{LCONS} is a variant |
| that constructs a language object. Functions @code{list1} to |
| @code{list6} construct a pairlist from one to six items, and |
| @code{lang1} to @code{lang6} do the same for a language object (a |
| function to call plus zero to five arguments). Functions @code{elt} and |
| @code{lastElt} find the @var{i}th element and the last element of a |
| pairlist, and @code{nthcdr} returns a pointer to the @var{n}th position |
| in the pairlist (whose @code{CAR} is the @var{n}th item). |
| |
| Functions @code{str2type} and @code{type2str} map @R{} |
| length-one character strings to and from @code{SEXPTYPE} numbers, and |
| @code{type2char} maps numbers to C character strings. |
| |
| @comment Want to encourage use of some of the more stable and useful R_* |
| @comment and Rf_* functions: |
| @menu |
| * Semi-internal convenience functions:: |
| @end menu |
| |
| @node Semi-internal convenience functions, , Some convenience functions, Some convenience functions |
| @subsubsection Semi-internal convenience functions |
| |
| There is quite a collection of functions that may be used in your C code |
| @emph{if} you are willing to adapt to rare ``API'' changes. |
| These typically contain ``workhorses'' of their @R{} counterparts. |
| |
| Functions @code{any_duplicated} and @code{any_duplicated3} are fast |
| versions of @R{}'s @code{any(duplicated(.))}. |
| |
| Function @code{R_compute_identical} corresponds to @R{}'s @code{identical} function. |
| |
| |
| @node Named objects and copying, , Some convenience functions, Handling R objects in C |
| @subsection Named objects and copying |
| @findex duplicate |
| @cindex Copying objects |
| |
| When assignments are done in @R{} such as |
| |
| @example |
| x <- 1:10 |
| y <- x |
| @end example |
| |
| @noindent |
| the named object is not necessarily copied, so after those two |
| assignments @code{y} and @code{x} are bound to the same @code{SEXPREC} |
| (the structure a @code{SEXP} points to). This means that any code which |
| alters one of them has to make a copy before modifying the copy if the |
| usual @R{} semantics are to apply. Note that whereas @code{.C} and |
| @code{.Fortran} do copy their arguments (unless the dangerous @code{dup |
| = FALSE} is used), @code{.Call} and @code{.External} do not. So |
| @code{duplicate} is commonly called on arguments to @code{.Call} before |
| modifying them. |
| |
| However, at least some of this copying is unneeded. In the first |
| assignment shown, @code{x <- 1:10}, @R{} first creates an object with |
| value @code{1:10} and then assigns it to @code{x} but if @code{x} is |
| modified no copy is necessary as the temporary object with value |
| @code{1:10} cannot be referred to again. @R{} distinguishes between |
| named and unnamed objects @emph{via} a field in a @code{SEXPREC} that |
| can be accessed @emph{via} the macros @code{NAMED} and @code{SET_NAMED}. This |
| can take values |
| |
| @table @code |
| @item 0 |
| The object is not bound to any symbol |
| @item 1 |
| The object has been bound to exactly one symbol |
| @item >= 2 |
| The object has potentially been bound to two or more symbols, and one |
| should act as if another variable is currently bound to this value. |
| The maximal value is @code{NAMEDMAX}. |
| @end table |
| |
| @noindent |
| Note the past tenses: @R{} does not do full reference counting and there |
| may currently be fewer bindings. |
| |
| It is safe to modify the value of any @code{SEXP} for which |
| @code{NAMED(foo)} is zero, and if @code{NAMED(foo)} is two or more, the |
| value should be duplicated (@emph{via} a call to @code{duplicate}) |
| before any modification. Note that it is the responsibility of the |
| author of the code making the modification to do the duplication, even |
| if it is @code{x} whose value is being modified after @code{y <- x}. |
| |
| The case @code{NAMED(foo) == 1} allows some optimization, but it can be |
| ignored (and duplication done whenever @code{NAMED(foo) > 0}). (This |
| optimization is not currently usable in user code.) It is intended |
| for use within replacement functions. Suppose we used |
| |
| @example |
| x <- 1:10 |
| foo(x) <- 3 |
| @end example |
| |
| @noindent |
| which is computed as |
| |
| @example |
| x <- 1:10 |
| x <- "foo<-"(x, 3) |
| @end example |
| |
| @noindent |
| Then inside @code{"foo<-"} the object pointing to the current value of |
| @code{x} will have @code{NAMED(foo)} as one, and it would be safe to |
| modify it as the only symbol bound to it is @code{x} and that will be |
| rebound immediately. (Provided the remaining code in @code{"foo<-"} |
| make no reference to @code{x}, and no one is going to attempt a direct |
| call such as @code{y <- "foo<-"(x)}.) |
| |
| This mechanism is to be replaced in @R{} 4.0.0. To |
| support future changes, package code should use the macros |
| @code{MAYBE_REFERENCED}, @code{MAYBE_SHARED}, and |
| @code{MARK_NOT_MUTABLE}. These currently correspond to |
| |
| @table@code |
| @item MAYBE_REFERENCED(x) |
| @code{NAMED(x) > 0} |
| @item MAYBE_SHARED(x) |
| @code{NAMED(x) > 1} |
| @item MARK_NOT_MUTABLE(x) |
| @code{SET_NAMED(x, NAMEDMAX)} |
| @end table |
| |
| @c commented out as people misread this as general. |
| @c Currently all arguments to a @code{.Call} call will have @code{NAMED} |
| @c set to 2 or higher and so users must assume that they need to be duplicated |
| @c before alteration. |
| |
| @node Interface functions .Call and .External, Evaluating R expressions from C, Handling R objects in C, System and foreign language interfaces |
| @section Interface functions @code{.Call} and @code{.External} |
| @cindex Interfaces to compiled code |
| |
| In this section we consider the details of the @R{}/C interfaces. |
| |
| These two interfaces have almost the same functionality. @code{.Call} is |
| based on the interface of the same name in @Sl{} version 4, and |
| @code{.External} is based on @R{}'s @code{.Internal}. @code{.External} |
| is more complex but allows a variable number of arguments. |
| |
| @menu |
| * Calling .Call:: |
| * Calling .External:: |
| * Missing and special values:: |
| @end menu |
| |
| @node Calling .Call, Calling .External, Interface functions .Call and .External, Interface functions .Call and .External |
| @subsection Calling @code{.Call} |
| |
| @findex .Call |
| |
| Let us convert our finite convolution example to use @code{.Call}. The |
| calling function in @R{} is |
| |
| @example |
| conv <- function(a, b) .Call("convolve2", a, b) |
| @end example |
| |
| @noindent |
| which could hardly be simpler, but as we shall see all the type |
| coercion is transferred to the C code, which is |
| |
| @example |
| @group |
| #include <R.h> |
| #include <Rinternals.h> |
| |
| SEXP convolve2(SEXP a, SEXP b) |
| @{ |
| int na, nb, nab; |
| double *xa, *xb, *xab; |
| SEXP ab; |
| |
| a = PROTECT(coerceVector(a, REALSXP)); |
| b = PROTECT(coerceVector(b, REALSXP)); |
| na = length(a); nb = length(b); nab = na + nb - 1; |
| ab = PROTECT(allocVector(REALSXP, nab)); |
| xa = REAL(a); xb = REAL(b); xab = REAL(ab); |
| for(int i = 0; i < nab; i++) xab[i] = 0.0; |
| for(int i = 0; i < na; i++) |
| for(int j = 0; j < nb; j++) xab[i + j] += xa[i] * xb[j]; |
| UNPROTECT(3); |
| return ab; |
| @} |
| @end group |
| @end example |
| |
| @node Calling .External, Missing and special values, Calling .Call, Interface functions .Call and .External |
| @subsection Calling @code{.External} |
| |
| @findex .External |
| |
| We can use the same example to illustrate @code{.External}. The @R{} |
| code changes only by replacing @code{.Call} by @code{.External} |
| |
| @example |
| conv <- function(a, b) .External("convolveE", a, b) |
| @end example |
| |
| @noindent |
| but the main change is how the arguments are passed to the C code, this |
| time as a single SEXP. The only change to the C code is how we handle |
| the arguments. |
| |
| @example |
| @group |
| #include <R.h> |
| #include <Rinternals.h> |
| |
| SEXP convolveE(SEXP args) |
| @{ |
| int i, j, na, nb, nab; |
| double *xa, *xb, *xab; |
| SEXP a, b, ab; |
| |
| a = PROTECT(coerceVector(CADR(args), REALSXP)); |
| b = PROTECT(coerceVector(CADDR(args), REALSXP)); |
| ... |
| @} |
| @end group |
| @end example |
| |
| @noindent |
| Once again we do not need to protect the arguments, as in the @R{} side |
| of the interface they are objects that are already in use. The macros |
| |
| @example |
| @group |
| first = CADR(args); |
| second = CADDR(args); |
| third = CADDDR(args); |
| fourth = CAD4R(args); |
| @end group |
| @end example |
| |
| @noindent |
| provide convenient ways to access the first four arguments. More |
| generally we can use the |
| @findex CAR |
| @findex CDR |
| @code{CDR} and @code{CAR} macros as in |
| |
| @example |
| @group |
| args = CDR(args); a = CAR(args); |
| args = CDR(args); b = CAR(args); |
| @end group |
| @end example |
| |
| @noindent |
| which clearly allows us to extract an unlimited number of arguments |
| (whereas @code{.Call} has a limit, albeit at 65 not a small one). |
| |
| More usefully, the @code{.External} interface provides an easy way to |
| handle calls with a variable number of arguments, as @code{length(args)} |
| will give the number of arguments supplied (of which the first is |
| ignored). We may need to know the names (`tags') given to the actual |
| arguments, which we can by using the @code{TAG} macro and using |
| something like the following example, that prints the names and the first |
| value of its arguments if they are vector types. |
| |
| @example |
| @group |
| SEXP showArgs(SEXP args) |
| @{ |
| args = CDR(args); /* skip 'name' */ |
| for(int i = 0; args != R_NilValue; i++, args = CDR(args)) @{ |
| const char *name = |
| isNull(TAG(args)) ? "" : CHAR(PRINTNAME(TAG(args))); |
| SEXP el = CAR(args); |
| if (length(el) == 0) @{ |
| Rprintf("[%d] '%s' R type, length 0\n", i+1, name); |
| continue; |
| @} |
| @end group |
| @group |
| switch(TYPEOF(el)) @{ |
| case REALSXP: |
| Rprintf("[%d] '%s' %f\n", i+1, name, REAL(el)[0]); |
| break; |
| @end group |
| @group |
| case LGLSXP: |
| case INTSXP: |
| Rprintf("[%d] '%s' %d\n", i+1, name, INTEGER(el)[0]); |
| break; |
| @end group |
| @group |
| case CPLXSXP: |
| @{ |
| Rcomplex cpl = COMPLEX(el)[0]; |
| Rprintf("[%d] '%s' %f + %fi\n", i+1, name, cpl.r, cpl.i); |
| @} |
| break; |
| @end group |
| @group |
| case STRSXP: |
| Rprintf("[%d] '%s' %s\n", i+1, name, |
| CHAR(STRING_ELT(el, 0))); |
| break; |
| @end group |
| @group |
| default: |
| Rprintf("[%d] '%s' R type\n", i+1, name); |
| @} |
| @} |
| return R_NilValue; |
| @} |
| @end group |
| @end example |
| |
| This can be called by the wrapper function |
| |
| @example |
| showArgs <- function(...) invisible(.External("showArgs", ...)) |
| @end example |
| |
| @noindent |
| Note that this style of programming is convenient but not necessary, as |
| an alternative style is |
| |
| @example |
| showArgs1 <- function(...) invisible(.Call("showArgs1", list(...))) |
| @end example |
| |
| @noindent |
| The (very similar) C code is in the scripts. |
| |
| @node Missing and special values, , Calling .External, Interface functions .Call and .External |
| @subsection Missing and special values |
| @cindex Missing values |
| @cindex IEEE special values |
| |
| One piece of error-checking the @code{.C} call does (unless @code{NAOK} |
| is true) is to check for missing (@code{NA}) and @acronym{IEEE} special |
| values (@code{Inf}, @code{-Inf} and @code{NaN}) and give an error if any |
| are found. With the @code{.Call} interface these will be passed to our |
| code. In this example the special values are no problem, as |
| @acronym{IEC60559} arithmetic will handle them correctly. In the current |
| implementation this is also true of @code{NA} as it is a type of |
| @code{NaN}, but it is unwise to rely on such details. Thus we will |
| re-write the code to handle @code{NA}s using macros defined in |
| @file{R_ext/Arith.h} included by @file{R.h}. |
| |
| The code changes are the same in any of the versions of @code{convolve2} |
| or @code{convolveE}: |
| |
| @example |
| @group |
| ... |
| for(int i = 0; i < na; i++) |
| for(int j = 0; j < nb; j++) |
| if(ISNA(xa[i]) || ISNA(xb[j]) || ISNA(xab[i + j])) |
| xab[i + j] = NA_REAL; |
| else |
| xab[i + j] += xa[i] * xb[j]; |
| ... |
| @end group |
| @end example |
| |
| @findex ISNA |
| @findex ISNAN |
| |
| Note that the @code{ISNA} macro, and the similar macros @code{ISNAN} |
| (which checks for @code{NaN} or @code{NA}) and @code{R_FINITE} (which is |
| false for @code{NA} and all the special values), only apply to numeric |
| values of type @code{double}. Missingness of integers, logicals and |
| character strings can be tested by equality to the constants |
| @code{NA_INTEGER}, @code{NA_LOGICAL} and @code{NA_STRING}. These and |
| @code{NA_REAL} can be used to set elements of @R{} vectors to @code{NA}. |
| |
| The constants @code{R_NaN}, @code{R_PosInf} and @code{R_NegInf} can be |
| used to set @code{double}s to the special values. |
| |
| @node Evaluating R expressions from C, Parsing R code from C, Interface functions .Call and .External, System and foreign language interfaces |
| @section Evaluating R expressions from C |
| @cindex Evaluating R expressions from C |
| |
| The main function we will use is |
| |
| @example |
| SEXP eval(SEXP expr, SEXP rho); |
| @end example |
| |
| @noindent |
| the equivalent of the interpreted @R{} code @code{eval(expr, envir = |
| rho)} (so @code{rho} must be an environment), although we can also make |
| use of @code{findVar}, @code{defineVar} and @code{findFun} (which |
| restricts the search to functions). |
| |
| To see how this might be applied, here is a simplified internal version |
| of @code{lapply} for expressions, used as |
| |
| @example |
| @group |
| a <- list(a = 1:5, b = rnorm(10), test = runif(100)) |
| .Call("lapply", a, quote(sum(x)), new.env()) |
| @end group |
| @end example |
| |
| @noindent |
| with C code |
| |
| @example |
| @group |
| SEXP lapply(SEXP list, SEXP expr, SEXP rho) |
| @{ |
| int n = length(list); |
| SEXP ans; |
| |
| if(!isNewList(list)) error("'list' must be a list"); |
| if(!isEnvironment(rho)) error("'rho' should be an environment"); |
| ans = PROTECT(allocVector(VECSXP, n)); |
| for(int i = 0; i < n; i++) @{ |
| defineVar(install("x"), VECTOR_ELT(list, i), rho); |
| SET_VECTOR_ELT(ans, i, eval(expr, rho)); |
| @} |
| setAttrib(ans, R_NamesSymbol, getAttrib(list, R_NamesSymbol)); |
| UNPROTECT(1); |
| return ans; |
| @} |
| @end group |
| @end example |
| |
| It would be closer to @code{lapply} if we could pass in a function |
| rather than an expression. One way to do this is @emph{via} interpreted |
| @R{} code as in the next example, but it is possible (if somewhat |
| obscure) to do this in C code. The following is based on the code in |
| @file{src/main/optimize.c}. |
| |
| @example |
| @group |
| SEXP lapply2(SEXP list, SEXP fn, SEXP rho) |
| @{ |
| int n = length(list); |
| SEXP R_fcall, ans; |
| |
| if(!isNewList(list)) error("'list' must be a list"); |
| if(!isFunction(fn)) error("'fn' must be a function"); |
| if(!isEnvironment(rho)) error("'rho' should be an environment"); |
| R_fcall = PROTECT(lang2(fn, R_NilValue)); |
| ans = PROTECT(allocVector(VECSXP, n)); |
| for(int i = 0; i < n; i++) @{ |
| SETCADR(R_fcall, VECTOR_ELT(list, i)); |
| SET_VECTOR_ELT(ans, i, eval(R_fcall, rho)); |
| @} |
| setAttrib(ans, R_NamesSymbol, getAttrib(list, R_NamesSymbol)); |
| UNPROTECT(2); |
| return ans; |
| @} |
| @end group |
| @end example |
| |
| @noindent |
| used by |
| |
| @example |
| .Call("lapply2", a, sum, new.env()) |
| @end example |
| |
| @noindent |
| Function @code{lang2} creates an executable pairlist of two elements, but |
| this will only be clear to those with a knowledge of a LISP-like |
| language. |
| |
| As a more comprehensive example of constructing an @R{} call in C code |
| and evaluating, consider the following fragment of |
| @code{printAttributes} in @file{src/main/print.c}. |
| |
| @example |
| /* Need to construct a call to |
| print(CAR(a), digits=digits) |
| based on the R_print structure, then eval(call, env). |
| See do_docall for the template for this sort of thing. |
| */ |
| SEXP s, t; |
| t = s = PROTECT(allocList(3)); |
| SET_TYPEOF(s, LANGSXP); |
| SETCAR(t, install("print")); t = CDR(t); |
| SETCAR(t, CAR(a)); t = CDR(t); |
| SETCAR(t, ScalarInteger(digits)); |
| SET_TAG(t, install("digits")); |
| eval(s, env); |
| UNPROTECT(1); |
| @end example |
| |
| @noindent |
| At this point @code{CAR(a)} is the @R{} object to be printed, the |
| current attribute. There are three steps: the call is constructed as |
| a pairlist of length 3, the list is filled in, and the expression |
| represented by the pairlist is evaluated. |
| |
| A pairlist is quite distinct from a generic vector list, the only |
| user-visible form of list in @R{}. A pairlist is a linked list (with |
| @code{CDR(t)} computing the next entry), with items (accessed by |
| @code{CAR(t)}) and names or tags (set by @code{SET_TAG}). In this call |
| there are to be three items, a symbol (pointing to the function to be |
| called) and two argument values, the first unnamed and the second named. |
| Setting the type to @code{LANGSXP} makes this a call which can be evaluated. |
| |
| Customarily, the evaluation environment is passed from the calling |
| @R{} code (see @code{rho} above). In special cases it is possible that |
| the C code may need to obtain the current evaluation environment |
| which can be done via @code{R_GetCurrentEnv()} function. |
| |
| @menu |
| * Zero-finding:: |
| * Calculating numerical derivatives:: |
| @end menu |
| |
| @node Zero-finding, Calculating numerical derivatives, Evaluating R expressions from C, Evaluating R expressions from C |
| @subsection Zero-finding |
| @cindex Zero-finding |
| |
| In this section we re-work the example of Becker, Chambers & Wilks (1988, |
| pp.~205--10) on finding a zero of a univariate function. The @R{} code |
| and an example are |
| |
| @example |
| zero <- function(f, guesses, tol = 1e-7) @{ |
| f.check <- function(x) @{ |
| x <- f(x) |
| if(!is.numeric(x)) stop("Need a numeric result") |
| as.double(x) |
| @} |
| .Call("zero", body(f.check), as.double(guesses), as.double(tol), |
| new.env()) |
| @} |
| |
| cube1 <- function(x) (x^2 + 1) * (x - 1.5) |
| zero(cube1, c(0, 5)) |
| @end example |
| |
| @noindent |
| where this time we do the coercion and error-checking in the @R{} code. |
| The C code is |
| |
| @example |
| @group |
| SEXP mkans(double x) |
| @{ |
| // no need for PROTECT() here, as REAL(.) does not allocate: |
| SEXP ans = allocVector(REALSXP, 1); |
| REAL(ans)[0] = x; |
| return ans; |
| @} |
| @end group |
| |
| @group |
| double feval(double x, SEXP f, SEXP rho) |
| @{ |
| // a version with (too) much PROTECT()ion .. "better safe than sorry" |
| SEXP symbol, value; |
| PROTECT(symbol = install("x")); |
| PROTECT(value = mkans(x)); |
| defineVar(symbol, value, rho); |
| UNPROTECT(2); |
| return(REAL(eval(f, rho))[0]); |
| @} |
| @end group |
| |
| @group |
| SEXP zero(SEXP f, SEXP guesses, SEXP stol, SEXP rho) |
| @{ |
| double x0 = REAL(guesses)[0], x1 = REAL(guesses)[1], |
| tol = REAL(stol)[0]; |
| double f0, f1, fc, xc; |
| @end group |
| |
| @group |
| if(tol <= 0.0) error("non-positive tol value"); |
| f0 = feval(x0, f, rho); f1 = feval(x1, f, rho); |
| if(f0 == 0.0) return mkans(x0); |
| if(f1 == 0.0) return mkans(x1); |
| if(f0*f1 > 0.0) error("x[0] and x[1] have the same sign"); |
| @end group |
| |
| @group |
| for(;;) @{ |
| xc = 0.5*(x0+x1); |
| if(fabs(x0-x1) < tol) return mkans(xc); |
| fc = feval(xc, f, rho); |
| if(fc == 0) return mkans(xc); |
| if(f0*fc > 0.0) @{ |
| x0 = xc; f0 = fc; |
| @} else @{ |
| x1 = xc; f1 = fc; |
| @} |
| @} |
| @} |
| @end group |
| @end example |
| |
| @node Calculating numerical derivatives, , Zero-finding, Evaluating R expressions from C |
| @subsection Calculating numerical derivatives |
| @cindex Numerical derivatives |
| |
| We will use a longer example (by Saikat DebRoy) to illustrate the use of |
| evaluation and @code{.External}. This calculates numerical derivatives, |
| something that could be done as effectively in interpreted @R{} code but |
| may be needed as part of a larger C calculation. |
| |
| An interpreted @R{} version and an example are |
| |
| @example |
| @group |
| numeric.deriv <- function(expr, theta, rho=sys.frame(sys.parent())) |
| @{ |
| eps <- sqrt(.Machine$double.eps) |
| ans <- eval(substitute(expr), rho) |
| grad <- matrix(, length(ans), length(theta), |
| dimnames=list(NULL, theta)) |
| for (i in seq_along(theta)) @{ |
| old <- get(theta[i], envir=rho) |
| delta <- eps * max(1, abs(old)) |
| assign(theta[i], old+delta, envir=rho) |
| ans1 <- eval(substitute(expr), rho) |
| assign(theta[i], old, envir=rho) |
| grad[, i] <- (ans1 - ans)/delta |
| @} |
| attr(ans, "gradient") <- grad |
| ans |
| @} |
| omega <- 1:5; x <- 1; y <- 2 |
| numeric.deriv(sin(omega*x*y), c("x", "y")) |
| @end group |
| @end example |
| |
| @noindent |
| where @code{expr} is an expression, @code{theta} a character vector of |
| variable names and @code{rho} the environment to be used. |
| |
| For the compiled version the call from @R{} will be |
| |
| @example |
| .External("numeric_deriv", @var{expr}, @var{theta}, @var{rho}) |
| @end example |
| |
| @noindent |
| with example usage |
| |
| @example |
| .External("numeric_deriv", quote(sin(omega*x*y)), |
| c("x", "y"), .GlobalEnv) |
| @end example |
| |
| @noindent |
| Note the need to quote the expression to stop it being evaluated in the |
| caller. |
| |
| Here is the complete C code which we will explain section by section. |
| |
| @example |
| @group |
| #include <R.h> /* for DOUBLE_EPS */ |
| #include <Rinternals.h> |
| |
| SEXP numeric_deriv(SEXP args) |
| @{ |
| SEXP theta, expr, rho, ans, ans1, gradient, par, dimnames; |
| double tt, xx, delta, eps = sqrt(DOUBLE_EPS), *rgr, *rans; |
| int i, start; |
| @end group |
| |
| @group |
| expr = CADR(args); |
| if(!isString(theta = CADDR(args))) |
| error("theta should be of type character"); |
| if(!isEnvironment(rho = CADDDR(args))) |
| error("rho should be an environment"); |
| @end group |
| |
| @group |
| ans = PROTECT(coerceVector(eval(expr, rho), REALSXP)); |
| gradient = PROTECT(allocMatrix(REALSXP, LENGTH(ans), LENGTH(theta))); |
| rgr = REAL(gradient); rans = REAL(ans); |
| @end group |
| |
| @group |
| for(i = 0, start = 0; i < LENGTH(theta); i++, start += LENGTH(ans)) @{ |
| par = PROTECT(findVar(installChar(STRING_ELT(theta, i)), rho)); |
| tt = REAL(par)[0]; |
| xx = fabs(tt); |
| delta = (xx < 1) ? eps : xx*eps; |
| REAL(par)[0] += delta; |
| ans1 = PROTECT(coerceVector(eval(expr, rho), REALSXP)); |
| for(int j = 0; j < LENGTH(ans); j++) |
| rgr[j + start] = (REAL(ans1)[j] - rans[j])/delta; |
| REAL(par)[0] = tt; |
| UNPROTECT(2); /* par, ans1 */ |
| @} |
| @end group |
| |
| @group |
| dimnames = PROTECT(allocVector(VECSXP, 2)); |
| SET_VECTOR_ELT(dimnames, 1, theta); |
| dimnamesgets(gradient, dimnames); |
| setAttrib(ans, install("gradient"), gradient); |
| UNPROTECT(3); /* ans gradient dimnames */ |
| return ans; |
| @} |
| @end group |
| @end example |
| |
| The code to handle the arguments is |
| |
| @example |
| @group |
| expr = CADR(args); |
| if(!isString(theta = CADDR(args))) |
| error("theta should be of type character"); |
| if(!isEnvironment(rho = CADDDR(args))) |
| error("rho should be an environment"); |
| @end group |
| @end example |
| |
| @noindent |
| Note that we check for correct types of @code{theta} and @code{rho} but |
| do not check the type of @code{expr}. That is because @code{eval} can |
| handle many types of @R{} objects other than @code{EXPRSXP}. There is |
| no useful coercion we can do, so we stop with an error message if the |
| arguments are not of the correct mode. |
| |
| The first step in the code is to evaluate the expression in the |
| environment @code{rho}, by |
| |
| @example |
| ans = PROTECT(coerceVector(eval(expr, rho), REALSXP)); |
| @end example |
| |
| @noindent |
| We then allocate space for the calculated derivative by |
| |
| @example |
| gradient = PROTECT(allocMatrix(REALSXP, LENGTH(ans), LENGTH(theta))); |
| @end example |
| |
| @noindent |
| The first argument to @code{allocMatrix} gives the @code{SEXPTYPE} of |
| the matrix: here we want it to be @code{REALSXP}. The other two |
| arguments are the numbers of rows and columns. (Note that @code{LENGTH} |
| is intended to be used for vectors: @code{length} is more generally |
| applicable.) |
| |
| @example |
| @group |
| for(i = 0, start = 0; i < LENGTH(theta); i++, start += LENGTH(ans)) @{ |
| par = PROTECT(findVar(installChar(STRING_ELT(theta, i)), rho)); |
| @end group |
| @end example |
| |
| @noindent |
| Here, we are entering a for loop. We loop through each of the |
| variables. In the @code{for} loop, we first create a symbol |
| corresponding to the @code{i}'th element of the @code{STRSXP} |
| @code{theta}. Here, @code{STRING_ELT(theta, i)} accesses the |
| @code{i}'th element of the @code{STRSXP} @code{theta}. Macro |
| @code{CHAR()} extracts the actual character |
| representation@footnote{@pxref{Character encoding issues} for why this |
| might not be what is required.} of it: it returns a pointer. We then |
| install the name and use @code{findVar} to find its value. |
| |
| @example |
| @group |
| tt = REAL(par)[0]; |
| xx = fabs(tt); |
| delta = (xx < 1) ? eps : xx*eps; |
| REAL(par)[0] += delta; |
| ans1 = PROTECT(coerceVector(eval(expr, rho), REALSXP)); |
| @end group |
| @end example |
| |
| @noindent |
| We first extract the real value of the parameter, then calculate |
| @code{delta}, the increment to be used for approximating the numerical |
| derivative. Then we change the value stored in @code{par} (in |
| environment @code{rho}) by @code{delta} and evaluate @code{expr} in |
| environment @code{rho} again. Because we are directly dealing with |
| original @R{} memory locations here, @R{} does the evaluation for the |
| changed parameter value. |
| |
| @example |
| @group |
| for(int j = 0; j < LENGTH(ans); j++) |
| rgr[j + start] = (REAL(ans1)[j] - rans[j])/delta; |
| REAL(par)[0] = tt; |
| UNPROTECT(2); |
| @} |
| @end group |
| @end example |
| |
| @noindent |
| Now, we compute the @code{i}'th column of the gradient matrix. Note how |
| it is accessed: @R{} stores matrices by column (like Fortran). |
| |
| @example |
| @group |
| dimnames = PROTECT(allocVector(VECSXP, 2)); |
| SET_VECTOR_ELT(dimnames, 1, theta); |
| dimnamesgets(gradient, dimnames); |
| setAttrib(ans, install("gradient"), gradient); |
| UNPROTECT(3); |
| return ans; |
| @} |
| @end group |
| @end example |
| |
| @noindent |
| First we add column names to the gradient matrix. This is done by |
| allocating a list (a @code{VECSXP}) whose first element, the row names, |
| is @code{NULL} (the default) and the second element, the column names, |
| is set as @code{theta}. This list is then assigned as the attribute |
| having the symbol @code{R_DimNamesSymbol}. Finally we set the gradient |
| matrix as the gradient attribute of @code{ans}, unprotect the remaining |
| protected locations and return the answer @code{ans}. |
| |
| @node Parsing R code from C, External pointers and weak references, Evaluating R expressions from C, System and foreign language interfaces |
| @section Parsing R code from C |
| @cindex Parsing R code from C |
| |
| Suppose an @R{} extension want to accept an @R{} expression from the |
| user and evaluate it. The previous section covered evaluation, but the |
| expression will be entered as text and needs to be parsed first. A |
| small part of @R{}'s parse interface is declared in header file |
| @file{R_ext/Parse.h}@footnote{This is only guaranteed to show the |
| current interface: it is liable to change.}. |
| |
| An example of the usage can be found in the (example) Windows package |
| @pkg{windlgs} included in the @R{} source tree. The essential part is |
| |
| @example |
| @group |
| #include <R.h> |
| #include <Rinternals.h> |
| #include <R_ext/Parse.h> |
| |
| SEXP menu_ttest3() |
| @{ |
| char cmd[256]; |
| SEXP cmdSexp, cmdexpr, ans = R_NilValue; |
| ParseStatus status; |
| ... |
| if(done == 1) @{ |
| cmdSexp = PROTECT(allocVector(STRSXP, 1)); |
| SET_STRING_ELT(cmdSexp, 0, mkChar(cmd)); |
| cmdexpr = PROTECT(R_ParseVector(cmdSexp, -1, &status, R_NilValue)); |
| if (status != PARSE_OK) @{ |
| UNPROTECT(2); |
| error("invalid call %s", cmd); |
| @} |
| /* Loop is needed here as EXPSEXP will be of length > 1 */ |
| for(int i = 0; i < length(cmdexpr); i++) |
| ans = eval(VECTOR_ELT(cmdexpr, i), R_GlobalEnv); |
| UNPROTECT(2); |
| @} |
| return ans; |
| @} |
| @end group |
| @end example |
| @noindent |
| Note that a single line of text may give rise to more than one @R{} |
| expression. |
| |
| @findex R_ParseVector |
| @code{R_ParseVector} is essentially the code used to implement |
| @code{parse(text=)} at @R{} level. The first argument is a character |
| vector (corresponding to @code{text}) and the second the maximal |
| number of expressions to parse (corresponding to @code{n}). The third |
| argument is a pointer to a variable of an enumeration type, and it is |
| normal (as @code{parse} does) to regard all values other than |
| @code{PARSE_OK} as an error. Other values which might be returned are |
| @code{PARSE_INCOMPLETE} (an incomplete expression was found) and |
| @code{PARSE_ERROR} (a syntax error), in both cases the value returned |
| being @code{R_NilValue}. The fourth argument is a length one character |
| vector to be used as a filename in error messages, a @code{srcfile} |
| object or the @R{} @code{NULL} object (as in the example above). If a |
| @code{srcfile} object was used, a @code{srcref} attribute would be |
| attached to the result, containing a list of @code{srcref} objects of |
| the same length as the expression, to allow it to be echoed with its |
| original formatting. |
| |
| @menu |
| * Accessing source references:: |
| @end menu |
| |
| @node Accessing source references, , Parsing R code from C, Parsing R code from C |
| @subsection Accessing source references |
| |
| The source references added by the parser are recorded by @R{}'s evaluator |
| as it evaluates code. Two functions |
| make these available to debuggers running C code: |
| @findex R_Srcref |
| @findex R_GetCurrentSrcref |
| @findex R_GetSrcFilename |
| |
| @example |
| SEXP R_GetCurrentSrcref(int skip); |
| @end example |
| |
| This function checks @code{R_Srcref} and the current evaluation stack |
| for entries that contain source reference information. The |
| @code{skip} argument tells how many source references to skip before |
| returning the @code{SEXP} of the @code{srcref} object, counting from |
| the top of the stack. If @code{skip < 0}, @code{abs(skip)} locations |
| are counted up from the bottom of the stack. If too few or no source |
| references are found, @code{NULL} is returned. |
| |
| @example |
| SEXP R_GetSrcFilename(SEXP srcref); |
| @end example |
| |
| This function extracts the filename from the source reference for |
| display, returning a length 1 character vector containing the |
| filename. If no name is found, @code{""} is returned. |
| |
| @node External pointers and weak references, Vector accessor functions, Parsing R code from C, System and foreign language interfaces |
| @section External pointers and weak references |
| |
| The @code{SEXPTYPE}s @code{EXTPTRSXP} and @code{WEAKREFSXP} can be |
| encountered at @R{} level, but are created in C code. |
| |
| @cindex external pointer |
| External pointer @code{SEXP}s are intended to handle references to C |
| structures such as `handles', and are used for this purpose in package |
| @CRANpkg{RODBC} for example. They are unusual in their copying semantics in |
| that when an @R{} object is copied, the external pointer object is not |
| duplicated. (For this reason external pointers should only be used as |
| part of an object with normal semantics, for example an attribute or an |
| element of a list.) |
| |
| An external pointer is created by |
| |
| @example |
| SEXP R_MakeExternalPtr(void *p, SEXP tag, SEXP prot); |
| @end example |
| |
| @noindent |
| where @code{p} is the pointer (and hence this cannot portably be a |
| function pointer), and @code{tag} and @code{prot} are references to |
| ordinary @R{} objects which will remain in existence (be protected from |
| garbage collection) for the lifetime of the external pointer object. A |
| useful convention is to use the @code{tag} field for some form of type |
| identification and the @code{prot} field for protecting the memory that |
| the external pointer represents, if that memory is allocated from the |
| @R{} heap. Both @code{tag} and @code{prot} can be @code{R_NilValue}, |
| and often are. |
| |
| An alternative way as from @R{} 3.4.0 to create an external pointer from |
| a function pointer is |
| |
| @example |
| typedef void * (*R_DL_FUNC)(); |
| SEXP R_MakeExternalPtrFn(R_DL_FUNC p, SEXP tag, SEXP prot); |
| @end example |
| |
| |
| The elements of an external pointer can be accessed and set @emph{via} |
| |
| @example |
| void *R_ExternalPtrAddr(SEXP s); |
| DL_FUNC R_ExternalPtrAddrFn(SEXP s); |
| SEXP R_ExternalPtrTag(SEXP s); |
| SEXP R_ExternalPtrProtected(SEXP s); |
| void R_ClearExternalPtr(SEXP s); |
| void R_SetExternalPtrAddr(SEXP s, void *p); |
| void R_SetExternalPtrTag(SEXP s, SEXP tag); |
| void R_SetExternalPtrProtected(SEXP s, SEXP p); |
| @end example |
| |
| @noindent |
| Clearing a pointer sets its value to the C @code{NULL} pointer. |
| |
| @cindex finalizer |
| An external pointer object can have a @emph{finalizer}, a piece of code |
| to be run when the object is garbage collected. This can be @R{} code |
| or C code, and the various interfaces are, respectively. |
| |
| @example |
| void R_RegisterFinalizerEx(SEXP s, SEXP fun, Rboolean onexit); |
| |
| typedef void (*R_CFinalizer_t)(SEXP); |
| void R_RegisterCFinalizerEx(SEXP s, R_CFinalizer_t fun, Rboolean onexit); |
| @end example |
| |
| @noindent |
| The @R{} function indicated by @code{fun} should be a function of a |
| single argument, the object to be finalized. @R{} does not perform a |
| garbage collection when shutting down, and the @code{onexit} argument of |
| the extended forms can be used to ask that the finalizer be run during a |
| normal shutdown of the @R{} session. It is suggested that it is good |
| practice to clear the pointer on finalization. |
| |
| The only @R{} level function for interacting with external pointers is |
| @code{reg.finalizer} which can be used to set a finalizer. |
| |
| It is probably not a good idea to allow an external pointer to be |
| @code{save}d and then reloaded, but if this happens the pointer will be |
| set to the C @code{NULL} pointer. |
| |
| Finalizers can be run at many places in the code base and much of it, |
| including the @R{} interpreter, is not re-entrant. So great care is |
| needed in choosing the code to be run in a finalizer. Finalizers are |
| marked to be run at garbage collection but only run at a somewhat safe |
| point thereafter. |
| |
| @cindex weak reference |
| Weak references are used to allow the programmer to maintain information |
| on entities without preventing the garbage collection of the entities |
| once they become unreachable. |
| |
| A weak reference contains a key and a value. The value is reachable is |
| if it either reachable directly or @emph{via} weak references with reachable |
| keys. Once a value is determined to be unreachable during garbage |
| collection, the key and value are set to @code{R_NilValue} and the |
| finalizer will be run later in the garbage collection. |
| |
| Weak reference objects are created by one of |
| |
| @example |
| SEXP R_MakeWeakRef(SEXP key, SEXP val, SEXP fin, Rboolean onexit); |
| SEXP R_MakeWeakRefC(SEXP key, SEXP val, R_CFinalizer_t fin, |
| Rboolean onexit); |
| @end example |
| |
| @noindent |
| where the @R{} or C finalizer are specified in exactly the same way as |
| for an external pointer object (whose finalization interface is |
| implemented @emph{via} weak references). |
| |
| The parts can be accessed @emph{via} |
| |
| @example |
| SEXP R_WeakRefKey(SEXP w); |
| SEXP R_WeakRefValue(SEXP w); |
| void R_RunWeakRefFinalizer(SEXP w); |
| @end example |
| |
| A toy example of the use of weak references can be found at |
| @uref{https://homepage.stat.uiowa.edu/~luke/R/references/weakfinex.html}, |
| but that is used to add finalizers to external pointers which can now be |
| done more directly. At the time of writing no @acronym{CRAN} or |
| Bioconductor package uses weak references. |
| |
| |
| @menu |
| * An external pointer example:: |
| @end menu |
| |
| @node An external pointer example, , External pointers and weak references, External pointers and weak references |
| @subsection An example |
| |
| Package @CRANpkg{RODBC} uses external pointers to maintain its |
| @emph{channels}, connections to databases. There can be several |
| connections open at once, and the status information for each is stored |
| in a C structure (pointed to by @code{thisHandle} in the code extract |
| below) that is returned @emph{via} an external pointer as part of the RODBC |
| `channel' (as the @code{"handle_ptr"} attribute). The external pointer |
| is created by |
| |
| @example |
| SEXP ans, ptr; |
| ans = PROTECT(allocVector(INTSXP, 1)); |
| ptr = R_MakeExternalPtr(thisHandle, install("RODBC_channel"), R_NilValue); |
| PROTECT(ptr); |
| R_RegisterCFinalizerEx(ptr, chanFinalizer, TRUE); |
| ... |
| /* return the channel no */ |
| INTEGER(ans)[0] = nChannels; |
| /* and the connection string as an attribute */ |
| setAttrib(ans, install("connection.string"), constr); |
| setAttrib(ans, install("handle_ptr"), ptr); |
| UNPROTECT(3); |
| return ans; |
| @end example |
| |
| @noindent |
| Note the symbol given to identify the usage of the external pointer, and |
| the use of the finalizer. Since the final argument when registering the |
| finalizer is @code{TRUE}, the finalizer will be run at the end of the |
| @R{} session (unless it crashes). This is used to close and clean up |
| the connection to the database. The finalizer code is simply |
| |
| @example |
| static void chanFinalizer(SEXP ptr) |
| @{ |
| if(!R_ExternalPtrAddr(ptr)) return; |
| inRODBCClose(R_ExternalPtrAddr(ptr)); |
| R_ClearExternalPtr(ptr); /* not really needed */ |
| @} |
| @end example |
| |
| @noindent |
| Clearing the pointer and checking for a @code{NULL} pointer avoids any |
| possibility of attempting to close an already-closed channel. |
| |
| @R{}'s connections provide another example of using external pointers, |
| in that case purely to be able to use a finalizer to close and destroy the |
| connection if it is no longer is use. |
| |
| @node Vector accessor functions, Character encoding issues, External pointers and weak references, System and foreign language interfaces |
| @section Vector accessor functions |
| |
| The vector accessors like @code{REAL} and @code{INTEGER} and |
| @code{VECTOR_ELT} are @emph{functions} when used in @R{} extensions. |
| (For efficiency they may be macros or inline functions when used in |
| the @R{} source code, apart from @code{SET_STRING_ELT} and |
| @code{SET_VECTOR_ELT} which are always functions.) |
| |
| The accessor functions check that they are being used on an appropriate |
| type of @code{SEXP}. |
| |
| If efficiency is essential, the internal versions of the accessors can be |
| obtained by defining @samp{USE_RINTERNALS} before including |
| @file{Rinternals.h}. If you find it necessary to do so, please do test |
| that your code compiles without @samp{USE_RINTERNALS} defined, as this |
| provides a stricter test that the accessors have been used correctly. |
| Also be prepared to adjust your code should @R{} internals change. |
| Note too that the use of @samp{USE_RINTERNALS} when the header is |
| included in C++ code is not supported: doing so may use C99 features |
| which are not necessarily supported by the C++ compiler. Nor is use |
| with @file{Rdefines.h} supported. |
| |
| The accessor functions, and other functions in the @R{} API, are also |
| subject to change to support the @samp{ALTREP} project |
| (@uref{https://svn.r-project.org/R/branches/ALTREP/ALTREP.html}). Code |
| that does not define @samp{USE_RINTERNALS} should not be affected by |
| these changes, but code that does define @samp{USE_RINTERNALS} may need |
| to be adjusted. |
| |
| |
| @node Character encoding issues, , Vector accessor functions, System and foreign language interfaces |
| @section Character encoding issues |
| |
| @findex translateChar |
| @findex translateCharUTF8 |
| @code{CHARSXP}s can be marked as coming from a known encoding (Latin-1 |
| or UTF-8). This is mainly intended for human-readable output, and most |
| packages can just treat such @code{CHARSXP}s as a whole. However, if |
| they need to be interpreted as characters or output at C level then it |
| would normally be correct to ensure that they are converted to the |
| encoding of the current locale: this can be done by accessing the data |
| in the @code{CHARSXP} by @code{translateChar} rather than by |
| @code{CHAR}. If re-encoding is needed this allocates memory with |
| @code{R_alloc} which thus persists to the end of the |
| @code{.Call}/@code{.External} call unless @code{vmaxset} is used |
| (@pxref{Transient storage allocation}). |
| |
| There is a similar function @code{translateCharUTF8} which converts to |
| UTF-8: this has the advantage that a faithful translation is almost |
| always possible (whereas only a few languages can be represented in the |
| encoding of the current locale unless that is UTF-8). |
| |
| Both @code{translateChar} and @code{translateCharUTF8} will translate |
| any input, using escapes such as @samp{<A9>} and @samp{<U+0093>} to |
| represent untranslatable parts of the input. |
| |
| @findex getCharCE |
| @findex mkCharCE |
| There is a public interface to the encoding marked on @code{CHARXSXPs} |
| @emph{via} |
| |
| @example |
| typedef enum @{CE_NATIVE, CE_UTF8, CE_LATIN1, CE_BYTES, CE_SYMBOL, CE_ANY@} cetype_t; |
| cetype_t getCharCE(SEXP); |
| SEXP mkCharCE(const char *, cetype_t); |
| @end example |
| |
| @noindent |
| Only @code{CE_UTF8} and @code{CE_LATIN1} are marked on @code{CHARSXPs} |
| (and so @code{Rf_getCharCE} will only return one of the first three), |
| and these should only be used on non-@acronym{ASCII} strings. Value |
| @code{CE_BYTES} is used to make @code{CHARSXP}s which should be regarded |
| as a set of bytes and not translated. Value @code{CE_SYMBOL} is used |
| internally to indicate Adobe Symbol encoding. Value @code{CE_ANY} is |
| used to indicate a character string that will not need re-encoding -- |
| this is used for character strings known to be in @acronym{ASCII}, and |
| can also be used as an input parameter where the intention is that the |
| string is treated as a series of bytes. (See the comments under |
| @code{mkChar} about the length of input allowed.) |
| |
| Function |
| |
| @findex reEnc |
| @example |
| const char *reEnc(const char *x, cetype_t ce_in, cetype_t ce_out, |
| int subst); |
| @end example |
| |
| @noindent |
| can be used to re-encode character strings: like @code{translateChar} it |
| returns a string allocated by @code{R_alloc}. This can translate from |
| @code{CE_SYMBOL} to @code{CE_UTF8}, but not conversely. Argument |
| @code{subst} controls what to do with untranslatable characters or |
| invalid input: this is done byte-by-byte with @code{1} indicates to |
| output hex of the form @code{<a0>}, and @code{2} to replace by @code{.}, |
| with any other value causing the byte to produce no output. |
| |
| @findex mkCharLenCE |
| There is also |
| |
| @example |
| SEXP mkCharLenCE(const char *, size_t, cetype_t); |
| @end example |
| |
| @noindent |
| to create marked character strings of a given length. |
| |
| |
| @node The R API, Generic functions and methods, System and foreign language interfaces, Top |
| @chapter The R @acronym{API}: entry points for C code |
| |
| @menu |
| * Memory allocation:: |
| * Error signaling:: |
| * Random numbers:: |
| * Missing and IEEE values:: |
| * Printing:: |
| * Calling C from Fortran and vice versa:: |
| * Numerical analysis subroutines:: |
| * Optimization:: |
| * Integration:: |
| * Utility functions:: |
| * Re-encoding:: |
| * Condition handling and cleanup code:: |
| * Allowing interrupts:: |
| * Platform and version information:: |
| * Inlining C functions:: |
| * Controlling visibility:: |
| * Standalone Mathlib:: |
| * Organization of header files:: |
| @end menu |
| |
| There are a large number of entry points in the @R{} executable/DLL that |
| can be called from C code (and some that can be called from Fortran |
| code). Only those documented here are stable enough that they will only |
| be changed with considerable notice. |
| |
| The recommended procedure to use these is to include the header file |
| @file{R.h} in your C code by |
| |
| @example |
| #include <R.h> |
| @end example |
| |
| @noindent |
| This will include several other header files from the directory |
| @file{@var{R_INCLUDE_DIR}/R_ext}, and there are other header files |
| there that can be included too, but many of the features they contain |
| should be regarded as undocumented and unstable. |
| |
| Most of these header files, including all those included by @file{R.h}, |
| can be used from C++ code. |
| |
| @quotation Note |
| Because @R{} re-maps many of its external names to avoid clashes with |
| user code, it is @emph{essential} to include the appropriate header |
| files when using these entry points. |
| @end quotation |
| |
| This remapping can cause problems@footnote{Known problems are redefining |
| @code{LENGTH}, @code{error}, @code{length}, @code{vector} and |
| @code{warning}}, and can be eliminated by defining @code{R_NO_REMAP} and |
| prepending @samp{Rf_} to @emph{all} the function names used from |
| @file{Rinternals.h} and @file{R_ext/Error.h}. These problems can |
| usually be avoided by including other headers (such as system headers |
| and those for external software used by the package) before @file{R.h}. |
| |
| We can classify the entry points as |
| |
| @table @emph |
| @item API |
| Entry points which are documented in this manual and declared in an |
| installed header file. These can be used in distributed packages and |
| will only be changed after deprecation. |
| |
| @item public |
| Entry points declared in an installed header file that are exported |
| on all @R{} platforms but are not documented and subject to change |
| without notice. |
| |
| @item private |
| Entry points that are used when building @R{} and exported on all @R{} |
| platforms but are not declared in the installed header files. |
| Do not use these in distributed code. |
| |
| @item hidden |
| Entry points that are where possible (Windows and some modern Unix-alike |
| compilers/loaders when using @R{} as a shared library) not exported. |
| @end table |
| |
| @node Memory allocation, Error signaling, The R API, The R API |
| @section Memory allocation |
| @cindex Memory allocation from C |
| |
| @menu |
| * Transient storage allocation:: |
| * User-controlled memory:: |
| @end menu |
| |
| There are two types of memory allocation available to the C programmer, |
| one in which @R{} manages the clean-up and the other in which user |
| has full control (and responsibility). |
| |
| @node Transient storage allocation, User-controlled memory, Memory allocation, Memory allocation |
| @subsection Transient storage allocation |
| @findex R_alloc |
| @findex R_allocLD |
| @findex S_alloc |
| @findex S_realloc |
| @findex vmaxget |
| @findex vmaxset |
| |
| Here @R{} will reclaim the memory at the end of the call to @code{.C}, |
| @code{.Call} or @code{.External}. Use |
| |
| @example |
| char *R_alloc(size_t @var{n}, int @var{size}) |
| @end example |
| |
| @noindent |
| which allocates @var{n} units of @var{size} bytes each. A typical usage |
| (from package @pkg{stats}) is |
| |
| @example |
| x = (int *) R_alloc(nrows(merge)+2, sizeof(int)); |
| @end example |
| |
| @noindent |
| (@code{size_t} is defined in @file{stddef.h} which the header defining |
| @code{R_alloc} includes.) |
| |
| There is a similar call, @code{S_alloc} (for compatibility with older |
| versions of @Sl{}) which zeroes the memory allocated, |
| |
| @example |
| char *S_alloc(long @var{n}, int @var{size}) |
| @end example |
| |
| @noindent |
| and |
| |
| @example |
| char *S_realloc(char *@var{p}, long @var{new}, long @var{old}, int @var{size}) |
| @end example |
| |
| @noindent |
| which changes the allocation size from @var{old} to @var{new} units, and |
| zeroes the additional units. |
| |
| For compatibility with current versions of @Sl{}, header @file{S.h} |
| (only) defines wrapper macros equivalent to |
| |
| @example |
| type* Salloc(long @var{n}, int @var{type}) |
| type* Srealloc(char *@var{p}, long @var{new}, long @var{old}, int @var{type}) |
| @end example |
| |
| This memory is taken from the heap, and released at the end of the |
| @code{.C}, @code{.Call} or @code{.External} call. Users can also manage |
| it, by noting the current position with a call to @code{vmaxget} and |
| subsequently clearing memory allocated by a call to @code{vmaxset}. An |
| example might be |
| |
| @example |
| void *vmax = vmaxget() |
| // a loop involving the use of R_alloc at each iteration |
| vmaxset(vmax) |
| @end example |
| |
| @noindent |
| This is only recommended for experts. |
| |
| Note that this memory will be freed on error or user interrupt |
| (if allowed: @pxref{Allowing interrupts}). |
| |
| The memory returned is only guaranteed to be aligned as required for |
| @code{double} pointers: take precautions if casting to a pointer which |
| needs more. There is also |
| |
| @example |
| long double *R_allocLD(size_t @var{n}) |
| @end example |
| |
| @noindent |
| which is guaranteed to have the 16-byte alignment needed for @code{long |
| double} pointers on some platforms. |
| |
| |
| These functions should only be used in code called by @code{.C} etc, |
| never from front-ends. They are not thread-safe. |
| |
| @node User-controlled memory, , Transient storage allocation, Memory allocation |
| @subsection User-controlled memory |
| @findex Calloc |
| @findex Realloc |
| @findex Free |
| |
| The other form of memory allocation is an interface to @code{malloc}, |
| the interface providing @R{} error signaling. This memory lasts until |
| freed by the user and is additional to the memory allocated for the @R{} |
| workspace. |
| |
| The interface functions are |
| |
| @example |
| @group |
| @var{type}* Calloc(size_t @var{n}, @var{type}) |
| @var{type}* Realloc(@var{any} *@var{p}, size_t @var{n}, @var{type}) |
| void Free(@var{any} *@var{p}) |
| @end group |
| @end example |
| |
| @noindent |
| providing analogues of @code{calloc}, @code{realloc} and @code{free}. |
| If there is an error during allocation it is handled by @R{}, so if |
| these routines return the memory has been successfully allocated or |
| freed. @code{Free} will set the pointer @var{p} to @code{NULL}. (Some |
| but not all versions of @Sl{} do so.) |
| |
| Users should arrange to @code{Free} this memory when no longer needed, |
| including on error or user interrupt. This can often be done most |
| conveniently from an @code{on.exit} action in the calling @R{} function |
| -- see @code{pwilcox} for an example. |
| |
| Do not assume that memory allocated by @code{Calloc}/@code{Realloc} |
| comes from the same pool as used by @code{malloc}: in particular do not |
| use @code{free} or @code{strdup} with it. |
| |
| Memory obtained by these functions should be aligned in the same way as |
| @code{malloc}, that is `suitably aligned for any kind of variable'. |
| |
| These entry points need to be prefixed by @code{R_} if |
| @code{STRICT_R_HEADERS} has been defined. |
| |
| |
| @node Error signaling, Random numbers, Memory allocation, The R API |
| @section Error signaling |
| @cindex Error signaling from C |
| |
| The basic error signaling routines are the equivalents of @code{stop} and |
| @code{warning} in @R{} code, and use the same interface. |
| |
| @example |
| @group |
| void error(const char * @var{format}, ...); |
| void warning(const char * @var{format}, ...); |
| @end group |
| @end example |
| |
| @noindent |
| These have the same call sequences as calls to @code{printf}, but in the |
| simplest case can be called with a single character string argument |
| giving the error message. (Don't do this if the string contains @samp{%} |
| or might otherwise be interpreted as a format.) |
| |
| If @code{STRICT_R_HEADERS} is not defined there is also an |
| @Sl{}-compatibility interface which uses calls of the form |
| |
| @example |
| @group |
| PROBLEM ...... ERROR |
| MESSAGE ...... WARN |
| PROBLEM ...... RECOVER(NULL_ENTRY) |
| MESSAGE ...... WARNING(NULL_ENTRY) |
| @end group |
| @end example |
| |
| @noindent |
| the last two being the forms available in all @Sl{} versions. Here |
| @samp{......} is a set of arguments to @code{printf}, so can be a string |
| or a format string followed by arguments separated by commas. |
| |
| @menu |
| * Error signaling from Fortran:: |
| @end menu |
| |
| @node Error signaling from Fortran, , Error signaling, Error signaling |
| @subsection Error signaling from Fortran |
| @cindex Error signaling from Fortran |
| |
| There are two interface function provided to call @code{error} and |
| @code{warning} from Fortran code, in each case with a simple character |
| string argument. They are defined as |
| |
| @example |
| @group |
| subroutine rexit(@var{message}) |
| subroutine rwarn(@var{message}) |
| @end group |
| @end example |
| |
| Messages of more than 255 characters are truncated, with a warning. |
| |
| |
| @node Random numbers, Missing and IEEE values, Error signaling, The R API |
| @section Random number generation |
| @cindex Random numbers in C |
| @findex unif_rand |
| @findex norm_rand |
| @findex exp_rand |
| @findex R_unif_index |
| @findex GetRNGstate |
| @findex PutRNGstate |
| @findex .Random.seed |
| @findex seed_in |
| @findex seed_out |
| |
| The interface to @R{}'s internal random number generation routines is |
| |
| @example |
| @group |
| double unif_rand(); |
| double norm_rand(); |
| double exp_rand(); |
| double R_unif_index(double); |
| @end group |
| @end example |
| |
| @noindent |
| giving one uniform, normal or exponential pseudo-random variate. |
| However, before these are used, the user must call |
| |
| @example |
| GetRNGstate(); |
| @end example |
| |
| @noindent |
| and after all the required variates have been generated, call |
| |
| @example |
| PutRNGstate(); |
| @end example |
| |
| @noindent |
| These essentially read in (or create) @code{.Random.seed} and write it |
| out after use. |
| |
| File @file{S.h} defines @code{seed_in} and @code{seed_out} for |
| @Sl{}-compatibility rather than @code{GetRNGstate} and |
| @code{PutRNGstate}. These take a @code{long *} argument which is |
| ignored. |
| |
| The random number generator is private to @R{}; there is no way to |
| select the kind of RNG or set the seed except by evaluating calls to the |
| @R{} functions. |
| |
| The C code behind @R{}'s @code{r@var{xxx}} functions can be accessed by |
| including the header file @file{Rmath.h}; @xref{Distribution |
| functions}. Those calls generate a single variate and should also be |
| enclosed in calls to @code{GetRNGstate} and @code{PutRNGstate}. |
| |
| @c MM: FIXME void rmultinom() is different, returning a vector! |
| |
| @node Missing and IEEE values, Printing, Random numbers, The R API |
| @section Missing and @acronym{IEEE} special values |
| @cindex Missing values |
| @cindex IEEE special values |
| @findex ISNA |
| @findex ISNAN |
| @findex R_FINITE |
| @findex R_IsNaN |
| @findex R_PosInf |
| @findex R_NegInf |
| @findex NA_REAL |
| |
| A set of functions is provided to test for @code{NA}, @code{Inf}, |
| @code{-Inf} and @code{NaN}. These functions are accessed @emph{via} macros: |
| |
| @example |
| @group |
| ISNA(@var{x}) @r{True for R's @code{NA} only} |
| ISNAN(@var{x}) @r{True for R's @code{NA} and @acronym{IEEE} @code{NaN}} |
| R_FINITE(@var{x}) @r{False for @code{Inf}, @code{-Inf}, @code{NA}, @code{NaN}} |
| @end group |
| @end example |
| |
| @noindent |
| and @emph{via} function @code{R_IsNaN} which is true for @code{NaN} but not |
| @code{NA}. |
| |
| Do use @code{R_FINITE} rather than @code{isfinite} or @code{finite}; the |
| latter is often mendacious and @code{isfinite} is only available on a |
| some platforms, on which @code{R_FINITE} is a macro expanding to |
| @code{isfinite}. |
| |
| Currently in C code @code{ISNAN} is a macro calling @code{isnan}. |
| (Since this gives problems on some C++ systems, if the @R{} headers is |
| called from C++ code a function call is used.) |
| |
| You can check for @code{Inf} or @code{-Inf} by testing equality to |
| @code{R_PosInf} or @code{R_NegInf}, and set (but not test) an @code{NA} |
| as @code{NA_REAL}. |
| |
| All of the above apply to @emph{double} variables only. For integer |
| variables there is a variable accessed by the macro @code{NA_INTEGER} |
| which can used to set or test for missingness. |
| |
| |
| @node Printing, Calling C from Fortran and vice versa, Missing and IEEE values, The R API |
| @section Printing |
| @cindex Printing from C |
| @findex Rprintf |
| @findex REprintf |
| @findex Rvprintf |
| @findex REvprintf |
| |
| The most useful function for printing from a C routine compiled into |
| @R{} is @code{Rprintf}. This is used in exactly the same way as |
| @code{printf}, but is guaranteed to write to @R{}'s output (which might |
| be a @acronym{GUI} console rather than a file, and can be re-directed by |
| @code{sink}). It is wise to write complete lines (including the |
| @code{"\n"}) before returning to @R{}. It is defined in |
| @file{R_ext/Print.h}. |
| |
| The function @code{REprintf} is similar but writes on the error stream |
| (@code{stderr}) which may or may not be different from the standard |
| output stream. |
| |
| Functions @code{Rvprintf} and @code{REvprintf} are analogues using the |
| @code{vprintf} interface. Because that is a C99@footnote{also part of |
| C++11.} interface, they are only defined by @file{R_ext/Print.h} in C++ |
| code if the macro @code{R_USE_C99_IN_CXX} is defined when it is |
| included. |
| |
| Another circumstance when it may be important to use these functions is |
| when using parallel computation on a cluster of computational nodes, as |
| their output will be re-directed/logged appropriately. |
| |
| @menu |
| * Printing from Fortran:: |
| @end menu |
| |
| @node Printing from Fortran, , Printing, Printing |
| @subsection Printing from Fortran |
| @cindex Printing from Fortran |
| |
| On many systems Fortran @code{write} and @code{print} statements can be |
| used, but the output may not interleave well with that of C, and will be |
| invisible on @acronym{GUI} interfaces. They are not portable and best |
| avoided. |
| |
| Three subroutines are provided to ease the output of information from |
| Fortran code. |
| |
| @example |
| @group |
| subroutine dblepr(@var{label}, @var{nchar}, @var{data}, @var{ndata}) |
| subroutine realpr(@var{label}, @var{nchar}, @var{data}, @var{ndata}) |
| subroutine intpr (@var{label}, @var{nchar}, @var{data}, @var{ndata}) |
| @end group |
| @end example |
| |
| @noindent |
| Here @var{label} is a character label of up to 255 characters, |
| @var{nchar} is its length (which can be @code{-1} if the whole label is |
| to be used), and @var{data} is an array of length at least @var{ndata} |
| of the appropriate type (@code{double precision}, @code{real} and |
| @code{integer} respectively). These routines print the label on one |
| line and then print @var{data} as if it were an @R{} vector on |
| subsequent line(s). They work with zero @var{ndata}, and so can be used |
| to print a label alone. Note though that some compilers will give an |
| error or warning unless @var{data} is an array: others will accept a |
| scalar when @var{ndata} has value one or zero. |
| |
| @node Calling C from Fortran and vice versa, Numerical analysis subroutines, Printing, The R API |
| @section Calling C from Fortran and vice versa |
| @cindex Calling C from Fortran and vice versa |
| |
| Naming conventions for symbols generated by Fortran differ by platform: |
| it is not safe to assume that Fortran names appear to C with a trailing |
| underscore. To help cover up the platform-specific differences there is |
| a set of macros@footnote{The @samp{F77_} in the names is historical and |
| dates back to usage in @Sl{}.} that should be used. |
| |
| @table @code |
| @item F77_SUB(@var{name}) |
| to define a function in C to be called from Fortran |
| @item F77_NAME(@var{name}) |
| to declare a Fortran routine in C before use |
| @item F77_CALL(@var{name}) |
| to call a Fortran routine from C |
| @item F77_COMDECL(@var{name}) |
| to declare a Fortran common block in C |
| @item F77_COM(@var{name}) |
| to access a Fortran common block from C |
| @end table |
| |
| On most current platforms these are all the same, but it is unwise to |
| rely on this. Note that names containing underscores were not legal in |
| Fortran 77, and are not portably handled by the above macros. (Also, |
| all Fortran names for use by @R{} are lower case, but this is not |
| enforced by the macros.) |
| |
| For example, suppose we want to call R's normal random numbers from |
| Fortran. We need a C wrapper along the lines of |
| |
| @cindex Random numbers in Fortran |
| @example |
| @group |
| #include <R.h> |
| |
| void F77_SUB(rndstart)(void) @{ GetRNGstate(); @} |
| void F77_SUB(rndend)(void) @{ PutRNGstate(); @} |
| double F77_SUB(normrnd)(void) @{ return norm_rand(); @} |
| @end group |
| @end example |
| |
| @noindent |
| to be called from Fortran as in |
| |
| @example |
| @group |
| subroutine testit() |
| double precision normrnd, x |
| call rndstart() |
| x = normrnd() |
| call dblepr("X was", 5, x, 1) |
| call rndend() |
| end |
| @end group |
| @end example |
| |
| @noindent |
| Note that this is not guaranteed to be portable, for the return |
| conventions might not be compatible between the C and Fortran compilers |
| used. (Passing values @emph{via} arguments is safer.) |
| |
| The standard packages, for example @pkg{stats}, are a rich source of |
| further examples. |
| |
| Where supported, @emph{link time optimization} provides a reliable way |
| to check the consistency of calls to C from Fortran or @emph{vice |
| versa}. |
| @ifset UseExternalXrefs |
| @xref{Link-Time Optimization, , Link-Time Optimization, |
| R-admin, R Installation and Administration}. |
| @end ifset |
| One place where this occurs is the registration of @code{.Fortran} calls |
| in C code (@pxref{Registering native routines}). For example |
| @example |
| init.c:10:13: warning: type of 'vsom_' does not match original |
| declaration [-Wlto-type-mismatch] |
| extern void F77_NAME(vsom)(void *, void *, void *, void *, |
| void *, void *, void *, void *, void *); |
| vsom.f90:20:33: note: type mismatch in parameter 9 |
| subroutine vsom(neurons,dt,dtrows,dtcols,xdim,ydim,alpha,train) |
| vsom.f90:20:33: note: 'vsom' was previously declared here |
| @end example |
| shows that a subroutine has been registered with 9 arguments (as that is |
| what the @code{.Fortran} call used) but only has 8. |
| |
| @menu |
| * Fortran character strings:: |
| * Fortran LOGICAL:: |
| @end menu |
| |
| @node Fortran character strings, Fortran LOGICAL, Calling C from Fortran and vice versa, Calling C from Fortran and vice versa |
| @subsection Fortran character strings |
| |
| Passing character strings from C to Fortran or @emph{vice versa} is |
| not portable, but can be done with care. The internal representations |
| are different: a character array in C (or C++) is nul-terminated so its |
| length can be computed by @code{strlen}. Fortran character arrays are |
| typically stored as an array of bytes and a length. This matters when |
| passing strings from C to Fortran or @emph{vice versa}: in many cases |
| one has been able to get away with passing the string but not the |
| length. However, in 2019 this changed for @command{gfortran}, starting |
| with version 9 but backported to versions 7 and 8. Several months |
| later, @command{gfortran} 9.2 introduced an option |
| @example |
| -ftail-call-workaround |
| @end example |
| @noindent |
| and made it the current default but said it might be withdrawn in future. |
| |
| Suppose we want a function to report a message from Fortran to @R{}'s |
| console (one could use @code{intpr} with dummy data, but it might be the |
| basis of a custom reporting function). Suppose the equivalent in Fortran |
| would be |
| @example |
| subroutine rmsg(msg) |
| character*(*) msg |
| print *.msg |
| end |
| @end example |
| @noindent |
| in file @file{rmsg.f}. Using @command{gfortran} 9.2 and later we can |
| extract the C view by |
| @example |
| gfortran -c -fc-prototypes-external rmsg.f |
| @end example |
| @noindent |
| which gives |
| @example |
| void rmsg_ (char *msg, size_t msg_len); |
| @end example |
| @noindent |
| (where @code{size_t} applies to version 8 and later). We could re-write |
| that portably in C as |
| @example |
| #define USE_FC_LEN_T |
| #include <Rconfig.h> // included by R.h, so define USE_FC_LEN_T early |
| |
| void F77_NAME(rmsg)(char *msg, FC_LEN_T msg_len) |
| @{ |
| char cmsg[msg_len+1]; |
| strncpy(cmsg, msg, msg_len); |
| cmsg[msg_len] = '\0'; // nul-terminate the string, to be sure |
| // do something with 'cmsg' |
| @} |
| @end example |
| @noindent |
| in code depending on @code{R(>= 3.6.2)}. For earlier versions of @R{} we |
| could just assume that @code{msg} is nul-terminated (not guaranteed, but |
| people have been getting away with it for many years), so the complete C |
| side might be |
| @example |
| #define USE_FC_LEN_T |
| #include <Rconfig.h> |
| |
| #ifdef FC_LEN_T |
| void F77_NAME(rmsg)(char *msg, FC_LEN_T msg_len) |
| @{ |
| char cmsg[msg_len+1]; |
| strncpy(cmsg, msg, msg_len); |
| cmsg[msg_len] = '\0'; |
| // do something with 'cmsg' |
| @} |
| #else |
| void F77_NAME(rmsg)(char *msg) |
| @{ |
| // do something with 'msg' |
| @} |
| #endif |
| @end example |
| |
| |
| An alternative is to use Fortran 2003 features@footnote{These started to |
| be implemented in compilers @emph{ca} 2007, e.g.@: in @command{gfortran} |
| 4.3.} to set up the Fortran routine to pass a C-compatible character |
| string. We could use something like |
| @example |
| module cfuncs |
| use iso_c_binding, only: c_char, c_null_char |
| interface |
| subroutine cmsg(msg) bind(C, name = 'cmsg') |
| use iso_c_binding, only: c_char |
| character(kind = c_char):: msg(*) |
| end subroutine cmsg |
| end interface |
| end module |
| |
| subroutine rmsg(msg) |
| use cfuncs |
| character(*) msg |
| call cmsg(msg//c_null_char) ! need to concatenate a nul terminator |
| end subroutine rmsg |
| @end example |
| @noindent |
| where the C side is simply |
| @example |
| void cmsg(const char *msg) |
| @{ |
| // do something with nul-terminated string 'msg' |
| @} |
| @end example |
| |
| Passing a variable-length string from C to Fortran is trickier, but all |
| the uses in BLAS and LAPACK are of a single character, and for these we |
| can write a wrapper in Fortran along the lines of |
| @example |
| subroutine c_dgemm(transa, transb, m, n, k, alpha, |
| + a, lda, b, ldb, beta, c, ldc) |
| + bind(C, name = 'Cdgemm') |
| use iso_c_binding, only : c_char, c_int, c_double |
| character(c_char):: transa, transb |
| integer(c_int):: m, n, k, lda, ldb, ldc |
| real(c_double):: alpha, beta, a(lda,*), b(ldb,*), c(ldc,*) |
| call dgemm(transa, transb, m, n, k, alpha, |
| + a, lda, b, ldb, beta, c, ldc) |
| end subroutine c_dgemm |
| @end example |
| @noindent |
| which is then called from C with declaration |
| @example |
| void |
| Cdgemm(const char *transa, const char *transb, const int *m, |
| const int *n, const int *k, const double *alpha, |
| const double *a, const int *lda, const double *b, const int *ldb, |
| const double *beta, double *c, const int *ldc); |
| @end example |
| |
| @noindent |
| Alternatively, do as @R{} does as from version 3.6.2 and pass |
| the character length(s) from C to Fortran. A portable way to do this is |
| @example |
| // before any R headers, or define in PKG_CPPFLAGS |
| #define USE_FC_LEN_T |
| #include <Rconfig.h> |
| #include <R_ext/BLAS.h> |
| #ifndef FCONE |
| # define FCONE |
| #endif |
| ... |
| F77_CALL(dgemm)("N", "T", &nrx, &ncy, &ncx, &one, x, |
| &nrx, y, &nry, &zero, z, &nrx FCONE FCONE); |
| @end example |
| @noindent |
| (Note there is no comma before or between the @code{FCONE} invocations.) |
| It is strongly recommended that packages which call from C/C++ |
| BLAS/LAPACK routines with character arguments adopt this approach. |
| |
| @node Fortran LOGICAL, , Fortran character strings, Calling C from Fortran and vice versa |
| @subsection Fortran LOGICAL |
| |
| Passing Fortran LOGICAL variables to/from C/C++ is potentially |
| compiler-dependent. Fortran compilers have long used a 32-bit integer |
| type so it is pretty portable to use @code{int *} on the C/C++ side. |
| However, recent versions of @command{gfortran} @emph{via} the option |
| @option{-fc-prototypes-external} say the C equivalent is |
| @code{int_least32_t *}: `Link-Time Optimization' will report @code{int |
| *} as a mismatch. |
| |
| @node Numerical analysis subroutines, Optimization, Calling C from Fortran and vice versa, The R API |
| @section Numerical analysis subroutines |
| @cindex Numerical analysis subroutines from C |
| |
| @R{} contains a large number of mathematical functions for its own use, |
| for example numerical linear algebra computations and special functions. |
| |
| The header files @file{R_ext/BLAS.h}, @file{R_ext/Lapack.h} and |
| @file{R_ext/Linpack.h} contains declarations of the BLAS, LAPACK and |
| LINPACK linear algebra functions included in @R{}. These are expressed |
| as calls to Fortran subroutines, and they will also be usable from |
| users' Fortran code. Although not part of the official @acronym{API}, |
| this set of subroutines is unlikely to change (but might be |
| supplemented). |
| |
| The header file @file{Rmath.h} lists many other functions that are |
| available and documented in the following subsections. Many of these are |
| C interfaces to the code behind @R{} functions, so the @R{} function |
| documentation may give further details. |
| |
| @menu |
| * Distribution functions:: |
| * Mathematical functions:: |
| * Numerical Utilities:: |
| * Mathematical constants:: |
| @end menu |
| |
| @node Distribution functions, Mathematical functions, Numerical analysis subroutines, Numerical analysis subroutines |
| @subsection Distribution functions |
| @cindex Distribution functions from C |
| |
| The routines used to calculate densities, cumulative distribution |
| functions and quantile functions for the standard statistical |
| distributions are available as entry points. |
| |
| The arguments for the entry points follow the pattern of those for the |
| normal distribution: |
| |
| @example |
| @group |
| double dnorm(double @var{x}, double @var{mu}, double @var{sigma}, int @var{give_log}); |
| double pnorm(double @var{x}, double @var{mu}, double @var{sigma}, int @var{lower_tail}, |
| int @var{give_log}); |
| double qnorm(double @var{p}, double @var{mu}, double @var{sigma}, int @var{lower_tail}, |
| int @var{log_p}); |
| double rnorm(double @var{mu}, double @var{sigma}); |
| @end group |
| @end example |
| |
| @noindent |
| That is, the first argument gives the position for the density and CDF |
| and probability for the quantile function, followed by the |
| distribution's parameters. Argument @var{lower_tail} should be |
| @code{TRUE} (or @code{1}) for normal use, but can be @code{FALSE} (or |
| @code{0}) if the probability of the upper tail is desired or specified. |
| |
| Finally, @var{give_log} should be non-zero if the result is required on |
| log scale, and @var{log_p} should be non-zero if @var{p} has been |
| specified on log scale. |
| |
| Note that you directly get the cumulative (or ``integrated'') |
| @emph{hazard} function, @eqn{H(t) = - \log(1 - F(t)), H(t) = - log(1 - |
| F(t))}, by using |
| |
| @example |
| - p@var{dist}(t, ..., /*lower_tail = */ FALSE, /* give_log = */ TRUE) |
| @end example |
| |
| @noindent |
| or shorter (and more cryptic) @code{- p@var{dist}(t, ..., 0, 1)}. |
| @cindex cumulative hazard |
| |
| The random-variate generation routine @code{rnorm} returns one normal |
| variate. @xref{Random numbers}, for the protocol in using the |
| random-variate routines. |
| @cindex Random numbers in C |
| |
| Note that these argument sequences are (apart from the names and that |
| @code{rnorm} has no @var{n}) mainly the same as the corresponding @R{} |
| functions of the same name, so the documentation of the @R{} functions |
| can be used. Note that the exponential and gamma distributions are |
| parametrized by @code{scale} rather than @code{rate}. |
| |
| |
| For reference, the following table gives the basic name (to be prefixed |
| by @samp{d}, @samp{p}, @samp{q} or @samp{r} apart from the exceptions |
| noted) and distribution-specific arguments for the complete set of |
| distributions. |
| |
| @quotation |
| @multitable @columnfractions .28 .22 .30 |
| @item beta @tab @code{beta} @tab @code{a}, @code{b} |
| @item non-central beta @tab @code{nbeta} @tab @code{a}, @code{b}, @code{ncp} |
| @c in R shape1, shape2, ncp |
| @item binomial @tab @code{binom} @tab @code{n}, @code{p} |
| @item Cauchy @tab @code{cauchy} @tab @code{location}, @code{scale} |
| @item chi-squared @tab @code{chisq} @tab @code{df} |
| @item non-central chi-squared @tab @code{nchisq} @tab @code{df}, @code{ncp} |
| @item exponential @tab @code{exp} @tab @code{scale} (and @strong{not} @code{rate}) |
| @item F @tab @code{f} @tab @code{n1}, @code{n2} |
| @item non-central F @tab @code{nf} @tab @code{n1}, @code{n2}, @code{ncp} |
| @item gamma @tab @code{gamma} @tab @code{shape}, @code{scale} |
| @item geometric @tab @code{geom} @tab @code{p} |
| @item hypergeometric @tab @code{hyper} @tab @code{NR}, @code{NB}, @code{n} |
| @c in R m, n, k |
| @item logistic @tab @code{logis} @tab @code{location}, @code{scale} |
| @item lognormal @tab @code{lnorm} @tab @code{logmean}, @code{logsd} |
| @item negative binomial @tab @code{nbinom} @tab @code{size}, @code{prob} |
| @item normal @tab @code{norm} @tab @code{mu}, @code{sigma} |
| @item Poisson @tab @code{pois} @tab @code{lambda} |
| @item Student's t @tab @code{t} @tab @code{n} |
| @item non-central t @tab @code{nt} @tab @code{df}, @code{delta} |
| @item Studentized range @tab @code{tukey} (*) @tab @code{rr}, @code{cc}, @code{df} |
| @c in R nranges, nmeans, df |
| @item uniform @tab @code{unif} @tab @code{a}, @code{b} |
| @c in R min, max |
| @item Weibull @tab @code{weibull} @tab @code{shape}, @code{scale} |
| @item Wilcoxon rank sum @tab @code{wilcox} @tab @code{m}, @code{n} |
| @item Wilcoxon signed rank @tab @code{signrank} @tab @code{n} |
| @end multitable |
| @end quotation |
| |
| @noindent |
| Entries marked with an asterisk only have @samp{p} and @samp{q} |
| functions available, and none of the non-central distributions have |
| @samp{r} functions. After a call to @code{dwilcox}, @code{pwilcox} or |
| @code{qwilcox} the function @code{wilcox_free()} should be called, and |
| similarly for the signed rank functions. |
| |
| (If remapping is suppressed, the Normal distribution names are |
| @code{Rf_dnorm4}, @code{Rf_pnorm5} and @code{Rf_qnorm5}.) |
| |
| For the negative binomial distribution (@samp{nbinom}), in addition to the |
| @code{(size, prob)} parametrization, the alternative @code{(size, mu)} |
| parametrization is provided as well by functions @samp{[dpqr]nbinom_mu()}, |
| see @kbd{?NegBinomial} in @R{}. |
| |
| Functions @code{dpois_raw(x, *)} and @code{dbinom_raw(x, *)} are versions of the |
| Poisson and binomial probability mass functions which work continuously in |
| @code{x}, whereas @code{dbinom(x,*)} and @code{dpois(x,*)} only return non |
| zero values for integer @code{x}. |
| @example |
| @group |
| double dbinom_raw(double x, double n, double p, double q, int give_log) |
| double dpois_raw (double x, double lambda, int give_log) |
| @end group |
| @end example |
| Note that @code{dbinom_raw()} gets both @eqn{p, p} and @eqn{q = 1-p, q = 1-p} which |
| may be advantageous when one of them is close to @eqn{1, 1}. |
| |
| |
| @node Mathematical functions, Numerical Utilities, Distribution functions, Numerical analysis subroutines |
| @subsection Mathematical functions |
| |
| @findex gammafn |
| @findex lgammafn |
| @findex digamma |
| @findex trigamma |
| @findex tetragamma |
| @findex pentagamma |
| @findex psigamma |
| @cindex Gamma function |
| @deftypefun double gammafn (double @var{x}) |
| @deftypefunx double lgammafn (double @var{x}) |
| @deftypefunx double digamma (double @var{x}) |
| @deftypefunx double trigamma (double @var{x}) |
| @deftypefunx double tetragamma (double @var{x}) |
| @deftypefunx double pentagamma (double @var{x}) |
| @deftypefunx double psigamma (double @var{x}, double @var{deriv}) |
| The Gamma function, the natural logarithm of its absolute value and |
| first four derivatives and the n-th derivative of Psi, the digamma |
| function, which is the derivative of @code{lgammafn}. In other words, |
| @code{digamma(x)} is the same as @code{psigamma(x,0)}, |
| @code{trigamma(x) == psigamma(x,1)}, etc. |
| @end deftypefun |
| |
| @findex beta |
| @findex lbeta |
| @cindex Beta function |
| @deftypefun double beta (double @var{a}, double @var{b}) |
| @deftypefunx double lbeta (double @var{a}, double @var{b}) |
| The (complete) Beta function and its natural logarithm. |
| @end deftypefun |
| |
| @findex choose |
| @findex lchoose |
| @deftypefun double choose (double @var{n}, double @var{k}) |
| @deftypefunx double lchoose (double @var{n}, double @var{k}) |
| The number of combinations of @var{k} items chosen from from @var{n} and |
| the natural logarithm of its absolute value, generalized to arbitrary real |
| @var{n}. @var{k} is rounded to the nearest integer (with a warning if |
| needed). |
| @end deftypefun |
| |
| @findex bessel_i |
| @findex bessel_j |
| @findex bessel_k |
| @findex bessel_y |
| @cindex Bessel functions |
| @deftypefun double bessel_i (double @var{x}, double @var{nu}, double @var{expo}) |
| @deftypefunx double bessel_j (double @var{x}, double @var{nu}) |
| @deftypefunx double bessel_k (double @var{x}, double @var{nu}, double @var{expo}) |
| @deftypefunx double bessel_y (double @var{x}, double @var{nu}) |
| Bessel functions of types I, J, K and Y with index @var{nu}. For |
| @code{bessel_i} and @code{bessel_k} there is the option to return |
| @w{exp(-@var{x}) I(@var{x}; @var{nu})} or @w{exp(@var{x}) K(@var{x}; |
| @var{nu})} if @var{expo} is 2. (Use @code{@var{expo} == 1} for unscaled |
| values.) |
| @end deftypefun |
| |
| |
| @node Numerical Utilities, Mathematical constants, Mathematical functions, Numerical analysis subroutines |
| @subsection Numerical Utilities |
| There are a few other numerical utility functions available as entry points. |
| |
| |
| @deftypefun double R_pow (double @var{x}, double @var{y}) |
| @deftypefunx double R_pow_di (double @var{x}, int @var{i}) |
| @code{R_pow(@var{x}, @var{y})} and @code{R_pow_di(@var{x}, @var{i})} |
| compute @code{@var{x}^@var{y}} and @code{@var{x}^@var{i}}, respectively |
| using @code{R_FINITE} checks and returning the proper result (the same |
| as @R{}) for the cases where @var{x}, @var{y} or @var{i} are 0 or |
| missing or infinite or @code{NaN}. |
| @end deftypefun |
| |
| @deftypefun double log1p (double @var{x}) |
| Computes @code{log(1 + @var{x})} (@emph{log 1 @b{p}lus x}), accurately |
| even for small @var{x}, i.e., @eqn{|x| \ll 1, |x| << 1}. |
| |
| This should be provided by your platform, in which case it is not |
| included in @file{Rmath.h}, but is (probably) in @file{math.h} which |
| @file{Rmath.h} includes (except under C++, so it may not be declared for |
| C++98). |
| @end deftypefun |
| |
| @deftypefun double log1pmx (double @var{x}) |
| Computes @code{log(1 + @var{x}) - @var{x}} (@emph{log 1 @b{p}lus x @b{m}inus @b{x}}), |
| accurately even for small @var{x}, i.e., @eqn{|x| \ll 1, |x| << 1}. |
| @end deftypefun |
| |
| @deftypefun double log1pexp (double @var{x}) |
| Computes @code{log(1 + exp(@var{x}))} (@emph{log 1 @b{p}lus @b{exp}}), |
| accurately, notably for large @var{x}, e.g., @eqn{x > 720, x > 720}. |
| @end deftypefun |
| |
| @c log1mexp(.) to come |
| |
| @deftypefun double expm1 (double @var{x}) |
| Computes @code{exp(@var{x}) - 1} (@emph{exp x @b{m}inus 1}), accurately |
| even for small @var{x}, i.e., @eqn{|x| \ll 1, |x| << 1}. |
| |
| This should be provided by your platform, in which case it is not |
| included in @file{Rmath.h}, but is (probably) in @file{math.h} which |
| @file{Rmath.h} includes (except under C++, so it may not be declared for |
| C++98). |
| @end deftypefun |
| |
| @deftypefun double lgamma1p (double @var{x}) |
| Computes @code{log(gamma(@var{x} + 1))} (@emph{log(gamma(1 @b{p}lus x))}), |
| accurately even for small @var{x}, i.e., @eqn{0 < x < 0.5, 0 < x < 0.5}. |
| @end deftypefun |
| |
| @deftypefun double cospi (double @var{x}) |
| Computes @code{cos(pi * x)} (where @code{pi} is 3.14159...), |
| accurately, notably for half integer @var{x}. |
| |
| This might be provided by your platform@footnote{It is an optional C11 |
| extension.}, in which case it is not included in @file{Rmath.h}, but is |
| in @file{math.h} which @file{Rmath.h} includes. (Ensure that |
| neither @file{math.h} nor @file{cmath} is included before |
| @file{Rmath.h} or define |
| @example |
| #define __STDC_WANT_IEC_60559_FUNCS_EXT__ 1 |
| @end example |
| @noindent |
| before the first inclusion.) |
| @end deftypefun |
| |
| @deftypefun double sinpi (double @var{x}) |
| Computes @code{sin(pi * x)} accurately, notably for (half) integer @var{x}. |
| |
| This might be provided by your platform, in which case it is not |
| included in @file{Rmath.h}, but is in @file{math.h} which @file{Rmath.h} |
| includes (but see the comments for @code{cospi}). |
| @end deftypefun |
| |
| @deftypefun double tanpi (double @var{x}) |
| Computes @code{tan(pi * x)} accurately, notably for (half) integer @var{x}. |
| |
| This might be provided by your platform, in which case it is not included |
| in @file{Rmath.h}, but is in @file{math.h} which @file{Rmath.h} includes |
| (but see the comments for @code{cospi}). |
| @end deftypefun |
| |
| @deftypefun double logspace_add (double @var{logx}, double @var{logy}) |
| @deftypefunx double logspace_sub (double @var{logx}, double @var{logy}) |
| @deftypefunx double logspace_sum (const double* @var{logx}, int @var{n}) |
| Compute the log of a sum or difference from logs of terms, i.e., ``x + |
| y'' as @code{log (exp(@var{logx}) + exp(@var{logy}))} and ``x - y'' as |
| @code{log (exp(@var{logx}) - exp(@var{logy}))}, |
| and ``sum_i x[i]'' as @code{log (sum[i = 1:@var{n} exp(@var{logx}[i])] )} |
| without causing unnecessary overflows or throwing away too much accuracy. |
| @end deftypefun |
| |
| @deftypefun int imax2 (int @var{x}, int @var{y}) |
| @deftypefunx int imin2 (int @var{x}, int @var{y}) |
| @deftypefunx double fmax2 (double @var{x}, double @var{y}) |
| @deftypefunx double fmin2 (double @var{x}, double @var{y}) |
| Return the larger (@code{max}) or smaller (@code{min}) of two integer or |
| double numbers, respectively. Note that @code{fmax2} and @code{fmin2} |
| differ from C99/C++11's @code{fmax} and @code{fmin} when one of the |
| arguments is a @code{NaN}: these versions return @code{NaN}. |
| @end deftypefun |
| |
| @deftypefun double sign (double @var{x}) |
| Compute the @emph{signum} function, where sign(@var{x}) is 1, 0, or |
| @math{-1}, when @var{x} is positive, 0, or negative, respectively, and |
| @code{NaN} if @code{x} is a @code{NaN}. |
| @end deftypefun |
| |
| @deftypefun double fsign (double @var{x}, double @var{y}) |
| Performs ``transfer of sign'' and is defined as @eqn{|x| * |
| \hbox{sign}(y), |x| * sign(y)}. |
| @end deftypefun |
| |
| @deftypefun double fprec (double @var{x}, double @var{digits}) |
| Returns the value of @var{x} rounded to @var{digits} decimal digits |
| (after the decimal point). |
| |
| This is the function used by @R{}'s @code{signif()}. |
| @end deftypefun |
| |
| @deftypefun double fround (double @var{x}, double @var{digits}) |
| Returns the value of @var{x} rounded to @var{digits} @emph{significant} |
| decimal digits. |
| |
| This is the function used by @R{}'s @code{round()}. (Note that C99/C++11 |
| provide a @code{round} function but C++98 need not.) |
| @end deftypefun |
| |
| @deftypefun double ftrunc (double @var{x}) |
| Returns the value of @var{x} truncated (to an integer value) towards |
| zero. |
| @end deftypefun |
| |
| @node Mathematical constants, , Numerical Utilities, Numerical analysis subroutines |
| @subsection Mathematical constants |
| @findex M_E |
| @findex M_PI |
| @c maybe not all into the index ... |
| |
| @R{} has a set of commonly used mathematical constants encompassing |
| constants defined by POSIX and usually@footnote{but see the second |
| paragraph of @pxref{Portable C and C++ code}.} found in @file{math.h} |
| (but maybe not in the C++ header @file{cmath}) and contains further ones |
| that are used in statistical computations. These are defined to (at |
| least) 30 digits accuracy in @file{Rmath.h}. The following definitions |
| use @code{ln(x)} for the natural logarithm (@code{log(x)} in @R{}). |
| |
| @quotation |
| @multitable {Name can be long} {Definition (needs space)} {0.123456789012345678 ...} |
| @headitem Name @tab Definition (@code{ln = log}) @tab round(@emph{value}, 7) |
| @c SVID & X/Open Constants -- names from Solaris math.h : |
| @item @code{M_E} @tab @math{e} @tab 2.7182818 |
| @item @code{M_LOG2E} @tab log2(@math{e}) @tab 1.4426950 |
| @item @code{M_LOG10E} @tab log10(@math{e}) @tab 0.4342945 |
| @item @code{M_LN2} @tab ln(2) @tab 0.6931472 |
| @item @code{M_LN10} @tab ln(10) @tab 2.3025851 |
| @item @code{M_PI} @tab @eqn{\pi, pi} @tab 3.1415927 |
| @item @code{M_PI_2} @tab @eqn{\pi/2, pi/2} @tab 1.5707963 |
| @item @code{M_PI_4} @tab @eqn{\pi/4, pi/4} @tab 0.7853982 |
| @item @code{M_1_PI} @tab @eqn{1/\pi, 1/pi} @tab 0.3183099 |
| @item @code{M_2_PI} @tab @eqn{2/\pi, 2/pi} @tab 0.6366198 |
| @item @code{M_2_SQRTPI} @tab 2/sqrt(@eqn{\pi, pi}) @tab 1.1283792 |
| @item @code{M_SQRT2} @tab sqrt(2) @tab 1.4142136 |
| @item @code{M_SQRT1_2} @tab 1/sqrt(2) @tab 0.7071068 |
| @c R-specific ones |
| @item @code{M_SQRT_3} @tab sqrt(3) @tab 1.7320508 |
| @item @code{M_SQRT_32} @tab sqrt(32) @tab 5.6568542 |
| @item @code{M_LOG10_2} @tab log10(2) @tab 0.3010300 |
| @item @code{M_2PI} @tab @eqn{2\pi, 2*pi} @tab 6.2831853 |
| @item @code{M_SQRT_PI} @tab sqrt(@eqn{\pi, pi}) @tab 1.7724539 |
| @item @code{M_1_SQRT_2PI} @tab 1/sqrt(@eqn{2\pi, 2*pi}) @tab 0.3989423 |
| @item @code{M_SQRT_2dPI} @tab sqrt(2/@eqn{\pi, pi}) @tab 0.7978846 |
| @item @code{M_LN_SQRT_PI} @tab ln(sqrt(@eqn{\pi, pi})) @tab 0.5723649 |
| @item @code{M_LN_SQRT_2PI} @tab ln(sqrt(@eqn{2\pi, 2*pi})) @tab 0.9189385 |
| @item @code{M_LN_SQRT_PId2} @tab ln(sqrt(@eqn{\pi, pi}/2)) @tab 0.2257914 |
| @end multitable |
| @end quotation |
| |
| There are a set of constants (@code{PI}, @code{DOUBLE_EPS}) (and so on) |
| defined (unless @code{STRICT_R_HEADERS} is defined) in the included |
| header @file{R_ext/Constants.h}, mainly for compatibility with @Sl{}. |
| |
| @findex TRUE |
| @findex FALSE |
| Further, the included header @file{R_ext/Boolean.h} has enumeration |
| constants @code{TRUE} and @code{FALSE} of type @code{Rboolean} in |
| order to provide a way of using ``logical'' variables in C consistently. |
| This can conflict with other software: for example it conflicts with the |
| headers in IJG's @code{jpeg-9} (but not earlier versions). |
| |
| |
| @node Optimization, Integration, Numerical analysis subroutines, The R API |
| @section Optimization |
| @cindex optimization |
| |
| The C code underlying @code{optim} can be accessed directly. The user |
| needs to supply a function to compute the function to be minimized, of |
| the type |
| |
| @example |
| typedef double optimfn(int n, double *par, void *ex); |
| @end example |
| |
| @noindent |
| where the first argument is the number of parameters in the second |
| argument. The third argument is a pointer passed down from the calling |
| routine, normally used to carry auxiliary information. |
| |
| Some of the methods also require a gradient function |
| |
| @example |
| typedef void optimgr(int n, double *par, double *gr, void *ex); |
| @end example |
| |
| @noindent |
| which passes back the gradient in the @code{gr} argument. No function |
| is provided for finite-differencing, nor for approximating the Hessian |
| at the result. |
| |
| The interfaces (defined in header @file{R_ext/Applic.h}) are |
| |
| @itemize @bullet |
| @item Nelder Mead: |
| @findex nmmin |
| @example |
| void nmmin(int n, double *xin, double *x, double *Fmin, optimfn fn, |
| int *fail, double abstol, double intol, void *ex, |
| double alpha, double beta, double gamma, int trace, |
| int *fncount, int maxit); |
| @end example |
| |
| @item BFGS: |
| @findex vmmin |
| @example |
| void vmmin(int n, double *x, double *Fmin, |
| optimfn fn, optimgr gr, int maxit, int trace, |
| int *mask, double abstol, double reltol, int nREPORT, |
| void *ex, int *fncount, int *grcount, int *fail); |
| @end example |
| |
| @item Conjugate gradients: |
| @findex cgmin |
| @example |
| void cgmin(int n, double *xin, double *x, double *Fmin, |
| optimfn fn, optimgr gr, int *fail, double abstol, |
| double intol, void *ex, int type, int trace, |
| int *fncount, int *grcount, int maxit); |
| @end example |
| |
| @item Limited-memory BFGS with bounds: |
| @findex lbfgsb |
| @example |
| void lbfgsb(int n, int lmm, double *x, double *lower, |
| double *upper, int *nbd, double *Fmin, optimfn fn, |
| optimgr gr, int *fail, void *ex, double factr, |
| double pgtol, int *fncount, int *grcount, |
| int maxit, char *msg, int trace, int nREPORT); |
| @end example |
| |
| @item Simulated annealing: |
| @findex samin |
| @example |
| void samin(int n, double *x, double *Fmin, optimfn fn, int maxit, |
| int tmax, double temp, int trace, void *ex); |
| @end example |
| |
| @end itemize |
| |
| @noindent |
| Many of the arguments are common to the various methods. @code{n} is |
| the number of parameters, @code{x} or @code{xin} is the starting |
| parameters on entry and @code{x} the final parameters on exit, with |
| final value returned in @code{Fmin}. Most of the other parameters can |
| be found from the help page for @code{optim}: see the source code |
| @file{src/appl/lbfgsb.c} for the values of @code{nbd}, which |
| specifies which bounds are to be used. |
| |
| |
| @node Integration, Utility functions, Optimization, The R API |
| @section Integration |
| @cindex integration |
| |
| The C code underlying @code{integrate} can be accessed directly. The |
| user needs to supply a @emph{vectorizing} C function to compute the |
| function to be integrated, of the type |
| |
| @example |
| typedef void integr_fn(double *x, int n, void *ex); |
| @end example |
| |
| @noindent |
| where @code{x[]} is both input and output and has length @code{n}, i.e., |
| a C function, say @code{fn}, of type @code{integr_fn} must basically do |
| @code{for(i in 1:n) x[i] := f(x[i], ex)}. The vectorization requirement |
| can be used to speed up the integrand instead of calling it @code{n} |
| times. Note that in the current implementation built on QUADPACK, |
| @code{n} will be either 15 or 21. The @code{ex} argument is a pointer |
| passed down from the calling routine, normally used to carry auxiliary |
| information. |
| |
| There are interfaces (defined in header @file{R_ext/Applic.h}) for |
| integrals over finite and infinite intervals (or ``ranges'' or |
| ``integration boundaries''). |
| |
| @itemize @bullet |
| @item Finite: |
| @findex Rdqags |
| @example |
| void Rdqags(integr_fn f, void *ex, double *a, double *b, |
| double *epsabs, double *epsrel, |
| double *result, double *abserr, int *neval, int *ier, |
| int *limit, int *lenw, int *last, |
| int *iwork, double *work); |
| @end example |
| |
| @item Infinite: |
| @findex Rdqagi |
| @example |
| void Rdqagi(integr_fn f, void *ex, double *bound, int *inf, |
| double *epsabs, double *epsrel, |
| double *result, double *abserr, int *neval, int *ier, |
| int *limit, int *lenw, int *last, |
| int *iwork, double *work); |
| @end example |
| |
| @end itemize |
| |
| @noindent |
| Only the 3rd and 4th argument differ for the two integrators; for the |
| finite range integral using @code{Rdqags}, @code{a} and @code{b} are the |
| integration interval bounds, whereas for an infinite range integral using |
| @code{Rdqagi}, @code{bound} is the finite bound of the integration (if |
| the integral is not doubly-infinite) and @code{inf} is a code indicating |
| the kind of integration range, |
| |
| @table @code |
| @item inf = 1 |
| corresponds to (bound, +Inf), |
| @item inf = -1 |
| corresponds to (-Inf, bound), |
| @item inf = 2 |
| corresponds to (-Inf, +Inf), |
| @end table |
| |
| @code{f} and @code{ex} define the integrand function, see above; |
| @code{epsabs} and @code{epsrel} specify the absolute and relative |
| accuracy requested, @code{result}, @code{abserr} and @code{last} are the |
| output components @code{value}, @code{abs.err} and @code{subdivisions} |
| of the @R{} function integrate, where @code{neval} gives the number of |
| integrand function evaluations, and the error code @code{ier} is |
| translated to @R{}'s @code{integrate() $ message}, look at that function |
| definition. @code{limit} corresponds to @code{integrate(..., |
| subdivisions = *)}. It seems you should always define the two work |
| arrays and the length of the second one as |
| |
| @example |
| lenw = 4 * limit; |
| iwork = (int *) R_alloc(limit, sizeof(int)); |
| work = (double *) R_alloc(lenw, sizeof(double)); |
| @end example |
| |
| The comments in the source code in @file{src/appl/integrate.c} give |
| more details, particularly about reasons for failure (@code{ier >= 1}). |
| |
| |
| @node Utility functions, Re-encoding, Integration, The R API |
| @section Utility functions |
| @cindex Sort functions from C |
| |
| @R{} has a fairly comprehensive set of sort routines which are made |
| available to users' C code. |
| The following is declared in header file @file{Rinternals.h}. |
| |
| @deftypefun void R_orderVector (int* @var{indx}, int @var{n}, SEXP @var{arglist}, Rboolean @var{nalast}, Rboolean @var{decreasing}) |
| @deftypefunx void R_orderVector1 (int* @var{indx}, int @var{n}, SEXP @var{x}, Rboolean @var{nalast}, Rboolean @var{decreasing}) |
| |
| @code{R_orderVector()} corresponds to @R{}'s @code{order(..., na.last, decreasing)}. |
| More specifically, @code{indx <- order(x, y, na.last, decreasing)} corresponds to |
| @code{R_orderVector(indx, n, Rf_lang2(x, y), nalast, decreasing)} and for |
| three vectors, @code{Rf_lang3(x,y,z)} is used as @var{arglist}. |
| |
| Both @code{R_orderVector} and @code{R_orderVector1} assume the vector |
| @code{indx} to be allocated to length @eqn{\ge n, >= n}. On return, |
| @code{indx[]} contains a permutation of @code{0:(n-1)}, i.e., 0-based C |
| indices (and not 1-based @R{} indices, as @R{}'s @code{order()}). |
| |
| When ordering only one vector, @code{R_orderVector1} is faster and |
| corresponds (but is 0-based) to @R{}'s @code{indx <- order(x, na.last, |
| decreasing)}. It was added in @R{} 3.3.0. |
| @end deftypefun |
| |
| All other sort routines are declared in header file |
| @file{R_ext/Utils.h} (included by @file{R.h}) and include the following. |
| |
| @deftypefun void R_isort (int* @var{x}, int @var{n}) |
| @deftypefunx void R_rsort (double* @var{x}, int @var{n}) |
| @deftypefunx void R_csort (Rcomplex* @var{x}, int @var{n}) |
| @deftypefunx void rsort_with_index (double* @var{x}, int* @var{index}, int @var{n}) |
| The first three sort integer, real (double) and complex data |
| respectively. (Complex numbers are sorted by the real part first then |
| the imaginary part.) @code{NA}s are sorted last. |
| |
| @code{rsort_with_index} sorts on @var{x}, and applies the same |
| permutation to @var{index}. @code{NA}s are sorted last. |
| @end deftypefun |
| |
| @deftypefun void revsort (double* @var{x}, int* @var{index}, int @var{n}) |
| Is similar to @code{rsort_with_index} but sorts into decreasing order, |
| and @code{NA}s are not handled. |
| @end deftypefun |
| |
| @deftypefun void iPsort (int* @var{x}, int @var{n}, int @var{k}) |
| @deftypefunx void rPsort (double* @var{x}, int @var{n}, int @var{k}) |
| @deftypefunx void cPsort (Rcomplex* @var{x}, int @var{n}, int @var{k}) |
| These all provide (very) partial sorting: they permute @var{x} so that |
| @code{@var{x}[@var{k}]} is in the correct place with smaller values to |
| the left, larger ones to the right. |
| @end deftypefun |
| |
| |
| @deftypefun void R_qsort (double *@var{v}, size_t @var{i}, size_t @var{j}) |
| @deftypefunx void R_qsort_I (double *@var{v}, int *@var{I}, int @var{i}, int @var{j}) |
| @deftypefunx void R_qsort_int (int *@var{iv}, size_t @var{i}, size_t @var{j}) |
| @deftypefunx void R_qsort_int_I (int *@var{iv}, int *@var{I}, int @var{i}, int @var{j}) |
| |
| |
| These routines sort @code{@var{v}[@var{i}:@var{j}]} or |
| @code{@var{iv}[@var{i}:@var{j}]} (using 1-indexing, i.e., |
| @code{@var{v}[1]} is the first element) calling the quicksort algorithm |
| as used by @R{}'s @code{sort(v, method = "quick")} and documented on the |
| help page for the @R{} function @code{sort}. The @code{..._I()} |
| versions also return the @code{sort.index()} vector in @code{I}. Note |
| that the ordering is @emph{not} stable, so tied values may be permuted. |
| |
| Note that @code{NA}s are not handled (explicitly) and you should |
| use different sorting functions if @code{NA}s can be present. |
| @end deftypefun |
| |
| @deftypefun subroutine qsort4 (double precision @var{v}, integer @var{indx}, integer @var{ii}, integer @var{jj}) |
| @deftypefunx subroutine qsort3 (double precision @var{v}, integer @var{ii}, integer @var{jj}) |
| |
| The Fortran interface routines for sorting double precision vectors are |
| @code{qsort3} and @code{qsort4}, equivalent to @code{R_qsort} and |
| @code{R_qsort_I}, respectively. |
| @end deftypefun |
| |
| @deftypefun void R_max_col (double* @var{matrix}, int* @var{nr}, int* @var{nc}, int* @var{maxes}, int* @var{ties_meth}) |
| Given the @var{nr} by @var{nc} matrix @code{matrix} in column-major |
| (``Fortran'') |
| order, @code{R_max_col()} returns in @code{@var{maxes}[@var{i}-1]} the |
| column number of the maximal element in the @var{i}-th row (the same as |
| @R{}'s @code{max.col()} function). In the case of ties (multiple maxima), |
| @code{*ties_meth} is an integer code in @code{1:3} determining the method: |
| 1 = ``random'', 2 = ``first'' and 3 = ``last''. |
| See @R{}'s help page @code{?max.col}. |
| @end deftypefun |
| |
| @deftypefun int findInterval (double* @var{xt}, int @var{n}, double @var{x}, Rboolean @var{rightmost_closed}, Rboolean @var{all_inside}, int @var{ilo}, int* @var{mflag}) |
| @deftypefunx int findInterval2(double* @var{xt}, int @var{n}, double @var{x}, Rboolean @var{rightmost_closed}, Rboolean @var{all_inside}, Rboolean @var{left_open}, int @var{ilo}, int* @var{mflag}) |
| Given the ordered vector @var{xt} of length @var{n}, return the interval |
| or index of @var{x} in @code{@var{xt}[]}, typically max(@math{i}; @eqn{1 |
| \le i \le @var{n}, 1 <= i <= @var{n}} & @math{@var{xt}[i]} @eqn{\le, <=} |
| @var{x}) where we use 1-indexing as in @R{} and Fortran (but not C). If |
| @var{rightmost_closed} is true, also returns @math{@var{n}-1} if @var{x} |
| equals @math{@var{xt}[@var{n}]}. If @var{all_inside} is not 0, the |
| result is coerced to lie in @code{1:(@var{n}-1)} even when @var{x} is |
| outside the @var{xt}[] range. On return, @code{*@var{mflag}} equals |
| @math{-1} if @var{x} < @var{xt}[1], @math{+1} if @var{x} >= |
| @var{xt}[@var{n}], and 0 otherwise. |
| |
| The algorithm is particularly fast when @var{ilo} is set to the last |
| result of @code{findInterval()} and @var{x} is a value of a sequence which |
| is increasing or decreasing for subsequent calls. |
| |
| @code{findInterval2()} is a generalization of @code{findInterval()}, |
| with an extra @code{Rboolean} argument @var{left_open}. Setting |
| @code{left_open = TRUE} basically replaces all left-closed right-open |
| intervals @eqn{[s, t)} by left-open ones @eqn{(s, t]}, see the help page |
| of @R{} function @code{findInterval} for details. |
| |
| There is also an @code{F77_CALL(interv)()} version of |
| @code{findInterval()} with the same arguments, but all pointers. |
| @end deftypefun |
| |
| A system-independent interface to produce the name of a temporary |
| file is provided as |
| |
| @deftypefun {char *} R_tmpnam (const char *@var{prefix}, const char *@var{tmpdir}) |
| @deftypefunx {char *} R_tmpnam2 (const char *@var{prefix}, const char *@var{tmpdir}, const char *@var{fileext}) |
| Return a pathname for a temporary file with name beginning with |
| @var{prefix} and ending with @var{fileext} in directory @var{tmpdir}. |
| A @code{NULL} prefix or extension is replaced by @code{""}. Note that |
| the return value is dynamically allocated and should be freed using |
| @code{R_free_tmpnam} when no longer needed (unlike the |
| system call @code{tmpnam}). Freeing the result using @code{free} is no |
| longer recommended. |
| @end deftypefun |
| |
| @c ---- |
| |
| There is also the internal function used to expand file names in several |
| @R{} functions, and called directly by @code{path.expand}. |
| |
| @deftypefun {const char *} R_ExpandFileName (const char *@var{fn}) |
| Expand a path name @var{fn} by replacing a leading tilde by the user's |
| home directory (if defined). The precise meaning is platform-specific; |
| it will usually be taken from the environment variable @env{HOME} if |
| this is defined. |
| @end deftypefun |
| |
| For historical reasons there are Fortran interfaces to functions |
| @code{D1MACH} and @code{I1MACH}. These can be called from C code as |
| e.g.@: @code{F77_CALL(d1mach)(4)}. Note that these are emulations of |
| the original functions by Fox, Hall and Schryer on NetLib at |
| @uref{http://www.netlib.org/slatec/src/} for IEC 60559 arithmetic |
| (required by @R{}). |
| |
| @node Re-encoding, Condition handling and cleanup code, Utility functions, The R API |
| @section Re-encoding |
| |
| @R{} has its own C-level interface to the encoding conversion |
| capabilities provided by @code{iconv} because there are |
| incompatibilities between the declarations in different implementations |
| of @code{iconv}. |
| |
| These are declared in header file @file{R_ext/Riconv.h}. |
| |
| @deftypefun {void *} Riconv_open (const char *@var{to}, const char *@var{from}) |
| @end deftypefun |
| Set up a pointer to an encoding object to be used to convert between two |
| encodings: @code{""} indicates the current locale. |
| |
| @deftypefun size_t Riconv (void *@var{cd}, const char **@var{inbuf}, size_t *@var{inbytesleft}, char **@var{outbuf}, size_t *@var{outbytesleft}) |
| @end deftypefun |
| Convert as much as possible of @code{inbuf} to @code{outbuf}. Initially |
| the @code{int} variables indicate the number of bytes available in the |
| buffers, and they are updated (and the @code{char} pointers are updated |
| to point to the next free byte in the buffer). The return value is the |
| number of characters converted, or @code{(size_t)-1} (beware: |
| @code{size_t} is usually an unsigned type). It should be safe to assume |
| that an error condition sets @code{errno} to one of @code{E2BIG} (the |
| output buffer is full), @code{EILSEQ} (the input cannot be converted, |
| and might be invalid in the encoding specified) or @code{EINVAL} (the |
| input does not end with a complete multi-byte character). |
| |
| @deftypefun int Riconv_close (void * @var{cd}) |
| @end deftypefun |
| Free the resources of an encoding object. |
| |
| |
| @node Condition handling and cleanup code, Allowing interrupts, Re-encoding, The R API |
| @section Condition handling and cleanup code |
| @cindex Condition handling |
| @cindex Cleanup code |
| @cindex Error handling |
| |
| Two functions are available for establishing condition handlers from |
| within C code: |
| |
| @example |
| #include <Rinternals.h> |
| |
| SEXP R_tryCatchError(SEXP (*fun)(void *data), void *data, |
| SEXP (*hndlr)(SEXP cond, void *hdata), void *hdata); |
| |
| SEXP R_tryCatch(SEXP (*fun)(void *data), void *data, |
| SEXP, |
| SEXP (*hndlr)(SEXP cond, void *hdata), void *hdata, |
| void (*clean)(void *cdata), void *cdata); |
| @end example |
| |
| @code{R_tryCatchError} establishes an exiting handler for conditions |
| inheriting form class @code{error}. |
| |
| @code{R_tryCatch} can be used to establish a handler for other |
| conditions and to register a cleanup action. The conditions to be |
| handled are specified as a character vector (@code{STRSXP}). |
| A @code{NULL} pointer can be passed as @code{fun} or @code{clean} |
| if condition handling or cleanup are not needed. |
| |
| These are currently implemented using the R-level @code{tryCatch} |
| mechanism so are subject to some overhead. |
| |
| The function @code{R_UnwindProtect} can be used to ensure that a cleanup |
| action takes place on ordinary return as well as on a non-local transfer |
| of control, which R implements as a @code{longjmp}. |
| |
| @example |
| SEXP R_UnwindProtect(SEXP (*fun)(void *data), void *data, |
| void (*clean)(void *data, Rboolean jump), void *cdata, |
| SEXP cont); |
| @end example |
| |
| @code{R_UnwindProtect} can be used in two ways. The simper usage, |
| suitable for use in C code, passes @code{NULL} for the @code{cont} |
| argument. @code{R_UnwindProtect} will call @code{fun(data)}. If |
| @code{fun} returns a value, then @code{R_UnwindProtect} calls |
| @code{clean(cleandata, FALSE)} before returning the value returned by |
| @code{fun}. If @code{fun} executes a non-local transfer of control, then |
| @code{clean(cleandata, TRUE)} is called, and the non-local transfer of |
| control is resumed. |
| |
| The second use pattern, suitable to support C++ stack unwinding, uses |
| two additional functions: |
| |
| @example |
| SEXP R_MakeUnwindCont(); |
| void NORET R_ContinueUnwind(SEXP cont); |
| @end example |
| |
| @code{R_MakeUnwindCont} allocates a @emph{continuation token} |
| @code{cont} to pass to @code{R_UnwindProtect}. This token should be |
| protected with @code{PROTECT} before calling |
| @code{R_UnwindProtect}. When the @code{clean} function is called with |
| @code{jump == TRUE}, indicating that R is executing a non-local transfer |
| of control, it can throw a C++ exception to a C++ @code{catch} outside |
| the C++ code to be unwound, and then use the continuation token in the a |
| call @code{R_ContinueUnwind(cont)} to resume the non-local transfer of |
| control within R. |
| |
| |
| @node Allowing interrupts, Platform and version information, Condition handling and cleanup code, The R API |
| @section Allowing interrupts |
| @cindex Interrupts |
| |
| No part of @R{} can be interrupted whilst running long computations in |
| compiled code, so programmers should make provision for the code to be |
| interrupted at suitable points by calling from C |
| |
| @example |
| #include <R_ext/Utils.h> |
| |
| void R_CheckUserInterrupt(void); |
| @end example |
| |
| @noindent |
| and from Fortran |
| |
| @example |
| subroutine rchkusr() |
| @end example |
| |
| These check if the user has requested an interrupt, and if so branch to |
| @R{}'s error signaling functions. |
| |
| Note that it is possible that the code behind one of the entry points |
| defined here if called from your C or Fortran code could be interruptible |
| or generate an error and so not return to your code. |
| |
| |
| @node Platform and version information, Inlining C functions, Allowing interrupts, The R API |
| @section Platform and version information |
| @cindex Version information from C |
| @cindex OpenMP |
| @findex R_Version |
| |
| The header files define @code{USING_R}, which can be used to test if |
| the code is indeed being used with @R{}. |
| |
| Header file @file{Rconfig.h} (included by @file{R.h}) is used to define |
| platform-specific macros that are mainly for use in other header files. |
| The macro @code{WORDS_BIGENDIAN} is defined on |
| big-endian@footnote{@uref{https://en.wikipedia.org/@/wiki/@/Endianness}.} |
| systems (e.g.@: most OSes on Sparc and PowerPC hardware) and not on |
| little-endian systems (nowadays all the commoner @R{} platforms). It |
| can be useful when manipulating binary files. NB: these macros apply |
| only to the C compiler used to build @R{}, not necessarily to another C |
| or C++ compiler. |
| |
| Header file @file{Rversion.h} (@strong{not} included by @file{R.h}) |
| defines a macro @code{R_VERSION} giving the version number encoded as an |
| integer, plus a macro @code{R_Version} to do the encoding. This can be |
| used to test if the version of @R{} is late enough, or to include |
| back-compatibility features. For protection against very old versions |
| of @R{} which did not have this macro, use a construction such as |
| |
| @example |
| @group |
| #if defined(R_VERSION) && R_VERSION >= R_Version(3, 1, 0) |
| ... |
| #endif |
| @end group |
| @end example |
| |
| More detailed information is available in the macros @code{R_MAJOR}, |
| @code{R_MINOR}, @code{R_YEAR}, @code{R_MONTH} and @code{R_DAY}: see the |
| header file @file{Rversion.h} for their format. Note that the minor |
| version includes the patchlevel (as in @samp{2.2}). |
| |
| Packages which use @code{alloca} need to ensure it is defined: as it is |
| part of neither C nor POSIX there is no standard way to do so. One can |
| use |
| |
| @example |
| #include <Rconfig.h> // for HAVE_ALLOCA_H |
| #ifdef __GNUC__ |
| // this covers gcc, clang, icc |
| # undef alloca |
| # define alloca(x) __builtin_alloca((x)) |
| #elif defined(HAVE_ALLOCA_H) |
| // needed for native compilers on Solaris and AIX |
| # include <alloca.h> |
| #endif |
| @end example |
| |
| @noindent |
| (and this should be included before standard C headers such as |
| @file{stdlib.h}, since on some platforms these include @file{malloc.h} |
| which may have a conflicting definition), which suffices for known @R{} |
| platforms. |
| |
| @node Inlining C functions, Controlling visibility, Platform and version information, The R API |
| @section Inlining C functions |
| @findex R_INLINE |
| |
| The C99 keyword @code{inline} should be recognized by all compilers |
| nowadays used to build @R{}. Portable code which might be used with |
| earlier versions of @R{} can be written using the macro @code{R_INLINE} |
| (defined in file @file{Rconfig.h} included by @file{R.h}), as for |
| example from package @CRANpkg{cluster} |
| |
| @example |
| #include <R.h> |
| |
| static R_INLINE int ind_2(int l, int j) |
| @{ |
| ... |
| @} |
| @end example |
| |
| Be aware that using inlining with functions in more than one compilation |
| unit is almost impossible to do portably, see |
| @uref{http://www.greenend.org.uk/@/rjk/@/2003/@/03/@/inline.html}, so this usage |
| is for @code{static} functions as in the example. All the @R{} |
| configure code has checked is that @code{R_INLINE} can be used in a |
| single C file with the compiler used to build @R{}. We recommend that |
| packages making extensive use of inlining include their own configure |
| code. |
| |
| @node Controlling visibility, Standalone Mathlib, Inlining C functions, The R API |
| @section Controlling visibility |
| @cindex Visibility |
| |
| Header @file{R_ext/Visibility.h} has some definitions for controlling the |
| visibility of entry points. These are only effective when |
| @samp{HAVE_VISIBILITY_ATTRIBUTE} is defined -- this is checked when @R{} |
| is configured and recorded in header @file{Rconfig.h} (included by |
| @file{R_ext/Visibility.h}). It is often defined on modern Unix-alikes |
| with a recent compiler@footnote{It is defined by the Intel compilers, |
| but also hides unsatisfied references and so cannot be used with @R{}. |
| It is not supported by the AIX nor Solaris compilers.}, but not |
| supported on macOS nor Windows. Minimizing the visibility of symbols in |
| a shared library will both speed up its loading (unlikely to be |
| significant) and reduce the possibility of linking to other entry points |
| of the same name. |
| |
| C/C++ entry points prefixed by @code{attribute_hidden} will not be |
| visible in the shared object. There is no comparable mechanism for |
| Fortran entry points, but there is a more comprehensive scheme used by, |
| for example package @pkg{stats}. Most compilers which allow control of |
| visibility will allow control of visibility for all symbols @emph{via} a |
| flag, and where known the flag is encapsulated in the macros |
| @samp{C_VISIBILITY}, @samp{CXX_VISIBILITY}@footnote{As from @R{} 3.5.2: |
| This applies to the compiler for the default C++ dialect (currently |
| normally C++11) and not necessarily to other dialects.} and |
| @samp{F_VISIBILITY} for C, C++ and Fortran |
| compilers.@footnote{@samp{F77_VISIBILITY} was used prior to @R 3.6.0 and |
| is still available (but deprecated). In some cases the Fortran |
| compilers accept the flag but do not actually hide their symbols.} |
| These are defined in @file{etc/Makeconf} and so available for normal |
| compilation of package code. For example, @file{src/Makevars} could |
| include some of |
| |
| @example |
| PKG_CFLAGS=$(C_VISIBILITY) |
| PKG_CXXFLAGS=$(CXX_VISIBILITY) |
| PKG_FFLAGS=$(F_VISIBILITY) |
| @end example |
| |
| This would end up with @strong{no} visible entry points, which would be |
| pointless. However, the effect of the flags can be overridden by using |
| the @code{attribute_visible} prefix. A shared object which registers |
| its entry points needs only for have one visible entry point, its |
| initializer, so for example package @pkg{stats} has |
| |
| @example |
| void attribute_visible R_init_stats(DllInfo *dll) |
| @{ |
| R_registerRoutines(dll, CEntries, CallEntries, FortEntries, NULL); |
| R_useDynamicSymbols(dll, FALSE); |
| ... |
| @} |
| @end example |
| |
| Because the @samp{C_VISIBILITY} mechanism is only useful in conjunction |
| with @code{attribute_visible}, it is not enabled unless |
| @samp{HAVE_VISIBILITY_ATTRIBUTE} is defined. The usual visibility flag |
| is @option{-fvisibility=hidden}: some compilers also support |
| @option{-fvisibility-inlines-hidden} which can be used by overriding |
| @samp{C_VISIBILITY} and @samp{CXX_VISIBILITY} in @file{config.site} when |
| building @R{}, or editing @file{etc/Makeconf} in the @R{} installation. |
| |
| Note that @command{configure} only checks that visibility attributes and |
| flags are accepted, not that they actually hide symbols. |
| |
| The visibility mechanism is not available on Windows, but there is an |
| equally effective way to control which entry points are visible, by |
| supplying a definitions file |
| @file{@var{pkgnme}/src/@var{pkgname}-win.def}: only entry points |
| listed in that file will be visible. Again using @pkg{stats} as an |
| example, it has |
| |
| @example |
| LIBRARY stats.dll |
| EXPORTS |
| R_init_stats |
| @end example |
| |
| @node Standalone Mathlib, Organization of header files, Controlling visibility, The R API |
| @section Using these functions in your own C code |
| |
| It is possible to build @code{Mathlib}, the @R{} set of mathematical |
| functions documented in @file{Rmath.h}, as a standalone library |
| @file{libRmath} under both Unix-alikes and Windows. (This includes the |
| functions documented in @ref{Numerical analysis subroutines} as from |
| that header file.) |
| |
| The library is not built automatically when @R{} is installed, but can |
| be built in the directory @file{src/nmath/standalone} in the @R{} |
| sources: see the file @file{README} there. To use the code in your own |
| C program include |
| |
| @example |
| @group |
| #define MATHLIB_STANDALONE |
| #include <Rmath.h> |
| @end group |
| @end example |
| |
| @noindent |
| and link against @samp{-lRmath} (and perhaps @samp{-lm}). There is an |
| example file @file{test.c}. |
| |
| A little care is needed to use the random-number routines. You will |
| need to supply the uniform random number generator |
| |
| @example |
| double unif_rand(void) |
| @end example |
| |
| @noindent |
| or use the one supplied (and with a dynamic library or DLL you will have |
| to use the one supplied, which is the Marsaglia-multicarry with an entry |
| points |
| |
| @example |
| set_seed(unsigned int, unsigned int) |
| @end example |
| |
| @noindent |
| to set its seeds and |
| |
| @example |
| get_seed(unsigned int *, unsigned int *) |
| @end example |
| |
| @noindent |
| to read the seeds). |
| |
| @node Organization of header files, , Standalone Mathlib, The R API |
| @section Organization of header files |
| |
| The header files which @R{} installs are in directory |
| @file{@var{R_INCLUDE_DIR}} (default @file{@var{R_HOME}/include}). This |
| currently includes |
| |
| @quotation |
| @multitable @columnfractions 0.30 0.55 |
| @item @file{R.h} @tab includes many other files |
| @item @file{S.h} @tab different version for code ported from @Sl{} |
| @item @file{Rinternals.h} @tab definitions for using @R{}'s internal |
| structures |
| @item @file{Rdefines.h} @tab macros for an @Sl{}-like interface to the |
| above (no longer maintained) |
| @item @file{Rmath.h} @tab standalone math library |
| @item @file{Rversion.h} @tab @R{} version information |
| @item @file{Rinterface.h} @tab for add-on front-ends (Unix-alikes only) |
| @item @file{Rembedded.h} @tab for add-on front-ends |
| @item @file{R_ext/Applic.h} @tab optimization and integration |
| @item @file{R_ext/BLAS.h} @tab C definitions for BLAS routines |
| @item @file{R_ext/Callbacks.h} @tab C (and R function) top-level task |
| handlers |
| @item @file{R_ext/GetX11Image.h} @tab X11Image interface used by package |
| @pkg{trkplot} |
| @item @file{R_ext/Lapack.h} @tab C definitions for some LAPACK routines |
| @item @file{R_ext/Linpack.h} @tab C definitions for some LINPACK |
| routines, not all of which are included in @R{} |
| @item @file{R_ext/Parse.h} @tab a small part of @R{}'s parse interface: |
| not part of the stable API. |
| @item @file{R_ext/RStartup.h} @tab for add-on front-ends |
| @item @file{R_ext/Rdynload.h} @tab needed to register compiled code in |
| packages |
| @item @file{R_ext/R-ftp-http.h} @tab interface to internal method of |
| @code{download.file} |
| @item @file{R_ext/Riconv.h} @tab interface to @code{iconv} |
| @item @file{R_ext/Visibility.h} @tab definitions controlling visibility |
| @item @file{R_ext/eventloop.h} @tab for add-on front-ends and for |
| packages that need to share in the @R{} event loops (not Windows) |
| @end multitable |
| @end quotation |
| |
| The following headers are included by @file{R.h}: |
| |
| @quotation |
| @multitable @columnfractions 0.30 0.55 |
| @item @file{Rconfig.h} @tab configuration info that is made available |
| @item @file{R_ext/Arith.h} @tab handling for @code{NA}s, @code{NaN}s, |
| @code{Inf}/@code{-Inf} |
| @item @file{R_ext/Boolean.h} @tab @code{TRUE}/@code{FALSE} type |
| @item @file{R_ext/Complex.h} @tab C typedefs for @R{}'s @code{complex} |
| @item @file{R_ext/Constants.h} @tab constants |
| @item @file{R_ext/Error.h} @tab error signaling |
| @item @file{R_ext/Memory.h} @tab memory allocation |
| @item @file{R_ext/Print.h} @tab @code{Rprintf} and variations. |
| @item @file{R_ext/RS.h} @tab definitions common to @file{R.h} and |
| @file{S.h}, including @code{F77_CALL} etc. |
| @item @file{R_ext/Random.h} @tab random number generation |
| @item @file{R_ext/Utils.h} @tab sorting and other utilities |
| @item @file{R_ext/libextern.h} @tab definitions for exports from |
| @file{R.dll} on Windows. |
| @end multitable |
| @end quotation |
| |
| The graphics systems are exposed in headers |
| @file{R_ext/GraphicsEngine.h}, @file{R_ext/GraphicsDevice.h} (which it |
| includes) and @file{R_ext/QuartzDevice.h}. Facilities for defining |
| custom connection implementations are provided in |
| @file{R_ext/Connections.h}, but make sure you consult the file before |
| use. |
| |
| Let us re-iterate the advice to include system headers before the @R{} |
| header files, especially @file{Rinternals.h} (included by |
| @file{Rdefines.h}) and @file{Rmath.h}, which redefine names which may be |
| used in system headers (fewer if @samp{R_NO_REMAP} is defined, or |
| @samp{R_NO_REMAP_RMATH} for @file{Rmath.h}). |
| |
| @node Generic functions and methods, Linking GUIs and other front-ends to R, The R API, Top |
| @chapter Generic functions and methods |
| @cindex Generic functions |
| @cindex Method functions |
| |
| @R{} programmers will often want to add methods for existing generic |
| functions, and may want to add new generic functions or make existing |
| functions generic. In this chapter we give guidelines for doing so, |
| with examples of the problems caused by not adhering to them. |
| |
| This chapter only covers the `informal' class system copied from S3, |
| and not with the S4 (formal) methods of package @pkg{methods}. |
| |
| First, a @emph{caveat}: a function named @code{@var{gen}.@var{cl}} will |
| be invoked by the generic @code{@var{gen}} for class @code{@var{cl}}, so |
| do not name functions in this style unless they are intended to be |
| methods. |
| |
| The key function for methods is @code{NextMethod}, which dispatches the |
| next method. It is quite typical for a method function to make a few |
| changes to its arguments, dispatch to the next method, receive the |
| results and modify them a little. An example is |
| |
| @example |
| @group |
| t.data.frame <- function(x) |
| @{ |
| x <- as.matrix(x) |
| NextMethod("t") |
| @} |
| @end group |
| @end example |
| |
| @noindent |
| Note that the example above works because there is a @emph{next} method, |
| the default method, not that a new method is selected when the class is |
| changed. |
| |
| @emph{Any} method a programmer writes may be invoked from another method |
| by @code{NextMethod}, @emph{with the arguments appropriate to the |
| previous method}. Further, the programmer cannot predict which method |
| @code{NextMethod} will pick (it might be one not yet dreamt of), and the |
| end user calling the generic needs to be able to pass arguments to the |
| next method. For this to work |
| |
| @quotation |
| @emph{A method must have all the arguments of the generic, including |
| @code{@dots{}} if the generic does.} |
| @end quotation |
| |
| It is a grave misunderstanding to think that a method needs only to |
| accept the arguments it needs. The original S version of |
| @code{predict.lm} did not have a @code{@dots{}} argument, although |
| @code{predict} did. It soon became clear that @code{predict.glm} needed |
| an argument @code{dispersion} to handle over-dispersion. As |
| @code{predict.lm} had neither a @code{dispersion} nor a @code{@dots{}} |
| argument, @code{NextMethod} could no longer be used. (The legacy, two |
| direct calls to @code{predict.lm}, lives on in @code{predict.glm} in |
| @R{}, which is based on the workaround for S3 written by Venables & |
| Ripley.) |
| |
| Further, the user is entitled to use positional matching when calling |
| the generic, and the arguments to a method called by @code{UseMethod} |
| are those of the call to the generic. Thus |
| |
| @quotation |
| @emph{A method must have arguments in exactly the same order as the |
| generic.} |
| @end quotation |
| |
| @noindent |
| To see the scale of this problem, consider the generic function |
| @code{scale}, defined as |
| |
| @example |
| @group |
| scale <- function (x, center = TRUE, scale = TRUE) |
| UseMethod("scale") |
| @end group |
| @end example |
| |
| @noindent |
| Suppose an unthinking package writer created methods such as |
| |
| @example |
| scale.foo <- function(x, scale = FALSE, ...) @{ @} |
| @end example |
| |
| @noindent |
| Then for @code{x} of class @code{"foo"} the calls |
| |
| @example |
| @group |
| scale(x, , TRUE) |
| scale(x, scale = TRUE) |
| @end group |
| @end example |
| |
| @noindent |
| would do most likely do different things, to the justifiable |
| consternation of the end user. |
| |
| To add a further twist, which default is used when a user calls |
| @code{scale(x)} in our example? What if |
| |
| @example |
| scale.bar <- function(x, center, scale = TRUE) NextMethod("scale") |
| @end example |
| |
| @noindent |
| and @code{x} has class @code{c("bar", "foo")}? It is the default |
| specified in the method that is used, but the default |
| specified in the generic may be the one the user sees. |
| This leads to the recommendation: |
| |
| @quotation |
| @emph{If the generic specifies defaults, all methods should use the same defaults.} |
| @end quotation |
| |
| @noindent |
| An easy way to follow these recommendations is to always keep generics |
| simple, e.g. |
| |
| @example |
| scale <- function(x, ...) UseMethod("scale") |
| @end example |
| |
| Only add parameters and defaults to the generic if they make sense in |
| all possible methods implementing it. |
| |
| @menu |
| * Adding new generics:: |
| @end menu |
| |
| @node Adding new generics, , Generic functions and methods, Generic functions and methods |
| @section Adding new generics |
| |
| When creating a new generic function, bear in mind that its argument |
| list will be the maximal set of arguments for methods, including those |
| written elsewhere years later. So choosing a good set of arguments may |
| well be an important design issue, and there need to be good arguments |
| @emph{not} to include a @code{@dots{}} argument. |
| |
| If a @code{@dots{}} argument is supplied, some thought should be given |
| to its position in the argument sequence. Arguments which follow |
| @code{@dots{}} must be named in calls to the function, and they must be |
| named in full (partial matching is suppressed after @code{@dots{}}). |
| Formal arguments before @code{@dots{}} can be partially matched, and so |
| may `swallow' actual arguments intended for @code{@dots{}}. Although it |
| is commonplace to make the @code{@dots{}} argument the last one, that is |
| not always the right choice. |
| |
| Sometimes package writers want to make generic a function in the base |
| package, and request a change in @R{}. This may be justifiable, but |
| making a function generic with the old definition as the default method |
| does have a small performance cost. It is never necessary, as a package |
| can take over a function in the base package and make it generic by |
| something like |
| |
| @example |
| @group |
| foo <- function(object, ...) UseMethod("foo") |
| foo.default <- function(object, ...) base::foo(object) |
| @end group |
| @end example |
| |
| @noindent |
| Earlier versions of this manual suggested assigning @code{foo.default <- |
| base::foo}. This is @strong{not} a good idea, as it captures the base |
| function at the time of installation and it might be changed as @R{} is |
| patched or updated. |
| |
| The same idea can be applied for functions in other packages with namespaces. |
| |
| @node Linking GUIs and other front-ends to R, Function and variable index, Generic functions and methods, Top |
| @chapter Linking GUIs and other front-ends to R |
| |
| There are a number of ways to build front-ends to @R{}: we take this to |
| mean a GUI or other application that has the ability to submit commands |
| to @R{} and perhaps to receive results back (not necessarily in a text |
| format). There are other routes besides those described here, for |
| example the package @CRANpkg{Rserve} (from @acronym{CRAN}, see also |
| @uref{https://www.rforge.net/@/Rserve/}) and connections to Java in |
| @samp{JRI} (part of the @CRANpkg{rJava} package on @acronym{CRAN}) and |
| the Omegahat/Bioconductor package @samp{SJava}. |
| |
| Note that the APIs described in this chapter are only intended to be |
| used in an alternative front-end: they are not part of the API made |
| available for @R{} packages and can be dangerous to use in a |
| conventional package (although packages may contain alternative |
| front-ends). Conversely some of the functions from the API (such as |
| @code{R_alloc}) should not be used in front-ends. |
| |
| @menu |
| * Embedding R under Unix-alikes:: |
| * Embedding R under Windows:: |
| @end menu |
| |
| @node Embedding R under Unix-alikes, Embedding R under Windows, Linking GUIs and other front-ends to R, Linking GUIs and other front-ends to R |
| @section Embedding R under Unix-alikes |
| |
| @R{} can be built as a shared library@footnote{In the parlance of macOS |
| this is a @emph{dynamic} library, and is the normal way to build @R{} on |
| that platform.} if configured with @option{--enable-R-shlib}. This |
| shared library can be used to run @R{} from alternative front-end |
| programs. We will assume this has been done for the rest of this |
| section. Also, it can be built as a static library if configured with |
| @option{--enable-R-static-lib}, and that can be used in a very similar |
| way (at least on Linux: on other platforms one needs to ensure that all |
| the symbols exported by @file{libR.a} are linked into the front-end). |
| |
| The command-line @R{} front-end, @file{@var{R_HOME}/bin/exec/R}, is one |
| such example, and the former @acronym{GNOME} (see package @pkg{gnomeGUI} |
| on @acronym{CRAN}'s @samp{Archive} area) and macOS consoles are others. |
| The source for @file{@var{R_HOME}/bin/exec/R} is in file |
| @file{src/main/Rmain.c} and is very simple |
| |
| @example |
| int Rf_initialize_R(int ac, char **av); /* in ../unix/system.c */ |
| void Rf_mainloop(); /* in main.c */ |
| |
| extern int R_running_as_main_program; /* in ../unix/system.c */ |
| |
| int main(int ac, char **av) |
| @{ |
| R_running_as_main_program = 1; |
| Rf_initialize_R(ac, av); |
| Rf_mainloop(); /* does not return */ |
| return 0; |
| @} |
| @end example |
| |
| @noindent |
| indeed, misleadingly simple. Remember that |
| @file{@var{R_HOME}/bin/exec/R} is run from a shell script |
| @file{@var{R_HOME}/bin/R} which sets up the environment for the |
| executable, and this is used for |
| |
| @itemize @bullet |
| @item |
| Setting @env{R_HOME} and checking it is valid, as well as the path |
| @env{R_SHARE_DIR} and @env{R_DOC_DIR} to the installed @file{share} and |
| @file{doc} directory trees. Also setting @env{R_ARCH} if needed. |
| |
| @item |
| Setting @env{LD_LIBRARY_PATH} to include the directories used in linking |
| @R{}. This is recorded as the default setting of |
| @env{R_LD_LIBRARY_PATH} in the shell script |
| @file{@var{R_HOME}/etc@var{R_ARCH}/ldpaths}. |
| |
| @item |
| Processing some of the arguments, for example to run @R{} under a |
| debugger and to launch alternative front-ends to provide GUIs. |
| @end itemize |
| |
| @noindent |
| The first two of these can be achieved for your front-end by running it |
| @emph{via} @command{R CMD}. So, for example |
| |
| @example |
| R CMD /usr/local/lib/R/bin/exec/R |
| R CMD exec/R |
| @end example |
| |
| @noindent |
| will both work in a standard @R{} installation. (@command{R CMD} looks |
| first for executables in @file{@var{R_HOME}/bin}. These command-lines |
| need modification if a sub-architecture is in use.) If you do not want |
| to run your front-end in this way, you need to ensure that @env{R_HOME} |
| is set and @env{LD_LIBRARY_PATH} is suitable. (The latter might well |
| be, but modern Unix/Linux systems do not normally include |
| @file{/usr/local/lib} (@file{/usr/local/lib64} on some architectures), |
| and @R{} does look there for system components.) |
| |
| The other senses in which this example is too simple are that all the |
| internal defaults are used and that control is handed over to the |
| @R{} main loop. There are a number of small examples@footnote{but these |
| are not part of the automated test procedures and so little tested.} in the |
| @file{tests/Embedding} directory. These make use of |
| @code{Rf_initEmbeddedR} in @file{src/main/Rembedded.c}, and essentially |
| use |
| @example |
| #include <Rembedded.h> |
| |
| int main(int ac, char **av) |
| @{ |
| /* do some setup */ |
| Rf_initEmbeddedR(argc, argv); |
| /* do some more setup */ |
| |
| /* submit some code to R, which is done interactively via |
| run_Rmainloop(); |
| |
| A possible substitute for a pseudo-console is |
| |
| R_ReplDLLinit(); |
| while(R_ReplDLLdo1() > 0) @{ |
| /* add user actions here if desired */ |
| @} |
| |
| */ |
| Rf_endEmbeddedR(0); |
| /* final tidying up after R is shutdown */ |
| return 0; |
| @} |
| @end example |
| |
| @noindent |
| If you do not want to pass @R{} arguments, you can fake an @code{argv} |
| array, for example by |
| |
| @example |
| char *argv[]= @{"REmbeddedPostgres", "--silent"@}; |
| Rf_initEmbeddedR(sizeof(argv)/sizeof(argv[0]), argv); |
| @end example |
| |
| However, to make a GUI we usually do want to run @code{run_Rmainloop} |
| after setting up various parts of @R{} to talk to our GUI, and arranging |
| for our GUI callbacks to be called during the @R{} mainloop. |
| |
| One issue to watch is that on some platforms @code{Rf_initEmbeddedR} and |
| @code{Rf_endEmbeddedR} change the settings of the FPU (e.g.@: to allow |
| errors to be trapped and to make use of extended precision registers). |
| |
| The standard code sets up a session temporary directory in the usual |
| way, @emph{unless} @code{R_TempDir} is set to a non-NULL value before |
| @code{Rf_initEmbeddedR} is called. In that case the value is assumed to |
| contain an existing writable directory (no check is done), and it is not |
| cleaned up when @R{} is shut down. |
| |
| @code{Rf_initEmbeddedR} sets @R{} to be in interactive mode: you can set |
| @code{R_Interactive} (defined in @file{Rinterface.h}) subsequently to |
| change this. |
| |
| Note that @R{} expects to be run with the locale category |
| @samp{LC_NUMERIC} set to its default value of @code{C}, and so should |
| not be embedded into an application which changes that. |
| |
| It is the user's responsibility to attempt to initialize only once. To |
| protect the @R{} interpreter, @code{Rf_initialize_R} will exit the |
| process if re-initialization is attempted. |
| |
| @menu |
| * Compiling against the R library:: |
| * Setting R callbacks:: |
| * Registering symbols:: |
| * Meshing event loops:: |
| * Threading issues:: |
| @end menu |
| |
| @node Compiling against the R library, Setting R callbacks, Embedding R under Unix-alikes, Embedding R under Unix-alikes |
| @subsection Compiling against the R library |
| |
| Suitable flags to compile and link against the @R{} (shared or static) |
| library can be found by |
| |
| @example |
| R CMD config --cppflags |
| R CMD config --ldflags |
| @end example |
| |
| @noindent |
| (These apply only to an uninstalled copy or a standard install.) |
| |
| If @R{} is installed, @code{pkg-config} is available and neither |
| sub-architectures nor a macOS framework have been used, alternatives for |
| a shared @R{} library are |
| |
| @example |
| pkg-config --cflags libR |
| pkg-config --libs libR |
| @end example |
| |
| @noindent |
| and for a static @R{} library |
| |
| @example |
| pkg-config --cflags libR |
| pkg-config --libs --static libR |
| @end example |
| |
| @noindent |
| (This may work for an installed OS framework if @code{pkg-config} is |
| taught where to look for @file{libR.pc}: it is installed inside the |
| framework.) |
| |
| However, a more comprehensive way is to set up a @file{Makefile} to |
| compile the front-end. Suppose file @file{myfe.c} is to be compiled to |
| @file{myfe}. A suitable @file{Makefile} might be |
| |
| @example |
| ## WARNING: does not work when $@{R_HOME@} contains spaces |
| include $@{R_HOME@}/etc$@{R_ARCH@}/Makeconf |
| all: myfe |
| |
| ## The following is not needed, but avoids PIC flags. |
| myfe.o: myfe.c |
| $(CC) $(ALL_CPPFLAGS) $(CFLAGS) -c myfe.c -o $@@ |
| |
| ## replace $(LIBR) $(LIBS) by $(STATIC_LIBR) if R was build with a static libR |
| myfe: myfe.o |
| $(MAIN_LINK) -o $@@ myfe.o $(LIBR) $(LIBS) |
| @end example |
| |
| @noindent |
| invoked as |
| |
| @example |
| R CMD make |
| R CMD myfe |
| @end example |
| |
| Even though not recommended, @code{$@{R_HOME@}} may contain spaces. In |
| that case, it cannot be passed as an argument to @code{include} in the |
| makefile. Instead, one can instruct @command{make} using the @code{-f} |
| option to include @file{Makeconf}, for example @emph{via} recursive |
| invocation of @command{make}, see @ref{Writing portable packages}. |
| |
| @example |
| all: |
| $(MAKE) -f "$@{R_HOME@}/etc$@{R_ARCH@}/Makeconf" -f Makefile.inner |
| @end example |
| |
| Additional flags which @code{$(MAIN_LINK)} includes are, amongst others, |
| those to select OpenMP and @option{--export-dynamic} for the GNU linker |
| on some platforms. In principle @code{$(LIBS)} is not needed |
| when using a shared @R{} library as @file{libR} is linked against |
| those libraries, but some platforms need the executable also linked |
| against them. |
| @c E.g. it seems current Linux needs the executable linked against -lm. |
| |
| @node Setting R callbacks, Registering symbols, Compiling against the R library, Embedding R under Unix-alikes |
| @subsection Setting R callbacks |
| |
| For Unix-alikes there is a public header file @file{Rinterface.h} that |
| makes it possible to change the standard callbacks used by @R{} in a |
| documented way. This defines pointers (if @code{R_INTERFACE_PTRS} is |
| defined) |
| |
| @example |
| extern void (*ptr_R_Suicide)(const char *); |
| extern void (*ptr_R_ShowMessage)(const char *); |
| extern int (*ptr_R_ReadConsole)(const char *, unsigned char *, int, int); |
| extern void (*ptr_R_WriteConsole)(const char *, int); |
| extern void (*ptr_R_WriteConsoleEx)(const char *, int, int); |
| extern void (*ptr_R_ResetConsole)(); |
| extern void (*ptr_R_FlushConsole)(); |
| extern void (*ptr_R_ClearerrConsole)(); |
| extern void (*ptr_R_Busy)(int); |
| extern void (*ptr_R_CleanUp)(SA_TYPE, int, int); |
| extern int (*ptr_R_ShowFiles)(int, const char **, const char **, |
| const char *, Rboolean, const char *); |
| extern int (*ptr_R_ChooseFile)(int, char *, int); |
| extern int (*ptr_R_EditFile)(const char *); |
| extern void (*ptr_R_loadhistory)(SEXP, SEXP, SEXP, SEXP); |
| extern void (*ptr_R_savehistory)(SEXP, SEXP, SEXP, SEXP); |
| extern void (*ptr_R_addhistory)(SEXP, SEXP, SEXP, SEXP); |
| // added in R 3.0.0 |
| extern int (*ptr_R_EditFiles)(int, const char **, const char **, const char *); |
| extern SEXP (*ptr_do_selectlist)(SEXP, SEXP, SEXP, SEXP); |
| extern SEXP (*ptr_do_dataentry)(SEXP, SEXP, SEXP, SEXP); |
| extern SEXP (*ptr_do_dataviewer)(SEXP, SEXP, SEXP, SEXP); |
| extern void (*ptr_R_ProcessEvents)(); |
| @end example |
| |
| @noindent |
| which allow standard @R{} callbacks to be redirected to your GUI. What |
| these do is generally documented in the file @file{src/unix/system.txt}. |
| |
| @deftypefun void R_ShowMessage (char *@var{message}) |
| This should display the message, which may have multiple lines: it |
| should be brought to the user's attention immediately. |
| @end deftypefun |
| |
| @deftypefun void R_Busy (int @var{which}) |
| This function invokes actions (such as change of cursor) when @R{} |
| embarks on an extended computation (@code{@var{which}=1}) and when such |
| a state terminates (@code{@var{which}=0}). |
| @end deftypefun |
| |
| @deftypefun int R_ReadConsole (const char *@var{prompt}, unsigned char *@var{buf}, @ |
| int @var{buflen}, int @var{hist}) |
| @deftypefunx void R_WriteConsole (const char *@var{buf}, int @var{buflen}) |
| @deftypefunx void R_WriteConsoleEx (const char *@var{buf}, int @var{buflen}, int @var{otype}) |
| @deftypefunx void R_ResetConsole () |
| @deftypefunx void R_FlushConsole () |
| @deftypefunx void R_ClearErrConsole () |
| |
| These functions interact with a console. |
| |
| @code{R_ReadConsole} prints the given prompt at the console and then |
| does a @code{fgets(3)}--like operation, transferring up to @var{buflen} |
| characters into the buffer @var{buf}. The last two bytes should be |
| set to @samp{"\n\0"} to preserve sanity. If @var{hist} is non-zero, |
| then the line should be added to any command history which is being |
| maintained. The return value is 0 is no input is available and >0 |
| otherwise. |
| |
| @code{R_WriteConsoleEx} writes the given buffer to the console, |
| @var{otype} specifies the output type (regular output or |
| warning/error). Call to @code{R_WriteConsole(buf, buflen)} is equivalent |
| to @code{R_WriteConsoleEx(buf, buflen, 0)}. To ensure backward |
| compatibility of the callbacks, @code{ptr_R_WriteConsoleEx} is used only |
| if @code{ptr_R_WriteConsole} is set to @code{NULL}. To ensure that |
| @code{stdout()} and @code{stderr()} connections point to the console, |
| set the corresponding files to @code{NULL} @emph{via} |
| @example |
| R_Outputfile = NULL; |
| R_Consolefile = NULL; |
| @end example |
| |
| @code{R_ResetConsole} is called when the system is reset after an error. |
| @code{R_FlushConsole} is called to flush any pending output to the |
| system console. @code{R_ClearerrConsole} clears any errors associated |
| with reading from the console. |
| @end deftypefun |
| |
| @deftypefun int R_ShowFiles (int @var{nfile}, const char **@var{file}, @ |
| const char **@var{headers}, const char *@var{wtitle}, Rboolean @var{del}, @ |
| const char *@var{pager}) |
| |
| This function is used to display the contents of files. |
| @end deftypefun |
| |
| @deftypefun int R_ChooseFile (int @var{new}, char *@var{buf}, @ |
| int @var{len}) |
| |
| Choose a file and return its name in @var{buf} of length @var{len}. |
| Return value is 0 for success, > 0 otherwise. |
| @end deftypefun |
| |
| @deftypefun int R_EditFile (const char *@var{buf}) |
| Send a file to an editor window. |
| @end deftypefun |
| |
| @deftypefun int R_EditFiles (int @var{nfile}, const char **@var{file}, const char **@var{title}, const char *@var{editor}) |
| Send @var{nfile} files to an editor, with titles possibly to be used for |
| the editor window(s). |
| @end deftypefun |
| |
| @deftypefun SEXP R_loadhistory (SEXP, SEXP, SEXP, SEXP); |
| @deftypefunx SEXP R_savehistory (SEXP, SEXP, SEXP, SEXP); |
| @deftypefunx SEXP R_addhistory (SEXP, SEXP, SEXP, SEXP); |
| |
| @code{.Internal} functions for @code{loadhistory}, @code{savehistory} |
| and @code{timestamp}. |
| |
| If the console has no history mechanism these can be as |
| simple as |
| |
| @example |
| SEXP R_loadhistory (SEXP call, SEXP op, SEXP args, SEXP env) |
| @{ |
| errorcall(call, "loadhistory is not implemented"); |
| return R_NilValue; |
| @} |
| SEXP R_savehistory (SEXP call, SEXP op , SEXP args, SEXP env) |
| @{ |
| errorcall(call, "savehistory is not implemented"); |
| return R_NilValue; |
| @} |
| SEXP R_addhistory (SEXP call, SEXP op , SEXP args, SEXP env) |
| @{ |
| return R_NilValue; |
| @} |
| @end example |
| |
| The @code{R_addhistory} function should return silently if no history |
| mechanism is present, as a user may be calling @code{timestamp} purely |
| to write the time stamp to the console. |
| @end deftypefun |
| |
| @deftypefun void R_Suicide (const char *@var{message}) |
| This should abort @R{} as rapidly as possible, displaying the message. |
| A possible implementation is |
| |
| @example |
| void R_Suicide (const char *message) |
| @{ |
| char pp[1024]; |
| snprintf(pp, 1024, "Fatal error: %s\n", s); |
| R_ShowMessage(pp); |
| R_CleanUp(SA_SUICIDE, 2, 0); |
| @} |
| @end example |
| @end deftypefun |
| |
| @deftypefun void R_CleanUp (SA_TYPE @var{saveact}, int @var{status}, @ |
| int @var{RunLast}) |
| |
| This function invokes any actions which occur at system termination. |
| It needs to be quite complex: |
| |
| @example |
| #include <Rinterface.h> |
| #include <Rembedded.h> /* for Rf_KillAllDevices */ |
| |
| void R_CleanUp (SA_TYPE saveact, int status, int RunLast) |
| @{ |
| if(saveact == SA_DEFAULT) saveact = SaveAction; |
| if(saveact == SA_SAVEASK) @{ |
| /* ask what to do and set saveact */ |
| @} |
| switch (saveact) @{ |
| case SA_SAVE: |
| if(runLast) R_dot_Last(); |
| if(R_DirtyImage) R_SaveGlobalEnv(); |
| /* save the console history in R_HistoryFile */ |
| break; |
| case SA_NOSAVE: |
| if(runLast) R_dot_Last(); |
| break; |
| case SA_SUICIDE: |
| default: |
| break; |
| @} |
| |
| R_RunExitFinalizers(); |
| /* clean up after the editor e.g. CleanEd() */ |
| |
| R_CleanTempDir(); |
| |
| /* close all the graphics devices */ |
| if(saveact != SA_SUICIDE) Rf_KillAllDevices(); |
| fpu_setup(FALSE); |
| |
| exit(status); |
| @} |
| @end example |
| @end deftypefun |
| |
| These callbacks should never be changed in a running @R{} session (and |
| hence cannot be called from an extension package). |
| |
| @deftypefun SEXP R_dataentry (SEXP, SEXP, SEXP, SEXP); |
| @deftypefunx SEXP R_dataviewer (SEXP, SEXP, SEXP, SEXP); |
| @deftypefunx SEXP R_selectlist (SEXP, SEXP, SEXP, SEXP); |
| |
| @code{.External} functions for @code{dataentry} (and @code{edit} on |
| matrices and data frames), @code{View} and @code{select.list}. These |
| can be changed if they are not currently in use. |
| @end deftypefun |
| |
| |
| @node Registering symbols, Meshing event loops, Setting R callbacks, Embedding R under Unix-alikes |
| @subsection Registering symbols |
| |
| An application embedding @R{} needs a different way of registering |
| symbols because it is not a dynamic library loaded by @R{} as would be |
| the case with a package. Therefore @R{} reserves a special |
| @code{DllInfo} entry for the embedding application such that it can |
| register symbols to be used with @code{.C}, @code{.Call} etc. This |
| entry can be obtained by calling @code{getEmbeddingDllInfo}, so a |
| typical use is |
| |
| @example |
| DllInfo *info = R_getEmbeddingDllInfo(); |
| R_registerRoutines(info, cMethods, callMethods, NULL, NULL); |
| @end example |
| |
| The native routines defined by @code{cMethods} and @code{callMethods} |
| should be present in the embedding application. See @ref{Registering |
| native routines} for details on registering symbols in general. |
| |
| |
| @node Meshing event loops, Threading issues, Registering symbols, Embedding R under Unix-alikes |
| @subsection Meshing event loops |
| |
| One of the most difficult issues in interfacing @R{} to a front-end is |
| the handling of event loops, at least if a single thread is used. @R{} |
| uses events and timers for |
| |
| @itemize |
| @item |
| Running X11 windows such as the graphics device and data editor, and |
| interacting with them (e.g., using @code{locator()}). |
| |
| @item |
| Supporting Tcl/Tk events for the @pkg{tcltk} package (for at least the |
| X11 version of Tk). |
| |
| @item |
| Preparing input. |
| |
| @item |
| Timing operations, for example for profiling @R{} code and |
| @code{Sys.sleep()}. |
| |
| @item |
| Interrupts, where permitted. |
| @end itemize |
| |
| @noindent |
| Specifically, the Unix-alike command-line version of @R{} runs separate |
| event loops for |
| |
| @itemize |
| @item |
| Preparing input at the console command-line, in file |
| @file{src/unix/sys-unix.c}. |
| |
| @item |
| Waiting for a response from a socket in the internal functions |
| underlying FTP and HTTP transfers in @code{download.file()} and for |
| direct socket access, in files |
| @file{src/@/modules/@/internet/@/nanoftp.c}, |
| @file{src/@/modules/@/internet/@/nanohttp.c} and |
| @file{src/@/modules/@/internet/@/Rsock.c} |
| |
| @item |
| Mouse and window events when displaying the X11-based dataentry window, |
| in file @file{src/modules/X11/dataentry.c}. This is regarded as |
| @emph{modal}, and no other events are serviced whilst it is active. |
| @end itemize |
| |
| There is a protocol for adding event handlers to the first two types of |
| event loops, using types and functions declared in the header |
| @file{R_ext/eventloop.h} and described in comments in file |
| @file{src/unix/sys-std.c}. It is possible to add (or remove) an input |
| handler for events on a particular file descriptor, or to set a polling |
| interval (@emph{via} @code{R_wait_usec}) and a function to be called |
| periodically @emph{via} @code{R_PolledEvents}: the polling mechanism is used by |
| the @pkg{tcltk} package. |
| |
| It is not intended that these facilities are used by packages, but if |
| they are needed exceptionally, the package should ensure that it cleans |
| up and removes its handlers when its namespace is unloaded. Note that |
| the header @file{sys/select.h} is needed@footnote{At least according to |
| POSIX 2004 and later. Earlier standards prescribed @file{sys/time.h}: |
| @file{R_ext/eventloop.h} will include it if @code{HAVE_SYS_TIME_H} is |
| defined.}: users should check this is available and define |
| @code{HAVE_SYS_SELECT_H} before including @file{R_ext/eventloop.h}. (It |
| is often the case that another header will include @file{sys/select.h} |
| before @file{eventloop.h} is processed, but this should not be relied |
| on.) |
| |
| An alternative front-end needs both to make provision for other @R{} |
| events whilst waiting for input, and to ensure that it is not frozen out |
| during events of the second type. The ability to add a polled handler |
| as @code{R_timeout_handler} is used by the @pkg{tcltk} package. |
| |
| |
| @node Threading issues, , Meshing event loops, Embedding R under Unix-alikes |
| @subsection Threading issues |
| |
| Embedded @R{} is designed to be run in the main thread, and all the |
| testing is done in that context. There is a potential issue with the |
| stack-checking mechanism where threads are involved. This uses two |
| variables declared in @file{Rinterface.h} (if @code{CSTACK_DEFNS} is |
| defined) as |
| |
| @example |
| extern uintptr_t R_CStackLimit; /* C stack limit */ |
| extern uintptr_t R_CStackStart; /* Initial stack address */ |
| @end example |
| |
| @noindent |
| Note that @code{uintptr_t} is an optional C99 type for which a |
| substitute is defined in @R{}, so your code needs to define |
| @code{HAVE_UINTPTR_T} appropriately. To do so, test if the type is |
| defined in C header @file{stdint.h} or C++ header @file{cstdint} and if |
| so include the header and define @code{HAVE_UINTPTR_T} before including |
| @file{Rinterface.h}. (For C code one can simply include |
| @file{Rconfig.h}, possibly @emph{via} @file{R.h}, and for C++11 code |
| @file{Rinterface.h} will include the header @file{cstdint}.) |
| |
| These will be set@footnote{at least on platforms where the values are |
| available, that is having @code{getrlimit} and on Linux or having |
| @code{sysctl} supporting @code{KERN_USRSTACK}, including FreeBSD and OS |
| X.} when @code{Rf_initialize_R} is called, to values appropriate to the |
| main thread. Stack-checking can be disabled by setting |
| @code{R_CStackLimit = (uintptr_t)-1} immediately after |
| @code{Rf_initialize_R} is called, but it is better to if possible set |
| appropriate values. (What these are and how to determine them are |
| OS-specific, and the stack size limit may differ for secondary threads. |
| If you have a choice of stack size, at least 10Mb is recommended.) |
| |
| You may also want to consider how signals are handled: @R{} sets signal |
| handlers for several signals, including @code{SIGINT}, @code{SIGSEGV}, |
| @code{SIGPIPE}, @code{SIGUSR1} and @code{SIGUSR2}, but these can all be |
| suppressed by setting the variable @code{R_SignalHandlers} (declared in |
| @file{Rinterface.h}) to @code{0}. |
| |
| Note that these variables must not be changed by an @R{} |
| @strong{package}: a package should not call @R{} internals which |
| makes use of the stack-checking mechanism on a secondary thread. |
| |
| @node Embedding R under Windows, , Embedding R under Unix-alikes, Linking GUIs and other front-ends to R |
| @section Embedding R under Windows |
| |
| All Windows interfaces to @R{} call entry points in the DLL |
| @file{R.dll}, directly or indirectly. Simpler applications may find it |
| easier to use the indirect route @emph{via} @acronym{(D)COM}. |
| |
| @menu |
| * Using (D)COM:: |
| * Calling R.dll directly:: |
| * Finding R_HOME:: |
| @end menu |
| |
| @node Using (D)COM, Calling R.dll directly, Embedding R under Windows, Embedding R under Windows |
| @subsection Using (D)COM |
| |
| @acronym{(D)COM} is a standard Windows mechanism used for communication |
| between Windows applications. One application (here @R{}) is run as COM |
| server which offers services to clients, here the front-end calling |
| application. The services are described in a `Type Library' and are |
| (more or less) language-independent, so the calling application can be |
| written in C or C++ or Visual Basic or Perl or Python and so on. |
| The `D' in (D)COM refers to `distributed', as the client and server can |
| be running on different machines. |
| |
| The basic @R{} distribution is not a (D)COM server, but two addons are |
| currently available that interface directly with @R{} and provide a |
| (D)COM server: |
| @itemize |
| @item |
| There is a (D)COM server called @code{StatConnector} written by Thomas |
| Baier available @emph{via} @uref{http://www.autstat.com/}, |
| which works with @R{} packages to support transfer of data to and from |
| @R{} and remote execution of @R{} commands, as well as embedding of an |
| @R{} graphics window. |
| |
| Recent versions have usage restrictions. |
| |
| @item |
| Another (D)COM server, @code{RDCOMServer}, may be available from Omegahat, |
| @uref{http://www.omegahat.net/}. Its philosophy is discussed in |
| @uref{http://www.omegahat.net/@/RDCOMServer/@/Docs/@/Paradigm.html} and is |
| very different from the purpose of this section. |
| @end itemize |
| @node Calling R.dll directly, Finding R_HOME, Using (D)COM, Embedding R under Windows |
| @subsection Calling R.dll directly |
| |
| The @code{R} DLL is mainly written in C and has @code{_cdecl} entry |
| points. Calling it directly will be tricky except from C code (or C++ |
| with a little care). |
| |
| There is a version of the Unix-alike interface calling |
| |
| @example |
| int Rf_initEmbeddedR(int ac, char **av); |
| void Rf_endEmbeddedR(int fatal); |
| @end example |
| |
| @noindent |
| which is an entry point in @file{R.dll}. Examples of its use (and a |
| suitable @file{Makefile.win}) can be found in the @file{tests/Embedding} |
| directory of the sources. You may need to ensure that |
| @file{@var{R_HOME}/bin} is in your @env{PATH} so the @R{} DLLs are found. |
| |
| Examples of calling @file{R.dll} directly are provided in the directory |
| @file{src/@/gnuwin32/@/front-ends}, including a simple command-line |
| front end @file{rtest.c} whose code is |
| |
| @smallexample |
| #define Win32 |
| #include <windows.h> |
| #include <stdio.h> |
| #include <Rversion.h> |
| #define LibExtern __declspec(dllimport) extern |
| #include <Rembedded.h> |
| #include <R_ext/RStartup.h> |
| /* for askok and askyesnocancel */ |
| #include <graphapp.h> |
| |
| /* for signal-handling code */ |
| #include <psignal.h> |
| |
| /* simple input, simple output */ |
| |
| /* This version blocks all events: a real one needs to call ProcessEvents |
| frequently. See rterm.c and ../system.c for one approach using |
| a separate thread for input. |
| */ |
| int myReadConsole(const char *prompt, char *buf, int len, int addtohistory) |
| @{ |
| fputs(prompt, stdout); |
| fflush(stdout); |
| if(fgets(buf, len, stdin)) return 1; else return 0; |
| @} |
| |
| void myWriteConsole(const char *buf, int len) |
| @{ |
| printf("%s", buf); |
| @} |
| |
| void myCallBack(void) |
| @{ |
| /* called during i/o, eval, graphics in ProcessEvents */ |
| @} |
| |
| void myBusy(int which) |
| @{ |
| /* set a busy cursor ... if which = 1, unset if which = 0 */ |
| @} |
| |
| static void my_onintr(int sig) @{ UserBreak = 1; @} |
| |
| int main (int argc, char **argv) |
| @{ |
| structRstart rp; |
| Rstart Rp = &rp; |
| char Rversion[25], *RHome; |
| |
| sprintf(Rversion, "%s.%s", R_MAJOR, R_MINOR); |
| if(strcmp(getDLLVersion(), Rversion) != 0) @{ |
| fprintf(stderr, "Error: R.DLL version does not match\n"); |
| exit(1); |
| @} |
| |
| R_setStartTime(); |
| R_DefParams(Rp); |
| if((RHome = get_R_HOME()) == NULL) @{ |
| fprintf(stderr, "R_HOME must be set in the environment or Registry\n"); |
| exit(1); |
| @} |
| Rp->rhome = RHome; |
| Rp->home = getRUser(); |
| Rp->CharacterMode = LinkDLL; |
| Rp->ReadConsole = myReadConsole; |
| Rp->WriteConsole = myWriteConsole; |
| Rp->CallBack = myCallBack; |
| Rp->ShowMessage = askok; |
| Rp->YesNoCancel = askyesnocancel; |
| Rp->Busy = myBusy; |
| |
| Rp->R_Quiet = TRUE; /* Default is FALSE */ |
| Rp->R_Interactive = FALSE; /* Default is TRUE */ |
| Rp->RestoreAction = SA_RESTORE; |
| Rp->SaveAction = SA_NOSAVE; |
| R_SetParams(Rp); |
| R_set_command_line_arguments(argc, argv); |
| |
| FlushConsoleInputBuffer(GetStdHandle(STD_INPUT_HANDLE)); |
| |
| signal(SIGBREAK, my_onintr); |
| GA_initapp(0, 0); |
| readconsolecfg(); |
| setup_Rmainloop(); |
| #ifdef SIMPLE_CASE |
| run_Rmainloop(); |
| #else |
| R_ReplDLLinit(); |
| while(R_ReplDLLdo1() > 0) @{ |
| /* add user actions here if desired */ |
| @} |
| /* only get here on EOF (not q()) */ |
| #endif |
| Rf_endEmbeddedR(0); |
| return 0; |
| @} |
| @end smallexample |
| |
| The ideas are |
| |
| @itemize |
| @item |
| Check that the front-end and the linked @file{R.dll} match -- other |
| front-ends may allow a looser match. |
| |
| @item |
| Find and set the @R{} home directory and the user's home directory. The |
| former may be available from the Windows Registry: it will be in |
| @code{HKEY_LOCAL_MACHINE\Software\R-core\R\InstallPath} from an |
| administrative install and |
| @code{HKEY_CURRENT_USER\Software\R-core\R\InstallPath} otherwise, if |
| selected during installation (as it is by default). |
| |
| @item |
| Define startup conditions and callbacks @emph{via} the @code{Rstart} structure. |
| @code{R_DefParams} sets the defaults, and @code{R_SetParams} sets |
| updated values. |
| |
| @item |
| Record the command-line arguments used by |
| @code{R_set_command_line_arguments} for use by the @R{} function |
| @code{commandArgs()}. |
| |
| @item |
| Set up the signal handler and the basic user interface. |
| |
| @item |
| Run the main @R{} loop, possibly with our actions intermeshed. |
| |
| @item |
| Arrange to clean up. |
| @end itemize |
| |
| An underlying theme is the need to keep the GUI `alive', and this has |
| not been done in this example. The @R{} callback @code{R_ProcessEvents} |
| needs to be called frequently to ensure that Windows events in @R{} |
| windows are handled expeditiously. Conversely, @R{} needs to allow the |
| GUI code (which is running in the same process) to update itself as |
| needed -- two ways are provided to allow this: |
| |
| @itemize |
| @item |
| @code{R_ProcessEvents} calls the callback registered by |
| @code{Rp->callback}. A version of this is used to run package Tcl/Tk |
| for @pkg{tcltk} under Windows, for the code is |
| |
| @example |
| void R_ProcessEvents(void) |
| @{ |
| while (peekevent()) doevent(); /* Windows events for GraphApp */ |
| if (UserBreak) @{ UserBreak = FALSE; onintr(); @} |
| R_CallBackHook(); |
| if(R_tcldo) R_tcldo(); |
| @} |
| @end example |
| |
| @item |
| The mainloop can be split up to allow the calling application to take |
| some action after each line of input has been dealt with: see the |
| alternative code below @code{#ifdef SIMPLE_CASE}. |
| @end itemize |
| |
| It may be that no @R{} GraphApp windows need to be considered, although |
| these include pagers, the @code{windows()} graphics device, the @R{} |
| data and script editors and various popups such as @code{choose.file()} |
| and @code{select.list()}. It would be possible to replace all of these, |
| but it seems easier to allow GraphApp to handle most of them. |
| |
| It is possible to run @R{} in a GUI in a single thread (as |
| @file{RGui.exe} shows) but it will normally be easier@footnote{An |
| attempt to use only threads in the late 1990s failed to work correctly |
| under Windows 95, the predominant version of Windows at that time.} to |
| use multiple threads. |
| |
| Note that @R{}'s own front ends use a stack size of 10Mb, whereas MinGW |
| executables default to 2Mb, and Visual C++ ones to 1Mb. The latter |
| stack sizes are too small for a number of @R{} applications, so |
| general-purpose front-ends should use a larger stack size. |
| |
| |
| @node Finding R_HOME, , Calling R.dll directly, Embedding R under Windows |
| @subsection Finding R_HOME |
| |
| Both applications which embed @R{} and those which use a @code{system} |
| call to invoke @R{} (as @command{Rscript.exe}, @command{Rterm.exe} or |
| @command{R.exe}) need to be able to find the @R{} @file{bin} directory. |
| The simplest way to do so is the ask the user to set an environment |
| variable @env{R_HOME} and use that, but naive users may be flummoxed as |
| to how to do so or what value to use. |
| |
| The @R{} for Windows installers have for a long time allowed the value |
| of @code{R_HOME} to be recorded in the Windows Registry: this is |
| optional but selected by default. @emph{Where} it is recorded has |
| changed over the years to allow for multiple versions of @R{} to be |
| installed at once, and to allow 32- and 64-bit versions of @R{} to be |
| installed on the same machine. |
| |
| The basic Registry location is @code{Software\R-core\R}. For an |
| administrative install this is under @code{HKEY_LOCAL_MACHINE} and on a |
| 64-bit OS @code{HKEY_LOCAL_MACHINE\Software\R-core\R} is by default |
| redirected for a 32-bit application, so a 32-bit application will see |
| the information for the last 32-bit install, and a 64-bit application |
| that for the last 64-bit install. For a personal install, the |
| information is under @code{HKEY_CURRENT_USER\Software\R-core\R} which is |
| seen by both 32-bit and 64-bit applications and so records the last |
| install of either architecture. To circumvent this, there are locations |
| @code{Software\R-core\R32} and @code{Software\R-core\R64} which always |
| refer to one architecture. |
| |
| When @R{} is installed and recording is not disabled then two string |
| values are written at that location for keys @code{InstallPath} and |
| @code{Current Version}, and these keys are removed when @R{} is |
| uninstalled. To allow information about other installed versions to be |
| retained, there is also a key named something like @code{3.0.0} or |
| @code{3.0.0 patched} or @code{3.1.0 Pre-release} with a value for |
| @code{InstallPath}. |
| |
| So a comprehensive algorithm to search for @code{R_HOME} is something |
| like |
| |
| @itemize |
| @item |
| Decide which of personal or administrative installs should have |
| precedence. There are arguments both ways: we find that with roaming |
| profiles that @code{HKEY_CURRENT_USER\Software} often gets reverted to |
| an earlier version. Do the following for one or both of |
| @code{HKEY_CURRENT_USER} and @code{HKEY_LOCAL_MACHINE}. |
| |
| @item |
| If the desired architecture is known, look in @code{Software\R-core\R32} |
| or @code{Software\R-core\R64}, and if that does not exist or the |
| architecture is immaterial, in @code{Software\R-core\R}. |
| |
| @item |
| If key @code{InstallPath} exists then this is @code{R_HOME} (recorded |
| using backslashes). If it does not, look for version-specific keys like |
| @code{2.11.0 alpha}, pick the latest (which is of itself a complicated |
| algorithm as @code{2.11.0 patched > 2.11.0 > 2.11.0 alpha > 2.8.1}) and |
| use its value for @code{InstallPath}. |
| @end itemize |
| |
| @node Function and variable index, Concept index, Linking GUIs and other front-ends to R, Top |
| @unnumbered Function and variable index |
| |
| @printindex vr |
| |
| @node Concept index, , Function and variable index, Top |
| @unnumbered Concept index |
| |
| @printindex cp |
| |
| @bye |
| |
| @c Local Variables: *** |
| @c mode: TeXinfo *** |
| @c End: *** |