| \input texinfo |
| @c %**start of header |
| @setfilename R-intro.info |
| @settitle An Introduction to R |
| @setchapternewpage on |
| @c %**end of header |
| |
| @c Authors: If you edit/add @example(s) , please keep |
| @c ./R-intro.R up-to-date ! |
| @c ~~~~~~~~~~~ |
| @syncodeindex fn vr |
| |
| |
| @dircategory Programming |
| @direntry |
| * R Introduction: (R-intro). An introduction to R. |
| @end direntry |
| |
| @finalout |
| |
| @include R-defs.texi |
| @include version.texi |
| |
| @copying |
| This manual is for R, version @value{VERSION}. |
| |
| Copyright @copyright{} 1990 W.@: N.@: Venables@* |
| Copyright @copyright{} 1992 W.@: N.@: Venables & D.@: M.@: Smith@* |
| Copyright @copyright{} 1997 R.@: Gentleman & R.@: Ihaka@* |
| Copyright @copyright{} 1997, 1998 M.@: Maechler@* |
| @Rcopyright{1999} |
| |
| @quotation |
| @permission{} |
| @end quotation |
| @end copying |
| |
| |
| @c <FIXME> |
| @c Apparently AUCTeX 11.06 has a problem with '@appendixsection' entries |
| @c when updating nodes---the equivalent '@appendixsec' seems to work. |
| @c Hence changed (temporarily?) ... |
| @c </FIXME> |
| |
| @c <NOTE> |
| @c Conversion to PDF fails if sectioning titles contain (user-defined) |
| @c macros such as @R{}. Hence in section titles we changed @R{} to R. |
| @c Revert when this is fixed. |
| @c </NOTE> |
| |
| @titlepage |
| @title An Introduction to R |
| @subtitle Notes on @R{}: A Programming Environment for Data Analysis and Graphics |
| @subtitle Version @value{VERSION} |
| @author W. N. Venables, D. M. Smith |
| @author and the R Core Team |
| @page |
| @vskip 0pt plus 1filll |
| @insertcopying |
| @end titlepage |
| |
| @ifplaintext |
| @insertcopying |
| @end ifplaintext |
| |
| @contents |
| |
| @ifnottex |
| @node Top, Preface, (dir), (dir) |
| @top An Introduction to R |
| |
| This is an introduction to R (``GNU S''), a language and environment for |
| statistical computing and graphics. R is similar to the |
| award-winning@footnote{ACM Software Systems award, 1998: |
| @uref{https://awards.acm.org/award_winners/chambers_6640862.cfm}.} S |
| system, which was developed at Bell Laboratories by John Chambers et al. |
| It provides a wide variety of statistical and graphical techniques |
| (linear and nonlinear modelling, statistical tests, time series |
| analysis, classification, clustering, ...). |
| |
| This manual provides information on data types, programming elements, |
| statistical modelling and graphics. |
| |
| @insertcopying |
| |
| @end ifnottex |
| |
| @menu |
| * Preface:: |
| * Introduction and preliminaries:: |
| * Simple manipulations numbers and vectors:: |
| * Objects:: |
| * Factors:: |
| * Arrays and matrices:: |
| * Lists and data frames:: |
| * Reading data from files:: |
| * Probability distributions:: |
| * Loops and conditional execution:: |
| * Writing your own functions:: |
| * Statistical models in R:: |
| * Graphics:: |
| * Packages:: |
| * OS facilities:: |
| * A sample session:: |
| * Invoking R:: |
| * The command-line editor:: |
| * Function and variable index:: |
| * Concept index:: |
| * References:: |
| @end menu |
| |
| @node Preface, Introduction and preliminaries, Top, Top |
| @unnumbered Preface |
| |
| This introduction to @R{} is derived from an original set of notes |
| describing the @Sl{} and @SPLUS{} environments written in 1990--2 by |
| Bill Venables and David M. Smith when at the University of Adelaide. We |
| have made a number of small changes to reflect differences between the |
| @R{} and @Sl{} programs, and expanded some of the material. |
| |
| We would like to extend warm thanks to Bill Venables (and David Smith) |
| for granting permission to distribute this modified version of the notes |
| in this way, and for being a supporter of @R{} from way back. |
| |
| Comments and corrections are always welcome. Please address email |
| correspondence to @email{R-core@@R-project.org}. |
| |
| @subheading Suggestions to the reader |
| |
| Most @R{} novices will start with the introductory session in Appendix |
| A. This should give some familiarity with the style of @R{} sessions |
| and more importantly some instant feedback on what actually happens. |
| |
| Many users will come to @R{} mainly for its graphical facilities. |
| @xref{Graphics}, which can be read at almost any time and need not wait |
| until all the preceding sections have been digested. |
| |
| @menu |
| * Introduction and preliminaries:: |
| @end menu |
| |
| @node Introduction and preliminaries, Simple manipulations numbers and vectors, Preface, Top |
| @chapter Introduction and preliminaries |
| |
| @menu |
| * The R environment:: |
| * Related software and documentation:: |
| * R and statistics:: |
| * R and the window system:: |
| * Using R interactively:: |
| * Getting help:: |
| * R commands; case sensitivity etc:: |
| * Recall and correction of previous commands:: |
| * Executing commands from or diverting output to a file:: |
| * Data permanency and removing objects:: |
| @end menu |
| |
| @node The R environment, Related software and documentation, Introduction and preliminaries, Introduction and preliminaries |
| @section The R environment |
| |
| @R{} is an integrated suite of software facilities for data |
| manipulation, calculation and graphical display. Among other things it |
| has |
| |
| @itemize @bullet |
| @item |
| an effective data handling and storage facility, |
| @item |
| a suite of operators for calculations on arrays, in particular matrices, |
| @item |
| a large, coherent, integrated collection of intermediate tools for data |
| analysis, |
| @item |
| graphical facilities for data analysis and display either directly at |
| the computer or on hardcopy, and |
| @item |
| a well developed, simple and effective programming language (called `S') |
| which includes conditionals, loops, user defined recursive functions and |
| input and output facilities. (Indeed most of the system supplied |
| functions are themselves written in the @Sl{} language.) |
| @end itemize |
| |
| The term ``environment'' is intended to characterize it as a fully |
| planned and coherent system, rather than an incremental accretion of |
| very specific and inflexible tools, as is frequently the case with other |
| data analysis software. |
| |
| @R{} is very much a vehicle for newly developing methods of interactive |
| data analysis. It has developed rapidly, and has been extended by a |
| large collection of @emph{packages}. However, most programs written in |
| @R{} are essentially ephemeral, written for a single piece of data |
| analysis. |
| |
| @node Related software and documentation, R and statistics, The R environment, Introduction and preliminaries |
| @section Related software and documentation |
| |
| @R{} can be regarded as an implementation of the @Sl{} language which |
| was developed at Bell Laboratories by Rick Becker, John Chambers and |
| Allan Wilks, and also forms the basis of the @SPLUS{} systems. |
| |
| The evolution of the @Sl{} language is characterized by four books by |
| John Chambers and coauthors. For @R{}, the basic reference is @emph{The |
| New @Sl{} Language: A Programming Environment for Data Analysis and |
| Graphics} by Richard A.@: Becker, John M.@: Chambers and Allan R.@: |
| Wilks. The new features of the 1991 release of @Sl{} |
| @c (@Sl{} version 3) JMC says the 1988 version is S3. |
| are covered in @emph{Statistical Models in @Sl{}} edited by John M.@: |
| Chambers and Trevor J.@: Hastie. The formal methods and classes of the |
| @pkg{methods} package are based on those described in @emph{Programming |
| with Data} by John M.@: Chambers. @xref{References}, for precise |
| references. |
| |
| There are now a number of books which describe how to use @R{} for data |
| analysis and statistics, and documentation for @Sl{}/@SPLUS{} can |
| typically be used with @R{}, keeping the differences between the @Sl{} |
| implementations in mind. @xref{What documentation exists for R?, , , |
| R-FAQ, The R statistical system FAQ}. |
| |
| @node R and statistics, R and the window system, Related software and documentation, Introduction and preliminaries |
| @section R and statistics |
| @cindex Packages |
| |
| Our introduction to the @R{} environment did not mention |
| @emph{statistics}, yet many people use @R{} as a statistics system. We |
| prefer to think of it of an environment within which many classical and |
| modern statistical techniques have been implemented. A few of these are |
| built into the base @R{} environment, but many are supplied as |
| @emph{packages}. There are about 25 packages supplied with @R{} (called |
| ``standard'' and ``recommended'' packages) and many more are available |
| through the @acronym{CRAN} family of Internet sites (via |
| @uref{https://CRAN.R-project.org}) and elsewhere. More details on |
| packages are given later (@pxref{Packages}). |
| |
| Most classical statistics and much of the latest methodology is |
| available for use with @R{}, but users may need to be prepared to do a |
| little work to find it. |
| |
| There is an important difference in philosophy between @Sl{} (and hence |
| @R{}) and the other main statistical systems. In @Sl{} a statistical |
| analysis is normally done as a series of steps, with intermediate |
| results being stored in objects. Thus whereas SAS and SPSS will give |
| copious output from a regression or discriminant analysis, @R{} will |
| give minimal output and store the results in a fit object for subsequent |
| interrogation by further @R{} functions. |
| |
| @node R and the window system, Using R interactively, R and statistics, Introduction and preliminaries |
| @section R and the window system |
| |
| The most convenient way to use @R{} is at a graphics workstation running |
| a windowing system. This guide is aimed at users who have this |
| facility. In particular we will occasionally refer to the use of @R{} |
| on an X window system although the vast bulk of what is said applies |
| generally to any implementation of the @R{} environment. |
| |
| Most users will find it necessary to interact directly with the |
| operating system on their computer from time to time. In this guide, we |
| mainly discuss interaction with the operating system on UNIX machines. |
| If you are running @R{} under Windows or macOS you will need to make |
| some small adjustments. |
| |
| Setting up a workstation to take full advantage of the customizable |
| features of @R{} is a straightforward if somewhat tedious procedure, and |
| will not be considered further here. Users in difficulty should seek |
| local expert help. |
| |
| @node Using R interactively, Getting help, R and the window system, Introduction and preliminaries |
| @section Using R interactively |
| |
| When you use the @R{} program it issues a prompt when it expects input |
| commands. The default prompt is @samp{@code{>}}, which on UNIX might be |
| the same as the shell prompt, and so it may appear that nothing is |
| happening. However, as we shall see, it is easy to change to a |
| different @R{} prompt if you wish. We will assume that the UNIX shell |
| prompt is @samp{@code{$}}. |
| |
| In using @R{} under UNIX the suggested procedure for the first occasion |
| is as follows: |
| |
| @enumerate |
| @item |
| Create a separate sub-directory, say @file{work}, to hold data files on |
| which you will use @R{} for this problem. This will be the working |
| directory whenever you use @R{} for this particular problem. |
| |
| @example |
| $ mkdir work |
| $ cd work |
| @end example |
| |
| @item |
| Start the @R{} program with the command |
| |
| @example |
| $ R |
| @end example |
| |
| @item |
| At this point @R{} commands may be issued (see later). |
| |
| @item |
| To quit the @R{} program the command is |
| |
| @example |
| > q() |
| @end example |
| |
| At this point you will be asked whether you want to save the data from |
| your @R{} session. On some systems this will bring up a dialog box, and |
| on others you will receive a text prompt to which you can respond |
| @kbd{yes}, @kbd{no} or @kbd{cancel} (a single letter abbreviation will |
| do) to save the data before quitting, quit without saving, or return to |
| the @R{} session. Data which is saved will be available in future @R{} |
| sessions. |
| |
| @end enumerate |
| |
| Further @R{} sessions are simple. |
| |
| @enumerate |
| |
| @item |
| Make @file{work} the working directory and start the program as before: |
| |
| @example |
| $ cd work |
| $ R |
| @end example |
| |
| @item |
| Use the @R{} program, terminating with the @code{q()} command at the end |
| of the session. |
| |
| @end enumerate |
| |
| To use @R{} under Windows the procedure to |
| follow is basically the same. Create a folder as the working directory, |
| and set that in the @file{Start In} field in your @R{} shortcut. |
| Then launch @R{} by double clicking on the icon. |
| |
| @section An introductory session |
| |
| Readers wishing to get a feel for @R{} at a computer before proceeding |
| are strongly advised to work through the introductory session |
| given in @ref{A sample session}. |
| |
| @node Getting help, R commands; case sensitivity etc, Using R interactively, Introduction and preliminaries |
| @section Getting help with functions and features |
| @findex help |
| |
| @R{} has an inbuilt help facility similar to the @code{man} facility of |
| UNIX. To get more information on any specific named function, for |
| example @code{solve}, the command is |
| |
| @example |
| > help(solve) |
| @end example |
| @findex help |
| |
| An alternative is |
| |
| @example |
| > ?solve |
| @end example |
| @findex ? |
| |
| For a feature specified by special characters, the argument must be |
| enclosed in double or single quotes, making it a ``character string'': |
| This is also necessary for a few words with syntactic meaning including |
| @code{if}, @code{for} and @code{function}. |
| |
| @example |
| > help("[[") |
| @end example |
| |
| Either form of quote mark may be used to escape the other, as in the |
| string @code{"It's important"}. Our convention is to use |
| double quote marks for preference. |
| |
| On most @R{} installations help is available in @HTML{} format by |
| running |
| |
| @example |
| > help.start() |
| @end example |
| @findex help.start |
| |
| @noindent |
| which will launch a Web browser that allows the help pages to be browsed |
| with hyperlinks. On UNIX, subsequent help requests are sent to the |
| @HTML{}-based help system. The `Search Engine and Keywords' link in the |
| page loaded by @code{help.start()} is particularly useful as it is |
| contains a high-level concept list which searches though available |
| functions. It can be a great way to get your bearings quickly and to |
| understand the breadth of what @R{} has to offer. |
| |
| @findex help.search |
| The @code{help.search} command (alternatively @code{??}) |
| allows searching for help in various |
| ways. For example, |
| |
| @example |
| > ??solve |
| @end example |
| @findex ?? |
| |
| Try @code{?help.search} for details and more examples. |
| |
| The examples on a help topic can normally be run by |
| |
| @example |
| > example(@var{topic}) |
| @end example |
| @findex example |
| |
| Windows versions of @R{} have other optional help systems: use |
| |
| @example |
| > ?help |
| @end example |
| |
| @noindent |
| for further details. |
| |
| @node R commands; case sensitivity etc, Recall and correction of previous commands, Getting help, Introduction and preliminaries |
| @section R commands, case sensitivity, etc. |
| |
| Technically @R{} is an @emph{expression language} with a very simple |
| syntax. It is @emph{case sensitive} as are most UNIX based packages, so |
| @code{A} and @code{a} are different symbols and would refer to different |
| variables. The set of symbols which can be used in @R{} names depends |
| on the operating system and country within which @R{} is being run |
| (technically on the @emph{locale} in use). Normally all alphanumeric |
| symbols are allowed@footnote{For portable @R{} code (including that to |
| be used in @R{} packages) only A--Za--z0--9 should be used.} (and in |
| some countries this includes accented letters) plus @samp{@code{.}} and |
| @samp{@code{_}}, with the restriction that a name must start with |
| @samp{@code{.}} or a letter, and if it starts with @samp{@code{.}} the |
| second character must not be a digit. Names are effectively |
| unlimited in length. |
| |
| Elementary commands consist of either @emph{expressions} or |
| @emph{assignments}. If an expression is given as a command, it is |
| evaluated, printed (unless specifically made invisible), and the value |
| is lost. An assignment also evaluates an expression and passes the |
| value to a variable but the result is not automatically printed. |
| |
| Commands are separated either by a semi-colon (@samp{@code{;}}), or by a |
| newline. Elementary commands can be grouped together into one compound |
| expression by braces (@samp{@code{@{}} and @samp{@code{@}}}). |
| @emph{Comments} can be put almost@footnote{@strong{not} inside strings, |
| nor within the argument list of a function definition} anywhere, |
| starting with a hashmark (@samp{@code{#}}), everything to the end of the |
| line is a comment. |
| |
| If a command is not complete at the end of a line, @R{} will |
| give a different prompt, by default |
| |
| @example |
| + |
| @end example |
| |
| @noindent |
| on second and subsequent lines and continue to read input until the |
| command is syntactically complete. This prompt may be changed by the |
| user. We will generally omit the continuation prompt |
| and indicate continuation by simple indenting. |
| |
| Command lines entered at the console are limited@footnote{some of the |
| consoles will not allow you to enter more, and amongst those which do |
| some will silently discard the excess and some will use it as the start |
| of the next line.} to about 4095 bytes (not characters). |
| |
| @node Recall and correction of previous commands, Executing commands from or diverting output to a file, R commands; case sensitivity etc, Introduction and preliminaries |
| @section Recall and correction of previous commands |
| |
| Under many versions of UNIX and on Windows, @R{} provides a mechanism |
| for recalling and re-executing previous commands. The vertical arrow |
| keys on the keyboard can be used to scroll forward and backward through |
| a @emph{command history}. Once a command is located in this way, the |
| cursor can be moved within the command using the horizontal arrow keys, |
| and characters can be removed with the @key{DEL} key or added with the |
| other keys. More details are provided later: @pxref{The command-line |
| editor}. |
| |
| The recall and editing capabilities under UNIX are highly customizable. |
| You can find out how to do this by reading the manual entry for the |
| @strong{readline} library. |
| |
| Alternatively, the Emacs text editor provides more general support |
| mechanisms (via @acronym{ESS}, @emph{Emacs Speaks Statistics}) for |
| working interactively with @R{}. @xref{R and Emacs, , , R-FAQ, The R |
| statistical system FAQ}. |
| |
| @node Executing commands from or diverting output to a file, Data permanency and removing objects, Recall and correction of previous commands, Introduction and preliminaries |
| @section Executing commands from or diverting output to a file |
| @cindex Diverting input and output |
| |
| If commands@footnote{of unlimited length.} are stored in an external |
| file, say @file{commands.R} in the working directory @file{work}, they |
| may be executed at any time in an @R{} session with the command |
| |
| @example |
| > source("commands.R") |
| @end example |
| @findex source |
| |
| For Windows @strong{Source} is also available on the |
| @strong{File} menu. The function @code{sink}, |
| |
| @example |
| > sink("record.lis") |
| @end example |
| @findex sink |
| |
| @noindent |
| will divert all subsequent output from the console to an external file, |
| @file{record.lis}. The command |
| |
| @example |
| > sink() |
| @end example |
| |
| @noindent |
| restores it to the console once again. |
| |
| @node Data permanency and removing objects, , Executing commands from or diverting output to a file, Introduction and preliminaries |
| @section Data permanency and removing objects |
| |
| The entities that @R{} creates and manipulates are known as |
| @emph{objects}. These may be variables, arrays of numbers, character |
| strings, functions, or more general structures built from such |
| components. |
| |
| During an @R{} session, objects are created and stored by name (we |
| discuss this process in the next section). The @R{} command |
| |
| @example |
| > objects() |
| @end example |
| |
| @noindent |
| (alternatively, @code{ls()}) can be used to display the names of (most |
| of) the objects which are currently stored within @R{}. The collection |
| of objects currently stored is called the @emph{workspace}. |
| @cindex Workspace |
| |
| To remove objects the function @code{rm} is available: |
| |
| @example |
| > rm(x, y, z, ink, junk, temp, foo, bar) |
| @end example |
| @findex rm |
| @cindex Removing objects |
| |
| All objects created during an @R{} session can be stored permanently in |
| a file for use in future @R{} sessions. At the end of each @R{} session |
| you are given the opportunity to save all the currently available |
| objects. If you indicate that you want to do this, the objects are |
| written to a file called @file{.RData}@footnote{The leading ``dot'' in |
| this file name makes it @emph{invisible} in normal file listings in |
| UNIX, and in default GUI file listings on macOS and Windows.} in the |
| current directory, and the command lines used in the session are saved |
| to a file called @file{.Rhistory}. |
| |
| When @R{} is started at later time from the same directory it reloads |
| the workspace from this file. At the same time the associated commands |
| history is reloaded. |
| |
| It is recommended that you should use separate working directories for |
| analyses conducted with @R{}. It is quite common for objects with names |
| @code{x} and @code{y} to be created during an analysis. Names like this |
| are often meaningful in the context of a single analysis, but it can be |
| quite hard to decide what they might be when the several analyses have |
| been conducted in the same directory. |
| |
| @node Simple manipulations numbers and vectors, Objects, Introduction and preliminaries, Top |
| @chapter Simple manipulations; numbers and vectors |
| @cindex Vectors |
| |
| @menu |
| * Vectors and assignment:: |
| * Vector arithmetic:: |
| * Generating regular sequences:: |
| * Logical vectors:: |
| * Missing values:: |
| * Character vectors:: |
| * Index vectors:: |
| * Other types of objects:: |
| @end menu |
| |
| @node Vectors and assignment, Vector arithmetic, Simple manipulations numbers and vectors, Simple manipulations numbers and vectors |
| @section Vectors and assignment |
| |
| @R{} operates on named @emph{data structures}. The simplest such |
| structure is the numeric @emph{vector}, which is a single entity |
| consisting of an ordered collection of numbers. To set up a vector |
| named @code{x}, say, consisting of five numbers, namely 10.4, 5.6, 3.1, |
| 6.4 and 21.7, use the @R{} command |
| |
| @example |
| > x <- c(10.4, 5.6, 3.1, 6.4, 21.7) |
| @end example |
| @findex c |
| @findex vector |
| |
| This is an @emph{assignment} statement using the @emph{function} |
| @code{c()} which in this context can take an arbitrary number of vector |
| @emph{arguments} and whose value is a vector got by concatenating its |
| arguments end to end.@footnote{With other than vector types of argument, |
| such as @code{list} mode arguments, the action of @code{c()} is rather |
| different. See @ref{Concatenating lists}.} |
| |
| A number occurring by itself in an expression is taken as a vector of |
| length one. |
| |
| Notice that the assignment operator (@samp{@code{<-}}), which consists |
| of the two characters @samp{@code{<}} (``less than'') and |
| @samp{@code{-}} (``minus'') occurring strictly side-by-side and it |
| `points' to the object receiving the value of the expression. |
| In most contexts the @samp{@code{=}} operator can be used as an alternative. |
| @c In this text, the assignment operator is printed as @samp{<-}, rather |
| @c than ``@code{<-}''. |
| @cindex Assignment |
| |
| Assignment can also be made using the function @code{assign()}. An |
| equivalent way of making the same assignment as above is with: |
| |
| @example |
| > assign("x", c(10.4, 5.6, 3.1, 6.4, 21.7)) |
| @end example |
| |
| @noindent |
| The usual operator, @code{<-}, can be thought of as a syntactic |
| short-cut to this. |
| |
| Assignments can also be made in the other direction, using the obvious |
| change in the assignment operator. So the same assignment could be made |
| using |
| |
| @example |
| > c(10.4, 5.6, 3.1, 6.4, 21.7) -> x |
| @end example |
| |
| If an expression is used as a complete command, the value is printed |
| @emph{and lost}@footnote{Actually, it is still available as |
| @code{.Last.value} before any other statements are executed.}. So now if we |
| were to use the command |
| |
| @example |
| > 1/x |
| @end example |
| |
| @noindent |
| the reciprocals of the five values would be printed at the terminal (and |
| the value of @code{x}, of course, unchanged). |
| |
| The further assignment |
| |
| @example |
| > y <- c(x, 0, x) |
| @end example |
| |
| @noindent |
| would create a vector @code{y} with 11 entries consisting of two copies |
| of @code{x} with a zero in the middle place. |
| |
| @node Vector arithmetic, Generating regular sequences, Vectors and assignment, Simple manipulations numbers and vectors |
| @section Vector arithmetic |
| |
| Vectors can be used in arithmetic expressions, in which case the |
| operations are performed element by element. Vectors occurring in the |
| same expression need not all be of the same length. If they are not, |
| the value of the expression is a vector with the same length as the |
| longest vector which occurs in the expression. Shorter vectors in the |
| expression are @emph{recycled} as often as need be (perhaps |
| fractionally) until they match the length of the longest vector. In |
| particular a constant is simply repeated. So with the above assignments |
| the command |
| @cindex Recycling rule |
| |
| @example |
| > v <- 2*x + y + 1 |
| @end example |
| |
| @noindent |
| generates a new vector @code{v} of length 11 constructed by adding |
| together, element by element, @code{2*x} repeated 2.2 times, @code{y} |
| repeated just once, and @code{1} repeated 11 times. |
| |
| @cindex Arithmetic functions and operators |
| The elementary arithmetic operators are the usual @code{+}, @code{-}, |
| @code{*}, @code{/} and @code{^} for raising to a power. |
| @findex + |
| @findex - |
| @findex * |
| @findex / |
| @findex ^ |
| In addition all of the common arithmetic functions are available. |
| @code{log}, @code{exp}, @code{sin}, @code{cos}, @code{tan}, @code{sqrt}, |
| and so on, all have their usual meaning. |
| @findex log |
| @findex exp |
| @findex sin |
| @findex cos |
| @findex tan |
| @findex sqrt |
| @code{max} and @code{min} select the largest and smallest elements of a |
| vector respectively. |
| @findex max |
| @findex min |
| @code{range} is a function whose value is a vector of length two, namely |
| @code{c(min(x), max(x))}. |
| @findex range |
| @code{length(x)} is the number of elements in @code{x}, |
| @findex length |
| @code{sum(x)} gives the total of the elements in @code{x}, |
| @findex sum |
| and @code{prod(x)} their product. |
| @findex prod |
| |
| Two statistical functions are @code{mean(x)} which calculates the sample |
| mean, which is the same as @code{sum(x)/length(x)}, |
| @findex mean |
| and @code{var(x)} which gives |
| |
| @example |
| sum((x-mean(x))^2)/(length(x)-1) |
| @end example |
| @findex var |
| |
| @noindent |
| or sample variance. If the argument to @code{var()} is an |
| @math{n}-by-@math{p} matrix the value is a @math{p}-by-@math{p} sample |
| covariance matrix got by regarding the rows as independent |
| @math{p}-variate sample vectors. |
| |
| @code{sort(x)} returns a vector of the same size as @code{x} with the |
| elements arranged in increasing order; however there are other more |
| flexible sorting facilities available (see @code{order()} or |
| @code{sort.list()} which produce a permutation to do the sorting). |
| @findex sort |
| @findex order |
| |
| Note that @code{max} and @code{min} select the largest and smallest |
| values in their arguments, even if they are given several vectors. The |
| @emph{parallel} maximum and minimum functions @code{pmax} and |
| @code{pmin} return a vector (of length equal to their longest argument) |
| that contains in each element the largest (smallest) element in that |
| position in any of the input vectors. |
| @findex pmax |
| @findex pmin |
| |
| For most purposes the user will not be concerned if the ``numbers'' in a |
| numeric vector are integers, reals or even complex. Internally |
| calculations are done as double precision real numbers, or double |
| precision complex numbers if the input data are complex. |
| |
| To work with complex numbers, supply an explicit complex part. Thus |
| |
| @example |
| sqrt(-17) |
| @end example |
| |
| @noindent |
| will give @code{NaN} and a warning, but |
| |
| @example |
| sqrt(-17+0i) |
| @end example |
| |
| @noindent |
| will do the computations as complex numbers. |
| |
| @menu |
| * Generating regular sequences:: |
| @end menu |
| |
| @node Generating regular sequences, Logical vectors, Vector arithmetic, Simple manipulations numbers and vectors |
| @section Generating regular sequences |
| @cindex Regular sequences |
| |
| @R{} has a number of facilities for generating commonly used sequences |
| of numbers. For example @code{1:30} is the vector @code{c(1, 2, |
| @dots{}, 29, 30)}. |
| @c <NOTE> |
| @c Info cannot handle ':' as an index entry. |
| @ifnotinfo |
| @findex : |
| @end ifnotinfo |
| @c </NOTE> |
| The colon operator has high priority within an expression, so, for |
| example @code{2*1:15} is the vector @code{c(2, 4, @dots{}, 28, 30)}. |
| Put @code{n <- 10} and compare the sequences @code{1:n-1} and |
| @code{1:(n-1)}. |
| |
| The construction @code{30:1} may be used to generate a sequence |
| backwards. |
| |
| @findex seq |
| The function @code{seq()} is a more general facility for generating |
| sequences. It has five arguments, only some of which may be specified |
| in any one call. The first two arguments, if given, specify the |
| beginning and end of the sequence, and if these are the only two |
| arguments given the result is the same as the colon operator. That is |
| @code{seq(2,10)} is the same vector as @code{2:10}. |
| |
| Arguments to @code{seq()}, and to many other @R{} functions, can also |
| be given in named form, in which case the order in which they appear is |
| irrelevant. The first two arguments may be named |
| @code{from=@var{value}} and @code{to=@var{value}}; thus |
| @code{seq(1,30)}, @code{seq(from=1, to=30)} and @code{seq(to=30, |
| from=1)} are all the same as @code{1:30}. The next two arguments to |
| @code{seq()} may be named @code{by=@var{value}} and |
| @code{length=@var{value}}, which specify a step size and a length for |
| the sequence respectively. If neither of these is given, the default |
| @code{by=1} is assumed. |
| |
| For example |
| |
| @example |
| > seq(-5, 5, by=.2) -> s3 |
| @end example |
| |
| @noindent |
| generates in @code{s3} the vector @code{c(-5.0, -4.8, -4.6, @dots{}, |
| 4.6, 4.8, 5.0)}. Similarly |
| |
| @example |
| > s4 <- seq(length=51, from=-5, by=.2) |
| @end example |
| |
| @noindent |
| generates the same vector in @code{s4}. |
| |
| The fifth argument may be named @code{along=@var{vector}}, which is |
| normally used as the only argument to create the sequence @code{1, 2, |
| @dots{}, length(@var{vector})}, or the empty sequence if the vector is |
| empty (as it can be). |
| |
| A related function is @code{rep()} |
| @findex rep |
| which can be used for replicating an object in various complicated ways. |
| The simplest form is |
| |
| @example |
| > s5 <- rep(x, times=5) |
| @end example |
| |
| @noindent |
| which will put five copies of @code{x} end-to-end in @code{s5}. Another |
| useful version is |
| |
| @example |
| > s6 <- rep(x, each=5) |
| @end example |
| |
| @noindent |
| which repeats each element of @code{x} five times before moving on to |
| the next. |
| |
| @node Logical vectors, Missing values, Generating regular sequences, Simple manipulations numbers and vectors |
| @section Logical vectors |
| |
| As well as numerical vectors, @R{} allows manipulation of logical |
| quantities. The elements of a logical vector can have the values |
| @code{TRUE}, @code{FALSE}, and @code{NA} (for ``not available'', see |
| below). The first two are often abbreviated as @code{T} and @code{F}, |
| respectively. Note however that @code{T} and @code{F} are just |
| variables which are set to @code{TRUE} and @code{FALSE} by default, but |
| are not reserved words and hence can be overwritten by the user. Hence, |
| you should always use @code{TRUE} and @code{FALSE}. |
| @findex FALSE |
| @findex TRUE |
| @findex F |
| @findex T |
| |
| Logical vectors are generated by @emph{conditions}. For example |
| |
| @example |
| > temp <- x > 13 |
| @end example |
| |
| @noindent |
| sets @code{temp} as a vector of the same length as @code{x} with values |
| @code{FALSE} corresponding to elements of @code{x} where the condition |
| is @emph{not} met and @code{TRUE} where it is. |
| |
| The logical operators are @code{<}, @code{<=}, @code{>}, @code{>=}, |
| @code{==} for exact equality and @code{!=} for inequality. |
| @findex < |
| @findex <= |
| @findex > |
| @findex >= |
| @findex == |
| @findex != |
| In addition if @code{c1} and @code{c2} are logical expressions, then |
| @w{@code{c1 & c2}} is their intersection (@emph{``and''}), @w{@code{c1 | c2}} |
| is their union (@emph{``or''}), and @code{!c1} is the negation of |
| @code{c1}. |
| @findex ! |
| @findex | |
| @findex & |
| |
| Logical vectors may be used in ordinary arithmetic, in which case they |
| are @emph{coerced} into numeric vectors, @code{FALSE} becoming @code{0} |
| and @code{TRUE} becoming @code{1}. However there are situations where |
| logical vectors and their coerced numeric counterparts are not |
| equivalent, for example see the next subsection. |
| |
| @node Missing values, Character vectors, Logical vectors, Simple manipulations numbers and vectors |
| @section Missing values |
| @cindex Missing values |
| |
| In some cases the components of a vector may not be completely |
| known. When an element or value is ``not available'' or a ``missing |
| value'' in the statistical sense, a place within a vector may be |
| reserved for it by assigning it the special value @code{NA}. |
| @findex NA |
| In general any operation on an @code{NA} becomes an @code{NA}. The |
| motivation for this rule is simply that if the specification of an |
| operation is incomplete, the result cannot be known and hence is not |
| available. |
| |
| @findex is.na |
| The function @code{is.na(x)} gives a logical vector of the same size as |
| @code{x} with value @code{TRUE} if and only if the corresponding element |
| in @code{x} is @code{NA}. |
| |
| @example |
| > z <- c(1:3,NA); ind <- is.na(z) |
| @end example |
| |
| Notice that the logical expression @code{x == NA} is quite different |
| from @code{is.na(x)} since @code{NA} is not really a value but a marker |
| for a quantity that is not available. Thus @code{x == NA} is a vector |
| of the same length as @code{x} @emph{all} of whose values are @code{NA} |
| as the logical expression itself is incomplete and hence undecidable. |
| |
| Note that there is a second kind of ``missing'' values which are |
| produced by numerical computation, the so-called @emph{Not a Number}, |
| @code{NaN}, |
| @findex NaN |
| values. Examples are |
| |
| @example |
| > 0/0 |
| @end example |
| |
| @noindent |
| or |
| |
| @example |
| > Inf - Inf |
| @end example |
| |
| @noindent |
| which both give @code{NaN} since the result cannot be defined sensibly. |
| |
| In summary, @code{is.na(xx)} is @code{TRUE} @emph{both} for @code{NA} |
| and @code{NaN} values. To differentiate these, @code{is.nan(xx)} is only |
| @code{TRUE} for @code{NaN}s. |
| @findex is.nan |
| |
| Missing values are sometimes printed as @code{<NA>} when character |
| vectors are printed without quotes. |
| |
| @node Character vectors, Index vectors, Missing values, Simple manipulations numbers and vectors |
| @section Character vectors |
| @cindex Character vectors |
| |
| Character quantities and character vectors are used frequently in @R{}, |
| for example as plot labels. Where needed they are denoted by a sequence |
| of characters delimited by the double quote character, e.g., |
| @code{"x-values"}, @code{"New iteration results"}. |
| |
| Character strings are entered using either matching double (@code{"}) or |
| single (@code{'}) quotes, but are printed using double quotes (or |
| sometimes without quotes). They use C-style escape sequences, using |
| @code{\} as the escape character, so @code{\\} is entered and printed as |
| @code{\\}, and inside double quotes @code{"} is entered as @code{\"}. |
| Other useful escape sequences are @code{\n}, newline, @code{\t}, tab and |
| @code{\b}, backspace---see @command{?Quotes} for a full list. |
| |
| Character vectors may be concatenated into a vector by the @code{c()} |
| function; examples of their use will emerge frequently. |
| @findex c |
| |
| @findex paste |
| The @code{paste()} function takes an arbitrary number of arguments and |
| concatenates them one by one into character strings. Any numbers given |
| among the arguments are coerced into character strings in the evident |
| way, that is, in the same way they would be if they were printed. The |
| arguments are by default separated in the result by a single blank |
| character, but this can be changed by the named argument, |
| @code{sep=@var{string}}, which changes it to @code{@var{string}}, |
| possibly empty. |
| |
| For example |
| |
| @example |
| > labs <- paste(c("X","Y"), 1:10, sep="") |
| @end example |
| |
| @noindent |
| makes @code{labs} into the character vector |
| |
| @example |
| c("X1", "Y2", "X3", "Y4", "X5", "Y6", "X7", "Y8", "X9", "Y10") |
| @end example |
| |
| Note particularly that recycling of short lists takes place here too; |
| thus @code{c("X", "Y")} is repeated 5 times to match the sequence |
| @code{1:10}. |
| @footnote{@code{paste(..., collapse=@var{ss})} joins the |
| arguments into a single character string putting @var{ss} in between, e.g., |
| @code{ss <- "|"}. There are more tools for character manipulation, see the help |
| for @code{sub} and @code{substring}.} |
| |
| @node Index vectors, Other types of objects, Character vectors, Simple manipulations numbers and vectors |
| @section Index vectors; selecting and modifying subsets of a data set |
| @cindex Indexing vectors |
| |
| Subsets of the elements of a vector may be selected by appending to the |
| name of the vector an @emph{index vector} in square brackets. More |
| generally any expression that evaluates to a vector may have subsets of |
| its elements similarly selected by appending an index vector in square |
| brackets immediately after the expression. |
| |
| @c FIXME: Add a forward reference to subset() here |
| @c FIXME and add a paragraph about subset() {which needs to come after |
| @c FIXME data frames ... |
| |
| Such index vectors can be any of four distinct types. |
| |
| @enumerate |
| |
| @item |
| @strong{A logical vector}. In this case the index vector is recycled to the |
| same length as the vector from which elements are to be selected. |
| Values corresponding to @code{TRUE} in the index vector are selected and |
| those corresponding to @code{FALSE} are omitted. For example |
| |
| @example |
| > y <- x[!is.na(x)] |
| @end example |
| |
| @noindent |
| creates (or re-creates) an object @code{y} which will contain the |
| non-missing values of @code{x}, in the same order. Note that if |
| @code{x} has missing values, @code{y} will be shorter than @code{x}. |
| Also |
| |
| @example |
| > (x+1)[(!is.na(x)) & x>0] -> z |
| @end example |
| |
| @noindent |
| creates an object @code{z} and places in it the values of the vector |
| @code{x+1} for which the corresponding value in @code{x} was both |
| non-missing and positive. |
| |
| @item |
| @strong{A vector of positive integral quantities}. In this case the |
| values in the index vector must lie in the set @{1, 2, @dots{}, |
| @code{length(x)}@}. The corresponding elements of the vector are |
| selected and concatenated, @emph{in that order}, in the result. The |
| index vector can be of any length and the result is of the same length |
| as the index vector. For example @code{x[6]} is the sixth component of |
| @code{x} and |
| |
| @example |
| > x[1:10] |
| @end example |
| |
| @noindent |
| selects the first 10 elements of @code{x} (assuming @code{length(x)} is |
| not less than 10). Also |
| |
| @example |
| > c("x","y")[rep(c(1,2,2,1), times=4)] |
| @end example |
| |
| @noindent |
| (an admittedly unlikely thing to do) produces a character vector of |
| length 16 consisting of @code{"x", "y", "y", "x"} repeated four times. |
| |
| @item |
| @strong{A vector of negative integral quantities}. Such an index vector |
| specifies the values to be @emph{excluded} rather than included. Thus |
| |
| @example |
| > y <- x[-(1:5)] |
| @end example |
| |
| @noindent |
| gives @code{y} all but the first five elements of @code{x}. |
| |
| @item |
| @strong{A vector of character strings}. This possibility only applies |
| where an object has a @code{names} attribute to identify its components. |
| In this case a sub-vector of the names vector may be used in the same way |
| as the positive integral labels in item 2 further above. |
| |
| @example |
| > fruit <- c(5, 10, 1, 20) |
| > names(fruit) <- c("orange", "banana", "apple", "peach") |
| > lunch <- fruit[c("apple","orange")] |
| @end example |
| |
| The advantage is that alphanumeric @emph{names} are often easier to |
| remember than @emph{numeric indices}. This option is particularly |
| useful in connection with data frames, as we shall see later. |
| |
| @end enumerate |
| |
| An indexed expression can also appear on the receiving end of an |
| assignment, in which case the assignment operation is performed |
| @emph{only on those elements of the vector}. The expression must be of |
| the form @code{vector[@var{index_vector}]} as having an arbitrary |
| expression in place of the vector name does not make much sense here. |
| |
| For example |
| |
| @example |
| > x[is.na(x)] <- 0 |
| @end example |
| |
| @noindent |
| replaces any missing values in @code{x} by zeros and |
| |
| @example |
| > y[y < 0] <- -y[y < 0] |
| @end example |
| |
| @noindent |
| has the same effect as |
| |
| @example |
| > y <- abs(y) |
| @end example |
| |
| @node Other types of objects, , Index vectors, Simple manipulations numbers and vectors |
| @section Other types of objects |
| |
| Vectors are the most important type of object in @R{}, but there are |
| several others which we will meet more formally in later sections. |
| |
| @itemize @bullet |
| @item |
| @emph{matrices} or more generally @emph{arrays} are multi-dimensional |
| generalizations of vectors. In fact, they @emph{are} vectors that can |
| be indexed by two or more indices and will be printed in special ways. |
| @xref{Arrays and matrices}. |
| |
| @item |
| @emph{factors} provide compact ways to handle categorical data. |
| @xref{Factors}. |
| |
| @item |
| @emph{lists} are a general form of vector in which the various elements |
| need not be of the same type, and are often themselves vectors or lists. |
| Lists provide a convenient way to return the results of a statistical |
| computation. @xref{Lists}. |
| |
| @item |
| @emph{data frames} are matrix-like structures, in which the columns can |
| be of different types. Think of data frames as `data matrices' with one |
| row per observational unit but with (possibly) both numerical and |
| categorical variables. Many experiments are best described by data |
| frames: the treatments are categorical but the response is numeric. |
| @xref{Data frames}. |
| |
| @item |
| @emph{functions} are themselves objects in @R{} which can be stored in |
| the project's workspace. This provides a simple and convenient way to |
| extend @R{}. @xref{Writing your own functions}. |
| |
| @end itemize |
| |
| @node Objects, Factors, Simple manipulations numbers and vectors, Top |
| @chapter Objects, their modes and attributes |
| @cindex Objects |
| @cindex Attributes |
| |
| @c <FIXME> |
| @c This needs to be re-written for R. We really have data types (as |
| @c returned by typeof()) and that functions mode() and storage.mode() |
| @c are for S compatibility mostly. Hence in particular, there is no |
| @c intrinsic attribute `mode'. |
| |
| @menu |
| * The intrinsic attributes mode and length:: |
| * Changing the length of an object:: |
| * Getting and setting attributes:: |
| * The class of an object:: |
| @end menu |
| |
| @node The intrinsic attributes mode and length, Changing the length of an object, Objects, Objects |
| @section Intrinsic attributes: mode and length |
| |
| The entities @R{} operates on are technically known as @emph{objects}. |
| Examples are vectors of numeric (real) or complex values, vectors of |
| logical values and vectors of character strings. These are known as |
| ``atomic'' structures since their components are all of the same type, |
| or @emph{mode}, namely @emph{numeric}@footnote{@emph{numeric} mode is |
| actually an amalgam of two distinct modes, namely @emph{integer} and |
| @emph{double} precision, as explained in the manual.}, @emph{complex}, |
| @emph{logical}, @emph{character} and @emph{raw}. |
| |
| Vectors must have their values @emph{all of the same mode}. Thus any |
| given vector must be unambiguously either @emph{logical}, |
| @emph{numeric}, @emph{complex}, @emph{character} or @emph{raw}. (The |
| only apparent exception to this rule is the special ``value'' listed as |
| @code{NA} for quantities not available, but in fact there are several |
| types of @code{NA}). Note that a vector can be empty and still have a |
| mode. For example the empty character string vector is listed as |
| @code{character(0)} and the empty numeric vector as @code{numeric(0)}. |
| |
| @R{} also operates on objects called @emph{lists}, which are of mode |
| @emph{list}. These are ordered sequences of objects which individually |
| can be of any mode. @emph{lists} are known as ``recursive'' rather than |
| atomic structures since their components can themselves be lists in |
| their own right. |
| |
| The other recursive structures are those of mode @emph{function} and |
| @emph{expression}. Functions are the objects that form part of the @R{} |
| system along with similar user written functions, which we discuss in |
| some detail later. Expressions as objects form an |
| advanced part of @R{} which will not be discussed in this guide, except |
| indirectly when we discuss @emph{formulae} used with modeling in @R{}. |
| |
| By the @emph{mode} of an object we mean the basic type of its |
| fundamental constituents. This is a special case of a ``property'' |
| of an object. Another property of every object is its @emph{length}. The |
| functions @code{mode(@var{object})} and @code{length(@var{object})} can be |
| used to find out the mode and length of any defined structure |
| @footnote{Note however that @code{length(@var{object})} does not always |
| contain intrinsic useful information, e.g., when @code{@var{object}} is a |
| function.}. |
| |
| Further properties of an object are usually provided by |
| @code{attributes(@var{object})}, see @ref{Getting and setting attributes}. |
| Because of this, @emph{mode} and @emph{length} are also called ``intrinsic |
| attributes'' of an object. |
| @findex mode |
| @findex length |
| |
| For example, if @code{z} is a complex vector of length 100, then in an |
| expression @code{mode(z)} is the character string @code{"complex"} and |
| @code{length(z)} is @code{100}. |
| |
| @R{} caters for changes of mode almost anywhere it could be considered |
| sensible to do so, (and a few where it might not be). For example with |
| |
| @example |
| > z <- 0:9 |
| @end example |
| |
| @noindent |
| we could put |
| |
| @example |
| > digits <- as.character(z) |
| @end example |
| |
| @noindent |
| after which @code{digits} is the character vector @code{c("0", "1", "2", |
| @dots{}, "9")}. A further @emph{coercion}, or change of mode, |
| reconstructs the numerical vector again: |
| |
| @example |
| > d <- as.integer(digits) |
| @end example |
| |
| @noindent |
| Now @code{d} and @code{z} are the same.@footnote{In general, coercion |
| from numeric to character and back again will not be exactly reversible, |
| because of roundoff errors in the character representation.} There is a |
| large collection of functions of the form @code{as.@var{something}()} |
| for either coercion from one mode to another, or for investing an object |
| with some other attribute it may not already possess. The reader should |
| consult the different help files to become familiar with them. |
| |
| @c </FIXME> |
| |
| @node Changing the length of an object, Getting and setting attributes, The intrinsic attributes mode and length, Objects |
| @section Changing the length of an object |
| |
| An ``empty'' object may still have a mode. For example |
| |
| @example |
| > e <- numeric() |
| @end example |
| |
| @noindent |
| makes @code{e} an empty vector structure of mode numeric. Similarly |
| @code{character()} is a empty character vector, and so on. Once an |
| object of any size has been created, new components may be added to it |
| simply by giving it an index value outside its previous range. Thus |
| |
| @example |
| > e[3] <- 17 |
| @end example |
| |
| @noindent |
| now makes @code{e} a vector of length 3, (the first two components of |
| which are at this point both @code{NA}). This applies to any structure |
| at all, provided the mode of the additional component(s) agrees with the |
| mode of the object in the first place. |
| |
| This automatic adjustment of lengths of an object is used often, for |
| example in the @code{scan()} function for input. (@pxref{The scan() |
| function}.) |
| |
| Conversely to truncate the size of an object requires only an assignment |
| to do so. Hence if @code{alpha} is an object of length 10, then |
| |
| @example |
| > alpha <- alpha[2 * 1:5] |
| @end example |
| |
| @noindent |
| makes it an object of length 5 consisting of just the former components |
| with even index. (The old indices are not retained, of course.) We can |
| then retain just the first three values by |
| |
| @example |
| > length(alpha) <- 3 |
| @end example |
| |
| @noindent |
| and vectors can be extended (by missing values) in the same way. |
| |
| @node Getting and setting attributes, The class of an object, Changing the length of an object, Objects |
| @section Getting and setting attributes |
| @findex attr |
| @findex attributes |
| |
| The function @code{attributes(@var{object})} |
| @findex attributes |
| returns a list of all the non-intrinsic attributes currently defined for |
| that object. The function @code{attr(@var{object}, @var{name})} |
| @findex attr |
| can be used to select a specific attribute. These functions are rarely |
| used, except in rather special circumstances when some new attribute is |
| being created for some particular purpose, for example to associate a |
| creation date or an operator with an @R{} object. The concept, however, |
| is very important. |
| |
| Some care should be exercised when assigning or deleting attributes |
| since they are an integral part of the object system used in @R{}. |
| |
| When it is used on the left hand side of an assignment it can be used |
| either to associate a new attribute with @code{@var{object}} or to |
| change an existing one. For example |
| |
| @example |
| > attr(z, "dim") <- c(10,10) |
| @end example |
| |
| @noindent |
| allows @R{} to treat @code{z} as if it were a 10-by-10 matrix. |
| |
| @node The class of an object, , Getting and setting attributes, Objects |
| @section The class of an object |
| @cindex Classes |
| |
| All objects in @R{} have a @emph{class}, reported by the function |
| @code{class}. For simple vectors this is just the mode, for example |
| @code{"numeric"}, @code{"logical"}, @code{"character"} or @code{"list"}, |
| but @code{"matrix"}, @code{"array"}, @code{"factor"} and |
| @code{"data.frame"} are other possible values. |
| |
| A special attribute known as the @emph{class} of the object is used to |
| allow for an object-oriented style@footnote{A different style using |
| `formal' or `S4' classes is provided in package @code{methods}.} of |
| programming in @R{}. For example if an object has class |
| @code{"data.frame"}, it will be printed in a certain way, the |
| @code{plot()} function will display it graphically in a certain way, and |
| other so-called generic functions such as @code{summary()} will react to |
| it as an argument in a way sensitive to its class. |
| |
| To remove temporarily the effects of class, use the function |
| @code{unclass()}. |
| @findex unclass |
| For example if @code{winter} has the class @code{"data.frame"} then |
| |
| @example |
| > winter |
| @end example |
| |
| @noindent |
| will print it in data frame form, which is rather like a matrix, whereas |
| |
| @example |
| > unclass(winter) |
| @end example |
| |
| @noindent |
| will print it as an ordinary list. Only in rather special situations do |
| you need to use this facility, but one is when you are learning to come |
| to terms with the idea of class and generic functions. |
| |
| Generic functions and classes will be discussed further in @ref{Object |
| orientation}, but only briefly. |
| |
| @node Factors, Arrays and matrices, Objects, Top |
| @chapter Ordered and unordered factors |
| @cindex Factors |
| @cindex Ordered factors |
| |
| A @emph{factor} is a vector object used to specify a discrete |
| classification (grouping) of the components of other vectors of the same length. |
| @R{} provides both @emph{ordered} and @emph{unordered} factors. |
| While the ``real'' application of factors is with model formulae |
| (@pxref{Contrasts}), we here look at a specific example. |
| |
| @section A specific example |
| |
| Suppose, for example, we have a sample of 30 tax accountants from all |
| the states and territories of Australia@footnote{Readers should note |
| that there are eight states and territories in Australia, namely the |
| Australian Capital Territory, New South Wales, the Northern Territory, |
| Queensland, South Australia, Tasmania, Victoria and Western Australia.} |
| and their individual state of origin is specified by a character vector |
| of state mnemonics as |
| |
| @example |
| > state <- c("tas", "sa", "qld", "nsw", "nsw", "nt", "wa", "wa", |
| "qld", "vic", "nsw", "vic", "qld", "qld", "sa", "tas", |
| "sa", "nt", "wa", "vic", "qld", "nsw", "nsw", "wa", |
| "sa", "act", "nsw", "vic", "vic", "act") |
| @end example |
| |
| Notice that in the case of a character vector, ``sorted'' means sorted |
| in alphabetical order. |
| |
| A @emph{factor} is similarly created using the @code{factor()} function: |
| @findex factor |
| |
| @example |
| > statef <- factor(state) |
| @end example |
| |
| The @code{print()} function handles factors slightly differently from |
| other objects: |
| |
| @example |
| > statef |
| [1] tas sa qld nsw nsw nt wa wa qld vic nsw vic qld qld sa |
| [16] tas sa nt wa vic qld nsw nsw wa sa act nsw vic vic act |
| Levels: act nsw nt qld sa tas vic wa |
| @end example |
| |
| To find out the levels of a factor the function @code{levels()} can be |
| used. |
| @findex levels |
| |
| @example |
| > levels(statef) |
| [1] "act" "nsw" "nt" "qld" "sa" "tas" "vic" "wa" |
| @end example |
| |
| @menu |
| * The function tapply() and ragged arrays:: |
| * Ordered factors:: |
| @end menu |
| |
| @node The function tapply() and ragged arrays, Ordered factors, Factors, Factors |
| @section The function @code{tapply()} and ragged arrays |
| @findex tapply |
| |
| To continue the previous example, suppose we have the incomes of the |
| same tax accountants in another vector (in suitably large units of |
| money) |
| |
| @example |
| > incomes <- c(60, 49, 40, 61, 64, 60, 59, 54, 62, 69, 70, 42, 56, |
| 61, 61, 61, 58, 51, 48, 65, 49, 49, 41, 48, 52, 46, |
| 59, 46, 58, 43) |
| @end example |
| |
| To calculate the sample mean income for each state we can now use the |
| special function @code{tapply()}: |
| |
| @example |
| > incmeans <- tapply(incomes, statef, mean) |
| @end example |
| |
| @noindent |
| giving a means vector with the components labelled by the levels |
| |
| @example |
| act nsw nt qld sa tas vic wa |
| 44.500 57.333 55.500 53.600 55.000 60.500 56.000 52.250 |
| @end example |
| |
| The function @code{tapply()} is used to apply a function, here |
| @code{mean()}, to each group of components of the first argument, here |
| @code{incomes}, defined by the levels of the second component, here |
| @code{statef}@footnote{Note that @code{tapply()} also works in this case |
| when its second argument is not a factor, e.g., |
| @samp{@code{tapply(incomes, state)}}, and this is true for quite a few |
| other functions, since arguments are @emph{coerced} to factors when |
| necessary (using @code{as.factor()}).}, as if they were separate vector |
| structures. The result is a structure of the same length as the levels |
| attribute of the factor containing the results. The reader should |
| consult the help document for more details. |
| |
| Suppose further we needed to calculate the standard errors of the state |
| income means. To do this we need to write an @R{} function to calculate |
| the standard error for any given vector. Since there is an builtin |
| function @code{var()} to calculate the sample variance, such a function |
| is a very simple one liner, specified by the assignment: |
| |
| @example |
| > stdError <- function(x) sqrt(var(x)/length(x)) |
| @end example |
| |
| @noindent |
| (Writing functions will be considered later in @ref{Writing your own |
| functions}. Note that @R{}'s a builtin function @code{sd()} is something different.) |
| @findex sd |
| @findex var |
| After this assignment, the standard errors are calculated by |
| |
| @example |
| > incster <- tapply(incomes, statef, stdError) |
| @end example |
| |
| @noindent |
| and the values calculated are then |
| |
| @example |
| > incster |
| act nsw nt qld sa tas vic wa |
| 1.5 4.3102 4.5 4.1061 2.7386 0.5 5.244 2.6575 |
| @end example |
| |
| As an exercise you may care to find the usual 95% confidence limits for |
| the state mean incomes. To do this you could use @code{tapply()} once |
| more with the @code{length()} function to find the sample sizes, and the |
| @code{qt()} function to find the percentage points of the appropriate |
| @math{t}-distributions. (You could also investigate @R{}'s facilities |
| for @math{t}-tests.) |
| |
| The function @code{tapply()} can also be used to handle more complicated |
| indexing of a vector by multiple categories. For example, we might wish |
| to split the tax accountants by both state and sex. However in this |
| simple instance (just one factor) what happens can be thought of as |
| follows. The values in the vector are collected into groups |
| corresponding to the distinct entries in the factor. The function is |
| then applied to each of these groups individually. The value is a |
| vector of function results, labelled by the @code{levels} attribute of |
| the factor. |
| |
| The combination of a vector and a labelling factor is an example of what |
| is sometimes called a @emph{ragged array}, since the subclass sizes are |
| possibly irregular. When the subclass sizes are all the same the |
| indexing may be done implicitly and much more efficiently, as we see in |
| the next section. |
| |
| |
| @node Ordered factors, , The function tapply() and ragged arrays, Factors |
| @section Ordered factors |
| @findex ordered |
| |
| The levels of factors are stored in alphabetical order, or in the order |
| they were specified to @code{factor} if they were specified explicitly. |
| |
| Sometimes the levels will have a natural ordering that we want to record |
| and want our statistical analysis to make use of. The @code{ordered()} |
| @findex ordered |
| function creates such ordered factors but is otherwise identical to |
| @code{factor}. For most purposes the only difference between ordered |
| and unordered factors is that the former are printed showing the |
| ordering of the levels, but the contrasts generated for them in fitting |
| linear models are different. |
| |
| |
| @node Arrays and matrices, Lists and data frames, Factors, Top |
| @chapter Arrays and matrices |
| |
| @menu |
| * Arrays:: |
| * Array indexing:: |
| * Index matrices:: |
| * The array() function:: |
| * The outer product of two arrays:: |
| * Generalized transpose of an array:: |
| * Matrix facilities:: |
| * Forming partitioned matrices:: |
| * The concatenation function c() with arrays:: |
| * Frequency tables from factors:: |
| @end menu |
| |
| @node Arrays, Array indexing, Arrays and matrices, Arrays and matrices |
| @section Arrays |
| @cindex Arrays |
| @cindex Matrices |
| |
| An array can be considered as a multiply subscripted collection of data |
| entries, for example numeric. @R{} allows simple facilities for |
| creating and handling arrays, and in particular the special case of |
| matrices. |
| |
| A dimension vector is a vector of non-negative integers. If its length is |
| @math{k} then the array is @math{k}-dimensional, e.g.@: a matrix is a |
| @math{2}-dimensional array. The dimensions are indexed from one up to |
| the values given in the dimension vector. |
| |
| A vector can be used by @R{} as an array only if it has a dimension |
| vector as its @emph{dim} attribute. Suppose, for example, @code{z} is a |
| vector of 1500 elements. The assignment |
| |
| @example |
| > dim(z) <- c(3,5,100) |
| @end example |
| @findex dim |
| |
| @noindent |
| gives it the @emph{dim} attribute that allows it to be treated as a |
| @math{3} by @math{5} by @math{100} array. |
| |
| Other functions such as @code{matrix()} and @code{array()} are available |
| for simpler and more natural looking assignments, as we shall see in |
| @ref{The array() function}. |
| |
| The values in the data vector give the values in the array in the same |
| order as they would occur in FORTRAN, that is ``column major order,'' |
| with the first subscript moving fastest and the last subscript slowest. |
| |
| For example if the dimension vector for an array, say @code{a}, is |
| @code{c(3,4,2)} then there are @eqn{3 \times 4 \times 2 = 24, 3 * 4 * 2 |
| = 24} entries in @code{a} and the data vector holds them in the order |
| @code{a[1,1,1], a[2,1,1], @dots{}, a[2,4,2], a[3,4,2]}. |
| |
| Arrays can be one-dimensional: such arrays are usually treated in the |
| same way as vectors (including when printing), but the exceptions can |
| cause confusion. |
| |
| @node Array indexing, Index matrices, Arrays, Arrays and matrices |
| @section Array indexing. Subsections of an array |
| @cindex Indexing of and by arrays |
| |
| Individual elements of an array may be referenced by giving the name of |
| the array followed by the subscripts in square brackets, separated by |
| commas. |
| |
| More generally, subsections of an array may be specified by giving a |
| sequence of @emph{index vectors} in place of subscripts; however |
| @emph{if any index position is given an empty index vector, then the |
| full range of that subscript is taken}. |
| |
| Continuing the previous example, @code{a[2,,]} is a @eqn{4 \times 2, 4 * |
| 2} array with dimension vector @code{c(4,2)} and data vector containing |
| the values |
| |
| @example |
| c(a[2,1,1], a[2,2,1], a[2,3,1], a[2,4,1], |
| a[2,1,2], a[2,2,2], a[2,3,2], a[2,4,2]) |
| @end example |
| |
| @noindent |
| in that order. @code{a[,,]} stands for the entire array, which is the |
| same as omitting the subscripts entirely and using @code{a} alone. |
| |
| For any array, say @code{Z}, the dimension vector may be referenced |
| explicitly as @code{dim(Z)} (on either side of an assignment). |
| |
| Also, if an array name is given with just @emph{one subscript or index |
| vector}, then the corresponding values of the data vector only are used; |
| in this case the dimension vector is ignored. This is not the case, |
| however, if the single index is not a vector but itself an array, as we |
| next discuss. |
| |
| @menu |
| * Index matrices:: |
| * The array() function:: |
| @end menu |
| |
| @node Index matrices, The array() function, Array indexing, Arrays and matrices |
| @section Index matrices |
| |
| As well as an index vector in any subscript position, a matrix may be |
| used with a single @emph{index matrix} in order either to assign a vector |
| of quantities to an irregular collection of elements in the array, or to |
| extract an irregular collection as a vector. |
| |
| A matrix example makes the process clear. In the case of a doubly |
| indexed array, an index matrix may be given consisting of two columns |
| and as many rows as desired. The entries in the index matrix are the |
| row and column indices for the doubly indexed array. Suppose for |
| example we have a @math{4} by @math{5} array @code{X} and we wish to do |
| the following: |
| |
| @itemize @bullet |
| @item |
| Extract elements @code{X[1,3]}, @code{X[2,2]} and @code{X[3,1]} as a |
| vector structure, and |
| @item |
| Replace these entries in the array @code{X} by zeroes. |
| @end itemize |
| In this case we need a @math{3} by @math{2} subscript array, as in the |
| following example. |
| |
| @example |
| > x <- array(1:20, dim=c(4,5)) # @r{Generate a 4 by 5 array.} |
| > x |
| [,1] [,2] [,3] [,4] [,5] |
| [1,] 1 5 9 13 17 |
| [2,] 2 6 10 14 18 |
| [3,] 3 7 11 15 19 |
| [4,] 4 8 12 16 20 |
| > i <- array(c(1:3,3:1), dim=c(3,2)) |
| > i # @r{@code{i} is a 3 by 2 index array.} |
| [,1] [,2] |
| [1,] 1 3 |
| [2,] 2 2 |
| [3,] 3 1 |
| > x[i] # @r{Extract those elements} |
| [1] 9 6 3 |
| > x[i] <- 0 # @r{Replace those elements by zeros.} |
| > x |
| [,1] [,2] [,3] [,4] [,5] |
| [1,] 1 5 0 13 17 |
| [2,] 2 0 10 14 18 |
| [3,] 0 7 11 15 19 |
| [4,] 4 8 12 16 20 |
| > |
| @end example |
| @noindent |
| Negative indices are not allowed in index matrices. @code{NA} and zero |
| values are allowed: rows in the index matrix containing a zero are |
| ignored, and rows containing an @code{NA} produce an @code{NA} in the |
| result. |
| |
| |
| As a less trivial example, suppose we wish to generate an (unreduced) |
| design matrix for a block design defined by factors @code{blocks} |
| (@code{b} levels) and @code{varieties} (@code{v} levels). Further |
| suppose there are @code{n} plots in the experiment. We could proceed as |
| follows: |
| |
| @example |
| > Xb <- matrix(0, n, b) |
| > Xv <- matrix(0, n, v) |
| > ib <- cbind(1:n, blocks) |
| > iv <- cbind(1:n, varieties) |
| > Xb[ib] <- 1 |
| > Xv[iv] <- 1 |
| > X <- cbind(Xb, Xv) |
| @end example |
| |
| To construct the incidence matrix, @code{N} say, we could use |
| |
| @example |
| > N <- crossprod(Xb, Xv) |
| @end example |
| @findex crossprod |
| |
| However a simpler direct way of producing this matrix is to use |
| @code{table()}: |
| @findex table |
| |
| @example |
| > N <- table(blocks, varieties) |
| @end example |
| |
| Index matrices must be numerical: any other form of matrix (e.g.@: a |
| logical or character matrix) supplied as a matrix is treated as an |
| indexing vector. |
| |
| @node The array() function, The outer product of two arrays, Index matrices, Arrays and matrices |
| @section The @code{array()} function |
| @findex array |
| |
| As well as giving a vector structure a @code{dim} attribute, arrays can |
| be constructed from vectors by the @code{array} function, which has the |
| form |
| |
| @example |
| > Z <- array(@var{data_vector}, @var{dim_vector}) |
| @end example |
| |
| For example, if the vector @code{h} contains 24 or fewer, numbers then |
| the command |
| |
| @example |
| > Z <- array(h, dim=c(3,4,2)) |
| @end example |
| |
| @noindent |
| would use @code{h} to set up @math{3} by @math{4} by @math{2} array in |
| @code{Z}. If the size of @code{h} is exactly 24 the result is the same as |
| |
| @example |
| > Z <- h ; dim(Z) <- c(3,4,2) |
| @end example |
| |
| However if @code{h} is shorter than 24, its values are recycled from the |
| beginning again to make it up to size 24 (@pxref{The recycling rule}) |
| but @code{dim(h) <- c(3,4,2)} would signal an error about mismatching |
| length. |
| As an extreme but common example |
| |
| @example |
| > Z <- array(0, c(3,4,2)) |
| @end example |
| |
| @noindent |
| makes @code{Z} an array of all zeros. |
| |
| At this point @code{dim(Z)} stands for the dimension vector |
| @code{c(3,4,2)}, and @code{Z[1:24]} stands for the data vector as it was |
| in @code{h}, and @code{Z[]} with an empty subscript or @code{Z} with no |
| subscript stands for the entire array as an array. |
| |
| Arrays may be used in arithmetic expressions and the result is an array |
| formed by element-by-element operations on the data vector. The |
| @code{dim} attributes of operands generally need to be the same, and |
| this becomes the dimension vector of the result. So if @code{A}, |
| @code{B} and @code{C} are all similar arrays, then |
| |
| @example |
| > D <- 2*A*B + C + 1 |
| @end example |
| |
| @noindent |
| makes @code{D} a similar array with its data vector being the result of |
| the given element-by-element operations. However the precise rule |
| concerning mixed array and vector calculations has to be considered a |
| little more carefully. |
| |
| @menu |
| * The recycling rule:: |
| @end menu |
| |
| @node The recycling rule, , The array() function, The array() function |
| @subsection Mixed vector and array arithmetic. The recycling rule |
| @cindex Recycling rule |
| |
| The precise rule affecting element by element mixed calculations with |
| vectors and arrays is somewhat quirky and hard to find in the |
| references. From experience we have found the following to be a reliable |
| guide. |
| |
| @itemize @bullet |
| @item |
| The expression is scanned from left to right. |
| @item |
| Any short vector operands are extended by recycling their values until |
| they match the size of any other operands. |
| @item |
| As long as short vectors and arrays @emph{only} are encountered, the |
| arrays must all have the same @code{dim} attribute or an error results. |
| @item |
| Any vector operand longer than a matrix or array operand generates an error. |
| @item |
| If array structures are present and no error or coercion to vector has |
| been precipitated, the result is an array structure with the common |
| @code{dim} attribute of its array operands. |
| @end itemize |
| |
| @node The outer product of two arrays, Generalized transpose of an array, The array() function, Arrays and matrices |
| @section The outer product of two arrays |
| @cindex Outer products of arrays |
| |
| An important operation on arrays is the @emph{outer product}. If |
| @code{a} and @code{b} are two numeric arrays, their outer product is an |
| array whose dimension vector is obtained by concatenating their two |
| dimension vectors (order is important), and whose data vector is got by |
| forming all possible products of elements of the data vector of @code{a} |
| with those of @code{b}. The outer product is formed by the special |
| operator @code{%o%}: |
| @findex %o% |
| |
| @example |
| > ab <- a %o% b |
| @end example |
| |
| An alternative is |
| |
| @example |
| > ab <- outer(a, b, "*") |
| @end example |
| @findex outer |
| |
| The multiplication function can be replaced by an arbitrary function of |
| two variables. For example if we wished to evaluate the function |
| @eqn{f(x; y) = \cos(y)/(1 + x^2), f(x; y) = cos(y)/(1 + x^2)} |
| over a regular grid of values with @math{x}- and @math{y}-coordinates |
| defined by the @R{} vectors @code{x} and @code{y} respectively, we could |
| proceed as follows: |
| |
| @example |
| > f <- function(x, y) cos(y)/(1 + x^2) |
| > z <- outer(x, y, f) |
| @end example |
| |
| In particular the outer product of two ordinary vectors is a doubly |
| subscripted array (that is a matrix, of rank at most 1). Notice that |
| the outer product operator is of course non-commutative. Defining your |
| own @R{} functions will be considered further in @ref{Writing your own |
| functions}. |
| |
| @subsubheading An example: Determinants of 2 by 2 single-digit matrices |
| |
| As an artificial but cute example, consider the determinants of @math{2} |
| by @math{2} matrices @math{[a, b; c, d]} where each entry is a |
| non-negative integer in the range @math{0, 1, @dots{}, 9}, that is a |
| digit. |
| |
| The problem is to find the determinants, @math{ad - bc}, of all possible |
| matrices of this form and represent the frequency with which each value |
| occurs as a @emph{high density} plot. This amounts to finding the |
| probability distribution of the determinant if each digit is chosen |
| independently and uniformly at random. |
| |
| A neat way of doing this uses the @code{outer()} function twice: |
| |
| @example |
| > d <- outer(0:9, 0:9) |
| > fr <- table(outer(d, d, "-")) |
| > plot(fr, xlab="Determinant", ylab="Frequency") |
| @end example |
| |
| Notice that @code{plot()} here uses a histogram like plot method, because |
| it ``sees'' that @code{fr} is of class @code{"table"}. |
| The ``obvious'' way of doing this problem with @code{for} loops, to be |
| discussed in @ref{Loops and conditional execution}, is so inefficient as |
| to be impractical. |
| |
| It is also perhaps surprising that about 1 in 20 such matrices is |
| singular. |
| |
| @node Generalized transpose of an array, Matrix facilities, The outer product of two arrays, Arrays and matrices |
| @section Generalized transpose of an array |
| @cindex Generalized transpose of an array |
| |
| The function @code{aperm(a, perm)} |
| @findex aperm |
| may be used to permute an array, @code{a}. The argument @code{perm} |
| must be a permutation of the integers @math{@{1, @dots{}, k@}}, where |
| @math{k} is the number of subscripts in @code{a}. The result of the |
| function is an array of the same size as @code{a} but with old dimension |
| given by @code{perm[j]} becoming the new @code{j}-th dimension. The |
| easiest way to think of this operation is as a generalization of |
| transposition for matrices. Indeed if @code{A} is a matrix, (that is, a |
| doubly subscripted array) then @code{B} given by |
| |
| @example |
| > B <- aperm(A, c(2,1)) |
| @end example |
| |
| @noindent |
| is just the transpose of @code{A}. For this special case a simpler |
| function @code{t()} |
| @findex t |
| is available, so we could have used @code{B <- t(A)}. |
| |
| @node Matrix facilities, Forming partitioned matrices, Generalized transpose of an array, Arrays and matrices |
| @section Matrix facilities |
| |
| @iftex |
| @macro xTx{} |
| @tex |
| $@strong{x}^T @strong{x}$% |
| @end tex |
| @end macro |
| @macro xxT{} |
| @tex |
| $@strong{x}@strong{x}^T$% |
| @end tex |
| @end macro |
| @end iftex |
| |
| @ifnottex |
| @macro xTx{} |
| x'x |
| @end macro |
| @macro xxT{} |
| x x' |
| @end macro |
| @end ifnottex |
| |
| As noted above, a matrix is just an array with two subscripts. However |
| it is such an important special case it needs a separate discussion. |
| @R{} contains many operators and functions that are available only for |
| matrices. For example @code{t(X)} is the matrix transpose function, as |
| noted above. The functions @code{nrow(A)} and @code{ncol(A)} give the |
| number of rows and columns in the matrix @code{A} respectively. |
| @findex nrow |
| @findex ncol |
| |
| @menu |
| * Multiplication:: |
| * Linear equations and inversion:: |
| * Eigenvalues and eigenvectors:: |
| * Singular value decomposition and determinants:: |
| * Least squares fitting and the QR decomposition:: |
| @end menu |
| |
| @node Multiplication, Linear equations and inversion, Matrix facilities, Matrix facilities |
| @subsection Matrix multiplication |
| |
| @cindex Matrix multiplication |
| The operator @code{%*%} is used for matrix multiplication. |
| @findex %*% |
| An @math{n} by @math{1} or @math{1} by @math{n} matrix may of course be |
| used as an @math{n}-vector if in the context such is appropriate. |
| Conversely, vectors which occur in matrix multiplication expressions are |
| automatically promoted either to row or column vectors, whichever is |
| multiplicatively coherent, if possible, (although this is not always |
| unambiguously possible, as we see later). |
| |
| If, for example, @code{A} and @code{B} are square matrices of the same |
| size, then |
| |
| @example |
| > A * B |
| @end example |
| |
| @noindent |
| is the matrix of element by element products and |
| |
| @example |
| > A %*% B |
| @end example |
| |
| @noindent |
| is the matrix product. If @code{x} is a vector, then |
| |
| @example |
| > x %*% A %*% x |
| @end example |
| |
| @noindent |
| is a quadratic form.@footnote{Note that @code{x %*% x} is ambiguous, as |
| it could mean either @xTx{} or @xxT{}, where @eqn{@strong{x},x} is the |
| column form. In such cases the smaller matrix seems implicitly to be |
| the interpretation adopted, so the scalar @xTx{} is in this case the |
| result. The matrix @xxT{} may be calculated either by @code{cbind(x) |
| %*% x} or @code{x %*% rbind(x)} since the result of @code{rbind()} or |
| @code{cbind()} is always a matrix. However, the best way to compute |
| @xTx{} or @xxT{} is @code{crossprod(x)} or @code{x %o% x} respectively.} |
| |
| @findex crossprod |
| The function @code{crossprod()} forms ``crossproducts'', meaning that |
| @code{crossprod(X, y)} is the same as @code{t(X) %*% y} but the |
| operation is more efficient. If the second argument to |
| @code{crossprod()} is omitted it is taken to be the same as the first. |
| |
| @findex diag |
| The meaning of @code{diag()} depends on its argument. @code{diag(v)}, |
| where @code{v} is a vector, gives a diagonal matrix with elements of the |
| vector as the diagonal entries. On the other hand @code{diag(M)}, where |
| @code{M} is a matrix, gives the vector of main diagonal entries of |
| @code{M}. This is the same convention as that used for @code{diag()} in |
| @sc{Matlab}. Also, somewhat confusingly, if @code{k} is a single |
| numeric value then @code{diag(k)} is the @code{k} by @code{k} identity |
| matrix! |
| |
| @node Linear equations and inversion, Eigenvalues and eigenvectors, Multiplication, Matrix facilities |
| @subsection Linear equations and inversion |
| |
| @cindex Linear equations |
| @findex solve |
| Solving linear equations is the inverse of matrix multiplication. |
| When after |
| |
| @example |
| > b <- A %*% x |
| @end example |
| |
| @noindent |
| only @code{A} and @code{b} are given, the vector @code{x} is the |
| solution of that linear equation system. In @R{}, |
| |
| @example |
| > solve(A,b) |
| @end example |
| |
| @noindent |
| solves the system, returning @code{x} (up to some accuracy loss). |
| Note that in linear algebra, formally |
| @eqn{@strong{x} = @strong{A}^{-1} @strong{b}, @code{x = A^@{-1@} %*% b}} |
| where |
| @eqn{@strong{A}^{-1}, @code{A^@{-1@}}} denotes the @emph{inverse} of |
| @eqn{@strong{A},@code{A}}, which can be computed by |
| |
| @example |
| solve(A) |
| @end example |
| |
| @noindent |
| but rarely is needed. Numerically, it is both inefficient and |
| potentially unstable to compute @code{x <- solve(A) %*% b} instead of |
| @code{solve(A,b)}. |
| |
| The quadratic form @eqn{@strong{x^T A^{-1} x},@ @code{x %*% A^@{-1@} %*% |
| x} @ } which is used in multivariate computations, should be computed by |
| something like@footnote{Even better would be to form a matrix square |
| root @eqn{B, B} with @eqn{A = BB^T, A = BB'} and find the squared length |
| of the solution of @eqn{By = x, By = x} , perhaps using the Cholesky or |
| eigen decomposition of @eqn{A, A}. } @code{x %*% solve(A,x)}, rather |
| than computing the inverse of @code{A}. |
| |
| @node Eigenvalues and eigenvectors, Singular value decomposition and determinants, Linear equations and inversion, Matrix facilities |
| @subsection Eigenvalues and eigenvectors |
| @cindex Eigenvalues and eigenvectors |
| |
| @findex eigen |
| The function @code{eigen(Sm)} calculates the eigenvalues and |
| eigenvectors of a symmetric matrix @code{Sm}. The result of this |
| function is a list of two components named @code{values} and |
| @code{vectors}. The assignment |
| |
| @example |
| > ev <- eigen(Sm) |
| @end example |
| |
| @noindent |
| will assign this list to @code{ev}. Then @code{ev$val} is the vector of |
| eigenvalues of @code{Sm} and @code{ev$vec} is the matrix of |
| corresponding eigenvectors. Had we only needed the eigenvalues we could |
| have used the assignment: |
| |
| @example |
| > evals <- eigen(Sm)$values |
| @end example |
| |
| @noindent |
| @code{evals} now holds the vector of eigenvalues and the second |
| component is discarded. If the expression |
| |
| @example |
| > eigen(Sm) |
| @end example |
| |
| @noindent |
| is used by itself as a command the two components are printed, with |
| their names. For large matrices it is better to avoid computing the |
| eigenvectors if they are not needed by using the expression |
| |
| @example |
| > evals <- eigen(Sm, only.values = TRUE)$values |
| @end example |
| |
| |
| @node Singular value decomposition and determinants, Least squares fitting and the QR decomposition, Eigenvalues and eigenvectors, Matrix facilities |
| @subsection Singular value decomposition and determinants |
| @cindex Singular value decomposition |
| |
| @findex svd |
| The function @code{svd(M)} takes an arbitrary matrix argument, @code{M}, |
| and calculates the singular value decomposition of @code{M}. This |
| consists of a matrix of orthonormal columns @code{U} with the same |
| column space as @code{M}, a second matrix of orthonormal columns |
| @code{V} whose column space is the row space of @code{M} and a diagonal |
| matrix of positive entries @code{D} such that @code{M = U %*% D %*% |
| t(V)}. @code{D} is actually returned as a vector of the diagonal |
| elements. The result of @code{svd(M)} is actually a list of three |
| components named @code{d}, @code{u} and @code{v}, with evident meanings. |
| |
| If @code{M} is in fact square, then, it is not hard to see that |
| |
| @example |
| > absdetM <- prod(svd(M)$d) |
| @end example |
| |
| @noindent |
| calculates the absolute value of the determinant of @code{M}. If this |
| calculation were needed often with a variety of matrices it could be |
| defined as an @R{} function |
| |
| @example |
| > absdet <- function(M) prod(svd(M)$d) |
| @end example |
| |
| @cindex Determinants |
| @noindent |
| after which we could use @code{absdet()} as just another @R{} function. |
| As a further trivial but potentially useful example, you might like to |
| consider writing a function, say @code{tr()}, to calculate the trace of |
| a square matrix. [Hint: You will not need to use an explicit loop. |
| Look again at the @code{diag()} function.] |
| |
| @findex det |
| @findex determinant |
| @R{} has a builtin function @code{det} to calculate a determinant, |
| including the sign, and another, @code{determinant}, to give the sign |
| and modulus (optionally on log scale), |
| |
| @c Functions will be discussed formally later in these notes. |
| |
| @node Least squares fitting and the QR decomposition, , Singular value decomposition and determinants, Matrix facilities |
| @subsection Least squares fitting and the QR decomposition |
| @cindex Least squares fitting |
| @cindex QR decomposition |
| |
| The function @code{lsfit()} returns a list giving results of a least |
| squares fitting procedure. An assignment such as |
| |
| @example |
| > ans <- lsfit(X, y) |
| @end example |
| @findex lsfit |
| |
| @noindent |
| gives the results of a least squares fit where @code{y} is the vector of |
| observations and @code{X} is the design matrix. See the help facility |
| for more details, and also for the follow-up function @code{ls.diag()} |
| for, among other things, regression diagnostics. Note that a grand mean |
| term is automatically included and need not be included explicitly as a |
| column of @code{X}. Further note that you almost always will prefer |
| using @code{lm(.)} (@pxref{Linear models}) to @code{lsfit()} for |
| regression modelling. |
| |
| @findex qr |
| Another closely related function is @code{qr()} and its allies. |
| Consider the following assignments |
| |
| @example |
| > Xplus <- qr(X) |
| > b <- qr.coef(Xplus, y) |
| > fit <- qr.fitted(Xplus, y) |
| > res <- qr.resid(Xplus, y) |
| @end example |
| |
| @noindent |
| These compute the orthogonal projection of @code{y} onto the range of |
| @code{X} in @code{fit}, the projection onto the orthogonal complement in |
| @code{res} and the coefficient vector for the projection in @code{b}, |
| that is, @code{b} is essentially the result of the @sc{Matlab} |
| `backslash' operator. |
| |
| It is not assumed that @code{X} has full column rank. Redundancies will |
| be discovered and removed as they are found. |
| |
| This alternative is the older, low-level way to perform least squares |
| calculations. Although still useful in some contexts, it would now |
| generally be replaced by the statistical models features, as will be |
| discussed in @ref{Statistical models in R}. |
| |
| |
| @node Forming partitioned matrices, The concatenation function c() with arrays, Matrix facilities, Arrays and matrices |
| @section Forming partitioned matrices, @code{cbind()} and @code{rbind()} |
| @findex cbind |
| @findex rbind |
| |
| As we have already seen informally, matrices can be built up from other |
| vectors and matrices by the functions @code{cbind()} and @code{rbind()}. |
| Roughly @code{cbind()} forms matrices by binding together matrices |
| horizontally, or column-wise, and @code{rbind()} vertically, or |
| row-wise. |
| |
| In the assignment |
| |
| @example |
| > X <- cbind(@var{arg_1}, @var{arg_2}, @var{arg_3}, @dots{}) |
| @end example |
| |
| @noindent |
| the arguments to @code{cbind()} must be either vectors of any length, or |
| matrices with the same column size, that is the same number of rows. |
| The result is a matrix with the concatenated arguments @var{arg_1}, |
| @var{arg_2}, @dots{} forming the columns. |
| |
| If some of the arguments to @code{cbind()} are vectors they may be |
| shorter than the column size of any matrices present, in which case they |
| are cyclically extended to match the matrix column size (or the length |
| of the longest vector if no matrices are given). |
| |
| The function @code{rbind()} does the corresponding operation for rows. |
| In this case any vector argument, possibly cyclically extended, are of |
| course taken as row vectors. |
| |
| Suppose @code{X1} and @code{X2} have the same number of rows. To |
| combine these by columns into a matrix @code{X}, together with an |
| initial column of @code{1}s we can use |
| |
| @example |
| > X <- cbind(1, X1, X2) |
| @end example |
| |
| The result of @code{rbind()} or @code{cbind()} always has matrix status. |
| Hence @code{cbind(x)} and @code{rbind(x)} are possibly the simplest ways |
| explicitly to allow the vector @code{x} to be treated as a column or row |
| matrix respectively. |
| |
| @node The concatenation function c() with arrays, Frequency tables from factors, Forming partitioned matrices, Arrays and matrices |
| @section The concatenation function, @code{c()}, with arrays |
| |
| It should be noted that whereas @code{cbind()} and @code{rbind()} are |
| concatenation functions that respect @code{dim} attributes, the basic |
| @code{c()} function does not, but rather clears numeric objects of all |
| @code{dim} and @code{dimnames} attributes. This is occasionally useful |
| in its own right. |
| |
| The official way to coerce an array back to a simple vector object is to |
| use @code{as.vector()} |
| |
| @example |
| > vec <- as.vector(X) |
| @end example |
| @findex as.vector |
| |
| However a similar result can be achieved by using @code{c()} with just |
| one argument, simply for this side-effect: |
| |
| @example |
| > vec <- c(X) |
| @end example |
| @findex c |
| |
| There are slight differences between the two, but ultimately the choice |
| between them is largely a matter of style (with the former being |
| preferable). |
| |
| @node Frequency tables from factors, , The concatenation function c() with arrays, Arrays and matrices |
| @section Frequency tables from factors |
| @cindex Tabulation |
| |
| Recall that a factor defines a partition into groups. Similarly a pair |
| of factors defines a two way cross classification, and so on. |
| @findex table |
| The function @code{table()} allows frequency tables to be calculated |
| from equal length factors. If there are @math{k} factor arguments, |
| the result is a @math{k}-way array of frequencies. |
| |
| Suppose, for example, that @code{statef} is a factor giving the state |
| code for each entry in a data vector. The assignment |
| |
| @example |
| > statefr <- table(statef) |
| @end example |
| |
| @noindent |
| gives in @code{statefr} a table of frequencies of each state in the |
| sample. The frequencies are ordered and labelled by the @code{levels} |
| attribute of the factor. This simple case is equivalent to, but more |
| convenient than, |
| |
| @example |
| > statefr <- tapply(statef, statef, length) |
| @end example |
| |
| Further suppose that @code{incomef} is a factor giving a suitably |
| defined ``income class'' for each entry in the data vector, for example |
| with the @code{cut()} function: |
| |
| @example |
| > factor(cut(incomes, breaks = 35+10*(0:7))) -> incomef |
| @end example |
| @findex cut |
| |
| Then to calculate a two-way table of frequencies: |
| |
| @example |
| > table(incomef,statef) |
| statef |
| incomef act nsw nt qld sa tas vic wa |
| (35,45] 1 1 0 1 0 0 1 0 |
| (45,55] 1 1 1 1 2 0 1 3 |
| (55,65] 0 3 1 3 2 2 2 1 |
| (65,75] 0 1 0 0 0 0 1 0 |
| @end example |
| |
| Extension to higher-way frequency tables is immediate. |
| |
| @node Lists and data frames, Reading data from files, Arrays and matrices, Top |
| @chapter Lists and data frames |
| |
| @menu |
| * Lists:: |
| * Constructing and modifying lists:: |
| * Data frames:: |
| @end menu |
| |
| @node Lists, Constructing and modifying lists, Lists and data frames, Lists and data frames |
| @section Lists |
| @cindex Lists |
| |
| An @R{} @emph{list} is an object consisting of an ordered collection of |
| objects known as its @emph{components}. |
| |
| There is no particular need for the components to be of the same mode or |
| type, and, for example, a list could consist of a numeric vector, a |
| logical value, a matrix, a complex vector, a character array, a |
| function, and so on. Here is a simple example of how to make a list: |
| |
| @example |
| > Lst <- list(name="Fred", wife="Mary", no.children=3, |
| child.ages=c(4,7,9)) |
| @end example |
| @findex list |
| |
| Components are always @emph{numbered} and may always be referred to as |
| such. Thus if @code{Lst} is the name of a list with four components, |
| these may be individually referred to as @code{Lst[[1]]}, |
| @code{Lst[[2]]}, @code{Lst[[3]]} and @code{Lst[[4]]}. If, further, |
| @code{Lst[[4]]} is a vector subscripted array then @code{Lst[[4]][1]} is |
| its first entry. |
| |
| If @code{Lst} is a list, then the function @code{length(Lst)} gives the |
| number of (top level) components it has. |
| |
| Components of lists may also be @emph{named}, and in this case the |
| component may be referred to either by giving the component name as a |
| character string in place of the number in double square brackets, or, |
| more conveniently, by giving an expression of the form |
| |
| @example |
| > @var{name}$@var{component_name} |
| @end example |
| |
| @noindent |
| for the same thing. |
| |
| This is a very useful convention as it makes it easier to get the right |
| component if you forget the number. |
| |
| So in the simple example given above: |
| |
| @code{Lst$name} is the same as @code{Lst[[1]]} and is the string |
| @code{"Fred"}, |
| |
| @code{Lst$wife} is the same as @code{Lst[[2]]} and is the string |
| @code{"Mary"}, |
| |
| @code{Lst$child.ages[1]} is the same as @code{Lst[[4]][1]} and is the |
| number @code{4}. |
| |
| Additionally, one can also use the names of the list components in |
| double square brackets, i.e., @code{Lst[["name"]]} is the same as |
| @code{Lst$name}. This is especially useful, when the name of the |
| component to be extracted is stored in another variable as in |
| |
| @example |
| > x <- "name"; Lst[[x]] |
| @end example |
| |
| It is very important to distinguish @code{Lst[[1]]} from @code{Lst[1]}. |
| @samp{@code{[[@var{@dots{}}]]}} is the operator used to select a single |
| element, whereas @samp{@code{[@var{@dots{}}]}} is a general subscripting |
| operator. Thus the former is the @emph{first object in the list} |
| @code{Lst}, and if it is a named list the name is @emph{not} included. |
| The latter is a @emph{sublist of the list @code{Lst} consisting of the |
| first entry only. If it is a named list, the names are transferred to |
| the sublist.} |
| |
| The names of components may be abbreviated down to the minimum number of |
| letters needed to identify them uniquely. Thus @code{Lst$coefficients} |
| may be minimally specified as @code{Lst$coe} and @code{Lst$covariance} |
| as @code{Lst$cov}. |
| |
| The vector of names is in fact simply an attribute of the list like any |
| other and may be handled as such. Other structures besides lists may, |
| of course, similarly be given a @emph{names} attribute also. |
| |
| @node Constructing and modifying lists, Data frames, Lists, Lists and data frames |
| @section Constructing and modifying lists |
| |
| New lists may be formed from existing objects by the function |
| @code{list()}. An assignment of the form |
| |
| @example |
| > Lst <- list(@var{name_1}=@var{object_1}, @var{@dots{}}, @var{name_m}=@var{object_m}) |
| @end example |
| |
| @noindent |
| sets up a list @code{Lst} of @math{m} components using @var{object_1}, |
| @dots{}, @var{object_m} for the components and giving them names as |
| specified by the argument names, (which can be freely chosen). If these |
| names are omitted, the components are numbered only. The components |
| used to form the list are @emph{copied} when forming the new list and |
| the originals are not affected. |
| |
| Lists, like any subscripted object, can be extended by specifying |
| additional components. For example |
| |
| @example |
| > Lst[5] <- list(matrix=Mat) |
| @end example |
| |
| @menu |
| * Concatenating lists:: |
| @end menu |
| |
| @node Concatenating lists, , Constructing and modifying lists, Constructing and modifying lists |
| @subsection Concatenating lists |
| @cindex Concatenating lists |
| |
| @findex c |
| When the concatenation function @code{c()} is given list arguments, the |
| result is an object of mode list also, whose components are those of the |
| argument lists joined together in sequence. |
| |
| @example |
| > list.ABC <- c(list.A, list.B, list.C) |
| @end example |
| |
| Recall that with vector objects as arguments the concatenation function |
| similarly joined together all arguments into a single vector structure. |
| In this case all other attributes, such as @code{dim} attributes, are |
| discarded. |
| |
| |
| @node Data frames, , Constructing and modifying lists, Lists and data frames |
| @section Data frames |
| @cindex Data frames |
| |
| A @emph{data frame} is a list with class @code{"data.frame"}. There are |
| restrictions on lists that may be made into data frames, namely |
| |
| @itemize @bullet |
| @item |
| The components must be vectors (numeric, character, or logical), |
| factors, numeric matrices, lists, or other data frames. |
| @item |
| Matrices, lists, and data frames provide as many variables to the new |
| data frame as they have columns, elements, or variables, respectively. |
| @item |
| Numeric vectors, logicals and factors are included as is, and by |
| default@footnote{Conversion of character columns to factors is |
| overridden using the @code{stringsAsFactors} argument to the |
| @code{data.frame()} function.} character vectors are coerced to be |
| factors, whose levels are the unique values appearing in the vector. |
| @item |
| Vector structures appearing as variables of the data frame must all have |
| the @emph{same length}, and matrix structures must all have the same |
| @emph{row size}. |
| @end itemize |
| |
| A data frame may for many purposes be regarded as a matrix with columns |
| possibly of differing modes and attributes. It may be displayed in |
| matrix form, and its rows and columns extracted using matrix indexing |
| conventions. |
| |
| @menu |
| * Making data frames:: |
| * attach() and detach():: |
| * Working with data frames:: |
| * Attaching arbitrary lists:: |
| * Managing the search path:: |
| @end menu |
| |
| @node Making data frames, attach() and detach(), Data frames, Data frames |
| @subsection Making data frames |
| |
| Objects satisfying the restrictions placed on the columns (components) |
| of a data frame may be used to form one using the function |
| @code{data.frame}: |
| @findex data.frame |
| |
| @example |
| > accountants <- data.frame(home=statef, loot=incomes, shot=incomef) |
| @end example |
| |
| A list whose components conform to the restrictions of a data frame may |
| be @emph{coerced} into a data frame using the function |
| @code{as.data.frame()} |
| @findex as.data.frame |
| |
| The simplest way to construct a data frame from scratch is to use the |
| @code{read.table()} function to read an entire data frame from an |
| external file. This is discussed further in @ref{Reading data from |
| files}. |
| |
| @node attach() and detach(), Working with data frames, Making data frames, Data frames |
| @subsection @code{attach()} and @code{detach()} |
| @findex attach |
| @findex detach |
| |
| The @code{$} notation, such as @code{accountants$home}, for list |
| components is not always very convenient. A useful facility would be |
| somehow to make the components of a list or data frame temporarily |
| visible as variables under their component name, without the need to |
| quote the list name explicitly each time. |
| |
| The @code{attach()} function takes a `database' such as a list or data |
| frame as its argument. Thus suppose @code{lentils} is a |
| data frame with three variables @code{lentils$u}, @code{lentils$v}, |
| @code{lentils$w}. The attach |
| |
| @example |
| > attach(lentils) |
| @end example |
| |
| @noindent |
| places the data frame in the search path at @w{position 2}, and provided |
| there are no variables @code{u}, @code{v} or @code{w} in @w{position 1}, |
| @code{u}, @code{v} and @code{w} are available as variables from the data |
| frame in their own right. At this point an assignment such as |
| |
| @example |
| > u <- v+w |
| @end example |
| |
| @noindent |
| does not replace the component @code{u} of the data frame, but rather |
| masks it with another variable @code{u} in the working directory at |
| @w{position 1} on the search path. To make a permanent change to the |
| data frame itself, the simplest way is to resort once again to the |
| @code{$} notation: |
| |
| @example |
| > lentils$u <- v+w |
| @end example |
| |
| However the new value of component @code{u} is not visible until the |
| data frame is detached and attached again. |
| |
| To detach a data frame, use the function |
| |
| @example |
| > detach() |
| @end example |
| |
| More precisely, this statement detaches from the search path the entity |
| currently at @w{position 2}. Thus in the present context the variables |
| @code{u}, @code{v} and @code{w} would be no longer visible, except under |
| the list notation as @code{lentils$u} and so on. Entities at positions |
| greater than 2 on the search path can be detached by giving their number |
| to @code{detach}, but it is much safer to always use a name, for example |
| by @code{detach(lentils)} or @code{detach("lentils")} |
| |
| @quotation Note |
| In @R{} lists and data frames can only be attached at position 2 or |
| above, and what is attached is a @emph{copy} of the original object. |
| You can alter the attached values @emph{via} @code{assign}, but the |
| original list or data frame is unchanged. |
| @end quotation |
| |
| @node Working with data frames, Attaching arbitrary lists, attach() and detach(), Data frames |
| @subsection Working with data frames |
| |
| A useful convention that allows you to work with many different problems |
| comfortably together in the same working directory is |
| |
| @itemize @bullet |
| @item |
| gather together all variables for any well defined and separate problem |
| in a data frame under a suitably informative name; |
| @item |
| when working with a problem attach the appropriate data frame at |
| @w{position 2}, and use the working directory at @w{level 1} for |
| operational quantities and temporary variables; |
| @item |
| before leaving a problem, add any variables you wish to keep for future |
| reference to the data frame using the @code{$} form of assignment, and |
| then @code{detach()}; |
| @item |
| finally remove all unwanted variables from the working directory and |
| keep it as clean of left-over temporary variables as possible. |
| @end itemize |
| |
| In this way it is quite simple to work with many problems in the same |
| directory, all of which have variables named @code{x}, @code{y} and |
| @code{z}, for example. |
| |
| @node Attaching arbitrary lists, Managing the search path, Working with data frames, Data frames |
| @subsection Attaching arbitrary lists |
| |
| @code{attach()} is a generic function that allows not only directories |
| and data frames to be attached to the search path, but other classes of |
| object as well. In particular any object of mode @code{"list"} may be |
| attached in the same way: |
| |
| @example |
| > attach(any.old.list) |
| @end example |
| |
| Anything that has been attached can be detached by @code{detach}, by |
| position number or, preferably, by name. |
| |
| @node Managing the search path, , Attaching arbitrary lists, Data frames |
| @subsection Managing the search path |
| @findex search |
| @cindex Search path |
| |
| The function @code{search} shows the current search path and so is |
| a very useful way to keep track of which data frames and lists (and |
| packages) have been attached and detached. Initially it gives |
| |
| @example |
| > search() |
| [1] ".GlobalEnv" "Autoloads" "package:base" |
| @end example |
| @noindent |
| where @code{.GlobalEnv} is the workspace.@footnote{See the on-line help |
| for @code{autoload} for the meaning of the second term.} |
| |
| After @code{lentils} is attached we have |
| |
| @example |
| > search() |
| [1] ".GlobalEnv" "lentils" "Autoloads" "package:base" |
| > ls(2) |
| [1] "u" "v" "w" |
| @end example |
| |
| @noindent |
| and as we see @code{ls} (or @code{objects}) can be used to examine the |
| contents of any position on the search path. |
| |
| Finally, we detach the data frame and confirm it has been removed from |
| the search path. |
| |
| @example |
| > detach("lentils") |
| > search() |
| [1] ".GlobalEnv" "Autoloads" "package:base" |
| @end example |
| |
| @node Reading data from files, Probability distributions, Lists and data frames, Top |
| @chapter Reading data from files |
| @cindex Reading data from files |
| |
| Large data objects will usually be read as values from external files |
| rather than entered during an @R{} session at the keyboard. @R{} input |
| facilities are simple and their requirements are fairly strict and even |
| rather inflexible. There is a clear presumption by the designers of |
| @R{} that you will be able to modify your input files using other tools, |
| such as file editors or Perl@footnote{Under UNIX, the utilities |
| @command{sed} or@command{awk} can be used.} to fit in with the |
| requirements of @R{}. Generally this is very simple. |
| |
| If variables are to be held mainly in data frames, as we strongly |
| suggest they should be, an entire data frame can be read directly with |
| the @code{read.table()} function. There is also a more primitive input |
| function, @code{scan()}, that can be called directly. |
| |
| For more details on importing data into @R{} and also exporting data, |
| see the @emph{R Data Import/Export} manual. |
| |
| @menu |
| * The read.table() function:: |
| * The scan() function:: |
| * Accessing builtin datasets:: |
| * Editing data:: |
| @end menu |
| |
| @node The read.table() function, The scan() function, Reading data from files, Reading data from files |
| @section The @code{read.table()} function |
| @findex read.table |
| |
| To read an entire data frame directly, the external file will normally |
| have a special form. |
| |
| @itemize @bullet |
| @item |
| The first line of the file should have a @emph{name} for each variable |
| in the data frame. |
| |
| @item |
| Each additional line of the file has as its first item a @emph{row label} |
| and the values for each variable. |
| @end itemize |
| |
| If the file has one fewer item in its first line than in its second, this |
| arrangement is presumed to be in force. So the first few lines of a file |
| to be read as a data frame might look as follows. |
| |
| @quotation |
| @cartouche |
| @example |
| @r{Input file form with names and row labels:} |
| |
| Price Floor Area Rooms Age Cent.heat |
| 01 52.00 111.0 830 5 6.2 no |
| 02 54.75 128.0 710 5 7.5 no |
| 03 57.50 101.0 1000 5 4.2 no |
| 04 57.50 131.0 690 6 8.8 no |
| 05 59.75 93.0 900 5 1.9 yes |
| ... |
| @end example |
| @end cartouche |
| @end quotation |
| |
| By default numeric items (except row labels) are read as numeric |
| variables and non-numeric variables, such as @code{Cent.heat} in the |
| example, as factors. This can be changed if necessary. |
| |
| The function @code{read.table()} can then be used to read the data frame |
| directly |
| |
| @example |
| > HousePrice <- read.table("houses.data") |
| @end example |
| |
| Often you will want to omit including the row labels directly and use the |
| default labels. In this case the file may omit the row label column as in |
| the following. |
| |
| @quotation |
| @cartouche |
| @example |
| @r{Input file form without row labels:} |
| |
| Price Floor Area Rooms Age Cent.heat |
| 52.00 111.0 830 5 6.2 no |
| 54.75 128.0 710 5 7.5 no |
| 57.50 101.0 1000 5 4.2 no |
| 57.50 131.0 690 6 8.8 no |
| 59.75 93.0 900 5 1.9 yes |
| ... |
| @end example |
| @end cartouche |
| @end quotation |
| |
| The data frame may then be read as |
| |
| @example |
| > HousePrice <- read.table("houses.data", header=TRUE) |
| @end example |
| |
| @noindent |
| where the @code{header=TRUE} option specifies that the first line is a |
| line of headings, and hence, by implication from the form of the file, |
| that no explicit row labels are given. |
| |
| @menu |
| * The scan() function:: |
| @end menu |
| |
| @node The scan() function, Accessing builtin datasets, The read.table() function, Reading data from files |
| @section The @code{scan()} function |
| @findex scan |
| |
| Suppose the data vectors are of equal length and are to be read in |
| parallel. Further suppose that there are three vectors, the first of |
| mode character and the remaining two of mode numeric, and the file is |
| @file{input.dat}. The first step is to use @code{scan()} to read in the |
| three vectors as a list, as follows |
| |
| @example |
| > inp <- scan("input.dat", list("",0,0)) |
| @end example |
| |
| The second argument is a dummy list structure that establishes the mode |
| of the three vectors to be read. The result, held in @code{inp}, is a |
| list whose components are the three vectors read in. To separate the |
| data items into three separate vectors, use assignments like |
| |
| @example |
| > label <- inp[[1]]; x <- inp[[2]]; y <- inp[[3]] |
| @end example |
| |
| More conveniently, the dummy list can have named components, in which |
| case the names can be used to access the vectors read in. For example |
| |
| @example |
| > inp <- scan("input.dat", list(id="", x=0, y=0)) |
| @end example |
| |
| If you wish to access the variables separately they may either be |
| re-assigned to variables in the working frame: |
| |
| @example |
| > label <- inp$id; x <- inp$x; y <- inp$y |
| @end example |
| |
| @noindent |
| or the list may be attached at @w{position 2} of the search path |
| (@pxref{Attaching arbitrary lists}). |
| |
| If the second argument is a single value and not a list, a single vector |
| is read in, all components of which must be of the same mode as the |
| dummy value. |
| |
| @example |
| > X <- matrix(scan("light.dat", 0), ncol=5, byrow=TRUE) |
| @end example |
| |
| There are more elaborate input facilities available and these are |
| detailed in the manuals. |
| |
| @node Accessing builtin datasets, Editing data, The scan() function, Reading data from files |
| @section Accessing builtin datasets |
| @cindex Accessing builtin datasets |
| @findex data |
| |
| Around 100 datasets are supplied with @R{} (in package @pkg{datasets}), |
| and others are available in packages (including the recommended packages |
| supplied with @R{}). To see the list of datasets currently available |
| use |
| |
| @example |
| data() |
| @end example |
| |
| @noindent |
| All the datasets supplied with @R{} are available directly by name. |
| However, many packages still use the obsolete convention in which |
| @code{data} was also used to load datasets into @R{}, for example |
| |
| @example |
| data(infert) |
| @end example |
| |
| @noindent |
| and this can still be used with the standard packages (as in this |
| example). In most cases this will load an @R{} object of the same name. |
| However, in a few cases it loads several objects, so see the on-line |
| help for the object to see what to expect. |
| |
| @subsection Loading data from other R packages |
| |
| To access data from a particular package, use the @code{package} |
| argument, for example |
| |
| @example |
| data(package="rpart") |
| data(Puromycin, package="datasets") |
| @end example |
| |
| If a package has been attached by @code{library}, its datasets are |
| automatically included in the search. |
| |
| User-contributed packages can be a rich source of datasets. |
| |
| @node Editing data, , Accessing builtin datasets, Reading data from files |
| @section Editing data |
| |
| @findex edit |
| When invoked on a data frame or matrix, @code{edit} brings up a separate |
| spreadsheet-like environment for editing. This is useful for making |
| small changes once a data set has been read. The command |
| |
| @example |
| > xnew <- edit(xold) |
| @end example |
| |
| @noindent |
| will allow you to edit your data set @code{xold}, and on completion the |
| changed object is assigned to @code{xnew}. If you want to alter the |
| original dataset @code{xold}, the simplest way is to use |
| @code{fix(xold)}, which is equivalent to @code{xold <- edit(xold)}. |
| |
| Use |
| |
| @example |
| > xnew <- edit(data.frame()) |
| @end example |
| |
| @noindent |
| to enter new data via the spreadsheet interface. |
| |
| |
| @node Probability distributions, Loops and conditional execution, Reading data from files, Top |
| @chapter Probability distributions |
| @cindex Probability distributions |
| |
| @menu |
| * R as a set of statistical tables:: |
| * Examining the distribution of a set of data:: |
| * One- and two-sample tests:: |
| @end menu |
| |
| @node R as a set of statistical tables, Examining the distribution of a set of data, Probability distributions, Probability distributions |
| @section R as a set of statistical tables |
| |
| One convenient use of @R{} is to provide a comprehensive set of |
| statistical tables. Functions are provided to evaluate the cumulative |
| distribution function @eqn{P(X \le x), P(X <= x)}, |
| the probability density function and the quantile function (given |
| @math{q}, the smallest @math{x} such that @eqn{P(X \le x) > q, P(X <= x) > q}), |
| and to simulate from the distribution. |
| |
| @quotation |
| @multitable{Distribution namessss}{names, names}{arguments, arguments} |
| @headitem Distribution @tab @R{} name @tab additional arguments |
| @item beta @tab @code{beta} @tab @code{shape1, shape2, ncp} |
| @item binomial @tab @code{binom} @tab @code{size, prob} |
| @item Cauchy @tab @code{cauchy} @tab @code{location, scale} |
| @item chi-squared @tab @code{chisq} @tab @code{df, ncp} |
| @item exponential @tab @code{exp} @tab @code{rate} |
| @item F @tab @code{f} @tab @code{df1, df2, ncp} |
| @item gamma @tab @code{gamma} @tab @code{shape, scale} |
| @item geometric @tab @code{geom} @tab @code{prob} |
| @item hypergeometric @tab @code{hyper} @tab @code{m, n, k} |
| @item log-normal @tab @code{lnorm} @tab @code{meanlog, sdlog} |
| @item logistic @tab @code{logis} @tab @code{location, scale} |
| @item negative binomial @tab @code{nbinom} @tab @code{size, prob} |
| @item normal @tab @code{norm} @tab @code{mean, sd} |
| @item Poisson @tab @code{pois} @tab @code{lambda} |
| @item signed rank @tab @code{signrank} @tab @code{n} |
| @item Student's t @tab @code{t} @tab @code{df, ncp} |
| @item uniform @tab @code{unif} @tab @code{min, max} |
| @item Weibull @tab @code{weibull} @tab @code{shape, scale} |
| @item Wilcoxon @tab @code{wilcox} @tab @code{m, n} |
| @end multitable |
| @end quotation |
| |
| @noindent |
| Prefix the name given here by @samp{d} for the density, @samp{p} for the |
| CDF, @samp{q} for the quantile function and @samp{r} for simulation |
| (@emph{r}andom deviates). The first argument is @code{x} for |
| @code{d@var{xxx}}, @code{q} for @code{p@var{xxx}}, @code{p} for |
| @code{q@var{xxx}} and @code{n} for @code{r@var{xxx}} (except for |
| @code{rhyper}, @code{rsignrank} and @code{rwilcox}, for which it is |
| @code{nn}). In not quite all cases is the non-centrality parameter |
| @code{ncp} currently available: see the on-line help for details. |
| |
| The @code{p@var{xxx}} and @code{q@var{xxx}} functions all have logical |
| arguments @code{lower.tail} and @code{log.p} and the @code{d@var{xxx}} |
| ones have @code{log}. This allows, e.g., getting the cumulative (or |
| ``integrated'') @emph{hazard} function, @eqn{H(t) = - \log(1 - F(t)), |
| H(t) = - log(1 - F(t))}, by |
| |
| @example |
| - p@var{xxx}(t, ..., lower.tail = FALSE, log.p = TRUE) |
| @end example |
| |
| @noindent |
| or more accurate log-likelihoods (by @code{d@var{xxx}(..., log = |
| TRUE)}), directly. |
| |
| In addition there are functions @code{ptukey} and @code{qtukey} for the |
| distribution of the studentized range of samples from a normal |
| distribution, and @code{dmultinom} and @code{rmultinom} for the |
| multinomial distribution. Further distributions are available in |
| contributed packages, notably @CRANpkg{SuppDists}. |
| |
| Here are some examples |
| |
| @example |
| > ## @r{2-tailed p-value for t distribution} |
| > 2*pt(-2.43, df = 13) |
| > ## @r{upper 1% point for an F(2, 7) distribution} |
| > qf(0.01, 2, 7, lower.tail = FALSE) |
| @end example |
| |
| See the on-line help on @code{RNG} for how random-number generation is |
| done in @R{}. |
| |
| @node Examining the distribution of a set of data, One- and two-sample tests, R as a set of statistical tables, Probability distributions |
| @section Examining the distribution of a set of data |
| |
| Given a (univariate) set of data we can examine its distribution in a |
| large number of ways. The simplest is to examine the numbers. Two |
| slightly different summaries are given by @code{summary} and |
| @code{fivenum} |
| @findex summary |
| @findex fivenum |
| and a display of the numbers by @code{stem} (a ``stem and leaf'' plot). |
| @findex stem |
| |
| @example |
| > attach(faithful) |
| > summary(eruptions) |
| Min. 1st Qu. Median Mean 3rd Qu. Max. |
| 1.600 2.163 4.000 3.488 4.454 5.100 |
| > fivenum(eruptions) |
| [1] 1.6000 2.1585 4.0000 4.4585 5.1000 |
| > stem(eruptions) |
| |
| The decimal point is 1 digit(s) to the left of the | |
| |
| 16 | 070355555588 |
| 18 | 000022233333335577777777888822335777888 |
| 20 | 00002223378800035778 |
| 22 | 0002335578023578 |
| 24 | 00228 |
| 26 | 23 |
| 28 | 080 |
| 30 | 7 |
| 32 | 2337 |
| 34 | 250077 |
| 36 | 0000823577 |
| 38 | 2333335582225577 |
| 40 | 0000003357788888002233555577778 |
| 42 | 03335555778800233333555577778 |
| 44 | 02222335557780000000023333357778888 |
| 46 | 0000233357700000023578 |
| 48 | 00000022335800333 |
| 50 | 0370 |
| @end example |
| |
| A stem-and-leaf plot is like a histogram, and @R{} has a function |
| @code{hist} to plot histograms. |
| @findex hist |
| |
| @example |
| > hist(eruptions) |
| ## @r{make the bins smaller, make a plot of density} |
| > hist(eruptions, seq(1.6, 5.2, 0.2), prob=TRUE) |
| > lines(density(eruptions, bw=0.1)) |
| > rug(eruptions) # @r{show the actual data points} |
| @end example |
| |
| @findex density |
| @cindex Density estimation |
| More elegant density plots can be made by @code{density}, and we added a |
| line produced by @code{density} in this example. The bandwidth |
| @code{bw} was chosen by trial-and-error as the default gives too much |
| smoothing (it usually does for ``interesting'' densities). (Better |
| automated methods of bandwidth choice are available, and in this example |
| @code{bw = "SJ"} gives a good result.) |
| |
| @ifnotinfo |
| @image{images/hist,9cm} |
| @end ifnotinfo |
| |
| We can plot the empirical cumulative distribution function by using the |
| function @code{ecdf}. |
| @findex ecdf |
| @cindex Empirical CDFs |
| |
| @example |
| > plot(ecdf(eruptions), do.points=FALSE, verticals=TRUE) |
| @end example |
| |
| This distribution is obviously far from any standard distribution. |
| How about the right-hand mode, say eruptions of longer than 3 minutes? |
| Let us fit a normal distribution and overlay the fitted CDF. |
| |
| @example |
| > long <- eruptions[eruptions > 3] |
| > plot(ecdf(long), do.points=FALSE, verticals=TRUE) |
| > x <- seq(3, 5.4, 0.01) |
| > lines(x, pnorm(x, mean=mean(long), sd=sqrt(var(long))), lty=3) |
| @end example |
| |
| @ifnotinfo |
| @image{images/ecdf,9cm} |
| @end ifnotinfo |
| |
| Quantile-quantile (Q-Q) plots can help us examine this more carefully. |
| @cindex Quantile-quantile plots |
| @findex qqnorm |
| @findex qqline |
| |
| @example |
| par(pty="s") # arrange for a square figure region |
| qqnorm(long); qqline(long) |
| @end example |
| |
| @noindent |
| which shows a reasonable fit but a shorter right tail than one would |
| expect from a normal distribution. Let us compare this with some |
| simulated data from a @math{t} distribution |
| |
| @ifnotinfo |
| @image{images/QQ,7cm} |
| @end ifnotinfo |
| |
| @example |
| x <- rt(250, df = 5) |
| qqnorm(x); qqline(x) |
| @end example |
| |
| @noindent |
| which will usually (if it is a random sample) show longer tails than |
| expected for a normal. We can make a Q-Q plot against the generating |
| distribution by |
| |
| @example |
| qqplot(qt(ppoints(250), df = 5), x, xlab = "Q-Q plot for t dsn") |
| qqline(x) |
| @end example |
| |
| Finally, we might want a more formal test of agreement with normality |
| (or not). @R{} provides the Shapiro-Wilk test |
| @cindex Shapiro-Wilk test |
| @findex shapiro.test |
| |
| @example |
| > shapiro.test(long) |
| |
| Shapiro-Wilk normality test |
| |
| data: long |
| W = 0.9793, p-value = 0.01052 |
| @end example |
| |
| @noindent |
| and the Kolmogorov-Smirnov test |
| @cindex Kolmogorov-Smirnov test |
| @findex ks.test |
| |
| @example |
| > ks.test(long, "pnorm", mean = mean(long), sd = sqrt(var(long))) |
| |
| One-sample Kolmogorov-Smirnov test |
| |
| data: long |
| D = 0.0661, p-value = 0.4284 |
| alternative hypothesis: two.sided |
| @end example |
| |
| @noindent |
| (Note that the distribution theory is not valid here as we |
| have estimated the parameters of the normal distribution from the same |
| sample.) |
| |
| @node One- and two-sample tests, , Examining the distribution of a set of data, Probability distributions |
| @section One- and two-sample tests |
| @cindex One- and two-sample tests |
| |
| So far we have compared a single sample to a normal distribution. A |
| much more common operation is to compare aspects of two samples. Note |
| that in @R{}, all ``classical'' tests including the ones used below are |
| in package @pkg{stats} which is normally loaded. |
| |
| Consider the following sets of data on the latent heat of the fusion of |
| ice (@emph{cal/gm}) from Rice (1995, p.490) |
| |
| @example |
| Method A: 79.98 80.04 80.02 80.04 80.03 80.03 80.04 79.97 |
| 80.05 80.03 80.02 80.00 80.02 |
| Method B: 80.02 79.94 79.98 79.97 79.97 80.03 79.95 79.97 |
| @end example |
| |
| @noindent |
| Boxplots provide a simple graphical comparison of the two samples. |
| |
| @c NOTE scan() from stdin is not parse()able, hence not source()able |
| @c Hence ./R-intro.R uses c(..) |
| @example |
| A <- scan() |
| 79.98 80.04 80.02 80.04 80.03 80.03 80.04 79.97 |
| 80.05 80.03 80.02 80.00 80.02 |
| |
| B <- scan() |
| 80.02 79.94 79.98 79.97 79.97 80.03 79.95 79.97 |
| |
| boxplot(A, B) |
| @end example |
| @findex boxplot |
| @cindex Box plots |
| |
| @noindent |
| which indicates that the first group tends to give higher results than |
| the second. |
| |
| @ifnotinfo |
| @image{images/ice,7cm} |
| @end ifnotinfo |
| |
| To test for the equality of the means of the two examples, we can use |
| an @emph{unpaired} @math{t}-test by |
| @cindex Student's @math{t} test |
| @findex t.test |
| |
| @example |
| > t.test(A, B) |
| |
| Welch Two Sample t-test |
| |
| data: A and B |
| t = 3.2499, df = 12.027, p-value = 0.00694 |
| alternative hypothesis: true difference in means is not equal to 0 |
| 95 percent confidence interval: |
| 0.01385526 0.07018320 |
| sample estimates: |
| mean of x mean of y |
| 80.02077 79.97875 |
| @end example |
| |
| @noindent |
| which does indicate a significant difference, assuming normality. By |
| default the @R{} function does not assume equality of variances in the |
| two samples (in contrast to the similar @SPLUS{} @code{t.test} |
| function). We can use the F test to test for equality in the variances, |
| provided that the two samples are from normal populations. |
| |
| @example |
| > var.test(A, B) |
| |
| F test to compare two variances |
| |
| data: A and B |
| F = 0.5837, num df = 12, denom df = 7, p-value = 0.3938 |
| alternative hypothesis: true ratio of variances is not equal to 1 |
| 95 percent confidence interval: |
| 0.1251097 2.1052687 |
| sample estimates: |
| ratio of variances |
| 0.5837405 |
| @end example |
| @findex var.test |
| |
| @noindent |
| which shows no evidence of a significant difference, and so we can use |
| the classical @math{t}-test that assumes equality of the variances. |
| |
| @example |
| > t.test(A, B, var.equal=TRUE) |
| |
| Two Sample t-test |
| |
| data: A and B |
| t = 3.4722, df = 19, p-value = 0.002551 |
| alternative hypothesis: true difference in means is not equal to 0 |
| 95 percent confidence interval: |
| 0.01669058 0.06734788 |
| sample estimates: |
| mean of x mean of y |
| 80.02077 79.97875 |
| @end example |
| |
| All these tests assume normality of the two samples. The two-sample |
| Wilcoxon (or Mann-Whitney) test only assumes a common continuous |
| distribution under the null hypothesis. |
| |
| @cindex Wilcoxon test |
| @findex wilcox.test |
| @example |
| > wilcox.test(A, B) |
| |
| Wilcoxon rank sum test with continuity correction |
| |
| data: A and B |
| W = 89, p-value = 0.007497 |
| alternative hypothesis: true location shift is not equal to 0 |
| |
| Warning message: |
| Cannot compute exact p-value with ties in: wilcox.test(A, B) |
| @end example |
| |
| @noindent |
| Note the warning: there are several ties in each sample, which suggests |
| strongly that these data are from a discrete distribution (probably due |
| to rounding). |
| |
| There are several ways to compare graphically the two samples. We have |
| already seen a pair of boxplots. The following |
| |
| @example |
| > plot(ecdf(A), do.points=FALSE, verticals=TRUE, xlim=range(A, B)) |
| > plot(ecdf(B), do.points=FALSE, verticals=TRUE, add=TRUE) |
| @end example |
| |
| @noindent |
| will show the two empirical CDFs, and @code{qqplot} will perform a Q-Q |
| plot of the two samples. The Kolmogorov-Smirnov test is of the maximal |
| vertical distance between the two ecdf's, assuming a common continuous |
| distribution: |
| |
| @example |
| > ks.test(A, B) |
| |
| Two-sample Kolmogorov-Smirnov test |
| |
| data: A and B |
| D = 0.5962, p-value = 0.05919 |
| alternative hypothesis: two-sided |
| |
| Warning message: |
| cannot compute correct p-values with ties in: ks.test(A, B) |
| @end example |
| |
| @node Loops and conditional execution, Writing your own functions, Probability distributions, Top |
| @chapter Grouping, loops and conditional execution |
| @cindex Loops and conditional execution |
| |
| @menu |
| * Grouped expressions:: |
| * Control statements:: |
| @end menu |
| |
| @node Grouped expressions, Control statements, Loops and conditional execution, Loops and conditional execution |
| @section Grouped expressions |
| @cindex Grouped expressions |
| |
| @R{} is an expression language in the sense that its only command type |
| is a function or expression which returns a result. Even an assignment |
| is an expression whose result is the value assigned, and it may be used |
| wherever any expression may be used; in particular multiple assignments |
| are possible. |
| |
| Commands may be grouped together in braces, @code{@{@var{expr_1}; |
| @var{@dots{}}; @var{expr_m}@}}, in which case the value of the group |
| is the result of the last expression in the group evaluated. Since such |
| a group is also an expression it may, for example, be itself included in |
| parentheses and used as part of an even larger expression, and so on. |
| |
| @node Control statements, , Grouped expressions, Loops and conditional execution |
| @section Control statements |
| @cindex Control statements |
| |
| @menu |
| * Conditional execution:: |
| * Repetitive execution:: |
| @end menu |
| |
| @node Conditional execution, Repetitive execution, Control statements, Control statements |
| @subsection Conditional execution: @code{if} statements |
| @findex if |
| |
| The language has available a conditional construction of the form |
| |
| @example |
| > if (@var{expr_1}) @var{expr_2} else @var{expr_3} |
| @end example |
| @findex if |
| @findex else |
| |
| @noindent |
| where @var{expr_1} must evaluate to a single logical value and the |
| result of the entire expression is then evident. |
| |
| @findex && |
| @findex || |
| The ``short-circuit'' operators @code{&&} and @code{||} are often used |
| as part of the condition in an @code{if} statement. Whereas @code{&} |
| and @code{|} apply element-wise to vectors, @code{&&} and @code{||} |
| apply to vectors of length one, and only evaluate their second argument |
| if necessary. |
| |
| @findex ifelse |
| There is a vectorized version of the @code{if}/@code{else} construct, |
| the @code{ifelse} function. This has the form @code{ifelse(condition, a, |
| b)} and returns a vector of the same length as @code{condition}, with |
| elements @code{a[i]} if @code{condition[i]} is true, otherwise |
| @code{b[i]} (where @code{a} and @code{b} are recycled as necessary). |
| |
| |
| @node Repetitive execution, , Conditional execution, Control statements |
| @subsection Repetitive execution: @code{for} loops, @code{repeat} and @code{while} |
| @findex for |
| |
| There is also a @code{for} loop construction which has the form |
| |
| @example |
| > for (@code{@var{name}} in @var{expr_1}) @var{expr_2} |
| @end example |
| |
| @noindent |
| where @code{@var{name}} is the loop variable. @var{expr_1} is a |
| vector expression, (often a sequence like @code{1:20}), and |
| @var{expr_2} is often a grouped expression with its sub-expressions |
| written in terms of the dummy @emph{name}. @var{expr_2} is repeatedly |
| evaluated as @var{name} ranges through the values in the vector result |
| of @var{expr_1}. |
| |
| As an example, suppose @code{ind} is a vector of class indicators and we |
| wish to produce separate plots of @code{y} versus @code{x} within |
| classes. One possibility here is to use @code{coplot()},@footnote{to be |
| discussed later, or use @code{xyplot} from package @CRANpkg{lattice}.} |
| which will produce an array of plots corresponding to each level of the |
| factor. Another way to do this, now putting all plots on the one |
| display, is as follows: |
| |
| @example |
| > xc <- split(x, ind) |
| > yc <- split(y, ind) |
| > for (i in 1:length(yc)) @{ |
| plot(xc[[i]], yc[[i]]) |
| abline(lsfit(xc[[i]], yc[[i]])) |
| @} |
| @end example |
| |
| @findex split |
| |
| (Note the function @code{split()} which produces a list of vectors |
| obtained by splitting a larger vector according to the classes specified |
| by a factor. This is a useful function, mostly used in connection |
| with boxplots. See the @code{help} facility for further details.) |
| |
| @quotation |
| @strong{Warning}: @code{for()} loops are used in @R{} code much less |
| often than in compiled languages. Code that takes a `whole object' view |
| is likely to be both clearer and faster in @R{}. |
| @end quotation |
| |
| Other looping facilities include the |
| |
| @example |
| > repeat @var{expr} |
| @end example |
| @findex repeat |
| |
| @noindent |
| statement and the |
| |
| @example |
| > while (@var{condition}) @var{expr} |
| @end example |
| @findex while |
| |
| @noindent |
| statement. |
| |
| The @code{break} statement can be used to terminate any loop, possibly |
| abnormally. This is the only way to terminate @code{repeat} loops. |
| @findex break |
| |
| The @code{next} statement can be used to discontinue one particular |
| cycle and skip to the ``next''. |
| @findex next |
| |
| Control statements are most often used in connection with |
| @emph{functions} which are discussed in @ref{Writing your own |
| functions}, and where more examples will emerge. |
| |
| |
| @node Writing your own functions, Statistical models in R, Loops and conditional execution, Top |
| @chapter Writing your own functions |
| @cindex Writing functions |
| |
| As we have seen informally along the way, the @R{} language allows the |
| user to create objects of mode @emph{function}. These are true @R{} |
| functions that are stored in a special internal form and may be used in |
| further expressions and so on. In the process, the language gains |
| enormously in power, convenience and elegance, and learning to write |
| useful functions is one of the main ways to make your use of @R{} |
| comfortable and productive. |
| |
| It should be emphasized that most of the functions supplied as part of |
| the @R{} system, such as @code{mean()}, @code{var()}, |
| @code{postscript()} and so on, are themselves written in @R{} and thus |
| do not differ materially from user written functions. |
| |
| A function is defined by an assignment of the form |
| |
| @example |
| > @var{name} <- function(@var{arg_1}, @var{arg_2}, @dots{}) @var{expression} |
| @end example |
| @findex function |
| |
| @noindent |
| The @var{expression} is an @R{} expression, (usually a grouped |
| expression), that uses the arguments, @var{arg_i}, to calculate a value. |
| The value of the expression is the value returned for the function. |
| |
| A call to the function then usually takes the form |
| @code{@var{name}(@var{expr_1}, @var{expr_2}, @dots{})} and may occur |
| anywhere a function call is legitimate. |
| |
| @menu |
| * Simple examples:: |
| * Defining new binary operators:: |
| * Named arguments and defaults:: |
| * The three dots argument:: |
| * Assignment within functions:: |
| * More advanced examples:: |
| * Scope:: |
| * Customizing the environment:: |
| * Object orientation:: |
| @end menu |
| |
| @node Simple examples, Defining new binary operators, Writing your own functions, Writing your own functions |
| @section Simple examples |
| |
| As a first example, consider a function to calculate the two sample |
| @math{t}-statistic, showing ``all the steps''. This is an artificial |
| example, of course, since there are other, simpler ways of achieving the |
| same end. |
| |
| The function is defined as follows: |
| |
| @example |
| > twosam <- function(y1, y2) @{ |
| n1 <- length(y1); n2 <- length(y2) |
| yb1 <- mean(y1); yb2 <- mean(y2) |
| s1 <- var(y1); s2 <- var(y2) |
| s <- ((n1-1)*s1 + (n2-1)*s2)/(n1+n2-2) |
| tst <- (yb1 - yb2)/sqrt(s*(1/n1 + 1/n2)) |
| tst |
| @} |
| @end example |
| |
| With this function defined, you could perform two sample @math{t}-tests |
| using a call such as |
| |
| @example |
| > tstat <- twosam(data$male, data$female); tstat |
| @end example |
| |
| As a second example, consider a function to emulate directly the |
| @sc{Matlab} backslash command, which returns the coefficients of the |
| orthogonal projection of the vector @math{y} onto the column space of |
| the matrix, @math{X}. (This is ordinarily called the least squares |
| estimate of the regression coefficients.) This would ordinarily be |
| done with the @code{qr()} function; however this is sometimes a bit |
| tricky to use directly and it pays to have a simple function such as the |
| following to use it safely. |
| |
| Thus given a @math{n} by @math{1} vector @math{y} and an @math{n} by |
| @math{p} matrix @math{X} then @math{X \ y} is defined as |
| @ifnottex |
| (X'X)^@{-@}X'y, where (X'X)^@{-@} |
| @end ifnottex |
| @tex |
| $(X^T X)^{-}X^T y$, where $(X^T X)^{-}$ |
| @end tex |
| is a generalized inverse of @math{X'X}. |
| |
| @example |
| > bslash <- function(X, y) @{ |
| X <- qr(X) |
| qr.coef(X, y) |
| @} |
| @end example |
| |
| After this object is created it may be used in statements such as |
| |
| @example |
| > regcoeff <- bslash(Xmat, yvar) |
| @end example |
| |
| @noindent |
| and so on. |
| |
| The classical @R{} function @code{lsfit()} does this job quite well, and |
| more@footnote{See also the methods described in @ref{Statistical models |
| in R}}. It in turn uses the functions @code{qr()} and @code{qr.coef()} |
| in the slightly counterintuitive way above to do this part of the |
| calculation. Hence there is probably some value in having just this |
| part isolated in a simple to use function if it is going to be in |
| frequent use. If so, we may wish to make it a matrix binary operator |
| for even more convenient use. |
| |
| @node Defining new binary operators, Named arguments and defaults, Simple examples, Writing your own functions |
| @section Defining new binary operators |
| @cindex Binary operators |
| |
| Had we given the @code{bslash()} function a different name, namely one of |
| the form |
| |
| @example |
| %@var{anything}% |
| @end example |
| |
| @noindent |
| it could have been used as a @emph{binary operator} in expressions |
| rather than in function form. Suppose, for example, we choose @code{!} |
| for the internal character. The function definition would then start as |
| |
| @example |
| > "%!%" <- function(X, y) @{ @dots{} @} |
| @end example |
| |
| @noindent |
| (Note the use of quote marks.) The function could then be used as |
| @code{X %!% y}. (The backslash symbol itself is not a convenient choice |
| as it presents special problems in this context.) |
| |
| The matrix multiplication operator, @code{%*%}, and the outer product |
| matrix operator @code{%o%} are other examples of binary operators |
| defined in this way. |
| |
| @node Named arguments and defaults, The three dots argument, Defining new binary operators, Writing your own functions |
| @section Named arguments and defaults |
| @cindex Named arguments |
| @cindex Default values |
| |
| As first noted in @ref{Generating regular sequences}, if arguments to |
| called functions are given in the ``@code{@var{name}=@var{object}}'' |
| form, they may be given in any order. Furthermore the argument sequence |
| may begin in the unnamed, positional form, and specify named arguments |
| after the positional arguments. |
| |
| Thus if there is a function @code{fun1} defined by |
| |
| @example |
| > fun1 <- function(data, data.frame, graph, limit) @{ |
| @r{[function body omitted]} |
| @} |
| @end example |
| |
| @noindent |
| then the function may be invoked in several ways, for example |
| |
| @example |
| > ans <- fun1(d, df, TRUE, 20) |
| > ans <- fun1(d, df, graph=TRUE, limit=20) |
| > ans <- fun1(data=d, limit=20, graph=TRUE, data.frame=df) |
| @end example |
| |
| @noindent |
| are all equivalent. |
| |
| In many cases arguments can be given commonly appropriate default |
| values, in which case they may be omitted altogether from the call when |
| the defaults are appropriate. For example, if @code{fun1} were defined |
| as |
| |
| @example |
| > fun1 <- function(data, data.frame, graph=TRUE, limit=20) @{ @dots{} @} |
| @end example |
| |
| @noindent |
| it could be called as |
| |
| @example |
| > ans <- fun1(d, df) |
| @end example |
| |
| @noindent |
| which is now equivalent to the three cases above, or as |
| |
| @example |
| > ans <- fun1(d, df, limit=10) |
| @end example |
| |
| @noindent |
| which changes one of the defaults. |
| |
| It is important to note that defaults may be arbitrary expressions, even |
| involving other arguments to the same function; they are not restricted |
| to be constants as in our simple example here. |
| |
| @node The three dots argument, Assignment within functions, Named arguments and defaults, Writing your own functions |
| @section The @samp{@dots{}} argument |
| |
| @c The ?Reserved topic links here, so please update it |
| @c if changing the node name. |
| |
| Another frequent requirement is to allow one function to pass on |
| argument settings to another. For example many graphics functions use |
| the function @code{par()} and functions like @code{plot()} allow the |
| user to pass on graphical parameters to @code{par()} to control the |
| graphical output. (@xref{The par() function}, for more details on the |
| @code{par()} function.) This can be done by including an extra |
| argument, literally @samp{@dots{}}, of the function, which may then be |
| passed on. An outline example is given below. |
| |
| @example |
| fun1 <- function(data, data.frame, graph=TRUE, limit=20, ...) @{ |
| @r{[omitted statements]} |
| if (graph) |
| par(pch="*", ...) |
| @r{[more omissions]} |
| @} |
| @end example |
| |
| Less frequently, a function will need to refer to components of |
| @samp{@dots{}}. The expression @code{list(...)} evaluates all such |
| arguments and returns them in a named list, while @code{..1}, |
| @code{..2}, etc. evaluate them one at a time, with @samp{..n} |
| returning the n'th unmatched argument. |
| |
| @node Assignment within functions, More advanced examples, The three dots argument, Writing your own functions |
| @section Assignments within functions |
| |
| Note that @emph{any ordinary assignments done within the function are |
| local and temporary and are lost after exit from the function}. Thus |
| the assignment @code{X <- qr(X)} does not affect the value of the |
| argument in the calling program. |
| |
| To understand completely the rules governing the scope of @R{} assignments |
| the reader needs to be familiar with the notion of an evaluation |
| @emph{frame}. This is a somewhat advanced, though hardly difficult, |
| topic and is not covered further here. |
| |
| If global and permanent assignments are intended within a function, then |
| either the ``superassignment'' operator, @code{<<-} or the function |
| @code{assign()} can be used. See the @code{help} document for details. |
| @SPLUS{} users should be aware that @code{<<-} has different semantics |
| in @R{}. These are discussed further in @ref{Scope}. |
| |
| @node More advanced examples, Scope, Assignment within functions, Writing your own functions |
| @section More advanced examples |
| |
| @menu |
| * Efficiency factors in block designs:: |
| * Dropping all names in a printed array:: |
| * Recursive numerical integration:: |
| @end menu |
| |
| @node Efficiency factors in block designs, Dropping all names in a printed array, More advanced examples, More advanced examples |
| @subsection Efficiency factors in block designs |
| |
| As a more complete, if a little pedestrian, example of a function, |
| consider finding the efficiency factors for a block design. (Some |
| aspects of this problem have already been discussed in @ref{Index |
| matrices}.) |
| |
| A block design is defined by two factors, say @code{blocks} (@code{b} |
| levels) and @code{varieties} (@code{v} levels). If @math{R} and |
| @math{K} are the @math{v} by @math{v} and @math{b} by @math{b} |
| @emph{replications} and @emph{block size} matrices, respectively, and |
| @math{N} is the @math{b} by @math{v} incidence matrix, then the |
| efficiency factors are defined as the eigenvalues of the matrix |
| @ifnottex |
| E = I_v - R^@{-1/2@}N'K^@{-1@}NR^@{-1/2@} = I_v - A'A, where |
| A = K^@{-1/2@}NR^@{-1/2@}. |
| @end ifnottex |
| @tex |
| $$E = I_v - R^{-1/2}N^T K^{-1}NR^{-1/2} = I_v - A^T A,$$ |
| where $A = K^{-1/2}NR^{-1/2}$. |
| @end tex |
| One way to write the function is given below. |
| |
| @example |
| > bdeff <- function(blocks, varieties) @{ |
| blocks <- as.factor(blocks) # @r{minor safety move} |
| b <- length(levels(blocks)) |
| varieties <- as.factor(varieties) # @r{minor safety move} |
| v <- length(levels(varieties)) |
| K <- as.vector(table(blocks)) # @r{remove dim attr} |
| R <- as.vector(table(varieties)) # @r{remove dim attr} |
| N <- table(blocks, varieties) |
| A <- 1/sqrt(K) * N * rep(1/sqrt(R), rep(b, v)) |
| sv <- svd(A) |
| list(eff=1 - sv$d^2, blockcv=sv$u, varietycv=sv$v) |
| @} |
| @end example |
| |
| It is numerically slightly better to work with the singular value |
| decomposition on this occasion rather than the eigenvalue routines. |
| |
| The result of the function is a list giving not only the efficiency |
| factors as the first component, but also the block and variety canonical |
| contrasts, since sometimes these give additional useful qualitative |
| information. |
| |
| @node Dropping all names in a printed array, Recursive numerical integration, Efficiency factors in block designs, More advanced examples |
| @subsection Dropping all names in a printed array |
| |
| For printing purposes with large matrices or arrays, it is often useful |
| to print them in close block form without the array names or numbers. |
| Removing the @code{dimnames} attribute will not achieve this effect, but |
| rather the array must be given a @code{dimnames} attribute consisting of |
| empty strings. For example to print a matrix, @code{X} |
| |
| @example |
| > temp <- X |
| > dimnames(temp) <- list(rep("", nrow(X)), rep("", ncol(X))) |
| > temp; rm(temp) |
| @end example |
| |
| This can be much more conveniently done using a function, |
| @code{no.dimnames()}, shown below, as a ``wrap around'' to achieve the |
| same result. It also illustrates how some effective and useful user |
| functions can be quite short. |
| |
| @example |
| no.dimnames <- function(a) @{ |
| ## @r{Remove all dimension names from an array for compact printing.} |
| d <- list() |
| l <- 0 |
| for(i in dim(a)) @{ |
| d[[l <- l + 1]] <- rep("", i) |
| @} |
| dimnames(a) <- d |
| a |
| @} |
| @end example |
| |
| With this function defined, an array may be printed in close format |
| using |
| |
| @example |
| > no.dimnames(X) |
| @end example |
| |
| This is particularly useful for large integer arrays, where patterns are |
| the real interest rather than the values. |
| |
| @node Recursive numerical integration, , Dropping all names in a printed array, More advanced examples |
| @subsection Recursive numerical integration |
| |
| Functions may be recursive, and may themselves define functions within |
| themselves. Note, however, that such functions, or indeed variables, |
| are not inherited by called functions in higher evaluation frames as |
| they would be if they were on the search path. |
| |
| The example below shows a naive way of performing one-dimensional |
| numerical integration. The integrand is evaluated at the end points of |
| the range and in the middle. If the one-panel trapezium rule answer is |
| close enough to the two panel, then the latter is returned as the value. |
| Otherwise the same process is recursively applied to each panel. The |
| result is an adaptive integration process that concentrates function |
| evaluations in regions where the integrand is farthest from linear. |
| There is, however, a heavy overhead, and the function is only |
| competitive with other algorithms when the integrand is both smooth and |
| very difficult to evaluate. |
| |
| The example is also given partly as a little puzzle in @R{} programming. |
| |
| @example |
| area <- function(f, a, b, eps = 1.0e-06, lim = 10) @{ |
| fun1 <- function(f, a, b, fa, fb, a0, eps, lim, fun) @{ |
| ## @r{function `fun1' is only visible inside `area'} |
| d <- (a + b)/2 |
| h <- (b - a)/4 |
| fd <- f(d) |
| a1 <- h * (fa + fd) |
| a2 <- h * (fd + fb) |
| if(abs(a0 - a1 - a2) < eps || lim == 0) |
| return(a1 + a2) |
| else @{ |
| return(fun(f, a, d, fa, fd, a1, eps, lim - 1, fun) + |
| fun(f, d, b, fd, fb, a2, eps, lim - 1, fun)) |
| @} |
| @} |
| fa <- f(a) |
| fb <- f(b) |
| a0 <- ((fa + fb) * (b - a))/2 |
| fun1(f, a, b, fa, fb, a0, eps, lim, fun1) |
| @} |
| @end example |
| |
| @menu |
| * Scope:: |
| * Object orientation:: |
| @end menu |
| |
| @node Scope, Customizing the environment, More advanced examples, Writing your own functions |
| @section Scope |
| @cindex Scope |
| |
| The discussion in this section is somewhat more technical than in other |
| parts of this document. However, it details one of the major differences |
| between @SPLUS{} and @R{}. |
| |
| The symbols which occur in the body of a function can be divided into |
| three classes; formal parameters, local variables and free variables. |
| The formal parameters of a function are those occurring in the argument |
| list of the function. Their values are determined by the process of |
| @emph{binding} the actual function arguments to the formal parameters. |
| Local variables are those whose values are determined by the evaluation |
| of expressions in the body of the functions. Variables which are not |
| formal parameters or local variables are called free variables. Free |
| variables become local variables if they are assigned to. Consider the |
| following function definition. |
| |
| @example |
| f <- function(x) @{ |
| y <- 2*x |
| print(x) |
| print(y) |
| print(z) |
| @} |
| @end example |
| |
| In this function, @code{x} is a formal parameter, @code{y} is a local |
| variable and @code{z} is a free variable. |
| |
| In @R{} the free variable bindings are resolved by first looking in the |
| environment in which the function was created. This is called |
| @emph{lexical scope}. First we define a function called @code{cube}. |
| |
| @example |
| cube <- function(n) @{ |
| sq <- function() n*n |
| n*sq() |
| @} |
| @end example |
| |
| The variable @code{n} in the function @code{sq} is not an argument to that |
| function. Therefore it is a free variable and the scoping rules must be |
| used to ascertain the value that is to be associated with it. Under static |
| scope (@SPLUS{}) the value is that associated with a global variable named |
| @code{n}. Under lexical scope (@R{}) it is the parameter to the function |
| @code{cube} since that is the active binding for the variable @code{n} at |
| the time the function @code{sq} was defined. The difference between |
| evaluation in @R{} and evaluation in @SPLUS{} is that @SPLUS{} looks for a |
| global variable called @code{n} while @R{} first looks for a variable |
| called @code{n} in the environment created when @code{cube} was invoked. |
| |
| @example |
| ## @r{first evaluation in S} |
| S> cube(2) |
| Error in sq(): Object "n" not found |
| Dumped |
| S> n <- 3 |
| S> cube(2) |
| [1] 18 |
| ## @r{then the same function evaluated in R} |
| R> cube(2) |
| [1] 8 |
| @end example |
| |
| Lexical scope can also be used to give functions @emph{mutable state}. |
| In the following example we show how @R{} can be used to mimic a bank |
| account. A functioning bank account needs to have a balance or total, a |
| function for making withdrawals, a function for making deposits and a |
| function for stating the current balance. We achieve this by creating |
| the three functions within @code{account} and then returning a list |
| containing them. When @code{account} is invoked it takes a numerical |
| argument @code{total} and returns a list containing the three functions. |
| Because these functions are defined in an environment which contains |
| @code{total}, they will have access to its value. |
| |
| The special assignment operator, @code{<<-}, |
| @findex <<- |
| is used to change the value associated with @code{total}. This operator |
| looks back in enclosing environments for an environment that contains |
| the symbol @code{total} and when it finds such an environment it |
| replaces the value, in that environment, with the value of right hand |
| side. If the global or top-level environment is reached without finding |
| the symbol @code{total} then that variable is created and assigned to |
| there. For most users @code{<<-} creates a global variable and assigns |
| the value of the right hand side to it@footnote{In some sense this |
| mimics the behavior in @SPLUS{} since in @SPLUS{} this operator always |
| creates or assigns to a global variable.}. Only when @code{<<-} has |
| been used in a function that was returned as the value of another |
| function will the special behavior described here occur. |
| |
| @example |
| open.account <- function(total) @{ |
| list( |
| deposit = function(amount) @{ |
| if(amount <= 0) |
| stop("Deposits must be positive!\n") |
| total <<- total + amount |
| cat(amount, "deposited. Your balance is", total, "\n\n") |
| @}, |
| withdraw = function(amount) @{ |
| if(amount > total) |
| stop("You don't have that much money!\n") |
| total <<- total - amount |
| cat(amount, "withdrawn. Your balance is", total, "\n\n") |
| @}, |
| balance = function() @{ |
| cat("Your balance is", total, "\n\n") |
| @} |
| ) |
| @} |
| |
| ross <- open.account(100) |
| robert <- open.account(200) |
| |
| ross$withdraw(30) |
| ross$balance() |
| robert$balance() |
| |
| ross$deposit(50) |
| ross$balance() |
| ross$withdraw(500) |
| @end example |
| |
| @node Customizing the environment, Object orientation, Scope, Writing your own functions |
| @section Customizing the environment |
| @cindex Customizing the environment |
| |
| Users can customize their environment in several different ways. There |
| is a site initialization file and every directory can have its own |
| special initialization file. Finally, the special functions |
| @code{.First} and @code{.Last} can be used. |
| |
| The location of the site initialization file is taken from the value of |
| the @env{R_PROFILE} environment variable. If that variable is unset, |
| the file @file{Rprofile.site} in the @R{} home subdirectory @file{etc} is |
| used. This file should contain the commands that you want to execute |
| every time @R{} is started under your system. A second, personal, |
| profile file named @file{.Rprofile}@footnote{So it is hidden under |
| UNIX.} can be placed in any directory. If @R{} is invoked in that |
| directory then that file will be sourced. This file gives individual |
| users control over their workspace and allows for different startup |
| procedures in different working directories. If no @file{.Rprofile} |
| file is found in the startup directory, then @R{} looks for a |
| @file{.Rprofile} file in the user's home directory and uses that (if it |
| exists). If the environment variable @env{R_PROFILE_USER} is set, the |
| file it points to is used instead of the @file{.Rprofile} files. |
| |
| Any function named @code{.First()} in either of the two profile files or |
| in the @file{.RData} image has a special status. It is automatically |
| performed at the beginning of an @R{} session and may be used to |
| initialize the environment. For example, the definition in the example |
| below alters the prompt to @code{$} and sets up various other useful |
| things that can then be taken for granted in the rest of the session. |
| |
| Thus, the sequence in which files are executed is, @file{Rprofile.site}, |
| the user profile, @file{.RData} and then @code{.First()}. A definition |
| in later files will mask definitions in earlier files. |
| |
| @example |
| > .First <- function() @{ |
| options(prompt="$ ", continue="+\t") # @r{@code{$} is the prompt} |
| options(digits=5, length=999) # @r{custom numbers and printout} |
| x11() # @r{for graphics} |
| par(pch = "+") # @r{plotting character} |
| source(file.path(Sys.getenv("HOME"), "R", "mystuff.R")) |
| # @r{my personal functions} |
| library(MASS) # @r{attach a package} |
| @} |
| @end example |
| @findex .First |
| |
| Similarly a function @code{.Last()}, if defined, is (normally) executed |
| at the very end of the session. An example is given below. |
| |
| @example |
| > .Last <- function() @{ |
| graphics.off() # @r{a small safety measure.} |
| cat(paste(date(),"\nAdios\n")) # @r{Is it time for lunch?} |
| @} |
| @end example |
| @findex .Last |
| |
| @node Object orientation, , Customizing the environment, Writing your own functions |
| @section Classes, generic functions and object orientation |
| @cindex Classes |
| @cindex Generic functions |
| @cindex Object orientation |
| |
| The class of an object determines how it will be treated by what are |
| known as @emph{generic} functions. Put the other way round, a generic |
| function performs a task or action on its arguments @emph{specific to |
| the class of the argument itself}. If the argument lacks any @code{class} |
| attribute, or has a class not catered for specifically by the generic |
| function in question, there is always a @emph{default action} provided. |
| |
| An example makes things clearer. The class mechanism offers the user |
| the facility of designing and writing generic functions for special |
| purposes. Among the other generic functions are @code{plot()} for |
| displaying objects graphically, @code{summary()} for summarizing |
| analyses of various types, and @code{anova()} for comparing statistical |
| models. |
| |
| The number of generic functions that can treat a class in a specific way |
| can be quite large. For example, the functions that can accommodate in |
| some fashion objects of class @code{"data.frame"} include |
| |
| @example |
| [ [[<- any as.matrix |
| [<- mean plot summary |
| @end example |
| |
| @findex methods |
| A currently complete list can be got by using the @code{methods()} |
| function: |
| |
| @example |
| > methods(class="data.frame") |
| @end example |
| |
| Conversely the number of classes a generic function can handle can also |
| be quite large. For example the @code{plot()} function has a default |
| method and variants for objects of classes @code{"data.frame"}, |
| @code{"density"}, @code{"factor"}, and more. A complete list can be got |
| again by using the @code{methods()} function: |
| |
| @example |
| > methods(plot) |
| @end example |
| |
| For many generic functions the function body is quite short, for example |
| |
| @example |
| > coef |
| function (object, ...) |
| UseMethod("coef") |
| @end example |
| |
| @noindent |
| The presence of @code{UseMethod} indicates this is a generic function. |
| To see what methods are available we can use @code{methods()} |
| |
| @example |
| > methods(coef) |
| [1] coef.aov* coef.Arima* coef.default* coef.listof* |
| [5] coef.nls* coef.summary.nls* |
| |
| Non-visible functions are asterisked |
| @end example |
| |
| @noindent |
| In this example there are six methods, none of which can be seen by |
| typing its name. We can read these by either of |
| |
| @findex getAnywhere |
| @findex getS3method |
| @example |
| > getAnywhere("coef.aov") |
| A single object matching 'coef.aov' was found |
| It was found in the following places |
| registered S3 method for coef from namespace stats |
| namespace:stats |
| with value |
| |
| function (object, ...) |
| @{ |
| z <- object$coef |
| z[!is.na(z)] |
| @} |
| |
| > getS3method("coef", "aov") |
| function (object, ...) |
| @{ |
| z <- object$coef |
| z[!is.na(z)] |
| @} |
| @end example |
| |
| A function named @code{@var{gen}.@var{cl}} will be invoked by the |
| generic @code{@var{gen}} for class @code{@var{cl}}, so do not name |
| functions in this style unless they are intended to be methods. |
| |
| The reader is referred to the @emph{R Language Definition} for a more |
| complete discussion of this mechanism. |
| |
| |
| @node Statistical models in R, Graphics, Writing your own functions, Top |
| @chapter Statistical models in R |
| @cindex Statistical models |
| |
| This section presumes the reader has some familiarity with statistical |
| methodology, in particular with regression analysis and the analysis of |
| variance. Later we make some rather more ambitious presumptions, namely |
| that something is known about generalized linear models and nonlinear |
| regression. |
| |
| The requirements for fitting statistical models are sufficiently well |
| defined to make it possible to construct general tools that apply in a |
| broad spectrum of problems. |
| |
| @R{} provides an interlocking suite of facilities that make fitting |
| statistical models very simple. As we mention in the introduction, the |
| basic output is minimal, and one needs to ask for the details by calling |
| extractor functions. |
| |
| @menu |
| * Formulae for statistical models:: |
| * Linear models:: |
| * Generic functions for extracting model information:: |
| * Analysis of variance and model comparison:: |
| * Updating fitted models:: |
| * Generalized linear models:: |
| * Nonlinear least squares and maximum likelihood models:: |
| * Some non-standard models:: |
| @end menu |
| |
| @node Formulae for statistical models, Linear models, Statistical models in R, Statistical models in R |
| @section Defining statistical models; formulae |
| @cindex Formulae |
| |
| The template for a statistical model is a linear regression model with |
| independent, homoscedastic errors |
| |
| @ifnottex |
| @display |
| y_i = sum_@{j=0@}^p beta_j x_@{ij@} + e_i, @ @ @ @ i = 1, @dots{}, n, |
| @end display |
| @noindent |
| where the e_i are NID(0, sigma^2). |
| @end ifnottex |
| @tex |
| $$ y_i = \sum_{j=0}^p \beta_j x_{ij} + e_i, |
| \qquad e_i \sim {\rm NID}(0,\sigma^2), |
| \qquad i = 1, @dots{}, n |
| $$ |
| @end tex |
| In matrix terms this would be written |
| |
| @ifnottex |
| @display |
| y = X @ beta + e |
| @end display |
| @end ifnottex |
| @tex |
| $$ y = X \beta + e $$ |
| @end tex |
| |
| @noindent |
| where the @math{y} is the response vector, @math{X} is the @emph{model |
| matrix} or @emph{design matrix} and has columns |
| @math{x_0, x_1, @dots{}, x_p}, |
| the determining variables. Very often @math{x_0} |
| will be a column of ones defining an @emph{intercept} term. |
| |
| @subsubheading Examples |
| |
| Before giving a formal specification, a few examples may usefully set |
| the picture. |
| |
| Suppose @code{y}, @code{x}, @code{x0}, @code{x1}, @code{x2}, @dots{} are |
| numeric variables, @code{X} is a matrix and @code{A}, @code{B}, |
| @code{C}, @dots{} are factors. The following formulae on the left |
| side below specify statistical models as described on the right. |
| |
| @table @code |
| @item y ~ x |
| @itemx y ~ 1 + x |
| Both imply the same simple linear regression model of @math{y} on |
| @math{x}. The first has an implicit intercept term, and the second an |
| explicit one. |
| |
| @item y ~ 0 + x |
| @itemx y ~ -1 + x |
| @itemx y ~ x - 1 |
| Simple linear regression of @math{y} on @math{x} through the origin |
| (that is, without an intercept term). |
| |
| @item log(y) ~ x1 + x2 |
| Multiple regression of the transformed variable, |
| @ifnottex |
| log(y), |
| @end ifnottex |
| @tex |
| $\log(y)$, |
| @end tex |
| on @math{x1} and @math{x2} (with an implicit intercept term). |
| |
| @item y ~ poly(x,2) |
| @itemx y ~ 1 + x + I(x^2) |
| Polynomial regression of @math{y} on @math{x} of degree 2. The first |
| form uses orthogonal polynomials, and the second uses explicit powers, |
| as basis. |
| |
| @item y ~ X + poly(x,2) |
| Multiple regression @math{y} with model matrix consisting of the matrix |
| @math{X} as well as polynomial terms in @math{x} to degree 2. |
| |
| @item y ~ A |
| Single classification analysis of variance model of @math{y}, with |
| classes determined by @math{A}. |
| |
| @item y ~ A + x |
| Single classification analysis of covariance model of @math{y}, with |
| classes determined by @math{A}, and with covariate @math{x}. |
| |
| @item y ~ A*B |
| @itemx y ~ A + B + A:B |
| @itemx y ~ B %in% A |
| @itemx y ~ A/B |
| Two factor non-additive model of @math{y} on @math{A} and @math{B}. The |
| first two specify the same crossed classification and the second two |
| specify the same nested classification. In abstract terms all four |
| specify the same model subspace. |
| |
| @item y ~ (A + B + C)^2 |
| @itemx y ~ A*B*C - A:B:C |
| Three factor experiment but with a model containing main effects and two |
| factor interactions only. Both formulae specify the same model. |
| |
| @item y ~ A * x |
| @itemx y ~ A/x |
| @itemx y ~ A/(1 + x) - 1 |
| Separate simple linear regression models of @math{y} on @math{x} within |
| the levels of @math{A}, with different codings. The last form produces |
| explicit estimates of as many different intercepts and slopes as there |
| are levels in @math{A}. |
| |
| @item y ~ A*B + Error(C) |
| An experiment with two treatment factors, @math{A} and @math{B}, and |
| error strata determined by factor @math{C}. For example a split plot |
| experiment, with whole plots (and hence also subplots), determined by |
| factor @math{C}. |
| @end table |
| |
| @findex ~ |
| The operator @code{~} is used to define a @emph{model formula} in @R{}. |
| The form, for an ordinary linear model, is |
| |
| @example |
| @var{response} ~ @var{op_1} @var{term_1} @var{op_2} @var{term_2} @var{op_3} @var{term_3} @var{@dots{}} |
| @end example |
| |
| @noindent |
| where |
| |
| @table @var |
| @item response |
| is a vector or matrix, (or expression evaluating to a vector or matrix) |
| defining the response variable(s). |
| @item op_i |
| is an operator, either @code{+} or @code{-}, implying the inclusion or |
| exclusion of a term in the model, (the first is optional). |
| @item term_i |
| is either |
| @itemize @bullet |
| @item |
| a vector or matrix expression, or @code{1}, |
| @item |
| a factor, or |
| @item |
| a @emph{formula expression} consisting of factors, vectors or matrices |
| connected by @emph{formula operators}. |
| @end itemize |
| In all cases each term defines a collection of columns either to be |
| added to or removed from the model matrix. A @code{1} stands for an |
| intercept column and is by default included in the model matrix unless |
| explicitly removed. |
| |
| @end table |
| |
| The @emph{formula operators} are similar in effect to the Wilkinson and |
| Rogers notation used by such programs as Glim and Genstat. One |
| inevitable change is that the operator @samp{@code{.}} becomes |
| @samp{@code{:}} since the period is a valid name character in @R{}. |
| |
| The notation is summarized below (based on Chambers & Hastie, 1992, |
| p.29): |
| |
| @table @code |
| @item @var{Y} ~ @var{M} |
| @var{Y} is modeled as @var{M}. |
| |
| @item @var{M_1} + @var{M_2} |
| Include @var{M_1} and @var{M_2}. |
| |
| @item @var{M_1} - @var{M_2} |
| Include @var{M_1} leaving out terms of @var{M_2}. |
| |
| @item @var{M_1} : @var{M_2} |
| The tensor product of @var{M_1} and @var{M_2}. If both terms are |
| factors, then the ``subclasses'' factor. |
| |
| @item @var{M_1} %in% @var{M_2} |
| Similar to @code{@var{M_1}:@var{M_2}}, but with a different coding. |
| |
| @item @var{M_1} * @var{M_2} |
| @code{@var{M_1} + @var{M_2} + @var{M_1}:@var{M_2}}. |
| |
| @item @var{M_1} / @var{M_2} |
| @code{@var{M_1} + @var{M_2} %in% @var{M_1}}. |
| |
| @item @var{M}^@var{n} |
| All terms in @var{M} together with ``interactions'' up to order @var{n} |
| |
| @item I(@var{M}) |
| Insulate @var{M}. Inside @var{M} all operators have their normal |
| arithmetic meaning, and that term appears in the model matrix. |
| @end table |
| |
| Note that inside the parentheses that usually enclose function arguments |
| all operators have their normal arithmetic meaning. The function |
| @code{I()} is an identity function used to allow terms in model formulae |
| to be defined using arithmetic operators. |
| |
| Note particularly that the model formulae specify the @emph{columns |
| of the model matrix}, the specification of the parameters being |
| implicit. This is not the case in other contexts, for example in |
| specifying nonlinear models. |
| |
| @menu |
| * Contrasts:: |
| @end menu |
| |
| @node Contrasts, , Formulae for statistical models, Formulae for statistical models |
| @subsection Contrasts |
| @cindex Contrasts |
| |
| We need at least some idea how the model formulae specify the columns of |
| the model matrix. This is easy if we have continuous variables, as each |
| provides one column of the model matrix (and the intercept will provide |
| a column of ones if included in the model). |
| |
| @cindex Factors |
| @cindex Ordered factors |
| What about a @math{k}-level factor @code{A}? The answer differs for |
| unordered and ordered factors. For @emph{unordered} factors @math{k - |
| 1} columns are generated for the indicators of the second, @dots{}, |
| @math{k}th levels of the factor. (Thus the implicit parameterization is |
| to contrast the response at each level with that at the first.) For |
| @emph{ordered} factors the @math{k - 1} columns are the orthogonal |
| polynomials on @math{1, @dots{}, k}, omitting the constant term. |
| |
| Although the answer is already complicated, it is not the whole story. |
| First, if the intercept is omitted in a model that contains a factor |
| term, the first such term is encoded into @math{k} columns giving the |
| indicators for all the levels. Second, the whole behavior can be |
| changed by the @code{options} setting for @code{contrasts}. The default |
| setting in @R{} is |
| |
| @example |
| options(contrasts = c("contr.treatment", "contr.poly")) |
| @end example |
| |
| @noindent |
| The main reason for mentioning this is that @R{} and @Sl{} have |
| different defaults for unordered factors, @Sl{} using Helmert |
| contrasts. So if you need to compare your results to those of a textbook |
| or paper which used @SPLUS{}, you will need to set |
| |
| @example |
| options(contrasts = c("contr.helmert", "contr.poly")) |
| @end example |
| |
| @noindent |
| This is a deliberate difference, as treatment contrasts (@R{}'s default) |
| are thought easier for newcomers to interpret. |
| |
| We have still not finished, as the contrast scheme to be used can be set |
| for each term in the model using the functions @code{contrasts} and |
| @code{C}. |
| @findex contrasts |
| @findex C |
| |
| We have not yet considered interaction terms: these generate the |
| products of the columns introduced for their component terms. |
| |
| Although the details are complicated, model formulae in @R{} will |
| normally generate the models that an expert statistician would expect, |
| provided that marginality is preserved. Fitting, for example, a model |
| with an interaction but not the corresponding main effects will in |
| general lead to surprising results, and is for experts only. |
| |
| |
| @node Linear models, Generic functions for extracting model information, Formulae for statistical models, Statistical models in R |
| @section Linear models |
| @cindex Linear models |
| |
| The basic function for fitting ordinary multiple models is @code{lm()}, |
| and a streamlined version of the call is as follows: |
| @findex lm |
| |
| @example |
| > @var{fitted.model} <- lm(@var{formula}, data = @var{data.frame}) |
| @end example |
| |
| For example |
| |
| @example |
| > fm2 <- lm(y ~ x1 + x2, data = production) |
| @end example |
| |
| @noindent |
| would fit a multiple regression model of @math{y} on @math{x1} and |
| @math{x2} (with implicit intercept term). |
| |
| The important (but technically optional) parameter @code{data = |
| production} specifies that any variables needed to construct the model |
| should come first from the @code{production} @emph{data frame}. |
| @emph{This is the case regardless of whether data frame |
| @code{production} has been attached on the search path or not}. |
| |
| @node Generic functions for extracting model information, Analysis of variance and model comparison, Linear models, Statistical models in R |
| @section Generic functions for extracting model information |
| |
| The value of @code{lm()} is a fitted model object; technically a list of |
| results of class @code{"lm"}. Information about the fitted model can |
| then be displayed, extracted, plotted and so on by using generic |
| functions that orient themselves to objects of class @code{"lm"}. These |
| include |
| |
| @example |
| add1 deviance formula predict step |
| alias drop1 kappa print summary |
| anova effects labels proj vcov |
| coef family plot residuals |
| @end example |
| |
| A brief description of the most commonly used ones is given below. |
| |
| @table @code |
| @findex anova |
| @item anova(@var{object_1}, @var{object_2}) |
| Compare a submodel with an outer model and produce an analysis of |
| variance table. |
| |
| @findex coefficients |
| @findex coef |
| @item coef(@var{object}) |
| Extract the regression coefficient (matrix). |
| |
| Long form: @code{coefficients(@var{object})}. |
| |
| @findex deviance |
| @item deviance(@var{object}) |
| Residual sum of squares, weighted if appropriate. |
| |
| @findex formula |
| @item formula(@var{object}) |
| Extract the model formula. |
| |
| @findex plot |
| @item plot(@var{object}) |
| Produce four plots, showing residuals, fitted values and some |
| diagnostics. |
| |
| @findex predict |
| @item predict(@var{object}, newdata=@var{data.frame}) |
| The data frame supplied must have variables specified with the same |
| labels as the original. The value is a vector or matrix of predicted |
| values corresponding to the determining variable values in |
| @var{data.frame}. |
| |
| @c @item @code{predict.gam(@var{object},} |
| @c @item @w{@ @ @ @code{newdata=@var{data.frame})}} |
| @c @tab @code{predict.gam()} is a safe alternative to @code{predict()} that |
| @c can be used for @code{lm}, @code{glm} and @code{gam} fitted objects. It |
| @c must be used, for example, in cases where orthogonal polynomials are |
| @c used as the original basis functions, and the addition of new data |
| @c implies different basis functions to the original. |
| |
| @findex print |
| @item print(@var{object}) |
| Print a concise version of the object. Most often used implicitly. |
| |
| @findex residuals |
| @findex resid |
| @item residuals(@var{object}) |
| Extract the (matrix of) residuals, weighted as appropriate. |
| |
| Short form: @code{resid(@var{object})}. |
| |
| @findex step |
| @item step(@var{object}) |
| Select a suitable model by adding or dropping terms and preserving |
| hierarchies. The model with the smallest value of AIC (Akaike's An |
| Information Criterion) discovered in the stepwise search is returned. |
| |
| @findex summary |
| @item summary(@var{object}) |
| Print a comprehensive summary of the results of the regression analysis. |
| |
| @findex vcov |
| @item vcov(@var{object}) |
| Returns the variance-covariance matrix of the main parameters of a |
| fitted model object. |
| @end table |
| |
| @node Analysis of variance and model comparison, Updating fitted models, Generic functions for extracting model information, Statistical models in R |
| @section Analysis of variance and model comparison |
| @cindex Analysis of variance |
| |
| The model fitting function @code{aov(@var{formula}, |
| data=@var{data.frame})} |
| @findex aov |
| operates at the simplest level in a very similar way to the function |
| @code{lm()}, and most of the generic functions listed in the table in |
| @ref{Generic functions for extracting model information} apply. |
| |
| It should be noted that in addition @code{aov()} allows an analysis of |
| models with multiple error strata such as split plot experiments, or |
| balanced incomplete block designs with recovery of inter-block |
| information. The model formula |
| |
| @example |
| @var{response} ~ @var{mean.formula} + Error(@var{strata.formula}) |
| @end example |
| @findex Error |
| |
| @noindent |
| specifies a multi-stratum experiment with error strata defined by the |
| @var{strata.formula}. In the simplest case, @var{strata.formula} is |
| simply a factor, when it defines a two strata experiment, namely between |
| and within the levels of the factor. |
| |
| For example, with all determining variables factors, a model formula such |
| as that in: |
| |
| @example |
| > fm <- aov(yield ~ v + n*p*k + Error(farms/blocks), data=farm.data) |
| @end example |
| |
| @noindent |
| would typically be used to describe an experiment with mean model |
| @code{v + n*p*k} and three error strata, namely ``between farms'', |
| ``within farms, between blocks'' and ``within blocks''. |
| |
| @menu |
| * ANOVA tables:: |
| @end menu |
| |
| @node ANOVA tables, , Analysis of variance and model comparison, Analysis of variance and model comparison |
| @subsection ANOVA tables |
| |
| Note also that the analysis of variance table (or tables) are for a |
| sequence of fitted models. The sums of squares shown are the decrease |
| in the residual sums of squares resulting from an inclusion of |
| @emph{that term} in the model at @emph{that place} in the sequence. |
| Hence only for orthogonal experiments will the order of inclusion be |
| inconsequential. |
| |
| For multistratum experiments the procedure is first to project the |
| response onto the error strata, again in sequence, and to fit the mean |
| model to each projection. For further details, see Chambers & Hastie |
| (1992). |
| |
| A more flexible alternative to the default full ANOVA table is to |
| compare two or more models directly using the @code{anova()} function. |
| @findex anova |
| |
| @example |
| > anova(@var{fitted.model.1}, @var{fitted.model.2}, @dots{}) |
| @end example |
| |
| The display is then an ANOVA table showing the differences between the |
| fitted models when fitted in sequence. The fitted models being compared |
| would usually be an hierarchical sequence, of course. This does not |
| give different information to the default, but rather makes it easier to |
| comprehend and control. |
| |
| @node Updating fitted models, Generalized linear models, Analysis of variance and model comparison, Statistical models in R |
| @section Updating fitted models |
| @cindex Updating fitted models |
| |
| The @code{update()} function is largely a convenience function that |
| allows a model to be fitted that differs from one previously fitted |
| usually by just a few additional or removed terms. Its form is |
| @findex update |
| |
| @example |
| > @var{new.model} <- update(@var{old.model}, @var{new.formula}) |
| @end example |
| |
| In the @var{new.formula} the special name consisting of a period, |
| @samp{@code{.}}, |
| @findex . |
| only, can be used to stand for ``the corresponding part of the old model |
| formula''. For example, |
| |
| @example |
| > fm05 <- lm(y ~ x1 + x2 + x3 + x4 + x5, data = production) |
| > fm6 <- update(fm05, . ~ . + x6) |
| > smf6 <- update(fm6, sqrt(.) ~ .) |
| @end example |
| |
| @noindent |
| would fit a five variate multiple regression with variables (presumably) |
| from the data frame @code{production}, fit an additional model including |
| a sixth regressor variable, and fit a variant on the model where the |
| response had a square root transform applied. |
| |
| Note especially that if the @code{data=} argument is specified on the |
| original call to the model fitting function, this information is passed on |
| through the fitted model object to @code{update()} and its allies. |
| |
| The name @samp{.} can also be used in other contexts, but with slightly |
| different meaning. For example |
| |
| @example |
| > fmfull <- lm(y ~ . , data = production) |
| @end example |
| |
| @noindent |
| would fit a model with response @code{y} and regressor variables |
| @emph{all other variables in the data frame @code{production}}. |
| |
| Other functions for exploring incremental sequences of models are |
| @code{add1()}, @code{drop1()} and @code{step()}. |
| @findex add1 |
| @findex drop1 |
| @findex step |
| The names of these give a good clue to their purpose, but for full |
| details see the on-line help. |
| |
| @node Generalized linear models, Nonlinear least squares and maximum likelihood models, Updating fitted models, Statistical models in R |
| @section Generalized linear models |
| @cindex Generalized linear models |
| |
| Generalized linear modeling is a development of linear models to |
| accommodate both non-normal response distributions and transformations |
| to linearity in a clean and straightforward way. A generalized linear |
| model may be described in terms of the following sequence of |
| assumptions: |
| |
| @itemize @bullet |
| @item |
| There is a response, @math{y}, of interest and stimulus variables |
| @ifnottex |
| x_1, x_2, @dots{}, |
| @end ifnottex |
| @tex |
| $x_1$, $x_2$, @dots{}, |
| @end tex |
| whose values influence the distribution of the response. |
| |
| @item |
| The stimulus variables influence the distribution of @math{y} through |
| @emph{a single linear function, only}. This linear function is called |
| the @emph{linear predictor}, and is usually written |
| @ifnottex |
| @display |
| eta = beta_1 x_1 + beta_2 x_2 + @dots{} + beta_p x_p, |
| @end display |
| hence x_i has no influence on the distribution of @math{y} if and only if |
| beta_i is zero. |
| @end ifnottex |
| @tex |
| $$ \eta = \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p, $$ |
| hence $x_i$ has no influence on the distribution of @math{y} if and only |
| if $\beta_i=0$. |
| @end tex |
| |
| @item |
| The distribution of @math{y} is of the form |
| @ifnottex |
| @display |
| f_Y(y; mu, phi) |
| = exp((A/phi) * (y lambda(mu) - gamma(lambda(mu))) + tau(y, phi)) |
| @end display |
| where phi is a @emph{scale parameter} (possibly known), and is constant |
| for all observations, @math{A} represents a prior weight, assumed known |
| but possibly varying with the observations, and $\mu$ is the mean of |
| @math{y}. |
| @end ifnottex |
| @tex |
| $$ |
| f_Y(y;\mu,\varphi) |
| = \exp\left[{A \over \varphi}\left\{y\lambda(\mu) - |
| \gamma\left(\lambda(\mu)\right)\right\} + \tau(y,\varphi)\right] |
| $$ |
| where $\varphi$ is a @emph{scale parameter} (possibly known), and is |
| constant for all observations, $A$ represents a prior weight, assumed |
| known but possibly varying with the observations, and $\mu$ is the mean |
| of $y$. |
| @end tex |
| So it is assumed that the distribution of @math{y} is determined by its |
| mean and possibly a scale parameter as well. |
| |
| @item |
| @ifnottex |
| The mean, mu, is a smooth invertible function of the linear predictor: |
| @display |
| mu = m(eta), eta = m^@{-1@}(mu) = ell(mu) |
| @end display |
| and this inverse function, ell(), is called the @emph{link function}. |
| @end ifnottex |
| @tex |
| The mean, $\mu$, is a smooth invertible function of the linear predictor: |
| $$ \mu = m(\eta),\qquad \eta = m^{-1}(\mu) = \ell(\mu) $$ |
| and this inverse function, $\ell()$, is called the @emph{link function}. |
| @end tex |
| |
| @end itemize |
| |
| These assumptions are loose enough to encompass a wide class of models |
| useful in statistical practice, but tight enough to allow the |
| development of a unified methodology of estimation and inference, at |
| least approximately. The reader is referred to any of the current |
| reference works on the subject for full details, such as McCullagh & |
| Nelder (1989) or Dobson (1990). |
| |
| @menu |
| * Families:: |
| * The glm() function:: |
| @end menu |
| |
| @node Families, The glm() function, Generalized linear models, Generalized linear models |
| @subsection Families |
| @cindex Families |
| |
| The class of generalized linear models handled by facilities supplied in |
| @R{} includes @emph{gaussian}, @emph{binomial}, @emph{poisson}, |
| @emph{inverse gaussian} and @emph{gamma} response distributions and also |
| @emph{quasi-likelihood} models where the response distribution is not |
| explicitly specified. In the latter case the @emph{variance function} |
| must be specified as a function of the mean, but in other cases this |
| function is implied by the response distribution. |
| |
| Each response distribution admits a variety of link functions to connect |
| the mean with the linear predictor. Those automatically available are |
| shown in the following table: |
| |
| @quotation |
| @multitable @columnfractions 0.25 0.55 |
| @headitem Family name @tab Link functions |
| @item @code{binomial} @tab @code{logit}, @code{probit}, @code{log}, @code{cloglog} |
| @item @code{gaussian} @tab @code{identity}, @code{log}, @code{inverse} |
| @item @code{Gamma} @tab @code{identity}, @code{inverse}, @code{log} |
| @item @code{inverse.gaussian} @tab @code{1/mu^2}, @code{identity}, @code{inverse}, @code{log} |
| @item @code{poisson} @tab @code{identity}, @code{log}, @code{sqrt} |
| @item @code{quasi} @tab @code{logit}, @code{probit}, @code{cloglog}, |
| @code{identity}, @code{inverse}, @code{log}, @code{1/mu^2}, @code{sqrt} |
| @end multitable |
| @end quotation |
| |
| The combination of a response distribution, a link function and various |
| other pieces of information that are needed to carry out the modeling |
| exercise is called the @emph{family} of the generalized linear model. |
| |
| @node The glm() function, , Families, Generalized linear models |
| @subsection The @code{glm()} function |
| @findex glm |
| |
| Since the distribution of the response depends on the stimulus variables |
| through a single linear function @emph{only}, the same mechanism as was |
| used for linear models can still be used to specify the linear part of a |
| generalized model. The family has to be specified in a different way. |
| |
| The @R{} function to fit a generalized linear model is @code{glm()} |
| which uses the form |
| |
| @example |
| > @var{fitted.model} <- glm(@var{formula}, family=@var{family.generator}, data=@var{data.frame}) |
| @end example |
| |
| The only new feature is the @var{family.generator}, which is the |
| instrument by which the family is described. It is the name of a |
| function that generates a list of functions and expressions that |
| together define and control the model and estimation process. Although |
| this may seem a little complicated at first sight, its use is quite |
| simple. |
| |
| The names of the standard, supplied family generators are given under |
| ``Family Name'' in the table in @ref{Families}. Where there is a choice |
| of links, the name of the link may also be supplied with the family |
| name, in parentheses as a parameter. In the case of the @code{quasi} |
| family, the variance function may also be specified in this way. |
| |
| Some examples make the process clear. |
| |
| @subsubheading The @code{gaussian} family |
| |
| A call such as |
| |
| @example |
| > fm <- glm(y ~ x1 + x2, family = gaussian, data = sales) |
| @end example |
| |
| @noindent |
| achieves the same result as |
| |
| @example |
| > fm <- lm(y ~ x1+x2, data=sales) |
| @end example |
| |
| @noindent |
| but much less efficiently. Note how the gaussian family is not |
| automatically provided with a choice of links, so no parameter is |
| allowed. If a problem requires a gaussian family with a nonstandard |
| link, this can usually be achieved through the @code{quasi} family, as |
| we shall see later. |
| |
| @subsubheading The @code{binomial} family |
| |
| Consider a small, artificial example, from Silvey (1970). |
| |
| On the Aegean island of Kalythos the male inhabitants suffer from a |
| congenital eye disease, the effects of which become more marked with |
| increasing age. Samples of islander males of various ages were tested |
| for blindness and the results recorded. The data is shown below: |
| |
| @iftex |
| @quotation |
| @multitable {No.@: tested::} {50} {50} {50} {50} {50} |
| @item Age: @tab 20 @tab 35 @tab 45 @tab 55 @tab 70 |
| @item No.@: tested: @tab 50 @tab 50 @tab 50 @tab 50 @tab 50 |
| @item No.@: blind: @tab @w{ 6} @tab 17 @tab 26 @tab 37 @tab 44 |
| @end multitable |
| @end quotation |
| @end iftex |
| @ifnottex |
| @multitable {No.@: tested::} {50} {50} {50} {50} {50} |
| @item Age: @tab 20 @tab 35 @tab 45 @tab 55 @tab 70 |
| @item No.@: tested: @tab 50 @tab 50 @tab 50 @tab 50 @tab 50 |
| @item No.@: blind: @tab @w{ 6} @tab 17 @tab 26 @tab 37 @tab 44 |
| @end multitable |
| @end ifnottex |
| |
| The problem we consider is to fit both logistic and probit models to |
| this data, and to estimate for each model the LD50, that is the age at |
| which the chance of blindness for a male inhabitant is 50%. |
| |
| If @math{y} is the number of blind at age @math{x} and @math{n} the |
| number tested, both models have the form |
| @ifnottex |
| y ~ B(n, F(beta_0 + beta_1 x)) |
| @end ifnottex |
| @tex |
| $$ y \sim {\rm B}(n, F(\beta_0 + \beta_1 x)) $$ |
| @end tex |
| where for the probit case, |
| @eqn{F(z) = \Phi(z), F(z) = Phi(z)} |
| is the standard normal distribution function, and in the logit case |
| (the default), |
| @eqn{F(z) = e^z/(1+e^z),F(z) = e^z/(1+e^z)}. |
| In both cases the LD50 is |
| @ifnottex |
| LD50 = - beta_0/beta_1 |
| @end ifnottex |
| @tex |
| $$ \hbox{LD50} = -\beta_0/\beta_1 $$ |
| @end tex |
| that is, the point at which the argument of the distribution function is |
| zero. |
| |
| The first step is to set the data up as a data frame |
| |
| @example |
| > kalythos <- data.frame(x = c(20,35,45,55,70), n = rep(50,5), |
| y = c(6,17,26,37,44)) |
| @end example |
| |
| To fit a binomial model using @code{glm()} there are three possibilities |
| for the response: |
| |
| @itemize @bullet |
| @item |
| If the response is a @emph{vector} it is assumed to hold @emph{binary} |
| data, and so must be a @math{0/1} vector. |
| |
| @item |
| If the response is a @emph{two-column matrix} it is assumed that the |
| first column holds the number of successes for the trial and the second |
| holds the number of failures. |
| |
| @item |
| If the response is a @emph{factor}, its first level is taken as failure |
| (0) and all other levels as `success' (1). |
| @end itemize |
| |
| Here we need the second of these conventions, so we add a matrix to our |
| data frame: |
| |
| @example |
| > kalythos$Ymat <- cbind(kalythos$y, kalythos$n - kalythos$y) |
| @end example |
| |
| To fit the models we use |
| |
| @example |
| > fmp <- glm(Ymat ~ x, family = binomial(link=probit), data = kalythos) |
| > fml <- glm(Ymat ~ x, family = binomial, data = kalythos) |
| @end example |
| |
| Since the logit link is the default the parameter may be omitted on the |
| second call. To see the results of each fit we could use |
| |
| @example |
| > summary(fmp) |
| > summary(fml) |
| @end example |
| |
| Both models fit (all too) well. To find the LD50 estimate we can use a |
| simple function: |
| |
| @example |
| > ld50 <- function(b) -b[1]/b[2] |
| > ldp <- ld50(coef(fmp)); ldl <- ld50(coef(fml)); c(ldp, ldl) |
| @end example |
| |
| The actual estimates from this data are 43.663 years and 43.601 years |
| respectively. |
| |
| @subsubheading Poisson models |
| |
| With the Poisson family the default link is the @code{log}, and in |
| practice the major use of this family is to fit surrogate Poisson |
| log-linear models to frequency data, whose actual distribution is often |
| multinomial. This is a large and important subject we will not discuss |
| further here. It even forms a major part of the use of non-gaussian |
| generalized models overall. |
| |
| Occasionally genuinely Poisson data arises in practice and in the past |
| it was often analyzed as gaussian data after either a log or a |
| square-root transformation. As a graceful alternative to the latter, a |
| Poisson generalized linear model may be fitted as in the following |
| example: |
| |
| @example |
| > fmod <- glm(y ~ A + B + x, family = poisson(link=sqrt), |
| data = worm.counts) |
| @end example |
| |
| @subsubheading Quasi-likelihood models |
| |
| For all families the variance of the response will depend on the mean |
| and will have the scale parameter as a multiplier. The form of |
| dependence of the variance on the mean is a characteristic of the |
| response distribution; for example for the poisson distribution |
| @eqn{\hbox{Var}[y] = \mu,Var(y) = mu}. |
| |
| For quasi-likelihood estimation and inference the precise response |
| distribution is not specified, but rather only a link function and the |
| form of the variance function as it depends on the mean. Since |
| quasi-likelihood estimation uses formally identical techniques to those |
| for the gaussian distribution, this family provides a way of fitting |
| gaussian models with non-standard link functions or variance functions, |
| incidentally. |
| |
| For example, consider fitting the non-linear regression |
| @ifnottex |
| y = theta_1 z_1 / (z_2 - theta_2) + e |
| @end ifnottex |
| @tex |
| $$ y = {\theta_1z_1 \over z_2 - \theta_2} + e $$ |
| @end tex |
| which may be written alternatively as |
| @ifnottex |
| y = 1 / (beta_1 x_1 + beta_2 x_2) + e |
| @end ifnottex |
| @tex |
| $$ y = {1 \over \beta_1x_1 + \beta_2x_2} + e $$ |
| @end tex |
| where |
| @ifnottex |
| x_1 = z_2/z_1, x_2 = -1/z_1, beta_1 = 1/theta_1, and beta_2 = |
| theta_2/theta_1. |
| @end ifnottex |
| @tex |
| $x_1 = z_2/z_1$, $x_2=-1/z_1$, $\beta_1=1/\theta_1$ and |
| $\beta_2=\theta_2/\theta_1$. |
| @end tex |
| Supposing a suitable data frame to be set up we could fit this |
| non-linear regression as |
| |
| @example |
| > nlfit <- glm(y ~ x1 + x2 - 1, |
| family = quasi(link=inverse, variance=constant), |
| data = biochem) |
| @end example |
| |
| The reader is referred to the manual and the help document for further |
| information, as needed. |
| |
| @node Nonlinear least squares and maximum likelihood models, Some non-standard models, Generalized linear models, Statistical models in R |
| @section Nonlinear least squares and maximum likelihood models |
| @cindex Nonlinear least squares |
| |
| Certain forms of nonlinear model can be fitted by Generalized Linear |
| Models (@code{glm()}). But in the majority of cases we have to approach |
| the nonlinear curve fitting problem as one of nonlinear optimization. |
| @R{}'s nonlinear optimization routines are @code{optim()}, @code{nlm()} |
| and @code{nlminb()}, |
| @findex nlm |
| @findex optim |
| @findex nlminb |
| which provide the functionality (and more) of @SPLUS{}'s @code{ms()} and |
| @code{nlminb()}. We seek the parameter values that minimize some index |
| of lack-of-fit, and they do this by trying out various parameter values |
| iteratively. Unlike linear regression for example, there is no |
| guarantee that the procedure will converge on satisfactory estimates. |
| All the methods require initial guesses about what parameter values to |
| try, and convergence may depend critically upon the quality of the |
| starting values. |
| |
| @menu |
| * Least squares:: |
| * Maximum likelihood:: |
| @end menu |
| |
| @node Least squares, Maximum likelihood, Nonlinear least squares and maximum likelihood models, Nonlinear least squares and maximum likelihood models |
| @subsection Least squares |
| |
| One way to fit a nonlinear model is by minimizing the sum of the squared |
| errors (SSE) or residuals. This method makes sense if the observed |
| errors could have plausibly arisen from a normal distribution. |
| |
| Here is an example from Bates & Watts (1988), page 51. The data are: |
| |
| @example |
| > x <- c(0.02, 0.02, 0.06, 0.06, 0.11, 0.11, 0.22, 0.22, 0.56, 0.56, |
| 1.10, 1.10) |
| > y <- c(76, 47, 97, 107, 123, 139, 159, 152, 191, 201, 207, 200) |
| @end example |
| |
| The fit criterion to be minimized is: |
| |
| @example |
| > fn <- function(p) sum((y - (p[1] * x)/(p[2] + x))^2) |
| @end example |
| |
| In order to do the fit we need initial estimates of the parameters. One |
| way to find sensible starting values is to plot the data, guess some |
| parameter values, and superimpose the model curve using those values. |
| |
| @example |
| > plot(x, y) |
| > xfit <- seq(.02, 1.1, .05) |
| > yfit <- 200 * xfit/(0.1 + xfit) |
| > lines(spline(xfit, yfit)) |
| @end example |
| |
| We could do better, but these starting values of 200 and 0.1 seem |
| adequate. Now do the fit: |
| |
| @example |
| > out <- nlm(fn, p = c(200, 0.1), hessian = TRUE) |
| @end example |
| @findex nlm |
| |
| After the fitting, @code{out$minimum} is the SSE, and |
| @code{out$estimate} are the least squares estimates of the parameters. |
| To obtain the approximate standard errors (SE) of the estimates we do: |
| |
| @example |
| > sqrt(diag(2*out$minimum/(length(y) - 2) * solve(out$hessian))) |
| @end example |
| |
| The @code{2} which is subtracted in the line above represents the number |
| of parameters. A 95% confidence interval would be the parameter |
| estimate @eqn{\pm, +/-} 1.96 SE. We can superimpose the least squares |
| fit on a new plot: |
| |
| @example |
| > plot(x, y) |
| > xfit <- seq(.02, 1.1, .05) |
| > yfit <- 212.68384222 * xfit/(0.06412146 + xfit) |
| > lines(spline(xfit, yfit)) |
| @end example |
| |
| The standard package @pkg{stats} provides much more extensive facilities |
| for fitting non-linear models by least squares. The model we have just |
| fitted is the Michaelis-Menten model, so we can use |
| |
| @example |
| > df <- data.frame(x=x, y=y) |
| > fit <- nls(y ~ SSmicmen(x, Vm, K), df) |
| > fit |
| Nonlinear regression model |
| model: y ~ SSmicmen(x, Vm, K) |
| data: df |
| Vm K |
| 212.68370711 0.06412123 |
| residual sum-of-squares: 1195.449 |
| > summary(fit) |
| |
| Formula: y ~ SSmicmen(x, Vm, K) |
| |
| Parameters: |
| Estimate Std. Error t value Pr(>|t|) |
| Vm 2.127e+02 6.947e+00 30.615 3.24e-11 |
| K 6.412e-02 8.281e-03 7.743 1.57e-05 |
| |
| Residual standard error: 10.93 on 10 degrees of freedom |
| |
| Correlation of Parameter Estimates: |
| Vm |
| K 0.7651 |
| @end example |
| |
| @node Maximum likelihood, , Least squares, Nonlinear least squares and maximum likelihood models |
| @subsection Maximum likelihood |
| @cindex Maximum likelihood |
| |
| Maximum likelihood is a method of nonlinear model fitting that applies |
| even if the errors are not normal. The method finds the parameter values |
| which maximize the log likelihood, or equivalently which minimize the |
| negative log-likelihood. Here is an example from Dobson (1990), pp.@: |
| 108--111. This example fits a logistic model to dose-response data, |
| which clearly could also be fit by @code{glm()}. The data are: |
| |
| @example |
| > x <- c(1.6907, 1.7242, 1.7552, 1.7842, 1.8113, |
| 1.8369, 1.8610, 1.8839) |
| > y <- c( 6, 13, 18, 28, 52, 53, 61, 60) |
| > n <- c(59, 60, 62, 56, 63, 59, 62, 60) |
| @end example |
| |
| The negative log-likelihood to minimize is: |
| |
| @example |
| > fn <- function(p) |
| sum( - (y*(p[1]+p[2]*x) - n*log(1+exp(p[1]+p[2]*x)) |
| + log(choose(n, y)) )) |
| @end example |
| |
| @noindent |
| We pick sensible starting values and do the fit: |
| |
| @example |
| > out <- nlm(fn, p = c(-50,20), hessian = TRUE) |
| @end example |
| @findex nlm |
| |
| @noindent |
| After the fitting, @code{out$minimum} is the negative log-likelihood, |
| and @code{out$estimate} are the maximum likelihood estimates of the |
| parameters. To obtain the approximate SEs of the estimates we do: |
| |
| @example |
| > sqrt(diag(solve(out$hessian))) |
| @end example |
| |
| A 95% confidence interval would be the parameter estimate @eqn{\pm, +/-} |
| 1.96 SE. |
| |
| @node Some non-standard models, , Nonlinear least squares and maximum likelihood models, Statistical models in R |
| @section Some non-standard models |
| |
| We conclude this chapter with just a brief mention of some of the other |
| facilities available in @R{} for special regression and data analysis |
| problems. |
| |
| @itemize @bullet |
| @item |
| @cindex Mixed models |
| @strong{Mixed models.} The recommended @CRANpkg{nlme} package provides |
| functions @code{lme()} and @code{nlme()} |
| @findex lme |
| @findex nlme |
| for linear and non-linear mixed-effects models, that is linear and |
| non-linear regressions in which some of the coefficients correspond to |
| random effects. These functions make heavy use of formulae to specify |
| the models. |
| |
| @item |
| @cindex Local approximating regressions |
| @strong{Local approximating regressions.} The @code{loess()} |
| @findex loess |
| function fits a nonparametric regression by using a locally weighted |
| regression. Such regressions are useful for highlighting a trend in |
| messy data or for data reduction to give some insight into a large data |
| set. |
| |
| Function @code{loess} is in the standard package @pkg{stats}, together |
| with code for projection pursuit regression. |
| @findex loess |
| |
| @item |
| @cindex Robust regression |
| @strong{Robust regression.} There are several functions available for |
| fitting regression models in a way resistant to the influence of extreme |
| outliers in the data. Function @code{lqs} |
| @findex lqs |
| in the recommended package @CRANpkg{MASS} provides state-of-art algorithms |
| for highly-resistant fits. Less resistant but statistically more |
| efficient methods are available in packages, for example function |
| @code{rlm} |
| @findex rlm |
| in package @CRANpkg{MASS}. |
| |
| @item |
| @cindex Additive models |
| @strong{Additive models.} This technique aims to construct a regression |
| function from smooth additive functions of the determining variables, |
| usually one for each determining variable. Functions @code{avas} and |
| @code{ace} |
| @findex avas |
| @findex ace |
| in package @CRANpkg{acepack} and functions @code{bruto} and @code{mars} |
| @findex bruto |
| @findex mars |
| in package @CRANpkg{mda} provide some examples of these techniques in |
| user-contributed packages to @R{}. An extension is @strong{Generalized |
| Additive Models}, implemented in user-contributed packages @CRANpkg{gam} and |
| @CRANpkg{mgcv}. |
| |
| @item |
| @cindex Tree-based models |
| @strong{Tree-based models.} Rather than seek an explicit global linear |
| model for prediction or interpretation, tree-based models seek to |
| bifurcate the data, recursively, at critical points of the determining |
| variables in order to partition the data ultimately into groups that are |
| as homogeneous as possible within, and as heterogeneous as possible |
| between. The results often lead to insights that other data analysis |
| methods tend not to yield. |
| |
| Models are again specified in the ordinary linear model form. The model |
| fitting function is @code{tree()}, |
| @findex tree |
| but many other generic functions such as @code{plot()} and @code{text()} |
| are well adapted to displaying the results of a tree-based model fit in |
| a graphical way. |
| |
| Tree models are available in @R{} @emph{via} the user-contributed |
| packages @CRANpkg{rpart} and @CRANpkg{tree}. |
| |
| @end itemize |
| |
| @node Graphics, Packages, Statistical models in R, Top |
| @chapter Graphical procedures |
| |
| Graphical facilities are an important and extremely versatile component |
| of the @R{} environment. It is possible to use the facilities to |
| display a wide variety of statistical graphs and also to build entirely |
| new types of graph. |
| |
| The graphics facilities can be used in both interactive and batch modes, |
| but in most cases, interactive use is more productive. Interactive use |
| is also easy because at startup time @R{} initiates a graphics |
| @emph{device driver} which opens a special @emph{graphics window} for |
| the display of interactive graphics. Although this is done |
| automatically, it may useful to know that the command used is |
| @code{X11()} under UNIX, @code{windows()} under Windows and |
| @code{quartz()} under macOS. A new device can always be opened by |
| @code{dev.new()}. |
| |
| Once the device driver is running, @R{} plotting commands can be used to |
| produce a variety of graphical displays and to create entirely new kinds |
| of display. |
| |
| Plotting commands are divided into three basic groups: |
| |
| @itemize @bullet |
| @item |
| @strong{High-level} plotting functions create a new plot on the graphics |
| device, possibly with axes, labels, titles and so on. |
| @item |
| @strong{Low-level} plotting functions add more information to an |
| existing plot, such as extra points, lines and labels. |
| @item |
| @strong{Interactive} graphics functions allow you interactively add |
| information to, or extract information from, an existing plot, using a |
| pointing device such as a mouse. |
| @end itemize |
| |
| In addition, @R{} maintains a list of @emph{graphical parameters} which |
| can be manipulated to customize your plots. |
| |
| This manual only describes what are known as `base' graphics. A |
| separate graphics sub-system in package @pkg{grid} coexists with base -- |
| it is more powerful but harder to use. There is a recommended package |
| @CRANpkg{lattice} which builds on @pkg{grid} and provides ways to produce |
| multi-panel plots akin to those in the @emph{Trellis} system in @Sl{}. |
| |
| @menu |
| * High-level plotting commands:: |
| * Low-level plotting commands:: |
| * Interacting with graphics:: |
| * Using graphics parameters:: |
| * Graphics parameters:: |
| * Device drivers:: |
| * Dynamic graphics:: |
| @end menu |
| |
| @node High-level plotting commands, Low-level plotting commands, Graphics, Graphics |
| @section High-level plotting commands |
| |
| High-level plotting functions are designed to generate a complete plot |
| of the data passed as arguments to the function. Where appropriate, |
| axes, labels and titles are automatically generated (unless you request |
| otherwise.) High-level plotting commands always start a new plot, |
| erasing the current plot if necessary. |
| |
| @menu |
| * The plot() function:: |
| * Displaying multivariate data:: |
| * Display graphics:: |
| * Arguments to high-level plotting functions:: |
| @end menu |
| |
| @node The plot() function, Displaying multivariate data, High-level plotting commands, High-level plotting commands |
| @subsection The @code{plot()} function |
| @findex plot |
| |
| One of the most frequently used plotting functions in @R{} is the |
| @code{plot()} function. This is a @emph{generic} function: the type of |
| plot produced is dependent on the type or @emph{class} of the first |
| argument. |
| |
| @table @code |
| |
| @item plot(@var{x}, @var{y}) |
| @itemx plot(@var{xy}) |
| If @var{x} and @var{y} are vectors, @code{plot(@var{x}, @var{y})} |
| produces a scatterplot of @var{y} against @var{x}. The same effect can |
| be produced by supplying one argument (second form) as either a list |
| containing two elements @var{x} and @var{y} or a two-column matrix. |
| |
| @item plot(@var{x}) |
| If @var{x} is a time series, this produces a time-series plot. If |
| @var{x} is a numeric vector, it produces a plot of the values in the |
| vector against their index in the vector. If @var{x} is a complex |
| vector, it produces a plot of imaginary versus real parts of the vector |
| elements. |
| |
| @item plot(@var{f}) |
| @itemx plot(@var{f}, @var{y}) |
| @var{f} is a factor object, @var{y} is a numeric vector. The first form |
| generates a bar plot of @var{f}; the second form produces boxplots of |
| @var{y} for each level of @var{f}. |
| |
| @item plot(@var{df}) |
| @itemx plot(~ @var{expr}) |
| @itemx plot(@var{y} ~ @var{expr}) |
| @var{df} is a data frame, @var{y} is any object, @var{expr} is a list |
| of object names separated by `@code{+}' (e.g., @code{a + b + c}). The |
| first two forms produce distributional plots of the variables in a data |
| frame (first form) or of a number of named objects (second form). The |
| third form plots @var{y} against every object named in @var{expr}. |
| @end table |
| |
| @node Displaying multivariate data, Display graphics, The plot() function, High-level plotting commands |
| @subsection Displaying multivariate data |
| |
| @R{} provides two very useful functions for representing multivariate |
| data. If @code{X} is a numeric matrix or data frame, the command |
| |
| @example |
| > pairs(X) |
| @end example |
| @findex pairs |
| |
| @noindent |
| produces a pairwise scatterplot matrix of the variables defined by the |
| columns of @code{X}, that is, every column of @code{X} is plotted |
| against every other column of @code{X} and the resulting @math{n(n-1)} |
| plots are arranged in a matrix with plot scales constant over the rows |
| and columns of the matrix. |
| |
| When three or four variables are involved a @emph{coplot} may be more |
| enlightening. If @code{a} and @code{b} are numeric vectors and @code{c} |
| is a numeric vector or factor object (all of the same length), then |
| the command |
| |
| @example |
| > coplot(a ~ b | c) |
| @end example |
| @findex coplot |
| |
| @noindent |
| produces a number of scatterplots of @code{a} against @code{b} for given |
| values of @code{c}. If @code{c} is a factor, this simply means that |
| @code{a} is plotted against @code{b} for every level of @code{c}. When |
| @code{c} is numeric, it is divided into a number of @emph{conditioning |
| intervals} and for each interval @code{a} is plotted against @code{b} |
| for values of @code{c} within the interval. The number and position of |
| intervals can be controlled with @code{given.values=} argument to |
| @code{coplot()}---the function @code{co.intervals()} is useful for |
| selecting intervals. You can also use two @emph{given} variables with a |
| command like |
| |
| @example |
| > coplot(a ~ b | c + d) |
| @end example |
| |
| @noindent |
| which produces scatterplots of @code{a} against @code{b} for every joint |
| conditioning interval of @code{c} and @code{d}. |
| |
| The @code{coplot()} and @code{pairs()} function both take an argument |
| @code{panel=} which can be used to customize the type of plot which |
| appears in each panel. The default is @code{points()} to produce a |
| scatterplot but by supplying some other low-level graphics function of |
| two vectors @code{x} and @code{y} as the value of @code{panel=} you can |
| produce any type of plot you wish. An example panel function useful for |
| coplots is @code{panel.smooth()}. |
| |
| @node Display graphics, Arguments to high-level plotting functions, Displaying multivariate data, High-level plotting commands |
| @subsection Display graphics |
| |
| Other high-level graphics functions produce different types of plots. |
| Some examples are: |
| |
| @table @code |
| @c @item tsplot(x_1, x_2, @dots{}) |
| @c @findex tsplot |
| @c Plots any number of time series on the same scale. This automatic |
| @c simultaneous scaling feature is also useful when the @code{x@var{_i}}'s |
| @c are ordinary numeric vectors, in which case they are plotted against the |
| @c numbers @math{1, 2, 3, @dots{}}. |
| |
| @item qqnorm(x) |
| @itemx qqline(x) |
| @itemx qqplot(x, y) |
| @findex qqnorm |
| @findex qqline |
| @findex qqplot |
| Distribution-comparison plots. The first form plots the numeric vector |
| @code{x} against the expected Normal order scores (a normal scores plot) |
| and the second adds a straight line to such a plot by drawing a line |
| through the distribution and data quartiles. The third form plots the |
| quantiles of @code{x} against those of @code{y} to compare their |
| respective distributions. |
| |
| @item hist(x) |
| @itemx hist(x, nclass=@var{n}) |
| @itemx hist(x, breaks=@var{b}, @dots{}) |
| @findex hist |
| Produces a histogram of the numeric vector @code{x}. A sensible number |
| of classes is usually chosen, but a recommendation can be given with the |
| @code{nclass=} argument. Alternatively, the breakpoints can be |
| specified exactly with the @code{breaks=} argument. If the |
| @code{probability=TRUE} argument is given, the bars represent relative |
| frequencies divided by bin width instead of counts. |
| |
| @item dotchart(x, @dots{}) |
| @findex dotchart |
| Constructs a dotchart of the data in @code{x}. In a dotchart the |
| @math{y}-axis gives a labelling of the data in @code{x} and the |
| @math{x}-axis gives its value. For example it allows easy visual |
| selection of all data entries with values lying in specified ranges. |
| |
| @item image(x, y, z, @dots{}) |
| @itemx contour(x, y, z, @dots{}) |
| @itemx persp(x, y, z, @dots{}) |
| @findex image |
| @findex contour |
| @findex persp |
| Plots of three variables. The @code{image} plot draws a grid of rectangles |
| using different colours to represent the value of @code{z}, the @code{contour} |
| plot draws contour lines to represent the value of @code{z}, and the |
| @code{persp} plot draws a 3D surface. |
| @end table |
| |
| @node Arguments to high-level plotting functions, , Display graphics, High-level plotting commands |
| @subsection Arguments to high-level plotting functions |
| |
| There are a number of arguments which may be passed to high-level |
| graphics functions, as follows: |
| |
| @table @code |
| @item add=TRUE |
| Forces the function to act as a low-level graphics function, |
| superimposing the plot on the current plot (some functions only). |
| |
| @item axes=FALSE |
| Suppresses generation of axes---useful for adding your own custom axes |
| with the @code{axis()} function. The default, @code{axes=TRUE}, means |
| include axes. |
| |
| @item log="x" |
| @itemx log="y" |
| @itemx log="xy" |
| Causes the @math{x}, @math{y} or both axes to be logarithmic. This will |
| work for many, but not all, types of plot. |
| |
| @item type= |
| The @code{type=} argument controls the type of plot produced, as |
| follows: |
| |
| @table @code |
| @item type="p" |
| Plot individual points (the default) |
| @item type="l" |
| Plot lines |
| @item type="b" |
| Plot points connected by lines (@emph{both}) |
| @item type="o" |
| Plot points overlaid by lines |
| @item type="h" |
| Plot vertical lines from points to the zero axis (@emph{high-density}) |
| @item type="s" |
| @itemx type="S" |
| Step-function plots. In the first form, the top of the vertical defines |
| the point; in the second, the bottom. |
| @item type="n" |
| No plotting at all. However axes are still drawn (by default) and the |
| coordinate system is set up according to the data. Ideal for creating |
| plots with subsequent low-level graphics functions. |
| @end table |
| |
| @item xlab=@var{string} |
| @itemx ylab=@var{string} |
| Axis labels for the @math{x} and @math{y} axes. Use these arguments to |
| change the default labels, usually the names of the objects used in the |
| call to the high-level plotting function. |
| |
| @item main=@var{string} |
| Figure title, placed at the top of the plot in a large font. |
| |
| @item sub=@var{string} |
| Sub-title, placed just below the @math{x}-axis in a smaller font. |
| @end table |
| |
| @node Low-level plotting commands, Interacting with graphics, High-level plotting commands, Graphics |
| @section Low-level plotting commands |
| |
| Sometimes the high-level plotting functions don't produce exactly the |
| kind of plot you desire. In this case, low-level plotting commands can |
| be used to add extra information (such as points, lines or text) to the |
| current plot. |
| |
| Some of the more useful low-level plotting functions are: |
| |
| @table @code |
| @item points(x, y) |
| @itemx lines(x, y) |
| @findex points |
| @findex lines |
| Adds points or connected lines to the current plot. @code{plot()}'s |
| @code{type=} argument can also be passed to these functions (and |
| defaults to @code{"p"} for @code{points()} and @code{"l"} for |
| @code{lines()}.) |
| |
| @item text(x, y, labels, @dots{}) |
| @findex text |
| Add text to a plot at points given by @code{x, y}. Normally |
| @code{labels} is an integer or character vector in which case |
| @code{labels[i]} is plotted at point @code{(x[i], y[i])}. The default |
| is @code{1:length(x)}. |
| |
| @strong{Note}: This function is often used in the sequence |
| |
| @example |
| > plot(x, y, type="n"); text(x, y, names) |
| @end example |
| |
| @noindent |
| The graphics parameter @code{type="n"} suppresses the points but sets up |
| the axes, and the @code{text()} function supplies special characters, as |
| specified by the character vector @code{names} for the points. |
| |
| @item abline(a, b) |
| @itemx abline(h=@var{y}) |
| @itemx abline(v=@var{x}) |
| @itemx abline(@var{lm.obj}) |
| @findex abline |
| Adds a line of slope @code{b} and intercept @code{a} to the current |
| plot. @code{h=@var{y}} may be used to specify @math{y}-coordinates for |
| the heights of horizontal lines to go across a plot, and |
| @code{v=@var{x}} similarly for the @math{x}-coordinates for vertical |
| lines. Also @var{lm.obj} may be list with a @code{coefficients} |
| component of length 2 (such as the result of model-fitting functions,) |
| which are taken as an intercept and slope, in that order. |
| |
| @item polygon(x, y, @dots{}) |
| @findex polygon |
| Draws a polygon defined by the ordered vertices in (@code{x}, @code{y}) |
| and (optionally) shade it in with hatch lines, or fill it if the |
| graphics device allows the filling of figures. |
| |
| @item legend(x, y, legend, @dots{}) |
| @findex legend |
| Adds a legend to the current plot at the specified position. Plotting |
| characters, line styles, colors etc., are identified with the labels in |
| the character vector @code{legend}. At least one other argument @var{v} |
| (a vector the same length as @code{legend}) with the corresponding |
| values of the plotting unit must also be given, as follows: |
| |
| @table @code |
| @item legend( , fill=@var{v}) |
| Colors for filled boxes |
| @item legend( , col=@var{v}) |
| Colors in which points or lines will be drawn |
| @item legend( , lty=@var{v}) |
| Line styles |
| @item legend( , lwd=@var{v}) |
| Line widths |
| @item legend( , pch=@var{v}) |
| Plotting characters (character vector) |
| @end table |
| |
| @item title(main, sub) |
| @findex title |
| Adds a title @code{main} to the top of the current plot in a large font |
| and (optionally) a sub-title @code{sub} at the bottom in a smaller font. |
| |
| @item axis(side, @dots{}) |
| @findex axis |
| Adds an axis to the current plot on the side given by the first argument |
| (1 to 4, counting clockwise from the bottom.) Other arguments control |
| the positioning of the axis within or beside the plot, and tick |
| positions and labels. Useful for adding custom axes after calling |
| @code{plot()} with the @code{axes=FALSE} argument. |
| @end table |
| |
| Low-level plotting functions usually require some positioning |
| information (e.g., @math{x} and @math{y} coordinates) to determine where |
| to place the new plot elements. Coordinates are given in terms of |
| @emph{user coordinates} which are defined by the previous high-level |
| graphics command and are chosen based on the supplied data. |
| |
| Where @code{x} and @code{y} arguments are required, it is also |
| sufficient to supply a single argument being a list with elements named |
| @code{x} and @code{y}. Similarly a matrix with two columns is also |
| valid input. In this way functions such as @code{locator()} (see below) |
| may be used to specify positions on a plot interactively. |
| |
| @menu |
| * Mathematical annotation:: |
| * Hershey vector fonts:: |
| @end menu |
| |
| @node Mathematical annotation, Hershey vector fonts, Low-level plotting commands, Low-level plotting commands |
| @subsection Mathematical annotation |
| |
| In some cases, it is useful to add mathematical symbols and formulae to a |
| plot. This can be achieved in @R{} by specifying an @emph{expression} rather |
| than a character string in any one of @code{text}, @code{mtext}, @code{axis}, |
| or @code{title}. For example, the following code draws the formula for |
| the Binomial probability function: |
| |
| @example |
| > text(x, y, expression(paste(bgroup("(", atop(n, x), ")"), p^x, q^@{n-x@}))) |
| @end example |
| |
| More information, including a full listing of the features available can |
| obtained from within @R{} using the commands: |
| |
| @example |
| > help(plotmath) |
| > example(plotmath) |
| > demo(plotmath) |
| @end example |
| |
| @node Hershey vector fonts, , Mathematical annotation, Low-level plotting commands |
| @subsection Hershey vector fonts |
| |
| It is possible to specify Hershey vector fonts for rendering text when using |
| the @code{text} and @code{contour} functions. There are three reasons for |
| using the Hershey fonts: |
| @itemize @bullet |
| @item |
| Hershey fonts can produce better |
| output, especially on a computer screen, for rotated and/or small text. |
| @item |
| Hershey fonts |
| provide certain symbols that may not be available |
| in the standard fonts. In particular, there are zodiac signs, cartographic |
| symbols and astronomical symbols. |
| @item |
| Hershey fonts provide cyrillic and japanese (Kana and Kanji) characters. |
| @end itemize |
| |
| More information, including tables of Hershey characters can be obtained from |
| within @R{} using the commands: |
| |
| @example |
| > help(Hershey) |
| > demo(Hershey) |
| > help(Japanese) |
| > demo(Japanese) |
| @end example |
| |
| @node Interacting with graphics, Using graphics parameters, Low-level plotting commands, Graphics |
| @section Interacting with graphics |
| |
| @R{} also provides functions which allow users to extract or add |
| information to a plot using a mouse. The simplest of these is the |
| @code{locator()} function: |
| |
| @table @code |
| @item locator(n, type) |
| @findex locator |
| Waits for the user to select locations on the current plot using the |
| left mouse button. This continues until @code{n} (default 512) points |
| have been selected, or another mouse button is pressed. The |
| @code{type} argument allows for plotting at the selected points and has |
| the same effect as for high-level graphics commands; the default is no |
| plotting. @code{locator()} returns the locations of the points selected |
| as a list with two components @code{x} and @code{y}. |
| @end table |
| |
| @code{locator()} is usually called with no arguments. It is |
| particularly useful for interactively selecting positions for graphic |
| elements such as legends or labels when it is difficult to calculate in |
| advance where the graphic should be placed. For example, to place some |
| informative text near an outlying point, the command |
| |
| @example |
| > text(locator(1), "Outlier", adj=0) |
| @end example |
| |
| @noindent |
| may be useful. (@code{locator()} will be ignored if the current device, |
| such as @code{postscript} does not support interactive pointing.) |
| |
| @table @code |
| @item identify(x, y, labels) |
| @findex identify |
| Allow the user to highlight any of the points defined by @code{x} and |
| @code{y} (using the left mouse button) by plotting the corresponding |
| component of @code{labels} nearby (or the index number of the point if |
| @code{labels} is absent). Returns the indices of the selected points |
| when another button is pressed. |
| @end table |
| |
| Sometimes we want to identify particular @emph{points} on a plot, rather |
| than their positions. For example, we may wish the user to select some |
| observation of interest from a graphical display and then manipulate |
| that observation in some way. Given a number of @math{(x, y)} |
| coordinates in two numeric vectors @code{x} and @code{y}, we could use |
| the @code{identify()} function as follows: |
| |
| @example |
| > plot(x, y) |
| > identify(x, y) |
| @end example |
| |
| The @code{identify()} functions performs no plotting itself, but simply |
| allows the user to move the mouse pointer and click the left mouse |
| button near a point. If there is a point near the mouse pointer it will |
| be marked with its index number (that is, its position in the |
| @code{x}/@code{y} vectors) plotted nearby. Alternatively, you could use |
| some informative string (such as a case name) as a highlight by using |
| the @code{labels} argument to @code{identify()}, or disable marking |
| altogether with the @code{plot = FALSE} argument. When the process is |
| terminated (see above), @code{identify()} returns the indices of the |
| selected points; you can use these indices to extract the selected |
| points from the original vectors @code{x} and @code{y}. |
| |
| @node Using graphics parameters, Graphics parameters, Interacting with graphics, Graphics |
| @section Using graphics parameters |
| |
| When creating graphics, particularly for presentation or publication |
| purposes, @R{}'s defaults do not always produce exactly that which is |
| required. You can, however, customize almost every aspect of the |
| display using @emph{graphics parameters}. @R{} maintains a list of a |
| large number of graphics parameters which control things such as line |
| style, colors, figure arrangement and text justification among many |
| others. Every graphics parameter has a name (such as `@code{col}', |
| which controls colors,) and a value (a color number, for example.) |
| |
| A separate list of graphics parameters is maintained for each active |
| device, and each device has a default set of parameters when |
| initialized. Graphics parameters can be set in two ways: either |
| permanently, affecting all graphics functions which access the current |
| device; or temporarily, affecting only a single graphics function call. |
| |
| @menu |
| * The par() function:: |
| * Arguments to graphics functions:: |
| @end menu |
| |
| @node The par() function, Arguments to graphics functions, Using graphics parameters, Using graphics parameters |
| @subsection Permanent changes: The @code{par()} function |
| @findex par |
| @cindex Graphics parameters |
| |
| The @code{par()} function is used to access and modify the list of |
| graphics parameters for the current graphics device. |
| |
| @table @code |
| @item par() |
| Without arguments, returns a list of all graphics parameters and their |
| values for the current device. |
| @item par(c("col", "lty")) |
| With a character vector argument, returns only the named graphics |
| parameters (again, as a list.) |
| @item par(col=4, lty=2) |
| With named arguments (or a single list argument), sets the values of |
| the named graphics parameters, and returns the original values of the |
| parameters as a list. |
| @end table |
| |
| Setting graphics parameters with the @code{par()} function changes the |
| value of the parameters @emph{permanently}, in the sense that all future |
| calls to graphics functions (on the current device) will be affected by |
| the new value. You can think of setting graphics parameters in this way |
| as setting ``default'' values for the parameters, which will be used by |
| all graphics functions unless an alternative value is given. |
| |
| Note that calls to @code{par()} @emph{always} affect the global values |
| of graphics parameters, even when @code{par()} is called from within a |
| function. This is often undesirable behavior---usually we want to set |
| some graphics parameters, do some plotting, and then restore the |
| original values so as not to affect the user's @R{} session. You can |
| restore the initial values by saving the result of @code{par()} when |
| making changes, and restoring the initial values when plotting is |
| complete. |
| |
| @example |
| > oldpar <- par(col=4, lty=2) |
| @r{@dots{} plotting commands @dots{}} |
| > par(oldpar) |
| @end example |
| |
| @noindent |
| To save and restore @emph{all} settable@footnote{Some graphics |
| parameters such as the size of the current device are for information |
| only.} graphical parameters use |
| |
| @example |
| > oldpar <- par(no.readonly=TRUE) |
| @r{@dots{} plotting commands @dots{}} |
| > par(oldpar) |
| @end example |
| |
| |
| @node Arguments to graphics functions, , The par() function, Using graphics parameters |
| @subsection Temporary changes: Arguments to graphics functions |
| |
| Graphics parameters may also be passed to (almost) any graphics function |
| as named arguments. This has the same effect as passing the arguments |
| to the @code{par()} function, except that the changes only last for the |
| duration of the function call. For example: |
| |
| @example |
| > plot(x, y, pch="+") |
| @end example |
| |
| @noindent |
| produces a scatterplot using a plus sign as the plotting character, |
| without changing the default plotting character for future plots. |
| |
| Unfortunately, this is not implemented entirely consistently and it is |
| sometimes necessary to set and reset graphics parameters using |
| @code{par()}. |
| |
| |
| @node Graphics parameters, Device drivers, Using graphics parameters, Graphics |
| @section Graphics parameters list |
| |
| The following sections detail many of the commonly-used graphical |
| parameters. The @R{} help documentation for the @code{par()} function |
| provides a more concise summary; this is provided as a somewhat more |
| detailed alternative. |
| |
| Graphics parameters will be presented in the following form: |
| |
| @table @code |
| @item @var{name}=@var{value} |
| A description of the parameter's effect. @var{name} is the name of the |
| parameter, that is, the argument name to use in calls to @code{par()} or |
| a graphics function. @var{value} is a typical value you might use when |
| setting the parameter. |
| @end table |
| |
| Note that @code{axes} is @strong{not} a graphics parameter but an |
| argument to a few @code{plot} methods: see @code{xaxt} and @code{yaxt}. |
| |
| @menu |
| * Graphical elements:: |
| * Axes and tick marks:: |
| * Figure margins:: |
| * Multiple figure environment:: |
| @end menu |
| |
| @node Graphical elements, Axes and tick marks, Graphics parameters, Graphics parameters |
| @subsection Graphical elements |
| |
| @R{} plots are made up of points, lines, text and polygons (filled |
| regions.) Graphical parameters exist which control how these |
| @emph{graphical elements} are drawn, as follows: |
| |
| @table @code |
| @item pch="+" |
| Character to be used for plotting points. The default varies with |
| graphics drivers, but it is usually |
| @ifnottex |
| a circle. |
| @end ifnottex |
| @tex |
| `$\circ$'. |
| @end tex |
| Plotted points tend to appear slightly above or below the appropriate |
| position unless you use @code{"."} as the plotting character, which |
| produces centered points. |
| |
| @item pch=4 |
| When @code{pch} is given as an integer between 0 and 25 inclusive, a |
| specialized plotting symbol is produced. To see what the symbols are, |
| use the command |
| |
| @example |
| > legend(locator(1), as.character(0:25), pch = 0:25) |
| @end example |
| |
| @noindent |
| Those from 21 to 25 may appear to duplicate earlier symbols, but can be |
| coloured in different ways: see the help on @code{points} and its |
| examples. |
| |
| In addition, @code{pch} can be a character or a number in the range |
| @code{32:255} representing a character in the current font. |
| |
| @item lty=2 |
| Line types. Alternative line styles are not supported on all graphics |
| devices (and vary on those that do) but line type 1 is always a solid |
| line, line type 0 is always invisible, and line types 2 and onwards are |
| dotted or dashed lines, or some combination of both. |
| |
| @item lwd=2 |
| Line widths. Desired width of lines, in multiples of the ``standard'' |
| line width. Affects axis lines as well as lines drawn with |
| @code{lines()}, etc. Not all devices support this, and some have |
| restrictions on the widths that can be used. |
| |
| @item col=2 |
| Colors to be used for points, lines, text, filled regions and images. |
| A number from the current palette (see @code{?palette}) or a named colour. |
| |
| @item col.axis |
| @itemx col.lab |
| @itemx col.main |
| @itemx col.sub |
| The color to be used for axis annotation, @math{x} and @math{y} labels, |
| main and sub-titles, respectively. |
| |
| @item font=2 |
| An integer which specifies which font to use for text. If possible, |
| device drivers arrange so that @code{1} corresponds to plain text, |
| @code{2} to bold face, @code{3} to italic, @code{4} to bold italic |
| and @code{5} to a symbol font (which include Greek letters). |
| |
| @item font.axis |
| @itemx font.lab |
| @itemx font.main |
| @itemx font.sub |
| The font to be used for axis annotation, @math{x} and @math{y} labels, |
| main and sub-titles, respectively. |
| |
| @item adj=-0.1 |
| Justification of text relative to the plotting position. @code{0} means |
| left justify, @code{1} means right justify and @code{0.5} means to |
| center horizontally about the plotting position. The actual value is |
| the proportion of text that appears to the left of the plotting |
| position, so a value of @code{-0.1} leaves a gap of 10% of the text width |
| between the text and the plotting position. |
| |
| @item cex=1.5 |
| Character expansion. The value is the desired size of text characters |
| (including plotting characters) relative to the default text size. |
| |
| @item cex.axis |
| @itemx cex.lab |
| @itemx cex.main |
| @itemx cex.sub |
| The character expansion to be used for axis annotation, @math{x} and |
| @math{y} labels, main and sub-titles, respectively. |
| @end table |
| |
| @node Axes and tick marks, Figure margins, Graphical elements, Graphics parameters |
| @subsection Axes and tick marks |
| |
| Many of @R{}'s high-level plots have axes, and you can construct axes |
| yourself with the low-level @code{axis()} graphics function. Axes have |
| three main components: the @emph{axis line} (line style controlled by the |
| @code{lty} graphics parameter), the @emph{tick marks} (which mark off unit |
| divisions along the axis line) and the @emph{tick labels} (which mark the |
| units.) These components can be customized with the following graphics |
| parameters. |
| |
| @table @code |
| @item lab=c(5, 7, 12) |
| The first two numbers are the desired number of tick intervals on the |
| @math{x} and @math{y} axes respectively. The third number is the |
| desired length of axis labels, in characters (including the decimal |
| point.) Choosing a too-small value for this parameter may result in all |
| tick labels being rounded to the same number! |
| |
| @item las=1 |
| Orientation of axis labels. @code{0} means always parallel to axis, |
| @code{1} means always horizontal, and @code{2} means always |
| perpendicular to the axis. |
| |
| @item mgp=c(3, 1, 0) |
| Positions of axis components. The first component is the distance from |
| the axis label to the axis position, in text lines. The second |
| component is the distance to the tick labels, and the final component is |
| the distance from the axis position to the axis line (usually zero). |
| Positive numbers measure outside the plot region, negative numbers |
| inside. |
| |
| @item tck=0.01 |
| Length of tick marks, as a fraction of the size of the plotting region. |
| When @code{tck} is small (less than 0.5) the tick marks on the @math{x} |
| and @math{y} axes are forced to be the same size. A value of 1 gives |
| grid lines. Negative values give tick marks outside the plotting |
| region. Use @code{tck=0.01} and @code{mgp=c(1,-1.5,0)} for internal |
| tick marks. |
| |
| @item xaxs="r" |
| @itemx yaxs="i" |
| Axis styles for the @math{x} and @math{y} axes, respectively. With |
| styles @code{"i"} (internal) and @code{"r"} (the default) tick marks |
| always fall within the range of the data, however style @code{"r"} |
| leaves a small amount of space at the edges. (@Sl{} has other styles |
| not implemented in @R{}.) |
| |
| @c Setting this parameter to @code{"d"} (direct axis) @emph{locks in} the |
| @c current axis and uses it for all future plots (or until the parameter is |
| @c set to one of the other values above, at least.) Useful for generating |
| @c series of fixed-scale plots. |
| @end table |
| |
| @node Figure margins, Multiple figure environment, Axes and tick marks, Graphics parameters |
| @subsection Figure margins |
| |
| |
| A single plot in @R{} is known as a @code{figure} and comprises a |
| @emph{plot region} surrounded by margins (possibly containing axis |
| labels, titles, etc.) and (usually) bounded by the axes themselves. |
| |
| @ifnotinfo |
| A typical figure is |
| |
| @image{images/fig11,7cm} |
| @end ifnotinfo |
| |
| Graphics parameters controlling figure layout include: |
| |
| @table @code |
| @item mai=c(1, 0.5, 0.5, 0) |
| Widths of the bottom, left, top and right margins, respectively, |
| measured in inches. |
| |
| @item mar=c(4, 2, 2, 1) |
| Similar to @code{mai}, except the measurement unit is text lines. |
| @end table |
| |
| @code{mar} and @code{mai} are equivalent in the sense that setting one |
| changes the value of the other. The default values chosen for this |
| parameter are often too large; the right-hand margin is rarely needed, |
| and neither is the top margin if no title is being used. The bottom and |
| left margins must be large enough to accommodate the axis and tick |
| labels. Furthermore, the default is chosen without regard to the size |
| of the device surface: for example, using the @code{postscript()} driver |
| with the @code{height=4} argument will result in a plot which is about |
| 50% margin unless @code{mar} or @code{mai} are set explicitly. When |
| multiple figures are in use (see below) the margins are reduced, however |
| this may not be enough when many figures share the same page. |
| |
| @node Multiple figure environment, , Figure margins, Graphics parameters |
| @subsection Multiple figure environment |
| |
| @R{} allows you to create an @math{n} by @math{m} array of figures on a |
| single page. Each figure has its own margins, and the array of figures |
| is optionally surrounded by an @emph{outer margin}, as shown in the |
| following figure. |
| |
| @ifnotinfo |
| @image{images/fig12,6cm} |
| @end ifnotinfo |
| |
| The graphical parameters relating to multiple figures are as follows: |
| |
| @table @code |
| @item mfcol=c(3, 2) |
| @itemx mfrow=c(2, 4) |
| Set the size of a multiple figure array. The first value is the number of |
| rows; the second is the number of columns. The only difference between |
| these two parameters is that setting @code{mfcol} causes figures to be |
| filled by column; @code{mfrow} fills by rows. |
| |
| The layout in the Figure could have been created by setting |
| @code{mfrow=c(3,2)}; the figure shows the page after four plots have |
| been drawn. |
| |
| Setting either of these can reduce the base size of symbols and text |
| (controlled by @code{par("cex")} and the pointsize of the device). In a |
| layout with exactly two rows and columns the base size is reduced by a |
| factor of 0.83: if there are three or more of either rows or columns, |
| the reduction factor is 0.66. |
| |
| @item mfg=c(2, 2, 3, 2) |
| Position of the current figure in a multiple figure environment. The first |
| two numbers are the row and column of the current figure; the last two |
| are the number of rows and columns in the multiple figure array. Set |
| this parameter to jump between figures in the array. You can even use |
| different values for the last two numbers than the @emph{true} values |
| for unequally-sized figures on the same page. |
| |
| @item fig=c(4, 9, 1, 4)/10 |
| Position of the current figure on the page. Values are the positions of |
| the left, right, bottom and top edges respectively, as a percentage of |
| the page measured from the bottom left corner. The example value would |
| be for a figure in the bottom right of the page. Set this parameter for |
| arbitrary positioning of figures within a page. If you want to add a |
| figure to a current page, use @code{new=TRUE} as well (unlike S). |
| |
| @item oma=c(2, 0, 3, 0) |
| @itemx omi=c(0, 0, 0.8, 0) |
| Size of outer margins. Like @code{mar} and @code{mai}, the first |
| measures in text lines and the second in inches, starting with the |
| bottom margin and working clockwise. |
| |
| @end table |
| |
| Outer margins are particularly useful for page-wise titles, etc. Text |
| can be added to the outer margins with the @code{mtext()} function with |
| argument @code{outer=TRUE}. There are no outer margins by default, |
| however, so you must create them explicitly using @code{oma} or |
| @code{omi}. |
| |
| More complicated arrangements of multiple figures can be produced by the |
| @code{split.screen()} and @code{layout()} functions, as well as by the |
| @pkg{grid} and @CRANpkg{lattice} packages. |
| |
| @node Device drivers, Dynamic graphics, Graphics parameters, Graphics |
| @section Device drivers |
| @cindex Graphics device drivers |
| |
| @R{} can generate graphics (of varying levels of quality) on almost any |
| type of display or printing device. Before this can begin, however, |
| @R{} needs to be informed what type of device it is dealing with. This |
| is done by starting a @emph{device driver}. The purpose of a device |
| driver is to convert graphical instructions from @R{} (``draw a line,'' |
| for example) into a form that the particular device can understand. |
| |
| Device drivers are started by calling a device driver function. There |
| is one such function for every device driver: type @code{help(Devices)} |
| for a list of them all. For example, issuing the command |
| |
| @example |
| > postscript() |
| @end example |
| |
| @noindent |
| causes all future graphics output to be sent to the printer in |
| PostScript format. Some commonly-used device drivers are: |
| |
| @table @code |
| @item X11() |
| @findex X11 |
| For use with the X11 window system on Unix-alikes |
| @item windows() |
| @findex windows |
| For use on Windows |
| @item quartz() |
| @findex quartz |
| For use on macOS |
| @item postscript() |
| @findex postscript |
| For printing on PostScript printers, or creating PostScript graphics |
| files. |
| @item pdf() |
| @findex pdf |
| Produces a PDF file, which can also be included into PDF files. |
| @item png() |
| @findex png |
| Produces a bitmap PNG file. (Not always available: see its help page.) |
| @item jpeg() |
| @findex jpeg |
| Produces a bitmap JPEG file, best used for @code{image} plots. |
| (Not always available: see its help page.) |
| @end table |
| |
| When you have finished with a device, be sure to terminate the device |
| driver by issuing the command |
| |
| @example |
| > dev.off() |
| @end example |
| |
| This ensures that the device finishes cleanly; for example in the case |
| of hardcopy devices this ensures that every page is completed and has |
| been sent to the printer. (This will happen automatically at the normal |
| end of a session.) |
| |
| @menu |
| * PostScript diagrams for typeset documents:: |
| * Multiple graphics devices:: |
| @end menu |
| |
| @node PostScript diagrams for typeset documents, Multiple graphics devices, Device drivers, Device drivers |
| @subsection PostScript diagrams for typeset documents |
| |
| By passing the @code{file} argument to the @code{postscript()} device |
| driver function, you may store the graphics in PostScript format in a |
| file of your choice. The plot will be in landscape orientation unless |
| the @code{horizontal=FALSE} argument is given, and you can control the |
| size of the graphic with the @code{width} and @code{height} arguments |
| (the plot will be scaled as appropriate to fit these dimensions.) For |
| example, the command |
| |
| @example |
| > postscript("file.ps", horizontal=FALSE, height=5, pointsize=10) |
| @end example |
| |
| @noindent |
| will produce a file containing PostScript code for a figure five inches |
| high, perhaps for inclusion in a document. It is important to note that |
| if the file named in the command already exists, it will be overwritten. |
| This is the case even if the file was only created earlier in the same |
| @R{} session. |
| |
| Many usages of PostScript output will be to incorporate the figure in |
| another document. This works best when @emph{encapsulated} PostScript |
| is produced: @R{} always produces conformant output, but only marks the |
| output as such when the @code{onefile=FALSE} argument is supplied. This |
| unusual notation stems from @Sl{}-compatibility: it really means that |
| the output will be a single page (which is part of the EPSF |
| specification). Thus to produce a plot for inclusion use something like |
| |
| @example |
| > postscript("plot1.eps", horizontal=FALSE, onefile=FALSE, |
| height=8, width=6, pointsize=10) |
| @end example |
| |
| |
| @node Multiple graphics devices, , PostScript diagrams for typeset documents, Device drivers |
| @subsection Multiple graphics devices |
| |
| In advanced use of @R{} it is often useful to have several graphics |
| devices in use at the same time. Of course only one graphics device can |
| accept graphics commands at any one time, and this is known as the |
| @emph{current device}. When multiple devices are open, they form a |
| numbered sequence with names giving the kind of device at any position. |
| |
| The main commands used for operating with multiple devices, and their |
| meanings are as follows: |
| |
| @table @code |
| @item X11() |
| [UNIX] |
| @item windows() |
| @itemx win.printer() |
| @itemx win.metafile() |
| [Windows] |
| @item quartz() |
| [macOS] |
| @item postscript() |
| @itemx pdf() |
| @item png() |
| @item jpeg() |
| @item tiff() |
| @item bitmap() |
| @itemx @dots{} |
| Each new call to a device driver function opens a new graphics device, |
| thus extending by one the device list. This device becomes the current |
| device, to which graphics output will be sent. |
| |
| @item dev.list() |
| @findex dev.list |
| Returns the number and name of all active devices. The device at |
| position 1 on the list is always the @emph{null device} which does not |
| accept graphics commands at all. |
| |
| @item dev.next() |
| @itemx dev.prev() |
| @findex dev.next |
| @findex dev.prev |
| Returns the number and name of the graphics device next to, or previous |
| to the current device, respectively. |
| |
| @item dev.set(which=@var{k}) |
| @findex dev.set |
| Can be used to change the current graphics device to the one at position |
| @var{k} of the device list. Returns the number and label of the device. |
| |
| @item dev.off(@var{k}) |
| @findex dev.off |
| Terminate the graphics device at point @var{k} of the device list. For |
| some devices, such as @code{postscript} devices, this will either print |
| the file immediately or correctly complete the file for later printing, |
| depending on how the device was initiated. |
| |
| @item dev.copy(device, @dots{}, which=@var{k}) |
| @itemx dev.print(device, @dots{}, which=@var{k}) |
| Make a copy of the device @var{k}. Here @code{device} is a device |
| function, such as @code{postscript}, with extra arguments, if needed, |
| specified by @samp{@dots{}}. @code{dev.print} is similar, but the |
| copied device is immediately closed, so that end actions, such as |
| printing hardcopies, are immediately performed. |
| |
| @item graphics.off() |
| Terminate all graphics devices on the list, except the null device. |
| @end table |
| |
| @node Dynamic graphics, , Device drivers, Graphics |
| @section Dynamic graphics |
| @cindex Dynamic graphics |
| |
| @R{} does not have builtin capabilities for dynamic or |
| interactive graphics, e.g.@: rotating point clouds or to ``brushing'' |
| (interactively highlighting) points. However, extensive dynamic graphics |
| facilities are available in the system GGobi by Swayne, Cook and Buja |
| available from |
| |
| @quotation |
| @uref{http://www.ggobi.org/} |
| @end quotation |
| |
| @noindent |
| and these can be accessed from @R{} via the package @CRANpkg{rggobi}, described at |
| @uref{http://www.ggobi.org/rggobi}. |
| |
| Also, package @CRANpkg{rgl} provides ways to interact with 3D plots, for example |
| of surfaces. |
| |
| @node Packages, OS facilities, Graphics, Top |
| @chapter Packages |
| @cindex Packages |
| |
| All @R{} functions and datasets are stored in @emph{packages}. Only |
| when a package is loaded are its contents available. This is done both |
| for efficiency (the full list would take more memory and would take |
| longer to search than a subset), and to aid package developers, who are |
| protected from name clashes with other code. The process of developing |
| packages is described in @ref{Creating R packages, , Creating R |
| packages, R-exts, Writing R Extensions}. Here, we will describe them |
| from a user's point of view. |
| |
| To see which packages are installed at your site, issue the command |
| |
| @example |
| > library() |
| @end example |
| |
| @noindent |
| with no arguments. To load a particular package (e.g., the @CRANpkg{boot} |
| package containing functions from Davison & Hinkley (1997)), use a |
| command like |
| |
| @example |
| > library(boot) |
| @end example |
| |
| Users connected to the Internet can use the @code{install.packages()} |
| and @code{update.packages()} functions (available through the |
| @code{Packages} menu in the Windows and macOS GUIs, @pxref{Installing |
| packages, , , R-admin, R Installation and Administration}) to install |
| and update packages. |
| |
| To see which packages are currently loaded, use |
| |
| @example |
| > search() |
| @end example |
| |
| @noindent |
| to display the search list. Some packages may be loaded but not |
| available on the search list (@pxref{Namespaces}): these will be |
| included in the list given by |
| |
| @example |
| > loadedNamespaces() |
| @end example |
| |
| |
| To see a list of all available help topics in an installed package, |
| use |
| |
| @example |
| > help.start() |
| @end example |
| |
| @noindent |
| to start the @HTML{} help system, and then navigate to the package |
| listing in the @code{Reference} section. |
| |
| @menu |
| * Standard packages:: |
| * Contributed packages and CRAN:: |
| * Namespaces:: |
| @end menu |
| |
| @node Standard packages, Contributed packages and CRAN, Packages, Packages |
| @section Standard packages |
| |
| The standard (or @emph{base}) packages are considered part of the @R{} |
| source code. They contain the basic functions that allow @R{} to work, |
| and the datasets and standard statistical and graphical functions that |
| are described in this manual. They should be automatically available in |
| any @R{} installation. @xref{Which add-on packages exist for R?, , R |
| packages, R-FAQ, R FAQ}, for a complete list. |
| |
| @node Contributed packages and CRAN, Namespaces, Standard packages, Packages |
| @section Contributed packages and @acronym{CRAN} |
| @cindex CRAN |
| |
| There are thousands of contributed packages for @R{}, written by many |
| different authors. Some of these packages implement specialized |
| statistical methods, others give access to data or hardware, and others |
| are designed to complement textbooks. Some (the @emph{recommended} |
| packages) are distributed with every binary distribution of @R{}. Most |
| are available for download from @acronym{CRAN} |
| (@uref{https://CRAN.R-project.org/} and its mirrors) and other |
| repositories such as Bioconductor (@uref{https://www.bioconductor.org/}). |
| and Omegahat (@uref{http://www.omegahat.net/}). The @emph{R FAQ} |
| contains a list of CRAN packages current at the time of release, but the |
| collection of available packages changes very frequently. |
| |
| @node Namespaces, , Contributed packages and CRAN, Packages |
| @section Namespaces |
| @cindex Namespace |
| @findex :: |
| @findex ::: |
| |
| Packages have @emph{namespaces}, which do three things: they allow the |
| package writer to hide functions and data that are meant only for |
| internal use, they prevent functions from breaking when a user (or other |
| package writer) picks a name that clashes with one in the package, and |
| they provide a way to refer to an object within a particular package. |
| |
| For example, @code{t()} is the transpose function in @R{}, but users |
| might define their own function named @code{t}. Namespaces prevent |
| the user's definition from taking precedence, and breaking every |
| function that tries to transpose a matrix. |
| |
| There are two operators that work with namespaces. The double-colon |
| operator @code{::} selects definitions from a particular namespace. |
| In the example above, the transpose function will always be available |
| as @code{base::t}, because it is defined in the @code{base} package. |
| Only functions that are exported from the package can be retrieved in |
| this way. |
| |
| The triple-colon operator @code{:::} may be seen in a few places in R |
| code: it acts like the double-colon operator but also allows access to |
| hidden objects. Users are more likely to use the @code{getAnywhere()} |
| function, which searches multiple packages. |
| |
| Packages are often inter-dependent, and loading one may cause others to |
| be automatically loaded. The colon operators described above will also |
| cause automatic loading of the associated package. When packages with |
| namespaces are loaded automatically they are not added to the search |
| list. |
| |
| @node OS facilities, A sample session, Packages, Top |
| @chapter OS facilities |
| |
| @R{} has quite extensive facilities to access the OS under which it is |
| running: this allows it to be used as a scripting language and that |
| ability is much used by @R{} itself, for example to install packages. |
| |
| Because @R{}'s own scripts need to work across all platforms, |
| considerable effort has gone into make the scripting facilities as |
| platform-independent as is feasible. |
| |
| @menu |
| * Files and directories:: |
| * Filepaths:: |
| * System commands:: |
| * Compression and Archives:: |
| @end menu |
| |
| @node Files and directories, Filepaths, OS facilities, OS facilities |
| @section Files and directories |
| |
| There are many functions to manipulate files and directories. Here are |
| pointers to some of the more commonly used ones. |
| |
| To create an (empty) file or directory, use @code{file.create} or |
| @code{dir.create}. (These are the analogues of the POSIX utilities |
| @command{touch} and @command{mkdir}.) For temporary files and |
| directories in the @R{} session directory see @code{tempfile}. |
| |
| Files can be removed by either @code{file.remove} or @code{unlink}: the |
| latter can remove directory trees. |
| |
| For directory listings use @code{list.files} (also available as |
| @code{dir}) or @code{list.dirs}. These can select files using a regular |
| expression: to select by wildcards use @code{Sys.glob}. |
| |
| Many types of information on a filepath (including for example if it is |
| a file or directory) can be found by @code{file.info}. |
| |
| There are several ways to find out if a file `exists' (a file can |
| exist on the filesystem and not be visible to the current user). |
| There are functions @code{file.exists}, @code{file.access} and |
| @code{file_test} with various versions of this test: @code{file_test} is |
| a version of the POSIX @command{test} command for those familiar with |
| shell scripting. |
| |
| Function @code{file.copy} is the @R{} analogue of the POSIX command |
| @command{cp}. |
| |
| Choosing files can be done interactively by @code{file.choose}: the |
| Windows port has the more versatile functions @code{choose.files} and |
| @code{choose.dir} and there are similar functions in the @pkg{tcltk} |
| package: @code{tk_choose.files} and @code{tk_choose.dir}. |
| |
| Functions @code{file.show} and @code{file.edit} will display and edit |
| one or more files in a way appropriate to the @R{} port, using the |
| facilities of a console (such as RGui on Windows or R.app on macOS) if |
| one is in use. |
| |
| There is some support for @emph{links} in the filesystem: see functions |
| @code{file.link} and @code{Sys.readlink}. |
| |
| |
| @node Filepaths, System commands, Files and directories, OS facilities |
| @section Filepaths |
| |
| With a few exceptions, @R{} relies on the underlying OS functions to |
| manipulate filepaths. Some aspects of this are allowed to depend on the |
| OS, and do, even down to the version of the OS. There are POSIX |
| standards for how OSes should interpret filepaths and many @R{} users |
| assume POSIX compliance: but Windows does not claim to be compliant and |
| other OSes may be less than completely compliant. |
| |
| The following are some issues which have been encountered with filepaths. |
| |
| @itemize @bullet |
| @item |
| POSIX filesystems are case-sensitive, so @file{foo.png} and |
| @file{Foo.PNG} are different files. However, the defaults on Windows |
| and macOS are to be case-insensitive, and FAT filesystems (commonly used |
| on removable storage) are not normally case-sensitive (and all filepaths |
| may be mapped to lower case). |
| |
| @item |
| Almost all the Windows' OS services support the use of slash or |
| backslash as the filepath separator, and @R{} converts the known |
| exceptions to the form required by Windows. |
| |
| @item |
| The behaviour of filepaths with a trailing slash is OS-dependent. Such |
| paths are not valid on Windows and should not be expected to work. |
| POSIX-2008 requires such paths to match only directories, but earlier |
| versions allowed them to also match files. So they are best avoided. |
| |
| @item |
| Multiple slashes in filepaths such as @file{/abc//def} are valid on |
| POSIX filesystems and treated as if there was only one slash. They are |
| @emph{usually} accepted by Windows' OS functions. However, leading |
| double slashes may have a different meaning. |
| |
| @item |
| Windows' UNC filepaths (such as @file{\\server\dir1\dir2\file} and |
| @file{\\?\UNC\server\dir1\dir2\file}) are not supported, but they may |
| work in some @R{} functions. POSIX filesystems are allowed to treat a |
| leading double slash specially. |
| |
| @item |
| Windows allows filepaths containing drives and relative to the current |
| directory on a drive, e.g.@: @file{d:foo/bar} refers to |
| @file{d:/a/b/c/foo/bar} if the current directory @emph{on drive |
| @file{d:}} is @file{/a/b/c}. It is intended that these work, but the |
| use of absolute paths is safer. |
| @end itemize |
| |
| Functions @code{basename} and @code{dirname} select parts of a file |
| path: the recommended way to assemble a file path from components is |
| @code{file.path}. Function @code{pathexpand} does `tilde expansion', |
| substituting values for home directories (the current user's, and |
| perhaps those of other users). |
| |
| On filesystems with links, a single file can be referred to by many |
| filepaths. Function @code{normalizePath} will find a canonical |
| filepath. |
| |
| Windows has the concepts of short (`8.3') and long file names: |
| @code{normalizePath} will return an absolute path using long file names |
| and @code{shortPathName} will return a version using short names. The |
| latter does not contain spaces and uses backslash as the separator, so |
| is sometimes useful for exporting names from @R{}. |
| |
| File @emph{permissions} are a related topic. @R{} has support for the |
| POSIX concepts of read/write/execute permission for owner/group/all but |
| this may be only partially supported on the filesystem, so for example |
| on Windows only read-only files (for the account running the @R{} |
| session) are recognized. Access Control Lists (ACLs) are employed on |
| several filesystems, but do not have an agreed standard and @R{} has no |
| facilities to control them. Use @code{Sys.chmod} to change permissions. |
| |
| @node System commands, Compression and Archives, Filepaths, OS facilities |
| @section System commands |
| |
| Functions @code{system} and @code{system2} are used to invoke a system |
| command and optionally collect its output. @code{system2} is a little |
| more general but its main advantage is that it is easier to write |
| cross-platform code using it. |
| |
| @code{system} behaves differently on Windows from other OSes (because |
| the API C call of that name does). Elsewhere it invokes a shell to run |
| the command: the Windows port of @R{} has a function @code{shell} to do |
| that. |
| |
| To find out if the OS includes a command, use @code{Sys.which}, which |
| attempts to do this in a cross-platform way (unfortunately it is not a |
| standard OS service). |
| |
| Function @code{shQuote} will quote filepaths as needed for commands in |
| the current OS. |
| |
| @node Compression and Archives, , System commands, OS facilities |
| @section Compression and Archives |
| |
| Recent versions of @R{} have extensive facilities to read and write |
| compressed files, often transparently. Reading of files in @R{} is to a |
| vey large extent done by @emph{connections}, and the @code{file} |
| function which is used to open a connection to a file (or a URL) and is |
| able to identify the compression used from the `magic' header of the |
| file. |
| |
| The type of compression which has been supported for longest is |
| @command{gzip} compression, and that remains a good general compromise. |
| Files compressed by the earlier Unix @command{compress} utility can also |
| be read, but these are becoming rare. Two other forms of compression, |
| those of the @command{bzip2} and @command{xz} utilities are also |
| available. These generally achieve higher rates of compression |
| (depending on the file, much higher) at the expense of slower |
| decompression and much slower compression. |
| |
| There is some confusion between @command{xz} and @command{lzma} |
| compression (see @uref{https://en.wikipedia.org/wiki/Xz} and |
| @uref{https://en.wikipedia.org/wiki/LZMA}): @R{} can read files |
| compressed by most versions of either. |
| |
| File archives are single files which contain a collection of files, the |
| most common ones being `tarballs' and zip files as used to distribute |
| @R{} packages. @R{} can list and unpack both (see functions @code{untar} |
| and @code{unzip}) and create both (for @command{zip} with the help of an |
| external program). |
| |
| @node A sample session, Invoking R, OS facilities, Top |
| @appendix A sample session |
| |
| The following session is intended to introduce to you some features of |
| the @R{} environment by using them. Many features of the system will be |
| unfamiliar and puzzling at first, but this puzzlement will soon |
| disappear. |
| @c This is written for the UNIX user. Those using Windows will |
| @c need to adapt the discussion slightly. |
| |
| @table @code |
| @item Start @R{} appropriately for your platform (@pxref{Invoking R}). |
| |
| The @R{} program begins, with a banner. |
| |
| (Within @R{} code, the prompt on the left hand side will not be shown to |
| avoid confusion.) |
| |
| @item help.start() |
| Start the @HTML{} interface to on-line help (using a web browser |
| available at your machine). You should briefly explore the features of |
| this facility with the mouse. |
| |
| Iconify the help window and move on to the next part. |
| |
| @item x <- rnorm(50) |
| @itemx y <- rnorm(x) |
| Generate two pseudo-random normal vectors of @math{x}- and |
| @math{y}-coordinates. |
| |
| @item plot(x, y) |
| Plot the points in the plane. A graphics window will appear automatically. |
| |
| @item ls() |
| See which @R{} objects are now in the @R{} workspace. |
| |
| @item rm(x, y) |
| Remove objects no longer needed. (Clean up). |
| |
| @item x <- 1:20 |
| Make @math{x = (1, 2, @dots{}, 20)}. |
| |
| @item w <- 1 + sqrt(x)/2 |
| A `weight' vector of standard deviations. |
| |
| @item dummy <- data.frame(x=x, y= x + rnorm(x)*w) |
| @itemx dummy |
| Make a @emph{data frame} of two columns, @math{x} and @math{y}, and look |
| at it. |
| |
| @item fm <- lm(y ~ x, data=dummy) |
| @itemx summary(fm) |
| Fit a simple linear regression and look at the |
| analysis. With @code{y} to the left of the tilde, |
| we are modelling @math{y} dependent on @math{x}. |
| |
| @item fm1 <- lm(y ~ x, data=dummy, weight=1/w^2) |
| @itemx summary(fm1) |
| Since we know the standard deviations, we can do a weighted regression. |
| |
| @item attach(dummy) |
| Make the columns in the data frame visible as variables. |
| |
| @item lrf <- lowess(x, y) |
| Make a nonparametric local regression function. |
| |
| @item plot(x, y) |
| Standard point plot. |
| |
| @item lines(x, lrf$y) |
| Add in the local regression. |
| |
| @item abline(0, 1, lty=3) |
| The true regression line: (intercept 0, slope 1). |
| |
| @item abline(coef(fm)) |
| Unweighted regression line. |
| |
| @item abline(coef(fm1), col = "red") |
| Weighted regression line. |
| |
| @item detach() |
| Remove data frame from the search path. |
| |
| @item plot(fitted(fm), resid(fm), |
| @itemx @w{@ @ @ @ @ xlab="Fitted values"}, |
| @itemx @w{@ @ @ @ @ ylab="Residuals"}, |
| @itemx @w{@ @ @ @ @ main="Residuals vs Fitted")} |
| A standard regression diagnostic plot to check for heteroscedasticity. |
| Can you see it? |
| |
| @item qqnorm(resid(fm), main="Residuals Rankit Plot") |
| A normal scores plot to check for skewness, kurtosis and outliers. (Not |
| very useful here.) |
| |
| @item rm(fm, fm1, lrf, x, dummy) |
| Clean up again. |
| @end table |
| |
| The next section will look at data from the classical experiment of |
| Michelson to measure the speed of light. This dataset is available in |
| the @code{morley} object, but we will read it to illustrate the |
| @code{read.table} function. |
| |
| @table @code |
| @item filepath <- system.file("data", "morley.tab" , package="datasets") |
| @itemx filepath |
| Get the path to the data file. |
| |
| @item file.show(filepath) |
| Optional. Look at the file. |
| |
| @item mm <- read.table(filepath) |
| @itemx mm |
| Read in the Michelson data as a data frame, and look at it. |
| There are five experiments (column @code{Expt}) and each has 20 runs |
| (column @code{Run}) and @code{sl} is the recorded speed of light, |
| suitably coded. |
| |
| @item mm$Expt <- factor(mm$Expt) |
| @itemx mm$Run <- factor(mm$Run) |
| Change @code{Expt} and @code{Run} into factors. |
| |
| @item attach(mm) |
| Make the data frame visible at position 3 (the default). |
| |
| @item plot(Expt, Speed, main="Speed of Light Data", xlab="Experiment No.") |
| Compare the five experiments with simple boxplots. |
| |
| @item fm <- aov(Speed ~ Run + Expt, data=mm) |
| @itemx summary(fm) |
| Analyze as a randomized block, with `runs' and `experiments' as factors. |
| |
| @item fm0 <- update(fm, . ~ . - Run) |
| @itemx anova(fm0, fm) |
| Fit the sub-model omitting `runs', and compare using a formal analysis |
| of variance. |
| |
| @item detach() |
| @itemx rm(fm, fm0) |
| Clean up before moving on. |
| |
| @end table |
| |
| We now look at some more graphical features: contour and image plots. |
| |
| @table @code |
| @item x <- seq(-pi, pi, len=50) |
| @itemx y <- x |
| @math{x} is a vector of 50 equally spaced values in |
| @ifnottex |
| the interval [-pi\, pi]. |
| @end ifnottex |
| @iftex |
| @tex |
| $-\pi\leq x \leq \pi$. |
| @end tex |
| @end iftex |
| @math{y} is the same. |
| |
| @item f <- outer(x, y, function(x, y) cos(y)/(1 + x^2)) |
| @math{f} is a square matrix, with rows and columns indexed by @math{x} |
| and @math{y} respectively, of values of the function |
| @eqn{\cos(y)/(1 + x^2),cos(y)/(1 + x^2)}. |
| |
| @item oldpar <- par(no.readonly = TRUE) |
| @itemx par(pty="s") |
| Save the plotting parameters and set the plotting region to ``square''. |
| |
| @item contour(x, y, f) |
| @itemx contour(x, y, f, nlevels=15, add=TRUE) |
| Make a contour map of @math{f}; add in more lines for more detail. |
| |
| @item fa <- (f-t(f))/2 |
| @code{fa} is the ``asymmetric part'' of @math{f}. (@code{t()} is |
| transpose). |
| |
| @item contour(x, y, fa, nlevels=15) |
| Make a contour plot, @dots{} |
| |
| @item par(oldpar) |
| @dots{} and restore the old graphics parameters. |
| |
| @item image(x, y, f) |
| @itemx image(x, y, fa) |
| Make some high density image plots, (of which you can get |
| hardcopies if you wish), @dots{} |
| |
| @item objects(); rm(x, y, f, fa) |
| @dots{} and clean up before moving on. |
| @end table |
| |
| @R{} can do complex arithmetic, also. |
| |
| @table @code |
| @item th <- seq(-pi, pi, len=100) |
| @itemx z <- exp(1i*th) |
| @code{1i} is used for the complex number @math{i}. |
| |
| @item par(pty="s") |
| @itemx plot(z, type="l") |
| Plotting complex arguments means plot imaginary versus real parts. This |
| should be a circle. |
| |
| @item w <- rnorm(100) + rnorm(100)*1i |
| Suppose we want to sample points within the unit circle. One method |
| would be to take complex numbers with standard normal real and imaginary |
| parts @dots{} |
| |
| @item w <- ifelse(Mod(w) > 1, 1/w, w) |
| @dots{} and to map any outside the circle onto their reciprocal. |
| |
| @item plot(w, xlim=c(-1,1), ylim=c(-1,1), pch="+",xlab="x", ylab="y") |
| @itemx lines(z) |
| All points are inside the unit circle, but the distribution is not |
| uniform. |
| |
| @item w <- sqrt(runif(100))*exp(2*pi*runif(100)*1i) |
| @itemx plot(w, xlim=c(-1,1), ylim=c(-1,1), pch="+", xlab="x", ylab="y") |
| @itemx lines(z) |
| The second method uses the uniform distribution. The points should now |
| look more evenly spaced over the disc. |
| |
| @item rm(th, w, z) |
| Clean up again. |
| |
| @item q() |
| Quit the @R{} program. You will be asked if you want to save the @R{} |
| workspace, and for an exploratory session like this, you probably do not |
| want to save it. |
| @end table |
| |
| @node Invoking R, The command-line editor, A sample session, Top |
| @appendix Invoking R |
| |
| Users of @R{} on Windows or macOS should read the OS-specific section |
| first, but command-line use is also supported. |
| |
| @menu |
| * Invoking R from the command line:: |
| * Invoking R under Windows:: |
| * Invoking R under macOS:: |
| * Scripting with R:: |
| @end menu |
| |
| @node Invoking R from the command line, Invoking R under Windows, Invoking R, Invoking R |
| @appendixsec Invoking R from the command line |
| |
| When working at a command line on UNIX or Windows, the command @samp{R} |
| can be used both for starting the main @R{} program in the form |
| |
| @display |
| @code{R} [@var{options}] [@code{<}@var{infile}] [@code{>}@var{outfile}], |
| @end display |
| |
| @noindent |
| or, via the @code{R CMD} interface, as a wrapper to various @R{} tools |
| (e.g., for processing files in @R{} documentation format or manipulating |
| add-on packages) which are not intended to be called ``directly''. |
| |
| At the Windows command-line, @command{Rterm.exe} is preferred to |
| @command{R}. |
| |
| You need to ensure that either the environment variable @env{TMPDIR} is |
| unset or it points to a valid place to create temporary files and |
| directories. |
| |
| Most options control what happens at the beginning and at the end of an |
| @R{} session. The startup mechanism is as follows (see also the on-line |
| help for topic @samp{Startup} for more information, and the section below |
| for some Windows-specific details). |
| |
| @itemize @bullet |
| @item |
| Unless @option{--no-environ} was given, @R{} searches for user and site |
| files to process for setting environment variables. The name of the |
| site file is the one pointed to by the environment variable |
| @env{R_ENVIRON}; if this is unset, @file{@var{R_HOME}/etc/Renviron.site} |
| is used (if it exists). The user file is the one pointed to by the |
| environment variable @env{R_ENVIRON_USER} if this is set; otherwise, |
| files @file{.Renviron} in the current or in the user's home directory |
| (in that order) are searched for. These files should contain lines of |
| the form @samp{@var{name}=@var{value}}. (See @code{help("Startup")} for |
| a precise description.) Variables you might want to set include |
| @env{R_PAPERSIZE} (the default paper size), @env{R_PRINTCMD} (the |
| default print command) and @env{R_LIBS} (specifies the list of @R{} |
| library trees searched for add-on packages). |
| |
| @item |
| Then @R{} searches for the site-wide startup profile unless the command |
| line option @option{--no-site-file} was given. The name of this file is |
| taken from the value of the @env{R_PROFILE} environment variable. If |
| that variable is unset, the default |
| @file{@var{R_HOME}/etc/Rprofile.site} is used if this exists. |
| |
| @item |
| Then, unless @option{--no-init-file} was given, @R{} searches for a user |
| profile and sources it. The name of this file is taken from the |
| environment variable @env{R_PROFILE_USER}; if unset, a file called |
| @file{.Rprofile} in the current directory or in the user's home |
| directory (in that order) is searched for. |
| |
| @item |
| It also loads a saved workspace from file @file{.RData} in the current |
| directory if there is one (unless @option{--no-restore} or |
| @option{--no-restore-data} was specified). |
| |
| @item |
| Finally, if a function @code{.First()} exists, it is executed. This |
| function (as well as @code{.Last()} which is executed at the end of the |
| @R{} session) can be defined in the appropriate startup profiles, or |
| reside in @file{.RData}. |
| @end itemize |
| |
| In addition, there are options for controlling the memory available to |
| the @R{} process (see the on-line help for topic @samp{Memory} for more |
| information). Users will not normally need to use these unless they |
| are trying to limit the amount of memory used by @R{}. |
| |
| @R{} accepts the following command-line options. |
| |
| @table @option |
| @item --help |
| @itemx -h |
| Print short help message to standard output and exit successfully. |
| |
| @item --version |
| Print version information to standard output and exit successfully. |
| |
| @item --encoding=@var{enc} |
| Specify the encoding to be assumed for input from the console or |
| @code{stdin}. This needs to be an encoding known to @code{iconv}: see |
| its help page. (@code{--encoding @var{enc}} is also accepted.) The |
| input is re-encoded to the locale @R{} is running in and needs to be |
| representable in the latter's encoding (so e.g.@: you cannot re-encode |
| Greek text in a French locale unless that locale uses the UTF-8 |
| encoding). |
| |
| @item RHOME |
| Print the path to the @R{} ``home directory'' to standard output and |
| exit successfully. Apart from the front-end shell script and the man |
| page, @R{} installation puts everything (executables, packages, etc.) |
| into this directory. |
| |
| @item --save |
| @itemx --no-save |
| Control whether data sets should be saved or not at the end of the @R{} |
| session. If neither is given in an interactive session, the user is |
| asked for the desired behavior when ending the session with @kbd{q()}; |
| in non-interactive use one of these must be specified or implied by some |
| other option (see below). |
| |
| @item --no-environ |
| Do not read any user file to set environment variables. |
| |
| @item --no-site-file |
| Do not read the site-wide profile at startup. |
| |
| @item --no-init-file |
| Do not read the user's profile at startup. |
| |
| @item --restore |
| @itemx --no-restore |
| @itemx --no-restore-data |
| Control whether saved images (file @file{.RData} in the directory where |
| @R{} was started) should be restored at startup or not. The default is |
| to restore. (@option{--no-restore} implies all the specific |
| @option{--no-restore-*} options.) |
| |
| @item --no-restore-history |
| Control whether the history file (normally file @file{.Rhistory} in the |
| directory where @R{} was started, but can be set by the environment |
| variable @env{R_HISTFILE}) should be restored at startup or not. The |
| default is to restore. |
| |
| @item --no-Rconsole |
| (Windows only) Prevent loading the @file{Rconsole} file at startup. |
| |
| @item --vanilla |
| Combine @option{--no-save}, @option{--no-environ}, |
| @option{--no-site-file}, @option{--no-init-file} and |
| @option{--no-restore}. Under Windows, this also includes |
| @option{--no-Rconsole}. |
| |
| @item -f @var{file} |
| @itemx --file=@var{file} |
| (not @command{Rgui.exe}) Take input from @var{file}: @samp{-} means |
| @code{stdin}. Implies @option{--no-save} unless @option{--save} has |
| been set. On a Unix-alike, shell metacharacters should be avoided in |
| @var{file} (but spaces are allowed). |
| |
| @item -e @var{expression} |
| (not @command{Rgui.exe}) Use @var{expression} as an input line. One or |
| more @option{-e} options can be used, but not together with @option{-f} |
| or @option{--file}. Implies @option{--no-save} unless @option{--save} |
| has been set. (There is a limit of 10,000 bytes on the total length of |
| expressions used in this way. Expressions containing spaces or shell |
| metacharacters will need to be quoted.) |
| |
| @item --no-readline |
| (UNIX only) Turn off command-line editing via @strong{readline}. This |
| is useful when running @R{} from within Emacs using the @acronym{ESS} |
| (``Emacs Speaks Statistics'') package. @xref{The command-line editor}, |
| for more information. Command-line editing is enabled for default |
| interactive use (see @option{--interactive}). This option also affects |
| tilde-expansion: see the help for @code{path.expand}. |
| |
| @item --min-vsize=@var{N} |
| @itemx --min-nsize=@var{N} |
| For expert use only: set the initial trigger sizes for garbage |
| collection of vector heap (in bytes) and @emph{cons cells} (number) |
| respectively. Suffix @samp{M} specifies megabytes or millions of cells |
| respectively. The defaults are 6Mb and 350k respectively and can also |
| be set by environment variables @env{R_NSIZE} and @env{R_VSIZE}. |
| |
| @item --max-ppsize=@var{N} |
| Specify the maximum size of the pointer protection stack as @var{N} |
| locations. This defaults to 10000, but can be increased to allow |
| large and complicated calculations to be done. Currently the maximum |
| value accepted is 100000. |
| |
| @item --max-mem-size=@var{N} |
| (Windows only) Specify a limit for the amount of memory to be used both |
| for @R{} objects and working areas. This is set by default to the |
| smaller of the amount of physical RAM in the machine and for 32-bit |
| @R{}, 1.5Gb@footnote{2.5Gb on versions of Windows that support 3Gb per |
| process and have the support enabled: see the @file{rw-FAQ} Q2.9; 3.5Gb |
| on most 64-bit versions of Windows.}, and must be between 32Mb and the |
| maximum allowed on that version of Windows. |
| |
| @item --quiet |
| @itemx --silent |
| @itemx -q |
| Do not print out the initial copyright and welcome messages. |
| |
| @item --slave |
| Make @R{} run as quietly as possible. This option is intended to |
| support programs which use @R{} to compute results for them. It implies |
| @option{--quiet} and @option{--no-save}. |
| |
| @item --interactive |
| (UNIX only) Assert that @R{} really is being run interactively even if |
| input has been redirected: use if input is from a FIFO or pipe and fed |
| from an interactive program. (The default is to deduce that @R{} is |
| being run interactively if and only if @file{stdin} is connected to a |
| terminal or @code{pty}.) Using @option{-e}, @option{-f} or |
| @option{--file} asserts non-interactive use even if |
| @option{--interactive} is given. |
| |
| Note that this does not turn on command-line editing. |
| |
| @item --ess |
| (Windows only) Set @code{Rterm} up for use by @code{R-inferior-mode} in |
| @acronym{ESS}, including asserting interactive use (without the |
| command-line editor) and no buffering of @file{stdout}. |
| |
| @item --verbose |
| Print more information about progress, and in particular set @R{}'s |
| option @code{verbose} to @code{TRUE}. @R{} code uses this option to |
| control the printing of diagnostic messages. |
| |
| @item --debugger=@var{name} |
| @itemx -d @var{name} |
| (UNIX only) Run @R{} through debugger @var{name}. For most debuggers |
| (the exceptions are @command{valgrind} and recent versions of |
| @command{gdb}), further command line options are disregarded, and should |
| instead be given when starting the @R{} executable from inside the |
| debugger. |
| |
| @item --gui=@var{type} |
| @itemx -g @var{type} |
| (UNIX only) Use @var{type} as graphical user interface (note that this |
| also includes interactive graphics). Currently, possible values for |
| @var{type} are @samp{X11} (the default) and, provided that @samp{Tcl/Tk} |
| support is available, @samp{Tk}. (For back-compatibility, @samp{x11} and |
| @samp{tk} are accepted.) |
| |
| @item --arch=@var{name} |
| (UNIX only) Run the specified sub-architecture. |
| |
| @item --args |
| This flag does nothing except cause the rest of the command line to be |
| skipped: this can be useful to retrieve values from it with |
| @code{commandArgs(TRUE)}. |
| @end table |
| |
| Note that input and output can be redirected in the usual way (using |
| @samp{<} and @samp{>}), but the line length limit of 4095 bytes still |
| applies. Warning and error messages are sent to the error channel |
| (@code{stderr}). |
| |
| The command @code{R CMD} allows the invocation of various tools which |
| are useful in conjunction with @R{}, but not intended to be called |
| ``directly''. The general form is |
| |
| @example |
| R CMD @var{command} @var{args} |
| @end example |
| |
| @noindent |
| where @var{command} is the name of the tool and @var{args} the arguments |
| passed on to it. |
| |
| Currently, the following tools are available. |
| |
| @table @code |
| @item BATCH |
| Run @R{} in batch mode. Runs @command{R --restore --save} with possibly |
| further options (see @code{?BATCH}). |
| @item COMPILE |
| (UNIX only) Compile C, C++, Fortran @dots{} files for use with @R{}. |
| @item SHLIB |
| Build shared library for dynamic loading. |
| @item INSTALL |
| Install add-on packages. |
| @item REMOVE |
| Remove add-on packages. |
| @item build |
| Build (that is, package) add-on packages. |
| @item check |
| Check add-on packages. |
| @item LINK |
| (UNIX only) Front-end for creating executable programs. |
| @item Rprof |
| Post-process @R{} profiling files. |
| @item Rdconv |
| @itemx Rd2txt |
| Convert Rd format to various other formats, including @HTML{}, @LaTeX{}, |
| plain text, and extracting the examples. @code{Rd2txt} can be used as |
| shorthand for @code{Rd2conv -t txt}. |
| @item Rd2pdf |
| Convert Rd format to PDF. |
| @item Stangle |
| Extract S/R code from Sweave or other vignette documentation |
| @item Sweave |
| Process Sweave or other vignette documentation |
| @item Rdiff |
| Diff @R{} output ignoring headers etc |
| @item config |
| Obtain configuration information |
| @item javareconf |
| (Unix only) Update the Java configuration variables |
| @item rtags |
| (Unix only) Create Emacs-style tag files from C, R, and Rd files |
| @item open |
| (Windows only) Open a file via Windows' file associations |
| @item texify |
| (Windows only) Process (La)TeX files with R's style files |
| @end table |
| |
| Use |
| |
| @example |
| R CMD @var{command} --help |
| @end example |
| |
| @noindent |
| to obtain usage information for each of the tools accessible via the |
| @code{R CMD} interface. |
| |
| In addition, you can use options @option{--arch=}, |
| @option{--no-environ}, @option{--no-init-file}, @option{--no-site-file} |
| and @option{--vanilla} between @command{R} and @command{CMD}: these |
| affect any @R{} processes run by the tools. (Here @option{--vanilla} is |
| equivalent to @option{--no-environ --no-site-file --no-init-file}.) |
| However, note that @command{R CMD} does not of itself use any @R{} |
| startup files (in particular, neither user nor site @file{Renviron} |
| files), and all of the @R{} processes run by these tools (except |
| @command{BATCH}) use @option{--no-restore}. Most use @option{--vanilla} |
| and so invoke no @R{} startup files: the current exceptions are |
| @command{INSTALL}, @command{REMOVE}, @command{Sweave} and |
| @command{SHLIB} (which uses @option{--no-site-file --no-init-file}). |
| |
| @example |
| R CMD @var{cmd} @var{args} |
| @end example |
| |
| @noindent |
| for any other executable @command{@var{cmd}} on the path or given by an |
| absolute filepath: this is useful to have the same environment as @R{} |
| or the specific commands run under, for example to run @command{ldd} or |
| @command{pdflatex}. Under Windows @var{cmd} can be an executable or a |
| batch file, or if it has extension @code{.sh} or @code{.pl} the |
| appropriate interpreter (if available) is called to run it. |
| |
| |
| @node Invoking R under Windows, Invoking R under macOS, Invoking R from the command line, Invoking R |
| @appendixsec Invoking R under Windows |
| |
| There are two ways to run @R{} under Windows. Within a terminal window |
| (e.g.@: @code{cmd.exe} or a more capable shell), the methods described in |
| the previous section may be used, invoking by @code{R.exe} or more |
| directly by @code{Rterm.exe}. For interactive use, there is a |
| console-based GUI (@code{Rgui.exe}). |
| |
| The startup procedure under Windows is very similar to that under |
| UNIX, but references to the `home directory' need to be clarified, as |
| this is not always defined on Windows. If the environment variable |
| @env{R_USER} is defined, that gives the home directory. Next, if the |
| environment variable @env{HOME} is defined, that gives the home |
| directory. After those two user-controllable settings, @R{} tries to |
| find system defined home directories. It first tries to use the |
| Windows "personal" directory (typically @code{My Documents} in |
| recent versions of Windows). If that fails, and |
| environment variables @env{HOMEDRIVE} and @env{HOMEPATH} are defined |
| (and they normally are) these define the home directory. Failing all |
| those, the home directory is taken to be the starting directory. |
| |
| You need to ensure that either the environment variables @env{TMPDIR}, |
| @env{TMP} and @env{TEMP} are either unset or one of them points to a |
| valid place to create temporary files and directories. |
| |
| Environment variables can be supplied as @samp{@var{name}=@var{value}} |
| pairs on the command line. |
| |
| If there is an argument ending @file{.RData} (in any case) it is |
| interpreted as the path to the workspace to be restored: it implies |
| @option{--restore} and sets the working directory to the parent of the |
| named file. (This mechanism is used for drag-and-drop and file |
| association with @code{RGui.exe}, but also works for @code{Rterm.exe}. |
| If the named file does not exist it sets the working directory |
| if the parent directory exists.) |
| |
| The following additional command-line options are available when |
| invoking @code{RGui.exe}. |
| |
| @table @option |
| @item --mdi |
| @itemx --sdi |
| @itemx --no-mdi |
| Control whether @code{Rgui} will operate as an MDI program |
| (with multiple child windows within one main window) or an SDI application |
| (with multiple top-level windows for the console, graphics and pager). The |
| command-line setting overrides the setting in the user's @file{Rconsole} file. |
| |
| @item --debug |
| Enable the ``Break to debugger'' menu item in @code{Rgui}, and trigger |
| a break to the debugger during command line processing. |
| @end table |
| |
| Under Windows with @code{R CMD} you may also specify your own |
| @file{.bat}, @file{.exe}, @file{.sh} or @file{.pl} file. It will be run |
| under the appropriate interpreter (Perl for @file{.pl}) with several |
| environment variables set appropriately, including @env{R_HOME}, |
| @env{R_OSTYPE}, @env{PATH}, @env{BSTINPUTS} and @env{TEXINPUTS}. For |
| example, if you already have @file{latex.exe} on your path, then |
| |
| @example |
| R CMD latex.exe mydoc |
| @end example |
| @noindent |
| will run @LaTeX{} on @file{mydoc.tex}, with the path to @R{}'s |
| @file{share/texmf} macros appended to @env{TEXINPUTS}. (Unfortunately, |
| this does not help with the MiKTeX build of @LaTeX{}, but |
| @command{R CMD texify mydoc} will work in that case.) |
| |
| @node Invoking R under macOS, Scripting with R, Invoking R under Windows, Invoking R |
| @appendixsec Invoking R under macOS |
| |
| There are two ways to run @R{} under macOS. Within a @code{Terminal.app} |
| window by invoking @code{R}, the methods described in the first |
| subsection apply. There is also console-based GUI (@code{R.app}) that by |
| default is installed in the @code{Applications} folder on your |
| system. It is a standard double-clickable macOS application. |
| |
| The startup procedure under macOS is very similar to that under UNIX, but |
| @code{R.app} does not make use of command-line arguments. The `home |
| directory' is the one inside the R.framework, but the startup and |
| current working directory are set as the user's home directory unless a |
| different startup directory is given in the Preferences window |
| accessible from within the GUI. |
| |
| @node Scripting with R, , Invoking R under macOS, Invoking R |
| @appendixsec Scripting with R |
| |
| If you just want to run a file @file{foo.R} of @R{} commands, the |
| recommended way is to use @command{R CMD BATCH foo.R}. If you want to |
| run this in the background or as a batch job use OS-specific facilities |
| to do so: for example in most shells on Unix-alike OSes @command{R CMD |
| BATCH foo.R &} runs a background job. |
| |
| You can pass parameters to scripts via additional arguments on the |
| command line: for example (where the exact quoting needed will depend on |
| the shell in use) |
| |
| @example |
| R CMD BATCH "--args arg1 arg2" foo.R & |
| @end example |
| |
| @noindent |
| will pass arguments to a script which can be retrieved as a character |
| vector by |
| |
| @example |
| args <- commandArgs(TRUE) |
| @end example |
| |
| This is made simpler by the alternative front-end @command{Rscript}, |
| which can be invoked by |
| |
| @example |
| Rscript foo.R arg1 arg2 |
| @end example |
| |
| @noindent |
| and this can also be used to write executable script files like (at |
| least on Unix-alikes, and in some Windows shells) |
| |
| @example |
| #! /path/to/Rscript |
| args <- commandArgs(TRUE) |
| ... |
| q(status=<exit status code>) |
| @end example |
| |
| @noindent |
| If this is entered into a text file @file{runfoo} and this is made |
| executable (by @command{chmod 755 runfoo}), it can be invoked for |
| different arguments by |
| |
| @example |
| runfoo arg1 arg2 |
| @end example |
| |
| @noindent |
| For further options see @command{help("Rscript")}. This writes @R{} |
| output to @file{stdout} and @file{stderr}, and this can be redirected in |
| the usual way for the shell running the command. |
| |
| If you do not wish to hardcode the path to @command{Rscript} but have it |
| in your path (which is normally the case for an installed @R{} except on |
| Windows, but e.g.@: macOS users may need to add @file{/usr/local/bin} |
| to their path), use |
| |
| @example |
| #! /usr/bin/env Rscript |
| ... |
| @end example |
| |
| @noindent |
| At least in Bourne and bash shells, the @code{#!} mechanism does |
| @strong{not} allow extra arguments like |
| @code{#! /usr/bin/env Rscript --vanilla}. |
| |
| One thing to consider is what @code{stdin()} refers to. It is |
| commonplace to write @R{} scripts with segments like |
| |
| @example |
| chem <- scan(n=24) |
| 2.90 3.10 3.40 3.40 3.70 3.70 2.80 2.50 2.40 2.40 2.70 2.20 |
| 5.28 3.37 3.03 3.03 28.95 3.77 3.40 2.20 3.50 3.60 3.70 3.70 |
| @end example |
| |
| @noindent |
| and @code{stdin()} refers to the script file to allow such traditional |
| usage. If you want to refer to the process's @file{stdin}, use |
| @code{"stdin"} as a @code{file} connection, e.g.@: @code{scan("stdin", ...)}. |
| |
| Another way to write executable script files (suggested by Fran@,{c}ois |
| Pinard) is to use a @emph{here document} like |
| |
| @example |
| #!/bin/sh |
| [environment variables can be set here] |
| R --slave [other options] <<EOF |
| |
| R program goes here... |
| |
| EOF |
| @end example |
| |
| @noindent |
| but here @code{stdin()} refers to the program source and |
| @code{"stdin"} will not be usable. |
| |
| Short scripts can be passed to @command{Rscript} on the command-line |
| @emph{via} the @option{-e} flag. (Empty scripts are not accepted.) |
| |
| Note that on a Unix-alike the input filename (such as @file{foo.R}) |
| should not contain spaces nor shell metacharacters. |
| |
| |
| @node The command-line editor, Function and variable index, Invoking R, Top |
| @appendix The command-line editor |
| |
| @appendixsection Preliminaries |
| |
| When the @acronym{GNU} @strong{readline} library is available at the |
| time @R{} is configured for compilation under UNIX, an inbuilt command |
| line editor allowing recall, editing and re-submission of prior commands |
| is used. Note that other versions of @strong{readline} exist and may be |
| used by the inbuilt command line editor: this used to happen on macOS. |
| |
| It can be disabled (useful for usage with @acronym{ESS} @footnote{The |
| `Emacs Speaks Statistics' package; see the @acronym{URL} |
| @uref{https://ESS.R-project.org/}}) using the startup option |
| @option{--no-readline}. |
| |
| Windows versions of @R{} have somewhat simpler command-line editing: see |
| @samp{Console} under the @samp{Help} menu of the @acronym{GUI}, and the |
| file @file{README.Rterm} for command-line editing under |
| @code{Rterm.exe}. |
| |
| When using @R{} with GNU@footnote{It is possible to build @R{} using an |
| emulation of GNU @strong{readline}, such as one based on NetBSD's |
| @strong{editline}, it which case only a subset of the capabilities may |
| be provided.} @strong{readline} capabilities, the functions described |
| below are available, as well as others (probably) documented in |
| @command{man readline} or @command{info readline} on your system. |
| |
| Many of these use either Control or Meta characters. Control |
| characters, such as @kbd{Control-m}, are obtained by holding the |
| @key{CTRL} down while you press the @key{m} key, and are written as |
| @kbd{C-m} below. Meta characters, such as @kbd{Meta-b}, are typed by |
| holding down @key{META}@footnote{On a PC keyboard this is usually the |
| Alt key, occasionally the `Windows' key. On a Mac keyboard normally no |
| meta key is available.} and pressing @key{b}, and written as @kbd{M-b} |
| in the following. If your terminal does not have a @key{META} key |
| enabled, you can still type Meta characters using two-character |
| sequences starting with @kbd{ESC}. Thus, to enter @kbd{M-b}, you could |
| type @key{ESC}@key{b}. The @kbd{ESC} character sequences are also |
| allowed on terminals with real Meta keys. Note that case is significant |
| for Meta characters. |
| |
| Some but not all versions@footnote{In particular, not versions 6.3 or |
| later: this is worked around as from @R{} 3.4.0.} of @strong{readline} |
| will recognize resizing of the terminal window so this is best avoided. |
| |
| @appendixsection Editing actions |
| |
| The @R{} program keeps a history of the command lines you type, |
| including the erroneous lines, and commands in your history may be |
| recalled, changed if necessary, and re-submitted as new commands. In |
| Emacs-style command-line editing any straight typing you do while in |
| this editing phase causes the characters to be inserted in the command |
| you are editing, displacing any characters to the right of the cursor. |
| In @emph{vi} mode character insertion mode is started by @kbd{M-i} or |
| @kbd{M-a}, characters are typed and insertion mode is finished by typing |
| a further @key{ESC}. (The default is Emacs-style, and only that is |
| described here: for @emph{vi} mode see the @strong{readline} |
| documentation.) |
| |
| Pressing the @key{RET} command at any time causes the command to be |
| re-submitted. |
| |
| Other editing actions are summarized in the following table. |
| |
| @appendixsection Command-line editor summary |
| |
| @subheading Command recall and vertical motion |
| |
| @table @kbd |
| @item C-p |
| Go to the previous command (backwards in the history). |
| @item C-n |
| Go to the next command (forwards in the history). |
| @item C-r @var{text} |
| Find the last command with the @var{text} string in it. This can be |
| cancelled by @code{C-g} (and on some versions of @R{} by @code{C-c}). |
| @end table |
| |
| On most terminals, you can also use the up and down arrow keys instead |
| of @kbd{C-p} and @kbd{C-n}, respectively. |
| |
| @subheading Horizontal motion of the cursor |
| |
| @table @kbd |
| @item C-a |
| Go to the beginning of the command. |
| @item C-e |
| Go to the end of the line. |
| @item M-b |
| Go back one word. |
| @item M-f |
| Go forward one word. |
| @item C-b |
| Go back one character. |
| @item C-f |
| Go forward one character. |
| @end table |
| |
| On most terminals, you can also use the left and right arrow keys |
| instead of @kbd{C-b} and @kbd{C-f}, respectively. |
| |
| @subheading Editing and re-submission |
| |
| @table @kbd |
| @item @var{text} |
| Insert @var{text} at the cursor. |
| @item C-f @var{text} |
| Append @var{text} after the cursor. |
| @item @key{DEL} |
| Delete the previous character (left of the cursor). |
| @item C-d |
| Delete the character under the cursor. |
| @item M-d |
| Delete the rest of the word under the cursor, and ``save'' it. |
| @item C-k |
| Delete from cursor to end of command, and ``save'' it. |
| @item C-y |
| Insert (yank) the last ``saved'' text here. |
| @item C-t |
| Transpose the character under the cursor with the next. |
| @item M-l |
| Change the rest of the word to lower case. |
| @item M-c |
| Change the rest of the word to upper case. |
| @item @key{RET} |
| Re-submit the command to @R{}. |
| @end table |
| |
| The final @key{RET} terminates the command line editing sequence. |
| |
| The @strong{readline} key bindings can be customized in the usual way |
| @emph{via} a @file{~/.inputrc} file. These customizations can be |
| conditioned on application @code{R}, that is by including a section like |
| |
| @example |
| $if R |
| "\C-xd": "q('no')\n" |
| $endif |
| @end example |
| |
| @node Function and variable index, Concept index, The command-line editor, Top |
| @appendix Function and variable index |
| |
| @printindex vr |
| |
| @node Concept index, References, Function and variable index, Top |
| @appendix Concept index |
| |
| @printindex cp |
| |
| @node References, , Concept index, Top |
| @appendix References |
| |
| D.@: M.@: Bates and D.@: G.@: Watts (1988), @emph{Nonlinear Regression |
| Analysis and Its Applications.} John Wiley & Sons, New York. |
| |
| @noindent |
| Richard A.@: Becker, John M.@: Chambers and Allan R.@: Wilks (1988), |
| @emph{The New S Language.} Chapman & Hall, New York. |
| This book is often called the ``@emph{Blue Book}''. |
| |
| @noindent |
| John M.@: Chambers and Trevor J.@: Hastie eds. (1992), |
| @emph{Statistical Models in S.} Chapman & Hall, New York. |
| This is also called the ``@emph{White Book}''. |
| |
| @noindent |
| John M.@: Chambers (1998) |
| @emph{Programming with Data}. Springer, New York. |
| This is also called the ``@emph{Green Book}''. |
| |
| @noindent |
| A.@: C.@: Davison and D.@: V.@: Hinkley (1997), @emph{Bootstrap Methods |
| and Their Applications}, Cambridge University Press. |
| |
| @noindent |
| Annette J.@: Dobson (1990), @emph{An Introduction to Generalized Linear |
| Models}, Chapman and Hall, London. |
| |
| @noindent |
| Peter McCullagh and John A.@: Nelder (1989), @emph{Generalized Linear |
| Models.} Second edition, Chapman and Hall, London. |
| |
| @noindent |
| John A.@ Rice (1995), @emph{Mathematical Statistics and Data Analysis.} |
| Second edition. Duxbury Press, Belmont, CA. |
| |
| @noindent |
| S.@: D.@: Silvey (1970), @emph{Statistical Inference.} Penguin, London. |
| |
| @bye |