| % File src/library/parallel/man/unix/mcparallel.Rd |
| % Part of the R package, https://www.R-project.org |
| % Copyright 2009-2018 R Core Team |
| % Distributed under GPL 2 or later |
| |
| \name{mcparallel} |
| \alias{mccollect} |
| \alias{mcparallel} |
| |
| \title{Evaluate an \R Expression Asynchronously in a Separate Process} |
| \description{ |
| These functions are based on forking and so are not available on Windows. |
| |
| \code{mcparallel} starts a parallel \R process which evaluates the |
| given expression. |
| |
| \code{mccollect} collects results from one or more parallel processes. |
| } |
| \usage{ |
| mcparallel(expr, name, mc.set.seed = TRUE, silent = FALSE, |
| mc.affinity = NULL, mc.interactive = FALSE, |
| detached = FALSE) |
| |
| mccollect(jobs, wait = TRUE, timeout = 0, intermediate = FALSE) |
| } |
| \arguments{ |
| \item{expr}{expression to evaluate (do \emph{not} use any on-screen |
| devices or GUI elements in this code, see \code{\link{mcfork}} for |
| the inadvisability of using \code{mcparallel} with GUI front-ends |
| and multi-threaded libraries). Raw vectors are reserved for |
| internal use and cannot be returned, but the expression may evaluate |
| e.g. to a list holding a raw vector. \code{NULL} should not be returned |
| because it is used by \code{mccollect} to signal an error. } |
| \item{name}{an optional name (character vector of length one) that can |
| be associated with the job.} |
| \item{mc.set.seed}{logical: see section \sQuote{Random numbers}.} |
| \item{silent}{if set to \code{TRUE} then all output on stdout will be |
| suppressed (stderr is not affected).} |
| \item{mc.affinity}{either a numeric vector specifying CPUs to restrict |
| the child process to (1-based) or \code{NULL} to not modify the CPU |
| affinity} |
| \item{mc.interactive}{logical, if \code{TRUE} or \code{FALSE} then the |
| child process will be set as interactive or non-interactive |
| respectively. If \code{NA} then the child process will inherit the |
| interactive flag from the parent.} |
| \item{detached}{logical, if \code{TRUE} then the job is detached from |
| the current session and cannot deliver any results back - it is used |
| for the code side-effect only.} |
| \item{jobs}{list of jobs (or a single job) to collect results |
| for. Alternatively \code{jobs} can also be an integer vector of |
| process IDs. If omitted \code{collect} will wait for all currently |
| existing children.} |
| \item{wait}{if set to \code{FALSE} it checks for any results that are |
| available within \code{timeout} seconds from now, otherwise it waits |
| for all specified jobs to finish.} |
| \item{timeout}{timeout (in seconds) to check for job results -- applies |
| only if \code{wait} is \code{FALSE}.} |
| \item{intermediate}{\code{FALSE} or a function which will be called while |
| \code{collect} waits for results. The function will be called with one |
| parameter which is the list of results received so far.} |
| } |
| \details{ |
| \code{mcparallel} evaluates the \code{expr} expression in parallel to |
| the current \R process. Everything is shared read-only (or in fact |
| copy-on-write) between the parallel process and the current process, |
| i.e.\sspace{}no side-effects of the expression affect the main process. The |
| result of the parallel execution can be collected using |
| \code{mccollect} function. |
| |
| \code{mccollect} function collects any available results from parallel |
| jobs (or in fact any child process). If \code{wait} is \code{TRUE} |
| then \code{collect} waits for all specified jobs to finish before |
| returning a list containing the last reported result for each |
| job. If \code{wait} is \code{FALSE} then \code{mccollect} merely |
| checks for any results available at the moment and will not wait for |
| jobs to finish. If \code{jobs} is specified, jobs not listed there |
| will not be affected or acted upon. |
| |
| Note: If \code{expr} uses low-level multicore functions such |
| as \code{\link{sendMaster}} a single job can deliver results |
| multiple times and it is the responsibility of the user to interpret |
| them correctly. \code{mccollect} will return \code{NULL} for a |
| terminating job that has sent its results already after which the |
| job is no longer available. |
| |
| Jobs are identified by process IDs (even when referred to as job objects), |
| which are reused by the operating system. Detached jobs created by |
| \code{mcparallel} can thus never be safely referred to by their process |
| IDs nor job objects. Non-detached jobs are guaranteed to exist until |
| collected by \code{mccollect}, even if crashed or terminated by a signal. |
| Once collected by \code{mccollect}, a job is regarded as detached, and |
| thus must no longer be referred to by its process ID nor its job object. |
| With \code{wait = TRUE}, all jobs passed to \code{mccollect} are |
| collected. With \code{wait = FALSE}, the collected jobs are given as |
| names of the result vector, and thus in subsequent calls to |
| \code{mccollect} these jobs must be excluded. Job objects should be used |
| in preference of process IDs whenever accepted by the API. |
| |
| The \code{mc.affinity} parameter can be used to try to restrict |
| the child process to specific CPUs. The availability and the extent of |
| this feature is system-dependent (e.g., some systems will only |
| consider the CPU count, others will ignore it completely). |
| } |
| \value{ |
| \code{mcparallel} returns an object of the class \code{"parallelJob"} |
| which inherits from \code{"childProcess"} (see the \sQuote{Value} |
| section of the help for \code{\link{mcfork}}). If argument |
| \code{name} was supplied this will have an additional component |
| \code{name}. |
| |
| \code{mccollect} returns any results that are available in a list. The |
| results will have the same order as the specified jobs. If there are |
| multiple jobs and a job has a name it will be used to name the |
| result, otherwise its process ID will be used. If none of the |
| specified children are still running, it returns \code{NULL}. |
| } |
| \section{Random numbers}{ |
| If \code{mc.set.seed = FALSE}, the child process has the same initial |
| random number generator (RNG) state as the current \R session. If the |
| RNG has been used (or \code{.Random.seed} was restored from a saved |
| workspace), the child will start drawing random numbers at the same |
| point as the current session. If the RNG has not yet been used, the |
| child will set a seed based on the time and process ID when it first |
| uses the RNG: this is pretty much guaranteed to give a different |
| random-number stream from the current session and any other child |
| process. |
| |
| The behaviour with \code{mc.set.seed = TRUE} is different only if |
| \code{\link{RNGkind}("L'Ecuyer-CMRG")} has been selected. Then each |
| time a child is forked it is given the next stream (see |
| \code{\link{nextRNGStream}}). So if you select that generator, set a |
| seed and call \code{\link{mc.reset.stream}} just before the first use |
| of \code{mcparallel} the results of simulations will be reproducible |
| provided the same tasks are given to the first, second, \ldots{} |
| forked process. |
| } |
| |
| \note{ |
| Prior to \R 3.4.0 and on a 32-bit platform, the \link{serialize}d |
| result from each forked process is limited to \eqn{2^{31} - 1}{2^31 - |
| 1} bytes. (Returning very large results via serialization is |
| inefficient and should be avoided.) |
| } |
| |
| |
| \author{ |
| Simon Urbanek and R Core. |
| |
| Derived from the \pkg{multicore} package formerly on |
| \acronym{CRAN}. (but with different handling of the RNG stream). |
| } |
| \seealso{ |
| \code{\link{pvec}}, \code{\link{mclapply}} |
| } |
| \examples{ |
| p <- mcparallel(1:10) |
| q <- mcparallel(1:20) |
| # wait for both jobs to finish and collect all results |
| res <- mccollect(list(p, q)) |
| |
| ## IGNORE_RDIFF_BEGIN |
| ## reports process ids, so not reproducible |
| p <- mcparallel(1:10) |
| mccollect(p, wait = FALSE, 10) # will retrieve the result (since it's fast) |
| mccollect(p, wait = FALSE) # will signal the job as terminating |
| mccollect(p, wait = FALSE) # there is no longer such a job |
| ## IGNORE_RDIFF_END |
| |
| \dontshow{set.seed(123, "L'Ecuyer"); mc.reset.stream()} |
| # a naive parallel lapply can be created using mcparallel alone: |
| jobs <- lapply(1:10, function(x) mcparallel(rnorm(x), name = x)) |
| mccollect(jobs) |
| } |
| \keyword{interface} |