blob: 69ef622e43a60e2f9a13d03c41166f9edf3ff9c2 [file] [log] [blame]
% File src/library/parallel/man/unix/mcparallel.Rd
% Part of the R package, https://www.R-project.org
% Copyright 2009-2018 R Core Team
% Distributed under GPL 2 or later
\name{mcparallel}
\alias{mccollect}
\alias{mcparallel}
\title{Evaluate an \R Expression Asynchronously in a Separate Process}
\description{
These functions are based on forking and so are not available on Windows.
\code{mcparallel} starts a parallel \R process which evaluates the
given expression.
\code{mccollect} collects results from one or more parallel processes.
}
\usage{
mcparallel(expr, name, mc.set.seed = TRUE, silent = FALSE,
mc.affinity = NULL, mc.interactive = FALSE,
detached = FALSE)
mccollect(jobs, wait = TRUE, timeout = 0, intermediate = FALSE)
}
\arguments{
\item{expr}{expression to evaluate (do \emph{not} use any on-screen
devices or GUI elements in this code, see \code{\link{mcfork}} for
the inadvisability of using \code{mcparallel} with GUI front-ends
and multi-threaded libraries). Raw vectors are reserved for
internal use and cannot be returned, but the expression may evaluate
e.g. to a list holding a raw vector. \code{NULL} should not be returned
because it is used by \code{mccollect} to signal an error. }
\item{name}{an optional name (character vector of length one) that can
be associated with the job.}
\item{mc.set.seed}{logical: see section \sQuote{Random numbers}.}
\item{silent}{if set to \code{TRUE} then all output on stdout will be
suppressed (stderr is not affected).}
\item{mc.affinity}{either a numeric vector specifying CPUs to restrict
the child process to (1-based) or \code{NULL} to not modify the CPU
affinity}
\item{mc.interactive}{logical, if \code{TRUE} or \code{FALSE} then the
child process will be set as interactive or non-interactive
respectively. If \code{NA} then the child process will inherit the
interactive flag from the parent.}
\item{detached}{logical, if \code{TRUE} then the job is detached from
the current session and cannot deliver any results back - it is used
for the code side-effect only.}
\item{jobs}{list of jobs (or a single job) to collect results
for. Alternatively \code{jobs} can also be an integer vector of
process IDs. If omitted \code{collect} will wait for all currently
existing children.}
\item{wait}{if set to \code{FALSE} it checks for any results that are
available within \code{timeout} seconds from now, otherwise it waits
for all specified jobs to finish.}
\item{timeout}{timeout (in seconds) to check for job results -- applies
only if \code{wait} is \code{FALSE}.}
\item{intermediate}{\code{FALSE} or a function which will be called while
\code{collect} waits for results. The function will be called with one
parameter which is the list of results received so far.}
}
\details{
\code{mcparallel} evaluates the \code{expr} expression in parallel to
the current \R process. Everything is shared read-only (or in fact
copy-on-write) between the parallel process and the current process,
i.e.\sspace{}no side-effects of the expression affect the main process. The
result of the parallel execution can be collected using
\code{mccollect} function.
\code{mccollect} function collects any available results from parallel
jobs (or in fact any child process). If \code{wait} is \code{TRUE}
then \code{collect} waits for all specified jobs to finish before
returning a list containing the last reported result for each
job. If \code{wait} is \code{FALSE} then \code{mccollect} merely
checks for any results available at the moment and will not wait for
jobs to finish. If \code{jobs} is specified, jobs not listed there
will not be affected or acted upon.
Note: If \code{expr} uses low-level multicore functions such
as \code{\link{sendMaster}} a single job can deliver results
multiple times and it is the responsibility of the user to interpret
them correctly. \code{mccollect} will return \code{NULL} for a
terminating job that has sent its results already after which the
job is no longer available.
Jobs are identified by process IDs (even when referred to as job objects),
which are reused by the operating system. Detached jobs created by
\code{mcparallel} can thus never be safely referred to by their process
IDs nor job objects. Non-detached jobs are guaranteed to exist until
collected by \code{mccollect}, even if crashed or terminated by a signal.
Once collected by \code{mccollect}, a job is regarded as detached, and
thus must no longer be referred to by its process ID nor its job object.
With \code{wait = TRUE}, all jobs passed to \code{mccollect} are
collected. With \code{wait = FALSE}, the collected jobs are given as
names of the result vector, and thus in subsequent calls to
\code{mccollect} these jobs must be excluded. Job objects should be used
in preference of process IDs whenever accepted by the API.
The \code{mc.affinity} parameter can be used to try to restrict
the child process to specific CPUs. The availability and the extent of
this feature is system-dependent (e.g., some systems will only
consider the CPU count, others will ignore it completely).
}
\value{
\code{mcparallel} returns an object of the class \code{"parallelJob"}
which inherits from \code{"childProcess"} (see the \sQuote{Value}
section of the help for \code{\link{mcfork}}). If argument
\code{name} was supplied this will have an additional component
\code{name}.
\code{mccollect} returns any results that are available in a list. The
results will have the same order as the specified jobs. If there are
multiple jobs and a job has a name it will be used to name the
result, otherwise its process ID will be used. If none of the
specified children are still running, it returns \code{NULL}.
}
\section{Random numbers}{
If \code{mc.set.seed = FALSE}, the child process has the same initial
random number generator (RNG) state as the current \R session. If the
RNG has been used (or \code{.Random.seed} was restored from a saved
workspace), the child will start drawing random numbers at the same
point as the current session. If the RNG has not yet been used, the
child will set a seed based on the time and process ID when it first
uses the RNG: this is pretty much guaranteed to give a different
random-number stream from the current session and any other child
process.
The behaviour with \code{mc.set.seed = TRUE} is different only if
\code{\link{RNGkind}("L'Ecuyer-CMRG")} has been selected. Then each
time a child is forked it is given the next stream (see
\code{\link{nextRNGStream}}). So if you select that generator, set a
seed and call \code{\link{mc.reset.stream}} just before the first use
of \code{mcparallel} the results of simulations will be reproducible
provided the same tasks are given to the first, second, \ldots{}
forked process.
}
\note{
Prior to \R 3.4.0 and on a 32-bit platform, the \link{serialize}d
result from each forked process is limited to \eqn{2^{31} - 1}{2^31 -
1} bytes. (Returning very large results via serialization is
inefficient and should be avoided.)
}
\author{
Simon Urbanek and R Core.
Derived from the \pkg{multicore} package formerly on
\acronym{CRAN}. (but with different handling of the RNG stream).
}
\seealso{
\code{\link{pvec}}, \code{\link{mclapply}}
}
\examples{
p <- mcparallel(1:10)
q <- mcparallel(1:20)
# wait for both jobs to finish and collect all results
res <- mccollect(list(p, q))
## IGNORE_RDIFF_BEGIN
## reports process ids, so not reproducible
p <- mcparallel(1:10)
mccollect(p, wait = FALSE, 10) # will retrieve the result (since it's fast)
mccollect(p, wait = FALSE) # will signal the job as terminating
mccollect(p, wait = FALSE) # there is no longer such a job
## IGNORE_RDIFF_END
\dontshow{set.seed(123, "L'Ecuyer"); mc.reset.stream()}
# a naive parallel lapply can be created using mcparallel alone:
jobs <- lapply(1:10, function(x) mcparallel(rnorm(x), name = x))
mccollect(jobs)
}
\keyword{interface}