Skip to content

purrr's map functions have a .parallel argument to parallelize a map using the mirai package. This allows you to run computations in parallel using more cores on your machine, or distributed over the network.

Parallelizing a map using 'n' processes does not automatically lead to it taking 1/n of the time. Additional overhead from setting up the parallel task and communicating with parallel processes eats into this benefit, and can outweigh it for very short tasks or those involving large amounts of data. The threshold at which parallelization becomes clearly beneficial will differ according to your individual setup and task, but a rough guide would be in the order of 100 microseconds to 1 millisecond for each map iteration.

Daemons settings

How and where parallelization occurs is determined by mirai::daemons(). This is a function from the mirai package that sets up daemons (persistent background processes that receive parallel computations) on your local machine or across the network.

purrr requires these to be set up prior to performing any parallel map operations. It is usual to set daemons once per session. You can leave them running as they consume almost no resources whilst waiting to receive tasks. The following sets up 6 daemons on your local machine:


mirai::daemons(6)

daemons()arguments:

  • n: the number of daemons to launch on your local machine, e.g. mirai::daemons(6). As a rule of thumb, for maximum efficiency this should be (at most) one less than the number of cores on your machine, leaving one core for the main R process.

  • url and remote: used to set up and launch daemons for distributed computing over the network. See mirai::daemons function documentation for more details.

  • None: calling mirai::daemons() with no arguments returns a summary of the current connection status and mirai tasks.

For details on further options, see mirai::daemons.

Resetting daemons:

Daemons persist for the duration of your session. To reset and terminate any existing daemons:


mirai::daemons(0)

All daemons automatically terminate when your session ends and the connection drops. Hence you do not need to explicitly terminate daemons in this instance, although it is still good practice to do so.

Note: it should always be for the user to set daemons. If you are using parallel map within a package, do not make any mirai::daemons() calls within the package. This helps prevent inadvertently spawning too many daemons if functions are used recursively within each other.

Crating a function

carrier::crate() provides a systematic way of making the function .f self-contained so that it can be readily shared with other processes.

Crating ensures that everything needed by the function is serialized along with it, but not other objects which happen to be in the function's enclosing environment. This helps to prevent inadvertently shipping large data objects to daemons, where they are not needed.

Any non-package function supplied to .f will be automatically crated. When this happens, a confirmation along with the crate size is printed to the console. Package functions are not crated as these are already self-contained.

If your function .f() contains free variables, for example it references other local functions in its body, then explicitly carrier::crate() your function supplying these variables to its ... argument. This ensures that these objects are available to .f() when it is executed in a parallel process.

Examples:


# package functions are not auto-crated:
map(1:3, stats::runif, .parallel = TRUE)

# other functions (incl. anonymous functions) are auto-crated:
mtcars |> map_dbl(function(...) sum(...), .parallel = TRUE)

# explicitly crate a function to include other objects required by it:
fun <- function(x) {x + x %% 2 }
map(1:3, carrier::crate(function(x) x + fun(x), fun = fun), .parallel = TRUE)

For details on further options, see carrier::crate.

Further documentation

purrr's parallelization is powered by mirai. See the mirai introduction and reference for more details.

Crating is provided by the carrier package. See the carrier readme for more details.