Parallelize Map()
This article is originally published at https://statcompute.wordpress.com
Map() is a convenient routine in Python to apply a function to all items from one or more lists, as shown below. This specific nature also makes map() a perfect candidate for the parallelism.
In [1]: a = (1, 2, 3) In [2]: b = (10, 20, 30) In [3]: def func(a, b): ...: print "a -->", a, "b -->", b ...: In [4]: ### SERIAL CALL ### In [5]: map(func, a, b) a --> 1 b --> 10 a --> 2 b --> 20 a --> 3 b --> 30
Pool.map() function in Multiprocessing Package is the parallel implementation of map(). However, a drawback is that Pool.map() doesn’t support more than one arguments in the function call. Therefore, in case of a functional call with multiple arguments, a wrapper function is necessary to make it working, which however should be defined before importing Multiprocessing package.
In [6]: ### PARALLEL CALL ### In [7]: ### SINCE POOL.MAP() DOESN'T TAKE MULTIPLE ARGUMENTS, A WRAPPER IS NEEDED In [8]: def f2(ab): ...: a, b = ab ...: return func(a, b) ...: In [9]: from multiprocessing import Pool, cpu_count In [10]: pool = Pool(processes = cpu_count()) In [11]: ### PARALLEL MAP() ON ALL CPUS In [12]: pool.map(f2, zip(a, b)) a --> 1 b --> 10 a --> 2 b --> 20 a --> 3 b --> 30
In addition, Pool.apply() function, with some tweaks, can also be employed to mimic the parallel version of map(). The advantage of this approach is that, different from Pool.map(), Pool.apply() is able to handle multiple arguments by using the list comprehension.
In [13]: ### POOL.APPLY() CAN ALSO MIMIC MAP() In [14]: [pool.apply(func, args = (i, j)) for i, j in zip(a, b)] a --> 1 b --> 10 a --> 2 b --> 20 a --> 3 b --> 30
Alternatively, starmap() function in the parmap package (https://github.com/zeehio/parmap), which is specifically designed to overcome limitations in Pool.map(), provides a more friendly and elegant interface to implement the parallelized map() with multiple arguments at the cost of a slight computing overhead.
In [15]: ### ALTERNATIVELY, PARMAP PACKAGE IS USED In [16]: from parmap import starmap In [17]: starmap(func, zip(a, b), pool = Pool(processes = cpu_count())) a --> 1 b --> 10 a --> 2 b --> 20 a --> 3 b --> 30
Thanks for visiting r-craft.org
This article is originally published at https://statcompute.wordpress.com
Please visit source website for post related comments.