Joblib is a nice Python library for “lightweight pipelining”. This includes for example “easy simple parallel computing”. It is heavily used by scikit-learn to speed up for example machine learning algorithms. However, it has some quirks.
One of them is that joblib.Parallel (used for easily parallelizing for-loops) is overwriting certain variables defined by the outer scope. These variables include for example args
, parser
, exitcode
, or spawn
(see pierreglaser’s response here). This can be rather unexpected and cause confusion. There is a bug report asking to fix or to document this behavior. See the following code for an example:
1 2 3 4 5 6 7 |
import joblib args=5 def func(): return args joblib.Parallel(n_jobs=2)(joblib.delayed(func)() for i in range(1)) |
In the previous code snippet you would expect get a list with two “5”s: [5,5]
. Instead what you will get is a list of two Namespace
object defined by the joblib.Parallel
“context”.
For reference, this behavior was observed for Python3, joblib=0.13.1
and Ubuntu 18.04 with kernel Linux 4.15.0-43-generic x86_64
.
One thought on “joblib.Parallel is overwriting variables”