Joblib is a nice Python library for “lightweight pipelining”. This includes for example “easy simple parallel computing”. It is heavily used by scikit-learn to speed up for example machine learning algorithms. However, it has some quirks.
One of them is that joblib.Parallel (used for easily parallelizing for-loops) is overwriting certain variables defined by the outer scope. These variables include for example
spawn (see pierreglaser’s response here). This can be rather unexpected and cause confusion. There is a bug report asking to fix or to document this behavior. See the following code for an example:
joblib.Parallel(n_jobs=2)(joblib.delayed(func)() for i in range(1))
In the previous code snippet you would expect get a list with two “5”s:
[5,5]. Instead what you will get is a list of two
Namespace object defined by the
For reference, this behavior was observed for Python3,
joblib=0.13.1 and Ubuntu 18.04 with kernel
Linux 4.15.0-43-generic x86_64.