For machine learning tasks with not a lot of instances (n) but a lot of features, the ElasticNet by Zou and Hastie, 2005 in my experience often works well in practice.
Missing libgfortran.so.3
To use the ElasticNet in Python, there are various packages on PiPy. I used glmnet_py/glmnet-py
since it is referred to by the glmnet_python
wrapper published on GitHub which is in turn linked by the original authors.
However, when I installed glmnet_py
from PyPi via “pip install glmnet_py” (as recommended by the documentation), I get an error about a missing library file, namely libgfortran.so.3
:
1 |
OSError: libgfortran.so.3: cannot open shared object file: No such file or directory |
To solve this, I had compile and install glmnet_py
manually from the code on GitHub:
1 2 3 |
git clone https://github.com/bbalasub1/glmnet_python.git cd glmnet_python python setup.py install |
This happened to my on a Fedora 29 with Python 3.7.2.
To track this issue, please see here.
Different versions of the same thing
The glmnet_py
situation seems to be a little strange at first glance since there are two very similar PiPy repositories, namely glmnet_py
and glmnet_python
.
However, the latter seems to have been removed since a) its project description also refers to glmnet_py
and b) pip
seems not to be able to install the latter (at least for me).
Similarly, there are two GitHub repositories: one (by bbalasub1) linked by the original authors and one (by hanfang) that claims to have modified the former. However, the one by hanfang seems outdated. In contrast the one by bbalasub1 has actually seen some updates as of the time of writing this note.
Other ElasticNet implementations in Python
Of course, there are other ElasticNet implementations in Python. I have simply opted for the one above since it seems to be closely related to the original authors as well as the R code used by my colleagues. Here are some other examples of Python implementations:
- There are several variants in Scikit Learn (sklearn), for example
- the regular regressor variant
- one optimized by cross validation
- incorporated into their stochastic gradient decent regressor and classifier
- or in a multi task scenario
- There is also a wrapper around the original Fortran code (similar to
glmnet_py
) that integrates with Scikit Learn (sklearn).