Error running Python’s GLMnet implementation `glmnet_py` (missing libgfortran.so.3)

For machine learning tasks with not a lot of instances (n) but a lot of features, the ElasticNet by Zou and Hastie, 2005 in my experience often works well in practice.

Missing libgfortran.so.3

To use the ElasticNet in Python, there are various packages on PiPy. I used glmnet_py/glmnet-py since it is referred to by the glmnet_python wrapper published on GitHub which is in turn linked by the original authors.

However, when I installed glmnet_py from PyPi via “pip install glmnet_py” (as recommended by the documentation), I get an error about a missing library file, namely libgfortran.so.3:

To solve this, I had compile and install glmnet_py manually from the code on GitHub:

This happened to my on a Fedora 29 with Python 3.7.2.

To track this issue, please see here.

Different versions of the same thing

The glmnet_py situation seems to be a little strange at first glance since there are two very similar PiPy repositories, namely glmnet_py and glmnet_python.

However, the latter seems to have been removed since a) its project description also refers to glmnet_py and b) pip seems not to be able to install the latter (at least for me).

Similarly, there are two GitHub repositories: one (by bbalasub1) linked by the original authors and one (by hanfang) that claims to have modified the former. However, the one by hanfang seems outdated. In contrast the one by bbalasub1 has actually seen some updates as of the time of writing this note.

Other ElasticNet implementations in Python

Of course, there are other ElasticNet implementations in Python. I have simply opted for the one above since it seems to be closely related to the original authors as well as the R code used by my colleagues. Here are some other examples of Python implementations:

Power-law, Pareto, Zipf and Scale-Free distributions

I did some related work on human mobility these days and came across the terms of Power-Law, Pareto, Zipf’s and Scale-Free distributions all the time. And, shame on me, I did not know the “difference”. Indeed, it turned out that all these notions are words for the same thing as explained by

Power laws, Pareto distributions and Zipf’s law
M. Newman. Contemporary physics46 (5): 323-351 (2005)

In particular it says about the power-law, Zipf’s law and the Pareto distribution:

Since power-law cumulative distributions imply a powerlaw form for p(x), “Zipf’s law” and “Pareto distribution” are effectively synonymous with “power-law distribution”.

With regard to the scale-free aspect of power-law, it says:

A power-law distribution is also sometimes called a scale-free distribution. Why? Because a power law is the only distribution that is the same whatever scale we look at it on.

And, finally, a little fun fact:

Zipf’s law and the Pareto distribution differ from one another in the way the cumulative distribution is plotted—Zipf made his plots with x on the horizontal axis and P(x) on the vertical one; Pareto did it the other way around. This causes much confusion in the literature, but the data depicted in the plots are of course identical.