“Bagging based on resampling with and without replacement is equivalent”, is it?

Recently, I found the following StackOverflow post as well as the corresponding paper that states that bagging based on resampling with and without replacement is equivalent in the scenarios considered by the authors. In particular, the paper states that they found:

An equivalence between bagging based on resampling with and without replacement: resample sizes M_\text{with} = \alpha_{with}N and M_\text{w/o} = \alpha_\text{w/o}N (for resampling with and without replacement, respectively) produce very similar bagged statistics if \alpha_\text{with} = \alpha_\text{w/o}/(1 - \alpha_\text{w/o}) . This approximate equality holds for each sample, not just on the average over samples. While our derivation is limited to U-statistics, the equivalence seems to hold more widely, as illustrated by our limited experiments with bagging cart trees.

in “Observations on Bagging” by Buja and Stuetzle (2006)

While the authors are optimistic that their finding may be general, this needs further investigation since their results are mainly limited to U-statistics. Also note that it is generally NOT possible to just replace sampling with replacement and sampling without replacement in any application scenario as argued in the notes on random forests below. Nevertheless, I think it is an interesting result. The following examples and notes try to illustrate what the paper “Observations on Bagging” by Buja and Stuetzle (2006) actually implicates and what it does not. Please feel free to comment and discuss 🙂

Illustrations

Here is some Python code illustrating the formula mentioned in the text:

It is apparent that, generally, replacement introduces variance, also that the formula from the paper correctly adjusts for that variance.

The following graph shows \alpha_\text{with} as a function of \alpha_\text{without}. The more of the overall data we sample without replacement, the more samples with replacement are required to get to the same results.

Note on random forests

For random forests, generally, the concept of replacement is considered essential. This is because the underlying concept of random forests is bagging to prevent overfitting, i.e., bagging builds an ensemble of estimators trained on data with a high variance (with regard to the training data they have seen).

In particular, random forests sample with replacement as many samples as present in the training data, that is, \alpha_\text{with}=1. According to the paper, this is equivalent to \alpha_\text{without}=\alpha_\text{with}/(1 + \alpha_\text{with})=1/2 in order to achieve the same variance as sampling with replacement. I haven’t tried this, but is sounds plausible. However, reducing the number of samples to train on by half, may get us in trouble if we only have a small number of samples to train on to begin with. This is because the individual estimators may not have enough data to learn anything.

Finally, if we trained random forests on data sampled without replacement, i.e., \alpha_\text{with}=1, this would mean that we have no variance in the data used to train the individual estimators of the ensemble. Thus the ensemble will become meaningless. By the way, if we wanted to have a \alpha_\text{without} matching \alpha_\text{with}=1, \alpha_\text{without} would approach infinity which is hinted on in the plot above and as can be seen from the corresponding formula.

Links

  • https://stats.stackexchange.com/questions/130662/what-is-the-effect-of-bootstap-resampling-in-bagging-algorithmensemble-learning
  • https://www.jstor.org/stable/24307547
  • https://www.quora.com/Why-does-random-forest-use-sampling-with-replacement-instead-of-without-replacement

Acknowledgements

Thanks to Florian Lemmerich for working on this with me 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *