Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign up[MRG] Add quantile regression #9978
Conversation
Thanks. You'll need a lot of patience for core devs to review this in detail, I suspect. We have a number of pressing API issues as well as highly requested features that have been in the reviewing queue for some time. But I hope we'll get here soon enough! |
@avidale Thanks a lot for your contribution! @jnothman has already outlined the broader picture but FWIW a few pointers from a wanderer for when the core devs become available. First, I would suggest that you look into resolving the CI test failures. After that, you might want to consider adding to the PR to a point where you feel comfortable changing the status of this PR to [MRG] from [WIP]. (Of course, this is of no use if you actually need some comments/ideas before you can start working any further). In my limited experience, [WIP]s are usually not prioritized for review (but don't quote me on this ;)) so you might want to consider the change. Finally when you get to that point, you might want to tag some of the core devs that participated in the original discussion since some of them might have missed the initial post. |
total_iter = [] | ||
loss_args = (X, y, self.quantile, self.alpha, self.l1_ratio, | ||
sample_weight) | ||
for i in range(10): |
JasonSanchez
Dec 23, 2017
This seems a little strange to have 10 here. Could you explain more?
This seems a little strange to have 10 here. Could you explain more?
avidale
Mar 18, 2018
Author
Yes, number of iterations has been a "magic constant": maximal number of cost function detalizations (with decreasing gamma
) was hardcoded to 10 . Now it is a class parameter.
Yes, number of iterations has been a "magic constant": maximal number of cost function detalizations (with decreasing gamma
) was hardcoded to 10 . Now it is a class parameter.
This is a great add to scikit-learn. Would personally really like to see it merged. |
|
@@ -0,0 +1,102 @@ | |||
# Authors: David Dale [email protected] | |||
# License: TBD |
jnothman
Feb 1, 2018
Member
We have the same license across the code base...
We have the same license across the code base...
avidale
Mar 18, 2018
Author
Fixed it
Fixed it
|
||
def test_quantile_is_approximately_sparse(): | ||
# now most of coefficients are not exact zero, but with large n_samples they are close enough | ||
X, y = make_regression(n_samples=3000, n_features=100, n_informative=10, random_state=0, noise=1.0) |
jnothman
Feb 1, 2018
Member
please keep lines under 80 chars
please keep lines under 80 chars
assert_array_almost_equal(warm.coef_, warm_coef, 1) | ||
|
||
|
||
def test_quantile_convergence(): |
jnothman
Feb 1, 2018
Member
This test is failing.
This test is failing.
avidale
Mar 18, 2018
Author
Applied a larger dataset (1000 obs vs 500 before). The convergence improved just enough for the test to pass.
Applied a larger dataset (1000 obs vs 500 before). The convergence improved just enough for the test to pass.
Under Travis CI there is another failure: the
Current solution: just changed the solver from |
This pull request introduces 1 alert when merging 463c633 into 4207225 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
Thanks for your patience @avidale ! This is a very nice contribution! Below are few comments, I have not checked the math. The question is how can we be sure that the results we get are valid. There are quite a few asymptotic and sanity checks in the added tests which certainly help. Still is there are way we could compare the results with some well established implementation e.g. in R or other language? |
=================== | ||
|
||
Quantile regression estimates median or other quantiles of :math:`y` conditional on :math:`X`, | ||
while OLS estimates conditional mean. |
rth
Sep 27, 2018
Member
Please remove spaces in the beginning of the line, see rendered docs at https://20523-843222-gh.circle-artifacts.com/0/home/ubuntu/scikit-learn/doc/_build/html/stable/modules/linear_model.html#quantile-regression
Please remove spaces in the beginning of the line, see rendered docs at https://20523-843222-gh.circle-artifacts.com/0/home/ubuntu/scikit-learn/doc/_build/html/stable/modules/linear_model.html#quantile-regression
coefficients, not its absolute value. | ||
|
||
Quantile loss function can be used with models other than linear. | ||
:class:`GradientBoostingRegressor` also has an option to predict conditional quantiles. |
rth
Sep 27, 2018
Member
Please provide more details, about how this could be achieved (the exact parameters are described in the parent issue I think).
Please provide more details, about how this could be achieved (the exact parameters are described in the parent issue I think).
(namely, Pareto). | ||
The second part of the code shows that LinearRegression minimizes RMSE, | ||
while QuantileRegressor minimizes MAE, and both do their own job well. |
rth
Sep 27, 2018
Member
Move this paragraph before the actual section of the code when it is computed. Use sphinx-gallery formatting there:
#########################################################################
#
# The second part of the code shows that LinearRegression minimizes RMSE,
# while QuantileRegressor minimizes MAE, and both do their own job well.
(except also keep the lines <80 characters long).
Move this paragraph before the actual section of the code when it is computed. Use sphinx-gallery formatting there:
#########################################################################
#
# The second part of the code shows that LinearRegression minimizes RMSE,
# while QuantileRegressor minimizes MAE, and both do their own job well.
(except also keep the lines <80 characters long).
|
||
plt.figure(figsize=(15, 5)) | ||
|
||
plt.subplot(121) |
rth
Sep 27, 2018
Member
The new matplotlib style API is recommended,
fig, ax = plt.subplots(1, 2, figsize=(15, 5))
ax.plot(..)
The new matplotlib style API is recommended,
fig, ax = plt.subplots(1, 2, figsize=(15, 5))
ax.plot(..)
|
||
plt.subplot(122) | ||
|
||
y = 20 + x * 0.5 + np.random.pareto(10, size=x.shape[0])*10 |
rth
Sep 27, 2018
Member
Use RNG to get reproducible results,
rng = np.random.RandomState(42)
y = ... + rng.pareto(...)
Use RNG to get reproducible results,
rng = np.random.RandomState(42)
y = ... + rng.pareto(...)
Read more in the :ref:`User Guide <quantile_regression>` | ||
.. versionadded:: 0.20 |
rth
Sep 27, 2018
Member
0.21 now
0.21 now
if fit_intercept: | ||
grad = np.zeros(n_features + 1) | ||
else: | ||
grad = np.zeros(n_features + 0) |
rth
Sep 27, 2018
Member
np.empty
np.empty
max_iter=10000, alpha=0.0001, l1_ratio=0.0, | ||
warm_start=False, fit_intercept=True, | ||
normalize=False, copy_X=True, | ||
gamma=1e-2, gtol=1e-4, xtol=1e-6, |
rth
Sep 27, 2018
Member
How were the default values for gtol
, xtol
determined?
How were the default values for gtol
, xtol
determined?
avidale
Oct 7, 2018
Author
I did manual experiments on a few generated datasets with normalized features, and found that these values suffice for the algorithm to converge. So yes, they are somehow arbitrary.
I can make some report describing how these parameters affect the results of computation. Do you think it is necessary?
I did manual experiments on a few generated datasets with normalized features, and found that these values suffice for the algorithm to converge. So yes, they are somehow arbitrary.
I can make some report describing how these parameters affect the results of computation. Do you think it is necessary?
# if positive error costs more, the slope is maximal | ||
model = QuantileRegressor(quantile=0.51, alpha=0).fit(X, y) | ||
assert_almost_equal(model.intercept_, 1, 2) | ||
assert_almost_equal(model.coef_[0], 10, 2) |
rth
Sep 27, 2018
Member
use assert_allclose
everywhere with the desired relative tolerance, as recommended in numpy docs https://docs.scipy.org/doc/numpy/reference/generated/numpy.testing.assert_array_almost_equal.html
use assert_allclose
everywhere with the desired relative tolerance, as recommended in numpy docs https://docs.scipy.org/doc/numpy/reference/generated/numpy.testing.assert_array_almost_equal.html
X, y = make_regression(n_samples=1000, n_features=20, random_state=0, | ||
noise=10.0) | ||
X += np.random.normal(size=X.shape[1], scale=3) | ||
X *= np.random.normal(size=X.shape[1], scale=3) |
rth
Sep 27, 2018
Member
use np.random.RandomState
use np.random.RandomState
This pull request introduces 1 alert when merging 80c8182 into 63e5ae6 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
Thank you for the review, @rth! I have fixed some of the issues you indicated, and probably will fix some more. |
noise=1.0) | ||
|
||
# check that for small n_iter, warning is thrown | ||
with warnings.catch_warnings(record=True) as w: |
rth
Oct 7, 2018
Member
Here and below we can use with pytest.warns(None) as record:
(see pytest warning capture) -- it has a very similar API, and a better integration/compatibility with pytest
Here and below we can use with pytest.warns(None) as record:
(see pytest warning capture) -- it has a very similar API, and a better integration/compatibility with pytest
I was thinking more of some simple comparison on a few examples to validate this implementation. Comparing with R and statsmodels would be great! That could be done by posting a script and the resulting figure as a comment in PR (you can use |
untagging this as it seems unlikely to get in in time. |
Sorry review is slow, we are severely limited in review bandwidth but I have now the availability to look into this. Are you still interesting in working on it @avidale ? A few comments are below. Could you please resolve conflicts? The example is quite nice. I think the main point why this PR is currently somewhat difficult to get accepted is the use this smoothed quantile L1+L2 loss and the resulting nested LBFGS solver, is fairly non trivial. I think it would be easier to start with QuantileLoss from gradient boosting with L2 penalty only and LBFGS and add this |
The :class:`QuantileRegressor` applies linear loss to all samples. It is thus more radical than | ||
:class:`HuberRegressor`, that applies linear penalty to small fraction of outliers and quadratic loss | ||
to the rest of observations. :class:`QuantileRegressor` also supports L1 and L2 regularization, | ||
like :class:`ElasticNet`. It solves |
rth
Jul 31, 2019
Member
Please keep lines under 80 char length.
Please keep lines under 80 char length.
Quantile regression estimates median or other quantiles of :math:`y` conditional on :math:`X`, while OLS estimates | ||
conditional mean. | ||
|
||
The :class:`QuantileRegressor` applies linear loss to all samples. It is thus more radical than |
rth
Jul 31, 2019
Member
Maybe "Somewhat related, :class:HuberRegressor
..
Maybe "Somewhat related, :class:HuberRegressor
..
:align: center | ||
:scale: 50% | ||
|
||
Another possible advantage of quantile regression over OLS is its robustness |
rth
Jul 31, 2019
Member
remove "possible"
remove "possible"
The right figure shows example of an asymmetric error distribution | ||
(namely, Pareto). | ||
""" | ||
from __future__ import division |
rth
Jul 31, 2019
Member
We don't support Python2 anymore so this can be removed.
We don't support Python2 anymore so this can be removed.
sample_weight = np.array(sample_weight) | ||
check_consistent_length(y, sample_weight) | ||
else: | ||
sample_weight = np.ones_like(y) |
rth
Jul 31, 2019
Member
If you merge master in you could use the _check_sample_weights
helper form utils validation.
If you merge master in you could use the _check_sample_weights
helper form utils validation.
loss_args = (X, y, self.quantile, self.alpha, self.l1_ratio, | ||
self.fit_intercept, sample_weight) | ||
for i in range(self.n_gamma_decreases): | ||
gamma = self.gamma * self.gamma_decrease ** i |
rth
Jul 31, 2019
Member
So if l1_ratio=0.0
the penalty is smooth, do we actually need to do this loop? On failure to converge I think this would try anther round of LBFGS optimization while it should just exist with a ConvergenceWarning in that case I think?
So if l1_ratio=0.0
the penalty is smooth, do we actually need to do this loop? On failure to converge I think this would try anther round of LBFGS optimization while it should just exist with a ConvergenceWarning in that case I think?
rth
Aug 1, 2019
Member
Nevermind, I understand it's not smooth even without l1_ratio..
Nevermind, I understand it's not smooth even without l1_ratio..
noise=1.0) | ||
|
||
# check that for small n_iter, warning is thrown | ||
with pytest.warns(None) as record: |
rth
Jul 31, 2019
Member
This can be done with,
with pytest.warns(ConvergenceWarning, match="QuantileRegressor did not converge"):
# code here
This can be done with,
with pytest.warns(ConvergenceWarning, match="QuantileRegressor did not converge"):
# code here
warnings.warn("QuantileRegressor did not converge:" + | ||
" Scipy solver terminated with '%s'." | ||
% str(result['message']) | ||
) |
rth
Jul 31, 2019
Member
if you merge master in you could use utils.optimize._check_optimize_result('lbfgs', result)
.
if you merge master in you could use utils.optimize._check_optimize_result('lbfgs', result)
.
or np.abs(old_param) < self.xtol \ | ||
and new_value < value + self.gtol: | ||
value = new_value | ||
parameters = new_parameters |
rth
Jul 31, 2019
Member
This sounds quite expensive to calculate.
This sounds quite expensive to calculate.
|
||
Another possible advantage of quantile regression over OLS is its robustness | ||
to outliers, because it is only sign of an error that influences estimated | ||
coefficients, not its absolute value. |
jeromedockes
Aug 2, 2019
Contributor
shouldn't this be "it is the value [...] rather than the squared error" ? "only the sign of an error" would correspond to L0 loss
shouldn't this be "it is the value [...] rather than the squared error" ? "only the sign of an error" would correspond to L0 loss
For reference after IRL discussion with @rth and @GaelVaroquaux, here is a paper Yi, Congrui, and Jian Huang. "Semismooth newton coordinate descent algorithm Also, when applying (strictly positive) l2 regularization, the dual problem (of argmax_c c^Ty - (1 / 4 \lambda) c^TXX^Tc sorry for the noise if this is not helpful |
Thanks @jeromedockes that is very helpful! The reference implementation for quantile regression is probably
So it looks like only the last (experimental) solver supports penalties there. |
yes; at least for "lasso" I think it is the same solver as described in Portnoy, Stephen, and Roger Koenker. "The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators." Statistical Science 12.4 (1997): 279-300. except that in this case the linear program becomes (n samples + n features)-dimensional with (n features) additional constraints |
Overall it looks like for quantile regression the solver we may want is either the Lasso from Portnoy & Koenker (1997) or the Scad solver with smoothed penalty Fan & Li (1999). Both the Yi (2017), implemented in hqreg, and Chen & Wei (2005) for the smoothed penalty used in this PR probably won't pass the scikit-learn inclusion criteria citation wise. So adding this implementation to https://github.com/scikit-learn-contrib/scikit-learn-extra might be a first step toward making it accessible to users. |
Both the Yi (2017), implemented in hqreg, and Chen & Wei (2005) for the smoothed penalty used in this PR probably won't pass the scikit-learn inclusion criteria citation wise.
IMHO if clear benchmarking shows that they are the right solution, I am
+1 for still including them. The logic is that the problem (quantile
regression) clearly passes the bar. The difficult choice then becomes the
solver, and rules for the solver are less stringent.
This is of course to be discussed by a wider set of people.
|
This PR fixes issue #3148
This new feature implements quantile regression - an algorithm that directly minimizes mean absolute error of a linear regression model.
The work is still in progress, but I do want to receive some feedback.