Mixture modeling reports wrong number of `n_iter_` #10740

kno10 · 2018-03-01T22:15:58Z

If you run mixture modeling with max_iter set to 5, it will report to have finished 4 iterations.

Because it is reporting the last iteration number 0...max_iter-1 in mixture/base.by, not the number of completed iterations.

This patch adds + 1 in the appropriate place.

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

agramfort · 2018-03-02T08:27:34Z

you need to fix travis

glemaitre

You need to add a regression test with the patch.

glemaitre · 2018-03-02T16:08:16Z

sklearn/mixture/base.py

@@ -228,7 +228,8 @@ def fit(self, X, y=None):
            if self.lower_bound_ > max_lower_bound:
                max_lower_bound = self.lower_bound_
                best_params = self._get_parameters()
-                best_n_iter = n_iter
+                # Report the number of completed iterations, not 0-indexed


I would remove the comment

Then people will just think this is a mistake.
sklearn code could do with much more inline comments, actually. I also do not like how the current code relies on "leaking" the n_iter variable out of the for loop. That is pretty bad code style, but I tried to keep the patch small. And the variable is badly named. The name appears to be the "number of iterations", but it is a 0-based index.
Feel free to do any such changes, add a regression test, etc.

Second thought, the comment is ok there. It will remove some ambiguity.

Then people will just think this is a mistake.

I hope that people try to understand code before to mention this is a mistake :).

sklearn code could do with much more inline comments, actually

I would argue that code should be self-explanatory. Comments should be there to remove ambiguity.

And the variable is badly named

True and it brings ambiguity ;)

I also do not like how the current code relies on "leaking" the n_iter variable out of the for loop. That is pretty bad code style

I don't really see the issue with it.

Feel free to do any such changes, add a regression test, etc.

We are reviewing code of contributors but do not modify theirs PRs directly (apart of minor changes before merging).

jnothman

The reporting in _print_verbose_msg_iter_end is still zero-based. Can't we just change the range?

glemaitre · 2018-03-15T12:33:13Z

@jnothman do we have the same issue than in #10723 in which a user could check the n_iter and would not only be interested about the printing?

kno10

renumeration

kno10 · 2018-03-22T16:14:19Z

@glemaitre Please re-review, as I did the requested change.

glemaitre · 2018-04-21T21:35:52Z

Can you add a what's new entry and a regression test.

If you run mixture modeling with max_iter set to 5, it will report to have finished 4 iterations. Because it is reporting the last iteration number 0...max_iter-1 in mixture/base.by, not the number of *completed* iterations. This patch changes the iterations to be numbered 1...max_iter (inclusive)

kno10 · 2018-04-23T13:00:26Z

Added test that fails on master with AssertionError: 0 != 1

(On master, n_iter_=0 when running with max_iter=1).

Contributions to sklearn would be easier if you would demand less boilerplate from occasional contributors. Even a trivial fix like this requires me to do many iterations. I probably won't go through this hassle again, the effort/benefit is too low.

jnothman · 2018-04-23T13:47:39Z

Contributions to sklearn would be easier if you would demand less boilerplate from occasional contributors. Even a trivial fix like this requires me to do many iterations. I probably won't go through this hassle again, the effort/benefit is too low.

I'm sorry you feel this way, but the requirement of tests is pretty standard in software development. Changelog entries are necessary to help users understand changed behaviour when running the script from version to version. The project is almost entirely run on volunteer time, so it's not like we're asking for these things just because we feel like it.

kno10 · 2018-04-23T13:52:40Z

Do not apply the standards of commercial development to volunteers.
I agree that changelogs etc. should be maintained, but I don't think it is appropriate to completely offload this to casual contributors.
See: if I had just made a issue, I would have had much less work. Next time, I am probably not going to make a pull request.

glemaitre

Do not apply the standards of commercial development to volunteers.

This is not commercial development but rather good practices. A lot of things are sometimes consider useless (documentation, PEP8, etc.)

Could you make the CI happy.

glemaitre · 2018-04-23T13:58:49Z

sklearn/mixture/tests/test_gaussian_mixture.py

@@ -172,6 +172,18 @@ def test_gaussian_mixture_attributes():
    assert_equal(gmm.n_init, n_init)
    assert_equal(gmm.init_params, init_params)

+def test_regression_10740():


rename test_regression_10740 to test_gmm_n_iter

glemaitre · 2018-04-23T13:58:49Z

sklearn/mixture/tests/test_gaussian_mixture.py

@@ -172,6 +172,18 @@ def test_gaussian_mixture_attributes():
    assert_equal(gmm.n_init, n_init)
    assert_equal(gmm.init_params, init_params)

+def test_regression_10740():
+    # Regression test for #10740. Ensure that n_iter_ is the number of iterations done


Too long ...

jnothman · 2018-04-23T14:07:36Z

Please, if you feel more comfortable making an issue than a PR, that's fine. Do what you are comfortable with. Clearly different volunteers have different feelings about what is appropriate and what is burdensome. Those of us who spend many hours a week on this are hardly fazed by a small non-regression test.

Do not apply the standards of commercial development to volunteers

This project would have nowhere near its level of success without some rigour (albeit fallible) in testing and documentation. And yet our testing does not even come close to a commercial software house with dedicated QA personnel. I could have, for instance, asked you to write a generic test for all iterative estimators. Or to make sure that the result holds under certain kinds of randomised and edge case inputs. That's what you might expect from enterprise development.

kno10 · 2018-04-23T14:23:25Z

Ok. Forget about it. I do not care anymore. Goodbye.

This is how you lose potential contributors.

jnothman · 2018-04-23T21:24:08Z

I suppose you're saying i should have cared less and not replied. thanks for raising the issue

…

On 24 Apr 2018 12:23 am, "Erich Schubert" ***@***.***> wrote: Closed #10740 <#10740>. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#10740 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz60b7ZQdxxcvbX03Mk1iE9ymJ21yjks5trePggaJpZM4SZFRs> .

kno10 force-pushed the patch-2 branch from a21fa52 to 7b78c64 Compare Mar 2, 2018

glemaitre requested changes Mar 2, 2018

View changes

jnothman reviewed Mar 3, 2018

View changes

kno10 force-pushed the patch-2 branch from 7b78c64 to 82aeecc Compare Mar 15, 2018

kno10 reviewed Mar 15, 2018

View changes

kno10 and others added 2 commits Apr 23, 2018

Add whats new and regression test for scikit-learn#10740

a61cc2f

kno10 force-pushed the patch-2 branch from 82aeecc to a61cc2f Compare Apr 23, 2018

glemaitre reviewed Apr 23, 2018

View changes

kno10 closed this Apr 23, 2018

kno10 deleted the patch-2 branch Apr 23, 2018

kno10 restored the patch-2 branch Apr 23, 2018

kno10 deleted the patch-2 branch Apr 23, 2018

kno10 restored the patch-2 branch Apr 23, 2018

glemaitre mentioned this pull request Apr 23, 2018

[MRG+1] FIX make n_iter_ being the number of iteration performed #11015

Merged

Apr	MAY	Jun
	24
2021	2022	2025

scikit-learn / scikit-learn Public

Mixture modeling reports wrong number of `n_iter_` #10740

Mixture modeling reports wrong number of `n_iter_` #10740

kno10 commented Mar 1, 2018

agramfort commented Mar 2, 2018

glemaitre left a comment

glemaitre Mar 2, 2018

kno10 Mar 2, 2018

glemaitre Mar 2, 2018 •

edited

jnothman left a comment

glemaitre commented Mar 15, 2018

kno10 left a comment

kno10 commented Mar 22, 2018

glemaitre commented Apr 21, 2018

kno10 commented Apr 23, 2018

jnothman commented Apr 23, 2018

kno10 commented Apr 23, 2018

glemaitre left a comment

glemaitre Apr 23, 2018

glemaitre Apr 23, 2018

jnothman commented Apr 23, 2018

kno10 commented Apr 23, 2018 •

edited

jnothman commented Apr 23, 2018

scikit-learn / scikit-learn Public

Mixture modeling reports wrong number of n_iter_ #10740

Mixture modeling reports wrong number of n_iter_ #10740

Conversation

kno10 commented Mar 1, 2018

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

agramfort commented Mar 2, 2018

glemaitre left a comment

glemaitre Mar 2, 2018

Choose a reason for hiding this comment

kno10 Mar 2, 2018

Choose a reason for hiding this comment

glemaitre Mar 2, 2018 • edited

Choose a reason for hiding this comment

jnothman left a comment

glemaitre commented Mar 15, 2018

kno10 left a comment

kno10 commented Mar 22, 2018

glemaitre commented Apr 21, 2018

kno10 commented Apr 23, 2018

jnothman commented Apr 23, 2018

kno10 commented Apr 23, 2018

glemaitre left a comment

glemaitre Apr 23, 2018

Choose a reason for hiding this comment

glemaitre Apr 23, 2018

Choose a reason for hiding this comment

jnothman commented Apr 23, 2018

kno10 commented Apr 23, 2018 • edited

jnothman commented Apr 23, 2018

Mixture modeling reports wrong number of `n_iter_` #10740

Mixture modeling reports wrong number of `n_iter_` #10740

glemaitre Mar 2, 2018 •

edited

kno10 commented Apr 23, 2018 •

edited