The Wayback Machine - https://web.archive.org/web/20220524210025/https://github.com/scikit-learn/scikit-learn/pull/10740
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixture modeling reports wrong number of n_iter_ #10740

Closed
wants to merge 2 commits into from

Conversation

kno10
Copy link
Contributor

@kno10 kno10 commented Mar 1, 2018

If you run mixture modeling with max_iter set to 5, it will report to have finished 4 iterations.

Because it is reporting the last iteration number 0...max_iter-1 in mixture/base.by, not the number of completed iterations.

This patch adds + 1 in the appropriate place.

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

@agramfort
Copy link
Member

@agramfort agramfort commented Mar 2, 2018

you need to fix travis

Copy link
Contributor

@glemaitre glemaitre left a comment

You need to add a regression test with the patch.

@@ -228,7 +228,8 @@ def fit(self, X, y=None):
if self.lower_bound_ > max_lower_bound:
max_lower_bound = self.lower_bound_
best_params = self._get_parameters()
best_n_iter = n_iter
# Report the number of completed iterations, not 0-indexed
Copy link
Contributor

@glemaitre glemaitre Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the comment

Copy link
Contributor Author

@kno10 kno10 Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then people will just think this is a mistake.
sklearn code could do with much more inline comments, actually. I also do not like how the current code relies on "leaking" the n_iter variable out of the for loop. That is pretty bad code style, but I tried to keep the patch small. And the variable is badly named. The name appears to be the "number of iterations", but it is a 0-based index.
Feel free to do any such changes, add a regression test, etc.

Copy link
Contributor

@glemaitre glemaitre Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second thought, the comment is ok there. It will remove some ambiguity.

Then people will just think this is a mistake.

I hope that people try to understand code before to mention this is a mistake :).

sklearn code could do with much more inline comments, actually

I would argue that code should be self-explanatory. Comments should be there to remove ambiguity.

And the variable is badly named

True and it brings ambiguity ;)

I also do not like how the current code relies on "leaking" the n_iter variable out of the for loop. That is pretty bad code style

I don't really see the issue with it.

Feel free to do any such changes, add a regression test, etc.

We are reviewing code of contributors but do not modify theirs PRs directly (apart of minor changes before merging).

Copy link
Member

@jnothman jnothman left a comment

The reporting in _print_verbose_msg_iter_end is still zero-based. Can't we just change the range?

@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Mar 15, 2018

@jnothman do we have the same issue than in #10723 in which a user could check the n_iter and would not only be interested about the printing?

Copy link
Contributor Author

@kno10 kno10 left a comment

renumeration

@kno10
Copy link
Contributor Author

@kno10 kno10 commented Mar 22, 2018

@glemaitre Please re-review, as I did the requested change.

@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Apr 21, 2018

Can you add a what's new entry and a regression test.

kno10 and others added 2 commits Apr 23, 2018
If you run mixture modeling with max_iter set to 5, it will report to have finished 4 iterations.

Because it is reporting the last iteration number 0...max_iter-1 in mixture/base.by, not the number of *completed* iterations.

This patch changes the iterations to be numbered 1...max_iter (inclusive)
@kno10
Copy link
Contributor Author

@kno10 kno10 commented Apr 23, 2018

Added test that fails on master with AssertionError: 0 != 1

(On master, n_iter_=0 when running with max_iter=1).

Contributions to sklearn would be easier if you would demand less boilerplate from occasional contributors. Even a trivial fix like this requires me to do many iterations. I probably won't go through this hassle again, the effort/benefit is too low.

@jnothman
Copy link
Member

@jnothman jnothman commented Apr 23, 2018

Contributions to sklearn would be easier if you would demand less boilerplate from occasional contributors. Even a trivial fix like this requires me to do many iterations. I probably won't go through this hassle again, the effort/benefit is too low.

I'm sorry you feel this way, but the requirement of tests is pretty standard in software development. Changelog entries are necessary to help users understand changed behaviour when running the script from version to version. The project is almost entirely run on volunteer time, so it's not like we're asking for these things just because we feel like it.

@kno10
Copy link
Contributor Author

@kno10 kno10 commented Apr 23, 2018

Do not apply the standards of commercial development to volunteers.
I agree that changelogs etc. should be maintained, but I don't think it is appropriate to completely offload this to casual contributors.
See: if I had just made a issue, I would have had much less work. Next time, I am probably not going to make a pull request.

Copy link
Contributor

@glemaitre glemaitre left a comment

Do not apply the standards of commercial development to volunteers.

This is not commercial development but rather good practices. A lot of things are sometimes consider useless (documentation, PEP8, etc.)

Could you make the CI happy.

@@ -172,6 +172,18 @@ def test_gaussian_mixture_attributes():
assert_equal(gmm.n_init, n_init)
assert_equal(gmm.init_params, init_params)

def test_regression_10740():
Copy link
Contributor

@glemaitre glemaitre Apr 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename test_regression_10740 to test_gmm_n_iter

@@ -172,6 +172,18 @@ def test_gaussian_mixture_attributes():
assert_equal(gmm.n_init, n_init)
assert_equal(gmm.init_params, init_params)

def test_regression_10740():
# Regression test for #10740. Ensure that n_iter_ is the number of iterations done
Copy link
Contributor

@glemaitre glemaitre Apr 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too long ...

@jnothman
Copy link
Member

@jnothman jnothman commented Apr 23, 2018

Please, if you feel more comfortable making an issue than a PR, that's fine. Do what you are comfortable with. Clearly different volunteers have different feelings about what is appropriate and what is burdensome. Those of us who spend many hours a week on this are hardly fazed by a small non-regression test.

Do not apply the standards of commercial development to volunteers

This project would have nowhere near its level of success without some rigour (albeit fallible) in testing and documentation. And yet our testing does not even come close to a commercial software house with dedicated QA personnel. I could have, for instance, asked you to write a generic test for all iterative estimators. Or to make sure that the result holds under certain kinds of randomised and edge case inputs. That's what you might expect from enterprise development.

@kno10
Copy link
Contributor Author

@kno10 kno10 commented Apr 23, 2018

Ok. Forget about it. I do not care anymore. Goodbye.

This is how you lose potential contributors.

@kno10 kno10 closed this Apr 23, 2018
@kno10 kno10 deleted the patch-2 branch Apr 23, 2018
@kno10 kno10 restored the patch-2 branch Apr 23, 2018
@kno10 kno10 deleted the patch-2 branch Apr 23, 2018
@kno10 kno10 restored the patch-2 branch Apr 23, 2018
@jnothman
Copy link
Member

@jnothman jnothman commented Apr 23, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants