New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mixture modeling reports wrong number of n_iter_
#10740
Conversation
you need to fix travis |
sklearn/mixture/base.py
Outdated
@@ -228,7 +228,8 @@ def fit(self, X, y=None): | |||
if self.lower_bound_ > max_lower_bound: | |||
max_lower_bound = self.lower_bound_ | |||
best_params = self._get_parameters() | |||
best_n_iter = n_iter | |||
# Report the number of completed iterations, not 0-indexed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then people will just think this is a mistake.
sklearn code could do with much more inline comments, actually. I also do not like how the current code relies on "leaking" the n_iter
variable out of the for loop. That is pretty bad code style, but I tried to keep the patch small. And the variable is badly named. The name appears to be the "number of iterations", but it is a 0-based index.
Feel free to do any such changes, add a regression test, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second thought, the comment is ok there. It will remove some ambiguity.
Then people will just think this is a mistake.
I hope that people try to understand code before to mention this is a mistake :).
sklearn code could do with much more inline comments, actually
I would argue that code should be self-explanatory. Comments should be there to remove ambiguity.
And the variable is badly named
True and it brings ambiguity ;)
I also do not like how the current code relies on "leaking" the n_iter variable out of the for loop. That is pretty bad code style
I don't really see the issue with it.
Feel free to do any such changes, add a regression test, etc.
We are reviewing code of contributors but do not modify theirs PRs directly (apart of minor changes before merging).
The reporting in _print_verbose_msg_iter_end
is still zero-based. Can't we just change the range?
@glemaitre Please re-review, as I did the requested change. |
Can you add a what's new entry and a regression test. |
If you run mixture modeling with max_iter set to 5, it will report to have finished 4 iterations. Because it is reporting the last iteration number 0...max_iter-1 in mixture/base.by, not the number of *completed* iterations. This patch changes the iterations to be numbered 1...max_iter (inclusive)
Added test that fails on master with (On master, Contributions to sklearn would be easier if you would demand less boilerplate from occasional contributors. Even a trivial fix like this requires me to do many iterations. I probably won't go through this hassle again, the effort/benefit is too low. |
I'm sorry you feel this way, but the requirement of tests is pretty standard in software development. Changelog entries are necessary to help users understand changed behaviour when running the script from version to version. The project is almost entirely run on volunteer time, so it's not like we're asking for these things just because we feel like it. |
Do not apply the standards of commercial development to volunteers. |
Do not apply the standards of commercial development to volunteers.
This is not commercial development but rather good practices. A lot of things are sometimes consider useless (documentation, PEP8, etc.)
Could you make the CI happy.
@@ -172,6 +172,18 @@ def test_gaussian_mixture_attributes(): | |||
assert_equal(gmm.n_init, n_init) | |||
assert_equal(gmm.init_params, init_params) | |||
|
|||
def test_regression_10740(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename test_regression_10740
to test_gmm_n_iter
@@ -172,6 +172,18 @@ def test_gaussian_mixture_attributes(): | |||
assert_equal(gmm.n_init, n_init) | |||
assert_equal(gmm.init_params, init_params) | |||
|
|||
def test_regression_10740(): | |||
# Regression test for #10740. Ensure that n_iter_ is the number of iterations done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too long ...
Please, if you feel more comfortable making an issue than a PR, that's fine. Do what you are comfortable with. Clearly different volunteers have different feelings about what is appropriate and what is burdensome. Those of us who spend many hours a week on this are hardly fazed by a small non-regression test.
This project would have nowhere near its level of success without some rigour (albeit fallible) in testing and documentation. And yet our testing does not even come close to a commercial software house with dedicated QA personnel. I could have, for instance, asked you to write a generic test for all iterative estimators. Or to make sure that the result holds under certain kinds of randomised and edge case inputs. That's what you might expect from enterprise development. |
Ok. Forget about it. I do not care anymore. Goodbye. This is how you lose potential contributors. |
I suppose you're saying i should have cared less and not replied. thanks
for raising the issue
…On 24 Apr 2018 12:23 am, "Erich Schubert" ***@***.***> wrote:
Closed #10740 <#10740>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#10740 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz60b7ZQdxxcvbX03Mk1iE9ymJ21yjks5trePggaJpZM4SZFRs>
.
|
If you run mixture modeling with max_iter set to 5, it will report to have finished 4 iterations.
Because it is reporting the last iteration number 0...max_iter-1 in mixture/base.by, not the number of completed iterations.
This patch adds + 1 in the appropriate place.
Reference Issues/PRs
What does this implement/fix? Explain your changes.
Any other comments?