Document on how to use custom bin edges in KBinsDiscretizer #18498

rachittoshniwal · 2020-09-30T11:26:40Z

Describe the workflow you want to enable

A parameter accepting custom bin edges as an array.

Describe your proposed solution

Use pd.cut() under the hood or any other computationally efficient method.

Describe alternatives you've considered, if relevant

pandas' cut function.

Additional context

Say if we need to bin ages as 'infant', 'kid', 'teen', 'adult', 'senior citizen' into [0, 1, 13, 20, 60, np.inf] etc.

We can't do it using KBinsDiscretizer currently.

I wonder if there is any reason behind not implementing it?

titigmr · 2020-09-30T11:45:41Z

I think is a good idea to create bins from np.array values but we need to wait an answer of a core developer for this issue. I can work on it if it is accepted.

Also, it's better that edge's bins take strictly less than or less and equal values for the discretization?

glemaitre · 2020-09-30T12:02:40Z

One issue is that you will need to pass a list of arrays of length n_features. I don't know if it will be easy to deal with a high number of features.

ping @ogrisel

rachittoshniwal · 2020-09-30T12:45:10Z

One issue is that you will need to pass a list of arrays of length n_features. I don't know if it will be easy to deal with a high number of features.

Doesn't the same issue pop up with OrdinalEncoder when we pass in lists of ordered categories for each feature?

Secondly, the user was any way going to use pd.cut( ) and pass in list of arrays one feature at a time, and manually do it for the train and test sets separately.

It will simplify things for them as they can now automate the transform step for all the test set features which were binned.

scikit-learn will remember the bin edges from the fit and appropriately do the work on the test set when transformed

lorentzenchr · 2020-10-02T19:44:26Z

I'm in favour to offer a manual setup for bins. In my experience, this is important for use cases of just one or a few features that you want to bin yourself, often together with a ColumnTransformer.

The SplineTransformer of PR #18368 supports this, i.e. manually specifying bins (there it is knot positions). Note that SplineTransformer(degree=0, n_knots=n_bins+1) is equivalent to KBinsDiscretizer(n_bins=n_bins, encode='onehot-dense').

jnothman · 2020-10-04T23:25:17Z

One reason we have not supported manual bins is that it should be equivalently available using FunctionTransformer with something like pd.cut. Do we need to add a transformer for this purpose?

lorentzenchr · 2020-10-06T21:29:22Z

It would just be an additional argument. If we decide against, it would be good to give an example with the FunctionTransformer.

glemaitre · 2020-10-22T08:27:51Z

I would be in favour of using the FunctionTransformer. An example seems the best way. I think that it should go to the user guide.

bhargavasomya · 2020-11-01T00:26:04Z

Hello @glemaitre! I am pretty new to this and would like to contribute. Is this issue still open or someone is already working on it? Thanks.

glemaitre · 2020-11-10T13:59:44Z

@bhargavasomya sorry for the delay to answer. The issue is still open and you can submit a pull request

hitesh9116 · 2020-11-23T02:53:19Z

Sir I also want to work on this issue. If no one is working on it, can I ?

glemaitre · 2020-11-23T08:05:03Z

@bhargavasomya are you working on the issue?

rachittoshniwal added the New Feature label Sep 30, 2020

glemaitre added Documentation and removed New Feature labels Oct 22, 2020

glemaitre changed the title ~~Custom bin edges in KBinsDiscretizer~~ Document on how to use custom bin edges in KBinsDiscretizer Oct 22, 2020

glemaitre added Easy good first issue labels Oct 22, 2020

lorentzenchr added the help wanted label Oct 24, 2020

CoMartel linked a pull request that will close this issue Nov 27, 2020

[DOC] Document on how to use custom bin edges in FunctionTransformer #18929

Open

Todaime pushed a commit to Todaime/scikit-learn that referenced this issue Dec 6, 2020

Fixes #scikit-learn#18498

fb5307e

Todaime mentioned this issue Dec 6, 2020

Documentation on how to use custom edge bins, KBinsDiscretizer #18972

Open

cmarmo removed the help wanted label Dec 7, 2020

Nov	DEC	Jan
	08
2019	2020	2021

scikit-learn / scikit-learn

Document on how to use custom bin edges in KBinsDiscretizer #18498

Document on how to use custom bin edges in KBinsDiscretizer #18498

rachittoshniwal commented Sep 30, 2020 •

edited

titigmr commented Sep 30, 2020

glemaitre commented Sep 30, 2020

rachittoshniwal commented Sep 30, 2020

lorentzenchr commented Oct 2, 2020

jnothman commented Oct 4, 2020

lorentzenchr commented Oct 6, 2020

glemaitre commented Oct 22, 2020

bhargavasomya commented Nov 1, 2020

glemaitre commented Nov 10, 2020

hitesh9116 commented Nov 23, 2020

glemaitre commented Nov 23, 2020

scikit-learn / scikit-learn

Sponsor scikit-learn/scikit-learn

Join GitHub today

GitHub is where the world builds software

Document on how to use custom bin edges in KBinsDiscretizer #18498

Document on how to use custom bin edges in KBinsDiscretizer #18498

Comments

rachittoshniwal commented Sep 30, 2020 • edited

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

titigmr commented Sep 30, 2020

glemaitre commented Sep 30, 2020

rachittoshniwal commented Sep 30, 2020

lorentzenchr commented Oct 2, 2020

jnothman commented Oct 4, 2020

lorentzenchr commented Oct 6, 2020

glemaitre commented Oct 22, 2020

bhargavasomya commented Nov 1, 2020

glemaitre commented Nov 10, 2020

hitesh9116 commented Nov 23, 2020

glemaitre commented Nov 23, 2020

Essential cookies

Always active

Analytics cookies

rachittoshniwal commented Sep 30, 2020 •

edited