Implement `nbytes` or `__sizeof__`  (or equivalent) for estimators

In `dask.distributed` we use a [dispatch on type](https://github.com/dask/distributed/blob/master/distributed/sizeof.py#L14) to determine memory overhead for intermediate results. Having a rough sense of the size of intermediate results is useful for the scheduler as it often correlates to the cost of serializing that intermediate between workers.

The default is to fallback to `sys.getsizeof` which calls the [`__sizeof__` method](https://docs.python.org/3/library/sys.html#sys.getsizeof) on the object. It would be useful if this (or an equivalent method) was implemented for scikit-learn estimators.

A naive generic implementation for estimators might be:

```python
def __sizeof__(self):
    return sum(x.nbytes if hasattr(x, 'nbytes') else getsizeof(x)
               for x in self.__dict__.values())
```

It'd probably even be fine to ignore (or approximate) the memory usage of parameters, and just focus on the memory usage of the results of `fit`. This may be straightforward for numpy arrays, but less clear for things like [trees](https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_tree.pyx#L492).
        

Apr	MAY	Jun
	18
2024	2025	2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `nbytes` or `sizeof` (or equivalent) for estimators #8642

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement nbytes or __sizeof__ (or equivalent) for estimators #8642

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Implement `nbytes` or `sizeof` (or equivalent) for estimators #8642