The Wayback Machine - https://web.archive.org/web/20250518013331/https://github.com/scikit-learn/scikit-learn/issues/8642
Skip to content

Implement nbytes or __sizeof__ (or equivalent) for estimators #8642

Open
@jcrist

Description

@jcrist

In dask.distributed we use a dispatch on type to determine memory overhead for intermediate results. Having a rough sense of the size of intermediate results is useful for the scheduler as it often correlates to the cost of serializing that intermediate between workers.

The default is to fallback to sys.getsizeof which calls the __sizeof__ method on the object. It would be useful if this (or an equivalent method) was implemented for scikit-learn estimators.

A naive generic implementation for estimators might be:

def __sizeof__(self):
    return sum(x.nbytes if hasattr(x, 'nbytes') else getsizeof(x)
               for x in self.__dict__.values())

It'd probably even be fine to ignore (or approximate) the memory usage of parameters, and just focus on the memory usage of the results of fit. This may be straightforward for numpy arrays, but less clear for things like trees.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions