Skip to content

Memory leak in textual output streaming #17624

Closed
@krassowski

Description

@krassowski

Description

Streaming the stderr and stdout output leads to a severe memory leak.

Reproduce

  1. Open JupyterLab
  2. Create a code cell with
    from time import sleep
    for i in range(10_000):
        print('X' * 40, flush=True)
        sleep(0.001)
    or
    from time import sleep
    import logging
    for i in range(10_000):
        logging.warning('X' * 27)  # this gives 40 characters
        sleep(0.001)
  3. Open Dev Tools → Memory tab
  4. Record the allocation, run the cell:
    Image
    or
    Image
  5. See many copies of the string with decreasing sizes:
    Image
  6. See more copies of the string stored as concatenated strings, with retained size much larger than the shallow size:
    Image
  7. See even more copies of the strings stored as slices strings, with retained size much larger than the shallow size:
    Image
  8. See the concatenated strings retain sliced strings, preventing garbage collection:
    Image
  9. See that 2126 MB were allocated in total Image
  10. Try re-running with larger iteration number and see Chrome crash

In addition, these two non-native classes are implicated in the problematic retention, ch which is an minified symbol for yjs.structs.Item and Hl which is yjs.structs.ContentString:

Image

Image

The allocation stack points to thee addText function which in some code paths creates substrings of the string, and interfaces with the yjs string methods:

Image

export function addText(
prevIndex: number,
curText: IObservableString,
newText: string
): number {
const { text, index } = processText(prevIndex, newText, curText.text);
// Compute the difference between current text and new text.
let done = false;
let idx = 0;
while (!done) {
if (idx === text.length) {
if (idx === curText.text.length) {
done = true;
} else {
curText.remove(idx, curText.text.length);
done = true;
}
} else if (idx === curText.text.length) {
if (idx !== text.length) {
curText.insert(curText.text.length, text.slice(idx));
done = true;
}
} else if (text[idx] !== curText.text[idx]) {
curText.remove(idx, curText.text.length);
curText.insert(idx, text.slice(idx));
done = true;
} else {
idx++;
}
}
return index;

and for yjs.structs.Item the stack is:

Image

The yjs might be partially to blame as it may store more copies than necessary, but it appears as the core issue might be explained by a long standing (10+ years) bug (feature?) in Chrome and Firefox: substrings retain a copy of the original source string, which in case of streaming where the final string is a composition of all previous substrings can lead to a huge memory leak:

Note that the Firefox issue was closed last month and Firefox 141 may no longer manifest the bug, but the attempts to fix it in Chrome stalled.

Expected behavior

The resulting strings takes 410 kB (or a small multiple of that) in memory.

Context

  • Browser and version: Chrome 137
  • JupyterLab version: 4.4.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions