The Wayback Machine - https://web.archive.org/web/20250530133635/https://github.com/python/cpython/pull/134588
Skip to content

gh-134584: Decref elimination for float ops in the JIT #134588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

Fidget-Spinner
Copy link
Member

@Fidget-Spinner Fidget-Spinner commented May 23, 2025

The key idea is to remove decref of inputs of common operations in the JIT, if they are deemed to be borrowed/using borrowed refcounts. We rely on the base interpreter's static analysis to determine if things are borrowed.

For the float operations, these can't be generated using the cases generator, because they use the FromDoubleConsumeFloat things which are not DECREF_INPUT.

For the other list/tuple stuff, or anything that use DECREF_INPUTs, we can automatically generate the instruction variants by not emitting the decref at all. However, I left that out of this PR as this is the initial proof of concept

JIT without these optimizations:
nbody: Mean +- std dev: 106 ms +- 3 ms

JIT with decref optimizations (this branch at 7f90d0cf7357b2b703110c685311fbac5f317ac1):
nbody: Mean +- std dev: 98.1 ms +- 2.5 ms

interpreter (no JIT):
nbody: Mean +- std dev: 90.2 ms +- 0.9 ms

Roughly a 7.5% speedup on nbody just from these optimizations alone!

It's still slower than the normal interpreter on my system, but it's a good start.

I removed the list ops at ad03b1e to make the PR easier to review, we can add them back in later for the speed boost!

@Fidget-Spinner Fidget-Spinner changed the title Decref elimination floats gh-134584: Decref elimination for float ops in the JIT May 23, 2025
@Fidget-Spinner
Copy link
Member Author

@tomasr8 @brandtbucher 7.5% speedup in nbody on the JIT in this PR 👀. I removed some of the optimizations in this PR to make things easier to review, and also to split up work later!

@markshannon
Copy link
Member

Whether we can skip the refcount operation is a property of the reference, not the object.
I'm a bit puzzled how this works, as is.

You'll need to remove the skip_refcount field from JitOptSymbol and change the type of the reference from JitOptSymbol * to JitOptSymbolRef and put the skip_refcount field there. Much like we did for PyStackRef.

@Fidget-Spinner
Copy link
Member Author

Whether we can skip the refcount operation is a property of the reference, not the object. I'm a bit puzzled how this works, as is.

The optimizer in this PR uses LOAD_FAST_BORROW and LOAD_CONST for a safe indicator that says "something else holds a strong reference to this reference" (in this case, localsplus for LOAD_FAST_BORROW, and co_consts for LOAD_CONST) AND "this reference will be consumed immediately by something, so it's safe to borrow it".

Consider the following operation:

LOAD_FAST_BORROW x
LOAD_FAST_BORROW y
BINARY_OP_ADD_FLOAT

Then it's trivial to see that x and y have strong references elsewhere. Thus they can just become

LOAD_FAST_BORROW x
LOAD_FAST_BORROW y
BINARY_OP_ADD_FLOAT__NO_DECREF_INPUT

Basically, I'm leveraging on the work that Matt did to do cheap and safe reference analysis.

I think it should be possible to just do this for all LOAD_FAST, let me try that.

@Fidget-Spinner
Copy link
Member Author

I think it should be possible to just do this for all LOAD_FAST, let me try that.

Yeah this is not safe. If we assign it back to a locals, it needs to be an owned reference, not borrowed.

@Fidget-Spinner
Copy link
Member Author

You'll need to remove the skip_refcount field from JitOptSymbol and change the type of the reference from JitOptSymbol * to JitOptSymbolRef and put the skip_refcount field there. Much like we did for PyStackRef.

I'm internally deliberating on this one. It's a lot of churn; a few thousand lines of diff. I don't think we need to. We used the tagged pointer representation over PyObject because PyObject has proper lifetimes, reference management, and also to save space. Meanwhile JitOptSymbol has no lifetime management at all (it's effectively immortal for the context of the optimizer), needs no reference management, and if we want to save space, we can just reuse the memory in the thread state. We basically get none of the associated benefits of the _PyStackRef API (sound lifetime semantics), for all the associated negatives like the code churn.

If this needs better naming, I'm all for it. Eventually, I want to expand this so it's more powerful than the stackref API -- skipping reference counts when we know it's truly safe to skip, not just like stackrefs where we skip only because it's on the stack. For example, take the following code:

x = (a, b, c, d)
// Do stuff with a, b, c, d

A smart optimizer would be able to skip all refcounting operations on a,b,c,d as long as x is kept live. As we know x holds an owned reference to a b c d. This is why skipping refcounting is a property of the symbol, not a property of the stackref, and it should be kept as part of the symbol.

@Fidget-Spinner
Copy link
Member Author

I've renamed it to clarify that the main property is observing the strong references are held by someone else, not whether we can skip refcounts or not.

@Fidget-Spinner
Copy link
Member Author

I've renamed it to clarify that the main property is observing the strong references are held by someone else, not whether we can skip refcounts or not.

On second thought, this name is even worse. It's confusing and error-prone.

@markshannon
Copy link
Member

I'll repeat what I said before "Whether we can skip the refcount operation is a property of the reference, not the object."

def leaks():
    v = 1.0
    while True:
        v = (v,)[0] * v

Will keep leaking as (v,)[0] converts the reference to v to a strong reference which is then leaked if the BINARY_OP is converted to the no-decref version.

@Fidget-Spinner
Copy link
Member Author

Will keep leaking as (v,)[0] converts the reference to v to a strong reference which is then leaked if the BINARY_OP is converted to the no-decref version.

To be pedantic, that won't leak as we don't trace from function entry at the moment. So the optimizer will never see that as a constant symbol. But yes I get your point, let me just go through the churn then.

@Fidget-Spinner
Copy link
Member Author

@markshannon I have done the refactor like you requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants