-
-
Notifications
You must be signed in to change notification settings - Fork 32k
gh-134584: Decref elimination for float ops in the JIT #134588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
gh-134584: Decref elimination for float ops in the JIT #134588
Conversation
@tomasr8 @brandtbucher 7.5% speedup in nbody on the JIT in this PR 👀. I removed some of the optimizations in this PR to make things easier to review, and also to split up work later! |
Whether we can skip the refcount operation is a property of the reference, not the object. You'll need to remove the |
The optimizer in this PR uses Consider the following operation:
Then it's trivial to see that x and y have strong references elsewhere. Thus they can just become
Basically, I'm leveraging on the work that Matt did to do cheap and safe reference analysis. I think it should be possible to just do this for all |
Yeah this is not safe. If we assign it back to a locals, it needs to be an owned reference, not borrowed. |
I'm internally deliberating on this one. It's a lot of churn; a few thousand lines of diff. I don't think we need to. We used the tagged pointer representation over PyObject because PyObject has proper lifetimes, reference management, and also to save space. Meanwhile JitOptSymbol has no lifetime management at all (it's effectively immortal for the context of the optimizer), needs no reference management, and if we want to save space, we can just reuse the memory in the thread state. We basically get none of the associated benefits of the _PyStackRef API (sound lifetime semantics), for all the associated negatives like the code churn. If this needs better naming, I'm all for it. Eventually, I want to expand this so it's more powerful than the stackref API -- skipping reference counts when we know it's truly safe to skip, not just like stackrefs where we skip only because it's on the stack. For example, take the following code:
A smart optimizer would be able to skip all refcounting operations on a,b,c,d as long as x is kept live. As we know x holds an owned reference to a b c d. This is why skipping refcounting is a property of the symbol, not a property of the stackref, and it should be kept as part of the symbol. |
I've renamed it to clarify that the main property is observing the strong references are held by someone else, not whether we can skip refcounts or not. |
On second thought, this name is even worse. It's confusing and error-prone. |
This reverts commit 5c429b6.
I'll repeat what I said before "Whether we can skip the refcount operation is a property of the reference, not the object."
Will keep leaking as |
To be pedantic, that won't leak as we don't trace from function entry at the moment. So the optimizer will never see that as a constant symbol. But yes I get your point, let me just go through the churn then. |
@markshannon I have done the refactor like you requested. |
The key idea is to remove decref of inputs of common operations in the JIT, if they are deemed to be borrowed/using borrowed refcounts. We rely on the base interpreter's static analysis to determine if things are borrowed.
For the float operations, these can't be generated using the cases generator, because they use the
FromDoubleConsumeFloat
things which are notDECREF_INPUT
.For the other list/tuple stuff, or anything that use
DECREF_INPUT
s, we can automatically generate the instruction variants by not emitting the decref at all. However, I left that out of this PR as this is the initial proof of conceptRoughly a 7.5% speedup on nbody just from these optimizations alone!
It's still slower than the normal interpreter on my system, but it's a good start.
I removed the list ops at ad03b1e to make the PR easier to review, we can add them back in later for the speed boost!