The Wayback Machine - https://web.archive.org/web/20231226213927/https://github.com/python/cpython/pull/113465
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-113464: A copy-and-patch JIT compiler #113465

Draft
wants to merge 404 commits into
base: main
Choose a base branch
from

Conversation

brandtbucher
Copy link
Member

@brandtbucher brandtbucher commented Dec 25, 2023

'Twas the night before Christmas, when all through the code
Not a core dev was merging, not even Guido;
The CI was spun on the PRs with care
In hopes that green check-markings soon would be there;
The buildbots were nestled all snug under desks,
Even PPC64 AIX;
Doc-writers, triage team, the Council of Steering,
Had just stashed every change and stopped engineering,

When in the "PRs" tab arose such a clatter,
They opened GitHub to see what was the matter.
Away to CPython they flew like a flash,
Towards sounds of PROT_EXEC and __builtin___clear_cache.
First LLVM was downloaded, unzipped
Then the Actions were running a strange new build script,
When something appeared, they were stopped in their tracks,
jit_stencils.h, generated from hacks,
With their spines all a-shiver, they muttered "Oh, shit...",
They knew in a moment it must be a JIT.

More rapid than interpretation it came
And it copied-and-patched every stencil by name:
"Now, _LOAD_FAST! Now, _STORE_FAST! _BINARY_OP_ADD_INT!
On, _GUARD_DORV_VALUES_INST_ATTR_FROM_DICT!
To the top of the loop! And down into the call!
Now cache away! Cache away! Cache away all!"
But why now? And how so? They needed a hint,
Thankfully, Brandt gave a great talk at the sprint;
So over to YouTube the reviewers flew,
They read the white paper, and the blog post too.

And then, after watching, they saw its appeal
Not writing the code themselves seemed so unreal.
And the platform support was almost too easy,
ARM64 Macs to 32-bit PCs.
There was some runtime C, not too much, just enough,
Basically a loader, relocating stuff;
It ran every test, one by one passed them all,
With not one runtime dependency to install.
Mostly build-time Python! With strict static typing!
For maintenance ease, and also nerd-sniping!

Though dispatch was faster, the JIT wasn't wise,
And the traces it used still should be optimized;
The code it was JIT'ing still needed some thinning,
With code models small, and some register pinning;
Or new calling conventions, shared stubs for paths slow,
Since this JIT was brand new, there was fruit hanging low.
It was awkwardly large, parsed straight out of the ELFs,
And they laughed when they saw it, in spite of themselves;

A configure flag, and no merging this year,
Soon gave them to know they had nothing to fear;
It wasn't much faster, at least it could work,
They knew that'd come later; no one was a jerk,
But they were still smart, and determined, and skilled,
They opened a shell, and configured the build;
--enable-experimental-jit, then made it,
And away the JIT flew as their "+1"s okay'ed it.
But they heard it exclaim, as it traced out of sight,
"Happy JIT-mas to all, and to all a good night!"

@brandtbucher brandtbucher added performance Performance or resource usage interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Dec 25, 2023
@brandtbucher brandtbucher self-assigned this Dec 25, 2023
@bedevere-app bedevere-app bot mentioned this pull request Dec 25, 2023
@brandtbucher
Copy link
Member Author

(FYI, the merge commit is broken because tier 2 isn't working on main right now... not something that's gonna get fixed this week. The head of my justin branch works fine, though.)


### Installing LLVM

The JIT compiler does not require end users to install any third-party dependencies, but part of it must be *built* using LLVM. You are *not* required to build the rest of CPython using LLVM, or the even the same version of LLVM (in fact, this is uncommon).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you perhaps put a few notes about what the blockers are for GCC support / what specifically is relied upon from LLVM?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1579,6 +1579,26 @@ else
AC_MSG_RESULT([no])
fi

# Check for --enable-experimental-jit:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this look for the LLVM tools and allow overriding the used paths and such?

Copy link
Member

@sobolevn sobolevn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits :)

Thanks a lot, this looks amazing.

#endif

#ifndef Py_BUILD_CORE
#error "this header requires Py_BUILD_CORE define"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#error "this header requires Py_BUILD_CORE define"
# error "this header requires Py_BUILD_CORE define"

replace = dataclasses.replace


S = typing.TypeVar("S", schema.COFFSection, schema.ELFSection, schema.MachOSection)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very common to name such variables as _S and _R

self.disassembly.append(f"{offset:x}: {' '.join(['00'] * padding)}")
self.body.extend([0] * padding)

def emit_aarch64_trampoline(self, hole: Hole) -> typing.Generator[Hole, None, None]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def emit_aarch64_trampoline(self, hole: Hole) -> typing.Generator[Hole, None, None]:
def emit_aarch64_trampoline(self, hole: Hole) -> typing.Iterable[Hole]:

Most likely that is the only thing you care about in this contract.



S = typing.TypeVar("S", schema.COFFSection, schema.ELFSection, schema.MachOSection)
R = typing.TypeVar(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely you mean _R = TypeVar("_R", bound=schema.COFFRelocation | schema.ELFRelocation | schema.MachORelocation)

Using type var values seems not needed here.

return dataclasses.replace(target, debug=debug, verbose=verbose)


def dump_header() -> typing.Generator[str, None, None]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def dump_header() -> typing.Generator[str, None, None]:
def dump_header() -> typing.Iterable[str]:

yield ""


def dump_footer(opnames: list[str]) -> typing.Generator[str, None, None]:
Copy link
Member

@sobolevn sobolevn Dec 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def dump_footer(opnames: list[str]) -> typing.Generator[str, None, None]:
def dump_footer(opnames: list[str]) -> typing.Iterable[str]:

Copy link
Member

@Eclips4 Eclips4 Dec 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterable means that it's can be reused. I think here should be Iterator[str]. Same for dump & dump_header functions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are fine :)

(but, mypy does not make this assumption about reusability: https://mypy-play.net/?mypy=latest&python=3.12&flags=strict&gist=c8d554830f0ab2731f4072c4ec6cb489)

yield "}"


def dump(stencil_groups: dict[str, StencilGroup]) -> typing.Generator[str, None, None]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def dump(stencil_groups: dict[str, StencilGroup]) -> typing.Generator[str, None, None]:
def dump(stencil_groups: dict[str, StencilGroup]) -> typing.Generator[str]:

Comment on lines +10 to +13
try:
args = [name, "--version"]
if echo:
print(shlex.join(args))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
try:
args = [name, "--version"]
if echo:
print(shlex.join(args))
args = [name, "--version"]
if echo:
print(shlex.join(args))
try:

Comment on lines +32 to +35
try:
args = ["brew", "--prefix", f"llvm@{LLVM_VERSION}"]
if echo:
print(shlex.join(args))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
try:
args = ["brew", "--prefix", f"llvm@{LLVM_VERSION}"]
if echo:
print(shlex.join(args))
args = ["brew", "--prefix", f"llvm@{LLVM_VERSION}"]
if echo:
print(shlex.join(args))
try:

@adrian17
Copy link
Contributor

I'm curious, given the perf results reported in your talk, do you have any documented ideas on improving the generated code - either by tinkering with whatever gets generated (though I'm aware messing with it too much manually defeats the idea of having it work "magically"), or by improving the template for LLVM? Some of the things I see in generated code seem really pessimized, like the obligatory jump-to-continue in each op (with a jump-to-register too, probably enforced by mcmodel=large), or 64-bit oparg immediates.

export QEMU_LD_PREFIX="/usr/$HOST"
./configure --enable-experimental-jit ${{ matrix.debug && '--with-pydebug' || '--enable-optimizations --with-lto' }} --build=x86_64-linux-gnu --host="$HOST" --with-build-python=../build/bin/python3 --with-pkg-config=no ac_cv_buggy_getaddrinfo=no ac_cv_file__dev_ptc=no ac_cv_file__dev_ptmx=yes
make all --jobs 2
./python -m test --exclude ${{ matrix.exclude }} --multiprocess 0 --timeout 3600 --verbose2 --verbose3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hugovk Can we apply reusable workflow as same as free-threaded?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think longer term we should integrate into the main test matrix, but there's a big matrix here, and we might want to have some different trigger rules and required checks whilst this is still experimental.

Plus shorter term, we need to get it passing, re: #113465 (comment) and https://github.com/python/cpython/actions/runs/7320036782?pr=113465

Comment on lines +62 to +63
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
- uses: actions/checkout@v4
- uses: actions/setup-python@v5

- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: '3.11'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
python-version: '3.11'
python-version: '3.12'

or for latest stable

Suggested change
python-version: '3.11'
python-version: '3.x'

?

- uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Windows
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of space to help readability.

Suggested change
- name: Windows
- name: Windows

choco install llvm --allow-downgrade --version ${{ matrix.llvm }}
./PCbuild/build.bat --experimental-jit ${{ matrix.debug && '-d' || '--pgo' }} -p ${{ matrix.architecture }}
./PCbuild/rt.bat ${{ matrix.debug && '-d' }} -p ${{ matrix.architecture }} -q --exclude ${{ matrix.exclude }} --multiprocess 0 --timeout 3600 --verbose2 --verbose3
- name: macOS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: macOS
- name: macOS

./configure --enable-experimental-jit ${{ matrix.debug && '--with-pydebug' || '--enable-optimizations --with-lto' }}
make all --jobs 3
./python.exe -m test --exclude ${{ matrix.exclude }} --multiprocess 0 --timeout 3600 --verbose2 --verbose3
- name: Native Linux
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: Native Linux
- name: Native Linux

export QEMU_LD_PREFIX="/usr/$HOST"
./configure --enable-experimental-jit ${{ matrix.debug && '--with-pydebug' || '--enable-optimizations --with-lto' }} --build=x86_64-linux-gnu --host="$HOST" --with-build-python=../build/bin/python3 --with-pkg-config=no ac_cv_buggy_getaddrinfo=no ac_cv_file__dev_ptc=no ac_cv_file__dev_ptmx=yes
make all --jobs 2
./python -m test --exclude ${{ matrix.exclude }} --multiprocess 0 --timeout 3600 --verbose2 --verbose3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think longer term we should integrate into the main test matrix, but there's a big matrix here, and we might want to have some different trigger rules and required checks whilst this is still experimental.

Plus shorter term, we need to get it passing, re: #113465 (comment) and https://github.com/python/cpython/actions/runs/7320036782?pr=113465

.tp_itemsize = 0,
.tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_DISALLOW_INSTANTIATION,
.tp_dealloc = uop_opt_dealloc,
};

PyObject *
PyUnstable_Optimizer_NewUOpOptimizer(void)
PyUnstable_Optimizer_NewUOpOptimizer(int jit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, maybe jit parameter can be declared as bool type?

@jessekrubin
Copy link

FWIW: unrelated to anything code, the "night before christmas" theme of the post is extremely well done! I cracked up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants