The Wayback Machine - https://web.archive.org/web/20230308165249/https://github.com/python/cpython/pull/101441
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-97933: inline list/dict/set comprehensions in functions #101441

Open
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

carljm
Copy link
Contributor

@carljm carljm commented Jan 30, 2023

Closes #97933.

In builds with --enable-optimizations:

➜ ./python -m pyperf timeit -s 'l = [1, 2, 3, 4, 5]' '[x for x in l]' --compare-to ../mainopt/python
/home/carljm/build/mainopt/python: ..................... 200 ns +- 3 ns
/home/carljm/build/ic2opt/python: ..................... 120 ns +- 2 ns

Mean +- std dev: [/home/carljm/build/mainopt/python] 200 ns +- 3 ns -> [/home/carljm/build/ic2opt/python] 120 ns +- 2 ns: 1.67x faster

Credit to @vladima for authoring the original version of this code in Cinder 3.8. The compiler approach is different in this version (so as to support inlining all comprehensions, regardless of name clashes); some of the symbol table changes remain the same.

A couple implementation notes:

  • We need a new opcode LOAD_FAST_AND_CLEAR because we need to push/pop the value of possibly-undefined variables on the stack while clearing them. To do this with existing opcodes would require eliminating the invariant/assert that LOAD_FAST never loads NULL, and would add inefficient refcount operations and interpreter loop cycles.
  • If a comprehension iteration variable is a cell inside the comprehension (i.e. the comprehension builds closures) we end up including that variable in both co_varnames and co_cellvars for the inlined-into outer function and using the non-offset (varnames) storage location for it. This is how function arguments that are also cells are currently handled, so we just piggyback on that handling. This makes the needed push/pop of the outer cell simpler as we can just use LOAD_FAST_AND_CLEAR/STORE_FAST as-is for it, and it will push/pop the cell itself. Otherwise we would need some new handling in the compiler for allowing push/pop of a cell in offset storage.

@carljm
Copy link
Contributor Author

carljm commented Jan 31, 2023

@markshannon @iritkatriel would be grateful for a pyperformance run on this one (presuming signal all looks good.)

@carljm carljm changed the title gh-97933: inline sync list/dict/set comprehensions gh-97933: inline list/dict/set comprehensions in functions Jan 31, 2023
@carljm carljm added the performance Performance or resource usage label Jan 31, 2023
@markshannon
Copy link
Member

I'll run the benchmarks now

@markshannon
Copy link
Member

Benchmarking is delayed a bit, but should be back tomorrow at the latest.

@markshannon
Copy link
Member

markshannon commented Jan 31, 2023

You don't seem to be clearing the inner variable before use, so that

def f():
    l = [ None ]
    [ (l[0], l) in [[1,2]] ]
    print(l)

will, I think, print [1] instead of raising an UnboundLocalError

You can actually make the code more efficient by clearing the local when stashing it on the stack:

inst(LOAD_FAST_AND_CLEAR, ( -- val)) {
    val = LOCALS[oparg];
    LOCALS[oparg] = NULL;
}

as there is no need for a NULL check or INCREF.

Python/compile.c Outdated
// could be cleverer about minimizing swaps but it doesn't matter
// because `apply_static_swaps` will eliminate all of these anyway
ADDOP_I(c, loc, SWAP, 2);
ADDOP_NAME(c, loc, STORE_FAST, k, varnames);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could be storing NULL here. That is OK for the interpreter, but the dataflow analysis in add_checks_for_loads_of_uninitialized_variables in the compiler assumes that STORE_FAST leaves the variable initialized.

What you'll need to do is add a pseudo instruction, and convert that to STORE_FAST in the assembler. That way the dataflow analysis won't be fooled.

You might want to update add_checks_for_loads_of_uninitialized_variables to understand the LOAD_FAST_AND_CLEAR, STORE_FAST_MAYBE_NULL pairs, but that should wait for another PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a test case:

def f():
     if False:
          x = 0
     [x for _ in [1]]
     return x

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, thanks!

The test case needs to use [x for x in [1]], not [x for _ in [1]] -- in the latter case, x is not bound in the comprehension and is just a free var (converted to local by inlining) so we don't push/pop it at all.

@markshannon
Copy link
Member

markshannon commented Jan 31, 2023

A couple of obscure issues, but this looks like a very nice improvement.

I think the gain in performance is well worth the cost in code size, given how common comprehensions are.

* main:
  pythongh-101440: fix json snippet error in logging-cookbook.rst (python#101439)
  pythongh-99276 - Updated Doc/faq/general.rst (python#101396)
  Add JOBS parameter to docs Makefile (python#101395)
  pythongh-98831: rewrite GET_LEN, GET_ITER, BEFORE_WITH and a few simple opcodes in the instruction definition DSL (python#101443)
  pythongh-77607: Improve accuracy of os.path.join docs (python#101406)
  Fixes typo in asyncio.TaskGroup context manager code example (python#101449)
  pythongh-98831: Clean up and add cache size static_assert to macro (python#101442)
  pythongh-99955: use SUCCESS/ERROR return values in optimizer and assembler. Use RETURN_IF_ERROR where appropriate. Fix a couple of bugs. (python#101412)
@carljm
Copy link
Contributor Author

carljm commented Jan 31, 2023

You don't seem to be clearing the inner variable before use

Your example code doesn't involve a comprehension, I'm assuming you meant something like

def f():
    l = [ None ]
    [ (l[0], l) for l in [[1,2]] ]
    print(l)

But this doesn't raise UnboundLocalError on main either, it works fine; the iteration variable l is always bound within the comprehension before the expression is evaluated.

So far I haven't been able to construct a test case where failing to clear an incoming variable bound inside the comprehension has any visible effect, because AFAICT any locals in a comprehension are always (necessarily by the syntactic limitations of how a comprehension is built) defined before use. Let me know if you can think of a test case I'm missing!

You can actually make the code more efficient by clearing the local when stashing it on the stack

But I think this is sufficient reason regardless to justify switching from LOAD_FAST_OR_NULL to LOAD_FAST_AND_CLEAR, so I'll do that!

@carljm
Copy link
Contributor Author

carljm commented Jan 31, 2023

Actually I don't think LOAD_FAST_AND_CLEAR will work, because of cases like this:

def f(x):
    return [x for x in x]

We need to save/restore the outer value of x, but we cannot clear it on entry because we are also relying on being able to load it when we evaluate the iterable part of the comprehension (which in the non-inlined case would have been evaluated outside the function and passed in as the .0 parameter.) So I think we need to stick with a non-clearing LOAD_FAST_OR_NULL on entry.

If there were a correctness reason for needing to clear in other cases, I could add an additional DELETE_FAST after the LOAD_FAST_OR_NULL, but so far I haven't found any such correctness need, and clearing this way is not an efficiency improvement.

EDIT: there is also the option of doing clearing pushes and also moving evaluation of the outermost iterable to before the pushes; this would require some non-elidable SWAPs to preserve it as TOS through the pushes, though...

@carljm
Copy link
Contributor Author

carljm commented Jan 31, 2023

@markshannon I think LOAD_FAST_OR_NULL could also become a pseudo-instruction that resolves to LOAD_FAST (which would get around having it turned into LOAD_FAST_CHECK), but we'd have to remove the assert(value != NULL) in the LOAD_FAST implementation. Do you prefer keeping that assertion at the cost of another (non-pseudo) opcode?

@carljm
Copy link
Contributor Author

carljm commented Jan 31, 2023

I've addressed the review comments. I have some TODOs for potential optimizations that could be done here or in a follow-up diff:

  • Eliminate some unnecessary push/pops in cases where the value is never loaded after the comprehension. This would fall into the general category of dead store elimination. I'm not sure if we can actually do this or if it violates some guarantees about introspectable frame state in tracing?
  • Better handling of SWAPs on the post-comprehension pops. apply_static_swaps can't help us here if the comprehension value is immediately returned (because we can't "swap" a RETURN_VALUE -- this case would go away if we can do the dead store elimination, since if it's right before a RETURN_VALUE it's clearly a dead store) or if it's immediately popped (because the POP_TOP will have a lineno of -1 and apply_static_swaps will thus refuse to swap it). Instead of having one SWAP 2 per pop, we could have a single SWAP N and do the last pop first.
  • Teach add_checks_for_loads_of_uninitialized_variables to understand and trace through the push/pop pairs.

* main:
  pythonGH-100288: Skip extra work when failing to specialize LOAD_ATTR (pythonGH-101354)
  pythongh-101409: Improve generated clinic code for self type checks (python#101411)
  pythongh-98831: rewrite BEFORE_ASYNC_WITH and END_ASYNC_FOR in the instruction definition DSL (python#101458)
  pythongh-101469: Optimise get_io_state() by using _PyModule_GetState() (pythonGH-101470)
@markshannon
Copy link
Member

Sorry for posting a gibberish example.
Here's what I should have posted:

def f():
    l = [ None ]
    [ 1 for (l[0], l) in [[1,2]] ]
    print(l)
f()

3.11

$ python3.11 ~/test/test.py 
Traceback (most recent call last):
   ...
UnboundLocalError: cannot access local variable 'l' where it is not associated with a value

This PR

$ ./python ~/test/test.py 
[1]

* main:
  pythongh-98831: rewrite PUSH_EXC_INFO and conditional jumps in the instruction definition DSL (python#101481)
  pythongh-98831: Modernize the LOAD_ATTR family (python#101488)
  pythongh-101498 : Fix asyncio.Timeout example in docs (python#101499)
  pythongh-101454: fix documentation for END_ASYNC_FOR (python#101455)
  pythongh-101277: Isolate itertools, add group and _grouper types to module state (python#101302)
  pythongh-101317: Add `ssl_shutdown_timeout` parameter for `asyncio.StreamWriter.start_tls` (python#101335)
  datetime.rst: fix combine() signature (python#101490)
* main: (82 commits)
  pythongh-101670: typo fix in PyImport_ExtendInittab() (python#101723)
  pythonGH-99293: Document that `Py_TPFLAGS_VALID_VERSION_TAG` shouldn't be used. (#pythonGH-101736)
  no-issue: Add Dong-hee Na as the cjkcodecs codeowner (pythongh-101731)
  pythongh-101678: Merge math_1_to_whatever() and math_1() (python#101730)
  pythongh-101678: refactor the math module to use special functions from c11 (pythonGH-101679)
  pythongh-85984: Remove legacy Lib/pty.py code. (python#92365)
  pythongh-98831: Use opcode metadata for stack_effect() (python#101704)
  pythongh-101283: Version was just released, so should be changed in 3.11.3 (pythonGH-101719)
  pythongh-101283: Fix use of unbound variable (pythonGH-101712)
  pythongh-101283: Improved fallback logic for subprocess with shell=True on Windows (pythonGH-101286)
  pythongh-101277: Port more itertools static types to heap types (python#101304)
  pythongh-98831: Modernize CALL and family (python#101508)
  pythonGH-101696: invalidate type version tag in `_PyStaticType_Dealloc` (python#101697)
  pythongh-100221: Fix creating dirs in `make sharedinstall` (pythonGH-100329)
  pythongh-101670: typo fix in PyImport_AppendInittab() (pythonGH-101672)
  pythongh-101196: Make isdir/isfile/exists faster on Windows (pythonGH-101324)
  pythongh-101614: Don't treat python3_d.dll as a Python DLL when checking extension modules for incompatibility (pythonGH-101615)
  pythongh-100933: Improve `check_element` helper in `test_xml_etree` (python#100934)
  pythonGH-101578: Normalize the current exception (pythonGH-101607)
  pythongh-47937: Note that Popen attributes are read-only (python#93070)
  ...
@carljm carljm marked this pull request as draft February 9, 2023 19:25
* main:
  Fix some typos in asdl_c.py (pythonGH-101757)
  pythongh-101747: Fix refleak in new `OrderedDict` repr (pythonGH-101748)
  pythongh-101430: Update tracemalloc to handle presize properly. (pythongh-101745)
  pythonGH-101228: Fix typo in docstring for read method of `_io.TextIOWrapper` class (python#101227)
  Fix typo in `test_fstring.py` (python#101600)
  pythongh-101726: Update the OpenSSL version to 1.1.1t (pythonGH-101727)
  pythongh-101283: Fix 'versionchanged' for the shell=True fallback on Windows in 3.12 (pythonGH-101728)
  LibFFI build requires x64 Cygwin, and skip the ARM build (pythonGH-101743)
@carljm carljm added the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label Feb 10, 2023
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @carljm for commit 17d5d84 🤖

If you want to schedule another build, you need to add the :hammer: test-with-refleak-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label Feb 10, 2023
@carljm carljm marked this pull request as ready for review February 10, 2023 06:14
@carljm
Copy link
Contributor Author

carljm commented Feb 10, 2023

@markshannon Ok, I think I've addressed all review comments, and CI looks good. Ready for review and pyperformance run.

Given the current usage of STORE_FAST_MAYBE_NULL, it isn't possible to
construct a repro where this is needed, because if the local was NULL
coming into the comprehension it will already be marked as unsafe coming
out of the comprehension (since the body of the comprehension is always
a loop which may be skipped entirely). But in case this opcode is used
in other contexts in future, it should mark the local unsafe.
* main:
  Docs: Fix getstatus() -> getcode() typos (python#101296)
  Docs: use parameter list for sqlite3.Cursor.execute* (python#101782)
  pythongh-101763: Update bundled copy of libffi to 3.4.4 on Windows (pythonGH-101784)
  pythongh-101517: make bdb avoid looking up in linecache with lineno=None (python#101787)
  pythongh-101759: Update Windows installer to SQLite 3.40.1 (python#101762)
  pythongh-101277: Finalise isolating itertools (pythonGH-101305)
  pythongh-101759: Update macOS installer to SQLite 3.40.1 (python#101761)
* main:
  pythongh-101810: Remove duplicated st_ino calculation (pythonGH-101811)
  pythongh-92547: Purge sqlite3_enable_shared_cache() detection from configure (python#101873)
  pythonGH-100987: Refactor `_PyInterpreterFrame` a bit, to assist generator improvement. (pythonGH-100988)
  pythonGH-87849: Simplify stack effect of SEND and specialize it for generators and coroutines. (pythonGH-101788)
  Correct trivial grammar in reset_mock docs (python#101861)
  pythongh-101845: pyspecific: Fix i18n for availability directive (pythonGH-101846)
  pythongh-89792: Limit test_tools freeze test build parallelism based on the number of cores (python#101841)
  pythongh-85984: Utilize new "winsize" functions from termios in pty tests. (python#101831)
  pythongh-89792: Prevent test_tools from copying 1000M of "source" in freeze test (python#101837)
  Fix typo in test_fstring.py (python#101823)
  pythonGH-101797: allocate `PyExpat_CAPI` capsule on heap (python#101798)
  pythongh-101390: Fix docs for `imporlib.util.LazyLoader.factory` to properly call it a class method (pythonGH-101391)
* main: (60 commits)
  pythongh-102056: Fix a few bugs in error handling of exception printing code (python#102078)
  pythongh-102011: use sys.exception() instead of sys.exc_info() in docs where possible (python#102012)
  pythongh-101566: Sync with zipp 3.14. (pythonGH-102018)
  pythonGH-99818: improve the documentation for zipfile.Path and Traversable (pythonGH-101589)
  pythongh-88233: zipfile: handle extras after a zip64 extra (pythonGH-96161)
  pythongh-101981: Apply HOMEBREW related environment variables (pythongh-102074)
  pythongh-101907: Stop using `_Py_OPCODE` and `_Py_OPARG` macros (pythonGH-101912)
  pythongh-101819: Adapt _io types to heap types, batch 1 (pythonGH-101949)
  pythongh-101981: Build macOS as recommended by the devguide (pythonGH-102070)
  pythongh-97786: Fix compiler warnings in pytime.c (python#101826)
  pythongh-101578: Amend PyErr_{Set,Get}RaisedException docs (python#101962)
  Misc improvements to the float tutorial (pythonGH-102052)
  pythongh-85417: Clarify behaviour on branch cuts in cmath module (python#102046)
  pythongh-100425: Update tutorial docs related to sum() accuracy (FH-101854)
  Add missing 'is' to `cmath.log()` docstring (python#102049)
  pythongh-100210: Correct the comment link for unescaping HTML (python#100212)
  pythongh-97930: Also include subdirectory in makefile. (python#102030)
  pythongh-99735: Use required=True in argparse subparsers example (python#100927)
  Fix incorrectly documented attribute in csv docs (python#101250)
  pythonGH-84783: Make the slice object hashable (pythonGH-101264)
  ...
@brianm78
Copy link

This seems to result in some surprising behaviour in the presence of nested functions. Eg:

     def f():
            x = 1
            print(f"start: {x=}")
            def g():
                    print(f"start g {x=}")
                    l = [x for x in range(10)]
                    print(f"end g {x=}")
            g()
            print(f"end f: {x=}")

This seems to result in the list comprehension overwriting the closed over cell var. Whereas in current python this will print x=1 in all cases, with this I get:

start: x=1
start g x=1
end g x=9
end f: x=9

It works if x isn't accessed in the inner function, but if it creates a closure, the STORE_FAST in the listcomp overwrites the parent x's value, rather than a local.

@carljm
Copy link
Contributor Author

carljm commented Feb 28, 2023

@brianm78 Thanks! This is fixed in the latest commit.

* main: (67 commits)
  pythongh-99108: Add missing md5/sha1 defines to Modules/Setup (python#102308)
  pythongh-100227: Move _str_replace_inf to PyInterpreterState (pythongh-102333)
  pythongh-100227: Move the dtoa State to PyInterpreterState (pythongh-102331)
  pythonGH-102305: Expand some macros in generated_cases.c.h (python#102309)
  Migrate to new PSF mailgun account (python#102284)
  pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives (in Python/) (python#102193)
  pythonGH-90744: Fix erroneous doc links in the sys module (python#101319)
  pythongh-87092: Make jump target label equal to the offset of the target in the instructions sequence (python#102093)
  pythongh-101101: Unstable C API tier (PEP 689) (pythonGH-101102)
  IDLE: Simplify DynOptionsMenu __init__code (python#101371)
  pythongh-101561: Add typing.override decorator (python#101564)
  pythongh-101825: Clarify that as_integer_ratio() output is always normalized (python#101843)
  pythongh-101773: Optimize creation of Fractions in private methods (python#101780)
  pythongh-102251: Updates to test_imp Toward Fixing Some Refleaks (pythongh-102254)
  pythongh-102296 Document that inspect.Parameter kinds support ordering (pythonGH-102297)
  pythongh-102250: Fix double-decref in COMPARE_AND_BRANCH error case (pythonGH-102287)
  pythongh-101100: Fix sphinx warnings in `types` module (python#102274)
  pythongh-91038: Change default argument value to `False` instead of `0` (python#31621)
  pythongh-101765: unicodeobject: use Py_XDECREF correctly (python#102283)
  [doc] Improve grammar/fix missing word (pythonGH-102060)
  ...
* main: (37 commits)
  pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives in sub interpreters module (python#102472)
  pythongh-95672: Fix versionadded indentation of get_pagesize in test.rst (pythongh-102455)
  pythongh-102416: Do not memoize incorrectly loop rules in the parser (python#102467)
  pythonGH-101362: Optimise PurePath(PurePath(...)) (pythonGH-101667)
  pythonGH-101362: Check pathlib.Path flavour compatibility at import time (pythonGH-101664)
  pythonGH-101362: Call join() only when >1 argument supplied to pathlib.PurePath() (python#101665)
  pythongh-102444: Fix minor bugs in `test_typing` highlighted by pyflakes (python#102445)
  pythonGH-102341: Improve the test function for pow (python#102342)
  Fix unused classes in a typing test (pythonGH-102437)
  pythongh-101979: argparse: fix a bug where parentheses in metavar argument of add_argument() were dropped (python#102318)
  pythongh-102356: Add thrashcan macros to filter object dealloc (python#102426)
  Move around example in to_bytes() to avoid confusion (python#101595)
  pythonGH-97546: fix flaky asyncio `test_wait_for_race_condition` test (python#102421)
  pythongh-96821: Add config option `--with-strict-overflow` (python#96823)
  pythongh-101992: update pstlib module documentation (python#102133)
  pythongh-63301: Set exit code when tabnanny CLI exits on error (python#7699)
  pythongh-101863: Fix wrong comments in EUC-KR codec (pythongh-102417)
  pythongh-102302 Micro-optimize `inspect.Parameter.__hash__` (python#102303)
  pythongh-102179: Fix `os.dup2` error reporting for negative fds (python#102180)
  pythongh-101892: Fix `SystemError` when a callable iterator call exhausts the iterator (python#101896)
  ...
@carljm
Copy link
Contributor Author

carljm commented Mar 7, 2023

Latest commit also inlines comprehensions in module and class scopes; thanks @markshannon for the suggestion.

I'll come back to this tomorrow and add some more tests to increase confidence that all the edge cases are handled, and update PEP 709 accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inline dict/list/set comprehensions in the compiler for better performance
4 participants