New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-97933: inline list/dict/set comprehensions in functions #101441
base: main
Are you sure you want to change the base?
Conversation
@markshannon @iritkatriel would be grateful for a pyperformance run on this one (presuming signal all looks good.) |
I'll run the benchmarks now |
Benchmarking is delayed a bit, but should be back tomorrow at the latest. |
You don't seem to be clearing the inner variable before use, so that def f():
l = [ None ]
[ (l[0], l) in [[1,2]] ]
print(l) will, I think, print You can actually make the code more efficient by clearing the local when stashing it on the stack: inst(LOAD_FAST_AND_CLEAR, ( -- val)) {
val = LOCALS[oparg];
LOCALS[oparg] = NULL;
} as there is no need for a |
Python/compile.c
Outdated
// could be cleverer about minimizing swaps but it doesn't matter | ||
// because `apply_static_swaps` will eliminate all of these anyway | ||
ADDOP_I(c, loc, SWAP, 2); | ||
ADDOP_NAME(c, loc, STORE_FAST, k, varnames); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could be storing NULL
here. That is OK for the interpreter, but the dataflow analysis in add_checks_for_loads_of_uninitialized_variables
in the compiler assumes that STORE_FAST
leaves the variable initialized.
What you'll need to do is add a pseudo instruction, and convert that to STORE_FAST
in the assembler. That way the dataflow analysis won't be fooled.
You might want to update add_checks_for_loads_of_uninitialized_variables
to understand the LOAD_FAST_AND_CLEAR
, STORE_FAST_MAYBE_NULL
pairs, but that should wait for another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a test case:
def f():
if False:
x = 0
[x for _ in [1]]
return x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, thanks!
The test case needs to use [x for x in [1]]
, not [x for _ in [1]]
-- in the latter case, x
is not bound in the comprehension and is just a free var (converted to local by inlining) so we don't push/pop it at all.
A couple of obscure issues, but this looks like a very nice improvement. I think the gain in performance is well worth the cost in code size, given how common comprehensions are. |
* main: pythongh-101440: fix json snippet error in logging-cookbook.rst (python#101439) pythongh-99276 - Updated Doc/faq/general.rst (python#101396) Add JOBS parameter to docs Makefile (python#101395) pythongh-98831: rewrite GET_LEN, GET_ITER, BEFORE_WITH and a few simple opcodes in the instruction definition DSL (python#101443) pythongh-77607: Improve accuracy of os.path.join docs (python#101406) Fixes typo in asyncio.TaskGroup context manager code example (python#101449) pythongh-98831: Clean up and add cache size static_assert to macro (python#101442) pythongh-99955: use SUCCESS/ERROR return values in optimizer and assembler. Use RETURN_IF_ERROR where appropriate. Fix a couple of bugs. (python#101412)
Your example code doesn't involve a comprehension, I'm assuming you meant something like
But this doesn't raise So far I haven't been able to construct a test case where failing to clear an incoming variable bound inside the comprehension has any visible effect, because AFAICT any locals in a comprehension are always (necessarily by the syntactic limitations of how a comprehension is built) defined before use. Let me know if you can think of a test case I'm missing!
But I think this is sufficient reason regardless to justify switching from |
Actually I don't think
We need to save/restore the outer value of If there were a correctness reason for needing to clear in other cases, I could add an additional EDIT: there is also the option of doing clearing pushes and also moving evaluation of the outermost iterable to before the pushes; this would require some non-elidable SWAPs to preserve it as TOS through the pushes, though... |
@markshannon I think |
I've addressed the review comments. I have some TODOs for potential optimizations that could be done here or in a follow-up diff:
|
* main: pythonGH-100288: Skip extra work when failing to specialize LOAD_ATTR (pythonGH-101354) pythongh-101409: Improve generated clinic code for self type checks (python#101411) pythongh-98831: rewrite BEFORE_ASYNC_WITH and END_ASYNC_FOR in the instruction definition DSL (python#101458) pythongh-101469: Optimise get_io_state() by using _PyModule_GetState() (pythonGH-101470)
Sorry for posting a gibberish example. def f():
l = [ None ]
[ 1 for (l[0], l) in [[1,2]] ]
print(l)
f() 3.11
This PR
|
* main: pythongh-98831: rewrite PUSH_EXC_INFO and conditional jumps in the instruction definition DSL (python#101481) pythongh-98831: Modernize the LOAD_ATTR family (python#101488) pythongh-101498 : Fix asyncio.Timeout example in docs (python#101499) pythongh-101454: fix documentation for END_ASYNC_FOR (python#101455) pythongh-101277: Isolate itertools, add group and _grouper types to module state (python#101302) pythongh-101317: Add `ssl_shutdown_timeout` parameter for `asyncio.StreamWriter.start_tls` (python#101335) datetime.rst: fix combine() signature (python#101490)
* main: (82 commits) pythongh-101670: typo fix in PyImport_ExtendInittab() (python#101723) pythonGH-99293: Document that `Py_TPFLAGS_VALID_VERSION_TAG` shouldn't be used. (#pythonGH-101736) no-issue: Add Dong-hee Na as the cjkcodecs codeowner (pythongh-101731) pythongh-101678: Merge math_1_to_whatever() and math_1() (python#101730) pythongh-101678: refactor the math module to use special functions from c11 (pythonGH-101679) pythongh-85984: Remove legacy Lib/pty.py code. (python#92365) pythongh-98831: Use opcode metadata for stack_effect() (python#101704) pythongh-101283: Version was just released, so should be changed in 3.11.3 (pythonGH-101719) pythongh-101283: Fix use of unbound variable (pythonGH-101712) pythongh-101283: Improved fallback logic for subprocess with shell=True on Windows (pythonGH-101286) pythongh-101277: Port more itertools static types to heap types (python#101304) pythongh-98831: Modernize CALL and family (python#101508) pythonGH-101696: invalidate type version tag in `_PyStaticType_Dealloc` (python#101697) pythongh-100221: Fix creating dirs in `make sharedinstall` (pythonGH-100329) pythongh-101670: typo fix in PyImport_AppendInittab() (pythonGH-101672) pythongh-101196: Make isdir/isfile/exists faster on Windows (pythonGH-101324) pythongh-101614: Don't treat python3_d.dll as a Python DLL when checking extension modules for incompatibility (pythonGH-101615) pythongh-100933: Improve `check_element` helper in `test_xml_etree` (python#100934) pythonGH-101578: Normalize the current exception (pythonGH-101607) pythongh-47937: Note that Popen attributes are read-only (python#93070) ...
* main: Fix some typos in asdl_c.py (pythonGH-101757) pythongh-101747: Fix refleak in new `OrderedDict` repr (pythonGH-101748) pythongh-101430: Update tracemalloc to handle presize properly. (pythongh-101745) pythonGH-101228: Fix typo in docstring for read method of `_io.TextIOWrapper` class (python#101227) Fix typo in `test_fstring.py` (python#101600) pythongh-101726: Update the OpenSSL version to 1.1.1t (pythonGH-101727) pythongh-101283: Fix 'versionchanged' for the shell=True fallback on Windows in 3.12 (pythonGH-101728) LibFFI build requires x64 Cygwin, and skip the ARM build (pythonGH-101743)
@markshannon Ok, I think I've addressed all review comments, and CI looks good. Ready for review and pyperformance run. |
Given the current usage of STORE_FAST_MAYBE_NULL, it isn't possible to construct a repro where this is needed, because if the local was NULL coming into the comprehension it will already be marked as unsafe coming out of the comprehension (since the body of the comprehension is always a loop which may be skipped entirely). But in case this opcode is used in other contexts in future, it should mark the local unsafe.
* main: Docs: Fix getstatus() -> getcode() typos (python#101296) Docs: use parameter list for sqlite3.Cursor.execute* (python#101782) pythongh-101763: Update bundled copy of libffi to 3.4.4 on Windows (pythonGH-101784) pythongh-101517: make bdb avoid looking up in linecache with lineno=None (python#101787) pythongh-101759: Update Windows installer to SQLite 3.40.1 (python#101762) pythongh-101277: Finalise isolating itertools (pythonGH-101305) pythongh-101759: Update macOS installer to SQLite 3.40.1 (python#101761)
* main: pythongh-101810: Remove duplicated st_ino calculation (pythonGH-101811) pythongh-92547: Purge sqlite3_enable_shared_cache() detection from configure (python#101873) pythonGH-100987: Refactor `_PyInterpreterFrame` a bit, to assist generator improvement. (pythonGH-100988) pythonGH-87849: Simplify stack effect of SEND and specialize it for generators and coroutines. (pythonGH-101788) Correct trivial grammar in reset_mock docs (python#101861) pythongh-101845: pyspecific: Fix i18n for availability directive (pythonGH-101846) pythongh-89792: Limit test_tools freeze test build parallelism based on the number of cores (python#101841) pythongh-85984: Utilize new "winsize" functions from termios in pty tests. (python#101831) pythongh-89792: Prevent test_tools from copying 1000M of "source" in freeze test (python#101837) Fix typo in test_fstring.py (python#101823) pythonGH-101797: allocate `PyExpat_CAPI` capsule on heap (python#101798) pythongh-101390: Fix docs for `imporlib.util.LazyLoader.factory` to properly call it a class method (pythonGH-101391)
* main: (60 commits) pythongh-102056: Fix a few bugs in error handling of exception printing code (python#102078) pythongh-102011: use sys.exception() instead of sys.exc_info() in docs where possible (python#102012) pythongh-101566: Sync with zipp 3.14. (pythonGH-102018) pythonGH-99818: improve the documentation for zipfile.Path and Traversable (pythonGH-101589) pythongh-88233: zipfile: handle extras after a zip64 extra (pythonGH-96161) pythongh-101981: Apply HOMEBREW related environment variables (pythongh-102074) pythongh-101907: Stop using `_Py_OPCODE` and `_Py_OPARG` macros (pythonGH-101912) pythongh-101819: Adapt _io types to heap types, batch 1 (pythonGH-101949) pythongh-101981: Build macOS as recommended by the devguide (pythonGH-102070) pythongh-97786: Fix compiler warnings in pytime.c (python#101826) pythongh-101578: Amend PyErr_{Set,Get}RaisedException docs (python#101962) Misc improvements to the float tutorial (pythonGH-102052) pythongh-85417: Clarify behaviour on branch cuts in cmath module (python#102046) pythongh-100425: Update tutorial docs related to sum() accuracy (FH-101854) Add missing 'is' to `cmath.log()` docstring (python#102049) pythongh-100210: Correct the comment link for unescaping HTML (python#100212) pythongh-97930: Also include subdirectory in makefile. (python#102030) pythongh-99735: Use required=True in argparse subparsers example (python#100927) Fix incorrectly documented attribute in csv docs (python#101250) pythonGH-84783: Make the slice object hashable (pythonGH-101264) ...
This seems to result in some surprising behaviour in the presence of nested functions. Eg:
This seems to result in the list comprehension overwriting the closed over cell var. Whereas in current python this will print x=1 in all cases, with this I get:
It works if x isn't accessed in the inner function, but if it creates a closure, the STORE_FAST in the listcomp overwrites the parent x's value, rather than a local. |
@brianm78 Thanks! This is fixed in the latest commit. |
* main: (67 commits) pythongh-99108: Add missing md5/sha1 defines to Modules/Setup (python#102308) pythongh-100227: Move _str_replace_inf to PyInterpreterState (pythongh-102333) pythongh-100227: Move the dtoa State to PyInterpreterState (pythongh-102331) pythonGH-102305: Expand some macros in generated_cases.c.h (python#102309) Migrate to new PSF mailgun account (python#102284) pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives (in Python/) (python#102193) pythonGH-90744: Fix erroneous doc links in the sys module (python#101319) pythongh-87092: Make jump target label equal to the offset of the target in the instructions sequence (python#102093) pythongh-101101: Unstable C API tier (PEP 689) (pythonGH-101102) IDLE: Simplify DynOptionsMenu __init__code (python#101371) pythongh-101561: Add typing.override decorator (python#101564) pythongh-101825: Clarify that as_integer_ratio() output is always normalized (python#101843) pythongh-101773: Optimize creation of Fractions in private methods (python#101780) pythongh-102251: Updates to test_imp Toward Fixing Some Refleaks (pythongh-102254) pythongh-102296 Document that inspect.Parameter kinds support ordering (pythonGH-102297) pythongh-102250: Fix double-decref in COMPARE_AND_BRANCH error case (pythonGH-102287) pythongh-101100: Fix sphinx warnings in `types` module (python#102274) pythongh-91038: Change default argument value to `False` instead of `0` (python#31621) pythongh-101765: unicodeobject: use Py_XDECREF correctly (python#102283) [doc] Improve grammar/fix missing word (pythonGH-102060) ...
* main: (37 commits) pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives in sub interpreters module (python#102472) pythongh-95672: Fix versionadded indentation of get_pagesize in test.rst (pythongh-102455) pythongh-102416: Do not memoize incorrectly loop rules in the parser (python#102467) pythonGH-101362: Optimise PurePath(PurePath(...)) (pythonGH-101667) pythonGH-101362: Check pathlib.Path flavour compatibility at import time (pythonGH-101664) pythonGH-101362: Call join() only when >1 argument supplied to pathlib.PurePath() (python#101665) pythongh-102444: Fix minor bugs in `test_typing` highlighted by pyflakes (python#102445) pythonGH-102341: Improve the test function for pow (python#102342) Fix unused classes in a typing test (pythonGH-102437) pythongh-101979: argparse: fix a bug where parentheses in metavar argument of add_argument() were dropped (python#102318) pythongh-102356: Add thrashcan macros to filter object dealloc (python#102426) Move around example in to_bytes() to avoid confusion (python#101595) pythonGH-97546: fix flaky asyncio `test_wait_for_race_condition` test (python#102421) pythongh-96821: Add config option `--with-strict-overflow` (python#96823) pythongh-101992: update pstlib module documentation (python#102133) pythongh-63301: Set exit code when tabnanny CLI exits on error (python#7699) pythongh-101863: Fix wrong comments in EUC-KR codec (pythongh-102417) pythongh-102302 Micro-optimize `inspect.Parameter.__hash__` (python#102303) pythongh-102179: Fix `os.dup2` error reporting for negative fds (python#102180) pythongh-101892: Fix `SystemError` when a callable iterator call exhausts the iterator (python#101896) ...
Latest commit also inlines comprehensions in module and class scopes; thanks @markshannon for the suggestion. I'll come back to this tomorrow and add some more tests to increase confidence that all the edge cases are handled, and update PEP 709 accordingly. |
Closes #97933.
In builds with
--enable-optimizations
:Credit to @vladima for authoring the original version of this code in Cinder 3.8. The compiler approach is different in this version (so as to support inlining all comprehensions, regardless of name clashes); some of the symbol table changes remain the same.
A couple implementation notes:
LOAD_FAST_AND_CLEAR
because we need to push/pop the value of possibly-undefined variables on the stack while clearing them. To do this with existing opcodes would require eliminating the invariant/assert thatLOAD_FAST
never loadsNULL
, and would add inefficient refcount operations and interpreter loop cycles.co_varnames
andco_cellvars
for the inlined-into outer function and using the non-offset (varnames) storage location for it. This is how function arguments that are also cells are currently handled, so we just piggyback on that handling. This makes the needed push/pop of the outer cell simpler as we can just useLOAD_FAST_AND_CLEAR/STORE_FAST
as-is for it, and it will push/pop the cell itself. Otherwise we would need some new handling in the compiler for allowing push/pop of a cell in offset storage.