The Wayback Machine - https://web.archive.org/web/20220203064415/https://github.com/python/cpython/pull/28007
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-45021: Fix a hang in forked children #28007

Merged
merged 4 commits into from Sep 20, 2021
Merged

bpo-45021: Fix a hang in forked children #28007

merged 4 commits into from Sep 20, 2021

Conversation

@0x0L
Copy link
Contributor

@0x0L 0x0L commented Aug 27, 2021

_global_shutdown_lock should be reinitialized in forked children

https://bugs.python.org/issue45021

_global_shutdown_lock should be reinitialized in forked children
@the-knights-who-say-ni
Copy link

@the-knights-who-say-ni the-knights-who-say-ni commented Aug 27, 2021

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

Recognized GitHub username

We couldn't find a bugs.python.org (b.p.o) account corresponding to the following GitHub usernames:

@0x0L

This might be simply due to a missing "GitHub Name" entry in one's b.p.o account settings. This is necessary for legal reasons before we can look at this contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@pitrou
Copy link
Member

@pitrou pitrou commented Aug 31, 2021

It would be nice to add a simple test for this.

@pitrou pitrou requested a review from gpshead Aug 31, 2021
@0x0L
Copy link
Contributor Author

@0x0L 0x0L commented Aug 31, 2021

I'm not too sure how to test for a freeze: should I just put a minimal reproducer somewhere and let it freeze if run on unpatched version ?

@pitrou
Copy link
Member

@pitrou pitrou commented Aug 31, 2021

Yes, that would work.

@corona10 corona10 requested a review from vstinner Sep 11, 2021
Lib/test/test_concurrent_futures.py Outdated Show resolved Hide resolved
Lib/test/test_concurrent_futures.py Outdated Show resolved Hide resolved
Lib/concurrent/futures/thread.py Outdated Show resolved Hide resolved
@bedevere-bot
Copy link

@bedevere-bot bedevere-bot commented Sep 15, 2021

🤖 New build scheduled with the buildbot fleet by @ambv for commit 02891f9 🤖

If you want to schedule another build, you need to add the "🔨 test-with-buildbots" label again.

@0x0L
Copy link
Contributor Author

@0x0L 0x0L commented Sep 18, 2021

Build failures look unrelated to this patch

@vstinner
Copy link
Member

@vstinner vstinner commented Sep 20, 2021

buildbot/AMD64 Arch Linux Asan Debug PR

Unrelated and already fixed by: python/buildmaster-config@576b13d

buildbot/x86-64 macOS PR

Unrelated and known issue: https://bugs.python.org/issue43592

Copy link
Member

@vstinner vstinner left a comment

LGTM. @pitrou: Would you mind to double check the fix?

@gpshead
Copy link
Member

@gpshead gpshead commented Sep 20, 2021

because this test is mixing fork and threads which is a posix-nono it will probably run into hanging problems of its own on some platforms or build setups. If/When that happens, we can conditionally enable it only for a a known system and build configs it happens to not hang on. merging anyways, we'll untangle that when it happens.

@gpshead gpshead merged commit 0bfa110 into python:main Sep 20, 2021
74 of 76 checks passed
@miss-islington
Copy link
Contributor

@miss-islington miss-islington commented Sep 20, 2021

Thanks @0x0L for the PR, and @gpshead for merging it 🌮🎉.. I'm working now to backport this PR to: 3.9, 3.10.
🐍🍒🤖

@bedevere-bot
Copy link

@bedevere-bot bedevere-bot commented Sep 20, 2021

GH-28480 is a backport of this pull request to the 3.10 branch.

@bedevere-bot
Copy link

@bedevere-bot bedevere-bot commented Sep 20, 2021

GH-28481 is a backport of this pull request to the 3.9 branch.

miss-islington added a commit to miss-islington/cpython that referenced this issue Sep 20, 2021
_global_shutdown_lock should be reinitialized in forked children
(cherry picked from commit 0bfa110)

Co-authored-by: nullptr <[email protected]>
miss-islington added a commit to miss-islington/cpython that referenced this issue Sep 20, 2021
_global_shutdown_lock should be reinitialized in forked children
(cherry picked from commit 0bfa110)

Co-authored-by: nullptr <[email protected]>
@vstinner
Copy link
Member

@vstinner vstinner commented Sep 20, 2021

I'm happy to see my Lock._at_fork_reinit() method being used :-) https://bugs.python.org/issue40089

@vstinner
Copy link
Member

@vstinner vstinner commented Sep 20, 2021

Maybe at some point, we should consider to make the method public?

miss-islington added a commit that referenced this issue Sep 20, 2021
_global_shutdown_lock should be reinitialized in forked children
(cherry picked from commit 0bfa110)


Co-authored-by: nullptr <[email protected]>

Automerge-Triggered-By: GH:gpshead
miss-islington added a commit that referenced this issue Sep 20, 2021
_global_shutdown_lock should be reinitialized in forked children
(cherry picked from commit 0bfa110)


Co-authored-by: nullptr <[email protected]>

Automerge-Triggered-By: GH:gpshead
nsait-linaro added a commit to nsait-linaro/cpython that referenced this issue Sep 21, 2021
_global_shutdown_lock should be reinitialized in forked children
@0x0L
Copy link
Contributor Author

@0x0L 0x0L commented Sep 28, 2021

@vstinner @gpshead
It looks like there's an issue with this patch. Running the following in on an unpatched python

import os
from concurrent.futures import ProcessPoolExecutor
from concurrent.futures.thread import _global_shutdown_lock

if hasattr(os, 'register_at_fork'):
    os.register_at_fork(before=_global_shutdown_lock.acquire,
                        after_in_child=_global_shutdown_lock._at_fork_reinit,
                        ​after_in_parent=_global_shutdown_lock.release)

with ProcessPoolExecutor() as pool:
    r = list(pool.map(str.upper, ['a']))

fails in some env.... Oddly I have two conda env with the exact same hash for the python package and this example hangs up in one env and not the other.

Calling os.register_at_fork only with the after_in_child looks ok and actually seems more appropriate.
0x0L@dc2d0c9 passes the test

Any thoughts ? Should I open a new issue or can we reopen this one ?

@gpshead
Copy link
Member

@gpshead gpshead commented Sep 28, 2021

I cannot reproduce any hang with that snippet on such a python revision.

While not technically necessary for this issue bug "fix" (quotes because all use of threading+fork in one process is a bad idea), It still seems right to grab the lock across the fork, otherwise the forked child could be in a strange state w.r.t. the locks protected resources.

@0x0L
Copy link
Contributor Author

@0x0L 0x0L commented Sep 29, 2021

I also struggle to reproduce it. It only happens - consistently though - in a single one of my envs.
I'll try to find some time to attach a gdb and investigate a bit more.

@0x0L
Copy link
Contributor Author

@0x0L 0x0L commented Sep 30, 2021

:) Actually the python env where it didn't work was already patched by hand... Everything's look ok.
My apologies

@vstinner
Copy link
Member

@vstinner vstinner commented Sep 30, 2021

Don't worry, it happens to everybody. Thanks for double checking!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants