gh-95778: Add pre-check for int-to-str conversion #96537

mdickinson · 2022-09-03T16:13:31Z

On current main, converting a large enough int to a decimal string raises ValueError as expected. However, the raise comes after the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop.

This PR gives a proof-of-concept quick fix: essentially we catch most values that exceed the threshold up front. Those that slip through will still be on the small side, and will get caught by the existing check.

For the record, here's the justification for the current check. The C code check is:

max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10

In math-speak, writing $M$ for max_str_digits, $L$ for PyLong_SHIFT and $s$ for size_a, that check is:
$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$

From this it follows that
$$\frac{M}{3L} < \frac{s-1}{10}$$
hence that
$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$
So
$$2^{L(s-1)} > 10^M.$$
But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything below the intended limit in the check.

~~I don't think this is ready to merge as-is - there are some details to figure out, and I'll add line-by-line comments for those.~~

Issue: CVE-2020-10735: Prevent DoS by large int<->str conversions #95778

Objects/longobject.c

…r overflow

mdickinson · 2022-09-03T17:00:38Z

To give an idea of how much the crude check misses: assuming 30 bits per digit and int_max_str_digits value at its default of 4300, the smallest integer for which the pre-check kicks in is 2**(30 * 480), which is about 6.79e4334. So values between 1e4300 and 6.79e4334 will end up getting converted and then meeting the post-loop check and raising ValueError.

EDIT: Whoops, sorry; the bound is 2**(30 * 480), not 2**(30 * 481). Edited to fix.

A RPi4 takes 10 seconds for the int_to_str test now.

skip rather than fail if we find unexpectedly high performance.

bedevere-bot · 2022-09-04T09:55:18Z

🤖 New build scheduled with the buildbot fleet by @gpshead for commit 4dae3e0 🤖

If you want to schedule another build, you need to add the "🔨 test-with-buildbots" label again.

bedevere-bot · 2022-09-04T10:00:47Z

🤖 New build scheduled with the buildbot fleet by @gpshead for commit adb1784 🤖

If you want to schedule another build, you need to add the "🔨 test-with-buildbots" label again.

miss-islington · 2022-09-04T16:21:21Z

Thanks @mdickinson for the PR, and @gpshead for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10, 3.11.
🐍🍒⛏🤖

…GH-96537) Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =) The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact. The justification for the current check. The C code check is: ```c max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10 ``` In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is: $$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$ From this it follows that $$\frac{M}{3L} < \frac{s-1}{10}$$ hence that $$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$ So $$2^{L(s-1)} > 10^M.$$ But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.  * Issue: pythongh-95778  Co-authored-by: Gregory P. Smith [Google LLC] <[email protected]> (cherry picked from commit b126196) Co-authored-by: Mark Dickinson <[email protected]>

bedevere-bot · 2022-09-04T16:21:31Z

GH-96562 is a backport of this pull request to the 3.11 branch.

miss-islington · 2022-09-04T16:21:32Z

Sorry, @mdickinson and @gpshead, I could not cleanly backport this to 3.10 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker b126196838bbaf5f4d35120e0e6bcde435b0b480 3.10

…ythonGH-96537) Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =) The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact. The justification for the current check. The C code check is: ```c max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10 ``` In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is: $$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$ From this it follows that $$\frac{M}{3L} < \frac{s-1}{10}$$ hence that $$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$ So $$2^{L(s-1)} > 10^M.$$ But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.  * Issue: pythongh-95778  Co-authored-by: Gregory P. Smith [Google LLC] <[email protected]> (cherry picked from commit b126196) Co-authored-by: Mark Dickinson <[email protected]>

bedevere-bot · 2022-09-04T16:32:33Z

GH-96563 is a backport of this pull request to the 3.10 branch.

…#96537) Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =) The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact. The justification for the current check. The C code check is: ```c max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10 ``` In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is: $$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$ From this it follows that $$\frac{M}{3L} < \frac{s-1}{10}$$ hence that $$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$ So $$2^{L(s-1)} > 10^M.$$ But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.  * Issue: pythongh-95778  Co-authored-by: Gregory P. Smith [Google LLC] <[email protected]>

Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =) The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact. The justification for the current check. The C code check is: ```c max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10 ``` In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is: $$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$ From this it follows that $$\frac{M}{3L} < \frac{s-1}{10}$$ hence that $$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$ So $$2^{L(s-1)} > 10^M.$$ But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.  * Issue: gh-95778  Co-authored-by: Gregory P. Smith [Google LLC] <[email protected]> (cherry picked from commit b126196) Co-authored-by: Mark Dickinson <[email protected]>

…#96537) Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =) The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact. The justification for the current check. The C code check is: ```c max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10 ``` In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is: $$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$ From this it follows that $$\frac{M}{3L} < \frac{s-1}{10}$$ hence that $$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$ So $$2^{L(s-1)} > 10^M.$$ But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.  * Issue: pythongh-95778  Co-authored-by: Gregory P. Smith [Google LLC] <[email protected]>

) (#96563) Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =) The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact. The justification for the current check. The C code check is: ```c max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10 ``` In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is: $$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$ From this it follows that $$\frac{M}{3L} < \frac{s-1}{10}$$ hence that $$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$ So $$2^{L(s-1)} > 10^M.$$ But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.  * Issue: gh-95778  Co-authored-by: Gregory P. Smith [Google LLC] <[email protected]> (cherry picked from commit b126196) Co-authored-by: Mark Dickinson <[email protected]>

* Correctly pre-check for int-to-str conversion (#96537) Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =) The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact. The justification for the current check. The C code check is: ```c max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10 ``` In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is: $$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$ From this it follows that $$\frac{M}{3L} < \frac{s-1}{10}$$ hence that $$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$ So $$2^{L(s-1)} > 10^M.$$ But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.  * Issue: gh-95778  Co-authored-by: Gregory P. Smith [Google LLC] <[email protected]> Co-authored-by: Christian Heimes <[email protected]> Co-authored-by: Mark Dickinson <[email protected]>

Add pre-check for int-to-str conversion

2fbc9ce

bedevere-bot added the awaiting core review label Sep 3, 2022

mdickinson reviewed Sep 3, 2022

View changes

Objects/longobject.c Outdated Show resolved Hide resolved

mdickinson reviewed Sep 3, 2022

View changes

Objects/longobject.c Outdated Show resolved Hide resolved

Avoid potential undefined behaviour from overflow

d16db91

mdickinson reviewed Sep 3, 2022

View changes

Objects/longobject.c Outdated Show resolved Hide resolved

Reworked, even cruder bound, that avoids potential issues with intege…

66a07ac

…r overflow

mdickinson added the skip news label Sep 3, 2022

mdickinson requested review from gpshead and tiran Sep 3, 2022

mdickinson added needs backport to 3.7 needs backport to 3.8 needs backport to 3.9 needs backport to 3.10 needs backport to 3.11 labels Sep 3, 2022

gpshead self-assigned this Sep 4, 2022

gpshead added type-bug An unexpected behavior, bug, or error type-security A security issue release-blocker labels Sep 4, 2022

gpshead added 2 commits Sep 4, 2022

Rename the error message constants.

6eadbda

Add a DoS prevention success timed regression test.

195686c

gpshead added 3 commits Sep 4, 2022

Improve the test to check close to the limit.

87bd23d

Use fewer digits in the test to speed it up on slow hosts.

de9ed4d

A RPi4 takes 10 seconds for the int_to_str test now.

Misc: Fix a typo in the header comment.

dbd8da9

gpshead removed needs backport to 3.8 needs backport to 3.9 labels Sep 4, 2022

gpshead added 2 commits Sep 4, 2022

Minor comment typo fix (restart CI).

56f08c2

cleanup the test.

4dae3e0

skip rather than fail if we find unexpectedly high performance.

gpshead added the 🔨 test-with-buildbots Test the PR with the buildbot fleet and report in the status section label Sep 4, 2022

bedevere-bot removed the 🔨 test-with-buildbots Test the PR with the buildbot fleet and report in the status section label Sep 4, 2022

gpshead approved these changes Sep 4, 2022

View changes

bedevere-bot added awaiting merge and removed awaiting core review labels Sep 4, 2022

Add Mark's name to the NEWS entry.

adb1784

gpshead added the 🔨 test-with-buildbots Test the PR with the buildbot fleet and report in the status section label Sep 4, 2022

bedevere-bot removed the 🔨 test-with-buildbots Test the PR with the buildbot fleet and report in the status section label Sep 4, 2022

gpshead merged commit b126196 into python:main Sep 4, 2022
81 checks passed

bedevere-bot removed the awaiting merge label Sep 4, 2022

bedevere-bot removed the needs backport to 3.11 label Sep 4, 2022

bedevere-bot removed the needs backport to 3.10 label Sep 4, 2022

Aug	SEP	Jan
	08
2021	2022	2025

gh-95778: Add pre-check for int-to-str conversion #96537

gh-95778: Add pre-check for int-to-str conversion #96537

mdickinson commented Sep 3, 2022 •

edited

mdickinson commented Sep 3, 2022 •

edited

bedevere-bot commented Sep 4, 2022

bedevere-bot commented Sep 4, 2022

miss-islington commented Sep 4, 2022

bedevere-bot commented Sep 4, 2022

miss-islington commented Sep 4, 2022

bedevere-bot commented Sep 4, 2022

gh-95778: Add pre-check for int-to-str conversion #96537

gh-95778: Add pre-check for int-to-str conversion #96537

Conversation

mdickinson commented Sep 3, 2022 • edited

mdickinson commented Sep 3, 2022 • edited

bedevere-bot commented Sep 4, 2022

bedevere-bot commented Sep 4, 2022

miss-islington commented Sep 4, 2022

bedevere-bot commented Sep 4, 2022

miss-islington commented Sep 4, 2022

bedevere-bot commented Sep 4, 2022

mdickinson commented Sep 3, 2022 •

edited

mdickinson commented Sep 3, 2022 •

edited