The Wayback Machine - https://web.archive.org/web/20240429215548/https://github.com/python/cpython/issues/111140
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C API: Consider adding public PyLong_AsByteArray() and PyLong_FromByteArray() functions #111140

Open
vstinner opened this issue Oct 20, 2023 · 59 comments
Labels
topic-C-API type-feature A feature request or enhancement

Comments

@vstinner
Copy link
Member

vstinner commented Oct 20, 2023

Feature or enhancement

The private _PyLong_AsByteArray() and _PyLong_FromByteArray() functions were removed in Python 3.13: see PR #108429.

@scoder asked what is the intended replacement for _PyLong_FromByteArray().

The replacement for _PyLong_FromByteArray() is PyObject_CallMethod((PyObject*)&PyList_Type, "from_bytes", "s#s", str, len, "big") but I'm not sure what is the easy way to set the signed parameter to True (default: signed=False).

The replacement for _PyLong_AsByteArray() is PyObject_CallMethod(my_int, "to_bytes", "ns", length, "big"). Same, I'm not sure how to easy set the signed parameter to True (default: signed=False).

I propose to add public PyLong_AsByteArray() and PyLong_FromByteArray() functions to the C API.

Python 3.12 modified PyLongObject: it's no longer a simple array of digits, but it's now a more less straightforward _PyLongValue structure which requires using unstable functions to access small "compact" values:

  • PyUnstable_Long_IsCompact()
  • PyUnstable_Long_CompactValue()

So having a reliable and simple way to import/export a Python int object as bytes became even more important.


A code search for _PyLong_AsByteArray in PyPI top 5,000 projects found 12 projects using it:

  • Cython (0.29.36)
  • blspy (2.0.2)
  • catboost (1.2)
  • fastobo (0.12.2)
  • gevent (22.10.2)
  • guppy3 (3.1.3)
  • line_profiler (4.0.3)
  • msgspec (0.16.0)
  • orjson (3.9.1)
  • pickle5 (0.0.12)
  • pyodbc (4.0.39)
  • rlp (3.0.0)

Linked PRs

@vstinner vstinner added type-feature A feature request or enhancement topic-C-API labels Oct 20, 2023
@vstinner vstinner changed the title C API: Consider adding a public PyLong_AsByteArray() and PyLong_FromByteArray() functions C API: Consider adding public PyLong_AsByteArray() and PyLong_FromByteArray() functions Oct 20, 2023
@scoder
Copy link
Contributor

scoder commented Oct 21, 2023

Thanks for creating the issue. I agree that the functions should be added. The current replacements seem awful for this kind of basic functionality. Going through an expensive Python call like this for converting between PyLong and large C integer types (int128_t) seems excessive.

Note that at least a couple of projects that you list use Cython implemented parts and thus probably just mention the function in there. I'm sure something like line_profiler would never end up calling it.

@serhiy-storchaka
Copy link
Member

It was already discussed several times. This API lacks some features which would be needed for general use. You need to know the size of the resulting bytes array. Calculating it is not trivial, especially for negative integers. Also, it would be core convenient to support "native" ending, not just "big"/"littel".

I have been thinking about implementing a similar API for mpz_import/mpz_export in GMP (https://gmplib.org/manual/Integer-Import-and-Export) or mp_unpack/mp_unpack in libtommath (https://github.com/libtom/libtommath). It is powerful and a kind of standard. It is general enough to support internal PyLong represenation and the marshal format used for long integers (15 bits packed in 16-bit words). But it is only for unsigned integers, you are supposed to store the sign separately. It should be extended to support negative integers in several formats, for convenience and for performance.

@vstinner
Copy link
Member Author

I'm not sure that passing the endian as a string is efficient if this function is part of hot code.

@serhiy-storchaka
Copy link
Member

Not as a string. Just 3-variant value native/little/big instead of boolean little/big.

@vstinner
Copy link
Member Author

Not as a string. Just 3-variant value native/little/big instead of boolean little/big.

Sorry, I was confused between C API (int for the endian) and the Python API (string for the endian).

@scoder
Copy link
Contributor

scoder commented Oct 21, 2023

It was already discussed several times. This API lacks some features which would be needed for general use.

Ok, then please put the existing function back in until there is a public replacement.

@scoder
Copy link
Contributor

scoder commented Oct 21, 2023

I created PR #111162

@vstinner
Copy link
Member Author

This API lacks some features which would be needed for general use.

Which features are missing?

Do you have links to previous discussions if it was discussed previously?

@vstinner
Copy link
Member Author

You need to know the size of the resulting bytes array.

wcstombs() can be called with NULL buffer to compute the buffer size. It avoids to have to provide a second API "calculate the buffet size".

I suppose that a common use case is also to convert a Python int object to C int type for which there is no C API. Like int128_t

@scoder
Copy link
Contributor

scoder commented Oct 23, 2023 via email

@encukou
Copy link
Member

encukou commented Oct 23, 2023

I'm partial to API like mpz_export that accepts a buffer and length, and can:

  • Successfully fill the buffer, and output the length
  • Not fill the buffer, but output the necessary length

This allows handling the common cases with a minimum of function calls:

  • If the size is already known, you only need one function call
  • If the caller can pre-allocate a buffer, and the value happens to fit, this is again one function call
  • If the size is unknown, and a pre-allocated buffer can't be used or turns out to be too small, it's two function calls

But, IMO we also need general API for exporting/importing Python objects to/from buffers (see e.g. #15 in capi-workgroup/problems), and it would be good to make this consistent.

I'd prefer adding the original functions back until we design a proper replacement.

@scoder
Copy link
Contributor

scoder commented Oct 24, 2023

mpz_export() looks like a good blue print. It's based on "word data", though, not bytes. I'm not sure if that design is something important to copy. There are certainly use cases for this (it resembles SIMD-like operations, for example). However, I'm not convinced that reordering bytes in native/little/big-endian words is an important feature for a PyLong C-API. Whoever needs this kind of special functionality can probably implement it in a separate pass on their side. A simple byte array export in (overall) native/little/big ordering seems sufficient to get the data in and out.

Note that it goes together with an mpz_sgn() function to query the sign since that is not part of the export. That's reasonable since the sign is unlikely to become part of the internal PyLong digits representation even in the future. Given that Py_SIZE() cannot be used for the sign detection any more, a stable function and a fast inline function would both be helpful for this.

So, basically, the proposal is to add

  • a function PyLong_AsByteArray() for the unsigned export, modelled after mpz_export
  • a function PyLong_FromByteArray() for the unsigned import, modelled after mpz_import
  • a function PyLong_Sign() to detect the sign as -1, 0, 1
  • an inline function PyUnstable_Long_Sign() to read the sign without checks and ABI compatibility guarantees
  • a function PyLong_BitLength() to count the number of bits used by the PyLong value

IMO we also need general API for exporting/importing Python objects to/from buffers, and it would be good to make this consistent.

It's not strictly related, though. I think a PyLong number is sufficiently different from an arbitrary Python object array to not require a highly similar interface. If it can be made similar, fine. I wouldn't hold up one for the other, though.

Regarding Serhiy's concerns about missing ABI-stability of enum flags and arguments: we've used C macro names for this for ages, and they turn into simple integers that can be passed as C int arguments. Just #define some names and people will use them.

encukou pushed a commit that referenced this issue Oct 25, 2023
* gh-106320: Re-add _PyLong_FromByteArray(), _PyLong_AsByteArray() and _PyLong_GCD() to the public header files since they are used by third-party packages and there is no efficient replacement.

See #111140
See #111139

* gh-111262: Re-add _PyDict_Pop() to have a C-API until a new public one is designed.
@vstinner
Copy link
Member Author

See also comments about removed _PyLong_New(): #108604 (comment)

@vstinner
Copy link
Member Author

_PyLong_FromByteArray(), _PyLong_AsByteArray() and _PyLong_GCD() functions were restored by commit a8a89fc.

@zooba
Copy link
Member

zooba commented Feb 1, 2024

Reopening, because I think at a minimum we should have the two functions mentioned in the title.

My proposed API (I have an implementation, but not PR ready yet, and one open question) is basically the one Petr liked but simpler:

/* PyLong_AsByteArray: Copy the integer value to a native address.
   n is the number of bytes available in the buffer.
   Uses the current build's default endianness, and sign extends the value
   to fit the buffer.

   Returns 0 on success or -1 with an exception set, but if the buffer is
   not big enough, returns the desired buffer size without setting an
   exception. Note that the desired size may be larger than strictly
   necessary to avoid precise calculations. */
PyAPI_FUNC(int) PyLong_AsByteArray(PyObject* v, void* buffer, size_t n);

/* PyLong_FromByteArray: Create an integer value containing the number from
   a native buffer.
   n is the number of bytes to read from the buffer.
   Uses the current build's default endianness, and assumes the value was
   sign extended to 'n' bytes.

   Returns the int object, or NULL with an exception set. */
PyAPI_FUNC(PyObject *v) PyLong_FromByteArray(void* buffer, size_t n);

I'm comfortable making these only do default endianness, because they're really intended as an extension of all the other int conversions we have that also do default endianness. Alternate endianness is a specialised formatting or bit packing operation.

The "designed size may be larger than strictly necessary" is to allow returning sizeof(Py_ssize_t) for a "compact" int, rather than calculating the exact number of bits, and potentially doing the same for a larger int if we come up with a faster calculation.

I envision the use here to be like this (note the EAFP):

int value;
int n = PyLong_AsByteArray(o, &value, sizeof(value));
if (n < 0) goto abort; // exception, nothing useful we can do
if (n == 0) {
    // use value
} else {
    // malloc some memory
    n = PyLong_AsByteArray(o, <new memory>, n);
    // use new memory
    free(new_memory);
}

Similarly for FromByteArray - that should work efficiently against pointers to locals, so you don't have to choose the right API name for the type.

However, the bit I'm wavering on is what to do with unsigned values with the MSB set. Right now, you need to allow an extra byte to "prove" that 2**64-1 is not -1, which is just a pain when you'd rather PyLong_AsByteArray(o, &uint64_value, sizeof(unsigned int64_t)). With C++ templates this wouldn't even come up, but I can't decide on my own between:

  • a second function (easiest when native type is known at compile time, which seems likely)
  • a parameter (easiest when native type is only known at runtime...)
  • always treat the value as signed (most like Python, but harder for a caller to code)
  • always treat as unsigned but add a "negative" out parameter (not sure how callers would deal with that?)

I don't think we have perf concerns at the point where this matters, as we're already at the extremes of a 64-bit integer (for most platforms). That's too big for a "compact" int, so we're on the slow path already. But I do want to get the usability right. I'm leaning towards a PyLong_AsUnsignedByteArray function that basically differs only in the sign extension part.

@zooba zooba reopened this Feb 1, 2024
@encukou
Copy link
Member

encukou commented Feb 1, 2024

I'd prefer exposing both endiannness and signedness as arguments. As I see it, the functions should be intended for serialization too, not just for converting to native ints -- and in that case, it's best to be explicit.

Perhaps we should use named flags, like:

int n = PyLong_AsByteArray(o, &value, sizeof(value), Py_AS_NATIVE_ENDIAN | Py_AS_SIGNED);

@zooba
Copy link
Member

zooba commented Feb 1, 2024

As I see it, the functions should be intended for serialization too

All the scenarios I've seen have just been about converting to native ints (in contexts where serialization may happen next, but has to happen via a native int). Can you/anyone show me some where the caller doesn't want the int value, but just wants to store the bytes? (And doesn't want to/can't use the struct module, which is intended for this case.)

FWIW, non-default endianness is inevitably a slow path. We can make this very fast for normal sized, native endian values, which are the vast majority of cases, but forcing an endianness has to slow things down.

@zooba
Copy link
Member

zooba commented Feb 1, 2024

How about this as a proposed API:

PyAPI_FUNC(int) PyLong_AsByteArray(PyObject* v, void* buffer, size_t n);
PyAPI_FUNC(int) PyLong_AsUnsignedByteArray(PyObject* v, void* buffer, size_t n);
PyAPI_FUNC(int) PyLong_AsByteArrayWithOptions(PyObject* v, void* buffer, size_t n, int flags);

Where the first two are essentially exported aliases that make it easier to read/write code without having to remember/write a set of flags every time?

@zooba
Copy link
Member

zooba commented Feb 26, 2024

Putting -(2**127) into a 128-bit signed value should work fine. The topmost bit will be set and the rest are zero, which is correct. -(2**127 + 1) requires 129 bits or else you lose the sign.

Values in [2**127, 2**128) into an 128-bit unsigned value are the edge case. Note that all of them are positive, so your check needs to be that the number is in this range - not trivial to do in any language, and certainly not as simple as checking the sign.

An additional flag could be used to make the API return 16 rather than 17 in this case. But that's all it would be - it isn't really checking signed/unsigned, it's just not requiring room for a sign bit for large positive integers that would otherwise have lost their 0 sign bit.

(It would probably also imply "negative inputs are treated as positive with infinite leading 1 bits, but we only need room for at least one of them" which is actually the normal definition of two's complement, which is why I didn't bother making a fuss about it in the docs. Adding more details sometimes just makes things more confusing.)

@zooba
Copy link
Member

zooba commented Feb 26, 2024

The most efficient way I can think of to handle this particular case would be to extract into 128-bits, and if the returned value is 17 and the destination is unsigned then do the more complex comparison. Most of the time, the return value will be less than 17 and you don't have to do any extra work at all, and if it's more than 17 you don't have to do the work either.

At that sized value, we're already relying on a very complex function that doesn't give us enough information to know whether it only needs the sign bit or not. Maybe that could be added, but it would be more destabilising than I want to risk myself (and I don't know who the expert would be for it at this point). We'd probably end up doing an internal overallocation ourselves to implement a flag to make it transparent, which is going to be a penalty on every single conversion, not just those that are close to a boundary.

[Update] I meant 16/17 bytes, of course.

zooba added a commit to zooba/cpython that referenced this issue Feb 28, 2024
@zooba
Copy link
Member

zooba commented Feb 28, 2024

Putting -(2**127) into a 128-bit signed value should work fine.

Okay, I see the problem here is that _PyLong_NumBits (which already existed) doesn't handle negative values, and so _PyLong_NumBits(-2**127) == _PyLong_NumBits(2**127) despite this not being strictly true. We can adjust the expected size in this case to handle it.

This probably means that a new argument could be added here for the "don't require a sign bit if it would be 0" case.

@scoder
Copy link
Contributor

scoder commented Feb 28, 2024 via email

@gpshead
Copy link
Member

gpshead commented Feb 28, 2024

I think a PyLong_AsUnsignedNativeBytes addition to compliment PyLong_AsNativeBytes makes the most sense. It offers round-trippable symmetry with the new PyLong_FromNativeBytes and PyLong_FromUnsignedNativeBytes pair.

This way the API reports overflowing values accurately via the returned size instead of forcing API callers to reinvent how to determine that on their own.

@zooba
Copy link
Member

zooba commented Feb 29, 2024

What will PyLong_AsUnsignedNativeBytes do if passed a negative int? Will it still sign extend? That's important behaviour to me, and I'd hate to lose it in favour of an arbitrary exception.

FWIW, the underlying function that does the conversion has an "unsigned" mode that just rejects negative values, but doesn't otherwise change its behaviour. What is needed here is something different from that.

@encukou
Copy link
Member

encukou commented Feb 29, 2024

What will PyLong_AsUnsignedNativeBytes do if passed a negative int?

A negative value cannot be represented as unsigned native bytes, so I think it should reject them: set ValueError and return -1.

@zooba
Copy link
Member

zooba commented Feb 29, 2024

The problem is that conversions already exist that interpret unsigned native bytes as signed, and so you can get negative values that were never meant to be (e.g. many Windows error codes show up on OSError.winerror as negative, when they weren't meant to).

In C, if you cast those negative values to unsigned then it will sign extend. Why shouldn't we do the same in C?

@encukou
Copy link
Member

encukou commented Feb 29, 2024

IMO, that's a C language issue, and one that you can solve by careful typing. Regrettably, C is overly eager to cast between signed and unsigned. Recent compilers have warnings for some of that, but it's not really something C can change at this point.
Kinda like str/unicode in Python 2: you can convert between them so freely that they can look like the same thing. But they're not.

PyLong is strongly signed. If users want sign extension, they should use PyLong_AsNativeBytes, not the unsigned variant.

@zooba
Copy link
Member

zooba commented Feb 29, 2024

As long as the existing behaviour/test for -2147467259 doesn't change, then sure.

I'd like to see the new unsigned API be clearly marked as only working on positive input values, but I'm also happy to stay out of the way of it.

@scoder
Copy link
Contributor

scoder commented Feb 29, 2024

There are good use cases for both – reading the signed (actual) value of a PyLong and reading its abs() value. Given that the internal representation makes the latter very efficient, it seems tempting to pass that efficiency on to the users.

Maybe the unsigned function could have a second return argument pointer that gives the sign as 1 and -1? Users can pass NULL if they know the sign, and look at the returned sign if they don't. Raising an exception is easy enough to leave it to users if sign and expectation diverge.

@zooba
Copy link
Member

zooba commented Feb 29, 2024

reading its abs() value

This is yet another case that nobody has mentioned previously, though I agree it's temptingly cheap to expose (at least for large values - anything less than Py_ssize_t is going to involve a comparison and negation - and of course performance characteristics can change over time, which is why we're exposing an API that isn't based on the internal representation in the first place).

But (unsigned)-2147467259 is not the same as abs(-2147467259), which is why having the C-style conversion is useful.

What's really the problem here is extracting values that require all the available bits to provide full fidelity. So one solution could be an optional out parameter that returns true iff it a zero sign bit was the only bit that couldn't be copied. So then the checks become:

  • res < 0 for when an exception was set
  • res <= sizeof(target) for when the entire value fit (the usual case)
  • res > sizeof(target) && sign_bit_overflow_only == 1 for when a positive value happened to leave the MSB set (e.g. [128, 256) into a single byte)12
  • res > sizeof(target) && sign_bit_overflow_only == 0 for when the value was truncated to fit

If you're going to treat the value as signed, you only need to check res <= sizeof(target), as today. (If you want the sign, look at the MSB of the result.)

If you're going to treat the result as unsigned, you also allow sign_bit_overflow_only == 1 (noting that negative input values will never need to set this new flag).

Footnotes

  1. Note that negative values cannot overflow by just the sign bit. At least one leading 1 bit has to remain, and so (signed)0x...FFFF_7FFF cannot fit into 16 bits with a sign bit overflow. (signed)0x...FFFF_FFFF can always fit into 16 bits - the problem is that (unsigned)0xFFFF can also fit into 16 bits but we need to know that the 17th bit would've been zero.

  2. You wouldn't really need to check res > sizeof(target) here, but we should specify that the flag is only set when a signed overflow occurred

@encukou
Copy link
Member

encukou commented Mar 1, 2024

reading its abs() value

Without knowing the use cases my first reaction is that we need to draw the line somewhere, and I'd be OK with this joining exporting non-byte digits on the other side of the line.

What's really the problem here is extracting values that require all the available bits to provide full fidelity.

Which is fairly important since C-ish APIs tend to smuggle unrelated information in high bits. If everyone used ints for actual counting, we wouldn't be here :)

[scoder] Maybe the unsigned function could have a second return argument pointer that gives the sign as 1 and -1?
[zooba] one solution could be an optional out parameter that returns true iff it a zero sign bit was the only bit that couldn't be copied

Or a nullable char *sign_out argument:

  • if it's NULL, negative values cause ValueError
  • otherwise the sign is always filled in

@zooba
Copy link
Member

zooba commented Mar 1, 2024

Again, it's not negative values that are the tricky problem - it's large positive values. By excluding negatives, all you're doing is annoying the caller.

The extra information that has to be returned is not "was the input positive or negative", it's "was the only information that was lost the sign bit". And this is only relevant for positive values because if you omit the sign bit from a negative value in two's complement, you change the value and so have always lost more than just the sign bit.

I updated #116053 last night to do the extra checks we need, and there are comments where that extra information needs to be returned when we figure out how best to do it (either by returning it, or by taking a flag that says to assume it).

@encukou
Copy link
Member

encukou commented Mar 1, 2024

Ah, I think I finally get it. There are 3 cases:

  • Signed quantity (covered by the existing function)
  • Unsigned quantity (negatives are invalid; we want the full range)
  • The topmost bit is just a bit; we don't care if the Python representation of MSb=1 is a negative int or large positive one.

Is that right?

In my suggestion, I was thinking about a new function -- PyLong_AsUnsignedNativeBytes(..., char *sign_out) -- to cover the last two.

@zooba
Copy link
Member

zooba commented Mar 1, 2024

If we have a new function, that's signal enough that we don't need to give it an extra argument. When you're calling that function, if the input was positive but the resulting MSB is set, we don't care (provided nothing higher than the MSB needed to be set).

I don't honestly see the benefit in rejecting negatives. The same rule applies - if the input was negative, provided the MSB is set (no information loss) and everything higher than the MSB would be set (sign extension), we can return success (which is the same as for the signed case). I'd rather just add a function for getting the sign from the PyLongObject so that people who want to reject negatives can do it, but I wouldn't want to conflate it with choosing between AsNativeBytes and AsUnsignedNativeBytes.

The range check (sign check) is to do with business logic, not with the binary representation.

@encukou
Copy link
Member

encukou commented Mar 1, 2024

So, writing 255 and -1 into 1-byte buffer would have the same result -- all bits set, which PyLong_FromUnsignedNativeBytes would turn into 255. Right?

IMO, accepting negatives in AsUnsignedNativeBytes is a footgun that at least needs a prominent note in the docs. I see it as perpetuating C's mistakes. But, I can see where you're coming from, and I can live with our difference in opinions.

@zooba
Copy link
Member

zooba commented Mar 1, 2024

all bits set, which PyLong_FromUnsignedNativeBytes would turn into 255. Right?

and PyLong_FromNativeBytes would turn into -1. Right.

IMO, accepting negatives in AsUnsignedNativeBytes is a footgun that at least needs a prominent note in the docs.

This is fine, but my counterpoint is that there's no other way to do it in our C API (and the way to do it in Python is to & 0xFFFFF...., which is a pain to do dynamically). So if we cut it off, we force users into complex workarounds, whereas if we allow it then it becomes possible.

And I think the documentation for this makes the most sense framed as "behaves like AsNativeBytes but assumes the result will be used as unsigned, and so does not require positive input values to leave the most significant bit clear. This may result in large positive inputs being indistinguishable from some negative inputs. To exclude negative inputs, first test the sign with <new API>"

@gpshead
Copy link
Member

gpshead commented Mar 1, 2024

I like the direction this is going, yes, that is the way I was hoping an Unsigned API variant would behave. I do think it is useful to have a way to return that the value was negative. Petr's char *sign_out idea makes sense to me there, always fill that in with 0 or -1 if it is non-NULL.

woodruffw pushed a commit to woodruffw-forks/cpython that referenced this issue Mar 4, 2024
…ve the test (python#115380)

This expands the examples to cover both realistic use cases for the API.
    
I noticed thing in the test that could be done better so I added those as well: We need to guarantee that all bytes of the result are overwritten and that too many are not written.  Tests now pre-fills the result with data in order to ensure that.

Co-authored-by: Steve Dower <[email protected]>
zooba added a commit to zooba/cpython that referenced this issue Mar 19, 2024
zooba added a commit to zooba/cpython that referenced this issue Mar 28, 2024
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
…ve the test (python#115380)

This expands the examples to cover both realistic use cases for the API.
    
I noticed thing in the test that could be done better so I added those as well: We need to guarantee that all bytes of the result are overwritten and that too many are not written.  Tests now pre-fills the result with data in order to ensure that.

Co-authored-by: Steve Dower <[email protected]>
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-C-API type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

7 participants