hash method in str child class causing unintended side effects #100313

Marlin-Na · 2022-12-17T07:24:04Z

Bug report

Consider the following example:

class MyStr(str):
    def __init__(self, value, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.value = value

    def __hash__(self) -> int:
        return hash(str(self))

    def __str__(self) -> str:
        return str(self.value)

def dummy_func(x):
    class MyClass:
        def __init__(self, data):
            self.data = data
        def __str__(self):
            return self.data
    str(MyClass(x))

dummy_func is a function that should have absolutely no side effects.
However, checkout:

a = MyStr("teststring")
a in {}
# False
dummy_func(a)
a in {}
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "<stdin>", line 6, in __hash__
#   File "<stdin>", line 6, in __hash__
#   File "<stdin>", line 8, in __str__
#   File "<stdin>", line 8, in __str__
#   File "<stdin>", line 8, in __str__
#   [Previous line repeated 329 more times]
# RecursionError: maximum recursion depth exceeded while calling a Python object
assert id(a.value.data) == id(a)

In this case, dummy_func mutates the input, and creates a weird circular dependency (i.e. id(a.value.data) == id(a)).

The issue can be fixed by removing the __hash__ method of MyStr.

Please confirm the behavior is unintended.

Environment

CPython versions tested on: Python 3.10.5
Operating system and architecture: CentOS, x86_64

The text was updated successfully, but these errors were encountered:

kwsp · 2022-12-17T09:51:28Z

I suspect this is undefined behaviour. I quote from the documentation of object.__str__:

object.str(self)
Called by str(object) and the built-in functions format() and print() to compute the “informal” or nicely printable string representation of an object. The return value must be a string object.

In other words, __str__ must return a string object. In MyClass.__str__, you returned a MyStr object, which isn't a string object, but a subclass of a string object. I don't know how str and __str__ work internally, and I'm looking for the code, but I suspect it check if its of type str, and if not, it does something with it.

Another fix (probably the correct fix) to this issue is to make sure __str__ always returns a str:

class MyClass:
    def __init__(self, data):
        self.data = data
    def __str__(self):
        return str(self.data)

...

dummy_func is a function that should have absolutely no side effects.

This is not really guaranteed, since your code has undefined behaviour.

Achxy · 2022-12-17T09:55:10Z

The strange part here is how MyStr is getting instantiated multiple times with MyClass as an argument, the rest is caused by MyClass.__str__ invoking MyStr.__str__ which then causes a recursive error.
As a fix, you could make sure self.value is binded to an str instance, and internally using str.__str__ desc.

Here is a somewhat bandaid:

class MyStr(str):
    def __init__(self, value, *args, **kwargs):
        self.value = str(value)
        super().__init__(*args, **kwargs)

    def __hash__(self) -> int:
        return hash(str(self))

    def __str__(self) -> str:
        return str.__str__(self.value)


def dummy_func(x):
    class MyClass:
        def __init__(self, data: str):
            self.data = data

        def __str__(self):
            return self.data

    str(MyClass(x))


a = MyStr("teststring")
print(a in {})  # False
dummy_func(a)
print(a in {})  # False

Achxy · 2022-12-17T10:11:59Z

I was able to recreate what was happening above a bit more concisely,

class DerivedStr(str):
    def __init__(self, v) -> None:
        # type(v)=<class 'str'>
        # type(v)=<class '__main__.Repro'>
        print(f"{type(v)=}")
        self.v = v

    def __str__(self) -> str:
        return self.v


class Repro:
    def __str__(self) -> str:
        return DerivedStr("Subclass test")


r = Repro()
x = str(r)  # Returns DerivedStr instance

# Instantiates DerivedStr with Repro
y = str(x)  # TypeError: __str__ returned non-string (type Repro)

Apparently instantiating str or invoking __str__ on DerivedStr instance causes a new DerivedStr instance to be created with Repro as an argument.

ronaldoussoren · 2022-12-17T11:12:56Z

I was able to recreate what was happening above a bit more concisely,

class DerivedStr(str):
    def __init__(self, v) -> None:
        # type(v)=<class 'str'>
        # type(v)=<class '__main__.Repro'>
        print(f"{type(v)=}")
        self.v = v

    def __str__(self) -> str:
        return self.v


class Repro:
    def __str__(self) -> str:
        return DerivedStr("Subclass test")


r = Repro()
x = str(r)  # Returns DerivedStr instance

# Instantiates DerivedStr with Repro
y = str(x)  # TypeError: __str__ returned non-string (type Repro)

Apparently instantiating str or invoking __str__ on DerivedStr instance causes a new DerivedStr instance to be created with Repro as an argument.

An issue with this reproducer is the use of __init__ for initialising: strings are immutable types that do most of their inititialision in their __new__. Replacing the __init__ method by a __new__ that calls super and otherwise has the same behaviour as the __init__ method fixes the error, e.g.:

class DerivedStr(str):
    def __new__(cls, v) -> None:
        self = super().__new__(cls, v)
        print(f"__init__ {v=} {type(v)=}")
        self.v = v
        return self

    def __str__(self) -> str:
        print(f"__str__ {self.v=} {type(self.v)=}")
        return (self.v)

Leaving the issue open because I'm not sure if this fully explains what's going on here.

sweeneyde · 2022-12-18T04:48:12Z

I think the main idea here is that mc = MyClass(x); str(mc) mutates the object x, where type(x) is MyStr:

str(mc) actually calls type(str).__call__(str, mc), which goes into type.__call__(str, mc).
- This first calls str.__new__(str, mc) (the ->tp_new(type, args, kwds) in the c code)
  - str.__new__ is unicode_new in the c code.
  - This calls PyObject_Str(mc), which calls type(mc).__str__(mc), which as implemented by OP, returns x.
- The body of type.__call__ receives the returned object x from. It then checks whether isinstance(x, str), and since it is (subclasses count as instances), it decides to call type(x).__init__(x, mc).
- As implemented by OP, MyStr.__init__ assigns x.value = mc (super is irrelevant).

This mutation causes hash(x), which was once a successfully terminating program where type(x.value) is str to turn into an infinite loop when type(x.value) is MyClass, caused by this circular definition:

hash(x)
   := MyStr.__hash__(x)
   := hash(str(x))
   := hash(MyStr.__str__(x))
   := hash(str(x.value))
   := hash(str(mc))
   := hash(MyClass.__str__(mc))
   := hash(mc.data)
   := hash(x)

The heart of this is that str is a type, not a function. So calling it has this extra default behavior of calling __init__ where applicable. This is documented here.

I agree with the suggestions that __new__ would work better and that returning subclasses from __str__ is asking for trouble, but I also agree that the behavior in this corner case is a bit surprising. But I don't see any way that this could be easily changed without changing that fundamental Python rule about when __init__ methods are called.

sweeneyde · 2022-12-18T05:05:06Z

One possibility to prevent this would be to deprecate returning anything but an exact str from __str__ methods. The related issue for operator.__index__ returning exact ints is #61776

Marlin-Na · 2022-12-18T05:09:54Z

@sweeneyde Thanks, that's a super clear view of what's happening here. I think we may either throw an TypeError if __str__ returns a str subclass, or we doing a conversion force converting it to plain str.

ronaldoussoren · 2022-12-21T15:18:30Z

One possibility to prevent this would be to deprecate returning anything but an exact str from __str__ methods. The related issue for operator.__index__ returning exact ints is #61776

That would break existing code though. I'm using a subclass of str() in PyObjC, including in the implementation of __repr__ (and hence __str__) for classes defined by that package.

Marlin-Na added the type-bug An unexpected behavior, bug, or error label Dec 17, 2022

ronaldoussoren added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Dec 17, 2022

Nov	DEC	Jan
	23
2021	2022	2023

hash method in str child class causing unintended side effects #100313

hash method in str child class causing unintended side effects #100313

Marlin-Na commented Dec 17, 2022

kwsp commented Dec 17, 2022

Achxy commented Dec 17, 2022 •

edited

Achxy commented Dec 17, 2022 •

edited

ronaldoussoren commented Dec 17, 2022

sweeneyde commented Dec 18, 2022 •

edited

sweeneyde commented Dec 18, 2022

Marlin-Na commented Dec 18, 2022

ronaldoussoren commented Dec 21, 2022

__hash__ method in str child class causing unintended side effects #100313

__hash__ method in str child class causing unintended side effects #100313

Comments

Marlin-Na commented Dec 17, 2022

Bug report

Environment

kwsp commented Dec 17, 2022

Achxy commented Dec 17, 2022 • edited

Achxy commented Dec 17, 2022 • edited

ronaldoussoren commented Dec 17, 2022

sweeneyde commented Dec 18, 2022 • edited

sweeneyde commented Dec 18, 2022

Marlin-Na commented Dec 18, 2022

ronaldoussoren commented Dec 21, 2022

hash method in str child class causing unintended side effects #100313

hash method in str child class causing unintended side effects #100313

Achxy commented Dec 17, 2022 •

edited

Achxy commented Dec 17, 2022 •

edited

sweeneyde commented Dec 18, 2022 •

edited