The Wayback Machine - https://web.archive.org/web/20221223142424/https://github.com/python/cpython/issues/100313
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

__hash__ method in str child class causing unintended side effects #100313

Open
Marlin-Na opened this issue Dec 17, 2022 · 8 comments
Open

__hash__ method in str child class causing unintended side effects #100313

Marlin-Na opened this issue Dec 17, 2022 · 8 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@Marlin-Na
Copy link

Marlin-Na commented Dec 17, 2022

Bug report

Consider the following example:

class MyStr(str):
    def __init__(self, value, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.value = value

    def __hash__(self) -> int:
        return hash(str(self))

    def __str__(self) -> str:
        return str(self.value)

def dummy_func(x):
    class MyClass:
        def __init__(self, data):
            self.data = data
        def __str__(self):
            return self.data
    str(MyClass(x))

dummy_func is a function that should have absolutely no side effects.
However, checkout:

a = MyStr("teststring")
a in {}
# False
dummy_func(a)
a in {}
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "<stdin>", line 6, in __hash__
#   File "<stdin>", line 6, in __hash__
#   File "<stdin>", line 8, in __str__
#   File "<stdin>", line 8, in __str__
#   File "<stdin>", line 8, in __str__
#   [Previous line repeated 329 more times]
# RecursionError: maximum recursion depth exceeded while calling a Python object
assert id(a.value.data) == id(a)

In this case, dummy_func mutates the input, and creates a weird circular dependency (i.e. id(a.value.data) == id(a)).

The issue can be fixed by removing the __hash__ method of MyStr.

Please confirm the behavior is unintended.

Environment

  • CPython versions tested on: Python 3.10.5
  • Operating system and architecture: CentOS, x86_64
@Marlin-Na Marlin-Na added the type-bug An unexpected behavior, bug, or error label Dec 17, 2022
@kwsp
Copy link
Contributor

kwsp commented Dec 17, 2022

I suspect this is undefined behaviour. I quote from the documentation of object.__str__:

object.str(self)
Called by str(object) and the built-in functions format() and print() to compute the “informal” or nicely printable string representation of an object. The return value must be a string object.

In other words, __str__ must return a string object. In MyClass.__str__, you returned a MyStr object, which isn't a string object, but a subclass of a string object. I don't know how str and __str__ work internally, and I'm looking for the code, but I suspect it check if its of type str, and if not, it does something with it.

Another fix (probably the correct fix) to this issue is to make sure __str__ always returns a str:

class MyClass:
    def __init__(self, data):
        self.data = data
    def __str__(self):
        return str(self.data)

...

dummy_func is a function that should have absolutely no side effects.

This is not really guaranteed, since your code has undefined behaviour.

@Achxy
Copy link

Achxy commented Dec 17, 2022

The strange part here is how MyStr is getting instantiated multiple times with MyClass as an argument, the rest is caused by MyClass.__str__ invoking MyStr.__str__ which then causes a recursive error.
As a fix, you could make sure self.value is binded to an str instance, and internally using str.__str__ desc.

Here is a somewhat bandaid:

class MyStr(str):
    def __init__(self, value, *args, **kwargs):
        self.value = str(value)
        super().__init__(*args, **kwargs)

    def __hash__(self) -> int:
        return hash(str(self))

    def __str__(self) -> str:
        return str.__str__(self.value)


def dummy_func(x):
    class MyClass:
        def __init__(self, data: str):
            self.data = data

        def __str__(self):
            return self.data

    str(MyClass(x))


a = MyStr("teststring")
print(a in {})  # False
dummy_func(a)
print(a in {})  # False

@Achxy
Copy link

Achxy commented Dec 17, 2022

I was able to recreate what was happening above a bit more concisely,

class DerivedStr(str):
    def __init__(self, v) -> None:
        # type(v)=<class 'str'>
        # type(v)=<class '__main__.Repro'>
        print(f"{type(v)=}")
        self.v = v

    def __str__(self) -> str:
        return self.v


class Repro:
    def __str__(self) -> str:
        return DerivedStr("Subclass test")


r = Repro()
x = str(r)  # Returns DerivedStr instance

# Instantiates DerivedStr with Repro
y = str(x)  # TypeError: __str__ returned non-string (type Repro)

Apparently instantiating str or invoking __str__ on DerivedStr instance causes a new DerivedStr instance to be created with Repro as an argument.

@ronaldoussoren
Copy link
Contributor

ronaldoussoren commented Dec 17, 2022

I was able to recreate what was happening above a bit more concisely,

class DerivedStr(str):
    def __init__(self, v) -> None:
        # type(v)=<class 'str'>
        # type(v)=<class '__main__.Repro'>
        print(f"{type(v)=}")
        self.v = v

    def __str__(self) -> str:
        return self.v


class Repro:
    def __str__(self) -> str:
        return DerivedStr("Subclass test")


r = Repro()
x = str(r)  # Returns DerivedStr instance

# Instantiates DerivedStr with Repro
y = str(x)  # TypeError: __str__ returned non-string (type Repro)

Apparently instantiating str or invoking __str__ on DerivedStr instance causes a new DerivedStr instance to be created with Repro as an argument.

An issue with this reproducer is the use of __init__ for initialising: strings are immutable types that do most of their inititialision in their __new__. Replacing the __init__ method by a __new__ that calls super and otherwise has the same behaviour as the __init__ method fixes the error, e.g.:

class DerivedStr(str):
    def __new__(cls, v) -> None:
        self = super().__new__(cls, v)
        print(f"__init__ {v=} {type(v)=}")
        self.v = v
        return self

    def __str__(self) -> str:
        print(f"__str__ {self.v=} {type(self.v)=}")
        return (self.v)

Leaving the issue open because I'm not sure if this fully explains what's going on here.

@ronaldoussoren ronaldoussoren added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Dec 17, 2022
@sweeneyde
Copy link
Member

sweeneyde commented Dec 18, 2022

I think the main idea here is that mc = MyClass(x); str(mc) mutates the object x, where type(x) is MyStr:

  • str(mc) actually calls type(str).__call__(str, mc), which goes into type.__call__(str, mc).
    • This first calls str.__new__(str, mc) (the ->tp_new(type, args, kwds) in the c code)
      • str.__new__ is unicode_new in the c code.
      • This calls PyObject_Str(mc), which calls type(mc).__str__(mc), which as implemented by OP, returns x.
    • The body of type.__call__ receives the returned object x from. It then checks whether isinstance(x, str), and since it is (subclasses count as instances), it decides to call type(x).__init__(x, mc).
    • As implemented by OP, MyStr.__init__ assigns x.value = mc (super is irrelevant).

This mutation causes hash(x), which was once a successfully terminating program where type(x.value) is str to turn into an infinite loop when type(x.value) is MyClass, caused by this circular definition:

hash(x)
   := MyStr.__hash__(x)
   := hash(str(x))
   := hash(MyStr.__str__(x))
   := hash(str(x.value))
   := hash(str(mc))
   := hash(MyClass.__str__(mc))
   := hash(mc.data)
   := hash(x)

The heart of this is that str is a type, not a function. So calling it has this extra default behavior of calling __init__ where applicable. This is documented here.

I agree with the suggestions that __new__ would work better and that returning subclasses from __str__ is asking for trouble, but I also agree that the behavior in this corner case is a bit surprising. But I don't see any way that this could be easily changed without changing that fundamental Python rule about when __init__ methods are called.

@sweeneyde
Copy link
Member

sweeneyde commented Dec 18, 2022

One possibility to prevent this would be to deprecate returning anything but an exact str from __str__ methods. The related issue for operator.__index__ returning exact ints is #61776

@Marlin-Na
Copy link
Author

Marlin-Na commented Dec 18, 2022

@sweeneyde Thanks, that's a super clear view of what's happening here. I think we may either throw an TypeError if __str__ returns a str subclass, or we doing a conversion force converting it to plain str.

@ronaldoussoren
Copy link
Contributor

ronaldoussoren commented Dec 21, 2022

One possibility to prevent this would be to deprecate returning anything but an exact str from __str__ methods. The related issue for operator.__index__ returning exact ints is #61776

That would break existing code though. I'm using a subclass of str() in PyObjC, including in the implementation of __repr__ (and hence __str__) for classes defined by that package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants