Why is hash(-1) == hash(-2) in Python?

by alexmolason 1/7/25, 10:23 PMwith 91 comments
by Terr_on 1/7/25, 10:40 PM

In a way this is an argument for languages where it's normal to have result-types [0] or exceptions, instead of -1 etc. It just feels wrong to have any potential collision or confusion between normal results and error results.

[0] https://en.wikipedia.org/wiki/Result_type

by vhcron 1/10/25, 10:10 PM

Somewhat related, hash() returns a result % sys.hash_info.modulus, so if you build a set or a dict with keys multiple of sys.hash_info.modulus, you run into quadratic behavior.

    >>> {i * sys.hash_info.modulus for i in range(5000)}
    0.40s
    >>> {i * sys.hash_info.modulus for i in range(50000)}
    29.34s
    >>> {i for i in range(50000)}
    0.06s

by nneonneoon 1/10/25, 9:23 PM

Naturally, this extends to custom types:

  >>> class X:
  ...     def __hash__(self): return -1
  ... 
  >>> hash(X())
  -2

by ks2048on 1/10/25, 8:19 PM

The next k for which hash(k) != k: 2**61-1

    print(hash(2**61 - 2))
    2305843009213693950

    print(hash(2**61 - 1))
    0

by binary132on 1/10/25, 10:25 PM

The fact that a C function really only has its return value for communicating success or error status is why most fallible C functions use mutable parameters, also known as output arguments, for their result value or values. This is like, elementary C program design.

by daxfohlon 1/10/25, 9:47 PM

Another question is why do ints largely hash to themselves? Most hashing guides recommend hashes be far more distributed IIUC.

by Galanweon 1/10/25, 8:58 PM

> So, in C, when you design a function, and you want your function to be able to indicate “there was an error”, it has to return that error as its return value.

Well, you can also use an errno-like system. It has its own set of drawbacks as well, but it removes the "I need to reserve a sentinel value" problem.

by TRiG_Irelandon 1/10/25, 8:38 PM

I enjoyed this, because it seems to be written by an interested and interesting human, at a level that I could actually follow.

by kazinatoron 1/11/25, 2:26 AM

Object hashes that can fail or be unimplemented, where lower-level hashes have to be aware of this and stay away from the value that the parent uses for error signaling?

What else is unmitigated shit in Python? :)

by AEVLon 1/11/25, 9:22 PM

Why not instead have

hash(-1)=-2,

hash(-2)=-3,

hash(-3)=-4,

and so on?

by snickerbockerson 1/10/25, 9:07 PM

>CPython is written in C, and unlike Python, C doesn’t have exceptions. So, in C, when you design a function, and you want your function to be able to indicate “there was an error”, it has to return that error as its return value

This is just wrong. C doesn't feature 'error handling' as a dedicated form of branching the way many higher level languages do and you're in no way required to use return codes as an error signal. This is a case of bad API design and is entirely python's fault.

by intalentiveon 1/10/25, 8:46 PM

>>> d = {-2: None}

>>> -1 in d

False

What gives?

by kstrauseron 1/10/25, 8:15 PM

Because the hash function is only reliably useful for laying out dict keys, and its outputs are totally opaque and coincidental.

The pigeonhole principle says there are an infinite number of inputs with the same hash output. Trying to figure out how that happens can be fun and enlightening, but why? “Because it just does, that’s all.”

by jiggawattson 1/10/25, 8:44 PM

This seems like a loaded footgun: anyone writing a custome hash function has to know about -1 being treated like an error and handle it in a similar way.

I can’t think of any other language where this kind of thing happens, which means other developers won’t expect it either.

I can see the bug report now: “certain records cause errors, occurs only for about one in a few billion.”

by pmarreckon 1/10/25, 8:19 PM

A better question might be, why are people still using OO dynamic languages without any typing?