ivank

See /ivank/about/. mailto:ivan@ludios.org

How to test your __eq__ / __ne__ / __cmp__

In Python, a common mistake is to implement __eq__ on your object without also implementing __ne__. Even worse, your unit tests will often hide the error because the default object-identity __ne__ will probably satisfy your assertions.

If you’ve implemented __eq__ and __ne__, you might still have a mistake if the superclass has a __cmp__: Python’s cmp will fall back to the superclass’ __cmp__ instead of using your __eq__ (example). You’ll probably never notice this problem unless you use cmp(...) on your object.

When I need to exercise all possible combinations of ==, !=, and cmp, I mix this into my TestCases and use self.assertReally(Not)Equal(a, b):

class ReallyEqualMixin(object):
	def assertReallyEqual(self, a, b):
		# assertEqual first, because it will have a good message if the
		# assertion fails.
		self.assertEqual(a, b)
		self.assertEqual(b, a)
		self.assertTrue(a == b)
		self.assertTrue(b == a)
		self.assertFalse(a != b)
		self.assertFalse(b != a)
		self.assertEqual(0, cmp(a, b))
		self.assertEqual(0, cmp(b, a))
 
	def assertReallyNotEqual(self, a, b):
		# assertNotEqual first, because it will have a good message if the
		# assertion fails.
		self.assertNotEqual(a, b)
		self.assertNotEqual(b, a)
		self.assertFalse(a == b)
		self.assertFalse(b == a)
		self.assertTrue(a != b)
		self.assertTrue(b != a)
		self.assertNotEqual(0, cmp(a, b))
		self.assertNotEqual(0, cmp(b, a))

No © on the above, enjoy.

Further reading: How to override comparison operators in Python

Notes on subclassing Python’s dict

Update 2011-05-10: This post was written after implementing securedict in Securetypes. If this post doesn’t make sense, see the code.

The notes:

If for some reason you must subclass Python’s, dict, keep these in mind:

1. Both dict.__init__ and dict.update use the update algorithm, which doesn’t necessarily iterate over the object you pass in:

>>> help({}.update)

D.update(E, **F) -> None. Update D from dict/iterable E and F.
If E has a .keys() method, does: for k in E: D[k] = E[k]
If E lacks .keys() method, does: for (k, v) in E: D[k] = v
In either case, this is followed by: for k in F: D[k] = F[k]

Above, for k in E: D[k] should actually read for k in E.keys(): D[k].

It also omits a CPython implementation quirk: the update algorithm has a fast path for dicts (and subclasses of it), which ignores the keys method. CPython’s dictobject.c actually does this:

D.update(E, **F) -> None. Update D from dict/iterable E and F.
If isinstance(E, dict), does: for k in E: D[k] = E[k], bypassing E.__iter__
Else if E has a .keys() method, does: for k in E.keys(): D[k] = E[k]
Else if E lacks .keys() method, does: for (k, v) in E: D[k] = v
In any case, this is followed by: for k in F: D[k] = F[k]

2. If you override __eq__ and __ne__, remember to override __cmp__ as well, or else cmp(yourCustomDict, ...) will be broken.

3. If your custom dict behaves a lot like the real Python dict, consider copying many of the unit tests from CPython’s Lib/test/test_dict.py:DictTest. These tests have some omissions, though: they don’t test .iteritems(), the new .view*() methods, or dict instantiation with **kwargs. They’re also missing comprehensive equality tests.

4. A custom __repr__ is tricky to implement, if you’re trying to avoid infinite recursion when the dict contains itself. Built-in types solve the problem by repr’ing to something like [[...]] or {"a": {...}}. In your dict subclass with a custom __repr__, use an instance variable to track whether you’ve already been __repr__‘ed, and remember to reset that variable in a finally: block.

5. Do you want .copy() to return an instance of your own custom dict? If so, better implement it.

6. In Python 2.7+, dicts have three new methods: viewkeys, viewitems, and viewvalues. Changing their behavior in a good way doesn’t look practical.

Selected movie recommendations

Moved to this new page.

JavaScript’s sort: wonders never cease

>>> [1, 2, 10, 20].sort()
[1, 10, 2, 20]
 
>>> [3, "3", 3, "3", "3", "3", 3, 3, 3].sort()
["3", "3", "3", 3, 3, 3, "3", 3, 3]

The above was tested in Chrome 5. Sort varies by browser. Bedtime reading: ECMA-262 15.4.4.11 Array.prototype.sort(comparefn)

Don’t use from __future__ import division

tl;dr: For maintenance reasons, just float() one of the values instead.

Don’t use from __future__ import division. Consider these two cases:

1. You’re moving a block of code from a file without “future division” to a file with “future division”. You forget to change all of the /s to //s, and are screwed (because you have incomplete test coverage). Or maybe you’re moving a block of code the other way, and similarly forget to change things.

2. You have a module with from __future__ import division, but all division operations were removed in an earlier commit. Can you now remove the from __future__ import? Maybe not, and you might keep them forever, just in case there are outstanding patches to the module. But not everyone will follow that logic.

Summary: subtle global behavior mutation is bad, even if scoped to a single file.

(Consider ignoring all of this if you’re developing for both Python 2 and Python 3.)

Found an answer, left the conversation

I have had this experience several times in my life; I come across clear enough evidence that settles for me an issue I had seen long disputed. At that point my choice is to either go back and try to persuade disputants, or to continue on to explore the new issues that this settlement raises. As Eliezer implicitly advises, after a short detour to tell a few disputants, I have usually chosen this second route. This is one explanation for the existence of settled but still disputed issues; people who learn the answer leave the conversation.

Robin Hanson

Python < 2.5 and unicode/str comparisons

Comparing strings to unicode objects should have never been possible, but it does “work”, and you’ve probably seen this behavior in Python 2.5 – 2.7:

Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41)
>>> u"\xff" == "\xff"
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False

To do the comparison, Python calls unicode() on the str object behind the scenes, and if it cannot decode it, it emits a warning and returns False.

If you’re still maintaining software that must run on Python 2.4 (or worse), you might run into this old behavior:

Python 2.4.6 (#1, Aug 2 2010, 18:27:11)
>>> u"\xff" == "\xff"
Traceback (most recent call last):
  File "", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)

Also, if you’re writing tests that involve this, keep in mind that Python 2.4 does not have a UnicodeWarning.

(After I wrote this, I found that it was documented in What’s New in Python 2.5.)

Blizzard annoyances

If you use Blizzard Downloader, you might see:

“There was a problem authenticating your download. Please go to http://www.blizzard.com/account to start a new download”

One of the reasons you might see this is this (see others): when you download the downloader, Blizzard embeds your IP address into the executable, and only lets you start the download from the same IP. This is terrible software design, and completely unnecessary to stop bandwidth leeches: just embed a unique key into the executable, and ignore the IP address.

A language quirk

Let’s say you’re writing some documentation, and need to point people to a comment on a web page. Keep in mind that more comments may be added at any time. If you need to point people to the second (or third, or fourth, etc) comment, you have an easy time; you just say:

“See the second comment on <X>”

But if you need to point to the first comment, and at this time there is only one comment, you need to sound like a pedant:

“See the first (and possibly only) comment on <X>”

In English, “the second object” does not imply that more objects exist, but “the first object” does imply that more than one exists. Why would you have written “first” if there’s only one? If you wrote just “see the first comment”, and the reader saw only one comment on the page, they might wonder if a comment has been deleted, or if they’re even in the right place. If you wrote “see the only comment”, and soon another comment appeared, you would cause similar confusion.

I don’t expect this to catch on, but with a new word we can express doubts about the existence of more than one object:

“See the firsteth comment on <X>”.

If anyone out there is designing a human language, please make it easier to express things about changing environments.

Clock jumps and browsers

If you’re trying to build reliable web applications, you might have wondered what happens to existing timeouts and intervals after the system clock jumps. Not that clock jumps happen very often, but it’s nice to think of setTimeout(func, 0) (or similar) as a reliable abstraction. Unfortunately, it isn’t. See Clock jump test page for the results of my testing, or to do your own tests.

In some browsers on some OSes, if the clock jumps backwards, an existing timer might not fire for a long time. If the clock jumps forwards, an existing timer might fire too early, possibly causing misbehavior if you were relying on a correct time duration.

A summary of the test page: IE and Opera on any OS use the monotonic clock to schedule timeouts. Firefox on Windows uses the monotonic clock, but a setInterval timer is broken after a backwards clock jump. Every other OS/browser combination schedules by the system time, or behaves strangely when the time jumps.

Here’s the bug for Chromium, but a lot more bugs need to be filed.