You are browsing the archives of python.
python
Python NaN equality rules (and cw.eq.equals)
>>> float('nan') == float('nan')
False
>>> n = float('nan')
>>> n
nan
>>> n == n
False
>>> [float('nan')] == [float('nan')]
False # Note: in PyPy, True
>>> [n] == [n]
True
>>> nl = [float('nan')]
>>> nl == nl
True
Got it? Good. (The behavior above is caused by various object-identity shortcuts for either the NaN or the list object.)
If you’re wondering how this works in JavaScript, well, it doesn’t, because JavaScript doesn’t have any kind of deep-equality comparison. JavaScript Arrays and Objects compare by identity. But in Coreweb’s cw.eq.equals, I didn’t implement the object-identity shortcut, so NaN comparison works correctly:
>>> Number.NaN == Number.NaN
false
>>> n = Number.NaN
NaN
>>> n == n
false
>>> cw.eq.equals([Number.NaN], [Number.NaN])
false
>>> cw.eq.equals([n], [n])
false
>>> nl = [Number.NaN]
[NaN]
>>> cw.eq.equals(nl, nl)
false
Performance improvements in Python Protocol Buffers
protobuf‘s Python implementation has been known for its slowness, but that might be changing. From a 2010-11-01 changelog:
Python
* Added an experimental C++ implementation for Python messages via a Python
extension. Implementation type is controlled by an environment variable
PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION (valid values: "cpp" and "python")
The default value is currently "python" but will be changed to "cpp" in
future release.
* Improved performance on message instantiation significantly.
Most of the work on message instantiation is done just once per message
class, instead of once per message instance.
* Improved performance on text message parsing.
– http://code.google.com/p/protobuf/source/detail?r=349
Also, if you like Protocol Buffers and JSON, check out Protojson.
Getting the total size of a built-in Python object
Ever notice how sys.getsizeof doesn’t include the size of the object’s children?
>>> sys.getsizeof({}) 136 >>> sys.getsizeof({"1": "x" * 1000000}) 136
I don’t know if there is a truly good use for this, but someone in #python wanted it, so here it is:
import sys def totalSizeOf(obj, _alreadySeen=None): """ Get the size of object C{obj} using L{sys.getsizeof} on the object itself and all of its children recursively. If the same object appears more than once inside C{obj}, it is counted only once. This only works properly if C{obj} is a str, unicode, list, tuple, dict, set, frozenset, bool, NoneType, int, complex, float, long, or any nested combination of the above. C{obj} is allowed to have circular references. This might be useful for getting a good estimate of how much memory a JSON-decoded object is using after receiving it. Design notes: L{sys.getsizeof} returns reasonable numbers, but does not recurse into the object's children. As we recurse into the children, we keep track of objects we've already counted for two reasons: - If we've already counted the object's memory usage, we don't want to count it again. - As a bonus, we handle circular references gracefully. This function assumes that containers do not modify their children as they are traversed. """ if _alreadySeen is None: _alreadySeen = set() total = sys.getsizeof(obj) _alreadySeen.add(id(obj)) if isinstance(obj, dict): # Count the memory usage of both the keys and values. for k, v in obj.iteritems(): if not id(k) in _alreadySeen: total += totalSizeOf(k, _alreadySeen) if not id(v) in _alreadySeen: total += totalSizeOf(v, _alreadySeen) else: try: iterator = obj.__iter__() except (TypeError, AttributeError): pass else: for item in iterator: if not id(item) in _alreadySeen: total += totalSizeOf(item, _alreadySeen) return total
No © on the above, enjoy.
>>> totalSizeOf({}) 136 >>> totalSizeOf({"1": "x" * 1000000}) 1000179
If you want unit tests, see mypy/test/test_objops.py.
A template for immutable Python objects
First, the immutable object template, then the explanation:
import operator class Circle(tuple): __slots__ = () # An immutable and unique marker, used to make sure different # tuple subclasses are not equal to each other. _MARKER = object() size = property(operator.itemgetter(1)) color = property(operator.itemgetter(2)) def __new__(cls, size, color): """ @param size: an int @param color: a str """ return tuple.__new__(cls, (cls._MARKER, size, color)) def __repr__(self): return '%s(%r, %r)' % (self.__class__.__name__, self[1], self[2]) def double(self): """ Get a Circle twice the size of this one. """ return self.__class__(self.size * 2, self.color)
Why bother? Well, compared to normal user-defined class instances, Circle instances are immutable, have a __hash__ (hashes contents), and have better default comparison operators (compares contents). Everything works as you would expect:
>>> Circle(3, "red") == Circle(3, "red") True >>> Circle(3, "red") == Circle(3, "orange") False >>> Circle(3, "red") == (3, "red") False >>> Circle(3, "red").size 3 >>> Circle(3, "red").color 'red' >>> a = set() >>> a.add(Circle(3, "red")) >>> a.add(Circle(3, "red")) >>> a.add(Circle(4, "green")) >>> a set([Circle(3, 'red'), Circle(4, 'green')]) >>> c = Circle(2, "red") >>> c.double() Circle(4, 'red') >>> c.color = 'blue' Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: can't set attribute
If you wanted hashibility and good comparisons, couldn’t you just add __hash__ and comparison operators to your normal class? Yes, but then hashing and comparison would call into slower[1] Python code rather than tuple‘s native methods. And since your object is not really immutable, a user of your API might be tempted to mutate an object that really shouldn’t be mutated.
The above hack is actually built in to Python as collections.namedtuple (see the source). It works by generating code (like the above template) and execing it. There are a few reasons you might not want it, though:
1. namedtuple is available in Python 2.6+ only (though there are some alternate implementations).
2. You cannot add your own methods or customize the __repr__.
3. You cannot validate parameters passed to the constructor.
4. If you want to add docstrings to your namedtuple, you probably have to subclass it.
5. Completely different namedtuples are equal to each other if they have the same contents:
>>> from collections import namedtuple >>> A = namedtuple('A', 'x y') >>> B = namedtuple('B', 'z t') >>> A(1, 2) == B(1, 2) True
Then again, there’s reasons not to use the above “immutable object template” either: it’s very easy to mess up, you need rather superfluous unit tests for attribute access, and it might scare away new Python programmers. There are also a few surprises: Circle is len()able, indexable, and sliceable. But it is the least-terrible solution I could come up with.
[1] this might not be the case with PyPy
How to test your __eq__ / __ne__ / __cmp__
In Python, a common mistake is to implement __eq__ on your object without also implementing __ne__. Even worse, your unit tests will often hide the error because the default object-identity __ne__ will probably satisfy your assertions.
If you’ve implemented __eq__ and __ne__, you might still have a mistake if the superclass has a __cmp__: Python’s cmp will fall back to the superclass’ __cmp__ instead of using your __eq__ (example). You’ll probably never notice this problem unless you use cmp(...) on your object.
When I need to exercise all possible combinations of ==, !=, and cmp, I mix this into my TestCases and use self.assertReally(Not)Equal(a, b):
class ReallyEqualMixin(object): def assertReallyEqual(self, a, b): # assertEqual first, because it will have a good message if the # assertion fails. self.assertEqual(a, b) self.assertEqual(b, a) self.assertTrue(a == b) self.assertTrue(b == a) self.assertFalse(a != b) self.assertFalse(b != a) self.assertEqual(0, cmp(a, b)) self.assertEqual(0, cmp(b, a)) def assertReallyNotEqual(self, a, b): # assertNotEqual first, because it will have a good message if the # assertion fails. self.assertNotEqual(a, b) self.assertNotEqual(b, a) self.assertFalse(a == b) self.assertFalse(b == a) self.assertTrue(a != b) self.assertTrue(b != a) self.assertNotEqual(0, cmp(a, b)) self.assertNotEqual(0, cmp(b, a))
No © on the above, enjoy.
Further reading: How to override comparison operators in Python
Notes on subclassing Python’s dict
Update 2011-05-10: This post was written after implementing securedict in Securetypes. If this post doesn’t make sense, see the code.
The notes:
If for some reason you must subclass Python’s, dict, keep these in mind:
1. Both dict.__init__ and dict.update use the update algorithm, which doesn’t necessarily iterate over the object you pass in:
>>> help({}.update)
D.update(E, **F) -> None. Update D from dict/iterable E and F.
If E has a .keys() method, does: for k in E: D[k] = E[k]
If E lacks .keys() method, does: for (k, v) in E: D[k] = v
In either case, this is followed by: for k in F: D[k] = F[k]
Above, for k in E: D[k] should actually read for k in E.keys(): D[k].
It also omits a CPython implementation quirk: the update algorithm has a fast path for dicts (and subclasses of it), which ignores the keys method. CPython’s dictobject.c actually does this:
D.update(E, **F) -> None. Update D from dict/iterable E and F.
If isinstance(E, dict), does: for k in E: D[k] = E[k], bypassing E.__iter__
Else if E has a .keys() method, does: for k in E.keys(): D[k] = E[k]
Else if E lacks .keys() method, does: for (k, v) in E: D[k] = v
In any case, this is followed by: for k in F: D[k] = F[k]
2. If you override __eq__ and __ne__, remember to override __cmp__ as well, or else cmp(yourCustomDict, ...) will be broken.
3. If your custom dict behaves a lot like the real Python dict, consider copying many of the unit tests from CPython’s Lib/test/test_dict.py:DictTest. These tests have some omissions, though: they don’t test .iteritems(), the new .view*() methods, or dict instantiation with **kwargs. They’re also missing comprehensive equality tests.
4. A custom __repr__ is tricky to implement, if you’re trying to avoid infinite recursion when the dict contains itself. Built-in types solve the problem by repr’ing to something like [[...]] or {"a": {...}}. In your dict subclass with a custom __repr__, use an instance variable to track whether you’ve already been __repr__‘ed, and remember to reset that variable in a finally: block.
5. Do you want .copy() to return an instance of your own custom dict? If so, better implement it.
6. In Python 2.7+, dicts have three new methods: viewkeys, viewitems, and viewvalues. Changing their behavior in a good way doesn’t look practical.
Don’t use from __future__ import division
tl;dr: For maintenance reasons, just float() one of the values instead.
Don’t use from __future__ import division. Consider these two cases:
1. You’re moving a block of code from a file without “future division” to a file with “future division”. You forget to change all of the /s to //s, and are screwed (because you have incomplete test coverage). Or maybe you’re moving a block of code the other way, and similarly forget to change things.
2. You have a module with from __future__ import division, but all division operations were removed in an earlier commit. Can you now remove the from __future__ import? Maybe not, and you might keep them forever, just in case there are outstanding patches to the module. But not everyone will follow that logic.
Summary: subtle global behavior mutation is bad, even if scoped to a single file.
(Consider ignoring all of this if you’re developing for both Python 2 and Python 3.)
Python < 2.5 and unicode/str comparisons
Comparing strings to unicode objects should have never been possible, but it does “work”, and you’ve probably seen this behavior in Python 2.5 – 2.7:
Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41)
>>> u"\xff" == "\xff"
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False
To do the comparison, Python calls unicode() on the str object behind the scenes, and if it cannot decode it, it emits a warning and returns False.
If you’re still maintaining software that must run on Python 2.4 (or worse), you might run into this old behavior:
Python 2.4.6 (#1, Aug 2 2010, 18:27:11)
>>> u"\xff" == "\xff"
Traceback (most recent call last):
File "
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)
Also, if you’re writing tests that involve this, keep in mind that Python 2.4 does not have a UnicodeWarning.
(After I wrote this, I found that it was documented in What’s New in Python 2.5.)
Monoclock: Access the monotonic clock from Python
Python’s time.time() jumps around if the system time changes. The monotonic clock doesn’t, and climbs steadily upwards. Unfortunately, Python doesn’t give you easy access to the monotonic clock. A 50-line Python module may help you out, at least if you have a POSIX-like OS that has librt (I tested only Linux).
Get it at: https://github.com/ludios/Monoclock
Let me know if it works, or doesn’t. Please contribute, especially if you’ve implemented Windows or OS X support.
Protojson: JSON serialization for Protocol Buffers
Google’s Protocol Buffers serializes to and deserializes from a compact binary format. As of this writing, there’s an open ticket for JSON serialization support. JSON is very useful when transporting data to web applications, or if you want human-readable bytes on the wire.
I recently wrote Protojson, a protobuf Message<->lists encoder/decoder in Python. You can get it at github.
Protojson requires the google.protobuf Python module, because it works with google.protobuf.message.Messages. Right now, Protojson only supports the PbLite (PbJsLite) format. You can use Protojson to send and receive Messages from web applications with Closure Library’s goog.proto2, or even use it for non-webapp purposes. If you’re interested in goog.proto2, see this thread, which links to a .proto->.pb.js compiler.
I haven’t benchmarked Protojson, but I wouldn’t be surprised if Protojson+simplejson was faster than the binary serialization (at least for Python).
Try it out and let me know how it works.