ivank

See /ivank/about/. mailto:ivan@ludios.org

Python NaN equality rules (and cw.eq.equals)

>>> float('nan') == float('nan')
False
>>> n = float('nan')
>>> n
nan
>>> n == n
False
>>> [float('nan')] == [float('nan')]
False # Note: in PyPy, True
>>> [n] == [n]
True
>>> nl = [float('nan')]
>>> nl == nl
True

Got it? Good. (The behavior above is caused by various object-identity shortcuts for either the NaN or the list object.)

If you’re wondering how this works in JavaScript, well, it doesn’t, because JavaScript doesn’t have any kind of deep-equality comparison. JavaScript Arrays and Objects compare by identity. But in Coreweb’s cw.eq.equals, I didn’t implement the object-identity shortcut, so NaN comparison works correctly:

>>> Number.NaN == Number.NaN
false
>>> n = Number.NaN
NaN
>>> n == n
false
>>> cw.eq.equals([Number.NaN], [Number.NaN])
false
>>> cw.eq.equals([n], [n])
false
>>> nl = [Number.NaN]
[NaN]
>>> cw.eq.equals(nl, nl)
false

Sublime Text + Samba + GHC problem

If you’re using Sublime Text on Windows to edit files on a network drive powered by Samba, and then run a Haskell program on the system running Samba (not the machine running Sublime Text), you might see:

openBinaryFile: resource exhausted (Resource temporarily unavailable)

or

openFile: resource exhausted (Resource temporarily unavailable)

This problem is caused by Samba grabbing oplocks (opportunistic locks) on the files being edited. Sublime Text makes system calls that result in the oplocks being grabbed, while most other editors (including Wordpad) do not. You can work around the problem by adding oplocks = no in the configuration for your share in smb.conf. Using Samba has some reading on oplocks.

Preserving italics in pastes from webpages

Tired of losing italics in pastes from websites to your plain-text editor? For a while, I thought I needed to switch to a rich-text editor to handle these pastes, but that would be a pain for several reasons. A few days ago I realized I just needed to mutate the webpage by surrounding italic text with slashes. I would run a bookmarklet, then copy/paste. (Skip below for a working bookmarklet).

My first attempt at this used the :before and :after CSS pseudo-elements:

javascript:(function() {
	var newSS, styles='em:before, em:after, i:before, i:after { content: "/" }';
	if(document.createStyleSheet) {
		document.createStyleSheet("javascript:'" + styles + "'");
	} else {
		newSS = document.createElement('link');
		newSS.rel = 'stylesheet';
		newSS.href = 'data:text/css,' + escape(styles);
		document.getElementsByTagName("head")[0].appendChild(newSS);
	}
})();

If you run the above (paste into your URL bar and hit enter), it looks like it works until you try to paste the text containing slashes; neither Chrome 10 or Firefox 4 will put the slashes in your clipboard. I threw out the simple clean non-working code and went for a terrible hack:

javascript:(function() { document.body.innerHTML = document.body.innerHTML.
	replace(/<em(\s.*?)?>([\s\S]+?)<\/em>/gi, "/<em$1>$2</em>/").
	replace(/<i(\s.*?)?>([\s\S]+?)<\/i>/gi, "/<i$1>$2</i>/").
	replace(/<span style="font-style: italic;">([\s\S]+?)<\/span>/gi, '/<span style="font-style: italic;">$1</span>/').
	replace(/<strong(\s.*?)?>([\s\S]+?)<\/strong>/gi, "*<strong$1>$2</strong>*").
	replace(/<b(\s.*?)?>([\s\S]+?)<\/b>/gi, "*<b$1>$2</b>*"); })();

This above replaces document.body.innerHTML with a new string where italics are surrounded with “/” and bold surrounded by “*”. This is terrible in several ways: it may break or re-run JavaScript code on the page, or somehow make you vulnerable to XSS (in a manner likely similar to the IE8 XSS “blocker”). Only while writing this post did I realize the security implications of the above. I decided to implement it using DOM manipulation (requires CSS3 selectors and querySelectorAll):

javascript:(function() { 
	var wrapElements = function(selector, wrapperText) {
		var elements = document.querySelectorAll(selector);
		for(var i=0; i < elements.length; i++) {
			var surround = elements[i];
			if(surround.parentNode) {
				surround.parentNode.insertBefore(
					document.createTextNode(wrapperText), surround);
			}
			surround.appendChild(
				document.createTextNode(wrapperText));
		}
	};
 
	var getStyleAttrVariations = function(elem, property, value) {
		return [
			 elem + '[style="' + property + ': ' + value + ';"]'
			,elem + '[style="' + property + ': ' + value + '"]'
			,elem + '[style="' + property + ':'  + value + ';"]'
			,elem + '[style="' + property + ':'  + value + '"]'
		];
	};
 
	wrapElements('em, i, span[class~="emphasis"], span[class~="italic"]', '/');
	wrapElements(getStyleAttrVariations('span', 'font-style', 'italic').join(' ,'), '/');
 
	wrapElements('strong, b, span[class~="bold"]', '*');
	wrapElements(getStyleAttrVariations('span', 'font-weight', 'bold').join(' ,'), '*');
})();

If you don’t want to manually create a bookmark with the above code, you can drag this link to your toolbar or bookmarks:

Preserve italics and bold

I use a bookmark keyword in Firefox to activate it; keywords are not just for searches.

Here’s some bold and italic text so you know it’s working: bold, italic.

Git repos have moved to GitHub

In case you cloned Monoclock or Protojson, http://ludios.net/git/ no longer exists; everything is now at https://github.com/ludios

Performance improvements in Python Protocol Buffers

protobuf‘s Python implementation has been known for its slowness, but that might be changing. From a 2010-11-01 changelog:

  Python
  * Added an experimental  C++ implementation for Python messages via a Python
    extension. Implementation type is controlled by an environment variable
    PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION (valid values: "cpp" and "python")
    The default value is currently "python" but will be changed to "cpp" in
    future release.
  * Improved performance on message instantiation significantly.
    Most of the work on message instantiation is done just once per message
    class, instead of once per message instance.
  * Improved performance on text message parsing.

http://code.google.com/p/protobuf/source/detail?r=349

Also, if you like Protocol Buffers and JSON, check out Protojson.

Getting the total size of a built-in Python object

Ever notice how sys.getsizeof doesn’t include the size of the object’s children?

>>> sys.getsizeof({})       
136
>>> sys.getsizeof({"1": "x" * 1000000})
136

I don’t know if there is a truly good use for this, but someone in #python wanted it, so here it is:

import sys
 
def totalSizeOf(obj, _alreadySeen=None):
	"""
	Get the size of object C{obj} using L{sys.getsizeof} on the object
	itself and all of its children recursively.  If the same object appears
	more than once inside C{obj}, it is counted only once.
 
	This only works properly if C{obj} is a str, unicode, list, tuple, dict,
	set, frozenset, bool, NoneType, int, complex, float, long, or any nested
	combination of the above.  C{obj} is allowed to have circular references.
 
	This might be useful for getting a good estimate of how much memory a
	JSON-decoded object is using after receiving it.
 
	Design notes: L{sys.getsizeof} returns reasonable numbers, but does not
	recurse into the object's children.  As we recurse into the children, we
	keep track of objects we've already counted for two reasons:
		- If we've already counted the object's memory usage, we don't
		want to count it again.
		- As a bonus, we handle circular references gracefully.
 
	This function assumes that containers do not modify their children as
	they are traversed.
	"""
	if _alreadySeen is None:
		_alreadySeen = set()
 
	total = sys.getsizeof(obj)
	_alreadySeen.add(id(obj))
 
	if isinstance(obj, dict):
		# Count the memory usage of both the keys and values.
		for k, v in obj.iteritems():
			if not id(k) in _alreadySeen:
				total += totalSizeOf(k, _alreadySeen)
			if not id(v) in _alreadySeen:
				total += totalSizeOf(v, _alreadySeen)
	else:
		try:
			iterator = obj.__iter__()
		except (TypeError, AttributeError):
			pass
		else:
			for item in iterator:
				if not id(item) in _alreadySeen:
					total += totalSizeOf(item, _alreadySeen)
 
	return total

No © on the above, enjoy.

>>> totalSizeOf({})
136
>>> totalSizeOf({"1": "x" * 1000000})  
1000179

If you want unit tests, see mypy/test/test_objops.py.

A template for immutable Python objects

First, the immutable object template, then the explanation:

import operator
 
class Circle(tuple):
	__slots__ = ()
	# An immutable and unique marker, used to make sure different
	# tuple subclasses are not equal to each other.
	_MARKER = object()
 
	size = property(operator.itemgetter(1))
	color = property(operator.itemgetter(2))
 
	def __new__(cls, size, color):
		"""
		@param size: an int
		@param color: a str
		"""
		return tuple.__new__(cls, (cls._MARKER, size, color))
 
	def __repr__(self):
		return '%s(%r, %r)' % (self.__class__.__name__, self[1], self[2])
 
        def double(self):
                """
                Get a Circle twice the size of this one.
                """
                return self.__class__(self.size * 2, self.color)

Why bother? Well, compared to normal user-defined class instances, Circle instances are immutable, have a __hash__ (hashes contents), and have better default comparison operators (compares contents). Everything works as you would expect:

>>> Circle(3, "red") == Circle(3, "red")
True
>>> Circle(3, "red") == Circle(3, "orange")
False
>>> Circle(3, "red") == (3, "red")
False
>>> Circle(3, "red").size
3
>>> Circle(3, "red").color
'red'
>>> a = set()
>>> a.add(Circle(3, "red"))
>>> a.add(Circle(3, "red"))
>>> a.add(Circle(4, "green")) 
>>> a
set([Circle(3, 'red'), Circle(4, 'green')])
>>> c = Circle(2, "red")
>>> c.double()
Circle(4, 'red')
>>> c.color = 'blue'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: can't set attribute

If you wanted hashibility and good comparisons, couldn’t you just add __hash__ and comparison operators to your normal class? Yes, but then hashing and comparison would call into slower[1] Python code rather than tuple‘s native methods. And since your object is not really immutable, a user of your API might be tempted to mutate an object that really shouldn’t be mutated.

The above hack is actually built in to Python as collections.namedtuple (see the source). It works by generating code (like the above template) and execing it. There are a few reasons you might not want it, though:

1. namedtuple is available in Python 2.6+ only (though there are some alternate implementations).
2. You cannot add your own methods or customize the __repr__.
3. You cannot validate parameters passed to the constructor.
4. If you want to add docstrings to your namedtuple, you probably have to subclass it.
5. Completely different namedtuples are equal to each other if they have the same contents:

>>> from collections import namedtuple
>>> A = namedtuple('A', 'x y')
>>> B = namedtuple('B', 'z t')
>>> A(1, 2) == B(1, 2)
True

Then again, there’s reasons not to use the above “immutable object template” either: it’s very easy to mess up, you need rather superfluous unit tests for attribute access, and it might scare away new Python programmers. There are also a few surprises: Circle is len()able, indexable, and sliceable. But it is the least-terrible solution I could come up with.

[1] this might not be the case with PyPy

Worthwhile programming talks

Rich Hickey: Persistent Data Structures and Managed References: Clojure’s approach to Identity and State

Cliff Click: Not Your Father’s Von Nuemann Machine: A Crash Course in Modern Hardware

Simon Peyton Jones: Data Parallel Haskell

I might update this post if I find any more great talks in the next few months.

The Clo*ure namespace is officially full

closure (English),
closure (programming),
Clojure (language),
Clozure (language),
Closure Library,
Closure Compiler,
Clojure Library,
and that’s not all….

Adobe is finally working on Flash Socket write() feedback

A reason to rejoice:

We are working on this feature. It is definitely under work, so it will be available soon guys.

Socket class write() – information feedback

This will allow you to know how much written data has actually made it to your kernel. Without this information, you cannot do flow control unless the server is constantly responding with TCP-level data.

This “write feedback” information is available in WebSocket (in the bufferedAmount property), but ideally it would dispatch events too.