Lack of persistent collections considered harmful

Can you spot the bug?

class Protocol:
      def __init__(self, tracker):
            self.tracker = tracker
            self.tracker.trackProto(self)

      def handleMessage(self, msg):
            # Broadcast the message to everyone
            for proto in self.tracker.protocols:
                  proto.tellPeer(msg)

      def tellPeer(self, msg):
            print msg
            if msg == "close":
                  self.tracker.forgetProto(self)

class Tracker:
      def __init__(self):
            self.protocols = set()

      def trackProto(self, proto):
            self.protocols.add(proto)

      def forgetProto(self, proto):
            self.protocols.remove(proto)

tracker = Tracker()
protocols = [Protocol(tracker), Protocol(tracker), Protocol(tracker)]
protocols[0].handleMessage("heya")
protocols[0].handleMessage("close")

(Spoilers follow)

.

.

.

.

.

.

.

.

.

The problem is that the set object self.tracker.protocols changes during iteration, raising a RuntimeError: Set changed size during iteration. This kind of bug is completely non-obvious, except to those who imagine all possible stack traces in their head. We could add a .copy() to work around this, but then we have to explain in a comment why we did so. There's also the penalty of copying the whole thing.

This bug happens over and over again, just about everywhere.

How about an easier one?

import random

def do(actions=['one']):
      if random.random() < 0.1:
            actions.append('two')
      return actions

for i in xrange(100):
      print do()

(Spoilers follow)

.

.

.

.

.

.

.

.

.

Of course, 10% of the time, actions will be mutated, and at some point the function will return ['one', 'two', 'two']. Initially, this looks like a "Python should copy the default argument" problem, but that idea doesn't work: you're not limited to literals; you can reference another object, which can be anything. Should Python copy that, too? The problem is really that almost every collection in the language is mutable, for no good reason.

What's wrong with this simple observer system?

observers = set()

def addObserver(observer):
      observers.add(observer)

def removeObserver(observer):
      observers.remove(observer)

def tellObservers():
      events = ['mouse_move', 'click']
      for o in observers:
            o(events)

###

def myObserver(events):
      events.append('key_down')
      removeObserver(myObserver)

def myOtherObserver(events):
      print events

addObserver(myObserver)
addObserver(myOtherObserver)
tellObservers()

(Spoilers follow)

.

.

.

.

.

.

.

.

.

Not one, but two things! First, we have the same problem as the first example: when tellObservers calls myObserver, the removeObserver call will mutate the observers set and raise RuntimeError. After we fix that, we still have our problematic events list. The first observer mutates it, messing it up for the second observer. You have two options at this juncture: do "defensive copying" where tellObservers calls o(events.copy()) instead of o(events), or document that observers must not mutate the list they get. Sound like a good use of everyone's time?

Do you want to write unit tests for all of these "edge cases", and hope you didn't miss anything? The real solution is to embrace persistent data structures. Immutable objects won't change while you're looking at them (or ever). Persistent data structures make modifications cheap through structural sharing. You will sleep better, not worrying about mutation issues.

Last modified July 28, 2012 .