Friday, August 5, 2011

Sets

Sets are python hash tables. You can think of them as just a dictionary without any values. So, they are a lot like lists, except they
  • Do not preserve order
  • Have fast lookups
  • Must be composed of hashable (read: immutable) elements
  • Must have unique elements
So, it looks like a bunch of downsides, but the fast lookup can really be killer. Par example:
>>> foo = range(10000)
>>> timeit int(rand()*100000) in foo
10000 loops, best of 3: 146 us per loop

>>> boo = set(range(10000))
>>> timeit int(rand()*100000) in boo
1000000 loops, best of 3: 479 ns per loop
Notice that the lookup time for a set is ~300 times faster. Sets also have nicely overloaded operators, so that we can do set operations
>>> A = set(range(5))
>>> B = set(range(2,10))
>>> 5 in A  #testing for element
False
>>> A | B  # A union B
set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> A - B  # set difference
set([0,1])
>>> A & B  # set intersection
set([2,3,4])
>>> A <= B # testing subsets
False
Other nice functions to now are add and update. Add adds a single element, update will take the union of a set with something else.
>>> A
set([0,1,2,3,4])
>>> A.add(10)
>>> A
set([0,1,2,3,4,10])
>>> A.update(range(12))
>>> A
set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
Full documentation here

1 comment: