Python provides the module itertools for constructing efficient iterators and you don't have to implement an iterator.
Python kernel version: 2.7
Enumerator
>>> names = ['Alice', 'Bob', 'Cindy']
>>> for index, element in enumerate(names,6):
... print '%d %s' % (index, element)
6 Alice
7 Bob
8 Cindy
| Type | Example |
|---|---|
| Infinite iterators | count, cycle, repeat |
| Iterators that teminate | chain, compress, dropwhile, ifilter, ifilterfalse, imap, starmap |
| Combinatoric generators | combinations |
Infinite iterators
The itertools package comes with three iterators that can iterate infinitely. If they are used, we need set to break-out condition of these iterators. These can be useful for generating numbers or cycling over iterables of unknown length, for example.
Iterators that teminate
As stated in title, most of the iterators in itertools are not infinite.
Combinatoric generators
The itertools library contains four iterators that can be used for creating combinations and permutations of data.
count(start, step)
The count iterator will return number which beginning with start parameter and iterateing with step parameter.
>>> from itertools import count
>>> for i in count(start = 10, step = 1):
.... if i >= 15:
.... break
.... else:
.... print(i)
10
11
12
13
14
islice(iterable, start, stop, step)or islice(iterable, count)
Islice is another way to limit the output of the infinite iterator. Basically what islice does is take a slice by index of the iterable and returns the selected items as an iterator. There are two implementation of islice: one is looping over count and the other is. If the iterable is within the range between the start and stop, slice indices would perform faster than islice.
>>> from itertools import count
>>> from itertools import islice
>>> for i in islice(count(10,8), 5):
... print(i)
10
18
26
34
42
>>> for i in islice("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 3, 8):
... print(i)
D
E
F
G
H
>>> A = "ABCD"
>>> for j in xrange(1000000):
... for i in A[3:15]:
... pass
237 ms per loop
>>> for j in xrange(1000000):
... for i in islice(A, 3, 15):
... pass
354 ms per loop
cycle(iterable)
The cycle iterator from itertools allows you to create an iterator that will cycle through a series of values infinitely.
>>> from itertools import cycle
>>> count = 0
>>> for item in cycle('XYZ'):
... if count > 7:
... break
... print(item)
... count += 1
...
X
Y
Z
X
Y
Z
X
Y
repeat(object, times)
The repeat iterators will return an object an object over and over again forever unless you set its times argument. It is quite similar to cycle except that it doesn’t cycle over a set of values repeatedly.
>>> from itertools import repeat
>>> for i in repeat(2, 5):
... print i
2
2
2
2
2
chain(iterables) and chain.from_iterable(iterables)
There three methods to achieve the concatenation: chain, chain.from_iterable and the list operation. The chain iterator will take a series of iterables and basically flatten them down into one long iterable. You can also use a method of chain called chain.from_iterable. This method works slightly different from chain. Instead of passing in a series of iterables, you have to pass in a nested list. After the test, it turns out the list operation works the best in aspect of time.
>>> from itertools import chain
>>> my_list = ['foo', ["four", "layer"], 'bar']
>>> numbers = list(range(5))
>>> cmd = ['ls', '/some/dir']
>>> chain_list = []
>>> for i in xrange(1000000):
... chain_list = []
... chain_list += cmd + numbers + my_list
322 ms per loop
>>> chain_list = []
>>> for i in xrange(1000000):
... chain_list = []
... chain_list = list(chain(my_list, numbers, cmd))
857 ms per loop
>>> chain_list = []
>>> for i in xrange(1000000):
... chain_list = []
... chain_list = list(chain.from_iterable([cmd, my_list, numbers]))
893 ms per loop
compress(data, selectors)
The compress is useful for selecting the iterables with the binary filter. The filter can be a list of booleans (or 1s and 0s). One thing should be noted is that if there is missing in the selector, the compress function will take it as False, you can check the length of the selector before the operation.
>>> from itertools import compress
>>> data = 'ABCDEFG'
>>> selector_1 = [True, False, True, True, False]
>>> list(compress(data, selector_1))
['A', 'C', 'D']
>>> selector_2 = [True, False, True, True, False, False, True]
>>> list(compress(data, selector_2))
['A', 'C', 'D', 'G']
dropwhile(predicate, iterable)
There is a neat little iterator in itertools called dropwhile. This iterator will drop elements as long as the filter criteria is True. That is, the data will not appear in the output until the predicate becomes False. This iterator behaves as a trigger and could be triggered only once.
>>> from itertools import dropwhile
>>> list(dropwhile(lambda x: x<5, [1,4,6,4,1]))
[6, 4, 1]
>>> list(dropwhile(lambda x: x<5, [1,4,6,4,1,10,11,5,7,4,3,2,1]))
[6, 4, 1, 10, 11, 5, 7, 4, 3, 2, 1]
>>> from itertools import dropwhile
>>> def greater_than_five(x):
... return x > 5
>>> list(dropwhile(greater_than_five, [6, 7, 8, 9, 1, 2, 3, 10]))
[1, 2, 3, 10]
ifilter(predicate, iterable) and ifilterfalse(predicate, iterable)
According to predicate, the ifilterfalse and ifilter function return those values that evaluated to False and True respectively.
>>> from itertools import ifilterfalse
>>> def not_greater_than_five(x):
... return x > 5
>>> list(ifilterfalse(not_greater_than_five, [6, 7, 8, 9, 1, 2, 3, 10]))
[1, 2, 3]
>>> from itertools import ifilter
>>> def greater_than_five(x):
... return x > 5
>>> list(ifilter(greater_than_five, [6, 7, 8, 9, 1, 2, 3, 10]))
[6, 7, 8, 9, 10]
Make an iterator that computes the function using arguments from each of the iterables. If function is set to None, then imap() returns the arguments as a tuple. Like map() but stops when the shortest iterable is exhausted instead of filling in None for shorter iterables. The reason for the difference is that infinite iterator arguments are typically an error for map() (because the output is fully evaluated) but represent a common and useful way of supplying arguments to imap().
>>> from itertools import imap
>>> for i in imap(pow, (2,3,10), (5,2,3)):
... print i
32
9
1000
>>> for i in imap(pow, count(2), (5,2,3)):
... print i
32
9
64
>>> for i in map(pow, count(2), (5,2,3)):
... print i
TypeError: unsupported operand type(s) for ** or pow(): 'int' and 'NoneType'
>>> for i in map(lambda x,y: x+y, (4,5,2), (5,2,3)):
... print i
9
7
5
starmap(function, iterable)
The starmap tool will create an iterator that can perform computation with the function and iterable provided. As the official documentation mentions, “the difference between map() and starmap() parallels the distinction between function(a,b) and function(*c).”
>>> from itertools import starmap
>>> def add(a, b, c):
... return a+(b*c)
>>> for item in starmap(add, [(2,3,6), (4,5,7)]):
... print(item)
20
39
combinations(iterator, r)
itertools.combinations create the iterator of combinations. The second parameter "r" represents the r in "n choose r" which could be writen in \(C_{r}^{n}\)
>>> from itertools import combinations
>>> list(combinations('WXYZ', 2))
[('W', 'X'), ('W', 'Y'), ('W', 'Z'), ('X', 'Y'), ('X', 'Z'), ('Y', 'Z')]
Combinations returns result with tuple type. We can loop over our iterator and join the tuples into a single string
>>> for item in combinations('WXYZ', 2):
... print(''.join(item))
...
WX
WY
WZ
XY
XZ
YZ
- Iterator is a generator, which is an object which has the ability to be iterated through only once. The reason they are efficient has nothing to do with telling you what is next "by reference." They are efficient because they only generate the next item on the fly. The next result is not produced until it is requested. All of the items are not generated at once.
- The keyword combination
forinaccepts an iterable object as its second argument. - The generator is not a pointer. The iterable object can be a generator, but it can also be any other iterable object, such as a list, dict, string, or a user-defined type that provides the required functionality.
- The
iterfunction is applied to the object to get an iterator. Actually, to be more precise, the object's__iter__method is called. - If the call to
__iter__is successful, the functionnext()is applied to the iterable object over and over again in a loop, and the first variable supplied toforinis assigned to the result of thenext()function. Actually, to be more precise: it calls the iterator object's__next__method. The for loop ends whennext()raises theStopIteration exception - When used by itself, the
inkeyword first calls the object's__contains__method. If the iterable object is NOT a container (i.e. it doesn't have a__contains__method),innext tries to call the object's__iter__method, and returns an iterator if the call is successful. Basically, an iterator is an object that you can use the built-in generic functionnext(). If the object doesn't have a__iter__method to return an iterator, in then falls back on the old-style iteration protocol using the object's__getitem__method. - If you wish to create your own object type to iterate over (i.e, you can use
forin, or justin, on it), it's useful to know about theyieldkeyword, which is used in generators. The presence ofyieldturns a function or method into a generator instead of a regular function/method. You don't need the __next__ method if you use a generator because it brings __next__ along with it automatically).
>>> class MyIterable():
... def __iter__(self):
... yield 1
>>> m = MyIterable()
>>> for _ in m: print(_)
1
>>> 2 in m
True
If you wish to create your own container object type (i.e, you can use in on it by itself, but NOT for in), you just need the __contains__ method.
>>> class MyUselessContainer():
... def __contains__(self, obj):
... return True
>>> m = MyUselessContainer()
>>> 1 in m
True
>>> 'Foo' in m
True
>>> TypeError in m
True
>>> None in m
True