- `ChainMap`
- `defaultdict`
- `deque`
There is a submodule of collections called abc or Abstract Base Classes. These will not be covered in this post, let's start with the ChainMap container!
ChainMap
from collections import ChainMap car_parts = {'hood': 500, 'engine': 5000, 'front_door': 750} car_options = {'A/C': 1000, 'Turbo': 2500, 'rollbar': 300} car_accessories = {'cover': 100, 'hood_ornament': 150, 'seat_cover': 99} car_pricing = ChainMap(car_accessories, car_options, car_parts)
car_pricing >> ChainMap({'cover': 100, 'hood_ornament': 150, 'seat_cover': 99}, {'A/C': 1000, 'Turbo': 2500, 'rollbar': 300}, {'hood': 500, 'engine': 5000, 'front_door': 750}) car_pricing['hood'] >> 500
Finally, we try to access one of the keys of our ChainMap. When we do this, the ChainMap will go through each map to see if that key exists and has a value. If it does, then the ChainMap will return the first value it finds that matches that key.
This is especially useful if you want to set default values. Suppose we want to create an application that has some default values. The application will also know the operating system environment variables. If there is an environment variable that matches one of the keys we have as a default in our application, the environment will override our default value. In addition, let's assume that we can pass arguments to our application.
These arguments take precedence over the environment and the defaults. This is one place where a ChainMap can really come in handy. Let's look at a simple example that is based on one from the Python documentation:
Note: do not run this code from Jupyter Notebook, but from your favorite IDE and calling it from a terminal. this command `python chain_map.py -u daniel`
import argparse import os from collections import ChainMap def main(): app_defaults = {'username':'admin', 'password':'admin'} parser = argparse.ArgumentParser() parser.add_argument('-u', '--username') parser.add_argument('-p', '--password') args = parser.parse_args() command_line_arguments = {key:value for key, value in vars(args).items() if value} chain = ChainMap(command_line_arguments, os.environ, app_defaults) print(chain['username']) if __name__ == '__main__': main() os.environ['username'] = 'test' main()
➜ python python3 post.py -u daniel daniel daniel
Next we have a simple function that has some defaults. I have seen these defaults used for some popular routers. We then set up our argument parser and tell it how to handle certain command line options. You'll notice that argparse doesn't provide a way to get a dictionary object from its arguments, so we use a dict comprehension to extract what we need.
The other interesting piece here is the use of Python's built-in vars. If you called it without a vars argument it would behave like Python's built-in locales. But if you pass it an object, then vars is the equivalent of the `__dict__` property of object. In other words, vars(args) is equal to `args.__dict__`.
Finally we create our ChainMap by passing our command line arguments (if any), then the environment variables and finally the default values.
At the end of the code, we try calling our function, then setting an environment variable and calling it again. Try it and you will see that it prints admin and then tests as expected. Now let's try calling the script with a command line argument:
python chain_map.py -u daniel
When I run this on my machine, it returns daniel twice. This is because our command line argument overrides everything else. It doesn't matter what we set the environment to because our ChainMap will look at the command line arguments first before anything else. If you try it without the `-u daniel` it will run the actual arguments, in my case `"admin" "test"`.
Now that you know how to use ChainMaps, we can move on to Counter!
Counter
from collections import Counter Counter('superfluous') >>Counter({'s': 2, 'u': 3, 'p': 1, 'e': 1, 'r': 1, 'f': 1, 'l': 1, 'o': 1})counter = Counter('superfluous') counter['u'] >> 3
The counter provides some methods that you may be interested in. For example, you can call elements which will get an iterator over the elements that are in the dictionary, but in an arbitrary order. This function can be considered as an "encoder", since the output in this case is an encoded version of the string.
list(counter.elements()) >> ['s', 's', 'u', 'u', 'u', 'p', 'e', 'r', 'f', 'l', 'o']
counter.most_common(2) [('u', 3), ('s', 2)]
Here we just asked our Counter which were the two most recurring items. As you can see, it produced a list of tuples that tells us that `"u"` occurred 3 times and `"s"` occurred twice.
The other method I want to cover is the subtract method. The `subtract` method accepts an iterable or a mapping and uses that argument to subtract. It's a little easier to explain if you see some code:
counter_one = Counter('superfluous') counter_one >> Counter({'s': 2, 'u': 3, 'p': 1, 'e': 1, 'r': 1, 'f': 1, 'l': 1, 'o': 1}) counter_two = Counter('super') counter_one.subtract(counter_two) counter_one >> Counter({'s': 1, 'u': 2, 'p': 0, 'e': 0, 'r': 0, 'f': 1, 'l': 1, 'o': 1})
As I mentioned at the beginning of this section, you can use Counter against any iterable or mapping, so you don't have to use only strings. You can also pass tuples, dictionaries and lists to it.
Try it on your own to see how it works with those other data types. Now we are ready to move on to `defaultdict`!
`defaultdict`
sentence = "The red for jumped over the fence and ran to the zoo for food" words = sentence.split(' ') words >> ['The', 'red', 'for', 'jumped', 'over', 'the', 'fence', 'and', 'ran', 'to', 'the', 'zoo', 'for', 'food'] reg_dict = {} for word in words: if word in reg_dict: reg_dict[word] += 1 else: reg_dict[word] = 1 print(reg_dict) >> {'The': 1, 'red': 1, 'for': 2, 'jumped': 1, 'over': 1, 'the': 2, 'fence': 1, 'and': 1, 'ran': 1, 'to': 1, 'zoo': 1, 'food': 1}
Now let's try to do the same with defaultdict!
from collections import defaultdict sentence = "The red for jumped over the fence and ran to the zoo for food" words = sentence.split(' ') d = defaultdict(int) for word in words: d[word] += 1 print(d) >> defaultdict(<class 'int'>, {'The': 1, 'red': 1, 'for': 2, 'jumped': 1, 'over': 1, 'the': 2, 'fence': 1, 'and': 1, 'ran': 1, 'to': 1, 'zoo': 1, 'food': 1})
You will notice right away that the code is much simpler. The defaultdict will automatically assign zero as a value to any key it doesn't already have in it. We add one to make it make more sense and it will also increment if the word appears multiple times in the sentence.
Now let's try using a Python list type as our `default_factory`. We will start with a regular dictionary, as before.
my_list = [(1234, 100.23), (345, 10.45), (1234, 75.00), (345, 222.66), (678, 300.25), (1234, 35.67)] reg_dict = {} for acct_num, value in my_list: if acct_num in reg_dict: reg_dict[acct_num].append(value) else: reg_dict[acct_num] = [value]
If you run this code, you should get output similar to the following:
print(reg_dict) >> {1234: [100.23, 75.0, 35.67], 345: [10.45, 222.66], 678: [300.25]}
from collections import defaultdict my_list = [(1234, 100.23), (345, 10.45), (1234, 75.00), (345, 222.66), (678, 300.25), (1234, 35.67)] d = defaultdict(list) for acct_num, value in my_list: d[acct_num].append(value)
Again, this eliminates the if/else conditional logic and makes the code easier to follow. Here is the output of the above code:
print(d) >> defaultdict(<class 'list'>, {1234: [100.23, 75.0, 35.67], 345: [10.45, 222.66], 678: [300.25]})
This is a very good thing! Let's try using a `lambda` also as our `default_factory`!
from collections import defaultdict animal = defaultdict(lambda: "Monkey") animal >> defaultdict(<function __main__.<lambda>()>, {}) animal['Sam'] = 'Tiger' print (animal['Nick']) >> Monkey animal >> defaultdict(<function __main__.<lambda>()>, {'Sam': 'Tiger', 'Nick': 'Monkey'})
Here we create a `defaultdict` that will assign `Monkey` as default value to any key. The first key is set to `Tiger`, and the next key is not set. If you print the second key, you will see that it has 'Monkey' assigned to it.
In case you haven't noticed yet, it is basically impossible to cause a KeyError as long as you set the `default_factory` to something that makes sense. The documentation mentions that if you set the `default_factory` to `None`, then you will receive a KeyError.
Let's see how that works:
from collections import defaultdict x = defaultdict(None) x['Mike'] --------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-30-d21c3702d01d> in <module> 1 from collections import defaultdict 2 x = defaultdict(None) ----> 3 x['Mike'] KeyError: 'Mike'
In this case, we just created a `defaultdict` with an error. It can no longer assign a default value to our key, so it throws a `KeyError` instead. Of course, since it is a subclass of `dict`, we can simply set the key to some value and it will work. But that defeats the purpose of `defaultdict`.
`deque`
A list is optimized for fast fixed-length operations. Full details can be found in the Python documentation. A deque accepts a `maxlen` argument that sets the limits for the deque. Otherwise, the deque will grow to an arbitrary size. When a bounded deque is full, any new elements added will cause the same number of elements to come out the other end.
As a general rule, if you need to add or remove elements quickly, use a deque. If you need quick random access use a list. Let's take a moment to see how a deque can be created and used.
from collections import deque import string d = deque(string.ascii_lowercase) for letter in d: print(letter) >> a b c d e f g h i j k l m n o p q r s t u v w x y z
d.append('bye') d >> deque(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'bye']) d.appendleft('hello') d >> deque(['hello', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'bye']) d.rotate(1) d >> deque(['bye', 'hello', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'])
Let's break this down a bit. First we add a chain to the right end of the deque. Then we add another string to the left side of the deque. Finally, we call `rotate` on our deque and pass it a one, which causes it to rotate once to the right.
In other words, it makes an element rotate from the far right and in front. You can pass it a negative number to make the deque rotate to the left instead.
Let's end this section with an example based on some of the Python documentation
from collections import deque def get_last(filename, n=5): """ Returns the last n lines from the file """ try: with open(filename) as f: return deque(f, n) except OSError: print("Error opening file: {}".format(filename)) raise
This code works much like the Linux tail program does. Here we pass a `filename` to our script along with the `n` number of lines we want it to return.
The deque is limited to whatever number we pass as `n`. This means that once the deque is full, when new lines are read and added to the deque, the oldest lines are pulled from the other end and discarded.
I have also wrapped the file opening with a simple exception handler because it is very easy to pass a malformed path. This will catch files that do not exist.