14. Dictionary#

Suppose we want to manage data concerning a metro line, for example, Line 16 of the Grand Paris Express. We have four pieces of information about it: its latitude, longitude, identifier, and label. We could use a tuple or a list to group these four values. For example, for the La Défense station, we could have [2.237688709843252, 48.890854792161804, 77, “La Défense”]. To access the station name, we would need to retrieve the fourth value of the list. To manage all the stations on the line, we would have a list containing lists representing the stations.

[
    [2.2021300102601757, 48.8724809534261, 74, "Rueil - Suresnes 'Mont-Valérien'"],
    [2.237688709843252, 48.890854792161804, 77, "La Défense"],
    [2.2720135095825587, 48.914987386561855, 79, "Bois-Colombes"],
    ...
]

This approach is problematic. Indeed, you need to know that index 3 is used to access the station name, and while this may be obvious when creating the data structure, it may be less clear over time. Will you remember this association months after starting the project? If another developer needs to work on the code, how will they know what convention was used? Finally, what happens if we choose to modify a station’s data? Imagine we want to group the GPS coordinates into a single tuple: [(2.237688709843252, 48.890854792161804), 77, “La Défense”]. In this case, everywhere we used indices 2 and 3, we would now need to use indices 1 and 2. This is likely to cause bugs. We will therefore explore another solution to address this issue: dictionaries (see Dictionnary).

14.1. Definition#

In Python, a dictionary is a versatile data structure that stores data as key-value pairs. Unlike lists, which are indexed by numerical positions, dictionaries are indexed by keys. These keys can be of any immutable (see Immutable) type, such as strings, numbers, or tuples, allowing for a wide range of indexing possibilities. The values associated with these keys can be of any data type, including other dictionaries, making dictionaries a powerful tool for complex data structures.

Key Characteristics of a Dictionary:

  • Unique Keys: Each key in a dictionary must be unique. If a key is assigned a new value, the previous value associated with that key is replaced. This uniqueness ensures that each key maps to exactly one value, making it easy to update or retrieve specific pieces of information.

  • Fast Access: One of the most significant advantages of using dictionaries is the speed at which values can be accessed. Since dictionaries are implemented as hash tables (see Hash Table), accessing a value by its key is much faster than searching for an element in a list, especially as the size of the data grows. This efficiency is particularly beneficial when dealing with large datasets or when quick lookups are required.

  • Mutability: Dictionaries are mutable (see Mutable), meaning that their contents can be changed after creation. You can add new key-value pairs, update existing ones, or delete entries as needed. This flexibility makes dictionaries ideal for scenarios where the data structure needs to evolve during runtime, such as dynamically adjusting to new data or conditions.

station = {
    "name": "La Défense",
    "coordinates": (2.237688709843252, 48.890854792161804),
    "id": 77
}

14.2. Basic manipulations#

14.2.1. Creation#

Here’s how to create an empty dictionary.

d = {}

Using a dictionary for our scenario is useful. Each station can be represented as follows with three key-value pairs.

station = {
    "name": "La Défense",
    "coordinates": (2.237688709843252, 48.890854792161804),
    "id": 77
}

The values contained in a dictionary can be of any type, such as numbers, strings, lists, or even other dictionaries. However, the keys must be immutable, meaning they cannot be modified.

14.2.2. Accessing Values#

Accessing values in a dictionary is quite similar to accessing values in sequences. We use square brackets.

>>> station["name"]
"La Défense"

Don’t think that accessing a value in a Python dictionary is as laborious as looking up a word in a English dictionary. It’s as fast as accessing an element in a list by its index.

A KeyError will be raised if you try to access a key that doesn’t exist in the dictionary.

>>> station["lat"]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[2], line 1
----> 1 station["lat"]

KeyError: 'lat'

To avoid this kind of error, Python offers the get function, which allows you to access the value associated with a key or a default value if the key is not associated with any value.

station.get("name", "Unkown")  # if name is not a key, then "Unkown" is returned

It is possible to determin if a key is present in a dictionnary using the in keyword.

>>> "name" in station
True

14.2.3. Adding and Removing Associations in a Dictionary#

To add a new key-value pair to a dictionary, you use square brackets to specify the key and assign it a value. This operation inserts the new key-value pair into the dictionary if the key does not already exist.

station["line"] = 16

In this example, a new key "line" with the value 16 is added to the dictionary. The dictionary now includes four key-value pairs.

{
    "name": "La Défense",
    "coordinates": (2.237688709843252, 48.890854792161804),
    "id": 77,
    "line": 16
}

If you add a new value for an existing key, the old value is replaced with the new one. This operation effectively updates the value associated with that key.

To remove a key-value pair from a dictionary, you use the del keyword followed by the key in square brackets. This operation deletes the specified key and its associated value from the dictionary.

del station["line"]

In this example, the key “line” and its associated value are removed from the dictionary. The resulting dictionary contains only the remaining key-value pairs.

14.3. Iterating Through a Dictionary#

A dictionary in Python can be iterated over in several ways: by keys, by values, or by key-value pairs.

14.3.1. Iterating Over Keys#

When you iterate over a dictionary using a for loop, you are iterating over its keys. The order of the keys is not guaranteed to be the same as the order in which they were added.

for key in station:
    print(key)

Output:

"name"
"coordinates"
"id"

Note

The output order might vary; the keys are visited in an arbitrary order.

14.3.2. Iterating Over Values#

To iterate over the values of a dictionary, use the values method (see Method). This method returns a view object that displays a list of all the values in the dictionary.

for value in d.values():
    print(value)

Output:

"La Défense"
(2.237688709843252, 48.890854792161804)
77

This will print all the values in the dictionary. The order of values corresponds to the order of the keys, but again, this order is not guaranteed to be consistent.

14.3.3. Iterating Over Key-Value Pairs#

To iterate over both keys and values simultaneously, use the items method. This method returns a view object that displays a list of tuples, where each tuple contains a key and its associated value.

for k, v in station.items():
    print(k, v)

Output:

"name", "La Défense"
"coordinates", (2.237688709843252, 48.890854792161804)
"line", 77

This approach is useful when you need to process both keys and values together.

14.3.4. Getting the Number of Key-Value Pairs#

To find out how many key-value pairs are present in a dictionary, use the len function. This function returns the number of key-value pairs in the dictionary.

print(len(station))

This output indicates that there are three key-value pairs in the dictionary.`

14.4. Hashing and dictionary#

Hashing (see Hash) is a process used to convert an input (often a key) into a fixed-size integer, known as a hash code or hash value. This value is used to determine where to store or retrieve data in a hash table. Here’s a more detailed breakdown.

14.4.1. Hash Function#

The hash function is a mathematical algorithm that takes an input (or key) and computes an integer hash code. This hash code is used as an index into an array or hash table. Hash functions must have the following properties:

  • Deterministic: The same input always produces the same hash code.

  • Efficient: The function should compute the hash code quickly.

  • Uniform Distribution: Ideally, hash codes are distributed uniformly across the hash table to minimize collisions.

Common hash functions include MD5, SHA-1, and custom algorithms depending on the use case.

14.4.2. Hash Table#

A hash table is an array where each slot can store data. When you insert a key-value pair into a hash table, the hash function computes the index based on the hash code. The data is then stored in the corresponding slot in the array.

Since different keys can produce the same hash code, collisions occur. Several methods exist to handle collisions we will not covered this in the course.

To maintain efficiency, a hash table may need to be resized and rehashed if it becomes too full. When rehashing, a new, larger hash table is created. The existing entries are then hashed again and inserted into the new table. This helps in maintaining low load factors and efficient operations.