28. General Policy Statements#

In France, a general policy statement is delivered by the Government before Parliament when it takes office. This is a tradition, but not a constitutional obligation. These declarations are public, and we have access to all of them starting from Michel Debré’s speech on January 15, 1959. In political science, computing is increasingly used to analyze this data, particularly to count the occurrences of certain words in speeches that can be interpreted as trends.

We will create a program to address this question, which will allow us to work on file handling, string processing, lists, dictionaries, and the use of the matplotlib library.

28.1. Reading the Data#

The statements are stored in files named x.txt, where x is an integer, with 1.txt being the very first declaration. These files are grouped in the statements directory. We can download it here: statements.zip

28.1.1. Reading a Line#

Write a function split_line that takes a str as a parameter. This function replace all punctuation characters with white space, and then split a line into word in lower case.

A regular expression (often abbreviated as regex or regexp) is a sequence of characters that forms a search pattern. It’s used for pattern matching within strings, allowing you to search, edit, or manipulate text. We can use regex to replace any punctuation characters from a string.

import re
re.sub(r"""[,?;.:/'"()!-]""", " ", string_to_edit)

Hint: You can also use functions to manipulate strings, particularly strip and split, lower.

import re
def split_line(s):
    pass


#test = "A cette première vérité, que nul d'entre nous n'a le droit de méconnaître, s'en ajoute une autre, également essentielle. En un siècle où le maintien de la paix résulte d'un fragile équilibre des forces, notre patrie est, par la nature et l'histoire, située à un carrefour du monde. Alors que nous vivons un temps où la faiblesse ne pardonne pas, la France, tout en ne pouvant prétendre à l'égalité de puissance avec les grands empires du monde, se voit imposer les sévères responsabilités d'un pays déterminé à un rôle de premier plan. De la résolution de ses pouvoirs publics, de leur vigueur, de leur ténacité, dépend, pour une bonne part, l'ordre ou l'anarchie dans deux continents."
#split_line(test)

28.1.2. Reading a Declaration#

Write a function read_statement(i) that takes an integer i as a parameter, representing a file i.txt. This function creates a list of all the words contained in the speech. It returns this list.

Hint: In class, we presented a method for reading a file line by line.

def read_statement(i):
    pass

#len(read_statement(1)) #8810

28.1.3. Counting Occurences#

Write a function couting_occurences that takes a list of words as a parameter. It returns a dictionary where the keys are the words from the input list, and the values represent the number of times each word appears.

def couting_occurences(words):
    pass

# sum(couting_occurences(read_statement(1)).values()) #8810

28.1.4. Sorting Words#

Since a dictionary is not a sequential data structure, we need to convert it into a list where each element is a tuple containing a word (in second place) and its occurrence count (in first). We can then sort the list in decreasing order of occurrences. We can then sort the list in decreasing order based on the occurrence count. The sorting method prioritizes the first element in the tuple and uses the second element (the word) as a tiebreaker when counts are equal.

Write a function sorting_words that takes a dictionary of word occurrences as a parameter and returns a sorted list of tuples.

Hint: use sort or sorted as seen in previous section.

def sorting_words(words):
    pass

words = sorting_words(couting_occurences(read_statement(1)))
# words  # [(442, 'de'), (353, 'la'), (270, 'l'), (244, 'et'), (205, 'des'),...

28.2. Refining Results#

Many words in the text, such as adverbs and pronouns, do not contribute meaningfully to the analysis. To perform proper text analysis, we need to filter these words out. The file words_eclusion.txt contains a list of words to exclude from the analysis.

You can reuse the read_word_file function that we created in the Diceware exercise. Copy this method into your source code to proceed.

def read_word_file(file_path):
    file = None
    words = []
    try:
        file = open(file_path, 'r')
        for line in file:
            words.append(line.strip())
        return words
    except FileNotFoundError:
        print(f"The file {file_path} does not exist.")
    except IOError:
        print("An error occurred while accessing the file.")
    finally:
        if file :
            file.close()

banned_words = read_word_file("general_policy/words_exclusion.txt")
len(banned_words) # 865
865

28.2.1. Filter Banned Words#

Create a filter_banned_words function that takes two parameters: a list of occurences of word (tuples of occurrence count and word) and a list of banned words. It returns a new list of word occurrences that contains only meaningful words, excluding the banned ones.

def filter_banned_words(occurences, banned_words):
    pass

# words = filter_banned_words(words, banned_words)
# words[:30] # [(42, 'gouvernement'), (42, 'france'), (32, 'politique'), (29, 'doit'),...

28.2.2. Vizualize Results#

We can now use the module matplotlib to vizualise the results. Add the following function to your code.

import matplotlib.pyplot as plt

def top_words(i):
    banned_words = read_word_file("general_policy/words_exclusion.txt")
    words = sorting_words(couting_occurences(read_statement(i)))
    filter_words = filter_banned_words(words, banned_words)

    # This notation create two lists. One with the first element of tuples. Another of the second element
    x, y = zip(*filter_words[:20]) 

    # Add a title to the graph, rotate the label, use bar graph and display the graph
    plt.title("Top 30 of most used words in statement")
    plt.xticks(rotation=90)
    plt.bar(y, x)
    plt.show()

#top_words(1)

28.2.3. Temporal Analysis#

We can perform a temporal analysis to discover patterns by examining all the statements. The files in the statements directory represent the following declarations:

  • Déclaration de Michel Debré, 15 janvier 1959

  • Déclaration de Georges Pompidou, 26 avril 1962

  • Déclaration de Maurice Couve de Murville, 17 juillet 1968

  • Déclaration de Jacques Chaban Delmas, 16 septembre 1969

  • Déclaration de Pierre Messmer, 3 octobre 1972

  • Déclaration de Jacques Chirac, 5 juin 1974

  • Déclaration de Raymond Barre, 5 octobre 1976

  • Déclaration de Pierre Mauroy, 8 juillet 1981

  • Déclaration de Laurent Fabius, 24 juillet 1984

  • Déclaration de Jacques Chirac, 9 avril 1986

  • Déclaration de Michel Rocard, 29 juin 1988

  • Déclaration d’Édith Cresson, 22 mai 1991

  • Déclaration de Pierre Bérégovoy, 8 avril 1992

  • Déclaration d’Édouard Balladur, 8 avril 1993

  • Déclaration d’Alain Juppé, 23 mai 1995

  • Déclaration de Lionel Jospin, 24 juin 1997

  • Déclaration de Jean-Pierre Raffarin, 3 juillet 2002

  • Déclaration de Jean-Pierre Raffarin, 5 avril 2004

  • Déclaration de Dominique de Villepin, 8 juin 2005

  • Déclaration de François Fillon, 3 juillet 2007

  • Déclaration de François Fillon, 24 novembre 2010

  • Déclaration de Jean-Marc Ayrault, 3 juillet 2012

  • Déclaration de Manuel Valls, 8 avril 2014

  • Déclaration de Manuel Valls, 16 septembre 2014

  • Déclaration de Bernard Cazeneuve, 14 décembre 2016

  • Déclaration d’Édouard Philippe, le 4 juillet 2017

  • Déclaration de Jean Castex, le 15 juillet 2020

  • Déclaration d’Élisabeth Borne, le 6 juillet 2022

  • Déclaration de M. Gabriel Attal, le 30 janvier 2024

Add the following evolution function to your code. This function should read all the statements and display the evolution of the usage of the word over time.

def evolution(word):
    x = [1959, 1962, 1968, 1969, 1972,
         1974, 1976, 1981, 1984, 1986,
         1988, 1991, 1992, 1993, 1995,
         1997, 2002, 2004, 2005, 2007,
         2010, 2012, 2014, 2014, 2016,
         2017, 2020, 2022, 2024]
    statements = [read_statement(i) for i in range(1, len(x)+1)]
    y = [s.count(word)/len(s) for s in statements]

    plt.title(f'Usage evolution of word "{word}"\nin general statements from {x[0]} to {x[-1]}')
    plt.plot(x, y[:len(x)])
    plt.show()

# evolution("chômage")
# evolution("santé")
# evolution("école")

28.2.4. Do It Yourself#

You should now have a basic understanding of how to manage data and display results using matplotlib. Experiment with it to find interesting ways to visualize your data. Explore different types of charts and plots to represent your data effectively. You can find examples of charts plotted using matplotlib with the associated code.