28. General Policy Statements#
In France, a general policy statement is delivered by the Government before Parliament when it takes office. This is a tradition, but not a constitutional obligation. These declarations are public, and we have access to all of them starting from Michel Debré’s speech on January 15, 1959. In political science, computing is increasingly used to analyze this data, particularly to count the occurrences of certain words in speeches that can be interpreted as trends.
We will create a program to address this question, which will allow us to work on file handling, string processing, lists, dictionaries, and the use of the matplotlib
library.
28.1. Reading the Data#
The statements are stored in files named x.txt
, where x
is an integer, with 1.txt
being the very first declaration. These files are grouped in the statements
directory. We can download it here: statements.zip
28.1.1. Reading a Line#
Write a function split_line
that takes a str
as a parameter. This function replace all punctuation characters with white space, and then split a line into word in lower case.
A regular expression (often abbreviated as regex or regexp) is a sequence of characters that forms a search pattern. It’s used for pattern matching within strings, allowing you to search, edit, or manipulate text. We can use regex to replace any punctuation characters from a string.
import re
re.sub(r"""[,?;.:/'"()!-]""", " ", string_to_edit)
Hint: You can also use functions to manipulate strings, particularly strip and split, lower.
import re
def split_line(s):
pass
#test = "A cette première vérité, que nul d'entre nous n'a le droit de méconnaître, s'en ajoute une autre, également essentielle. En un siècle où le maintien de la paix résulte d'un fragile équilibre des forces, notre patrie est, par la nature et l'histoire, située à un carrefour du monde. Alors que nous vivons un temps où la faiblesse ne pardonne pas, la France, tout en ne pouvant prétendre à l'égalité de puissance avec les grands empires du monde, se voit imposer les sévères responsabilités d'un pays déterminé à un rôle de premier plan. De la résolution de ses pouvoirs publics, de leur vigueur, de leur ténacité, dépend, pour une bonne part, l'ordre ou l'anarchie dans deux continents."
#split_line(test)
28.1.2. Reading a Declaration#
Write a function read_statement(i)
that takes an integer i
as a parameter, representing a file i.txt
. This function creates a list of all the words contained in the speech. It returns this list.
Hint: In class, we presented a method for reading a file line by line.
def read_statement(i):
pass
#len(read_statement(1)) #8810
28.1.3. Counting Occurences#
Write a function couting_occurences
that takes a list of words as a parameter. It returns a dictionary where the keys are the words from the input list, and the values represent the number of times each word appears.
def couting_occurences(words):
pass
# sum(couting_occurences(read_statement(1)).values()) #8810
28.1.4. Sorting Words#
Since a dictionary is not a sequential data structure, we need to convert it into a list where each element is a tuple containing a word (in second place) and its occurrence count (in first). We can then sort the list in decreasing order of occurrences. We can then sort the list in decreasing order based on the occurrence count. The sorting method prioritizes the first element in the tuple and uses the second element (the word) as a tiebreaker when counts are equal.
Write a function sorting_words
that takes a dictionary of word occurrences as a parameter and returns a sorted list of tuples.
Hint: use sort
or sorted
as seen in previous section.
def sorting_words(words):
pass
words = sorting_words(couting_occurences(read_statement(1)))
# words # [(442, 'de'), (353, 'la'), (270, 'l'), (244, 'et'), (205, 'des'),...
28.2. Refining Results#
Many words in the text, such as adverbs and pronouns, do not contribute meaningfully to the analysis. To perform proper text analysis, we need to filter these words out. The file words_eclusion.txt
contains a list of words to exclude from the analysis.
You can reuse the read_word_file
function that we created in the Diceware exercise. Copy this method into your source code to proceed.
def read_word_file(file_path):
file = None
words = []
try:
file = open(file_path, 'r')
for line in file:
words.append(line.strip())
return words
except FileNotFoundError:
print(f"The file {file_path} does not exist.")
except IOError:
print("An error occurred while accessing the file.")
finally:
if file :
file.close()
banned_words = read_word_file("general_policy/words_exclusion.txt")
len(banned_words) # 865
865
28.2.1. Filter Banned Words#
Create a filter_banned_words
function that takes two parameters: a list of occurences of word (tuples of occurrence count and word) and a list of banned words. It returns a new list of word occurrences that contains only meaningful words, excluding the banned ones.
def filter_banned_words(occurences, banned_words):
pass
# words = filter_banned_words(words, banned_words)
# words[:30] # [(42, 'gouvernement'), (42, 'france'), (32, 'politique'), (29, 'doit'),...
28.2.2. Vizualize Results#
We can now use the module matplotlib
to vizualise the results. Add the following function to your code.
import matplotlib.pyplot as plt
def top_words(i):
banned_words = read_word_file("general_policy/words_exclusion.txt")
words = sorting_words(couting_occurences(read_statement(i)))
filter_words = filter_banned_words(words, banned_words)
# This notation create two lists. One with the first element of tuples. Another of the second element
x, y = zip(*filter_words[:20])
# Add a title to the graph, rotate the label, use bar graph and display the graph
plt.title("Top 30 of most used words in statement")
plt.xticks(rotation=90)
plt.bar(y, x)
plt.show()
#top_words(1)
28.2.3. Temporal Analysis#
We can perform a temporal analysis to discover patterns by examining all the statements. The files in the statements directory represent the following declarations:
Déclaration de Michel Debré, 15 janvier 1959
Déclaration de Georges Pompidou, 26 avril 1962
Déclaration de Maurice Couve de Murville, 17 juillet 1968
Déclaration de Jacques Chaban Delmas, 16 septembre 1969
Déclaration de Pierre Messmer, 3 octobre 1972
Déclaration de Jacques Chirac, 5 juin 1974
Déclaration de Raymond Barre, 5 octobre 1976
Déclaration de Pierre Mauroy, 8 juillet 1981
Déclaration de Laurent Fabius, 24 juillet 1984
Déclaration de Jacques Chirac, 9 avril 1986
Déclaration de Michel Rocard, 29 juin 1988
Déclaration d’Édith Cresson, 22 mai 1991
Déclaration de Pierre Bérégovoy, 8 avril 1992
Déclaration d’Édouard Balladur, 8 avril 1993
Déclaration d’Alain Juppé, 23 mai 1995
Déclaration de Lionel Jospin, 24 juin 1997
Déclaration de Jean-Pierre Raffarin, 3 juillet 2002
Déclaration de Jean-Pierre Raffarin, 5 avril 2004
Déclaration de Dominique de Villepin, 8 juin 2005
Déclaration de François Fillon, 3 juillet 2007
Déclaration de François Fillon, 24 novembre 2010
Déclaration de Jean-Marc Ayrault, 3 juillet 2012
Déclaration de Manuel Valls, 8 avril 2014
Déclaration de Manuel Valls, 16 septembre 2014
Déclaration de Bernard Cazeneuve, 14 décembre 2016
Déclaration d’Édouard Philippe, le 4 juillet 2017
Déclaration de Jean Castex, le 15 juillet 2020
Déclaration d’Élisabeth Borne, le 6 juillet 2022
Déclaration de M. Gabriel Attal, le 30 janvier 2024
Add the following evolution
function to your code. This function should read all the statements and display the evolution of the usage of the word over time.
def evolution(word):
x = [1959, 1962, 1968, 1969, 1972,
1974, 1976, 1981, 1984, 1986,
1988, 1991, 1992, 1993, 1995,
1997, 2002, 2004, 2005, 2007,
2010, 2012, 2014, 2014, 2016,
2017, 2020, 2022, 2024]
statements = [read_statement(i) for i in range(1, len(x)+1)]
y = [s.count(word)/len(s) for s in statements]
plt.title(f'Usage evolution of word "{word}"\nin general statements from {x[0]} to {x[-1]}')
plt.plot(x, y[:len(x)])
plt.show()
# evolution("chômage")
# evolution("santé")
# evolution("école")
28.2.4. Do It Yourself#
You should now have a basic understanding of how to manage data and display results using matplotlib
. Experiment with it to find interesting ways to visualize your data. Explore different types of charts and plots to represent your data effectively. You can find examples of charts plotted using matplotlib
with the associated code.