Given my background in finance, I celebrate the new year on 1st April 😉 and the Union Budget is as important an event as Christmas!
As an NLP exercise, I decided to use the budget speeches from the last decade. What is NLP, you ask? Natural Language Processing (NLP) is an interdisciplinary branch of artificial intelligence, computer science, and linguists that helps program computers to understand, interpret, and generate native human or natural language. Do read our earlier blog post, A Quick Introduction To Natural Language Processing.
Alexa, Siri, and Google Assistant are all examples of NLP in practice. NLP has numerous applications such as part-of-speech tagging, Named Entity Recognition (NER), question-answering, speech recognition, text-to-speech and speech-to-text, topic modeling, sentiment classification, language modeling, and translation
In this article, we will focus on the sentiment analysis of the budget speeches by Indian Finance Ministers. We have had 12 budget presentations (including 2 interim ones) in the past ten years. I downloaded the data from the Government of India’s site. However, the Sarkar does not pay much attention to the details and some years have wrong/dead links! I sourced the missing speeches from national newspapers.
I used a loop to read the data and store it as a Python pandas data frame. I used regular expressions to clean the data. Using sklearn’s CountVectorizer, I created a document-term matrix excluding common English stop words. We did our analysis on the cleaned data.
Here are the top 15 words used in a few of the budget speeches:
Feb 2010 – Pranab Mukherjee
cent, propose, crore, year, duty, government, tax, sector, development, growth, budget, provide, fiscal, central
Feb 2013 – P Chidambaram
propose, crore, percent, tax, provide, government, year, sector, investment, development, funds, fund, rate, plan
Feb 2015 – Arun Jaitley
tax, crore, proposed, India, act, service, government, excise, duty, year, investment, madam, provide, credit
Jul 2019 – Nirmala Sitharaman
tax, government, proposed, India, provide, shall, lakh, section, scheme, crore, income, act, years, year
Looking at the list, I added some more words which we consider not relevant for the analysis. The list is add_stop_words= [‘crore’, ‘year’, ‘propose’, ‘provide’, ‘sector’, ‘lakh’, ‘years’, ‘proposed’, ‘new’, ‘cent’, ‘percent’, ‘shall’ ]
Then we built word clouds for the budget speeches from the last decade.
Do you notice any trends and patterns from the word clouds? Looking at the word clouds, what other words would you remove by adding to the add_stop_words list? Which words do you think would be among the most commonly used words in the 2020 Union Budget?
I did a short analysis on the vocabulary of Finance Ministers. It would have been interesting to see how Shashi Tharoor would have measured up if he was the Finance Minister, don’t you think?
We also did a sentiment analysis using the textblob library.
As we can see, our finance ministers are a positive lot. As cheery as Santa Claus! Ho Ho!!
As noted in 2013 and 2018, the finance ministers tend to be more opinionated during the final full budget before the national elections.
Finally, we analyzed the polarity for the budget speeches.
Are you wondering what polarity is? In brief, polarity refers to the emotions expressed in a sentence. The strength of sentiments or opinions is linked to the intensity of emotions, such as happiness and anger. It does appear that the mood dips during the end of the budget speeches.
Interested to read more about how TextBlob calculates sentiments and polarity? You can read more here.
What else would you do? Use a bag of words/n-grams? Use stemming and lemmatization? Or do you side with Peter Skomoroch, the Principal Data Scientist at LinkedIn?
Interested in learning more about NLP? Join the Postgraduate Program in Data Science and Machine Learning (PGPDM) course, offered by Jigsaw Academy in collaboration with the University of Chicago, which has a new module on AI and DL. NLP is covered in detail.
We also cover text analysis in IIM Indore’s Integrated program in Business Analytics (IPBA) course.