# How the PGPDM Program’s Capstone Project Helped Me Learn More About Analytics

After 9 months of rigorous training and assignments, I entered my crucial and final phase in my Post Graduate Program in Data Science & Machine Learning course with Jigsaw Academy and The University of Chicago’s Graham school. My Capstone project is a real-time business problem of a leading cleaning product manufacturer in the USA, assigned by the industry partner. The problem statement was to identify the pricing dynamics and determine the price elasticity of the target brand based on customer segments and time period.

Wow! Interesting!

When I was assigned, I had a few questions in my mind:

- How will my results impact the cleaning brand when I solve this?
- Where will they use this?
- Why will they be interested in understanding this price movement?
- Who are the key stakeholders?
- How much benefit will they attain?

To clear the doubts and to find a way to solve this challenge, I did the following:

__Business Understanding__

I took the first step to understand the business domain. My research with the industry partner and web sources led to very interesting insights about –

- the relationship between product manufacturers and retailers
- the importance of pricing correctly to increase revenue and margin
- the undisputable power of retailers who determine the final price
- how our cleaning brand manufacturer can influence the price by understanding the price elasticity of their products and
- what key factors will determine the price elasticity

I even signed up for an online course on pricing strategy to understand the details in-depth.

__Data Understanding__

Once I understood the above pointers, my next step was to look at the data that was provided to us. The data had 3 customer segments (retailers) with store level and state level weekly transactional data of target brand and its competitors. On doing preliminary descriptive statistics, I was able to grasp the level of data I had, the missing elements and the differences in the customer segments (each retailer was selling different SKUs of the target brand and having its own pricing strategy)

__Data Preparation__

Exploratory Data Analysis: Here comes the most interesting part – slicing and dicing data through data manipulation and visualization. This is the stage where I understood which products do an exceptional and mediocre sales across states, stores, customer segments and time period. The below diagram explains how the product price and quantity sold have changed over time for one retailer.

- Feature Engineering: To build the model, I understood I need to engineer few variables like price of one product, week number and log transformations of a few more variables to make them normally distributed. Well-engineered variables will always yield a better and accurate model!

__Modelling__

The fancy stage in the entire data science process – building models. I was not fortunate enough to build an advanced model as a statistical algorithm like linear regression only will help to solve the problem of determining elasticity of a product. Nevertheless, I was lucky to apply linear regression in depth, avoiding over-fitting issues and learning more about the assumptions and interpretations of linear regression results (I believe this would be a strong foundation for building advanced models in future. The key thing to note is to solve a business problem with a more sophisticated and scalable algorithm than building advanced models that may not fit the business requirement). I estimated both price elasticity and cross-price elasticity of the target brand based on customer segments and time period to present a comparative picture.

As linear regression helps find the statistical relationship between variables, the price elasticity can be determined as:

**Q = α + β _{1}X_{1} + β_{2}X_{2} + β_{3}X_{3} + β_{4}X_{4} + e_{i}**

Q – Tot. Quantity sold of the target product

β_{1} – Coefficient/slope of the target product

X_{1} – Price of the target product`

β_{2} – Coefficient/slope of the related product

X_{2} – Price of the related product

β_{3} – Coefficient/slope of the competitor product

X_{3} – Price of the competitor product

β_{4} – Coefficient/slope of Time period

X_{4} – Time period

**Price elasticity of Demand = Change in quantity (ΔQ) / Change in Price (ΔPX _{1})**

__Evaluation__

The key evaluation metric was RMSE or Root Mean Square Error value which needs to be as low as possible to determine the efficiency of the linear regression model. Also, I have considered MAE and predicted R2 scores as other key evaluation parameters.

Finally, what can be inferred from this exercise?

The target product got a price elasticity of -2.27. An intuitive explanation will be – a 10% decrease in their price will lead to 22.7% increase in their demand. If the retailer sells 1000 products at $5 each in a week, reducing the price by 10% i.e. keeping it at $4.5, the demand will increase to 1227 quantities. Subsequently, the revenue in a week for the retailer will increase from $5000 to $5521.50. The cleaning brand manufacturer can use this information to influence the retailer and optimize the price to get more margin and market share for their product.

To conclude, I have approached this entire project based on CRISP-DM Methodology which was the data analytics project framework of the PGPDM course.