Decluttering Data Science: The Expert Review
In the second part of the 3-part series of decluttering the idea of what Data Science is, the members of the Analytics Leaders Network (ALN) share their thoughts.
Find here Part 1 of the series.
Data Analytics – Art, Science, or Engineering? – Avik Sarkar, Head, Data Analytics Cell, Niti Aayog
As a child growing up in West Bengal, it was hard to avoid discussion about the various types of martial arts that each youth practised like Taekwondo, Jujitsu, Kung Fu, Karate, etc. The more interesting part was the debate between these various groups to prove how superior their style of martial arts is, as compared to the others. Over time with a few grey hairs, I have realised how irrelevant those debates were and the only objective of martial arts is self-defence in certain circumstances. The current discussion on data analytics seems quite similar as people with and without grey hairs participating in the debate!
With the growing popularity of Data Analytics or one of the related terms like Data Science, Big Data, Artificial Intelligence, Business Analytics, Business Intelligence, Machine Learning, etc. etc. (yes, the two etc. are on purpose as the list is very long). As we have increasingly more people entering this domain from various backgrounds and expectations we would see more such terms to describe this agile domain. To me the objective of this field is very simple – generate insights from data and apply them on the field.
The field is like the Bollywood film industry of early 1900s when few professionals used it, the field must evolve over the years. For a long time, data analytics was restricted to researchers working across domains and had a limited view of the domain. For example, statisticians would focus on the aspects of surveys, estimation, etc.; social science folks focussing more on regression and causality aspects whereas computer scientists focussing more on algorithms. Now the field is becoming more popular and is evolving as a truly inter disciplinary domain. The confluence of science, arts and engineering have opened the enormous possibilities of data analytics today.
Data analytics professionals need to master the art of storytelling with data. This is a very effective way to communicate the findings from the data starting from the high level to the finer insights – this helps in getting a larger group interested about the findings from the data. Intuitive, interactive visualization can often help in the process of storytelling. Story telling is primarily seen as an arts activity, hence data analytics can pick up new visualization and storytelling from the arts domain.
Then there is an aspect of dealing with large amounts of data and making sense of this data. Techniques from scientific domains like mathematics, physics and statistics and computer science lends lot of the technology and algorithms to deal with the same. Often one cannot process all in a single go and must deal with samples – the field of statistics have developed on the principles of collecting insights about a population based on intelligently chosen samples. Hence the field of data analytics has a lot to learn from the various fields of science.
Data across different datasets must be combined to get a complete picture – often these datasets can reside across different servers. Also, once the insights from the data is generated, it is important to apply these insights in real-time for better outcomes. All these and many other aspects of data analytics brings the engineering element – this is very crucial in today’s world while delivering real-time recommendation on the shopping portal or dynamic discounts on the travel portal, etc. Hence, data analytics has to deal heavily with the engineering element too. All these makes the field of data analytics a true confluence of various fields of arts, science, and engineering.
Data science is an Art – Indranath Mukherjee, Head – Strategic Analytics, XL Catlin India
Although there are strong elements of science and engineering in the space, data science is lot more than just feeding data into a computer and applying some algorithm to magically get the insight. We are problem solvers first and fortunately we are skilled in the art of decision science.
At a time when everyone is talking about AI and ML, it is very easy to jump into the bandwagon of sexy algorithms. Knowing a bunch of algorithm is great but that won’t necessarily make you a successful data scientist. My sincere urge for the aspiring data scientist is that, learn as many algorithms as you want but learn the basics well. Understanding the problem at hand is the first step. Knowing the domain well is essential as your reading of data will be much insightful when you know the domain.
I can’t emphasize enough how important the step of feature engineering is in a modeling engagement. When you have done your exploratory data analysis (EDA) and feature engineering well, you are ready for modeling. And in most of the problems, a simplistic approach is likely to give you a good result which can be implemented in the real-life scenario. Some of the complex algorithms like Deep Recurrent Neural Networks may result in marginally better results but in many scenarios the results may be too complex to implement.
To simplify things further, understanding of all the algorithms under the sun is neither necessary nor sufficient condition for becoming a good data scientist. Knowing logistic and linear regression and some decision tree models will be more than enough to start your journey. Focus on the problem at hand; learn how to be creative in your solution approach. Learn the art of EDA and feature engineering by practicing a lot.
When it comes to applying algorithms, start from the simple model and see how much incremental lift you are getting by doing something complex. The scientific and engineering elements of your armoury will mostly be available in public domain; you must develop the capability to think creatively and critically. Deep learning is not the silver bullet that will take us to the end of all the problems. Let the dust settle in your mind first and you enjoy learning the art of data science.
Head here to read the concluding part of the 3-part series.
If you are an analytics leader and wish to share your experience or thoughts about the Indian analytics space, I invite you to write to me.