Data Science and Big Data Analytics may appear like interchangeable at times, but they both serve different purposes. Dr. Ghosh explains that Data Science is the in-depth evaluation of data and has been around for decades (probably under different names).
Big Data, on the other hand, hit the scene a little later. So despite being called ‘Big’ data, it is the younger sibling, in this case. It is different in its approach and is based on the problem you want to solve.
Big Data is not necessarily about having a huge amount of data. There are two dimensions to it. One is amassing different varieties of data. The other one is where you have to deal with certain questions, but you do not have the data. Big Data doesn’t necessarily mean sitting over a pre-availed chunk of data. It often involves working around the lack of required data.
For example, take Donald Trump’s Presidential campaign. It relied on a combination of structured and unstructured data. It pooled social media data (unstructured) with information gathered from surveys and polls (structured).
Combining Big Data and Data Science
Now, the Big Data problem is that there is a lot of unstructured data, which may need some cleaning and refining to make it usable. The volume of data may be large too. So how do you solve the challenging issue of combining the two and drawing insights from them?
That’s where Data Science comes into the picture. It trains people in certain skill sets, which allows amalgamation of both the above-mentioned factors. So Big Data is more like a topic (theory), while data science involves more of training. They are not two different concepts, they are interlinked and interdependent. However, they are not a substitute for each other.
Also, Data Science has evolved over time. Initially, it was more like computer science or engineering. However, industry experts soon realized that slicing and dicing a huge set of data as per the requirements was not enough to have meaningful insights. There was a need to understand the relationship between all the elements to be able to make reliable predictions. That paved way for tools used by statisticians and economists, which when tied together defines the function of a data scientist.
Big Data largely focuses on storing and processing data. It usually involves a massive amount of data, which cannot be effectively processed with the help of traditional data segregation tools. Data Science, on the other hand, is more focused towards the decision and actions based on the collected data. It comprises of the use of Mathematical, Machine Learning and Statistical algorithms for generating and using data.