Will Big Data Replace Data Warehouse?
I have often come across this question – at times as a direct question from few of my colleague and also at times as a point of discussion while designing business intelligence system for the clients.
Data warehousing is the buzzword for the past two decades and big data is a hot trend in the recent decade. Let’s find out what could be the answer for this question.
Obviously, first thought for anyone who is technically not much deep into these technologies is that recent big data will replace older data warehousing. An additional reason for this simple thinking is the similarities they offer:
- Both hold a lot of data
- Both can be used for reporting
- Both are managed by electronic storage devices
But still, Big data and Data warehouse are not interchangeable. Why?
What is data warehouse?
Data Warehousing is extracting data from one or more homogeneous or heterogeneous data sources, transforming the data and loading that into a data repository to do data analysis which helps in taking better decisions to improve one’s performance and can be used for reporting.
Data repository generated from the process as mentioned is nothing but the data warehouse.
What is big data?
Big data refers to volume, variety, and velocity of the data. How big is the data, the speed at which it is coming and a variety of data determines so-called “Big Data”. The 3 V’s of the big data was articulated by industry analyst Doug Laney in the early 2000s.
- Volume. Organizations collect data from a variety of sources, including business transactions, social media, and information from sensor or machine-to-machine data. In the past, storing it would’ve been a problem – but new technologies (such as Hadoop) have eased the burden.
- Velocity. Data streams in at an unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors, and smart metering are driving the need to deal with torrents of data in near-real time.
- Variety. Data comes in all types of formats – from structured, numeric data in traditional databases to unstructured text documents, email, video, audio, stock ticker data and financial transactions.
Why does any organization want big data or data warehouse?
- Big data: Organizations want a big data solution because in a lot of corporations there is a lot of data. And in those corporations that data – if unlocked properly – can contain much valuable information that can lead to better decisions that, in turn, can lead to more revenue, more profitability, and more customers. And that is what most corporations want.
- Data warehouse: Organizations need a data warehouse in order to make informed decisions. In order to really know what is going on in your corporation, you need data that is reliable, believable and accessible to everyone.
Both the above look similar but there is a clear difference. Big data is a repository to hold lots of data but it is not sure what we want to do with it, whereas data warehouse is designed with the clear intention to make informed decisions. Further, a big data can be used for data warehousing purposes.
Why is it like comparing apples to oranges?
Big data and data warehouse are two different things, it is like comparing apple to an orange.
- A big data solution is a technology whereas
- Data warehousing is an architecture
A technology, such as big data, is a means to store and manage large amounts of data. Organizations make use of various big data solutions to store a large volume of data at lower cost.
Whereas as a data warehouse is a framework to organize data to give a single version of the truth. Typically, a data warehouse is built to consolidate data from varied sources and organize them in an easily readable way. There is a data lineage capability that helps trace the origin of the data.
So, what is the conclusion?
As evident from the important differences between big data and data warehouse, they are not the same and therefore not interchangeable. Therefore big data solution will not replace data warehouse. An organization can have any combination as below depending on the need(not because they are similar):
- Only big data solution
- Only data warehouse solution
- Big data as well as data warehouse solution
This is a guest post by Manjunath Hegde, who has over a decade’s experience in Business Intelligence and working with analytics related technologies. He is currently enrolled in the Executive Program in Business Analytics by Jigsaw Academy and MISB Bocconi.