Big Data and the Data Warehouse

So you have a traditional data warehouse and heavily invested BI implementation. You have critical business reports running every day. However with the advent of Big Data you are wondering what you should do with your enterprise data warehouse and BI tools?

The truth is that Big Data is NOT a substitute for Data Warehouse and BI. At least yet. In the future BI tools will probably mature to do all that they can do with data warehouses and OLTP today, but we are not there yet. Today Big Data should be used to augment the data warehouse , not replace it. Here is how both systems can co-exist

  • Keep your structured summarized ETLd data in your enterprise data warehouse as is.
  • Use Big Data systems like Hadoop to store massive amounts of unstructured data like logs, social media content, reviews, comments, text etc. The Big Data systems have the ability to store and process massive amounts of data on commodity hardware and scale really well. Hence the Big Data system becomes an archive of data.
  • Analyze your Big Data using HBase/Hive, extract meaningful stuff from it and put it in the warehouse to report against it.
  • Use the warehouse to bring together structured data and filtered unstructured data from across the enterprise to offer accurate Business Intelligence.

An example, lets say you are an online retail company and traditionally stored structured information like orders, customer accounts etc, in the data warehouse. Now there is a flood of new unstructured data like reviews, comments, product description, social media data from customers .. If you had to store all that in the database it would be very very costly. But you don’t want to throw them away either since your never know what you will need. So dump all of that unstructured data in a Big Data file system. Derive the pieces needed and batch load them into the warehouse. Now you can use your BI tools to report against combined structured and unstructured data that have been put in dimensional data marts or OLAP cubes. The BI tools available today work well against dimensional marts and cubes providing a rich set of function and capability for reporting and ad hoc querying.

Remember Big Data tools and skills are specialized and emerging. Hence a consulting engagement for a Big Data project is bound to be expensive. Hence know the problems you want to solve with Big Data and don’t try to replace your heavily invested enterprise data warehouse with Hadoop!