Big Data and the Data Warehouse

So you have a traditional data warehouse and heavily invested BI implementation. You have critical business reports running every day. However with the advent of Big Data you are wondering what you should do with your enterprise data warehouse and BI tools?

The truth is that Big Data is NOT a substitute for Data Warehouse and BI. At least yet. In the future BI tools will probably mature to do all that they can do with data warehouses and OLTP today, but we are not there yet. Today Big Data should be used to augment the data warehouse , not replace it. Here is how both systems can co-exist

  • Keep your structured summarized ETLd data in your enterprise data warehouse as is.
  • Use Big Data systems like Hadoop to store massive amounts of unstructured data like logs, social media content, reviews, comments, text etc. The Big Data systems have the ability to store and process massive amounts of data on commodity hardware and scale really well. Hence the Big Data system becomes an archive of data.
  • Analyze your Big Data using HBase/Hive, extract meaningful stuff from it and put it in the warehouse to report against it.
  • Use the warehouse to bring together structured data and filtered unstructured data from across the enterprise to offer accurate Business Intelligence.

An example, lets say you are an online retail company and traditionally stored structured information like orders, customer accounts etc, in the data warehouse. Now there is a flood of new unstructured data like reviews, comments, product description, social media data from customers .. If you had to store all that in the database it would be very very costly. But you don’t want to throw them away either since your never know what you will need. So dump all of that unstructured data in a Big Data file system. Derive the pieces needed and batch load them into the warehouse. Now you can use your BI tools to report against combined structured and unstructured data that have been put in dimensional data marts or OLAP cubes. The BI tools available today work well against dimensional marts and cubes providing a rich set of function and capability for reporting and ad hoc querying.

Remember Big Data tools and skills are specialized and emerging. Hence a consulting engagement for a Big Data project is bound to be expensive. Hence know the problems you want to solve with Big Data and don’t try to replace your heavily invested enterprise data warehouse with Hadoop!


From Data to Decisions or Decisions to Data?

Analytics helps in transforming your data into information and derive insights to make decisions

Data -> Information -> Decisions

We have heard this over and over again. However where should one start? One of the key issues I am seeing, with this hype of Data Analytics, is people are starting with the Data and saying “do something with all the data that I collect and tap into all sources of information available and give me some insights. I will then use the insights to make decisions”. This is the wrong approach and can get you into lengthy and messy engagements.

Start with the decisions you need to make. Prioritize them. Then break each decision down into questions you need to answer to make that decision. Sort the questions by importance. Once you have the most important questions, ask your Data Scientists/Analysts to provide answers to these questions. Specify the format in which you want to see the answers. When you have narrowed this down you will realize you need to tap only certain specific data sources and you may need to use limited tools/technology to answer it.  Some questions can be answered by basic business intelligence reports, some questions may need deeper data mining of unstructured data.

Get wise about your Analytics strategy ..  just because the world’s data is available at your fingertips, doesn’t mean you need all of it. You will save cost in the long run.