Big Data Analytics: Size is not important

Posted on August 29, 2011


There was a time when databases came in desktop, departmental and  enterprise sizes.  There was nothing larger than ‘enterprise’ and very few enterprises needed databases that scaled to what was the largest imaginable unit of data, the terabyte. They even named a database after it.

We now live in the world of the networked enterprise. Last year, according to the IDC, the digital universe totalled 1.2 zettabytes of data. And we are only at the beginning of the explosion which is set to grow by as much as 40 times by 2020. Massive data sets are being generated by web logs, social networks, connected devices and RFID tags. This is even before we connect our fridges (and we will) to the internet. Data volumes are growing at such a click that we needed a new term, Big Data (I know) to describe it.

What is meant by ‘big’ is highly subjective but the term is loosely used to  describe volumes of data that can not be dealt with by a conventional RDBMS running on conventional hardware. That is to say, alternative approaches to software, hardware or data architectures (Hadoop, map reduce, columnar, distributed data processing etc) are required.

Big Data is not just more of the same though.  Big Data is fundamentally different. It’s new and new data can present new opportunities. According to  the Mckinsey Global Institute the use of big data is a key way for leading companies to outperform their peers. Leading retailers, like Tesco, are already using big data to take even more market share.

This is because Big Data represents a fundamental shift from capturing transactions for analysis to capturing interactions. The source of todays analytic applications are customer purchases, product returns and  supplier purchase orders whilst Big Data captures every customer click and conversation. It can capture each and every interaction. This represents an extraordinary opportunity to capture, analyse and understand what customers really think about products and services or how they are responding to a marketing campaign as the campaign is running.

Deriving analytics from big data, from content, unstructured data and natural language conversations requires a  new approach. In spite of the name though, it’s less about the size and more about the structure (or absence of structure) and level at which organisations can now understand their businesses and their customers.