We are living in the time of Big Data. It comes from everywhere - from our cell phones, our computers, from fuel pumps, water sensors in meteorological stations, and countless other sources. It is analyzed in real-time (that means now, while you are reading this) and the results are used to send us advertisements for products and services, to calculate market trends, and to figure out how groups of people will vote, to name just a few of the uses. The systems that are emerging can do this kind of analysis at levels of detail (now called granularity) that amaze, compel and even horrify us.
What is Big Data? In 2001 Doug Laney, an analyst at Gartner, defined data growth challenges and opportunities as being three-dimensional or Three V, i.e. increasing volume (amount of data), velocity (speed of data in and out), and variety (range of data types and sources). "Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization." Additionally, a new V, "Veracity" is added by some organizations to describe it.
The volume of data seems to come from the land of unimaginable quantities like petabytes. (A petabyte is 1015 bytes of information, or one quadrillion. An exabyte is 1018 bytes, or one quintillion.) The world's effective capacity to exchange information through telecommunications networks was 281 petabytes in 1986 and grew to 65 exabytes by 2007. The amount of traffic flowing over the internet is predicted to reach 667 exabytes annually by the end of 2013. Why should we care about this? For one, technological improvements mean that the ten years needed to analyze the human genome in the past has now shrunk to less than a week.
But it also means that Big Brother can watch us, and find us anywhere, anytime, using facial recognition software. And there are financial risks with that amount of information moving around. Falcon Credit Card Fraud Detection System, for example, protects 2.1 billion active credit card accounts world-wide on a minute-to-minute basis.
In developing countries, accelerated expansion of cell phone usage and the Internet means they are also active contributors to Big Data, which can be used in a number of positive ways. For instance, in helping to identify and address disease outbreaks and to provide information and understanding about mobility patterns and labor demand on a real-time basis. We can help address social services issues, improve market management and responsiveness, and plug some important gaps. Big Data can also help us with early warning systems, awareness of situations in the moment they occur, and fast feedback on changes. One good recent example was here in the Philippines where the sheer volume of video, cellular, and meteorological data provided during and after Typhoon Yolanda helped the world respond faster, and likely helped limit the numbers of lives lost.
It is not all good news however. The challenges faced in using Big Data, particularly for developing countries, are enormous. Lack of privacy, limited access to information collected by private firms, ineffective analysis of the right data to answer the right questions, bias in results, and inadequate reflection of the cultural context can cause us to leap before we look. And the sheer volume of information means that “data about data” has to be formed in order to make those exabytes useful. We are developing tools for this, and starting to determine what data sources can provide information useful in improving the lives of those in developing countries.
We need to see how we can make Big Data work for everyone. This is the next big innovation.