According to IBM, Every day we create 2.5 quintillion (2.5*1018 ) bytes of data in the world and it’s so much that about 90% of the world’s data today has been created in the last two years alone. This vast amount of data generated so fast is throwing a lot of challenges to the data science and related field in analyzing and utilizing them. This fast generating, challenging, variety and difficult data is called big data.
Big data is not a single technology but a combination of old and new technologies that help companies gain actionable insight. So big data is the capability to manage huge volume of different data, at the right speed and within the right time frame to allow real-time analysis and action.
The major challenges of big data are:
Volumn: How much data.
Velocity: How fast the data is processed.
Variety: Different types of data
Big data comprises of almost all kinds of data available in the world that are structured and unstructured. Unstructured data is data that’s not in a particular data model and it can be any data such as text, sensor data, audio, video, images, click streams, log files to name a few. In 1998, Merrill Lynch cited a rule of thumb that somewhere around 80-90% of all potentially usable business information may originate in unstructured form. Recently analysts predict that data will grow 800% over the next five years. Computer world says that unstructured information might account for more than 70-80% of all data in an organization. So it’s extremely crucial to analyze and utilize these vast amounts of data for the benefit of the organization.
Global Market for Big data:
- Digital information is growing at 57% per annum globally.
- With global social network penetration and mobile internet penetration both under 20% this growth has only just begun.
- All the data generated is valuable, but only if it can be interpreted in a timely and cost effective manner.
- IDC expects revenues for big data technology infrastructure to grow by 40% per annum for the next three years.
In 2006, IDC estimated, the world produced 0.18 zettabytes of digital information. It grew to 1.8 zettabytes in 2011 and will reach 35 zettabytes by 2020.
Few statistics to demonstrate the ‘big’ part of the bigdata:
- Twitter generates nearly 12 TB of data per day, 58 million tweets perday.
- Every hour Wallmart controls more than 1 million customer transactions. All of this information is transferred into a database working with over 2.5 petabytes of information.
- According to FICO, the credit card fraud system currently in place helps protect over 2 billion accounts all over the globe.
- Currently Facebook holds more than 45 billion photos in its entire user base and the number of photos growing rapidly.
- The amount of data processed daily by Google is 20 PB and monthly worldwide searches on Google sites are 87.8 billion.
Here is an interesting statistics from YouTube alone:
- More than 1 billion UNIQUE users visit YouTube every month.
- Over 4 billion hours of video are watched each month.
- 72 hours of video are uploaded every minute. (It will take 3 days to watch them all without sleep).
So, Big data is the next big thing happening to IT industry. To be successful in the IT industry it’s really crucial to adopt to big data analytics to make use of the exploding amount of data that’s available now and in the future.