Introduction to NServiceBus

Why NServiceBus?

Simply put, NServiceBus elegantly solves the “reliability” part of the high-performing scalable distributed system architecture.  Think about for a moment that you architected, designed and developed a high end system which is highly scalable, available, extensible etc. but is less reliable, how big will be the success of the system, its highly questionable. Reliability of a system is crucial to gain credibility from the customer which will lead to big success. In addition to reliability, you can achieve highly scalable, extensible, available applications by incorporating NServiceBus in your SOA based distributed system.

Say for example you architected an online book store application and in order to provide high scalability you designed separate subsystems to process Ordering, Billing and shipping as follows.

order

UI portal depends on Ordering system to populate cart and do checkout, ordering system talks to billing to bill the customer and after billing to goes to shipping in order to deliver the product to the customer.  All these subsystems are loosely coupled and hosted on its own server to provide high scalability and extensibility etc,. Now although you implemented high availability and disaster recovery solutions to all the subsystems and you expect it to be highly available but still there is chance that system will become unavailable in an extreme case like network down, system reboot due to patch installation or system crash. In a distributed systems world, our systems sometimes depends on services from third party providers and what if their systems unavailable during an important transaction, you will end up losing the revenue from that transaction.

I personally encountered this problem on a famous online portal, firstcry.com who is specialized in baby products. I ordered some bath products for my baby and when I ordered there were only a very few quantities left in the stock so I ordered them quickly. I did a perfect checkout entering all the details including my card details. Amount got debited from my card. But when I am taken back to their order status page from my third party card payment page, there waited a surprise for me. Order status stated payment not yet received. How come this can be? I entered all the details correctly, it was a perfect checkout and I received even a sms from my credit card provider that amount got debited.  I called firstcry.com and they said “sorry sir, but payment not yet received you have to wait 24 hours to learn what happened and to process your order”. That’s unacceptable. But still I waited 24 hours patiently and then I received a call from them saying “Now we received payment but sorry sir, we are now out of stocks on those products you ordered”. That’s intolerable. Since there were only less quantity when I made order and by the time payment received them (which took 24 hours in my case), products gone out of stock, somebody else bought it and I got to get my money returned. They lost the revenue from that transaction and worst of all they lost customer’s trust, which will lead me not to go them for future transactions. From the situation it’s clear that some system went down or not in sync or not did a reliable transaction while I made order. I wish they needed a Service Bus to connect their sub systems reliably.

How NServiceBus achieves reliability?

Traditional distributed systems are designed as Request/Response system; i.e client makes a request to the server for a service and waits for the response. Server processes the request and sends the response to the client. Client process is blocked till it gets the response from server as it’s a synchronous processing. Since its synchronous processing and when one of communication is down, the whole transaction will be failed. But NServiceBus uses Message based communication using any messaging queues like MSMQ, RabbitQ etc. NServiceBus provides reliable communication using Message queue and asynchronous processing, implementing following principles.

1. Store and forward Messaging.

store and forward

Store and forward follows “fire and forget” strategy by client making request to server asynchronously and immediately returns control to the calling process. Request message can be stored locally at the client side before sending it to server. So before that request reaches server if server crashes or down, the request won’t be lost as it’s stored at the client side queue storage and that request will be retried sending to server automatically till it reaches server successfully.  Even if server crashes after the request received and stored at the server queue but did not processed successfully, that request will be processed after server is up and ready to process. This strategy exposes two benefits. 1. Client server communication is reliable through asynchronous messaging. 2. Since it follows ‘fire and forget’ approach, once request is made at the client side, control is returned immediately thereby releasing all the client resources like client thread and memory etc tied up with that request.

2. Request/Response & One way messaging.

request response

Unlike traditional synchronous request/response processing, NServiceBus implements request/response message processing as two one-way asynchronous processing. One asynchronous request is from client to server for making request. Another one is from server to client for sending response. Again, this exhibits all the benefits exposed by store and forward strategy.

3. Publish/Subscribe messaging.

pubsub_sub

Publish/Subscribe strategy helps build the distributed system extremely loosely coupled and each system have no knowledge about the other systems purpose of existence. Pub/Sub always works like there will be multiple subscribers and one publisher. All subscribers will be interested on a certain message which will be published by the publisher. Initially all subscribers must subscribe to publishers exposing its interest for a certain message. In addition to subscription message, each subscriber has to send its respective endpoint to which publisher has to deliver published message. The same way, subscriber has to know the publisher endpoint to which it has to subscribe.

pubsub_pub

Facade Pattern

Make it easy.  Provide an easy to use interface for a complex subsystem.

One touch play:

Weekend is up again. You rented a movie to watch and you have a sophisticated Full HD TV and home theatre system. Here are the steps to do to watch a movie.

  1. Turn on TV and home theatre system.facade
  2. Tune in TV to accept input from home theatre; it could be synced to DISH or computer etc.
  3. Change the home theatre mode to DVD as you are about to watch a DVD. It will usually have mode like, FM, TV, external input etc.
  4. Turn all the lights-off
  5. Turn on the popcorn popper for delicious popcorn to eat while watching movie.
  6. Put the DVD in the DVD player tray and play the movie.

What an exhaustive procedure to follow every time you wanted to watch a movie, it’s difficult and mind daunting.

What if you had all these systems are interconnected well with a remote control and on the remote control you had a “One touch play” button.

Just put the DVD on the DVD tray and press the “One touch play” button. All those six step procedures performed automatically. You enjoy the movie peacefully. That “One touch play” button implements Façade pattern.

Definition from GoF:

Provide a unified interface to a set of interfaces in a subsystem. Façade defines a higher-level interface that makes the subsystem easier to use.

Class diagram:

facade

Participants:
Façade (Remote control)
– Knows which subsystem is responsible for a particular request.
– Delegates client request to appropriate subsystem objects.
Subsystem classes (TV, Home theatre system, popcorn popper and Light system)
– Implement subsystem functionality.
– Handle work assigned by the façade object
– Have no knowledge of the façade; that is, they keep no references to it.

Implementation:

TV class


public class TV
 {
 public void TurnOn()
 {
 Console.WriteLine("TV turning on.");
 }
 public void SyncUpToHomeTheatre()
 {
 Console.WriteLine("TV syncing up to home theatre system.");
 }
 }

Home Theatre system class


public class HomeTheatreSystem
 {
 public void PowerOn()
 {
 Console.WriteLine("Home theatre system turning on.");
 }
 public void SetModeToDVD()
 {
 Console.WriteLine("Change mode to DVD player.");
 }
 public void Play()
 {
 Console.WriteLine("Playing movie");
 }
 }

Light system class:


public class RoomLightSystem
 {
 public void TurnOff()
 {
 Console.WriteLine("Turning of the lights");
 }
 }

Popcorn popper class:


public class PopcornPopper
 {
 public void TurnOn()
 {
 Console.WriteLine("Turning on the popper");
 }
 public void PopIt()
 {
 Console.WriteLine("Popping popcorn");
 }
 }

Remote control class:


public class RemoteControl
 {
 private TV tv = new TV();
 private HomeTheatreSystem hts = new HomeTheatreSystem();
 private RoomLightSystem lights = new RoomLightSystem();
 PopcornPopper popper = new PopcornPopper();
 public void OneTouchPlay()
 {
 tv.TurnOn();
 hts.PowerOn();
 tv.SyncUpToHomeTheatre();
 hts.SetModeToDVD();
 lights.TurnOff();
 popper.TurnOn();
 popper.PopIt();
 hts.Play();
 }
 }

Client code:


class Program
 {
 static void Main(string[] args)
 {
 RemoteControl remote = new RemoteControl();
 remote.OneTouchPlay();
 Console.ReadKey();
 }
 }

Sample output:


TV turning on.
Home theatre system turning on.
TV syncing up to home theatre system.
Change mode to DVD player.
Turning of the lights
Turning on the popper
Popping popcorn
Playing movie

As you can see clients communicate with the subsystem by sending requests to Façade(Remote), which forwards them to the appropriate subsystem objects. Although the subsystem objects perform the actual work, the façade may have to do work of its own to translate its interface to subsystem interfaces.

A Simple implementation for decorator pattern

For detailed explanation of decorator pattern, please check this link: http://alagesann.com/2013/08/16/decorator-pattern-made-easy/


public interface IPizza
 {
 int GetPrice();
 }

 public class Pizza : IPizza
 {
 public int GetPrice()
 {
 return 10;
 }
 }

 public class PizzaWithCheese : IPizza
 {
 private int CheesePrice { get; set; }
 private IPizza pizza;
 public PizzaWithCheese(IPizza pizza,int price)
 {
 this.pizza = pizza;
 CheesePrice = price;
 }
 public int GetPrice()
 {
 return pizza.GetPrice() + CheesePrice;
 }
 }

 public class PizzaWithChicken : IPizza
 {
 private int ChickenPrice { get; set; }
 private IPizza pizza;
 public PizzaWithChicken(IPizza pizza, int price)
 {
 this.pizza = pizza;
 ChickenPrice = price;
 }
 public int GetPrice()
 {
 return pizza.GetPrice() + ChickenPrice;
 }
 }

 class Program
 {
 static void Main(string[] args)
 {
 IPizza pizza = new Pizza();
 Console.WriteLine("Default pizza price=" + pizza.GetPrice());
 IPizza pizzaWithCheese = new PizzaWithCheese(pizza, 10);
 Console.WriteLine("pizza with cheese price=" + pizzaWithCheese.GetPrice());

IPizza pizzaWithChicken = new PizzaWithChicken(pizza, 20);
 Console.WriteLine("pizza with chicken price=" + pizzaWithChicken.GetPrice());

IPizza pizzaWithCheeseAndChicken = new PizzaWithChicken(pizzaWithCheese, 20);
 Console.WriteLine("pizza with cheese and chicken price=" + pizzaWithCheeseAndChicken.GetPrice());
 Console.ReadKey();
 }
 }

Sample output:


Default pizza price=10
pizza with cheese price=20
pizza with chicken price=30
pizza with cheese and chicken price=40

Decorator pattern made easy

Add new role or functionality dynamically to an object.

Real life Decorator in an organization:

Everybody from an organization is an employee, when he/she does not have a role/job title for a moment. Once he/she is assigned with a role he becomes responsible to do the functionality of the role he is assigned. Again, the role assigned to him is not permanent, business environment - teamwork graph  15mpbased on his performance he could be either promoted to a new position with new role or depromoted to his old position or he can stay with the same position. Often, there will be employees who will perform more than one role at the same time, for ex: A Manager will do team management and task assignment to his subordinates. How will it be if HR dept. doesn’t have capability to assign/reassign new roles and responsibilities to employees like I just said? Employee always will have only one permanent role for ever. You know that’s a bad HR dept. Assigning and reassigning roles, responsibilities or properties to an object is decorating, just like make-up, you act differently for the kind of make-up you put on.

Definition from GoF:

Attach additional responsibilities to an object dynamically. Decorators provide a flexible alternative to subclassing for extending functionality.

Problem domain:

Say, we are designing HR Application. Everybody as an employee can do basic functionalities such as Join, Termination and basic functionalities. So Employee is the basic component of the application. Based on the responsibilities each employee is assigned a different or more than one role to perform.  We should be able to assign/ remove any role (Engineer, team lead, manager etc.) to the employee dynamically. It should adhere to Open-Close principle i.e., creating and assigning new role should be easy without changing existing class hierarchy.

For now, following must be implemented.

I)                    An Engineer can do basic stuff and Coding.

II)                  A TeamLead can do basic stuff and task management.

III)                A manager can do basic stuff, task management and also people management.

Since the problem calls for dynamically adding/removing responsibilities to an object, it can be solved using decorator pattern.

Class diagram of HR application using decorator pattern:

HRApp_Classdiagram

Participant:

Component (IEmployee)
– Defines the interface for the objects that can have responsibilities added to them                     dynamically.
Concrete Component (Employee)
– Defines an object to which additional responsibilities can be added.
Decorator (Role)
– Maintains a reference to a component object and defines an interface that conforms to           component’s interface.
Concrete decorators (Engineer, TeamLead, Manager)
– Adds responsibilities to the component.

Implementation:
First, Component interface, this defines all the operations of an Employee along with Name.


public interface IEmployee
 {
 string Name { get; set; }
 string Join();
 string Terminate();
 string PerformJob();
 }

Now we will implement a one concrete Employee class, this is going to be the basic Concrete implementation of an employee to which responsibilities will be added dynamically. For now , it will do only basic functionalities.


public class Employee : IEmployee
 {
 public string Name { get; set; }
 public Employee(string Name)
 {
 this.Name = Name;
 }
public string Join()
 {
 return Name + " joined the company..";
 }

public string Terminate()
 {
 return Name + " left the company..";
 }
 public string PerformJob()
 {
 return Name + " doing basic stuff..";
 }

}

Now we will define an abstract class (interface) which will have an object for the component interface to delegate to perform the basic operations. Also this will help avoid the concrete decorator to implement all operations that are already available in employee component class that are going to be common for all the employee roles. These basic /common operations will be inherited to the concrete decorators.


public abstract class Role: IEmployee
 {
 public abstract IEmployee Employee { get; set; }

public string Name { get ; set; }

public string Join()
 {
 return Employee.Join();
 }

public string Terminate()
 {
 return Employee.Terminate();
 }
 public abstract string PerformJob();
 }

Engineer class implementation:


public class Engineer : Role
 {
 public override IEmployee Employee { get; set; }
 public Engineer(IEmployee Employee)
 {
 this.Employee = Employee;
 }
 public override string PerformJob()
 {
 return Employee.PerformJob() + " and Coding";
 }
 }

Team Lead class implementation:


public class TeamLead : Role
 {
 public override IEmployee Employee { get; set; }
 public TeamLead(IEmployee Employee)
 {
 this.Employee = Employee;
 }
 public override string PerformJob()
 {
 return Employee.PerformJob() + " and Task Management";
 }
 }

Manager class implementation:


public class Manager : Role
 {
 public override IEmployee Employee { get; set; }
 public Manager(IEmployee Employee)
 {
 this.Employee = Employee;
 }
 public override string PerformJob()
 {
 return Employee.PerformJob() + " and People management";
 }
 }

Client program:


class Program
 {
 static void Main(string[] args)
 {
 IEmployee employee1 = new Employee("George");
 IEmployee employee2 = new Employee("John");
 IEmployee employee3 = new Employee("Thomas");
 Console.WriteLine(employee1.Join());
 Console.WriteLine(employee2.Join());
 Console.WriteLine(employee3.Join());
 Engineer engineer = new Engineer(employee1);
 Console.WriteLine(engineer.PerformJob());
 TeamLead lead = new TeamLead(employee2);
 Console.WriteLine(lead.PerformJob());
 Manager manager = new Manager(new TeamLead(employee3));
 Console.WriteLine(manager.PerformJob());
 Console.WriteLine(engineer.Terminate());
 Console.WriteLine(lead.Terminate());
 Console.WriteLine(manager.Terminate());
 Console.ReadKey();
 }
 }

Sample output:

 George joined the company..
 John joined the company..
 Thomas joined the company..
 George doing basic stuff.. and Coding
 John doing basic stuff.. and Task Management
 Thomas doing basic stuff.. and Task Management and People management
 George left the company..
 John left the company..
 Thomas left the company..

As you can see, we can now dynamically create any role at run time and assign it to the default employee object. It’s so flexible than inheritance where you cannot add responsibilities dynamically, everything is static there. Decorator pattern is otherwise called as Wrapper as each decorator wraps a component inside it to which it delegates the functionalities. Decorator pattern is structural pattern.

For a simple implementation of decorator pattern, please check this link: http://alagesann.com/2013/08/16/a-simple-implementation-for-decorator-pattern/

Trello Architecture

Trello is a state of the art technology website developed by Fog creek software for managing anything (be it personal tasks, Vacation planning, project planning etc.) collaboratively by a group of people or individual.

This is how the application is designed and working:

Trello Arch

  1. Trello is a Single-page app that would generate its UI on the client and accept data updates from a push channel.
  2. They use CoffeeScript which does all the client side functionalities.
  3. The main client side technologies other than CoffeeScript:
    i)    Backbone.js (client-side MVC).
    ii)    HTML 5 pushState
    iii)    Mustache (templating language)

4.    Trello servers serve virtually no HTML, In fact, they don’t serve much client-side code at all. Since it’s a single-page app, a single minified and compressed approx 250K app is downloaded initially in the form of JS file, after that everything else is asynchronous.
5.    Initial single minified/compressed js file contains third party libraries, compiled CoffeeScript and Mustache templates and a CSS file (Compiled from LESS source with inlined images).
6.    This minified initial file is served through CloudFront CDN, so initial load of the app is fast irrespective of geographic location.
7.    In parallel, Trello kicks off an AJAX data load for the first pages data content and try to establish a HTML 5 websocket connection to the server.
8.    After data is returned from server through Ajax call, Backbone.js is in play to bind the data to DOM and display the content, just like our Knockout.js.
9.    There won’t be any page transition from one page to another after full initial page is loaded. HTML 5 PushState is used to move between pages and to provide consistent links in the location bar and just load data for backbone based script to handle transition.
10.    To represent HTML as client side model, Mustache is used as a templating language.
11.    When there is browser support on HTML 5 websocket (chrome, firefox and safari), websocket connection is made so that server can push changes made by other people down to browsers listening on the appropriate channels. So when anything happens to a board you are watching that action is published to trello server processes and propagated to your watching browser with very minimum delay, usually under a second.
12.    When browser don’t support websockets (browser like IE), they use AJAX requests to get updates every couple of seconds while user is active, and back off to polling every 10 seconds when the user goes idle.
13.    Major server side technologies are:
i)    Node.js ( server side technology)
ii)    Redis (To share data between users at the server)
iii)    MongoDB (Database )

14.    Server side of Trello is mostly built using Node.js. Trello wanted instant propagation of updates, which meant that they needed to be able to hold lot of open connections, so an event-driven, non-blocking server seemed a good choice, so they chose node.js.
15.    Client simply invoked those functions written in node.js throw a thin wrapper over a websocket.
16.    Trello uses Redis for data that needs to be shared between server processes but not persisted to disk.
17.    Interesting use of Redis is for sending changes to Models down to browser clients. When an object is changed on the server, they send a JSON message down all the appropriate WebSockets to notify those clients, and store the same message in a fixed-length list for the affected model, noting how many messages have been added to that list over all time. Then, when a client that is on AJAX polling pings the server to see if any changes have been made to an object since its last poll, they can get the entire server-side response down to client.
18.    Since trello have to work extremely fast, they used MongoDB to store data permanently.
19.    MongoDB is a documents database, so it stores a trello card’s data in a single document.

Boyer-Moore Algorithm

Unlike Knuth-Moriss-Pratt linear time algorithm, Boyer-moore algorithm can search for a pattern in a sublinear time. Till KMP time, people worked hard to come up with an algorithm to search for a pattern in linear time, i.e., O(M+N) time where M is the total length of the pattern to search and N is the total length of the String searched from.

Boyer and Moore were curious to nail down an algorithm which can work more efficiently than KMP in sublinear time. By “Sublinear” Boyer-moore means, this can search a pattern from the string by inspecting minimum number of characters rather than inspecting every character in the string. They deviced an intellectual logic to skip inspecting some characters and still reached a successful search match when the pattern existed in the string.

Though Boyer-moore algorithm works efficiently in sublinear time, this has few limitations in which case it will end up working in linear time. Search efficiency increases when the number of characters from the pattern increases as it gets more characters to skip from inspecting. Efficiency falls back to linear time when it had to search only one character.

The idea of how it works:

The main idea behind the algorithm is it gains more important information by matching the pattern from right-left rather than usual left-right matching. Assume pattern has p0, p1, p2… pm characters and Searching string have s0, s1, s2… sn characters. Algorithm takes edge by comparing sm to pm, sm-1 to pm-1 and so on up to s0-p0.

An Example:Image 1Assume that there is no space between the characters both in pattern and string and search happens at the place where upward arrow points.

Since pattern length is 7, search starts at the String location “F” and step left one by one. Searching is done by picking a character at the String pointed by the upward arrow and searches that character in the pattern.

Character “F” from string is matched with char “T” at the pattern and then with “A”, then with “H” and so on. Since char “F” is not matched with ANY of the character from the pattern, we can confirm that the whole pattern won’t be matched with the current position in the string. By current position, I mean from the char pointed by the first char of the pattern to the char pointed by last char of the pattern (that’s where the actual searching happens now). In this case, pattern is not matched from “W” to “F”.

The key point is that for match to succeed, the char “F” should exist atleast in one place in the pattern. Since it’s not appearing even in one place, we can confirm the pattern won’t match in the current place. So we can safely skip from inspecting all other left chars from “-“ to “W” in the string with the every char of the pattern.

Boyer-moore observation 1:  If char is known not to occur in pattern, then we know we need not consider the possibility of an occurrence of pattern starting at string position 1, 2, 3… or patternlength: Such an occurrence would require that char to search be a character of pattern.

According to observation 1, we can safely skip first 7 chars from the string and first char of pattern is positioned at the 8th char of string.

Image 2Now,  matching occurs starting from char “I” to “-“ in the string and the upward arrow pointing at “-“ that’s where the match starts.

“-“ is matched with every char in the pattern from right to left and we have match at the fifth location in the pattern. Importantly first matching with the pattern only considered during matching. When there is a match occurs pattern has to be realigned with the string so that the currently matched char is positioned at the same location.  In this case, pattern is realigned to match those two hyphens.  After realignment arrow too has to be repositioned down by 4 chars so as to point to the last char of the pattern.

Since our first search char “-“ matched at the fifth position in the pattern, not the first position (right most of pattern as the string), we know for sure that its worthless matching the rest of the chars in the string starting from “Y” to “I” left to right in the string. So moving pattern to right by 4 is safe.

Boyer-moore observation 2:  if the last (right most) occurrence of char in pattern is delta1 characters from the right end of the pattern, then we know we can slide pattern down delta1 positions without checking for matches.

Current position after applying observation 2

Image 3Now, matching starts at “T” and current position taken for matching in string is “T” to “T” that’s where the pattern is positioned now.

First match succeed as “T” in string matches with “T” in the pattern.

Upward arrow moved to the left by one position to point to “L” and “L” is matched with “A” in the pattern but match fails.

Boyer-Moore observation 3a:  if the mismatch occurs k characters from the start of the pattern and the mismatched character not in pattern, then we can advance atleast k characters.

‘L’ is not in ‘P’ and the mismatch occurred against p6, hence we can advance (atleast) 6 characters.

Current position is:

Image 4However we can actually do better than this:

Boyer-moore observation 3b:  Since we know that earlier we already matched some characters (1 in this case). If the matched characters don’t match the start of the pattern, then we can actually jump forward a little more, delta2 distance.

Current position is:

Image 5Now pointer is point at the hyphen and that hyphen occurs at the third character of the pattern, we can apply observation 1 which moves the pattern 4 characters forward to match those two hyphens.

Current position is:

Image 6Now upward arrow points at the last char of the string T and when we match each character of string with the pattern, we get successful match between all the chars with the pattern. So we FOUND the pattern in the string.

Performance analysis:

Though Boyer-Moore algorithm performance is sublinear in best case, but it has O(MN) runtime in worst case when the length of characters to search is very small.

Boyer-Moore algorithm is really fast on larger alphabets and for small set of characters it’s recommended to use Knuth-Morris-Pratt algorithm.

What is big data?

big-data

According to IBM,  Every day we create 2.5 quintillion (2.5*1018 ) bytes of data in the world and it’s so much that about 90% of the world’s data today has been created in the last two years alone. This vast amount of data generated so fast is throwing a lot of challenges to the data science and related field in analyzing and utilizing them. This fast generating, challenging, variety and difficult data is called big data.

Big data is not a single technology but a combination of old and new technologies that help companies gain actionable insight. So big data is the capability to manage huge volume of different data, at the right speed and within the right time frame to allow real-time analysis and action.

The major challenges of big data are:

Volumn: How much data.

Velocity: How fast the data is processed.

Variety: Different types of data

Big data comprises of almost all kinds of data available in the world that are structured and unstructured. Unstructured data is data that’s not in a particular data model and it can be any data such as text, sensor data, audio, video, images, click streams, log files to name a few.  In 1998, Merrill Lynch cited a rule of thumb that somewhere around 80-90% of all potentially usable business information may originate in unstructured form.  Recently analysts predict that data will grow 800% over the next five years. Computer world says that unstructured information might account for more than 70-80% of all data in an organization. So it’s extremely crucial to analyze and utilize these vast amounts of data for the benefit of the organization.

Global Market for Big data:

  • Digital information is growing at 57% per annum globally.
  • With global social network penetration and mobile internet penetration both under 20% this growth has only just begun.
  • All the data generated is valuable, but only if it can be interpreted in a timely and cost effective manner.
  • IDC expects revenues for big data technology infrastructure to grow by 40% per annum for the next three years.

In 2006, IDC estimated, the world produced 0.18 zettabytes of digital information. It grew to 1.8 zettabytes in 2011 and will reach 35 zettabytes by 2020.

Few statistics to demonstrate the ‘big’ part of the bigdata:

  1. Twitter generates nearly 12 TB of data per day, 58 million tweets perday.
  2. Every hour Wallmart controls more than 1 million customer transactions. All of this information is transferred into a database working with over 2.5 petabytes of information.
  3. According to FICO, the credit card fraud system currently in place helps protect over 2 billion accounts all over the globe.
  4. Currently Facebook holds more than 45 billion photos in its entire user base and the number of photos growing rapidly.
  5. The amount of data processed daily by Google is 20 PB and monthly worldwide searches on Google sites are 87.8 billion.

Here is an interesting statistics from YouTube alone:

  1. More than 1 billion UNIQUE users visit YouTube every month.
  2. Over 4 billion hours of video are watched each month.
  3. 72 hours of video are uploaded every minute. (It will take 3 days to watch them all without sleep).

So, Big data is the next big thing happening to IT industry. To be successful in the IT industry it’s really crucial to adopt to big data analytics to make use of the exploding amount of data that’s available now and in the future.