I just attended a conference at Intel about Big Data – the vibe was very positive, excited about what problems we can solve with the aggregation of data and ability to store it in its native state using Hadoop. I had heard about this buzz word before, but here I got a much better understanding of what it is. However, there were some gaps in my knowledge, including understanding the idea of the Map reduce Algorithm. I did some research online, and this explanation helped a lot to bring the idea to a simple level: http://ksat.me/map-reduce-a-really-simple-introduction-kloudo/

In addendum, I will describe a different problem that can be solved in the same way. Imagine that your grandfather just left you a fortune in a house somewhere on an uninhabited island. The house is filled to the brim with coins of the following denomination: 1c, 5c, 10c, 25c, $1.

You want to count how many of each denomination you have so you can order the right amount of paper tubes and hand it all organized to your bank. You’ve sent a message to all your relatives to come help you in return for a fraction of the fortune.

100 relatives respond, and you decide to put them to work. You assign the 5 oldest people to be the Reducers, since they will have to move around the least, the best number cruncher as the Grouper, and the remaining 94 people as Mappers.

First:

You give the mappers each a huge bag and notebook. You send these people into the house to gather the coins into the bag, and each time they drop a single coin into the bag, to mark the denomination in their notebook separated by commas. When all the change is collected, you take their notebooks, which may look something like this:

1,1,1,5,10,10,100,25,10,5,5,1,5,1….

Second:

You drop these notebooks into the Grouper’s lap, and further give him five notebooks, each marked with 1c, 5c, 10c, 25c, $1. His job is to take each notebook from the mappers, and when he encounters a 1c, start counting with tick marks into the 1c notebook, any 5c into the 5c notebook, and so on until he gets through all the mappers’ notebooks. One of the 5 tickmarks notebooks might look like this:

In theory, this job can be completed by more people, just make sure to safeguard the tickmark notebooks so only one person can write a tick mark in it at once, and also keep track of which mapper notebooks have been read through.

Third:

You give the 5 reducers enormous eyeglasses and a big cup of coffee (or two), and each gets a notebook. Their job is to count the number of tick marks per notebook. After they are done, they should write the number of tickmarks in the back.

i.e. 1c notebook will have 509110481 on the back meaning there were that many pennies in the house, the 5c notebook will have something like 392740191 on the back, meaning there were that many nickels in the house, and so on.

Fourth – once you have this, all you have to do to determine how many paper tubes you need, is to find out how many pennies, nickels, dimes, quarters, and dollar coins fit into their respective paper tubes, and divide each total by their respective number.

i.e. if 50 pennies fit into a paper tube, divide 509110481 by 5 to get 101822096.2 — rounding up to 101822097 tubes needed.

After all this, you will probably wonder how much money you have, but I bet you can figure that out with some easy math 😉