Map Reduce makes Cents

I just attended a conference at Intel about Big Data – the vibe was very positive, excited about what problems we can solve with the aggregation of data and ability to store it in its native state using Hadoop. I had heard about this buzz word before, but here I got a much better understanding of what it is. However, there were some gaps in my knowledge, including understanding the idea of the Map reduce Algorithm. I did some research online, and this explanation helped a lot to bring the idea to a simple level: http://ksat.me/map-reduce-a-really-simple-introduction-kloudo/

In addendum, I will describe a different problem that can be solved in the same way. Imagine that your grandfather just left you a fortune in a house somewhere on an uninhabited island. The house is filled to the brim with coins of the following denomination: 1c, 5c, 10c, 25c, $1.
You want to count how many of each denomination you have so you can order the right amount of paper tubes and hand it all organized to your bank. You’ve sent a message to all your relatives to come help you in return for a fraction of the fortune.

100 relatives respond, and you decide to put them to work. You assign the 5 oldest people to be the Reducers, since they will have to move around the least, the best number cruncher as the Grouper, and the remaining 94 people as Mappers.

First:
You give the mappers each a huge bag and notebook. You send these people into the house to gather the coins into the bag, and each time they drop a single coin into the bag, to mark the denomination in their notebook separated by commas. When all the change is collected, you take their notebooks, which may look something like this:
1,1,1,5,10,10,100,25,10,5,5,1,5,1….

Second:
You drop these notebooks into the Grouper’s lap, and further give him five notebooks, each marked with 1c, 5c, 10c, 25c, $1. His job is to take each notebook from the mappers, and when he encounters a 1c, start counting with tick marks into the 1c notebook, any 5c into the 5c notebook, and so on until he gets through all the mappers’ notebooks. One of the 5 tickmarks notebooks might look like this:

In theory, this job can be completed by more people, just make sure to safeguard the tickmark notebooks so only one person can write a tick mark in it at once, and also keep track of which mapper notebooks have been read through.

Third:
You give the 5 reducers enormous eyeglasses and a big cup of coffee (or two), and each gets a notebook. Their job is to count the number of tick marks per notebook. After they are done, they should write the number of tickmarks in the back.
i.e. 1c notebook will have 509110481 on the back meaning there were that many pennies in the house, the 5c notebook will have something like 392740191 on the back, meaning there were that many nickels in the house, and so on.

Fourth – once you have this, all you have to do to determine how many paper tubes you need, is to find out how many pennies, nickels, dimes, quarters, and dollar coins fit into their respective paper tubes, and divide each total by their respective number.
i.e. if 50 pennies fit into a paper tube, divide 509110481 by 5 to get 101822096.2 — rounding up to 101822097 tubes needed.

After all this, you will probably wonder how much money you have, but I bet you can figure that out with some easy math 😉

Stanford CS experience

I have titled this post “..experience” because there is no real word for what “battle” I have fought to learn, complete assignments, and study for exams in the classes I am taking for Computer Science at Stanford.
Imagine a hike when you are making it up a steep hill, thinking that is the top, but as soon as you reach it, there is merely a degrade in elevation, and an even steeper hill ahead. Each such hill is like each assignment. And sometimes, there are booby traps along the way (the nearly forgotten lab assignment due on Fridays).

As you ascend each hill, there are many paths, and although more than one path leads to the top, some of them are multi-mile detours. In delirium, you sometimes take the wrong path, so absorbed in the code and how it runs, that you barely see past your nose. You sometimes just count the stars.. not above, but those that dereference the pointers in question. Where does this pointer point? Eyes watery, brain throbbing.

Delving deeper, it’s like walking through fog, where you’re using your sense of intuition, adding binary numbers in your head, recognizing offsets in hexadecimal which may or may not be correct, recognizing function addresses just by the sheer number of times you’ve gotten lost in the code, and seeing it over and over again.

At the end of each “hill”, the fog clears, you look around for booby traps, then look at the murky bog that is the next assignment. And then when you’re done with the last one… the last thing is left is the final exam. For which you have to turn around, run back down the mountain of hills, marking each twist and turn you have come up on, and all in one weekend, climb the whole thing back up. This time it is faster, as the path has already been laid out by previous work. However, it is far from easy, wriggling through all the material for the class, looking again at all the asterisks, the lecture details, notes, textbook. Checking again and again, writing questions on the message boards. tightening the ropes, getting ready for the three hour brain-torture ahead.

The day of the exam. The day of the last trial, where you don’t know what what is ahead. You have to blind fold yourself just until the precipice. Then the blind fold is gone. You have all equipment and maybe more that you don’t need: ice axe, water bottles, ropes, fire extinguisher (wait why did you bring that? it’s super heavy.. well it’s just in case). Coffee, a bar of chocolate, more coffee, some fruit, coconut water, and coffee. Wait, did I mention coffee? Perhaps it’s just a mental cushion, the placebo effect. A furious run through the tasks, sometimes mumbling out loud “you can do it!” and “come on!”.. the clock tells true the time is done. You are done. Done? Flashbacks begin. “What did I mess up? What will my grade be? Will I pass the class?”

But it has hit you, you are at the top. And you are happy, exuberant. The fog behind you is gone! Maybe the way up was not perfect, but you have learned many many ways of solving problems, lots of times the hard way, but it was such an awesome learning experience. But then you get chilly. What’s next? Looking out over the horizon.. there is another mountain. And it’s even taller. And even more covered in fog. The chill envelops you, one more week until the next climb starts.

..and quite literally. you get cold. You get A cold.
*sniffle* *cough* *groan* Awesome 🙁

Strategy

For the past 3 weeks, I’ve been in exam period. Having already taken four of my five exams, one would think that I’d be nearly relieved of all the stress involved with exam period. However, it is not really so.

Let me visualize this: it’s like a steeplechase course. Steeplechase and not hurdles, because hurdles are all the same size. And this steeplechase course is one with five hurdles, one for each exam, but imagine the solar system’s first five planets, and their relative sizes. Now match these up, respective to each hurdle. This is how I went into this exam period: First exam, ah piece of cake. Second one.. not too bad. Third one, ok slightly harder. Fourth one, ah no problem.. and then I look ahead. In the distance, there looms the last hurdle, heavy and thick, oozing with malice, weapons of confusing gas, to make you completely lost in its midst, if by chance you can’t catapult yourself over the top.

This last hurdle is 2 days away: Algebra/Geometry exam – Wednesday, May 25th, 2011. But I am not completely despondent.
This was my strategy, one which I hope really has paid off. First, I have been keeping track of how many hours I study specifically for this exam per day. This numeric is in order to keep me motivated, and to have a goal to which I hold myself. Secondly, I recognized that my greatest difficulty is the fact that all the notes and course is written in french. In order to combat this, I have been diligently translating the important points in each chapter. First I tried to translate all of the text, but realized that dwelling on little sections that are not too important was less efficient than if I just went over the propositions, theorems, corollaries, and remarks written in bold. So after two chapters entirely translated, I rewrote just the bold text. When I stumbled upon notation that I didn’t get within an instant, I highlighted it separately, and tried to visualize, or in case it was just something different than what I was used to from the USA, I’d clarify in the translated notes.

To keep my brain from passing over information, I used other kind of tactics intermingled with this translation. These varied from copying demonstrations, then trying to rewrite them myself, to rewriting the correction of the midterm which I had failed miserably in March. In fact this last one was quite a confidence booster because as I rewrote the answers, they made complete sense, and really didn’t seem as convoluted as it seemed to me that day in March.

Now the killer hurdle doesn’t seem as impossible anymore. In any case, I will have done almost everything in my power to get the highest score I can.

Current tally of hours: 35.5
Goal before exam: 40

———
Update: total hrs: 52, exam score: 7.06/20. => need for oral exam with teacher.
Studied another 10 hrs before it the oral exam. Result:
Oral exam completed: Je l’avais déchiré. 🙂