Sunday, April 06, 2014
Map Reduce - Overview
MapReduce is a parallel and distributed solution approach developed by Google for processing large datasets. Described in this paper - http://research.google.com/archive/mapreduce.html
Map transforms a set of data into key value pairs and Reduce aggregates this data into a scalar. A reducer receives all the data for an individual "key" from all the mappers.
The approach assumes that there are no dependencies between the input data. This make it easy to parallelize the problem. The number of parallel reduce task is limited by the number of distinct "key" values which are emitted by the map function.
MapReduce incorporates usually also a framework which supports MapReduce operations. A master controls the whole MapReduce process. The MapReduce framework is responsible for load balancing, re-issuing task if a worker as failed or is to slow, etc. The master divides the input data into separate units, send individual chunks of data to the mapper machines and collects the information once a mapper is finished. If the mapper are finished then the reducer machines will be assigned work. All key/value pairs with the same key will be send to the same reducer.
The classical example for using MapReduce is logfile analysis.
Big logfiles are split and a mapper search for different webpages which are accessed. Every time a webpage is found in the log a key / value pair is emitted to the reducer where the key is the webpage and the value is "1". The reducers aggregate the number of for certain webpages. As a end result you have the total number of hits for each webpage.
The Sign of Four by Arthur Conan Doyle My rating: 5 of 5 stars It is the second Sherlock Holmes book after "A Study in Scarlet&quo...
I hit this issue recently which occurred on only one windows 7 host. The error was caused by this hard to guess reason ( http://support.mic...
Following are the steps to create NPIV port on an adapter (only tested with Brocade HBAs) on a host with VMware vSphere ESX/ESXi hypervisor...
At times, you may have wanted to perform some action periodically in your web application. Quartz is an enterprise grade scheduler which can...