MapReduce

- April 02, 2020

Introduction To MapReduce:

· MapReduce is a computing model that decomposes larger manipulation jobs into individual tasks.

· These tasks can be executed parallel across the cluster.

· The results of the tasks are joined together to form the final result.

· MapReduce is the data processing component of Hadoop.

· Mapreduce transforms the list of input data into list of output data elements.

· Mapreduce is the heart of hadoop. It is designed for processing huge amount of data.

· There are two different processing layers:

1. Map

2. Reduce

Different Phases in Mapreduce :

Map:

· Map takes the set of data & convert into another set of data where individual elements are broken down into tuples(key, value pairs).

· Here data can be in structured or unstructured format.

· Key is reference to input value.(IntWritable, LongWritable)

· Value is a dataset on which to operate.(IntWritable, LongWritable, TextWritable).

· The output of map is known as intermediate output, which is stored in local system.

· The intermediate output of map is given as input to reduce phase.

· If there is no reduce phase or if the processing of reduce is completed then, the output is stored on hdfs.

· The movement of output from mapper phase to reducer phase is known as shuffling.

· The output of mapper can be different from input pair.

· Different phases under map phase:

Ø Partitioner:Output of mapper is partitioned and filtered to many partitions by partitioner.

Ø Combiner: Before passing the output to reduce phase, combiner summarizes the output record with same key. Therefore combiner is known as “Mini-Reducer”

Reduce:

· The input of reducer is intermediate output which is produced by mapper.

· Keys , Value pairs provided to reducer are sortedby key.

· Reducer is the second phase of map reduce.

· An output of reduce phase is final output.

· Different aggregate operations like filter etc.,, can be performed on reduce phase.

· By default number of reducers is 1.

· There are 3 phases of reducer in Mapreduce:

Ø Shuffling: The process of transferring output from mappers to reducers is known as shuffling.

Ø Sorting: The keys generated by mapper are automatically sorted by mapreduce. Values generated to the reducer are sorted which helps reducer, to easily distinguish when, a new reduce task should start.

Ø Reduce phase: Final output is produced , after sorting and aggregate operations are performed.

· We can set the count of reducers by using the method as follows:

Job.setNumReducerTask(Int).

By increasing the number of reducers,

Ø It increases the framework overhead

Ø Increases load balancing

Ø Lowers the cost of failures.

DataTypes:

Normal DataType MapReduce DataType

1. Int : Intwritable.

2. Float : Floatwritable.

3. Double : Double writable.

4. Long : Long Writable.

5. String : StringWritable.

6. Boolean : BooleanWritable.

Search This Blog

TECH FOR U

MapReduce

Comments

Post a Comment

Popular posts from this blog

Problem Statement Of Real Estate Use Cases

Problem Statement Of Bank Marketing analysis

Hadoop