

MapReduce = Programming Model + Execution Framework

Cluster Architecture

스크린샷 2023-12-20 오전 6.14.35.png

Simplest environment for parallel processing

MapReduce Execution Framework

Distributed File System

Distribution of the input

Execution Flow Overview

스크린샷 2023-12-20 오전 6.17.38.png

Overall schematic diagram for MapReduce framework

스크린샷 2023-12-20 오전 6.17.55.png


MapReduce: Step-by-Step Execution

MapReduce: Output

Execution Overview

(1)MapReduce splits the Input files into M “splits” then Starts many copies of program on servers

스크린샷 2023-12-20 오전 6.19.57.png

(2) One copy(the master) is special. The rest are workers. The master picks idle workers And assigns each 1 of M map tasks or 1of R reduce tasks.

스크린샷 2023-12-20 오전 6.20.17.png

(3) A map worker reads the input split. It parses key/value pairs of the input data and passes each pair to the user defined map function.

(4) Write to the buffers

스크린샷 2023-12-20 오전 6.20.44.png

(5) Read intermediate key/value pairs, sort them by its key.

스크린샷 2023-12-20 오전 6.21.00.png

(6) Perform a reduce task for each intermediate key, write the result to the output files

스크린샷 2023-12-20 오전 6.21.22.png

MapReduce Synchronization

MapReduce Failure Handling

MapReduce Redundant Execution