목차


ppt

MapReduce = Programming Model + Execution Framework


Cluster Architecture

스크린샷 2023-12-20 오전 6.14.35.png


Cluster Architecture

Untitled


Simplest environment for parallel processing


MapReduce Execution Framework


Distributed File System


Distributed File System


GFS/HDFS


Distribution of the input


Execution Flow Overview

스크린샷 2023-12-20 오전 6.17.38.png


Overall schematic diagram for MapReduce framework

스크린샷 2023-12-20 오전 6.17.55.png


Master


MapReduce: Step-by-Step Execution


MapReduce: Step-by-Step Execution


MapReduce: Output


Execution Overview

(1)MapReduce splits the Input files into M “splits” then Starts many copies of program on servers

스크린샷 2023-12-20 오전 6.19.57.png


(2) One copy(the master) is special. The rest are workers. The master picks idle workers And assigns each 1 of M map tasks or 1of R reduce tasks.

스크린샷 2023-12-20 오전 6.20.17.png


(3) A map worker reads the input split. It parses key/value pairs of the input data and passes each pair to the user defined map function.

(4) Write to the buffers

스크린샷 2023-12-20 오전 6.20.44.png


(5) Read intermediate key/value pairs, sort them by its key.

스크린샷 2023-12-20 오전 6.21.00.png

(6) Perform a reduce task for each intermediate key, write the result to the output files

스크린샷 2023-12-20 오전 6.21.22.png


MapReduce Synchronization


MapReduce Failure Handling


MapReduce Failure Handling



MapReduce Redundant Execution