No single standard definition...
“Big Data” is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.
“Big Data” is data whose scale, diversity, and complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it...

What is Big Data?

“Big data is high-volume, high-velocity and high- variety information assets that demand cost- effective, innovative forms of information processing for enhanced insight and decision making.” by Gartner

What is Big Data?

스크린샷 2023-12-20 오전 5.26.43.png

What is Big Data?

Some makes it 4V’s

What is Big Data?

Some makes it 5V’s

Big Data Characteristics

스크린샷 2023-12-20 오전 5.28.03.png

Data is New King!

스크린샷 2023-12-20 오전 5.28.13.png

From Data To Wisdom

스크린샷 2023-12-20 오전 5.28.24.png

Quotable Quotes about Big Data

“Information is the oil of the 21st century, and analytics is the combustion engine” by Gartner
“Data is the crude oil of the 21st century. You need data scientists to refine it!” by Karthik
“Without big data analytics, companies are blind and deaf, wandering out onto the web like deer on a freeway.” by Geoffrey Moore

Data Science

An area that manages, manipulates, extracts, and interprets knowledge from tremendous amount of data
Data science is a multidisciplinary field of study with goal to address the challenges in big data

Why is Big Data hard?

How to store?
- 1000 * 1TB hard drives required to store 1PB of data
How to move?
- Assuming 10GB network, it takes 2 hours to copy 1TB, or 83 days to copy a 1PB
How to search?
- Assuming each record is 1KB and one machine can process 1000 records per sec, it needs 277 CPU days to process a 1TB and 785 CPU years to process a 1PB
How to process?
- How to convert algorithms to work in large size
- How to create new algorithms

Why is Big Data hard?

there are no one-size-fits-all solution
Rapidly-evolving technology
Many different tools!
Different computation model: need new algorithms!

Solution to Big Data Processing

Need to bring distributed storage and distributed processing to handle big data
Issues:
- Distributing computation across many machines
- Maximizing performance
  - Minimize I/O to disk,
  - Minimize transfers across the network
- Combining the results of distributed computation
- Recovering from failures

Big Data

스크린샷 2023-12-20 오전 5.30.45.png

Infrastructure for Big Data

스크린샷 2023-12-20 오전 5.31.03.png

Google

Google data centers in the Dalles, Oregon

Example: Google DataCenter

Cluster Computing

Cluster: collection of individual PC’s (compute nodes) connected by a high performance network
- Each compute node is an independent entity with its own
  - Processor
  - Mainmemory
  - Oneormultiplenetworkingcards
All compute nodes typically have access to a shared file system (Distributed file system)

Scalability

Scalability is the ability of the system to adapt to increased demands in terms of processing
A system is said to be scalable if it can handle the addition of users and resources without suffering a noticeable loss of performance or increase in administrative complexity

Two types of scaling

Scale up(=Vertical Scaling)
Scale out(=Horizontal Scaling)

목차

ppt

What is Big Data?

What is Big Data?

What is Big Data?

What is Big Data?

What is Big Data?

Big Data Characteristics

Data is New King!

From Data To Wisdom

Quotable Quotes about Big Data

Data Science

Why is Big Data hard?

Why is Big Data hard?

Solution to Big Data Processing

Big Data

Infrastructure for Big Data

Google

Cluster Computing

Scalability

Two types of scaling

Scale Up vs. Scale Out