Big Data and Hadoop
Duration: 3 days
Description:
Big Data means different things to different people. In this course, we’ll ensure that you have a solid definition based on fundamental computational theory. We will discuss Apache Hadoop, Map Reduce, and the Hadoop infrastructure (Pig, Hive, etc.).
Objectives:
- Understand Big Data
- Understand distributed architectures
- Understand Hadoop
- Understand and master how to write a Map Reduce algorithm
- Understand how to use Apache Pig
- Understand how to use Apache Hive
Audience
- Programmers
- Architects
- Data Engineers
- Data Scientists
- Managers that want to understand the value of Hadoop and Big Data
Outline
Big Data
-
What is Big Data?
-
Why horizontal scaling?
-
The fundamental problems, theories and solutions in distributed computing
-
What is the CAP theorem and why is it important?
-
Principles of distributed computing
Map Reduce
-
How does Map Reduce fit into the Big Data Picture?
-
Map Reduce in Java
Hadoop
-
What is Hadoop?
-
Why Hadoop?
-
Building Map Reduce in Hadoop
Apache Hive
-
What is Hive?
-
Why Hive?
-
Hive SQL
-
Hive Joins
-
HCatalog
-
Performance tips in Hive
Apache Pig
-
What is Pig?
-
Why Pig?
-
Pig Latin
-
Advanced Pig
-
Pig joins
-
Pig performance
Related tools and a comparison
-
Apache Spark
-
HBase
-
Kafka
-
Flume