Big Data and Hadoop

Duration: 3 days


Big Data means different things to different people. In this course, we’ll ensure that you have a solid definition based on fundamental computational theory. We will discuss Apache Hadoop, Map Reduce, and the Hadoop infrastructure (Pig, Hive, etc.).


  • Understand Big Data
  • Understand distributed architectures
  • Understand Hadoop
  • Understand and master how to write a Map Reduce algorithm
  • Understand how to use Apache Pig
  • Understand how to use Apache Hive


  • Programmers
  • Architects
  • Data Engineers
  • Data Scientists
  • Managers that want to understand the value of Hadoop and Big Data


Big Data

  • What is Big Data?

  • Why horizontal scaling?

  • The fundamental problems, theories and solutions in distributed computing

  • What is the CAP theorem and why is it important?

  • Principles of distributed computing

Map Reduce

  • How does Map Reduce fit into the Big Data Picture?

  • Map Reduce in Java


  • What is Hadoop?

  • Why Hadoop?

  • Building Map Reduce in Hadoop

Apache Hive

  • What is Hive?

  • Why Hive?

  • Hive SQL

  • Hive Joins

  • HCatalog

  • Performance tips in Hive

Apache Pig

  • What is Pig?

  • Why Pig?

  • Pig Latin

  • Advanced Pig

  • Pig joins

  • Pig performance

Related tools and a comparison

  • Apache Spark

  • HBase

  • Kafka

  • Flume