onlincourse

Introduction to Big Data Analytics with Hadoop

Hone your data analyst skills and improve your workflow as you learn how to store, analyse, and scale big data using Hadoop with this online course from Packt.

Duration

3 weeks

Weekly study

2 hours

100% online

How it works

Unlimited subscription

Learn more

Learn how to use the Hadoop ecosystem

Understanding Hadoop is a highly valuable skill for anyone working with large amounts of data. Companies such as Amazon, eBay, Facebook, Google, LinkedIn, Spotify, and Twitter use Hadoop in some way to process huge chunks of data.

On this three-week course, you’ll become familiar with Hadoop’s ecosystem and understand how to apply Hadoop skills in the real world.

Exploring the history and key terminology of Hadoop, you’ll then walk-through the installation process on your desktop to help you get started.

Explore Hadoop Distributed File System (HDFS)

With a solid introduction to Hadoop, you’ll learn how to manage big data on a cluster with Hadoop Distributed File System (HDFS).

You’ll also discover MapReduce to understand what it is and how it’s used before moving onto programming Hadoop with Pig and Spark.

With this knowledge, you’ll be able to start analysing data on Hadoop.

Understand MySQL and NoSQL

Next, you’ll learn how to do more with your data as you understand how to store and query data. To help you do this, you’ll learn how to use applications such as Sqoop, Hive, MySQL, Phoenix, and MongoDB.

Develop core data analyst skills

Finally, you’ll hone your data analyst skills by learning how to query data interactivity. You’ll also gain an overview of Presto and learn how to install it to ensure you can quickly query data of any size.

By the end of the course, you’ll have the skills to effectively work with big data using Hadoop and be able to streamline your processes.

Week 1
Introduction to Hadoop and using the HDFS
- Introduction to the course
  Welcome to Introduction to Big Data Analytics with Hadoop and the start of your learning journey, brought to you by Packt.
- Introduction to Hadoop
  In this activity, we will discuss how to install Hadoop, the effect of the Hortonworks and Cloudera merger, a Hadoop overview and history and the Hadoop ecosystem.
- Using the Hadoop Distributed File System (HDFS)
  In this activity, we will discuss the Hadoop Distributed File System (HDFS), installing the MovieLens dataset, installing a dataset into HDFS using the command line and MapReduce.
- Using the Hadoop's Core: MapReduce
  In this activity, we will discuss MapReduce, how MapReduce distributes processing and a MapReduce example.
- Using the Hadoop’s Core: Activities and challenge exercise
  In this activity, we will explore Python MRJob, Nano and the MapReduce job. We will also describe how to rank movies by their popularity and check our results.
- Wrap up
  You have reached the end of Week 1. In this activity, you'll reflect on what you have learned.
Week 2
Programming Hadoop with Pig and Spark
- Introduction to Week 2
  Welcome to Week 2. In this activity we'll highlight the main topics that will be covered this week.
- Introduction to programming Hadoop with Pig
  In this activity, we will discuss and introduction to Ambari and an introduction to Pig. We will also apply Pig to an activity.
- Pig continued
  In this activity, we will discuss Pig in more detail and apply Pig to a challenge exercise.
- Programming Hadoop with Spark
  In this activity, we will discuss Hadoop with Spark, Resilient Distributed Datasets (RDD) and using RDD.
- Data sets and Spark 2.0
  In this activity, we will discuss data sets and Spark 2.0.
- Wrap up
  You have reached the end of Week 2. In this activity, you'll reflect on what you have learned.
Week 3
Using relational and non-relational data stores with Hadoop
- Introduction to Week 3
  Welcome to Week 3. In this activity we'll highlight the main topics that will be covered this week.
- Using relational data-stores with Hadoop part 1
  In this activity, we will discuss what Hive is and how Hive works.
- Using relational data-stores with Hadoop part 2
  In this activity, we will discuss integrating MySQL with Hadoop. We will describe installing MySQL and importing data and using Sqoop to import and export data.
- Using non-relational data stores with Hadoop (1)
  In this activity, we will discuss NoSQL and HBase.
- Using non-relational data stores with Hadoop (2)
  In this activity, we will discuss Cassandra, installing Cassandra and writing Spark output into Cassandra.
- Using non-relational data stores with Hadoop (3)
  In this activity, we will discuss MongoDB, integrating MongoDB with Spark and using the MongoDB shell.
- Wrap up
  You have reached the end of Week 3. In this activity, you'll reflect on what you have learned.