Introduction to Big Data Analytics with Hadoop
Hone your data analyst skills and improve your workflow as you learn how to store, analyse, and scale big data using Hadoop with this online course from Packt.
Duration
3 weeks
Weekly study
2 hours
100% online
How it works
Unlimited subscription
Learn more
Understanding Hadoop is a highly valuable skill for anyone working with large amounts of data. Companies such as Amazon, eBay, Facebook, Google, LinkedIn, Spotify, and Twitter use Hadoop in some way to process huge chunks of data.
On this three-week course, you’ll become familiar with Hadoop’s ecosystem and understand how to apply Hadoop skills in the real world.
Exploring the history and key terminology of Hadoop, you’ll then walk-through the installation process on your desktop to help you get started.
With a solid introduction to Hadoop, you’ll learn how to manage big data on a cluster with Hadoop Distributed File System (HDFS).
You’ll also discover MapReduce to understand what it is and how it’s used before moving onto programming Hadoop with Pig and Spark.
With this knowledge, you’ll be able to start analysing data on Hadoop.
Next, you’ll learn how to do more with your data as you understand how to store and query data. To help you do this, you’ll learn how to use applications such as Sqoop, Hive, MySQL, Phoenix, and MongoDB.
Finally, you’ll hone your data analyst skills by learning how to query data interactivity. You’ll also gain an overview of Presto and learn how to install it to ensure you can quickly query data of any size.
By the end of the course, you’ll have the skills to effectively work with big data using Hadoop and be able to streamline your processes.
Welcome to Introduction to Big Data Analytics with Hadoop and the start of your learning journey, brought to you by Packt.
In this activity, we will discuss how to install Hadoop, the effect of the Hortonworks and Cloudera merger, a Hadoop overview and history and the Hadoop ecosystem.
In this activity, we will discuss the Hadoop Distributed File System (HDFS), installing the MovieLens dataset, installing a dataset into HDFS using the command line and MapReduce.
In this activity, we will discuss MapReduce, how MapReduce distributes processing and a MapReduce example.
In this activity, we will explore Python MRJob, Nano and the MapReduce job. We will also describe how to rank movies by their popularity and check our results.
You have reached the end of Week 1. In this activity, you'll reflect on what you have learned.
Welcome to Week 2. In this activity we'll highlight the main topics that will be covered this week.
In this activity, we will discuss and introduction to Ambari and an introduction to Pig. We will also apply Pig to an activity.
In this activity, we will discuss Pig in more detail and apply Pig to a challenge exercise.
In this activity, we will discuss Hadoop with Spark, Resilient Distributed Datasets (RDD) and using RDD.
In this activity, we will discuss data sets and Spark 2.0.
You have reached the end of Week 2. In this activity, you'll reflect on what you have learned.
Welcome to Week 3. In this activity we'll highlight the main topics that will be covered this week.
In this activity, we will discuss what Hive is and how Hive works.
In this activity, we will discuss integrating MySQL with Hadoop. We will describe installing MySQL and importing data and using Sqoop to import and export data.
In this activity, we will discuss NoSQL and HBase.
In this activity, we will discuss Cassandra, installing Cassandra and writing Spark output into Cassandra.
In this activity, we will discuss MongoDB, integrating MongoDB with Spark and using the MongoDB shell.
You have reached the end of Week 3. In this activity, you'll reflect on what you have learned.
More courses you might like
Learners who joined this course have also enjoyed these courses.
©2025 onlincourse.com. All rights reserved