Hadoop, MapReduce, HDFS, Spark, Pig, Hive, HBase, MongoDB, Cassandra, Flume – the list goes on! Over 25 technologies.
What Will I Learn?
- Design distributed systems that manage “big data” using Hadoop and related technologies.
- Use HDFS and MapReduce for storing and analyzing data at scale.
- Use Pig and Spark to create scripts to process data on a Hadoop cluster in more complex ways.
- Analyze relational data using Hive and MySQL
- Analyze non-relational data using HBase, Cassandra, and MongoDB
- Query data interactively with Drill, Phoenix, and Presto
- Choose an appropriate data storage technology for your application
- Understand how Hadoop clusters are managed by YARN, Tez, Mesos, Zookeeper, Zeppelin, Hue, and Oozie.
- Publish data to your Hadoop cluster using Kafka, Sqoop, and Flume
- Consume streaming data using Spark Streaming, Flink, and Storm
- You will need access to a PC running 64-bit Windows, MacOS, or Linux with an Internet connection, if you want to participate in the hands-on activities and exercises. You must have at least 8GB of free RAM on your system; 10GB or more is recommended. If your PC does not meet these requirements, you can still follow along in the course without doing hands-on activities.
- Some activities will require some prior programming experience, preferably in Python or Scala.
- A basic familiarity with the Linux command line will be very helpful.