Installing Spark
Big data consultant Jeffrey Aven covers the basics about how Spark is deployed and how to install Spark. He also covers how to deploy Spark on Hadoop using the Hadoop scheduler, YARN.
Now that you’ve gotten through the heavy stuff in the last two hours, you can dive headfirst into Spark and get your hands dirty, so to speak.
This hour covers the basics about how Spark is deployed and how to install Spark. I will also cover how to deploy Spark on Hadoop using the Hadoop scheduler, YARN, discussed in Hour 2.
By the end of this hour, you’ll be up and running with an installation of Spark that you will use in subsequent hours.
Spark Deployment Modes
There are three primary deployment modes for Spark:
Spark Standalone
Spark on YARN (Hadoop)
Spark on Mesos
Spark Standalone refers to the built-in or “standalone” scheduler. The term can be confusing because you can have a single machine or a multinode fully distributed cluster both running in Spark Standalone mode. The term “standalone” simply means it does not need an external scheduler.
With Spark Standalone, you can get up an running quickly with few dependencies or environmental considerations. Spark Standalone includes everything you need to get started.
Spark on YARN and Spark on Mesos are deployment modes that use the resource schedulers YARN and Mesos respectively. In each case, you would need to establish a working YARN or Mesos cluster prior to installing and configuring Spark. In the case of Spark on YARN, this typically involves deploying Spark to an existing Hadoop cluster.
I will cover Spark Standalone and Spark on YARN installation examples in this hour because these are the most common deployment modes in use today.