Home > Articles

Installing Spark

  • Print
  • + Share This
This chapter is from the book

Deploying Spark on Hadoop

As discussed previously, deploying Spark with Hadoop is a popular option for many users because Spark can read from and write to the data in Hadoop (in HDFS) and can leverage Hadoop’s process scheduling subsystem, YARN.

Using a Management Console or Interface

If you are using a commercial distribution of Hadoop such as Cloudera or Hortonworks, you can often deploy Spark using the management console provided with each respective platform: for example, Cloudera Manager for Cloudera or Ambari for Hortonworks.

If you are using the management facilities of a commercial distribution, the version of Spark deployed may lag the latest stable Apache release because Hadoop vendors typically update their software stacks with their respective major and minor release schedules.

Installing Manually

Installing Spark on a YARN cluster manually (that is, not using a management interface such as Cloudera Manager or Ambari) is quite straightforward to do.

Submitting Spark applications using YARN can be done in two submission modes: yarn-cluster or yarn-client.

Using the yarn-cluster option, the Spark Driver and Spark Context, ApplicationsMaster, and all executors run on YARN NodeManagers. These are all concepts we will explore in detail in Hour 4, “Understanding the Spark Runtime Architecture.” The yarn-cluster submission mode is intended for production or non interactive/batch Spark applications. You cannot use yarn-cluster as an option for any of the interactive Spark shells. For instance, running the following command:

spark-shell --master yarn-cluster

will result in this error:

Error: Cluster deploy mode is not applicable to Spark shells.

Using the yarn-client option, the Spark Driver runs on the client (the host where you ran the Spark application). All of the tasks and the ApplicationsMaster run on the YARN NodeManagers however unlike yarn-cluster mode, the Driver does not run on the ApplicationsMaster. The yarn-client submission mode is intended to run interactive applications such as pyspark or spark-shell.

  • + Share This
  • 🔖 Save To Your Account