Home > Articles

Installing Spark

  • Print
  • + Share This
This chapter is from the book

Preparing to Install Spark

Spark is a cross-platform application that can be deployed on

  • Linux (all distributions)

  • Windows

  • Mac OS X

Although there are no specific hardware requirements, general Spark instance hardware recommendations are

  • 8 GB or more memory

  • Eight or more CPU cores

  • 10 gigabit or greater network speed

  • Four or more disks in JBOD configuration (JBOD stands for “Just a Bunch of Disks,” referring to independent hard disks not in a RAID—or Redundant Array of Independent Disks—configuration)

Spark is written in Scala with programming interfaces in Python (PySpark) and Scala. The following are software prerequisites for installing and running Spark:

  • Java

  • Python (if you intend to use PySpark)

If you wish to use Spark with R (as I will discuss in Hour 15, “Getting Started with Spark and R”), you will need to install R as well. Git, Maven, or SBT may be useful as well if you intend on building Spark from source or compiling Spark programs.

If you are deploying Spark on YARN or Mesos, of course, you need to have a functioning YARN or Mesos cluster before deploying and configuring Spark to work with these platforms.

I will cover installing Spark in Standalone mode on a single machine on each type of platform, including satisfying all of the dependencies and prerequisites.

  • + Share This
  • 🔖 Save To Your Account