(Part 1) Cluster Mode
This post covers cluster mode specific settings, for client mode specific settings, see Part 2.
One morning, while doing some back-of-an-envelope calculations, I discovered that we could lower our AWS costs by using clusters of fewer, powerful machines.
More cores, more memory, lower costs – it’s not every day a win/win/win comes along.
As you might expect, there was a catch. We had been using the AWS maximizeResourceAllocation setting to automatically set the size of our Spark executors and driver.
maximizeResourceAllocation allocates an entire node and its resources for the Spark driver. This worked well for us before. Our previous cluster of 10 nodes had been divided into 9 executors and 1 driver. 90% of our resources were processing data while 10% were dedicated to the various housekeeping tasks the driver performs.
However, allocating an entire node to the driver with our new cluster design wasted resources egregiously. A full 33% of our resources were devoted to the driver, leaving only 67% for processing data. Needless to say, our driver was significantly over-allocated.
Clearly, maximizeResourceAllocation wasn’t going to work for our new cluster. We were going to have to roll up our sleeves and manually configure our Spark jobs. Like any developer, I consulted the sacred texts (Google, Stack Overflow, Spark Docs). Helpful information abounded, but most of it was overly general. I had difficulty finding definite answers as to what settings I should choose.
While calculating the specifics for our setup, I knew that the cluster specs might change again in the future. I wanted to build a spreadsheet that would make this process less painful. With a generous amount of guidance gleaned from this Cloudera blogpost, How to Tune Your Apache Spark Jobs Part 2, I built the following spreadsheet:
If you would like an easy way to calculate the optimal settings for your Spark cluster, download the spreadsheet from the link above. Below, I’ve listed the fields in the spreadsheet and detail the way in which each is intended to be used.
A couple of quick caveats:
- The generated configs are optimized for running Spark jobs in cluster deploy-mode
- The generated configs result in the driver being allocated as many resources as a single executor.
The fields shown above are configurable. The green-shaded fields should be changed to match your cluster’s specs. It is not recommended that you change the yellow-shaded fields, but some use-cases might require customization. More information about the default, recommended values for the yellow-shaded fields can be found in the Cloudera post.
Number of Nodes
The number of worker machines in your cluster. This can be as low as one machine.
Memory Per Node (GB)
The amount of RAM per node that is available for Spark’s use. If using Yarn, this will be the amount of RAM per machine managed by Yarn Resource Manager.
Cores Per Node
The number of cores per node that are available for Spark’s use. If using Yarn, this will be the number of cores per machine managed by Yarn Resource Manager.
Memory Overhead Coefficient
Recommended value: .1
The percentage of memory in each executor that will be reserved for spark.yarn.executor.memoryOverhead.
Executor Memory Upper Bound (GB)
Recommended value: 64
The upper bound for executor memory. Each executor runs on its own JVM. Upwards of 64GB of memory and garbage collection issues can cause slowness.
Executor Core Upper Bound
Recommended value: 5
The upper bound for cores per executor. More than 5 cores per executor can degrade HDFS I/O throughput. I believe this value can safely be increased if writing to a web-based “file system” such as S3, but significant increases to this limit are not recommended.
OS Reserved Cores
Recommended value: 1
Cores per machine to reserve for OS processes. Should be zero if only a percentage of the machine’s cores were made available to Spark (i.e. entered in the Cores Per Node field above).
OS Reserved Memory (GB)
Recommended value: 1
The amount of RAM per machine to reserve for OS processes. Should be zero if only a percentage of the machine’s RAM was made available to Spark (i.e. entered in the Memory Per Node field above).
Parallelism Per Core
Recommended value: 2
The level of parallelism per allocated core. This field is used to determine the spark.default.parallelism setting. Generally recommended setting for this value is double the number of cores.
Note: Cores Per Node and Memory Per Node could also be used to optimize Spark for local mode. If your local machine has 8 cores and 16 GB of RAM and you want to allocate 75% of your resources to running a Spark job, setting Cores Per Node and Memory Per Node to 6 and 12 respectively will give you optimal settings. You would also want to zero out the OS Reserved settings. If Spark is limited to using only a portion of your system, there is no need to set aside resources specifically for the OS.
Once the configurable fields on the left-hand side of the spreadsheet have been set to the desired values, the resultant cluster configuration will be reflected in the reference table.
There is some degree of subjectivity in selecting the Executors Per Node setting that will work best for your use case, so I elected to use a reference table rather than selecting the number automatically.
A good rule of thumb for selecting the optimal number of Executors Per Node would be to select the setting that minimizes Unused Memory Per Node and Unused Cores Per Node while keeping Total Memory Per Executor below the Executor Memory Upper Bound and Core Per Executor below the Executor Core Upper Bound.
For example, take the reference table shown above:
- Executors Per Node: 1
- Unused Memory Per Node: 0
- Unused Cores Per Node: 0
Warning: Total Memory Per Executor exceeds the Executor Memory Upper Bound
Warning: Cores Per Executor exceeds Executor Core Upper Bound
- (That row has been greyed out since it has exceeded one of the upper bounds)
- Executors Per Node: 5
- Unused Memory Per Node: 0
- Unused Cores Per Node: 1
Warning: Cores Per Executor exceeds the Executor Core Upper Bound.
- Executors Per Node: 6
- Unused Memory Per Node: 1
- Unused Core Per Node: 1
- Total Memory Per Executor and Cores Per Executor are both below their respective upper bounds.
- Executors Per Node: All others
- Either exceed the Executor Memory Upper Bound, exceed the Executor Cores Upper Bound, or waste more resources than Executors Per Node = 6
Executors Per Node = 6 is the optimal setting.
Now that we have selected an optimal number of Executors Per Node, we are ready to generate the Spark configs with which we will run our job. We enter the optimal number of executors in the Selected Executors Per Node field. The correct settings will be generated automatically.
(Number of Nodes * Selected Executors Per Node) - 1
This is the number of total executors in your cluster. We subtract one to account for the driver. The driver will consume as many resources as we are allocating to an individual executor on one, and only one, of our nodes.
Equal to Overhead Memory Per Executor
The memory to be allocated for the memoryOverhead of each executor, in MB. Calculated from the values from the row in the reference table that corresponds to our Selected Executors Per Node.
Equal to Memory Per Executor
The memory to be allocated for each executor. Calculated from the values from the row in the reference table that corresponds to our Selected Executors Per Node.
Equal to spark.yarn.executor.memoryOverhead
The memory to be allocated for the memoryOverhead of the driver, in MB.
Equal to spark.executor.memory
The memory to be allocated for the driver.
Equal to Cores Per Executor
The number of cores allocated for each executor. Calculated from the values from the row in the reference table that corresponds to our Selected Executors Per Node.
Equal to spark.executor.cores
The number of cores allocated for the driver.
spark.executor.instances * spark.executor.cores * Parallelism Per Core
Default parallelism for Spark RDDs, Dataframes, etc.
Now that we have the proper numbers for our configs, using them is fairly simple. Below, I’ve demonstrated 3 different ways the configs might be used:
Add to spark-defaults.conf
Note: Will be used for submitted jobs unless overwritten by spark-submit args
Pass as software settings to an AWS EMR Cluster
Note: Will be added to spark-defaults.conf
Pass as args with spark-submit
--[your class] \
--master yarn \
--deploy-mode cluster \
--num-executors 17 \
--conf spark.yarn.executor.memoryOverhead=4096 \
--executor-memory 35G \
--conf spark.yarn.driver.memoryOverhead=4096 \
--driver-memory 35G \
--executor-cores 5 \
--driver-cores 5 \
--conf spark.default.parallelism=170 \