HPE Ezmeral Software platform
cancel
Showing results for 
Search instead for 
Did you mean: 

Brainstorm on Memory issues for Spark

 
Hao_Zhu
New Member

Brainstorm on Memory issues for Spark

Hi Team,

Have you encountered any kinds of memory issues on Spark?

If so, do you want to share the troubleshooting tips?

Thanks~

2 REPLIES 2
idyptan
Visitor

Re: Brainstorm on Memory issues for Spark

I would say the Memory issue is a subtopic for more general Optimization issue in Spark.

Since Spark was designed as in-memory computation framework, naturally it is more demanding to RAM space than lagacy MR. Therefore it is always good idea to design your cluster specification with this in mind.

However there is no recepy to make your cluster highly utilised and never hit OOM. It is always speculative and is subject to change with time. I would argue this is about the balance of stability and costs. With time you get understanding what is reasonable capacity for your workloads. This is iterative and dynamic process.

There are multiples layers of memory you should consider before taking actions on OOM issue.

1. Is a physical memory - this is what OS sees when the job is launched. In Linux you check it with "top", "free", etc.

If you're submitting Spark jobs with Yarn RM you can diagnose this type of OOM in container logs:

Error: ExecutorLostFailure Reason: Container killed by YARN for exceeding limits.
12.4 GB of 12.3 GB physical memory used. 
Consider boosting spark.yarn.executor.memoryOverhead.
Error: ExecutorLostFailure Reason: Container killed by YARN for exceeding limits.
4.5GB of 3GB physical memory used limits.
Consider boosting spark.yarn.executor.memoryOverhead.

 As suggested consider boosting "spark.yarn.executor.memoryOverhead". Typically, you need to allocate 1/10 of spark.executor.memory to get rid of it.

2. Virtual memory. This is your app's physical memory + swap (paged files).

This is managed by RM, and diagnosed by message below:

Container killed by YARN for exceeding memory limits.
1.1gb of 1.0gb virtual memory used. Killing container.

Can be solved by disabling vmem check on NM:

"yarn.nodemanager.vmem-check-enabled":"false"

 

3. Java Heap Space. This is memory avaiable for Spark JVM inself (driver/executor).

It can be detected in container logs as message below:

WARN TaskSetManager: Loss was due to 
java.lang.OutOfMemoryError
java.lang.OutOfMemoryError: Java heap space

  You request RM for memory slots for your Spark app by setting this intrinsic config spark.executor.memory

In many cases if this run out, Spark would try to spill data to disk and no OOM occurs. You as a Spark app developer can chose not to use disk at all for performance concerns. Then your app fails fast with OOM, instead of occupying resources of your cluster.

There are numerous optimisation techniqes however to lower the memory footprint.

Here are useful links that cover this subject:

https://0x0fff.com/spark-memory-management/

https://aws.amazon.com/blogs/big-data/best-practices-for-successfully-managing-memory-for-apache-spark-applications-on-amazon-emr/

https://g1thubhub.github.io/spark-memory.html

 

Vinayak_Meghraj
Occasional Contributor

Re: Brainstorm on Memory issues for Spark

By default, Spark uses On-heap memory only. The size of the On-heap memory is configured by the –executor-memory or spark.executor.memory parameter when the Spark Application starts. The concurrent tasks running inside Executor share JVM's On-heap memory.

The On-heap memory area in the Executor can be roughly divided into the following four blocks:

Storage Memory: It's mainly used to store Spark cache data, such as RDD cache, Broadcast variable, Unroll data, and so on.
Execution Memory: It's mainly used to store temporary data in the calculation process of Shuffle, Join, Sort, Aggregation, etc.
User Memory: It's mainly used to store the data needed for RDD conversion operations, such as the information for RDD dependency.
Reserved Memory: The memory is reserved for system and is used to store Spark's internal objects.

 

https://support.datafabric.hpe.com/s/article/Spark-Troubleshooting-guide-Memory-Management-How-to-troubleshooting-out-of-memory-OOM-issues-on-Spark-Executor?language=en_US