HPE Ezmeral Software platform
1748073 Members
5268 Online
108758 Solutions
New Discussion юеВ

MEP 7.1.0 and Spark 2.4.7 versus JDK 11

 
jamesrgrinter
Occasional Contributor

MEP 7.1.0 and Spark 2.4.7 versus JDK 11

"Developer" question: how are the MEP 7.x releases squaring the circle between Data Fabric 6.2 requiring JRE/JDK 11, and Spark 2.4.x being very much JDK 8?

That is the Data Fabric 6.2-related releases, including components like hadoop-common (2.7.4.0-mapr-710), contain Java 11 byte code, but Spark 2.4 does not (officially, anyway?) run under JDK 11. In fact, I know some methods will definitely fail at run-time if they get called.

8 REPLIES 8
jamesrgrinter
Occasional Contributor

Re: MEP 7.1.0 and Spark 2.4.7 versus JDK 11

To answer a bit of my own question, I can see that the 'MapR' version of the Spark code has been patched to accomodate Java 11.

But are there any caveats to that support that I might need to be aware of?

Harshkohli
HPE Pro

Re: MEP 7.1.0 and Spark 2.4.7 versus JDK 11

Hello,

Currently we have not come across any such issues or caveats.  Just in case there are issues, feel free to reach out to us.

Thanks.

I work for HPE
Harshkohli
HPE Pro

Re: MEP 7.1.0 and Spark 2.4.7 versus JDK 11

Kindly let me know if you are ok with the solution provided.

Regards.

I work for HPE
Harshkohli
HPE Pro

Re: MEP 7.1.0 and Spark 2.4.7 versus JDK 11

Kindly let me know if you are ok with the solution provided.

Regards.

I work for HPE
jamesrgrinter
Occasional Contributor

Re: MEP 7.1.0 and Spark 2.4.7 versus JDK 11

Well, one that's just shown up is that we were looking at whether we should be migrating our Spark application code to Structured Streaming and DataFrames, from Spark Streaming and RDDs. This question has arisen because all the optimisation work within the Spark project is being directed at DataFrames and the newer APIs, and the Spark project now considers Spark Streaming as 'legacy'.

One of our libraries uses GraphX, and so we would (probably) want to update it to use DataFrames and the newer GraphFrames API, to track Spark direction of development. But the (MapR-specific) combination of Spark 2.4 and Scala 2.12 is considered experimental for the GraphFrames project, which puts a maintenance burden on us to build and maintain our own version of GraphFrames.

tdunning
HPE Pro

Re: MEP 7.1.0 and Spark 2.4.7 versus JDK 11

Can you clarify what you are actually asking?

Are you asking if you can move to Java 11? Whether you can move to a more current Spark? Whether you can avoid Java 11?

I really can't tell. There are good answers for all of the above.

Also, which version of data fabric is your cluster running? Which version of MEP?

I work for HPE
jamesrgrinter
Occasional Contributor

Re: MEP 7.1.0 and Spark 2.4.7 versus JDK 11

I'm really trying to understand what direction we should be taking with our Spark streaming applications, if we want to continue running them on MapR (we're finally making the shift from 5.2 to 6.2) but taking advantage of the features and performance improvements that newer versions of Spark can offer.

We run the spark applications outside of/connecting to the MapR cluster nodes (spark on k8s, actually) with container images containing the MapR specific spark JARs (in this case we'd been working with version 2.4.7.100-mapr-710).

This thread had already established that for MapR 6.2 we'd have been looking at MEP 7.1.0, which entails:

- Java 11
- Scala 2.12
- Spark 2.4.7.100-mapr-710

but that is not a standard combination as far as other Spark-related libraries are concerned (such as the aforementioned GraphFrames).

Of course I was asking this back in September, and now - checking back at the EEP Release Notes - I see MEP 8.0.0 (October '21) has brought Spark 3.1.2.0-eep-800. So perhaps we should skip ahead to that version and see what implications that has for us.

tdunning
HPE Pro

Re: MEP 7.1.0 and Spark 2.4.7 versus JDK 11

My apologies for not realizing that this was the tail of a longer thread.

To your point, yes, skipping forward (if practical given your code base) is the right choice.

I work for HPE