HPE Ezmeral: Uncut
Ellen_Friedman

Julia for large-scale systems: Cost vs benefits of trying a new programming language

Try_Julia_HPE_Ezmeral.pngEver hear someone say “If it ain’t broke, don’t fix it?” It’s an American phrase, but the sentiment is universal. Don’t tamper with something that is already working just to try something new.

That’s fairly good advice, as long as you carefully consider what counts as broken. Just getting by may not be good enough. A process may be working, but is it working well? Will it continue to work well as you go forward?

You may need to try a new technology, tool, or approach to be more efficient and stay competitive. While there is a cost in time and effort to try something new, the benefits may be worth it. If you automatically avoid any change, you could miss out on easier, more efficient results in the long run. The question is, are the benefits worth the cost?

Weighing costs and rewards of trying something new

Take the example of a relatively new, high-level, general purpose, open source computer language called Julia. Whether you are a developer, data scientist, or manage resources and teams of those who would use it, you may be wondering if this new language is worth trying? It’s not just a question of whether or not this language is useful. It’s also a question of whether it’s worth making the effort to change if you already are well experienced with other languages.

To answer that, I turned to Ted Dunning, CTO for HPE Ezmeral Data Fabric and an experienced data scientist and developer, who has recently been trying Julia. I asked why try a new language, and what are your impressions of Julia compared to languages you already like?

His first response offers some basic advice: “Early exploration of new approaches and new technologies makes it easier to adapt when changes are forced on you.” Ted pointed out that in modern data systems, the question of whether it’s worth trying a new language -- or any new technology -- comes down to weighing benefits against costs. 

Ideally, a new language should be:

  • Fairly easy to learn and adopt

The new language also should be easy to use going forward.

  • Efficiently scalable

The new language should be able to scale up or down seamlessly.

  • Highly performant

The new language should have excellent performance at small or medium scale and in distributed systems at large to very large scale. Furthermore, the new language should give good performance that does not necessarily require specialized hardware or extreme human effort to achieve. 

  • Flexible

The new language should be easy to integrate with other tools and systems.

So how does the Julia language measure up in terms of these four basic criteria? It jumps these first hurdles with room to spare. Ted found Julia language to be developer-friendly and an easy transition with his Python background. In a quick test example, Julia made it simple for him to produce a highly performant animation. In trying out Julia for his work with a large-scale open-source data project, he found Julia to be highly scalable and easy to integrate with Python data flows.  

The Julia language is an attractive option based on these key criteria, but does Julia offer sufficient advantages to be worth making a change from other popular languages with which you may already have experience? 

Julia vs Java or Python: Impressions from an experienced data scientist

To decide if a new language is worth trying, you’re not just comparing its capabilities to other languages as if all options are new to you. For experienced developers, data engineers, and data scientists, you’re really assessing how much and what types of advantages the new language may offer over those you already know and use. You must weigh that difference against the effort of learning a new language and potentially even learning entirely new approaches to problems.  

I asked Ted his initial impressions of Julia compared to two widely popular languages he often uses, Python and Java. Each of these two languages meets some but not quite all of the above criteria, while the Julia language gets good marks on all of them. Ted says, “Julia offers the flexibility of a dynamic language like Python but with the excellent performance you expect from a more static language such as Java”. Furthermore, Ted found Julia to be “extraordinarily expressive, even more expressive than Python.”

Another great advantage of Julia is that it’s easily extensible. Surprisingly, you can extend software packages without having to change their internals. The important implication is that you can tweak packages other people wrote or can mash up different packages to solve new problems. (That also means others may be able to use your programs in ways you didn’t expect.)

As a matter of personal preference, Ted also likes the way Julia easily displays mathematical symbols. If you’re used to encoding mathematical expressions, it’s convenient and efficient to be able to directly read code with familiar mathematical symbols. These equations may be Greek to some people, but it’s helpful for those doing numerical analysis, including use by physicists and other scientists.  

For managers, having teams use Julia is an advantage because it gives them the expressivity, convenience, and developer productivity of a higher-order language but with the high performance more often seen with low-level languages. This difference can pay off in faster development as well as more efficient execution. In addition, Julia offers interoperability, so teams can develop on laptops but deliver on high performance hardware -- thus improving team performance.

Trying out the Julia language does not mean giving up on other programming languages. Python and Java are both excellent choices and widely used for good reason. Most experienced data scientists, developers, and data engineers make use of multiple languages and tools that are well suited for different aspects of different projects. The lesson is to keep abreast of what tools you may want to add to your tool chest (and to have the flexibility needed to do so).

The power of community for open projects

A good computer language is not a stand-alone effort. A big part of Python’s popularity and strength is the strong, open Python community. Similarly, the Julia language and package system has a welcoming and growing open Julia community with thousands of contributors. Julia makes it easy for people to contribute new packages by integrating git and GitHub directly into the package management system.

Flexibility in tool choice depends on flexible data access

Julia is an attractive option, but to have the flexibility to use new tools such as Julia, it’s important that your data infrastructure does not get in the way. This issue particularly matters if you are to take full advantage of Julia’s strong performance at scale. For instance, with Julia, it is trivial to write a program that runs on 100 machines. But if your data infrastructure cannot provide the necessary performance or if it isn't available on those machines, then Julia won't be able to show off its real potential.

Aside from scale, your data infrastructure should be accessible using industry standard data APIs. Julia heavily uses standard file APIs for data access, so it needs data infrastructure that does not require non-standard access.

A modern system that functions across multiple data centers and from edge to cloud needs highly performant data infrastructure that provides a unifying data layer with built-in data motion and wide flexibility in terms of data access. And the data infrastructure should make it possible to scale up or down or add new applications and technologies without having to re-architect the system. HPE Ezmeral Data Fabric File and Object Store is a hardware agnostic software that provides the flexibility needed to use whatever tools you choose, on the data you need, wherever you need it. 

Find out more

To explore more about the Julia language, visit the Julia language website or follow the community on Twitter @JuliaLanguage .

For more information on scalable, flexible data infrastructure, visit HPE Ezmeral Data Fabric .

To consider better ways to use data for data science, read the blog “Avoiding pitfalls: Tips for better data science”.

 

Hewlett Packard Enterprise

www.hpe.com/containerplatform

www.hpe.com/mlops

www.hpe.com/datafabric

0 Kudos
About the Author

Ellen_Friedman

Ellen Friedman is a principal technologist at HPE focused on large-scale data analytics and machine learning. Ellen worked at MapR Technologies for seven years prior to her current role at HPE, where she was a committer for the Apache Drill and Apache Mahout open source projects. She is a co-author of multiple books published by O’Reilly Media, including AI & Analytics in Production, Machine Learning Logistics, and the Practical Machine Learning series.