Tech Insights
1825723 Members
2949 Online
109687 Solutions
New Article
AndreaFabrizi1

How to accelerate the performance of your SAS® environment with WEKA file systems

Learn why WEKA filesystems and HPE solutions for WEKA make a good fit for large production-scale SAS® 9.4 systems.

HPE-WEKA-blog-GettyImages-685050686.jpgSAS® is the advanced predictive analytics leader with industry-specific product sets for healthcare, banking, government, retail, telecommunications, aerospace, marketing optimization, and high-performance computing.

However, setting up file systems to support large volumes of data with adequate performance can be very challenging. Indeed, the most prevalent and noticeable cause of performance issues with SAS® software is insufficient I/O infrastructure bandwidth.  As a rule of thumb, SAS® recommends 100-150 MB/sec per application core as the minimum throughput for the storage system.  

WEKA filesystems and HPE solutions for WEKA provide the answer to such a challenge. Indeed, the WEKA filesystem on HPE ProLiant DL325 Gen10plus servers can provide a sustainable throughput to support SAS® 9.4 systems with hundreds of cores.

WEKA filesystem for SAS® 9.4 test results

Recently, HPE and WEKA engineers have tested WEKA as filesystems with SAS® 9.4 to measure the sustainable throughput of the WEKA filesystems for SAS® 9.4.

The scenario used was the SAS® Mixed Analytics workload suite scenario.[1] It consists of typical analytic jobs designed to replicate light to heavy workloads. The SAS® Mixed Analytics workload has the following characteristics:

  • 50% CPU intensive and 50% IO intensive jobs
  • Utilized SAS® procedure that includes DATA step, PROC RISK, PROC LOGISTIC, PROC GLM (general linear model), PROC REQ, PROC SQL, PROC MEANS, PROC SUMMARY, PROC FREQ, and PROC SORT 
  • SAS® program input sizes up to 50GB per job
  • Input data types are text, SAS® data set, and SAS® transport files
  • Memory use per job is up to 1GB
  • Job runtimes were varied (short and long-running tasks)

The test was performed on the following hardware architecture:HPE WEKA table.pngThe figure below shows the WEKA systems throughput versus the number of SAS® jobs::

HPE WEKA IO performace.png

Test description

The SAS® environment was queried every 5 seconds during the tests. The results were then aggregated for the best 10 intervals (50 seconds), 24 intervals (2 minutes), 120 intervals (10 minutes), and 240 intervals (20 minutes). This was done because the IO demand starts high and tails steadily off during the duration of the tests. In fact, for a number of intervals during the later stages of the tests, only CPU-intensive jobs remained active, and no IO was being performed.

Graph explanation

The graph legend tells the number of 60 user runs (288 SAS® jobs) that have been executed on each compute server (Client). 1 Client means that two x 30-user runs were run simultaneously on a single server. 2 Clients means that four 30-user runs were run simultaneously, 2 on each server. 8 Clients means two x 30-user runs on each of 8 servers. Each Client ran 288 SAS® jobs. This means that for the 1 Client, a total of 288 SAS® jobs were run, and for 2 Clients, 576 jobs were run. Likewise, 3 Clients ran a total of 864 SAS® jobs, etc. The largest number of SAS® jobs occurred with the 8 Client run. In that run, there were 2304 total SAS® jobs run across all environments, with their IO demand being satisfied by the WEKA storage system.

You'll find a detailed description of this test in the HPE technical white paper: How to accelerate the performance of your SAS environment with HPE Solutions for WEKA

What these tests show

As the testing demonstrates, an entry-level, seven-node WEKA cluster provides a sustainable throughput of up to 40GB/sec. This means such a cluster can support up to 260 cores of a SAS® 9.4 system. What’s more, due to the WEKA distributed architecture and single namespace, additional performance can be simply obtained by progressively adding nodes.

HPE engineering and WEKA engineering jointly developed and validated an optimized HPE Solution for WEKA based on the latest HPE ProLiant DL360 Gen10 Plus and HPE ProLiant DL325 Gen10 Plus Servers. The task-optimized HPE ProLiant DL360 and ProLiant DL325 server families provide a very efficient and cost-effective hardware platform to combine with WEKAFS to support the most demanding AI and analytics workloads. HPE ProLiant DL servers support all the hardware required for maximizing WEKA performance over the fabric, including NVMe drives and fast InfiniBand or Ethernet network adapters.

When your enterprise needs to accommodate large SAS® IO-intensive workloads, HPE solutions for WEKA on the HPE ProLiant DL server family is a storage solution capable of providing high throughput, scalability,  easy-to-use high availability, and resiliency.

Why HPE and WEKA?

The key benefits derived from the combination of HPE solutions for WEKA software with HPE ProLiant DL servers are:

  • HPE and WEKA deliver a simple, high-performance, consumer-grade user experience for large SAS® environments, both compute and storage.
  • The combined solution is easy to operate and tune.
  • HPE is a single-step shop for the whole hardware and software solution.
  • HPE has a dedicated solutions team focusing on SAS® analytics, unlike several competitors.
  • HPE produces reference architectures and collaterals regularly.

Learn more

Get more information about HPE Solutions for WEKA.

Also check out these additional resources:

[1] The SAS Mixed Analytics workload suite scenario uses real-world data volumes and structures of a typical SAS customer. The scenario simulates the types of jobs received from various SAS clients such as Display Manager, batch, SAS Data Integration Studio, SAS Enterprise Miner™, SAS Add-In for Microsoft® Office, SAS Enterprise Guide®, and SAS Studio.


Andrea Fabrizi
Hewlett Packard Enterprise

twitter.com/HPE_Storage
linkedin.com/showcase/hpestorage/
hpe.com/storage

twitter.com/HPE_AI
linkedin.com/showcase/hpe-ai/
hpe.com/us/en/solutions/artificial-intelligence.html

0 Kudos
About the Author

AndreaFabrizi1

Andrea Fabrizi is the Strategic Portfolio Manager for Big Data and Analytics at HPE.