Servers: The Right Compute
cancel
Showing results for 
Search instead for 
Did you mean: 

Server performance monitoring made easy with HPE iLO 5

Scott_Faasse

Did you know that server performance tuning is a data driven exercise? With HPE iLO 5 you can easily monitor performance data to optimize server resources in real time.Performance Monitoring made easy with HPE iLO 5_blog_551570212 (2).jpg

Do you feel you’re not getting the most performance out of your servers? Do you struggle with how to leverage tools that report performance data and maybe more importantly – how to put that data to good use? In this blog, we are going to take a peek under the server’s hood to see what powers the latest data driven performance features that are available on HPE Gen10 servers.

Recently, I wrote about a new HPE Integrated Lights-Out (iLO 5) feature that takes workload performance tuning to a whole new level. If you haven’t had a chance to read my blog introducing Workload Performance Advisor, I encourage you to give it a read, to learn how HPE is expanding beyond our previous performance management technologies that include Workload Matching and Jitter Smoothing. At the center of all this goodness is a rich set of performance related sensors, accessible to platform level firmware. Performance tuning is a data driven exercise, and I will show you how HPE is enabling you to access and leverage the same performance data for integration into your own performance-minded management activities and solutions.

Before diving into the bells and whistles of iLO 5 and its foray into performance monitoring and reporting, I want to share with you a precept I have developed on the topic over the past decade or so. First, let’s clear the air – performance monitoring is not new. Operating Systems have shipped performance monitoring capabilities for years. So why is performance monitoring at the server management level relevant? The general rule I have adopted over the last several years is that performance data is only useful if there is something actionable you can do with it, or if it drives an important function in a bigger solution. Information for information sake isn’t all that useful and not particularly valuable.

Take server power management, for example. iLO displays server power, but have you noticed that if you don’t like how much power the server is consuming you can enable power capping or change the power efficiency mode of your server? It makes sense to report power at the server level because you can manage it at the server level. So, with the advent of performance management features being embedded in your server firmware now and going forward, performance monitoring brings about a whole new degree of relevance.

Performance Monitoring with HPE Integrated Lights-Out (iLO 5)

As promised, let’s take a look under the hood at two of our recent server performance tuning features – workload matching and jitter smoothing. Both of these features analyze system performance in real-time. That is, they are driven by performance metrics that are observed while the system is up and running your workload and performing actual work! It doesn’t matter what application or operating system is running, the platform is monitoring performance related telemetry with the goal of helping you increase overall performance in processor utilization, processor average frequency, memory bus utilization, processor jitter, I/O bus utilization, processor power and processor interconnect utilization.

These metrics are measured quite frequently by the platform – on the order of around once per second in most cases - which allows the platform to get an accurate and cohesive view of performance-related activity.

To keep the amount of data storage to a maintainable level, iLO averages these metrics over a 20 second period. The 20 second samples are then used to build 10 minute, 1 hour, 24 hour, and 1 week reports of performance related resource activity—viewable via the web interface in the form of graphs as well as through the RESTful API scripting interface as complete datasets.

PerfMon.PNG

Flexibility in selecting time intervals for performance monitoring

You might be wondering why HPE chose to offer 10 minute, 1 hour, 24 hour, and 1 week time intervals for displaying performance data. Well, there is a good reason for these intervals, and they are based on some generic use cases summarized below:

  • 10 minute data. This interval is great for near instant feedback on resource utilization trends for an application or simple benchmarks. If you are running short duration benchmarks, the 10 minute data gives you a great snapshot of how performance is trending – STREAM and LINPACK are great examples.
  • 1 hour data. A lot of applications and benchmarks don’t complete in less than 10 minutes. I call these “get stuff done” workloads, meaning I can kick them off and go and get other stuff done while I await the results. And because I don’t want to have to stare at performance monitoring data while it runs, the 1 hour data is often perfect for seeing how the system was behaving during those runs.
  • 24 hour data. Some workloads take an entire workday to finish, and some benchmarks do as well. The 24 hour data is well suited for these workloads, without having to setup scripting tools to gather all the 1 hour data in chunks. Also, servers that run applications meant to service what I call “human activity” have 24 hour trends where peak activity can reveal some unexpected bottle necks not seen when trying to look at live performance data. Being able to look at the last 24 hours is very helpful when trying to determine if you are running into performance QoS issues.
  • 1 week data. Quite simply – because when a server is deployed to do real work – it doesn’t just run for a day and stop. It is important to be able to see the history of a server’s performance over more than just a single day, and when you are trying to find a bottleneck from an application that runs a batch job on Wednesday only!

Leveraging performance monitoring within your deployed ecosystem

These sensors drive performance management functionality such as Workload Performance Advisor and Jitter Smoothing. But you can leverage the sensors too. Besides being able to look at the graphs or download the data via RESTful API, each performance sensor is configurable to generate Simple Network Management Protocol (SNMP) alerts after crossing user defined thresholds and dwell times. This is particularly valuable if you want to, for instance, be alerted if your workload is suddenly maxing out processor utilization, or memory bandwidth utilization for an extended period of time. For your solution, this may indicate that you are resource-constrained and need to take action, such as adding more servers to a grid in order to meet certain quality of service (QoS) contracts. When we talk about the health of a server, we look at whether components are online and functioning; with performance monitoring, you can track server health from a performance point of view.

There are several ways to leverage performance monitoring in iLO 5 today, with the latest 1.40a firmware update and on Gen10 servers with Intel Scalable Xeon Processors:

  • Log into your server’s iLO 5 web page and view the Workload Performance Advisor tab. Check out the tuning recommendations and workload characteristics report that iLO makes, based on these performance sensors.
  • Log into your server’s iLO 5 web page and view the Performance Monitoring tab. Check the status of each of the available performance sensors over the past 10 minutes, 1 hour, 24 hours, and 1 week.
  • Program custom threshold alerts for sensors that are most critical to your business.
  • Check out the processor Jitter sensor and if you are seeing high rates of processor jitter, consider enabling Jitter Smoothing in the Performance Settings tab.
  • Want to do all of the above using scripting tools? Check out the latest RESTful API properties that provide the performance monitoring and workload performance advisor data through this powerful – industry standard – interface.

You can learn more about these features, including performance monitoring with iLO5 in this whitepaper. If you don’t have an iLO 5 advanced license and want to give these features a try, check out the free trial program.



Scott Faasse
Distinguished Technologist, Server Firmware and Performance
Hewlett Packard Enterprise

twitter.com/HPE_Servers
linkedin.com/in/scott-faasse-43870941/
hpe.com/servers

0 Kudos
About the Author

Scott_Faasse

Scott Faasse is a Distinguished Technologist and HPE’s expert on Platform and Processor Power Management and Performance. Since joining Compaq/HP/HPE in 2001, Scott has served as the lead platform firmware developer for six generations of the ProLiant DL380 Server, architected and developed HPE’s Power Regulator feature, lead HPE in several industry standards and partner collaboration efforts, and is one of the principal technologist behind HPE’s Intelligent System Tuning. Scott is also an avid outdoorsman. He enjoys hiking long distances with a really heavy backpack (rucking), camping with family, fly fishing in urban settings, and traditional archery.

Events
June 18 - 20
Las Vegas, NV
HPE Discover 2019 Las Vegas
Learn about all things Discover 2019 in  Las Vegas, Nevada, June 18-20, 2019
Read more
Read for dates
HPE at 2019 Technology Events
Learn about the technology events where Hewlett Packard Enterprise will have a presence in 2019.
Read more
View all