IT Operations Management (ITOM)
Showing results for 
Search instead for 
Do you mean 

How Mentor Graphics solved complex OS provisioning failure rate with HP Server Automation

NimishShelat on ‎08-05-2015 02:21 PM ‎01-01-2017 09:35 PM kimlock

James Bagley.pngBy James Bagley, Grid Computing / SW Developer, Mentor Graphics

 

Editor’s note: This article is part of an ongoing series of guest posts by HP Software customers about Automation and Cloud Management use cases.

 

Mentor Graphics works in Electronic Design Automation (EDA), designing, testing and verifying computer processors, printed circuit boards, automotive electrical systems like infotainment systems and wire routing for a modern jumbo jet.

 

One of Mentor's products allows designers to virtualize and simulate a computer processor, send it commands and examine the output. Considering that the latest CPU chip designs have about 5 billion transistors, the cost of tooling up to make one is pretty significant; you want to be reasonably sure it's going to work first.

 

Not surprisingly, these simulations and verifications are very CPU- and Memory-intensive, while other types of regression testing take a small amount of resources per test, but the tests number in the hundreds of thousands.

 

That’s why Mentor (and most of our customers) use some variant of grid computing to manage these engineering processes and get the right resources to the right process.

 

So. Many. Operating Systems.

But the broad range of large and small tasks also requires a broad range of small and large computers in our data center, and compatibility testing dictates that we also keep a fairly broad range of operating systems available. Finally, to gain the broadest support for compiled binary, you need to compile on the oldest available OS. As a result, we are often bumping up against both the oldest OS we can still get support for as well as the latest bleeding-edge OS available from vendors.

 

Our computing environment has 50 unique hardware models in one data center, not including Sun or IBM (non-Intel) systems, with 31 unique operating systems and 35 different configurations. Altogether, that gives us more than 50,000 possible combinations.

 

Unreliable OS Provisioning

The challenge we faced was in the ability of automated OS provisioning tools to successfully complete their tasks. Initially, our efforts in automating OS provisioning experienced unacceptable failure rates due to the inability of the tool or process to establish compatibility between hardware and software. When a critical-path deployment for a high-profile division was blocked by issues with the system, my boss came to me and said, “Just make it work.” (That’s a very dangerous thing to say to a programmer like myself.)

 

So what causes the high failure rate?

 

I found technicians were selecting operating systems in HP Server Automation (SA) that were not compatible with the hardware being provisioned. So in some ways, the failure rate was artificially high since we would see 3-5 failures for one host as they hunted around trying one OS after another until one worked.

 

My programmer background told me that this was pretty inefficient way of deriving compatibility. Surely I could come up with some way to relate operating systems to hardware models to avoid the error in the first place.

 

We worked up some new hardware compatibility logic in form of a dynamic dropdown menu system that would offer only the hardware that was compatible to the OS being provisioned. Figure 1 (below) shows a SQL schema of what we came up with. “mp” in the bottom right stands for "management port", and mptype would be HP, Dell, VM or IPMI, while “productName” is the server model name.

 

(Note the “gfs_boot_minutes” and “osbp_minutes” in the bottom left of the graphic.)

 

SQL Schema.png

 

Fig. 1: Mentor Graphics SQL schema to relate operating systems to hardware.

 

This hardware compatibility logic helped us achieve a 94.35 percent resolution rate, and only 5.65 percent failure.

 

This is clearly, a big improvement!

 

Populating an OS compatibility database

 

In order to develop this dynamic drop down system, we created a database. But how do you populate or maintain an OS compatibility database? Nobody would really want that job populating 50,000 possible combinations — that's like one full-time employee doing data entry for a year!

 

Instead, we populated the database using a validation process in Server Automation. Validation is a sort of regression test. Each successful installation is recorded in the database.

 

The validation also recorded various time measurements. This turned out to be side benefit to another problem: a technician would typically need to wait for a deployment to fail via timeout before trying again. Now we use the recorded times to create timeout values appropriate for whatever hardware and operating system combination is being deployed. This has helped reduce latency times, in cases where the process was going to fail for other reasons.

 

Time to Provision

 

A second side benefit is that the regression helped us measure the speed of provisioning for each pair. What we found was a full regression test of every OS typically takes between 10 and 12 hours. At the end of which, whoever is running the test gets an email that looks like Figure 2.

 

regression results.png

 

Fig 2: Mentor Graphics OS Build Plan Regression Results

 

This simple report shows what operating systems work on what hardware, and also the reverse — what hardware is compatible with what operating system. Now we can suggest to management the right hardware to purchase—based on the pairing data. We also manage inventory by capacity, so we can deliver different service levels in terms of time to provision and pick the right pair based on time-to-market needs.

 

With a little ingenuity, we have been able to substantially improve the OS provisioning reliability, and make valuable use of timeout values and time to provision data — all using HP Server Automation.

Learn more

 

Read more about HP Server Automation

 

Read the other blogs in this series:

 

About the author: James Bagley is an accomplished Software Developer and IT Professional working for Mentor Graphics. With a background in both programming and system administration, James had the experience to develop automation for IT.

 

0 Kudos
About the Author

NimishShelat

Nimish Shelat is currently focused on Datacenter Automation and IT Process Automation solutions. Shelat strives to help customers, traditional IT and Cloud based IT, transform to Service Centric model. The scope of these solutions spans across server, network, database and middleware infrastructure. The solutions are optimized for tasks like provisioning, patching, compliance, remediation and processes like Self-healing Incidence Remediation and Rapid Service Fulfilment, Change Management and Disaster Recovery. Shelat has 23 years of experience in IT, 20 of these have been at HP spanning across networking, printing , storage and enterprise software businesses. Prior to his current role as a Manager of Product Marketing and Technical Marketing, Shelat has held positions as Software Sales Specialist, Product Manager, Business Strategist, Project Manager and Programmer Analyst. Shelat has a B.S in Computer Science. He has earned his MBA from University of California, Davis with a focus on Marketing and Finance.

Events
June 6 - 8, 2017
Las Vegas, Nevada
Discover 2017 Las Vegas
Join us for HPE Discover 2017 in Las Vegas. The event will be held at the Venetian | Palazzo from June 6-8, 2017.
Read more
Apr 18, 2017
Houston, TX
HPE Tech Days - 2017
Follow a group of tech bloggers for a new HPE Tech Day, a full day of sessions about how to create a hybrid IT, from hyperconverged to Composable Infr...
Read more
View all
//Add this to "OnDomLoad" event