Operating System - HP-UX
1851094 Members
2304 Online
104056 Solutions
New Discussion

How to really stress out a system.

 
SOLVED
Go to solution
Steven E. Protter
Exalted Contributor

How to really stress out a system.

The day before it was supposed to go prod, an rp5450 serer with 11.11, March 03 patch set and lots of other patches, which serves two moderate oracle instances went wild.

Monday morning the red fault light went on, but pre-production testing went along merrily, nobody noticed. I thought the system was supposed to halt when the red fault light goes on.

Anyway, at 6:30 p.m. same day, I ordered(I can do that wow!) a remote reboot. The server failed to come up.

At 00:44 the next morning, it booted again.

At 7:40 a.m. the same morning it booted again, successfully without error.

No crash dump.

GSP reports a software panic, but nothing further.

Since then, just green lights, no attention lights.

I need some creative ways to stress this system out and put it over the edge so that whatever is wrong with it can be detected and corrected.

I have changed the crash dump configuration so that it goes to where I want where theres enough space.

So, what do I do to stress this system out and make it break before it goes production?

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
25 REPLIES 25
Ross Zubritski
Trusted Contributor

Re: How to really stress out a system.

Steven,

Does this box by chance have the new Tachyon XL2 FC Adapter(s) installed?

Regards,

RZ
John Meissner
Esteemed Contributor

Re: How to really stress out a system.

how about creating a script that runs hundreds or searches simultaneously and/or exporting hundreds of x applications simultaneously? how much stress would that cause?
All paths lead to destiny
Pete Randall
Outstanding Contributor

Re: How to really stress out a system.

Hi Steven,

I have to wonder what it was doing when it crashed. Shouldn't that be what you're trying to duplicate? Even if you manage to figure a way to hammer the system, it may not cause it to crash.

By the way, find is a wonderful way to put a load on a system. Just spawn of a dozen of two find commands running in background and watch the fun.


Pete

Pete
Steven E. Protter
Exalted Contributor

Re: How to really stress out a system.

Answer to the fiber card question.

No, but it has a different one in it.

I've tried find commands but obviously not enough. I think I may need to stress disk I/O.

What the system was doing when it started fault light, very low I/O oracle stuff. Nothing special, no real stress.

Quite a mystery.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Bill Hassell
Honored Contributor

Re: How to really stress out a system.

The red light indicates a hardware problem but not necessarily a catastrophic error that tales the machine down (ie, a redundant power supply fails, one of the many fans stops spinning). It might be a known bug in the GSP code that reports an error when there isn't one.

The tool you're looking for is EMS which is the online diagnostics and installed by default on new systems. If EMS is not installed, load it from your latest SupportPlus CDROM (or download it from the ITRC at: http://www.software.hp.com/cgi-bin/swdepot_parser.cgi/cgi/displayProductInfo.pl?productNumber=B6191AAE

These tools will monitor error reported by the CPU and send email to root on the local machine (did you look at root's email?) You can also look at the NVRAM log kept by the GSP by connecting to the console, type CTRL-b followed by 2 CR's for the login/password (if they've never been set) and look at the error logs.


Bill Hassell, sysadmin
Steven E. Protter
Exalted Contributor

Re: How to really stress out a system.

Wow, got Bill Hassel on one of my threads. Cool!

EMS was running, installed off the December 2002 Application CD's, or Internet, but early January time frame.

I have verified that EMS is running the way its supposed to. I had hardware check the machine and of course after two system boots on autopilot there wasn't much left in the logs. Just that one mysterious GSP record saying Software Panic.

Everyone's getting 7 points and the first person with good methodology and/or a script to safely stress the system on disk I/O gets a rabbit.

Hmmm. I'm thinking about a dd command. That would do it. I really, really huge read.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Jeff Schussele
Honored Contributor

Re: How to really stress out a system.

Hi SEP,

I'm somewhat baffled as to why you seem determined to chase down a "possible" HW problem, when the GSP is reporting a SW panic?
Never underestimate the power of crap code to mess a system up.
What do the shutdownlog, possible tombstones & crash dump, if created, show?
Any HPMCs?


Oh & to answer the question posed in the subject:

Tell it that it may be decommissioned in a Corporate right-sizing initiative. =~()

Cheers,
Jeff (Who's looking forward to a nice spring-like weekend - in the 70's baby!)
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
John Poff
Honored Contributor

Re: How to really stress out a system.

Hi Steven,

Any clues in your /var/tombstones file? We have an N4000 loaner in for a software test and the thing was giving me fits for a couple of days. It would run for a couple of hours and then panic with the red LED. Finally I got hooked up with an HP engineer and he deciphered the tombstone file, and figured out that I was having problems with some bad memory. I pulled out one of the memory carriers [it had 3 with 4 Gb each] and it was happy after that.

As mentioned before, you probably have a hardware problem, but the real trick is replicating it, which makes it a little tough to decide what stress test to run on the box. I'd suggest trying something that eats up lots of RAM, and also CPU. Since Oracle seems the best tool for that job, I'd try firing up some jobs inside Oracle to stress out the box.

JP
Steven E. Protter
Exalted Contributor

Re: How to really stress out a system.

Nada in /var/tombstones

Everything was checked.

Still scratching the head.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Volker Borowski
Honored Contributor
Solution

Re: How to really stress out a system.

Steven,

what about the most obvious:

create table
as select form
emp a, emp b, emp c , ....

I do not know if you have the standard tables in your DB, take another one then, but if you join the same table to itself a few times, you'll get tons of rows soon.

Just for IO-Stress, create a dummy TS across all your disks, and create the table just without any index into this TS.
Creation will give you write test.
Any select * after creation will give you read test because of full table scan due to missing index.
Optimize stress by creating hash partitions for this table in each datafile to support parallel query upon read.

Volker

Steven E. Protter
Exalted Contributor

Re: How to really stress out a system.

That would do it. A nice dd read would help me eliminate an I/O related system panic wouldn't it.

Being lazy and sick, A second rabbit for the dd command.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
John Poff
Honored Contributor

Re: How to really stress out a system.

Could it be an environmental problem? You are a thorough kind of guy so I'm sure you've got good power and the temperature is under control, but you never know. Maybe somebody is plugging in the vacuum cleaner on the same circuit or something crazy like that? It's a crazy idea, but weird problems sometimes have weird causes.

JP
Volker Borowski
Honored Contributor

Re: How to really stress out a system.

No rabbit for this one :-)

dd if=/dev/zero of=/fs1/data1 bs=1k count=1000000 &

And then of course some 5 to 10 all at once.

Check "man dd" for "count=". I am at home and have no system available now, but it should be "count".

Volker

Pete Randall
Outstanding Contributor

Re: How to really stress out a system.

JP,

That would be way too easy to spot. All you have to do is look for the cleaning guy whose hair is still smoking after plugging his vacuum into the 208V circuit.

;^)


Pete

(Feeling better yet, Steven?)

Pete
Steven E. Protter
Exalted Contributor

Re: How to really stress out a system.

Hmmm. Environment. We have the machines in two racks with HP UPS built in. I've actually run the power cables so that an an entire UPS can go down and all three systems should stay running.

In such a situation there should be a GSP message though. Power failure is a Attention or Yellow. I'll have to look that up.

Okay, thanks for the awesome help. I'm going to let this one go for the Sabbath now.

I'll be back Sat. night to read up and hand out more points.

I like handing out points.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
John Bolene
Honored Contributor

Re: How to really stress out a system.

we had one that was going down due to a bad UPS system and another that had a bad battery in the UPS

for a cpu stress only, you can load seti, one copy of the program for each processor

http://setiathome.ssl.berkeley.edu
It is always a good day when you are launching rockets! http://tripolioklahoma.org, Mostly Missiles http://mostlymissiles.com
John Poff
Honored Contributor

Re: How to really stress out a system.

Pete,

That would do it! Maybe all the smoke from the vacuum cleaner is causing his box to crash. :)

JP
John Bolene
Honored Contributor

Re: How to really stress out a system.

for a darn good I/O and cpu beating

cd /
find .|xargs cksum

this has to read and compute the checksum for every byte on the system
It is always a good day when you are launching rockets! http://tripolioklahoma.org, Mostly Missiles http://mostlymissiles.com
Martin Johnson
Honored Contributor

Re: How to really stress out a system.

Don't rule out environmental problems.

A few years back a couple of us were swapping war stories and this one came up:

A software specialist was called to a customer site because a newly installed system was crashing several times a day and the hardware people could not find anything wrong with the hardware. They even replaced most of the major components, with no results.

The software specialist ruled out the software as different applications were running when the crashes occurred. There were no indications of an OS problem.

The software specialist requested an environmental specialist be called in. The environmental specialist, in his initial examination, did not find anything unusual. So he decided to take a break and go to the bathroom.

While he was in the bathroom, the system crashed. When he was told of the crash, he immediate went back to the urinal he used and flushed it several times. Nothing happened.

Further examination revealed that the system had been grounded to a drain pipe where a six foot section of the metal pipe had been replaced with plastic. Further testing demonstrated that flushing the urinal after it had been *USED* caused the system to crash. Something to do with electrolytes.

The fix was not to use the drain pipe to ground the system.

True Story
Marty
Pete Randall
Outstanding Contributor

Re: How to really stress out a system.

Marty,

If it hadn't been for the fact that it was a urinal, I would have been sorely tempted to say that that story was full of s**t.

Is that really, really true? Priceless!


Pete

Pete
Martin Johnson
Honored Contributor

Re: How to really stress out a system.

Pete,

I wasn't there personally, but I have no reason to doubt it's authenticity. We software specialists have seen some pretty strange things in our time.

As a software specialist, I was called out to a customer site where a newly installed system was crashing for no apparent reason about every 2 weeks. The first thing I noticed, was the computer room was on the cool, damp side. I inspected the crash dumps and ruled out the software and OS. I called in an environmental specialist. It took him a while to determine the cause: The cool, damp conditions encouraged a mold/fungus/growth inside the back of the system to grow. When it touched the backplane, it shorted out the system, and burnt back some of the growth. It took about 2 weeks to grow back and short out the system again. We removed the growth and had the customer raise the temperature and reduce the humidity. The problem did not reoccur.

Or there was the time I was called in to investigate a system that was repeatedly crashing. The system was installed in the winter with no problems. Now that it was springtime, the system would crash on warm, sunny afternoons. The system was grounded by a wire that was attached to a metal rod that had been pounded into the ground. The attachment was poorly connected. The warm afternoon sun would cause the wire to expand and no longer make contact with the rod. This left the system susceptible to crashing. Tightening the connection eliminated the problem.

Marty
Tim Medford
Valued Contributor

Re: How to really stress out a system.

Steven - Is this a brand new machine? I had a similar, although not identical problem with a new rp5470 system configured and shipped by HP.

The tech from HP ended up pulling and reseating all the PCI cards and I haven't had any trouble since. Seems that sometimes they work loose during shipping and can cause erratic problems.

For what it's worth.

Tim
John Meissner
Esteemed Contributor

Re: How to really stress out a system.

Seeing as how many people are "taking it there" I'll just add my 2 cents:

you should buy another squirl to run on the treadmill inside your server.... If you only have one he might get tired and stop running... you could also get a cat on a treadmill to chase the squirl....
All paths lead to destiny
Shannon Petry
Honored Contributor

Re: How to really stress out a system.

Well, I believe that MySQL has a bench mark program on their site somewhere which really beats the snot out of the system for all aspects of database. The benchmark test runs in all major databases (SAP, Oracle, DB2, etc...) and tests DB performance. At the same time, this of course beats on the OS.


Not sure the route you want to go, but if I were in your shoes, I'd run this test in a loop for a day or so, or multiple instances for a day or so if possible.

Sincerely,
Shannon


ps Stress your machine by telling it your trading it in for an NT server.
Microsoft. When do you want a virus today?