Operating System - HP-UX
1848812 Members
8525 Online
104037 Solutions
New Discussion

Re: Need options for diagnosing problems

 
Ron D.
Frequent Advisor

Need options for diagnosing problems

A system that I manage has been having performance problems during backup procedures for the past 2 months. CPU utilization is usually below 20% with occasional spikes to 30%. The system is a model L with 4 CPUs.

Disk utilization, however, is running much higher than it should. One disk that is part of a SAN has the following SAR data:
device %busy avque r+w/s blks/s avwait avserv
Average c6t0d2 99.72 646.82 339 6183 6.91 16.41

The SAN is an LSI Logic MetaStor with 2 trays of disks. The volume consists of 3 disks in a RAID 5 configuration. Other systems attached to the MetaStor have no problems with performance.

Users don't seem to notice the problems even though performance reports, such as sar and glance plus, indicate that it is happening nearly all the time. One oddity, full backups ran for 2 hours before the problem occured and now take 13 hours with the exception of last Sunday evening when they took 2 hours.

Does anyone have ideas about how to troubleshoot the problem? If you have potential solutions, I would appreciate them as well. So far, I used glance and sar to get a better idea of utilization and what processes are hitting the disks, but they all look pretty basic.

Thanks
10 REPLIES 10
John Meissner
Esteemed Contributor

Re: Need options for diagnosing problems

How much memory does your system have?
All paths lead to destiny
Eugeny Brychkov
Honored Contributor

Re: Need options for diagnosing problems

Ron,
check with LSI diagnostics output if this volume has read/write cache turned off
Eugeny
Ron D.
Frequent Advisor

Re: Need options for diagnosing problems

4GB of RAM. Swap file is 4GB also with usage currently at 75MB, or about 5% and doesn't vary much
Vincente Fernandes
Valued Contributor

Re: Need options for diagnosing problems

run a dd on the c6t0d2 drive using dd if=/dev/dsk/c6t0d2 of=/dev/null bs=4096k, warning dd will take sometime. If you have STM or Online Diag installed use stm and run information/verify on the drive and see if it gives any errors etc.
A. Clay Stephenson
Acclaimed Contributor

Re: Need options for diagnosing problems

You have to understand what you are seeing when you examine disk utilization reports when they are refering to array devices. Your UNIX box doesn't have a clue that this is actually an array device; all it knows it that a tremendous amount of i/o is going through what it thinks is a single disk. You may, in fact, have no real problem at all.


The difference in your backup times is more puzzling. It might be related to the tape drive throughput or it might be related to the disk throughput OR if this is a network backup (OB2, Veritas, etc.) then you may have a network bottleneck or even a hostname resolution problem.

You need to use Glance and drill down to set have fast the file offset is advancing as the backup writes to the tape device. You also need to see what the i/o rates are through the various disks.

I would also check to see if mount options have changed. (convosync, mincache)

The other thing to look at is the tunable disksort_seconds (PHKL_21678 but it may have been superceded or you may need a version for 11.11)).
This changes the fairness algortihm for sequential vs. random i/o.
If it ain't broke, I can fix that.
Ron D.
Frequent Advisor

Re: Need options for diagnosing problems

Here's some more info:
Avg que on the disk in question is 646.82 as noted above. Based on that, I would think that I actually do have a problem. Other disks on the system show que length of less than 15.

I checked the LSI Logic utilities and have both read and write cache enabled for the volume. An option exists to enable write caching if the system senses low battery voltage on the controllers. I have temporarily enable this option too to see if there is a change.

I power cycled the tape drive to see if it makes a difference.

I/O rates for the drive are running in the 300+ range, occasionally going up to 350+.

The volume in question is utilized as a raw volume and is not in fstab; sorry I didn't mention this earlier.

I constructed a duplicate system (h/w and s/w) using tapes from Monday's backup and performed a backup of that system last evening. The full backup took 1h 33m 23s. Keep in mind that this is without users on it, but did have all processes running so it's about what I expected.

W.C. Epperson
Trusted Contributor

Re: Need options for diagnosing problems

What sort of tape drive, and how is it connected? Streaming tape drives will show the kinds of performance disparity you note if you don't feed them quite fast enough to keep them streaming. And streaming SCSI tape drives like dedicated SCSI buses: other activity on the bus can cause them to be data-starved and thus slow waaaaay down.
"I have great faith in fools; self-confidence, my friends call it." --Poe
Ron D.
Frequent Advisor

Re: Need options for diagnosing problems

The tape drive is a DLT 35/70
W.C. Epperson
Trusted Contributor

Re: Need options for diagnosing problems

Ok, a DLT. They're about the worst about non-streaming performance (actually wear out way fast because of the repositioning). And they're awful if there's any other activity on the SCSI bus with them. Is there anything else on this SCSI card?
"I have great faith in fools; self-confidence, my friends call it." --Poe
Ron D.
Frequent Advisor

Re: Need options for diagnosing problems

The DLT is the only device attached to that particular SCSI card.

Incidentally, I found some corruption in the database being backed up and am pursuing that as a potential solution. The database is for a medical application and uses Mumps. The vendor is looking into it to see what can be done.

Ya gotta love those vertical market apps...