System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

disk io i've got figures but whats normal

SOLVED
Go to solution
user001
Frequent Advisor

disk io i've got figures but whats normal

 

hey all,

 

Chasing some help trying to work out if some disks are under too much load.

 

they've been graphing disk io physical and logical through glance into a monitoring system for a little while now.

 

i think its an rx26XX server with mirror ux running 11.31

 

users are complaining the application they are using randomly locks up, i think they need to go back and look at the software myself but i just want to make sure disk io is ok.

 

is there a way to determine acceptable disk IO for system peformance?

 

i haven't been able to make a connection between the app locking up and running sar -d 3 500.

 

glance isn't complaining either e.g. impending bottleneck, i've gone through the syslog.

 

i suggested they should upgrade the patch level and see how it goes but they seem to be histent. its running 2007 patchset if i remember correctly.

 

any advice would be great.

 

one thing i did noticed was the fcache_fb_policy was enabled in the kernal tunable which i thought might be causing problems?

 

thanks.

 

11 REPLIES
Dennis Handly
Acclaimed Contributor

Re: disk I/O i've got figures but what's normal

>users are complaining the application they are using randomly locks

 

Locks up and hangs permanently?  Or just gets extremely slow for awhile?

Bill Hassell
Honored Contributor

Re: disk io i've got figures but whats normal

Disks don't get tired. You can read or write as fast as your server will allow and the system will not hang. The users need to define what they mean by 'hang or lockup'. Is a hang more than 3 seconds with no change on the screen, or more than 5 minutes? Is the application using network (NFS) shares? Is the application communicating with the network when it hangs? If they experience a lockup, do you have to reboot the server?

 

And most important, 2007 is 5 years ago...are you running PCs that have never been patched for 5 years (or are they being patched automatically every week or so)? Bottom line is that there are numerous errors and changes that need to be made in the 2007 install, some of which will cure some lockup problems. I know the users want a simple go-faster button but the reality is: patches fix problems, current and future.



Bill Hassell, sysadmin
user001
Frequent Advisor

Re: disk io i've got figures but whats normal

thanks for the replies.

 

when they say lockup they are talking about 10 second application lockup.

 

another reason i think its not disk io related is because they can still use the server e.g. change between directory and run other tools.

 

it is a networked application but its only a small group of people on the same lan.

 

no nfs shares.

 

no they do not have to reboot the server, it just catches up and resumes.

 

i'm not sure about the patching, its just something i noitced while i was there. I'd have to check with them, i'm pretty sure they said they were up to date but i could have sworn i saw an old patch level applied which raised some concerns.

 

is there a way to go through "lockup cures" related to patching so i can double check there patches again.

 

thanks for the info.

 

BowlesCR
Advisor

Re: disk io i've got figures but whats normal

Being able to browse the server may or may not rule out a disk issue. Depending on your storage layout, the disks hosting the directories they're browsing may be totally unrelated to the disks the application is using.

I'd try using `glance` if your OE includes it and tabbing through the various subsystem reports (disk, network, etc) to see if any smoking guns pop out.
user001
Frequent Advisor

Re: disk io i've got figures but whats normal

Hi,

 

I've had glance running all day with the adviser running and maybe once throughout the day disk util will get to 60% for second and return to normal.

 

The network however is always complaining and hitting 100% probabilty for packet rate and queue length.


I always figured these numbers were wrong though considering looking at the snmp figures for the interface it averages around 50Mb. Its a 100Mb card.

 

I looked into it initally but then found others had the same problem because glance thought it was a 10Mb card or something along those lines.

 

Is there a way to confirm the network releated stats?

 

Thanks.

Bill Hassell
Honored Contributor

Re: disk io i've got figures but whats normal

...The network however is always complaining and hitting 100% probabilty for packet rate and queue length.

 

Are you using glance or gpm? glance just reports packet rates with no % value.


...I always figured these numbers were wrong though considering looking at the snmp figures for the interface it averages around 50Mb. Its a 100Mb card.

 

50Mbit throughput for a 100 Mb card is almost maxed out. With overhead and handshaking, 100 Mbit is really busy at 50 Mb -- and that is probably the red flag since your disks seem to be loafing. Use the character version of glance so you can see actual numbers.

 

There is no simple way to determine patches -- an old patch may be the latest while others have been updated this year. The patch numbers are a good indication. Use the command:

 

show_patches | sort

 

The highest numbered patches are the latest and for 11.23, numbers in the 30000-40000 range are fairly current. If the highest numbers are less than 30000, then the system is in need of patches.



Bill Hassell, sysadmin
user001
Frequent Advisor

Re: disk io i've got figures but whats normal

Hey Bill,

 

I was just looking at xglance, i don't use glance often and this seems to be easier for me.

 

Could you please point me in the right direction to verify the network is overloaded?

 

Thank you.

Bill Hassell
Honored Contributor

Re: disk io i've got figures but whats normal

Packets per second is useful. gpm (the Xwindow interface) will display the actual numbers. Dozens to hundreds are normal, thousands will be busy but as always, it depends. For very short packets (less than 1KB), thousands is busy, whereas large blocks of data, hundreds may be maxed out. As with all 100 Mbit connections, be sure that the card is auto-negotiating speed and duplex correctly. For lan0, use the command:

 

lanadmin -x 0

(or lanadmin -x 1 for lan1, etc)

 

It should report 100 Full-Duplex and auto = on.

If you see half-duplex, your card is misconfigured and running at less than 10% normal speed.



Bill Hassell, sysadmin
user001
Frequent Advisor

Re: disk io i've got figures but whats normal

 

i see so i'm running gpm and looking at network by interface.

 

its an aggregated link in LB_HS

 

duplex and negotiation are good.

 

in pkt  rate      1540

out pkt rate     2231

input pkt          21003

out pkt             30754

network %      2.39

 

current warning is a 60% chance of bottle neck

 

Thanks.

Bill Hassell
Honored Contributor
Solution

Re: disk io i've got figures but whats normal

So the packet count looks good, so your network is running well.  If Glance alarms at 60%, then you've probably reach ed the upper limit of the computer+LAN combination. And of course, the primary system throughput depends on the performance of the remote end plus all the steps in between.

 

The 10 second pauses may be due to the way the application handles data record contention. Ask the vendor about performance limits and enhacements.



Bill Hassell, sysadmin
user001
Frequent Advisor

Re: disk io i've got figures but whats normal

hey bill,

the maintains at 60 and spikes at 95 and 100.

thanks bill this was my suspicion.