Re: disk io i've got figures but whats normal

user001 · ‎01-12-2012

hey all,

Chasing some help trying to work out if some disks are under too much load.

they've been graphing disk io physical and logical through glance into a monitoring system for a little while now.

i think its an rx26XX server with mirror ux running 11.31

users are complaining the application they are using randomly locks up, i think they need to go back and look at the software myself but i just want to make sure disk io is ok.

is there a way to determine acceptable disk IO for system peformance?

i haven't been able to make a connection between the app locking up and running sar -d 3 500.

glance isn't complaining either e.g. impending bottleneck, i've gone through the syslog.

i suggested they should upgrade the patch level and see how it goes but they seem to be histent. its running 2007 patchset if i remember correctly.

any advice would be great.

one thing i did noticed was the fcache_fb_policy was enabled in the kernal tunable which i thought might be causing problems?

thanks.

Dennis Handly · ‎01-13-2012

>users are complaining the application they are using randomly locks

Locks up and hangs permanently? Or just gets extremely slow for awhile?

Bill Hassell · ‎01-15-2012

Disks don't get tired. You can read or write as fast as your server will allow and the system will not hang. The users need to define what they mean by 'hang or lockup'. Is a hang more than 3 seconds with no change on the screen, or more than 5 minutes? Is the application using network (NFS) shares? Is the application communicating with the network when it hangs? If they experience a lockup, do you have to reboot the server?

And most important, 2007 is 5 years ago...are you running PCs that have never been patched for 5 years (or are they being patched automatically every week or so)? Bottom line is that there are numerous errors and changes that need to be made in the 2007 install, some of which will cure some lockup problems. I know the users want a simple go-faster button but the reality is: patches fix problems, current and future.

Bill Hassell, sysadmin

user001 · ‎01-16-2012

thanks for the replies.

when they say lockup they are talking about 10 second application lockup.

another reason i think its not disk io related is because they can still use the server e.g. change between directory and run other tools.

it is a networked application but its only a small group of people on the same lan.

no nfs shares.

no they do not have to reboot the server, it just catches up and resumes.

i'm not sure about the patching, its just something i noitced while i was there. I'd have to check with them, i'm pretty sure they said they were up to date but i could have sworn i saw an old patch level applied which raised some concerns.

is there a way to go through "lockup cures" related to patching so i can double check there patches again.

thanks for the info.

BowlesCR · ‎01-16-2012

Being able to browse the server may or may not rule out a disk issue. Depending on your storage layout, the disks hosting the directories they're browsing may be totally unrelated to the disks the application is using.

I'd try using `glance` if your OE includes it and tabbing through the various subsystem reports (disk, network, etc) to see if any smoking guns pop out.

user001 · ‎01-17-2012

Hi,

I've had glance running all day with the adviser running and maybe once throughout the day disk util will get to 60% for second and return to normal.

The network however is always complaining and hitting 100% probabilty for packet rate and queue length.

I always figured these numbers were wrong though considering looking at the snmp figures for the interface it averages around 50Mb. Its a 100Mb card.

I looked into it initally but then found others had the same problem because glance thought it was a 10Mb card or something along those lines.

Is there a way to confirm the network releated stats?

Thanks.

Bill Hassell · ‎01-17-2012

...The network however is always complaining and hitting 100% probabilty for packet rate and queue length.

Are you using glance or gpm? glance just reports packet rates with no % value.

...I always figured these numbers were wrong though considering looking at the snmp figures for the interface it averages around 50Mb. Its a 100Mb card.

50Mbit throughput for a 100 Mb card is almost maxed out. With overhead and handshaking, 100 Mbit is really busy at 50 Mb -- and that is probably the red flag since your disks seem to be loafing. Use the character version of glance so you can see actual numbers.

There is no simple way to determine patches -- an old patch may be the latest while others have been updated this year. The patch numbers are a good indication. Use the command:

show_patches | sort

The highest numbered patches are the latest and for 11.23, numbers in the 30000-40000 range are fairly current. If the highest numbers are less than 30000, then the system is in need of patches.

Bill Hassell, sysadmin

user001 · ‎01-17-2012

Hey Bill,

I was just looking at xglance, i don't use glance often and this seems to be easier for me.

Could you please point me in the right direction to verify the network is overloaded?

Thank you.

Bill Hassell · ‎01-17-2012

Packets per second is useful. gpm (the Xwindow interface) will display the actual numbers. Dozens to hundreds are normal, thousands will be busy but as always, it depends. For very short packets (less than 1KB), thousands is busy, whereas large blocks of data, hundreds may be maxed out. As with all 100 Mbit connections, be sure that the card is auto-negotiating speed and duplex correctly. For lan0, use the command:

lanadmin -x 0

(or lanadmin -x 1 for lan1, etc)

It should report 100 Full-Duplex and auto = on.

If you see half-duplex, your card is misconfigured and running at less than 10% normal speed.

Bill Hassell, sysadmin

user001 · ‎01-17-2012

i see so i'm running gpm and looking at network by interface.

its an aggregated link in LB_HS

duplex and negotiation are good.

in pkt rate 1540

out pkt rate 2231

input pkt 21003

out pkt 30754

network % 2.39

current warning is a 60% chance of bottle neck

Thanks.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: disk io i've got figures but whats normal

disk io i've got figures but whats normal