Simpler Navigation for Servers and Operating Systems - Please Update Your Bookmarks
Completed: a much simpler Servers and Operating Systems section of the Community. We combined many of the older boards, so you won't have to click through so many levels to get at the information you need. Check the consolidated boards here as many sub-forums are now single boards.
If you have bookmarked forums or discussion boards in Servers and Operating Systems, we suggest you check and update them as needed.
General
cancel
Showing results for 
Search instead for 
Did you mean: 

DL140G2 + RHEL4 + SW RAID = system load never 0?

SOLVED
Go to solution

DL140G2 + RHEL4 + SW RAID = system load never 0?

Strange problem dogging me. I have a DL140G2 system with two 80GB SATA drives and one 2.8GHz Xeon processor. I've installed RHEL4 on the system using software RAID1 (mirroring).

I'm booting with ide0=noprobe ide1=noprobe in order to make use of the ata_piix driver (otherwise PATA mode is used).

There is currently _nothing_ running on this system and the weird thing is, the system load never reaches 0.00! It's almost always around 0.10 or 0.22 and for the life of me I cannot figure out what is bumping the load up.

Now, maybe this is not something to be concerned about, but on my DL140G1 systems I have never encountered this. Of course these were systems using old ATA drives, but the idle load on 'em was always 0.00.

On the new G2 boxes, while I am using the system, there is also occasional IO pauses where I'll enter in a simple command like 'uptime' and the system pauses to think for a good 5 seconds then spits back the answer.

It all adds up to me worrying that this issue will mangify itself once the machine is in production.

Things I have tried:

- Booting in non-SMP mode (HT on in BIOS)
- Booting in SMP mode with noapic (HT on in BIOS)
- Booting in non-SMP mode (HT off in BIOS)
- Booting in SMP mode (HT off in BIOS)

Nothing has fixed the issue.

I've also disabled USB on the system as it appeared to be on the same IRQ as eth0. I am using the HP bcm5700 drivers and this is with the stock RHEL 2.6.9-34 kernel.

Any suggestions? Should I build my own kernel and disable APIC/HT support there?

Also wondering if perhaps the combo of SW RAID and the SATA drivers are causing the issue, but I don't know how to track this down.
11 REPLIES
Steven E. Protter
Exalted Contributor

Re: DL140G2 + RHEL4 + SW RAID = system load never 0?

Shalom,

SATA could be conributing to the issue, but there really isn't an issue.

I've installed RHEL4 on the system using software RAID1 (mirroring).

If there is any activity on the system, even log entries going on, there is going to be work to keep the software mirror updated..

You should consider using hardware raid instead, it takes the activity off the CPU leaving it free for more meaningful work.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Re: DL140G2 + RHEL4 + SW RAID = system load never 0?

Thanks for the reply... I've had great experience with Linux software RAID before (no symptoms like this) on ProLiant hardware. That's the only reason I'm puzzled by what's going on.

In any case, I'm not going to worry about it too much. Will see how it behaves once it's out in production.

Re: DL140G2 + RHEL4 + SW RAID = system load never 0?

Well, I've built a custom kernel using the latest 2.6.16 sources and no longer am having any of the symptoms I was with the Red Hat kernel.

If I switch back however, the symptoms return.

Guess I'll try using my .config from the .16 kernel on the RH kernel and see if that does the trick.
Alan_152
Honored Contributor

Re: DL140G2 + RHEL4 + SW RAID = system load never 0?

i wouldn't expect the system load to ever be "0". Start "top" or "ps -ef" and you'll see lots of stuff running in the background.

Re: DL140G2 + RHEL4 + SW RAID = system load never 0?

This is a completely empty system right now. Yes there are processes running, but I have plenty of other servers that are either sitting unused or only doing simple things like DNS resolution and their load is _always_ right around 0.00, 0.01.

There is no way this server should be sitting at 0.25-0.30 load when it's running nothing but sshd and the normal OS processes all night.

[rayvd@backup rayvd]$ uptime
08:32:43 up 51 days, 20:46, 3 users, load average: 0.00, 0.00, 0.00

This server has a "slower" processor than mY DL140 but is idle most of the day just as my new server is.
Stuart Browne
Honored Contributor

Re: DL140G2 + RHEL4 + SW RAID = system load never 0?

I disagree with you there Alan. Even with background processes, if the server isn't production, and isn't being tested against (in otherwords, 'Idle'), it should have 0.00/0.00/0.00.

I have production email servers that usually stay lower than 0.35, whilst under constant load!

Anyway.. ;)

Figured out which kernel-config options are causing it?
One long-haired git at your service...
Alan_152
Honored Contributor
Solution

Re: DL140G2 + RHEL4 + SW RAID = system load never 0?

Well, I didn't mean to start a fight.

Anyway, I've just installed a max rhel4u3 config on a max-configured rx2620. I'm accessing it through the MP, so it is as hands-off and no load as I can make it without actually turning stuff off.

In multiuser mode, here's what I've got:

[root@max ~]# uptime
17:18:37 up 1 min, 1 user, load average: 0.49, 0.23, 0.09

top - 17:18:51 up 2 min, 1 user, load average: 0.38, 0.22, 0.08
Tasks: 77 total, 1 running, 76 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0% s
Mem: 16641696k total, 417760k used, 16223936k free, 25152k buffers
Swap: 2031584k total, 0k used, 2031584k free, 102816k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3430 root 16 0 4192 2368 1840 R 0.1 0.0 0:00.01 top
1 root 16 0 3440 1536 1232 S 0.0 0.0 0:02.62 init
2 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
4 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/1
5 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/1

[root@max ~]# ps -eo "%C %c"
%CPU COMMAND
0.4 init
0.0 migration/0
0.0 ksoftirqd/0
0.0 migration/1
0.0 ksoftirqd/1
0.0 migration/2
0.0 ksoftirqd/2
0.0 migration/3
0.0 ksoftirqd/3
0.0 events/0
0.0 events/1
0.0 events/2
0.0 events/3
0.0 khelper
0.0 kacpid
0.0 kblockd/0
0.0 kblockd/1
0.0 kblockd/2
0.0 kblockd/3
0.0 khubd
0.0 pdflush
0.0 pdflush
0.0 aio/0
0.0 kswapd0
0.0 aio/1
0.0 aio/2
0.0 aio/3
0.0 kseriod
0.0 scsi_eh_0
0.0 qla2400_0_dpc
0.0 scsi_eh_1
0.0 qla2400_1_dpc
0.0 scsi_eh_2
0.0 scsi_eh_3
0.0 kmirrord
0.0 kmir_mon
0.0 kjournald
0.0 udevd
0.0 kauditd
0.0 kmpathd/0
0.0 kmpathd/1
0.0 kmpathd/2
0.0 kmpathd/3
0.0 dhclient
0.0 syslogd
0.0 klogd
0.0 portmap
0.0 rpc.statd
0.0 rpc.idmapd
0.0 smartd
0.0 acpid
0.0 cupsd
0.0 sshd
0.0 xinetd
0.0 sendmail
0.0 sendmail
0.0 gpm
0.0 htt
0.0 htt_server
0.2 cannaserver
0.0 crond
0.0 xfs
0.0 anacron
0.0 atd
0.0 salinfod
0.0 dbus-daemon-1
0.0 cups-config-dae
0.0 hald
0.0 login
0.0 mingetty
0.0 mingetty
0.0 mingetty
0.0 mingetty
0.0 mingetty
0.0 mingetty
0.0 bash
0.0 ps

If I wanted to improve my load factors, there's a bunch of stuff I could shut down there. I suspect it is the same for the OP.




Re: DL140G2 + RHEL4 + SW RAID = system load never 0?

Interesting. Well, I'd think that even with that many programs running, as long as nothing is running _intensively_ you should still see around 0.00 or 0.02, etc -- meaning no processes are having to wait to get CPU time.

In any case, it seems that disabling APIC in the kernel and also disabling SMP support (I have a uniprocessor system anyways).

The only side effect of this is that /proc/interrupts shows ERR: incrementing a _lot_, but the system load numbers are more what I'd expect and the IO "pauses" are much less frequent.

I'll probably try this on a system without software RAID and see if the md driver is to blame or if it's ata_piix.
Steven E. Protter
Exalted Contributor

Re: DL140G2 + RHEL4 + SW RAID = system load never 0?

Kernel change fixed it. Interesting.

Part of the standard proliant install process involves a kernel change if you have a fiber connection.

If you'd care to share the customizations and changes, it would make this a valuable historical thread.

You have relatively few posts so I'll ask you to assign points to responses based on usefulness. If you found my advice useless, please take the time for posting me zero points.

TY

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Re: DL140G2 + RHEL4 + SW RAID = system load never 0?

I'll probably do some more testing this evening but will definitely put together a summary of my changes -- or at least the kernel config I am using.

I still want to test if these changes have the same effect on my stock Red Hat kernel if I rebuild it.

And yeah, sorry I need to remember to use the points system on here :-)

Re: DL140G2 + RHEL4 + SW RAID = system load never 0?

Well, just a follow-up. The issue actually hasn't completely gone away -- even with 2.6.16 like I thought it had.

There seem to be an excessively high number of interrupts (1000/sec to the IO-APIC timer vs 100/sec on a production DL140G1).

I've posted about this on the Red Hat mailing list:

https://www.redhat.com/archives/redhat-list/2006-April/msg00112.html