Operating System - Linux
1753380 Members
5587 Online
108792 Solutions
New Discussion юеВ

Re: Raid Error Redhat EL AS 3 upd 3

 
Tarek
Super Advisor

Raid Error Redhat EL AS 3 upd 3

Hi,
from time to time my linux server goes 100% iowait and I'm getting crazy to understand why.
The only way to solve the problem is to reboot the system with -n flag.
This server is a lotus notes domino server, version 6.5.4.
The hardware is an IBM xSeries 255, 2cpu XEON(TM) MP CPU 1.90GHz and 3GB ram.
This system is connected to an IBM FASTT storage via QLogic fibre channel.
I see that the 100% iowait is on the partition that contains the notes databases.
I think that the problems may be:
- something hangs in notes (unfortunately there's no notes admin..)
- the raid controller
Today I've seen this error in the messages:
Nov 14 04:03:16 bw-noteslx1 kernel: 467 [RAIDarray.mpp]cannot get memory for mppLnx_CmndEntry_t
Nov 14 04:03:19 bw-noteslx1 last message repeated 18 times

When this happens I'm not understanding what process is causing this 100% iowait.

I know for sure that the system needs more RAM but I don't think this is causing the iowait.
/proc/meminfo shows:
[root@bw-noteslx1 log]# cat /proc/meminfo
total: used: free: shared: buffers: cached:
Mem: 3982761984 3964911616 17850368 0 55107584 3659128832
Swap: 8389574656 754298880 7635275776
MemTotal: 3889416 kB
MemFree: 17432 kB
MemShared: 0 kB
Buffers: 53816 kB
Cached: 3468024 kB
SwapCached: 105344 kB
Active: 2953868 kB
ActiveAnon: 841044 kB
ActiveCache: 2112824 kB
Inact_dirty: 563640 kB
Inact_laundry: 109392 kB
Inact_clean: 59844 kB
Inact_target: 737348 kB
HighTotal: 3047192 kB
HighFree: 1172 kB
LowTotal: 842224 kB
LowFree: 16260 kB
SwapTotal: 8192944 kB
SwapFree: 7456324 kB
HugePages_Total: 0
HugePages_Free: 0
Hugepagesize: 2048 kB

Any ideas? How can I proceed?

Thanks in advance
5 REPLIES 5
Tarek
Super Advisor

Re: Raid Error Redhat EL AS 3 upd 3

Hi,
I've seen that while the system was 100% waitio these processes were the ones consuming most cpu:
* updall (domino): two processes; one in state D, uninterruptible sleep (usually IO), the other in state R, runnable (on run queue)
* [catalog ] (domino); in state Z, a defunct ("zombie") process
* [kswapd], in state S
So I need to understand why that process was zombie, maybe caused by an IO problem (process with D state).

Can this be considered a correct troubleshooting?

Thanks.
Steven E. Protter
Exalted Contributor

Re: Raid Error Redhat EL AS 3 upd 3

Shalom,

Seems notes is exceeding the i/o capacity of the system. The system can be tuned to provide more capacity but the real issue is configuring notes to be a bit less i/o intensive.

notes needs to be seriously looked at.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Tarek
Super Advisor

Re: Raid Error Redhat EL AS 3 upd 3

Hi,
I know there's too much io on the system. During the night we compact the notes database and we archive, that's why we have so much io.

Our system has two fibre channel QLogic adapters (qla2300):
QLogic PCI to Fibre Channel Host Adapter for QLA2340 : Firmware version: 3.03.11, Driver version 7.05.00

We use IBM multipath, Linux MPP driver. Version:09.00.A5.00

I'd like to know if there's a way to better configure the io parameters related to mpp and fibre channels.

Thanks again
Tarek
Super Advisor

Re: Raid Error Redhat EL AS 3 upd 3

When the problem happened, this was the iostat of the server:
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
/dev/sda 12.79 29.81 83.69 66.89 759.18 763.63 379.59 381.81 10.11 0.09 0.06 0.07 1.12
/dev/sda1 0.09 0.00 0.01 0.00 0.19 0.00 0.10 0.00 30.18 0.00 9.08 4.16 0.00
/dev/sda2 1.70 0.83 0.29 0.91 3.99 3.48 1.99 1.74 6.21 0.13 10.92 5.74 0.69
/dev/sda3 1.96 0.10 0.42 0.15 18.98 2.03 9.49 1.01 36.94 0.16 27.68 16.12 0.92
/dev/sda5 1.21 0.42 0.14 0.73 10.81 9.13 5.40 4.57 23.09 0.12 14.29 8.76 0.76
/dev/sda6 0.00 0.03 0.01 0.04 0.05 0.54 0.03 0.27 12.50 0.02 45.71 40.50 0.19
/dev/sda7 2.05 0.30 0.46 0.35 20.01 5.22 10.01 2.61 31.26 0.10 12.20 15.47 1.25
/dev/sda8 0.00 22.20 81.95 64.30 655.59 692.45 327.79 346.23 9.22 0.10 0.07 0.00 0.07
/dev/sda9 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 19.05 0.00 4.29 4.29 0.00
/dev/sda10 5.77 5.94 0.42 0.41 49.50 50.78 24.75 25.39 120.97 0.14 16.80 7.33 0.61
/dev/sdb 0.00 0.00 0.00 0.00 0.01 0.00 0.01 0.00 25.64 0.00 78.55 18.78 0.00
/dev/sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12.80 0.00 1.50 1.50 0.00
/dev/sdb2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12.80 0.00 2.50 2.50 0.00
/dev/sdc 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 28.48 0.00 95.63 22.25 0.00
/dev/sdc1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 10.67 0.00 2.50 2.50 0.00
/dev/sdd 430.53 125.35 92.26 45.90 769.65 853.20 384.82 426.60 11.75 0.01 0.12 0.01 0.18
/dev/sdd1 430.53 125.35 92.26 45.90 769.63 853.20 384.81 426.60 11.75 0.01 0.12 0.01 0.18
/dev/sde 318.66 10.54 33.70 4.44 853.20 120.27 426.60 60.14 25.52 0.13 0.33 0.14 0.54
/dev/sde1 318.66 10.54 33.70 4.44 853.20 120.27 426.60 60.14 25.52 0.13 0.33 0.14 0.54

Please help me reading this output.
Thanks
Tobias Hennes
New Member

Re: Raid Error Redhat EL AS 3 upd 3

Hello Tarek,

I'm looking for same Erro Message "RAIDarray.mppcannot get memory for mppLnx_CmndEntry_t" and I find your posti in this forum.
I have the same problems the CPU iowait goes 100% and the only way to solve the problem is to reboot the server.
me equipment is Redhat 3 Kernel 2.4.21-20.ELsmp Hardware IBM Blade HS20 43.
this system is connect with a Storage FastT DS4300 IBM over IO driver RDAC (mpp failover)

I don't know, what I can still do...!
Could you solve the problem?

Thanks for your help

kind regards

tobias hennes