Operating System - Linux
1828340 Members
3621 Online
109976 Solutions
New Discussion

Extremely slow io on cciss raid6

 
Ulrik Holmén
Advisor

Extremely slow io on cciss raid6

I've installed RHEL 5.1 on a DL320S server with a Smart Array P400 Controller with 6 SATA disks in a RAID6 (ADG) setup. The write speed is terrible. I normally get about 8Mb/s write speed which is not what I expect from such hardware.

I've tried different kernels and parameters to increase the speed and it has helped with the reading speed which is now at about 200Mb/s sustained rate as long as no writes occure during the read. As soon as a write occures the read speed decreases radically.

I've noticed a lot of people seems to have the same problem but so far I haven't seen any good solutions apart from replacing the array controller. The iowait is above 90% while writing to the disk and this is making the whole system incredibly slow. Just listing files in a directory can take 20s due to the iowait.

I'm running the latest of everyting now. Firmware, kernel etc but the problem is still there. I've tried the cciss.sf.net driver and the vanilla kernel driver. All the same.

System information:

uname:
Linux someserver 2.6.25 #1 SMP Wed Jun 11 21:21:21 CEST 2008 i686 i686 i386 GNU/Linux

from dmesg:
HP CISS Driver (v 3.6.14)
ACPI: PCI Interrupt 0000:0a:00.0[A] -> GSI 16 (level, low) -> IRQ 16
cciss0: <0x3230> at PCI 0000:0a:00.0 IRQ 217 using DAC
blocks= 4294967296 block_size= 512
blocks= 5860333808 block_size= 512
heads=255, sectors=32, cylinders=718179

blocks= 5860333808 block_size= 512
heads=255, sectors=32, cylinders=718179

cciss/c0d0: p1 p2

/proc/interrupts:
CPU0 CPU1
0: 255 0 IO-APIC-edge timer
1: 8 0 IO-APIC-edge i8042
3: 1 0 IO-APIC-edge
4: 2 0 IO-APIC-edge
8: 3 0 IO-APIC-edge rtc
9: 0 0 IO-APIC-fasteoi acpi
12: 131 0 IO-APIC-edge i8042
21: 990599 0 IO-APIC-fasteoi uhci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3, uhci_hcd:usb4, ehci_hcd:usb6
22: 22324 0 IO-APIC-fasteoi ipmi_si
23: 166 0 IO-APIC-fasteoi uhci_hcd:usb5
215: 5802 2005 PCI-MSI-edge eth0
217: 512723 0 PCI-MSI-edge cciss0
NMI: 0 0 Non-maskable interrupts
LOC: 3144939 3144944 Local timer interrupts
RES: 1045 34959 Rescheduling interrupts
CAL: 209 653 function call interrupts
TLB: 445 478 TLB shootdowns
TRM: 0 0 Thermal event interrupts
SPU: 0 0 Spurious interrupts
ERR: 0
MIS: 0

/proc/driver/cciss/cciss0:
cciss0: HP Smart Array P400 Controller
Board ID: 0x3234103c
Firmware Version: 4.12
IRQ: 217
Logical drives: 1
Current Q depth: 0
Current # commands on controller: 16
Max Q depth since init: 19
Max # commands on controller since init: 24
Max SG entries since init: 31
Sequential access devices: 0

cciss/c0d0: 3000.49GB RAID ADG

/sys/block/cciss\!c0d0/queue/read_ahead_kb:
128

/sys/block/cciss\!c0d0/queue/max_sectors_kb:
512

vmstat -a 1 5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free inact active si so bi bo in cs us sy id wa st
0 9 128 13176 931408 57280 0 0 5938 2580 245 1682 3 2 5 90 0
1 9 128 13124 931260 57268 0 0 3848 20088 455 3236 2 3 0 95 0
0 9 128 13124 931260 57268 0 0 0 1516 65 2131 0 0 0 100 0
0 6 128 13044 931548 57268 0 0 8712 3564 667 4246 5 3 0 93 0
1 4 128 13208 931564 57268 0 0 0 6216 66 2084 0 1 0 99 0

Any ideas apart from changing the array adapter?

24 REPLIES 24
fschicker
Advisor

Re: Extremely slow io on cciss raid6

hi ulrik,

we have the same problem!
did you find a solution for now?
we have the issue on different HP Servers with different P400 Controllers, everytime the same...

my questions:
- what hardware revision has your controller? (lspci output)
- did you try to put the controller in an other pci-x slot?
- what is the output of "lshw" from the pci slot the controller is in?

My Post: http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1240003

thank you!

i hope we can solve this problem :(

greets
Ivan Ferreira
Honored Contributor

Re: Extremely slow io on cciss raid6

I would not expect performance for a RAID 6 configuration and small controllers. ¿Do you have write-back cache? ¿How many disks you have? ¿Have you considered using RAID 5 + spare instead of RAID 6?. ¿What is the performance testing tool that you use? ¿What is the block size used?

¿Can you create a RAID 0, with 1 disk, and then with all disks, for performance test purposes?. With this you could identify the performance for each disk, and then for all disks in a stripe configuration, and then, compare with RAID 6 performance.

Use iostat -x to identify the "service time" (svctm) on each situation. Post your results.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
fschicker
Advisor

Re: Extremely slow io on cciss raid6

hi ivan,

its not an issue with the raidlevel.
we tried it on about 6 servers, with raid 1 / 5 and 6, everytime the same.

please read my post i linked, i think this shows the issue a little better.
Ivan Ferreira
Honored Contributor

Re: Extremely slow io on cciss raid6

I see in your test that you run over a file system. You should run your tests over the raw device. ¿Was this FS ext3? ¿Journaling enabled? For filesystem, use Iozone or Bonnie.

¿What would be the performance over a single disk?

A large block size won't always be better.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
fschicker
Advisor

Re: Extremely slow io on cciss raid6

hi ivan,

thanks for your answer.

i know the possibilities of tweaking with blocksizes and filesystems but i think 8 mb/s of writing has it reasons somewhere else :)

i cant start bonnie because the server gets too much load an the services on it get offline if i start writing too much to the disk.
Ivan Ferreira
Honored Contributor

Re: Extremely slow io on cciss raid6

Then, let's wait Ulrik Holmén results.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
fschicker
Advisor

Re: Extremely slow io on cciss raid6

Hi Ivan,

Now i could make bonnie and more other tests.

Here my results:

- direktly to the disk, without ext3:

sync; time sh -c "dd if=/dev/zero of=/dev/cciss/c0d0p3 bs=1024k count=1000; sync"
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 677.96 seconds, 1.5 MB/s

real 11m18.088s
user 0m0.000s
sys 0m2.584s

- bonnie:

bonnie -b -s 1100 -d /tmp/ -u root
Using uid:0, gid:0.
Writing with putc()...done
Writing intelligently...done
Rewriting...done
Reading with getc()...done
Reading intelligently...done
start 'em...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
our.server.na 1100M 7366 15 4572 0 3456 0 26078 53 76173 5 135.3 0
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 396 0 +++++ +++ 263 0 227 0 +++++ +++ 1564 2
our.server.name,1100M,7366,15,4572,0,3456,0,26078,53,76173,5,135.3,0,16,396,0,+++++,+++,263,0,227,0,+++++,+++,1564,2

i dont know bonnie very well, but it doesnt look fine. server had 0.0 load before.

greets,
florian
Ulrik Holmén
Advisor

Re: Extremely slow io on cciss raid6

I've tried both raw disk and ext3 and the the problem is not related to fs issues as I've seen in other suggestions to the same problem.

I know I cannot expect the speed of lightning with raid5/6 but more than 8Mb/s is not expecting to much. The speed is actually not the biggest issue. The frustrating problem is that the server is totally locked while writing to disk. The server is going to be a slave database server but it is simply not possible with the current performance.

To test the performance I run:

read:

time dd of=/dev/zero if=/dev/mapper/VolGroup00-test bs=1M count=3000
3000+0 records in
3000+0 records out
3145728000 bytes (3.1 GB) copied, 15.6588 seconds, 201 MB/s

real 0m15.713s
user 0m0.005s
sys 0m4.264s

write:

time dd if=/dev/zero of=/dev/mapper/VolGroup00-test bs=1M count=3000
3000+0 records in
3000+0 records out
3145728000 bytes (3.1 GB) copied, 426.12 seconds, 7.4 MB/s

real 7m6.139s
user 0m0.003s
sys 0m4.418s

Ulrik Holmén
Advisor

Re: Extremely slow io on cciss raid6

While writing (The LVM is located on cciss/c0d0p2):

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
cciss/c0d0 0.00 1440.00 0.00 51.50 0.00 6.24 248.00 144.74 2835.11 19.43 100.05
cciss/c0d0p1
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
cciss/c0d0p2
0.00 1440.00 0.00 51.50 0.00 6.24 248.00 144.74 2835.11 19.43 100.05
Ulrik Holmén
Advisor

Re: Extremely slow io on cciss raid6

I solved the problem by forcing enable of writecache. It seems I forgot to order the BBWC and then the writecache was disabled by default. I have to order a BBWC to make sure I don't break anything in case of a powerfailure.

ctrl slot=2 modify drivewritecache=enable

The difference was quite astonishing:

sync; time dd if=/dev/zero of=/dev/mapper/VolGroup00-test bs=1M count=3000; sync
3000+0 records in
3000+0 records out
3145728000 bytes (3.1 GB) copied, 35.2907 seconds, 89.1 MB/s

real 0m35.292s
user 0m0.008s
sys 0m4.891s
Jon Gomersall
Advisor

Re: Extremely slow io on cciss raid6

fschicker
Advisor

Re: Extremely slow io on cciss raid6

dear ulrik,

i would NOT prefer enabling DWC because you are losing data on power loss. the BBWC doesnt help here because its the write cache directly on the disk.
its intresting that this helps, we have disabled disk writecache on all our servers (where we dont use p400) and get the performance you have (100-120mb/s).
i think enabling DWC helps, but ISNT the solution.

see what HP says under: http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c01149818〈=en&cc=us&taskId=101&prodSeriesId=3369549&prodTypeId=329290

greets
Ulrik Holmén
Advisor

Re: Extremely slow io on cciss raid6

No. I think you're right. It wasn't the real solution but it mitigates the problem with io wait. I have now ordered a battery backup to the controller so soon I feel safe with the solution as well.

I've read a lot of reports regarding what I think is the same problem with solutions ranging from BBWC to different fs.



fschicker
Advisor

Re: Extremely slow io on cciss raid6

dear ulrik,

i think its not a problem with the bbwc.
we search for the error since about 1 week, we have our DL320s with BBWC and the problem occurs also.

i guess it is one of the following:

- problem with DMA handling, i looked at the cciss driver source, there is a case with P600 where problems occur
- hardware revision of the p400. we have a customer who has also the same dl320s but a newer P400 HW REV, he has NOT the problems we have
- high memory irq conflict or shared pci-x slot

can you send me some information of your system:

- hpaducli -f result
- hpacucli output from "controller slot=2 show config detail"
- lspci -v and lshw output

thank you!

Florian
Ulrik Holmén
Advisor

Re: Extremely slow io on cciss raid6

Output from HP Array Configuration Utility CLI 8.0-14.0
Ulrik Holmén
Advisor

Re: Extremely slow io on cciss raid6

Output from hpaducli
Ulrik Holmén
Advisor

Re: Extremely slow io on cciss raid6

Output from lspci
Ulrik Holmén
Advisor

Re: Extremely slow io on cciss raid6

Output from lshw.

Here you go.

I have also been through the driver a couple of times while troubleshooting the issue and I saw the P600 failure in DMA prefecth but I'm convinced that it is not the same issue as we have with the P400 as the result from DMA prefetching from memory locations outside of the memory would not only be slow write access :).
fschicker
Advisor

Re: Extremely slow io on cciss raid6

thanks for your files, looks all normal i think.

my questions:

- are you using original hp disks?
- did you try to upgrade to latest fw?

Ulrik Holmén
Advisor

Re: Extremely slow io on cciss raid6

No. At the moment it is not original disks but we still have the original disks and the same performance problem was there with original disks as well. We actually changed them becausse we thought the speed issue was somehow connected to them.

Yes. We have upgraded the firmware in every piece of the server. We ran the latest firmware upgrade CD FW810 and it did upgrade the controller but the problem still remai
Martin Wozenilek
New Member

Re: Extremely slow io on cciss raid6

Same problem still here with SLES 10.1. This is a real problem as several VMWare are running on this host. If you have better experience with different HP controllers unter Linux: which one could you advice. I would change the controller and skip the P400 to the trash. Or any other solutions? ;)

Bye!
fschicker
Advisor

Re: Extremely slow io on cciss raid6

hi,

please open a case at hp. i think its the only option to solve this problem. we got a controller with a newer hardware-revision, after that we got 50 mb/s with writing.

but this "solution" only worked in our DL320S, not in DL320 or 160/180 :(
beovax
Advisor

Re: Extremely slow io on cciss raid6

Did anyone manage to resolve this? we have 2X DL185 G5 servers both with 512mb BBWC - cache ratio is set to 50/50 no cache enabled on the physical disks (cant afford to lose any data)

Servers are running ESX mush isnt supported by vmware so I cant log any calls there.

HP support is useless - had a call open for 6 months regarding the shared ilo - had to reopen it 4 times. They finally admitted the shared ilo is rubish (sorry im leaving the point here)

Performance we get is about 15mbs -20 mbs which isnt too great

If any one has any ideas it would be very helpful, I suspect we need a new hardware revsion or a magical fimrware upate
HermanSmit
New Member

Re: Extremely slow io on cciss raid6

I've found the solution for our problem with the low sequential write speed.
The severs: HP PRoliant ML350 G6
Controller: HP smart array p410i 256 mb.

The write-values of our three servers were: 5 mb p/s, 3mb p/s and 1,5 (!) mb p/s.
Terrebly low.

The trhee servers deon'nt have a battery. So the write cache is deactivated.

An other server that has a battery (bbwc)has write values of 160 mb p/s.

I've tested this server to set the cache to: 100 % read - 0 % write. Then the writespeed decreases to 5 mb /ps!

We have to buy 3 batteries voor the other 3 servers, then the write cache can be activited with setting (25 % read - 75 % write). This will solve the problem for us.