Operating System - HP-UX
1832234 Members
2767 Online
110041 Solutions
New Discussion

Re: EMC Timeout Justifications

 
SOLVED
Go to solution

EMC Timeout Justifications

Hi All
I have read many instances of increasing the PV timeout on EMC disks - and have practiced this myself simply as a rule of best practice.

However my question is this - we are experiencing I/O delays and timeouts on servers directly attached (via 1Gb optic fibre) to EMC 8x30 enclosures. I know that increasing the PV timeouts can help in reducing the intermittent loss of a disk to LVM, however does increasing the timeout also have a beneficial affect on performance?

Thanks.
6 REPLIES 6
Ken Hubnik_2
Honored Contributor

Re: EMC Timeout Justifications

I do not beleive there is a beneficial affect in performance other than trying to eliminate the timeouts. With EMC disk the minimum timeout value should be set at 180. If you continue to experience problems you can bump it up to 240. If you still experience problems I would get with your EMC rep and let him/her look into the problem further.
Stuart Abramson_2
Honored Contributor
Solution

Re: EMC Timeout Justifications

Our EMC guy just told us 90. We used to use 180.

No impact on performance. What it means is, on a read (or write, I guess), wait this long before marking the disk/disk sector "bad". If the i/o returns satisfactorily in less than that time, just keep going. So it only has an effect when i/o is delayed past 90 or 180 seconds, which isn't that often.

Stuart

Stuart
James Murtagh
Honored Contributor

Re: EMC Timeout Justifications

Hi,

There are timeouts for LVM and for the disk driver which are closely related. When you set the PV timeout this will be the time the driver spent trying the request before informing LVM of the failure. LVM will then switch to a PV link or inform the IO requestor that there has been a failure. Its the application/programs sole responsibility to deal with that, whether it is rescheduled or cancelled etc. If you have already set the PV timeouts to EMCs recommended 180 seconds I would now look at your scsi queue depth settings. The queue depth can be found using the scsictl command:

# scsictl -a /dev/rdsk/c3t6d0
immediate_report = 1; queue_depth = 8

This indicates for this disk or lun there can be 8 jobs queuing be the driver. I have seen recommendations that the total number of jobs waiting per server should not exceed 1000. Hence if you have 250 luns presented to the server a queue depth of 4 may be a good value. If there are too many jobs waiting for a specific lun then there is more likelyhood the latter jobs will not be processed hence the driver issuing a timeout to LVM. If you are seeing these timeouts only on selected luns this may be one option to look at. On 11i there is a dynamic tuneable scsi_max_qdepth which can be set globally to aid this process.

Regards,

James.
Steve Post
Trusted Contributor

Re: EMC Timeout Justifications

So James?

So the command
scsictl -m queue_depth=4 /dev/rdsk/c#t#d#
might help lower the number of scsi timeout errors?

But would if a request could not get into the queue (because it's smaller), what would happen?

steve
James Murtagh
Honored Contributor

Re: EMC Timeout Justifications

Hi Steve,

In this case the driver will return a QUEUE FULL status to the OS and the request will be re-queued. How the OS then deals with all this I'm not sure. I don't know if LVM is informed and manages it or if the kernel handles this by taking the request off the callout queue hence not triggering the timeout. I'll certainly look into this in more detail if you are interested, but I think the most important point is the OS deals with it, its not left to the application. I believe when the queue full condition occurs the driver throttles the IO to the device until the number of requests reaches zero then resets it.

Regards,

James.
Steve Post
Trusted Contributor

Re: EMC Timeout Justifications

thanks James.

I asked because I have a timeout error that shows up once, on one EMC disk, on random nights. The disk is mirrored, and heavily monitored by EMC support. It's not hurting anything. But I haven't had success fully eliminating the error. Of course I could change my timeout from 180 to 240 like another guy in here. But, I'll pursure changing the queue from 8 to 4. I'll get with hp support on it. I don't want (or need) to move fast on this issue.

steve