System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

I/O error Disk and timeout issue

caj
Frequent Advisor

I/O error Disk and timeout issue

Hi ,

I have 5* bl460c g7 server with EVA6400 ,VC fabric with 2* 8/40 brocade switch.

Each VC has 2*8 GB link to each switch .And all the 4 port on the each controller is connected as per the standard .

OS RHEL 5u6

I am facing the below error when ever there is heavy I/O ,or some times (one try out of 10) simple multipath -ll or pvs will hung for some time and it throw the below error in the console/message file .Does any one having any idea what si gone wrong ?

Jan 19 03:15:54 testkernel: INFO: task mpath_prio_alua:23084 blocked for more than 120 seconds.
Jan 19 03:15:54 testkernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 19 03:15:54 testkernel: mpath_prio_al D ffffffff80153806 0 23084 8051 (NOTLB)
Jan 19 03:15:54 testkernel: ffff810f6b2f9a28 0000000000000086 ffff81080a6a2080 ffff81080bdbe4f8
Jan 19 03:15:54 testkernel: ffff81080bdbe000 0000000000000001 ffff810828250820 ffff81080b82e7a0
Jan 19 03:15:54 testkernel: 00005afd5ca08298 0000000000002cd3 ffff810828250a08 0000000f0a0b52c0
Jan 19 03:15:54 testkernel: Call Trace:
Jan 19 03:15:54 testkernel: [] wait_for_completion+0x79/0xa2
Jan 19 03:15:54 testkernel: [] default_wake_function+0x0/0xe
Jan 19 03:15:54 testkernel: [] blk_execute_rq_nowait+0x7e/0x92
Jan 19 03:15:54 testkernel: [] blk_execute_rq+0x98/0xc0
Jan 19 03:15:54 testkernel: [] sg_io+0x258/0x356
Jan 19 03:15:54 testkernel: [] scsi_cmd_ioctl+0x1d2/0x3b5
Jan 19 03:15:54 testkernel: [] avc_has_perm+0x46/0x58
Jan 19 03:15:54 testkernel: [] do_lookup+0x65/0x1e6
Jan 19 03:15:54 testkernel: [] :sd_mod:sd_ioctl+0x93/0xc2
Jan 19 03:15:54 testkernel: [] blkdev_driver_ioctl+0x5d/0x72
Jan 19 03:15:54 testkernel: [] blkdev_ioctl+0x63c/0x697
Jan 19 03:15:54 testkernel: [] avc_has_perm+0x46/0x58
Jan 19 03:15:54 testkernel: [] inode_has_perm+0x56/0x63
Jan 19 03:15:54 testkernel: [] blkdev_open+0x0/0x4f
Jan 19 03:15:54 testkernel: [] blkdev_open+0x23/0x4f
Jan 19 03:15:54 testkernel: [] __dentry_open+0x101/0x1dc
Jan 19 03:15:54 testkernel: [] block_ioctl+0x1b/0x1f
Jan 19 03:15:54 testkernel: [] do_ioctl+0x21/0x6b
Jan 19 03:15:54 testkernel: [] vfs_ioctl+0x457/0x4b9
Jan 19 03:15:54 testkernel: [] sys_ioctl+0x59/0x78
Jan 19 03:15:54 testkernel: [] tracesys+0xd5/0xe0
J


Appriciate your response on this

Thanks
1 REPLY
Matti_Kurkela
Honored Contributor

Re: I/O error Disk and timeout issue

The "mpath_prio_alua" process has been stuck waiting for response from the storage system for more than 120 seconds, causing the hung-task detector in the kernel to trigger.

The detector will automatically produce a kernel-space call trace of the possibly-hung process. This does not necessarily mean anything is "wrong" within your system: it's just trying to help you figure out what the possibly-hung process is doing.

"mpath_prio_alua" is the tool used by the multipath utilities to query the active/passive state of the storage LUNs using the ALUA standard.

If running "mpath_prio_alua" takes a long time in your environment, you might want to find out why and fix it.

One possible way to get more information would be to run /sbin/mpath_prio_alua manually for each LUN using the -v option. If some particular LUNs make the command hang for longer time than others, then check the storage-side configuration of those LUNs.

Run "man mpath_prio_alua" for details and examples.

If "mpath_prio_alua" takes a long time simply because your system sees a very large number of LUNs, you might want to find out if it's necessary for your system to see them all. If your system sees an excessive number of LUNs, you might want to limit them by storage-side WWN masking and/or by fabric zoning.

MK
MK