Operating System - Linux
1820658 Members
2412 Online
109626 Solutions
New Discussion юеВ

Linux filesystem i/o erros (URGENT)

 
Beno├оt
Regular Advisor

Linux filesystem i/o erros (URGENT)

Hi,

Are there any known issues about the following?
Configuration is a 4 nodes HP ServiceGuard 1.11.15 on SUSE 8.0 (kernel 2.4.21-286-smp) with MSA1000 and SecurePath Workgroup 3.0C SP2.

Jul 5 01:04:27 meditel4 cmcld: Failed to open /opt/cmcluster/run/.cm_start_time: No such file or directory
Jul 5 01:15:20 meditel4 kernel: ?cpqp: Path hsx_mod-1-0-0-2 Failed (LUN 600805F30016D060A5EB1FEB34000005 Controller P56350D9IPC075 Array 500805F30016D060 HBA 2300-1)
Jul 5 01:15:20 meditel4 kernel: cpqp: All paths for Target/LUN 0/1 (WWID=600805F30016D060A5EB1FEB34000005) on Controller P56350GX3RT0AT failed
Jul 5 01:15:20 meditel4 kernel: swspLDPrepFailover Lun has outstanding command
Jul 5 01:15:23 meditel4 kernel: swspLDDoFailover Going to call swspControllerStart
Jul 5 01:35:07 meditel4 kernel: hda: packet command error: status=0x51 { DriveReady SeekComplete Error }
Jul 5 01:35:07 meditel4 kernel: hda: packet command error: error=0x54
Jul 5 01:35:07 meditel4 kernel: Error: Illegal request -- (Sense key=0x05)
Jul 5 01:35:07 meditel4 kernel: The failed "Start/Stop Unit" packet command was:
Jul 5 01:35:07 meditel4 kernel: cdrom: open failed.
Jul 5 01:43:53 meditel4 kernel: SCSI disk error : host 2 channel 0 id 0 lun 1 return code = 10000
Jul 5 01:43:53 meditel4 kernel: I/O error: dev 08:12, sector 317704664
Jul 5 01:43:53 meditel4 kernel: zam-7001: io error in reiserfs_find_entry
Jul 5 01:43:54 meditel4 kernel: SCSI disk error : host 2 channel 0 id 0 lun 1 return code = 10000
Jul 5 01:43:54 meditel4 kernel: I/O error: dev 08:12, sector 317704664
Jul 5 01:43:54 meditel4 kernel: zam-7001: io error in reiserfs_find_entry
Jul 5 01:43:54 meditel4 kernel: SCSI disk error : host 2 channel 0 id 0 lun 1 return code = 10000
Jul 5 01:43:54 meditel4 kernel: I/O error: dev 08:12, sector 317704664
Jul 5 01:43:54 meditel4 kernel: zam-7001: io error in reiserfs_find_entry
Jul 5 01:43:54 meditel4 kernel: SCSI disk error : host 2 channel 0 id 0 lun 1 return code = 10000
Jul 5 01:43:54 meditel4 kernel: I/O error: dev 08:12, sector 317704664
Jul 5 01:43:54 meditel4 kernel: zam-7001: io error in reiserfs_find_entry
Jul 5 01:43:54 meditel4 kernel: SCSI disk error : host 2 channel 0 id 0 lun 1 return code = 10000
Jul 5 01:43:54 meditel4 kernel: I/O error: dev 08:12, sector 317704664
Jul 5 01:43:54 meditel4 kernel: zam-7001: io error in reiserfs_find_entry
Jul 5 01:43:54 meditel4 kernel: SCSI disk error : host 2 channel 0 id 0 lun 1 return code = 10000
Jul 5 01:43:54 meditel4 kernel: I/O error: dev 08:12, sector 317704664
Jul 5 01:43:54 meditel4 kernel: zam-7001: io error in reiserfs_find_entry
Jul 5 01:43:54 meditel4 kernel: SCSI disk error : host 2 channel 0 id 0 lun 1 return code = 10000
Jul 5 01:43:54 meditel4 kernel: I/O error: dev 08:12, sector 317704664
Jul 5 01:43:54 meditel4 kernel: zam-7001: io error in reiserfs_find_entry
Jul 5 01:43:54 meditel4 kernel: SCSI disk error : host 2 channel 0 id 0 lun 1 return code = 10000
Jul 5 01:43:54 meditel4 kernel: I/O error: dev 08:12, sector 317704664
Jul 5 01:43:54 meditel4 kernel: zam-7001: io error in reiserfs_find_entry
Jul 5 01:43:57 meditel4 kernel: ?cpqp: Path hsx_mod-0-0-0-2 Failed (LUN 600805F30016D060A5EB1FEB34000005 Controller P56350GX3RT0AT Array 500805F30016D060 HBA 2300-0)
Jul 5 01:43:57 meditel4 kernel: cpqp: All paths for Target/LUN 0/1 (WWID=600805F30016D060A5EB1FEB34000005) on Controller P56350D9IPC075 failed
Jul 5 01:43:57 meditel4 kernel: swspLDDoFailover Going to call swspControllerStart
Jul 5 01:58:51 meditel4 cmcld: Timed out node meditel1. It may have failed.
Jul 5 02:00:42 meditel4 kernel: ?cpqp: Path hsx_mod-1-0-0-1 Failed (LUN 600805F30016D060A6DB0FD3E9280004 Controller P56350D9IPC075 Array 500805F30016D060 HBA 2300-1)
Jul 5 02:00:42 meditel4 kernel: cpqp: All paths for Target/LUN 0/0 (WWID=600805F30016D060A6DB0FD3E9280004) on Controller P56350GX3RT0AT failed
Jul 5 02:00:42 meditel4 kernel: swspLDDoFailover Going to call swspControllerStart
Jul 5 02:09:08 meditel4 kernel: ?cpqp: Path hsx_mod-0-0-0-1 Failed (LUN 600805F30016D060A6DB0FD3E9280004 Controller P56350GX3RT0AT Array 500805F30016D060 HBA 2300-0)
Jul 5 02:09:08 meditel4 kernel: cpqp: All paths for Target/LUN 0/0 (WWID=600805F30016D060A6DB0FD3E9280004) on Controller P56350D9IPC075 failed
Jul 5 02:09:08 meditel4 kernel: swspLDDoFailover Going to call swspControllerStart
Jul 5 02:10:59 meditel4 kernel: ?cpqp: Path hsx_mod-1-0-0-1 Failed (LUN 600805F30016D060A6DB0FD3E9280004 Controller P56350D9IPC075 Array 500805F30016D060 HBA 2300-1)
Jul 5 02:10:59 meditel4 kernel: cpqp: All paths for Target/LUN 0/0 (WWID=600805F30016D060A6DB0FD3E9280004) on Controller P56350GX3RT0AT failed
Jul 5 02:10:59 meditel4 kernel: swspLDDoFailover Going to call swspControllerStart
Jul 5 02:18:15 meditel4 cmcld: Timed out node meditel2. It may have failed.
Jul 5 02:22:53 meditel4 kernel: ?cpqp: Path hsx_mod-0-0-0-1 Failed (LUN 600805F30016D060A6DB0FD3E9280004 Controller P56350GX3RT0AT Array 500805F30016D060 HBA 2300-0)
Jul 5 02:22:53 meditel4 kernel: cpqp: All paths for Target/LUN 0/0 (WWID=600805F30016D060A6DB0FD3E9280004) on Controller P56350D9IPC075 failed
Jul 5 02:22:53 meditel4 kernel: swspLDDoFailover Going to call swspControllerStart
Jul 5 02:23:07 meditel4 kernel: ?cpqp: Path hsx_mod-0-0-0-2 Failed (LUN 600805F30016D060A5EB1FEB34000005 Controller P56350GX3RT0AT Array 500805F30016D060 HBA 2300-0)
Jul 5 02:23:07 meditel4 kernel: cpqp: All paths for Target/LUN 0/1 (WWID=600805F30016D060A5EB1FEB34000005) on Controller P56350D9IPC075 failed
Jul 5 02:23:07 meditel4 kernel: swspLDDoFailover Going to call swspControllerStart
Jul 5 02:24:41 meditel4 kernel: ?cpqp: Path hsx_mod-1-0-0-1 Failed (LUN 600805F30016D060A6DB0FD3E9280004 Controller P56350D9IPC075 Array 500805F30016D060 HBA 2300-1)
Jul 5 02:24:41 meditel4 kernel: cpqp: All paths for Target/LUN 0/0 (WWID=600805F30016D060A6DB0FD3E9280004) on Controller P56350GX3RT0AT failed
Jul 5 02:24:41 meditel4 kernel: swspLDDoFailover Going to call swspControllerStart
Jul 5 02:39:44 meditel4 cmclconfd[22707]: Failed to get main daemon to connect out


Thanks
5 REPLIES 5
Steven E. Protter
Exalted Contributor

Re: Linux filesystem i/o erros (URGENT)

Shalom,

Seems your alternate path, provided by securepath is not working.

Could be the connection or the disk itself thats the problem.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Beno├оt
Regular Advisor

Re: Linux filesystem i/o erros (URGENT)

Hi Steven,

Thanks for replying so fast.

I have the same kind of problem on my 4 nodes.

Do you it could be a problem on one of the disks of the MSA or on a LUN?
I don't have any errors on the MSA.

Where should investigate?
Beno├оt
Regular Advisor

Re: Linux filesystem i/o erros (URGENT)

Could it be a problem from QLA Fibre Channel Driver? (qla2x00src-v7.05.00)
Serviceguard for Linux
Honored Contributor

Re: Linux filesystem i/o erros (URGENT)

I haven't done the analysis Stephen has, but if you are seeing path problems on all nodes, it would seem either a switch or MSA1000 controller may have a problem.
Steven E. Protter
Exalted Contributor

Re: Linux filesystem i/o erros (URGENT)

I agree with Serviceguard. These messages are typical of a problem with storage.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com