SCSI: Read error - after mounting from another Server

pandora_1 · ‎05-31-2011

Hi All,
Error : Pasted
May 31 18:00:11 TEST1 vmunix: SCSI: Read error -- dev: b 31 0x036000, errno: 126, resid: 1024,
May 31 18:00:11 TEST1 vmunix: SCSI: Write error -- dev: b 31 0x036000, errno: 126, resid: 16384,

I get an error as mentioned above on Read/write/Async. We have shared VG between three servers on HPUX-11.23, which is ACTIVE only on one server at a time. This is like a floating filesystem . Disks are presented to three servers. This Filesystem was working perfectly on Server PROD. After unmount the FS.
vgchange -a n /dev/vgflp (deactivated)
On Server TEST1:
vgchange -a y /dev/vgflp
mount FS.

After this TEST1 now performs slow for any commands typed(intermittentely).Even ps, top etc...Well the same tasks used to function earlier properly.
A reboot of TEST1 is given as well.
Will a vgexport and a vgimport of the vg can rectify the issue ?
No multipathing installed.
What could be the problem ? How to check it ? IS this float filesystem approach right ?

pandora_1 · ‎05-31-2011

Error - appeneded....

Jun 1 10:34:01 dopdwdb2 vmunix: SCSI: Read error -- dev: b 31 0x036000, errno: 126, resid: 1024,
Jun 1 10:34:01 dopdwdb2 vmunix: blkno: 8, sectno: 16, offset: 8192, bcount: 1024.

Duncan Edmonstone · ‎05-31-2011

First off, not sure the problem is related to this shared VG. According to your attachment, the disks in the shared VG are:

PV Name /dev/dsk/c10t0d7
PV Name /dev/dsk/c11t0d7 Alternate Link
PV Name /dev/dsk/c12t0d7 Alternate Link
PV Name /dev/dsk/c13t0d7 Alternate Link

But from your information above:

> May 31 18:00:11 TEST1 vmunix: SCSI: Read error -- dev: b 31 0x036000, errno: 126, resid: 1024,
> May 31 18:00:11 TEST1 vmunix: SCSI: Write error -- dev: b 31 0x036000, errno: 126, resid: 16384,

This os on device 0x036000, which I would expect to be c3t6d0

Is that by any chance one of your boot disks - might explain why everything is going slow...

You don't mention what sort of disk it is in the VG... from the number of alternate paths, I'd guess at maybe a LUN on a disk array? Can you tell us what?

>> IS this float filesystem approach right ?

Well it works, but you are completely relying on "good practice" by sysadmins to prevent a corruption - if someone accidentally activates/mounts the VG on 2 systems at once, the chnaces are your data is toast.

HTH

Duncan

I am an HPE Employee

pandora_1 · ‎06-01-2011

Hi Duncan,

I have attached some more outputs on VG for your reference.

YES the VGFLP is presented from EVA SAN 5000 as one LUN.

How would you map device 0x036000, as c3t6d0. Well I couldn't see any type of Harware error or light? How do we check if its gone bad.
I tried using
dd if=/dev/rdsk/c3t6d0 of=dev/null bs=1024. But still its running. One time it came out ,but second time I had a run and still going on. Hope there is no impact on running this. I tend to get more of SCSI errors.

Jun 1 11:31:50 dopdwdb2 vmunix: blkno: 1539104, sectno: 3078208, offset: 1576042496, bcount: 16384.
Jun 1 11:31:50 dopdwdb2 vmunix: SCSI: Write error -- dev: b 31 0x036000, errno: 126, resid: 4096,
Jun 1 11:31:50 dopdwdb2 vmunix: blkno: 1539096, sectno: 3078192, offset: 1576034304, bcount: 4096.
Jun 1 11:31:50 dopdwdb2 vmunix: SCSI: Write error -- dev: b 31 0x036000, errno: 126, resid: 16384,
Jun 1 11:31:50 dopdwdb2 vmunix: blkno: 1538828, sectno: 3077656, offset: 1575759872, bcount: 16384.
Jun 1 11:31:50 dopdwdb2 vmunix: SCSI: Write error -- dev: b 31 0x036000, errno: 126, resid: 4096,
Jun 1 11:31:50 dopdwdb2 vmunix: blkno: 1538776, sectno: 3077552, offset: 1575706624, bcount: 4096.
Jun 1 11:31:50 dopdwdb2 vmunix: SCSI: Write error -- dev: b 31 0x036000, errno: 126, resid: 8192,
Jun 1 11:31:50 dopdwdb2 vmunix: blkno: 1538768, sectno: 3077536, offset: 1575698432, bcount: 8192.
Jun 1 11:31:50 dopdwdb2 vmunix: blkno: 1538740, sectno: 3077480, offset: 1575669760, bcount: 8192.
Jun 1 11:31:50 dopdwdb2 vmunix: SCSI: Read error -- dev: b 31 0x036000, errno: 126, resid: 1024,
Jun 1 11:31:50 dopdwdb2 vmunix: SCSI: Write error -- dev: b 31 0x036000, errno: 126, resid: 8192,
Jun 1 11:31:50 dopdwdb2 vmunix: blkno: 8, sectno: 16, offset: 8192, bcount: 1024.
Jun 1 11:31:50 dopdwdb2 vmunix: LVM: VG 64 0x000000: PVLink 31 0x036000 Failed! The PV is not accessible.
Jun 1 11:31:50 dopdwdb2 vmunix:
Jun 1 11:31:56 dopdwdb2 above message repeats 79 times
Jun 1 11:31:55 dopdwdb2 vmunix: LVM: VG 64 0x000000: PVLink 31 0x036000 Recovered.

Duncan Edmonstone · ‎06-01-2011

Hi,

>> Jun 1 10:34:01 dopdwdb2 vmunix: SCSI: Read error -- dev: b 31 0x036000, errno: 126, resid: 1024

So this is telling me that the block (b) device with a major number of 31 and a minor number of 0x0360000 has a problem.

To determine what sort of device, I use lsdev to show me the block device with a major number of 31:

# lsdev -b 31
Character Block Driver Class
188 31 sdisk disk

The minor number is in hex with the format 0xCCTD000 where CC is the bus, T is the target and D is the LUN in /dev/dsk/cXtYdZ. So this is c3t6d0. You can check that by looking at the device file:

# ll /dev/dsk/c3t6d0

the minor number reported there should be the same.

Looking at your LVM output, I see you have stale extents in vg00. Again this looks like a failing disk, so I would get a hardware call logged and start consulting the "When Good Disks Go Bad" document:

http://h20000.www2.hp.com/bc/docs/support/SupportManual/c01911837/c01911837.pdf

HTH

Duncan

I am an HPE Employee

pandora_1 · ‎06-01-2011

So could this be the reason for very slow performance and also on executing any commands (sqlplus, copy, ps...).
Hope this is not related to the VG presented from SAN ?

Duncan Edmonstone · ‎06-01-2011

Absolutely it could, yes... intermittent read/write failures to a disk in the root volume group will effect almost any process on the system (even commands like sqlplus which may not be installed on the root volume group will attempt to load libraries from /usr/lib which is in the root volume group)

Get a hardware call logged, and consult the doc I referenced above...

HTH

Duncan

I am an HPE Employee

pandora_1 · ‎06-02-2011

Hi Duncan,

Till the time the disk is getting replaced. To avoid the performance/slowness issue. Is it OK if we remove the Mirror copy of vg00. Will this resolve the slowness issue ? If so What steps to be taken for this ? ALso how to activate back mirror once disk is replaced.

Disk which is failed on rp7420 .Hope this is the second right disk, bcoz it doesn't shows any error light on disk.

disk 2 1/0/1/1/0/1/1.6.0 sdisk CLAIMED DEVICE HP 36.4GST336753LC
/dev/dsk/c3t6d0 /dev/rdsk/c3t6d0

Duncan Edmonstone · ‎06-02-2011

>> Is it OK if we remove the Mirror copy of vg00.

Yes.

>> Will this resolve the slowness issue ?

Yes.

>> If so What steps to be taken for this ? ALso how to activate back mirror once disk is replaced.

All covered in the document I referenced above.

HTH

Duncan

I am an HPE Employee

pandora_1 · ‎06-02-2011

Thanks a ton

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

SCSI: Read error - after mounting from another Server

SCSI: Read error - after mounting from another Server

Re: SCSI: Read error - after mounting from another Server

Re: SCSI: Read error - after mounting from another Server

Re: SCSI: Read error - after mounting from another Server

Re: SCSI: Read error - after mounting from another Server

Re: SCSI: Read error - after mounting from another Server

Re: SCSI: Read error - after mounting from another Server

Re: SCSI: Read error - after mounting from another Server

Re: SCSI: Read error - after mounting from another Server

Re: SCSI: Read error - after mounting from another Server