Monitoring the Health of VG00

Zinky · ‎12-14-2004

Has anyone written a vg00 health check script yet? If not, what should be the parameters to consitute imminent a mirrored vg00 failure? I am thinking searching for patterns from dmesg/syslog or vgdisplay -v.

Also, when should LVM vgdisplay complain that its components are failing? On one system, dmesg/Syslog is spewing out SCSI errors like so:

SCSI: Read error -- dev: b 31 0x222000, errno: 126, resid: 1024,
blkno: 8, sectno: 16, offset: 8192, bcount: 1024.
LVM: VG 64 0x000000: PVLink 31 0x222000 Failed! The PV is not accessible.
LVM: VG 64 0x000000: PVLink 31 0x222000 Recovered.
LVM: Failed to automatically resync PV 1f222000 error: 5

BUT, vgdisplay/lvdisplay/pvdisplay/diskinfo still shows the components of vg00 as healthy...

Hakuna Matata

Favourite Toy:
AMD Athlon II X6 1090T 6-core, 16GB RAM, 12TB ZFS RAIDZ-2 Storage. Linux Centos 5.6 running KVM Hypervisor. Virtual Machines: Ubuntu, Mint, Solaris 10, Windows 7 Professional, Windows XP Pro, Windows Server 2008R2, DOS 6.22, OpenFiler

Sanjay_6 · ‎12-14-2004

Hi,

The disk seems to be

0x222000 --> c22t2d0

Do a dd on the disk and check if it returns any error.

dd if=/dev/rdsk/c22t2d0 of=/dev/null bs=1024k

Do a pvdisplay -v /dev/dsk/c22t2d0 and check for stale extents.

Hope this helps.

Regds

Sanjay_6 · ‎12-14-2004

Hi Nelson,

If you not using EMS, you can use this product (free) to monitor the system and direct all root emails to yourself and other admins.

Most of the time EMS is able to alert root about any unusual event on the system, say a disk failure.

http://software.hp.com/portal/swdepot/displayProductInfo.do?productNumber=B6191AAE

Hope this helps.

Regds

doug mielke · ‎12-14-2004

A good metric to monitor is the disk access time. (sar -d)
If a disk is forced to do many re-trys to get data, the response times creep up. Beware of the effect of caching on arrays, but vg00 is a good candidate for this.

( this was a near constant when we looked through sar histories in my tech support days)

Steve Steel · ‎12-14-2004

Hi

What is the disk

I have seen this happen on a root disk when
AutoTrspass was not disabled

Doesnt mean HW

Also check patches

EMS is best advice

Steve Steel

If you want truly to understand something, try to change it. (Kurt Lewin)

A. Clay Stephenson · ‎12-14-2004

Wrong; these are hex digits, /dev/dsk/c34t2d0.

Dmseg is not at all a reliable source for these messages because it is a circular buffer. I wouldn't restrict this to vg00 but for all disks.

Note that the PVLink recovered so that vgdisplay would display normally. You might find that lvdisplay does show stale extents.

I suspect most people who do this sort on monitoring rely upon a product like IT/O, VP/O (whateven the Openview name of the month is) because disk failure is but one component of a much larger picture. In fact, if all your disks are mirrored or RAID'ed then it's not even a big deal; you just need to know that a disk has failed. VP/O and its default templates handle that quite nicely. The other advantage of using the VP/O approach is that al the monitoting can be done from one location -- for all of your boxes.

If it ain't broke, I can fix that.

Zinky · ‎12-14-2004

We've EMS on some, none on the others - That is why I am asking a "Generic" way to test VG00. VxVM is *very* sensitive about minute Disk Failures and senses impending faiures but inthe case of LVM - I've had several scenarios in the past wherein component disks are already on the verge of failing or failed but LVM (via vgdisplay, pvdisplay, lvdisplay) still shows a "healthy" config.

If LVM only flags this via the SYSLOG/dmesg messaging -- then I will probably just rely on this plus EMS on systems that we have it.

Hakuna Matata

Favourite Toy:
AMD Athlon II X6 1090T 6-core, 16GB RAM, 12TB ZFS RAIDZ-2 Storage. Linux Centos 5.6 running KVM Hypervisor. Virtual Machines: Ubuntu, Mint, Solaris 10, Windows 7 Professional, Windows XP Pro, Windows Server 2008R2, DOS 6.22, OpenFiler

Zinky · ‎12-14-2004

Thanks for all the suggestions.. ANd thanks Clay for correcting Sanjay. I usually just do a grep on the 0xaabbcc in the dev direcoty and based on the driver (lsdev).

All of our monitoring are homegrown and we actually use Console/SYSLOG monitoring tools to flag for alert, etc....

Hakuna Matata

Favourite Toy:
AMD Athlon II X6 1090T 6-core, 16GB RAM, 12TB ZFS RAIDZ-2 Storage. Linux Centos 5.6 running KVM Hypervisor. Virtual Machines: Ubuntu, Mint, Solaris 10, Windows 7 Professional, Windows XP Pro, Windows Server 2008R2, DOS 6.22, OpenFiler

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Monitoring the Health of VG00

Monitoring the Health of VG00

Re: Monitoring the Health of VG00

Re: Monitoring the Health of VG00

Re: Monitoring the Health of VG00

Re: Monitoring the Health of VG00

Re: Monitoring the Health of VG00

Re: Monitoring the Health of VG00

Re: Monitoring the Health of VG00