Operating System - HP-UX
1837984 Members
2748 Online
110124 Solutions
New Discussion

Mirror Disk and dead boot drive

 
SOLVED
Go to solution
Patrick Wallek
Honored Contributor

Mirror Disk and dead boot drive

HP9000 D330 running HP-UX 10.20 with MirrorDisk/UX. 1 Jamaica attached with 8 x 18.2 GB FW Diff drives with 4 drives on each of 2 SCSI channels. OS on 1 drive mirrored to drive on other channel.

Now for the problem --

I noticed on Friday afternoon that one of the OS disks had gone bad. I called HP and had a drive shipped to me, which arrived today (the system is not on 24x7 support). The system was still responding before I left on Friday and I was able to get logged in, etc.

When I arrived at the office this morning the system was not responding, I couldn't get logged in, the Glance (gpm) session I had up was not responding. I could ping the machine, but if I telnet'ed or rlogin'ed to the machine, no response. I had a perf meter on my sun box pointing to this HP machine and that was still working. When I arrived this morning the machine was showing a load average on the perf meter of around 75-100.

I finally wound up having to restart the machine and boot from the alternate disk (it was the primary boot disk that failed). I was able to boot from the alternate disk, so I know that drive was good.

Anyone have any idea why the system didn't stay up as it should have? The reason I have the MirrorDisk software is so that it doesn't behave as it did. Anyone have any theories?

I got the disk replaced and the mirrors resynced and all is now good. I'm just trying to go back and figure out what happened.

As an aside -- Does anyone know if HP has gotten a bad shipment of 18.2 DF drives? This is the 3rd drive I have had to replace in this machine in the past 4 months. The one that I just replaced was the same one that was replaced a month or so ago. Both fans in the Jamaica are running and the power supply lights are green.
19 REPLIES 19
A. Clay Stephenson
Acclaimed Contributor

Re: Mirror Disk and dead boot drive

Hi Patrick,

This is strange. My only possible theory is that maybe one of the lvols was not mirrored. I know it's lame but I can't think of anything else. I've had several boot disk failures just as you describe and have never had a problem.
I usually replace the drive sooner that you indicated and I can't recall running over a weekend without a mirror but unless you also have a marginal second drive I can't see why this failed (assuming all lvols were mirrored).

As to your second question, I have a large number of the 18's and have not noticed a higher than usual failure rate. If you don't figure this one out you're going to scare me and I'll need 2 mirrors from now on.

Regards, Clay
If it ain't broke, I can fix that.
Patrick Wallek
Honored Contributor

Re: Mirror Disk and dead boot drive

Clay,

Thanks for the response. Believe me, I'm not trying to panic anyone. I wasn't wild about leaving the machine running on 1 mirror the whole weekend, but when I notice the problem at about 3PM friday afternoon on only have 8a-5p next day support, what could I do? :(

I just rechecked the vg00 VG and all lvol's are showing 2 PVs used. According the lvlnboot everything except the dump area is showing both drives.

I guess my next step is to start poking through diagnostics on the mirror drive and see what crops up. Maybe I'll call that one in and get it replaced too.
A. Clay Stephenson
Acclaimed Contributor

Re: Mirror Disk and dead boot drive

Hi again Patrick,

I went back and checked my journals and over the past three yeears I've had six boot disk replacements 3 - 4GB's, 2 - 9GB's, and 1 - 18GB. (4 under 10.20; 2 under 11.0; 0 under 11i) In all cases, the mirrors kept the machine up without problems. The most plausible explanation is that indeed your other boot disk is marginal and I would replace it ASAP.

Clay
If it ain't broke, I can fix that.
Manju Kampli
Trusted Contributor

Re: Mirror Disk and dead boot drive

Patric,

The reason I could think of is, your system went in to semi hung status where it does allow the current process to run as usval and does not allow any new process to create/run .. like when you tried to run ping/telnet this did not produce any response. and you said, Your perf was working.. this process was already there on the system and did continue to work.
Just have some suspesion about the swap space mirroring .. its worth looking at it.

Hope this helps
Manju
Never stop "LEARNING"
John Poff
Honored Contributor

Re: Mirror Disk and dead boot drive

Patrick,

I've run across a similar situation a couple of times before. Once, I had a boot disk go out on one of our V-class boxes. The system was up and I could ping it, but it would not respond to telnet. I finally had to power it off and get the boot disk replaced. Our theory is that the machine was so busy logging the I/O errors that it couldn't respond to anything else.

Mirror disk did save you, I think your problem was that the disk wasn't totally dead. If the disk had completely fried, your system probably would have been in better shape. It's one of those cases where being half dead is worse than being dead.

JP
Varghese Mathew
Trusted Contributor

Re: Mirror Disk and dead boot drive

Hi,

Can it be a problem created by the faulty disk - like the faulty controller on the disk might have made the normal functioning of the SCSI Bus affected and as a result I/O activities got choked up. I don't see any other reasons for this strange problem.

Cheers !!!
Mathew
Cheers !!!
Thierry Poels_1
Honored Contributor

Re: Mirror Disk and dead boot drive

Hi,
I agree with John. I had the same expierence some years ago, where database queries sometimes worked fine, and sometimes resulted in I/O errors. It seemed that the database sometimes tried to read from the defective mirror and sometimes from the working mirror.
So with a system disk disk errors might have resulted in a panic.
It is indeed recommended to remove a defective disk as soon as possible.
regards,
Thierry.
All unix flavours are exactly the same . . . . . . . . . . for end users anyway.
Patrick Wallek
Honored Contributor

Re: Mirror Disk and dead boot drive

Thierry, John, Varghese,

I have a feeling you may be correct, BUT the kicker is that I did remove the disk that was bad. Since it was dead, I pulled it out of the Jamaica unit so that I could get the part # to give the RC. So there ain't no way it was going to read anything from that disk.

The machine was functioning this way MOST of the weekend. I was able to log into the thing Sunday evening and had no real problems.
John Poff
Honored Contributor

Re: Mirror Disk and dead boot drive

Patrick,

Were there any helpful messages in syslog? Were you running EMS on the box? If so, did it report anything?

Just curious.

JP
Mladen Despic
Honored Contributor

Re: Mirror Disk and dead boot drive

Patrick,

If you are collecting Measureware data on your system,
you could take a look at the global CPU and memory utilization. If Measureware is also collecting process data, you may be able to get more information about the events over the weekend by generating a report via 'extract -xp -gp -r '. You can create by editing /var/opt/perf/reptall . Do 'man extract' for more details.

HTH

Mladen
Patrick Wallek
Honored Contributor

Re: Mirror Disk and dead boot drive

Well Clay, I think you hit the nail on the head with one of your earlier responses to this thread. I have just noticed that my other boot disk on this machine has croaked, given up the ghost, gone kaput, died, whatever metaphor you prefer.

I have just called HP to get yet another disk sent to me.

One thing I have noticed with ioscan and with diagnostics is that the replacement disks I have are a slightly different model from the original. While it shouldn't make any difference, it does make me wonder.......

The newer disks I have received are SEAGATE ST318436LC while the originals that are still running are SEAGATE ST318275LC.
A. Clay Stephenson
Acclaimed Contributor
Solution

Re: Mirror Disk and dead boot drive

Hi Patrick,

I delighted to hear (and I'm sure you are as well) that your 2nd disk bit the dust; otherwise, I was really going to worry about Mirror/UX and ServiceGuard and such.

As for not replacing your disk over the weekend, that should not have made the slightest difference since to these boxes a few seconds is the same as a few days.

It's really a shame that you didn't do this right and crash before you got your volume group resynced and of course without a make_recovery. You would have had fun then.

I'm so gunshy that I actually take mirroring the system disks to 1 more level. Each weekend (or before patching) I do a dd using raw devices to another disk (my lifeboat). This protects me from the two things that mirrors
do not 1) my stupidity and 2) really, really bad patches. It's a nice feeling to know that no matter how stupid I am, I simply have to move the lifeboat into a boot slot and I'm back up.

Anyway, I'm glad you missed an OS restore, Clay
If it ain't broke, I can fix that.
Patrick Wallek
Honored Contributor

Re: Mirror Disk and dead boot drive

I just hope (lots of knocking on wood here) that this is the last disk I have to replace for a while. I do have a query in to my ASE at HP to see if there are any know issues with the disks I have.

I'm keeping my fingers crossed that the machine behaves until I get the new disk from HP and can get it replaced tomorrow.

Varghese Mathew
Trusted Contributor

Re: Mirror Disk and dead boot drive

Hi Patrick,

Oh!!, i just got relieved, once i saw the last comment made by you. I was just thinking about the bug around Mirror/UX, as we have approx. 35 HP 9000 servers and all of them have been installed with Mirror/UX.

Cheers !!!
Mathew
Cheers !!!
Alexander M. Ermes
Honored Contributor

Re: Mirror Disk and dead boot drive

Hi there.
Jesus Christ allmighty, you really scared me with your messages. Nice to hear, it is up and running again. We also have a few 9000 servers here and i would hate the thought of these going crazy. Clay, thks for the hint with the lifeboat. I think, i will do a similar setup.
If possible, could you mail or post some details about it ? I would really appreciate it.
Thks everybody
Alexander M. Ermes
.. and all these memories are going to vanish like tears in the rain! final words from Rutger Hauer in "Blade Runner"
Wodisch
Honored Contributor

Re: Mirror Disk and dead boot drive

Hllo Patrick,

I have noticed that problem on D-classes with
the built-in FW-SCSI controllers, it seems
to be a controller problem, not a disk problem.
Just check when your pair of boot-disks is
re-mirrored, by un-plugging the power from
one of them, "tail -f" the syslog.log and
be amazed: it takes aeons to show the SCSI-
messages...

Just my $0.02,
Wodisch
Tim D Fulford
Honored Contributor

Re: Mirror Disk and dead boot drive

Hi

I had a problem with automatically booting on alternate disk a D320. I could boot from it manually but not automatically. Part of the problem is that HP said they do not support auto bootable mirrored disks if they are outside the internal enclousure!!! However, they went on to say that even if it was an internal disk the F/W needed to be ??? (I forget)

So my advice is
o Check the F/W of both disks
o Make sure pri & mir in internal enclousure (I know they are on the same SCSI bus, but what can you do!)

Good Luck

Tim
-
Tommy Brown
Respected Contributor

Re: Mirror Disk and dead boot drive

Hi, I know this thread has closed, but to let you know of my very recent experience. I hav a K580 with Mirrored Vg00 (2x9.0Gb) Last Monday The primary failed confusing the SCSI bus so that scans would not function(predictive, SAM, ioscan). We did not even become aware of the problem since no one checked the email, or syslogs while I was on vacation. I discovered the failure a week later.. The only problem we experienced was some Oracle users could not reattach after they logged out Tuesday Morn (7days later).. The CE replaced the Dive Tuesday Evening ( not hot-swap) and everything is beautiful... I love this Mirroring..
Tommy
I may be slow, but I get there !
Trevor Dyson
Trusted Contributor

Re: Mirror Disk and dead boot drive

Hi All,

The original symptom sounds very similar to problems I have seen with the hardware diagnostics when a SCSI disk goes belly up. The hw diags start going beserk and use up 100% cpu. The system effectively hangs (I think primarily because the diag processes run at a high priority)

Patches may help, but I have seen this happen a lot on HP-UX 10.20 systems

If this was happening you would see messages in OLDsyslog showing "The diagnostics subsystem is generating messages too rapidly" or somethig similar.

-Trevor
I've got a little black book with me poems in