Operating System - HP-UX
1834625 Members
3152 Online
110069 Solutions
New Discussion

Re: How safe to use disk with fsck problems

 
SOLVED
Go to solution
John Bray_1
Occasional Advisor

How safe to use disk with fsck problems

I am in the UK. I habe a customer in Bermuda has a single HP-UX system on trial. We plan to replace it with an operational Linux one in July when the software port is ready. The customer has little knowledge of HP-UX, and there is probably not much support on the island.

The machine only has 7GB of disk space, 4.5 GB of which is in a logical volume with our application and data on. That volume has a disk error that fsck cannot fix. Its hfs so I can mount it with the -f option to ignore the errors.

I have various choices

1) continuing to use the dodgy volume for 2 more months

2) clearing and creating a new fs over the dodgy area. The space must be reused as there isn't enough space elsewhere on the machine

3) Arranging for a new disk to be installed.

(1) is the least work, (2) might come a cropper if I can't make a new fs, and (3) is likely to awkward given the lack of local support. So I wonder how safe it is to continue using the dodgy volume for the duration?

All offers to fly out to Bermuda to fix the problem will probably be rejected :-)

John
17 REPLIES 17
Stefan Farrelly
Honored Contributor

Re: How safe to use disk with fsck problems

In my opinion it cant be a serious problem if your app/server stays up with it mounted with the -f option so I would leave it alone. Chances are it will last another 2 months until replacement and you could well be creating more problems for yourself by attempting 2) or 3).
Im from Palmerston North, New Zealand, but somehow ended up in London...
Pete Randall
Outstanding Contributor

Re: How safe to use disk with fsck problems

John,

You broke my heart with that last sentence!

I would have to ask how good are your backups? If you have *GOOD* backups, taken frequently, then I might consider option 1, but I would still look into option 3. If the disk does decide to fail, you're going to need to have a replacement plan in place - and you're going to need those backups!


Pete

Pete
Bill Hassell
Honored Contributor

Re: How safe to use disk with fsck problems

mount -f ignores the fact that the directory has not been fixed properly. It is a VERY dangerous option that often leads to system crashes! fsck can NEVER fix a bad spot on the disk. It only fixes logical errors in the directory structures. So you're system is running on borrowed time, just waiting to crash beyond repair. You need a good Ignite/UX backup and a good fbackup of everything as soon as possible.

I'll be glad to travel by ship to Bermuda to fix the problems (and avoid all that flying).


Bill Hassell, sysadmin
A. Clay Stephenson
Acclaimed Contributor

Re: How safe to use disk with fsck problems

Because this is a trial system, I assume nothing critical (to the customer) is running on it. Your main concern (I assume) is blowing the sale. While the fsck won't fix the problem, it should have marked the blocks bad so that they are not used.
While fsck will fix logical block problems, any data on those blocks is probably corrupt and/or inaccessible.

If this were me, I would explain the problem to the customer and put him in the decision loop. The problem you face is that he may associate flaky behavior caused by disk problems with flaky software. Of course, disks are excellent excuses for bad software - so it cuts both ways.

Your marketing guys could probably use this as a good example of why backup solutions / redundant systems are so vital.

It has been my experience that flaky disks seldom improve with time; I would power-cycle as infrequently as possible because there is a rather significance chance that the disk will not come up.

It's really time to ask the customer; an honest, sincere approach at this point could prove as valuable as flawless hardware/software performance.

If it ain't broke, I can fix that.
John Bray_1
Occasional Advisor

Re: How safe to use disk with fsck problems

Pete

I was afraid someone would ask about backups. As a trial machine, we have no backup strategy other than storing the system configuration back in the UK. Hence my worry that if I muck things up I'd have to resend 500Mb of data across an ASDL speed link or post a CD.

As the machine was carried across in hand luggage from the UK, I'm pretty sure it has no handy extras like tape drives, so I'm stuck with network backups.

I am backing up now to another machine on site, and could keep them in sync with mirrordir.

Any attempts to do restores from backups are likely to hit problems if I ever can't make a new FS in the dodgy part of the disk, so I'm wary to investigate that too much with the keyhole access I have.

John
A. Clay Stephenson
Acclaimed Contributor

Re: How safe to use disk with fsck problems

Because this is a trial system, I assume nothing critical (to the customer) is running on it. Your main concern (I assume) is blowing the sale. While the fsck won't fix the problem, it should have marked the blocks bad so that they are not used.

If this were me, I would explain the problem to the customer and put him in the decision loop. The problem you face is that he may associate flaky behavior caused by disk problems with flaky software. Of course, disks are excellent excuses for bad software - so it cuts both ways.

Your marketing guys could probably use this as a good example of why backup solutions / redundant systems are so vital.

It has been my experience that flaky disks seldom improve with time; I would power-cycle as infrequently as possible because there is a rather significance chance that the disk will not come up.

It's really time to ask the customer; an honest, sincere approach at this point could prove as valuable as flawless hardware/software performance.

If it ain't broke, I can fix that.
A. Clay Stephenson
Acclaimed Contributor

Re: How safe to use disk with fsck problems

Because this is a trial system, I assume nothing critical (to the customer) is running on it. Your main concern (I assume) is blowing the sale. While the fsck won't fix the problem, it should have marked the blocks bad so that they are not used.

If this were me, I would explain the problem to the customer and put him in the decision loop. The problem you face is that he may associate flaky behavior caused by disk problems with flaky software. Of course, disks are excellent excuses for bad software - so it cuts both ways.

Your marketing guys could probably use this as a good example of why backup solutions / redundant systems are so vital.

It has been my experience that flaky disks seldom improve with time; I would power-cycle as infrequently as possible because there is a rather significance chance that the disk will not come up.

It's really time to ask the customer; an honest, sincere approach at this point could prove as valuable as flawless hardware/software performance.

If it ain't broke, I can fix that.
John Bray_1
Occasional Advisor

Re: How safe to use disk with fsck problems

The disk problem is spreading. The customer is fully aware of the problem, and the best solution is to fly out a new disk. If I configure it on a C180 here can it be plugged into a C200 there and boot cleanly. Are the 2 machines close enough for an install on one to work on the other?
A. Clay Stephenson
Acclaimed Contributor

Re: How safe to use disk with fsck problems

There are differences between a C180 and a C200 that extewnd beyond processor speeds. The C2XX's have an additional fast SCSI bus. By far, the safest option would be to ship the unit. In fact, even if you were sending a drive from a C180 it might not work because of different video card/monitor combinations.
If it ain't broke, I can fix that.
John Bray_1
Occasional Advisor

Re: How safe to use disk with fsck problems

By chance I found some colleagues who were flying out at the weekend, so I gave them a C180 configured HD and internal CD-ROM to plug in. Hopefully this will avoid using the fast SCSI bus Clay talks about. Sending the whole box would be expensive and or slow.
We shall see if this works

BTW, I could not assign points to you all from Konqueror, had to switch to Mozilla.

John
John Bray_1
Occasional Advisor

Re: How safe to use disk with fsck problems

The new disk has arrived in Bermuda, but refused to be recognised by the system either as a boot disk or as a resource for an OS install. They have been trying combinations of positions on the cables and keeping the old disks (it turns out we had 2*4Gb rather than 1 8Gb) in place, but all to no avail.

We're stumped.
A. Clay Stephenson
Acclaimed Contributor

Re: How safe to use disk with fsck problems

The physical positions on the SCSI cables mean absolutely nothing. I suspect that you have duplicate SCSI ID's - which are set by jumpers on the drives. The other thing is a bad controller, bad terminator, or bad cable. Remember, the SCSI bus must be terminated in exactly two places - on the ends of the bus.

If it ain't broke, I can fix that.
A. Clay Stephenson
Acclaimed Contributor
Solution

Re: How safe to use disk with fsck problems

One more possibility is that the firmware may not recognize the new disk - especially if the disk is much newer than the firmware on the old box and you replaced using a different drive model.

If it ain't broke, I can fix that.
Bill Hassell
Honored Contributor

Re: How safe to use disk with fsck problems

The good news about SCSI is that it has been standardized. The bad news is that there an obscene number of 'standards', some of which are completely incompatible. So unless the disk is an exact replacement (same manufacturer and model number), there is a significant probability that the disk is incompatible with the interface. You can verify this with ioscan. Disconnect the new disk and run:

ioscan -fC disk

It will report on all the known disks. Now add the new disk (make a note of the SCSI address for this disk) and rerun the same command. There are 3 possibilities:

1. One of the current disks has disappeared, which indicates that the new disk has the same address as an existing disk. Every disk on a particular channel must have a unique address which is usually set with jumpers.

2. The new disk makes no difference in the ioscan listing which means that it is dead. ioscan performs a low-level SCSI-ID command for every address and if the new disk doesn't reply, it has an electronic failure (not common but does happen once in a while).

3. The new disk does show up and now needs gto be incorporated into the existing LVM structures.

If #3 is ture but SAM desn't see the new disk as an unassigned, you are probably SCIS, SAM and diskvdriver missing patches (perhaps many).


Bill Hassell, sysadmin
John Bray_1
Occasional Advisor

Re: How safe to use disk with fsck problems

I don't think the problem is the age of the disk, as it came from a C180 of the same vintage as the C200.

It might be a problem with the termination or SCSI id. I know that external SCSI devices have switches to indicate their ids, do internal drives have something similar set with jumpers? They have tried combinations of disks, but surely one HD and 1 CDROM should work, especially as both worked together on the donor machine.

It is unlikely to be a cable or controller failure unless the corrupt disk problems we saw last week could be caused by cable or controller, which seems odd to me

I'll try and get them to do Bill's suggestion, booting with the pair of old disks as vg00, then rebooting with the new disk as well and see if it appears on the ioscan.

Thanks for your continuing support on this.
Keely Jackson
Trusted Contributor

Re: How safe to use disk with fsck problems

Hi John

Yes the internal disks do have jumpers to set the scsi id.

Cheers
Keely
Live long and prosper
Mike Fisher_5
Trusted Contributor

Re: How safe to use disk with fsck problems

Hi John

I'm not sure how helpful this is, but here goes...

1] Discs:
These are cheap as chips -
why not hunt around for an exact replacement for the dodgy one ?

2] Bermuda job market:
Classified Ads indicate there are quite a few HP-UX end-users in Bermuda
Therefore there will be 3rd-party support based on the Island -
I'd be amazed [& very interested from a biz POV] if this wasn't so

Good luck
Mike [Beachcomber] Fisher
Don't get mad - get naked