ProLiant Servers (ML,DL,SL)
1752642 Members
5704 Online
108788 Solutions
New Discussion юеВ

Re: "Repair file system" problem

 
batbold_hp
Occasional Advisor

"Repair file system" problem

Dear all,

I have an urgent issue with our HP ProLiant DL 585 Server. Red Hat 5 nash version installed on it. Recently there was a power failure. After fixing it storage controller automatically rebuilding affected storage hard disks. That's redundancy is RAID5.
While it's working, there might be high load on the server & system automatically mounted file-system read-only.
Then I rebooted the server but it doesn't start. Following errors occured:

Checking filesystems
fsck.ext3: No such file or directory while trying to open /dev/sda1
....
*** An error occurred during the file system check.
*** Dropping you to a shell: the system will reboot
*** When you leave the shell
Give root password for maintenance

..................
repair file system #


I don't have any idea to do. I've tried several HP storage related CDs but doesn't effect. I am now waiting the storage controller for building affected storage hard drives.
Please help me any idea if you have?

Thank you very much for you help
7 REPLIES 7
Matti_Kurkela
Honored Contributor

Re: "Repair file system" problem

First, find out where this filesystem is supposed to be mounted:

grep /dev/sda1 /etc/fstab

If the filesystem is non-essential, you might add a "noauto" option and change the fsck pass number to 0 on this filesystem's line in /etc/fstab to let the system start up without it. You can then fix it later when the worst of your crisis is over.

Based on the device name /dev/sda1, I'd say this is _not_ an internal SmartArray hard disk. It could be an external SCSI or FibreChannel disk. If so, what is the type of the Host Bus Adapter (=SCSI or FC card) used to connect it? And what kind of external device contains the disk?

Are all the external storage devices (if any) powered on and in a good state?

MK
MK
batbold_hp
Occasional Advisor

Re: "Repair file system" problem

Thank you MK for your reply

/dev/sda1
/dev/sdb1
...

These disks are located on the HP storage box. It connected to the server by fibre-optic cabel. They are SCSI hard disks.
These filesystems are written to be mount "/etc/fstab" file. File system type is ext3.

HP storage box is now powered on & rebuilding affected disks which is off when power failure occurred.

May I replace local server hard disks for backup & install OS to new located disks & mount the Storage box disks?


Thank you again
Matti_Kurkela
Honored Contributor

Re: "Repair file system" problem

The question is, are the server and the storage box talking to each other at the hardware level? If not, OS reinstallation will not help.

Is the driver for the fiber-optic interface card loaded? If so, look into directory /proc/scsi. There should be a sub-directory for each SCSI-like HBA.

(I *assume* your fibre-optic interface is a FibreChannel card, but as you have not told us the specific types of the card nor the storage box, I cannot be certain of that.)

Within the sub-directory for your fibre-optic HBA, you should find a file that reports the status of the HBA.

If the status is something like "AWAITING_LINK_UP", it means the card cannot communicate with the storage box. If so, inspect the fibre-optic cable: it is usually very thin and gets damaged very easily. Someone might have e.g. stepped on it while working to restore power to the systems.

If the fibre-optic connection is OK, we will need to get more information about the state of the storage box.

Run "cat /proc/partitions" and "cat /proc/scsi/scsi" to get a list of all disk devices the system currently sees. Does it contain _any_ disk devices associated with the storage box?

It might be useful to get the storage box management utility running and verify that the configuration of the storage box is OK.

If more than one physical disk was lost, it is possible that your /dev/sda1 device was not visible because it has uncorrectable damage: in that case, you might have to use the storage box management utility to allow the storage box to re-initialize the /dev/sda1 LUN and allow you to restore your data from backups.

If you need your server in GUI mode to run the storage box management utility, you must configure the system to skip mounting the /dev/sda1 for now, as I suggested in my previous reply.

MK
MK
batbold_hp
Occasional Advisor

Re: "Repair file system" problem

Thank you very much for helping,

cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 05 Lun: 00
Vendor: HP Model: Ultrium 1-SCSI Rev: P53W
Type: Sequential-Access ANSI SCSI revision: 03

Host: scsi0 Channel: 00 Id: 05 Lun: 01
Vendor: HP Model: 1x8 autoloader Rev: 1.50
Type: Medium Changer ANSI SCSI revision: 03


cat /proc/partitions
major minor #blocks name
104 0 143367120 cciss/c0d0
104 1 104391 cciss/c0d0p1
104 2 143259637 cciss/c0d0p2
104 16 143367120 cciss/c0d1
104 17 143364028 cciss/c0d1p1
253 0 219400064 dm-0
253 1 67108866 dm-1


Storage box is:
HP StorageWorks
Modular Smart Array 1000

Link is connected by:
Adapter 1 LP1050 : PCI Bus #:06 PCI Device #: 0E

Data the on the storage box is critical for us, hasn't backuped completely.

Can I make sure when I reinitialize /dev/sda1 data in it is not lost?

I just downloaded "ProLiant Essential Firmware" software & may it reinitialize the storage box?
batbold_hp
Occasional Advisor

Re: "Repair file system" problem

I tried to edit /etc/fstab to comment loading from external disks. But I am unable to edit.
It says "This is read-only system"!
Matti_Kurkela
Honored Contributor

Re: "Repair file system" problem

So your root filesystem is in read-only state.

This command will fix that:
mount -o remount,rw /

Now you should be able to edit /etc/fstab.

If your data is not backed up elsewhere, DO NOT REINITIALIZE any disk devices.

Your /proc/scsi/scsi and /proc/partitions indicate that your system is not seeing the storage box at all.

Possible causes:
- driver not loaded?
- broken cable?
- broken LP1050 card?

The full name of the LP1050 card would be "Emulex LP1050 FibreChannel HBA". Please run "lsmod |grep lpfc". If it returns nothing at all, the driver is not loaded yet; run "modprobe lpfc" to load it. This command may cause a lot of messages to be printed; that is OK. If the messages indicate that the sda disk becomes available, that is good.

If the lsmod command indicates the lpfc driver module is already loaded: run "dmesg |grep lpfc" to see the messages of the driver.

If you see a message like "lpfc ... Link Up Event ...", it means the LP1050 has fibre-optic connection to the storage box. A message like "lpfc ... Link Down Event ..." means the connection is lost.

If the system worked before, firmware updates are not likely to fix the current problem. Please do not waste any time with "Proliant Essential Firmware" at this point.

Your system is currently in a special startup problem fixing mode. Your first objective should be to get the system boot up a little better, even if the applications still won't start. After this, you can use all the diagnostic utilities and verify that the connection to the storage box is working.

Your second objective should be to TAKE A FULL BACKUP of your critical data as soon as the disks can be accessed. This way you don't permanently lose anything if the problem becomes worse.

Getting the applications running should be the last objective.

Please show us your /etc/modprobe.conf and /etc/fstab files; seeing them will help in planning the next actions.

MK
MK
batbold_hp
Occasional Advisor

Re: "Repair file system" problem

Thank you very much MK,

You helped me out of this problem.
I made local filesystem into read-write state by using following command:
mount -n -o remount /

Then OS startup was successfull but still couldn't connect to storage box. I tried such ways but couldn't connect the storage box. It's urgent so I coulnd't wait.
Then I unpluged the power of the 3rd box which powered off due to power failure. It might cause this crash.
Then switched off the entire storage box & on. Restarted the local filesystem.
When local filesystem was on I tried to mount storage box hard drives, it successfully connected. Now all seems working normally except the 3rd box. It is now building its disks from 1st, 2nd boxes.

What do you think what caused this problem?
How should we prevent this kind of problem?
In my opinion power failure on the 3rd box mainly caused this problem. We connected it to the redundant power.

Thank you