System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

fsck.ext3 error upon booting

nabeelhassan
Occasional Contributor

fsck.ext3 error upon booting

Hie all,

 

I am having ProLiant DL380 G7 server, what happened is that we were getting I/O error if we perform any action on the server therefore we restarted after it when its booting its giving error as per the attached snap.

 

Below is the diagnostics output for the storgae information

 

Hard Drive 1, Storage Controller in Slot 0           300.0 GB,10k RPM,SAS,HP EG0300FAWHV

Hard Drive 2, Storage Controller in Slot 0           300.0 GB,10k RPM,SAS,HP EG0300FAWHV

Hard Drive 3, Storage Controller in Slot 0           300.0 GB,10k RPM,SAS,HP EG0300FAWHV

Hard Drive 4, Storage Controller in Slot 0           300.0 GB,10k RPM,SAS,HP EG0300FAWHV

Hard Drive 5, Storage Controller in Slot 0           300.0 GB,10k RPM,SAS,HP EG0300FAWHV

Hard Drive 6, Storage Controller in Slot 0           300.0 GB,10k RPM,SAS,HP EG0300FAWHV

Hard Drive 7, Storage Controller in Slot 0           300.0 GB,10k RPM,SAS,HP EG0300FAWHV

Logical Drive 1, Storage Controller in Slot 0       300.0 GB, RAID 1 - OK

Logical Drive 2, Storage Controller in Slot 0       1.2 TB, RAID 5 - Failed

 

Please let me know what to do.

 

Thanks

Nabeel

 

1 REPLY
Matti_Kurkela
Honored Contributor

Re: fsck.ext3 error upon booting

> Logical Drive 2, Storage Controller in Slot 0       1.2 TB, RAID 5 - Failed

 

With Linux device names, logical drive 1 would be /dev/cciss/c0d0; logical drive 2 would be /dev/cciss/c0d1.

 

If the hardware RAID controller reports a logical drive as "Failed", any attempts to read that logical drive through the operating system are likely to fail too. In your screenshot, this seems to be exactly what is happening.

 

A RAID 5 set allows one physical drive to fail without causing data loss. But if another drive fails before the first failed drive is replaced and the recovery is complete, the RAID 5 set will fail and you will lose the data in it unless you have it backed up somewhere else.

 

If Linux has been installed on /dev/cciss/c0d0 and c0d1 is only used as application/data storage, you could enter the root password in the prompt displayed in your screenshot to access the system in a single user mode.

Then, first make sure the the root filesystem is writeable:

mount -o remount,rw /

 Then you can use your favourite editor to comment out /dev/cciss/c0d1p1 and /dev/cciss/c0d1p2 from /etc/fstab, and reboot the system. That might allow the system to boot up to normal state, although obviously with those two partitions of c0d1 logical drive missing.

 

The next steps would be to use the ACU (the SmartArray configuration utility; either "hpacucli" or its graphical version with the WebGUI) to make sure all the physical drives are OK and re-initialize the failed logical drive so that it will become accessible to the OS again. Then re-partition the c0d1 logical drive, run "mkfs" on the c0d1p1 and c0d1p2 partitions, uncomment the /etc/fstab entries, mount the partitions again (now totally empty), and start restoring the data from backups.

 

Oh... and the last step would be to find out what went wrong. Did some of the physical drives fail at some point? Was there some software capable of monitoring SmartArray RAID controllers installed? Was it not configured to send disk failure alerts to the correct place? Did someone delay replacing the first failed drive until another disk failed?

MK