ProLiant Servers (ML,DL,SL)
1753701 Members
4844 Online
108799 Solutions
New Discussion

After kernel upgrade RAID B110i failed (Experts)

 
SOLVED
Go to solution
tomas3man
Occasional Collector

After kernel upgrade RAID B110i failed (Experts)

Hello,

I have HP DL320 G6 (Smart Array B110i SATA) server with two 250GB HDD SATA disks. This is Master production server. It runs RHEL 5.7 i386.

 

When I "updated" my OS to 5.10, i now see 2 disk devices (sda, sdb) which are seemingly the same. However, they -should- be 1 device and mirrored. Further alarming is that the box booted from /dev/sdb1 (/boot) and that LVM is using /dev/sda2. 

 

I have upgraded 5.10 and did forget to upgrade kernel module hpahcisr.

 

My biggest concern is when i fix the RAID issue, which device will become the SOURCE (current) copy and which will be the DESTINATION potentially overwriting the data that has been written to since the upgrade.

 

Assumptions:
* booting back to the old kernel may not even possible, and i don't want to do that due to fear of how the RAID mirroring will behave (overwritting your new(er) data)


* i performed an "in-place" upgrade from 5.7 to 5.10 (as opposed to wiping the / volume)


* I believe LVM now sees 2 copies of the same Volume Group. It will then, of course, invalidate one copy.

 

 

I have Slave production with identical hardware/software setup. And Slave did not boot up with any kernel. The same case. I had to re install it from zero.

 

I have check HP website, and it said that i have to download kernel module as hpahcisr-***.rpm and install it. I did it. But by mistake install wrong module. I have installed 5.9 not 5.10. Then after reboot Slave did not want boot from any kernel. HP webpage suggest it will boot up from previous kernel.

 

In BIOS level RAID was 100% working and operational.

 

Now I will explain issue with /dev/sda.

 

On re installed slave now fdisk -l display that:

 

Disk /dev/sda: 250.0 GB, 250023444480 bytes
255 heads, 63 sectors/track, 30396 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 * 1 13 104391 83 Linux
/dev/sda2 14 30396 244051447+ 8e Linux LVM

 

 

This is normal working state. /dev/sda1 is /boot. And /dev/sda2 LVM with /home and /var and /.

 

Moving to MASTER server not working well right now fdisk shows:

 

Disk /dev/sda: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 * 1 13 104391 83 Linux
/dev/sda2 14 30396 244051447+ 8e Linux LVM

Disk /dev/sdb: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 * 1 13 104391 83 Linux
/dev/sdb2 14 30396 244051447+ 8e Linux LVM

 

 

You see two disks as separate device, RAID is not working.

 

On Master Server command return that:

 

[root@wawa-slev5-b ~]# lsmod | grep -i hp
[root@wawa-slev5-b ~]#

 

[root@wawa-slev5-b ~]# rpm -qa | grep hpa
[root@wawa-slev5-b ~]#

 

[root@wawa-slev5-b ~]# find /lib/modules/ | grep hpahci
/lib/modules/2.6.18-238.el5PAE/updates/hpahcisr.ko

 

[root@wawa-slev5-b ~]# ls /lib/modules/
2.6.18-238.el5PAE 2.6.18-274.12.1.el5PAE 2.6.18-371.4.1.el5PAE

 

[root@wawa-slev5-b ~]# uname -r
2.6.18-371.4.1.el5PAE

 

[root@wawa-slev5-b ~]# blkid
/dev/sdb1: LABEL="/boot" UUID="f59b672a-a5c4-4f64-9720-321ed20c057d" SEC_TYPE="ext2" TYPE="ext3"
/dev/mapper/VolGroup00-LogVol01: TYPE="swap"
/dev/mapper/VolGroup00-LogVol04: UUID="a0b19fd1-4488-4a48-9d8b-016e29df91a8" TYPE="ext3"
/dev/mapper/VolGroup00-LogVol03: UUID="1d30b1b2-a7e0-4431-a1c4-500fa9667b32" TYPE="ext3"
/dev/mapper/VolGroup00-LogVol02: UUID="bfef3c3f-7887-4c1a-9ce7-bf916f13b4a4" TYPE="ext3"
/dev/mapper/VolGroup00-LogVol00: UUID="72900b8e-a7a2-4a84-9f22-61f9df664c31" TYPE="ext3"
/dev/sr0: LABEL="RHEL/5.6 i386 DVD" TYPE="iso9660"
/dev/sda1: LABEL="/boot" UUID="f59b672a-a5c4-4f64-9720-321ed20c057d" TYPE="ext3" SEC_TYPE="ext2"
/dev/VolGroup00/LogVol00: UUID="72900b8e-a7a2-4a84-9f22-61f9df664c31" TYPE="ext3"
/dev/VolGroup00/LogVol01: TYPE="swap"

 

[root@wawa-slev5-b ~]# mount
/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/mapper/VolGroup00-LogVol02 on /home type ext3 (rw)
/dev/mapper/VolGroup00-LogVol03 on /var type ext3 (rw)
/dev/mapper/VolGroup00-LogVol04 on /opt type ext3 (rw)
/dev/sdb1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)

 

 

[root@wawa-slev5-b ~]# cat /etc/fstab
/dev/VolGroup00/LogVol00 / ext3 defaults 1 1
/dev/VolGroup00/LogVol02 /home ext3 defaults 1 2
/dev/VolGroup00/LogVol03 /var ext3 defaults 1 2
/dev/VolGroup00/LogVol04 /opt ext3 defaults 1 2
LABEL=/boot /boot ext3 defaults 1 2
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
/dev/VolGroup00/LogVol01 swap swap defaults 0 0

The newest kernel module for RHEL 5.7 and 5.10 is:
kmod-hpahcisr-PAE-rhel5-1.2.6-16.rhel5u10.i686.rpm
kmod-hpahcisr-PAE-rhel5-1.2.6-16.rhel5u7.i686.rpm

My server was updated but not re booted.

 

 

[root@wawa-slev5-b ~]# cat /etc/issue
Red Hat Enterprise Linux Server release 5.10 (Tikanga)

vgdisplay :
--- Physical volumes ---
PV Name /dev/sda2
PV UUID Llavx8-aQYP-ITwo-Hr4r-e1ss-TU4s-G0G5a8
PV Status allocatable
Total PE / Free PE 7447 / 0

 

 

As you see, LVM is working from /dev/sda2 and /boot is on /dev/sdb1. It is random.

Question remains : if I install kernel module kmod-hpahcisr-PAE-rhel5-1.2.6-16.rhel5u10.i686.rpm and re boot server, is it be synchronized?

 


I wish to be copied form latest /dev/sda2 to oldest /dev/sdb2, if revers it will be disaster.

 

 

 

2 REPLIES 2
Matti_Kurkela
Honored Contributor

Re: After kernel upgrade RAID B110i failed (Experts)

Before doing anything to fix the RAID issue, make sure you have a good backup of your server. That way, if the RAID recovery goes wrong, the only thing you will lose is some amount of time.

 

The B110i controller is tricky because it is essentially a standard Intel AHCI SATA controller; all the RAID functionality is in the hpahcisr driver. The standard Linux kernel already includes a driver for AHCI SATA controllers: the ahci.ko kernel module. When you look carefully at the Linux OS installation instructions for hpahcisr, they include a step to blacklist the standard ahci.ko module. Looks like you missed that step. Many others have made the same mistake when using the B110i for the first time.

 

Also, when you're updating a system that uses the hpahcisr driver, before installing any kernel updates you must verify that the exact version of the new kernel is supported by the hpahcisr driver, and upgrade the hpahcisr driver RPM if necessary. It may be that even the newest available version of the hpahcisr driver does not support the newest available kernel: in that case, you must delay the kernel update until the corresponding hpahcisr update is released by HP.

 

In your current situation, the standard ahci.ko driver module may be included in the current initrd file, so after installing the updated hpahcisr driver, you should:

1.) add the "blacklist ahci" line to either /etc/modprobe.conf if it does not already exist

2.) verify that any "alias scsi_hostadapter" lines in /etc/modprobe.conf refer to hpahcisr instead of ahci

3.) update your initrd file (see "man mkinitrd")

 

Because the hpahcisr driver is used to access your system disks, the driver must be included in the initrd file, so that it can load before the root filesystem is mounted. The mkinitrd command will transform the scsi_hostadapter aliases in /etc/modprobe.conf to explicit module loading commands in the early-boot script within the initrd file.

 

I haven't needed to do a hpahcisr recovery like yours, but I think I would proceed like this:

1.) take a backup

2.) find out which disk is currently being used (perhaps by looking at the disk LEDs?), power down the system and pull the disk that is not currently active (= contains stale data)

3.) restart and do whatever is needed to make the hpahcisr driver load and recognize the existing half of the RAID set.

As soon as the hpahcisr driver is writing to the disk again, it should also update the RAID metadata so it will know which disk is new and which one is old. After that, you can replace the other disk. Now the "good" disk has newer version of the RAID metadata than the "old" one, so the driver should automatically figure out which way to sync. If it syncs the wrong way... well, that's why you took the backup in the first place.

 

(The BIOS does minimal writes to the disk, so it may not update the RAID metadata unless you explicitly modify the RAID settings - and it certainly does not know anything about any writes done by the standard ahci driver.)

MK
tomas3man
Occasional Collector
Solution

Re: After kernel upgrade RAID B110i failed (Experts)

Hello