MSA Storage

hsg80 problem ?

 
SOLVED
Go to solution
BR139978
New Member

hsg80 problem ?

I've got a single hsg80, with 4 drive selves/enclosures attached. Of the 3 servers hanging off this, two are repeatedly reporting bad blocks and controller errors against the SAN partitions they have; like this;
dmio: Harddisk1 read error at block 36409920: status 0xc000009c

However, the other server has no problems at all. The two faulty servers use drives in different enclosures and pass through the same fibre switch as the working one. All use the same type of KGPSA-FC. Anyone got any ideas as to the root of the problem ? Or why the windows os should even be seeing bad blocks..?

thanks,
Ian
8 REPLIES 8
Uwe Zessin
Honored Contributor
Solution

Re: hsg80 problem ?

There isn't enough information available to tell what the problem is, sorry.

From looking up the error message in the Microsoft knowledgebase it sounds like the system is using a host-based mirror. It is possible that a non-redundant storage set (RAID level) is configured on the HSG, so a bad block cannot be repaired by the storage system.

You would have to provide a lot more information for us to help you diagnosing this system.

I would connect a terminal server to the HSG80's maintenance port (9600,N,1) and look if some error messages come up.

It would be great if you can pull the configuration and attach it as a .TXT file (please, no cut & paste!). The commands are:

> show this_controller full
> show disks full
> show mirrorsets full
> show stripesets full
> show raidsets full
> show concatsets full
> show units full
> show connections
.
BR139978
New Member

Re: hsg80 problem ?

apologies - i have it attached.

the servers with an issue are using the fp01r51 and db01r51. The good server is using the fp02r5x arrays. The other arrays are not in use any longer. There are two sets of connectins for db01 as it has had a dead hba.

thanks, ian
Uwe Zessin
Honored Contributor

Re: hsg80 problem ?

I don't see anything wrong
  • on the HSG. Did you get any unexpected output on the maintenance port?

  • Apart from not seeing any 'defects', the system is in a very bad shape. Disk drives for a RAID set should be equally split over all available controller ports, but they very often sit on a single SCSI bus. If that bus fails, the entire set is gone.

    Same for the spare disks - they should be distributed over all ports. Next, you either have two bad disks or those slots (DISK40500 + DISK60500) are no longer in use.

    Configuration data should not be saved on ALL disks in the array. And there are even some unused disks.

    I don't know the exact physical looking of your storage, but the attached report might be useful anyway.
  • .
    BR139978
    New Member

    Re: hsg80 problem ?

    I freely admit that the configuration isn't great...the two missing drives are indeed not there. Logging in to the console port didn't produce anything different to what i get from the cli window. I didn't realise that about the config data, i shall look into it.
    Uwe Zessin
    Honored Contributor

    Re: hsg80 problem ?

    You can safely remove the device entries, then:

    > delete DISK40500
    > delete DISK60500

    This will also clear the errors on the port button LEDs.

    Unfortunately, the only way to get rid of the configration data from a storage set is to re-initialize. Not a good idea without a backup.

    About the Windows error messages... is this always the same block on the disk or is the number scattered all over the disk?

    Next step would be to check the fiber links on the Fibre Channel switch. I assume you have a Brocade OEMed SANswitch. I would check the error counters before and after high activity. The command is:

    switchname:admin> portErrShow
    .
    BR139978
    New Member

    Re: hsg80 problem ?

    Ok, there aren't any errors on the fibre switch, so i'm wondering if it's the HBAs.

    Ian
    CA1004210
    Frequent Advisor

    Re: hsg80 problem ?

    Ian,

    I have experienced such errors in the past on Windows hosts - with SAN storage allocated to them especially on HSG80. I don't recall it seeing such errors on EVA connected Windows nodes though.

    If I recall it correctly (been a while) - we narrowed the issue down to old Firmware revision on HSG80 drives.

    You may want to consider that..other than that, just the basics (that we did when we were looking at this problem - ensure everything at the server end is update as in Microsoft Service Packs, patches, system BIOS, Proliant Support Packs - most importantly check the firmware of HSG80 Hard drives tho!)

    Hope, it helps...
    BR139978
    New Member

    Re: hsg80 problem ?

    thank you for that - i shall look at flashing the disk firmware as soon as i can. at the moment i'm trying to move the files off the disk to let me format the partition on the hsg80 to see if that improves things.

    ian