MSA Storage
cancel
Showing results for 
Search instead for 
Did you mean: 

hsg80 problem ?

 
SOLVED
Go to solution
BR139978
Occasional Visitor

hsg80 problem ?

I've got a single hsg80, with 4 drive selves/enclosures attached. Of the 3 servers hanging off this, two are repeatedly reporting bad blocks and controller errors against the SAN partitions they have; like this;
dmio: Harddisk1 read error at block 36409920: status 0xc000009c

However, the other server has no problems at all. The two faulty servers use drives in different enclosures and pass through the same fibre switch as the working one. All use the same type of KGPSA-FC. Anyone got any ideas as to the root of the problem ? Or why the windows os should even be seeing bad blocks..?

thanks,
Ian
8 REPLIES 8
Uwe Zessin
Honored Contributor
Solution

Re: hsg80 problem ?

There isn't enough information available to tell what the problem is, sorry.

From looking up the error message in the Microsoft knowledgebase it sounds like the system is using a host-based mirror. It is possible that a non-redundant storage set (RAID level) is configured on the HSG, so a bad block cannot be repaired by the storage system.

You would have to provide a lot more information for us to help you diagnosing this system.

I would connect a terminal server to the HSG80's maintenance port (9600,N,1) and look if some error messages come up.

It would be great if you can pull the configuration and attach it as a .TXT file (please, no cut & paste!). The commands are:

> show this_controller full
> show disks full
> show mirrorsets full
> show stripesets full
> show raidsets full
> show concatsets full
> show units full
> show connections
.
BR139978
Occasional Visitor

Re: hsg80 problem ?

apologies - i have it attached.

the servers with an issue are using the fp01r51 and db01r51. The good server is using the fp02r5x arrays. The other arrays are not in use any longer. There are two sets of connectins for db01 as it has had a dead hba.

thanks, ian
Uwe Zessin
Honored Contributor

Re: hsg80 problem ?

I don't see anything wrong
  • on the HSG. Did you get any unexpected output on the maintenance port?

  • Apart from not seeing any 'defects', the system is in a very bad shape. Disk drives for a RAID set should be equally split over all available controller ports, but they very often sit on a single SCSI bus. If that bus fails, the entire set is gone.

    Same for the spare disks - they should be distributed over all ports. Next, you either have two bad disks or those slots (DISK40500 + DISK60500) are no longer in use.

    Configuration data should not be saved on ALL disks in the array. And there are even some unused disks.

    I don't know the exact physical looking of your storage, but the attached report might be useful anyway.
  • .
    BR139978
    Occasional Visitor

    Re: hsg80 problem ?

    I freely admit that the configuration isn't great...the two missing drives are indeed not there. Logging in to the console port didn't produce anything different to what i get from the cli window. I didn't realise that about the config data, i shall look into it.
    Uwe Zessin
    Honored Contributor

    Re: hsg80 problem ?

    You can safely remove the device entries, then:

    > delete DISK40500
    > delete DISK60500

    This will also clear the errors on the port button LEDs.

    Unfortunately, the only way to get rid of the configration data from a storage set is to re-initialize. Not a good idea without a backup.

    About the Windows error messages... is this always the same block on the disk or is the number scattered all over the disk?

    Next step would be to check the fiber links on the Fibre Channel switch. I assume you have a Brocade OEMed SANswitch. I would check the error counters before and after high activity. The command is:

    switchname:admin> portErrShow
    .