StoreVirtual Storage
1752670 Members
5844 Online
108788 Solutions
New Discussion

Cannot log in to management group

 
SOLVED
Go to solution
fusiongroup
Advisor

Cannot log in to management group

Hi,

Really hoping someone can give me a few ideas on what to do here... Background: We had a switch fail that two of our four nodes, and our FOM, were connected to.

Once this was fixed, one of the nodes couldn't be connected to : "Login failed. Read timed out". All nodes pinged fine except the one having the problem, it would not respond to jumbo packet pings. It did respond to default size packet pings so I changed the packet size on the CMC machine to default and successfully logged on. I checked the switch port and jumbo packets were enabled so I presumed it must be an issue with the NIC on the node. I 'repaired the system' to take it out of the management group and resync the config. This got stuck for a couple of hours so I cancelled it.

The SAN is still up but now I cannot connect to any of the four managers. They all just sit at "Waiting 60 seconds to connect and log in..." then bring up the same "Login failed. Read timed out" error.

I've tried rebooting the server with the CMC on it mulitple times, I've tried switching jumbo frames on/off, I've tried connecting to each manager in turn. I'm just getting a response from any of them.

Any tips as how I can get these working? Reboot one/all of the nodes? I don't really want to switch them off/on in case I lose access to the data. These hold all our business critical data so I can't afford for it to be down. Fortunately I do have the option to migrate all VMs to a different SAN, it'll just take days, but I can trash everything and start from scratch if need be.

7 REPLIES 7
BenLoveday
Valued Contributor

Re: Cannot log in to management group

Hi there,

Just wondering...are these VSA's or physical Storevirtual nodes?

I have seen it before where the nodes lose their default gateway preventing CMC connectivity (when the CMC is in a different subnet to the nodes).

Also are you using split networking? e.g. one nic for management and one for iSCSI? Check these via the console/iLO to make sure they are still configured correctly.

Cheers,

Ben

fusiongroup
Advisor

Re: Cannot log in to management group

Hi,

They are four physical StoreVirtual nodes.

The VM with the CMC installed on it connects via a NIC on the same subnet.

The NICs are bonded for iSCSI. Each of the nodes has a 'mgmt' NIC in them, but when the nodes were installed the guy couldn't get them configured for some reason. I've never been able to configure them from the CMC either... I've just plugged the management NIC in to the network it's lighting up OK, but it's not getting an address from DHCP.

I installed the CLIQ Shell on my CMC VM and ran a GetGroupInfo against each of the three 'good' nodes, but two fail with 'connection timed out'. One just hangs until I <Ctrl>+C. The 'bad' node tells me the credentials are incorrect, but connecting with the default 'admin/admin' accounts gets results... It tells me the device isn't part of a management group.

I moved all data off the SAN so I can have a proper play with it today. I'm thinking that I need to get the three 'good' nodes need to be responding properly before I attempt to add the 'bad' node to the management group. I've rebooted all the nodes and I'm now I can connect to the management group in CMC again.

It chucked an error saying that the 'bad' node thinks it is part of the management group, but that the management group disagrees. It has a status of 'Joining/leaving mangement group, Storage system stat missing'. I fI try to log in to it, I get and error: 'Socket is closed'. I still won't respond to a jumbo frame ping.

fusiongroup
Advisor

Re: Cannot log in to management group

Hi,

Just updating this because I think I has figured out what the problem is... I have 'Node 1' plugged into the master switch of a two switch stack. 'Node 2 (the bad node) is plugged into the master-standby switch.

When looking at the config on the master switch I can see that jumbo frames are configured across all ports on both switches. However, when I check the stack status, I can see that the master switch believes the standby switch is not present. It seems the standby switch is operational, just in a blank state, so jumbo frames are not enabled.

My thoughts are that there is something not quite right with the stacking module/cabling so the standby switch isn't pulling it's config from the master switch correctly. I'll re-seat eveything and reboot the standby switch outside of business hours.

fusiongroup
Advisor

Re: Cannot log in to management group

Hi,

I've reloaded the stack and now the 'bad' node is pinging with jumbo frames again. Communication is fully restored.

I still can't connect to the device through the CMC, howver. It just errors out with "Socket is closed"

When attempting to connect using the CLI, I am told that it's a credentials issue. I have managed to make a successful connection using the default admin/admin account. I have attempted to change the admin account name and password to match the credentials set on my cluster, but it just errors out saying that the node must be in a management group. I'm getting this same error for most of the commands that I attempt to run.

I have added a matching user onto the management group on the CMC in the hope that it will attempt to try both sets of credentals to log in the node, but that hasn't worked either.

BenLoveday
Valued Contributor
Solution

Re: Cannot log in to management group

Hmm, sounds like you might be better off rebuilding the management group completely from scratch if you've been able to migrate the data off...

Resetting each node to defaults is fairly quick from the iLO.

Once all nodes are back at default you should be able to discover them as available devices, from here you can patch them and create a new MG, etc.

Just remember to unmount these datastores from your hosts if using vSphere so the hosts don't flip out :-)

Cheers,
Ben

fusiongroup
Advisor

Re: Cannot log in to management group

I was considering it rebuilding the whole thing, but the one disk I can't move off without scheduling weekend downtime is the cluster witness disk.

I think I'll fire up to the iLO, reset the bad node and then see if I can pull it back in to the original group. It's currently showing twice in the CMC, once directly under the group where is shows as 'Joining/Leaving the management group' and once under 'Storage Systems' where it shows as being removed for repair.

I'm hoping that if I reset the thing back to defaults, it will be classed as 'repaired' and pull it back in.

Fingers crossed...

Edit: Well I used the 'Edit Cluster' function to make all the required changes at the same time and now all my volumes are restriping. I've set all the rebuild priority rates to 'Medium' and the ETA is approx 17 hours.

One thing though. I still have the old storage unit in the magagement group from when I ran the 'Repair Storage System' option on it. I can't contact the device (it lists the MAC address as RIP0-xx:xx:xx:xx:xx:xx) and there are no options to remove it.

Is there any way of getting rid of it?

Thanks.

 

fusiongroup
Advisor

Re: Cannot log in to management group

Well I couldn't find any way of removing the system so, in the end, I changed the cluster witness disk on the server, removed the whole management group and recreated it all from scratch.

VMs are moving back right now.

Thanks for your help :)