- Community Home
- >
- Storage
- >
- Midrange and Enterprise Storage
- >
- StoreVirtual Storage
- >
- Re: Collapsing a multi-site SAN down to one vlan
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-28-2011 11:50 AM
тАО02-28-2011 11:50 AM
Collapsing a multi-site SAN down to one vlan
What they're suggesting, since I have the storage nodes split up into two sites already, and they're showing up in A/B/A/B/A/B/A/B order when I edit the cluster in the CMC is that I can shut down all the VMs, log out the iscsi sessions from the cluster, delete the second VIP and reassign all the storage nodes in site B to the subnet for Site A. It won't change the storage order in the cluster, things will just start communicating, and the VIP failover is supposedly fast enough to avoid problems. We are running a virtualized FOM in a third site.
We're using vSphere 4 update 2+patches, and we're running into the iSCSI software initator issue - no routing between subnets from the software initiator. And since each iSCSI target is assigned a single storage node as a gateway (and thus in a single subnet), losing a single connection from any VMware host will cause it to lose a path to any storage on the corresponding subnet (I guess we could have two paths per subnet, but I haven't tested that configuration yet. And that doubles my copper SFP and switch port count). Without the ability to route across subnets with iSCSI (or doubling my connection count, if that would even work), any storage switch reboot or failure (two in each site) is a single point of failure. That's no good. A single switch failure is more likely than a site failure, honestly.
What I was told is that for environments like ours (multiple buildings, single campus, VMware hosts in clusters distributed between the sites), they're moving away from the multi-subnet model in 9.0 anyway, and we'd be better off just switching now.
It makes sense for me. Certainly sounds better than my current situation, or doubling the number of vmnics assigned to iSCSI from 2 to 4, or doing the unsupported VMware "manually add a route for the iSCSI interfaces" outlined under "Third Topic" in the Multivendor Post on using iSCSI with VMware vSphere document. http://virtualgeek.typepad.com/virtual_geek/2009/09/a-multivendor-post-on-using-iscsi-with-vmware-vsphere.html
Thanks in advance for any opinions or suggestions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-28-2011 05:06 PM
тАО02-28-2011 05:06 PM
Re: Collapsing a multi-site SAN down to one vlan
Do you have both storage subnets "stretched"?
When I setup a multi-site SAN I configure (2) stretched VLANs, so BOTH storage subnets are accessible at both "sites".
Then you add both VIPs to the vSphere hosts.
This way if you have a local path problem, it simply fails-over to the WAN site...
Maybe I'm not seeing your issue properly. Do you have a network diagram of your iSCSI environment that you could post?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-28-2011 05:58 PM
тАО02-28-2011 05:58 PM
Re: Collapsing a multi-site SAN down to one vlan
However, while there are two VIPs - one on each site - san/iq assigns only one gateway for any iSCSI target/volume, on a single storage node at any given point.
If that storage node holding the gateway is in site A on subnet A, and any vSphere host in any location loses its connection to subnet A due to a cable failure or a switch failure, there's nothing to fail over to for that target. The vSphere hosts' connection to subnet B cannot talk to an assigned iscsi target gateway in subnet A, because vSphere's iscsi software initator doesn't support routing. There's no gateway assigned for that particular volume in subnet B on any of the storage nodes there. And nothing in that situation would make san/iq reassign the gateway for that target to another storage node. It's either going to be a problem with a single vSphere host or a single switch, neither of which will disrupt array communications.
The only way to work around that, aside from the unsupported routing trick, or doubling the number of connections (which I haven't tested it yet) so I have two connections to each subnet - one per switch per vlan per vsphere server, four total per server - is to drop it down to one subnet.
The way it's set up currently, it can work in the event of a complete site failure. But a single switch failure/reboot is very disruptive.
Does that make sense?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-01-2011 06:34 AM
тАО03-01-2011 06:34 AM
Re: Collapsing a multi-site SAN down to one vlan
What if you setup 1 vSwitch for iSCSI traffic with 2 pNICs:
Nic 1 goes to switch A. Switch A is a trunk port and both storage VLANs are allowed
Nic 2 goes to switch B. Switch B is a trunk port and both storage VLANs are allowed.
You setup 2 iSCSI kernels. one in each subnet, and tag the portgroup for the appropriate VLAN.
Nic 1 is the primary NIC for (iSCSI kernel 1) Storage Subnet "A", Nic 2 is the standby for Storage Subnet "A"
Nic 2 is the primary NIC for (iSCSI kernel 2) Storage Subnet "B", Nic 1 is the standby for Storage Subnet "B".
This way you would have both VIPs on each vSphere host, 2 physical NICs into redundant switches. Each host has redundant paths to BOTH storage subnets.
You have BOTH VIPs entered into ALL vSphere hosts. The Storage stack will use the 2 storage paths for failover, the network stack will provide failover from link, nic, or switch failure.
I'll have to dig more into what you were saying about "moving away from the multi-subnet model in 9". I'm not sure how they are accomplishing this, unless they have built better/faster VIP failover algorithms. In the past the VIP failover was too slow for database apps (SQL/Exchange), and VMs and could cause crashes during site link failures as they would timeout before the VIP failed over. Maybe this has changed in 9.0??
Any HP/P4000 moderators care to comment on this? Sounds like I've got some reading to do! ;)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-01-2011 07:00 AM
тАО03-01-2011 07:00 AM
Re: Collapsing a multi-site SAN down to one vlan
I actually think that config would work. I was somewhat caught up in the "configure one NIC, disable the other on each portgroup" advice. You know, ours may even be simpler. We don't need trunks - it's two subnets in a single vlan. We don't need tagging at all.
I have to think through what to do on the Linux side. Our only non-VMs on this are going to be a pair of Novell SLES 10 Sp3/ OES2 file servers in a cluster, and I need to figure out whether I can get the correct standby behavior out of the software iSCSI initiator, DM-Multipath and the two NICs.
In terms of VIP failover - how long are we talking? It looked to me like a VIP move only takes a few seconds if the VIP-holding storage node goes down for some reason but the site is up - that's about the same amount of time that it would take the VRRP gateway on our brocade 10GE switches to fail from one to the other in the case of a switch outage. When I was reassigning gateways on storage nodes, during a planned shutdown, I did the VIP-holder last, but it moved pretty much instantly.
In the case of a site failure with a FOM configured, I'd expect the FOM to intervene and the VIP to come back more quickly as well. They did say to retain our FOM in the new configuration as well.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-03-2011 06:22 AM
тАО03-03-2011 06:22 AM
Re: Collapsing a multi-site SAN down to one vlan
Did you get anything from HP indicating that they had recently improved the VIP failover?
I helped a customer (about a year ago) migrate TO a multi-site SAN because the VIP failover was too slow, and the only way we maintained high availability (particularly for Exchange) was the multi-VIP configuration.
Just wondering if improvements they made in 9.0 should change my thinking on this going forward.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-03-2011 06:31 AM
тАО03-03-2011 06:31 AM
Re: Collapsing a multi-site SAN down to one vlan
Single link/port failures or storage switch failures are more likely. And there's only a 1-in-8 chance that a storage node that fails is the VIP holder.
So if I'm going to have to compromise anyway, I may as well cover the most common outages smoothly. In a major disaster, we still have replicated data, it may require a little intervention to bring back some services. Right now, we're guaranteed it'll require intervention.
I do wish there was a feature where you could specify that you don't want a few critical volumes to be gatewayed through the same storage node holding the VIP. That way, if the VIP-holder fails, those critical volumes are still running since there's an established path, and if one of those fails, you're more likely to have a running VIP to ask for a new gateway assignment. There's probably a downside to that, but it would be handy.