- Community Home
- >
- Storage
- >
- HPE Nimble Storage
- >
- Array Performance and Data Protection
- >
- Re: In-guest stun times during failover
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-24-2015 01:39 PM
тАО12-24-2015 01:39 PM
Hi all,
Was wondering what is the usual duration of the in-guest stuns that you see during failovers. From the moment you lose connectivity to the moment you regain it (or from the moment the guest stops disk activity til it restarts it).
We did some manual failovers and seen recovery times between 20 and 30 secs. In my opinion it is a bit much and was hoping to find some examples from your experience.
Thanks
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-31-2015 09:39 AM
тАО12-31-2015 09:39 AM
Re: In-guest stun times during failover
Not sure what you mean by 'stuns'. I associate 'stun' with the vm freeze that occurs during vMotions which affects the entire vm. Array failover only affects IO. I have seen iscsi connections fail for up to 30 seconds during failover of the array controllers. Most of the time it is around 15-20seconds. Since my scsi timeouts are longer than that it isn't an issue for most vms. However some MS Failover Cluster nodes freak out if they miss a partner io so I have to monitor them during planned failovers. VMware datastores act the same, no issues there at all. Haven't had an unplanned failover yet but I have tested by pulling controllers and the results are about the same.
Nimble likes to point out that lots of companies upgrade during business hours but I think that is silly. Why run the risk? I'd never schedule a failover during potentially heavy IO periods.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-31-2015 09:44 AM
тАО12-31-2015 09:44 AM
Re: In-guest stun times during failover
Thanks Chuck for sharing your experience.
By stuns I mean IO freeze to vmdk not guest stun.
I also haven't seen any issues with guests but vmware datastores report all paths down after 10 seconds of connection loss. Other than this event, nothing major. Was just hoping that it may be lower then 10 seconds as to avoid the APD errors.
I agree with you on not upgrading during business ours.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-03-2016 08:55 AM
тАО01-03-2016 08:55 AM
Re: In-guest stun times during failover
Hi Vlad.
You might take a look at a post I made a while back regarding similar problems: A disruptive non-disruptive failover?. There's a lot of different feedback there. It's tied to UCS in particular but it also deals with VMware. One thing I'd note is to make sure your NCM package is current since it will help with path selection and failover to the array. Once I made all of the changes and updated our drivers our guests don't notice the IO pause and just keep running.
Alan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-03-2016 11:42 AM
тАО01-03-2016 11:42 AM
Re: In-guest stun times during failover
Hi Alan,
Thanks for chipping in. I have gone through your pos in the past and check the host network from the start. Everything is up to date and when pinging the Nimble iSCSI interface from the host during a failover, we only se a ping fail or gets delayed. So the switch and nics work fine. I went through the esxi logs and it seems that Nimble doesn't start processing iSCSI immediately, only after 20-30 seconds.
I am waiting on support to get a confirmation.
Vlad
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-03-2016 12:07 PM
тАО01-03-2016 12:07 PM
Re: In-guest stun times during failover
Sounds good. As I noted in the post I marked as an answer, when Support pulled our logs they found abnormal login timeouts. Installing Cisco's enic drivers helped get those under control, along with keeping our ESXi and NCM patches up-to-date.
Out of curiosity, do you have you Nimble connected to dedicated uplink switches? Our UCS setup uses Appliance Ports, so some minor issues come up because more paths and VLAN routes have to go down during failover of the controllers or UCS FIs. In previous iSCSI setups I've run we had fully redundant uplink switches so there was extremely little disruption to the paths even during an array's failover (since there were multiple physical paths to a given interface or VLAN). I've opted not to purchase switches just for this purpose since UCS and Nimble are designed to handle changes gracefully, but that means I've seen slightly longer, though rarely problematic, failovers.
Alan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-03-2016 12:30 PM
тАО01-03-2016 12:30 PM
Re: In-guest stun times during failover
We have a 5412zl HP chassis with 2 x 10 gig/8 port modules. These are used only for iSCSI traffic. Each module is on a separate VLAN.
During failover we have not seen any IP disruption, only the occasional delayed ping between host iscsi port and nimble iscsi port which is expected as the target IP floats between controllers.
According to the default vmware timeouts and to my understanding of those, after 10 seconds of no iSCSI servicing (RecoveryTimeout), the esxi host drops the iscsi connection and tries to login. Every 5 seconds after, the hosts attempts to login until it succeeds.
We have applied the settings from KB-┬н000087 although only LoginTimeout would have made sense even if we have a small environment, but we have seen no changes in behavior.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-03-2016 12:37 PM