- Community Home
- >
- Storage
- >
- Midrange and Enterprise Storage
- >
- HPE EVA Storage
- >
- Disk Timeouts on CA LUNs
HPE EVA Storage
1752781
Members
5981
Online
108789
Solutions
Forums
Categories
Company
Local Language
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Discussions
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-22-2007 07:40 AM
10-22-2007 07:40 AM
Disk Timeouts on CA LUNs
Hi all
We have some serious problems with our EVAs.
This is the setup:
We have 4 EVAs
SAN6_EVA4000___ESX: 6100 CA (10km) to SAN4_EVA4000___Coloc: 6100
SAN3_EVA5000___Mix: 3028 CA (10km) to SAN4_EVA4000___Coloc: “
SAN5_EVA4100___UX: 6110 No CA
Connected to SAN6_EVA4000___ESX we have VMWare ESX3.0 running Windows2003 with Fileserver, Exchange, …
Last Monday we received Errors from Windows2003 about Disk timeouts greater then 30sec.
We broke it down:
The disk timeouts were only on the SAN6 LUns that were CA replicated.
No Problems with other SAN6 LUNS.
No Problems with SAN3 LUNs that replicate to the same DG on SAN4 like SAN6.
In the SAN6 Controller Log I see:
11:51:52:053 15-Oct-2007 Controller A upper 0c1b5f0c #8781
A Data Replication Group has transitioned to the Logging state because the alternate Storage System is not accessible.
Corrective action code: 5f More details
11:51:51:780 15-Oct-2007 Controller A upper 0c345f0c #8779
The Data Replication Path between this Storage System and the Peer Storage System has closed, due to slow response on the connection between the specified host port and the Peer Storage System.
Corrective action code: 5f More details
11:51:51:780 15-Oct-2007 Controller A upper 0c18640c #8778
Conditions on the Data Replication Destination Storage System are preventing acceptable replication throughput: Initiating temporary logging on the affected Data Replication Group that is failsafe mode disabled.
Corrective action code: 64
I opened a case last Monday (15.Oct). I escalated it last Friday. Today they tell me that the expert is out for holiday until Wednesday. I speak with people that know less about EVAs then I do. We have 7x24 4h.
What I know:
Timeouts on EVA LUNs reported by ESX are common. We have them some times, I don’t care. Timeouts greater than 30secs that are reported by the OS are an other issue.
My questions:
-Should I be worried about my data?
-Can somebody explain the Controller Events?
-How to proceed with HP Support?
-Any other input is highly appreciated
Thx & rgds Stiwi Wondrusch
We have some serious problems with our EVAs.
This is the setup:
We have 4 EVAs
SAN6_EVA4000___ESX: 6100 CA (10km) to SAN4_EVA4000___Coloc: 6100
SAN3_EVA5000___Mix: 3028 CA (10km) to SAN4_EVA4000___Coloc: “
SAN5_EVA4100___UX: 6110 No CA
Connected to SAN6_EVA4000___ESX we have VMWare ESX3.0 running Windows2003 with Fileserver, Exchange, …
Last Monday we received Errors from Windows2003 about Disk timeouts greater then 30sec.
We broke it down:
The disk timeouts were only on the SAN6 LUns that were CA replicated.
No Problems with other SAN6 LUNS.
No Problems with SAN3 LUNs that replicate to the same DG on SAN4 like SAN6.
In the SAN6 Controller Log I see:
11:51:52:053 15-Oct-2007 Controller A upper 0c1b5f0c #8781
A Data Replication Group has transitioned to the Logging state because the alternate Storage System is not accessible.
Corrective action code: 5f More details
11:51:51:780 15-Oct-2007 Controller A upper 0c345f0c #8779
The Data Replication Path between this Storage System and the Peer Storage System has closed, due to slow response on the connection between the specified host port and the Peer Storage System.
Corrective action code: 5f More details
11:51:51:780 15-Oct-2007 Controller A upper 0c18640c #8778
Conditions on the Data Replication Destination Storage System are preventing acceptable replication throughput: Initiating temporary logging on the affected Data Replication Group that is failsafe mode disabled.
Corrective action code: 64
I opened a case last Monday (15.Oct). I escalated it last Friday. Today they tell me that the expert is out for holiday until Wednesday. I speak with people that know less about EVAs then I do. We have 7x24 4h.
What I know:
Timeouts on EVA LUNs reported by ESX are common. We have them some times, I don’t care. Timeouts greater than 30secs that are reported by the OS are an other issue.
My questions:
-Should I be worried about my data?
-Can somebody explain the Controller Events?
-How to proceed with HP Support?
-Any other input is highly appreciated
Thx & rgds Stiwi Wondrusch
2 REPLIES 2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-23-2007 05:06 AM
10-23-2007 05:06 AM
Re: Disk Timeouts on CA LUNs
It looks like you have a problem with the intersite links. I suggest to check the switch counters, too.
The disk timeout for Windows in a SAN or in a VM should be at least 60 seconds:
> reg query "HKLM\SYSTEM\CurrentControlSet\Services\Disk" /v TimeOutValue
> reg add HKLM\SYSTEM\CurrentControlSet\Services\Disk /v TimeOutValue /t REG_DWORD /d 60
I would also check if any .ISO files are stored on a VMFS together with VMs and mapped. I've been told that VMware ESX server can run into SCSI reservation issues. If such a file is permanently polled by a VM. Suggestion is to store them locally or on a separate VMFS.
> -How to proceed with HP Support?
Yes, that is a BIG problem these days :-( :-( :-( :-(
I suggest that you involve your management and have them talk to HP's upper layers. Don't get angry with those poor souls at layer 0 + 1.
The disk timeout for Windows in a SAN or in a VM should be at least 60 seconds:
> reg query "HKLM\SYSTEM\CurrentControlSet\Services\Disk" /v TimeOutValue
> reg add HKLM\SYSTEM\CurrentControlSet\Services\Disk /v TimeOutValue /t REG_DWORD /d 60
I would also check if any .ISO files are stored on a VMFS together with VMs and mapped. I've been told that VMware ESX server can run into SCSI reservation issues. If such a file is permanently polled by a VM. Suggestion is to store them locally or on a separate VMFS.
> -How to proceed with HP Support?
Yes, that is a BIG problem these days :-( :-( :-( :-(
I suggest that you involve your management and have them talk to HP's upper layers. Don't get angry with those poor souls at layer 0 + 1.
.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-23-2007 06:40 AM
10-23-2007 06:40 AM
Re: Disk Timeouts on CA LUNs
Hi Uwe
I checked the switch counters:
ISL to Colocation are on Port 15
sanswitch7:admin> portErrShow
frames enc crc too too bad enc disc link loss loss frjt fbsy
tx rx in err shrt long eof out c3 fail sync sig
=====================================================================
0: 1.0g 1.2g 0 0 0 0 0 146 3.9k 0 1 1 0 0
1: 40m 12m 0 0 0 0 0 296 3 0 11 12 0 0
2: 704m 3.2g 0 0 0 0 0 565 12 23 31 32 0 0
3: 1.7g 2.5g 0 0 0 0 0 4.2k 29 11 40 40 0 0
4: 3.8g 1.1g 0 0 0 0 0 183 16k 0 1 1 0 0
5: 0 0 0 0 0 0 0 0 0 0 0 2 0 0
6: 6.7m 3.9m 0 0 0 0 0 5.2k 0 2 6 6 0 0
7: 613m 876m 0 0 0 0 0 225m 0 3 101 290 0 0
8: 3.3g 489m 0 0 0 0 0 155k 9 2 4 4 0 0
9: 696m 808m 0 0 0 0 0 102 9 5 12 12 0 0
10: 725m 1.7g 0 0 0 0 0 662 0 12 6 4 0 0
11: 350m 650m 0 0 0 0 0 939k 0 0 4 4 0 0
12: 203m 481m 0 0 0 0 0 1.1k 36k 7 1 44 0 0
13: 63m 36m 0 0 0 0 0 0 9 2 7 8 0 0
14: 0 0 0 0 0 0 0 0 0 0 0 2 0 0
15: 2.3g 2.5g 0 0 0 0 0 14 4.8k 1 2 2 0 0
sanswitch8:admin> portErrShow
frames enc crc too too bad enc disc link loss loss frjt fbsy
tx rx in err shrt long eof out c3 fail sync sig
=====================================================================
0: 155m 175m 0 0 0 0 0 188 16 0 1 1 0 0
1: 0 0 0 0 0 0 0 0 0 0 0 2 0 0
2: 894m 1.2g 0 0 0 0 0 2.0k 11 15 16 17 0 0
3: 1.7g 2.5g 0 0 0 0 0 12k 20 14 72 72 0 0
4: 2.7g 686m 0 0 0 0 0 155 0 0 1 1 0 0
5: 1.4g 4.1g 0 0 0 0 0 278m 19 0 69 93 0 0
6: 2.7g 1.5g 0 0 0 0 0 577k 0 4 10 14 0 0
7: 530k 528k 0 0 0 0 0 208m 0 0 123 441 0 0
8: 538m 3.5g 0 0 0 0 0 264k 0 3 6 6 0 0
9: 4.1g 1.3g 0 0 0 0 0 0 0 5 12 12 0 0
10: 2.1g 1.9g 0 0 0 0 0 24m 0 14 8 4 0 0
11: 86m 3.3g 0 0 0 0 0 192k 25 0 4 4 0 0
12: 170m 53m 0 0 0 0 0 759k 2 3 15 16 0 0
13: 149m 664m 0 0 0 0 0 0 0 2 5 6 0 0
14: 0 0 0 0 0 0 0 0 0 0 0 2 0 0
15: 3.8g 3.3g 0 0 0 0 0 12 414 0 3 2 0 0
-What about these figures? Do you think they are to high?
-Do you know a command to figure out when they were last cleared?
-We dont have any ISO files on VMFS Partitions on the SAN.
-60sec timeouts:
Im speaking with our ESX guys. they are discussing this 60sec timeout setting as well since last monday. Your suggestion implies that a timeout of 30-60sec is absolutely normal. Is there no way to eliminate these?
-Support:
Management is already involved all the way up. Nowadays this does not help anymore.
thx a lot Uwe
rgds Stiwi
I checked the switch counters:
ISL to Colocation are on Port 15
sanswitch7:admin> portErrShow
frames enc crc too too bad enc disc link loss loss frjt fbsy
tx rx in err shrt long eof out c3 fail sync sig
=====================================================================
0: 1.0g 1.2g 0 0 0 0 0 146 3.9k 0 1 1 0 0
1: 40m 12m 0 0 0 0 0 296 3 0 11 12 0 0
2: 704m 3.2g 0 0 0 0 0 565 12 23 31 32 0 0
3: 1.7g 2.5g 0 0 0 0 0 4.2k 29 11 40 40 0 0
4: 3.8g 1.1g 0 0 0 0 0 183 16k 0 1 1 0 0
5: 0 0 0 0 0 0 0 0 0 0 0 2 0 0
6: 6.7m 3.9m 0 0 0 0 0 5.2k 0 2 6 6 0 0
7: 613m 876m 0 0 0 0 0 225m 0 3 101 290 0 0
8: 3.3g 489m 0 0 0 0 0 155k 9 2 4 4 0 0
9: 696m 808m 0 0 0 0 0 102 9 5 12 12 0 0
10: 725m 1.7g 0 0 0 0 0 662 0 12 6 4 0 0
11: 350m 650m 0 0 0 0 0 939k 0 0 4 4 0 0
12: 203m 481m 0 0 0 0 0 1.1k 36k 7 1 44 0 0
13: 63m 36m 0 0 0 0 0 0 9 2 7 8 0 0
14: 0 0 0 0 0 0 0 0 0 0 0 2 0 0
15: 2.3g 2.5g 0 0 0 0 0 14 4.8k 1 2 2 0 0
sanswitch8:admin> portErrShow
frames enc crc too too bad enc disc link loss loss frjt fbsy
tx rx in err shrt long eof out c3 fail sync sig
=====================================================================
0: 155m 175m 0 0 0 0 0 188 16 0 1 1 0 0
1: 0 0 0 0 0 0 0 0 0 0 0 2 0 0
2: 894m 1.2g 0 0 0 0 0 2.0k 11 15 16 17 0 0
3: 1.7g 2.5g 0 0 0 0 0 12k 20 14 72 72 0 0
4: 2.7g 686m 0 0 0 0 0 155 0 0 1 1 0 0
5: 1.4g 4.1g 0 0 0 0 0 278m 19 0 69 93 0 0
6: 2.7g 1.5g 0 0 0 0 0 577k 0 4 10 14 0 0
7: 530k 528k 0 0 0 0 0 208m 0 0 123 441 0 0
8: 538m 3.5g 0 0 0 0 0 264k 0 3 6 6 0 0
9: 4.1g 1.3g 0 0 0 0 0 0 0 5 12 12 0 0
10: 2.1g 1.9g 0 0 0 0 0 24m 0 14 8 4 0 0
11: 86m 3.3g 0 0 0 0 0 192k 25 0 4 4 0 0
12: 170m 53m 0 0 0 0 0 759k 2 3 15 16 0 0
13: 149m 664m 0 0 0 0 0 0 0 2 5 6 0 0
14: 0 0 0 0 0 0 0 0 0 0 0 2 0 0
15: 3.8g 3.3g 0 0 0 0 0 12 414 0 3 2 0 0
-What about these figures? Do you think they are to high?
-Do you know a command to figure out when they were last cleared?
-We dont have any ISO files on VMFS Partitions on the SAN.
-60sec timeouts:
Im speaking with our ESX guys. they are discussing this 60sec timeout setting as well since last monday. Your suggestion implies that a timeout of 30-60sec is absolutely normal. Is there no way to eliminate these?
-Support:
Management is already involved all the way up. Nowadays this does not help anymore.
thx a lot Uwe
rgds Stiwi
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
News and Events
Support
© Copyright 2024 Hewlett Packard Enterprise Development LP