Networking
1751872 Members
5449 Online
108782 Solutions
New Discussion юеВ

Re: Unresponsive DL100 and Gigabit Ethernet

 
Rod Hewitt
Occasional Advisor

Unresponsive DL100 and Gigabit Ethernet

We run W2K3 SBS and Windows Storage Server.

An ML370 G4 and a DL100 (formerly known as NAS 1500s) are connected to each other using gigabit Ethernet through a Belkin switch.

When using Explorer to copy a 10 GB file from the DL370 G4 to the DL100, the NAS appears to freeze. Any other use of the NAS is impossible тАУ you can't even log in to it. If the transfer is terminated the NAS does, eventually, appear to get back on its feet but that can take a fifteen minutes or more. It makes no difference if Explorer runs the copy on the ML370 or on the DL100.

We have reverted from using gigabit through the Belkin switch to 100 Mbps connections on the HP ProCurve 2650 switch.

As of now, it appears to be working fine. So why did we have problems when using gigabit?

The critical issues appear to be:

┬╖ Size of file
┬╖ Use of gigabit
┬╖ Use of Belkin switch

I have happily copied large files and done backups over the network in the past. However, doing so now causes major problems.

Is the change from the Belkin to the HP switch itself important? Or is it just that 100 Mbps is throttling back the maximum transfer rate?

Any help would be very much appreciated.

ML370 Event Log Entries
===================

Event Type: Error
Event Source: cpqcissm
Event Category: None
Event ID: 117
Date: 04/02/2005
Time: 10:24:06
User: N/A
Computer: ***SBS
Description:
The driver for device \Device\Scsi\cpqcissm1 detected a port timeout due to prolonged inactivity. All associated busses were reset in an effort to clear the condition.

Event Type: Error
Event Source: cpqcissm
Event Category: None
Event ID: 9
Date: 04/02/2005
Time: 10:24:08
User: N/A
Computer: ***SBS
Description:
The device, \Device\Scsi\cpqcissm1, did not respond within the timeout period.

Event Type: Error
Event Source: NIC Agents
Event Category: Service
Event ID: 1285
Date: 04/02/2005
Time: 10:27:31
User: N/A
Computer: ***SBS
Description:
NIC Agent: Connectivity has been lost for the NIC in slot 3, port 1. [SNMP TRAP: 18006 in CPQNIC.MIB]
5 REPLIES 5
Ron Kinner
Honored Contributor

Re: Unresponsive DL100 and Gigabit Ethernet

The base DL100 only has 512M of memory. Don't know what you have in the 370. I would expect a gigabit interface (if using UDP)could fill available memory faster than the DL100 or the 370 can transfer data to their harddrives so I expect what you are seeing is a buffer overrun due to a lack of flow control. I'm not familiar enough when any of the equipment you are working with to tell you which is at fault but I would look for a flow control issue. Perhaps the switch is disabling flow control?

Ron
Rod Hewitt
Occasional Advisor

Re: Unresponsive DL100 and Gigabit Ethernet

Ron,

Thank you - that is where I had got to in a sort of intuitive way. We have 4 GB on the ML370.

I am not an Ethernet expert (far from it). So, reading between the lines, we should be using Flow Control and, if we were, this should help with the problems?

I shall be checking that out in the next few minutes.

Rod
Oleg Koroz
Honored Contributor

Re: Unresponsive DL100 and Gigabit Ethernet

Jenik
New Member

Re: Unresponsive DL100 and Gigabit Ethernet

Have you found the solution? I have the same problem on DL380. Both NICs loss their connectivity.
Rod Hewitt
Occasional Advisor

Re: Unresponsive DL100 and Gigabit Ethernet

Jenik,

We are not really sure!

We switched on flow control (on every server NIC in the network).

We readjusted the schedule for the DL100's Volume Shadow service to avoid busy times on the DL100.

We changed some of the wiring round.

We replaced some of the cables with 'real' Cat 6 ones.

We used MS Network Monitor on various servers to try to discern anything useful.

We changed our backup device (from DAT to Ultrium).

Finally, I noticed that DHCP on the SBS server was not correctly configured. It should either have excluded or reserved the ip address to be used by the DL100. It didn't. So I set it to reserve the address.

I am now suspicious that when something on the network asked DHCP for an ip address it might be given the one in use by the NAS. This would cause a slight confusion with DHCP eventually saying that it was 'bad' because it was already in use. This really occurred to me as, after a long period with little problem, a brand new thin client box was added to the network and the problem resurfaced. With hindsight, I now realise that we have had few problems since the last round of thin client installations.

I have now updated the DL100 to SR5.5. (Just this weekend!) So we now await any further incidents.

We did find that SNMP.exe (a service) on the DL100 had a tendency to run at 50% processor after a reboot. Restarting the service helps.

Enjoy yourself picking from that smorgasbord of options. And the best of luck. Please do post again if you get a definitive answer or if I can be of more help.