StoreEver Tape Storage
1825533 Members
3044 Online
109681 Solutions
New Discussion юеВ

TLB Exception causes E1200 to reboot

 
SOLVED
Go to solution
Ian Grobler
Frequent Advisor

TLB Exception causes E1200 to reboot

Recently I have seen a strange issue on two seperate sites with similar infrastructure.
Scenario: MSL5030 LTO Library (FW 0430) with a LTO Drive (FW E38W) connected to a SAN via a onboard E1200 (FW 5606). While running a backup (Data Protector 5.1 with necessary patches) an error occurs on the Tape which raises and exception on the Tape Library. The E1200 interprets this exception as some sort of hardware error and reboots itself.

Surely this is incorrect? A problem with a faulty tape should not cause a re-initialize of the complete E1200. The exception being raised is internal between the E1200 and MSL5030(?) so I cannot see software being involved unless DP5.1 is causing this TLB exception on the Library.
Recently another site with a slightly older version of E1200 Firmware has done the same thing after a couple of errors on a tape. Any ideas? Sample of the E1200 event logs attached.
22 REPLIES 22
Marino Meloni_1
Honored Contributor

Re: TLB Exception causes E1200 to reboot

I have two suggestion:

first, log on to the serial port and reset to default value the NSR, this will reset some filds inside the unit used to store data by the bios.

second try to downgrade the version to the previous one and observe if it will solve
Ian Grobler
Frequent Advisor

Re: TLB Exception causes E1200 to reboot

Thanks, I have been waiting for some time from the customer to reset the E1200. As we are dealing with the same issue/exception on different versions of firmware I don't think changing firmware will make a difference. I will reset to defaults and setup from scratch to see if the makes a difference.
Ian Grobler
Frequent Advisor

Re: TLB Exception causes E1200 to reboot

I have reset the E1200 to defaults last week and set it up from scratch again. As soon as a tape error occurs I will be able to tell if this has helped (Let's hope so!).
Marino Meloni_1
Honored Contributor

Re: TLB Exception causes E1200 to reboot

A new info, I got, it your NSR is connected to the lan, some monitoring sw can cause the NSR to reboot after exeptions. To identify if this is the cause of your reboot, you have to let the lan cable disconnect, or connect it to a PC dircectly with a crossed cable
Ian Grobler
Frequent Advisor

Re: TLB Exception causes E1200 to reboot

Yes, the E1200's in question are connected to their respective LAN's via the Ethernet with a fixed IP configured. I am not aware of any tools which may be actively monitoring these but this definitely cannot be ruled out. I am still awaiting a tape issue after the reset to see how it goes. Will let you know when/if this occurs. Your advice much appreciated!

Re: TLB Exception causes E1200 to reboot

All those routers seem to reboot if they get a certain kind of packet. I first suspected SNMP but this also happens when plugged in to a disabled port on our Cisco-Switches.
(idea was to disable the port on the switch and simply enable it to be able to manage the NSRs)
The only solution is to pull ethernet and plug it in when you need to manage the router.
This problem exits since i use these routers and HP (or Crossroads) don't get this problem fixed, so it's a waste of time going back to an older version!

Bernhard
Ian Grobler
Frequent Advisor

Re: TLB Exception causes E1200 to reboot

Unfortunately resetting the E1200 to defaults and setting up from scratch did not help - as soon as there were errors writing to tape during a backup a TLB exception occurred on the MSL5030 and the E1200 rebooted. I have seen this on two independant systems this week again. I have not tried removing the E1200 from the LAN as yet - I would lose all remote manageability if I do this at the moment.
Even if there may be some errors on a tape or a drive is acting up it should NOT cause the E1200 to reboot - this just does not make sense. I have attached a E1200 report from one of the two problem sites.. any advice would be most welcome.
David Ruska
Honored Contributor
Solution

Re: TLB Exception causes E1200 to reboot

1) You may want to update to 5.6.69 firmware, as it has some fixes for assertions and TLBs.

---
Fixed between versions 5606 and 5669
Several fixes for Assertions and TLBs

FC ports with nothing attached will no longer fill up the Event Log

The default Target Rest mode is now Alternate

Several SCSI and fibre protocol fixes to improve reliability

Fixed an issue with tape spanning
---

2) If you are still having problems after that, then see if you can eliminate them by disconnecting the LAN (as a short term solution).

3) If the product is still under support, call your HP support center and open a case to investigate root cause.
The journey IS the reward.
David Ruska
Honored Contributor

Re: TLB Exception causes E1200 to reboot

Ian,

We took a look at the event logs you provided. There were two snipits that showed one TLB each on two routers back in January, one on 5.4.25, and another on 5.6.06. We'll check to see if there's anything the code addresses might reveal, but without a previous boot trace there's not much to go on.

The more recent report page you provided does not show any TLBs or reboots since 2/10/05 (start of the log). It does show two SCSI bus resets after a SCSI "write buffer" command (3C) to the drive failed to be sent. The are also 3 errors returned from a drive at different times (2 hardware errors, one medium (tape) error).

Can you provide us the report page for the other router?
The journey IS the reward.
David Ruska
Honored Contributor

Re: TLB Exception causes E1200 to reboot

Bernhard,

You mention that HP has not been able to resolve your issue. Have you provide trace logs to HP for analysis?

If you update to 5.6.69 and still see issues related to network connectivity, I'll be happy to have the engineering team review the router logs to see if we can determine the cause.
The journey IS the reward.
David Ruska
Honored Contributor

Re: TLB Exception causes E1200 to reboot

Ian,

I've confirmed that the TLB you had on 1/8/05 with 5.4.25 has been fixed in 5.6.06.

The second TLB (different issue) with 5.6.06 has not been reported before. If that problem reoccurs (with 5.6.69) we would be interested in trace logs so we can investigate further.
The journey IS the reward.
Ian Grobler
Frequent Advisor

Re: TLB Exception causes E1200 to reboot

David, Many thanks for all the effort on this so far. I have upgraded the router that was on 5.4.25 to 5.6.06 about 2 weeks ago and so far this has been stable.
I will upgrade the other router that is on 5.6.06 to the new 5.6.69 firmware as soon as possible and give it a test. After the two SCSI bus resets on the 5.6.0.6 router occurred earlier in April we removed one of the 'dodgy' tapes from the backup cycle and gave the drive a good cleaning and it has been holding up since.
Will do the upgrade and we take it from. Much appreciated!! Ian
David Ruska
Honored Contributor

Re: TLB Exception causes E1200 to reboot

Ian,

That's good to hear. Keep us posted if you have any issues with 5.6.69.

Here's the problem that was fixed in 5.6.06:

* Previously, if the buffered tape write option was used and the write command was aborted just before the drive responded with an error, then the router might reboot with a TLB message recorded in the router Event Log. This issue has been resolved.

A bad tape or other issue causing drive write problems needed to occur to expose the potential for that TLB.
The journey IS the reward.
Ian Grobler
Frequent Advisor

Re: TLB Exception causes E1200 to reboot

I upgraded the router from 5.6.06 to 5.6.69 this morning after another TLB exception last night.
I have also attached the router report file done just after the upgrade. I will monitor and let you know how it goes. Thanks for the feedback so far.
Marino Meloni_1
Honored Contributor

Re: TLB Exception causes E1200 to reboot

Hi Ian,
Have you run aa "acceptance test" on your ultrium drive, it report sense code 04 44 00 that seems to be an internal error, it may be the source of your trouble

marino
Ian Grobler
Frequent Advisor

Re: TLB Exception causes E1200 to reboot

Using the new 3.5 SR2 LTT tools I ran a "LTO Drive Assessment Test" and it passed the drive without any errors. I also ran a "Connectivity test" and this passed without any errors as well.
I am quite happy that the drive is ok at this stage. The occasional errors being experienced occur during high backup I/O and the fact that the E1200 reboots is the primary concern. I am hoping the new 5.6.69 firmware helps resolve this, even when a problem occurs writing to the tape the router should be rock solid and not reboot. We will be re-using a suspect tape this evening just to give it a good go ;-)
David Ruska
Honored Contributor

Re: TLB Exception causes E1200 to reboot

Ian,

As Marino said, the 0x04/0x4400 sense data from the drive says "Internal Target Failure", which typically means the drive encountered some hardware error, or a hardware condition (e.g. interrupt) that it was not expecting. There were a few cases of that error being reported, with the most recent at 04/21/2005 00:24:55. That does not indicate a restart of the router around that time.

One possible cause of this drive error is mutliple hosts attempted to communicate to the drive at the same time. This can happen if the drives are mapped to more than one host, and a host not performing the backup has an application or service running that does polling. There's an issue with win2003 that cases test unit ready commands to be sent down. Do you have any win2003 systems on the SAN that see the library? Windows RSM should also be disabled for these devices (unless needed by the backup app on the backup server).

The report page you provided did show a unit restart on 04/20/2005 22:19:43, but there were no errors prior to that. Is that the reboot you are referencing?

It's possible that the previous trace log would have captured some useful info on this, but the firmware update cleared it out.

Should you have another failure, capturing the full router report page and collecting an LTT support ticket from the drive as soon as possible, would allow us to have the best info to help identify the cause.
The journey IS the reward.
Ian Grobler
Frequent Advisor

Re: TLB Exception causes E1200 to reboot

There are three Win2003 hosts that have access to the library (as mapped on the router). All 3 hosts have the registry keys set to disable TUR and the RSM services are stopped and disabled (The backup application is Data Protector 5.1). One of the Win2003 hosts is a NAS4000 appliance which may/may not have some additional components on it causing this - it's configured "as is" as shipped by HP. Otherwise there should not be any additional access and the one other host (SMA) is zoned not to see the library.
Yes, the unit restart on 04/20/2005 22:19:43 is the latest reboot I was referencing.
There was no problem last night and if another issue occurs I will capture the full router report page and collect a LTT support ticket as requested. Thanks for all the help on this so far.
Marino Meloni_1
Honored Contributor

Re: TLB Exception causes E1200 to reboot

Another component that usualy is polling the SAN and can disrupt the backups are the Insight Agents Versions 7.20 (actualy the latest) have the possibility to stop polling tapes with all other agents active. (in previous one you should disable all the FC component or use specific registery manipulation)
So I suggest if you have the agents installed to upgrade to last release and to stop the related component
Ian Grobler
Frequent Advisor

Re: TLB Exception causes E1200 to reboot

Thanks Marino, The FC Tape support has been removed from the active agents on all 3 the SAN-connected servers. I double checked the 0x04/0x4400 sense data and it actually occurred at the exact time the Exchange host was scheduled to start it's backup (12:30 on host, 12:24 on router). As another backup was still occurring the 12:30 backup was placed on hold until the device was free, thus the 0x04/0x4400 sense data could/is definitely related to 2 servers accessing the backup device. This is obviously all controlled from Data Protector as per schedule. I could move the backup job time on a bit but would assume that the 0x04/0x4400 error is transient and not a real issue in this case - it would just indicate that another host has tried to access the device and is not really an error(?).
Ian Grobler
Frequent Advisor

Re: TLB Exception causes E1200 to reboot

Just an update. While I was on leave the E1200 rebooted 3 nights in a row one week and another 2 nights the next. The support team ended up testing direct-attach SCSI backups on the library which worked 100%. HP ended up replacing the E1200 card (FW 5.6.69) with a new one (5.6.06) and the system is now running 100% without any offending reboots. The other site which I upgraded to 5.6.06 from 5.4.25 two months ago is still running fine without issue. So.. I missed out on getting those traces but in the end of the day the E1200 seams to have been at fault, even with the 5.6.69 firmware there must have been some sort of hardware issue with the card. Many thanks to all that have assisted with this one!
Ian Grobler
Frequent Advisor

Re: TLB Exception causes E1200 to reboot

E1200 replaced with new E1200.