Tape Libraries and Drives
cancel
Showing results for 
Search instead for 
Did you mean: 

M2402 Router down monthly

alexlex
Occasional Contributor

M2402 Router down monthly

I have to power off and pwower on it so that it will work again. And there isn't any reason in Event Log.

MSA1000
Tape library MSL5026
Fabric Switch 6

Following is the date that it was down.
11/29/2006
12/09/2006
01/02/2007
01/09/2007
01/16/2007
01/18/2007
01/19/2007upgrade firmware from 5.6.78 to 5.6.8c. It seems that the new firmware reduced the times of down.
02/07/2007
03/16/2007
04/02/2007

Following is the Event Log. There are only some event after it rebootted.

1231. 03/16/2007 12:29:51 0d00h00m14.91s New device is added to location 3/2/0/0
1232. 03/16/2007 12:29:51 0d00h00m14.91s New device is added to location 3/2/1/0
1233. 03/16/2007 12:29:51 0d00h00m14.91s New device is added to location 3/2/2/0
1234. 03/16/2007 12:29:51 0d00h00m15.46s Unit restart and initialization, Firmware Version: 5.6 Build Level: 5.6.8c
1235. 03/16/2007 12:29:54 0d00h00m15.62s FC Port 0 Link is UP.
1236. 03/16/2007 12:29:54 0d00h00m15.62s FC Port 1 Link is UP.
1237. 03/16/2007 12:29:58 0d00h00m19.58s Voltage normal
1238. 03/16/2007 12:29:58 0d00h00m19.60s Fan Tray Is Present
1239. 03/16/2007 12:29:59 0d00h00m19.66s Fan RPM normal
1240. 03/16/2007 12:29:59 0d00h00m19.72s Power Supply normal
1241. 03/16/2007 15:26:36 0d02h56m53.73s FC discovery requested for port 0 via HTTP Discovery Menu
1242. 03/16/2007 15:26:39 0d02h56m55.95s FC discovery requested for port 1 via HTTP Discovery Menu
1243. 03/16/2007 15:26:41 0d02h56m58.59s pSCSI discovery requested for port -1 (all) via HTTP Discovery Menu
1244. 04/02/2007 16:25:08 0d00h00m14.91s New device is added to location 3/2/0/0
1245. 04/02/2007 16:25:08 0d00h00m14.91s New device is added to location 3/2/1/0
1246. 04/02/2007 16:25:08 0d00h00m14.91s New device is added to location 3/2/2/0
1247. 04/02/2007 16:25:09 0d00h00m15.46s Unit restart and initialization, Firmware Version: 5.6 Build Level: 5.6.8c
1248. 04/02/2007 16:25:12 0d00h00m15.62s FC Port 1 Link is UP.
1249. 04/02/2007 16:25:12 0d00h00m15.62s FC Port 0 Link is UP.
1250. 04/02/2007 16:25:16 0d00h00m19.58s Voltage normal
1251. 04/02/2007 16:25:16 0d00h00m19.60s Fan Tray Is Present
1252. 04/02/2007 16:25:16 0d00h00m19.66s Fan RPM normal
1253. 04/02/2007 16:25:16 0d00h00m19.72s Power Supply normal

7 REPLIES
Jack Trachtman
Super Advisor

Re: M2402 Router down monthly

Our M2402 is also running 5.6.8c.

We have rarely (never? I'm not sure) had failures with the unit, but

The web interface often is not accessible. I only access this every few months, but have often had to repower the unit to get a response from it.
Marino Meloni_1
Honored Contributor

Re: M2402 Router down monthly

the log report only the reboot operation, but you may have some info in the traces.
you can attach here the full report of the nsr availble from the left hand menu > reprot, saved in mth format (html single file)
alexlex
Occasional Contributor

Re: M2402 Router down monthly

Here is the report page.

Another question:the info in the traces is hardly to understand. Is there sometihing could help me to read the trace info?
Marino Meloni_1
Honored Contributor

Re: M2402 Router down monthly

The traces do not show any errors that can cause a nsr lock.
but there are a few things that you may correct in order to make the system more stable:

I see that the host listed below ( with WWN 10000000 C930B36D) is issuing a lot of TUR (Test Unit Ready), this may cause problems during backups
010200 872F34F0 PRLI_LOG_IN 20000000 C930B36D 10000000 C930B36D Auto Assigned

You should stop the RSM service in the windows servers, and check if the driver for the SDLT you are using have a checkbox that allow to disable TUR sending.


Also, it seems you have no zone created on your switch, because the NSR can see the MSA controllers, better to avoid this situation
FC Module 0 Devices
Port Device Type State D_ID Map Count Node Name Port Name LUN ID Device Description Serial No.
0 CTLR UP 0x010600 0 0x500805F300016A40 0x500805F300016A41 0 COMPAQ MSA1000 4.48
0 ENCL UP 0x01EF01 0 0x100000E0241213F7 0x500805F300016A44 0 Compaq MSA Fabric Sw 6 G12
1 CTLR UP 0x010600 0 0x500805F300016A40 0x500805F300016A49 0 COMPAQ MSA1000 4.48
1 ENCL UP 0x01EF01 0 0x100000E02412140B 0x500805F300016A4C 0 Compaq MSA Fabric Sw 6 G12

also, there are several hosts that are getting maps and then the library presented to them.

Host Maps
8B85B3FA (FC Port Name (Low)) Host Id 10300 FC Port 0 Auto Assigned
C92FC826 (FC Port Name (Low)) Host Id 10100 FC Port 0 Auto Assigned
C930B36D (FC Port Name (Low)) Host Id 10200 FC Port 0 Auto Assigned
8BA5B3FA (FC Port Name (Low)) Host Id 10300 FC Port 1 Indexed
C9432D55 (FC Port Name (Low)) Host Id 10100 FC Port 1 Indexed
C930B520 (FC Port Name (Low)) Host Id 10200 FC Port 1 Indexed

I alway recommend to create a dedicated map for the Host or Hosts that should see the library, and create an empty auto assigned map that will present nothing to the host that should appear and register unexpected to the NSR.


in conclusion, the traces (usualy this appear in previous or assert traces) do not show any cause of hang or reboot.

And the way to read the traces is to identify the D_ID of each trace and identify the SCSI command, then you can have an overview of what is going on
alexlex
Occasional Contributor

Re: M2402 Router down monthly

Marino Meloni, Thank you very much.
I have a new question about the mapping:the m2402 connects to two Compaq Fabric Switch 6. Every hosts have two fibre card and one card is connecting to one of the switchs.

If i use the mapping as below:
8B85B3FA (FC Port Name (Low)) Host Id 10300 FC Port 0 Auto Assigned
C92FC826 (FC Port Name (Low)) Host Id 10100 FC Port 0 Auto Assigned
C930B36D (FC Port Name (Low)) Host Id 10200 FC Port 0 Auto Assigned
8BA5B3FA (FC Port Name (Low)) Host Id 10300 FC Port 1 Auto Assigned
C9432D55 (FC Port Name (Low)) Host Id 10100 FC Port 1 Auto Assigned
C930B520 (FC Port Name (Low)) Host Id 10200 FC Port 1 Auto Assigned

Then every host will see 2 tape library(in fact only one). Whether it will effect the system's stability? If it will what i should do.
Marino Meloni_1
Honored Contributor

Re: M2402 Router down monthly

You should not have dual connection from the same host. if you keep this config, you should disable all the double device in device manager.
Usualy, backups are not so critical and dual path with automatic failover do not work, in any case, your backup will fail and should be reinitiated.
The tape drives do not support host with multinintiator, and if you want to have redundancy in your san, you should take care manualy of this in a nearline environment.
www.hp.com/go/ebs here you have EBS design guide with all the rules applying in this environment.

The easyest way is to have only one path from the host to the tape/robot, eventualy have an alternative path via another host, but this is managed by the backup aplication
Heshmat
Advisor

Re: M2402 Router down monthly

In other words dual path for a single drive is not supported.
About the Test Unit ready issue, where removable storage service issues this command thru the plug'n play driver, if you disable the service, in W2K3 the service will say disable but this works for as long a you dont reboot the server, if you reboot the server even if the service reports disable, the service might still be running, this is reported in MS article 842411.
If you follow the MS article on each one of the MS boxes that see the library, you should be able to get rid of this issue for good (TUR interrupting back ups).