StoreEver Tape Storage
1751781 Members
4118 Online
108781 Solutions
New Discussion юеВ

Re: Constant EML problems, fixed only by reboot

 
jknightcs1
Advisor

Constant EML problems, fixed only by reboot

We purchased an EML E-Series from HP a few months ago, and it's been giving us nothing but headaches. Every couple of weeks, especially during heavy-duty weekend backups, it would stop unloading tapes, or drives would stall while trying to inventory tapes and refuse to even eject them, even through commandview. The problem appeared to lie with the interface controller (it would be fixed after rebooting the controller and/or the entire EML, and there were error messages about lost connection to the IFC), and after going back and forth with HP support for a while, they finally replaced it last month.

However, a couple of weeks ago, two of the 8 tapes drives started to stall while reading tapes, without any apparent problems with the IFC. After disabling the drives and seeing if that solved the problems, a third tape drive started doing it last weekend. After shutting down the EML for 30 seconds and bringing it back up, they magically worked again. I'm going to call HP again, but I've had a very mixed experience with their support department, and we're getting very frustrated.

Has anyone else had similar problems with their EML's, especially with error messages like the following in Commandview TL?

(request id = HOST/0x101d6bb8) IfmMove::prepareForMore():DRIVE UNLOAD of (LMRC) 0,3,1,9 with tape AT1981L4 failed

Director - fetchResponse() cmo_user_fetch: Retry Performed 5610; End of Text
13 REPLIES 13
Wei Jung
Trusted Contributor

Re: Constant EML problems, fixed only by reboot

I still got some questions,

What OS are you running on the server used for backup?
What backup software?
What type of switch is this EML connected to?
Is there is a mixed enviroment are you using Windows? HPUX? RHEL?
Also, are they independently zoned?

The reason of all these questions is that your symptom might be caused by polling, so we need to understand your backup solution before any suggestions can be made.

Other issue might be the library running old versions of firmware that will make IFC hang up. Also there is a feature that can be disable from the CLI if the EML has set support enable.
jknightcs1
Advisor

Re: Constant EML problems, fixed only by reboot

Thanks for responding. Here's out setup:

We're running EMC (formerly Legato) Networker 7.3.3. The main backup server runs Red Hat Linux AS 4 Update 6. There is also a storage node server, also Red Hat AS 4 Update 6, that has two QLogic HBA's zoned into a two different Brocade switches. The EML is also zoned into these two switches (the robot and 4 drives on one switch, 4 other drives on the other switch). So the storage node controls the EML, which is controlled by the main backup server. Not sure what you mean by independently zoned, but I can find out if you define it for me.

We're backing up mostly Windows 2003 and Red Hat/Oracle Linux, with some Windows 2000 and a few Solaris Sparc 9 machines as well.

As far as firmware goes, we're on the latest version for everything; robot, interface controller, etc. That's always the first thing the tech had me do in my multiple phone calls.

Let me know if you need any other information.
GustavoT
Valued Contributor

Re: Constant EML problems, fixed only by reboot

Could you get a support ticket from that IFC and post it here?

independently zoned: I believe this means to have a zone per OS, let's say you have Win, RHLE and HPUX, you should create a zone for each OS and include the EML on them.
Aashique
Honored Contributor

Re: Constant EML problems, fixed only by reboot

Hi,
when your all drive is not working that time please check the all drive status.May be your one of drive is not working thats why all drive makes problem.I got this type of problem and after that found that one of drive have problem.

Thanks & Regards
Aashique
jknightcs1
Advisor

Re: Constant EML problems, fixed only by reboot

I'll try to attach the support ticket, though the forum is giving me problems doing that. As far as I know, we don't do independent zoning. And I always check the tape drives when this issue crops up; no indication they have a problem. They're always in the green in command view.
Marino Meloni_1
Honored Contributor

Re: Constant EML problems, fixed only by reboot


From your initial description of the problem, this seems to be related to some of the drives that stay in a "reserved" contition.

This command prevent any host to send commands to the drives while in use by the owner until the owner issue a "release" command.

If the communication between the owner host and the device is lost, the drive stay in a "reserved mode" that mean that it appear like dead as it do not accept any command from other host or from management tool.

Only a power cycle of that drive can clear the "reserved flag" and this is what it appear to solve your problem.

Then I would investigae in two directions:
1- be sure you do not have connectivity issue between your host and the library with an analysis of the SAN logs, ports logs, etc...

2- be sure you do not have any Windows host getting unexpected ownership of the drives, this may happens if you do not put in practice all the recommendations for windows hosts in san environment listed in the EBS design guide www.hp.con/go/ebs

Also, if you have disconnections issues between the IM and the IC, that are reported in Command view, these are usualy purely cosmetics, as the IM is a management tool, and if it do not get an answer from the IC (because he is busy) during one of his enquiry, it will inform about that, but it will recover during next attempt.

marino
jknightcs1
Advisor

Re: Constant EML problems, fixed only by reboot

That does sound like a possible reason. I'm looking over the docs now to see if it helps out.

In my case, I believe I'd be exploring problems with Linux hosts, rather than Windows hosts, since the EML is attached via fibre to a Linux server. Let me know if you think otherwise. Is there any way to tell if this "reserved" flag has been tripped the next time this occurs?
jknightcs1
Advisor

Re: Constant EML problems, fixed only by reboot

Actually, the more I look at this, the less I'm certain this is it. The jukebox and tape drives are only zoned to one machine, the backup node server. Wouldn't that mean there shouldn't be any competition from anything else for the scsi addresses?
Marino Meloni_1
Honored Contributor

Re: Constant EML problems, fixed only by reboot

I have seen some linux hosts with some fc hba sending a scsi bus reset command during boot up. If you have any real or virtual linux host in the san rebooting, it may cause problems across the san causing the drive to reset and causing backup abort. This may occurs aslo if the linux host in not in the zone. This action cause the drive to not be connected to the cell server anymore, then when the unload command is issued, it will only be sent to the robot and not to the tape drive, the robot will go in front of the drive waiting for the eject command but the tape drive never receive this command then you get an error


related to the reserve command, you can try to just power down and power up the single tape drive from the front panel of the library, powercycling the device will clear the reservation.