Operating System - HP-UX
1847693 Members
4101 Online
110265 Solutions
New Discussion

Re: Ignite and Dataprotector backup problem

 
SOLVED
Go to solution
Fabian Briseño
Esteemed Contributor

Ignite and Dataprotector backup problem

Hello guys.
I have the following problem.

Problem: Our dataprotector backup are failing.

Ignite software version: C.7.1.93
dataprotector version: 5.1

The scenario is this.
when a dataprotector backup is running and we start to run an ignite backup (with make_net_recovery or make_tape_recovery)the data protector backup fails, it gives the message below

further information would be that tha data protector backup fails when the ignite backup is in the initial stages, when it's doing the scan of the vg's,etc.

"Testing for necessary pax patch.
Recovery Archive Description = Recovery Archive"

just when it finishes doing this, the dataprotector backup fails.


[Major] From: BMA-NDMP@mtjbkup2 "Drive 1 - Library B" Time: 06/10/07 06:13:21
[90:51] /dev/rmt/39mn
Cannot write to device ([5] I/O error)

[Critical] From: VBDA@msdidev "/oracle/MUP/sapdata24" Time: 06/10/07 06:07:50
Received ABORT request from SM => aborting.

[Critical] From: VBDA@msdidev "/oracle/MUP/sapdata21" Time: 06/10/07 06:07:50
Received ABORT request from SM => aborting.

[Critical] From: VBDA@msdidev "/oracle/MUP/sapdata24" Time: 06/10/07 06:07:50
Connection to Media Agent broken => aborting.

[Critical] From: VBDA@msdidev "/oracle/MUP/sapdata21" Time: 06/10/07 06:07:50
Connection to Media Agent broken => aborting.
[Major] From: BMA-NDMP@mtjbkup2 "Drive 1 - Library B" Time: 06/10/07 06:14:04
[90:161] Cannot write filemark. ([13] Permission denied)

[Major] From: BMA-NDMP@mtjbkup2 "Drive 1 - Library B" Time: 06/10/07 06:14:04
[90:161] Cannot write filemark. ([13] Permission denied)

[Minor] From: BMA-NDMP@mtjbkup2 "Drive 1 - Library B" Time: 06/10/07 06:14:04
[90:190] Invalid format version of Data Protector medium.

[Critical] From: BSM@mtjbkup2 "MSDIDEV_MSMADV_Weekly_Offline_2" Time: 06/10/07 06:14:07
[61:17112] Medium header verification failed.
All objects on this medium will be marked as failed.

[Major] From: BMA-NDMP@mtjbkup2 "Drive 1 - Library B" Time: 06/10/07 06:14:04
[90:135] Cannot eject medium. ([13] Permission denied)

[Major] From: BMA-NDMP@mtjbkup2 "Drive 1 - Library B" Time: 06/10/07 06:14:04
[90:64] Cannot unload exchanger medium (Details unknown.)

DATA PROTECTOR LOG.

description.
The xMA has detected an error while writing to the device. The
error message indicates the reason as reported by the OS. In addition,
you can check if debug log file contains more information.

Actions.
The session will not be completed successfully. You should
restart the session. If the error occurred on NT system and the
error reports incorrect parameter, the most probable cause is
the invalid block size used to configure the backup device.
You should check the device's block size and set it to the
appropriate value.

I checked the syslogs of the ignite server and the dataprotecotr server and found this

Ignite server log.
Jun 8 02:59:34 IG5470 tftpd[15223]: Timeout (no requests in 10 minutes)
Jun 8 04:11:47 IG5470 tftpd[19472]: Timeout (no requests in 10 minutes)
Jun 8 22:52:37 IG5470 tftpd[18568]: Timeout (no requests in 10 minutes)
Jun 9 04:21:06 IG5470 tftpd[6094]: Timeout (no requests in 10 minutes)


dataprotector server log.
Jun 8 16:52:55 mtjbkup2 LVM[18959]: vgchange -a y /dev/vg132
Jun 8 17:24:25 mtjbkup2 vmunix: msgcnt 22 vxfs: mesg 064: vx_ivalidate - /MEXPR
OD/oracle/ME1 file system inode 20133 version number exceeds fileset's
Jun 9 05:00:36 mtjbkup2 LVM[16449]: lvlnboot -v /dev/vg00
Jun 9 05:00:37 mtjbkup2 LVM[16492]: lvlnboot -v /dev/vg00
Jun 9 05:10:18 mtjbkup2 LVM[22901]: lvlnboot -v


Any help you can give me would be welcomed, if you need more info please let me know.
Knowledge is power.
12 REPLIES 12
A. Clay Stephenson
Acclaimed Contributor

Re: Ignite and Dataprotector backup problem

I suspect that some of Ignite's i/o probing is interferring with DP's i/o but this is purely a guess. Frankly, it would have never occurred to me to try running Ignite at the same time that my normal backups are running on the same server.
If it ain't broke, I can fix that.
Fabian Briseño
Esteemed Contributor

Re: Ignite and Dataprotector backup problem

Thanks for the reply Clay.
I forgot to mention that this fails for any backup, for example.

dataprotector could be running a backup on SERVER A

and if you start an ignite backup ON SERVER C

The dataprotector backup fails with the above message,

I also must mention that this has never happended before, I havent installed any patch, the only software I installed is Ignite.


This problem has me baffled, help.
Knowledge is power.
A. Clay Stephenson
Acclaimed Contributor
Solution

Re: Ignite and Dataprotector backup problem

When you say this hasn't happened before and the only software I have installed is Ignite then, of course, you haven't had an interaction problem with Ignite before because Ignite wasn't installed.

It's also very difficult to say that you were not running Ignite on the same server that you were backing up because the disk agent might be on one server, the media agent on another, and the cell server on still another. Moreover you might have multiple media agents or disk agents on multiple hosts in the DP session.

At this point, you need to establish a baseline know that you have installed Ignite. Do any DP sessions fail when Ignite is not running? Have you made certain that there are no hung Ignite sessions or their child processes running on any hosts in the Cell? Have you made certain that there are no stray DP processes running. At this point, I would issue an "omnisv stop" command and then do a 'ps -ef | grep "omni"' on every host in the cell to make certain there are no stray DP processes. At the same time, make certain no Ignite processes are running on any hosts. I would then see if a DP session works before I started a make_xxx_recovery.

Fundamentally, you need to see why the write error occurred on your tape device. All of your errors stem from that point.
If it ain't broke, I can fix that.
Fabian Briseño
Esteemed Contributor

Re: Ignite and Dataprotector backup problem

Thanks for your seggestions Clay, ill check them out and then i will get back to you.
Knowledge is power.
Bill Hassell
Honored Contributor

Re: Ignite and Dataprotector backup problem

There is indeed a possibility that I/O probing by Ignite is causing the problem. Ignite was recently updated to use a different probing method and I have seen EMS messages as well as DP errors when Ignite starts. Note that this occurs on (unsupported) public loop SAN switch which may be one of the causes. The ioscan driver went through a similar update followed by some quick patches to the fibre drivers to prevent SAN switch difficulties when it ran.


Bill Hassell, sysadmin
Fabian Briseño
Esteemed Contributor

Re: Ignite and Dataprotector backup problem

Heloo.
Guys just to update, I will be sending this error to HP so that they can further analize the problem, I will update this thread once they give us the results.
Knowledge is power.
Court Campbell
Honored Contributor

Re: Ignite and Dataprotector backup problem

Why are you running ignite while a dp backup is running? Also you shouldn't use ignite to backup the dp database while dp is running. You should stop the do processes and then run ignite. If you don't you have the possibility of hosing your database and/or having a corrupt database on the ignite tape. My suggestions would be:

1. do not tun ignite while dp is running.

2. exclude the dp database directory during the ignite.
"The difference between me and you? I will read the man page." and "Respect the hat." and "You could just do a search on ITRC, you don't need to start a thread on a topic that's been answered 100 times already." Oh, and "What. no points???"
Fabian Briseño
Esteemed Contributor

Re: Ignite and Dataprotector backup problem

Hello court.
Thanks for the reply.

Actually we dont do this, this was a one time thing. and the problem occured, I know you shouldnt be running both backups and the same time.

What i do remeber Is that some time ago because of an emergency I runned an ignite backup while dp backups where running, and nothig happened, thats why when this happened I posted the message.

Thanks for your suggestions.



Knowledge is power.
Court Campbell
Honored Contributor

Re: Ignite and Dataprotector backup problem

> What i do remember Is that some time ago because of an emergency I runned an ignite backup while dp backups where running, and nothing happened


Every dog has its day...


I know that's not technical, but it could have been a timing thing.
"The difference between me and you? I will read the man page." and "Respect the hat." and "You could just do a search on ITRC, you don't need to start a thread on a topic that's been answered 100 times already." Oh, and "What. no points???"
Fabian Briseño
Esteemed Contributor

Re: Ignite and Dataprotector backup problem

Hi again Court.
I guess murphy's law entered the scene. ;D.

"Every dog has it's day" sounds about right.
Knowledge is power.
Eric SAUBIGNAC
Honored Contributor

Re: Ignite and Dataprotector backup problem

Hi Fabian,

shared librairies no ? Probably you work with a SAN environment ?

If so, then modify kernel parameter "st_san_safe" value from 0 to 1 on each ignite client.

This will prevent anymore utilization of drivers like /dev/rmt/Xm. You still can use others special files like /dev/rmt/Xmn, but no more /dev/rmt/Xm.

When ignite do some I/O probing, if st_san_safe is not activated, it will interfer with /dev/rmt/Xm --> that is a driver that rewinds the tape when you close it. So, if DP is currently working with it ... bad things happen ;-(

I have many clients, with SAN, shared librairies, DP and of course ignite. With st_san_safe=1 we have zero problem.

If you have any script that uses /dev/rmt/Xm you will have to edit them to use /dev/rmt/Xmn instead and add some "mt -f /dev/rmt/Xmn rewind" commands. That's all

Hope this will help

Regards
Eric
Fabian Briseño
Esteemed Contributor

Re: Ignite and Dataprotector backup problem

Hello.

Eric as of this moment we havent modified the kernel, what I did was install some patches and updated fiermwares that HP recomended (fibercardds, switches), also the ignite server was sharing a switch with our SAN network we have removed the ignite server from the SAN switch, as of today the problem has not presented itself again. If the problem comes up again I will try you suggestion Eric.


Thanks to everyone who helped me out, I was really stumped on this problem, but thanks to all you I saw the ligth again.


Thanks guys.
Knowledge is power.