StoreEver Tape Storage
1753821 Members
8311 Online
108805 Solutions
New Discussion

Re: HP DPX 4.0 1 agent (linux 64-bit) keeps failing during backup

 
blokemann
Occasional Advisor

HP DPX 4.0 1 agent (linux 64-bit) keeps failing during backup

 

 Hi

I need some advice on the best course of action with a frustrating issue.

We have Data Protector Express 4.00-sp1 - 56906 running on two Suse Linux Enterprise Server (SLES) v10 servers.

The domain server is running on 32 bit SLES 10 SP1, OES 2
while the remote agent server is running on 64 bit SLES 10 SP3, OES 2 SP2

I have run test backups during the day and they have worked, so I am not sure why the 2100hrs scheduled backup has the issue.

Basically the backup of the remote agents starts, files are counted and file backup commences but then it dies.

/var/log/messages shows the following for the job that was scheduled to run at 2100hrs.

The firt entry and this one look interesting:
Sep 7 21:11:46 srv2 dplinsdr: *** glibc detected *** /usr/local/hp/dpx/lin/x86_64/dplinsdr: double free or corruption (!prev): 0x00002aaaaca008d0 ***

------------------------------------------------------------------------------------------

Sep 7 21:10:52 srv2 kernel: dplinsdr: page allocation failure. order:4, mode:0xd0
Sep 7 21:10:52 srv2 kernel:
Sep 7 21:10:52 srv2 kernel: Call Trace: <ffffffff80167964>{__alloc_pages+796} <ffffffff80182e4c>{kmem_getpages+106}
Sep 7 21:10:52 srv2 kernel: <ffffffff80184231>{fallback_alloc+275} <ffffffff80184753>{__kmalloc+179}
Sep 7 21:10:52 srv2 kernel: <ffffffff8016d1a7>{kzalloc+9} <ffffffff801a74a9>{getxattr+137}
Sep 7 21:10:52 srv2 kernel: <ffffffff80196cf4>{link_path_walk+218} <ffffffff802f1209>{__down_write+21}
Sep 7 21:10:52 srv2 kernel: <ffffffff801fee72>{__up_write+20} <ffffffff80174544>{sys_brk+244}
Sep 7 21:10:52 srv2 kernel: <ffffffff801a75cf>{sys_lgetxattr+75} <ffffffff802f1209>{__down_write+21}
Sep 7 21:10:52 srv2 kernel: <ffffffff801fee72>{__up_write+20} <ffffffff80174544>{sys_brk+244}
Sep 7 21:10:52 srv2 kernel: <ffffffff8010ae36>{system_call+126}
Sep 7 21:10:52 srv2 kernel: Mem-info:
Sep 7 21:10:52 srv2 kernel: Node 0 DMA per-cpu:
Sep 7 21:10:52 srv2 kernel: CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Sep 7 21:10:52 srv2 kernel: CPU 1: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Sep 7 21:10:52 srv2 kernel: CPU 2: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Sep 7 21:10:52 srv2 kernel: CPU 3: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Sep 7 21:10:52 srv2 kernel: CPU 4: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Sep 7 21:10:52 srv2 kernel: CPU 5: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Sep 7 21:10:52 srv2 kernel: CPU 6: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Sep 7 21:10:52 srv2 kernel: CPU 7: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Sep 7 21:10:52 srv2 kernel: Node 0 DMA32 per-cpu:
Sep 7 21:10:52 srv2 kernel: CPU 0: Hot: hi: 186, btch: 31 usd: 100 Cold: hi: 62, btch: 15 usd: 55
Sep 7 21:10:52 srv2 kernel: CPU 1: Hot: hi: 186, btch: 31 usd: 169 Cold: hi: 62, btch: 15 usd: 11
Sep 7 21:10:52 srv2 kernel: CPU 2: Hot: hi: 186, btch: 31 usd: 173 Cold: hi: 62, btch: 15 usd: 51
Sep 7 21:10:52 srv2 kernel: CPU 3: Hot: hi: 186, btch: 31 usd: 159 Cold: hi: 62, btch: 15 usd: 54
Sep 7 21:10:52 srv2 kernel: CPU 4: Hot: hi: 186, btch: 31 usd: 179 Cold: hi: 62, btch: 15 usd: 48
Sep 7 21:10:52 srv2 kernel: CPU 5: Hot: hi: 186, btch: 31 usd: 178 Cold: hi: 62, btch: 15 usd: 12
Sep 7 21:10:52 srv2 kernel: CPU 6: Hot: hi: 186, btch: 31 usd: 155 Cold: hi: 62, btch: 15 usd: 50
Sep 7 21:10:52 srv2 kernel: CPU 7: Hot: hi: 186, btch: 31 usd: 156 Cold: hi: 62, btch: 15 usd: 51
Sep 7 21:10:52 srv2 kernel: Node 0 Normal per-cpu:
Sep 7 21:10:52 srv2 kernel: CPU 0: Hot: hi: 186, btch: 31 usd: 124 Cold: hi: 62, btch: 15 usd: 48
Sep 7 21:10:52 srv2 kernel: CPU 1: Hot: hi: 186, btch: 31 usd: 154 Cold: hi: 62, btch: 15 usd: 6
Sep 7 21:10:52 srv2 kernel: CPU 2: Hot: hi: 186, btch: 31 usd: 15 Cold: hi: 62, btch: 15 usd: 57
Sep 7 21:10:52 srv2 kernel: CPU 3: Hot: hi: 186, btch: 31 usd: 139 Cold: hi: 62, btch: 15 usd: 55
Sep 7 21:10:52 srv2 kernel: CPU 4: Hot: hi: 186, btch: 31 usd: 115 Cold: hi: 62, btch: 15 usd: 3
Sep 7 21:10:52 srv2 kernel: CPU 5: Hot: hi: 186, btch: 31 usd: 155 Cold: hi: 62, btch: 15 usd: 14
Sep 7 21:10:52 srv2 kernel: CPU 6: Hot: hi: 186, btch: 31 usd: 168 Cold: hi: 62, btch: 15 usd: 48
Sep 7 21:10:52 srv2 kernel: CPU 7: Hot: hi: 186, btch: 31 usd: 175 Cold: hi: 62, btch: 15 usd: 60
Sep 7 21:10:52 srv2 kernel: Free pages: 804448kB (0kB HighMem)
Sep 7 21:10:52 srv2 kernel: Active:202289 inactive:137558 dirty:229 writeback:0 unstable:0 free:201112 slab:717254 mapped:26192 pagetables:2886
Sep 7 21:10:52 srv2 kernel: Node 0 DMA free:12188kB min:16kB low:20kB high:24kB active:0kB inactive:0kB present:11780kB pages_scanned:0 all_unreclaimable? yes
Sep 7 21:10:52 srv2 kernel: lowmem_reserve[]: 0 3630 6029 6029
Sep 7 21:10:52 srv2 kernel: Node 0 DMA32 free:649472kB min:5976kB low:7468kB high:8964kB active:258228kB inactive:460832kB present:3717536kB pages_scanned:0 all_unreclaimable? no
Sep 7 21:10:52 srv2 kernel: lowmem_reserve[]: 0 0 2398 2398
Sep 7 21:10:52 srv2 kernel: Node 0 Normal free:142788kB min:3948kB low:4932kB high:5920kB active:550928kB inactive:89400kB present:2456320kB pages_scanned:5 all_unreclaimable? no
Sep 7 21:10:52 srv2 kernel: lowmem_reserve[]: 0 0 0 0
Sep 7 21:10:52 srv2 kernel: Node 0 DMA: 3*4kB 2*8kB 2*16kB 5*32kB 1*64kB 3*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 2*4096kB = 12188kB
Sep 7 21:10:52 srv2 kernel: Node 0 DMA32: 120290*4kB 20847*8kB 26*16kB 1*32kB 1*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 649472kB
Sep 7 21:10:52 srv2 kernel: Node 0 Normal: 27731*4kB 3857*8kB 13*16kB 1*32kB 0*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 142788kB
Sep 7 21:10:52 srv2 kernel: Swap cache: add 58, delete 58, find 4/4, race 0+0
Sep 7 21:10:52 srv2 kernel: Free swap = 1052064kB
Sep 7 21:10:52 srv2 kernel: Total swap = 1052248kB
Sep 7 21:10:52 srv2 kernel: Free swap: 1052064kB
Sep 7 21:10:52 srv2 kernel: 1671167 pages of RAM
Sep 7 21:10:52 srv2 kernel: 144740 reserved pages
Sep 7 21:10:52 srv2 kernel: 225539 pages shared
Sep 7 21:10:52 srv2 kernel: 0 pages swap cached
Sep 7 21:11:46 srv2 dplinsdr: *** glibc detected *** /usr/local/hp/dpx/lin/x86_64/dplinsdr: double free or corruption (!prev): 0x00002aaaaca008d0 ***
Sep 7 21:27:34 srv2 syslog-ng[30490]: STATS: dropped 0
Sep 7 22:27:34 srv2 syslog-ng[30490]: STATS: dropped 0
Sep 7 23:27:34 srv2 syslog-ng[30490]: STATS: dropped 0
------------------------------------------------------------------------------------------
Some things I have tried:
1. Recreated the backup job
2. Tested backups several times during the day and they worked.

 

Where to from here?
There appears to be a bug somewhere.

ATTACHMENTS:

(NB The attachments are from linux so if using Windows, it is best not to view with Notepad, use Wordpad or a word processor.)

hpdx-error Sep7.txt
= var/log/messages extract for the 7th Sep 2010 on the remote agent server. Scheduled backup starts at 2100hrs.

var-log-messages-dplinsdr-page-alloc-failure.txt
= /var/log/messages grep of dplinsvr showing page allocation failures.

var-log-messages-extract.log
= /var/log/messages extract, full log excluding some irrelevant DNS and other daemon messages.

Should I raise this as a bug and if so, where is the best place to do that?

4 REPLIES 4
blokemann
Occasional Advisor

Re: HP DPX 4.0 1 agent (linux 64-bit) keeps failing during backup

Here is the second attachment.

It seems that I can't add more than one attachment per post!  I won't worry about the 3rd attachment for now unless someone requests it.

 

I forgot to mention earlier that this is a brand new server with a fresh install on SLES and of course  Data Protector Express Network Agent 4.00 SP1.

David Williams_25
Occasional Visitor

Re: HP DPX 4.0 1 agent (linux 64-bit) keeps failing during backup

Can you send the exact error message displayed in Data Protector Express?

 

Also, can you send the scheduled backup job configuration and all the details (screenshots are best), especially the scheduler page and the setting you are using?

 

This will help me to determine the problem.

 

 

 

 

blokemann
Occasional Advisor

Re: HP DPX 4.0 1 agent (linux 64-bit) keeps failing during backup

The error on the domain server is :

 

WARNINGS and Errors

    srv2  Error 1037: Object in not active

 

This is obviously because the network agent is not running on the remote server because the process has died.

blokemann
Occasional Advisor

Re: HP DPX 4.0 1 agent (linux 64-bit) keeps failing during backup

I just ran another test job...

This gets quite inconsistent, because some test backups have worked in the past...

 

Before submitting the job, I ran a dplinsvc -q which confirmed that the remote agent daemon was running.

Then I started the backup job from the domain server and instant failure:

Error 1037: Object is not active

 

So I check on the remote agent again, with dplinsvc -q and its status is still running! ??

An attempt to restart it with dplinsvc -t led to it hanging with "Service is being stopped...".  Still hanging after 10min of attempting to stop. :-(