General
cancel
Showing results for 
Search instead for 
Did you mean: 

64bit Exe "add_to_restore_list: malloc of file_list_rec failed" - maxdsiz64 or maxtsiz64

Alzhy
Honored Contributor

64bit Exe "add_to_restore_list: malloc of file_list_rec failed" - maxdsiz64 or maxtsiz64

The exe is actually one of Netbackup 6.5's processes -- bprd.

The above happens when trying to restore a 35million+ filesystem.

bprd grows to 8GB and bombs out with the above eror.

We've increased maxdsiz64 to 16GB from 8GB. Still the same issue. Server has ample Physmem -- 32GB in fact.

WHilst waiting for Symantec support -- I thought I'll give asking for inputs/opinions from the good guys (and gals) here on ITRC..

Many thanks in advance.
Hakuna Matata.
6 REPLIES
Don Morris_1
Honored Contributor

Re: 64bit Exe "add_to_restore_list: malloc of file_list_rec failed" - maxdsiz64 or maxtsiz64

Could be swap exhaustion (plenty of physmem doesn't always mean plenty of swap especially if there's a big but sparse virtual allocation).

Also should check to make sure this process isn't invoked from a shell / parent that has used ulimit or setrlimit to enforce a lower limit than the tunable since then the tunable doesn't get to apply to the child (this is the standard, children have to inherit the parent's limits).

Certainly isn't maxtsiz_64bit -- that only affects Text and if the process starts, it fits.

maxssiz_64bit may be larger than needed and consuming some private space... but 64-bit private space is big enough that wouldn't matter.

Otherwise... I'd just be guessing. [Could even be an application bug casting the pointer returned to an int32 and mistaking it for NULL...]
Alzhy
Honored Contributor

Re: 64bit Exe "add_to_restore_list: malloc of file_list_rec failed" - maxdsiz64 or maxtsiz64

Swap's healthy.
Freeem's healthy.

I monitor via OVPA and see the process grow... when it hits 8GB - it bombs out.

I am theorizing this executable loads in memory the pathnams of all 30+ million files in the restore job. So to all ye programmers out there - do such structure use maxdsiz, maxtsize and maxsise? I admit I do not have a full understanding of the workings of these 3 kernel params that affect the process' memory needs.

This is on 11.31 IA64 and Netbackup 6.5. "bprd" I've validated to be a 64 bit native IA64 application.

Hakuna Matata.
Michael Mike Reaser
Valued Contributor

Re: 64bit Exe "add_to_restore_list: malloc of file_list_rec failed" - maxdsiz64 or maxtsiz64

You're failing on a call to malloc, which means heap space is being requested, which is governed by maxdsiz_64bit, which you say is set to 16 GB.

You state that swap space is OK, so I presume you've got acres-n-acres-n-acres of swap space available when the process dies.

As Don said, if it's not a ulimit/setrlimit issue with the parent shell or process, then it's probably Just A Bug in the coding of the application itself, and there's really nothing that can be done from the OS level. Looks like you'll have to wait on Symantec to get back to you. :-(
There's no place like 127.0.0.1

HP-Server-Literate since 1979
Dennis Handly
Acclaimed Contributor

Re: 64bit Exe "add_to_restore_list: malloc of file_list_rec failed" - maxdsiz64 or maxtsiz64

If you can run tusc on it, it would be helpful to see the last allocation size. If they are small, it is probably out of swap. If it is more than 8 Gb, it is probably an application bug. Possibly trying to double the size each time?
Alzhy
Honored Contributor

Re: 64bit Exe "add_to_restore_list: malloc of file_list_rec failed" - maxdsiz64 or maxtsiz64

Our swap's hefty and healthy.
We increased maxssize64 too to double the default (1 to 2GB).. No dice.

We've submitted bprd debugging logs to Symantec and are waiting for the verdict.

We are expecting a response ik - Use RAW Backup (aka FlashBackup).. it will not be nice.

I know the best practice for Gazillion Files Filesystems is to re-org into several filesystem trees but it's not just viable right now.
Hakuna Matata.
Dennis Handly
Acclaimed Contributor

Re: 64bit Exe "add_to_restore_list: malloc of file_list_rec failed" - maxdsiz64 or maxtsiz64

>We increased maxssize64 too, to double the default (1 to 2GB).

You probably should put that back.

>We've submitted bprd debugging logs to Symantec

Did you want to try tusc?