- Community Home
- >
- Servers and Operating Systems
- >
- Legacy
- >
- Operating System - Tru64 Unix
- >
- Re: Memory Error on Tru64
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-30-2006 12:36 PM
12-30-2006 12:36 PM
Memory Error on Tru64
Last weekend, they loaded a massive amount of data into our database. My process runs weekly and it failed on half the library calls, the ones that loads very large linked lists into RAM. Since I didn't write the library, all I know is that its failing on a memory allocation call, I believe malloc(), and I don't know it he's just getting a null pointer or an error code.
This happens on our Dev and Prod hosts, both TRU64. The RAM, ulimit settings and swap space size are:
Prod:
RAM - 12 Gigs
ulimit -d 12000000
ulimit -v 12000000
Swap space - 51 Gigs
Dev:
RAM - 12 Gigs (maybe 16 Gigs)
ulimit -d 6000000
ulimit -v 6000000
Swap space - 34 Gigs
The lib's programmer (as well as most of us) says that either host should be more than adequate. He did some calcs and his largest linked list should be just over 2.1 Gigs in size.
My questions are:
1. Is there another ulimit setting or UNIX setting we should be setting for this process?
2. Is there a limit of how much RAM or swap space that a process can access? Are these independent of the ulimit settings?
3. Am I understanding ulimit correctly - does the -d set the size of the process's data segment and the -v set the size of the process's virtual memory which I thought was the disk swap space?
4. Awhile back, I found some C functions that gave me host hardware and memory info. Just when I need it, I can't find my code or the URLs. Does anyone know of those?
5. Any ease way to monitor a process's memory and virtual memory usage?
6. Suppose he doesn't have logic errors and does need to build a 50 Gig linked list. On our Prod server with the 12 G of RAM and 51 G Swap that should work. But would we have to increase the ulimit -v from 12000000 to 38000000 or greater? Does the -v limit a process to how much of the swap space it can use, even if its free?
This process is dead in the water and we have contractual agreements to meet. Any help would be extremely appreciated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-30-2006 02:48 PM
12-30-2006 02:48 PM
Re: Memory Error on Tru64
Paul, Welcome to the Tru64 ITRC forum .
The ulimit setting are secondary to and controlled by the system configuration for the 'proc' subsystem.
Check out (and post as a reply?): sysconfig -q proc
While thinkin in that space you may also want to check out sysconfig -q vm and ipc.
You are talking about non-trivial amounts of malloc space which may deserve special attention. Be sure to read up the 'Tuning Memory Allocation' section in the 'malloc' man pag.
To see the memory allocation for a given process use 'top' or ps -o rssize,vsize,...
- rssize RSS [Tru64 UNIX] Real memory (resident set) size of the process (in 1024 byte units)
- vsize VSZ [Tru64 UNIX] Process virtual address size
IMHO it is not reasonable to attempt to manipulate 50GB of malloced data with anything less than 50 GB physical memory.
To push that much data into the page & swap subsystem, possibly with modest disk IO resource is going to be horrible.
If the data lives in a DB, then leave most of it there!? Massage it piecemeal. Have the DB return just what you need, when you need it. That's what the db is for no?
It is not uncommon see a few bytes returned from a database be exploded into dozens of bytes of object storage due to malloc allocation rounding, pointers and so on. 2.1Gb of stored data bytes may well exceed 12GB once malloced into objects in a list.
>> This process is dead in the water and we have contractual agreements to meet
Find whoever signed for the contract! They'll have to fess up no?
Run some benchmarks on the dev box:
- measure idle process size, extract 100MB from the DB, measure time to process it and sleep. Now remeasure process size.
Repeat for 200MB, 500MB, 1GB.
Does it scale as expected?
Regards,
Hein van den Heuvel
HvdH Performance Engineering
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-30-2006 06:17 PM
12-30-2006 06:17 PM
Re: Memory Error on Tru64
total available _virtual_ memory, not the
physical memory. (Bad _performance_ probably
involves a shortage of _physical_ memory.)
When malloc() fails, it returns a null
pointer, and normally sets errno to ENOMEM.
Failing is failing -- the fine print is
inconsequential.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-31-2006 01:05 AM
12-31-2006 01:05 AM
Re: Memory Error on Tru64
total available _virtual_ memory, not the
physical memory. (Bad _performance_ probably
involves a shortage of _physical_ memory.)
Yes and no. Yes malloc is about virtual memory. It has to be touched to require physical memory. Conceivably you can malloc much more then you physically have, with good performance. However, memory managers may keep the administration for what is free in the un-malloced sbreak area. As a chunk is requested the memory manager may well touch that chunk and/or neighbouring memory. (allthough I could imaging an implemenation where at least the initial, clean, mallocs keep on carving fresh chunks whilest only updating the 'space remaining' and/or next free address data).
As Steven suggests, mallocing big chunks (tens of pages at a time) one can probably run out of virtual memory before running out of physical with good speed.
However, the indicated usage suggests smaller chuncks which are immediately touched to me. That would make physical memory consumption hand in hand with virtual memory.
fwiw,
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-31-2006 02:03 AM
12-31-2006 02:03 AM
Re: Memory Error on Tru64
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-31-2006 09:32 AM
12-31-2006 09:32 AM
Re: Memory Error on Tru64
He did some changes in the code to ensure the total amount of memory being allocated.
(Added a global variable to the program to count each and every realloc)
He did a test on another Dev server with a 4GB ulimit and the process fails when 1.8G of total memory is allocated.
Tried the sysconfig and ipc commands from my directory, as well as qualifying it with /bin, /usr/bin and /usr/sbin. No luck. I wonder if they have it disabled for us "regular" folks. I'll have to see if a SysAdmin can do these.
As far at the reasons for our process, you're preaching to the choir. It is being re-engineered for another group in a manner similar to your suggestion. Without revealing any "trade secrets", it was built that way so that elaborate searches, that were too complex or time-consuming in SQL, could be done via the linked lists. Its would take a long while to describe it, which I couldn't do anyways since I didn't write it, don't know all the details and would be fired as a result.
In the meantime, my process and group are at its mercy. We just had a large data load and it appears our servers need to be tweeked to allow for larger linked lists. Also, I think the programmer is right and were not building 50 Gig linked lists. We're getting stopped at around the 2 Gig limit.
I'm going to contact a SysAdmin. I'll let you know what happens. Thanks for all the great and speed responses and info. And don't get me started on crazy contracts...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-31-2006 12:31 PM
12-31-2006 12:31 PM
Re: Memory Error on Tru64
distinguish from that for malloc().
I'd expect, however that multiple realloc()
calls could eat more memory (at least
temporarily) than one big malloc() (or one
big realloc()), because it needs to copy the
old data into the newly allocated storage.
urtx# type sysconfig
sysconfig is /sbin/sysconfig
or "man sysconfig".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-01-2007 02:00 AM
01-01-2007 02:00 AM
Re: Memory Error on Tru64
Depending on the malloced size, that may be a reasonable overhead.
As I suggested earlier, write a short test program to malloc or re-alloc a know quantity in similar sized chunks to match the application. Now see the vss/rss before and after. You may even want to trace the test program to see the sbreak syscalls when it needs to grow.
Also, malloc a few chunks of knows size and display the address. What is the distance between the addresses? It will be the malloced size plus some more.
Sorry, I do not have access to a tru64 system right now to test for you.
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-01-2007 02:03 PM
01-01-2007 02:03 PM
Re: Memory Error on Tru64
The sysconfig is info is attached.
I'll study it and then contact our SysAdmin.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-02-2007 05:18 AM
01-02-2007 05:18 AM
Re: Memory Error on Tru64
max_per_proc_address_space = 4398046511104
per_proc_address_space = 4294967296
Correct me if I'm wrong, but from another website, I learned that per_proc_address_space is the maximum amount of memory a process can address, while
max_per_proc_address_space is the maximum that a process can address if the ulimit -v is allowed to be set to unlimited.
The HP webpage: http://h30097.www3.hp.com/docs/base_doc/DOCUMENTATION/V51_HTML/MAN/MAN5/0141____.HTM
defines them as:
max_per_proc_address_space -
"Maximum amount, in bytes, of user process address space"
AND
per_proc_address_space -
"The maximum amount, in bytes, of user process address space."
which I find confusing.
Am I correct to conject that if we increase the per_proc_address_space to say 8G or 10G, we should be okay? The programmer just told me that his library should be using 2.5 G total. I'm figuring that with other overhead, it may double to 5G. So 8G or 10G should allow it to run.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-02-2007 01:45 PM
01-02-2007 01:45 PM
Re: Memory Error on Tru64
Personally I would prefer to 'see' the old setting/problem first with ps -o vss,rss but I appreciate that may be hard if a program exits upon failure. You can also just turn the know and hope for the best, then run ps -o afterwards while it works.
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-02-2007 02:32 PM
01-02-2007 02:32 PM
Re: Memory Error on Tru64
Rebooted and we re-ran with no luck.
Tried twice with ulimit -d and -v both set to 10G and then 8G.
Should we be changing another kernel param?
Another ulimit setting? -m?
Previous realloc() comment has me thinking if that is doubling memory usage do to shuffling of old and new pages.
Will examine memory some more during next tests. SysAdmin had said that only 7.5 of 12G of RAM was being used.
Programmer felt that only 2.5 G had been allocated for linked lists. Even doubled for overhead that would be just 5G. Host has 12G RAM/51G swap. Can't see why it would run out. Also, it ran out at same amount of data where it did on Dev server with per_proc_address_space set to 4G.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-02-2007 11:02 PM
01-02-2007 11:02 PM
Re: Memory Error on Tru64
Check the data limit?
Sorry for not pointing that out immmediately.
per_proc_data_size = 4294967296
max_per_proc_data_size = 8589934592
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-14-2007 03:19 PM
01-14-2007 03:19 PM
Re: Memory Error on Tru64
Turns out those that mentioned realloc were correct. It leaves holes in memory. A tech from HP did a test and found out that if you do the following:
ptr = realloc(1K);
ptr = realloc(2K);
ptr = realloc(4K);
ptr = realloc(8K);
You might think you were only using 8K of RAM. No instead you will of used 23K - the 8K you want plus 15K of holes in RAM that don't get reused by the OS.
He advised our library programmer to change how he was allocating memory. He did and we have been running in production successfully since.
Thanks to every one for your timely advice!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-14-2007 03:22 PM
01-14-2007 03:22 PM