Operating System - Linux
1839222 Members
3261 Online
110137 Solutions
New Discussion

Re: ProC programs crash on 11.23

 
SOLVED
Go to solution
dictum9
Super Advisor

ProC programs crash on 11.23

It's hp-ux version 11.23 on Itenium 2620.

ProC programs randomly crash on the OpenDir call. Any idea why that is? Is there a patch for this?

26 REPLIES 26
Steven E. Protter
Exalted Contributor
Solution

Re: ProC programs crash on 11.23

Shalom,

One would have to see the core dumps (perhaps take a look at them), and perhaps the code to get an idea.

Perhaps use a debugger such as gdb to get some idea whats wrong with the programs.

I will venture to guess, wildly that there is likely a problem with the code. recompile or code changes need to be made.

To hope to get lucky with a patch, search the ITRC patch database for some form of the call (opendir).

I can't see how anyone will be able to recommend a specific patch unless they had the same symptoms and fixed them with a patch.

hint: More information required to help with this.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
dictum9
Super Advisor

Re: ProC programs crash on 11.23

More info:

Pro*C/C++: Release 9.2.0.2.0
HP-UX 11.23
Itanium rx2620
A. Clay Stephenson
Acclaimed Contributor

Re: ProC programs crash on 11.23

Is OpenDir() actually the standard function opendir() or is it something else? This is rather standard debugging so add the -g flag (which I assume is passed from the ProC pre-compiler to the C/C++ compiler) and execute your program and allow it to die. Then do a stack trace and you should see the culprit.

Since you seem to be asking for psychic help, given the wealth of data that you have supplied, Miss Cleo says that what a pointer once pointed to as valid data no longer is. Perhaps a local string variable that should be declared static --- but contrary to her vast audience, I find that Miss Cleo is often wrong.
If it ain't broke, I can fix that.
dictum9
Super Advisor

Re: ProC programs crash on 11.23

opendir it is.

dictum9
Super Advisor

Re: ProC programs crash on 11.23

----Quote----

No, there is no core dump because it does not "crash". Instead it exits "gracefully". The program calls opendir() which would normally return a pointer to the directory structure and all the files in that directory. The program loops (sleeping for 5 seconds each time) looking in the directory for a file. Most of the time, the opendir() call is successful, but every once in a while it will return NULL and the program exits "gracefully" when that happens i.e. it just stops running.

Today we made a change to the program to re-try access to this directory if it fails, for up to 5 times. That band-aid may or may not be a work around to the problem, as GAR mentioned this morning it may be related to a memory leak, in which case the program will likely still quit running.

----end quote----
James R. Ferguson
Acclaimed Contributor

Re: ProC programs crash on 11.23

Hi:

Just out of curiosity, what is the latest standard patch bundle that you have applied?

Regards!

...JRF...
A. Clay Stephenson
Acclaimed Contributor

Re: ProC programs crash on 11.23

Oh, that's a horse of a different color. A NULL pointer returned by opendir() is not entirely unexpected and when that happens, you then need to check the value of the extern variable errno which should indicate the reason. A common problem might be that your process has reached maxdsiz and thus malloc() failed. Note: when you use the word "crash" in describing program behavior that does not normally mean that a function returned a result I was not expecting (but should have been or at least had meaningful error messages output before exiting).

PHKL_31500 does have a fix for a problem with opendir() but I don't think it fits your problem and you should read the installation instructions carefully before installing --- although I suspect it is already installed on your system.
If it ain't broke, I can fix that.
Sandman!
Honored Contributor

Re: ProC programs crash on 11.23

To reiterate what Clay has said a simple and probably generic fix would be to add exception handling to your Pro*C program. It can be like the one below:

if (!opendir("Some_Dir_Name"))
perror("Error");
dictum9
Super Advisor

Re: ProC programs crash on 11.23

J. Ferguson,

The system was installed in the fall of 2006. I ran a custom patch install with swainv.

C.Stephenson,

Yes, I have PHKL_31500.

Sandman!,

Good idea.
A. Clay Stephenson
Acclaimed Contributor

Re: ProC programs crash on 11.23

I was almost certain that your did have that patch as it is quite old and doesn't really fit your symptoms because your symptoms almost certainly are not a bug. There are a number of reasons opendir() can fail and putting a loop in to retry the call is not likely to fix anything. You need to resolve the base cause of the problem and that is where errno comes into the picture. Something like this should be rather close:

#include
#include
#include
#include
#include
#define assign_errno(x) ((errno != 0) ? errno : (x))

int process_dir(char *dirname)
{
int cc = 0;
DIR *dp = NULL;

dp = opendir(dirname);
if (dp != NULL)
{
/* normal processing goes here */
closedir(dp); /* If you miss this step you have an instant memory leak */
}
else
{
cc = assign_errno(255);
(void) fprintf(stderr,"Can't open directory \"%s\"; status %d.\n",dirname,cc);
(void) fflush(stderr);
}
return(cc);
} /* process_dir */


------------------------------------------
You now will have an integer exit code that means something and an error message on stderr that means something.
If it ain't broke, I can fix that.
dictum9
Super Advisor

Re: ProC programs crash on 11.23


I haven't yet tried the above code to get the error code, but I will.

Here is the output from the log file. A couple of scenarios were looked at:

(1) The directory was renamed to another name, and
(2) the permissions of the directory were set to 000

The corresponding messages given in the log file on dev were:
****** Start of qms1108a Report ******
Start time: 14:27:1 7/18/2007

Error Opening the /apps/qms/data/hnstest directory
stat() error on /apps/qms/data/hnstest: No such file or directory

qms1108a STOPPED

End time: 14:28:6 7/18/2007
****** End of qms1108a Report ******
****** Start of qms1108a Report ******
Start time: 14:34:32 7/18/2007

Error Opening the /apps/qms/data/hnstest directory
stat() error on /apps/qms/data/hnstest: Permission denied

qms1108a STOPPED

End time: 14:34:32 7/18/2007
****** End of qms1108a Report ******


However, when the actual crash happened, no data was logged.

****** Start of qms1108a Report ******
Start time: 11:31:19 7/23/2007

Error opening the /apps/qms/data/hnstest directory

qms1108a STOPPED

End time: 11:31:19 7/23/2007
****** End of qms1108a Report ******




dictum9
Super Advisor

Re: ProC programs crash on 11.23


==============================================================
Here is the actual code with opendir() and closedir(). Are they properly matched?
==============================================================

if ((dirp = opendir(path)) == NULL)
{
while (dirp == NULL && sleep_count < 5)
{
fprintf(log_file,"OPENDIR error:%s, sleep count %d, sleep interval %d\n", strerror(errno), sleep_count, sleep_seconds);
fflush(log_file);
sleep(sleep_seconds);
dirp = opendir(path);
sleep_count++;
if (sleep_seconds < 5)
{
sleep_seconds++;
}
}

if (dirp == NULL)
{
fprintf(log_file,"Error opening the %s directory\n", path);
fprintf(log_file, "stat() error on %s: %s\n", path,
strerror(errno));
fflush(log_file);
return(-1);
}
}

while ((dp = readdir(dirp)) != NULL)
{
if (dp->d_name[0] != '.')
{
sprintf(infile, "%s/%s", path, dp->d_name);
ret_stat = stat(infile, &filstat_ret);
if (ret_stat >= 0)
{
strcpy(filstat_buf[file_count].filename,dp->d_name);
filstat_buf[file_count].lmodt = filstat_ret.st_mtime;
file_count++;
}
else
return(-1);
}
}

filstat_buf[file_count].filename[0] = '\0';
j = file_count;

for (i=0; i < j; i++)
{
for (k=0; k < j-i-1; k++)
{
if (filstat_buf[k].lmodt > filstat_buf[k+1].lmodt)
{
strcpy(filstat_tmp.filename,filstat_buf[k].filename);
filstat_tmp.lmodt = filstat_buf[k].lmodt;
strcpy(filstat_buf[k].filename,filstat_buf[k+1].filename);
filstat_buf[k].lmodt = filstat_buf[k+1].lmodt;
strcpy(filstat_buf[k+1].filename,filstat_tmp.filename);
filstat_buf[k+1].lmodt = filstat_tmp.lmodt;
}
}
}
closedir(dirp);
Steven E. Protter
Exalted Contributor

Re: ProC programs crash on 11.23

Shalom,

In reflection of the additional information, I would take the following steps:

1) Test the code on another system, see if the results are consistent.
2) Take a look at the general patch level of the system and consider bringing it up to July 2007 or the previous QPK.

Is it possible to see a snip of the debug information to be sure its really an opendir call its crashing on.

Sorry there was no core dump, that would have made it easy. Are you sure core dumps are enabled on core dumps on the system in question?

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
A. Clay Stephenson
Acclaimed Contributor

Re: ProC programs crash on 11.23

There's a lot I don't like about this coding style but I'm all but positive this is not a patching issue. It is vitally important that you capture errno immediately after a function/system call error because the next function/system call could also fail and by the time you actually output errno, it's actually an artifact of the wrong function.
If it ain't broke, I can fix that.
Sandman!
Honored Contributor

Re: ProC programs crash on 11.23

Actually you would need two closedir() calls. One just before the return (-1) call and one right at the end (which exists). Else the current dir stream stays open and its malloc'ed memory remains in use and is not freed-up or returned.

while ((dp = readdir(dirp)) != NULL)
{
if (dp->d_name[0] != '.')
{
.
.
.
}
else
{
closedir(dirp); /* need to add this one */
return(-1);
}
}
dictum9
Super Advisor

Re: ProC programs crash on 11.23

The change was incorporated but the problem still persists.

There are a few places in the script where this was necessary, I need to confirm that they all were addressed.

Still unable to capture the error code.

Need to monitor memory usage for specific processes without glance.
James R. Ferguson
Acclaimed Contributor

Re: ProC programs crash on 11.23

Hi:

In the absence of 'glance' (the best tool!) you could craft a simple script to monitor the virtual size of a process thusly:

# cat ./watch
#!/usr/bin/sh
typeset WHO=$1
while true
do
{ date; UNIX95= ps -C ${WHO} -o comm= -o vsz= ;}|xargs
sleep 30
done

...run as :

# ./watch processname

If I were to monitor 'init' you would see output like:

Fri Jul 27 15:26:29 EDT 2007 init 344
Fri Jul 27 15:26:59 EDT 2007 init 344

See the manpages for 'ps'.

Regards!

...JRF...
Sandman!
Honored Contributor

Re: ProC programs crash on 11.23

Could you attach the code instead of pasting it so that it can be looked at. Merely reiterating what has already been said by Clay...check for errors after each and every system/lib call.
dictum9
Super Advisor

Re: ProC programs crash on 11.23


When running the UNIX95 command to monitor the memory usage of a process, what do the first two fields (numbers) on the line mean?



UNIX95= ps -e -o vsz -o pid -o args | sort -k1nr

15564 14395 /apps/qms/bin/qms1108a /bin/ON1108 /apps/qms/fims/upload /data/hnstest
^^^^^ ^^^^^
1860 14016 /apps/qms/bin/qms1108wrc /bin/ON1108wrc /data/hnswrc
1256 925 sh /apps/qms/bin/qms1109_IRU.sh


Sandman!
Honored Contributor

Re: ProC programs crash on 11.23

The first two fields are exactly what you have supplied to the ps(1) command viz., vsz (the memory size of the process in 1K units) and pid (its process ID) and the last one is the process creation command line.
dictum9
Super Advisor

Re: ProC programs crash on 11.23

The program crashed again today. The memory usage got as high as 18892 in 1K blocks. After restart, it's growing again.

Is there a kernel variable I can tune, or is this strictly a code issue?

The memory leak doesn't seem to be fixed.



18892 14395 /apps/qms/bin/qms1108a /bin/ON1108 /apps/qms/fims/upload /data/hnstest
1860 14016 /apps/qms/bin/qms1108wrc /bin/ON1108wrc /data/hnswrc
1256 925 sh /apps/qms/bin/qms1109_IRU.sh


Mon Jul 30 10:47:04 EDT 2007

18892 14395 /apps/qms/bin/qms1108a /bin/ON1108 /apps/qms/fims/upload /data/hnstest
1860 14016 /apps/qms/bin/qms1108wrc /bin/ON1108wrc /data/hnswrc
1256 925 sh /apps/qms/bin/qms1109_IRU.sh


Mon Jul 30 10:47:34 EDT 2007

1860 14016 /apps/qms/bin/qms1108wrc /bin/ON1108wrc /data/hnswrc
1256 925 sh /apps/qms/bin/qms1109_IRU.sh


James R. Ferguson
Acclaimed Contributor

Re: ProC programs crash on 11.23

Hi:

> The program crashed again today. The memory usage got as high as 18892 in 1K blocks. After restart, it's growing again. Is there a kernel variable I can tune, or is this strictly a code issue? The memory leak doesn't seem to be fixed.

Why do you believe that you have a memory leak? Usually, you would see the memory utiliztion for the processs grow until your program fails calling 'malloc' with an error of ENOMEM. With such a failure, the call will also return a NULL pointer.

In all your posts, I don't see evidence of continued memory growth nor any 'errno'. As already noted by others, you need to capture and test for non-zero 'errno' after every system call.

For the heap (data) (governed by 'malloc'), the kernel parameter 'maxdsiz' and 'madsiz_64bit' are the fences for the maximum data size of 32-bit and of 64-bit processes respectively.

Regards!

...JRF...

Sandman!
Honored Contributor

Re: ProC programs crash on 11.23

What do you mean by the "program crashed"? Did it produce a core dump? You can get a core dump by sending the SIGABRT signal & see what's happening.
Dennis Handly
Acclaimed Contributor

Re: ProC programs crash on 11.23

>The program crashed again today. ... or is this strictly a code issue?

If you think you have a memory leak, you can use wdb's leak detection commands.
(gdb) set heap-check on
run it
(gdb) info leaks