Operating System - OpenVMS
1752661 Members
5774 Online
108788 Solutions
New Discussion юеВ

Re: Java "not enough core"

 
SOLVED
Go to solution
Ben Armstrong
Regular Advisor

Re: Java "not enough core"

Murali,

That's a good question. No, a considerable number of tests fail prior to this failure. In fact, they don't merely fail, but rather they have errors, which renders those test results invalid. It looks like mspec is unable to recover from those, as a single error is often followed by a whole stream of them (every test case is marked "E" thereafter to the end of the file in which the error occurs). Perhaps it is the accumulation of such errors that ultimately leads to the stack dump. We're not sure. In any case, in parallel with an inquiry to HP, we are continuing to investigate these errors to see if they can be resolved, and perhaps if they can be, the stack dump will also be solved as a by-product.

Ben
P Muralidhar Kini
Honored Contributor

Re: Java "not enough core"

Hi Ben,

>> Perhaps it is the accumulation of such errors that ultimately leads
>> to the stack dump. We're not sure.
This is what even i had in mind given the observation that
when the test suites runs as a whole there is a failure with a particular test but
when only that particular test is run, there is no failure.

>> No, a considerable number of tests fail prior to this failure
May be the first (or first few) test that failed might have a clue as to what the root cause might be.

>> In any case, in parallel with an inquiry to HP, we are continuing to investigate
>> these errors to see if they can be resolved, and perhaps if they can be,
>> the stack dump will also be solved as a by-product.
Good luck.
Also please do post the solution to this problem in this thread, so that we all know the root cause (& solution) for this problem.

Regards,
Murali
Let There Be Rock - AC/DC
Ben Armstrong
Regular Advisor

Re: Java "not enough core"

OK, this is very strange. I had a hard time narrowing down what quota differed between the failing case and the successful one, and after reducing all my quotas down to equal or lower than ones on the failing account, I still couldn't reproduce it, until I observed that some quotas on the failing account were actually higher, so I started bumping mine up, one at a time until ...

I discovered actually *increasing* my BYTLM from 40000 to 382000 to match his finally causes the test to fail! Would someone please explain why this parameter would be relevant here? I have some guesses, based on what I've read about it, but would really like an expert opinion. (Keep in mind that in my simple test case, the relevant construct is backticks, which is supposed to spawn a process and return any output from that process.)

Thanks,
Ben
Hein van den Heuvel
Honored Contributor

Re: Java "not enough core"

>> *increasing* my BYTLM from 40000 to 382000 to match his finally causes the test to fail!

Interesting. Can you watch bytlm during the test with a dedicated program, dcl script, or simply with SHOW PROC/CONT ... Q

Does the issue occur 'right away' or after minutes or after a specific known test is a long series of tests?

Can you slow the problem down by sleeping every so-many iterations, or print a summary line every so often? just to better understand when this happens?


SPAWN uses bytlm to transfer symbols and logical names.

Do we know how the backticks are implemented? crtl-system call? call to Lib$spawn? concoction around SYS$CREPRC?

Since bytlm plays a role, you gotta think system services play a role.
How about trying to grab a log of those with SET PROC/SSLOG ?
It is a sledge-hammer approach, and you'll probably need some smarts (perl!) to weed through the details generated, but it could help pinpoint the root cause.
(I typically SPAWN before SET PROC/SSLOG as it has caused me to 'loose' processes with specific command ordering. )

hope this helps some,
Hein
Hoff
Honored Contributor

Re: Java "not enough core"

>...*increasing* my BYTLM from 40000 to 382000 to match his finally causes the test to fail! Would someone please explain why this parameter would be relevant here?

That reeks of a Java application bug somewhere, or of a Java VM bug.

On zero evidence, I'd look for a synchronization problem in the I/O processing where the code was implicitly synchronizing or implicitly throttling itself by running out of buffer storage space and the associated resource wait, and where the same code was allowed to free run by a higher quota, the process could then consume (other) process memory resources for use as buffers, and eventually crashing.

See if your (for instance) the pending I/O counts stored in the PCB spike when this thing tips over.

Whether this is a bug in JRuby or in the underpinnings is an open question.
Ben Armstrong
Regular Advisor

Re: Java "not enough core"

We delved into the source and found this problematic code in [.src.org.jruby.util]ShellLauncher.java:

private void verifyExecutableForShell() {
String cmdline = rawArgs[0].toString().trim();
if (doExecutableSearch && shouldVerifyPathExecutable(cmdline) && !cmdBuiltin) {
verifyExecutable();
}

// now, prepare the exec args

execArgs = new String[3];
execArgs[0] = shell;
execArgs[1] = shell.endsWith("sh") ? "-c" : "/c";

if (Platform.IS_WINDOWS) {
// that's how MRI does it too
execArgs[2] = "\"" + cmdline + "\"";
} else {
execArgs[2] = cmdline;
}
}

It turns out that "shell" is ultimately set in [.src.org.jruby.libraries]RbConfigLibrary.java by:


// TODO: note lack of command.com support for Win 9x...
public static String jrubyShell() {
return SafePropertyAccessor.getProperty("jruby.shell", Platform.IS_WINDOWS ? "cmd.exe" : "/bin/sh").replace('\\', '/');
}

Of course, this is not Windows, so it's assuming /bin/sh here, and then appending "-c", followed by the arguments. Fortunately, this can be worked around by replacing the shell with something else, e.g. set:

"-Djruby.shell=/path_to/sh.exe"

After writing a very simple shell in C++ that does nothing but drop the spurious "/c" (because sh.exe doesn't end with "sh", the windows switch form is used here) and pass the arguments to lib$do_command, and defining the jruby.shell as indicated above, the stack dumps have ceased. Here is that code:

#include
#include
#include
#include
#include

int main(int argc, char **argv)
{
string args="";
for(int i=1; i < argc; i++) {
if(strncasecmp(argv[i], "/c", strlen(argv[i]))!=0 )
args = args + string(" ") + argv[i];
}

char *str = const_cast(args.c_str());
static unsigned long int r0_status;

struct dsc$descriptor_s str_d =
{strlen(str), DSC$K_DTYPE_T, DSC$K_CLASS_S, str };

r0_status = lib$do_command(&str_d);

return 0;
}

Our only complaint is that this is slow. Three times slower to do 40 iterations of backticks to execute a simple DCL command than, say, executing sh.exe in a DCL procedure that spawns an execution of sh.exe 40 times. Any reason why things go so much slower in Java? Or is this likely to be a JRuby-specific issue? (I guess I should write a wrapper to shell out in Java, stripping away all of the JRuby stuff; btw, any good doc on how to call system services and RTL from Java?)

Here's a simple ruby test:

100.times{|_|puts `show time`; puts _}

On our rx2600 this takes 10 seconds, compared to 3 seconds for the DCL equivalent:

$loop:
$spawn/nolog sh "show time"
$index=index+1
$echo index
$if index.eq.100 then exit
$goto loop


Ben
Hoff
Honored Contributor

Re: Java "not enough core"

>Any reason why things go so much slower in Java?

Must. resist. zinger. :-)

That lib$do_command is an image teardown and a restart, so that's not going to be all that speedy. (And Unix does process creation operations vastly faster than VMS.)

And this:

string args="";
for(int i=1; i < argc; i++) {
if(strncasecmp(argv[i], "/c", strlen(argv[i]))!=0 )
args = args + string(" ") + argv[i];
}

If the argv strings are long or if there are a number of /c tokens here, that for-loop should be replaced with a couple of pointers in a for-loop that shuffle through the whole string, looking a the current character (as /) and then peeking at the next (as c) and compressing the string, rather than repeatedly searching the front part of that string. (And you can save off the length there, since you'll have it, rather than adding a strlen to fetch it again.)

Now as for where the wallclock is going, profile what you can.

And the difference between 10 seconds and 3 seconds doesn't look all that bad, given the volume of baggage here. And IIRC, DCL is likely running that particular command out of the CLI itself rather than an image activation, so you have 100 image activations and DCL doesn't.
Hein van den Heuvel
Honored Contributor

Re: Java "not enough core"

[arghhh, ITRC not behaving (or is it my internet connection?) ]





>> Any reason why things go so much slower in Java?

Image activations, for that 'shell'

in DCL the SHOW TIME is a native command, not image will be run.

You can easily verify that using a USER mode logical.
SHOW LOGICAL is an image and will 'eat' the logical.
SHOW TIME, SHOW DEFAULT and more are not and the logical will survive the command.

See below.
Hein


$ define/user test blah
$ show log test
"TEST" = "BLAH" (LNM$PROCESS_TABLE)
$ show log test
%SHOW-S-NOTRAN, no translation for logical name TEST
$ define/user test blah
$ show time
31-MAR-2011 08:54:28
$ show log test
"TEST" = "BLAH" (LNM$PROCESS_TABLE)
$ show log test
%SHOW-S-NOTRAN, no translation for logical name TEST



Ben Armstrong
Regular Advisor

Re: Java "not enough core"

Did you guys miss the fact that i'm running the "SHOW TIME" through "SH" which is my sh.cpp that I pasted earlier to use as jruby.shell? So that's always an image activation right there. Now, maybe there still are image activations that I'm missing, but I tried to be quite careful to make the two tests equivalent ...

i.e.

$ sh=="$path_to:sh.exe"
Ben Armstrong
Regular Advisor

Re: Java "not enough core"

In any case, optimizing this is probably premature, and particularly in the case of tightening up the argument parsing loop, I don't think will pay off much. I need to move on and figure out how to call system services/RTL routines from Java, and so forth. Guess I have some reading to do.

Thanks for all of your answers!

Ben