Operating System - OpenVMS
1827458 Members
5774 Online
109965 Solutions
New Discussion

Re: HPBASIC Thread safe?

 
SOLVED
Go to solution
True_Geek
Occasional Visitor

HPBASIC Thread safe?

Hi

 

I'm wonder if there anything similar in HPBASIC as Fortans and Pascals "VOLATILE"?

 

I have /NOOPT when compiling.

I have not tried /SYNCH and I wonder if that would help?

Some times it seams that code ain't executed in the order it's written...

 

BR

n00b

 

 

 

 

 

19 REPLIES 19
abrsvc
Respected Contributor

Re: HPBASIC Thread safe?

More specifics about the OS level, Language version etc .would be nice.

Given that you are requesting /NOOPT, you should not see anything optimized away that the equivalent of /VOLATILE would prevent.

One option that comes to mind though, would be to declare the variable as External. With the external reference, the compiler should not be able to optimize it in any way as it will not know much about it.

Why do you believe that there are out of order execution issues? Can you provide more details about the program environment?

Thanks,
Dan
Hein van den Heuvel
Honored Contributor

Re: HPBASIC Thread safe?

I don't think you can get Basic to be tryuly thread safe, but if I recall correctly it  is relatively safe anyway since it does not optimize variables into registers and such. 

 

I would stick anything that can change behind it's back into a MAP or COMMON to be sure.

For more tricky stuff you may need to protect with $SETAST (yuck) or a wrapper function with a sync mechanism.

 

What problem are you trying to solve?

What are you observing that makes you conclude the order is not as expected/intended.

 

Itanium? OpenVMS version? Basic version?

 

Hein

 

 

Craig A Berry
Honored Contributor

Re: HPBASIC Thread safe?

I think I remember being told that BASIC is thread-safe but not AST-reentrant.  Or, in other words, it depends on the threading model being used, which I don't see specified yet.

True_Geek
Occasional Visitor

Re: HPBASIC Thread safe?

Hi

 

Openvms 8.3 Alpha  and basic 1.7.

 

Sorry for my English in advance :)

 

I started to think a bit, yay :) And made my self some test programs, see test.zip.

 

The "run_counter1" just simulated the sys$enqw, but not generate what i expected when forcing them to diffrent CPU:s... so skip that for now.....

 

The run_counter2 simulated my problem.

Edit build.com were you place your source, se the MY_EXEC define...

@build

 

then run run_odd.com and run_even.com in 2 diffrent screens.

check what cpu's you have on system and clear and set the once you have in run_even/run_odd.com

 

The idee is simple:

 

 map (counter_common)       byte in_use,       &
                                                     long counter, &
                                                     long counter_array(32000)
 

Loop over every odd place in counter_array and write a value known only localy to program, then loop again check that it's the right value on all odd in array.

 

While doing the odd thing above, do the same thing for but for even places...

 

In theory they access diffrent parts of shared memory, you could expect problem if you force them to diffrent CPU's due to cache updates etc, but not of you force them to same cpu..

 

My test bench is a  2 cpu system, see the "set aff" in run_even/run_odd.

 

I was runing on same CPU but still they messed up.

 

Run 1

 

Terminal ODD:

@run_odd

$set process /affinity /permanent /set=0 /clear=(1)
$run run_counter2
1
 Interrupt - I interupted with ctrl-C because EVEN broke

 

Terminal EVEN:

@run_even

$set process /affinity /permanent /set=0 /clear=(1)
$run run_counter2
0
Process 2: Counter_array[ 19892 ]; 3753  <> last_counter: 3764

 

Even should find same value all the way over the loop.

It expected to find 3764, but found 3753... how come???

 

It show typical cpu cache problem, and it's repitable for me......

 

In real life i force those old vax programs to one cpu, but they still time to time "crash" due to the shared array get mixed data.

 

So is it a Openvms bug? -  that at heavy load it migrates processes to other cpu even if it's should not.

Or a compiler bug?

 

BR

n00b

 

 

 

 

 

 

True_Geek
Occasional Visitor

Re: HPBASIC Thread safe?

Quick update.

 

If i select a FIXED value of LAST_COUNTER and not increment it, then ODD and EVEN never break.

How odd...

 

BR

n00b

True_Geek
Occasional Visitor

Re: HPBASIC Thread safe?

 

Update:

 

Fixed to that the value i add to the array will be ODD if write to ODD array place and EVEN if written to EVEN.

Also i write out the i-2 arrays value and i don't exit program, just let it run the rest off array.

 

Terminal ODD:

run run_counter3
Start value for this process? 1
Process 2: Counter_array[ 12011 ]; 2297  <> last_counter: 2305
Prev: Counter_array[ 12009 ]; 2305
Process 2: Counter_array[ 12013 ]; 2297  <> last_counter: 2305
Prev: Counter_array[ 12011 ]; 2297
Process 2: Counter_array[ 19459 ]; 3737  <> last_counter: 3759
Prev: Counter_array[ 19457 ]; 3759
Process 2: Counter_array[ 19461 ]; 3737  <> last_counter: 3759
Prev: Counter_array[ 19459 ]; 3737
Process 2: Counter_array[ 337 ]; 4333  <> last_counter: 4339

 

Terminal EVEN:

 run run_counter3
Start value for this process? 0
Process 2: Counter_array[ 30340 ]; 5276  <> last_counter: 5304
Prev: Counter_array[ 30338 ]; 5304
Process 2: Counter_array[ 7234 ]; 7690  <> last_counter: 7714
Prev: Counter_array[ 7232 ]; 7714
Process 2: Counter_array[ 24178 ]; 8220  <> last_counter: 8274
Prev: Counter_array[ 24176 ]; 8274
Process 2: Counter_array[ 25090 ]; 9170  <> last_counter: 9198
Prev: Counter_array[ 25088 ]; 9198

 

The thing... with this.... is that it always looks like the ODD only see wrong "ODD" value like the value many cycles ago, same goes for EVEN.

 

When running alone, it OK, but when i start the other process the other seams to read OLD data....

 

The code:

 

       input "Start value for this process";start

        last_counter = start


        while(1=1) !loop forever

                i = start
                last_counter = last_counter + 2

                while (i < 32000 )

                        counter_array(i) = last_counter
                        i = i + 2
                next

                i = start

                while (i < 32000)

                        If counter_array(i) <> last_counter then
                                print "Process 2: Counter_array[";i;"];";counter_array(i);" <> last_counter:";last_counter
                                print "Prev: Counter_array[";i-2;"];";counter_array(i-2)
                        End if
                        i = i + 2
                Next


        next

 

 

BR

n00b

 

 

John Gillings
Honored Contributor
Solution

Re: HPBASIC Thread safe?

Spoiler
 

BR,

   I don't think this has anything to do with thread safety, AST reentrancy or stale CPU caches, it's a simple case of unsynchronised access to a shared write access data structure.

 

  There's no point in arguing about theory, or trying to figure out what might be happening. IF you have a shared data structure that will be written by multiple processes, ESPECIALLY when the unit of reference is smaller than a quadword, THEN you need some form of synchronisation. No ifs, ands, buts or maybes. I present your posting as evidence. If you find you need to lock a process to a CPU in order for your algorithm to work, then IT'S WRONG! Locking multiple processes to a single CPU is basically using the CPU itself as a very expensive lock.

 

  What you need to work out is the required granularity of locking and choose an appropriate mechanism. $ENQ is general and fairly simple. Depending on the frequency of access and the performance requirements you may be able to lock the whole table. Typically your writers would $ENQ a null mode lock, then convert to EX to write. Readers would $ENQ a null mode lock and convert to CR when they want to read.

 

  Don't even think about trying to hack around this fundamental principle. Without proper synchronisation your code WILL FAIL somewhere, sometime, and most likely in a manner that's very difficult to reproduce or diagnose.

 

  Remember that the underlying hardware accesses memory in whole quadwords. So, to update a longword, the hardware reads the surrounding quadword, masks in the updated longword and writes back the whole quadword. If two processes attempt to update adjacent longwords simultaneously, only one will win. This is known as "word tearing".

 

A crucible of informative mistakes
True_Geek
Occasional Visitor

Re: HPBASIC Thread safe?

Hi

 

Thank you for your reply.

 

It was most help full.

 

In the real application we have sync with enq, ast etc. but that don't help.

 

The thing you said about Quadword sounded like the right thing.

So I tested it

 

So I=I+4

And i start the processes with "0" and "2" (due to i have 4 bytes long)

 

Then it works perfectly!!! 

 

So now I have a solution:

To encapsule the struct/Record with 1-8 bytes (8 bytes most safe) from start and add a

STRING FILL = 0 at end to get it to even boundary.

 

Just one question remain... just because i'm a n00b on alpha.

 

If Set aff realy works and kernel don't have a bug that starts to migrate processes during load, then:

 

What i don't understand is that both processes sits on same cpu and share same cache.

I would expect that we have only one copy of the memory area in CPU and there for the QUADWORD problem should not exists.

But it do exists, so then the memory area exists in 2 instances in the CPU or SET AFF is broken.......

 

Any way that discussion is perhaps point less if i can get my hand on some good document over Alpha internal memory/cache hanling. Any one have a link?

 

 

 

Br

n00b

 

 

 

 

 

 

abrsvc
Respected Contributor

Re: HPBASIC Thread safe?

The issue here is less to do with the cache and more to do with the sync that John explained. For more information, look up the LDL_L/LDQ_L and STL_C/STQ_C instruction sets for Alpha. Reference the Alpha Architecture Reference Manual pages 4-8 thru 4-12. This is more commonly known as the "load locked/store conditional" instruction group. This will further explain the problem/solution.

Dan
Hein van den Heuvel
Honored Contributor

Re: HPBASIC Thread safe?

 

>> If Set aff realy works and kernel don't have a bug that starts to migrate processes during load, then:

 

Yeah right. You, as a self professed nOOb,  think you are going to find a kernel bug on your first day. NFW!

more likely: If it does not work as you expect, then your understanding is not yet adequate.

That was proven by the opening line... this was not at all about thread safety (which is understood to be within a process), but about share data access from multiple processes (kernel threads, I give you that)

 

You may want to go more aggresive on the alignment/seperation.

256 bits would be my first choice ( 32 bytes ) 

Is this application garantueed never ever to be ported to Itanium?

 

Cheers,

Hein.

 

 

 

 

H.Becker
Honored Contributor

Re: HPBASIC Thread safe?

On Alpha (and Itanium) systems, with more than one writer (process), you need to synchronize access to shared data, no matter how many CPUs your application is running on and what the size of the data and its alignment (or separation) is. The load locked/store conditional instructions can be used to construct such a synchronization, but high level OS or language synchronization methods are recommended .

 

Your examples show more than one writer.

 
With one writer and many readers you can get away without synchronization but ONLY IF the data is naturally aligned AND the CPU provides single instructions to access the entire data.
 
With multiple CPUs you want to separate the data - have them in separate cache lines - to avoid cache trashing, which is a performance penalty.
John Gillings
Honored Contributor

Re: HPBASIC Thread safe?

Br,

 

> What i don't understand is that both processes sits on same cpu and share same cache.

> I would expect that we have only one copy of the memory area in CPU and there for

> the QUADWORD problem should not exists.

 

What you expect has very little to do with a modern CPU and its behaviour related to shared memory access. You DON'T have one copy of memory, nor do you share caches. You can't assume anything about the order of memory accesses. Unless you're designing the microprocessor, or writing operating system kernels, don't try to think at the hardware level.

 

THE biggest cost in processor speed in a multiprocessing environment is synchronising memory states across multiple processes. Advanced processors like Alpha and Itanium in essence assume that you don't care if an update to memory location X in process A is synchronously visible to process B. This allows many optimisations and significantly increases the speed of the processor. But, it means you need to explicitly create synchronisation points in your instruction stream to make it clear where you care about memory ordering. Look up "Alpha memory barriers" if you're interested in the hardware details.

 

Most memory accesses are in private memory, so ordering and synchronisation don't matter. Where you are accessing common memory, you need to do proper synchronisation. If you're having trouble with $ENQ, then you must be doing it incorrectly. $ENQ works.

 

>So now I have a solution:

>To encapsule the struct/Record with 1-8 bytes (8 bytes most safe) from start

>and add a STRING FILL = 0 at end to get it to even boundary.

 

This is NOT a solution. It's a way of hiding the potential problems from your particular test program.

 

As I said, don't try to hack around it. Do it right. Shared memory MUST be properly synchronised.

A crucible of informative mistakes
abrsvc
Respected Contributor

Re: HPBASIC Thread safe?

BR,

 

Lets take a different approach.  As indicated above, there are methods to insure that there is no tearing of the data item.  You indicate that the $ENQ did not work for you.  Along with John, I too have used $ENQ without problems.  Perhaps we can assist in fixing the problems you are experiencing with the use of $ENQ.  Can you post a snippit of code that shows how you are using the $ENQ services?  Maybe a simple change to that will resolve the problem you are seeing.

 

Dan

Dennis Handly
Acclaimed Contributor

Re: HPBASIC Thread safe?

>ESPECIALLY when the unit of reference is smaller than a quadword, THEN you need some form of synchronisation

 

The latest C and C++ Standards threading model basically says this type of hardware would be unsupported, unless the software automatically provided this synchronization.  Also, the software couldn't optimize by loading multiple fields.

 

>Remember that the underlying hardware accesses memory in whole quadwords.  ...  the hardware reads the surrounding quadword, masks in the updated longword and writes back ...

 

The hardware?  Or the multiple generated instructions since there isn't just one?

GuentherF
Trusted Contributor

Re: HPBASIC Thread safe?

Does a "map" force natural alignment of elements? If not the counter_array would start at a +5 byte offset. In which case the generated code may fetch the quadword around one element of counter_array, update the long and write the quadword back. That would be on "top" of any hardware cache.

If that's the case then I wonder what happens if counter_array is declared first in counter_common.

/Guenther
GuentherF
Trusted Contributor

Re: HPBASIC Thread safe?

To eliminate a hardware cache related issue I would do the test with run_counter3 again but only with one CPU active.

It sounds odd to me to find a value that is 20+ iterations old. I can't make any sense out of this. Not yet.

/Guenther
John Gillings
Honored Contributor

Re: HPBASIC Thread safe?

re: Dennis,

 

> The hardware?

 

  Yes, the hardware. It can only read and write memory in whole (aligned) quadwords - it's an optimisation for speed, BUT it means if you want to update a smaller unit of memory, the hardware has to read the containing quadword, mask in the smaller field, then write back the quadword. That's why there's more than one operation involved to do such an update, and why they're vulnerable to word tearing if shared memory is updated with inadequate synchronisation. I think some of the later Alphas added instructions to do smaller granularity writes. 

 

>The latest C and C++ Standards threading model basically says this type of hardware would be

>unsupported, unless the software automatically provided this synchronization

 

So, they're mandating full memory synchronisation on all instructions just in case some bozo hasn't designed their threaded code correctly? The hardware CAN provide synchronisation, if required, but it's unnecessary and expensive to do it for all operations. That's how Alpha and Itanium work, and it's one of the reasons they're so fast. Please don't penalise my code just because you're too lazy to engineer yours properly!

A crucible of informative mistakes
Dennis Handly
Acclaimed Contributor

Re: HPBASIC Thread safe?

>if you want to update a smaller unit of memory, the hardware has to read the containing quadword, mask in the smaller field, then write back the quadword.

 

You're saying there IS a byte store instruction but it has to be done in non-atomic steps?

 

>they're mandating full memory synchronisation on all instructions just in case some bozo hasn't designed their threaded code correctly?

 

They do require 1, 2 and 4 byte aligned stores that are atomic.  The ordering probably requires a special qualifier like volatile.

 

>That's how Alpha and Itanium work, and it's one of the reasons they're so fast.

 

Yes but you expect atomic sub-word stores.  And now is required.

H.Becker
Honored Contributor

Re: HPBASIC Thread safe?

In the beginning, Alpha had only load and store instructions for (in VMS terms) quadwords and longwords (which are 64 and 32 bit entities). With EV56 the architecture was extended for word and byte (which are 16 and 8 bit entities) load and store instructions. Each (load and ) store intruction requires aligned data and is atomic.

 

That is until EV56 (and code generators using the then new instructions) you didn't expect atomic sub-longword stores.

 

Even with current compilers the code generator usually generates code compatible for all, including pre-EV56, CPUs. That results in a store byte operation to be non-atomic. You have to tell the compiler to make use of the "new" instructions. For example, the C compiler accepts an /ARCHITECTURE=EV56 switch.