Operating System - OpenVMS
1752379 Members
5987 Online
108788 Solutions
New Discussion юеВ

Re: HPBASIC Thread safe?

 
SOLVED
Go to solution
Hein van den Heuvel
Honored Contributor

Re: HPBASIC Thread safe?

 

>> If Set aff realy works and kernel don't have a bug that starts to migrate processes during load, then:

 

Yeah right. You, as a self professed nOOb,  think you are going to find a kernel bug on your first day. NFW!

more likely: If it does not work as you expect, then your understanding is not yet adequate.

That was proven by the opening line... this was not at all about thread safety (which is understood to be within a process), but about share data access from multiple processes (kernel threads, I give you that)

 

You may want to go more aggresive on the alignment/seperation.

256 bits would be my first choice ( 32 bytes ) 

Is this application garantueed never ever to be ported to Itanium?

 

Cheers,

Hein.

 

 

 

 

H.Becker
Honored Contributor

Re: HPBASIC Thread safe?

On Alpha (and Itanium) systems, with more than one writer (process), you need to synchronize access to shared data, no matter how many CPUs your application is running on and what the size of the data and its alignment (or separation) is. The load locked/store conditional instructions can be used to construct such a synchronization, but high level OS or language synchronization methods are recommended .

 

Your examples show more than one writer.

 
With one writer and many readers you can get away without synchronization but ONLY IF the data is naturally aligned AND the CPU provides single instructions to access the entire data.
 
With multiple CPUs you want to separate the data - have them in separate cache lines - to avoid cache trashing, which is a performance penalty.
John Gillings
Honored Contributor

Re: HPBASIC Thread safe?

Br,

 

> What i don't understand is that both processes sits on same cpu and share same cache.

> I would expect that we have only one copy of the memory area in CPU and there for

> the QUADWORD problem should not exists.

 

What you expect has very little to do with a modern CPU and its behaviour related to shared memory access. You DON'T have one copy of memory, nor do you share caches. You can't assume anything about the order of memory accesses. Unless you're designing the microprocessor, or writing operating system kernels, don't try to think at the hardware level.

 

THE biggest cost in processor speed in a multiprocessing environment is synchronising memory states across multiple processes. Advanced processors like Alpha and Itanium in essence assume that you don't care if an update to memory location X in process A is synchronously visible to process B. This allows many optimisations and significantly increases the speed of the processor. But, it means you need to explicitly create synchronisation points in your instruction stream to make it clear where you care about memory ordering. Look up "Alpha memory barriers" if you're interested in the hardware details.

 

Most memory accesses are in private memory, so ordering and synchronisation don't matter. Where you are accessing common memory, you need to do proper synchronisation. If you're having trouble with $ENQ, then you must be doing it incorrectly. $ENQ works.

 

>So now I have a solution:

>To encapsule the struct/Record with 1-8 bytes (8 bytes most safe) from start

>and add a STRING FILL = 0 at end to get it to even boundary.

 

This is NOT a solution. It's a way of hiding the potential problems from your particular test program.

 

As I said, don't try to hack around it. Do it right. Shared memory MUST be properly synchronised.

A crucible of informative mistakes
abrsvc
Respected Contributor

Re: HPBASIC Thread safe?

BR,

 

Lets take a different approach.  As indicated above, there are methods to insure that there is no tearing of the data item.  You indicate that the $ENQ did not work for you.  Along with John, I too have used $ENQ without problems.  Perhaps we can assist in fixing the problems you are experiencing with the use of $ENQ.  Can you post a snippit of code that shows how you are using the $ENQ services?  Maybe a simple change to that will resolve the problem you are seeing.

 

Dan

Dennis Handly
Acclaimed Contributor

Re: HPBASIC Thread safe?

>ESPECIALLY when the unit of reference is smaller than a quadword, THEN you need some form of synchronisation

 

The latest C and C++ Standards threading model basically says this type of hardware would be unsupported, unless the software automatically provided this synchronization.  Also, the software couldn't optimize by loading multiple fields.

 

>Remember that the underlying hardware accesses memory in whole quadwords.  ...  the hardware reads the surrounding quadword, masks in the updated longword and writes back ...

 

The hardware?  Or the multiple generated instructions since there isn't just one?

GuentherF
Trusted Contributor

Re: HPBASIC Thread safe?

Does a "map" force natural alignment of elements? If not the counter_array would start at a +5 byte offset. In which case the generated code may fetch the quadword around one element of counter_array, update the long and write the quadword back. That would be on "top" of any hardware cache.

If that's the case then I wonder what happens if counter_array is declared first in counter_common.

/Guenther
GuentherF
Trusted Contributor

Re: HPBASIC Thread safe?

To eliminate a hardware cache related issue I would do the test with run_counter3 again but only with one CPU active.

It sounds odd to me to find a value that is 20+ iterations old. I can't make any sense out of this. Not yet.

/Guenther
John Gillings
Honored Contributor

Re: HPBASIC Thread safe?

re: Dennis,

 

> The hardware?

 

  Yes, the hardware. It can only read and write memory in whole (aligned) quadwords - it's an optimisation for speed, BUT it means if you want to update a smaller unit of memory, the hardware has to read the containing quadword, mask in the smaller field, then write back the quadword. That's why there's more than one operation involved to do such an update, and why they're vulnerable to word tearing if shared memory is updated with inadequate synchronisation. I think some of the later Alphas added instructions to do smaller granularity writes. 

 

>The latest C and C++ Standards threading model basically says this type of hardware would be

>unsupported, unless the software automatically provided this synchronization

 

So, they're mandating full memory synchronisation on all instructions just in case some bozo hasn't designed their threaded code correctly? The hardware CAN provide synchronisation, if required, but it's unnecessary and expensive to do it for all operations. That's how Alpha and Itanium work, and it's one of the reasons they're so fast. Please don't penalise my code just because you're too lazy to engineer yours properly!

A crucible of informative mistakes
Dennis Handly
Acclaimed Contributor

Re: HPBASIC Thread safe?

>if you want to update a smaller unit of memory, the hardware has to read the containing quadword, mask in the smaller field, then write back the quadword.

 

You're saying there IS a byte store instruction but it has to be done in non-atomic steps?

 

>they're mandating full memory synchronisation on all instructions just in case some bozo hasn't designed their threaded code correctly?

 

They do require 1, 2 and 4 byte aligned stores that are atomic.  The ordering probably requires a special qualifier like volatile.

 

>That's how Alpha and Itanium work, and it's one of the reasons they're so fast.

 

Yes but you expect atomic sub-word stores.  And now is required.

H.Becker
Honored Contributor

Re: HPBASIC Thread safe?

In the beginning, Alpha had only load and store instructions for (in VMS terms) quadwords and longwords (which are 64 and 32 bit entities). With EV56 the architecture was extended for word and byte (which are 16 and 8 bit entities) load and store instructions. Each (load and ) store intruction requires aligned data and is atomic.

 

That is until EV56 (and code generators using the then new instructions) you didn't expect atomic sub-longword stores.

 

Even with current compilers the code generator usually generates code compatible for all, including pre-EV56, CPUs. That results in a store byte operation to be non-atomic. You have to tell the compiler to make use of the "new" instructions. For example, the C compiler accepts an /ARCHITECTURE=EV56 switch.