Re: Physical Representation of Binary Data

Dave Overall · ‎11-28-2007

I'm working in COBOL and want to manipulate data at the bit level by redefining an alphanumeric field as a numeric COMP field. To do this I need to understand how positive and negative integers are stored in binary fields. The reference and user manuals simply describe the format as "Word integer".

What is the physical representation of these word integers? What would be the hexadecimal result of placing -1234 in an S9(4) COMP field? (It doesn't seem to be "FB2E" which is what I expected).

Many thanks for any help on this.

Hein van den Heuvel · ‎11-28-2007

You are probably just confused with little endian vs big endian

( http://en.wikipedia.org/wiki/Endianness )

Handy documenation for the OpenVMS data types can be found (amongst other places) in the MACRO language manual:

http://h71000.www7.hp.com/doc/73final/4515/4515pro_012.html#basic_architecture

I don't need to tell you that Cobol, while a find language in general, is not very suited for bit manipulation no?

For a convienent, albiet not max speed, method to didle with bit-vield you may want to check out the functions LIB$EXTV and LIB$INSV:

http://h71000.www7.hp.com/DOC/82final/5932/5932pro_016.html

and

http://h71000.www7.hp.com/doc/82final/5932/5932pro_032.html

But you can also do an AND and stuff on comps.

Good luck!
Hein.

Robert Gezelter · ‎11-28-2007

Dave,

I agree with Hein, the most likely source of confusion has to do with little-endian vs big-endian byte ordering.

In a little endian representation (used on all OpenVMS platforms), the lowest order bit is at the lowest address. Thus, in the example of -1234 (0xFB2E), the lowest order byte (0x2E) will be at the lowest address. In a big-endian representation, largest byte is at the lowest address.

I do not have the citation, but there was a nice article a long time ago about this difference in either IEEE Spectrum or IEEE Computer.

Most architectures that are currently extant use two's complement arithmetic. In thhis scheme, the negative numbers are padded with high order "1" bits, the positive numbers are padded with high order "0" bits.

One of the architecture handbooks is good for a discussion of two's complement arithmetic, as are any of the college-level texts on computer hardware design.

What "unexpected" results are you seeing?

- Bob Gezelter, http://www.rlgsc.com

Hein van den Heuvel · ‎11-28-2007

btw... if you defined onto a pic9(4) only because 'that is big enough', then please don't.

Please used pic 9(9) comp or pic s9(9) comp everywhere you can and make them aligned by using a 01, or a carefully arranged lower level field.

To convince yourself as to why, please take the 5 or 10 minutes to study the output from COBOL/LIST/MACHINE for a short program.

hth,
Hein.

John Gillings · ‎11-28-2007

Dave,
What do you need to do at the bit level? There may be easier (and faster) ways.

As Hein suggested, COBOL is great at what COBOL does, but low level bit manipulation isn't one of them.

With such a rich range of data types, it's also sometimes difficult to determine exactly what representation COBOL will use. There's a large table in an appendix in the Cobol Reference Manual which maps COBOL PIC clauses to lower level data types. You may also need to refer to the Architecture Reference Manual for your architecture to work out the binary representation for the data types.

A crucible of informative mistakes

Phil.Howell · ‎11-28-2007

I would write a simple program with a selection of data types, and then run it with /debug
You can then deposit -1234 and then examine the data item, it will be FB2E
you can even examine/hex or examine/binary to be sure to be sure.
you could also define the item as usage binary, but I don't know if this would be of any benefit.
Phil

Dave Overall · ‎11-29-2007

To Hein, Robert, John and Phil,

Thanks very much for all your responses. It's been a very useful learning experience but after running into parity bit problems I've decided to do this another way.

Thanks

Dave

Hoff · ‎11-29-2007

Parity bit? What parity bit?

Are you working with a serial line, or some sort of external hardware?

Neither VAX, Alpha nor Itanium memory will have any sort of (application-visible) parity bit.

There's certainly a sign bit that can come into play, depending on how the integer value is declared and processed.

Dave Overall · ‎12-04-2007

Hello Hoff,

I tried to follow-up on each of the suggestions above but I still don't seem to be able to match the results I'm getting with the results I think I should get. Let me try and provide a more complete description of what I'm trying to do.

I need to quantify the difference between one 32 byte alphanumeric string (A) and another (B). I redefine both 01 level strings with PIC S9(4) COMP OCCURS 16 and subtract each binary field in string (A) from each binary field in string (B), checking for a non-zero result.

When the two byte positions are exactly the same, I get zero - as expected. However, in my test case, I have string (A) which contains "RS" in positions 10 & 11 and string (B) which contains "RT" in positions 10 & 11. Originally, I expected a difference of 1 but after reading about little endian binary storage, I realised that VMS will view these strings as "SR" and "TR" in terms of their numerical significance.

According to my ASCII tables "SR" should convert to hex "5352" and "TR" to hex "5452". This should give me the result of hex "0100" or decimal 256. Even if I do the maths in binary: 01010100,01010010 - 01010011,01010010 = 100000000 which is still decimal 256.

The problem is that in my test results, I get decimal 265 as a result of this subtraction.

Then I thought of ASCII parity bits. I tried the subtraction using even parity and got some very large nagative numbers. I presumed that these were being truncated at various stages in the calculation and that this must be the explanation for my results - although I've not been able to work out exactly what was going on.

If you're saying that parity bits wouldn't be included in my text strings, I'm even more confused than before.

I probably just shouldn't try to do bit manipulation using COBOL but I feel I should be able to understand what's being stored in these fields.

Dave

Dave Overall · ‎12-04-2007

Sorry!

The "RT"/"RS" are in positions 9 & 10 - not 10 & 11 as previously stated.

Dave

John Gillings · ‎12-04-2007

Dave,

With little endian representation, ASCII strings are read "backwards". The first character in the string is the least significant byte (lowest address). You probably have to play with the DUMP command to see how it works. Simple example:

$ create ascii.txt
here is some ASCII text^Z

$ dump/record ascii.txt

Dump of file DKA100:[JG]ASCII.TXT;1 on 5-DEC-2007 09:05:20.57
File ID (105,3941,0) End of file block 1 / Allocated 141

Record number 1 (00000001), 23 (0017) bytes, RFA(0001,0000,0000)

43534120 656D6F73 20736920 65726568 here is some ASC 000000
747865 74204949 II text......... 000010

With a dump you read the text part on the right from left to right, but the binary part on the left from right to left. (sounds a bit silly when expressed like that, but that's how it works, and when you get used to it is makes sense).

So, the text "here" translates to the right most 32 bit integer, "65726568" which you need to read right to left. The LEAST significant byte when interpreted as a numeric value is "h". So, if you have 4 byte strings, which you want to interpret as INTEGER and sort in the same sequence as the ASCII text, you need to shuffle the bytes around. Swap bytes 0 and 3 and swap bytes 1 and 2. So "here" would transform into "68657265" which would then read as ASCII "ereh".

Does this clear up anything?

A crucible of informative mistakes

John Gillings · ‎12-04-2007

Sorry, format of the dump messed up. Hopefully this will look better:

$ dump/record ascii.txt

Dump of file DKA100:[JG]ASCII.TXT;1 on 5-DEC-2007 09:05:20.57
File ID (105,3941,0) End of file block 1 / Allocated 141

Record number 1 (00000001), 23 (0017) bytes, RFA(0001,0000,0000)

43534120 656D6F73 20736920 65726568 here is some ASC 000000
747865 74204949 II text......... 000010

A crucible of informative mistakes

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Physical Representation of Binary Data

Physical Representation of Binary Data