data manipuation using

Ratzie · ‎10-02-2006

I have need to change two fields in a file.
$if "XG".eqs.F$EXTR(205,2,record)

Then replace WNPGMB01TTC with WNPIMBTVTOC at the 352 char pos (11 char length) and at the 504 char pos (11 char length)

I have used the following script, thanks to Heins, but this is not a swap of 1 field to another, but a replacement of one word for another.

So I am trying to wrap my head around it.

Appreciate any help.
Laura

$old = "CIRCUIT.DAT;4"
$new = "CIRCUIT.new;4"
$fdl = "CIRCUIT.FDL;4"
$tmp = "tmp.tmp"
$mod = 0
$num = 0
$
$close/nolog old
$close/nolog tmp
$creat/fdl=sys$input 'tmp
file; allo 20000; exte 5000; record; size 773; form fixed
$open/appe tmp 'tmp
$open/read old 'old
$
$loop:
$read/end=done old record
$num = num + 1
$if "XG".eqs.F$EXTR(205,2,record)
$then
$ mod = mod + 1
$
$ old_key = F$EXTR(00,38,record)
$ new_key = F$EXTR(86,38,record)
$ mid = F$EXTR(38,48,record)
$ rest = F$EXTR(124,999,record)
$ record = new_key + mid + old_key + rest
$
$ if mod.lt.3
$then
$ write sys$output "Record ''num', Old=''old_key', New=''New_key'"
$ sample = num
$endif
$
$endif
$write/symbol tmp record
$goto loop
$
$done:
$write sys$output "''num' records read, ''mod' modified."
$close old
$close tmp
$dump /record=(count=1,start='sample') 'tmp
$conv/stat/fast/sort/fdl='fdl' 'tmp 'new /EXCEPT=circuit.exception
$exit

Hein van den Heuvel · ‎10-02-2006

Hello again Laura,

This would be a perfect task for Datatrieve!
Perhaps you have that installed on your system? Perhaps you should?

You can do it with DCL as before.
And you could break up the record in chunks and glue them together. But I would just replace, in place.

The fields you need to change now are NOT part of the primary key, so an in-place update is fine. No need for delete + write.

You need something like my early reply to:
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=1059083

UNTESTED:
1) make a backup first?!
2) add /SHARE=WRITE to first open if needed.

$open/read/write tmp circuit.dat
$i = 0
$u = 0
$loop:
$i = i + 1 ! Count record
$read/end=done tmp record
$if "XG".nes.F$EXTR(205,2,record) then goto loop
$
$! Found XG, check first field to replace.
$
$prim = F$EXTR(0,38,record)
$WNPGMB01TTC = F$EXTR(352,11,record)
$if WNPGMB01TTC .NES. "WNPGMB01TTC"
$then
$ write sys$output "record ''i', primary=''prim' XG + X=''WNPGMB01TTC'"
$ goto loop
$endif
$
$! First field to replace OK, check second.
$
$WNPGMB01TTC = F$EXTR(504,11,record)
$if WNPGMB01TTC .NES. "WNPGMB01TTC"
$then
$ write sys$output "record ''i', primary=''prim' XG + Y=''WNPGMB01TTC'"
$ goto loop
$
$! Found XG and two fields were ok. Update!
$
$u = u + 1
$record[352,11] := "WNPIMBTVTOC"
$record[504,11] := "WNPIMBTVTOC"
$write/update/symbol tmp record
$goto loop
$
$done:
$write sys$output "''i' records read, ''u' updates'"
$close tmp

Cheers.
Hein

Phil.Howell · ‎10-02-2006

Another option for data manipulation is to use sort - with specification files.
This is also a lot faster than dcl with large data sets.
see this link for examples
http://h71000.www7.hp.com/doc/73final/6489/6489pro_026.html
Phil

Hein van den Heuvel · ‎10-02-2006

I agree, that in general bulk data processing is best done with tools like SORT or PERL or a dedicated COBOL program (apply deferred write) and so on.

But in this case, assuming we are about to update less that 10% of the records,

DCL will actually do about as well as anything else albeit in a clumsy manner. DTR

would be most elegant and clear. Actually, even SQL on a PC through an ODBC tool would probably be fine also.

Why would DCL do well (according to my expectations)? Well, in an other topic Laura showed this was an Indexed file. All 'normal' tools will read indexed files a bucket at a time through RMS. DTR will, Cobol will, DCL will, Sort will.
DCL wil only have one buffer, but that's all it needs for read next, read next,...
When an update candidate is found, DCL will tell RMS to update it in this one buffer causing one write IO. That's about as well as anything can do (barring deferred write and some luck that a subsequent update happens to the same bucket.

Hmm, must double check whether unchanged alternate keys cause IOs, I don't think so).

Yeah, Cobol, Basic, C or anything else really will burn less user CPU cycles figuring out what to do next, but that is hardly going to have an impact on top of the RMS $GETs and $UPDATES resource usage.

The only things which could speed this up is a tool which can read more than 1 bucket per IO at a time. I know of some: Freeware DIX, my Tune_check, and a commercial tools like "EGH SELECT" and Syncsort. SELECT (aka Vselect) and probably Synsort also, could readily generate a file with records to be updated, quicker than anything else out there.

For this case, I suspect it is more critical, to jsut get the job done.
If these kind of questions come back though, then a investing (learning, aquisition) of a tool like Datatrieve, DIX, SELECT or Syncsort would quickly earn its keep.

Regards,
Hein
HvdH Performance Consulting.

Ratzie · ‎10-03-2006

Hi Hein thanks again for all the userful information.

I get an error when I run the script.

%DCL-E-INVIFNEST, invalid IF-THEN-ELSE nesting structure or data inconsistency

I have no idea, I believe it is part of the if statement...

Steven Schweda · ‎10-03-2006

Perhaps a "$endif" after the second
"$ goto loop"?

Hein van den Heuvel · ‎10-03-2006

Well, I said DCL would be more or less on par for performance but it is not.
Not on my RX2620. It burns way to much CPU.
Things like a logical name translation for each and every "$READ file record" does not help.

Here are some experiment numbers:

GB file, 10.1M records @300 bytes,
10K selected records, 7% buckets split
RX2620, 73GB 15Krpm drive (LDdriver)

Tool IO CPU Elapsed
-------------------------------
DCL 400,000 14:12 16:38
SEARCH 400,000 03:09 04:55
SELECT 74,400 00:25 01:46
SELECT/A 108,000 00:33 01:54
Tune* 72,000 00:20 01:00

More fragmentation (18% buckets split)
(more, smalle, clumps/batches of record)

Tool IO CPU Elapsed
-------------------------------
DCL 427,000 14:41 17:57
DIX* 421,000 04:45 07:52
SEARCH 421,000 03:34 06:35
SELECT 98,000 00:23 02:15
SELECT/A 114,000 00:35 01:59
Tune* 109,000 00:45 01:23

*: reads all data and counts, but does not select.

DIX /FAST has great potential... but did not work on my test file.
http://oooovms.dyndns.org/dix/

Load data...
Initial 10.0M record load (presorted), and
additional 100K record add (random clumps)

Tool IO CPU Elapsed
-------------------------------
Convert 61,000 00:59 04:16
Frag 7% 147,000 00:20 05:13
Frag 18 357,000 00:45 13:00

Cheers,
Hein.

Ratzie · ‎02-08-2007

Thanks

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

data manipuation using

data manipuation using

Re: data manipuation using

Re: data manipuation using

Re: data manipuation using

Re: data manipuation using

Re: data manipuation using

Re: data manipuation using

Re: data manipuation using