1831208 Members
3033 Online
110021 Solutions
New Discussion

data manipuation using

 
SOLVED
Go to solution
Ratzie
Super Advisor

data manipuation using

I have need to change two fields in a file.
$if "XG".eqs.F$EXTR(205,2,record)

Then replace WNPGMB01TTC with WNPIMBTVTOC at the 352 char pos (11 char length) and at the 504 char pos (11 char length)

I have used the following script, thanks to Heins, but this is not a swap of 1 field to another, but a replacement of one word for another.

So I am trying to wrap my head around it.

Appreciate any help.
Laura


$old = "CIRCUIT.DAT;4"
$new = "CIRCUIT.new;4"
$fdl = "CIRCUIT.FDL;4"
$tmp = "tmp.tmp"
$mod = 0
$num = 0
$
$close/nolog old
$close/nolog tmp
$creat/fdl=sys$input 'tmp
file; allo 20000; exte 5000; record; size 773; form fixed
$open/appe tmp 'tmp
$open/read old 'old
$
$loop:
$read/end=done old record
$num = num + 1
$if "XG".eqs.F$EXTR(205,2,record)
$then
$ mod = mod + 1
$
$ old_key = F$EXTR(00,38,record)
$ new_key = F$EXTR(86,38,record)
$ mid = F$EXTR(38,48,record)
$ rest = F$EXTR(124,999,record)
$ record = new_key + mid + old_key + rest
$
$ if mod.lt.3
$then
$ write sys$output "Record ''num', Old=''old_key', New=''New_key'"
$ sample = num
$endif
$
$endif
$write/symbol tmp record
$goto loop
$
$done:
$write sys$output "''num' records read, ''mod' modified."
$close old
$close tmp
$dump /record=(count=1,start='sample') 'tmp
$conv/stat/fast/sort/fdl='fdl' 'tmp 'new /EXCEPT=circuit.exception
$exit
7 REPLIES 7
Hein van den Heuvel
Honored Contributor
Solution

Re: data manipuation using

Hello again Laura,

This would be a perfect task for Datatrieve!
Perhaps you have that installed on your system? Perhaps you should?

You can do it with DCL as before.
And you could break up the record in chunks and glue them together. But I would just replace, in place.

The fields you need to change now are NOT part of the primary key, so an in-place update is fine. No need for delete + write.

You need something like my early reply to:
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=1059083


UNTESTED:
1) make a backup first?!
2) add /SHARE=WRITE to first open if needed.


$open/read/write tmp circuit.dat
$i = 0
$u = 0
$loop:
$i = i + 1 ! Count record
$read/end=done tmp record
$if "XG".nes.F$EXTR(205,2,record) then goto loop
$
$! Found XG, check first field to replace.
$
$prim = F$EXTR(0,38,record)
$WNPGMB01TTC = F$EXTR(352,11,record)
$if WNPGMB01TTC .NES. "WNPGMB01TTC"
$then
$ write sys$output "record ''i', primary=''prim' XG + X=''WNPGMB01TTC'"
$ goto loop
$endif
$
$! First field to replace OK, check second.
$
$WNPGMB01TTC = F$EXTR(504,11,record)
$if WNPGMB01TTC .NES. "WNPGMB01TTC"
$then
$ write sys$output "record ''i', primary=''prim' XG + Y=''WNPGMB01TTC'"
$ goto loop
$
$! Found XG and two fields were ok. Update!
$
$u = u + 1
$record[352,11] := "WNPIMBTVTOC"
$record[504,11] := "WNPIMBTVTOC"
$write/update/symbol tmp record
$goto loop
$
$done:
$write sys$output "''i' records read, ''u' updates'"
$close tmp

Cheers.
Hein
Phil.Howell
Honored Contributor

Re: data manipuation using

Another option for data manipulation is to use sort - with specification files.
This is also a lot faster than dcl with large data sets.
see this link for examples
http://h71000.www7.hp.com/doc/73final/6489/6489pro_026.html
Phil
Hein van den Heuvel
Honored Contributor

Re: data manipuation using



I agree, that in general bulk data processing is best done with tools like SORT or PERL or a dedicated COBOL program (apply deferred write) and so on.

But in this case, assuming we are about to update less that 10% of the records,

DCL will actually do about as well as anything else albeit in a clumsy manner. DTR

would be most elegant and clear. Actually, even SQL on a PC through an ODBC tool would probably be fine also.

Why would DCL do well (according to my expectations)? Well, in an other topic Laura showed this was an Indexed file. All 'normal' tools will read indexed files a bucket at a time through RMS. DTR will, Cobol will, DCL will, Sort will.
DCL wil only have one buffer, but that's all it needs for read next, read next,...
When an update candidate is found, DCL will tell RMS to update it in this one buffer causing one write IO. That's about as well as anything can do (barring deferred write and some luck that a subsequent update happens to the same bucket.

Hmm, must double check whether unchanged alternate keys cause IOs, I don't think so).

Yeah, Cobol, Basic, C or anything else really will burn less user CPU cycles figuring out what to do next, but that is hardly going to have an impact on top of the RMS $GETs and $UPDATES resource usage.

The only things which could speed this up is a tool which can read more than 1 bucket per IO at a time. I know of some: Freeware DIX, my Tune_check, and a commercial tools like "EGH SELECT" and Syncsort. SELECT (aka Vselect) and probably Synsort also, could readily generate a file with records to be updated, quicker than anything else out there.

For this case, I suspect it is more critical, to jsut get the job done.
If these kind of questions come back though, then a investing (learning, aquisition) of a tool like Datatrieve, DIX, SELECT or Syncsort would quickly earn its keep.

Regards,
Hein
HvdH Performance Consulting.
Ratzie
Super Advisor

Re: data manipuation using

Hi Hein thanks again for all the userful information.

I get an error when I run the script.

%DCL-E-INVIFNEST, invalid IF-THEN-ELSE nesting structure or data inconsistency

I have no idea, I believe it is part of the if statement...
Steven Schweda
Honored Contributor

Re: data manipuation using

Perhaps a "$endif" after the second
"$ goto loop"?
Hein van den Heuvel
Honored Contributor

Re: data manipuation using


Well, I said DCL would be more or less on par for performance but it is not.
Not on my RX2620. It burns way to much CPU.
Things like a logical name translation for each and every "$READ file record" does not help.

Here are some experiment numbers:

GB file, 10.1M records @300 bytes,
10K selected records, 7% buckets split
RX2620, 73GB 15Krpm drive (LDdriver)

Tool IO CPU Elapsed
-------------------------------
DCL 400,000 14:12 16:38
SEARCH 400,000 03:09 04:55
SELECT 74,400 00:25 01:46
SELECT/A 108,000 00:33 01:54
Tune* 72,000 00:20 01:00


More fragmentation (18% buckets split)
(more, smalle, clumps/batches of record)


Tool IO CPU Elapsed
-------------------------------
DCL 427,000 14:41 17:57
DIX* 421,000 04:45 07:52
SEARCH 421,000 03:34 06:35
SELECT 98,000 00:23 02:15
SELECT/A 114,000 00:35 01:59
Tune* 109,000 00:45 01:23

*: reads all data and counts, but does not select.

DIX /FAST has great potential... but did not work on my test file.
http://oooovms.dyndns.org/dix/

Load data...
Initial 10.0M record load (presorted), and
additional 100K record add (random clumps)

Tool IO CPU Elapsed
-------------------------------
Convert 61,000 00:59 04:16
Frag 7% 147,000 00:20 05:13
Frag 18 357,000 00:45 13:00

Cheers,
Hein.
Ratzie
Super Advisor

Re: data manipuation using

Thanks