- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- splitting up files with unknown record type
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-15-2004 09:46 AM
07-15-2004 09:46 AM
splitting up files with unknown record type
I am wondering what options there are to split a file into smaller parts.
We have a script that collects files from different locations (nodes/disks) and put all those files together in a single file, using appen. But since that is not working with the original file organisation, the record type is set to unknown or unformatted. The append wll then work. Usually a maximum file size is configured in the script, but sometimes this is forgotten and the result is a file too large to handle.
So my question is: how to split it up again if the organisation is unformatted.
(I will get back on the exact file attributes. I do not have the info at hand right now)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-15-2004 04:01 PM
07-15-2004 04:01 PM
Re: splitting up files with unknown record type
I'm not sure I understand your question. Obviously you must have some criteria that you use to decide where the file gets split. Tyically, I'd expect the split point would be at a record boundary. But, if the file doesn't have record structure, how can that work?
Depending on the contents of the file, you might be able use SET FILE to change attributes to some "standard" format. You can then have a program read the file, interpret the data and split it up any way you like. Choosing the "best" format depends on your data.
Please post the exact file attributes of the files you're dealing with, and describe how you know where to split the data. Anything is possible!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-15-2004 05:22 PM
07-15-2004 05:22 PM
Re: splitting up files with unknown record type
I'm a little suprised how you entered this topic with half a story. No exact commands, no numbers as what is 'too large too handle', no exact error message why a normal append did not work, no indication of what might be in the individual files as to what content might set them appart.
For further help, show use the head and tail of a typical file. The dir/full output for an aggregate file. A DUMP of a block or two in an attached text tile. Stuff like that.
Also VMS itself has not notion of 'too large to handle'. So please help us understand why your application has that notion.
When those files are not too large, how do you use them? type/page, edit, and application only?
VMS has no native 'split' command. But is it trivial to write one in DCL or PERL to count record and/or bytes and hunt for record endings. If the file has no structure, then I'd gran perl with 'binmode'. Or I'd change the fiel attribute to fixed-512 and used dcl reads to read 512 byte chunks and look for structure within those chunks.
Cheers,
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-15-2004 06:15 PM
07-15-2004 06:15 PM
Re: splitting up files with unknown record type
You're both right. However, I entered it when I was at home. I remembered somebody asking me this question, but I did not have a chance to look into it, and I did not want to forget to enter the topic or find a solution. I think I should have sent an e-mail to my e-mail address at work.
Anyway, since I have entered the topic, we might as well continue.
More info:
1) This is how the script does the append:
$ rfm=F$FILE_ATTRIBUTES(in_file, "RFM")
$ SET FILE/ATTRIBUTE=(RFM:UDF) 'in_file'
$ IF F$SEARCH(outfile).NES."" THEN -
$ SET FILE/ATTRIBUTE=(RFM:UDF) 'outfile'
$ APPEND /LOG /NEW_VERSION 'in_file' 'outfile'
$ IF $STATUS
$ THEN
$ DELETE /LOG 'in_file'
$ ENDIF
$ SET FILE/ATTRIBUTE=(RFM:'rfm') 'outfile'
2) File organisation of in and output files:
File organization: Sequential
Shelved state: Online
Caching attribute: Writethrough
File attributes: Allocation: 0, Extend: 0, Global buffer count: 0, No version limit, Contiguous best try
Record format: Stream_LF, maximum 0 bytes, longest 0 bytes
Record attributes: Carriage return carriage control
RMS attributes: None
Journaling enabled: None
File protection: System:RWED, Owner:RWED, Group:RE, World:
Access Cntrl List: None
Client attributes: None
3) To large to handle is in terms of the application that has to deal with it. If the file is too large, it runs out of memory.
Of course I should have gathered all info before entering this topic. It is clear to me now that without knowing how the file is build up, it is impossible to split.
The data in the file is ASN.1 encoded, so I'll probably have to write something to split the file. It can probably not be done with some simple scripting.
I am always interested in DCL examples for splitting file (or anything else).
Cheers,
Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-15-2004 06:35 PM
07-15-2004 06:35 PM
Re: splitting up files with unknown record type
I dont'understand the scope of your application but it seems you will create a own library of files. Did you think use library instead append?
Look at this example:
$ LIBR/CREA/TEXT infile.TLB
$ LIBR/INS infile.TLB outfile1
$ LIBR/INS infile.TLB outfile2
Now infile contains 2 files.
You can split (extract) typing:
$ LIBR/EXTR=outfile1 infile.TLB
This is only an example and perhaps can't help you but might be a good idea to work.
Antonio Vigliotti
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-15-2004 06:44 PM
07-15-2004 06:44 PM
Re: splitting up files with unknown record type
its is not a library.
The process that creates the files has certain settings that have influence on the file size. The settings are either a number of ASN.1 encoded records in the file or the time the file is allowed to be open. In either case the file is closed and a new file is created.
There is more than one process doing that. All the files created by those processes are collected from different nodes/disks. The script that does the collecting, can be configured to append files (to prevent e.g. the disk index from running full).
After that the files are transferred to a system that processes the data in it.
Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-15-2004 07:45 PM
07-15-2004 07:45 PM
Re: splitting up files with unknown record type
do you need to read a unique file with included (appended) files?
If not, library works file for you.
Antonio Vigliotti
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-15-2004 09:57 PM
07-15-2004 09:57 PM
Re: splitting up files with unknown record type
If the record format of the files is not equal, you recieve a corrupted file with the append command, because SET FILE /ATTRIBUTES does not change the file structure, only file attributes are changed.
I make a small test:
I create a small sequential file VARIABLE record format:
$ CREATE A.TMP
aaaaaaaaaaaa
bbbbbbbbbbbb
Then I created another one with the CONVERT utility:
$ CONVERT/FDL=STREAM.FDL A.TMP B.TMP
Content of STREAM.FDL:
IDENT "16-JUL-2004 09:45:19 OpenVMS FDL Editor"
SYSTEM
SOURCE "OpenVMS"
FILE
ALLOCATION 0
BEST_TRY_CONTIGUOUS yes
EXTENSION 0
ORGANIZATION sequential
RECORD
BLOCK_SPAN yes
CARRIAGE_CONTROL carriage_return
FORMAT stream
SIZE 0
After that I append the files with yours script, so that in_file="B.TMP" and outfile="A.TMP".
The content of the resultant file was a corrupted file with the content:
aaaaaaaaaaaa
bbbbbbbbbbbbaaaaaaaaaaaa
bbbbbbbbbbbb
So I think, that you must modify the append script! First you must find out which organisation and record format is needed after append, then create a FDL for it and convert all files whit this FDL before appending.
Bojan Nemec
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-15-2004 10:11 PM
07-15-2004 10:11 PM
Re: splitting up files with unknown record type
I think I cannot split the file with an existing VMS command. neither with a DCL script.
I think I will write a little program that understand ASN.1 and can find out the end of an ASN.1 record.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-15-2004 10:20 PM
07-15-2004 10:20 PM
Re: splitting up files with unknown record type
you posted input file and output file have same format (stream_lf); why do you set to UDF before append? Your target file still remain UDF not stream_lf; I guess if you set your target file as stream_lf you can read and split (using appropriate DCL procedure).
Antonio Vigliotti
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-15-2004 10:22 PM
07-15-2004 10:22 PM
Re: splitting up files with unknown record type
I don't think so because he said it was stream_lf.
You can ftp the file to unix and split it over there.
You could use dcl to split it (but create the file with an fdl for stream_lf because default is VFC) :
$ i=-1
$ nam=0
$ open/read inp 'p1'
$r:
$ i=i+1
$ read/end=e inp rec
$ j=(i/10)*10
$ if j .eq. i
$ then
$ if f$tr("outp") .nes. "" then close outp
$ nam=nam+1
$ open/write outp 'p2'_'nam'
$ endif
$ write outp rec
$ goto r
$e:
$ close inp
$ close outp
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-16-2004 01:32 AM
07-16-2004 01:32 AM
Re: splitting up files with unknown record type
Sorry I missed that all files are STREAM_LF. I also think that you must write a program.
If yours files has records that are to long (more than 32,767 bytes) you will have problems reading it. I have same problems reading XML files which are no human formated (no LF).
There are some pices of the program in C:
#include
#include
.
.
.
struct FAB infab;
struct RAB inrab;
int stat;
char buffer[1024];
char * filename;
.
.
.
.
infab = cc$rms_fab;
infab.fab$l_fna = filename;
infab.fab$b_fns = strlen(filename);
infab.fab$b_fac = FAB$M_BIO|FAB$M_GET;
inrab = cc$rms_rab;
inrab.rab$l_fab = &infab;
inrab.rab$l_bkt = 0;
inrab.rab$l_ubf = buffer;
inrab.rab$w_usz = 1024;
stat = sys$open (&infab);
if (!(stat & 1)) sys$exit (stat);
stat = sys$connect (&inrab);
if (!(stat & 1)) sys$exit (stat);
for (;;)
{
stat = sys$read (&inrab);
if (!(stat & 1))
{
if (stat == RMS$_EOF)
{
/* End of file */
break;
} else {
sys$exit (stat);
}
} else {
/*
Process data in buffer.
The length of the buffer is in inrab.rab$w_rsz
For example if you want to copy the buffer to another buffer
memcpy (newbuffer , buffer , inrab.rab$w_rsz);
*/
}
}
For more on programming with RMS look at:
http://h71000.www7.hp.com/doc/731FINAL/4523/4523PRO.HTML
Bojan Nemec
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-16-2004 01:44 AM
07-16-2004 01:44 AM
Re: splitting up files with unknown record type
and search for split. I am unable to get it over here but maybe it is what you are looking for.
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-16-2004 02:36 AM
07-16-2004 02:36 AM
Re: splitting up files with unknown record type
" But since that is not working with the original file organisation, the record type is set to unknown or unformatted. "
Are you sure that did not work, or was someone confused by the warning messages?
Witness log below.
The really scary thing to mix/append through this 'udf' lie is stream and variable length (the VMS default). The is because the variable length recrod files have meta-data in the file (the record length) and you turned that into user data. Mixing various stram format, and the fiel 'print attributes' is sort of ok, as you can sort it out later. (a little tricky to terminate on LF or CR or CR+LF, but it can be sorted out).
In conclusion... you might not have a problem on the append side.
Next reply on the splitting.
Hein.
$ cre/fdl=nl: tmp_var.tmp
$ cre/fdl=tt: tmp_lf.tmp
record; format stream_lf;
$ appen/log tt: tmp_var.tmp
aap
noot
%APPEND-S-APPENDED, TNA78: appended to U$1:[HEIN]TMP_VAR.TMP;1 (2 records)
$ appen/log tt: tmp_lf.tmp
mies
teun
%APPEND-S-APPENDED, TNA78: appended to U$1:[HEIN]TMP_LF.TMP;1 (2 records)
$ cre/fdl=nl: tmp.tmp
$ append tmp_var.tmp tmp.tmp/log
%APPEND-S-APPENDED, U$1:[HEIN]TMP_VAR.TMP;1 appended to U$1:[HEIN]TMP.TMP;11 (2 records)
$ append tmp_lf.tmp tmp.tmp/log
%APPEND-W-INCOMPAT, U$1:[HEIN]TMP_LF.TMP;1 (input) and U$1:[HEIN]TMP.TMP;11 (output) have incompatible attributes
%APPEND-S-APPENDED, U$1:[HEIN]TMP_LF.TMP;1 appended to U$1:[HEIN]TMP.TMP;11 (2 records)
$ type tmp.tmp
aap
noot
mies
teun
So there was this "%APPEND-W-INCOMPAT".
But it did work as expected!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-16-2004 03:09 AM
07-16-2004 03:09 AM
Re: splitting up files with unknown record type
You can also use
$ convert/append inputf outputf
That will read inputf and convert the records to the format of outputf and then append them.
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-16-2004 03:26 AM
07-16-2004 03:26 AM
Re: splitting up files with unknown record type
From what I gather so far, in the end you just have stream_lf records in a stream_lf file, or at least you could have, if the first file is that, and accept the warnigs from append.
So that should make is straightforward to read records in the file. IF the individual records arr in the (low) hundreds of bytes, then you can use DCL to split.
Be sure to pre-allocate new output and/or use large extents.
Supposedly you have lots of data, and thus
I'd recommend a little home-grown program in the language of your choice (C, BASIC, PERL...). If you choose a language then you
can optimize the split a lot by using TRUNCATE after splitting:
1) read untill limit
2) rememeber RFA
3) create next file
4) start copying
5) close next file
6) reposition input using RFA (quick!)
7) truncate original (quick!)
Somehow... and only you know this so far... you need to be able to tell the stsart of a new 'file' within the file.
You need to know the absolute max for split file sizes (OUT_MAX), and it is probably handy to know the size of the biggest appended files(ADD_MAX).
In DCL you might do something like (UNTESTED, NOT EVEN SYNTAX CHECKED!)
$MAX_BYTE = OUT_MAX_BYTE - ADD_MAX_BYTE
$MAX_LINE = OUT_MAX_LINE - ADD_MAX_LINE
$file = 1
$OPEN/READ IN input.dat
$
$NEW_FILE_LOOP:
$CLOSE/NOLOG out
$CREATE/FDL=SYS$INPUT out_''file'.dat
FILE; ALLOCATION 10000; EXTEN 5000;
RECORD; FORMAT STREAM_LF
$
$OPEN/APPEND out out_''file'.dat
$IF need_a_break THEN WRITE/SYMB out record
$file = file + 1
$need_a_break = 0
$lines = 0
$bytes = 0
$
$READ_LOOP:
$READ/END=DONE in record
$lines = lines + 1
$bytes = lines + F$LEN(record)
$IF lines .GT. MAX_LINE .OR. bytes .GT. MAX_LINE THEN need_a_break = 1
$IF need_a_break
$THEN
$ if record looks like start of a new file then goto NEW_FILE_LOOP
$ENDIF
$WRITE/SYMB out record
$GOTO READ_LOOP
In perl (again, just a brain dump: UNTESTED) that might be
$MAX_BYTE = $OUT_MAX_BYTE - $ADD_MAX_BYTE;
$MAX_LINE = $OUT_MAX_LINE - $ADD_MAX_LINE;
$file = 1;
open (IN,"
while (
$bytes += len($_);
if ($lines++ < $MAX_LINE && $bytes < $MAX_BYTE) {
print OU;
next;
}
if (/looks like a new file/) {
close (OU);
$file++;
open (OU,">output_$file.dat") || die "output";
$lines = $bytes = 0
print OU;
}
}
hth,
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-16-2004 09:22 AM
07-16-2004 09:22 AM
Re: splitting up files with unknown record type
The best way to split would be a small program instead of a script. Especially since the file cannot be split up just anyware because of the ASN.1 encoded records in them. I don't want to end up with half records at the end or beginning of the parts.
A program is also the best solution, since the amount of data in the files is megabytes, and not just hundreds of bytes.
It would be a good exercise for me anyway to create a program like that.
Thanks all for your help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-17-2004 12:40 PM
07-17-2004 12:40 PM
Re: splitting up files with unknown record type
at least at some point in time Digital had it then (TM). Check
http://h18000.www1.hp.com/info/SP3290/SP3290PF.PDF
for ASN.1 support and toolsets to make writing your program easier. I am not exactly sure if you can get this product still...
Greetings, Martin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-17-2004 04:03 PM
07-17-2004 04:03 PM
Re: splitting up files with unknown record type
Interesting find!
Peter,
"A program is also the best solution, since the amount of data in the files is megabytes, and not just hundreds of bytes.
It would be a good exercise for me anyway to create a program like that."
Sounds right by me. I don't know whether you ever tried 'perl', but even if you did not you may want to consider it for jobs like this.
My example outlined a couple replies back should be pretty close to what you need.
Just replace that "looks like a new file" by the correct regular-expression to trigger on the start of a chunk of ASN.1. That, and fix my errors, because I did note try to run it.
Met vriendelijke groetjes,
Hein