Operating System - HP-UX
1751894 Members
5147 Online
108783 Solutions
New Discussion

summarize bytes of files and make a control break for a fix directory structure

 
SOLVED
Go to solution
support_billa
Valued Contributor

summarize bytes of files and make a control break for a fix directory structure

hello,

 

i have a file which is created by data protector.

 

i want to summarize the bytes of the files , which are printed in column "size" .

 

but i have a specials :

 

files are printed like :

 

/filesystem/directory/DIR/file

/filesystem/directory/MOVED201202/subdirectory/app/DIR/IF/geladen/...../file

 

i need to get a summarize of files only for 3 subdirectories like :  "/filesystem/directory/DIR" , "/filesystem/directory/MOVED201202" so i think i have to split the files with "/" and need to get 3 subdirectories .

so the "control break" is after  3 subdirectories .

is it possible by awk ?

 

the output should be :

 

"/filesystem/directory/DIR"  : 3000 bytes

"/filesystem/directory/MOVED201202" : xxxx bytes

 

the reason for the report:  we need to create filesystems for the subdirectories , so i have to know the size.

 

regards

 

in the attachment is the output

7 REPLIES 7
Dennis Handly
Acclaimed Contributor

Re: summarize sizes of files and make a control break for a fix directory structure

Try this:

#!/usr/bin/ksh

 

# Adds up hierarchical (must be ordered) directory and file sizes.

# Outputs just directories (ends in "/") and sizes

 

awk '
BEGIN { getline; getline # skip first two lines
# Add dummy directory entry for "/"
directory["/"] = 0
OFMT = "%.0f"
}
function add_entry() {
   for (dir in directory) {
      if (filename ~ dir) # substring match
         directory[dir] += size
   }
   # check to see if a directory, ends in "/"
   if (substr(filename, length(filename), 1) == "/")
      directory[filename] = size   # initialize with directory size
}
{
size = $4
filename = $7
add_entry()
}
# print out directory entries
END {
for (dir in directory)
   printf "%12.0f %s\n", directory[dir], dir
}' input-file | sort -k2,2

 

I get:

   410893371 /
   410893371 /filesystem/directory/
   337175785 /filesystem/directory/DIR/
    73716466 /filesystem/directory/MOVED201202/
    73691984 /filesystem/directory/MOVED201202/subdirectory/
    73691888 /filesystem/directory/MOVED201202/subdirectory/app/
    73691792 /filesystem/directory/MOVED201202/subdirectory/app/DIR/
    73691696 /filesystem/directory/MOVED201202/subdirectory/app/DIR/IF/
    73691600 /filesystem/directory/MOVED201202/subdirectory/app/DIR/IF/geladen/
          96 /filesystem/directory/lost+found/

support_billa
Valued Contributor

Re: summarize sizes of files and make a control break for a fix directory structure

hello Dennis,

 

 

 

 

 

Dennis Handly
Acclaimed Contributor

Re: summarize sizes of files and make a control break for a fix directory structure

>but only the output is different from the target value:

 

Because I add in the size of the directories too.

support_billa
Valued Contributor

Re: summarize sizes of files and make a control break for a fix directory structure

hello,

 

can you explain this part :


- first you initalize the array:

  # Add dummy directory entry for "/"
  directory["/"] = 0

- what means for dir in directory
   what is the content of the array
   first entry :  directory["/"] = 0
   next  entry :   ???

function add_entry() {
   for (dir in directory)
   {
      if (filename ~ dir) # substring match
              directory[dir] += size
      

i tried to extend your awk , but the result is a little bit much less then your awk.
is it the right way ?

 

my extension :

 

#!/usr/bin/ksh

# Adds up hierarchical (must be ordered) directory and file sizes.
# Outputs just directories (ends in "/") and sizes

awk '
BEGIN { getline; getline # skip first two lines
  # Add dummy directory entry for "/"
  directory["/"] = 0
  OFMT = "%.0f"
}
function add_entry() {
   for (dir in directory)
   {
      if (filename ~ dir) # substring match
      {
        if (substr(typ, 1 , 1) == "-")
         directory[dir] += size
      }
   }
   # check to see if a directory, ends in "/"
   if (substr(filename, length(filename), 1) == "/")
   {
      if (substr(typ, 1 , 1) == "d" )
      {
        directory[filename] -= size
      }
   }
}
{
   typ   = $1
   size = $4
   filename = $7
   add_entry()
}
# print out directory entries
END {
   for (dir in directory)
     printf "%12.0f %s\n", directory[dir], dir
}'    file | sort -k2,2


Dennis Handly
Acclaimed Contributor
Solution

Re: summarize sizes of files and make a control break for a fix directory structure

>- what means for dir in directory

 

Goes through all of the entries in the array (in random order)

 

>  what is the content of the array
   first entry:  directory["/"] = 0
   next  entry:   ???

 

Since they are random, there is no specific "next".  This is a hash, not a map.

 

>        if (substr(typ, 1 , 1) == "-")
>         directory[dir] += size

 

With bother with the leading "-"?  Better to just look for trailing "/".

Also the proper check for the opposite of "d" is (!= "d") and not (== "-").

 

>if (substr(typ, 1 , 1) == "d" )

 

Again, no need to check both ways.

 

>        directory[filename] -= size

 

Why are you subtracting?  If you don't want to count them, initialize it to 0.

 

support_billa
Valued Contributor

Re: summarize sizes of files and make a control break for a fix directory structure

> Why are you subtracting?  If you don't want to count them, initialize it to 0.

i change : directory[filename] -= size to directory[filename] 0

and then i get the right ouput:

Files of Directories : /filesystem/directory/DIR           Target value (size):    336998633   awk value : 336998633   
Files of Directories : /filesystem/directory/MOVED201202   Target value (size):    73709842    awk value : 73709842    

some last questions:

>> what means for dir in directory
>> Goes through all of the entries in the array (in random order)

this i understand.
 
>> what is the content of the array
>> Since they are random, there is no specific "next".  This is a hash, not a map.

what i don't understand :

for example, when awk read this line :

-rw-------    prod    edv    2636    04/10/12 03:13:10    /filesystem/directory/DIR/ABRKONZ_AUFWO_201137_IF_20120410031303.dmp.Z


then you call " add_entry() "
and it is not not in the array directory.

how i put it into the array ?

for (dir in directory)   <= here it isn't in the array the value?
   {

 

i don't understand, how the "directories" are stored in the array ?

 

regards

Dennis Handly
Acclaimed Contributor

Re: summarize sizes of files and make a control break for a fix directory structure

>how I put it into the array?

 

This puts it in the array:

   directory[filename] = size   # add directory size

 

And I only put directory entries into the array and for each file, I add to that directory entry.

 

>for (dir in directory)   <= here it isn't in the array the value?

 

This just iterates over the list of keys in the array.

 

>I don't understand, how the "directories" are stored in the array?

 

The array stores key/value pairs in the hash:  directory[filename] = size

filename is the key, directory[filename] is the corresponding value.