Operating System - HP-UX
1751695 Members
5132 Online
108781 Solutions
New Discussion

use parallelism in a script (command find )

 
SOLVED
Go to solution
support_billa
Valued Contributor

use parallelism in a script (command find )

hello,

 

due to my thread copy a filesystem best solution (local host or remote host)

i have a question about accelerate a script and use, i named it "parallelism "

 

i want to use in a shell script  "&" and command "wait" like :

 

# - Count Files of 2 Filesystems paralell

# - Get Cksum of 2 Filesystems paralell

 

local_old_fs=/old_fs
local_new_fs=/new_fs

cnt_old_fs_files=$( find ${local_old_fs} -type f ! -path "${local_old_fs}/lost+found/*" -type f | wc -l & )
cnt_new_fs_files=$( find ${local_new_fs} -type f ! -path "${local_new_fs}/lost+found/*" -type f | wc -l & )
wait

cd ${local_old_fs} && find . -type f ! -path "./lost+found/*" -type f -exec cksum {} + | sort -k3,3 > /tmp/cksum_old_fs_file &
cd ${local_new_fs} && find . -type f ! -path "./lost+found/*" -type f -exec cksum {} + | sort -k3,3 > /tmp/cksum_new_fs_file &
wait

is it ok ?

 

 

 

8 REPLIES 8
James R. Ferguson
Acclaimed Contributor
Solution

Re: use parallelism in a script (command find )

Hi:

 

Since your checking two different filesystems, I don't see any problem running two processes concurrently.

 

I would eliminate the heck to skip the 'lost+found' directory.  There should be only one file (if any) there and only for the mountpoint.  This seems like worrying about a drop in a rainstorm.

 

I would also write:

 

cd ${local_old_fs} && find . -type f ! -path "./lost+found/*" -type f -exec cksum {} + | sort -k3,3 > /tmp/cksum_old_fs_file &

...as this:

 

cd ${local_old_fs} && { find . -type f -exec cksum {} + | sort -k3,3 > /tmp/cksum_old_fs_file ; } &

...which clearly puts the whole subshell { ... } in the background.  Note the addition of the trailing semicolon.  I also remove the duplicated '-type f'.

 

Of course, when you're done adding "parallel" executions, measure what you would have achieved without that.

 

Regards!

 

...JRF...

Dennis Handly
Acclaimed Contributor

Re: use parallelism in a script (command find)

># - Count Files of 2 Filesystems parallel

># - Get Cksum of 2 Filesystems parallel

 

If you have lots of files, I wouldn't do these in separate steps.  I.e. collect the filenames in one pass, then count and cksum in the next two.

Or better yet, just use wc -l on the cksum output file.

 

If you want to skip lost+found, the proper primary is -prune:

find ${local_new_fs}  -path "${local_new_fs}/lost+found" -prune -o  -type f ... -print or -exec

(And if you know there is only one lost+found at the top, just use -name. ;-)

 

I'm not sure what happens if you use "&" inside $()?  You may want to write to separate files and then after wait, set the variables.

 

>cd ${local_old_fs} && find . -type f ! -path "./lost+found/*" -type f -exec cksum {} + | sort -k3,3 > /tmp/cksum_old_fs_file &

 

Instead of trying to put this all on one line, you may want to do separate commands and you don't need that {}:

if cd ${local_old_fs}; then

   find ... &

fi

 

>There should be only one file (if any) there and only for the mountpoint.

 

Right.

support_billa
Valued Contributor

Re: use parallelism in a script (command find)

hello,

 

If you want to skip lost+found, the proper primary is -prune:
find ${local_new_fs}  -path "${local_new_fs}/lost+found" -prune -o  -type f ... -print or -exec
(And if you know there is only one lost+found at the top, just use -name. ;-)

 

yes , i want skip the whole directory lost+found :

with your proposal it shows directory lost+found , it tested option "-prune" also, why i use this find command because when you use "fsadm", it created a file ".fsadm"

 

a little test with output:

 

mkdir -p /tmp/test_fs/lost+found
touch    /tmp/test_fs/lost+found/.fsadm
touch    /tmp/test_fs/test_file

local_new_fs=/tmp/test_fs

# your proposal
find ${local_new_fs}  -path "${local_new_fs}/lost+found" -prune -o -type f

# output:
/tmp/test_fs/lost+found
/tmp/test_fs/test_file

find ${local_new_fs} ! -path "${local_new_fs}/lost+found/*" -type f

# output:
/tmp/test_fs/test_file

 

regards

 

support_billa
Valued Contributor

Re: use parallelism in a script (command find)

...which clearly puts the whole subshell { ... } in the background.  Note the addition of the trailing semicolon.  
Of course, when you're done adding "parallel" executions, measure what you would have achieved without that.

 thx about the info "trailing semicolon", i forgot it the first time ... and got an syntax error :-))

 

cnt_new_fs_files=$( find ${local_new_fs} -type f ! -path "${local_new_fs}/lost+found/*" -type f | wc -l & )

I'm not sure what happens if you use "&" inside $()?  You may want to write to separate files and then after wait, set the variables.

 

i made several tests , how i use "&" , when i want to store the commands in a variable, i don't know, if it is ok ?

 

Dennis Handly
Acclaimed Contributor

Re: use parallelism in a script (command find)

>with your proposal it shows directory lost+found, it tested option "-prune" also

 

I already showed you how to not print that:

>>find ...  -prune -o -type f ... -print or -exec

 

You must have -print or -exec on the end.  Otherwise you get:

find ...  \( ... -prune -o -type f ... \) -print

 

>it created a file ".fsadm"

 

How big is it?

 

>when I want to store the commands in a variable, I don't know, if it is ok?

 

Did you try something trivial?

X=$( sleep 10; echo finished & )

And what's in $X?

James R. Ferguson
Acclaimed Contributor

Re: use parallelism in a script (command find)


@support_billa wrote:

 

yes , i want skip the whole directory lost+found :



This is overkill in my opinion.  Every mounted filesystem is going to have a 'lost+found' directory.  When you run 'fsadm', a file called '.fsadm' is created if it doesn't already exist.  This file is used as a lock for the 'fsadm' process.  Generally that's all that you should find in this directory unless you have had corruption of the filesystem and a 'fsck' places orphaned files and directories into 'lost+found'.  If those exist, you might want to ascertain what they "were" and see if anything is worth renaming.

 

Regards!

 

...JRF...

support_billa
Valued Contributor

Re: use parallelism in a script (command find)

I already showed you how to not print that:
>>find ...  -prune -o -type f? ... -print or -exec?
You must have -print or -exec on the end.  Otherwise you get:
find ...  \( ... -prune -o -type f? ... \) -print

 sorry, it was my mistake , it's a bad habit of me , when i use "find" , i don't use "-print", but here it is the key "-print".

i wondering , when i have tested option "-prune" weeks ago, why "-prune" didn't work ....

 

>it created a file ".fsadm"?
How big is it?

 it is not big, but the file exists and my checks get failed ( difference of amount of files )

 

X=$( sleep 10; echo finished & )

And what's in $X?

 it is finished :-)

This is overkill in my opinion.  Every mounted filesystem is going to have a 'lost+found' directory.  When you run 'fsadm', a file called '.fsadm' is created if it doesn't already exist.  This file is used as a lock for the 'fsadm' process.  Generally that's all that you should find in this directory unless you have had corruption of the filesystem and a 'fsck' places orphaned files and directories into 'lost+found'.  If those exist, you might want to ascertain what they "were" and see if anything is worth renaming.

 maybe you find it is a overkill , but those scripts use other persons and they don't know exactly what ".fsadm" is.
and i said, my checks get failed ( cksum, ....)

 

regards

Dennis Handly
Acclaimed Contributor

Re: use parallelism in a script (command find)

>when I use "find" , I don't use "-print"

 

And usually neither do I.  Especially since the Posix standard requires find(1) to add a -print to the end, if none.

But it places it as if there were those () I showed.