use parallelism in a script (command find )

support_billa · ‎09-28-2011

hello,

due to my thread copy a filesystem best solution (local host or remote host)

i have a question about accelerate a script and use, i named it "parallelism "

i want to use in a shell script "&" and command "wait" like :

# - Count Files of 2 Filesystems paralell

# - Get Cksum of 2 Filesystems paralell

local_old_fs=/old_fs
local_new_fs=/new_fs

cnt_old_fs_files=$( find ${local_old_fs} -type f ! -path "${local_old_fs}/lost+found/*" -type f | wc -l & )
cnt_new_fs_files=$( find ${local_new_fs} -type f ! -path "${local_new_fs}/lost+found/*" -type f | wc -l & )
wait

cd ${local_old_fs} && find . -type f ! -path "./lost+found/*" -type f -exec cksum {} + | sort -k3,3 > /tmp/cksum_old_fs_file &
cd ${local_new_fs} && find . -type f ! -path "./lost+found/*" -type f -exec cksum {} + | sort -k3,3 > /tmp/cksum_new_fs_file &
wait

is it ok ?

James R. Ferguson · ‎09-28-2011

Hi:

Since your checking two different filesystems, I don't see any problem running two processes concurrently.

I would eliminate the heck to skip the 'lost+found' directory. There should be only one file (if any) there and only for the mountpoint. This seems like worrying about a drop in a rainstorm.

I would also write:

cd ${local_old_fs} && find . -type f ! -path "./lost+found/*" -type f -exec cksum {} + | sort -k3,3 > /tmp/cksum_old_fs_file &

...as this:

cd ${local_old_fs} && { find . -type f -exec cksum {} + | sort -k3,3 > /tmp/cksum_old_fs_file ; } &

...which clearly puts the whole subshell { ... } in the background. Note the addition of the trailing semicolon. I also remove the duplicated '-type f'.

Of course, when you're done adding "parallel" executions, measure what you would have achieved without that.

Regards!

...JRF...

Dennis Handly · ‎09-28-2011

># - Count Files of 2 Filesystems parallel

># - Get Cksum of 2 Filesystems parallel

If you have lots of files, I wouldn't do these in separate steps. I.e. collect the filenames in one pass, then count and cksum in the next two.

Or better yet, just use wc -l on the cksum output file.

If you want to skip lost+found, the proper primary is -prune:

find ${local_new_fs} -path "${local_new_fs}/lost+found" -prune -o -type f ... -print or -exec

(And if you know there is only one lost+found at the top, just use -name. ;-)

I'm not sure what happens if you use "&" inside $()? You may want to write to separate files and then after wait, set the variables.

>cd ${local_old_fs} && find . -type f ! -path "./lost+found/*" -type f -exec cksum {} + | sort -k3,3 > /tmp/cksum_old_fs_file &

Instead of trying to put this all on one line, you may want to do separate commands and you don't need that {}:

if cd ${local_old_fs}; then

find ... &

fi

>There should be only one file (if any) there and only for the mountpoint.

Right.

support_billa · ‎09-29-2011

hello,

If you want to skip lost+found, the proper primary is -prune:
find ${local_new_fs}  -path "${local_new_fs}/lost+found" -prune -o  -type f ... -print or -exec
(And if you know there is only one lost+found at the top, just use -name. ;-)

yes , i want skip the whole directory lost+found :

with your proposal it shows directory lost+found , it tested option "-prune" also, why i use this find command because when you use "fsadm", it created a file ".fsadm"

a little test with output:

mkdir -p /tmp/test_fs/lost+found
touch /tmp/test_fs/lost+found/.fsadm
touch /tmp/test_fs/test_file

local_new_fs=/tmp/test_fs

# your proposal
find ${local_new_fs} -path "${local_new_fs}/lost+found" -prune -o -type f

# output:
/tmp/test_fs/lost+found
/tmp/test_fs/test_file

find ${local_new_fs} ! -path "${local_new_fs}/lost+found/*" -type f

# output:
/tmp/test_fs/test_file

regards

support_billa · ‎09-29-2011

...which clearly puts the whole subshell { ... } in the background.  Note the addition of the trailing semicolon.  
Of course, when you're done adding "parallel" executions, measure what you would have achieved without that.

thx about the info "trailing semicolon", i forgot it the first time ... and got an syntax error :-))

cnt_new_fs_files=$( find ${local_new_fs} -type f ! -path "${local_new_fs}/lost+found/*" -type f | wc -l & )

I'm not sure what happens if you use "&" inside $()?  You may want to write to separate files and then after wait, set the variables.

i made several tests , how i use "&" , when i want to store the commands in a variable, i don't know, if it is ok ?

Dennis Handly · ‎09-29-2011

>with your proposal it shows directory lost+found, it tested option "-prune" also

I already showed you how to not print that:

>>find ... -prune -o -type f ... -print or -exec

You must have -print or -exec on the end. Otherwise you get:

find ... $ ... -prune -o -type f ... $ -print

>it created a file ".fsadm"

How big is it?

>when I want to store the commands in a variable, I don't know, if it is ok?

Did you try something trivial?

X=$( sleep 10; echo finished & )

And what's in $X?

James R. Ferguson · ‎09-29-2011

@support_billa wrote:

yes , i want skip the whole directory lost+found :

This is overkill in my opinion. Every mounted filesystem is going to have a 'lost+found' directory. When you run 'fsadm', a file called '.fsadm' is created if it doesn't already exist. This file is used as a lock for the 'fsadm' process. Generally that's all that you should find in this directory unless you have had corruption of the filesystem and a 'fsck' places orphaned files and directories into 'lost+found'. If those exist, you might want to ascertain what they "were" and see if anything is worth renaming.

Regards!

...JRF...

support_billa · ‎09-29-2011

I already showed you how to not print that:
>>find ...  -prune -o -type f? ... -print or -exec?
You must have -print or -exec on the end.  Otherwise you get:
find ...  \( ... -prune -o -type f? ... \) -print

sorry, it was my mistake , it's a bad habit of me , when i use "find" , i don't use "-print", but here it is the key "-print".

i wondering , when i have tested option "-prune" weeks ago, why "-prune" didn't work ....

>it created a file ".fsadm"?
How big is it?

it is not big, but the file exists and my checks get failed ( difference of amount of files )

X=$( sleep 10; echo finished & )

And what's in $X?

it is finished :-)

This is overkill in my opinion.  Every mounted filesystem is going to have a 'lost+found' directory.  When you run 'fsadm', a file called '.fsadm' is created if it doesn't already exist.  This file is used as a lock for the 'fsadm' process.  Generally that's all that you should find in this directory unless you have had corruption of the filesystem and a 'fsck' places orphaned files and directories into 'lost+found'.  If those exist, you might want to ascertain what they "were" and see if anything is worth renaming.

maybe you find it is a overkill , but those scripts use other persons and they don't know exactly what ".fsadm" is.
and i said, my checks get failed ( cksum, ....)

regards

Dennis Handly · ‎09-30-2011

>when I use "find" , I don't use "-print"

And usually neither do I. Especially since the Posix standard requires find(1) to add a -print to the end, if none.

But it places it as if there were those () I showed.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

use parallelism in a script (command find )

use parallelism in a script (command find )

Re: use parallelism in a script (command find )

Re: use parallelism in a script (command find)

Re: use parallelism in a script (command find)

Re: use parallelism in a script (command find)

Re: use parallelism in a script (command find)

Re: use parallelism in a script (command find)

Re: use parallelism in a script (command find)

Re: use parallelism in a script (command find)