How does find prevent endless loops (e.g. when renaming files while finding them)? [duplicate]












2
















This question already has an answer here:




  • Will we ever “find” files whose names are changed by “find”? Why not?

    1 answer




Please consider the following command:



find . -type f -name '*.*' -exec mv '{}' '{}_foo' ;


How does find prevent endless loops in this case?



On one hand, I believe to know that find does not work like shell globs do, i.e. it does not fetch a list of all *.jpg files, stores that list internally and then processes the list entries. Instead, it gets the files to process "incrementally" from the underlying O/S and processes each of them as soon as it knows about it (let's ignore a certain amount of buffering which might take place since this is irrelevant to the question). After all, as far as I have understood, this is the main advantage of find over globs in directories which have a lot of files in them.



If this is true, I would like to understand how find prevents endless loops. In the example above, 1.jpg would be renamed to 1.jpg_foo. From discussions on StackOverflow and elsewhere, I know that renaming might result in the file (name) occupying a different slot in the directory file list, so chances are that find encounters that file a second time, renames it again (to 1.jpg_foo_foo), and so on.



Obviously, this does not happen. Could somebody please give some insight?










share|improve this question















marked as duplicate by ilkkachu bash
Users with the  bash badge can single-handedly close bash questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Jan 6 at 12:27


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.











  • 2





    In your example, the renamed file wouldn't be covered by the match since it would end in _foo.

    – Torin
    Jan 6 at 10:38











  • OK, thanks - Fixed it.

    – Binarus
    Jan 6 at 10:40











  • Hmm, I didn't notice this had the bash tag. But this does look like the same question, just yell at me if you disagree...

    – ilkkachu
    Jan 6 at 12:28













  • @ilkkachu No, you are right. Despite searching thoroughly and for a long time, I did not find the other question. So your action is appropriate. The bash tag is here because I hoped to attract people who know both variants (i.e. find as well as globs) and could confirm that and tell me why find is faster by orders of magnitudes in some cases.

    – Binarus
    Jan 6 at 12:33













  • @Binarus, oh, hmm, I didn't realize you meant find vs globs as such an important point. That might be worth a question in itself, if it isn't here already. I think there's a couple of other points than just speed: standardization and batching come to mind.

    – ilkkachu
    Jan 6 at 13:02
















2
















This question already has an answer here:




  • Will we ever “find” files whose names are changed by “find”? Why not?

    1 answer




Please consider the following command:



find . -type f -name '*.*' -exec mv '{}' '{}_foo' ;


How does find prevent endless loops in this case?



On one hand, I believe to know that find does not work like shell globs do, i.e. it does not fetch a list of all *.jpg files, stores that list internally and then processes the list entries. Instead, it gets the files to process "incrementally" from the underlying O/S and processes each of them as soon as it knows about it (let's ignore a certain amount of buffering which might take place since this is irrelevant to the question). After all, as far as I have understood, this is the main advantage of find over globs in directories which have a lot of files in them.



If this is true, I would like to understand how find prevents endless loops. In the example above, 1.jpg would be renamed to 1.jpg_foo. From discussions on StackOverflow and elsewhere, I know that renaming might result in the file (name) occupying a different slot in the directory file list, so chances are that find encounters that file a second time, renames it again (to 1.jpg_foo_foo), and so on.



Obviously, this does not happen. Could somebody please give some insight?










share|improve this question















marked as duplicate by ilkkachu bash
Users with the  bash badge can single-handedly close bash questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Jan 6 at 12:27


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.











  • 2





    In your example, the renamed file wouldn't be covered by the match since it would end in _foo.

    – Torin
    Jan 6 at 10:38











  • OK, thanks - Fixed it.

    – Binarus
    Jan 6 at 10:40











  • Hmm, I didn't notice this had the bash tag. But this does look like the same question, just yell at me if you disagree...

    – ilkkachu
    Jan 6 at 12:28













  • @ilkkachu No, you are right. Despite searching thoroughly and for a long time, I did not find the other question. So your action is appropriate. The bash tag is here because I hoped to attract people who know both variants (i.e. find as well as globs) and could confirm that and tell me why find is faster by orders of magnitudes in some cases.

    – Binarus
    Jan 6 at 12:33













  • @Binarus, oh, hmm, I didn't realize you meant find vs globs as such an important point. That might be worth a question in itself, if it isn't here already. I think there's a couple of other points than just speed: standardization and batching come to mind.

    – ilkkachu
    Jan 6 at 13:02














2












2








2









This question already has an answer here:




  • Will we ever “find” files whose names are changed by “find”? Why not?

    1 answer




Please consider the following command:



find . -type f -name '*.*' -exec mv '{}' '{}_foo' ;


How does find prevent endless loops in this case?



On one hand, I believe to know that find does not work like shell globs do, i.e. it does not fetch a list of all *.jpg files, stores that list internally and then processes the list entries. Instead, it gets the files to process "incrementally" from the underlying O/S and processes each of them as soon as it knows about it (let's ignore a certain amount of buffering which might take place since this is irrelevant to the question). After all, as far as I have understood, this is the main advantage of find over globs in directories which have a lot of files in them.



If this is true, I would like to understand how find prevents endless loops. In the example above, 1.jpg would be renamed to 1.jpg_foo. From discussions on StackOverflow and elsewhere, I know that renaming might result in the file (name) occupying a different slot in the directory file list, so chances are that find encounters that file a second time, renames it again (to 1.jpg_foo_foo), and so on.



Obviously, this does not happen. Could somebody please give some insight?










share|improve this question

















This question already has an answer here:




  • Will we ever “find” files whose names are changed by “find”? Why not?

    1 answer




Please consider the following command:



find . -type f -name '*.*' -exec mv '{}' '{}_foo' ;


How does find prevent endless loops in this case?



On one hand, I believe to know that find does not work like shell globs do, i.e. it does not fetch a list of all *.jpg files, stores that list internally and then processes the list entries. Instead, it gets the files to process "incrementally" from the underlying O/S and processes each of them as soon as it knows about it (let's ignore a certain amount of buffering which might take place since this is irrelevant to the question). After all, as far as I have understood, this is the main advantage of find over globs in directories which have a lot of files in them.



If this is true, I would like to understand how find prevents endless loops. In the example above, 1.jpg would be renamed to 1.jpg_foo. From discussions on StackOverflow and elsewhere, I know that renaming might result in the file (name) occupying a different slot in the directory file list, so chances are that find encounters that file a second time, renames it again (to 1.jpg_foo_foo), and so on.



Obviously, this does not happen. Could somebody please give some insight?





This question already has an answer here:




  • Will we ever “find” files whose names are changed by “find”? Why not?

    1 answer








linux bash find






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 6 at 10:39







Binarus

















asked Jan 6 at 10:30









BinarusBinarus

249211




249211




marked as duplicate by ilkkachu bash
Users with the  bash badge can single-handedly close bash questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Jan 6 at 12:27


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.






marked as duplicate by ilkkachu bash
Users with the  bash badge can single-handedly close bash questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Jan 6 at 12:27


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.










  • 2





    In your example, the renamed file wouldn't be covered by the match since it would end in _foo.

    – Torin
    Jan 6 at 10:38











  • OK, thanks - Fixed it.

    – Binarus
    Jan 6 at 10:40











  • Hmm, I didn't notice this had the bash tag. But this does look like the same question, just yell at me if you disagree...

    – ilkkachu
    Jan 6 at 12:28













  • @ilkkachu No, you are right. Despite searching thoroughly and for a long time, I did not find the other question. So your action is appropriate. The bash tag is here because I hoped to attract people who know both variants (i.e. find as well as globs) and could confirm that and tell me why find is faster by orders of magnitudes in some cases.

    – Binarus
    Jan 6 at 12:33













  • @Binarus, oh, hmm, I didn't realize you meant find vs globs as such an important point. That might be worth a question in itself, if it isn't here already. I think there's a couple of other points than just speed: standardization and batching come to mind.

    – ilkkachu
    Jan 6 at 13:02














  • 2





    In your example, the renamed file wouldn't be covered by the match since it would end in _foo.

    – Torin
    Jan 6 at 10:38











  • OK, thanks - Fixed it.

    – Binarus
    Jan 6 at 10:40











  • Hmm, I didn't notice this had the bash tag. But this does look like the same question, just yell at me if you disagree...

    – ilkkachu
    Jan 6 at 12:28













  • @ilkkachu No, you are right. Despite searching thoroughly and for a long time, I did not find the other question. So your action is appropriate. The bash tag is here because I hoped to attract people who know both variants (i.e. find as well as globs) and could confirm that and tell me why find is faster by orders of magnitudes in some cases.

    – Binarus
    Jan 6 at 12:33













  • @Binarus, oh, hmm, I didn't realize you meant find vs globs as such an important point. That might be worth a question in itself, if it isn't here already. I think there's a couple of other points than just speed: standardization and batching come to mind.

    – ilkkachu
    Jan 6 at 13:02








2




2





In your example, the renamed file wouldn't be covered by the match since it would end in _foo.

– Torin
Jan 6 at 10:38





In your example, the renamed file wouldn't be covered by the match since it would end in _foo.

– Torin
Jan 6 at 10:38













OK, thanks - Fixed it.

– Binarus
Jan 6 at 10:40





OK, thanks - Fixed it.

– Binarus
Jan 6 at 10:40













Hmm, I didn't notice this had the bash tag. But this does look like the same question, just yell at me if you disagree...

– ilkkachu
Jan 6 at 12:28







Hmm, I didn't notice this had the bash tag. But this does look like the same question, just yell at me if you disagree...

– ilkkachu
Jan 6 at 12:28















@ilkkachu No, you are right. Despite searching thoroughly and for a long time, I did not find the other question. So your action is appropriate. The bash tag is here because I hoped to attract people who know both variants (i.e. find as well as globs) and could confirm that and tell me why find is faster by orders of magnitudes in some cases.

– Binarus
Jan 6 at 12:33







@ilkkachu No, you are right. Despite searching thoroughly and for a long time, I did not find the other question. So your action is appropriate. The bash tag is here because I hoped to attract people who know both variants (i.e. find as well as globs) and could confirm that and tell me why find is faster by orders of magnitudes in some cases.

– Binarus
Jan 6 at 12:33















@Binarus, oh, hmm, I didn't realize you meant find vs globs as such an important point. That might be worth a question in itself, if it isn't here already. I think there's a couple of other points than just speed: standardization and batching come to mind.

– ilkkachu
Jan 6 at 13:02





@Binarus, oh, hmm, I didn't realize you meant find vs globs as such an important point. That might be worth a question in itself, if it isn't here already. I think there's a couple of other points than just speed: standardization and batching come to mind.

– ilkkachu
Jan 6 at 13:02










1 Answer
1






active

oldest

votes


















5














Within a single directory, it may be as simple as reading the entire filelist before processing it (and strace makes it looks like that's what happens):



# keep reading entries first
openat(AT_FDCWD, ".", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_DIRECTORY) = 4
getdents(4, /* 1024 entries */, 32768) = 32752
getdents(4, /* 1024 entries */, 32768) = 32768
getdents(4, /* 426 entries */, 32768) = 13632
getdents(4, /* 0 entries */, 32768) = 0
close(4) = 0


(output abridged for readability)



# process stuff later
clone(...
wait4(...
--- SIGCHLD...
clone(...
wait4(...
--- SIGCHLD ...


In general, though, find does not prevent any loops at all. If you move files to a subdir, that happens multiple times:



mkdir -p sub/sub/sub/sub
find -type f -exec mv {} sub/{}_foo ;


This results in sub/sub/sub/sub/file_foo_foo_foo_foo and such things. (-depth might help in this case).



It's best to avoid any possible clashes in the first place instead of blindly relying on find employing some magic that just isn't there. Your question before your edit was a good solution, since it simply didn't match the already renamed file at all.



Even in cases where not strictly required, it's nice to make it clear that files can't and shouldn't be processed twice. We're renaming jpg files here and not foo files.



Also even if find in a single call will prevent processing files twice, there's always a risk the script as a whole will re-run and find will run a 2nd time, so you'll need safeguards in place either way.






share|improve this answer
























  • Thank you very much and +1. Notably, the part of your answer with the recursive folder structure was very enlightening. In addition, I still think that find in general is not guaranteed to read the entire file list from a directory before processing the files, even though this was the case in your test. So I think I have learned that lesson: Just carefully craft renaming, moving, copying and the search pattern so that find won't treat the same file twice.

    – Binarus
    Jan 6 at 12:10




















1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









5














Within a single directory, it may be as simple as reading the entire filelist before processing it (and strace makes it looks like that's what happens):



# keep reading entries first
openat(AT_FDCWD, ".", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_DIRECTORY) = 4
getdents(4, /* 1024 entries */, 32768) = 32752
getdents(4, /* 1024 entries */, 32768) = 32768
getdents(4, /* 426 entries */, 32768) = 13632
getdents(4, /* 0 entries */, 32768) = 0
close(4) = 0


(output abridged for readability)



# process stuff later
clone(...
wait4(...
--- SIGCHLD...
clone(...
wait4(...
--- SIGCHLD ...


In general, though, find does not prevent any loops at all. If you move files to a subdir, that happens multiple times:



mkdir -p sub/sub/sub/sub
find -type f -exec mv {} sub/{}_foo ;


This results in sub/sub/sub/sub/file_foo_foo_foo_foo and such things. (-depth might help in this case).



It's best to avoid any possible clashes in the first place instead of blindly relying on find employing some magic that just isn't there. Your question before your edit was a good solution, since it simply didn't match the already renamed file at all.



Even in cases where not strictly required, it's nice to make it clear that files can't and shouldn't be processed twice. We're renaming jpg files here and not foo files.



Also even if find in a single call will prevent processing files twice, there's always a risk the script as a whole will re-run and find will run a 2nd time, so you'll need safeguards in place either way.






share|improve this answer
























  • Thank you very much and +1. Notably, the part of your answer with the recursive folder structure was very enlightening. In addition, I still think that find in general is not guaranteed to read the entire file list from a directory before processing the files, even though this was the case in your test. So I think I have learned that lesson: Just carefully craft renaming, moving, copying and the search pattern so that find won't treat the same file twice.

    – Binarus
    Jan 6 at 12:10


















5














Within a single directory, it may be as simple as reading the entire filelist before processing it (and strace makes it looks like that's what happens):



# keep reading entries first
openat(AT_FDCWD, ".", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_DIRECTORY) = 4
getdents(4, /* 1024 entries */, 32768) = 32752
getdents(4, /* 1024 entries */, 32768) = 32768
getdents(4, /* 426 entries */, 32768) = 13632
getdents(4, /* 0 entries */, 32768) = 0
close(4) = 0


(output abridged for readability)



# process stuff later
clone(...
wait4(...
--- SIGCHLD...
clone(...
wait4(...
--- SIGCHLD ...


In general, though, find does not prevent any loops at all. If you move files to a subdir, that happens multiple times:



mkdir -p sub/sub/sub/sub
find -type f -exec mv {} sub/{}_foo ;


This results in sub/sub/sub/sub/file_foo_foo_foo_foo and such things. (-depth might help in this case).



It's best to avoid any possible clashes in the first place instead of blindly relying on find employing some magic that just isn't there. Your question before your edit was a good solution, since it simply didn't match the already renamed file at all.



Even in cases where not strictly required, it's nice to make it clear that files can't and shouldn't be processed twice. We're renaming jpg files here and not foo files.



Also even if find in a single call will prevent processing files twice, there's always a risk the script as a whole will re-run and find will run a 2nd time, so you'll need safeguards in place either way.






share|improve this answer
























  • Thank you very much and +1. Notably, the part of your answer with the recursive folder structure was very enlightening. In addition, I still think that find in general is not guaranteed to read the entire file list from a directory before processing the files, even though this was the case in your test. So I think I have learned that lesson: Just carefully craft renaming, moving, copying and the search pattern so that find won't treat the same file twice.

    – Binarus
    Jan 6 at 12:10
















5












5








5







Within a single directory, it may be as simple as reading the entire filelist before processing it (and strace makes it looks like that's what happens):



# keep reading entries first
openat(AT_FDCWD, ".", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_DIRECTORY) = 4
getdents(4, /* 1024 entries */, 32768) = 32752
getdents(4, /* 1024 entries */, 32768) = 32768
getdents(4, /* 426 entries */, 32768) = 13632
getdents(4, /* 0 entries */, 32768) = 0
close(4) = 0


(output abridged for readability)



# process stuff later
clone(...
wait4(...
--- SIGCHLD...
clone(...
wait4(...
--- SIGCHLD ...


In general, though, find does not prevent any loops at all. If you move files to a subdir, that happens multiple times:



mkdir -p sub/sub/sub/sub
find -type f -exec mv {} sub/{}_foo ;


This results in sub/sub/sub/sub/file_foo_foo_foo_foo and such things. (-depth might help in this case).



It's best to avoid any possible clashes in the first place instead of blindly relying on find employing some magic that just isn't there. Your question before your edit was a good solution, since it simply didn't match the already renamed file at all.



Even in cases where not strictly required, it's nice to make it clear that files can't and shouldn't be processed twice. We're renaming jpg files here and not foo files.



Also even if find in a single call will prevent processing files twice, there's always a risk the script as a whole will re-run and find will run a 2nd time, so you'll need safeguards in place either way.






share|improve this answer













Within a single directory, it may be as simple as reading the entire filelist before processing it (and strace makes it looks like that's what happens):



# keep reading entries first
openat(AT_FDCWD, ".", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_DIRECTORY) = 4
getdents(4, /* 1024 entries */, 32768) = 32752
getdents(4, /* 1024 entries */, 32768) = 32768
getdents(4, /* 426 entries */, 32768) = 13632
getdents(4, /* 0 entries */, 32768) = 0
close(4) = 0


(output abridged for readability)



# process stuff later
clone(...
wait4(...
--- SIGCHLD...
clone(...
wait4(...
--- SIGCHLD ...


In general, though, find does not prevent any loops at all. If you move files to a subdir, that happens multiple times:



mkdir -p sub/sub/sub/sub
find -type f -exec mv {} sub/{}_foo ;


This results in sub/sub/sub/sub/file_foo_foo_foo_foo and such things. (-depth might help in this case).



It's best to avoid any possible clashes in the first place instead of blindly relying on find employing some magic that just isn't there. Your question before your edit was a good solution, since it simply didn't match the already renamed file at all.



Even in cases where not strictly required, it's nice to make it clear that files can't and shouldn't be processed twice. We're renaming jpg files here and not foo files.



Also even if find in a single call will prevent processing files twice, there's always a risk the script as a whole will re-run and find will run a 2nd time, so you'll need safeguards in place either way.







share|improve this answer












share|improve this answer



share|improve this answer










answered Jan 6 at 11:04









frostschutzfrostschutz

26.5k15483




26.5k15483













  • Thank you very much and +1. Notably, the part of your answer with the recursive folder structure was very enlightening. In addition, I still think that find in general is not guaranteed to read the entire file list from a directory before processing the files, even though this was the case in your test. So I think I have learned that lesson: Just carefully craft renaming, moving, copying and the search pattern so that find won't treat the same file twice.

    – Binarus
    Jan 6 at 12:10





















  • Thank you very much and +1. Notably, the part of your answer with the recursive folder structure was very enlightening. In addition, I still think that find in general is not guaranteed to read the entire file list from a directory before processing the files, even though this was the case in your test. So I think I have learned that lesson: Just carefully craft renaming, moving, copying and the search pattern so that find won't treat the same file twice.

    – Binarus
    Jan 6 at 12:10



















Thank you very much and +1. Notably, the part of your answer with the recursive folder structure was very enlightening. In addition, I still think that find in general is not guaranteed to read the entire file list from a directory before processing the files, even though this was the case in your test. So I think I have learned that lesson: Just carefully craft renaming, moving, copying and the search pattern so that find won't treat the same file twice.

– Binarus
Jan 6 at 12:10







Thank you very much and +1. Notably, the part of your answer with the recursive folder structure was very enlightening. In addition, I still think that find in general is not guaranteed to read the entire file list from a directory before processing the files, even though this was the case in your test. So I think I have learned that lesson: Just carefully craft renaming, moving, copying and the search pattern so that find won't treat the same file twice.

– Binarus
Jan 6 at 12:10





Popular posts from this blog

Quarter-circle Tiles

build a pushdown automaton that recognizes the reverse language of a given pushdown automaton?

Mont Emei