| [ | Tags | | | diff, directory tree, file-find, file-find-object, file-find-object-rule, file-find-rule, incremental, instantiation, interrupted, iteration, iterative, needle, perl, recursive, traversal | ] |
| [ | Current Location |
| | Home | ] |
| [ | Current Mood |
| | calm | ] |
| [ | Current Music |
| | Ronald Jenkees - Super-Fun | ] |
I've written about
File-Find-Object
before, but I've intended to write an entry demonstrating its philosophical
advantages over the core File::Find module. Today, I'd like to get to it.
As opposed to File::Find, File-Find-Object:
-
Has an iterative interface, and is capable of being interrupted in the
middle.
-
Can be instantiated and be used to traverse an arbitrary number of directory
trees in one process.
-
Can return result objects instead of just plain paths.
I'd like to demonstrate some of these advantages now.
Case Study #1: Looking for a Needle in a Haystack
Let's suppose you have a huge directory tree containing many directories
and files, and you're looking for only one result (or a few ones). Once you
found that result you wish to stop. This question was raised in
this Stack Overflow
post.
So how can you do it with File::Find? Not very easily. Either you can throw
an exception:
sub processFile() {
if ($_ =~ /target/) {
die { type => "file-was-found", path => $File::Find::name };
}
}
eval {
find (\&processFile, $mydir);
};
if ( $@ ) {
my $result = $@;
if ( (ref($result) eq "HASH") &&
($result->{type} eq "file-found")
)
{
my $path = $result->{path};
# Do something with $path.
}
elsif ( $result ) {
die $result;
}
}
else {
# be sad
}
This is incredibly inelegant, and abuses the Perl exception system for
propagating values instead of errors. But there's even a worse way, using
$File::Find::prune:
#! /usr/bin/perl -w
use strict;
use File::Find;
my @hits = ();
my $hit_lim = shift || 20;
find(
sub {
if( scalar @hits >= $hit_lim ) {
$File::Find::prune = 1;
return;
}
elsif( -d $_ ) {
return;
}
push @hits, $File::Find::name;
},
shift || '.'
);
$, = "\n";
print @hits, "\n";
Here, we prune all the levels from the results up to the root to get out
of the loop.
So how can you do it with File-Find-Object? In a very straightforward manner:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find::Object;
sub find_needle
{
my $base = shift;
my $finder = File::Find::Object->new({}, $base);
while (defined(my $r = $finder->next()))
{
if ($r =~ /target/)
{
return $r;
}
}
return;
}
my $found = find_needle(shift(@ARGV));
if (defined($found))
{
print "$found\n";
}
else
{
die "Could not find target.";
}
The find_needle() function is the important thing here, and one
can see it doesn't use any exceptions, excessive prunes or anything like
that. It just harnesses the iterative interface of File-Find-Object. And
it works too:
shlomi:~$ perl f-f-o-find-needle.pl ~/progs/
/home/shlomi/progs/Rpms/BUILD/ExtUtils-MakeMaker-6.52/t/dir_target.t
shlomi:~$
Case Study #2: Recursive Diff
Let's suppose an evil djinni has removed the -r flag from your
diff program, making you
unable to recursively find the differences between files in two directory
tree. As a result, you now need to write a recursive-diff program in
Perl that will run diff -u on the two copies of each equivalent path
in the two directorie.
Since File::Find cannot be instantiated two times at once, then when using
it, we will need to collect all the results from both directories, and then
traverse them in memory. But with File-Find-Object there is a better way:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find::Object;
use List::MoreUtils qw(all);
my @indexes = (0,1);
my @paths;
for my $idx (@indexes)
{
push @paths, shift(@ARGV);
}
my @finders = map { File::Find::Object->new({}, $_ ) } @paths;
my @results;
my @fns;
sub fetch
{
my $idx = shift;
if ($results[$idx] = $finders[$idx]->next_obj())
{
$fns[$idx] = join("/", @{$results[$idx]->full_components()});
}
return;
}
sub only_in
{
my $idx = shift;
printf("Only in %s: %s\n", $paths[$idx], $fns[$idx]);
fetch($idx);
return;
}
for my $idx (@indexes)
{
fetch($idx);
}
COMPARE:
while (all { $_ } @results)
{
my $skip = 0;
foreach my $idx (@indexes)
{
if (!$results[$idx]->is_file())
{
fetch($idx);
$skip = 1;
}
}
if ($skip)
{
next COMPARE;
}
if ($fns[0] lt $fns[1])
{
only_in(0);
}
elsif ($fns[1] lt $fns[0])
{
only_in(1);
}
else
{
system("diff", "-u", map {$_->path() } @results);
foreach my $idx (@indexes)
{
fetch($idx);
}
}
}
foreach my $idx (@indexes)
{
while($results[$idx])
{
only_in($idx);
}
}
( As a bonus, we do not need to sort the results explicitly at any stage,
because File-Find-Object sorts them for us. )
This program did not take me a long time to write, it works pretty well,
and does populate a long list of results of one or both directories.
Conclusion
If you use File-Find-Object instead of File::Find, your code may be
cleaner, your logic less convulted, and you may actually be able to
achieve things that are not possible with the latter. I hope I whet your
appetite here and convinced you to give File-Find-Object a try.
So what does the future holds? I recently ported File-Find-Rule to
File-Find-Object and called the result File-Find-Object-Rule . As a result,
"->start" and "->match" are now truly iterative, and I believe
you can iterate with them on several objects at once. As I discovered by
porting File-Find-Object-Rule-MMagic,
I unfortunately cannot maintain full backwards compatibility with the plugin
API of File-Find-Rule, because the latter exposes some of behaviour of
File::Find (in
a
leaky abstraction fashion).
I'm planning on porting more File-Find-Rule plugins to File-Find-Object-Rule,
and would appreciate any help. I also would like to look at the directory
tree traversal APIs of other languages to see if they contain any
interesting techniques.
|