Shlomif's Technical Posts Community - What you can do with File-Find-Object (that you can't with File::Find) [entries|archive|friends|userinfo]
Shlomif's Technical Posts Community

[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Links
[Links:| Shlomi Fish's Homepage Main Journal Homesite Blog Planet Linux-IL Amir Aharoni in Unicode open dot dot dot ]

What you can do with File-Find-Object (that you can't with File::Find) [Jul. 17th, 2009|03:58 pm]
Previous Entry Add to Memories Share Next Entry

shlomif_tech

[shlomif]
[Tags|, , , , , , , , , , , , , , ]
[Current Location |Home]
[Current Mood |calmcalm]
[Current Music |Ronald Jenkees - Super-Fun]

I've written about File-Find-Object before, but I've intended to write an entry demonstrating its philosophical advantages over the core File::Find module. Today, I'd like to get to it.

As opposed to File::Find, File-Find-Object:

  1. Has an iterative interface, and is capable of being interrupted in the middle.
  2. Can be instantiated and be used to traverse an arbitrary number of directory trees in one process.
  3. Can return result objects instead of just plain paths.

I'd like to demonstrate some of these advantages now.

Case Study #1: Looking for a Needle in a Haystack

Let's suppose you have a huge directory tree containing many directories and files, and you're looking for only one result (or a few ones). Once you found that result you wish to stop. This question was raised in this Stack Overflow post.

So how can you do it with File::Find? Not very easily. Either you can throw an exception:


sub processFile() {
   if ($_ =~ /target/) {
      die { type => "file-was-found", path => $File::Find::name };
   }
}

eval {
    find (\&processFile, $mydir);
};

if ( $@ ) {
    my $result = $@;
    if ( (ref($result) eq "HASH") && 
         ($result->{type} eq "file-found")
       )
    {
        my $path = $result->{path};
        # Do something with $path.
    }
    elsif ( $result ) {
        die $result;
    }
}
else {
   # be sad
}

This is incredibly inelegant, and abuses the Perl exception system for propagating values instead of errors. But there's even a worse way, using $File::Find::prune:

#! /usr/bin/perl -w

use strict;
use File::Find;

my @hits = ();
my $hit_lim = shift || 20;

find(
    sub {
        if( scalar @hits >= $hit_lim ) {
            $File::Find::prune = 1;
            return;
        }
        elsif( -d $_ ) {
            return;
        }
        push @hits, $File::Find::name;
    },
    shift || '.'
);

$, = "\n";
print @hits, "\n";

Here, we prune all the levels from the results up to the root to get out of the loop.

So how can you do it with File-Find-Object? In a very straightforward manner:

#!/usr/bin/perl

use strict;
use warnings;

use File::Find::Object;

sub find_needle
{
    my $base = shift;

    my $finder = File::Find::Object->new({}, $base);

    while (defined(my $r = $finder->next()))
    {
        if ($r =~ /target/)
        {
            return $r;
        }
    }

    return;
}

my $found = find_needle(shift(@ARGV));

if (defined($found))
{
    print "$found\n";
}
else
{
    die "Could not find target.";
}

The find_needle() function is the important thing here, and one can see it doesn't use any exceptions, excessive prunes or anything like that. It just harnesses the iterative interface of File-Find-Object. And it works too:

shlomi:~$ perl f-f-o-find-needle.pl ~/progs/
/home/shlomi/progs/Rpms/BUILD/ExtUtils-MakeMaker-6.52/t/dir_target.t
shlomi:~$

Case Study #2: Recursive Diff

Evil Djinni from Disney's Aladdin

Let's suppose an evil djinni has removed the -r flag from your diff program, making you unable to recursively find the differences between files in two directory tree. As a result, you now need to write a recursive-diff program in Perl that will run diff -u on the two copies of each equivalent path in the two directorie.

Since File::Find cannot be instantiated two times at once, then when using it, we will need to collect all the results from both directories, and then traverse them in memory. But with File-Find-Object there is a better way:

#!/usr/bin/perl

use strict;
use warnings;

use File::Find::Object;
use List::MoreUtils qw(all);

my @indexes = (0,1);
my @paths;
for my $idx (@indexes)
{
    push @paths, shift(@ARGV);
}

my @finders = map { File::Find::Object->new({}, $_ ) } @paths;

my @results;

my @fns;

sub fetch
{
    my $idx = shift;

    if ($results[$idx] = $finders[$idx]->next_obj())
    {
        $fns[$idx] = join("/", @{$results[$idx]->full_components()});
    }

    return;
}

sub only_in
{
    my $idx = shift;

    printf("Only in %s: %s\n", $paths[$idx], $fns[$idx]);
    fetch($idx);

    return;
}

for my $idx (@indexes)
{
    fetch($idx);
}

COMPARE:
while (all { $_ } @results)
{
    my $skip = 0;
    foreach my $idx (@indexes)
    {
        if (!$results[$idx]->is_file())
        {
            fetch($idx);
            $skip = 1;
        }
    }
    if ($skip)
    {
        next COMPARE;
    }

    if ($fns[0] lt $fns[1])
    {
        only_in(0);
    }
    elsif ($fns[1] lt $fns[0])
    {
        only_in(1);
    }
    else
    {
        system("diff", "-u", map {$_->path() } @results);
        foreach my $idx (@indexes)
        {
            fetch($idx);
        }
    }
}

foreach my $idx (@indexes)
{
    while($results[$idx])
    {
        only_in($idx);
    }
}

( As a bonus, we do not need to sort the results explicitly at any stage, because File-Find-Object sorts them for us. )

This program did not take me a long time to write, it works pretty well, and does populate a long list of results of one or both directories.

Conclusion

If you use File-Find-Object instead of File::Find, your code may be cleaner, your logic less convulted, and you may actually be able to achieve things that are not possible with the latter. I hope I whet your appetite here and convinced you to give File-Find-Object a try.

So what does the future holds? I recently ported File-Find-Rule to File-Find-Object and called the result File-Find-Object-Rule . As a result, "->start" and "->match" are now truly iterative, and I believe you can iterate with them on several objects at once. As I discovered by porting File-Find-Object-Rule-MMagic, I unfortunately cannot maintain full backwards compatibility with the plugin API of File-Find-Rule, because the latter exposes some of behaviour of File::Find (in a leaky abstraction fashion).

I'm planning on porting more File-Find-Rule plugins to File-Find-Object-Rule, and would appreciate any help. I also would like to look at the directory tree traversal APIs of other languages to see if they contain any interesting techniques.

LinkReply

Comments:
[User Picture]From: shlomif
2009-07-19 02:54 pm (UTC)

Re: a typo and traversal order

(Link)

Hi spx2! Thanks for your comment.

You are right that I had a problem in my code and that I should have either used "file-was-found" or "file-found" in both places. I guess I should have tested the code.

Regarding your question - File-Find-Objects sorts the file lexicographically by default, so I can depend on their order to be consistent and predictable. What I do is keep two markers on the two result sets and make sure they are as synchronised as possible. If the first marker is lower than the second one, I increment it, and vice versa. If they are equal, then I have two matching filenames and I invoke diff on them.

Regards,

-- Shlomi Fish