Shlomif's Technical Posts Community - Project Euler Problem #10 in Haskell, Perl and C [entries|archive|friends|userinfo]
Shlomif's Technical Posts Community

[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Links
[Links:| Shlomi Fish's Homepage Main Journal Homesite Blog Planet Linux-IL Amir Aharoni in Unicode open dot dot dot ]

Project Euler Problem #10 in Haskell, Perl and C [Jan. 30th, 2010|02:50 pm]
Previous Entry Add to Memories Share Next Entry

shlomif_tech

[shlomif]
[Tags|, , , , , , , , , , , , , ]
[Current Location |Home]
[Current Music |Klein Four Group - Finite Simple Group of Order Two]

zerothorder told how he found a solution for Project Euler's Problem 10 in Haskell. The problem is "Find the sum of all primes less than 2,000,000". It is given here below:

primes :: [Integer]
primes = 2 : filter isPrime [3, 5 ..]
    where
        -- only check divisibility of the numbers less than the square root of n
        isPrime n = all (not . divides n) $ takeWhile (\p -> p*p <= n) primes
        divides n p = n `mod` p == 0
 
result = sum $ takeWhile (< 2000000) primes
 
main = do putStrLn( show result )

He says "If this doesn't give you a nerdgasm, I don't know what will.". The problem is that this nerdgasm will last a long time. Benchmarking this program gives that 20 iterations of it run at 310 seconds - less than - about 15 seconds each (on my Pentium 4 2.4GHz machine running Mandriva Linux Cooker). So next I tried a better Haskell implmenetation that I recalled from a thread I started in the Haskell Café mailing list about implementing a sieve of Eratosthenes in Haskell:

import Data.Int

primes :: Int64 -> [Int64]

primes how_much = sieve [2..how_much] where
         sieve (p:x) = 
             p : (if p <= mybound
                 then sieve (remove (p*p) x)
                 else x) where
             remove what (a:as) | what > how_much = (a:as)
                                | a < what = a:(remove what as)
                                | a == what = (remove (what+step) as)
                                | a > what = a:(remove (what+step) as)
             remove what [] = []
             step = (if (p == 2) then p else (2*p)) 
         sieve [] = []
         mybound = ceiling(sqrt(fromIntegral how_much))

--main = print (length (primes 1000000))
main = print (sum (primes 2000000))

This does not involve costly operations such as modulo or division and 20 iterations of it run at 135 wallclocks seconds - over two times faster than zeroth's Haskell version.

Now how about Perl? Since Perl has assignment, we have the advantage that we can create a vector of bits that we will mark with the primes by iterating over all numbers up to the root of the limit. Here is the code:

#!/usr/bin/perl

use strict;
use warnings;

use Math::BigInt lib => 'GMP';

my $limit = 2_000_000;

my $primes_bitmask = "";

my $loop_to = int(sqrt($limit));
my $sum = 0;
my $total_sum = Math::BigInt->new('0');

for my $p (2 .. $loop_to)
{
    if (vec($primes_bitmask, $p, 1) == 0)
    {
        $sum += $p;

        my $i = $p * $p;

        while ($i < $limit)
        {
            vec($primes_bitmask, $i, 1) = 1;
        }
        continue
        {
            $i += $p;
        }

    }
}

for my $p ($loop_to .. $limit)
{
    if (vec($primes_bitmask, $p, 1) == 0)
    {
        if (($sum += $p) > (1 << 30))
        {
            $total_sum += $sum;
            $sum = 0;
        }
    }
}

$total_sum += $sum;
print "$total_sum\n";

20 runs of it run at 95 walclock seconds - even faster than the Haskell version. But it gets better. Since all the primes we encounter greater than 2 are not even, we can create a map of their pseduo-halves and conserve on memory and iterations. This is the Perl version:

#!/usr/bin/perl

use strict;
use warnings;

use Math::BigInt lib => 'GMP';

my $limit = 2_000_000;

my $primes_bitmask = "";

my $loop_to = (int(sqrt($limit)))>>1;
my $half_limit = ($limit-1)>>1;

my $sum = 0+2;
my $total_sum = Math::BigInt->new('0');

for my $half (1 .. $loop_to)
{
    if (vec($primes_bitmask, $half, 1) == 0)
    {
        my $p = (($half<<1)+1);
        $sum += $p;

        my $i = ($p * $p)>>1;

        while ($i < $limit)
        {
            vec($primes_bitmask, $i, 1) = 1;
        }
        continue
        {
            $i += $p;
        }

    }
}


for my $half ($loop_to .. $half_limit)
{
    if (vec($primes_bitmask, $half, 1) == 0)
    {
        if (($sum += (($half<<1)+1)) > (1 << 30))
        {
            $total_sum += $sum;
            $sum = 0;
        }
    }
}

$total_sum += $sum;
print "$total_sum\n";

Running this 20 times takes 73 wallclock seconds, close to half that of my Haskell version.

Then I wondered how long C will take. Here is a C implementation without the halving:

#include <string.h>
#include <math.h>
#include <stdint.h>
#include <stdio.h>

#define limit 2000000
int8_t bitmask[(limit+1)/8];

int main(int argc, char * argv[])
{
    int p, i;
    int mark_limit;
    long long sum = 0;

    memset(bitmask, '\0', sizeof(bitmask));
    mark_limit = (int)sqrt(limit);
    
    for (p=2 ; p <= mark_limit ; p++)
    {
        if (! ( bitmask[p>>3]&(1 << (p&(8-1))) ) )
        {
            /* It is a prime. */
            sum += p;
            for (i=p*p;i<=limit;i+=p)
            {
                bitmask[i>>3] |= (1 << (i&(8-1)));
            }
        }
    }
    for (; p <= limit; p++)
    {
        if (! ( bitmask[p>>3]&(1 << (p&(8-1))) ) )
        {
            sum += p;
        }
    }

    printf("%lli\n", sum);

    return 0;
}

This was too fast to measure with 20 runs alone, so 500 runs of it took 15 seconds, two or three orders of magnitude faster than the fastest Haskell or Perl versions. But naturally, we can apply the halving paradigm there too:

#include <string.h>
#include <math.h>
#include <stdint.h>
#include <stdio.h>

#define limit 2000000
int8_t bitmask[(limit+1)/8/2];

int main(int argc, char * argv[])
{
    int half, p, i;
    int half_limit;
    int loop_to;
    long long sum = 0 + 2;

    memset(bitmask, '\0', sizeof(bitmask));

    loop_to=(((int)(sqrt(limit)))>>1);
    half_limit = (limit-1)>>1;
    
    for (half=1 ; half <= loop_to ; half++)
    {
        if (! ( bitmask[half>>3]&(1 << (half&(8-1))) ) )
        {
            /* It is a prime. */
            p = (half << 1)+1;
            sum += p;
            for (i = ((p*p)>>1) ; i < half_limit ; i+=p )
            {
                bitmask[i>>3] |= (1 << (i&(8-1)));
            }
        }
    }

    for( ; half < half_limit ; half++)
    {
        if (! ( bitmask[half>>3]&(1 << (half&(8-1))) ) )
        {
            sum += (half<<1)+1;
        }
    }

    printf("%lli\n", sum);

    return 0;
}

500 runs of it take 10 wallclock seconds - 54.35 times per second, and 50% better than the previous C version. And I still haven't applied platform-specific gcc optimisations.

I should also note that the executables generated by ghc are extremely large in comparison to their C ones:

$ ls -l c_* haskell_*
-rwxr-xr-x 1 shlomi shlomi   6082 2009-12-04 07:46 c_mine
-rwxr-xr-x 1 shlomi shlomi   6103 2009-12-04 07:46 c_mine_half
-rwxr-xr-x 1 shlomi shlomi   6092 2009-12-04 07:46 c_mine_micro_opt
-rwxr-xr-x 1 shlomi shlomi 796825 2009-12-04 07:56 haskell_mine
-rwxr-xr-x 1 shlomi shlomi 571717 2009-12-04 07:46 haskell_zeroth

c_mine_half is less than 1% the size of haskell_mine (and runs faster). When talking about this to other people, they said that Haskell has a very optimised primes sequence generator, which I can try using (which should be over 3 times as fast), and that it has two kinds of integers, which the other type is faster, and that it has a better way to emulate assignment. But the bottom line is that the naïve and intuitive way to write such programs in Haskell is under-performant, even in comparison to Perl, and 100 or 1,000 times as much in comparison to C.

I've written this as a separate post and not as a comment to the original blog post because I'm very limited with the markup in the commenting there (I'm going to post a comment there with a link to this blog post, though). I should note that you can find all the code I mentioned inside a dedicated Github repository, and you can experiment with it further.

LinkReply

Comments:
From: (Anonymous)
2010-01-30 03:45 pm (UTC)

nsieve Haskell GHC

(Link)

>> the naïve and intuitive way to write such programs in Haskell is under-performant <<

So don't use "the naïve and intuitive way".

2.07s nsieve gcc
2.12s nsieve ghc

http://shootout.alioth.debian.org/gp4/program.php?test=nsieve&lang=ghc&id=1
From: (Anonymous)
2010-01-30 06:13 pm (UTC)

Re: nsieve Haskell GHC

(Link)

I agree you shouldn`t balk if your naive code underperforms, but I'm not sure about the "intuitive" part. Isn't one of the points of these higher level languages the fact that they are more... er... intuitive?
[User Picture]From: shlomif
2010-04-05 01:26 pm (UTC)

Re: nsieve Haskell GHC

(Link)

Hi Anonymous.

Very nice comment. I admit it's very insightful. Thanks for sharing.
From: (Anonymous)
2010-01-30 06:21 pm (UTC)

Re: nsieve Haskell GHC

(Link)

what if you have to solve a new problem that isn't in the lib?
[User Picture]From: shlomif
2010-01-30 07:30 pm (UTC)

Re: nsieve Haskell GHC

(Link)

Mr./Ms. Anonymous!

Your URL is broken and it's not a hyperlink. I'm tempted to delete your comment. I don't know what this nsieve is, but next time enter your text in HTML markup and preview it before you submit. I've been getting many comments like that lately. Presentation is very important.

Regards,

Shlomi Fish
From: (Anonymous)
2010-01-30 06:41 pm (UTC)

ghci vs. ghc?

(Link)

Are you running the Haskell code in the interpreter? When I compile your first program with `ghc -O2 --make prime` a single run takes about 1.8 seconds on my hardware (similar to yours).
[User Picture]From: shlomif
2010-01-30 07:21 pm (UTC)

Re: ghci vs. ghc?

(Link)

No, I'm not - I'm using ghc. See the Makefile:

HS_OPTS = -O2 -fvia-c -optc-O2

haskell_mine: 10.hs
    ghc $(HS_OPTS) -o $@ $<

I have ran about 20 runs one after the other and naturally there's some overhead of Perl 5 and its Benchmark.pm module. My latest benchmarks show that 20 runs of my algorithm after compiled to a final binary take 60 seconds (or 3 seconds per run on average) - comparable to yours.

From: (Anonymous)
2010-01-30 09:58 pm (UTC)

Re: ghci vs. ghc?

(Link)

It takes 1.7s / run on my laptop (2.66GHz Intel Core 2 Duo).
With a very simple change, computing the primes as Int and converting to Integer before summing I get 1.1s.

This is, of course, nowhere near the speed of your C code. But when you have a particular problem to solve you should solve it as easy as possible, and I think a running time of < 2s is quite acceptable.
[User Picture]From: shlomif
2010-01-31 01:47 pm (UTC)

Re: ghci vs. ghc?

(Link)

Hi.

1.7s / run sounds reasonable for a 2.66 GHz Intel Core 2 Duo. My machine is a Pentium 4 2.4GHz machine. You can optimise the Haskell program further by keeping the sum as an Int64 instead of an Integer, because an Int64 should be faster and the sum does not exceed it in this case. Can you show us any code?