Google Mail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
MetaOcaml and high-performance [was: AST versus Ocaml]
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  2 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Follow-up To:
Add Cc | Add Follow-up to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers that you hear
 
o...@okmij.org  
View profile   Translate to Translated (View Original)
 More options 9 Nov, 04:24
Newsgroups: fa.caml
From: o...@okmij.org
Date: Mon, 09 Nov 2009 04:24:02 UTC
Local: Mon 9 Nov 2009 04:24
Subject: [Caml-list] MetaOcaml and high-performance [was: AST versus Ocaml]

Jon Harrop wrote:
> I did some experiments with MetaOCaml. Firstly, it is x86 only and not x64
> which means poor floating-point performance, many times slower than HLVM's.
> The language itself is also very restrictive, e.g. you cannot generate
> pattern matches dynamically so you cannot leverage the pattern match
> compiler, which is a huge loss. In essence, effective use of MetaOCaml
> requires writing in continuation passing style which OCaml and, therefore,
> MetaOCaml do not handle well (closure invocations are slow in OCaml and CPS
> is not optimized back into anything sane). So I do not consider MetaOCaml to
> be a viable option for performance users.

A few clarifications seems to be in order. First of all, the original
poster asked about _offshoring_ with MetaOCaml. When the generated
code is run with offshoring, a C of Fortran file is created, which can
be compiled with your favorite compiler and dynamically linked back
into the running OCaml program. Alternatively, you can use the
generated C/Fortran code as it is, as a part of a C/Fortran project. We
did exactly the latter in our FFT project: we used MetaOCaml to create C
files for FFT kernels, and plugged the files into the FFTW benchmarking
framework, which is pure C. It worked as expected.

Because offshoring produces a portable C or Fortran code file, you can
use the code on 32 or 64-bit platform. The reason the native MetaOCaml
without offshoring does not work on amd64 is because at that time
OCaml didn't emit PIC code for amd64. So, dynamic linking was
impossible. That problem has long been fixed in later versions of
OCaml. Offshoring is a good way to get around it, thanks Jan Kybic.

Offshoring could just as well produce Verilog or LLVM code. Alas,
we didn't get around to exploring that idea.

Regarding continuation-passing style (CPS): if your code generatOR
needs let-insertion, then the generatOR _may_ need to be encoded in
CPS. The generatED code is _not_ in CPS; it is in the `conventional',
so-called `direct style'. Even if we assume that CPS code is
difficult to compile efficiently (which is not certain: in CPS, all
`important' function calls are tail-calls, and OCaml is very good with
tail-calls), that difficulty affects only the code generator rather
than the generated code. Most of the time it is the generated code
that is performance-critical.

One may say that writing the generator in CPS is a bother. I have
heard such objections. Please see our PEPM2009 paper about a way to
address it
        http://okmij.org/ftp/Computation/Generative.html#circle-shift
and write generators in the conventional style. Please see the example
of writing a flexible Gaussian Elimination code, paying no penalty for
abstractions.

Fortunately, some people have considered MetaOCaml to be a viable
option for performance users and have reported good results. For
example,

        Tuning MetaOCaml Programs for High Performance
        Diploma Thesis of Tobias Langhammer.
        http://www.infosun.fmi.uni-passau.de/cl/arbeiten/Langhammer.pdf

Here is a good quotation from the Introduction:

``This thesis proposes MetaOCaml for enriching the domain of high-performance
computing by multi-staged programming. MetaOCaml extends the OCaml
language.
..
    Benchmarks for all presented implementations confirm that the
execution time can be reduced significantly by high-level
optimizations. Some MetaOCaml programs even run as fast as respective
C implementations. Furthermore, in situations where optimizations in
pure MetaOCaml are limited, computation hotspots can be explicitly or
implicitly exported to C. This combination of high-level and low-level
techniques allows optimizations which cannot be obtained in pure C
without enormous effort.''

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile   Translate to Translated (View Original)
 More options 10 Nov, 15:38
Newsgroups: fa.caml
From: Jon Harrop <j...@ffconsultancy.com>
Date: Tue, 10 Nov 2009 15:38:01 UTC
Local: Tues 10 Nov 2009 15:38
Subject: Re: [Caml-list] MetaOcaml and high-performance [was: AST versus Ocaml]
On Monday 09 November 2009 04:23:28 o...@okmij.org wrote:

> Because offshoring produces a portable C or Fortran code file, you can
> use the code on 32 or 64-bit platform. The reason the native MetaOCaml
> without offshoring does not work on amd64 is because at that time
> OCaml didn't emit PIC code for amd64. So, dynamic linking was
> impossible. That problem has long been fixed in later versions of
> OCaml...

Has the problem been fixed in MetaOCaml?

That thesis contains three benchmarks:

1. Dense float matrix-matrix multiply.

2. Blur of an int image matrix as convolution with a 3x3 stencil matrix.

3. Polynomial multiplication with distributed parallelism.

I don't know about polynomial multiplication (suffice to say that it is not
leveraging shared-memory parallelism which is what performance users value in
today's multicore era) but the code for the first two benchmarks is probably
10-100x slower than any decent implementation. For example, his fastest
2048x2048 matrix multiply takes 167s whereas Matlab takes only 3.6s here.

In essence, the performance gain (if any) from offshoring to C or Fortran is
dwarfed by the lack of shared-memory parallelism.

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google