Google Mail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
OC4MC : OCaml for Multicore architectures
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 51 - 66 of 66 - Collapse all  -  Translate all to Translated (View all originals) < Older 
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Follow-up To:
Add Cc | Add Follow-up to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers that you hear
 
Mark Wong-VanHaren  
View profile   Translate to Translated (View Original)
 More options 25 Sep, 09:33
Newsgroups: fa.caml
From: Mark Wong-VanHaren <mark...@gmail.com>
Date: Fri, 25 Sep 2009 01:33:24 -0700 (PDT)
Local: Fri 25 Sep 2009 09:33
Subject: Re: OC4MC : OCaml for Multicore architectures
On Sep 24, 8:39 am, Jon Harrop <j...@ffconsultancy.com> wrote:

> Indeed. I have no idea how well received JoCaml has been but am certain that
> your work is of huge value.

My opinion: JoCaml is terrific.  Beautiful abstractions; a joy to use.

Absent an oc4mc-like change to OCaml's GC, one must use multiple OS
processes to obtain physical parallelism.  As a result, with
"distributed" JoCaml, speedup is possible only with coarse-grained
tasks (of, say, >1ms).  In such cases, it works great; we're using it
for physical parallelism in two commercial projects.

The marriage of JoCaml with a fast implementation of parallel GC would
be exciting.
--
Mark


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Philippe Wang  
View profile   Translate to Translated (View Original)
 More options 25 Sep, 10:33
Newsgroups: fa.caml
From: Philippe Wang <philippe.w...@lip6.fr>
Date: Fri, 25 Sep 2009 09:33:51 UTC
Local: Fri 25 Sep 2009 10:33
Subject: Re: [Caml-list] OC4MC : OCaml for Multicore architectures

On Sep 25, 2009, at 6:07 AM, Jacques Garrigue wrote:

> First, like everybody else, I'd like very much to try this out.
> Is there any chance it could compile on Snow Leopard :-)
> (I suppose it's near impossible, but still ask...)

I haven't tried that yet, mostly because I guess that it wouldn't work  
out-of-the-box.
However, the .asm file should be ok with OS X and what may clash are  
configure file behavior and C macros.
I should take a closer look at that, since SL now seems to work well.

Cheers,

--
Philippe Wang
   Philippe.W...@lip6.fr
   http://www-apr.lip6.fr/~pwang/

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile   Translate to Translated (View Original)
 More options 25 Sep, 11:06
Newsgroups: fa.caml
From: Jon Harrop <j...@ffconsultancy.com>
Date: Fri, 25 Sep 2009 10:06:38 UTC
Local: Fri 25 Sep 2009 11:06
Subject: Re: [Caml-list] OC4MC : OCaml for Multicore architectures
On Friday 25 September 2009 08:32:26 Hugo Ferreira wrote:

> Put it another way; if parallel/concurrent programming could be
> easily used with a minimum of effort then I believe "most people"
> would use it simply because it is available.

Once your run-time supports it, you just need a library that farms tasks out
to threads via queues and a lot of parallelism really is easy.

>  >...
>  > If I tell you that you just have to modify a bit your program to get a
>  > near linear speedup, then it looks great. But in practice it is rather
>  > having to rethink completely your algorithm, to eventually get a
>  > speedup bounded by bandwidth, and starting from a point lower than the
>  > original single thread program.
>  >...

> Rethinking our application/algorithmic structure may not be a real
> deterrent. An application does not require parallel/concurrent
> processing everywhere. It is really a question of identifying where
> and when this is useful. Much like selecting the most "appropriate"
> data-structure for any application. It's not an all or nothing
> proposition.

Right. Parallelizing programs generally consists of identifying a performance
bottleneck via measurement and performing the outermost parallelizable loops
in parallel. You can do many more clever things but they are far less common.

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
kche...@math.carleton.ca  
View profile   Translate to Translated (View Original)
 More options 25 Sep, 14:04
Newsgroups: fa.caml
From: kche...@math.carleton.ca
Date: Fri, 25 Sep 2009 13:04:53 UTC
Local: Fri 25 Sep 2009 14:04
Subject: Re: [Caml-list] OC4MC : OCaml for Multicore architectures

> On Friday 25 September 2009 08:32:26 Hugo Ferreira wrote:
>> Put it another way; if parallel/concurrent programming could be
>> easily used with a minimum of effort then I believe "most people"
>> would use it simply because it is available.

> Once your run-time supports it, you just need a library that farms tasks
> out
> to threads via queues and a lot of parallelism really is easy.

I wonder if Snow Leopard's Grand Central Dispatch
is of relevance here.  But then, it'll be OS-specific.

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Philippe Wang  
View profile   Translate to Translated (View Original)
 More options 25 Sep, 15:12
Newsgroups: fa.caml
From: Philippe Wang <philippe.wang.li...@gmail.com>
Date: Fri, 25 Sep 2009 14:12:02 UTC
Local: Fri 25 Sep 2009 15:12
Subject: Re: [Caml-list] OC4MC : OCaml for Multicore architectures

On Fri, Sep 25, 2009 at 1:28 AM, Jon Harrop <j...@ffconsultancy.com> wrote:
> On Thursday 24 September 2009 15:38:06 Philippe Wang wrote:
>> Very few programs that are not written with multicore in mind would
>> not be penalized.
>> I mean our GC is much much dumber than INRIA OCaml's one.
>> Our goal was to show it was possible to have good performance with
>> multicores for OCaml.
>> Maybe someday we'll find some time to optimize the GC, but it's likely
>> not very soon.

> Just to quantify this with a data point: the fastest (serial) version of my
> ray tracer benchmark is 10x slower with the new GC. However, this is
> anomalous with respect to complexity and the relative performance is much
> better for simpler renderings. For example, the new GC is only 1.7x slower
> with n=6 instead of n=9.

I just put a version with a bug fix on some structures allocation (20090925).
I hope it removes this anomaly.

--
Philippe Wang
   m...@philippewang.info

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Xavier Leroy  
View profile   Translate to Translated (View Original)
 More options 25 Sep, 16:05
Newsgroups: fa.caml
From: Xavier Leroy <Xavier.Le...@inria.fr>
Date: Fri, 25 Sep 2009 15:05:18 UTC
Local: Fri 25 Sep 2009 16:05
Subject: Re: [Caml-list] OC4MC : OCaml for Multicore architectures

Jon Harrop wrote:
> On Thursday 24 September 2009 13:39:40 Stefano Zacchiroli wrote:
>> On Thu, Sep 24, 2009 at 12:52:24PM +0100, Jon Harrop wrote:
>>> The next steps are to get oc4mc into the apt repositories and build
>> Uhm, I'm curious: how do you plan to achieve that?

> Good question. I have no idea, of course. :-)

That would be suicidal.  I definitely do not want to belittle the work
of Philippe and his teammates -- what they did is an amazing hack
indeed --, but you need to keep in mind the difference between a
proof-of-concept experiment and a product.

In a proof-of-concept experiment, you implement the feature want to
experiment with and keep everything else as simple as possible
(otherwise there is little chance that you'll complete the
experiment).  That's exactly what Philippe et al did, and rightly so:
their GC is about the simplest you can think of, they didn't bother
adapting some features of the run-time system, they target AMD64/Unix
only, etc.  Now they have a platform they can experiment with and make
measurements on: mission accomplished.

In a product, you'd need something that is essentially a drop-off
replacement for today's OCaml and can run, say, Coq with at most a 10%
slowdown.  That's a long way to go (I'd say a couple of years of work).
For example, single-generation stop-and-copy GC is known to have
terrible performance (both in running time and in latency) for
programs that have large data sets and allocate intensively.  This is
true in the sequential case and even worse in a stop-the-world
parallel setting, by Amdahl's law.  Note that the programs I mentioned
above are exactly those that the Caml user community cares most about
-- not matrix multiply nor ray tracers, Harrop's propaganda
notwithstanding -- and those for which OCaml has been delivering
top-class performance for the last 12 years -- again, Harrop's
propaganda notwithstanding.

On your way to a product, you'd need to independently-collectable
generations (which means some work on the compiler as well), plus a
parallel or even better concurrent major collector.  And of course a
lot more work on the runtime system and C interface to make everything
truly reentrant while remaining portable.  And probably some kind of
two-level scheduler for threads.  And after all that work
you'd end up with an extremely low-level and unsafe parallel
programming model that you'd need to tame by developing clever
libraries that mere mortals can use effectively (Apple's Grand Central
was mentioned on this thread; it's a good example)...

In summary, Philippe and his coauthors do deserve a round of applause,
but please keep a cool head.

- Xavier Leroy

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile   Translate to Translated (View Original)
 More options 25 Sep, 22:28
Newsgroups: fa.caml
From: Jon Harrop <j...@ffconsultancy.com>
Date: Fri, 25 Sep 2009 21:28:39 UTC
Local: Fri 25 Sep 2009 22:28
Subject: Re: [Caml-list] OC4MC : OCaml for Multicore architectures
On Friday 25 September 2009 05:07:21 Jacques Garrigue wrote:

> Your benchmark seems strange to me, as you are comparing apples with
> oranges.

In some sense, yes. I was interested in the performance of the
defacto-standard hash table implementations and not the performance that can
be obtained by reinventing the wheel.

> Hashtables in Python are a basic feature of the language,
> and they are of course implemented in C. In ocaml, they are
> implemented in ocaml (except the hashing function, which has to be
> polymorphic), using an array of association lists!
> (Actually the pairs are flattened for better performance, but still)
> What is impressive is that you don't need any special optimization to
> get reasonably good performance.

OCaml is 4x slower than F# on that benchmark for several reasons:

1. Overhead of 31-bit int arithmetic.

2. Lack of constant table sizes in the implementation and OCaml's failure to
optimize mod-by-a-constant.

3. No monomorphization.

You can write a far more efficient hash table implementation in F# than you
can in OCaml because it addressed all of those deficiencies.

> Actually the only tuning you need is to start from a reasonable table size,
> which you didn't...

No, the exact opposite is true: OCaml had the unfair advantage of starting
from the optimal table size for the problem whereas F# started from the
default size and had to resize. If you level the playing field then OCaml is
8x slower than F#.

> > Even if that were not the case, the idea of cherry picking interpreted
> > scripting languages to compete with because OCaml has fallen so far
> > behind mainstream languages (let alone modern languages) is embarrassing.
> > What's next, OCaml vs Bash for your high performance needs?

> OCaml was never touted as an HPC language!

I started learning OCaml because people were running high performance OCaml
code on a 256-CPU supercomputer in Cambridge. I have been touting OCaml for
HPC ever since. Thousands of scientists and engineers all over the world have
used OCaml for technical computing and chose it precisely because it was
competitively performant.

> The only claim I've seen is that it intends to stay within 2x of C for most
> applications. (Which is not so easy these days, gcc getting much faster.)

Yes. The infrastructure for compiler writers is improving rapidly as well
though, e.g. LLVM.

> Actually, I believe that Philippe's point is rather different.
> Making a functional language work well on multicores is difficult.
> If I tell you that you just have to modify a bit your program to get a
> near linear speedup, then it looks great. But in practice it is rather
> having to rethink completely your algorithm,

Sure. The free lunch is over. However, the solution usually consists either of
spawning independent computations or parallelizing outer loops, both of which
can be made very easy by the language implementor.

> to eventually get a speedup bounded by bandwidth,

For some applications under certain circumstances, yes.

> and starting from a point lower than the original single thread program.

Yes.

> There are applications for that (ray tracing is one), but this is not the
> kind of needs most people have.

Not the kind of needs the remaining OCaml programmers have, perhaps. Outside
the OCaml world, a lot of people are now programming for multicores.

> By the way, I was discussing with numerical computation people working
> on BLAS the other day, and their answer was clear: if you need high
> performance, better use a grid than SMP, since bandwidth is
> paramount.

That is a false dichotomy. Grids are inevitably composed of multicores so you
will still lose out if you fail to leverage SMP when programming for a grid.

> ...And you have to write in C or FORTRAN (or asm), because the timing of
> instructions matter.

I have written linear algebra code in F# that outperforms Intel's vendor tuned
Fortran (the MKL) by a substantial margin on Intel hardware. Moreover, their
code only works on certain types whereas mine is generic.

OCaml is an excellent language for this kind of work but it requires an
implementation with a performance profile that is very different from
OCaml's.

> The funniest part was that those people were working on integer
> computations, but had to stick to floating point, because timing on integers
> is unpredictable, making synchronization harder.  

Interesting.

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gerd Stolpmann  
View profile   Translate to Translated (View Original)
 More options 25 Sep, 22:34
Newsgroups: fa.caml
From: Gerd Stolpmann <g...@gerd-stolpmann.de>
Date: Fri, 25 Sep 2009 21:34:44 UTC
Local: Fri 25 Sep 2009 22:34
Subject: Re: [Caml-list] OC4MC : OCaml for Multicore architectures

> Rethinking our application/algorithmic structure may not be a real
> deterrent. An application does not require parallel/concurrent
> processing everywhere. It is really a question of identifying where
> and when this is useful. Much like selecting the most "appropriate"
> data-structure for any application. It's not an all or nothing
> proposition.

Well, if you get many cores for free it sounds logical to get the most
out of it. If you have to pay for extra cores, it becomes quickly a bad
deal. Imagine you can parallelize 50% of the runtime of the application.
Even if you have as many cores as you want, and the runtime of the
sped-up part drops to almost 0, the other still-sequential 50% limit the
overall improvement to only 50%. (That's known as Amdahl's law, Xavier
also mentioned it.) So, especially when you have many cores, it is not
the number of cores that limit the speed-up in practice, but the
fraction of the algorithm that can be parallelized at all.

I'm working for a company that uses Ocaml in a highly parallelized
world. We are running it on grid-style compute clusters to process text
and symbolic data. We are using multi-processing, which is easy to do
with current Ocaml. Programs we write often run on more than 100 cores.
Guess what our biggest problem is? Getting all the cores busy. Because
there is always also some sequential part, or buggy parallel part that
limits the overall throughput. We are constantly searching for these
"bottlenecks" as our managers call this phenomenon (and we get a lot of
pressure because the company pays a lot for these many cores, and they
want to see them utilized).

We have the big advantage that our data sets are already organized in an
easy-to-parallelize way, i.e. you can usually split it up into
independent portions, and process them independently (but not always).
If you cannot do this (like in a multi-core-capable GC where always some
part of the heap is shared by all cores), things become quickly very
complicated. So I generally do not expect much from such a GC.

We are also using Java with its multi-core GC. However, we are sometimes
seeing better performance when we don't scale it to the full number of
cores the system has, but also combine it with multi-processing (i.e.
start several Javas). I simply guess the GC runs at some time into lock
contention, and has to do many things sequentially.

So, I'm a professional and massive user of multi-core programming.
Nevertheless, my first wish is not to get a multi-core GC for
shared-memory parallelism, because I doubt we ever get a satisfactory
solution. My first wish is to make single-threaded execution as fast as
possible. The second one is to make RPC's cheaper, especially between
processes on the same system (or put it this way: I'd like to see that
the processes normally have their private heaps and are fully separated,
but also that they can use a shared memory segment by explicitly moving
values there - in the direction of Richard's Ancient module - so that it
is possible to make an RPC call by moving data to this special segment).

Of course, I appreciate any work on multi-core improvements, so applause
to Philippe and team.

Gerd
--
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany
g...@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Benjamin Canou  
View profile   Translate to Translated (View Original)
 More options 26 Sep, 00:27
Newsgroups: fa.caml
From: Benjamin Canou <benjamin.ca...@gmail.com>
Date: Fri, 25 Sep 2009 23:27:15 UTC
Local: Sat 26 Sep 2009 00:27
Subject: Re: [Caml-list] OC4MC : OCaml for Multicore architectures
  Hi everyone,

And let's have a little prayer for Philippe who is now in bed, suffering
from its head and hands because of his teammates letting him answer all
the mail.
Just (half) kidding.

So,

Xavier Leroy a wrote (and probably described the work quite well) :

> what they did is an amazing hack [1]
> indeed --, but you need to keep in mind the difference between a
> proof-of-concept experiment and a product.

By reading some messages in this thread I think we need to clarify again
the context and goals of OC4MC.

One of our main goals for OC4MC is to serve as a parallel and shared
memory low-level concurrency implementation, on top of which higher
level research concurrency libraries and language extensions can be
built. And as most of us agree, multicores, and soon manycores, are hard
to program, in particular because of the memory bandwidth. So there
probably are experiments to be done to help this at the language level,
now that we have this parallel runtime. Moreover, and to answer a
question that appeared in this thread, we provide our simple GC, but we
separated the GC algorithm from the runtime, so OC4MC is also a
low-level playground to experiment with your own GCs and choose the one
you want to use at linking.

To sum up, let's see OC4MC as an experimentation platform that leverages
some restrictions of OCaml, but of course neither as a drop-in
replacement for the official distribution nor as the future of OCaml. We
do not claim that the ideal solution to bring shared memory parallelism
to OCaml is, as OC4MC does, only to replace the runtime (and that INRIA
can just replace the official runtime by our hacked one).  However, from
a pragmatic (and optimistic) point of view, the modifications to the
compiler have been kept very lightweight, yet sufficient to break binary
compatibility. So if the excitement continues around OC4MC as in this
thread, maybe these modifications could be integrated into the
distribution since they really do not touch the core of the compiler and
cannot cause a lot of maintenance overhead.

I will add that we did not made this experiment to beat F# or python's
hashtables, so I will not comment on that here. The point about
performance is that it should be *predictable*.  We now have rewritten
and debugged most of the memory related behaviors present in the
original runtime in a more generic (and OC4MC friendly) way to achieve
this, and if it's not the case for some particular cases, we'll be glad
to (try to) fix these bugs.

On the maintenance side, as Philippe said, we already have some half
working version with ocaml 3.11.x, but partly because of the changes
made to the native runtime in this release and partly because of [1],
porting the patch is not trivial.

Cheers and have fun experimenting with OC4MC (so it will compensate the
amount of debugging we spent on it ;-) ).
  Benjamin.

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
kche...@math.carleton.ca  
View profile   Translate to Translated (View Original)
 More options 26 Sep, 01:46
Newsgroups: fa.caml
From: kche...@math.carleton.ca
Date: Sat, 26 Sep 2009 00:46:22 UTC
Local: Sat 26 Sep 2009 01:46
Subject: Re: [Caml-list] OC4MC : OCaml for Multicore architectures

> I will add that we did not made this experiment to beat F# or python's
> hashtables, so I will not comment on that here. The point about
> performance is that it should be *predictable*.

Perhaps an off-topic and naive question:
What does it take to beat F# and still
have predictable performance?

In any case, OC4MC is very encouraging.
Congrats to the team!

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile   Translate to Translated (View Original)
 More options 26 Sep, 02:42
Newsgroups: fa.caml
From: Jon Harrop <j...@ffconsultancy.com>
Date: Sat, 26 Sep 2009 01:42:42 UTC
Local: Sat 26 Sep 2009 02:42
Subject: Re: [Caml-list] OC4MC : OCaml for Multicore architectures
On Saturday 26 September 2009 01:45:50 kche...@math.carleton.ca wrote:

> Perhaps an off-topic and naive question: What does it take to beat F# and
> still have predictable performance?

Provided you're talking abouts today's machines and don't care about pause
times, HLVM with a parallel GC (not unlike the oc4mc one) and a task library
would beat F# and still have predictable performance.

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
kche...@math.carleton.ca  
View profile   Translate to Translated (View Original)
 More options 26 Sep, 14:51
Newsgroups: fa.caml
From: kche...@math.carleton.ca
Date: Sat, 26 Sep 2009 13:51:39 UTC
Local: Sat 26 Sep 2009 14:51
Subject: Re: [Caml-list] OC4MC : OCaml for Multicore architectures

> On Saturday 26 September 2009 01:45:50 kche...@math.carleton.ca wrote:
>> Perhaps an off-topic and naive question: What does it take to beat F#
>> and
>> still have predictable performance?

> Provided you're talking abouts today's machines and don't care about pause
> times, HLVM with a parallel GC (not unlike the oc4mc one) and a task
> library
> would beat F# and still have predictable performance.

If I understand correctly, HLVM is an
analog of Microsoft's CLR.  So theoretically,
one can build a compiler for ocaml that
compiles to HLVM.  Would that make ocaml
beat F#?

Kevin.

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile   Translate to Translated (View Original)
(1 user)  More options 26 Sep, 15:35
Newsgroups: fa.caml
From: Jon Harrop <j...@ffconsultancy.com>
Date: Sat, 26 Sep 2009 14:35:28 UTC
Local: Sat 26 Sep 2009 15:35
Subject: Re: [Caml-list] OC4MC : OCaml for Multicore architectures
On Saturday 26 September 2009 14:51:21 kche...@math.carleton.ca wrote:

> > On Saturday 26 September 2009 01:45:50 kche...@math.carleton.ca wrote:
> >> Perhaps an off-topic and naive question: What does it take to beat F#
> >> and
> >> still have predictable performance?

> > Provided you're talking abouts today's machines and don't care about
> > pause times, HLVM with a parallel GC (not unlike the oc4mc one) and a
> > task library
> > would beat F# and still have predictable performance.

> If I understand correctly, HLVM is an
> analog of Microsoft's CLR.

HLVM certainly draws upon ideas from the CLR but it is different in many
respects. One important advantage of HLVM over the CLR is that it handles
structs correctly in the presence of tail calls (thanks to LLVM). This means
that tuples can be represented (in the absence of polymorphic recursion) as
unboxed C structs which *greatly* reduces the burden on the garbage
collector. HLVM also uses a far superior code generator (LLVM) compared to
the CLR and OCaml.

> So theoretically,
> one can build a compiler for ocaml that
> compiles to HLVM.  Would that make ocaml
> beat F#?

That would beat the performance of F# with minimal effort. That was the goal
of my HLVM hobby project but I was forced to shelve it when the recession
hit. Hopefully I'll get back to it in 2010...

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile   Translate to Translated (View Original)
(1 user)  More options 26 Sep, 17:44
Newsgroups: fa.caml
From: Jon Harrop <j...@ffconsultancy.com>
Date: Sat, 26 Sep 2009 16:44:47 UTC
Local: Sat 26 Sep 2009 17:44
Subject: Re: [Caml-list] OC4MC : OCaml for Multicore architectures
On Friday 25 September 2009 22:39:42 Jon Harrop wrote:

> On Friday 25 September 2009 05:07:21 Jacques Garrigue wrote:
> > Hashtables in Python are a basic feature of the language,
> > and they are of course implemented in C. In ocaml, they are
> > implemented in ocaml (except the hashing function, which has to be
> > polymorphic), using an array of association lists!
> > (Actually the pairs are flattened for better performance, but still)
> > What is impressive is that you don't need any special optimization to
> > get reasonably good performance.

> OCaml is 4x slower than F# on that benchmark...

That was mapping int -> int where OCaml has the unfair advantage of optimal
initial size. If you map float -> float and give F# an initial size then it
is over 18x faster than OCaml. The reason is, of course, OCaml's data
representation strategy that is optimized for Xavier's Coq.

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile   Translate to Translated (View Original)
 More options 10 Oct, 05:01
Newsgroups: fa.caml
From: Jon Harrop <j...@ffconsultancy.com>
Date: Sat, 10 Oct 2009 04:01:14 UTC
Local: Sat 10 Oct 2009 05:01
Subject: Re: [Caml-list] OC4MC : OCaml for Multicore architectures
On Saturday 26 September 2009 00:26:50 Benjamin Canou wrote:

> On the maintenance side, as Philippe said, we already have some half
> working version with ocaml 3.11.x, but partly because of the changes
> made to the native runtime in this release and partly because of [1],
> porting the patch is not trivial.

OC4MC seems to work very well for numerical problems that do not allocation at
all but introducing even the slightest mutation (not even in the inner loop)
completely destroys performance and scaling. I'm guessing the reason is that
any allocations eventually trigger collections and those are copying the
entire heap which, in this case, consists almost entirely of float array
arrays.

My guess was that using big arrays would alleviate this problem by placing
most of the data outside the OCaml heap (I'm guessing that oc4mc leaves the
element data of a big array alone and copies only the small reference to
it?). However, it does not seem to handle bigarrays:

./out/lib/ocaml//libbigarray.a(bigarray_stubs.o): In function
`caml_ba_compare':
bigarray_stubs.c:(.text+0x1e5): undefined reference to
`caml_compare_unordered'
bigarray_stubs.c:(.text+0x28d): undefined reference to
`caml_compare_unordered'
collect2: ld returned 1 exit status
Error during linking

If I am correct then I would value functioning bigarrays above OCaml 3.11
support.

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile   Translate to Translated (View Original)
 More options 8 Nov, 18:11
Newsgroups: fa.caml
From: Jon Harrop <j...@ffconsultancy.com>
Date: Sun, 08 Nov 2009 18:11:50 UTC
Local: Sun 8 Nov 2009 18:11
Subject: Re: [Caml-list] OC4MC : OCaml for Multicore architectures
On Friday 25 September 2009 00:28:57 Jon Harrop wrote:

> Just to quantify this with a data point: the fastest (serial) version of my
> ray tracer benchmark is 10x slower with the new GC. However, this is
> anomalous with respect to complexity and the relative performance is much
> better for simpler renderings. For example, the new GC is only 1.7x slower
> with n=6 instead of n=9.

The new SmartPumpkin release of OC4MC does a lot better. Specifically, the
version compiled with partial collections is now only 3.9x slower on a serial
ray tracer with n=9 (compared to 10x slower before). I'll try it in more
detail...

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages < Older 
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google