to Lisp. Some of them are reporting competitive performance. However, when I try to run their programs with either CMUCL or SBCL they are two orders of magnitude slower. Given the number of people claiming similarly good performance, I'd like to know what the possible cause of the relative slowdown on my computer is?
My system is an unladen 900MHz Athlon T-bird with 768Mb RAM running Debian testing with SBCL 0.8.16 and CMUCL "19b-release-20050628-3 + minimal debian patches". Other people have both slower and faster CPUs and more and less RAM but all are consistently much faster than mine.
I believe SBCL always compiles to native code and I am asking CMUCL to compile to native code with:
Jon Harrop wrote: > ... when > I try to run their programs with either CMUCL or SBCL they are two orders > of magnitude slower.
Slight correction - CMUCL is only 1 order of magnitude slower.
Also, I just tried running the ackermann and harmonic tests from the shootout and they run at the same (fast) speed on my machine as on the shootout's machine. So it seems the problem is specific to raytracing-like code.
> to Lisp. Some of them are reporting competitive performance. However, when I > try to run their programs with either CMUCL or SBCL they are two orders of > magnitude slower. Given the number of people claiming similarly good > performance, I'd like to know what the possible cause of the relative > slowdown on my computer is?
> My system is an unladen 900MHz Athlon T-bird with 768Mb RAM running Debian > testing with SBCL 0.8.16 and CMUCL "19b-release-20050628-3 + minimal debian > patches". Other people have both slower and faster CPUs and more and less > RAM but all are consistently much faster than mine.
> I believe SBCL always compiles to native code and I am asking CMUCL to > compile to native code with:
Time for running in Allegro CL 7.0 on a 933 MHz Pentium II. as given:
; cpu time (non-gc) 60,202 msec (00:01:00.202) user, 421 msec system ; cpu time (gc) 33,623 msec user, 10 msec system ; cpu time (total) 93,825 msec (00:01:33.825) user, 431 msec system ; real time 105,632 msec (00:01:45.632) ; space allocation: ; 967,240 cons cells, 3,719,472,272 other bytes, 4,472 static bytes
But converting all "single-float" to double-float:
; cpu time (non-gc) 4,446 msec user, 120 msec system ; cpu time (gc) 2,524 msec user, 0 msec system ; cpu time (total) 6,970 msec user, 120 msec system ; real time 7,961 msec ; space allocation: ; 418,422 cons cells, 292,782,544 other bytes, 0 static bytes The compiler complained about the "shadow" statement warning: compile-file found "SHADOW" at the top-level -- see the documentation for comp:*cltl1-compile-file-toplevel-compatibility-p*
My guess is that much of the verbosity for the sbcl version could be struck out of the allegro cl version without any loss in speed, and that careful attention to other potential optimizations/ declarations could squeeze out better performance.
It could be that this single/double issue relates to your AMD64 timings. Most of it may be converting single to double and back.
Richard Fateman wrote: > Time for running in Allegro CL 7.0 on a 933 MHz Pentium II. > as given:
> ; cpu time (non-gc) 60,202 msec (00:01:00.202) user, 421 msec system > ; cpu time (gc) 33,623 msec user, 10 msec system > ; cpu time (total) 93,825 msec (00:01:33.825) user, 431 msec system > ; real time 105,632 msec (00:01:45.632) > ; space allocation: > ; 967,240 cons cells, 3,719,472,272 other bytes, 4,472 static bytes
These times agree with my own.
> But converting all "single-float" to double-float:
> ; cpu time (non-gc) 4,446 msec user, 120 msec system > ; cpu time (gc) 2,524 msec user, 0 msec system > ; cpu time (total) 6,970 msec user, 120 msec system > ; real time 7,961 msec > ; space allocation: > ; 418,422 cons cells, 292,782,544 other bytes, 0 static bytes > The compiler complained about the "shadow" statement > warning: compile-file found "SHADOW" at the top-level -- see the > documentation for > comp:*cltl1-compile-file-toplevel-compatibility-p*
Very interesting - thanks for that. Unfortunately replacing single-float with double-float causes SBCL to spew out a lot of error messages. I'll see if I can figure out why and I'll try CMUCL.
> My guess is that much of the verbosity for the sbcl version could be > struck out of the allegro cl version without any loss in speed, and that > careful attention to other potential optimizations/ declarations could > squeeze out better performance.
Right. Is Allegro CL free?
> It could be that this single/double issue relates to your AMD64 timings. > Most of it may be converting single to double and back.
Yes. I've no idea what CL says about coercions. The other languages may well be much more lax in this respect.
It is interesting that single vs double precision has such bizarre and (for me) unexpected performance implications...
<use...@jdh30.plus.com> wrote: >>> My system is an unladen 900MHz Athlon T-bird with 768Mb RAM running Debian >>> testing with SBCL 0.8.16 and CMUCL "19b-release-20050628-3 + minimal >>> debian patches".
That's a rather ancient version of SBCL, you might want to upgrade. For example:
This is using about 2x the memory of what you'd expect for each instance, and doing an extra memory indirection for each slot access. Proper support for storing the floats "raw" in the struct was added in 0.9.2 by David Lichteblau.
Another possible pitfall on older SBCLs (<0.8.21) is that they don't honor the compiler policy for code entered on the repl, but compile it with low speed, high debug/safety. If you've been pasting the code into the repl instead of LOADing it, performance would indeed be horrible.
-- Juho Snellman "Premature profiling is the root of all evil."
-- * Harald Hanche-Olsen <URL:http://www.math.ntnu.no/~hanche/> - Debating gives most of us much more psychological satisfaction than thinking does: but it deprives us of whatever chance there is of getting closer to the truth. -- C.P. Snow
Juho Snellman wrote: > <use...@jdh30.plus.com> wrote: >>>> My system is an unladen 900MHz Athlon T-bird with 768Mb RAM running >>>> Debian testing with SBCL 0.8.16 and CMUCL "19b-release-20050628-3 + >>>> minimal debian patches".
> That's a rather ancient version of SBCL, you might want to upgrade.
Debian unstable has 0.9.3. It's upgrading now... :-)
> This is using about 2x the memory of what you'd expect for each > instance, and doing an extra memory indirection for each slot access. > Proper support for storing the floats "raw" in the struct was added in > 0.9.2 by David Lichteblau.
I see.
> Another possible pitfall on older SBCLs (<0.8.21) is that they don't > honor the compiler policy for code entered on the repl, but compile it > with low speed, high debug/safety. If you've been pasting the code > into the repl instead of LOADing it, performance would indeed be > horrible.
Aha! Yes indeed. I just tried (load (compile-file "...")) and it runs in only 20secs compared to ~250secs from the top-level and 2.5secs for C++.
Thanks for the help. I'll repost when I get results with the new compiler.
Cute. I presume Allegro can completely optimize away the allocation of temporary structs consisting of only double-floats, but doesn't do this for single-floats?
-- Juho Snellman "Premature profiling is the root of all evil."
Raffael Cavallaro wrote: > On 2005-08-13 07:18:40 -0400, Jon Harrop <use...@jdh30.plus.com> said:
>> Here is Nathan Baum's port for CMUCL and SBCL:
> just as an additional data point, this code runs in just over 6 seconds > in sbcl 0.9.3 on a dual 2.0 GHz G5 (though sbcl only uses one processor).
On a single Xeon 3.4 GHz with sbcl 0.9.3:
Evaluation took: 3.783 seconds of real time 3.305498 seconds of user run time 0.477927 seconds of system run time 0 page faults and 509,576,768 bytes consed
Svenne Krap wrote: > Raffael Cavallaro wrote: >> just as an additional data point, this code runs in just over 6 seconds >> in sbcl 0.9.3 on a dual 2.0 GHz G5 (though sbcl only uses one processor).
> On a single Xeon 3.4 GHz with sbcl 0.9.3:
> Evaluation took: > 3.783 seconds of real time > 3.305498 seconds of user run time > 0.477927 seconds of system run time > 0 page faults and > 509,576,768 bytes consed
On my 1.8GHz AMD64 in 32-bit mode with SBCL 0.9.3 I'm now getting:
5.21 seconds of real time 4.15 seconds of user run time 0.62 seconds of system run time 0 page faults and 509,569,248 bytes consed
This seems to be on-par with other people's observations.
This compares to 1.037s for OCaml and 0.987s for C++, so SBCL is now much more competitive.
Ulrich Hobelmann wrote: > I wouldn't consider 5 times as slow as a *functional* language very > competitive, but it might be fast enough for many problems.
Well, it's relative. Most of the other Lisp/Scheme implementations were two orders of magnitude slower. Stalin gets even closer than SBCL.
Also, MLton often beats g++, so functional languages aren't slow coaches any more...
Jon Harrop <use...@jdh30.plus.com> writes: > Ulrich Hobelmann wrote: > > I wouldn't consider 5 times as slow as a *functional* language very > > competitive, but it might be fast enough for many problems.
> Well, it's relative. Most of the other Lisp/Scheme implementations were two > orders of magnitude slower. Stalin gets even closer than SBCL.
Which Lisps are you talking about? We've already seen where Allegro is faster than this SBCL timing and it hadn't even been optimized yet. I would be surprised if Lispworks were much different in this regard as well.
> Also, MLton often beats g++, so functional languages aren't slow coaches any > more...
So do many CL implementations on many benchmarks when "properly" written. Several have been shown here in the past. Typically this starts with the original posting "showing" how bad CL is supposed to be when comparing optimized C/C++ with naively written CL. It also often ends with the CL version beating the optimized C/C++ version.
Most typical of all is that such benchmarks (including this ray tracing thing) don't have much of anything interesting to say about anything.
/Jon
-- 'j' - a n t h o n y at romeo/charley/november com
jayessay wrote: > Jon Harrop <use...@jdh30.plus.com> writes: >> Ulrich Hobelmann wrote: >> > I wouldn't consider 5 times as slow as a *functional* language very >> > competitive, but it might be fast enough for many problems.
>> Well, it's relative. Most of the other Lisp/Scheme implementations were >> two orders of magnitude slower. Stalin gets even closer than SBCL.
> Which Lisps are you talking about?
Primarily CMUCL and SBCL.
> We've already seen where Allegro > is faster than this SBCL timing and it hadn't even been optimized yet. > I would be surprised if Lispworks were much different in this regard > as well.
Is Lispworks free?
>> Also, MLton often beats g++, so functional languages aren't slow coaches >> any more...
> So do many CL implementations on many benchmarks when "properly" > written. Several have been shown here in the past.
What kinds of tasks is Lisp best at, in terms of performance? I Googled for information on this but most of the sites I found were no longer up.
> Typically this > starts with the original posting "showing" how bad CL is supposed to > be when comparing optimized C/C++ with naively written CL. It also > often ends with the CL version beating the optimized C/C++ version.
Can you point me to some examples of this? I heard of a benchmark written long ago where some Lisp gurus managed to code an equivalently-efficient implementation in Lisp. However, it is important to know how easily an efficient version can be written. LOC is a very rudimentary measure of development time.
> Most typical of all is that such benchmarks (including this ray > tracing thing) don't have much of anything interesting to say about > anything.
I think my conclusions were interesting. In particular, I was surprised to see modern functional language implementations doing so well at what is arguably their weakest point.
I'd like to do another benchmark with an example from scientific computing next...
Jon Harrop wrote: > What kinds of tasks is Lisp best at, in terms of performance? I Googled for > information on this but most of the sites I found were no longer up.
Why performance at all? Lisp is good at many things, most notably good error recovery (interactive debugger, restarts...), but not for high-performance computing. There you probably want Fortran or C (and maybe link them to Lisp).
For symbolic processing, or anything non-number-chrunchy I wouldn't be surprised if an application written in Lisp (compiled) isn't a bit slower than the same app written in C++ or Java. But of course nobody writes an app in several languages...
-- I believe in Karma. That means I can do bad things to people all day long and I assume they deserve it. Dogbert
Ulrich Hobelmann wrote: > Jon Harrop wrote: >> What kinds of tasks is Lisp best at, in terms of performance? I Googled >> for information on this but most of the sites I found were no longer up.
> Why performance at all?
I became interested in Lisp's performance because several people advocated Lisp to me for these kinds of tasks, claiming that it was suitably efficient. I wanted to test that.
> Lisp is good at many things, most notably good > error recovery (interactive debugger, restarts...), but not for > high-performance computing. There you probably want Fortran or C (and > maybe link them to Lisp).
My background is in computational science. Fortran is fine for trivial programs that just loop over arrays of floats. Mathematica is great for symbolic computation. But there is a huge gap between those where Fortran isn't expressive enough and Mathematica isn't efficient enough. Languages like OCaml, SML, Haskell and Lisp fill that gap.
> For symbolic processing, or anything non-number-chrunchy I wouldn't be > surprised if an application written in Lisp (compiled) isn't a bit > slower than the same app written in C++ or Java. But of course nobody > writes an app in several languages...
I think it is productive to choose suitable tasks and implement them in several different languages. It helps other people to learn, e.g. by comparing C++ code to the equivalent OCaml, and it gives us all an idea of how efficient and expressive the different languages are.
jayessay wrote: > Jon Harrop <use...@jdh30.plus.com> writes: >>Well, it's relative. Most of the other Lisp/Scheme implementations were two >>orders of magnitude slower. Stalin gets even closer than SBCL. > So do many CL implementations on many benchmarks when "properly" > written. Several have been shown here in the past. Typically this > starts with the original posting "showing" how bad CL is supposed to > be when comparing optimized C/C++ with naively written CL. It also > often ends with the CL version beating the optimized C/C++ version.
> Most typical of all is that such benchmarks (including this ray > tracing thing) don't have much of anything interesting to say about > anything.
The comp.lang.lisp thread on Almabench comes into mind. Rif summed it up pretty nicely:
So what have we learned? We confirmed what we pretty much knew: you can write a C program in CL, at which point the relative speed of your C and CL versions will depend on the relative quality of the code generation.
I am sure Jon can pick up some nice tricks in the thread.
Ulrich Hobelmann <u.hobelm...@web.de> wrote: >I wouldn't consider 5 times as slow as a *functional* language very >competitive, but it might be fast enough for many problems.
For a runtime typed language vs. a statically typed one it is quite competitive. Generally, the compiler can never resolve all the runtime typing at compile time, unless you declare all and everything in your program (which would be nothing but a major PITA).
Jens Axel Søgaard wrote: > The comp.lang.lisp thread on Almabench comes into mind. > Rif summed it up pretty nicely:
Almabench. That's the one I'd heard of before.
> So what have we learned? We confirmed what we pretty much > knew: you can write a C program in CL, at which point the > relative speed of your C and CL versions will depend on the > relative quality of the code generation.
> I am sure Jon can pick up some nice tricks in the thread.
Jon Harrop wrote: > jayessay wrote: >>We've already seen where Allegro >>is faster than this SBCL timing and it hadn't even been optimized yet. >>I would be surprised if Lispworks were much different in this regard >>as well.
> Is Lispworks free?
How many implementations of Ocaml are there? One. So every developer working on Ocaml is hammering that one version. If you want to compare Ocaml to an open source language then choose a language for which there is but a single implementation; Perl, Python, Parrot, PHP, Ruby etc.
If you want to compare Ocaml to a specific implementation of Lisp, then target CMUCL, SBCL, CLISP, Lispworks etc. However, if you want to compare the performance of Lisp to the performance of Ocaml then choose the fastest implementation of Lisp available and run that. Allego, Lispworks and Corman (Corman is Windows only) are all free to you for this purpose.
Raffael Cavallaro wrote: > On 2005-08-13 07:18:40 -0400, Jon Harrop <use...@jdh30.plus.com> said:
> > Here is Nathan Baum's port for CMUCL and SBCL:
> just as an additional data point, this code runs in just over 6 seconds > in sbcl 0.9.3 on a dual 2.0 GHz G5 (though sbcl only uses one > processor).
I've also written a version of Jon's raytracer benchmark. Unlike the one given here I've used "simple-vector", to do the vectors.
I've also being using GCL to compile it so far.
It will be interesting to see how the performance compares.