to Lisp. Some of them are reporting competitive performance. However, when I try to run their programs with either CMUCL or SBCL they are two orders of magnitude slower. Given the number of people claiming similarly good performance, I'd like to know what the possible cause of the relative slowdown on my computer is?
My system is an unladen 900MHz Athlon T-bird with 768Mb RAM running Debian testing with SBCL 0.8.16 and CMUCL "19b-release-20050628-3 + minimal debian patches". Other people have both slower and faster CPUs and more and less RAM but all are consistently much faster than mine.
I believe SBCL always compiles to native code and I am asking CMUCL to compile to native code with:
Jon Harrop wrote: > ... when > I try to run their programs with either CMUCL or SBCL they are two orders > of magnitude slower.
Slight correction - CMUCL is only 1 order of magnitude slower.
Also, I just tried running the ackermann and harmonic tests from the shootout and they run at the same (fast) speed on my machine as on the shootout's machine. So it seems the problem is specific to raytracing-like code.
> to Lisp. Some of them are reporting competitive performance. However, when I > try to run their programs with either CMUCL or SBCL they are two orders of > magnitude slower. Given the number of people claiming similarly good > performance, I'd like to know what the possible cause of the relative > slowdown on my computer is?
> My system is an unladen 900MHz Athlon T-bird with 768Mb RAM running Debian > testing with SBCL 0.8.16 and CMUCL "19b-release-20050628-3 + minimal debian > patches". Other people have both slower and faster CPUs and more and less > RAM but all are consistently much faster than mine.
> I believe SBCL always compiles to native code and I am asking CMUCL to > compile to native code with:
Time for running in Allegro CL 7.0 on a 933 MHz Pentium II. as given:
; cpu time (non-gc) 60,202 msec (00:01:00.202) user, 421 msec system ; cpu time (gc) 33,623 msec user, 10 msec system ; cpu time (total) 93,825 msec (00:01:33.825) user, 431 msec system ; real time 105,632 msec (00:01:45.632) ; space allocation: ; 967,240 cons cells, 3,719,472,272 other bytes, 4,472 static bytes
But converting all "single-float" to double-float:
; cpu time (non-gc) 4,446 msec user, 120 msec system ; cpu time (gc) 2,524 msec user, 0 msec system ; cpu time (total) 6,970 msec user, 120 msec system ; real time 7,961 msec ; space allocation: ; 418,422 cons cells, 292,782,544 other bytes, 0 static bytes The compiler complained about the "shadow" statement warning: compile-file found "SHADOW" at the top-level -- see the documentation for comp:*cltl1-compile-file-toplevel-compatibility-p*
My guess is that much of the verbosity for the sbcl version could be struck out of the allegro cl version without any loss in speed, and that careful attention to other potential optimizations/ declarations could squeeze out better performance.
It could be that this single/double issue relates to your AMD64 timings. Most of it may be converting single to double and back.
Richard Fateman wrote: > Time for running in Allegro CL 7.0 on a 933 MHz Pentium II. > as given:
> ; cpu time (non-gc) 60,202 msec (00:01:00.202) user, 421 msec system > ; cpu time (gc) 33,623 msec user, 10 msec system > ; cpu time (total) 93,825 msec (00:01:33.825) user, 431 msec system > ; real time 105,632 msec (00:01:45.632) > ; space allocation: > ; 967,240 cons cells, 3,719,472,272 other bytes, 4,472 static bytes
These times agree with my own.
> But converting all "single-float" to double-float:
> ; cpu time (non-gc) 4,446 msec user, 120 msec system > ; cpu time (gc) 2,524 msec user, 0 msec system > ; cpu time (total) 6,970 msec user, 120 msec system > ; real time 7,961 msec > ; space allocation: > ; 418,422 cons cells, 292,782,544 other bytes, 0 static bytes > The compiler complained about the "shadow" statement > warning: compile-file found "SHADOW" at the top-level -- see the > documentation for > comp:*cltl1-compile-file-toplevel-compatibility-p*
Very interesting - thanks for that. Unfortunately replacing single-float with double-float causes SBCL to spew out a lot of error messages. I'll see if I can figure out why and I'll try CMUCL.
> My guess is that much of the verbosity for the sbcl version could be > struck out of the allegro cl version without any loss in speed, and that > careful attention to other potential optimizations/ declarations could > squeeze out better performance.
Right. Is Allegro CL free?
> It could be that this single/double issue relates to your AMD64 timings. > Most of it may be converting single to double and back.
Yes. I've no idea what CL says about coercions. The other languages may well be much more lax in this respect.
It is interesting that single vs double precision has such bizarre and (for me) unexpected performance implications...
<use...@jdh30.plus.com> wrote: >>> My system is an unladen 900MHz Athlon T-bird with 768Mb RAM running Debian >>> testing with SBCL 0.8.16 and CMUCL "19b-release-20050628-3 + minimal >>> debian patches".
That's a rather ancient version of SBCL, you might want to upgrade. For example:
This is using about 2x the memory of what you'd expect for each instance, and doing an extra memory indirection for each slot access. Proper support for storing the floats "raw" in the struct was added in 0.9.2 by David Lichteblau.
Another possible pitfall on older SBCLs (<0.8.21) is that they don't honor the compiler policy for code entered on the repl, but compile it with low speed, high debug/safety. If you've been pasting the code into the repl instead of LOADing it, performance would indeed be horrible.
-- Juho Snellman "Premature profiling is the root of all evil."
-- * Harald Hanche-Olsen <URL:http://www.math.ntnu.no/~hanche/> - Debating gives most of us much more psychological satisfaction than thinking does: but it deprives us of whatever chance there is of getting closer to the truth. -- C.P. Snow
Juho Snellman wrote: > <use...@jdh30.plus.com> wrote: >>>> My system is an unladen 900MHz Athlon T-bird with 768Mb RAM running >>>> Debian testing with SBCL 0.8.16 and CMUCL "19b-release-20050628-3 + >>>> minimal debian patches".
> That's a rather ancient version of SBCL, you might want to upgrade.
Debian unstable has 0.9.3. It's upgrading now... :-)
> This is using about 2x the memory of what you'd expect for each > instance, and doing an extra memory indirection for each slot access. > Proper support for storing the floats "raw" in the struct was added in > 0.9.2 by David Lichteblau.
I see.
> Another possible pitfall on older SBCLs (<0.8.21) is that they don't > honor the compiler policy for code entered on the repl, but compile it > with low speed, high debug/safety. If you've been pasting the code > into the repl instead of LOADing it, performance would indeed be > horrible.
Aha! Yes indeed. I just tried (load (compile-file "...")) and it runs in only 20secs compared to ~250secs from the top-level and 2.5secs for C++.
Thanks for the help. I'll repost when I get results with the new compiler.