Jon Harrop wrote:
> tbp wrote:
> My post is more about clarity and less about performance. Your
proposed
Oh? Silly me, i thought it was a benchmark.
> alterations make the C++ implementation significantly longer and more
> obfuscated and your littering of the source code with "inline"
actually
> slows the program down on my Athlon t-bird.
I've put explicit inlines because, ie for Vec operators, you've put
them outside of the class scope, hence screwing the common heuristic
saying that a member function declared & defined within the class body
should be inlined (and that's a much stronger hint than using an
'inline' directive).
Anyway, that's decoration compared to other issues in your source.
> Moreover, your choices of initial optimisation are decidedly
suboptimal. I'd
> have gone for:
Hmm? I tinker with a handful of lines, cut the runtime in half and
that's suboptimal?
Interesting.
> 1. Terminating the intersection of shadow rays when the first
intersection
> is found (instead of finding the closest intersection).
> 2. Use an implicit scene, to avoid storing it explicitly.
> 3. Use single-precision storage (or no storage at all).
> Both of these optimisations will give much bigger performance
improvements
> when averaged over different platforms and architectures.
Your aproach & space partition (as presented) will never give runtime
performance worth the time it takes to implement efficiently.
Such a sphere flake should render in a blink of the eye on modern
platforms, see the SPD for reference.
Or this:
http://www.uni-koblenz.de/~cg/publikationen/cp_raytrace.pdf I've merely fixed the most glaring flaws in your implementation.
Period.
> As it happens, I had already implemented all of your optimisations
and all
> of these optimisations to both implementations before I made my
original
> post. None of them add anything new. Sometimes C++ is faster,
sometimes
> OCaml is faster. OCaml is always much more succinct. You can keep
doing
> this until you are blue in the face but I don't think you'll learn
much of
> interest.
I've never contested that OCaml is more succint, expressive or elegant.
I'm just saying that, given the horrible c++ implementation you've
presented, you have absolutely no authority or qualification to say
anything about the relative merit of c++ & OCaml performance wise.
> Both ocamlopt and g++ were allowed to inline code.
It's all about exposing oportunities to do so.
> As OCaml is for symbolic use, it performs no inlining by default
(unlike
> g++) so it is common to ask OCaml to do some inlining on numerical
code.
> Thus, specifying "-inline 100" actually makes for a fairer
comparison.
No. You've used none of the c++ idiom that actually helps a c++
compiler to guess what to inline or not.
Know your compiler (or in that case, your language).
> Both implementations use pass by value as this is clearer, shorter
and (as a
> consequence) more common in real code.
Absolute non sense.
I can't beleive you really think that, but if that's the case, then you
should get back to your homework.
> If you want a fair comparison then you should also make equivalent
changes
> to the OCaml implementation.
You're the one trying to sell books about OCaml...
> You are trying to compare optimised C++ against unoptimised OCaml,
which
> would be unfair. More worryingly, you seem to have skipped the part
of the
> experiment where you actually measure something.
I've provided patches. See for yourself.
> When restricted to 80 columns, your optimisations add several lines.
Indeed,
> they push the C++ program over 100 LOC limit imposed by the creators
of the
> shootout.
You want to fit in those artificial constraints? Shorten names or
define some macros.
What was the fuss about already?
> In fact, I'd already implemented your optimisations (and many more
effective
> optimisations) and had to throw them out because of this limit. Note
that
> the OCaml has 30 more lines with which to optimise. My optimised <100
LOC
> OCaml program is much faster than my optimised <100 LOC C++ on all
> platforms.
Sure. I mean you've shown such a level of C++ wizardry that we're
supposed to take your word for it?
> > You'll notice i haven't even fixed the gratitious use of virtual
> > functions,
> What would you recommend instead?
Not using virtual functions in the hotpath.
Primo, that's uncalled for as your "hierarchy" doesn't mix types at a
given level, each object's type is known upfront.
Secundo, virtual functions are expensive because they forbid inlining
and are implemented as indirect branches , like:
401c83: callq *0x8(%rax)
It doesn't make sense to use an expensive feature in the hotpath if you
can avoid it.
And you definitely can.
> I chose an inheritance hierarchy because this is the closest C++
equivalent
> to a variant type which I see in typical C++ programs.
> Another optimisation that I made was to replace inheritance with a
single
> Scene struct which represented a group of child scenes or a sphere
when it
> had no children. I threw this out as well because it is not a fair
> equivalent to OCaml's variant type. For example, you could not
specify a
> colour for each sphere without also specifying colours for all
bounding
> volumes.
That's totally orthogonal.
Please do yourself a favor and read a good book about C++.
> > and other details
> Can you elaborate on these other "details"?
Strictly speaking of your implementation details (and not algorithm or
anything), you've also failed the const correctness test for example.
But i suppose you're going to say say that it's cruft or obfuscation.
> > that force standard compliant c++
> > compilers to pessimize a lot when facing such... hmm.. source.
> Manually implementing pass constant by reference is both obfuscating
and
> error-prone whilst being theoretically unnecessary. The fact that the
OCaml
> compiler does this optimisation for you when the GNU C++ compiler
does not
> might interest some people.
I think you've just discovered that OCaml & C++ are two different
language that require the programmer to express things in a totally
different way.
And before you propose ways to improve C++ you should learn the
language first.
> Everyone will be free to contribute to the shootout version of the
ray
> tracer. Perhaps you would like to contribute a less "fishy" C++
> implementation?
I don't have books to sell.
If/When i write a raytracer, i choose a path that at least has some
chances to perform decently.
See Wald & Havran.