Google Groups Home
Help | Sign in
Message from discussion Mini ray tracer
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
tbp  
View profile
 More options 10 May 2005, 11:36
Newsgroups: comp.graphics.rendering.raytracing
From: "tbp" <tbp...@gmail.com>
Date: 10 May 2005 03:36:54 -0700
Local: Tues 10 May 2005 11:36
Subject: Re: Mini ray tracer
Jon Harrop wrote:
> tbp wrote:
> My post is more about clarity and less about performance. Your

proposed
Oh? Silly me, i thought it was a benchmark.

> alterations make the C++ implementation significantly longer and more
> obfuscated and your littering of the source code with "inline"
actually
> slows the program down on my Athlon t-bird.

I've put explicit inlines because, ie for Vec operators, you've put
them outside of the class scope, hence screwing the common heuristic
saying that a member function declared & defined within the class body
should be inlined (and that's a much stronger hint than using an
'inline' directive).
Anyway, that's decoration compared to other issues in your source.

> Moreover, your choices of initial optimisation are decidedly
suboptimal. I'd
> have gone for:

Hmm? I tinker with a handful of lines, cut the runtime in half and
that's suboptimal?
Interesting.

> 1. Terminating the intersection of shadow rays when the first
intersection
> is found (instead of finding the closest intersection).

> 2. Use an implicit scene, to avoid storing it explicitly.

> 3. Use single-precision storage (or no storage at all).

> Both of these optimisations will give much bigger performance
improvements
> when averaged over different platforms and architectures.

Your aproach & space partition (as presented) will never give runtime
performance worth the time it takes to implement efficiently.
Such a sphere flake should render in a blink of the eye on modern
platforms, see the SPD for reference.
Or this: http://www.uni-koblenz.de/~cg/publikationen/cp_raytrace.pdf
I've merely fixed the most glaring flaws in your implementation.
Period.

> As it happens, I had already implemented all of your optimisations
and all
> of these optimisations to both implementations before I made my
original
> post. None of them add anything new. Sometimes C++ is faster,
sometimes
> OCaml is faster. OCaml is always much more succinct. You can keep
doing
> this until you are blue in the face but I don't think you'll learn
much of
> interest.

I've never contested that OCaml is more succint, expressive or elegant.
I'm just saying that, given the horrible c++ implementation you've
presented, you have absolutely no authority or qualification to say
anything about the relative merit of c++ & OCaml performance wise.

> Both ocamlopt and g++ were allowed to inline code.

It's all about exposing oportunities to do so.

> As OCaml is for symbolic use, it performs no inlining by default
(unlike
> g++) so it is common to ask OCaml to do some inlining on numerical
code.
> Thus, specifying "-inline 100" actually makes for a fairer

comparison.
No. You've used none of the c++ idiom that actually helps a c++
compiler to guess what to inline or not.
Know your compiler (or in that case, your language).

> Both implementations use pass by value as this is clearer, shorter
and (as a
> consequence) more common in real code.

Absolute non sense.
I can't beleive you really think that, but if that's the case, then you
should get back to your homework.

> If you want a fair comparison then you should also make equivalent
changes
> to the OCaml implementation.

You're the one trying to sell books about OCaml...

> You are trying to compare optimised C++ against unoptimised OCaml,
which
> would be unfair. More worryingly, you seem to have skipped the part
of the
> experiment where you actually measure something.

I've provided patches. See for yourself.

> When restricted to 80 columns, your optimisations add several lines.
Indeed,
> they push the C++ program over 100 LOC limit imposed by the creators
of the
> shootout.

You want to fit in those artificial constraints? Shorten names or
define some macros.
What was the fuss about already?

> In fact, I'd already implemented your optimisations (and many more
effective
> optimisations) and had to throw them out because of this limit. Note
that
> the OCaml has 30 more lines with which to optimise. My optimised <100
LOC
> OCaml program is much faster than my optimised <100 LOC C++ on all
> platforms.

Sure. I mean you've shown such a level of C++ wizardry that we're
supposed to take your word for it?

> > You'll notice i haven't even fixed the gratitious use of virtual
> > functions,

> What would you recommend instead?

Not using virtual functions in the hotpath.
Primo, that's uncalled for as your "hierarchy" doesn't mix types at a
given level, each object's type is known upfront.
Secundo, virtual functions are expensive because they forbid inlining
and are implemented as indirect branches , like:
  401c83:       callq  *0x8(%rax)
It doesn't make sense to use an expensive feature in the hotpath if you
can avoid it.
And you definitely can.

> I chose an inheritance hierarchy because this is the closest C++
equivalent
> to a variant type which I see in typical C++ programs.

> Another optimisation that I made was to replace inheritance with a
single
> Scene struct which represented a group of child scenes or a sphere
when it
> had no children. I threw this out as well because it is not a fair
> equivalent to OCaml's variant type. For example, you could not
specify a
> colour for each sphere without also specifying colours for all
bounding
> volumes.

That's totally orthogonal.
Please do yourself a favor and read a good book about C++.

> > and other details
> Can you elaborate on these other "details"?

Strictly speaking of your implementation details (and not algorithm or
anything), you've also failed the const correctness test for example.
But i suppose you're going to say say that it's cruft or obfuscation.

> > that force standard compliant c++
> > compilers to pessimize a lot when facing such... hmm.. source.

> Manually implementing pass constant by reference is both obfuscating
and
> error-prone whilst being theoretically unnecessary. The fact that the
OCaml
> compiler does this optimisation for you when the GNU C++ compiler
does not
> might interest some people.

I think you've just discovered that OCaml & C++ are two different
language that require the programmer to express things in a totally
different way.
And before you propose ways to improve C++ you should learn the
language first.

> Everyone will be free to contribute to the shootout version of the
ray
> tracer. Perhaps you would like to contribute a less "fishy" C++
> implementation?

I don't have books to sell.
If/When i write a raytracer, i choose a path that at least has some
chances to perform decently.
See Wald & Havran.

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2008 Google