Google Mail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Mini ray tracer
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 1 - 25 of 48 - Collapse all  -  Translate all to Translated (View all originals)   Newer >
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Follow-up To:
Add Cc | Add Follow-up to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers that you hear
 
Jon Harrop  
View profile   Translate to Translated (View Original)
 More options 8 May 2005, 02:27
Newsgroups: comp.graphics.rendering.raytracing
From: Jon Harrop <use...@jdh30.plus.com>
Date: Sun, 08 May 2005 02:27:06 +0100
Local: Sun 8 May 2005 02:27
Subject: Mini ray tracer

I recently tried my hand at writing a ray tracer (something I haven't done
for 15 years!) and, in particular, I to write the shortest possible ray
tracer which was still comprehensible and interesting. The initial result
was a 222-line OCaml program which incrementally renders a sphere-flake via
OpenGL:

  http://www.ffconsultancy.com/free/ray_tracer/index.html

I then cut this program down to a 66-line OCaml program and ported it into a
97-line C++ program. These programs are compared on this page:

  http://www.ffconsultancy.com/free/ray_tracer/comparison.html

The cut-down versions don't do reflections, color or such a pretty
sphere-flake and output a greyscale PGM but they do still use hierarchical
spherical bounding volumes to accelerate rendering enough that they can
render scenes containing millions of spheres.

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
asamard...@matf.bg.ac.yu  
View profile   Translate to Translated (View Original)
 More options 8 May 2005, 12:48
Newsgroups: comp.graphics.rendering.raytracing
From: asamard...@matf.bg.ac.yu
Date: 8 May 2005 04:48:11 -0700
Local: Sun 8 May 2005 12:48
Subject: Re: Mini ray tracer
Great work.  What is default floating point type used by OCaml
compiler?  I got running time of about 36s for scene level 3 on
Pentium3 machine and running time of about 25s when all references to
"double" type in C++ code changed to "float" (and float is usually
"good enough" for ray tracing)...

Regards,
Alex


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tbp...@gmail.com  
View profile   Translate to Translated (View Original)
 More options 9 May 2005, 18:51
Newsgroups: comp.graphics.rendering.raytracing
From: tbp...@gmail.com
Date: 9 May 2005 10:51:28 -0700
Local: Mon 9 May 2005 18:51
Subject: Re: Mini ray tracer
Jon Harrop wrote:
> I then cut this program down to a 66-line OCaml program and ported it
into a
> 97-line C++ program. These programs are compared on this page:

>   http://www.ffconsultancy.com/free/ray_tracer/comparison.html

Is that comparison supposed to be fair in any way whatsoever?

Somewhere on an opteron 146 running debian64:
# g++ -g -O3 ray.cpp -o ray
# time ./ray >tt

real    0m13.959s
user    0m13.946s
sys     0m0.009s

# objdump -x ray
[2mn later]
# diff ray_orig.cpp ray.cpp
12,16c12,18
< Vec operator+(Vec a, Vec b) { return Vec(a.x + b.x, a.y + b.y, a.z +
b.z); }
< Vec operator-(Vec a, Vec b) { return Vec(a.x - b.x, a.y - b.y, a.z -
b.z); }
< Vec operator*(double a, Vec b) { return Vec(a * b.x, a * b.y, a *
b.z); }
< double dot(Vec a, Vec b) { return a.x*b.x + a.y*b.y + a.z*b.z; }
< Vec unitise(Vec a) { return (1 / sqrt(dot(a, a))) * a; }
---

> inline Vec operator+(const Vec &a, const Vec &b) { return Vec(a.x +

b.x, a.y + b.y, a.z + b.z); }
> inline Vec operator-(const Vec &a, const Vec &b) { return Vec(a.x -

b.x, a.y - b.y, a.z - b.z); }
> inline Vec operator*(const double a, const Vec &b) { return Vec(a *

b.x, a * b.y, a * b.z); }
> inline double dot(const Vec &a, const Vec &b) { return a.x*b.x +

a.y*b.y + a.z*b.z; }
> inline Vec unitise(const Vec &a) { return (1 / sqrt(dot(a, a))) * a;
}

# g++ -g -O3 ray.cpp -o ray
# time ./ray >tt

real    0m10.493s
user    0m10.483s
sys     0m0.005s

If your point was to show how to write unbelievably inneficient code in
C++, you've succeeded.
Can't wait for a C++ vs Java comparison.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matt Pharr  
View profile   Translate to Translated (View Original)
 More options 9 May 2005, 19:37
Newsgroups: comp.graphics.rendering.raytracing
From: Matt Pharr <m...@pharr.org>
Date: Mon, 09 May 2005 18:37:59 GMT
Local: Mon 9 May 2005 19:37
Subject: Re: Mini ray tracer

tbp...@gmail.com writes:
> Jon Harrop wrote:
>> I then cut this program down to a 66-line OCaml program and ported it
> into a
>> 97-line C++ program. These programs are compared on this page:

>>   http://www.ffconsultancy.com/free/ray_tracer/comparison.html
> Is that comparison supposed to be fair in any way whatsoever?

> [demonstration that "inline" vector math makes C++ programs faster
> elided]

> If your point was to show how to write unbelievably inneficient code in
> C++, you've succeeded.

I think the point was to have some fun and to make a little demo that shows
how a functional language like OCaml can give rise to nice short expressive
programs while still delivering competitive performance.  And for those
goals, the OP succeeded!

Note also that he mentions on his web page that he's doing this for "the
computer language shootout benchmarks", http://shootout.alioth.debian.org/.
That page clearly points out all the caveats around benchmarks like these
(to the point that they can be effectively meaningless in  the real world.)

Specifically:

>> about The Language Shootout Benchmarks

>>Our goals are to learn about programming languages, compare their
>>performance in various (possibly meaningless) ways and, most importantly,
>>have some fun!

I've at least had some fun looking at his implementation!

-matt
--
Matt Pharr    m...@pharr.org    <URL:http://graphics.stanford.edu/~mmp>
=======================================================================
In a cruel and evil world, being cynical can allow you to get some
entertainment out of it. --Daniel Waters


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tbp  
View profile   Translate to Translated (View Original)
 More options 9 May 2005, 23:51
Newsgroups: comp.graphics.rendering.raytracing
From: "tbp" <tbp...@gmail.com>
Date: 9 May 2005 15:51:11 -0700
Local: Mon 9 May 2005 23:51
Subject: Re: Mini ray tracer
Matt Pharr wrote:
> > [demonstration that "inline" vector math makes C++ programs faster
> > elided]

You've missed the point.
a) he's telling the OCaml compiler to inline everything it can while
not giving its c++ counterpart opportunities to do so; fairness?
b) more importantly he's passing large structures by _value_ and i've
turned them into _references_.

In fact if you do the same for other hotpath functions as in:
19c21
<   virtual void intersect(double &, Vec &, Ray) = 0;
---

>   virtual void intersect(double &, Vec &, const Ray &) = 0;

26c28
<   double ray_sphere(Ray ray) { // Intersection of a ray with a sphere
---
>   double ray_sphere(const Ray &ray) { // Intersection of a ray with a

sphere
35c37
<   void intersect(double &lambda, Vec &normal, Ray ray) {
---
>   void intersect(double &lambda, Vec &normal, const Ray &ray) {

50c52
<   void intersect(double &lambda, Vec &normal, Ray ray) {
---

>   void intersect(double &lambda, Vec &normal, const Ray &ray) {

then the c++ version becomes 50% faster than originally (and in fact,
unconditionally faster than the OCaml version), without adding a single
line.

You'll notice i haven't even fixed the gratitious use of virtual
functions, and other details that force standard compliant c++
compilers to pessimize a lot when facing such... hmm.. source.

I'm sure OCaml is a wonderful & expressive language, but crafting such
a fishy c++ equivalent to make it shine (because you have an agenda) is
a despicable practice.

> > If your point was to show how to write unbelievably inneficient
code in
> > C++, you've succeeded.

> I think the point was to have some fun and to make a little demo that
shows
> how a functional language like OCaml can give rise to nice short
expressive
> programs while still delivering competitive performance.  And for
those
> goals, the OP succeeded!

> Note also that he mentions on his web page that he's doing this for
"the
> computer language shootout benchmarks",

http://shootout.alioth.debian.org/.
> That page clearly points out all the caveats around benchmarks like
these
> (to the point that they can be effectively meaningless in  the real

world.)
I have nothing against such benchmarks when they are not clearly
rigged.

But i guess the main point of the whole operation wasn't fair
benchmarking, education, or fun but selling books: "This program has
demonstrated some of the ways that OCaml improves upon C++. For a
thorough introduction to OCaml, read our book "Objective CAML for
Scientists".

Or put another way, that's a sad PR stunt.

Have a nice day.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile   Translate to Translated (View Original)
 More options 10 May 2005, 04:33
Newsgroups: comp.graphics.rendering.raytracing
From: Jon Harrop <use...@jdh30.plus.com>
Date: Tue, 10 May 2005 04:33:46 +0100
Local: Tues 10 May 2005 04:33
Subject: Re: Mini ray tracer

tbp wrote:
> Matt Pharr wrote:
>> > [demonstration that "inline" vector math makes C++ programs faster
>> > elided]
> You've missed the point.

My post is more about clarity and less about performance. Your proposed
alterations make the C++ implementation significantly longer and more
obfuscated and your littering of the source code with "inline" actually
slows the program down on my Athlon t-bird.

Moreover, your choices of initial optimisation are decidedly suboptimal. I'd
have gone for:

1. Terminating the intersection of shadow rays when the first intersection
is found (instead of finding the closest intersection).

2. Use an implicit scene, to avoid storing it explicitly.

3. Use single-precision storage (or no storage at all).

Both of these optimisations will give much bigger performance improvements
when averaged over different platforms and architectures.

As it happens, I had already implemented all of your optimisations and all
of these optimisations to both implementations before I made my original
post. None of them add anything new. Sometimes C++ is faster, sometimes
OCaml is faster. OCaml is always much more succinct. You can keep doing
this until you are blue in the face but I don't think you'll learn much of
interest.

> a) he's telling the OCaml compiler to inline everything it can while
> not giving its c++ counterpart opportunities to do so; fairness?

Both ocamlopt and g++ were allowed to inline code.

As OCaml is for symbolic use, it performs no inlining by default (unlike
g++) so it is common to ask OCaml to do some inlining on numerical code.
Thus, specifying "-inline 100" actually makes for a fairer comparison.

> b) more importantly he's passing large structures by _value_ and i've
> turned them into _references_.

Both implementations use pass by value as this is clearer, shorter and (as a
consequence) more common in real code.

> In fact if you do the same for other hotpath functions as in:
> ...
> then the c++ version becomes 50% faster than originally

If you want a fair comparison then you should also make equivalent changes
to the OCaml implementation.

> (and in fact, unconditionally faster than the OCaml version),

You are trying to compare optimised C++ against unoptimised OCaml, which
would be unfair. More worryingly, you seem to have skipped the part of the
experiment where you actually measure something.

> without adding a single line.

When restricted to 80 columns, your optimisations add several lines. Indeed,
they push the C++ program over 100 LOC limit imposed by the creators of the
shootout.

In fact, I'd already implemented your optimisations (and many more effective
optimisations) and had to throw them out because of this limit. Note that
the OCaml has 30 more lines with which to optimise. My optimised <100 LOC
OCaml program is much faster than my optimised <100 LOC C++ on all
platforms.

> You'll notice i haven't even fixed the gratitious use of virtual
> functions,

What would you recommend instead?

I chose an inheritance hierarchy because this is the closest C++ equivalent
to a variant type which I see in typical C++ programs.

Another optimisation that I made was to replace inheritance with a single
Scene struct which represented a group of child scenes or a sphere when it
had no children. I threw this out as well because it is not a fair
equivalent to OCaml's variant type. For example, you could not specify a
colour for each sphere without also specifying colours for all bounding
volumes.

> and other details

Can you elaborate on these other "details"?

> that force standard compliant c++  
> compilers to pessimize a lot when facing such... hmm.. source.

Manually implementing pass constant by reference is both obfuscating and
error-prone whilst being theoretically unnecessary. The fact that the OCaml
compiler does this optimisation for you when the GNU C++ compiler does not
might interest some people.

> I'm sure OCaml is a wonderful & expressive language, but crafting such
> a fishy c++ equivalent to make it shine (because you have an agenda) is
> a despicable practice.

Everyone will be free to contribute to the shootout version of the ray
tracer. Perhaps you would like to contribute a less "fishy" C++
implementation?

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile   Translate to Translated (View Original)
 More options 10 May 2005, 04:52
Newsgroups: comp.graphics.rendering.raytracing
From: Jon Harrop <use...@jdh30.plus.com>
Date: Tue, 10 May 2005 04:52:38 +0100
Local: Tues 10 May 2005 04:52
Subject: Re: Mini ray tracer

asamard...@matf.bg.ac.yu wrote:
> Great work.  What is default floating point type used by OCaml
> compiler?

Double precision.

> I got running time of about 36s for scene level 3 on
> Pentium3 machine and running time of about 25s when all references to
> "double" type in C++ code changed to "float" (and float is usually
> "good enough" for ray tracing)...

I agree that single precision is usually good enough for ray tracing.
However, when I tried the same optimisation I found that it made the
program much faster on 32-bit x86 and much slower on 64-bit AMD64. I assume
this is a float alignment issue.

Also, it generates (numerically) different results, which makes it trickier
to validate against the OCaml.

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Warp  
View profile   Translate to Translated (View Original)
 More options 10 May 2005, 09:20
Newsgroups: comp.graphics.rendering.raytracing
From: Warp <w...@cs.tut.fi>
Date: Tue, 10 May 2005 08:20:17 +0000 (UTC)
Local: Tues 10 May 2005 09:20
Subject: Re: Mini ray tracer

Jon Harrop <use...@jdh30.plus.com> wrote:
> My post is more about clarity and less about performance. Your proposed
> alterations make the C++ implementation significantly longer and more
> obfuscated and your littering of the source code with "inline" actually
> slows the program down on my Athlon t-bird.

  Then your compiler is stupid. Get a better one.

  The keyword 'inline' is not a command for the compiler that says it must
inline the function or else it will be deleted immediately. And the other
way around, the lack of 'inline' keyword by no means stops the compiler
from inlining the function if it can and it estimates it would be useful.

  In *theory* 'inline' is a hint to the compiler. In practice it has nothing
to do with inlining. Most of the modern compilers will simply ignore that
keyword when they are estimating whether a function should be inlined or
not. In this context 'inline' is a no-op. It's like the 'register'
keyword which is nowadays obsolete and a no-op.

  What 'inline' *does* have some effect on is the linker. If a function
with an identical declaration is implemented in more than one place
then in normal circumstances the linker will give an error (because it
can't know which one of them it should use). However, if the functions
were declared 'inline', then the compiler will tell the linker that
all of those functions are actually one and the same, and then the linker
will just use one of them.

  And by the way, if you are afraid of "litter" in C++ code, then you
should switch to another programming language.
  That "litter", as you call it, is not useless junk as you seem to imply.
It has a reason and a significance.

> > b) more importantly he's passing large structures by _value_ and i've
> > turned them into _references_.
> Both implementations use pass by value as this is clearer, shorter and (as a
> consequence) more common in real code.

  I hope you are not passing megabyte-sized vectors by value...

  If your opinion of "clearer" and "shorter" is that the & symbol should
not be used, then you might want to either check your priorities or
change your programming language (because C++ is not for you).

> If you want a fair comparison then you should also make equivalent changes
> to the OCaml implementation.

  You want to compare which programming language runs inefficient code
faster?

> > You'll notice i haven't even fixed the gratitious use of virtual
> > functions,
> What would you recommend instead?
> I chose an inheritance hierarchy because this is the closest C++ equivalent
> to a variant type which I see in typical C++ programs.

  Inheritance has no overhead in C++. Dynamic binding has.
  While virtual function calls are pretty fast they are still slightly
slower than regular function calls. If you want maximum speed you should
not make time-critical functions virtual unless you are really forced to.

> Another optimisation that I made was to replace inheritance with a single
> Scene struct which represented a group of child scenes or a sphere when it
> had no children.

  Inheritance causes no overhead. That kind of optimization is useless.

--
                                                          - Warp


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tbp  
View profile   Translate to Translated (View Original)
 More options 10 May 2005, 11:36
Newsgroups: comp.graphics.rendering.raytracing
From: "tbp" <tbp...@gmail.com>
Date: 10 May 2005 03:36:54 -0700
Local: Tues 10 May 2005 11:36
Subject: Re: Mini ray tracer
Jon Harrop wrote:
> tbp wrote:
> My post is more about clarity and less about performance. Your

proposed
Oh? Silly me, i thought it was a benchmark.

> alterations make the C++ implementation significantly longer and more
> obfuscated and your littering of the source code with "inline"
actually
> slows the program down on my Athlon t-bird.

I've put explicit inlines because, ie for Vec operators, you've put
them outside of the class scope, hence screwing the common heuristic
saying that a member function declared & defined within the class body
should be inlined (and that's a much stronger hint than using an
'inline' directive).
Anyway, that's decoration compared to other issues in your source.

> Moreover, your choices of initial optimisation are decidedly
suboptimal. I'd
> have gone for:

Hmm? I tinker with a handful of lines, cut the runtime in half and
that's suboptimal?
Interesting.

> 1. Terminating the intersection of shadow rays when the first
intersection
> is found (instead of finding the closest intersection).

> 2. Use an implicit scene, to avoid storing it explicitly.

> 3. Use single-precision storage (or no storage at all).

> Both of these optimisations will give much bigger performance
improvements
> when averaged over different platforms and architectures.

Your aproach & space partition (as presented) will never give runtime
performance worth the time it takes to implement efficiently.
Such a sphere flake should render in a blink of the eye on modern
platforms, see the SPD for reference.
Or this: http://www.uni-koblenz.de/~cg/publikationen/cp_raytrace.pdf
I've merely fixed the most glaring flaws in your implementation.
Period.

> As it happens, I had already implemented all of your optimisations
and all
> of these optimisations to both implementations before I made my
original
> post. None of them add anything new. Sometimes C++ is faster,
sometimes
> OCaml is faster. OCaml is always much more succinct. You can keep
doing
> this until you are blue in the face but I don't think you'll learn
much of
> interest.

I've never contested that OCaml is more succint, expressive or elegant.
I'm just saying that, given the horrible c++ implementation you've
presented, you have absolutely no authority or qualification to say
anything about the relative merit of c++ & OCaml performance wise.

> Both ocamlopt and g++ were allowed to inline code.

It's all about exposing oportunities to do so.

> As OCaml is for symbolic use, it performs no inlining by default
(unlike
> g++) so it is common to ask OCaml to do some inlining on numerical
code.
> Thus, specifying "-inline 100" actually makes for a fairer

comparison.
No. You've used none of the c++ idiom that actually helps a c++
compiler to guess what to inline or not.
Know your compiler (or in that case, your language).

> Both implementations use pass by value as this is clearer, shorter
and (as a
> consequence) more common in real code.

Absolute non sense.
I can't beleive you really think that, but if that's the case, then you
should get back to your homework.

> If you want a fair comparison then you should also make equivalent
changes
> to the OCaml implementation.

You're the one trying to sell books about OCaml...

> You are trying to compare optimised C++ against unoptimised OCaml,
which
> would be unfair. More worryingly, you seem to have skipped the part
of the
> experiment where you actually measure something.

I've provided patches. See for yourself.

> When restricted to 80 columns, your optimisations add several lines.
Indeed,
> they push the C++ program over 100 LOC limit imposed by the creators
of the
> shootout.

You want to fit in those artificial constraints? Shorten names or
define some macros.
What was the fuss about already?

> In fact, I'd already implemented your optimisations (and many more
effective
> optimisations) and had to throw them out because of this limit. Note
that
> the OCaml has 30 more lines with which to optimise. My optimised <100
LOC
> OCaml program is much faster than my optimised <100 LOC C++ on all
> platforms.

Sure. I mean you've shown such a level of C++ wizardry that we're
supposed to take your word for it?

> > You'll notice i haven't even fixed the gratitious use of virtual
> > functions,

> What would you recommend instead?

Not using virtual functions in the hotpath.
Primo, that's uncalled for as your "hierarchy" doesn't mix types at a
given level, each object's type is known upfront.
Secundo, virtual functions are expensive because they forbid inlining
and are implemented as indirect branches , like:
  401c83:       callq  *0x8(%rax)
It doesn't make sense to use an expensive feature in the hotpath if you
can avoid it.
And you definitely can.

> I chose an inheritance hierarchy because this is the closest C++
equivalent
> to a variant type which I see in typical C++ programs.

> Another optimisation that I made was to replace inheritance with a
single
> Scene struct which represented a group of child scenes or a sphere
when it
> had no children. I threw this out as well because it is not a fair
> equivalent to OCaml's variant type. For example, you could not
specify a
> colour for each sphere without also specifying colours for all
bounding
> volumes.

That's totally orthogonal.
Please do yourself a favor and read a good book about C++.

> > and other details
> Can you elaborate on these other "details"?

Strictly speaking of your implementation details (and not algorithm or
anything), you've also failed the const correctness test for example.
But i suppose you're going to say say that it's cruft or obfuscation.

> > that force standard compliant c++
> > compilers to pessimize a lot when facing such... hmm.. source.

> Manually implementing pass constant by reference is both obfuscating
and
> error-prone whilst being theoretically unnecessary. The fact that the
OCaml
> compiler does this optimisation for you when the GNU C++ compiler
does not
> might interest some people.

I think you've just discovered that OCaml & C++ are two different
language that require the programmer to express things in a totally
different way.
And before you propose ways to improve C++ you should learn the
language first.

> Everyone will be free to contribute to the shootout version of the
ray
> tracer. Perhaps you would like to contribute a less "fishy" C++
> implementation?

I don't have books to sell.
If/When i write a raytracer, i choose a path that at least has some
chances to perform decently.
See Wald & Havran.

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile   Translate to Translated (View Original)
 More options 10 May 2005, 16:17
Newsgroups: comp.graphics.rendering.raytracing
From: Jon Harrop <use...@jdh30.plus.com>
Date: Tue, 10 May 2005 16:17:23 +0100
Local: Tues 10 May 2005 16:17
Subject: Re: Mini ray tracer

Warp wrote:
> Jon Harrop <use...@jdh30.plus.com> wrote:
>> My post is more about clarity and less about performance. Your proposed
>> alterations make the C++ implementation significantly longer and more
>> obfuscated and your littering of the source code with "inline" actually
>> slows the program down on my Athlon t-bird.

>   Then your compiler is stupid. Get a better one.
> ...

I agree that the compiler is stupid but I'm comparing g++ as this is a
common compiler. A better C++ compiler is likely to produce faster code, of
course.

>   And by the way, if you are afraid of "litter" in C++ code, then you
> should switch to another programming language.

Absolutely. :-)

>   That "litter", as you call it, is not useless junk as you seem to imply.
> It has a reason and a significance.

It also has unexpected platform dependent results, as "tbp" has discovered.

>> > b) more importantly he's passing large structures by _value_ and i've
>> > turned them into _references_.

>> Both implementations use pass by value as this is clearer, shorter and
>> (as a consequence) more common in real code.

>   I hope you are not passing megabyte-sized vectors by value...

No, they are small, constant-sized vectors (3D), so pass by value does not
affect the asymptotic algorithmic complexity.

>   If your opinion of "clearer" and "shorter" is that the & symbol should
> not be used, then you might want to either check your priorities or
> change your programming language (because C++ is not for you).

Unfortunately, you cannot just use "&" because it will take a reference to a
temporary, giving an error like:

ray.cpp: In member function `virtual void Sphere::intersect(double&, Vec&,
Ray)
   ':
ray.cpp:39: error: no match for 'operator+' in 'ray.Ray::orig +
   operator*(double, Vec)(((&ray) + 24))'
ray.cpp:12: error: candidates are: Vec operator+(Vec&, Vec&)
/usr/include/c++/3.3/bits/stl_bvector.h:261: error:
   std::_Bit_const_iterator std::operator+(int, const
   std::_Bit_const_iterator&)
/usr/include/c++/3.3/bits/stl_bvector.h:202: error:
   std::_Bit_iterator std::operator+(int, const std::_Bit_iterator&)
ray.cpp: In function `double ray_trace(double, Vec, Ray, Scene*)':
ray.cpp:62: error: no match for 'operator+' in 'ray.Ray::orig +
   operator*(double, Vec)(((&ray) + 24))'
ray.cpp:12: error: candidates are: Vec operator+(Vec&, Vec&)
/usr/include/c++/3.3/bits/stl_bvector.h:261: error:
   std::_Bit_const_iterator std::operator+(int, const
   std::_Bit_const_iterator&)
/usr/include/c++/3.3/bits/stl_bvector.h:202: error:
   std::_Bit_iterator std::operator+(int, const std::_Bit_iterator&)

You must either use "const ... &" instead, or make sure that all values are
created explicitly as local variables in the caller (rather than being
inline subexpressions in the function call).

It is precisely this obfuscation (and the fact that these are tiny
structures, and that this is not necessarily an optimisation, and that it
could be done by the compiler) that made me choose pass by value.

>> If you want a fair comparison then you should also make equivalent
>> changes to the OCaml implementation.

>   You want to compare which programming language runs inefficient code
> faster?

If you completely optimise both implementations then you'll end up with the
same assembler. There is only value in comparing simpler implementations in
both languages. So yes, they will not be completely optimised but
subjectively "like" real code.

>> I chose an inheritance hierarchy because this is the closest C++
>> equivalent to a variant type which I see in typical C++ programs.

>   Inheritance has no overhead in C++. Dynamic binding has.
>   While virtual function calls are pretty fast they are still slightly
> slower than regular function calls. If you want maximum speed you should
> not make time-critical functions virtual unless you are really forced to.

I believe I am "really forced to" as virtual lookup is used instead of
switching on the tag of the variant type/union. In C++, virtual functions
are more elegant (and more common in real code) than C-style tagged unions.

>> Another optimisation that I made was to replace inheritance with a single
>> Scene struct which represented a group of child scenes or a sphere when
>> it had no children.

>   Inheritance causes no overhead. That kind of optimization is useless.

Getting rid of inheritance gets rid of the virtual functions. However, it
necessarily incurs an equivalent test somewhere else.

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile   Translate to Translated (View Original)
 More options 10 May 2005, 16:28
Newsgroups: comp.graphics.rendering.raytracing
From: Jon Harrop <use...@jdh30.plus.com>
Date: Tue, 10 May 2005 16:28:30 +0100
Local: Tues 10 May 2005 16:28
Subject: Re: Mini ray tracer

tbp wrote:
>> When restricted to 80 columns, your optimisations add several lines.
> Indeed,
>> they push the C++ program over 100 LOC limit imposed by the creators
> of the
>> shootout.
> You want to fit in those artificial constraints?

Yes, it actually says that at the top of the web page.

> Shorten names or define some macros. What was the fuss about already?

If you recall, the fuss was about clarity. So shortening names and defining
macros doesn't really seem appropriate.

>> In fact, I'd already implemented your optimisations (and many more
> effective
>> optimisations) and had to throw them out because of this limit. Note
> that
>> the OCaml has 30 more lines with which to optimise. My optimised <100
> LOC
>> OCaml program is much faster than my optimised <100 LOC C++ on all
>> platforms.
> Sure. I mean you've shown such a level of C++ wizardry that we're
> supposed to take your word for it?

I'd be very happy to include a better C++ version, if you have one.

>> > You'll notice i haven't even fixed the gratitious use of virtual
>> > functions,

>> What would you recommend instead?
> Not using virtual functions in the hotpath.
> Primo, that's uncalled for as your "hierarchy" doesn't mix types at a
> given level, each object's type is known upfront.

So you're talking about specialising the scene tree so that it can only
represent this scene, rather than scenes in general?

> Secundo, virtual functions are expensive because they forbid inlining
> and are implemented as indirect branches , like:
>   401c83:       callq  *0x8(%rax)
> It doesn't make sense to use an expensive feature in the hotpath if you
> can avoid it.
> And you definitely can.

Actually, this is another optimisation that I had already done. It makes no
significant difference to the performance because this isn't actually on
the "hotpath", as you call it. Most of the time is spent elsewhere.

>> > and other details
>> Can you elaborate on these other "details"?
> Strictly speaking of your implementation details (and not algorithm or
> anything), you've also failed the const correctness test for example.
> But i suppose you're going to say say that it's cruft or obfuscation.

Yes, that was also done deliberately to squeeze the C++ into 100 LOC.

>> Everyone will be free to contribute to the shootout version of the
> ray
>> tracer. Perhaps you would like to contribute a less "fishy" C++
>> implementation?
> I don't have books to sell.
> If/When i write a raytracer, i choose a path that at least has some
> chances to perform decently.
> See Wald & Havran.

I didn't think so. :-)

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile   Translate to Translated (View Original)
 More options 10 May 2005, 16:42
Newsgroups: comp.graphics.rendering.raytracing
From: Jon Harrop <use...@jdh30.plus.com>
Date: Tue, 10 May 2005 16:42:28 +0100
Local: Tues 10 May 2005 16:42
Subject: Re: Mini ray tracer

tbp wrote:
> Such a sphere flake should render in a blink of the eye on modern
> platforms, see the SPD for reference.
> Or this: http://www.uni-koblenz.de/~cg/publikationen/cp_raytrace.pdf

Although I'm sure that a production ray tracer will be a lot faster than my
tutorial ray tracer, the paper you cite describes a high-level program
which is not much faster than mine. The faster alternative is actually
written in assembler (using SIMD) and also fails to reach "interactive
frame rates" on such scenes.

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Warp  
View profile   Translate to Translated (View Original)
 More options 10 May 2005, 18:03
Newsgroups: comp.graphics.rendering.raytracing
From: Warp <w...@cs.tut.fi>
Date: Tue, 10 May 2005 17:03:21 +0000 (UTC)
Local: Tues 10 May 2005 18:03
Subject: Re: Mini ray tracer

Jon Harrop <use...@jdh30.plus.com> wrote:
> I agree that the compiler is stupid but I'm comparing g++ as this is a
> common compiler. A better C++ compiler is likely to produce faster code, of
> course.

  Are you using the optimization flags for your platform?

> Unfortunately, you cannot just use "&" because it will take a reference to a
> temporary, giving an error like:

  const &

  You very seldom need non-const references.

> >   Inheritance has no overhead in C++. Dynamic binding has.
> >   While virtual function calls are pretty fast they are still slightly
> > slower than regular function calls. If you want maximum speed you should
> > not make time-critical functions virtual unless you are really forced to.
> I believe I am "really forced to" as virtual lookup is used instead of
> switching on the tag of the variant type/union. In C++, virtual functions
> are more elegant (and more common in real code) than C-style tagged unions.

  I was not suggesting a "tagged union" as an alternative. That would
actually be slower than a dynamically bound class.
  I was suggesting as an alternative a different design where you do not
need dynamic binding.

> >   Inheritance causes no overhead. That kind of optimization is useless.
> Getting rid of inheritance gets rid of the virtual functions.

  Well, yes, but you are using the wrong means. You don't need to drop
inheritance simply to stop using dynamic binding (for the latter you
simply need to drop the 'virtual' keywords; inheritance has nothing
to do with that).

--
                                                          - Warp


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tbp  
View profile   Translate to Translated (View Original)
 More options 10 May 2005, 18:35
Newsgroups: comp.graphics.rendering.raytracing
From: "tbp" <tbp...@gmail.com>
Date: 10 May 2005 10:35:56 -0700
Local: Tues 10 May 2005 18:35
Subject: Re: Mini ray tracer
Jon Harrop wrote:
> tbp wrote:
> > Such a sphere flake should render in a blink of the eye on modern
> > platforms, see the SPD for reference.
> > Or this:

http://www.uni-koblenz.de/~cg/publikationen/cp_raytrace.pdf

> Although I'm sure that a production ray tracer will be a lot faster
than my
> tutorial ray tracer, the paper you cite describes a high-level
program
> which is not much faster than mine. The faster alternative is

actually
*cough*

> written in assembler (using SIMD) and also fails to reach
"interactive
> frame rates" on such scenes.

Not assembler, intrinsics.
And the whole thing is not about SIMD but ray coherence & BVH which are
notions that have nothing to do with a specific platform.
It just happens to fit into the SIMD model quite well.

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tbp  
View profile   Translate to Translated (View Original)
(1 user)  More options 10 May 2005, 18:44
Newsgroups: comp.graphics.rendering.raytracing
From: "tbp" <tbp...@gmail.com>
Date: 10 May 2005 10:44:21 -0700
Local: Tues 10 May 2005 18:44
Subject: Re: Mini ray tracer
Jon Harrop wrote:
> > You want to fit in those artificial constraints?
> Yes, it actually says that at the top of the web page.
> > Shorten names or define some macros. What was the fuss about
already?
> If you recall, the fuss was about clarity. So shortening names and
defining
> macros doesn't really seem appropriate.

If i sum it up, nothing's appropriate but what you did which
incidentally happens to be the worst way to implement it in C++.

> I'd be very happy to include a better C++ version, if you have one.

Better in what metric? I'm cautious because it seems we live in two
distinct reality.

> So you're talking about specialising the scene tree so that it can
only
> represent this scene, rather than scenes in general?

Scenes in general?
Please correct me if i'm wrong but it seems that your program is about
a sphere BVH enclosing... spheres.
Is that your idea of generality?
So, please focus on the topic at hand and you'll notice that using
dynamic binding is superfluous.

> > Secundo, virtual functions are expensive because they forbid
inlining
> > and are implemented as indirect branches , like:
> >   401c83:       callq  *0x8(%rax)
> > It doesn't make sense to use an expensive feature in the hotpath if
you
> > can avoid it.
> > And you definitely can.

> Actually, this is another optimisation that I had already done. It
makes no
> significant difference to the performance because this isn't actually
on
> the "hotpath", as you call it. Most of the time is spent elsewhere.

I see:
struct Scene { // Abstract base class representing a scene
  virtual void intersect(double &, Vec &, Ray) = 0;

So you're saying intersections aren't part of the hotpath?
Well you maybe right, after all it's only called a few million times.

> >> Can you elaborate on these other "details"?
> > Strictly speaking of your implementation details (and not algorithm
or
> > anything), you've also failed the const correctness test for
example.
> > But i suppose you're going to say say that it's cruft or
obfuscation.

> Yes, that was also done deliberately to squeeze the C++ into 100 LOC.

Bingo!

And i suppose that the fact that this emasculated C++ version performed
slower than its OCaml counterpart was merely a side effect?

What's next? You're going to try to sell me a bridge?
Oops. A book first, then a bridge.

> I didn't think so. :-)

Oh! That was a test and i failed it?
Damn.

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile   Translate to Translated (View Original)
 More options 10 May 2005, 20:39
Newsgroups: comp.graphics.rendering.raytracing
From: Jon Harrop <use...@jdh30.plus.com>
Date: Tue, 10 May 2005 20:39:44 +0100
Local: Tues 10 May 2005 20:39
Subject: Re: Mini ray tracer

Warp wrote:
>   Are you using the optimization flags for your platform?

Yes, I've played with various optimisation settings with both compilers on
both platforms. I didn't see anything particularly interesting when
tweaking them. The performance varies by about 20% but the ratio of the
performances stays the same (between OCaml and C++).

>   You very seldom need non-const references.

In point of fact, the program uses non-const references to return multiple
results from functions. IMHO, this is often done in C++ (and pass by
non-const pointer in C) because it is too tedious to define a new type for
the return values of each function in C/C++.

As there are currently only two values being returned by the intersect
routine, I could have used STL pair<> instead. I might investigate that (I
think it will be simpler but slower). I have seen code which uses
pair<pair<...>, ...> to return three values but I don't much like it.

>> I believe I am "really forced to" as virtual lookup is used instead of
>> switching on the tag of the variant type/union. In C++, virtual functions
>> are more elegant (and more common in real code) than C-style tagged
>> unions.

>   I was not suggesting a "tagged union" as an alternative. That would
> actually be slower than a dynamically bound class.
>   I was suggesting as an alternative a different design where you do not
> need dynamic binding.

I have actually already done this "optimisation" because I also thought it
would speed things up. It turns out that the virtual lookup is taking an
insignificant amount of time. I ditched the resulting code because it is
less general and less clear whilst being no faster. It is significantly
shorter though.

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile   Translate to Translated (View Original)
(1 user)  More options 10 May 2005, 20:54
Newsgroups: comp.graphics.rendering.raytracing
From: Jon Harrop <use...@jdh30.plus.com>
Date: Tue, 10 May 2005 20:54:08 +0100
Local: Tues 10 May 2005 20:54
Subject: Re: Mini ray tracer

tbp wrote:
> Jon Harrop wrote:
>> If you recall, the fuss was about clarity. So shortening names and
>> defining
>> macros doesn't really seem appropriate.
> If i sum it up, nothing's appropriate but what you did which
> incidentally happens to be the worst way to implement it in C++.

Perhaps there is a better way but I haven't seen it.

>> I'd be very happy to include a better C++ version, if you have one.
> Better in what metric? I'm cautious because it seems we live in two
> distinct reality.

Better => Faster whilst remaining clear and satisfying the requirements of
the shootout.

>> So you're talking about specialising the scene tree so that it can
> only
>> represent this scene, rather than scenes in general?
> Scenes in general?
> Please correct me if i'm wrong but it seems that your program is about
> a sphere BVH enclosing... spheres.
> Is that your idea of generality?

In both cases, the types of objects and bounding volumes are extensible.

To add a new kind of object to the C++, you create a new class derived from
Scene which implements its own "intersect" member function.

To add a new kind of object to the OCaml, you supplement the variant type
"scene" with another constructor and supplement the pattern match in the
"intersect" function with a case to handle your new object.

Similarly for new types of bounding volume.

Perhaps it would be interesting to add a new kind of object to both
implementations, to see how this is done in each language?

>> Actually, this is another optimisation that I had already done. It
> makes no
>> significant difference to the performance because this isn't actually
> on
>> the "hotpath", as you call it. Most of the time is spent elsewhere.
> I see:
> struct Scene { // Abstract base class representing a scene
>   virtual void intersect(double &, Vec &, Ray) = 0;

> So you're saying intersections aren't part of the hotpath?
> Well you maybe right, after all it's only called a few million times.

Yes, exactly. It isn't really that surprising when you consider that
"ray_sphere" is called ~10x more than "intersect" and it contains several
branches, lots of floating point maths and function calls.

>> Yes, that was also done deliberately to squeeze the C++ into 100 LOC.
> Bingo!

> And i suppose that the fact that this emasculated C++ version performed
> slower than its OCaml counterpart was merely a side effect?

The C++ outperforms the OCaml on AMD64 (and IA64). I'd say that the
performance is comparable in all cases.

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile   Translate to Translated (View Original)
 More options 10 May 2005, 21:04
Newsgroups: comp.graphics.rendering.raytracing
From: Jon Harrop <use...@jdh30.plus.com>
Date: Tue, 10 May 2005 21:04:53 +0100
Local: Tues 10 May 2005 21:04
Subject: Re: Mini ray tracer

tbp wrote:
> Not assembler, intrinsics.
> And the whole thing is not about SIMD but ray coherence & BVH which are
> notions that have nothing to do with a specific platform.
> It just happens to fit into the SIMD model quite well.

Ok. I don't know anything about intrinsics but they don't seem to be
relevant to my comparison of a tutorial ray tracer in C++ and OCaml.

As I've said, if you want to optimise my ray tracer (or any other program)
then I would recommend algorithmic optimisations over low-level ones like
SIMD, cache coherency etc. There are plenty more algorithmic optimisations
to be done (although I've already done the main one - hierarchical bounding
volumes).

In this case, it looks as though you'd want to write specialised C code for
certain platforms. This could be called from C++ or OCaml and it would
speed both programs up by roughly the same amount. This might be a good
idea though, as OCaml is clearly preferable for the higher-level, not-
performance-critical bulk of the code. So you could get rid of C++
altogether. :-)

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Warp  
View profile   Translate to Translated (View Original)
 More options 11 May 2005, 00:52
Newsgroups: comp.graphics.rendering.raytracing
From: Warp <w...@cs.tut.fi>
Date: Tue, 10 May 2005 23:52:20 +0000 (UTC)
Local: Wed 11 May 2005 00:52
Subject: Re: Mini ray tracer

Jon Harrop <use...@jdh30.plus.com> wrote:
> >   You very seldom need non-const references.
> In point of fact, the program uses non-const references to return multiple
> results from functions. IMHO, this is often done in C++ (and pass by
> non-const pointer in C) because it is too tedious to define a new type for
> the return values of each function in C/C++.

  Often? It is usually rare that you have to return several values
from a function so that usually all of them are needed.
  The problem of making one function which returns many values versus
making several functions, each one returning one value, is that when
you only need that one value returning many is useless overhead and
forces you to make clumsy code (you have to create dummy variables to
hold those return variables you don't need and then immediately discard
them).

  When there's a logical reason to return many values at once, returning
a class or struct as the return value of the function may in fact be a
good idea. If you call the function like this:

ReturnValues values = theFunction(whatever);

where ReturnValues is a class or struct, C++ compilers (or at least gcc;
probably also most of the other compilers) will actually make that very
efficiently (they will perform the so-called return value optimization).

  The return value optimization means that, in a code like the above,
the function being called actually builds its return value directly
into 'values' and will not create any temporaries nor will it call any
copy constructors. Perhaps a bit surprisingly this works even if the
function in question is not inline and its implementation is in a
completely separate compilation unit (ie. object file) which is just
linked later to the binary (and in fact it works even if the function
in question is in a dynamically loaded library).

  It might even be that the return value optimization technique produces
slightly faster code than the function returning its values by writing
to references given to it as parameters.

> As there are currently only two values being returned by the intersect
> routine, I could have used STL pair<> instead. I might investigate that (I
> think it will be simpler but slower). I have seen code which uses
> pair<pair<...>, ...> to return three values but I don't much like it.

  How about just creating your own struct or class?

> I have actually already done this "optimisation" because I also thought it
> would speed things up. It turns out that the virtual lookup is taking an
> insignificant amount of time.

  Yes, virtual function calls are very fast in C++. However, what is fastest
depends on what you are comparing it to.
  If you are comparing it to an equivalent functionality implemented with
a big switch-case block, then you should definitely use the virtual
functions instead.

  However, if you can design the program so that the function to be
called can be resolved at compilation time (instead of at runtime as
happens with virtual functions or switch-case blocks) then you might
get a slight speedup, specially if the function is short and called
millions of times.

--
                                                          - Warp


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile   Translate to Translated (View Original)
 More options 11 May 2005, 01:12
Newsgroups: comp.graphics.rendering.raytracing
From: Jon Harrop <use...@jdh30.plus.com>
Date: Wed, 11 May 2005 01:12:06 +0100
Local: Wed 11 May 2005 01:12
Subject: Re: Mini ray tracer

Warp wrote:
> Jon Harrop <use...@jdh30.plus.com> wrote:
>> In point of fact, the program uses non-const references to return
>> multiple results from functions. IMHO, this is often done in C++ (and
>> pass by non-const pointer in C) because it is too tedious to define a new
>> type for the return values of each function in C/C++.

>   Often? It is usually rare that you have to return several values
> from a function so that usually all of them are needed.

It may seem rare when you use a programming language that makes it
difficult, but lots of useful functions return multiple values.

>   The problem of making one function which returns many values versus
> making several functions, each one returning one value, is that when
> you only need that one value returning many is useless overhead and
> forces you to make clumsy code (you have to create dummy variables to
> hold those return variables you don't need and then immediately discard
> them).

In the case of the "intersect" function, you would not split it into two
functions to compute the parameter and normal vector of the intersection,
for example.

>   When there's a logical reason to return many values at once, returning
> a class or struct as the return value of the function may in fact be a
> good idea. If you call the function like this:

> ReturnValues values = theFunction(whatever);

> where ReturnValues is a class or struct, C++ compilers (or at least gcc;
> probably also most of the other compilers) will actually make that very
> efficiently (they will perform the so-called return value optimization).

Creating structs for the return values of every function that returns
multiple values may be a good idea, but the verbosity and clumsiness of
doing this in C++ makes it quite undesirable.

>   The return value optimization means that...

If it is performed, yes.

>> As there are currently only two values being returned by the intersect
>> routine, I could have used STL pair<> instead. I might investigate that
>> (I think it will be simpler but slower). I have seen code which uses
>> pair<pair<...>, ...> to return three values but I don't much like it.

>   How about just creating your own struct or class?

That would generate quite a bit more code and would probably break the 100
LOC limit.

>   However, if you can design the program so that the function to be
> called can be resolved at compilation time (instead of at runtime as
> happens with virtual functions or switch-case blocks) then you might
> get a slight speedup, specially if the function is short and called
> millions of times.

I do not believe that can be done in this case (without breaking generality
and the equivalence with the OCaml implementation).

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Warp  
View profile   Translate to Translated (View Original)
 More options 11 May 2005, 10:49
Newsgroups: comp.graphics.rendering.raytracing
From: Warp <w...@cs.tut.fi>
Date: Wed, 11 May 2005 09:49:23 +0000 (UTC)
Local: Wed 11 May 2005 10:49
Subject: Re: Mini ray tracer

Jon Harrop <use...@jdh30.plus.com> wrote:
> Creating structs for the return values of every function that returns
> multiple values may be a good idea, but the verbosity and clumsiness of
> doing this in C++ makes it quite undesirable.

  You mean modular?

  Have you ever heard of abstraction?

--
                                                          - Warp


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tbp  
View profile   Translate to Translated (View Original)
(1 user)  More options 11 May 2005, 11:59
Newsgroups: comp.graphics.rendering.raytracing
From: "tbp" <tbp...@gmail.com>
Date: 11 May 2005 03:59:18 -0700
Local: Wed 11 May 2005 11:59
Subject: Re: Mini ray tracer
Jon Harrop wrote:
> tbp wrote:
> > If i sum it up, nothing's appropriate but what you did which
> > incidentally happens to be the worst way to implement it in C++.

> Perhaps there is a better way but I haven't seen it.

> >> I'd be very happy to include a better C++ version, if you have
one.
> > Better in what metric? I'm cautious because it seems we live in two
> > distinct reality.

> Better => Faster whilst remaining clear and satisfying the
requirements of
> the shootout.

I'll leave non objective criterions aside (how do you measure clarity?)
and, as you insist on discarding my previous patches & measures, i've
taken the time this morning to make this up:
http://ompf.org/ray/miniray.cpp
# grep -e virtual -e inline miniray.cpp
# wc -l miniray.cpp
97 miniray.cpp
# g++ -O2 -ffast-math miniray.cpp -o miniray
# time ./miniray >pix.ppm

real    0m5.862s
user    0m5.800s
sys     0m0.006s
# cmp pix.orig.ppm pix.ppm; echo $?
0
# composite -compose difference pix.orig.ppm pix.ppm diff.ppm
# display diff.ppm
[contemplate a black picture]

So, on my box, the original version took ~13s then ~10s after some
patching, ~8s after some more patching and now we're at ~6s with a
const correct non virtual rule compliant version.

Before you start nitpicking please consider that:
a) this version complies with the shootout rules AFAIK.
b) it outputs the exact same picture.
c) i had to raise the epsilon a bit to avoid acnea, it has to do with
the way i compute normals but i'm not going to fix that.
d) i removed the virtualness in the most straightforward way and it's
certainly not aesthetically pleasing
e) i'm not going to spend ages on this, hence c) & d).
f) it's algorithmically equivalent to your version (ie primary and
shadow rays aren't special cased etc...)

> In both cases, the types of objects and bounding volumes are

extensible.
That means nothing. Everything is extensible.
And in this *particular case*, using dynamic binding is, again,
superfluous.
I've resolved that in a really crude way, but that's just one
possibility.

> > So you're saying intersections aren't part of the hotpath?
> > Well you maybe right, after all it's only called a few million
times.

> Yes, exactly. It isn't really that surprising when you consider that
> "ray_sphere" is called ~10x more than "intersect" and it contains
several
> branches, lots of floating point maths and function calls.

I was sarcastic; "ray_sphere" calls that interection routine which
cannot be inlined because it's virtual etc...
Also, the fact is that this kind of sphere intersection can easily be
made branchless with something like:
if ((d < 0.) | (t2 < 0.)) return infinity;
else return t1 > 0. ? t1 : t2;
But you want to branch away from the square root as it is what's
wasting most cycles.

> >> Yes, that was also done deliberately to squeeze the C++ into 100
LOC.
> > Bingo!

> > And i suppose that the fact that this emasculated C++ version
performed
> > slower than its OCaml counterpart was merely a side effect?

> The C++ outperforms the OCaml on AMD64 (and IA64). I'd say that the
> performance is comparable in all cases.

Again, your benchmark as presented it totally unfair.

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tbp  
View profile   Translate to Translated (View Original)
 More options 11 May 2005, 12:08
Newsgroups: comp.graphics.rendering.raytracing
From: "tbp" <tbp...@gmail.com>
Date: 11 May 2005 04:08:22 -0700
Local: Wed 11 May 2005 12:08
Subject: Re: Mini ray tracer
Jon Harrop wrote:
> Ok. I don't know anything about intrinsics but they don't seem to be
> relevant to my comparison of a tutorial ray tracer in C++ and OCaml.

True.

> As I've said, if you want to optimise my ray tracer (or any other
program)
> then I would recommend algorithmic optimisations over low-level ones

like
Huh. Really?

> SIMD, cache coherency etc. There are plenty more algorithmic
optimisations
> to be done (although I've already done the main one - hierarchical
bounding
> volumes).

So what? The fact is that implementation quality matters just as much.
A program has to run on a given machine at some point.

> In this case, it looks as though you'd want to write specialised C
code for
> certain platforms. This could be called from C++ or OCaml and it
would
> speed both programs up by roughly the same amount. This might be a
good
> idea though, as OCaml is clearly preferable for the higher-level,
not-
> performance-critical bulk of the code. So you could get rid of C++
> altogether. :-)

If you want to see it that way, that's fine.
But i wouldn't say that spending most cycles in c++ called from "higher
level" amounts to getting rid of it. Au contraire.

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile   Translate to Translated (View Original)
 More options 11 May 2005, 12:58
Newsgroups: comp.graphics.rendering.raytracing
From: Jon Harrop <use...@jdh30.plus.com>
Date: Wed, 11 May 2005 12:58:33 +0100
Local: Wed 11 May 2005 12:58
Subject: Re: Mini ray tracer

Warp wrote:
> Jon Harrop <use...@jdh30.plus.com> wrote:
>> Creating structs for the return values of every function that returns
>> multiple values may be a good idea, but the verbosity and clumsiness of
>> doing this in C++ makes it quite undesirable.

>   You mean modular?

No.

>   Have you ever heard of abstraction?

Yes.

For example, the following OCaml function returns the given number as well
as its square and cube:

let f n = n, n*n, n*n*n;;

The C++ equivalent of this using a struct is:

struct res { int n, n2, n3; };

res f(int n) {
  res ans;
  ans.n = n;
  ans.n2 = n*n;
  ans.n3 = n*n*n;
  return ans;

}

The C++ equivalent of this using a STL pair is:

pair<int, pair<int, int> > f(int n) {
  return pair<int, pair<int, int> >(n, n*n, n*n*n);

}

Do you see what I mean by verbose and clumsy?

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile   Translate to Translated (View Original)
 More options 11 May 2005, 14:07
Newsgroups: comp.graphics.rendering.raytracing
From: Jon Harrop <use...@jdh30.plus.com>
Date: Wed, 11 May 2005 14:07:46 +0100
Local: Wed 11 May 2005 14:07
Subject: Re: Mini ray tracer

tbp wrote:
>> SIMD, cache coherency etc. There are plenty more algorithmic
> optimisations
>> to be done (although I've already done the main one - hierarchical
> bounding
>> volumes).
> So what? The fact is that implementation quality matters just as much.

No, it doesn't. If you can reduce your algorithm from O(n) to O(log n) then
you should do so before you rewrite it in assembler.

> A program has to run on a given machine at some point.

True. I'm not saying that it is not worthwhile to optimise a ray tracer with
SIMD, just that it would be premature to do so on my ray tracer right now.

I would be very interested to see a highly optimised version of my ray
tracer. I think a lot of people could learn from a set of web pages which
show how to optimise the programs. If you're willing to help and I can find
the time then I'd like to do this sometime.

>> In this case, it looks as though you'd want to write specialised C
> code for
>> certain platforms. This could be called from C++ or OCaml and it
> would
>> speed both programs up by roughly the same amount. This might be a
> good
>> idea though, as OCaml is clearly preferable for the higher-level,
> not-
>> performance-critical bulk of the code. So you could get rid of C++
>> altogether. :-)
> If you want to see it that way, that's fine.
> But i wouldn't say that spending most cycles in c++ called from "higher
> level" amounts to getting rid of it. Au contraire.

I wouldn't call this kind of stuff C++:

  a = _mm_load_ps(inp_sse1);
  b = _mm_load_ps(inp_sse2);
  c = _mm_add_ps(a, b);
  _mm_store_ps(out_sse, c);

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Messages 1 - 25 of 48   Newer >
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google