Google Groups Home
Help | Sign in
Mini ray tracer
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 26 - 48 of 48 - Collapse all < Older 
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
Jon Harrop  
View profile
(1 user)  More options 11 May 2005, 14:37
Newsgroups: comp.graphics.rendering.raytracing
From: Jon Harrop <use...@jdh30.plus.com>
Date: Wed, 11 May 2005 14:37:19 +0100
Local: Wed 11 May 2005 14:37
Subject: Re: Mini ray tracer

tbp wrote:
> I'll leave non objective criterions aside (how do you measure clarity?)
> and, as you insist on discarding my previous patches & measures, i've
> taken the time this morning to make this up:

I appreciate it, thank you.

On my 1.2GHz Athlon t-bird I get:

$ g++-3.4 -Wall -O3 -ffast-math -funroll-all-loops ray.cpp -o ray
$ time ./ray >image.ppm

real    0m55.256s
user    0m54.860s
sys     0m0.080s
$ g++-3.4 -Wall -O3 -ffast-math -funroll-all-loops miniray.cpp -o miniray
miniray.cpp:18: warning: division by zero in `1.0e+0 / 0.'
$ time ./miniray >image.ppm

real    0m28.587s
user    0m28.400s
sys     0m0.040s

So your optimised version is 1.9x faster.

My optimised version is basically the same speed as yours (although quite
different):

$ g++-3.4 -Wall -O3 -march=athlon-tbird -ffast-math -funroll-all-loops
ray4.cpp -o ray4
$ time ./ray4 >image.ppm

real    0m29.960s
user    0m27.150s
sys     0m0.040s

> Before you start nitpicking please consider that:
> a) this version complies with the shootout rules AFAIK.

Yes.

> b) it outputs the exact same picture.

Yes.

> c) i had to raise the epsilon a bit to avoid acnea, it has to do with
> the way i compute normals but i'm not going to fix that.

Your epsilon is actually smaller than my delta (which was sqrt epsilon).

> d) i removed the virtualness in the most straightforward way and it's
> certainly not aesthetically pleasing

I'll study this in more detail.

> e) i'm not going to spend ages on this, hence c) & d).

What better solution would you recommend for (d)?

My optimised version uses:

struct Tree {
  vector<Tree> child;
  Vec center;
  double radius;
  Tree(Vec c, double r) : child(), center(c), radius(r) {}

};

which I don't much like.

> f) it's algorithmically equivalent to your version (ie primary and
> shadow rays aren't special cased etc...)

My optimised version has a specialised intersect function for shadow rays
(but still fits in <100 LOC and is just a clear, IMHO).

>> In both cases, the types of objects and bounding volumes are
> extensible.
> That means nothing. Everything is extensible.
> And in this *particular case*, using dynamic binding is, again,
> superfluous.
> I've resolved that in a really crude way, but that's just one
> possibility.

Can you describe how a new kind of object/bound would be implemented?

>> Yes, exactly. It isn't really that surprising when you consider that
>> "ray_sphere" is called ~10x more than "intersect" and it contains
> several
>> branches, lots of floating point maths and function calls.
> I was sarcastic; "ray_sphere" calls that interection routine which
> cannot be inlined because it's virtual etc...

In my original implementation, "intersect" calls "ray_sphere" and not the
other way around. Do you mean "ray_sphere" could not have been inlined?

> Also, the fact is that this kind of sphere intersection can easily be
> made branchless with something like:
> if ((d < 0.) | (t2 < 0.)) return infinity;
> else return t1 > 0. ? t1 : t2;
> But you want to branch away from the square root as it is what's
> wasting most cycles.

I hadn't optimised "ray_sphere".

>> The C++ outperforms the OCaml on AMD64 (and IA64). I'd say that the
>> performance is comparable in all cases.
> Again, your benchmark as presented it totally unfair.

In what way is it unfair?

Here's the C++ I had:

#include <vector>
#include <iostream>
#include <limits>
#include <cmath>
using namespace std;
numeric_limits<double> dbl;
double delta = sqrt(dbl.epsilon()), infinity = dbl.infinity(), pi = M_PI;
struct Vec { // 3D vector
  double x, y, z;
  Vec(double x2, double y2, double z2) : x(x2), y(y2), z(z2) {}

};

inline Vec operator+(const Vec &a, const Vec &b)
{ return Vec(a.x + b.x, a.y + b.y, a.z + b.z); }
inline Vec operator-(const Vec &a, const Vec &b)
{ return Vec(a.x - b.x, a.y - b.y, a.z - b.z); }
inline Vec operator*(double a, const Vec &b)
{ return Vec(a * b.x, a * b.y, a * b.z); }
inline double dot(const Vec &a, const Vec &b) { return a.x*b.x + a.y*b.y +
a.z*b.z; }
inline Vec unitise(const Vec &a) { return (1 / sqrt(dot(a, a))) * a; }
struct Ray { Vec orig, dir; Ray(Vec o, Vec d) : orig(o), dir(d) {} };
struct Tree {
  vector<Tree> child;
  Vec center;
  double radius;
  Tree(Vec c, double r) : child(), center(c), radius(r) {}
};

inline double ray_sphere(const Ray &ray, const Tree &tree) {
  Vec v = tree.center - ray.orig;
  double b = dot(v, ray.dir), disc = b*b - dot(v, v) +
tree.radius*tree.radius;
  if (disc < 0) return infinity;
  double d = sqrt(disc), t2 = b + d;
  if (t2 < 0) return infinity;
  double t1 = b - d;
  return (t1 > 0 ? t1 : t2);
}

void intersect(double &lambda, Vec &normal, const Ray &ray, const Tree
&tree) {
  double l = ray_sphere(ray, tree);
  if (l >= lambda) return;
  if (tree.child.size() == 0) {
    lambda = l;
    normal = unitise(ray.orig + l * ray.dir - tree.center);
  } else
    for (vector<Tree>::const_iterator it=tree.child.begin();
  it!=tree.child.end(); ++it)
      intersect(lambda, normal, ray, *it);
}

bool intersect(const Ray &ray, const Tree &tree) {
  if (ray_sphere(ray, tree) == infinity) return false;
  if (tree.child.size() == 0) return true; else {
    for (vector<Tree>::const_iterator it=tree.child.begin();
  it != tree.child.end(); ++it)
      if (intersect(ray, *it)) return true;
  }
  return false;
}

double ray_trace(const double weight, const Vec &light, const Ray &ray,
   const Tree &tree) {
  double lambda = infinity;
  Vec normal(0, 0, 0);
  intersect(lambda, normal, ray, tree);
  if (lambda == infinity) return 0;
  Vec o = ray.orig + lambda * ray.dir + delta * normal;
  double g = -dot(normal, light);
  if (g <= 0) return 0.;
  return (intersect(Ray(o, Vec(0, 0, 0) - light), tree) ? 0 : g);
}

Tree create(int level, double r, double x, double y, double z) {
  if (level == 1) return Tree(Vec(x, y, z), r);
  Tree group = Tree(Vec(x, y, z), 3*r);
  group.child.push_back(Tree(Vec(x, y, z), r));
  double rn = 3*r/sqrt(12.);
  for (int dz=-1; dz<=1; dz+=2)
    for (int dx=-1; dx<=1; dx+=2)
      group.child.push_back(create(level-1, r/2, x-dx*rn, y+rn, z-dz*rn));
  return group;
}

int main(int argc, char *argv[]) {
  int level = (argc==2 ? atoi(argv[1]) : 6), w = 512, h = 512, ss = 4;
  Tree tree = create(level, 1, 0, -1, 0);
  cout << "P2\n" << w << " " << h << "\n256\n";
  for (int y=h-1; y>=0; --y) {
    for (int x=0; x<w; ++x) {
      double g=0;
      for (int dx=0; dx<ss; ++dx)
 for (int dy=0; dy<ss; ++dy) {
   Vec d(x+double(dx)/ss-w/2, y+double(dy)/ss-h/2, max(w, h));
   g += ray_trace(1, unitise(Vec(-1, -3, 2)), Ray(Vec(0, 0, -4),
        unitise(d)), tree);
 }
      cout << min(255, int(256. * g / (ss*ss))) << " ";
    }
    cout << endl;
  }
  return 0;

}

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tbp  
View profile
(1 user)  More options 11 May 2005, 15:51
Newsgroups: comp.graphics.rendering.raytracing
From: "tbp" <tbp...@gmail.com>
Date: 11 May 2005 07:51:24 -0700
Local: Wed 11 May 2005 15:51
Subject: Re: Mini ray tracer
Jon Harrop wrote:
> I appreciate it, thank you.

You're welcome.

> On my 1.2GHz Athlon t-bird I get:
> $ g++-3.4 -Wall -O3 -ffast-math -funroll-all-loops miniray.cpp -o
miniray
> miniray.cpp:18: warning: division by zero in `1.0e+0 / 0.'
> $ time ./miniray >image.ppm

> real    0m28.587s
> user    0m28.400s
> sys     0m0.040s

> So your optimised version is 1.9x faster.

I've only quickly checked with g++-3.4, g++-4.x does a much better job
(at least on my opteron on x86-64, haven't checked on my t-bird).

And i wouldn't consider my version as optimized by any stretch, but
just one way to decently write a c++ implementation of that thing.

> My optimised version is basically the same speed as yours (although
quite
> different):

Hmm, need to check what it gives with a shadow ray shortcut, if i can
cram it up in there.

> > c) i had to raise the epsilon a bit to avoid acnea, it has to do
with
> > the way i compute normals but i'm not going to fix that.

> Your epsilon is actually smaller than my delta (which was sqrt

epsilon).
Ah! Missed that bit. Anyway that shouldn't make a difference as both
epsilons are too small to cull any part of the hierarchy away.

> > d) i removed the virtualness in the most straightforward way and
it's
> > certainly not aesthetically pleasing

> I'll study this in more detail.

> > e) i'm not going to spend ages on this, hence c) & d).

> What better solution would you recommend for (d)?

If that's better in the more efficient sense, none.
If that's better in the tigher memory foot print sense, you could
shrink it a bit by removing the use of a bounding sphere for 'final'
nodes (those sitting at the lowest lvl) and paying for that at runtime
with a branch or something.
Other than that it's just a matter of taste, and the virtual solution
is the most concise.

> My optimised version uses:

> struct Tree {
>   vector<Tree> child;
>   Vec center;
>   double radius;
>   Tree(Vec c, double r) : child(), center(c), radius(r) {}
> };

> which I don't much like.

It's a bit tighter than my solution, but both are variations on the
same theme.
Ideally i would use an explicit memory layout so things could be
properly aligned etc... And it would certainly end up being much
uglier.

> > f) it's algorithmically equivalent to your version (ie primary and
> > shadow rays aren't special cased etc...)

> My optimised version has a specialised intersect function for shadow
rays
> (but still fits in <100 LOC and is just a clear, IMHO).

I'll try to bolt something like that on top of my version.

> Can you describe how a new kind of object/bound would be implemented?

A BVH is fine for that kind of scenes, but it's all in the construction
of the hierarchy (or put another way, in its quality).
Now i seriously doubt i could cram a SAH compiler within 100 LOC :)

> In my original implementation, "intersect" calls "ray_sphere" and not
the
> other way around. Do you mean "ray_sphere" could not have been

inlined?
Sphere::intersect calls Sphere::ray_sphere, and that maybe inlined but
Sphere::intersect itself then cannot be inlined because it's virtual
and not statically resolved (and on top of that instead of a static
call, you pay for a indirect one).
Compare that to what happens in my version where the compiler has the
choice to inline to its heart content; that is placing (or not) a call
if that's cheaper at some point.
Like i've said earlier exposing things to the compiler is the name of
the game.

> > Also, the fact is that this kind of sphere intersection can easily
be
> > made branchless with something like:
> > if ((d < 0.) | (t2 < 0.)) return infinity;
> > else return t1 > 0. ? t1 : t2;
> > But you want to branch away from the square root as it is what's
> > wasting most cycles.

> I hadn't optimised "ray_sphere".

I was just saying that, in that function cycles are spent computing the
square root, not branching; so it's cheaper to avoid with a branch than
to unconditionnaly compute it (and then use some conditionnal moves).

> >> The C++ outperforms the OCaml on AMD64 (and IA64). I'd say that
the
> >> performance is comparable in all cases.
> > Again, your benchmark as presented it totally unfair.

> In what way is it unfair?

Because you obstructed the c++ compiler in many many ways, the most
prominent being passing structures by value instead of by reference (as
show in previous patches).
That's why i've also said the comparison was rigged as you didn't put
as much effort on the c++ side as on the OCaml; it's not surprising my
version is ~2x faster than the original.

Now if you present an amended version that at least fixes those issues,
i'll have no objections to whatever conclusion you can come up to.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Warp  
View profile
 More options 11 May 2005, 15:53
Newsgroups: comp.graphics.rendering.raytracing
From: Warp <w...@cs.tut.fi>
Date: Wed, 11 May 2005 14:53:27 +0000 (UTC)
Local: Wed 11 May 2005 15:53
Subject: Re: Mini ray tracer

Jon Harrop <use...@jdh30.plus.com> wrote:
> >> Creating structs for the return values of every function that returns
> >> multiple values may be a good idea, but the verbosity and clumsiness of
> >> doing this in C++ makes it quite undesirable.

> >   You mean modular?
> No.

  Then IMO you are wrong.

> Do you see what I mean by verbose and clumsy?

  No.

  Granted, C++ does not offer some features other programming languages do,
but you gave a misleading example. You gave the impression that if you
want to return three values from a function you have to always write
the entire struct code plus all the separate assignments.
  That is like saying that each time you want to use a string you have
to write a string class explicitly.

  C++ offers tools to make that code of yours simpler and shorter.
The way you gave in your example is not the only nor by any means
the best way.

  Besides, it usually comes down to design. It is rare that a function
just returns some values whose type and meaning are random. Usually
those returned values have a close relation to each other and form
a logical group of values.
  For example, if you have a function which returns an intersection point
and a normal vector, does it make sense, from design point of view, to
return these values using generic nameless structures with no higher
logic in them?
  Or does it make more sense to create a logical data container called
according to its usage, for example "Intersection", which contains the
two (or more) things which are closely related to each other?

  Which do you consider more logical, a function which just returns
a pair of values (using a generic container), or a function which returns
a value of type "Intersection"?

--
                                                          - Warp


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tbp  
View profile
 More options 11 May 2005, 16:13
Newsgroups: comp.graphics.rendering.raytracing
From: "tbp" <tbp...@gmail.com>
Date: 11 May 2005 08:13:02 -0700
Local: Wed 11 May 2005 16:13
Subject: Re: Mini ray tracer
Jon Harrop wrote:
> > So what? The fact is that implementation quality matters just as
much.

> No, it doesn't. If you can reduce your algorithm from O(n) to O(log
n) then
> you should do so before you rewrite it in assembler.

As we're obviously both right, and in fine it's a matter of tradeoffs,
it would pointless to argue further more.

> > A program has to run on a given machine at some point.

> True. I'm not saying that it is not worthwhile to optimise a ray
tracer with
> SIMD, just that it would be premature to do so on my ray tracer right

now.
I beg to differ. You can't simply plug in some SIMD scheme on top of
that as an afterthought, because it's tightly coupled with the way you
represent things.
That's one of the common pitfall of OO abstraction.
Wald discuss that point a bit in his thesis and i couldn't agree more.
But i know it's a not a very popular opinion atm.

> I would be very interested to see a highly optimised version of my
ray
> tracer. I think a lot of people could learn from a set of web pages
which
> show how to optimise the programs. If you're willing to help and I
can find
> the time then I'd like to do this sometime.

There's certainly lots of angles to attack that problem
(algorithmically or not), but i doubt any of them could fit within 100
LOC.
Beside C++ idiomatic issues, a better memory layout could certainly
help.
That's, perhaps, the lowest hanging fruit (that or 'cheating' with
lower numerical precision, ie for square roots).

Anyway as i've already invested some time in a c++ version, i'm open to
suggestions :)

> I wouldn't call this kind of stuff C++:

>   a = _mm_load_ps(inp_sse1);
>   b = _mm_load_ps(inp_sse2);
>   c = _mm_add_ps(a, b);
>   _mm_store_ps(out_sse, c);

But that's legit C++.
C++ is C on steroids while C itself is just a gloried assembler.
No matter how much OO or syntactic sugar you pour onto it.
I kinda apreciate to be able to code to-the-metal.
To each, its own.

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile
 More options 11 May 2005, 16:51
Newsgroups: comp.graphics.rendering.raytracing
From: Jon Harrop <use...@jdh30.plus.com>
Date: Wed, 11 May 2005 16:51:25 +0100
Local: Wed 11 May 2005 16:51
Subject: Re: Mini ray tracer

Warp wrote:
>   Besides, it usually comes down to design. It is rare that a function
> just returns some values whose type and meaning are random.

No, that is my point. It is common, not rare. Look at the prevelance of
tuple return types in ML programs and you'll see what I mean. You won't see
it in C++ code because C++ makes it too tedious for people to bother coding
it.

Many of the functions in the OCaml standard library return 2-tuples. None of
them need to be explicitly defined.

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile
 More options 11 May 2005, 16:54
Newsgroups: comp.graphics.rendering.raytracing
From: Jon Harrop <use...@jdh30.plus.com>
Date: Wed, 11 May 2005 16:54:39 +0100
Local: Wed 11 May 2005 16:54
Subject: Re: Mini ray tracer

tbp wrote:
> You can't simply plug in some SIMD scheme on top of
> that as an afterthought, because it's tightly coupled with the way you
> represent things.
> That's one of the common pitfall of OO abstraction.
> Wald discuss that point a bit in his thesis and i couldn't agree more.
> But i know it's a not a very popular opinion atm.

I agree, actually. I know nothing of cache coherency in the context of ray
tracers though.

> There's certainly lots of angles to attack that problem
> (algorithmically or not), but i doubt any of them could fit within 100
> LOC.

Yes, it wouldn't make for a very good introductory tutorial either. Although
I'm sure a lot of people (me included) would be interested.

> Beside C++ idiomatic issues, a better memory layout could certainly
> help.
> That's, perhaps, the lowest hanging fruit (that or 'cheating' with
> lower numerical precision, ie for square roots).

It would be interesting to compare equivalently optimised C++ and OCaml
programs based upon float arrays. Handling alignment will be tricky in
OCaml.

>> I wouldn't call this kind of stuff C++:

>>   a = _mm_load_ps(inp_sse1);
>>   b = _mm_load_ps(inp_sse2);
>>   c = _mm_add_ps(a, b);
>>   _mm_store_ps(out_sse, c);
> But that's legit C++.

Is that in the C++ standard or is it a common extension?

> C++ is C on steroids while C itself is just a gloried assembler.
> No matter how much OO or syntactic sugar you pour onto it.
> I kinda apreciate to be able to code to-the-metal.
> To each, its own.

True.

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile
(1 user)  More options 11 May 2005, 17:04
Newsgroups: comp.graphics.rendering.raytracing
From: Jon Harrop <use...@jdh30.plus.com>
Date: Wed, 11 May 2005 17:04:17 +0100
Local: Wed 11 May 2005 17:04
Subject: Re: Mini ray tracer

tbp wrote:
> I've only quickly checked with g++-3.4, g++-4.x does a much better job
> (at least on my opteron on x86-64, haven't checked on my t-bird).

I'll check it out when the Debian packages arrive. :-)

> And i wouldn't consider my version as optimized by any stretch, but
> just one way to decently write a c++ implementation of that thing.

"more optimised" then. :-)

>> My optimised version is basically the same speed as yours (although
> quite
>> different):
> Hmm, need to check what it gives with a shadow ray shortcut, if i can
> cram it up in there.

I don't think it actually made mine much faster so it probably isn't worth
it.

> It's a bit tighter than my solution, but both are variations on the
> same theme.
> Ideally i would use an explicit memory layout so things could be
> properly aligned etc... And it would certainly end up being much
> uglier.

The OCaml might be significantly easier to understand if suitable
higher-order functions can be factored out which make the memory layout
"look" like a tree.

>> Can you describe how a new kind of object/bound would be implemented?
> A BVH is fine for that kind of scenes, but it's all in the construction
> of the hierarchy (or put another way, in its quality).
> Now i seriously doubt i could cram a SAH compiler within 100 LOC :)

What is an SAH compiler?

>> In my original implementation, "intersect" calls "ray_sphere" and not
> the
>> other way around. Do you mean "ray_sphere" could not have been
> inlined?
> Sphere::intersect calls Sphere::ray_sphere, and that maybe inlined but

That's the important one though.

> Sphere::intersect itself then cannot be inlined because it's virtual
> and not statically resolved (and on top of that instead of a static
> call, you pay for a indirect one).
> Compare that to what happens in my version where the compiler has the
> choice to inline to its heart content; that is placing (or not) a call
> if that's cheaper at some point.
> Like i've said earlier exposing things to the compiler is the name of
> the game.

Yes. Although this is only useful to other people if it has a description of
what information is being provided to the compiler. It would be nice to
write this up.

>> In what way is it unfair?
> Because you obstructed the c++ compiler in many many ways,

I think the OCaml compiler was equivalently obstructed.

> the most
> prominent being passing structures by value instead of by reference (as
> show in previous patches).

I think the compiler could (should) have spotted that optimisation without
having to be told.

> That's why i've also said the comparison was rigged as you didn't put
> as much effort on the c++ side as on the OCaml; it's not surprising my
> version is ~2x faster than the original.

Well, I have a more optimised OCaml version which is just as fast on x86 and
20 lines shorter.

> Now if you present an amended version that at least fixes those issues,
> i'll have no objections to whatever conclusion you can come up to.

I'll try to write another page describing both optimised versions. I also
have a C version now, which is slower than both the C++ and the OCaml. :-)

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tbp  
View profile
(1 user)  More options 11 May 2005, 17:49
Newsgroups: comp.graphics.rendering.raytracing
From: "tbp" <tbp...@gmail.com>
Date: 11 May 2005 09:49:36 -0700
Local: Wed 11 May 2005 17:49
Subject: Re: Mini ray tracer
Jon Harrop wrote:
> tbp wrote:
> > I've only quickly checked with g++-3.4, g++-4.x does a much better
job
> > (at least on my opteron on x86-64, haven't checked on my t-bird).

> I'll check it out when the Debian packages arrive. :-)

There's a 4.0.0 package, at least on sid 64. Otherwise building a
snapshot is straightforward, yet a bit off topic :)

> > Hmm, need to check what it gives with a shadow ray shortcut, if i
can
> > cram it up in there.

> I don't think it actually made mine much faster so it probably isn't
worth
> it.

A quick hack at that (within the rules) shaved a second of runtime
here.
# diff -u miniray.cpp miniray_shadow.cpp
--- miniray.cpp 2005-05-11 14:09:08.000000000 +0200
+++ miniray_shadow.cpp  2005-05-11 20:11:00.000000000 +0200
@@ -47,12 +47,13 @@
   if (t >= hit.t) return;
   else { hit.t = t; hit.n = leaf.get_normal(ray.o + ray.d*t); }
  }
-  void intersect(const ray_t &ray, hit_t &hit) const {
+  template<bool shadow> void intersect(const ray_t &ray, hit_t &hit)
const {
   if (!objs.empty()) {
    if (bound.intersect(ray) >= hit.t) return;
    intersect_leaf(ray,hit);
+   if (shadow & (hit.t != infinity)) return;  // early exit.
    for (objs_t::const_iterator it=objs.begin(); it!=objs.end(); ++it)
-    (*it)->intersect(ray, hit);
+    (*it)->intersect<shadow>(ray, hit);
   }
   else intersect_leaf(ray,hit);
  }
@@ -68,11 +69,11 @@
  else return new group_t(sphere_t(v, r),sphere_t(v, r));
 }
 static double ray_trace(const group_t * const scene, const ray_t &ray)
{
- hit_t hit; scene->intersect(ray, hit);
+ hit_t hit; scene->intersect<false>(ray, hit);
  const double diffuse = (hit.t == infinity) ? 0. : -hit.n.dot(light);
  if (diffuse <= 0.) return 0.;
  const ray_t sray(ray.o + (ray.d*hit.t) + (hit.n*epsilon), -light);
- hit_t shit; scene->intersect(sray,shit);
+ hit_t shit; scene->intersect<true>(sray,shit);
  return (shit.t == infinity) ? diffuse : 0.;
 }
 int main(int argc, char *argv[]) {

# time ./miniray >pix.ppm

real    0m4.856s
user    0m4.812s
sys     0m0.004s

> The OCaml might be significantly easier to understand if suitable
> higher-order functions can be factored out which make the memory
layout
> "look" like a tree.

I'm absolutely not qualified to comment on OCaml unfortunately :)

> >> Can you describe how a new kind of object/bound would be
implemented?
> > A BVH is fine for that kind of scenes, but it's all in the
construction
> > of the hierarchy (or put another way, in its quality).
> > Now i seriously doubt i could cram a SAH compiler within 100 LOC :)

> What is an SAH compiler?

Surface Area Heuristic, a way to evaluate costs when building a
hierarchy.
Does wonders on BVH, kd-tree... you name it. But it's tricky to get it
right and even more to make it not behave quadratically.
Havran's thesis is a must read, Wald builds upon it (but there's a
metric ton of litterature about it).
http://www.cgg.cvut.cz/~havran/phdthesis.html
http://www.mpi-sb.mpg.de/~wald/PhD/

> > Like i've said earlier exposing things to the compiler is the name
of
> > the game.

> Yes. Although this is only useful to other people if it has a
description of
> what information is being provided to the compiler. It would be nice
to
> write this up.

I'm pedagogically challenged :)
On top of peremptoric statements like "know the language" and "know
your compiler", it boils down to things like const correctness,
tediously constraining things and so on.
Or put another way, compilers are mental midgets and C++ is a language
made for compilers and not humans.

Ok, that wasn't really helpful.
But i'm sure someone already has put that in a nicer way somewhere, no?
:)

> >> In what way is it unfair?
> > Because you obstructed the c++ compiler in many many ways,

> I think the OCaml compiler was equivalently obstructed.

I can't say. But for sure, your C++ "style" was not typical.

> > the most
> > prominent being passing structures by value instead of by reference
(as
> > show in previous patches).

> I think the compiler could (should) have spotted that optimisation
without
> having to be told.

That's wishful thinking. And it really really doesn't work that way.
And don't blame g++ for that, as it's actually quite good at that game.
But that's another topic.

> > That's why i've also said the comparison was rigged as you didn't
put
> > as much effort on the c++ side as on the OCaml; it's not surprising
my
> > version is ~2x faster than the original.

> Well, I have a more optimised OCaml version which is just as fast on
x86 and
> 20 lines shorter.

Bring it on ;)

> > Now if you present an amended version that at least fixes those
issues,
> > i'll have no objections to whatever conclusion you can come up to.

> I'll try to write another page describing both optimised versions. I
also
> have a C version now, which is slower than both the C++ and the

OCaml. :-)
Hah.

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.