Google Mail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Mini ray tracer
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 26 - 48 of 48 - Collapse all  -  Translate all to Translated (View all originals) < Older 
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Follow-up To:
Add Cc | Add Follow-up to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers that you hear
 
Jon Harrop  
View profile   Translate to Translated (View Original)
(1 user)  More options 11 May 2005, 14:37
Newsgroups: comp.graphics.rendering.raytracing
From: Jon Harrop <use...@jdh30.plus.com>
Date: Wed, 11 May 2005 14:37:19 +0100
Local: Wed 11 May 2005 14:37
Subject: Re: Mini ray tracer

tbp wrote:
> I'll leave non objective criterions aside (how do you measure clarity?)
> and, as you insist on discarding my previous patches & measures, i've
> taken the time this morning to make this up:

I appreciate it, thank you.

On my 1.2GHz Athlon t-bird I get:

$ g++-3.4 -Wall -O3 -ffast-math -funroll-all-loops ray.cpp -o ray
$ time ./ray >image.ppm

real    0m55.256s
user    0m54.860s
sys     0m0.080s
$ g++-3.4 -Wall -O3 -ffast-math -funroll-all-loops miniray.cpp -o miniray
miniray.cpp:18: warning: division by zero in `1.0e+0 / 0.'
$ time ./miniray >image.ppm

real    0m28.587s
user    0m28.400s
sys     0m0.040s

So your optimised version is 1.9x faster.

My optimised version is basically the same speed as yours (although quite
different):

$ g++-3.4 -Wall -O3 -march=athlon-tbird -ffast-math -funroll-all-loops
ray4.cpp -o ray4
$ time ./ray4 >image.ppm

real    0m29.960s
user    0m27.150s
sys     0m0.040s

> Before you start nitpicking please consider that:
> a) this version complies with the shootout rules AFAIK.

Yes.

> b) it outputs the exact same picture.

Yes.

> c) i had to raise the epsilon a bit to avoid acnea, it has to do with
> the way i compute normals but i'm not going to fix that.

Your epsilon is actually smaller than my delta (which was sqrt epsilon).

> d) i removed the virtualness in the most straightforward way and it's
> certainly not aesthetically pleasing

I'll study this in more detail.

> e) i'm not going to spend ages on this, hence c) & d).

What better solution would you recommend for (d)?

My optimised version uses:

struct Tree {
  vector<Tree> child;
  Vec center;
  double radius;
  Tree(Vec c, double r) : child(), center(c), radius(r) {}

};

which I don't much like.

> f) it's algorithmically equivalent to your version (ie primary and
> shadow rays aren't special cased etc...)

My optimised version has a specialised intersect function for shadow rays
(but still fits in <100 LOC and is just a clear, IMHO).

>> In both cases, the types of objects and bounding volumes are
> extensible.
> That means nothing. Everything is extensible.
> And in this *particular case*, using dynamic binding is, again,
> superfluous.
> I've resolved that in a really crude way, but that's just one
> possibility.

Can you describe how a new kind of object/bound would be implemented?

>> Yes, exactly. It isn't really that surprising when you consider that
>> "ray_sphere" is called ~10x more than "intersect" and it contains
> several
>> branches, lots of floating point maths and function calls.
> I was sarcastic; "ray_sphere" calls that interection routine which
> cannot be inlined because it's virtual etc...

In my original implementation, "intersect" calls "ray_sphere" and not the
other way around. Do you mean "ray_sphere" could not have been inlined?

> Also, the fact is that this kind of sphere intersection can easily be
> made branchless with something like:
> if ((d < 0.) | (t2 < 0.)) return infinity;
> else return t1 > 0. ? t1 : t2;
> But you want to branch away from the square root as it is what's
> wasting most cycles.

I hadn't optimised "ray_sphere".

>> The C++ outperforms the OCaml on AMD64 (and IA64). I'd say that the
>> performance is comparable in all cases.
> Again, your benchmark as presented it totally unfair.

In what way is it unfair?

Here's the C++ I had:

#include <vector>
#include <iostream>
#include <limits>
#include <cmath>
using namespace std;
numeric_limits<double> dbl;
double delta = sqrt(dbl.epsilon()), infinity = dbl.infinity(), pi = M_PI;
struct Vec { // 3D vector
  double x, y, z;
  Vec(double x2, double y2, double z2) : x(x2), y(y2), z(z2) {}

};

inline Vec operator+(const Vec &a, const Vec &b)
{ return Vec(a.x + b.x, a.y + b.y, a.z + b.z); }
inline Vec operator-(const Vec &a, const Vec &b)
{ return Vec(a.x - b.x, a.y - b.y, a.z - b.z); }
inline Vec operator*(double a, const Vec &b)
{ return Vec(a * b.x, a * b.y, a * b.z); }
inline double dot(const Vec &a, const Vec &b) { return a.x*b.x + a.y*b.y +
a.z*b.z; }
inline Vec unitise(const Vec &a) { return (1 / sqrt(dot(a, a))) * a; }
struct Ray { Vec orig, dir; Ray(Vec o, Vec d) : orig(o), dir(d) {} };
struct Tree {
  vector<Tree> child;
  Vec center;
  double radius;
  Tree(Vec c, double r) : child(), center(c), radius(r) {}
};

inline double ray_sphere(const Ray &ray, const Tree &tree) {
  Vec v = tree.center - ray.orig;
  double b = dot(v, ray.dir), disc = b*b - dot(v, v) +
tree.radius*tree.radius;
  if (disc < 0) return infinity;
  double d = sqrt(disc), t2 = b + d;
  if (t2 < 0) return infinity;
  double t1 = b - d;
  return (t1 > 0 ? t1 : t2);
}

void intersect(double &lambda, Vec &normal, const Ray &ray, const Tree
&tree) {
  double l = ray_sphere(ray, tree);
  if (l >= lambda) return;
  if (tree.child.size() == 0) {
    lambda = l;
    normal = unitise(ray.orig + l * ray.dir - tree.center);
  } else
    for (vector<Tree>::const_iterator it=tree.child.begin();
  it!=tree.child.end(); ++it)
      intersect(lambda, normal, ray, *it);
}

bool intersect(const Ray &ray, const Tree &tree) {
  if (ray_sphere(ray, tree) == infinity) return false;
  if (tree.child.size() == 0) return true; else {
    for (vector<Tree>::const_iterator it=tree.child.begin();
  it != tree.child.end(); ++it)
      if (intersect(ray, *it)) return true;
  }
  return false;
}

double ray_trace(const double weight, const Vec &light, const Ray &ray,
   const Tree &tree) {
  double lambda = infinity;
  Vec normal(0, 0, 0);
  intersect(lambda, normal, ray, tree);
  if (lambda == infinity) return 0;
  Vec o = ray.orig + lambda * ray.dir + delta * normal;
  double g = -dot(normal, light);
  if (g <= 0) return 0.;
  return (intersect(Ray(o, Vec(0, 0, 0) - light), tree) ? 0 : g);
}

Tree create(int level, double r, double x, double y, double z) {
  if (level == 1) return Tree(Vec(x, y, z), r);
  Tree group = Tree(Vec(x, y, z), 3*r);
  group.child.push_back(Tree(Vec(x, y, z), r));
  double rn = 3*r/sqrt(12.);
  for (int dz=-1; dz<=1; dz+=2)
    for (int dx=-1; dx<=1; dx+=2)
      group.child.push_back(create(level-1, r/2, x-dx*rn, y+rn, z-dz*rn));
  return group;
}

int main(int argc, char *argv[]) {
  int level = (argc==2 ? atoi(argv[1]) : 6), w = 512, h = 512, ss = 4;
  Tree tree = create(level, 1, 0, -1, 0);
  cout << "P2\n" << w << " " << h << "\n256\n";
  for (int y=h-1; y>=0; --y) {
    for (int x=0; x<w; ++x) {
      double g=0;
      for (int dx=0; dx<ss; ++dx)
 for (int dy=0; dy<ss; ++dy) {
   Vec d(x+double(dx)/ss-w/2, y+double(dy)/ss-h/2, max(w, h));
   g += ray_trace(1, unitise(Vec(-1, -3, 2)), Ray(Vec(0, 0, -4),
        unitise(d)), tree);
 }
      cout << min(255, int(256. * g / (ss*ss))) << " ";
    }
    cout << endl;
  }
  return 0;

}

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tbp  
View profile   Translate to Translated (View Original)
(1 user)  More options 11 May 2005, 15:51
Newsgroups: comp.graphics.rendering.raytracing
From: "tbp" <tbp...@gmail.com>
Date: 11 May 2005 07:51:24 -0700
Local: Wed 11 May 2005 15:51
Subject: Re: Mini ray tracer
Jon Harrop wrote:
> I appreciate it, thank you.

You're welcome.

> On my 1.2GHz Athlon t-bird I get:
> $ g++-3.4 -Wall -O3 -ffast-math -funroll-all-loops miniray.cpp -o
miniray
> miniray.cpp:18: warning: division by zero in `1.0e+0 / 0.'
> $ time ./miniray >image.ppm

> real    0m28.587s
> user    0m28.400s
> sys     0m0.040s

> So your optimised version is 1.9x faster.

I've only quickly checked with g++-3.4, g++-4.x does a much better job
(at least on my opteron on x86-64, haven't checked on my t-bird).

And i wouldn't consider my version as optimized by any stretch, but
just one way to decently write a c++ implementation of that thing.

> My optimised version is basically the same speed as yours (although
quite
> different):

Hmm, need to check what it gives with a shadow ray shortcut, if i can
cram it up in there.

> > c) i had to raise the epsilon a bit to avoid acnea, it has to do
with
> > the way i compute normals but i'm not going to fix that.

> Your epsilon is actually smaller than my delta (which was sqrt

epsilon).
Ah! Missed that bit. Anyway that shouldn't make a difference as both
epsilons are too small to cull any part of the hierarchy away.

> > d) i removed the virtualness in the most straightforward way and
it's
> > certainly not aesthetically pleasing

> I'll study this in more detail.

> > e) i'm not going to spend ages on this, hence c) & d).

> What better solution would you recommend for (d)?

If that's better in the more efficient sense, none.
If that's better in the tigher memory foot print sense, you could
shrink it a bit by removing the use of a bounding sphere for 'final'
nodes (those sitting at the lowest lvl) and paying for that at runtime
with a branch or something.
Other than that it's just a matter of taste, and the virtual solution
is the most concise.

> My optimised version uses:

> struct Tree {
>   vector<Tree> child;
>   Vec center;
>   double radius;
>   Tree(Vec c, double r) : child(), center(c), radius(r) {}
> };

> which I don't much like.

It's a bit tighter than my solution, but both are variations on the
same theme.
Ideally i would use an explicit memory layout so things could be
properly aligned etc... And it would certainly end up being much
uglier.

> > f) it's algorithmically equivalent to your version (ie primary and
> > shadow rays aren't special cased etc...)

> My optimised version has a specialised intersect function for shadow
rays
> (but still fits in <100 LOC and is just a clear, IMHO).

I'll try to bolt something like that on top of my version.

> Can you describe how a new kind of object/bound would be implemented?

A BVH is fine for that kind of scenes, but it's all in the construction
of the hierarchy (or put another way, in its quality).
Now i seriously doubt i could cram a SAH compiler within 100 LOC :)

> In my original implementation, "intersect" calls "ray_sphere" and not
the
> other way around. Do you mean "ray_sphere" could not have been

inlined?
Sphere::intersect calls Sphere::ray_sphere, and that maybe inlined but
Sphere::intersect itself then cannot be inlined because it's virtual
and not statically resolved (and on top of that instead of a static
call, you pay for a indirect one).
Compare that to what happens in my version where the compiler has the
choice to inline to its heart content; that is placing (or not) a call
if that's cheaper at some point.
Like i've said earlier exposing things to the compiler is the name of
the game.

> > Also, the fact is that this kind of sphere intersection can easily
be
> > made branchless with something like:
> > if ((d < 0.) | (t2 < 0.)) return infinity;
> > else return t1 > 0. ? t1 : t2;
> > But you want to branch away from the square root as it is what's
> > wasting most cycles.

> I hadn't optimised "ray_sphere".

I was just saying that, in that function cycles are spent computing the
square root, not branching; so it's cheaper to avoid with a branch than
to unconditionnaly compute it (and then use some conditionnal moves).

> >> The C++ outperforms the OCaml on AMD64 (and IA64). I'd say that
the
> >> performance is comparable in all cases.
> > Again, your benchmark as presented it totally unfair.

> In what way is it unfair?

Because you obstructed the c++ compiler in many many ways, the most
prominent being passing structures by value instead of by reference (as
show in previous patches).
That's why i've also said the comparison was rigged as you didn't put
as much effort on the c++ side as on the OCaml; it's not surprising my
version is ~2x faster than the original.

Now if you present an amended version that at least fixes those issues,
i'll have no objections to whatever conclusion you can come up to.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Warp  
View profile   Translate to Translated (View Original)
 More options 11 May 2005, 15:53
Newsgroups: comp.graphics.rendering.raytracing
From: Warp <w...@cs.tut.fi>
Date: Wed, 11 May 2005 14:53:27 +0000 (UTC)
Local: Wed 11 May 2005 15:53
Subject: Re: Mini ray tracer

Jon Harrop <use...@jdh30.plus.com> wrote:
> >> Creating structs for the return values of every function that returns
> >> multiple values may be a good idea, but the verbosity and clumsiness of
> >> doing this in C++ makes it quite undesirable.

> >   You mean modular?
> No.

  Then IMO you are wrong.

> Do you see what I mean by verbose and clumsy?

  No.

  Granted, C++ does not offer some features other programming languages do,
but you gave a misleading example. You gave the impression that if you
want to return three values from a function you have to always write
the entire struct code plus all the separate assignments.
  That is like saying that each time you want to use a string you have
to write a string class explicitly.

  C++ offers tools to make that code of yours simpler and shorter.
The way you gave in your example is not the only nor by any means
the best way.

  Besides, it usually comes down to design. It is rare that a function
just returns some values whose type and meaning are random. Usually
those returned values have a close relation to each other and form
a logical group of values.
  For example, if you have a function which returns an intersection point
and a normal vector, does it make sense, from design point of view, to
return these values using generic nameless structures with no higher
logic in them?
  Or does it make more sense to create a logical data container called
according to its usage, for example "Intersection", which contains the
two (or more) things which are closely related to each other?

  Which do you consider more logical, a function which just returns
a pair of values (using a generic container), or a function which returns
a value of type "Intersection"?

--
                                                          - Warp


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tbp  
View profile   Translate to Translated (View Original)
 More options 11 May 2005, 16:13
Newsgroups: comp.graphics.rendering.raytracing
From: "tbp" <tbp...@gmail.com>
Date: 11 May 2005 08:13:02 -0700
Local: Wed 11 May 2005 16:13
Subject: Re: Mini ray tracer
Jon Harrop wrote:
> > So what? The fact is that implementation quality matters just as
much.

> No, it doesn't. If you can reduce your algorithm from O(n) to O(log
n) then
> you should do so before you rewrite it in assembler.

As we're obviously both right, and in fine it's a matter of tradeoffs,
it would pointless to argue further more.

> > A program has to run on a given machine at some point.

> True. I'm not saying that it is not worthwhile to optimise a ray
tracer with
> SIMD, just that it would be premature to do so on my ray tracer right

now.
I beg to differ. You can't simply plug in some SIMD scheme on top of
that as an afterthought, because it's tightly coupled with the way you
represent things.
That's one of the common pitfall of OO abstraction.
Wald discuss that point a bit in his thesis and i couldn't agree more.
But i know it's a not a very popular opinion atm.

> I would be very interested to see a highly optimised version of my
ray
> tracer. I think a lot of people could learn from a set of web pages
which
> show how to optimise the programs. If you're willing to help and I
can find
> the time then I'd like to do this sometime.

There's certainly lots of angles to attack that problem
(algorithmically or not), but i doubt any of them could fit within 100
LOC.
Beside C++ idiomatic issues, a better memory layout could certainly
help.
That's, perhaps, the lowest hanging fruit (that or 'cheating' with
lower numerical precision, ie for square roots).

Anyway as i've already invested some time in a c++ version, i'm open to
suggestions :)

> I wouldn't call this kind of stuff C++:

>   a = _mm_load_ps(inp_sse1);
>   b = _mm_load_ps(inp_sse2);
>   c = _mm_add_ps(a, b);
>   _mm_store_ps(out_sse, c);

But that's legit C++.
C++ is C on steroids while C itself is just a gloried assembler.
No matter how much OO or syntactic sugar you pour onto it.
I kinda apreciate to be able to code to-the-metal.
To each, its own.

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile   Translate to Translated (View Original)
 More options 11 May 2005, 16:51
Newsgroups: comp.graphics.rendering.raytracing
From: Jon Harrop <use...@jdh30.plus.com>
Date: Wed, 11 May 2005 16:51:25 +0100
Local: Wed 11 May 2005 16:51
Subject: Re: Mini ray tracer

Warp wrote:
>   Besides, it usually comes down to design. It is rare that a function
> just returns some values whose type and meaning are random.

No, that is my point. It is common, not rare. Look at the prevelance of
tuple return types in ML programs and you'll see what I mean. You won't see
it in C++ code because C++ makes it too tedious for people to bother coding
it.

Many of the functions in the OCaml standard library return 2-tuples. None of
them need to be explicitly defined.

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile   Translate to Translated (View Original)
 More options 11 May 2005, 16:54
Newsgroups: comp.graphics.rendering.raytracing
From: Jon Harrop <use...@jdh30.plus.com>
Date: Wed, 11 May 2005 16:54:39 +0100
Local: Wed 11 May 2005 16:54
Subject: Re: Mini ray tracer

tbp wrote:
> You can't simply plug in some SIMD scheme on top of
> that as an afterthought, because it's tightly coupled with the way you
> represent things.
> That's one of the common pitfall of OO abstraction.
> Wald discuss that point a bit in his thesis and i couldn't agree more.
> But i know it's a not a very popular opinion atm.

I agree, actually. I know nothing of cache coherency in the context of ray
tracers though.

> There's certainly lots of angles to attack that problem
> (algorithmically or not), but i doubt any of them could fit within 100
> LOC.

Yes, it wouldn't make for a very good introductory tutorial either. Although
I'm sure a lot of people (me included) would be interested.

> Beside C++ idiomatic issues, a better memory layout could certainly
> help.
> That's, perhaps, the lowest hanging fruit (that or 'cheating' with
> lower numerical precision, ie for square roots).

It would be interesting to compare equivalently optimised C++ and OCaml
programs based upon float arrays. Handling alignment will be tricky in
OCaml.

>> I wouldn't call this kind of stuff C++:

>>   a = _mm_load_ps(inp_sse1);
>>   b = _mm_load_ps(inp_sse2);
>>   c = _mm_add_ps(a, b);
>>   _mm_store_ps(out_sse, c);
> But that's legit C++.

Is that in the C++ standard or is it a common extension?

> C++ is C on steroids while C itself is just a gloried assembler.
> No matter how much OO or syntactic sugar you pour onto it.
> I kinda apreciate to be able to code to-the-metal.
> To each, its own.

True.

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile   Translate to Translated (View Original)
(1 user)  More options 11 May 2005, 17:04
Newsgroups: comp.graphics.rendering.raytracing
From: Jon Harrop <use...@jdh30.plus.com>
Date: Wed, 11 May 2005 17:04:17 +0100
Local: Wed 11 May 2005 17:04
Subject: Re: Mini ray tracer

tbp wrote:
> I've only quickly checked with g++-3.4, g++-4.x does a much better job
> (at least on my opteron on x86-64, haven't checked on my t-bird).

I'll check it out when the Debian packages arrive. :-)

> And i wouldn't consider my version as optimized by any stretch, but
> just one way to decently write a c++ implementation of that thing.

"more optimised" then. :-)

>> My optimised version is basically the same speed as yours (although
> quite
>> different):
> Hmm, need to check what it gives with a shadow ray shortcut, if i can
> cram it up in there.

I don't think it actually made mine much faster so it probably isn't worth
it.

> It's a bit tighter than my solution, but both are variations on the
> same theme.
> Ideally i would use an explicit memory layout so things could be
> properly aligned etc... And it would certainly end up being much
> uglier.

The OCaml might be significantly easier to understand if suitable
higher-order functions can be factored out which make the memory layout
"look" like a tree.

>> Can you describe how a new kind of object/bound would be implemented?
> A BVH is fine for that kind of scenes, but it's all in the construction
> of the hierarchy (or put another way, in its quality).
> Now i seriously doubt i could cram a SAH compiler within 100 LOC :)

What is an SAH compiler?

>> In my original implementation, "intersect" calls "ray_sphere" and not
> the
>> other way around. Do you mean "ray_sphere" could not have been
> inlined?
> Sphere::intersect calls Sphere::ray_sphere, and that maybe inlined but

That's the important one though.

> Sphere::intersect itself then cannot be inlined because it's virtual
> and not statically resolved (and on top of that instead of a static
> call, you pay for a indirect one).
> Compare that to what happens in my version where the compiler has the
> choice to inline to its heart content; that is placing (or not) a call
> if that's cheaper at some point.
> Like i've said earlier exposing things to the compiler is the name of
> the game.

Yes. Although this is only useful to other people if it has a description of
what information is being provided to the compiler. It would be nice to
write this up.

>> In what way is it unfair?
> Because you obstructed the c++ compiler in many many ways,

I think the OCaml compiler was equivalently obstructed.

> the most
> prominent being passing structures by value instead of by reference (as
> show in previous patches).

I think the compiler could (should) have spotted that optimisation without
having to be told.

> That's why i've also said the comparison was rigged as you didn't put
> as much effort on the c++ side as on the OCaml; it's not surprising my
> version is ~2x faster than the original.

Well, I have a more optimised OCaml version which is just as fast on x86 and
20 lines shorter.

> Now if you present an amended version that at least fixes those issues,
> i'll have no objections to whatever conclusion you can come up to.

I'll try to write another page describing both optimised versions. I also
have a C version now, which is slower than both the C++ and the OCaml. :-)

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tbp  
View profile   Translate to Translated (View Original)
(1 user)  More options 11 May 2005, 17:49
Newsgroups: comp.graphics.rendering.raytracing
From: "tbp" <tbp...@gmail.com>
Date: 11 May 2005 09:49:36 -0700
Local: Wed 11 May 2005 17:49
Subject: Re: Mini ray tracer
Jon Harrop wrote:
> tbp wrote:
> > I've only quickly checked with g++-3.4, g++-4.x does a much better
job
> > (at least on my opteron on x86-64, haven't checked on my t-bird).

> I'll check it out when the Debian packages arrive. :-)

There's a 4.0.0 package, at least on sid 64. Otherwise building a
snapshot is straightforward, yet a bit off topic :)

> > Hmm, need to check what it gives with a shadow ray shortcut, if i
can
> > cram it up in there.

> I don't think it actually made mine much faster so it probably isn't
worth
> it.

A quick hack at that (within the rules) shaved a second of runtime
here.
# diff -u miniray.cpp miniray_shadow.cpp
--- miniray.cpp 2005-05-11 14:09:08.000000000 +0200
+++ miniray_shadow.cpp  2005-05-11 20:11:00.000000000 +0200
@@ -47,12 +47,13 @@
   if (t >= hit.t) return;
   else { hit.t = t; hit.n = leaf.get_normal(ray.o + ray.d*t); }
  }
-  void intersect(const ray_t &ray, hit_t &hit) const {
+  template<bool shadow> void intersect(const ray_t &ray, hit_t &hit)
const {
   if (!objs.empty()) {
    if (bound.intersect(ray) >= hit.t) return;
    intersect_leaf(ray,hit);
+   if (shadow & (hit.t != infinity)) return;  // early exit.
    for (objs_t::const_iterator it=objs.begin(); it!=objs.end(); ++it)
-    (*it)->intersect(ray, hit);
+    (*it)->intersect<shadow>(ray, hit);
   }
   else intersect_leaf(ray,hit);
  }
@@ -68,11 +69,11 @@
  else return new group_t(sphere_t(v, r),sphere_t(v, r));
 }
 static double ray_trace(const group_t * const scene, const ray_t &ray)
{
- hit_t hit; scene->intersect(ray, hit);
+ hit_t hit; scene->intersect<false>(ray, hit);
  const double diffuse = (hit.t == infinity) ? 0. : -hit.n.dot(light);
  if (diffuse <= 0.) return 0.;
  const ray_t sray(ray.o + (ray.d*hit.t) + (hit.n*epsilon), -light);
- hit_t shit; scene->intersect(sray,shit);
+ hit_t shit; scene->intersect<true>(sray,shit);
  return (shit.t == infinity) ? diffuse : 0.;
 }
 int main(int argc, char *argv[]) {

# time ./miniray >pix.ppm

real    0m4.856s
user    0m4.812s
sys     0m0.004s

> The OCaml might be significantly easier to understand if suitable
> higher-order functions can be factored out which make the memory
layout
> "look" like a tree.

I'm absolutely not qualified to comment on OCaml unfortunately :)

> >> Can you describe how a new kind of object/bound would be
implemented?
> > A BVH is fine for that kind of scenes, but it's all in the
construction
> > of the hierarchy (or put another way, in its quality).
> > Now i seriously doubt i could cram a SAH compiler within 100 LOC :)

> What is an SAH compiler?

Surface Area Heuristic, a way to evaluate costs when building a
hierarchy.
Does wonders on BVH, kd-tree... you name it. But it's tricky to get it
right and even more to make it not behave quadratically.
Havran's thesis is a must read, Wald builds upon it (but there's a
metric ton of litterature about it).
http://www.cgg.cvut.cz/~havran/phdthesis.html
http://www.mpi-sb.mpg.de/~wald/PhD/

> > Like i've said earlier exposing things to the compiler is the name
of
> > the game.

> Yes. Although this is only useful to other people if it has a
description of
> what information is being provided to the compiler. It would be nice
to
> write this up.

I'm pedagogically challenged :)
On top of peremptoric statements like "know the language" and "know
your compiler", it boils down to things like const correctness,
tediously constraining things and so on.
Or put another way, compilers are mental midgets and C++ is a language
made for compilers and not humans.

Ok, that wasn't really helpful.
But i'm sure someone already has put that in a nicer way somewhere, no?
:)

> >> In what way is it unfair?
> > Because you obstructed the c++ compiler in many many ways,

> I think the OCaml compiler was equivalently obstructed.

I can't say. But for sure, your C++ "style" was not typical.

> > the most
> > prominent being passing structures by value instead of by reference
(as
> > show in previous patches).

> I think the compiler could (should) have spotted that optimisation
without
> having to be told.

That's wishful thinking. And it really really doesn't work that way.
And don't blame g++ for that, as it's actually quite good at that game.
But that's another topic.

> > That's why i've also said the comparison was rigged as you didn't
put
> > as much effort on the c++ side as on the OCaml; it's not surprising
my
> > version is ~2x faster than the original.

> Well, I have a more optimised OCaml version which is just as fast on
x86 and
> 20 lines shorter.

Bring it on ;)

> > Now if you present an amended version that at least fixes those
issues,
> > i'll have no objections to whatever conclusion you can come up to.

> I'll try to write another page describing both optimised versions. I
also
> have a C version now, which is slower than both the C++ and the

OCaml. :-)
Hah.

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tbp  
View profile   Translate to Translated (View Original)
 More options 11 May 2005, 18:01
Newsgroups: comp.graphics.rendering.raytracing
From: "tbp" <tbp...@gmail.com>
Date: 11 May 2005 10:01:29 -0700
Local: Wed 11 May 2005 18:01
Subject: Re: Mini ray tracer
Jon Harrop wrote:
> I agree, actually. I know nothing of cache coherency in the context
of ray
> tracers though.

First, i'd like to point out that those SIMD rt not only take care
about mem/cache coherency, but also _rays_ coherency (dunno if that was
clear).

Then cache coherency isn't an issue in your app, at least at the
default level.
Though, better alignement would certainly help wrt doubles on x86.

> > There's certainly lots of angles to attack that problem
> > (algorithmically or not), but i doubt any of them could fit within
100
> > LOC.

> Yes, it wouldn't make for a very good introductory tutorial either.
Although
> I'm sure a lot of people (me included) would be interested.

Jacco Bikker has produced some interesting articles on flipcode about
that.
But his articles are now lagging way behind his own implementation.
He's a lazy bastard :)

I've implemented, both a BVH & kdtree based coherent raytracer, but
even the BVH one (which is simpler) would require lots of background
description before addressing the meaty stuff.
That's quite infortunate (and the litterature on the topic is
absolutely horrible to read).

> > Beside C++ idiomatic issues, a better memory layout could certainly
> > help.
> > That's, perhaps, the lowest hanging fruit (that or 'cheating' with
> > lower numerical precision, ie for square roots).

> It would be interesting to compare equivalently optimised C++ and
OCaml
> programs based upon float arrays. Handling alignment will be tricky
in
> OCaml.

Float arrays? I fail to grok the context. Could you elaborate a bit?

> >> I wouldn't call this kind of stuff C++:

> >>   a = _mm_load_ps(inp_sse1);
> >>   b = _mm_load_ps(inp_sse2);
> >>   c = _mm_add_ps(a, b);
> >>   _mm_store_ps(out_sse, c);
> > But that's legit C++.

> Is that in the C++ standard or is it a common extension?

Common x86 extension yes (seen in gcc, msvc, icc etc).
An altivec equivalent is common on PPC platforms, i think.
Admitedly, it's in phase with the "C as a glorified assembler" i
alluded to earlier.

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Warp  
View profile   Translate to Translated (View Original)
 More options 11 May 2005, 19:56
Newsgroups: comp.graphics.rendering.raytracing
From: Warp <w...@cs.tut.fi>
Date: Wed, 11 May 2005 18:56:45 +0000 (UTC)
Local: Wed 11 May 2005 19:56
Subject: Re: Mini ray tracer

Jon Harrop <use...@jdh30.plus.com> wrote:
> No, that is my point. It is common, not rare.

  So effectively crappy design is common, not rare. However, we all knew
that already, of course.

--
                                                          - Warp


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tbp  
View profile   Translate to Translated (View Original)
 More options 12 May 2005, 16:14
Newsgroups: comp.graphics.rendering.raytracing
From: "tbp" <tbp...@gmail.com>
Date: 12 May 2005 08:14:13 -0700
Local: Thurs 12 May 2005 16:14
Subject: Re: Mini ray tracer

tbp wrote:
> A quick hack at that (within the rules) shaved a second of runtime
> here.
[snip]
> # time ./miniray >pix.ppm

> real    0m4.856s
> user    0m4.812s
> sys     0m0.004s

I'm replying to myself (don't you love that), because while tinkering
with it, again - that's quite addictive -, i've noticed 2 things:
a) the original deallocation routine, Group::del(), that i've
shamelessly re-used is bogus and only delete the first level.
b) more importantly, the BVH can be tighten a bit: at a given level, if
you place a sphere of radius r centered at v(x,y,z), then instead of
bounding the sub hierarchy with sphere(v, 3*r) you can bound it with
sphere(v+(0,r,0), 2.4330128*r).

Then,
# time ./miniray >pix.ppm
real    0m3.734s
user    0m3.721s
sys     0m0.003s

Voilà.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile   Translate to Translated (View Original)
 More options 13 May 2005, 20:42
Newsgroups: comp.graphics.rendering.raytracing
From: Jon Harrop <use...@jdh30.plus.com>
Date: Fri, 13 May 2005 20:42:02 +0100
Local: Fri 13 May 2005 20:42
Subject: Re: Mini ray tracer

tbp wrote:
> tbp wrote:
>> A quick hack at that (within the rules) shaved a second of runtime
>> here.
> [snip]
>> # time ./miniray >pix.ppm

>> real    0m4.856s
>> user    0m4.812s
>> sys     0m0.004s

> I'm replying to myself (don't you love that), because while tinkering
> with it, again - that's quite addictive -, i've noticed 2 things:
> a) the original deallocation routine, Group::del(), that i've
> shamelessly re-used is bogus and only delete the first level.

Oops, yes. How thoughtless of you to copy my mistake. :-)

> b) more importantly, the BVH can be tighten a bit: at a given level, if
> you place a sphere of radius r centered at v(x,y,z), then instead of
> bounding the sub hierarchy with sphere(v, 3*r) you can bound it with
> sphere(v+(0,r,0), 2.4330128*r).

Yes, I hadn't optimised that either (just copied the idea from the real
sphere-flake, for which the easy approach is more excusable). :-)

> Then,
> # time ./miniray >pix.ppm
> real    0m3.734s
> user    0m3.721s
> sys     0m0.003s

> Voilà.

I'm surprised it makes such a big difference. I shall check it out in more
detail. I really have spent too much time playing with this now. :-)

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tbp  
View profile   Translate to Translated (View Original)
 More options 14 May 2005, 11:45
Newsgroups: comp.graphics.rendering.raytracing
From: "tbp" <tbp...@gmail.com>
Date: 14 May 2005 03:45:48 -0700
Local: Sat 14 May 2005 11:45
Subject: Re: Mini ray tracer
Jon Harrop wrote:
> Oops, yes. How thoughtless of you to copy my mistake. :-)

Let's say it was a feature.

> I'm surprised it makes such a big difference. I shall check it out in
more
> detail. I really have spent too much time playing with this now. :-)

All the magic is in the hierarchy.
Now as i couldn't see how i could pack in the code for a better bvh and
this tree is complete i've addressed other points this morning:
. i turn the tree into an array with skip pointers - that can be done
in 1 pass, the tree is complete - while i build in place.
. then i can iterate over it when tracing.
[uglified 100 LOC source annexed]
[also here http://ompf.org/ray/bvh100.cpp]

So i was at

> > # time ./miniray >pix.ppm
> > real    0m3.734s
> > user    0m3.721s
> > sys     0m0.003s

and now
# g++ -O2 -ffast-math -o bvh bvh100.cpp
# time ./bvh >pix.ppm
real    0m2.985s
user    0m2.966s
sys     0m0.005s

but
# cmp pix.orig.ppm pix.ppm
pix.orig.ppm pix.ppm differ: byte 1040, line 4
even if a diff between the two results in a black picture for the eye.

It's not as fast as it could, but the source is stuffed enough as it is
(cough).

// sphere bvh raytracer v0.3, (c) thierry berger-perrin 2005.
// keywords: sphere, bvh, iterative, array form, skip pointers.
#include <cmath>
#include <iostream>
#define EXPECT(a)    __builtin_expect((a),true) // (a)
struct vec_t {
  double x,y,z; vec_t() {}
  vec_t(double a,double b,double c): x(a),y(b),z(c) {}
  vec_t operator+(const vec_t &v) const { return
vec_t(x+v.x,y+v.y,z+v.z); }
  vec_t operator-(const vec_t &v) const { return
vec_t(x-v.x,y-v.y,z-v.z); }
  vec_t operator-() const { return vec_t(-x,-y,-z); }
  vec_t operator*(const double d) const { return vec_t(x*d,y*d,z*d); }
  double dot(const vec_t &v) const { return x*v.x + y*v.y + z*v.z; }
  double magsqr() const { return dot(*this); }
  vec_t norm() const { return *this*(1./sqrt(magsqr())); } };
static const vec_t light(vec_t(-1, -3, 2).norm());
static const double infinity = 1./0,  epsilon = 1e-12;
struct ray_t {
  vec_t o, d; ray_t(const vec_t &v) : o(v) {}
  ray_t(const vec_t &v, const vec_t &w) : o(v),d(w) {} };
struct hit_t { vec_t n; double t; hit_t() : n(vec_t(0,0,0)),
t(infinity) {} };
struct sphere_t { vec_t  o; double  r; sphere_t() {}
  sphere_t(const vec_t &v, double d) : o(v),r(d) {} // 1/r, constant
  vec_t get_normal(const vec_t &v) const { return (v-o)*(1./r); }
  double intersect(const ray_t &ray) const {
    const vec_t v(o-ray.o);
    const double b = ray.d.dot(v), disc = b*b - v.magsqr() + r*r;
    if (disc < 0.) return infinity; // branch away from the square root
    const double d = sqrt(disc), t2 = b + d, t1 = b - d;
    if (t2 < 0.) return infinity; else return (t1 > 0. ? t1 : t2); //
cond mv.

} };

struct node_t; static node_t *pool = 0, *end = 0;
struct node_t { // used in array form + skip for navigation.
  sphere_t bound, leaf; long diff; // gah, 2*32 + some. diff should be
moved.
  node_t() {}
  node_t(const sphere_t &b,const sphere_t &l, const int jump)
  : bound(b), leaf(l), diff(jump) {}
  template<bool shadow> static void intersect(const ray_t &ray, hit_t
&hit) {
    const node_t * __restrict__ p = pool; //const node_t * p = pool;
    while (p < end) {
      if (EXPECT(p->bound.intersect(ray) >= hit.t)) // missed bound
        p = (p->diff > 0) ? p+p->diff : end;        // skip
      else {
        const double t = p->leaf.intersect(ray);
        if (t < hit.t) {  // if hit, update, then break for shadows.
          hit.t = t; if (shadow) break;
          hit.n = p->leaf.get_normal(ray.o + ray.d*t);
        }
        ++p;  // next
} } } };

static double ray_trace(const node_t * const scene, const ray_t &ray) {
  hit_t hit; scene->intersect<false>(ray, hit); // trace primary
  const double diffuse = (hit.t == infinity) ? 0. : -hit.n.dot(light);
  if (diffuse <= 0.) return 0.;
  const ray_t sray(ray.o + (ray.d*hit.t) + (hit.n*epsilon), -light);
  hit_t shit; scene->intersect<true>(sray,shit); // trace shadow
  return (shit.t == infinity) ? diffuse : 0.; }
static node_t *create(node_t *node, const int level, int dist,
                        const double r, const vec_t &v)
{ if (level > 1) { // directly write the tree as an array via placement
new.
    node = 1+new (node) node_t(
      sphere_t(v+vec_t(0,r,0),2.4330128*r),sphere_t(v, r), dist+1);
    dist = std::max((dist-4)/4, 0); // *(distance>1?4:0) + 4;
    const double rn = (3/sqrt(12.))*r;
    for (int dz=-1; dz<=1; dz+=2) { for (int dx=-1; dx<=1; dx+=2) { //
go down
        node = create(node, level-1,
dist,0.5*r,v+vec_t(-dx*rn,+rn,-dz*rn));
    } } return node;
  } else return 1+new (node) node_t(sphere_t(v, r),sphere_t(v, r),1);
// final
}

template <int ss> static void trace_grid_aa(const int width, const int
height) {
  ray_t ray(vec_t(0, 0, -4)); enum { ss_sqr = ss*ss };
  const double w = width, h = height, scale = 256. / double(ss_sqr);
  vec_t grid[ss*ss]; // pre compute a ss² regular grid perturbation
  { const double rcp_ss = 1./double(ss), halfw = w/2, halfh = h/2;
    int idx = 0;
    for (int dx=0; dx<ss; ++dx) for (int dy=0; dy<ss; ++dy) {
      grid[idx] = vec_t(double(dx)*rcp_ss-halfw,
double(dy)*rcp_ss-halfh, 0);
    ++idx;
  } } // ok, now let's get back to work
  vec_t v(0,w-1, std::max(w,h));  // scan line
  for (int i=height-1; i>=0; --i) { for (int j=width-1; j>=0; --j) {
      double g=0; for (int idx = 0; idx < ss_sqr; ++idx) {        //
apply grid
        ray.d = (v+grid[idx]).norm(); g += ray_trace(pool, ray); }//
trace
      std::cout << int(scale * g) << " "; v.x += 1;  // next pixel
    } v.x = 0; v.y -= 1; }  // next line
  std::cout << std::endl;
}

int main(int argc, char *argv[]) {
  enum { w = 512, h = 512, ss = 4, alignment = 64 };
  const int level = (argc==2 ? atoi(argv[1]) : 6);
  int count = 4, dec = level; while (--dec > 1) count = (count*4) + 4;
  pool = new node_t[count]; end = pool + count; // raw
  create(pool, level, count, 1., vec_t(0, -1, 0)); // cooked
  std::cout << "P2\n" << w << " " << h << "\n256\n";
  trace_grid_aa<ss>(w,h); return 0;  // served


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile   Translate to Translated (View Original)
 More options 16 May 2005, 04:21
Newsgroups: comp.graphics.rendering.raytracing
From: Jon Harrop <use...@jdh30.plus.com>
Date: Mon, 16 May 2005 04:21:07 +0100
Local: Mon 16 May 2005 04:21
Subject: Re: Mini ray tracer

> All the magic is in the hierarchy.
> Now as i couldn't see how i could pack in the code for a better bvh and
> this tree is complete i've addressed other points this morning:

What data structures and algorithms would you recommend? Perhaps they could
be implemented in the OCaml version...

> . i turn the tree into an array with skip pointers - that can be done
> in 1 pass, the tree is complete - while i build in place.
> . then i can iterate over it when tracing.

> > # time ./miniray >pix.ppm
> > real    0m3.734s
> > user    0m3.721s
> > sys     0m0.003s  
> and now
> # g++ -O2 -ffast-math -o bvh bvh100.cpp
> # time ./bvh >pix.ppm
> real    0m2.985s
> user    0m2.966s
> sys     0m0.005s

On 1.2GHz Athlon t-bird, the unoptimised C++ on my webpage took:

$ g++-3.4 -Wall -O2 -ffast-math ray.cpp -o ray
$ time ./ray >image.pgm

real    1m51.071s
user    1m31.510s
sys     0m0.180s

and the unoptimised OCaml took:

$ ocamlopt -inline 100 -ffast-math ray11.ml -o ray11
$ time ./ray10 >image.pgm

real    0m53.916s
user    0m40.770s
sys     0m0.120s

Your latest version takes:

$ g++-3.4 -Wall -O2 -ffast-math bvh100.cpp -o bvh100
bvh100.cpp:17: warning: division by zero in `1.0e+0 / 0'
$ time ./bvh100 >image.pgm

real    0m18.492s
user    0m16.030s
sys     0m0.070s

> but
> # cmp pix.orig.ppm pix.ppm
> pix.orig.ppm pix.ppm differ: byte 1040, line 4
> even if a diff between the two results in a black picture for the eye.

A couple of points here:

1. I think the -ffast-math can cause gcc to violate IEEE-complaint FP
arithmetic, in which case it could theoretically alter the output and could
be banned as a consequence.

2. If we're happy to produce similar-looking results then switching to
single-precision takes over 40% off the time (on my T-bird, probably not on
AMD64):

$ g++-3.4 -Wall -O2 -ffast-math bvh100_2.cpp -o bvh100_2
bvh100_2.cpp:17: warning: division by zero in `1.0e+0 / 0'
$ time ./bvh100_2 >image2.pgm

real    0m11.282s
user    0m9.380s
sys     0m0.050s

For the shootout, the results will probably have to be identical. I'm more
than happy to generate correct-looking results (as this is the point of a
ray tracer!) for my web page though.

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tbp  
View profile   Translate to Translated (View Original)
 More options 16 May 2005, 10:45
Newsgroups: comp.graphics.rendering.raytracing
From: "tbp" <tbp...@gmail.com>
Date: 16 May 2005 02:45:19 -0700
Local: Mon 16 May 2005 10:45
Subject: Re: Mini ray tracer
Jon Harrop wrote:
> > All the magic is in the hierarchy.

> What data structures and algorithms would you recommend? Perhaps they
could
> be implemented in the OCaml version...

The present one has some nice properties if only for its 'compactness'.
Forgetting about implementation complexity for a moment, there's 2
angles:
a) the nature of the hierarchy; could use a kd tree (tiny nodes, cheap
intersection, efficient culling) or a box bvh (maybe tighter); both can
be cheaply traversed iteratively.
b) regardless of the hierarchy type, use a cost driven scheme to build
it; ultimately the full SAH thing.

That would give us a better hierarchy, but then it will certainly take
more time to build it than to render... (ie my best SAH kdtree compiler
takes ~30s to compile a 250k triangle soup).

I think my iterative version could be ammended for better memory
alignment (at least on x86) just by splitting the node_t struct into 2
spheres on a side (2*32bytes) and skip pointers on the other. Would
lend to some uglyfication. Haven't checked what it would give with
floats.

On top of all that, you can also think of shooting more than 1 ray at
once and things like that ;)

> On 1.2GHz Athlon t-bird, the unoptimised C++ on my webpage took:
> $ g++-3.4 -Wall -O2 -ffast-math bvh100.cpp -o bvh100
> bvh100.cpp:17: warning: division by zero in `1.0e+0 / 0'
> $ time ./bvh100 >image.pgm

> real    0m18.492s
> user    0m16.030s
> sys     0m0.070s

Something's wrong. That's >6x slower than on my box, an opteron 146
(2ghz), which is supposedly only, say, 2x faster than yours.

But i'm using a gcc snapshot in 64bit mode:
a) there's more registers.
b) more importantly, SSE math is used (not an option with doubles on
your t-bird).

> A couple of points here:

> 1. I think the -ffast-math can cause gcc to violate IEEE-complaint FP
> arithmetic, in which case it could theoretically alter the output and
could
> be banned as a consequence.

There's much more "violation" when using the x86 fpu (because of the
internal 80bit representation etc) than SSE.

> 2. If we're happy to produce similar-looking results then switching
to
> single-precision takes over 40% off the time (on my T-bird, probably
not on
> AMD64):

It would help with the memory bandwidth. Other than that, only the sqrt
would be a bit faster.
Try changing the fpu precision on your t-bird.

> For the shootout, the results will probably have to be identical. I'm

more
Yes.

> than happy to generate correct-looking results (as this is the point
of a
> ray tracer!) for my web page though.

Yesterday i've done some experimentation to see if i could generate a
nicer scene (with that bvh in array yada yada).
Got:
http://ompf.org/ray/boo8.png
http://ompf.org/ray/ohh12.png
http://ompf.org/ray/blossom12.png

and finally, http://ompf.org/ray/holy_cabbage_of_antioch2.png
That final picture took 6.8s with a 2x2 RGSS.
I haven't tried to compact the source to 100 LOC yet, but i think it's
possible  :)


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Warp  
View profile   Translate to Translated (View Original)
 More options 16 May 2005, 11:15
Newsgroups: comp.graphics.rendering.raytracing
From: Warp <w...@cs.tut.fi>
Date: Mon, 16 May 2005 10:15:24 +0000 (UTC)
Local: Mon 16 May 2005 11:15
Subject: Re: Mini ray tracer

Jon Harrop <use...@jdh30.plus.com> wrote:
> 1. I think the -ffast-math can cause gcc to violate IEEE-complaint FP
> arithmetic, in which case it could theoretically alter the output and could
> be banned as a consequence.

  I have my doubts that it will cause any difference in the final image
(given that you round numbers correctly). Those optimizations, if they
cause any computation to give a different result, probably cause the
difference in the 15th decimal or so.

  I don't know what kind of optimizations -ffast-math does, but one
that I could imagine is this kind:

  double d = 0.1234;
  double a = x/d, b = y/d, c = z/d;

  That could be probably optimized internally to:

  double d = 1/0.1234;
  double a = x*d, b = y*d, c = z*d;

  Due to how floating point works a, b and c might get just a slightly
different value (the difference would probably be in the least significant
decimal or the two least, which with doubles is something like the 15th
decimal or so).

--
                                                          - Warp


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Francois Labreque  
View profile   Translate to Translated (View Original)
 More options 19 May 2005, 00:14
Newsgroups: comp.graphics.rendering.raytracing
From: Francois Labreque <flabre...@videotron.ca>
Date: Wed, 18 May 2005 19:14:21 -0400
Local: Thurs 19 May 2005 00:14
Subject: Re: Mini ray tracer

Jon Harrop wrote:

> and the unoptimised OCaml took:

> $ ocamlopt -inline 100 -ffast-math ray11.ml -o ray11
> $ time ./ray10 >image.pgm

> real    0m53.916s
> user    0m40.770s
> sys     0m0.120s

Is there a reason why you're compiling "ray11" and then running "ray10"
for the test?

--
Francois Labreque | The surest sign of the existence of extra-
     flabreque     | terrestrial intelligence is that they never
         @         | bothered to come down here and visit us!
   videotron.ca    |                             - Calvin


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile   Translate to Translated (View Original)
 More options 21 May 2005, 21:32
Newsgroups: comp.graphics.rendering.raytracing
From: Jon Harrop <use...@jdh30.plus.com>
Date: Sat, 21 May 2005 21:32:53 +0100
Local: Sat 21 May 2005 21:32
Subject: Re: Mini ray tracer

Francois Labreque wrote:
> Jon Harrop wrote:

>> and the unoptimised OCaml took:

>> $ ocamlopt -inline 100 -ffast-math ray11.ml -o ray11
>> $ time ./ray10 >image.pgm

>> real    0m53.916s
>> user    0m40.770s
>> sys     0m0.120s

> Is there a reason why you're compiling "ray11" and then running "ray10"
> for the test?

Oops. :-)

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile   Translate to Translated (View Original)
 More options 25 May 2005, 08:46
Newsgroups: comp.graphics.rendering.raytracing
From: Jon Harrop <use...@jdh30.plus.com>
Date: Wed, 25 May 2005 08:46:21 +0100
Local: Wed 25 May 2005 08:46
Subject: Re: Mini ray tracer

tbp wrote:
> On top of all that, you can also think of shooting more than 1 ray at
> once and things like that ;)

I might try this if I can find the time.

>> On 1.2GHz Athlon t-bird, the unoptimised C++ on my webpage took:
>> $ g++-3.4 -Wall -O2 -ffast-math bvh100.cpp -o bvh100
>> bvh100.cpp:17: warning: division by zero in `1.0e+0 / 0'
>> $ time ./bvh100 >image.pgm

>> real    0m18.492s
>> user    0m16.030s
>> sys     0m0.070s
> Something's wrong. That's >6x slower than on my box, an opteron 146
> (2ghz), which is supposedly only, say, 2x faster than yours.

I think gcc is suffering from poor FP code generation. Interestingly, it is
usually ocamlopt that suffers from this and not gcc. I've no idea why but
this ray tracer is the only program I've ever written which produces these
kind of results (faster OCaml on x86).

Someone has kindly ported it to Fortran on the shootout now. For the same
functionality, the Fortran is much faster than both the C++ and the OCaml
(and the SML compiled with Mlton, which is actually significantly faster
than C++).

> But i'm using a gcc snapshot in 64bit mode:
> a) there's more registers.
> b) more importantly, SSE math is used (not an option with doubles on
> your t-bird).

My AMD64 is also 6x faster (it is supposed to be 3x faster, according to
other benchmarks).

>> A couple of points here:

>> 1. I think the -ffast-math can cause gcc to violate IEEE-complaint FP
>> arithmetic, in which case it could theoretically alter the output and
> could
>> be banned as a consequence.
> There's much more "violation" when using the x86 fpu (because of the
> internal 80bit representation etc) than SSE.

True.

>> 2. If we're happy to produce similar-looking results then switching
> to
>> single-precision takes over 40% off the time (on my T-bird, probably
> not on
>> AMD64):
> It would help with the memory bandwidth. Other than that, only the sqrt
> would be a bit faster.
> Try changing the fpu precision on your t-bird.

It doubles the performance. However, it almost halves the performance on
AMD64.

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tbp  
View profile   Translate to Translated (View Original)
 More options 25 May 2005, 16:30
Newsgroups: comp.graphics.rendering.raytracing
From: "tbp" <tbp...@gmail.com>
Date: 25 May 2005 08:30:01 -0700
Local: Wed 25 May 2005 16:30
Subject: Re: Mini ray tracer
Jon Harrop wrote:
> I think gcc is suffering from poor FP code generation. Interestingly, it is
> usually ocamlopt that suffers from this and not gcc. I've no idea why but
> this ray tracer is the only program I've ever written which produces these
> kind of results (faster OCaml on x86).

I haven't checked the 32bit side.
I guess it's hitting some weakspot in gcc; that might be interesting to
report.

It's interesting to note that on x86-64 ICC 8.1 & 9.0beta can't beat
gcc (or more precisely a 4.1 snapshot).

> Someone has kindly ported it to Fortran on the shootout now. For the same
> functionality, the Fortran is much faster than both the C++ and the OCaml
> (and the SML compiled with Mlton, which is actually significantly faster
> than C++).

I guess you really mean faster on x86 (as opposed to x86-64), and then
i would put the blame on the 32bit fpu codegen thing too.
I mean C/C++ being as low level as it is, it should be faster given a
proper implementation; uglier, hackish, verbose etc but not slower :)

> > Try changing the fpu precision on your t-bird.

> It doubles the performance. However, it almost halves the performance on
> AMD64.

Huh?
On x86-64, SSE should be used and that op. set is impervious to FPU
precision modes.
If you're talking about swapping doubles for floats, i've checked that
and i get a marginal speed improvement (after fixing stuff & memory
layout).
So, can you elaborate a bit?

Last week i've cleaned things a bit and put stub here,
http://ompf.org/ray/sphereflake/, for a 100 LOC C++ version that
generates a sphere flake, or something looking closer to the real
thing.
I've settled for a 2x2 rotated grid to get decent runtime while
preserving image quality.

Regards,
tbp.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Darren New  
View profile   Translate to Translated (View Original)
 More options 25 May 2005, 18:03
Newsgroups: comp.graphics.rendering.raytracing
From: Darren New <d...@san.rr.com>
Date: Wed, 25 May 2005 17:03:59 GMT
Local: Wed 25 May 2005 18:03
Subject: Re: Mini ray tracer

tbp wrote:
>>Someone has kindly ported it to Fortran on the shootout now.
> I mean C/C++ being as low level as it is, it should be faster given a
> proper implementation; uglier, hackish, verbose etc but not slower :)

FORTRAN has various language rules (such as disallowing aliasing,
allowing better optimizations) that C and C++ don't have. (FORTRAN
originally was also very restricted in the forms that array indicies
could take, ensuring the compiler could make use of index registers, for
example.)  So, no, FORTRAN is often faster than C for the same program.

--
   Darren New / San Diego, CA, USA (PST)
     The samba was clearly inspired
     by the margarita.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tbp  
View profile   Translate to Translated (View Original)
 More options 25 May 2005, 20:58
Newsgroups: comp.graphics.rendering.raytracing
From: "tbp" <tbp...@gmail.com>
Date: 25 May 2005 12:58:55 -0700
Local: Wed 25 May 2005 20:58
Subject: Re: Mini ray tracer
Darren New wrote:
> FORTRAN has various language rules (such as disallowing aliasing,
> allowing better optimizations) that C and C++ don't have. (FORTRAN
> originally was also very restricted in the forms that array indicies
> could take, ensuring the compiler could make use of index registers, for
> example.)  So, no, FORTRAN is often faster than C for the same program.

But now C/C++ comes equiped with 'restrict' and generally a whole range
of tightening aliasing rules as compiler options (with said compilers
being smarter & smarter about aliasing); no silver bullet but it shows
some progress on that side.

Anyway, if you said C/C++ was ugly, borked, byzantine or unproductive i
would have agreed. But when it comes to runtime performance, i simply
can't buy it: you can't get much lower than that without writing
straight asm (with the corrolary that you may have to express your
problem in compiler friendly way)... it all boils down to the
implementation's quality.

PS: i really didn't meant to start yet another sterile religious debate
about languages :)


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Harrop  
View profile   Translate to Translated (View Original)
 More options 30 May 2005, 17:52
Newsgroups: comp.graphics.rendering.raytracing
From: Jon Harrop <use...@jdh30.plus.com>
Date: Mon, 30 May 2005 17:52:50 +0100
Local: Mon 30 May 2005 17:52
Subject: Re: Mini ray tracer

Darren New wrote:
> tbp wrote:
>>>Someone has kindly ported it to Fortran on the shootout now.

>> I mean C/C++ being as low level as it is, it should be faster given a
>> proper implementation; uglier, hackish, verbose etc but not slower :)

> FORTRAN has various language rules (such as disallowing aliasing,
> allowing better optimizations) that C and C++ don't have. (FORTRAN
> originally was also very restricted in the forms that array indicies
> could take, ensuring the compiler could make use of index registers, for
> example.)  So, no, FORTRAN is often faster than C for the same program.

Not only is that untrue, it hasn't been true for many years. The main reason
that non-trivial Fortran programs are typically much slower than their
modern counterparts is that the verbosity and difficulty of implementing
more data structures and algorithms in Fortran makes non-trivial approaches
intractable.

Consider to Fortran port of my ray tracer. Not only is the resulting program
3x as long as the OCaml (and by far the longest implementation), not as
general (the scene tree is prohibitively difficult to implement in Fortran
without sacrificing its current genericity) but the resulting program is
not significantly faster than implementations written in modern languages
(slightly faster than OCaml, slightly slower than Mlton-compiled SML).

As TBP has said, this discussion doesn't belong here. If you want to know
more about this then read my book on OCaml for scientists:

  http://www.ffconsultancy.com/products/ocaml_for_scientists/

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages < Older 
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google