I have since ported the program to several other languages, including Java. Currently, the Mlton-compiled SML, OCaml, C++ and Fortran are the fastest, and Java trails a long way behind. I am concerned that this is because I am a much better OCaml/C++ than Java programmer so I'm asking for advice here.
These are my main questions:
1. What major optimisations are missing from my program (e.g. in C++, I pass vectors by reference and try to inline vector operations).
2. Where is the floating-point machine epsilon in the Java libraries?
3. How do you do infix operators in Java?
4. Am I supposed to have a static main function which instantiates a class and invokes a member function of it in order to start the program?
Here's my Java port:
// The Great Computer Language Shootout // http://shootout.alioth.debian.org/ // Contributed by Jon Harrop, 2005 // Compile: javac ray.java
public final class ray { // FIXME: Where is the floating point machine epsilon available in the // Java standard library? double delta = 1.49012e-08, infinity = Float.POSITIVE_INFINITY, pi = Math.PI;
// A 3D vector class Vec { public double x, y, z; public Vec(double x2, double y2, double z2) { x = x2; y = y2; z = z2; } }
Vec zero = new Vec(0, 0, 0);
// Vector arithmetic (FIXME: Use infix operators if possible) Vec add(Vec a, Vec b) { return new Vec(a.x + b.x, a.y + b.y, a.z + b.z); } Vec sub(Vec a, Vec b) { return new Vec(a.x - b.x, a.y - b.y, a.z - b.z); } Vec scale(double s, Vec a) { return new Vec(s*a.x, s*a.y, s*a.z); } double dot(Vec a, Vec b) { return a.x*b.x + a.y*b.y + a.z*b.z; } Vec unitise(Vec a) { return scale(1 / Math.sqrt(dot(a, a)), a); }
// A semi-infinite ray class Ray { public Vec orig, dir; public Ray(Vec o, Vec d) { orig = o; dir = d; } }
// A parametric intersection point and the normal vector at the point of // intersection class Intersect { public double lambda; public Vec normal; public Intersect(double l, Vec n) { lambda = l; normal = n; } }
// Abstract base class representing a node in the scene tree abstract class Scene { abstract public Intersect intersect(Intersect i, Ray ray); }
// Derived class representing a leaf node in the scene tree class Sphere extends Scene { public Vec center; public double radius;
public Sphere(Vec c, double r) { center = c; radius = r; }
// Find the first intersection of the given ray with this sphere public double ray_sphere(Ray ray) { Vec v = sub(center, ray.orig); double b = dot(v, ray.dir), disc = b*b - dot(v, v) + radius * radius; if (disc < 0) return infinity; double d = Math.sqrt(disc), t2 = b + d; if (t2 < 0) return infinity; double t1 = b - d; return (t1 > 0 ? t1 : t2); }
// Accumulate the first intersection of the given ray with this sphere public Intersect intersect(Intersect i, Ray ray) { double l = ray_sphere(ray); if (l >= i.lambda) return i; Vec n = add(ray.orig, sub(scale(l, ray.dir), center)); return new Intersect(l, unitise(n)); } }
// Derived class representing a non-leaf node in the scene tree class Group extends Scene { public Sphere bound; public LinkedList objs;
public Group(Sphere b) { bound = b; // We must initialise to an empty linked list or the null object // exception will be thrown at first use. objs = new LinkedList(); }
// Accumulate the first intersection of the given ray with this group // This function is used both for primary and shadow rays. public Intersect intersect(Intersect i, Ray ray) { double l = bound.ray_sphere(ray); if (l >= i.lambda) return i; // Loop over the list of child nodes, accumulating the result. ListIterator it = objs.listIterator(0); while (it.hasNext()) { Scene scene = (Scene)it.next(); i = scene.intersect(i, ray); } return i; } }
// Trace a single ray into the scene double ray_trace(Vec light, Ray ray, Scene scene) { Intersect i = scene.intersect(new Intersect(infinity, new Vec(0, 0, 0)), ray); if (i.lambda == infinity) return 0; Vec o = add(ray.orig, add(scale(i.lambda, ray.dir), scale(delta, i.normal))); double g = -dot(i.normal, light); // If we are on the shadowed side of a sphere then don't bother casting a // shadow ray as we know it will intersect the same sphere. if (g <= 0) return 0.; Ray sray = new Ray(o, sub(new Vec(0, 0, 0), light)); Intersect si = scene.intersect(new Intersect(infinity, i.normal), sray); return (si.lambda == infinity ? g : 0); }
// Recursively build the scene tree Scene create(int level, double r, Vec c) { Sphere sphere = new Sphere(c, r); if (level == 1) return sphere; Group group = new Group(new Sphere(c, 3*r)); group.objs.addFirst(sphere); double rn = 3*r/Math.sqrt(12); for (int dz=-1; dz<=1; dz+=2) for (int dx=-1; dx<=1; dx+=2) { Vec c2 = new Vec(c.x-dx*rn, c.y+rn, c.z-dz*rn); group.objs.addFirst(create(level-1, r/2, c2)); } return group; }
// Build a scene and trace many rays into it, outputting a PGM image void run(int n, int level, int ss) { // Scene tree Scene scene = create(level, 1, new Vec(0, -1, 0));
System.out.print("P5\n"+n+" "+n+"\n255\n"); for (int y=n-1; y>=0; --y) for (int x=0; x<n; ++x) { double g=0; for (int dx=0; dx<ss; ++dx) for (int dy=0; dy<ss; ++dy) { // We use "dx*1." instead of "double(dx)" to save space Vec d = new Vec(x+dx*1./ss-n/2., y+dy*1./ss-n/2., n); Ray ray = new Ray(new Vec(0, 0, -4), unitise(d)); g += ray_trace(unitise(new Vec(-1, -3, 2)), ray, scene); } System.out.print((char)(.5+255*g/(ss*ss))); } }
public static void main(String[] args) { (new ray()).run(Integer.parseInt(args[0]), 6, 4); }
> I have since ported the program to several other languages, including Java. > Currently, the Mlton-compiled SML, OCaml, C++ and Fortran are the fastest, > and Java trails a long way behind. I am concerned that this is because I am > a much better OCaml/C++ than Java programmer so I'm asking for advice here.
The first run of this code is done in interprete-mode. If you run it a few times, things get faster. Here is my benchmark (with n=32)
Jon Harrop wrote: > I've written a mini ray tracer for the computer language shootout. The > original version was written in OCaml which I ported to C++:
You've managed to squeeze down the C++ to the point where it's almost as short as the OCaml source ! I wouldn't have thought that was possible. I admit that I don't like the resulting code, but I do feel that you deserve some sort of award nevertheless ;-)
> 1. What major optimisations are missing from my program (e.g. in C++, I > pass vectors by reference and try to inline vector operations).
I think you have a bug -- the value that you parse out of the arguments to the program is then used as the parameter 'n' to run() (which I think is used to set the size of the generated image); whereas your C++ and OCaml codes' equivalents are used as the 'level' parameter.
With that corrected, I'm finding that your Java code runs faster than your C++ code -- by a factor of nearly two.
I'm comparing the Java running on the server JVM from Sun's JDK 1.5 (on WInXP) compared against with the C++ code compiled using all the optimisations I can think of with either M$ VC 6 or MS VS 2003 (which generate significantly different code sometimes). In my limited experience, the GNU C++ compiler does not usually produce code that is significantly faster or slower than that pair.
The server JVM (java -server <classname>) uses significantly heavier optimisation than the client JVM (java -client <classname>), although the client JVM is the default on most desktop systems.
Some numbers, measured on this 1.5 GHz laptop (with some sort of Intel chip inside...), for values of your 'level' parameter over the range 1 to 10 (i.e. from 1 to about 440K spheres)
The times are in seconds, and were generated from a single run of the program looping internally over the levels, hence the times do not include program startup.
Again, this was generated from a single run, looping internally over the values of 'level'. That might be significant since it means that the Hotspot optimiser will have had more chance to work on the problem than if the java program had been invoked separately 10 times.
Your Java code is non-idomatic in several ways, and does not really resemble how a "normal" OO Java programmer would structure this problem. Since you mention that you don't consider yourself to be a Java programmer, I won't go into details. I did, though, want to mention that I tried re-jigging the code into more a typical OO pattern, and to make more idiomatic use of Java's features (such as static final fields), the point being that it didn't make a significant difference to the performance.
> 2. Where is the floating-point machine epsilon in the Java libraries?
Are you looking for java.lang.Double.MIN_VALUE ? Or for something else ?
> 3. How do you do infix operators in Java?
You don't.
> 4. Am I supposed to have a static main function which instantiates a class > and invokes a member function of it in order to start the program?
You /have/ to start with a static main function (if you are using the standard java loading program -- which you typically will be). If you want to write reasonably OO programs, then you'll want to get some objects created and talking to each other as soon as possible, rather than writting a load of static "functions". So, yes you are.
Jon Harrop <use...@jdh30.plus.com> writes: > 1. What major optimisations are missing from my program (e.g. in C++, I pass > vectors by reference and try to inline vector operations).
I'll leave that for now (meaning: I haven't read the program yet :)
Generally, there are several tricks that allows the compiler and runtime to optimize, like defining variables and field as "final" if they don't change.
I'm sure Googleing for "java optimizing" will give about a gazillion links. I know that Peter Sestoft has a file on Java performance on <URL:http://www.dina.kvl.dk/~sestoft/javaprecisely/> which I remember as being good.
> 2. Where is the floating-point machine epsilon in the Java libraries?
I'm not sure what epsilon is to you, but you might want Double.MIN_VALUE, the smalles positive value representable as a double.
> 3. How do you do infix operators in Java?
You don't. The designers of Java decided early on that they wanted the *syntax* to always mean the same thing, making it easier to read other people's programs (in particular, they didn't want #define, and I guess overriding infix operators were ruled out for the same reason).
> 4. Am I supposed to have a static main function which instantiates a class > and invokes a member function of it in order to start the program?
The static "main" function *does* start the program. You want to call the "run" function from it, and since "run" is not static, you need to create an object before calling it. However, since "run" doesn't use that object, the "run" function could have been made static as well. Likewise the other methods on the class "ray" (classes are traditionally capitalized).
Anyway, I might have time to look at it some more later :)
> Here's my Java port:
[snip] A link to an online file would take less room here :)
Lasse Reichstein Nielsen wrote: > Jon Harrop <use...@jdh30.plus.com> writes: ... >> 2. Where is the floating-point machine epsilon in the >> Java libraries?
> I'm not sure what epsilon is to you, but you might want > Double.MIN_VALUE, the smalles positive value > representable as a double.
That is one definition of machine epsilon.
The other one I've seen is the gap between 1.0 and the smallest number greater than 1.0. For Java doubles, it is Math.ulp(1.0). In general, the gap between a Java double d and the smallest double greater than d is Math.ulp(d).
Jon Harrop <use...@jdh30.plus.com> wrote in message <news:429e5795$0$7560$ed2619ec@ptn-nntp-reader03.plus.net>... > I've written a mini ray tracer for the computer language shootout. The > original version was written in OCaml which I ported to C++:
> I have since ported the program to several other languages, including Java. > Currently, the Mlton-compiled SML, OCaml, C++ and Fortran are the fastest, > and Java trails a long way behind. I am concerned that this is because I am > a much better OCaml/C++ than Java programmer so I'm asking for advice here.
> These are my main questions:
> 1. What major optimisations are missing from my program (e.g. in C++, I pass > vectors by reference and try to inline vector operations).
Without studying your code thoroughly I would suggest you reuse Vector objects (and other objects) in your operations instead of creating a new ones. Even with the latest Hotspot JVM, creating new objects are still way slower than reusing them. For example:
Vec add(Vec a, Vec b) { return new Vec(a.x + b.x, a.y + b.y, a.z + b.z); }
becomes:
Vec add(Vec a, Vec b) { a.x += b.x; a.y += b.y; a.z += b.z; return a;
}
Only create new objects when you need to. This optimization will require some changes in the code structure and makes it slightly harder to read.
Another optimization is to replace the LinkedList with an ArrayList and use .size() and .get() when iterating the list. Iterators are slow.
Also remember to use the -server option when running the program, it can improve performance significantly.
One other thing to try is to use float instead of double (if the accuracy isn't needed). The latest Hotspot uses SSE, and this change may have some effect on SSE optimizations.
Remember that the Hotspot JVM requires a quite long warmup before compiling code, so if the program completes in a short time, you will never reach the performance of C++.
With these optimizations I think the performance will be similar to the C++ version.
> 3. How do you do infix operators in Java?
Do you mean operator overloading? Not supported in Java.
> 4. Am I supposed to have a static main function which instantiates a class > and invokes a member function of it in order to start the program?
In article <429e5795$0$7560$ed261...@ptn-nntp-reader03.plus.net>, Jon Harrop <use...@jdh30.plus.com> wrote:
> I have since ported the program to several other languages, including Java. > Currently, the Mlton-compiled SML, OCaml, C++ and Fortran are the fastest, > and Java trails a long way behind. I am concerned that this is because I am > a much better OCaml/C++ than Java programmer so I'm asking for advice here.
I strongly, strongly suggest finding a profiler. Such a beast can tell you where your CPU cycles are going, which will help you optimize them. If you are on Windows, the NetBeans folks have the JFluid profiler built in, on some builds. On the Mac, the Shark profiler is included with the developer tools. I have had great success with JProfiler, and the JetBrains folks seem to really like YourKit. (The first two are free, the second are for pay.)
As always, try to take a high level look at the profiler's results. For example, if it tells you that one arithmetic operation is the big cost, try to decide whether you can not do the computation, or change the algorithm to do that task fewer times before trying to get a ++ or -- to execute faster.
In Java, the above vague advice usually translates to looking at when and how you are creating objects, and making sure you do not have excessive synchronization overhead. Also, a few Sun classes are quite lame, and worth rewriting. (If, and only if, they are taking appreciable time...)
Then, once you know your algorithm is appropriate, tune the lines of code that matter.
Jon Harrop <use...@jdh30.plus.com> writes: > 1. What major optimisations are missing from my program (e.g. in C++, I pass > vectors by reference and try to inline vector operations).
You create a *lot* of vector objects. That means that you should make creation as fast as possible. That includes having as few fields as possible on the object.
All your nested classes are non-static. That means that they all have a reference to the enclosing *object*. That reference is a field, and it's not used, so it wastes space and therefore time.
Make all your nested classes static. I'm certain that will give you a boost.
Fiddling with the code, I see that you only ever use scaling in conjunction with addition, i.e., a+s*b. I changed addition to taking an extra scaling argument, avoiding one intermediate vector value.
> Here's my Java port:
Here's a modification of it, that is (IMHO) a little prettier and more Java-like. For a real Java program, I wouldn't have all those classes as nested classes at all, they would be in separate files.
Generally, I made everything static that could be made static, and everything final, that could be made final. The latter probably doesn't affect performance much in the long run, but it should allow the JVM to do some optimizations sooner.
I tried not to change the actual algorithm at all.