I have since ported the program to several other languages, including Java. Currently, the Mlton-compiled SML, OCaml, C++ and Fortran are the fastest, and Java trails a long way behind. I am concerned that this is because I am a much better OCaml/C++ than Java programmer so I'm asking for advice here.
These are my main questions:
1. What major optimisations are missing from my program (e.g. in C++, I pass vectors by reference and try to inline vector operations).
2. Where is the floating-point machine epsilon in the Java libraries?
3. How do you do infix operators in Java?
4. Am I supposed to have a static main function which instantiates a class and invokes a member function of it in order to start the program?
Here's my Java port:
// The Great Computer Language Shootout // http://shootout.alioth.debian.org/ // Contributed by Jon Harrop, 2005 // Compile: javac ray.java
public final class ray { // FIXME: Where is the floating point machine epsilon available in the // Java standard library? double delta = 1.49012e-08, infinity = Float.POSITIVE_INFINITY, pi = Math.PI;
// A 3D vector class Vec { public double x, y, z; public Vec(double x2, double y2, double z2) { x = x2; y = y2; z = z2; } }
Vec zero = new Vec(0, 0, 0);
// Vector arithmetic (FIXME: Use infix operators if possible) Vec add(Vec a, Vec b) { return new Vec(a.x + b.x, a.y + b.y, a.z + b.z); } Vec sub(Vec a, Vec b) { return new Vec(a.x - b.x, a.y - b.y, a.z - b.z); } Vec scale(double s, Vec a) { return new Vec(s*a.x, s*a.y, s*a.z); } double dot(Vec a, Vec b) { return a.x*b.x + a.y*b.y + a.z*b.z; } Vec unitise(Vec a) { return scale(1 / Math.sqrt(dot(a, a)), a); }
// A semi-infinite ray class Ray { public Vec orig, dir; public Ray(Vec o, Vec d) { orig = o; dir = d; } }
// A parametric intersection point and the normal vector at the point of // intersection class Intersect { public double lambda; public Vec normal; public Intersect(double l, Vec n) { lambda = l; normal = n; } }
// Abstract base class representing a node in the scene tree abstract class Scene { abstract public Intersect intersect(Intersect i, Ray ray); }
// Derived class representing a leaf node in the scene tree class Sphere extends Scene { public Vec center; public double radius;
public Sphere(Vec c, double r) { center = c; radius = r; }
// Find the first intersection of the given ray with this sphere public double ray_sphere(Ray ray) { Vec v = sub(center, ray.orig); double b = dot(v, ray.dir), disc = b*b - dot(v, v) + radius * radius; if (disc < 0) return infinity; double d = Math.sqrt(disc), t2 = b + d; if (t2 < 0) return infinity; double t1 = b - d; return (t1 > 0 ? t1 : t2); }
// Accumulate the first intersection of the given ray with this sphere public Intersect intersect(Intersect i, Ray ray) { double l = ray_sphere(ray); if (l >= i.lambda) return i; Vec n = add(ray.orig, sub(scale(l, ray.dir), center)); return new Intersect(l, unitise(n)); } }
// Derived class representing a non-leaf node in the scene tree class Group extends Scene { public Sphere bound; public LinkedList objs;
public Group(Sphere b) { bound = b; // We must initialise to an empty linked list or the null object // exception will be thrown at first use. objs = new LinkedList(); }
// Accumulate the first intersection of the given ray with this group // This function is used both for primary and shadow rays. public Intersect intersect(Intersect i, Ray ray) { double l = bound.ray_sphere(ray); if (l >= i.lambda) return i; // Loop over the list of child nodes, accumulating the result. ListIterator it = objs.listIterator(0); while (it.hasNext()) { Scene scene = (Scene)it.next(); i = scene.intersect(i, ray); } return i; } }
// Trace a single ray into the scene double ray_trace(Vec light, Ray ray, Scene scene) { Intersect i = scene.intersect(new Intersect(infinity, new Vec(0, 0, 0)), ray); if (i.lambda == infinity) return 0; Vec o = add(ray.orig, add(scale(i.lambda, ray.dir), scale(delta, i.normal))); double g = -dot(i.normal, light); // If we are on the shadowed side of a sphere then don't bother casting a // shadow ray as we know it will intersect the same sphere. if (g <= 0) return 0.; Ray sray = new Ray(o, sub(new Vec(0, 0, 0), light)); Intersect si = scene.intersect(new Intersect(infinity, i.normal), sray); return (si.lambda == infinity ? g : 0); }
// Recursively build the scene tree Scene create(int level, double r, Vec c) { Sphere sphere = new Sphere(c, r); if (level == 1) return sphere; Group group = new Group(new Sphere(c, 3*r)); group.objs.addFirst(sphere); double rn = 3*r/Math.sqrt(12); for (int dz=-1; dz<=1; dz+=2) for (int dx=-1; dx<=1; dx+=2) { Vec c2 = new Vec(c.x-dx*rn, c.y+rn, c.z-dz*rn); group.objs.addFirst(create(level-1, r/2, c2)); } return group; }
// Build a scene and trace many rays into it, outputting a PGM image void run(int n, int level, int ss) { // Scene tree Scene scene = create(level, 1, new Vec(0, -1, 0));
System.out.print("P5\n"+n+" "+n+"\n255\n"); for (int y=n-1; y>=0; --y) for (int x=0; x<n; ++x) { double g=0; for (int dx=0; dx<ss; ++dx) for (int dy=0; dy<ss; ++dy) { // We use "dx*1." instead of "double(dx)" to save space Vec d = new Vec(x+dx*1./ss-n/2., y+dy*1./ss-n/2., n); Ray ray = new Ray(new Vec(0, 0, -4), unitise(d)); g += ray_trace(unitise(new Vec(-1, -3, 2)), ray, scene); } System.out.print((char)(.5+255*g/(ss*ss))); } }
public static void main(String[] args) { (new ray()).run(Integer.parseInt(args[0]), 6, 4); }
> I have since ported the program to several other languages, including Java. > Currently, the Mlton-compiled SML, OCaml, C++ and Fortran are the fastest, > and Java trails a long way behind. I am concerned that this is because I am > a much better OCaml/C++ than Java programmer so I'm asking for advice here.
The first run of this code is done in interprete-mode. If you run it a few times, things get faster. Here is my benchmark (with n=32)
Jon Harrop wrote: > I've written a mini ray tracer for the computer language shootout. The > original version was written in OCaml which I ported to C++:
You've managed to squeeze down the C++ to the point where it's almost as short as the OCaml source ! I wouldn't have thought that was possible. I admit that I don't like the resulting code, but I do feel that you deserve some sort of award nevertheless ;-)
> 1. What major optimisations are missing from my program (e.g. in C++, I > pass vectors by reference and try to inline vector operations).
I think you have a bug -- the value that you parse out of the arguments to the program is then used as the parameter 'n' to run() (which I think is used to set the size of the generated image); whereas your C++ and OCaml codes' equivalents are used as the 'level' parameter.
With that corrected, I'm finding that your Java code runs faster than your C++ code -- by a factor of nearly two.
I'm comparing the Java running on the server JVM from Sun's JDK 1.5 (on WInXP) compared against with the C++ code compiled using all the optimisations I can think of with either M$ VC 6 or MS VS 2003 (which generate significantly different code sometimes). In my limited experience, the GNU C++ compiler does not usually produce code that is significantly faster or slower than that pair.
The server JVM (java -server <classname>) uses significantly heavier optimisation than the client JVM (java -client <classname>), although the client JVM is the default on most desktop systems.
Some numbers, measured on this 1.5 GHz laptop (with some sort of Intel chip inside...), for values of your 'level' parameter over the range 1 to 10 (i.e. from 1 to about 440K spheres)
The times are in seconds, and were generated from a single run of the program looping internally over the levels, hence the times do not include program startup.
Again, this was generated from a single run, looping internally over the values of 'level'. That might be significant since it means that the Hotspot optimiser will have had more chance to work on the problem than if the java program had been invoked separately 10 times.
Your Java code is non-idomatic in several ways, and does not really resemble how a "normal" OO Java programmer would structure this problem. Since you mention that you don't consider yourself to be a Java programmer, I won't go into details. I did, though, want to mention that I tried re-jigging the code into more a typical OO pattern, and to make more idiomatic use of Java's features (such as static final fields), the point being that it didn't make a significant difference to the performance.
> 2. Where is the floating-point machine epsilon in the Java libraries?
Are you looking for java.lang.Double.MIN_VALUE ? Or for something else ?
> 3. How do you do infix operators in Java?
You don't.
> 4. Am I supposed to have a static main function which instantiates a class > and invokes a member function of it in order to start the program?
You /have/ to start with a static main function (if you are using the standard java loading program -- which you typically will be). If you want to write reasonably OO programs, then you'll want to get some objects created and talking to each other as soon as possible, rather than writting a load of static "functions". So, yes you are.
Jon Harrop <use...@jdh30.plus.com> writes: > 1. What major optimisations are missing from my program (e.g. in C++, I pass > vectors by reference and try to inline vector operations).
I'll leave that for now (meaning: I haven't read the program yet :)
Generally, there are several tricks that allows the compiler and runtime to optimize, like defining variables and field as "final" if they don't change.
I'm sure Googleing for "java optimizing" will give about a gazillion links. I know that Peter Sestoft has a file on Java performance on <URL:http://www.dina.kvl.dk/~sestoft/javaprecisely/> which I remember as being good.
> 2. Where is the floating-point machine epsilon in the Java libraries?
I'm not sure what epsilon is to you, but you might want Double.MIN_VALUE, the smalles positive value representable as a double.
> 3. How do you do infix operators in Java?
You don't. The designers of Java decided early on that they wanted the *syntax* to always mean the same thing, making it easier to read other people's programs (in particular, they didn't want #define, and I guess overriding infix operators were ruled out for the same reason).
> 4. Am I supposed to have a static main function which instantiates a class > and invokes a member function of it in order to start the program?
The static "main" function *does* start the program. You want to call the "run" function from it, and since "run" is not static, you need to create an object before calling it. However, since "run" doesn't use that object, the "run" function could have been made static as well. Likewise the other methods on the class "ray" (classes are traditionally capitalized).
Anyway, I might have time to look at it some more later :)
> Here's my Java port:
[snip] A link to an online file would take less room here :)
Lasse Reichstein Nielsen wrote: > Jon Harrop <use...@jdh30.plus.com> writes: ... >> 2. Where is the floating-point machine epsilon in the >> Java libraries?
> I'm not sure what epsilon is to you, but you might want > Double.MIN_VALUE, the smalles positive value > representable as a double.
That is one definition of machine epsilon.
The other one I've seen is the gap between 1.0 and the smallest number greater than 1.0. For Java doubles, it is Math.ulp(1.0). In general, the gap between a Java double d and the smallest double greater than d is Math.ulp(d).
Jon Harrop <use...@jdh30.plus.com> wrote in message <news:429e5795$0$7560$ed2619ec@ptn-nntp-reader03.plus.net>... > I've written a mini ray tracer for the computer language shootout. The > original version was written in OCaml which I ported to C++:
> I have since ported the program to several other languages, including Java. > Currently, the Mlton-compiled SML, OCaml, C++ and Fortran are the fastest, > and Java trails a long way behind. I am concerned that this is because I am > a much better OCaml/C++ than Java programmer so I'm asking for advice here.
> These are my main questions:
> 1. What major optimisations are missing from my program (e.g. in C++, I pass > vectors by reference and try to inline vector operations).
Without studying your code thoroughly I would suggest you reuse Vector objects (and other objects) in your operations instead of creating a new ones. Even with the latest Hotspot JVM, creating new objects are still way slower than reusing them. For example:
Vec add(Vec a, Vec b) { return new Vec(a.x + b.x, a.y + b.y, a.z + b.z); }
becomes:
Vec add(Vec a, Vec b) { a.x += b.x; a.y += b.y; a.z += b.z; return a;
}
Only create new objects when you need to. This optimization will require some changes in the code structure and makes it slightly harder to read.
Another optimization is to replace the LinkedList with an ArrayList and use .size() and .get() when iterating the list. Iterators are slow.
Also remember to use the -server option when running the program, it can improve performance significantly.
One other thing to try is to use float instead of double (if the accuracy isn't needed). The latest Hotspot uses SSE, and this change may have some effect on SSE optimizations.
Remember that the Hotspot JVM requires a quite long warmup before compiling code, so if the program completes in a short time, you will never reach the performance of C++.
With these optimizations I think the performance will be similar to the C++ version.
> 3. How do you do infix operators in Java?
Do you mean operator overloading? Not supported in Java.
> 4. Am I supposed to have a static main function which instantiates a class > and invokes a member function of it in order to start the program?
In article <429e5795$0$7560$ed261...@ptn-nntp-reader03.plus.net>, Jon Harrop <use...@jdh30.plus.com> wrote:
> I have since ported the program to several other languages, including Java. > Currently, the Mlton-compiled SML, OCaml, C++ and Fortran are the fastest, > and Java trails a long way behind. I am concerned that this is because I am > a much better OCaml/C++ than Java programmer so I'm asking for advice here.
I strongly, strongly suggest finding a profiler. Such a beast can tell you where your CPU cycles are going, which will help you optimize them. If you are on Windows, the NetBeans folks have the JFluid profiler built in, on some builds. On the Mac, the Shark profiler is included with the developer tools. I have had great success with JProfiler, and the JetBrains folks seem to really like YourKit. (The first two are free, the second are for pay.)
As always, try to take a high level look at the profiler's results. For example, if it tells you that one arithmetic operation is the big cost, try to decide whether you can not do the computation, or change the algorithm to do that task fewer times before trying to get a ++ or -- to execute faster.
In Java, the above vague advice usually translates to looking at when and how you are creating objects, and making sure you do not have excessive synchronization overhead. Also, a few Sun classes are quite lame, and worth rewriting. (If, and only if, they are taking appreciable time...)
Then, once you know your algorithm is appropriate, tune the lines of code that matter.
Jon Harrop <use...@jdh30.plus.com> writes: > 1. What major optimisations are missing from my program (e.g. in C++, I pass > vectors by reference and try to inline vector operations).
You create a *lot* of vector objects. That means that you should make creation as fast as possible. That includes having as few fields as possible on the object.
All your nested classes are non-static. That means that they all have a reference to the enclosing *object*. That reference is a field, and it's not used, so it wastes space and therefore time.
Make all your nested classes static. I'm certain that will give you a boost.
Fiddling with the code, I see that you only ever use scaling in conjunction with addition, i.e., a+s*b. I changed addition to taking an extra scaling argument, avoiding one intermediate vector value.
> Here's my Java port:
Here's a modification of it, that is (IMHO) a little prettier and more Java-like. For a real Java program, I wouldn't have all those classes as nested classes at all, they would be in separate files.
Generally, I made everything static that could be made static, and everything final, that could be made final. The latter probably doesn't affect performance much in the long run, but it should allow the JVM to do some optimizations sooner.
I tried not to change the actual algorithm at all.
Patricia Shanahan wrote: > The other one I've seen is the gap between 1.0 and the > smallest number greater than 1.0. For Java doubles, it is > Math.ulp(1.0). In general, the gap between a Java double d > and the smallest double greater than d is Math.ulp(d).
This is the epsilon that I'm after. However, neither gcj (3.4) nor Sun J2SE (1.4) seem to implement it?!
Google turns up only 5 pages, one is a discussion of floating point under Java (which you seem to have posted to!), another implies that ulp is not implemented in libgcj and the rest are useless.
Chris Uppal wrote: > You've managed to squeeze down the C++ to the point where it's almost as > short > as the OCaml source ! I wouldn't have thought that was possible. I admit > that I don't like the resulting code, but I do feel that you deserve some > sort of award nevertheless ;-)
Thanks. :-)
To be honest, it isn't that much different to my usual C++ coding style. I used to space things out a lot more when using templates, but for vector arithmetic I find one-liners better. The OCaml code is certainly not that much different from usual style.
> I think you have a bug -- the value that you parse out of the arguments to > the program is then used as the parameter 'n' to run() (which I think is > used to set the size of the generated image); whereas your C++ and OCaml > codes' equivalents are used as the 'level' parameter.
I should explain that there are a few variants of these programs. The first one is a 222-line OCaml program which is substantially fancier (incremental OpenGL rendering):
which takes about 8s to run for n=256 and level=6 (your measurement was 50s on a 50%-faster computer). I have better optimised code which only takes 6s.
I'd also like to do language comparisons on my AMD64. Is J2SE available for AMD64?
>> 2. Where is the floating-point machine epsilon in the Java libraries?
> Are you looking for java.lang.Double.MIN_VALUE ? Or for something else ?
I'm looking for the smallest positive double which when added to one does not give one. Other people have said Math.ulp(1.0) but this fails on both GCJ and Sun J2SE. I'd be amazed if the floating point epsilon were not available in such a popular language...
Incidentally, my "delta" is the sqrt of epsilon (~4e-8).
> > Another optimization is to replace the LinkedList with an ArrayList > > and use .size() and .get() when iterating the list. Iterators are > > slow.
> You've worried me; my company has a java App > that uses Collections HEAVILY, and > (for convenience) we use Iterators exclusively.
I wouldn't worry to much unless you have really performance critical code (like a tight inner loop in a game or benchmark test). The iterators probably takes a negligible amount of your total CPU time.
> Would you care to quantify the speed difference?
I wrote a little benchmark (see code below) to compare iteration times using array access, ArrayList.get() and Iterator. Here's my results on an Athlon 2GHz using Java 1.5.0 -server:
Array: 1990 ms ArrayList.get(): 2021 ms ArrayList.iterator(): 2979 ms
So, it seems ArrayList.get() is about as fast as normal array access, and using an Iterator is about 50% slower.
>>The other one I've seen is the gap between 1.0 and the >>smallest number greater than 1.0. For Java doubles, it is >>Math.ulp(1.0). In general, the gap between a Java double d >>and the smallest double greater than d is Math.ulp(d).
> This is the epsilon that I'm after. However, neither gcj (3.4) nor Sun J2SE > (1.4) seem to implement it?!
Sorry, it's a "Since 1.5", so you would need a really up to date JDK.
Meanwhile, here is a way to calculate it directly, depending only on very well established features:
public class Epsilon { public static void main(String[] args) { System.out.println(Math.ulp(1.0)); System.out.println(Math.sqrt(Math.ulp(1.0))); double epsilon = Double.longBitsToDouble( Double.doubleToLongBits(1.0)+1)-1; System.out.println(epsilon); }
Jon Harrop wrote: > I'm looking for the smallest positive double which when > added to one does not give one. Other people have said > Math.ulp(1.0) but this fails on both GCJ and Sun J2SE. > I'd be amazed if the floating point epsilon were not > available in such a popular language...
> Incidentally, my "delta" is the sqrt of epsilon (~4e-8).
That seems rather large. If sqrt(epsilon) is ~4e-8 then epsilon is ~1.6e-15.
Most systems, including all Java implementations, use IEEE 754 64 bit binary floating point for double. IEEE 64 bit has a 52 bit stored mantissa, so the least significant bit has value 2^-52 times the value of the unstored most signficant bit. If the most significant bit has value 1, then the least significant bit has value 2^-52, which matches Math.ulp(1.0), about 2.2E-16.
[If that paragraph does not make sense to you, I can rewrite it with more background. I have no idea how much you know about floating point formats.]
Your epsilon seems to be about 7 times as big as it should be???
googmeis...@gmail.com wrote: >> I'm looking for the smallest positive double which when >> added to one does not give one.
> It's a property of the IEEE 754 binary floating point > standard. The correct answer is 2^(-53) + 2^(-105).
Jon agreed, elsewhere in this thread, with my suggestion of the gap between 1.0 and the smallest number greater than 1.0. That is 2^-52, the value of Math.ulp(1.0).
Googmeister's answer is slightly over half the gap between 1.0 and the smallest double greater than 1.0, so that adding it to 1.0 will round up. It is correct for the definition given above.
I've taken another look at it, and I now agree with the definition above, and Googmeister's answer, but the really important thing is how is this going to be used? What assumptions does the program make about the meaning of epsilon?
Jon Harrop wrote: > To be honest, it isn't that much different to my usual C++ coding style. I > used to space things out a lot more when using templates, but for vector > arithmetic I find one-liners better. The OCaml code is certainly not that > much different from usual style.
I can understand that if you are essentially thinking in terms of the equations of the underling geometry, then the sheer sprawl of the usual C-family layout(s) would feel unnatural. I can't help adding, though, that it makes it virually unreadable for anyone else -- or at least /I/ find it very difficult to read.
Which, btw, is one of the main reasons why I doubt if I'll follow up:
> I think you must be using the totally unoptimised C++ as Java is 6x slower > on my computer. The shootout has slightly better code:
There are articles both from Sun and IBM, stating that is NOT favorable to pool or cache objects. The newer garbage collectors have a generational mechanism that causes new objects to get reclaimed quicker.
I've had a fairly complex heavy image processor with many cached image blocks. Processing would take around 30 minutes and had a lot of pooling, where I check out buffered images and return them to the pool. When I took out the pooling and created new ones, the performance didn't get reduced one bit.
On a side note, the "language shootout" web page measures Java WRONG. They don't do their tests properly. Often Java runs actually faster than C, especially for number crunching.
I've gotten in many arguments in the past where people were not believing me, but I won on every occasion.
> There are articles both from Sun and IBM, stating that is NOT favorable > to pool or cache objects. The newer garbage collectors have a > generational mechanism that causes new objects to get reclaimed > quicker.
That's not universally true. For instance, if object creation/destruction is expensive. Database connections are an example.
-- Lee Fesperman, FFE Software, Inc. (http://www.firstsql.com) ============================================================== * The Ultimate DBMS is here! * FirstSQL/J Object/Relational DBMS (http://www.firstsql.com)
Lee Fesperman <first...@ix.netcom.com> wrote: > That's not universally true. For instance, if object creation > /destruction is expensive. Database connections are an example.
Right. Specifically, it's most true under the following conditions:
1. Memory management effort dominates the object's lifecycle management. 2. The object is small enough to be allocated in the nursery. 3. The object is short-lived enough not to be promoted from the nursery.
So, for example, keeping one 128K buffer may also improve performance over re-allocating it every time you need it. The real point Sun ought to be making is not that pooling is bad, but that software developers are, in general, not very good at predicting what is good or bad for performance. Because of that, writing an object pool without having some clear and objective profiling cases in place to measure the resulting change in performance is not a good idea.
Basically, keep code simple, then optimize with liberal use of objective data and not intuitive ideas about performance.
(That said, I don't think most people would include database connections in their general idea of instance pooling. Connection pooling is a different matter altogether, and in that case it probably is worth designing for pre-emptively just on the basis that it's widely recognized as good practice and provided for in the JDBC API.)
HomerCritic wrote: > On a side note, the "language shootout" web page measures Java WRONG. > They don't do their tests properly. > Often Java runs actually faster than C, especially for number > crunching.
Can you elaborate on this? All of my tests indicate that Java is many times slower than most other modern languages, even stereotypically slow languages like SML and OCaml.
Jon Harrop <use...@jdh30.plus.com> wrote: > Can you elaborate on this? All of my tests indicate that Java is many times > slower than most other modern languages, even stereotypically slow > languages like SML and OCaml.
You're making statements that are far too broad to be useful. Java is far slower doing what? If you're measuring only startup time (before 'main' even starts) then you'll find Java is literally tens of thousands of times slower than many other languages. That's why Java is rarely or never used for interactive applications that run for extremely short periods of time, such as typical command line utilities.
Most computational tasks tend to run within about 10% to 20% of optimal time for the hardware platform, and compare about evenly with modern compiled languages. There are specific tasks for which significant general performance differences may be measured (floating point calculations being one, but I can't recall whether Java is typically faster or slower here, and it may depend on the platform), but this wouldn't be described as "many times slower".
I strongly suspect that the issue is startup time. If you're measuring it with your ray tracer, are you measuring externally so as to include the startup time, or internally once the VM has started? There's no "right" way to do this; it depends on whether your target audience will care about startup time, or only how the appo performs once it's running. However, if you're including startup time, you ought to at least add a note that this is the reason for the results. The great majority of applications run for orders of magnitude longer than the average benchmark, and just saying that Java is slower is very misleading.