I have since ported the program to C (icc and gcc), SML (sml-nj and mlton) and Java (j2se and gcj) and Simon Geard kindly ported it to Fortran 90.
I'd like to see it ported to other languages as well, particularly Haskell and I'm keen to hear any constructive criticisms of my existing implementations.
It is interesting to note that the SML is 50% longer than the OCaml. When compiled with Mlton it is also significantly faster. Here are my timings (1.2GHz Athlon t-bird) for 128x128 resolution, 4x4 oversampling and 6 levels of spheres:
Mlton 1.316s mlton ray.sml IFC 1.361s ifort -O3 -u -static-libcxa -o raytracer raytracer.f90 C++ 1.605s g++-3.4 -O3 -funroll-all-loops -ffast-math ray.cpp -o ray ocamlopt 1.932s ocamlopt -inline 100 ray.ml -o ray SML-NJ 2.245s sml ray.sml G95 3.351s g95 -O3 -ffast-math ray.f90 -o ray C 5.971s gcc-3.4 -lm -std=c99 -O3 -ffast-math ray.c -o ray Java 6.492s javac ray.java GCJ 20.316s gcj-3.4 --main=ray -Wall -O3 -ffast-math ray.java -o ray ocamlc 41.047s ocamlc ray.ml -o ray
Note: the OCaml and C++ implementations timed here are not those on the site - they are more optimised and longer (keeping within 100 LOC).
The compile times are also interesting:
ocamlc 0.099s IFC 0.333s SML-NJ 0.625s C 0.655s ocamlopt 0.714s GCJ 0.723s G95 0.941s Java 1.362s Mlton 8.267s C++ 8.676s
fun real n = Real.fromInt n (* Mlton does not provide a "for loop" construct. *) fun for (s, e, f) = if s=e then () else (f (real s); for (s+1, e, f))
val delta = 0.00000001 (* FIXME: Where is mach eps in the std libs? *) val infinity = 1.0 / 0.0 (* FIXME: Where is infinity? *) val pi = 4.0 * Real.Math.atan 1.0 (* FIXME: Where is pi? *)
(* 3D vector SML typically uses tuples rather than records (as in the OCaml implementation). *) type vec = real * real * real (* SML allows operator precedences to be declared when defining infix operators. *) infix 2 *| fun s *| (x, y, z) = (s*x+0.0, s*y, s*z) infix 1 +| fun (x1, y1, z1) +| (x2, y2, z2) = (x1+x2+0.0, y1+y2+0.0, z1+z2+0.0) infix 1 -| fun (x1, y1, z1) -| (x2, y2, z2) = (x1-x2+0.0, y1-y2+0.0, z1-z2+0.0) fun dot (x1, y1, z1) (x2, y2, z2) = x1*x2 + y1*y2 + z1*z2+0.0 fun unitise r = (1.0 / Real.Math.sqrt (dot r r)) *| r
(* Node in the scene tree *) datatype scene = Sphere of vec * real | Group of vec * real * scene list
(* Find the first intersection of the given ray with this sphere *) fun ray_sphere (orig, dir) center radius = let val v = center -| orig val b = dot v dir val disc = b * b - dot v v + radius * radius in if disc < 0.0 then infinity else let val disc = Real.Math.sqrt disc in let val t2 = b + disc in if t2 < 0.0 then infinity else (fn t1 => if t1 > 0.0 then t1 else t2) (b - disc) end end end
(* Accumulate the first intersection of the given ray with this sphere *) (* This function is used both for primary and shadow rays. *) fun intersect (orig, dir) scene = let fun of_scene (scene, (l, n)) = case scene of Sphere (center, radius) => let val l' = ray_sphere (orig, dir) center radius in if l' >= l then (l, n) else (l', unitise (orig +| l' *| dir -| center)) end | Group (center, radius, scenes) => let val l' = ray_sphere (orig, dir) center radius in if l' >= l then (l, n) else foldl of_scene (l, n) scenes end in of_scene (scene, (infinity, (0.0, 0.0, 0.0))) end
(* Trace a single ray into the scene *) fun ray_trace light (orig, dir) scene = let val (lambda, n) = intersect (orig, dir) scene in if lambda >= infinity then 0.0 else let val g = 0.0 - dot n light in (* If we are on the shadowed side of a sphere then don't bother casting a shadow ray as we know it will intersect the same sphere. *) if g <= 0.0 then 0.0 else let val orig = orig +| lambda *| dir +| delta *| n val dir = (0.0, 0.0, 0.0) -| light in let val (l, _) = intersect (orig, dir) scene in if l >= infinity then g else 0.0 end end end end
(* Recursively build the scene tree *) fun create level r (x, y, z) = let val obj = Sphere ((x, y, z), r) in if level = 1 then obj else let val r' = 3.0 * r / Real.Math.sqrt 12.0 in let fun aux x' z' = create (level-1) (0.5 * r) (x-x', y+r', z+z') in let val objs = [aux (~r') (~r'), aux r' (~r'), aux (~r') r', aux r' r', obj] in Group ((x, y, z), 3.0 * r, objs) end end end end
(* Build a scene and trace many rays into it, outputting a PGM image *) val () = let val level = 6 (* Number of levels of spheres *) val ss = 4 (* Oversampling *) (* Resolution *) val n = case CommandLine.arguments () of [s] => (case Int.fromString s of SOME n => n | _ => 256) | _ => 256 val scene = create level 1.0 (0.0, ~1.0, 0.0) (* Scene tree *) in (fn s => print ("P5\n"^s^" "^s^"\n255\n")) (Int.toString n); for (0, n, fn y => for (0, n, fn x => let val g = ref 0.0 in for (0, ss, fn dx => for (0, ss, fn dy => let val n = real n val x = x + dx / real ss - n / 2.0 val y = n - 1.0 - y + dy / real ss - n / 2.0 in let val eye = unitise (~1.0, ~3.0, 2.0) val ray = ((0.0, 0.0, ~4.0), unitise (x, y, n)) in g := !g + ray_trace eye ray scene end end)); let val g = 0.5 + 255.0 * !g / real (ss*ss) in print (String.str(Char.chr(Real.trunc g))) end end)) end
>> C++ 1.605s g++-3.4 -O3 -funroll-all-loops -ffast-math ray.cpp -o >> ray >> C 5.971s gcc-3.4 -lm -std=c99 -O3 -ffast-math ray.c -o ray
> What explains such big difference here?
C does a lot better on AMD64 but I believe the difference is due to the efficiency of inlined const reference passing of vectors in C++ compared to the naive approach used by the C code.
Also, the C++ code includes some optimisations not in the C code because it already exceeded the 100 LOC limit of the shootout. Specifically, specialised calls for shadow rays. This shaves off another 30% or so.
> Where is the code actually used for benchmarking, BTW?
Here's the OCaml:
(* The Great Computer Language Shootout http://shootout.alioth.debian.org/ Contributed by Jon Harrop, 2005 Compile: ocamlopt -inline 100 ray.ml -o ray *)
(* This implementation differs from the original in several ways:
Uses an implicit scene, generated as a ray is traced rather than being precalculated and stored explicitly in a tree.
Specialized shadow-ray intersection functions. *)
let delta = sqrt epsilon_float and pi = 4. *. atan 1.
(* 3D vector and associated functions *) type vec = {x:float; y:float; z:float} let ( *| ) s r = {x = s *. r.x; y = s *. r.y; z = s *. r.z} let ( +| ) a b = {x = a.x +. b.x; y = a.y +. b.y; z = a.z +. b.z} let ( -| ) a b = {x = a.x -. b.x; y = a.y -. b.y; z = a.z -. b.z} let dot a b = a.x *. b.x +. a.y *. b.y +. a.z *. b.z let unitise r = (1. /. sqrt (dot r r)) *| r
(* A semi-infinite ray starting at "orig" and with direction "dir". *) type ray = { orig: vec; dir: vec }
(* Calculate the parametric intersection of the given ray with the given sphere. *) let ray_sphere orig dir center radius = let v = center -| orig in let b = dot v dir in let disc = b *. b -. dot v v +. radius *. radius in if disc < 0. then infinity else let disc = sqrt disc in (fun t2 -> if t2 < 0. then infinity else ((fun t1 -> if t1 > 0. then t1 else t2) (b -. disc))) (b +. disc)
(* Calculate whether or not the given ray intersects the given sphere. *) let ray_sphere' orig dir center radius = let v = center -| orig in let b = dot v dir in let disc = b *. b -. dot v v +. radius *. radius in if disc < 0. then false else b +. sqrt disc >= 0.
(* Ratio of the radii of one level of spheres to the next. *) let s = 6. /. sqrt 12.
(* Find the first intersection point of the given ray with the scene. *) let intersect level orig dir = let rec of_scene center radius lambda normal level = if level = 1 then let lambda' = ray_sphere orig dir center radius in if lambda' >= lambda then lambda, normal else lambda', unitise (orig +| lambda' *| dir -| center) else if ray_sphere orig dir center (3. *. radius) >= lambda then lambda, normal else let accu = of_scene center radius lambda normal 1 in let r = 0.5 *. radius and l = level - 1 in let r' = s *. r in let aux dx dz (lambda, normal) = of_scene (center +| {x=dx; y=r'; z=dz}) r lambda normal l in let mr' = -.r' in aux r' mr' (aux r' r' (aux mr' r' (aux mr' mr' accu))) in of_scene {x=0.; y= -1.; z=0.} 1. infinity {x=0.; y=0.; z=0.} level
(* Find if the given ray intersects the scene. *) let intersect' level orig dir = let rec of_scene center radius level = if level = 1 then ray_sphere' orig dir center radius else (* Exploit short-circuit evaluation of boolean comparisons to terminate this function early. *) ray_sphere' orig dir center (3. *. radius) && (of_scene center radius 1 || let r = 0.5 *. radius and l = level - 1 in let r' = s *. r in of_scene (center +| {x= -.r'; y=r'; z= -.r'}) r l || of_scene (center +| {x= r'; y=r'; z= -.r'}) r l || of_scene (center +| {x= -.r'; y=r'; z= r'}) r l || of_scene (center +| {x= r'; y=r'; z= r'}) r l) in of_scene {x=0.; y= -1.; z=0.} 1. level
(* Trace a single ray by casting it into the scene and, if it intersects anything, casting a second ray toward the light to determine occlusion. *) let rec ray_trace l light orig dir = let lambda, n = intersect l orig dir in if lambda = infinity then 0. else let g = -. dot n light in (* If we are on the shadowed side of a sphere then don't bother casting a shadow ray as we know it will intersect the same sphere. *) if g <= 0. then 0. else let p = orig +| lambda *| dir +| delta *| n in if intersect' l p ({x=0.; y=0.; z=0.} -| light) then 0. else g
(* Ray trace the scene at the given resolution. *) let () = (* Resolution *) let n = match Sys.argv with [| _; l |] -> int_of_string l | _ -> 256 in (* Light direction *) let light = unitise {x= -1.; y= -3.; z=2.} in (* Number of levels of spheres, and oversampling. *) let level = 6 and ss = 4 in
Printf.printf "P5\n%d %d\n255\n" n n; for y = n - 1 downto 0 do for x = 0 to n - 1 do (* Average each pixel over ss*ss separate rays. *) let g = ref 0. in for dx = 0 to ss - 1 do for dy = 0 to ss - 1 do (* Calculate the origin and direction of this ray. *) let orig = {x=0.; y=0.; z= -4.} in let dir = unitise {x = float (x - n / 2) +. float dx /. float ss; y = float (y - n / 2) +. float dy /. float ss; z = float n} in g := !g +. ray_trace level light orig dir done done; let g = int_of_float (0.5 +. 255. *. !g /. float (ss*ss)) in Printf.printf "%c" (char_of_int g) done done
Here's the C++:
// The Great Computer Language Shootout // http://shootout.alioth.debian.org/ // Contributed by Jon Harrop, 2005 // Compile: g++ -Wall -O3 -ffast-math ray.cpp -o ray
// Semi-infinite ray struct Ray { Vec orig, dir; Ray(Vec o, Vec d) : orig(o), dir(d) {} };
// Scene tree // In this implementation, a node in the scene tree is represented by a single // struct which is either a group of scene trees with a spherical bound or, // implicitly, a single sphere if the group has no children. // This is not equivalent to the variant type used to represent a node in the // original OCaml implementation because this representation cannot associate // data with only leaf nodes (such as color, reflectivity etc.) but requires // significantly less C++ code and gives room to implement a specialized // shadow-ray intersection algorithm. struct Scene { vector<Scene> child; // Child nodes in the scene tree Vec center; // Center of the sphere or spherical bound double radius; // Radius of the sphere or spherical bound
// Find the first intersection of the given ray with this sphere double ray_sphere(const Ray &ray, const Scene &s) { Vec v = s.center - ray.orig; double b = dot(v, ray.dir), disc = b*b - dot(v, v) + s.radius*s.radius; if (disc < 0) return infinity; double d = sqrt(disc), t2 = b + d; if (t2 < 0) return infinity; double t1 = b - d; return (t1 > 0 ? t1 : t2);
}
// Accumulate the first intersection of the given ray with the given scene // The accumulated parameter (lambda) and normal vector (normal) are passed by // reference to avoid having to define a struct to represent the real return // type of this function. void intersect(double &lambda, Vec &normal, const Ray &ray, const Scene &s) { double l = ray_sphere(ray, s); // If there is no intersection with this node or if the intersection point is // farther than the current intersection then return as no closer // intersection is to be found here. if (l >= lambda) return; if (s.child.size() == 0) { // Intersect with a single sphere lambda = l; normal = unitise(ray.orig + l * ray.dir - s.center); } else // Intersect with a group for (std::vector<Scene>::const_iterator it=s.child.begin(); it!=s.child.end(); ++it) intersect(lambda, normal, ray, *it);
}
// Find any intersection of the given ray with the given scene // This function is significantly faster than the above function because it can // terminate as soon as any intersection is found. // This function is distinguished from the above function by its arguments // (function overloading). bool intersect(const Ray &ray, const Scene &s) { if (ray_sphere(ray, s) == infinity) return false; if (s.child.size() == 0) return true; else for (std::vector<Scene>::const_iterator it=s.child.begin(); it != s.child.end(); ++it) if (intersect(ray, *it)) return true; return false;
}
// Trace a single ray into the scene double ray_trace(const double weight, const Vec light, const Ray ray, const Scene &s) { // As the accumulator is passed to the "intersect" function by reference, // they cannot be given inline so they are declared as local variables here. double lambda = infinity; Vec normal(0, 0, 0); intersect(lambda, normal, ray, s); if (lambda == infinity) return 0; Vec o = ray.orig + lambda * ray.dir + delta * normal; double g = -dot(normal, light); // If we are on the shadowed side of a sphere then don't bother casting a // shadow ray
...
> It is interesting to note that the SML is 50% longer than the OCaml.
It may be 50% longer than the original 66-line OCaml program, but it seems to be exactly the same size at the optimized OCaml program you posted.
> Here is the SML (for Mlton):
Just a couple of SML notes:
> fun real n = Real.fromInt n
A function "real" (of identical semantics) is available in the top-level environment of the Basis Library, so there is no need to define your own version.
> val delta = 0.00000001 (* FIXME: Where is mach eps in the std libs? *)
The Standard ML Basis Library does have Real.minNormalPos which corresponds to OCaml's min_float. You can compute OCaml's epsilon_float with val epsilon = Real.nextAfter(1.0,2.0) - 1.0
> val infinity = 1.0 / 0.0 (* FIXME: Where is infinity? *)
val infinity = Real.posInf
> val pi = 4.0 * Real.Math.atan 1.0 (* FIXME: Where is pi? *)
val pi = Real.Math.pi
> (* 3D vector > SML typically uses tuples rather than records (as in the OCaml implementation). *) > type vec = real * real * real
I don't know if it is typical, but records will have no impact on the peformance of the code compiled under MLton (and presumably under other SML compilers).
> (* SML allows operator precedences to be declared when defining infix operators. *) > infix 2 *| fun s *| (x, y, z) = (s*x+0.0, s*y, s*z) > infix 1 +| fun (x1, y1, z1) +| (x2, y2, z2) = > (x1+x2+0.0, y1+y2+0.0, z1+z2+0.0) > infix 1 -| fun (x1, y1, z1) -| (x2, y2, z2) = > (x1-x2+0.0, y1-y2+0.0, z1-z2+0.0) > fun dot (x1, y1, z1) (x2, y2, z2) = x1*x2 + y1*y2 + z1*z2+0.0
Rather than using "+0.0" to force the overloaded operators to resolve to Real.real, you could use a simple type annotation on the functions:
fun s *| (x, y, z) : vec = (s*x, s*y, s*z)
MLton doesn't actually perform constant folding on floating point operations, so you are paying for some extra operations in the final executable.
Matthew Fluet <mfl...@acm.org> writes: >> infix 2 *| fun s *| (x, y, z) = (s*x+0.0, s*y, s*z) > fun s *| (x, y, z) : vec = (s*x, s*y, s*z)
> MLton doesn't actually perform constant folding on floating point > operations, so you are paying for some extra operations in the final > executable.
Actually x+0.0 is not the same as x when x is -0.0 (according to IEEE floating point semantics - I don't know how SML treats it).
>>> infix 2 *| fun s *| (x, y, z) = (s*x+0.0, s*y, s*z)
>> fun s *| (x, y, z) : vec = (s*x, s*y, s*z)
>> MLton doesn't actually perform constant folding on floating point >> operations, so you are paying for some extra operations in the final >> executable.
> Actually x+0.0 is not the same as x when x is -0.0 (according to IEEE > floating point semantics - I don't know how SML treats it).
Right. And that's why it's a bad idea to use +0.0 just for the sake of indicating a type constraint.
> I'm keen to hear any constructive criticisms of my existing > implementations. [...] > let val r' = 3.0 * r / Real.Math.sqrt 12.0 in > let fun aux x' z' = > create (level-1) (0.5 * r) (x-x', y+r', z+z') in > let val objs = [aux (~r') (~r'), aux r' (~r'), > aux (~r') r', aux r' r', obj] in > Group ((x, y, z), 3.0 * r, objs) > end end end end
[...]
This won't have an effect on performance, but the above style of let-nesting (used throughout the code), which is probably a result of a conservative translating from OCaml, is not necessary in SML. In SML, you can flatten the nested let-expressions. In other words, you can flatten an expression of the form
let val ... in let val ... in ... let val in ... end ... end end