I'm trying to put together a list of various methods to do code generate for switch statements.
This is what I have so far: * Successive if/else branches * Jump tables * branch tables with linear and binary scans * Clustering tables (e.g., if you have cases 0-100 and 300-400).
Are there any other methods worth mentioning?
-- Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth [I think I've seen hashing into branch tables. -John]
Joshua Cranmer <Pidgeo...@verizon.invalid> wrote: > I'm trying to put together a list of various methods to do code generate > for switch statements. > This is what I have so far: > * Successive if/else branches > * Jump tables > * branch tables with linear and binary scans > * Clustering tables (e.g., if you have cases 0-100 and 300-400).
The two I have seen done on S/370 are an indexed branch to a table of branch instructions, and indexed load of an address to a register, and then BR to that address. I presume those are two of the above. (IBM programs commonly have return codes that are multiples of four, presumably to make the indexing easier.)
> Are there any other methods worth mentioning?
VAX has the CASE instruction, most likely a microcode implementation of one of the above. When VAX was new, there were stories that many of the special instructions (such as POLY and INDEX) were slower than doing the same operation using separate instructions.
I believe that JVM also has a special instruction for this operation, as the above likely don't work there.
> [I think I've seen hashing into branch tables. -John]
Sounds better than branching into hash tables.
-- glen [On machines with condition codes, I've seen binary searches expanded into code with compare and branch instructions; after the comparison you can do a three way branch on less, equal, or greater. -John]
On 2009-11-03, Joshua Cranmer <Pidgeo...@verizon.invalid> wrote:
> I'm trying to put together a list of various methods to do code generate > for switch statements.
> This is what I have so far: > * Successive if/else branches > * Jump tables > * branch tables with linear and binary scans > * Clustering tables (e.g., if you have cases 0-100 and 300-400).
> Are there any other methods worth mentioning?
if/else arranged in a binary search tree, rather than linear succession. Analogous to your branch table with binary scan, but using code rather than a table to reduce the comparions for N labels to log N.
E.g. cases are:
10 50 90 1000 1200 3000 4000 7500 12345
if (x < 3000) { if (x < 90) { if (x == 10) { ... } else if (x == 50) { ... } } else { if (x == 90) { ... } else if (x == 1000) { ... } else if (x == 1200) { ... } } } else { /* similarly for 3000-12345 range */ }
This can combine with clustering. Binary search to classify an input value into cluster ranges, then check the range and use the table for that range, or default out.
"glen herrmannsfeldt" <g...@ugcs.caltech.edu> wrote in message > Joshua Cranmer <Pidgeo...@verizon.invalid> wrote: >> I'm trying to put together a list of various methods to do code generate >> for switch statements.
<snip>
> I believe that JVM also has a special instruction for this > operation, as the above likely don't work there.
I think both the JVM and MSIL/CIL have special instructions for jump tables...
>> [I think I've seen hashing into branch tables. -John]
> Sounds better than branching into hash tables.
> -- glen > [On machines with condition codes, I've seen binary searches expanded > into code with compare and branch instructions; after the comparison > you can do a three way branch on less, equal, or greater. -John]
yeah...
in my compiler (on x86 and x64) I ended up using condition codes to implement a trinary jump of this sort.
I had intended this for implementing slightly faster switches (errm... because my compiler does not have jump tables... lame, I know, just never got around to it...).
from what I remember, I do a binary lookup to handle switches.
then again, I am still not using my compiler as a "general purpose" C compiler, and so there are still a few deficiencies I have never really gotten around to fixing.
so many other things to work on, such as the x86 interpreter (now works ok, and is ~170x slower than native, but still lacks a good deal in terms of API bindings, me left thinking I may need some sort of IDL-like technology for this, ...), ...
maybe just a whole lot of busywork, I don't know...
Joshua Cranmer <Pidgeo...@verizon.invalid> writes: > I'm trying to put together a list of various methods to do code generate > for switch statements....
> [On machines with condition codes, I've seen binary searches expanded > into code with compare and branch instructions; after the comparison > you can do a three way branch on less, equal, or greater. -John]
I once saw in a Fortran book the suggestion that one use arithmetic IF in place of computed GOTO with three labels.
IF(N-2) 1,2,3
instead of
GOTO (1,2,3),N
For modern processors depending on branch prediction logic, either a branch table or address table will likely foil any prediction system. Complicated conditional branching is likely faster in that case.
Joshua Cranmer <Pidgeo...@verizon.invalid> writes: >I'm trying to put together a list of various methods to do code generate >for switch statements. ... >Are there any other methods worth mentioning?
Hash-table-based jump tables.
IIRC the following paper discusses these methods:
@Article{journals/spe/KannanP94, title = "Short Communication: Correction to 'Producing Good Code for the case Statement'", author = "Sampath Kannan and Todd A. Proebsting", journal = "Softw, Pract. Exper", year = "1994", number = "2", volume = "24", bibdate = "2003-11-25", bibsource = "DBLP, http://dblp.uni-trier.de/db/journals/spe/spe24.html#KannanP94", pages = "233",
}
and this refers to:
@Article{bernstein:85a, author = "R. L. Bernstein", title = "Producing good code for the case statement", journal = "Software -- Practice \& Experience", volume = "15", number = "10", pages = "1021--1024", month = oct, year = "1985",
}
These days, with the varying amounts of branch prediction hardware, the best way probably depends on the hardware (which branch predictors are implemented in the hardware?) and on the run-time characteristics of the program (how effective are the branch predictors on this particular usage pattern).
On 11/04/2009 12:27 AM, glen herrmannsfeldt wrote:
> I believe that JVM also has a special instruction for this > operation, as the above likely don't work there.
Actually two special bytecodes, lookupswitch and tableswitch. These are the linear scan and the jump table methods, respectively. The upcoming string switch appears to use a hash routine.
> [On machines with condition codes, I've seen binary searches expanded > into code with compare and branch instructions; after the comparison > you can do a three way branch on less, equal, or greater. -John]
It seems both gcc and llvm emit code like this; I've added it to my list already.
-- Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth
Joshua Cranmer wrote: > I'm trying to put together a list of various methods to do code generate > for switch statements.
> This is what I have so far: > * Successive if/else branches
I have seen / have implemented if/else as either buried as part of the switch body or as a separate if/else code block that jumps into the switch body. The determining factor is the distance a conditional branch can jump.
> On 11/04/2009 12:27 AM, glen herrmannsfeldt wrote: >> I believe that JVM also has a special instruction for this >> operation, as the above likely don't work there.
> Actually two special bytecodes, lookupswitch and tableswitch. These are > the linear scan and the jump table methods, respectively. The upcoming > string switch appears to use a hash routine.
Hey, that's what I use.. invented independently of course.
'switch' for an indexed jump table, for case values in the range of 500 or 1000 (between minimum and maximum values). 'cswitch' for other case values, which just does a linear search.
Since these implement 'Switch' for an interpreted language, with the cswitch bytecode implemented in tight assembler, opting for a linear search doesn't have as much impact as it would do in a compiled language, especially if the most common cases appear first.
However, with a big enough table size then I guess a proper lookup is called for.
>>> I'm trying to put together a list of various methods to do code generate >>> for switch statements. ...
> Has anyone seen tries used for the string-based switch statements?
What dou you expect?
Visual Basic had such a feature, implemented as a series of string compares and jumps.
The typical solution is a lookup in a (sorted, hashed...) string list, followed by an "ordinary" switch with the index of the found string. The target address could be attached to every string in that list, so that a second switch is not required.
A string tree might give the best lookup performance.
DoDi
[If there's a lot of strings, a patricia trie might be faster than hashing or binary search since it doesn't require repeated scans of the string during the match process. -John]
nathan.mccau...@gmail.com wrote: > On Nov 5, 12:12 pm, Walter Banks <wal...@bytecraft.com> wrote: >> Joshua Cranmer wrote: >> > I'm trying to put together a list of various methods to do code >> > generate for switch statements. ...
> Has anyone seen tries used for the string-based switch statements?
No... Not tries exactly. Since compilation happens on static data, it's easy to build a simple balanced binary tree. The extra complication of tries when you're not going to be doing insertions and deletions isn't worth it.
I think the winner for string switches is a string representation with a precomputed hash code, used in combination with a hashtable of jump addresses.
Because the elements are known at compilation time, you can know in advance the number of collisions in the longest bucket when you're allocating the table, which means you can just allocate a table of NxM entities where N is the number of buckets (a power of 2) and M is the longest length needed (rounded up to a power of 2).
So when you get the string's hash code, you do a bitmask to get the index into the hash table, and then check to see if you have a matching hash code(success) or a zero hashcode (failure) there - if not, increment the index by one until you do (a degenerate case of "rehashing" that you can use here because you don't have to deal with the possibility of making space for arbitrary insertions) and then you have the jump address.
> A string tree might give the best lookup performance. > [If there's a lot of strings, a patricia trie might be faster than > hashing or binary search since it doesn't require repeated scans of > the string during the match process. -John]
Thanks for the "patricia trie" term :-) Wikipedia redirects it to "Radix tree".
Working out the best set of algorithms to use for mapping switch statements to machine code obviously requires information on switch statement usage in real code.
The latest issue of CVu, the magazine of he ACCU www.accu.org, has an article by yours truely comparing the usage of switch statements and nested if-else-if sequences in C code.
You need to be an ACCU member to access the article (yearly membership #25 or #35 if you want both magazines). I will make a pdf freely available in a month or so.
While I'm talking about stuff I have written, switch statement code generation gets a mention in this: shape-of-code.coding-guidelines.com/2009/09/register-vs-stack-based-vms/
>>>> I'm trying to put together a list of various methods to do code generate >>>> for switch statements. ...
>> Has anyone seen tries used for the string-based switch statements? ... >[If there's a lot of strings, a patricia trie might be faster than >hashing or binary search since it doesn't require repeated scans of >the string during the match process. -John]
It's very doubtful that it's faster than hashing. In hashing you traverse the tree once for computing the hash function and once for the final match. All other match attempts (and there are typically <1 of those) will typically mismatch at the first character, so no complete scan is necessary. You typically have to fetch one cache line per match attempt (i.e., typically 1-2; if you use external chaining, add another cache line fetch for the initial table lookup).
In contrast, in the trie method you have to chase a pointer for each substring you look at; if there are a lot of string, it's probably at every character for the first few characters. And for each pointer you chase, you will have to fetch a cache line.
Ok, if there is little competition for the D-cache, the first few levels of the trie might end up in the D-cache. OTOH, if a few words are looked up often, all the lines necessary for their lookup will be in the D-cache with a hash table, too.
Overall, I guess you can find cases where the patricia trie is faster (an application that does little but looking up lots of different, relatively short strings) but in most cases I think that hashing is faster, assuming good implementations of each data structure.
For our switch statement, we have to consider another issue: the data structures can be hard-coded; then at each node of the trie there will be a conditional or an indirect branch; for the hash table we have an indirect jump after computing the hash function, and then one or two conditional branches afterward. Depending on how repetetive and predictable the strings coming in are, the branch predictors may predict very well or very badly (i.e., 50% mispredicts for conditional branches, 100% for indirect branches).
Anyway, the more branches there are, the worse the effect of bad predictability will be (hitting the trie harder). It may be better not to hard-code the data structure to avoid some of the mispredictions; in the end, though, you probably will perform one indirect jump if you don't hard-code the data structure; I think that hard-coding the hash table will not be worse than not hard-coding it, but hard-coding the trie can be worse (depending on circumstances).
We've done a fair amount of experimentation on hashing verus trie building in our latest hardware design. As about 4 characters, hashes start to win. And, in our case, we can fit either in 1st level cache, that we can lock against spills, so cache misses aren't a penalty, which tend to make hashing an even better choice. Now, actual hardware and instruction set choices move the line a little, but for long enough strings hashing always wins.
Think of hashing as a way of rearranging the values, so that you can do the computed goto jump and make 1 n-way decision. The right hash minimizes collisions withou introducing too many empty slots.
BTW,long ago, when working at Pr1me computer, the switch statement optimization we did essentially reinvented a subset of B-trees without knowing what they were. We could mix tests and jump tables at each level as we descended what was essentially a trie of the numeric values. We didn't have the vocabulary to describe it so nicely at the time.
Hope this helps, -Chris
*************************************************************************** *** Chris Clark email: christopher.f.cl...@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres 23 Bailey Rd voice: (508) 435-5016 Berlin, MA 01503 USA twitter: @intel_chris