Another daft and possibly flameworthy question... :)
Has anyone ever attempted writing C++ RTTI support in an OS, if not in a kernel? I imagine it involves the creation of an "object header" of sorts in memory, in the implementation of the "new" operator's malloc routine. Sort of like the object headers in the runtimes for both Java and CLR.
Does anyone have any resources to read up on how to achieve this, or ideas on how to implement typeid & dynamic_cast? Thanks!
> [snip] > Has anyone ever attempted writing C++ RTTI support in an OS, if not in > a kernel?
No, I use C. I'll listen to your idea though. But, I'm not familiar with RTTI. Since I'm not, could you explain why or what you're wanting to do in your OS?
I mean, from what I've been able to gather about RTTI, it's a C++ feature that apparently allows you to perform a validity check on a type cast or type conversion at runtime instead of at compile time. What advantage does doing this at runtime have? What I'm trying to ask is, once your code is compiled and passes the compile time checks for a specific cast or conversion, why would it also need to do a runtime check? I'm assuming this has something to do with C++'s programming paradigm... Overloading?
> No, I use C. I'll listen to your idea though. But, I'm not familiar with > RTTI. Since I'm not, could you explain why or what you're wanting to do in > your OS?
> I mean, from what I've been able to gather about RTTI, it's a C++ feature > that apparently allows you to perform a validity check on a type cast or > type conversion at runtime instead of at compile time. What advantage does > doing this at runtime have? What I'm trying to ask is, once your code is > compiled and passes the compile time checks for a specific cast or > conversion, why would it also need to do a runtime check? I'm assuming this > has something to do with C++'s programming paradigm... Overloading?
Checking is a wrong word, identification is a correct one. The type identification is necessary to be able to perform dynamic dispatch. Since C is does not have polymorphic objects there is no dispatch and no need in identification.
As for OS support there are problems with it:
1. Reuse of the type identifiers (tags), after the type dies (leaving its scope). The tags should be dense or hashable in order to have dispatch efficient.
2. Global scope of the tags. In a distributed OS you will need to have unique tag for whatever possible type.
3. Persistence (stored polymorphic objects must keep their unique type tags)
4. Language impedance. Some languages like C++ define the way type tags must behave, which is inconsistent, limiting, and inefficient for more advanced OO languages. It might be difficult to find a common denominator.
I guess that tags (and dispatching tables) would have much in common with the virtual memory management.
>> No, I use C. I'll listen to your idea though. But, I'm not familiar >> with >> RTTI. Since I'm not, could you explain why or what you're wanting to do >> in >> your OS?
>> I mean, from what I've been able to gather about RTTI, it's a C++ feature >> that apparently allows you to perform a validity check on a type cast or >> type conversion at runtime instead of at compile time. What advantage >> does >> doing this at runtime have? What I'm trying to ask is, once your code is >> compiled and passes the compile time checks for a specific cast or >> conversion, why would it also need to do a runtime check? I'm assuming >> this >> has something to do with C++'s programming paradigm... Overloading?
(responding to Rod): RTTI can be used as a type-checking mechanism similar to dynamic types, but differing in that primarily, as opposed to using predicates, you use a "dynamic_cast" and see if the result is NULL...
> Checking is a wrong word, identification is a correct one. The type > identification is necessary to be able to perform dynamic dispatch. Since > C > is does not have polymorphic objects there is no dispatch and no need in > identification.
some of us do dynamic typing in C however, although typically by different means...
> As for OS support there are problems with it:
> 1. Reuse of the type identifiers (tags), after the type dies (leaving its > scope). The tags should be dense or hashable in order to have dispatch > efficient.
> 2. Global scope of the tags. In a distributed OS you will need to have > unique tag for whatever possible type.
> 3. Persistence (stored polymorphic objects must keep their unique type > tags)
these can be accomplished by using strings for types. though a "simple and intuitive" approach exists (comming up with a name for each possible use and naming each object with it), another possible approach (which works better if one is the compiler dev), is to use the source-code's type as the basis for the type ID.
the remaining issue is that, at the C level, there may be many "anonymous" types (such as structs or function-pointers which are declared implicitly as part of another declaration).
originally, I had used a "global gensym" to generate unique names for each such object, but once adding persistent metadata this became an issue since then the metadata cache becomes quickly polluted with endless redefinitions of the same type...
one idea I had found was to make use of the lexical structure for calculating a hash, such that 2 types formed via the same sequence of (internal) tokens are considered equivalent (theoretically, structural equivalence would be better, but it is much easier to calculate a hash and look up a type via its hash than it is to go and look for a pre-existing structurally-equivalent type).
this system is then used in combination with explicit type naming...
> 4. Language impedance. Some languages like C++ define the way type tags > must behave, which is inconsistent, limiting, and inefficient for more > advanced OO languages. It might be difficult to find a common denominator.
if possible, one option is to make use of a universal/generic "signature" system, where most types can be identified in terms of a signature string...
the issue here is that such a setup is a little more complicated than it may seem, as it needs to be capable of expressing not just structural types (structs, classes, ...), but also primitive types (int, short, long, ...), should not alias with languages (AKA: Java/C# char and C char are different, and both likely need to be expressed as different types, as well as expressing the very different object systems/semantics between C# and C++, ...); as well, it should also be capable of representing abstract virtual types (such as 'cons', 'fixnum', 'number', ...) typically found in dynamic languages (these should not be represented as the implementation's base-types, such as 'void *' or 'int'); ...
I have partly been addressing these issues, though mostly through mutation and adaptation (typically changes are kept small, as any notable change in my case requires modifying lots of disparate code...).
in my case, each signature is a string, so often this allows a very simple generic type-check mechanism: types are compatible if the signature strings are equivalent (theoretically, compatibility may exist between lexically different strings, but this is much more difficult and expensive to check, and in most cases, exact-match semantics are sufficient...).
I also like signature strings being strings as this means that sig-strings are far more easily human-decodable (unlike a binary signature coding, such as used in .NET). this does not mean that I think said strings should be "human readable" (as in, verbose to a point of being easily recognizable), rather, I just figure they should be composed of ASCII characters.
as-is, I will note that my system is sort of a descendant and hybridization of both the IA64 name-mangling scheme, and the JVM's signature strings (syntax closer to JVM, type notation similar to IA64's name mangling). FWIW, I will note that the IA64 name mangling scheme is de-facto in GCC and many other C++ compilers (except maybe MSVC and friends...). so, fammiliarity with name-mangling should give an idea of the basic notation (for C and C++ it is more-or-less letter-equivalent).
however, for other reasons my actual mangled names are somewhat different, where personally I feel my scheme to be an improvement over the IA64 scheme (both a generally cleaner syntax, as well as able to retain more information, such as the return type, ...). however, the combined name-and-signature is itself mangled (using an adaptation of the JNI scheme), and so the resultant mangled names look a little different.
in my case, the idea is to bridge the systems at link time (allowing calls to/from existing C++ compilers).
> I guess that tags (and dispatching tables) would have much in common with > the virtual memory management.
yeah...
type strings are associated with every object allocated from my garbage collector (among other things). allocating raw memory is typed as well, at least as far as identifying that the allocation is more-or-less unstructured memory (or unknown-structured memory).
of course, in this case, my GC uses a much older system than my signature system, so typically heap objects are named with generic names. I have considered at times using sig-strings when allocating objects as well, just this has not caught on as of yet in my case (technically, it would likely require internal framework code to be doing the allocations, and generic user code is unlikely to be able to get the signatures exactly right or keep everything consistsnt between their code and the matadata database, ...).
so, for user-allocated objects, generic/semantic type names are probably better (used in combination with sig-strings).
the main transition point would likely be with the introdution of the 'new' keyword, as then it is the compiler allocating the object, and so can far more likely generate correct sig-strings for the allocs.
I have no real idea how existing C++ compilers typically do RTTI though, since this does not seem to be exactly well documented...
> I also like signature strings being strings as this means that sig-strings > are far more easily human-decodable (unlike a binary signature coding, > such as used in .NET). this does not mean that I think said strings should > be "human readable" (as in, verbose to a point of being easily > recognizable), rather, I just figure they should be composed of ASCII > characters.
> as-is, I will note that my system is sort of a descendant and > hybridization of both the IA64 name-mangling scheme, and the JVM's > signature strings (syntax closer to JVM, type notation similar to IA64's > name mangling). FWIW, I will note that the IA64 name mangling scheme is > de-facto in GCC and many other C++ compilers (except maybe MSVC and > friends...). so, fammiliarity with name-mangling should give an idea of > the basic notation (for C and C++ it is more-or-less letter-equivalent).
> however, for other reasons my actual mangled names are somewhat different, > where personally I feel my scheme to be an improvement over the IA64 > scheme (both a generally cleaner syntax, as well as able to retain more > information, such as the return type, ...). > however, the combined name-and-signature is itself mangled (using an > adaptation of the JNI scheme), and so the resultant mangled names look a > little different.
> in my case, the idea is to bridge the systems at link time (allowing calls > to/from existing C++ compilers).
I went and checked, and I will now de-emphasize the similarity of my scheme and the IA64 scheme...
it seems I had failed to take note of something, which is that my scheme was originally a subset of the IA64 scheme, and has since mutated away much further than I had originally taken into account...
much past the level of common base types (char, int, long, ...) and pointers-to-types, the similarity ends...
so, in both schemes "void **" is "PPv" and "unsigned char *" is "Ph", but much past this level, and things quickly diverge (handling of complex types, scoping, ... is very different, and at this point my scheme becomes far more similar to the JVM's scheme).
"struct Foo_s *" -> "PXFoo_s;"
'X' is used for structs/unions/unmanaged classes/...
'U' ended up being used for user types, rather than user type-modifiers (different from IA64, so 'U' in my case serves the role of 'u' in IA64, and oddly I had considered 'F' for the use 'U' apparently originally had, ...).
as well as endless other differences...
for: "namespace foo { ... __gc class Bar { ... }; ... }" "Bar" -> "Lfoo/Bar;"
so, yeah, it is sort of like the love-child of the JVM's signature-system with aspects of the IA64 scheme, and some amount of custom engineering as well...
"int[,]" -> "C2i" ("Cf" is "float _Complex", ... but 'C' with a number was overloaded for square arrays). "int[][]" -> "QQi" (vs "[[i" as in the JVM, as I had wanted to keep '[' available as a possible later brace, rather than as a type modifier...).
as noted, my JNI-scheme adaptation has several additional character-shorthands, as well as the ability to encode 2-digit (8 bit) char escapes (vs forcing a 16-bit escape for all chars): '_9xx' vs '_0xxxx' ...
so, yeah, in all it is a little more "custom" than "conventional"...
On Fri, 19 Jun 2009 10:18:52 -0700, cr88192 wrote: > "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de> wrote in message > news:pvpvu6vo1ahz$.81meb7ad8cfn.dlg@40tude.net... >> Checking is a wrong word, identification is a correct one. The type >> identification is necessary to be able to perform dynamic dispatch. Since >> C is does not have polymorphic objects there is no dispatch and no need in >> identification.
> some of us do dynamic typing in C however, although typically by different > means...
Sure, but dynamic typing is a larger different issue. We should differentiate identification and full type descriptions. You might wish to have the first in OS, keeping the second outside.
>> As for OS support there are problems with it:
>> 1. Reuse of the type identifiers (tags), after the type dies (leaving its >> scope). The tags should be dense or hashable in order to have dispatch >> efficient.
>> 2. Global scope of the tags. In a distributed OS you will need to have >> unique tag for whatever possible type.
>> 3. Persistence (stored polymorphic objects must keep their unique type >> tags)
> these can be accomplished by using strings for types.
Yes, but that would make it pretty useless. An implementation should deliver performance of a dispatching call at the level of a plain call x 1.5-2 x number of dispatching arguments.
> the remaining issue is that, at the C level, there may be many "anonymous" > types (such as structs or function-pointers which are declared implicitly as > part of another declaration). > originally, I had used a "global gensym" to generate unique names for each > such object, but once adding persistent metadata this became an issue since > then the metadata cache becomes quickly polluted with endless redefinitions > of the same type...
This problem comes with structured equivalence of types. Which makes identification quite difficult, because you should describe the type in its tag. With a nominal equivalence the type tag could be a number.
>>> Checking is a wrong word, identification is a correct one. The type >>> identification is necessary to be able to perform dynamic dispatch. >>> Since >>> C is does not have polymorphic objects there is no dispatch and no need >>> in >>> identification.
>> some of us do dynamic typing in C however, although typically by >> different >> means...
> Sure, but dynamic typing is a larger different issue. We should > differentiate identification and full type descriptions. You might wish to > have the first in OS, keeping the second outside.
yes, ok. I had almost thought maybe the point was to support a full VM-like typesystem in the kernel or similar...
>>> 1. Reuse of the type identifiers (tags), after the type dies (leaving >>> its >>> scope). The tags should be dense or hashable in order to have dispatch >>> efficient.
>>> 2. Global scope of the tags. In a distributed OS you will need to have >>> unique tag for whatever possible type.
>>> 3. Persistence (stored polymorphic objects must keep their unique type >>> tags)
>> these can be accomplished by using strings for types.
> Yes, but that would make it pretty useless. An implementation should > deliver performance of a dispatching call at the level of a plain call x > 1.5-2 x number of dispatching arguments.
FWIW, if well-designed, a string-based signature can be of similar performance to a binary signature...
similarly, via interning strings, ... a number of other optimizations can be done (for example, typechecking via pointer-equality, ...).
but, yeah, for dispatch, one can do as the JVM does and require 1:1 string equivalence in these cases...
granted, usually one things strings, and they also think "slow", but in practice this need not be the case, as carefully designed strings and processing code can perform similarly to binary representations (typically by basing both the representation and processing code around the use of switches or jump-tables...).
similarly, though it is possible to dynamically dispatch using strings, in many cases I optimize this by using the strings to drive the creation of specialized machine code thunks, such that the machine code can perform the requested dispatch with no additional runtime overhead (internally, I use this for object method dispatching and similar...).
many other tasks can be done effectively via hash tables...
>> the remaining issue is that, at the C level, there may be many >> "anonymous" >> types (such as structs or function-pointers which are declared implicitly >> as >> part of another declaration).
>> originally, I had used a "global gensym" to generate unique names for >> each >> such object, but once adding persistent metadata this became an issue >> since >> then the metadata cache becomes quickly polluted with endless >> redefinitions >> of the same type...
> This problem comes with structured equivalence of types. Which makes > identification quite difficult, because you should describe the type in > its > tag. With a nominal equivalence the type tag could be a number.
yeah...
that was the point of my later fix of using lexical hashing: as a general rule, structurally equivalent types will generate the same hash.
"struct { int x; int y; }"
may appear in 2 different contexts, and can then be hashed using the tokens, in this case: "{", "int", "x", ";", "int", "y", ";", '}'
by hashing and checking this, we can know, at least, that the types are structurally equivalent.
however, there are cases where structurally "compatible" types may end up with different hashes, which FWIW would require more expensive checks to be done (such as searching for a matching type).
but, for now, I am lazy, and make the simplifying assumption that structural-equivalence = lexical-equivalence...
a slight modification (making it potentially safer), is using "internal" tokens, AKA, the structure is parsed and types/... are fully expanded, and then we hash this "conceptual" version of the object.
this way, 2 lexically equivalent objects which appear in different contexts and potentially have different types (but which overlap in terms of name) would generate different hashes.
something much closer to "proper" structural equivalence could be checked by then flattening the entire structure into a signature, and hashing/checking this signature.
for example, a loose structural equivalence could flatten the above struct to "{ii}", or if names are significant: "{/x;i/y;i}" or similar...
On Fri, 19 Jun 2009 12:26:44 -0700, cr88192 wrote: > "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de> wrote in message > news:rjm4lkpzo505$.jh2kpma5noap.dlg@40tude.net... >> On Fri, 19 Jun 2009 10:18:52 -0700, cr88192 wrote:
>>>> As for OS support there are problems with it:
>>>> 1. Reuse of the type identifiers (tags), after the type dies (leaving >>>> its >>>> scope). The tags should be dense or hashable in order to have dispatch >>>> efficient.
>>>> 2. Global scope of the tags. In a distributed OS you will need to have >>>> unique tag for whatever possible type.
>>>> 3. Persistence (stored polymorphic objects must keep their unique type >>>> tags)
>>> these can be accomplished by using strings for types.
>> Yes, but that would make it pretty useless. An implementation should >> deliver performance of a dispatching call at the level of a plain call x >> 1.5-2 x number of dispatching arguments.
> FWIW, if well-designed, a string-based signature can be of similar > performance to a binary signature...
[...]
> many other tasks can be done effectively via hash tables...
To start with you have to calculate the hash value from a string, which alone is too slow as compared to simple indexing.
Further type tags must be ordered according to the <: relation. An implementation of <: is required for tests like if the type S is in the class rooted in the type T.
> but, for now, I am lazy, and make the simplifying assumption that > structural-equivalence = lexical-equivalence...
Which is a wrong assumption. Two types are equivalent when they have same operations. Once you define an operation Foo for the type int in some context the result will be another type different from original int. As known in OO, you have to carry all operations with types in order to be able to compare (and deal) them. Just for this reason structural equivalence is hardly compatible with OO.
>>>>> As for OS support there are problems with it:
>>>>> 1. Reuse of the type identifiers (tags), after the type dies (leaving >>>>> its >>>>> scope). The tags should be dense or hashable in order to have dispatch >>>>> efficient.
>>>>> 2. Global scope of the tags. In a distributed OS you will need to have >>>>> unique tag for whatever possible type.
>>>>> 3. Persistence (stored polymorphic objects must keep their unique type >>>>> tags)
>>>> these can be accomplished by using strings for types.
>>> Yes, but that would make it pretty useless. An implementation should >>> deliver performance of a dispatching call at the level of a plain call x >>> 1.5-2 x number of dispatching arguments.
>> FWIW, if well-designed, a string-based signature can be of similar >> performance to a binary signature...
> [...]
>> many other tasks can be done effectively via hash tables...
> To start with you have to calculate the hash value from a string, which > alone is too slow as compared to simple indexing.
typically, hashes and indices are trivial to calculate. if you mean for representing the type (in an object), then yes, an index IS used here, but an index only tells where something is, not what it is...
for other tasks, one can precompute the hash, or make use of the literal pointer value...
so, as I see it, using indices/handles does not preclude using strings as the cannonical representation, no more than does using strings preclude using indices or handles for cases where these make more sense...
the point here is ASCII signature vs binary signature (AKA: a string of bytes where each byte works as a modifier), and the indernal use of indices or handles is not particularly relevant.
a loop and swtich will not really care whether or not its input is based on ASCII characters or literal byte values...
> Further type tags must be ordered according to the <: relation. An > implementation of <: is required for tests like if the type S is in the > class rooted in the type T.
this is a special case...
to pull this one off requires having a prebuilt structural representation of the class heirarchy (at which point, one fetches the relevant classes and performs the operation), or if one does not, that the types are strings or not does not make much difference, the operation will still be slow (for example, if performing the operation is performed via a large number of database operations...).
more so, this operation can be sped up notably via the use of hashing, and so in the general cases ceases to matter much.
is T descended from S? combine T and S hashes, and check the hash table for either a conformation or rejection, and if not found, perform the formal check and add it to the hash.
anyways, many operations (for example, in my object system), are pulled off in double-digit clock-cycles, which means I probably can't be doing things THAT badly...
>> but, for now, I am lazy, and make the simplifying assumption that >> structural-equivalence = lexical-equivalence...
> Which is a wrong assumption. Two types are equivalent when they have same > operations. Once you define an operation Foo for the type int in some > context the result will be another type different from original int. As > known in OO, you have to carry all operations with types in order to be > able to compare (and deal) them. Just for this reason structural > equivalence is hardly compatible with OO.
I fail to see the point of this claim...
the point in this case is to merge multiple redefinitions of the same type (such as a struct, ...), and lexical equivalence and/or structural equivalence is a good way to do this...
the semantic type heirarchy is, as I see it, an entirely different system.
actually, I tend to represent "types" and "structures" as essentially 2 parallel systems, where "structures" represents the layout or representation of things, and tends to rely on structural equivalence.
"types" is its own systems, where a type may use a structure, but it is not necessarily the case that a type is a structure (or vice-versa...).
types are still, however, typically merged along the lines of definitional equivalence, which would include things like their names, overloaded operators, ...
anyway, because something is a "simplifying assumption" does not mean it is "universally true", only that "it works in enough cases that, for now, I will assume it to be true" (in effect, there are cases which may break such an assumption, but they are ignored).
it would be like, for example, if one makes an assembler and makes a simplifying assumption: there are 3 sections: '.text', '.data', and '.bss'.
there may well be other sections in existence, but for the sake of the task at hand, they are not relevant.
now, in my case, I tend to assume that lexical-equivalence=structural-equivalence. the reason: doing a proper check for structural equivalence is expensive; hashing a sequence of tokens and seeing if I have a match, is VERY much cheaper (even if it "may" risk false negatives, or a naive approach may risk false positives in certain contrieved cases...).
so, one need not actually care in all cases what is actually the case, just whatever can be done most cheaply and efficiently...
On Sat, 20 Jun 2009 10:28:15 -0700, cr88192 wrote: > "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de> wrote in message > news:1fn4evn388rjq.mpfhmxywl1p5.dlg@40tude.net... >> On Fri, 19 Jun 2009 12:26:44 -0700, cr88192 wrote:
>>>>>> As for OS support there are problems with it:
>>>>>> 1. Reuse of the type identifiers (tags), after the type dies (leaving >>>>>> its >>>>>> scope). The tags should be dense or hashable in order to have dispatch >>>>>> efficient.
>>>>>> 2. Global scope of the tags. In a distributed OS you will need to have >>>>>> unique tag for whatever possible type.
>>>>>> 3. Persistence (stored polymorphic objects must keep their unique type >>>>>> tags)
>>>>> these can be accomplished by using strings for types.
>>>> Yes, but that would make it pretty useless. An implementation should >>>> deliver performance of a dispatching call at the level of a plain call x >>>> 1.5-2 x number of dispatching arguments.
>>> FWIW, if well-designed, a string-based signature can be of similar >>> performance to a binary signature...
>> [...]
>>> many other tasks can be done effectively via hash tables...
>> To start with you have to calculate the hash value from a string, which >> alone is too slow as compared to simple indexing.
> typically, hashes and indices are trivial to calculate.
O(n), where n is the length of the string as compared to O(1) of indexing.
> the point here is ASCII signature vs binary signature (AKA: a string of > bytes where each byte works as a modifier), and the indernal use of indices > or handles is not particularly relevant.
What I meant that the difference is n vs. 1.
>> Further type tags must be ordered according to the <: relation. An >> implementation of <: is required for tests like if the type S is in the >> class rooted in the type T.
> this is a special case...
You need in order to implement dynamic cast.
> is T descended from S?
This is the definition of T <: S. In order to evaluate it efficiently, you need ordered sets, rather than unordered, represented by a hash. The point is merely that simple hash won't really help here.
>>> but, for now, I am lazy, and make the simplifying assumption that >>> structural-equivalence = lexical-equivalence...
>> Which is a wrong assumption. Two types are equivalent when they have same >> operations. Once you define an operation Foo for the type int in some >> context the result will be another type different from original int. As >> known in OO, you have to carry all operations with types in order to be >> able to compare (and deal) them. Just for this reason structural >> equivalence is hardly compatible with OO.
> I fail to see the point of this claim...
The point is that structural equivalence does not capture the notion of type used in OO.
> the point in this case is to merge multiple redefinitions of the same type > (such as a struct, ...), and lexical equivalence and/or structural > equivalence is a good way to do this...
If you have identification, you just compare the identifiers. Otherwise, structural equivalence is in general case a halting problem. So the bottom line is, do not bother with structural equivalence, it does not work anyway.
> the semantic type heirarchy is, as I see it, an entirely different system.
Why? You need a tree distance defined on type tags, otherwise they would be useless.
> now, in my case, I tend to assume that > lexical-equivalence=structural-equivalence.
Huh, but that is what named equivalence is. It compares names and ignores any semantics!
> Another daft and possibly flameworthy question... :)
> Has anyone ever attempted writing C++ RTTI support in an OS, if not in > a kernel? I imagine it involves the creation of an "object header" of > sorts in memory, in the implementation of the "new" operator's malloc > routine. Sort of like the object headers in the runtimes for both Java > and CLR.
> Does anyone have any resources to read up on how to achieve this, or > ideas on how to implement typeid & dynamic_cast? Thanks!
GCC code, if you are interested and can read it :-
>>>> "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de> wrote in message >>>> news:rjm4lkpzo505$.jh2kpma5noap.dlg@40tude.net... >>>>> On Fri, 19 Jun 2009 10:18:52 -0700, cr88192 wrote: >>> To start with you have to calculate the hash value from a string, which >>> alone is too slow as compared to simple indexing.
>> typically, hashes and indices are trivial to calculate.
>O(n), where n is the length of the string as compared to O(1) of indexing.
I'm not sure what you mean by indexing, but for any meaning that occurs to me, this is not right.
If we have N distinct strings it takes log(N) decisions to distinguish between them. Moreover the average string length, n, must be >= log2(N) bits.
Furthermore O(log n) is the cost of determining that a string s is possibly in our set of strings. To verify that s actually is in the set of strings we must check the the entire string. I.e., verification is O(n).
In short, the cost of looking of looking up a string in a set of strings is O(n). The cost can be concealed by a languages syntax but it is there.
>>>>>>> As for OS support there are problems with it:
>>>>>>> 1. Reuse of the type identifiers (tags), after the type dies >>>>>>> (leaving >>>>>>> its >>>>>>> scope). The tags should be dense or hashable in order to have >>>>>>> dispatch >>>>>>> efficient.
>>>>>>> 2. Global scope of the tags. In a distributed OS you will need to >>>>>>> have >>>>>>> unique tag for whatever possible type.
>>>>>>> 3. Persistence (stored polymorphic objects must keep their unique >>>>>>> type >>>>>>> tags)
>>>>>> these can be accomplished by using strings for types.
>>>>> Yes, but that would make it pretty useless. An implementation should >>>>> deliver performance of a dispatching call at the level of a plain call >>>>> x >>>>> 1.5-2 x number of dispatching arguments.
>>>> FWIW, if well-designed, a string-based signature can be of similar >>>> performance to a binary signature...
>>> [...]
>>>> many other tasks can be done effectively via hash tables...
>>> To start with you have to calculate the hash value from a string, which >>> alone is too slow as compared to simple indexing.
>> typically, hashes and indices are trivial to calculate.
> O(n), where n is the length of the string as compared to O(1) of indexing.
theoretically, yes. in practice, this usually does not matter (as small loops are typically really fast and most strings are short).
in cases where performance is important, one need not do lookups with strings. in other cases, this does not matter.
>> the point here is ASCII signature vs binary signature (AKA: a string of >> bytes where each byte works as a modifier), and the indernal use of >> indices >> or handles is not particularly relevant.
> What I meant that the difference is n vs. 1.
however, if the string is maybe only a few chars long (typical of the sig strings in my case), this 'n' does not matter a whole lot.
I get a much bigger hit, FWIW, from having profiler options turned on...
>>> Further type tags must be ordered according to the <: relation. An >>> implementation of <: is required for tests like if the type S is in the >>> class rooted in the type T.
>> this is a special case...
> You need in order to implement dynamic cast.
ok. yes, but I was noting the "entirety" of the typesystem, not simply dynamic cast.
dynamic cast can use special mechanisms (as in, directly interfacing with the object system), and thus bypassing any direct usage of these strings. the core mechanisms for dynamic cast were actually implementing in my case for implementing a JVM, and it makes use of direct pointers, precomputed hashes, and a hashtable.
I think I mentally before wondered if the hash table wondered much in the face of an operation which simply walks the inheritence tree, but it could be a lot more expensive in the MI case or similar, so the hash is probably reasonable.
or, at least, this is how it is done for "managed objects". I have not yet decided on the RTTI scheme for unmanaged objects.
possibility A: implement the approach from the IA64 C++ ABI. possibility B: hide a reference to the "class info" in the same spot as would be used for RTTI, but this would break compatibility with gcc's RTTI... possibility C: do internal magic and combine A and B, where I have plain RTTI, and a hidden reference into the object system embedded in the RTTI info magic...
>> is T descended from S?
> This is the definition of T <: S. In order to evaluate it efficiently, you > need ordered sets, rather than unordered, represented by a hash. The point > is merely that simple hash won't really help here.
a hash does not help "much", but it does help.
otherwise, it is walking the inheritence tree, which is also cheap. either way, it is cheap...
from this stance, one could just as easily argue "you can't implement interface method dispatch with a hash table", but oh wait, I do, and it works fairly well...
many operations are in the double-digits of clock cycles, so things are probably not THAT bad...
>>>> but, for now, I am lazy, and make the simplifying assumption that >>>> structural-equivalence = lexical-equivalence...
>>> Which is a wrong assumption. Two types are equivalent when they have >>> same >>> operations. Once you define an operation Foo for the type int in some >>> context the result will be another type different from original int. As >>> known in OO, you have to carry all operations with types in order to be >>> able to compare (and deal) them. Just for this reason structural >>> equivalence is hardly compatible with OO.
>> I fail to see the point of this claim...
> The point is that structural equivalence does not capture the notion of > type used in OO.
this is a different point. as noted, the behavior of "named types" is very different from "unnamed types".
in my case, it is mostly a matter of relevance to unnamed types, than of named types...
consider: struct { int x; int y; } foo; ... struct { int x; int y; } bar;
(stuff like this actually happens in practice).
really, should each struct be assumed to be a completely unique type (and thus, through subsequent runs, fill up the metadata database with endless versions of essentially the same struct), or can we just safely merge them, thus avoiding a "database getting gradually filled with garbage" level of threat?...
FWIW, 2 different/unrelated classes generally CAN'T be merged via structural equivalence, it just does not work this way (firstly, classes are typically named, secondly, they typically inherit things, ...).
as for anonymous classes, well, that remains to be seen...
>> the point in this case is to merge multiple redefinitions of the same >> type >> (such as a struct, ...), and lexical equivalence and/or structural >> equivalence is a good way to do this...
> If you have identification, you just compare the identifiers. Otherwise, > structural equivalence is in general case a halting problem. So the bottom > line is, do not bother with structural equivalence, it does not work > anyway.
if I have identifiers, then I use them...
the issue here is what to do about cases where the original programmer did not bother to name the type, and the compiler is left figuring out what to do about it.
about the halting problem, I don't see why this is the case. structural equivalence is cheap enough to check, and hashes can be easily enough computed.
granted, in the "general case" (as in, not doing hash-based merging), I would have to go through a potentially long and slow look through the database, and probably deal with an O(n^2) overhead trying to merge things (especially bad since the DB is a key/value system vaguely similar to the Windows System Registry, and so is not so efficient for generalized queries...).
simple solution: I don't do this...
if the hash approach doesn't work, then the structures do not merge. it is not that difficult really...
>> the semantic type heirarchy is, as I see it, an entirely different >> system.
> Why? You need a tree distance defined on type tags, otherwise they would > be > useless.
in my compiler framework, they are implemented as 2 essentially parallel structures. more or less, this is a side effect of how C is designed, and C++ (more or less) follows suit (C++ blurs the line a little, but the split more or less remains).
the result is that there is an internal split between "types" and "structure".
within the metadata DB, both types and structures are defined, typically, in terms of QNames (or, alternatively, hashes, if they are used, along with GUIDs, but I have not had much reason thus far for using/allowing GUIDs as identifiers, and have instead typically been using Base64-coded 96-bit UIDs instead...).
there is also fairly heavy use of signatures, which in this case may both reference the QName (as well as indicating the way in which this qname is being used), as well as identify primitive types and other things.
hmm: I am now left tempted by the idea of a B64-GUID, or essentially a GUID using a Base64 representation as opposed to the "{XXXXXXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXX}" notation...
the reason: 22 or 23 characters (vs like 38 chars).
my 96 bit UIDs, however, nicely take up 16 chars (raw representation...).
A..Z a..z 0..9 _ $
>> now, in my case, I tend to assume that >> lexical-equivalence=structural-equivalence.
> Huh, but that is what named equivalence is. It compares names and ignores > any semantics!
>>>> To start with you have to calculate the hash value from a string, which >>>> alone is too slow as compared to simple indexing.
>>> typically, hashes and indices are trivial to calculate.
>>O(n), where n is the length of the string as compared to O(1) of indexing.
> I'm not sure what you mean by indexing,
I meant evaluation of the target of a dispatching call. When the dispatching table is represented by a dense array (the single dispatch case) indexed by the type tag. Ideal performance of dispatch should be close to that.
> but for any meaning that > occurs to me, this is not right.
> If we have N distinct strings it takes log(N) decisions to > distinguish between them. Moreover the average string length, n, > must be >= log2(N) bits.
You consider here a dispatching table represented by a binary tree. cr88192 did consider hash tables. My response was that evaluation of a hash value is still O(n).
> Furthermore O(log n) is the cost of determining that a string s is > possibly in our set of strings. To verify that s actually is in > the set of strings we must check the the entire string. I.e., > verification is O(n).
> In short, the cost of looking of looking up a string in a set of > strings is O(n). The cost can be concealed by a languages syntax > but it is there.
On Sun, 21 Jun 2009 10:39:30 -0700, cr88192 wrote: > "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de> wrote in message > news:fykmjqnzt20z.o7lx4giogyeq.dlg@40tude.net... >> On Sat, 20 Jun 2009 10:28:15 -0700, cr88192 wrote: >>> typically, hashes and indices are trivial to calculate.
>> O(n), where n is the length of the string as compared to O(1) of indexing.
> theoretically, yes. > in practice, this usually does not matter (as small loops are typically > really fast and most strings are short).
Not in the case when you have to pack all existing types into the table. The idea was to handle it global by the OS.
> in cases where performance is important, one need not do lookups with > strings.
The language does not know if performance is important when the programmer writes f(x). It has to decide how to implement the call. If the expected performance of an OS service is poor, it will not be used at all.
>>>> Further type tags must be ordered according to the <: relation. An >>>> implementation of <: is required for tests like if the type S is in the >>>> class rooted in the type T.
>>> this is a special case...
>> You need in order to implement dynamic cast.
> ok. yes, but I was noting the "entirety" of the typesystem, not simply > dynamic cast.
> dynamic cast can use special mechanisms (as in, directly interfacing with > the object system), and thus bypassing any direct usage of these strings.
So it is that "bypassing" which actually does all the job. (:-)) So you need something that does it (without strings and hash tables). That thing could be made a part of OS. But I doubt it would be easy to do.
> I think I mentally before wondered if the hash table wondered much in the > face of an operation which simply walks the inheritence tree, but it could > be a lot more expensive in the MI case or similar, so the hash is probably > reasonable.
I don't believe in hash tables. Additionally to already discussed:
you also need a very efficient merge and split of the dispatching tables.
This is an issue when types are declared in a scope (you extend the dispatching table by adding new type tags = new rows and columns there). Then, when the type leaves the scope and dies you have to collapse the tables. It would be extremely expensive operation with hashes. And as an OS service that is always the case, since applications come and go. So we have to forget that nice global scope static types.
> in my case, it is mostly a matter of relevance to unnamed types, than of > named types...
> consider: > struct { int x; int y; } foo; > ... > struct { int x; int y; } bar;
> (stuff like this actually happens in practice).
> really, should each struct be assumed to be a completely unique type (and > thus, through subsequent runs, fill up the metadata database with endless > versions of essentially the same struct), or can we just safely merge them, > thus avoiding a "database getting gradually filled with garbage" level of > threat?...
Because when f is defined on foo that does not make it defined on bar.
> the issue here is what to do about cases where the original programmer did > not bother to name the type, and the compiler is left figuring out what to > do about it.
You introduce a unique ID, which is the type tag.
> about the halting problem, I don't see why this is the case.
Because two types are equivalent if and only if all operations defined on them are equivalent. If somebody defines f on foo, that alone would make it distinct from bar. Note that members like x and y are actually operations too, which get and set the corresponding components. So foo and bar are equivalent because both have "get x" and "set x" of type integer.
>>>> typically, hashes and indices are trivial to calculate.
>>> O(n), where n is the length of the string as compared to O(1) of >>> indexing.
>> theoretically, yes. >> in practice, this usually does not matter (as small loops are typically >> really fast and most strings are short).
> Not in the case when you have to pack all existing types into the table. > The idea was to handle it global by the OS.
not so much of a problem... one can just allocate a hash table with maybe 1M entries, and hope that it does not run out (or, if it does, endure the cost of re-hashing into a bigger table...).
>> in cases where performance is important, one need not do lookups with >> strings.
> The language does not know if performance is important when the programmer > writes f(x). It has to decide how to implement the call. If the expected > performance of an OS service is poor, it will not be used at all.
hmm... many people tolerate Vista and Windows 7...
I have noticed that many typical Win32 API calls are FAR slower than even code which is done dynamically and via ASCII-based accessors (such as doing an interface method call and dispatching the method based on a textual name and signature, rather than via method handles...).
the slowness of the Win32 API is almost notable in its own right...
anyways, one can generally optimize according to commonality. machinery likely to be core functionality of a framework (such as accessing fields or calling methods) can be highly optimized.
this does not mean that every other piece of machinery need follow the exact same design...
>>>>> Further type tags must be ordered according to the <: relation. An >>>>> implementation of <: is required for tests like if the type S is in >>>>> the >>>>> class rooted in the type T.
>>>> this is a special case...
>>> You need in order to implement dynamic cast.
>> ok. yes, but I was noting the "entirety" of the typesystem, not simply >> dynamic cast.
>> dynamic cast can use special mechanisms (as in, directly interfacing with >> the object system), and thus bypassing any direct usage of these strings.
> So it is that "bypassing" which actually does all the job. (:-)) So you > need something that does it (without strings and hash tables). That thing > could be made a part of OS. But I doubt it would be easy to do.
yeah...
strings are typically my "cannonical" representation, but not every operation needs to use them...
it is just, which would one rather have for everything: strings; specialized structures and bit-twiddling.
I prefer to use strings, as then I can represent things fairly uniformly (it doesn't matter which exact subsystem or similar I am interacting with...).
some subsystems, however, may do things different internally...
>> I think I mentally before wondered if the hash table wondered much in the >> face of an operation which simply walks the inheritence tree, but it >> could >> be a lot more expensive in the MI case or similar, so the hash is >> probably >> reasonable.
> I don't believe in hash tables. Additionally to already discussed:
> you also need a very efficient merge and split of the dispatching tables.
> This is an issue when types are declared in a scope (you extend the > dispatching table by adding new type tags = new rows and columns there). > Then, when the type leaves the scope and dies you have to collapse the > tables. It would be extremely expensive operation with hashes. And as an > OS > service that is always the case, since applications come and go. So we > have > to forget that nice global scope static types.
once added we could "pretend" that types don't die...
that or, in my object system, the approach is basically to periodically "rehash" in a way which lesser-used entries are discarded (rehashing is not cheap, but it is also rare...).
as such, types will die when the table is rehashed, and they fall out of visibility.
as for the inheritence-checking hash, this has is small, and is more of a cache. if/when a type is actually destroyed, or another critical change is made, the hash can be simply cleared and then rebuilt (I don't use this as much for bigger hashes, as it can be a far more severe hit, overall, than rehashing...).
>> in my case, it is mostly a matter of relevance to unnamed types, than of >> named types...
>> consider: >> struct { int x; int y; } foo; >> ... >> struct { int x; int y; } bar;
>> (stuff like this actually happens in practice).
>> really, should each struct be assumed to be a completely unique type (and >> thus, through subsequent runs, fill up the metadata database with endless >> versions of essentially the same struct), or can we just safely merge >> them, >> thus avoiding a "database getting gradually filled with garbage" level of >> threat?...
> Because when f is defined on foo that does not make it defined on bar.
no, as then the structure has changed, and they will no longer be merged...
basically, adding a property will essentially change the structure, and thus the hash, and so it will no longer try to merge with the older type, rather it will try to merge with whatever type has the new hash (should such a type exist).
>> the issue here is what to do about cases where the original programmer >> did >> not bother to name the type, and the compiler is left figuring out what >> to >> do about it.
> You introduce a unique ID, which is the type tag.
the problem here is with multiple runs and a large persistent metadata database...
prior to the DB, this worked plenty well. after adding the DB, each new run essentially creates garbage, in the form of new UIDs and redefinitions of essentially the same types...
to deal with this would require some means of effectively "cleaning" the DB of old entries, and preventing it from gradually being weighed down by huge amounts of garbege...
>> about the halting problem, I don't see why this is the case.
> Because two types are equivalent if and only if all operations defined on > them are equivalent. If somebody defines f on foo, that alone would make > it > distinct from bar. Note that members like x and y are actually operations > too, which get and set the corresponding components. So foo and bar are > equivalent because both have "get x" and "set x" of type integer.
this is only if said operations are defined externally...
if they are methods, they are part of the structure, and no such merging would happen in the first place.
if they are defined via operator overloading, then they exist as part of the types' heirarchy, and not as part of the structures' heirarchy (a given structural type may be shared between several ADTs, and these ADTs will continue being unique, even if the structure is shared...).
if operator overloading were made to merge the overloaded operators into the structure, then effectively they would join with said structure, thus changing its hash and preventing merging.
so, again, there does not seem to be any real problem IMO...
On Sun, 21 Jun 2009 15:46:07 -0700, cr88192 wrote: > "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de> wrote in message > news:107uhrk3t7q4x$.pu1rjhwu0wpp$.dlg@40tude.net... >> On Sun, 21 Jun 2009 10:39:30 -0700, cr88192 wrote: >>> theoretically, yes. >>> in practice, this usually does not matter (as small loops are typically >>> really fast and most strings are short).
>> Not in the case when you have to pack all existing types into the table. >> The idea was to handle it global by the OS.
> not so much of a problem... > one can just allocate a hash table with maybe 1M entries, and hope that it > does not run out (or, if it does, endure the cost of re-hashing into a > bigger table...).
...in a flight control system during a flight across Atlantic...
>>> in cases where performance is important, one need not do lookups with >>> strings.
>> The language does not know if performance is important when the programmer >> writes f(x). It has to decide how to implement the call. If the expected >> performance of an OS service is poor, it will not be used at all.
> hmm... many people tolerate Vista and Windows 7...
There is already Windows. You cannot do it better (sorry worse (:-)). It is perfectly bad.
> anyways, one can generally optimize according to commonality.
I don't believe it. If an optimum exists and can be found, why don't you implement the optimum from the start?
>>> dynamic cast can use special mechanisms (as in, directly interfacing with >>> the object system), and thus bypassing any direct usage of these strings.
>> So it is that "bypassing" which actually does all the job. (:-)) So you >> need something that does it (without strings and hash tables). That thing >> could be made a part of OS. But I doubt it would be easy to do.
> yeah...
> strings are typically my "cannonical" representation, but not every > operation needs to use them...
>>> I think I mentally before wondered if the hash table wondered much in the >>> face of an operation which simply walks the inheritence tree, but it could >>> be a lot more expensive in the MI case or similar, so the hash is probably >>> reasonable.
>> I don't believe in hash tables. Additionally to already discussed:
>> you also need a very efficient merge and split of the dispatching tables.
>> This is an issue when types are declared in a scope (you extend the >> dispatching table by adding new type tags = new rows and columns there). >> Then, when the type leaves the scope and dies you have to collapse the >> tables. It would be extremely expensive operation with hashes. And as an OS >> service that is always the case, since applications come and go. So we have >> to forget that nice global scope static types.
> once added we could "pretend" that types don't die...
Oh yes, that is the Windows approach. If something goes wrong - reboot the damn beast...
> as for the inheritence-checking hash, this has is small, and is more of a > cache. if/when a type is actually destroyed, or another critical change is > made, the hash can be simply cleared and then rebuilt (I don't use this as > much for bigger hashes, as it can be a far more severe hit, overall, than > rehashing...).
That makes it totally useless for a language that creates type in a tight loop. For example, in Ada one often writes:
procedure Foo (X : <some-array-type>) is subtype Index is Integer range X'Range; -- The subtype which is always in the bounds of X. begin ... -- The compiler can omit any bound checks related to -- use Index with X
This is innocent hint to the compiler with turn catastrophic with what you proposed when Foo is called in a loop.
>> Because when f is defined on foo that does not make it defined on bar.
> no, as then the structure has changed, and they will no longer be merged...
In that case you will need to clone types re-hash each time an operation get declared. It does not worth the efforts, you save a paar KBytes at the cost of a massive distributed overhead.
>> You introduce a unique ID, which is the type tag.
> the problem here is with multiple runs and a large persistent metadata > database...
This is what I was trying to convey.
> prior to the DB, this worked plenty well. > after adding the DB, each new run essentially creates garbage, in the form > of new UIDs and redefinitions of essentially the same types...
> to deal with this would require some means of effectively "cleaning" the DB > of old entries, and preventing it from gradually being weighed down by huge > amounts of garbege...
Collecting the garbage implies that the type tag is a system resource, which makes it even more expensive to use...
>>> about the halting problem, I don't see why this is the case.
>> Because two types are equivalent if and only if all operations defined on >> them are equivalent. If somebody defines f on foo, that alone would make it >> distinct from bar. Note that members like x and y are actually operations >> too, which get and set the corresponding components. So foo and bar are >> equivalent because both have "get x" and "set x" of type integer.
> this is only if said operations are defined externally...
You mean free functions?
> if they are methods, they are part of the structure, and no such merging > would happen in the first place.
The relation between a type and its operations (methods) is n to m. Neither is a part of another.
Anyway, the point is, structural equivalence requires looking into the operations in order to determine equivalence. This is halting.
> if they are defined via operator overloading, then they exist as part of the > types' heirarchy, and not as part of the structures' heirarchy (a given > structural type may be shared between several ADTs, and these ADTs will > continue being unique, even if the structure is shared...).
class X { virtual void Foo (); }; class Y { virtual void Foo (); };
>>>>> To start with you have to calculate the hash value from a string, which >>>>> alone is too slow as compared to simple indexing.
>>>> typically, hashes and indices are trivial to calculate.
>>>O(n), where n is the length of the string as compared to O(1) of indexing.
>> I'm not sure what you mean by indexing,
>I meant evaluation of the target of a dispatching call. When the >dispatching table is represented by a dense array (the single dispatch >case) indexed by the type tag. Ideal performance of dispatch should be >close to that.
Thank you for the context. Yes, given that there exists a precomputed type tag, it will be more efficient time wise to do a table lookup using the precomputed tag than it is compute a new type signature.
>> but for any meaning that >> occurs to me, this is not right.
>> If we have N distinct strings it takes log(N) decisions to >> distinguish between them. Moreover the average string length, n, >> must be >= log2(N) bits.
>You consider here a dispatching table represented by a binary tree. cr88192 >did consider hash tables. My response was that evaluation of a hash value >is still O(n).
No, I am not considering a dispatching table represented by a binary tree. I was considering that which I considered in the text that I wrote.
>> Furthermore O(log n) is the cost of determining that a string s is >> possibly in our set of strings. To verify that s actually is in >> the set of strings we must check the the entire string. I.e., >> verification is O(n).
>> In short, the cost of looking of looking up a string in a set of >> strings is O(n). The cost can be concealed by a languages syntax >> but it is there.
>Yes!
Just as a side note, one should remember that the use of the big O() function in discussions of programming is almost always a convenient simplification. We aren't actually interested in the asymptotic value of cost functions as n goes to infinity - n is always bounded for some perhaps unknown value of n. Moreover usage seldom reflects all costs; rather it reflects those costs that are considered relevant in context.
This side note is not a disagreement with what you wrote; it is merely a general observation.
>>>> theoretically, yes. >>>> in practice, this usually does not matter (as small loops are typically >>>> really fast and most strings are short).
>>> Not in the case when you have to pack all existing types into the table. >>> The idea was to handle it global by the OS.
>> not so much of a problem... >> one can just allocate a hash table with maybe 1M entries, and hope that >> it >> does not run out (or, if it does, endure the cost of re-hashing into a >> bigger table...).
> ...in a flight control system during a flight across Atlantic...
you would have to have a MASSIVE hash to have any real effect here... a delay of maybe a few us is not likely to have any real effect here.
of course, you would probably not want to use a garbage collector though (since these may easily have delays of several hundred MS or more).
>>>> in cases where performance is important, one need not do lookups with >>>> strings.
>>> The language does not know if performance is important when the >>> programmer >>> writes f(x). It has to decide how to implement the call. If the expected >>> performance of an OS service is poor, it will not be used at all.
>> hmm... many people tolerate Vista and Windows 7...
> There is already Windows. You cannot do it better (sorry worse (:-)). It > is > perfectly bad.
yep, maybe...
>> anyways, one can generally optimize according to commonality.
> I don't believe it. If an optimum exists and can be found, why don't you > implement the optimum from the start?
it is too much effort to optimize everything. as a result, I optimize whatever is going slow, AKA: whatever starts showing up in the profiler...
>>>> dynamic cast can use special mechanisms (as in, directly interfacing >>>> with >>>> the object system), and thus bypassing any direct usage of these >>>> strings.
>>> So it is that "bypassing" which actually does all the job. (:-)) So you >>> need something that does it (without strings and hash tables). That >>> thing >>> could be made a part of OS. But I doubt it would be easy to do.
>> yeah...
>> strings are typically my "cannonical" representation, but not every >> operation needs to use them...
> ... and operations on type tags are among those!
well, it works...
granted, this GC/typesystem is not the fastest possible option, but in general it is "plenty fast enough". partly this is because I don't do a whole lot of fine-grain typechecking though.
it is not a very good GC setup for dynamically typed interpreters, for various reasons (actually, its garbage collection behavior is a much worse offender), but works plenty well for statically typed code...
as for type signatures, most of these result from compiling said statically typed code...
>>>> I think I mentally before wondered if the hash table wondered much in >>>> the >>>> face of an operation which simply walks the inheritence tree, but it >>>> could >>>> be a lot more expensive in the MI case or similar, so the hash is >>>> probably >>>> reasonable.
>>> I don't believe in hash tables. Additionally to already discussed:
>>> you also need a very efficient merge and split of the dispatching >>> tables.
>>> This is an issue when types are declared in a scope (you extend the >>> dispatching table by adding new type tags = new rows and columns there). >>> Then, when the type leaves the scope and dies you have to collapse the >>> tables. It would be extremely expensive operation with hashes. And as an >>> OS >>> service that is always the case, since applications come and go. So we >>> have >>> to forget that nice global scope static types.
>> once added we could "pretend" that types don't die...
> Oh yes, that is the Windows approach. If something goes wrong - reboot the > damn beast...
yep.
although, if done well, a rehash or GC operation can be done to clear out full hashes...
>> as for the inheritence-checking hash, this has is small, and is more of a >> cache. if/when a type is actually destroyed, or another critical change >> is >> made, the hash can be simply cleared and then rebuilt (I don't use this >> as >> much for bigger hashes, as it can be a far more severe hit, overall, than >> rehashing...).
> That makes it totally useless for a language that creates type in a tight > loop. For example, in Ada one often writes:
> procedure Foo (X : <some-array-type>) is > subtype Index is Integer range X'Range; > -- The subtype which is always in the bounds of X. > begin > ... -- The compiler can omit any bound checks related to > -- use Index with X
> This is innocent hint to the compiler with turn catastrophic with what you > proposed when Foo is called in a loop.
that is only if "ranges" are implemented as "proper" types.
anyways, I don't really expect things like this to happen in much practice.
anyways, the above table in question is for the C/I-OO system, and so, it would only really be an issue if one is creating a constant stream of new classes, and at this point even then the impact is likely to be modest (it falls back to doing linear inheritence checks).
creating a constant stream of classes is likely to create far worse problems than just poor hashing behavior though.
>>> Because when f is defined on foo that does not make it defined on bar.
>> no, as then the structure has changed, and they will no longer be >> merged...
> In that case you will need to clone types re-hash each time an operation > get declared. It does not worth the efforts, you save a paar KBytes at the > cost of a massive distributed overhead.
not really.
types are not (themselves) mutable... similarly, they don't force rebuilding the hash tables.
only type deletion would cause this, and in general, type deletion does not happen explicitly (more often, types fall out of visibility and gradually disappear).
>>> You introduce a unique ID, which is the type tag.
>> the problem here is with multiple runs and a large persistent metadata >> database...
> This is what I was trying to convey.
note that simply creating a new name, will make the problem worse. this is why I tend to use hash values, as a hash value will tend to be deterministic, and thus has much better "merge" properties against said persistent database.
>> prior to the DB, this worked plenty well. >> after adding the DB, each new run essentially creates garbage, in the >> form >> of new UIDs and redefinitions of essentially the same types...
>> to deal with this would require some means of effectively "cleaning" the >> DB >> of old entries, and preventing it from gradually being weighed down by >> huge >> amounts of garbege...
> Collecting the garbage implies that the type tag is a system resource, > which makes it even more expensive to use...
database size is the system resource...
bigger database means more wasted time and memory.
as a result, the generation of DB garbage should be minimized.
>>>> about the halting problem, I don't see why this is the case.
>>> Because two types are equivalent if and only if all operations defined >>> on >>> them are equivalent. If somebody defines f on foo, that alone would make >>> it >>> distinct from bar. Note that members like x and y are actually >>> operations >>> too, which get and set the corresponding components. So foo and bar are >>> equivalent because both have "get x" and "set x" of type integer.
>> this is only if said operations are defined externally...
> You mean free functions?
free functions are not operations... I don't need to care what functions are defined, since a function is not a part of the type.
a function is a function...
I had thought you were talking about operator overloading, ...
>> if they are methods, they are part of the structure, and no such merging >> would happen in the first place.
> The relation between a type and its operations (methods) is n to m. > Neither > is a part of another.
> Anyway, the point is, structural equivalence requires looking into the > operations in order to determine equivalence. This is halting.
no, I don't do it this way.
a class, for example, is a list of slots and methods. the only methods which can exist for a given class are those defined for this class, and they are physically located "within" the class.
classes with different method declarations are thus unique.
methods are internally associated with classes based on QName, meaning that, as is, as far as definitions go, only named classes can have methods at present (although, this is at the metadata/IL level).
at the language level, a class can have internally-defined methods, which could apply to unnamed classes (an unnamed class, for technical reasons, can't have external methods).
in the compiler, the class would be hashed (in this case, likely including said methods), and so if the classes are not identical, they will not merge.
if 2 unnammed classes both have identical definitions and identical methods, they may as well be merged...
however, the point is moot, as I don't think this is likely to happen much in practice.
more so, this could only happen in C++ (in C# and Java, these languages will not allow this particular situation to happen).
>> if they are defined via operator overloading, then they exist as part of >> the >> types' heirarchy, and not as part of the structures' heirarchy (a given >> structural type may be shared between several ADTs, and these ADTs will
>>>>>> To start with you have to calculate the hash value from a string, >>>>>> which >>>>>> alone is too slow as compared to simple indexing.
>>>>> typically, hashes and indices are trivial to calculate.
>>>>O(n), where n is the length of the string as compared to O(1) of >>>>indexing.
>>> I'm not sure what you mean by indexing,
>>I meant evaluation of the target of a dispatching call. When the >>dispatching table is represented by a dense array (the single dispatch >>case) indexed by the type tag. Ideal performance of dispatch should be >>close to that.
> Thank you for the context. Yes, given that there exists a > precomputed type tag, it will be more efficient time wise to do a > table lookup using the precomputed tag than it is compute a new > type signature.
yep, or we can statically lookup the method and then hold a method handle...
granted, this is best suited to static typing, not dynamic typing...
however, my present emphasis is on primarily optimizing the statically typed case, and dealing with dynamic languages primarily by looking for "invariants" and compiling them to statically typed code...
FWIW, this can also help locate "questionable" constructions in dynamic languages, which are often more likely to be programmer error than the intended behavior.
>>> but for any meaning that >>> occurs to me, this is not right.
>>> If we have N distinct strings it takes log(N) decisions to >>> distinguish between them. Moreover the average string length, n, >>> must be >= log2(N) bits.
>>You consider here a dispatching table represented by a binary tree. >>cr88192 >>did consider hash tables. My response was that evaluation of a hash value >>is still O(n).
> No, I am not considering a dispatching table represented by a > binary tree. I was considering that which I considered in the text > that I wrote.
he seems to have an issue with losing the context and then arguing about things in a context different than where they were stated.
>>> Furthermore O(log n) is the cost of determining that a string s is >>> possibly in our set of strings. To verify that s actually is in >>> the set of strings we must check the the entire string. I.e., >>> verification is O(n).
>>> In short, the cost of looking of looking up a string in a set of >>> strings is O(n). The cost can be concealed by a languages syntax >>> but it is there.
>>Yes!
> Just as a side note, one should remember that the use of the big > O() function in discussions of programming is almost always a > convenient simplification. We aren't actually interested in the > asymptotic value of cost functions as n goes to infinity - n is > always bounded for some perhaps unknown value of n. Moreover usage > seldom reflects all costs; rather it reflects those costs that are > considered relevant in context.
> This side note is not a disagreement with what you wrote; it is > merely a general observation.
yeah...
in particular, what is important is not some "theoretical" complexity, rather the practical performance.
O(n) where n is both small and bounded, and where O(1) is absurdly cheap, is IMO not worth worrying about.
O(log2 n) where O(1) is very expensive could very well be a worse situation than O(n) where O(1) is cheap, except for some sufficiently large n (and if n is known to be below this threshold, it makes little sense to use an algo which will be slower).
FFS, in some cases I use O(n^2) and O(n^3) algos to good effect in cases where I know that n is sufficiently small that it will not matter to the overall performance.
so, the big concern as I see it, is the actual performance (as in, testable, measurable, and verifiable with the profiler), not some theoretical complexity.
and, FWIW, hashing typically improves performance FAR more than it hurts it.
On Mon, 22 Jun 2009 10:26:18 -0700, cr88192 wrote: > "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de> wrote in message > news:xssk85vmx8ts.1a6biujdmgcr5$.dlg@40tude.net... >> I don't believe it. If an optimum exists and can be found, why don't you >> implement the optimum from the start?
> it is too much effort to optimize everything. > as a result, I optimize whatever is going slow, AKA: whatever starts showing > up in the profiler...
Things put into the OS must be close to optimal.
>>> strings are typically my "cannonical" representation, but not every >>> operation needs to use them...
>> ... and operations on type tags are among those!
> well, it works...
I have no doubts about it. The question was if it makes sense to put that into the OS. To answer this, making it just working is not enough. It must work optimal for most of possible uses.
>>> as for the inheritence-checking hash, this has is small, and is more of a >>> cache. if/when a type is actually destroyed, or another critical change >>> is >>> made, the hash can be simply cleared and then rebuilt (I don't use this >>> as >>> much for bigger hashes, as it can be a far more severe hit, overall, than >>> rehashing...).
>> That makes it totally useless for a language that creates type in a tight >> loop. For example, in Ada one often writes:
>> procedure Foo (X : <some-array-type>) is >> subtype Index is Integer range X'Range; >> -- The subtype which is always in the bounds of X. >> begin >> ... -- The compiler can omit any bound checks related to >> -- use Index with X
>> This is innocent hint to the compiler with turn catastrophic with what you >> proposed when Foo is called in a loop.
> that is only if "ranges" are implemented as "proper" types.
Yes. Are you saying that tags of "improper" types need not to be handled by the OS? (:-))
> creating a constant stream of classes is likely to create far worse problems > than just poor hashing behavior though.
Yes, if the implementation is poor.
Now considering a distributed networking system of the next generation, you will inevitable face this problem. You cannot put all types into one node. It means that types will be massively distributed between communicating nodes (and applications). So you will create and kill types all the time in the same way, we presently create and kill values. This is IMO a characteristic of "new generation" OS - working with types *routinely* = at virtually zero cost.
>>>> Because when f is defined on foo that does not make it defined on bar.
>>> no, as then the structure has changed, and they will no longer be >>> merged...
>> In that case you will need to clone types re-hash each time an operation >> get declared. It does not worth the efforts, you save a paar KBytes at the >> cost of a massive distributed overhead.
> not really.
> types are not (themselves) mutable...
Yes.
> similarly, they don't force rebuilding the hash tables.
No. Dispatching table is not a property of a type. It is of a polymorphic operation of a class (set of types). The set of types is mutable. You can create new types derived form another. That does not mutate the base. It mutates the dispatching table.
> only type deletion would cause this, and in general, type deletion does not > happen explicitly (more often, types fall out of visibility and gradually > disappear).
I see no difference. Once the type vanishes, the dispatching table should be cleaned in order to reuse the type tag and memory.
>>> if they are methods, they are part of the structure, and no such merging >>> would happen in the first place.
>> The relation between a type and its operations (methods) is n to m. >> Neither is a part of another.
>> Anyway, the point is, structural equivalence requires looking into the >> operations in order to determine equivalence. This is halting.
> no, I don't do it this way.
As I said, what you call structural is in rather nominal based on signatures.
> a class, for example, is a list of slots and methods. > the only methods which can exist for a given class are those defined for > this class, and they are physically located "within" the class.
This model is wrong because it does not support multi-methods. The relation in not 1 to n, as a matter of fact.
>>> if they are defined via operator overloading, then they exist as part of the >>> types' heirarchy, and not as part of the structures' heirarchy (a given >>> structural type may be shared between several ADTs, and these ADTs will >>> continue being unique, even if the structure is shared...).
>> class X { virtual void Foo (); }; >> class Y { virtual void Foo (); };
>> are these two equivalent?
> no, these two classes are named and have different names.
> mostly it is an issue for things like: > struct { int x; } foo; > ... > struct { int x; } bar;
OK, I see the first declaration of struct { int } introduces an anonymous type. The second declaration refers to this declaration.
I don't think it makes sense to support type inference in OS (I would rather consider equivalence of foo and bar in your example as type inference). This could be dealt with at the language level.
> similarly, my framework also typically merges strings as well... > feel like complaining about this next?...
Yes, because there exist different attitudes to the issue. In other languages anonymously declared types might be treated as not equivalent. Compare to Ada:
Foo : array (1..2) of Integer; -- Anonymous array type is declared here Bar : array (1..2) of Integer;
Foo := Bar; -- This is illegal, two anonymous types are not equivalent
>>> I don't believe it. If an optimum exists and can be found, why don't you >>> implement the optimum from the start?
>> it is too much effort to optimize everything. >> as a result, I optimize whatever is going slow, AKA: whatever starts >> showing >> up in the profiler...
> Things put into the OS must be close to optimal.
maybe, but some things are more important than others.
granted, for an OS is might be a little harder to run it though a profiler. in my past OS, one idea I once discovered to make it easier to profile and debug, was basically to rig up code so that the kernel could be built in "hosted" mode, AKA, running on top of an existing OS. this then allowed me to profile and debug things more easily, without as often booting the thing and trying to figure out just where the hell the crash was.
it might be a little more difficult if the OS involves loadable binaries, as one would have to devise a loader and process manager also able to cooperate with the host OS (such as Windows or Linux).
this is made a lot easier if ones binaries and libraries are load-time relocatable.
I forget exactly, but I think I had used fixed-address binaries (in the PE/COFF format), which had been set to relocate to an odd address (vague memory of 0x00C00000 or 0x0C000000 or similar).
either way, this way could allow a "dummy stub" to get the binary loaded into the process, and get the libraries loaded. note that this would mean using a different syscall mechanism between hosted and non-hosted operation (such as using one lib for core syscalls in hosted mode, and another for non-hosted).
from what I remember though, my kernel did not use fully separate process spaces, but at the time I was lazy and typically relocated binaries on load so they went in the same address space as the kernel (I think I had binaries with relocations left in or similar).
this would simplify loading, but is not so good from a design perspective...
(if I were to do OS-dev now, I would probably likely implement a partial special purpose CPU emulator and custom profiler and debugger, but this is more because now I would likely be targetting a long-mode kernel, which would likely be a PITA to run hosted...).
or such...
>>>> strings are typically my "cannonical" representation, but not every >>>> operation needs to use them...
>>> ... and operations on type tags are among those!
>> well, it works...
> I have no doubts about it. The question was if it makes sense to put that > into the OS. To answer this, making it just working is not enough. It must > work optimal for most of possible uses.
I think a lot of this (in its early form) was used in an OS.
basically, my modern codebase is more or less a distant descendant of my OS project, where long ago I gave up on OS dev (concluding that it was not likely to achieve useful results), and so all of the non-hosted parts (drivers, ...) eventually fell off, and other parts atrophied and disappeared...
it is worth noting that later, a few minor bits which had fell off, were re-incorporated into my compiler framework, such as chunks of the PE/COFF loader, ...
>>>> as for the inheritence-checking hash, this has is small, and is more of >>>> a >>>> cache. if/when a type is actually destroyed, or another critical change >>>> is >>>> made, the hash can be simply cleared and then rebuilt (I don't use this >>>> as >>>> much for bigger hashes, as it can be a far more severe hit, overall, >>>> than >>>> rehashing...).
>>> That makes it totally useless for a language that creates type in a >>> tight >>> loop. For example, in Ada one often writes:
>>> procedure Foo (X : <some-array-type>) is >>> subtype Index is Integer range X'Range; >>> -- The subtype which is always in the bounds of X. >>> begin >>> ... -- The compiler can omit any bound checks related to >>> -- use Index with X
>>> This is innocent hint to the compiler with turn catastrophic with what >>> you >>> proposed when Foo is called in a loop.
>> that is only if "ranges" are implemented as "proper" types.
> Yes. Are you saying that tags of "improper" types need not to be handled > by > the OS? (:-))
integer ranges to me seem more like a compiler hack, so the underlying type would likely just be a plain integer...
>> creating a constant stream of classes is likely to create far worse >> problems >> than just poor hashing behavior though.
> Yes, if the implementation is poor.
creating lots of classes will end up using up lots of memory in the C/I system, and at present, there are no mechanisms by which classes can be "garbage". as such, creating a constant stream of "new" classes would essentially use up all the memory...
the main advantage is that, in good old C-family languages, this does not really happen (except maybe in Java, but this creates "anonymous classes" which I think I implemented as GC friendly).
> Now considering a distributed networking system of the next generation, > you > will inevitable face this problem. You cannot put all types into one node. > It means that types will be massively distributed between communicating > nodes (and applications). So you will create and kill types all the time > in > the same way, we presently create and kill values. This is IMO a > characteristic of "new generation" OS - working with types *routinely* = > at > virtually zero cost.
potentially...
however, the question is whether they will be enough "unique" types in the entirety of the codebase running on such an OS to make this too much of a problem.
actually, a way to handle this is probably just making a separate "per-assembly" types DB, then loading this DB for whatever process is being loaded (and for whatever libraries/assemblies are loaded).
AKA: the .NET strategy...
as is, I maintain a single global types DB, but mostly this is because this was less effort than bothering to implement a separate DB per assembly.
I had a few times considered the possibility of splitting up the DB, though probably this would be either through "mounts" or "hives". however, I have not had reason to do so thus far.
>>>>> Because when f is defined on foo that does not make it defined on bar.
>>>> no, as then the structure has changed, and they will no longer be >>>> merged...
>>> In that case you will need to clone types re-hash each time an operation >>> get declared. It does not worth the efforts, you save a paar KBytes at >>> the >>> cost of a massive distributed overhead.
>> not really.
>> types are not (themselves) mutable...
> Yes.
>> similarly, they don't force rebuilding the hash tables.
> No. Dispatching table is not a property of a type. It is of a polymorphic > operation of a class (set of types). The set of types is mutable. You can > create new types derived form another. That does not mutate the base. It > mutates the dispatching table.
yes.
but, this is not (exactly) how my hashes are used.
I use hash tables for interface method calls, but not for virtual method calls (except for currently in the MI case).
the reason this is is, because I can make use of a method handle, and directly index the vtable slot...
for interfaces, the traditional mechanism gets rather ugly, and is not likely to be much faster than a hash table. so, my solution was to basically produce a joint hash value (class and method), and look up the method in a big hash table (and add the requested method if not found).
granted, the hash table I use for this is not exactly small (I think it has initially 2^18 entries or similar). (note that I also precompute the hash values, and so combining them and performing a lookup is fairly fast).
for MI, a similar mechanism is used, but I have since discovered that this is "ill advised" (namely, in that it can't correctly implement C++'s MI semantics).
the issue is mostly that MI is implemented by creating a new class which "merges" all the superclasses, and uses some internal amount of hackery to redirect superclass fields and methods into the new merged class (or, essentially, MI is implemented as hacked SI).
>> only type deletion would cause this, and in general, type deletion does >> not >> happen explicitly (more often, types fall out of visibility and gradually >> disappear).
> I see no difference. Once the type vanishes, the dispatching table should > be cleaned in order to reuse the type tag and memory.
well, it will happen, but gradually... once a type falls out of use, it will be cleared out of the hashes eventually. the main thing would be to allow named classes to be garbage collected, which presently is not done (since all the classes are linked together in the class manager).
note that the class-manager holds resident classes, but classes may exist in the DB and be non-resident (AKA: just a collection of DB entries). trying to instance the class may cause it to be loaded from the DB.
a remote possibility could be an "unload" feature for resident classes where there are no instances (or resident subclasses). however, this would be best done sparingly, as reloading would be expensive.
>> Yes. Are you saying that tags of "improper" types need not to be handled >> by the OS? (:-))
> integer ranges to me seem more like a compiler hack, so the underlying type > would likely just be a plain integer...
There is no such thing like "underlying type" or "plain integer". If any that is an implementation detail.
It is not only wrong to think in that, it is just wrong. Consider modular integer of the range 1..3. Again, type is a set of values and operations defined on them, period. Nothing beyond that.
>>> creating a constant stream of classes is likely to create far worse problems >>> than just poor hashing behavior though.
>> Yes, if the implementation is poor.
> creating lots of classes will end up using up lots of memory in the C/I > system, and at present, there are no mechanisms by which classes can be > "garbage". as such, creating a constant stream of "new" classes would > essentially use up all the memory...
Yes, if the implementation is poor... (:-))
>>>>> if they are methods, they are part of the structure, and no such merging >>>>> would happen in the first place.
>>>> The relation between a type and its operations (methods) is n to m. >>>> Neither is a part of another.
>>>> Anyway, the point is, structural equivalence requires looking into the >>>> operations in order to determine equivalence. This is halting.
>>> no, I don't do it this way.
>> As I said, what you call structural is in rather nominal based on >> signatures.
> I am not sure what is meant here...
It is meant that you don't compare types for being semantically equivalent. You compare their signatures constructed out of the memory layout. This is type of equivalence is nominal. Semantic equivalence of T1 and T2 means that
1. for each value of T1 there exist one and only one value of T2 2. the same is true for their operations 3 for each pair of such operations any combination the arguments equivalent in the sense defined by 1, the both yield the results equivalent in this sense.
Equivalently any provable predicate P defined in terms of T1 remains true after substituting T2 for T1.
>>> a class, for example, is a list of slots and methods. >>> the only methods which can exist for a given class are those defined for >>> this class, and they are physically located "within" the class.
>> This model is wrong because it does not support multi-methods. The relation >> in not 1 to n, as a matter of fact.
> method overloading works fine, as each method has a different signature...
Methods cannot be overloaded, per definition of. Unless you mean something else.
>>> similarly, my framework also typically merges strings as well... >>> feel like complaining about this next?...
>> Yes, because there exist different attitudes to the issue. In other >> languages anonymously declared types might be treated as not equivalent. >> Compare to Ada:
>> Foo : array (1..2) of Integer; -- Anonymous array type is declared here >> Bar : array (1..2) of Integer;
>> Foo := Bar; -- This is illegal, two anonymous types are not equivalent
> this is a compiler issue, not a core typesystem or runtime issue...
> the compiler can complain about it if it wants, but the runtime need not do > so...
Exactly the reverse. The OS does not care if any two types are equivalent according to the language standard. The compiler takes care to infer equivalence if it will.
This is different in different languages and incomputable in general case. Why would OS bother about the mess? It is the language semantics to determine what is equivalent, not the OS's business.
>>> Yes. Are you saying that tags of "improper" types need not to be handled >>> by the OS? (:-))
>> integer ranges to me seem more like a compiler hack, so the underlying >> type >> would likely just be a plain integer...
> There is no such thing like "underlying type" or "plain integer". If any > that is an implementation detail.
plain integer: in my case is given the letter 'i' and is defined as 32 bits. otherwise, it is 'x' and 64 bits, or 'n' and 128 bits.
> It is not only wrong to think in that, it is just wrong. Consider modular > integer of the range 1..3. Again, type is a set of values and operations > defined on them, period. Nothing beyond that.
then, the integer "can" be 8 bits, but more likely is 32...
>>>> creating a constant stream of classes is likely to create far worse >>>> problems >>>> than just poor hashing behavior though.
>>> Yes, if the implementation is poor.
>> creating lots of classes will end up using up lots of memory in the C/I >> system, and at present, there are no mechanisms by which classes can be >> "garbage". as such, creating a constant stream of "new" classes would >> essentially use up all the memory...
> Yes, if the implementation is poor... (:-))
well, you see if you can figure out how to allocate things without using any memory...
>>>>>> if they are methods, they are part of the structure, and no such >>>>>> merging >>>>>> would happen in the first place.
>>>>> The relation between a type and its operations (methods) is n to m. >>>>> Neither is a part of another.
>>>>> Anyway, the point is, structural equivalence requires looking into the >>>>> operations in order to determine equivalence. This is halting.
>>>> no, I don't do it this way.
>>> As I said, what you call structural is in rather nominal based on >>> signatures.
>> I am not sure what is meant here...
> It is meant that you don't compare types for being semantically > equivalent. > You compare their signatures constructed out of the memory layout. This is > type of equivalence is nominal. Semantic equivalence of T1 and T2 means > that
> 1. for each value of T1 there exist one and only one value of T2 > 2. the same is true for their operations > 3 for each pair of such operations any combination the arguments > equivalent in the sense defined by 1, the both yield the results > equivalent > in this sense.
> Equivalently any provable predicate P defined in terms of T1 remains true > after substituting T2 for T1.
oh, ok...
this can be done with named types, but generally is not the case for unnamed types.
this is also more a matter of the "core typesystem", but is not as much the case for "auxilary types".
>>>> a class, for example, is a list of slots and methods. >>>> the only methods which can exist for a given class are those defined >>>> for >>>> this class, and they are physically located "within" the class.
>>> This model is wrong because it does not support multi-methods. The >>> relation >>> in not 1 to n, as a matter of fact.
>> method overloading works fine, as each method has a different >> signature...
> Methods cannot be overloaded, per definition of. Unless you mean something > else.
internally, each will have a modified name, so, they might be instead regarded as: foo(i)v foo(ff)v foo(ddd)v
when compiling, the compiler figures out which types the call-site is using, and picks the appropriate method as the reciever.
actually, internally the name and signature are kept separate, but for some uses they are joined and spit apart when needed. this is similar to the JVM strategy.
>>>> similarly, my framework also typically merges strings as well... >>>> feel like complaining about this next?...
>>> Yes, because there exist different attitudes to the issue. In other >>> languages anonymously declared types might be treated as not equivalent. >>> Compare to Ada:
>>> Foo : array (1..2) of Integer; -- Anonymous array type is declared >>> here >>> Bar : array (1..2) of Integer;
>>> Foo := Bar; -- This is illegal, two anonymous types are not equivalent
>> this is a compiler issue, not a core typesystem or runtime issue...
>> the compiler can complain about it if it wants, but the runtime need not >> do >> so...
> Exactly the reverse. The OS does not care if any two types are equivalent > according to the language standard. The compiler takes care to infer > equivalence if it will.
> This is different in different languages and incomputable in general case. > Why would OS bother about the mess? It is the language semantics to > determine what is equivalent, not the OS's business.
FWIW, the OS need not particularly care about "types" in the first place...
the OS can instead provide abstract facilities (such as, facilities for signature handling, if even this, more likely the OS will just provide mmap, ...), leaving all other issues up to the combination of compiler and runtime libraries...
the runtime only need care about general issues, not specific issues (such as "exact" type behavior). things like type-checking, ... which are really compiler issues.
as I said before, the types equivalence/merging semantics are done in the compiler, not in the runtime. the runtime is only told things by the compiler ("hey, here is this type, ...").
but, anyways, my main concern here is C-family languages (C, C++, Java, C#, ...), and I am going by what is "standard" practice on these fronts. any other languages will just have to live with it...
On Wed, 24 Jun 2009 10:23:54 -0700, cr88192 wrote: > "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de> wrote in message > news:1nwbhpkrjmg29.ba7yhipe8atn.dlg@40tude.net... >> On Tue, 23 Jun 2009 12:01:20 -0700, cr88192 wrote:
>>>> Yes. Are you saying that tags of "improper" types need not to be handled >>>> by the OS? (:-))
>>> integer ranges to me seem more like a compiler hack, so the underlying >>> type >>> would likely just be a plain integer...
>> There is no such thing like "underlying type" or "plain integer". If any >> that is an implementation detail.
> plain integer: > in my case is given the letter 'i' and is defined as 32 bits. > otherwise, it is 'x' and 64 bits, or 'n' and 128 bits.
It is not an integer. Plain integer if any is such implements the set of integer numbers. Yours does not.
>> It is not only wrong to think in that, it is just wrong. Consider modular >> integer of the range 1..3. Again, type is a set of values and operations >> defined on them, period. Nothing beyond that.
> then, the integer "can" be 8 bits, but more likely is 32...
It cannot be, since 2+2 = 1 (mod 4)
This illustrates the principle. The type is a set of values and operations. The operations of the ring modulus 4 are sufficiently different from ones of the ring 256 (if "8-bits" means modulus 2**8).
>> Yes, if the implementation is poor... (:-))
> well, you see if you can figure out how to allocate things without using any > memory...
>>>>> a class, for example, is a list of slots and methods. >>>>> the only methods which can exist for a given class are those defined for >>>>> this class, and they are physically located "within" the class.
>>>> This model is wrong because it does not support multi-methods. The >>>> relation in not 1 to n, as a matter of fact.
>>> method overloading works fine, as each method has a different >>> signature...
>> Methods cannot be overloaded, per definition of. Unless you mean something >> else.