[...] : But wait! How did we get a pointer with value 0x1233:0x0024 in the first : place?
: Underflowing or overflowing an array subscript? Bzzz. Undefined behaviour. : Pointer arthmetic outside an object's bounds? Bzzz. Undefined behaviour. : (int*)(0x12330024) ? Implementation defined.
: Point is, a compiler targeting 8086 can get away with not normalising a : pointer, so long as it does not do any normalising anywhere else. : (Documenting the effect of comparing casted pointers notwithstanding.) [...]
I'm a bit confused. Do you mean that you can implement an ANSI C compiler for the 8086 without normalizing pointers before comparing them? If so, I agree 100%.
On the other hand, if you mean that normalizing is atypical, then note that your solution won't work on >64KB objects. If memory serves (good chance it won't), they're accessed with "huge" pointers, and can cross segments.
In article <3778913d.376...@news.euronet.nl> usur...@euronet.nl "Paul Mesken" writes:
...
>I've seen interesting incompatibilities posted which I didn't think of >myself (how a pointer is stored, the floats (I never do floats) and >particular bit representation schemes for negative numbers) but mine >is the only one that results in an error :-)
Good point. But that also makes it easy to spot and fix.
-- ----------------------------------------- Lawrence Kirby | f...@genesis.demon.co.uk Wilts, England | 70734....@compuserve.com -----------------------------------------
In article <T6Vd3.450$6n6.8084@client>, "Dann Corbit" <dcor...@solutionsiq.com> wrote:
> Quite right, a horrible gaffe (comparing doubles for equality), and > indicates that the programmer is probably incompetent for numerical
work.
Be assured, that this thread is not about comparing floating point numbers by memcmp. It is about hashing/comparing structures and (not excluding floating point numbers striving for a general solution) about all the problems coming from padding bits and bytes and potential holes in the data representations of structure components.
Alex_K...@scitex.com wrote: > > My naive implementation for this was > > memset(p,'\0',size);
> > Might it be possible that this does not reach all bits > > in memory?
> No, memset looks at an object as an array of unsigned chars; > since there are no holes or paddings in unsigned char, memset reaches > all bits.
If this is true, then all my problems with StructCmp() can be solved eventually. But I remember vaguely C discussions about implementations on top of LISP or on pure floating point machines, that might behave differently. Could you quote the standard?
> ... > The problem is - no data type except unsigned chars is guaranteed > not to have holes in its binary representation. So, for example, > int a,b; > a = 1; > b = 1; > -then memcmp(&a,&b,sizeof int) is not guaranteed to compare equal.
that would clear any available holes in the int data representation.
> if memcmp returns 0, stuctures *are* equal; the problem is when it > returns non-zero - they still may be equal, except padding bits.
If we can reach any padding bits and bytes, then we can also clear them.
> > D. Similar to C, the problem of getting at all bits that > > carry component information e.g. for calulating a hash > > value. Will be solved when C ist solved.
> > E. May the system touch/change unused padding bytes within > > a structure at will?
> Nothing forbids it.
I know. But I would trust Occom's razor on this point.
> > Could we assume, that after > > char PiStr[]="3.14159"; > > ... > > char *p=PiStr; > > ... /* maybe different module */ > > char *q=PiStr; > > the pointers p and q have the same bit representation?
> No.
First, why not?
Second: In hashing structures it would not make much sense to use "deep" structures for string content. If such pointers would point to a fixed set of constant strings, they were better replaced by enums.
> The worst possible problem is padding bytes in basic types; > bitfields also don't fit your approach. You may zero all bits > of your structure/object prior to assignment - but that doesn't buy > you much, since in case of holes in type representation you may (and > probably shall) receive padding from the rvalue.
Yes, bitfields would have to be isolated into substructures and equipped with separate assignment functions.
> In article <7laobh$im...@nnrp1.deja.com>, > Alex_K...@scitex.com wrote: > > > My naive implementation for this was > > > memset(p,'\0',size);
> > > Might it be possible that this does not reach all bits > > > in memory?
> > No, memset looks at an object as an array of unsigned chars; > > since there are no holes or paddings in unsigned char, memset reaches > > all bits.
> If this is true, then all my problems with StructCmp() > can be solved eventually. But I remember vaguely C > discussions about implementations on top of LISP or > on pure floating point machines, that might behave > differently. Could you quote the standard?
At the moment I have at hand only C9X draft, but in this aspect it doesn't differs from C89. So, 6.2.6.1 [#3] Values stored in object of type unsigned char shall be represented using a pure binary notation (C89 states the same) and, as a consequence of this statement and a definition for pure binary notation, uchar consists of CHAR_BIT bits and represents values from 0 to 2^^CHAR_BIT-1. No place for padding or holes.
> > ... > > The problem is - no data type except unsigned chars is guaranteed > > not to have holes in its binary representation. So, for example, > > int a,b; > > a = 1; > > b = 1; > > -then memcmp(&a,&b,sizeof int) is not guaranteed to compare equal.
> that would clear any available holes in the int > data representation.
> > if memcmp returns 0, stuctures *are* equal; the problem is when it > > returns non-zero - they still may be equal, except padding bits.
> If we can reach any padding bits and bytes, then we can also > clear them.
Well, it is possible - you have to initialise template for a appropriate data type with all_zero padding bits and all_one significant ones. But I'd be damned if I know how to do this portably.
> > > D. Similar to C, the problem of getting at all bits that > > > carry component information e.g. for calulating a hash > > > value. Will be solved when C ist solved.
> > > E. May the system touch/change unused padding bytes within > > > a structure at will?
> > Nothing forbids it.
> I know. But I would trust Occom's razor on this point.
An extremely optimistic perception of fellow human beings :-) What if some compiler writer doesn't follow Occam's principle?
> > > Could we assume, that after > > > char PiStr[]="3.14159"; > > > ... > > > char *p=PiStr; > > > ... /* maybe different module */ > > > char *q=PiStr; > > > the pointers p and q have the same bit representation?
> > No.
> First, why not?
a) Possible padding bits; b) Segment:offset architectures.
> Second: In hashing structures it would not make much sense > to use "deep" structures for string content. If such pointers > would point to a fixed set of constant strings, they were > better replaced by enums.
> > The worst possible problem is padding bytes in basic types; > > bitfields also don't fit your approach. You may zero all bits > > of your structure/object prior to assignment - but that doesn't buy > > you much, since in case of holes in type representation you may (and > > probably shall) receive padding from the rvalue.
> Yes, bitfields would have to be isolated into substructures > and equipped with separate assignment functions.
-- Regards, Alex Krol Disclaimer: I'm not speaking for Scitex Corporation Ltd
Sent via Deja.com http://www.deja.com/ Share what you know. Learn what you don't.
> In article <T6Vd3.450$6n6.8084@client>, > "Dann Corbit" <dcor...@solutionsiq.com> wrote: > > Quite right, a horrible gaffe (comparing doubles for equality), and > > indicates that the programmer is probably incompetent for numerical > work.
> Be assured, that this thread is not about comparing floating point > numbers by memcmp. It is about hashing/comparing structures and > (not excluding floating point numbers striving for a general solution) > about all the problems coming from padding bits and bytes and > potential holes in the data representations of structure components.
I was addressing a different issue. Some people think that you can compare floats for equality the same way that you would compare integers. This is wrong. Neither was I talking about comparing floating point numbers with memcmp(). If you see something like this:
float a; float b;
double c; double d; /* intervening stuff... */
if (a == b) foo(); else bar(); if (c == d) bar(); else foo();
It is a sure indication that the programmer does not understand floating point. I have seen it *even* in Numerical Analysis textbooks, which only goes to show that ignorance is widespread.
In article <9Q5e3.16341$4e1.144...@iad-read.news.verio.net>, hu...@mnsinc.com (Szu-Wen Huang) wrote:
> Helmut Leitner (leit...@hls.via.at) wrote: > [...] > : At the moment (with your help and the other's contributions) > : I see five different problems. > [...]
> Wouldn't the solution to these five problems outweigh the cost > of just writing a function to compare the two structures? If > I was maintaining this program, I'd probably find the mechanism > surprising.
I see "compare" as the smaller brother of "hash". Both have to access all data bits and avoid all padding bits to work.
You can write individual functions for the "compare", but you still have all the same problems regarding padding bits if you want to calculate a hash function.
> > Be assured, that this thread is not about comparing floating point > > numbers by memcmp. It is about hashing/comparing structures and > > (not excluding floating point numbers striving for a general solution) > > about all the problems coming from padding bits and bytes and > > potential holes in the data representations of structure components. > I was addressing a different issue. Some people think that you can compare > floats for equality the same way that you would compare integers. This is > wrong. Neither was I talking about comparing floating point numbers with > memcmp(). If you see something like this:
> if (a == b) foo(); else bar(); > if (c == d) bar(); else foo();
> It is a sure indication that the programmer does not understand floating > point. I have seen it *even* in Numerical Analysis textbooks,
I can hardly believe this.
> which only > goes to show that ignorance is widespread.
> In other words, the part (a!=b) is dead wrong to begin with. Do you catch > my drift?
Ok, I see.
But I don't see what this has to do with double/float. "a=b" or "a!=b" is wrong anyway. "a=c" seem even "wronger", if this were possible.
The only answer I see is to implement functions like
int DoubleEqu(double x,double y); int FloatEqu(float x,float y); int DoubleEquMantBitcount(double x,double y,int count); int FloatEquMantBitcount(float x,float y,int count);
I never understood why there is no proposed implementation for such functions in the FAQ, no proposal for the implicit use of such functions in the C standard (for == and !=).
Alex_K...@scitex.com wrote: > > If this is true, then all my problems with StructCmp() > > can be solved eventually. But I remember vaguely C > > discussions about implementations on top of LISP or > > on pure floating point machines, that might behave > > differently. Could you quote the standard?
> At the moment I have at hand only C9X draft, but in this aspect it > doesn't differs from C89. So, > 6.2.6.1 [#3] Values stored in object of type unsigned char shall be > represented using a pure binary notation > (C89 states the same) > and, as a consequence of this statement and a definition for pure > binary notation, uchar consists of CHAR_BIT bits and represents values > from 0 to 2^^CHAR_BIT-1. No place for padding or holes.
I found this (also in the C9X draft) but was not sure, whether this guarantees access to any padding bits and bytes of any other data type.
> > > ... > > > The problem is - no data type except unsigned chars is > guaranteed > > > not to have holes in its binary representation. So, for example, > > > int a,b; > > > a = 1; > > > b = 1; > > > -then memcmp(&a,&b,sizeof int) is not guaranteed to compare equal.
> > that would clear any available holes in the int > > data representation.
> > > if memcmp returns 0, stuctures *are* equal; the problem is when > it > > > returns non-zero - they still may be equal, except padding bits.
> > If we can reach any padding bits and bytes, then we can also > > clear them.
> Well, it is possible - you have to initialise template for a > appropriate data type with all_zero padding bits and all_one significant > ones. But I'd be damned if I know how to do this portably.
For all integer data types it should be easy. E.g. something like
unsigned char IntDataBits[sizeof(int)];
void IntDataBitsInit(void) { int bits=sizeof(int)*CHAR_BIT; int val_ref=314; int i; MemClear(IntDataBits,sizeof(IntDataBits)); for(i=0; i<bits; i++) { val=val_ref; PtrSetBit((unsigned char *)&val,i); if(val!=val_ref) { PtrSetBit(IntDataBits,i); } } }
For floating point types it should be possible, but one has to avoid pitfalls like producing NaNs that might compare equal ...
At the moment I have no clue about pointer types (as I said, they are not really useful in the context at hand)
> I never understood why there is no proposed > implementation for such functions in the FAQ, > no proposal for the implicit use of such functions > in the C standard (for == and !=).
Probably for the same reason that many "obvious" functions are missing from C - such functions a) are trivial to write and b) would either not be sufficiently general in usefulness to merit inclusion in the library, or would be too general to be useful!
For example:
if(DoubleEqu(a,b))
is not very helpful. How equal is equal? Within +/- 0.000001 of each other? 0.00000000001? Or perhaps +/- 0.0001%? If so, how would you calculate the percentage? a * 100 / b? ((a - b) * 100) / b? Or what if the user wanted to compare to 6 significant figures? Or 7?
Tricky to design one easy-to-use function that copes elegantly with all these possibilities and yet handles all comparisons of all possible float (or double) values correctly.
It's simpler to just let the user-programmer write his own functions to do this. And in fact that's precisely what I've had to do on several occasions in the past, for various different clients.
In article <y7re3.130$PW6.2602@client> "Dann Corbit" <dcor...@solutionsiq.com> writes:
> float a; > float b; > > double c; > double d; > > if (a == b) foo(); else bar(); > if (c == d) bar(); else foo(); > > It is a sure indication that the programmer does not understand floating > point.
Tsk, tsk. A bit harsh Dann. I think I *do* understand floating point ;-). There are places where exact equality is just what is required. -- dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131 home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
> > I never understood why there is no proposed > > implementation for such functions in the FAQ, > > no proposal for the implicit use of such functions > > in the C standard (for == and !=).
> Probably for the same reason that many "obvious" functions are missing from > C - such functions a) are trivial to write and b) would either not be > sufficiently general in usefulness to merit inclusion in the library, or > would be too general to be useful!
> For example:
> if(DoubleEqu(a,b))
> is not very helpful. How equal is equal? Within +/- 0.000001 of each other? > 0.00000000001? Or perhaps +/- 0.0001%? If so, how would you calculate the > percentage? a * 100 / b? ((a - b) * 100) / b?
Perhaps taking the larger (fabs) one as the measure.
> Or what if the user wanted to > compare to 6 significant figures? Or 7?
Then he will have to take DoubleEquSigDigits(x,y,6); DoubleEquSigDigits(x,y,7); and if wants to compare to two decimal digits DoubleEquDecDigits(x,y,2);
And don't tell me all this is trivial.
> Tricky to design one easy-to-use function that copes elegantly with all > these possibilities
Who says that it must be one function.
> and yet handles all comparisons of all possible float > (or double) values correctly.
> It's simpler to just let the user-programmer write his own functions to do > this.
Sure, it's a problem no one cares about. Which seems to be not very productive from an overall point of view.
> And in fact that's precisely what I've had to do on several occasions > in the past, for various different clients.
What I'm talking about is solving an old problem (x==y). If it can be done in one function, fine. If it can't be done in less than a 20 function API, fine too. The complete API itself would document the depth and all the variations of the problem.
> On Thu, 01 Jul 1999 08:55:05 +0200, Helmut Leitner <leit...@hls.via.at> wrote: > >I found this (also in the C9X draft) but was not sure, > >whether this guarantees access to any padding bits and bytes > >of any other data type.
> It doesn't matter if you can zero all the holes. There is no guarantee that the > bits in the holes are going to remain constant.
I know. But there is also no good reason for them to change.
> This is an artificial > discussion in a sense, as I can't imagine a machine that really does this,
Yes, It is.
> but > imagine that the holes are filled with bits from the clock. Each time you > examine them, the bits in the holes can be different.
> Likewise, you can fill the holes in a struct, but there is no guarantee that the > values in the padding will remain constant.
Lets assume that there were a good reason for a compiler to change padding bits or hole bits quietly (and outside of a component assignment process).
typedef struct test { type1 x1; /* there is padding */ ......... /* there are holes */ } TEST;
Now we "repack" it:
typedef struct test_rp { union { TEST t; unsigned char ucbuf[sizeof(TEST)]; } u; } TEST_RP;
Now he can't do it anymore, because he would risk quietly changing value bits of ucbuf. But only during assigments does can he know how the union is used...
So it doesn't make sense to think about a compiler using padding bits and bytes or holes.
IMO it is a purely theoretical question, as long as there is no single implementer standing up and telling us that he uses paddings/holes and tells us about its usefulness.
<leit...@hls.via.at> wrote: > Lets assume that there were a good reason for a compiler to > change padding bits or hole bits quietly (and outside of a > component assignment process).
> typedef struct test { > type1 x1; /* there is padding */ > ......... /* there are holes */ > } TEST;
> Now he can't do it anymore, because he would risk quietly > changing value bits of ucbuf. But only during assigments does > can he know how the union is used...
> So it doesn't make sense to think about a compiler using > padding bits and bytes or holes.
> IMO it is a purely theoretical question, as long as there is no > single implementer standing up and telling us that he uses > paddings/holes and tells us about its usefulness.
Lets say you have a processor with a "load/store multiple register" instruction, like PowerPC or 68000. And you have a struct
struct test { long a; long b; long c; short d; long e; } x, y;
That can be done on a 68000 with one "Load 4 registers" and one "Store four registers" by a good optimising compiler. That will overwrite padding between d and e. If you dont want to destroy the padding, you have to use two more instructions.
In article <377cdc53.17664...@news.pacificnet.net> Ke...@Quitt.net "Kevin D. Quitt" writes:
>On Thu, 01 Jul 1999 08:55:05 +0200, Helmut Leitner <leit...@hls.via.at> wrote: >>I found this (also in the C9X draft) but was not sure, >>whether this guarantees access to any padding bits and bytes >>of any other data type.
>It doesn't matter if you can zero all the holes. There is no guarantee that the >bits in the holes are going to remain constant.
I'm not entirely convinced about that. Consider that you can access anu object as an array of unsigned char. Each unsigned char is an object in its own right and, except through some mechanism involving volatile or undefined behaviour, the value of an object cannot change except through a side-effect visible in the abstract machine.
> This is an artificial >discussion in a sense, as I can't imagine a machine that really does this, but >imagine that the holes are filled with bits from the clock. Each time you >examine them, the bits in the holes can be different.
That wouldn't be valid implementation.
>Likewise, you can fill the holes in a struct, but there is no guarantee that the >values in the padding will remain constant.
In fact the only things I can think of that might legitimately change padding bytes are structure assignment and possibly library functions that modify a structure (e.g. mktime() ). When you modify a structure member you are modifying an object through an lvalue of the member type. The parent structure is of little concern once the member object has been located. Modification of an object can't modify memory that isn't part of that object. So modifying a structure member can't modify bytes that are not part of that structure member.
A trickier question is whether modifying a bit-field can modify padding bits in the allocation unit used for the bit-field. I suspect not although I could imagine a compiler optimiser doing this to produce more efficient code (e.g. by leaving out some masking operations).
-- ----------------------------------------- Lawrence Kirby | f...@genesis.demon.co.uk Wilts, England | 70734....@compuserve.com -----------------------------------------
In article <377bd98d.16954...@news.pacificnet.net> Ke...@Quitt.net "Kevin D. Quitt" writes:
>On Tue, 29 Jun 99 20:51:17 , Bill Godfrey <bill-godf...@usa.net> wrote: >>Imagine an arbitary array of 10 ints, at 0x1234:0x0004. When moving a pointer >>about this array, only the offset needs changing.
>This may be the case, but there is no guarantee that only the offset will be >changed. Pointers may be normalized or denormalized by library routines or >generated code; nothing prevents this.
>>But wait! How did we get a pointer with value 0x1233:0x0024 in the first >>place?
>It doesn't matter - you can't guarantee one won't be created.
You can if it messes up a legitimate comparison operation in another part of the program. An implementation must be consistent in either maintaining a normalised form for pointers or generating code for comparisons that doesn't depend on a normalised form (although localised optimisations may apply).
-- ----------------------------------------- Lawrence Kirby | f...@genesis.demon.co.uk Wilts, England | 70734....@compuserve.com -----------------------------------------
In article <7ldc42$go...@nnrp1.deja.com> leit...@hls.via.at "Helmut Leitner" writes:
>In article <7laobh$im...@nnrp1.deja.com>, > Alex_K...@scitex.com wrote: >> > My naive implementation for this was >> > memset(p,'\0',size);
>> > Might it be possible that this does not reach all bits >> > in memory?
>> No, memset looks at an object as an array of unsigned chars; >> since there are no holes or paddings in unsigned char, memset reaches >> all bits.
>If this is true, then all my problems with StructCmp() >can be solved eventually.
I doubt it, at least not in a way that is ultimately simpler than comparing the structure members individually.
Consider another problem - a structure that contains arrays of char holding strings. Any data in the array after the null will have no effect on the string comparison but it will affect a memcmp() comparison unless you ensure that all trailing bytes are set to zero or another consistent value. This implies either knowledge of the structure dataformat in the comparison routine or overhead in anything that manipulates the array contents. The latter option can easily be mishandled producing difficult to track down bugs.
>But I remember vaguely C >discussions about implementations on top of LISP or >on pure floating point machines, that might behave >differently. Could you quote the standard?
The current standard is a bit vague in this respect. However functions like memcpy() are defined to copy arrays of unsigned characters. For them yo work properly (i.e. be able to copy objects of any type) all data within any type must be represented when viewed as an array of unsigned char. C9X will make such guarantees explicit.
In principle a C implementation could be built on top of something else such as the examples you give. These underlying "architectures" could use extra bits and datastructures that are invisible to the C program. However since they are invisible to the C program they are irrelevant. In particular such things cannot contribute to any object's value.
>> ... >> The problem is - no data type except unsigned chars is guaranteed >> not to have holes in its binary representation. So, for example, >> int a,b; >> a = 1; >> b = 1; >> -then memcmp(&a,&b,sizeof int) is not guaranteed to compare equal.
>that would clear any available holes in the int >data representation.
What you're saying is that you can force a canonical representation of any value that an object can store. Possibly bu such code would be highly platform-specific and it would still probably need to be done per structure type. A better approach would be to write a hashing function for each structure type. This would be far more maintainable and portable.
-- ----------------------------------------- Lawrence Kirby | f...@genesis.demon.co.uk Wilts, England | 70734....@compuserve.com -----------------------------------------
> In article <7ldc42$go...@nnrp1.deja.com> > leit...@hls.via.at "Helmut Leitner" writes:
> >In article <7laobh$im...@nnrp1.deja.com>, > > Alex_K...@scitex.com wrote: > >> > My naive implementation for this was > >> > memset(p,'\0',size);
> >> > Might it be possible that this does not reach all bits > >> > in memory?
> >> No, memset looks at an object as an array of unsigned chars; > >> since there are no holes or paddings in unsigned char, memset reaches > >> all bits.
> >If this is true, then all my problems with StructCmp() > >can be solved eventually.
> I doubt it, at least not in a way that is ultimately simpler than > comparing the structure members individually.
Well, one has to try. At first it seemed impossible, now it seems possible (at least for flat structures).
> Consider another problem - a structure that contains arrays of char > holding strings. Any data in the array after the null will have no > effect on the string comparison but it will affect a memcmp() > comparison unless you ensure that all trailing bytes are set to > zero or another consistent value. This implies either knowledge of > the structure dataformat in the comparison routine or overhead in > anything that manipulates the array contents.
Little overhead during the assignment (put '\0' after the end of the string to fill the complete array). StrCpySizeFillZero(d,p,sizeof(array)) No overhead for the calling interface, because the size of the array should already be included to protect the other components of the structure.
> The latter option can > easily be mishandled producing difficult to track down bugs.
I don't understand this argument.
> >But I remember vaguely C > >discussions about implementations on top of LISP or > >on pure floating point machines, that might behave > >differently. Could you quote the standard?
> The current standard is a bit vague in this respect. However functions > like memcpy() are defined to copy arrays of unsigned characters. For them > yo work properly (i.e. be able to copy objects of any type) all data > within any type must be represented when viewed as an array of unsigned > char. C9X will make such guarantees explicit.
> In principle a C implementation could be built on top of something > else such as the examples you give. These underlying "architectures" > could use extra bits and datastructures that are invisible to the C > program. However since they are invisible to the C program they are > irrelevant. In particular such things cannot contribute to any object's > value.
> >> ... > >> The problem is - no data type except unsigned chars is guaranteed > >> not to have holes in its binary representation. So, for example, > >> int a,b; > >> a = 1; > >> b = 1; > >> -then memcmp(&a,&b,sizeof int) is not guaranteed to compare equal.
> >that would clear any available holes in the int > >data representation.
> What you're saying is that you can force a canonical representation > of any value that an object can store. Possibly bu such code would be > highly platform-specific
I don't see this.
> and it would still probably need to be done > per structure type.
Why do you think so?
> A better approach would be to write a hashing function > for each structure type.
If it can't be made portable and independent your right.
> This would be far more maintainable and > portable.
If it can be made portable and structure independent, then IMO your wrong.