Now that I've hooked up preliminary SmartWord filtering into the main
Filter utility, I think I'm about ready to make a first public
release. There are still a few issues to iron out and these are listed
below...
Filter is dropping words...
This seems to happen when the word in question spans two "lines". (I
define lines here as 255 characters.) What I'm doing at the moment is
to read in a line from the input file at a time and split this into
words, which I then check against the database. If it comes back as
found, then I write this to the output line and continue to the next
word. If not found, then it enters it into the unknowns list to be
written to a file when the program finishes with the document. What I
think I should be doing is checking if the word is in the dictionary,
if not, then are we at the end of the line? If we are then store it
and move on to the beginning of the next line. When we see a word
which isn't in the dictionary and we're beginning a new line, then we
add the current word to the end of the stored word to see if we get a
match. I've tried variations of this a couple of times now, with
different, but un-desired results.
AddWords single word logging...
Still haven't got anywhere with this long standing bug. A full
description of this one is in the !ReadMe file distributed with the
archive.
You can download what I have so far from http://www.garethlock.com/acorn/stdumper/stdump.zip