Go to Google Groups Home    php-text-statistics
Re: Readability of html

Joel Nation <joel...@cyberone.com.au>

Okay I checked in my first changes. This covers all the HTML tags we
use at my work that should have a full stop in front of them. There
may be a couple of others, but this should cover the vast majority of
HTML use. I didn't use a preg_replace, more comfortable out of the
world of regexps! I don't have PHP4, but I'll check in a PHP4 version
tomorrow hopefully. I'll have to use a strtolower and then just use
str_replace. I don't have the PHPUnit framework at home so I haven't
checked in a test, but I have a test that I've been running that I'll
check in once I have the PHPUnit framework up and running.

I noticed you've added the dale_chall list to the wiki. Are you
planning to add the dale_chall function in? I've already written a
quick implementation for work and I can check that in also if you
want.

-Joel

On Sep 26, 5:17 am, "David Child" <d...@addedbytes.com> wrote:

> Hi Joel,

> Good points all. I've added you as a member to the project athttp://code.google.com/p/php-text-statistics/- you should be able to
> commit code now. Looking forward to seeing your additions!

> Dave

> On Mon, Sep 22, 2008 at 10:21 AM, Joel Nation <joel...@cyberone.com.au> wrote:

> > The problem with these readability scores is that they don't take into
> > consideration the way html works. For instance you very rarely put a
> > full stop in a heading tag (eg: <h1>Hello.</h1>) but this will affect
> > most of the scores as that word will now be added to the next sentence
> > and make it longer then it actually is. And with lots of headings you
> > actually making the page more readable. Ditto for lists as (atleast at
> > my work) we don't generally put a full stop after a list item. Running
> > the scores on one of our pages (http://www.accc.gov.au/content/
> > index.phtml/itemId/815360) I initially get a Flesch Kincaid Grade
> > Level of 24.9 (if I run it over the entire page) and 18.2 if I strip
> > the html tags out. But I get a much better score of 8.3 if I add full
> > stops after the correct tags before stripping the tags out. Of course
> > running it over just the content I start with a reading level of 11,
> > but I still end up with the 8.3 I add the full stops. I would suggest
> > that the code should be modified to take this into consideration. It's
> > only a few lines of extra code and I'm happy to check in my changes to
> > a branch if possible.

> --
> AddedBytes.com - Web Marketing and Development