Google Mail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Message from discussion Readability of html

View Parsed - Show only message text

MIME-Version: 1.0
Received: by 10.150.54.6 with SMTP id c6mr189161yba.26.1223373658245; Tue, 07 
	Oct 2008 03:00:58 -0700 (PDT)
Date: Tue, 7 Oct 2008 03:00:58 -0700 (PDT)
In-Reply-To: <e8c787520809251117v45cd65eet39788f6ea804ba7a@mail.gmail.com>
X-IP: 59.167.54.10
References: <8c4e15ff-20dd-4683-8ed5-f1dcf5fc81a0@k36g2000pri.googlegroups.com> 
	<e8c787520809251117v45cd65eet39788f6ea804ba7a@mail.gmail.com>
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; en) 
	AppleWebKit/525.18 (KHTML, like Gecko) Version/3.1.2 Safari/525.22,gzip(gfe),gzip(gfe)
Message-ID: <592b0d74-213b-4628-980c-71668b2dc98c@a18g2000pra.googlegroups.com>
Subject: Re: Readability of html
From: Joel Nation <joel...@cyberone.com.au>
To: php-text-statistics <php-text-statistics@googlegroups.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Okay I checked in my first changes. This covers all the HTML tags we
use at my work that should have a full stop in front of them. There
may be a couple of others, but this should cover the vast majority of
HTML use. I didn't use a preg_replace, more comfortable out of the
world of regexps! I don't have PHP4, but I'll check in a PHP4 version
tomorrow hopefully. I'll have to use a strtolower and then just use
str_replace. I don't have the PHPUnit framework at home so I haven't
checked in a test, but I have a test that I've been running that I'll
check in once I have the PHPUnit framework up and running.

I noticed you've added the dale_chall list to the wiki. Are you
planning to add the dale_chall function in? I've already written a
quick implementation for work and I can check that in also if you
want.

-Joel

On Sep 26, 5:17=A0am, "David Child" <d...@addedbytes.com> wrote:
> Hi Joel,
>
> Good points all. I've added you as a member to the project athttp://code.=
google.com/p/php-text-statistics/- you should be able to
> commit code now. Looking forward to seeing your additions!
>
> Dave
>
>
>
> On Mon, Sep 22, 2008 at 10:21 AM, Joel Nation <joel...@cyberone.com.au> w=
rote:
>
> > The problem with these readability scores is that they don't take into
> > consideration the way html works. For instance you very rarely put a
> > full stop in a heading tag (eg: <h1>Hello.</h1>) but this will affect
> > most of the scores as that word will now be added to the next sentence
> > and make it longer then it actually is. And with lots of headings you
> > actually making the page more readable. Ditto for lists as (atleast at
> > my work) we don't generally put a full stop after a list item. Running
> > the scores on one of our pages (http://www.accc.gov.au/content/
> > index.phtml/itemId/815360) I initially get a Flesch Kincaid Grade
> > Level of 24.9 (if I run it over the entire page) and 18.2 if I strip
> > the html tags out. But I get a much better score of 8.3 if I add full
> > stops after the correct tags before stripping the tags out. Of course
> > running it over just the content I start with a reading level of 11,
> > but I still end up with the 8.3 I add the full stops. I would suggest
> > that the code should be modified to take this into consideration. It's
> > only a few lines of extra code and I'm happy to check in my changes to
> > a branch if possible.
>
> --

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google