Message from discussion
Readability of html
Received: by 10.214.244.8 with SMTP id r8mr21964qah.23.1222366653295;
Thu, 25 Sep 2008 11:17:33 -0700 (PDT)
Return-Path: <d...@addedbytes.com>
Received: from qb-out-1314.google.com (qb-out-1314.google.com [72.14.204.172])
by mx.google.com with ESMTP id 7si16015180yxg.0.2008.09.25.11.17.33;
Thu, 25 Sep 2008 11:17:33 -0700 (PDT)
Received-SPF: pass (google.com: domain of d...@addedbytes.com designates 72.14.204.172 as permitted sender) client-ip=72.14.204.172;
Authentication-Results: mx.google.com; spf=pass (google.com: domain of d...@addedbytes.com designates 72.14.204.172 as permitted sender) smtp.mail=d...@addedbytes.com
Received: by qb-out-1314.google.com with SMTP id f12so457922qba.42
for <php-text-statistics@googlegroups.com>; Thu, 25 Sep 2008 11:17:32 -0700 (PDT)
Received: by 10.210.63.5 with SMTP id l5mr100952eba.87.1222366652489;
Thu, 25 Sep 2008 11:17:32 -0700 (PDT)
Received: by 10.210.79.6 with HTTP; Thu, 25 Sep 2008 11:17:32 -0700 (PDT)
Message-ID: <e8c787520809251117v45cd65eet39788f6ea804ba7a@mail.gmail.com>
Date: Thu, 25 Sep 2008 19:17:32 +0100
From: "David Child" <d...@addedbytes.com>
To: php-text-statistics@googlegroups.com
Subject: Re: Readability of html
In-Reply-To: <8c4e15ff-20dd-4683-8ed5-f1dcf5fc81a0@k36g2000pri.googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
References: <8c4e15ff-20dd-4683-8ed5-f1dcf5fc81a0@k36g2000pri.googlegroups.com>
Hi Joel,
Good points all. I've added you as a member to the project at
http://code.google.com/p/php-text-statistics/ - you should be able to
commit code now. Looking forward to seeing your additions!
Dave
On Mon, Sep 22, 2008 at 10:21 AM, Joel Nation <joel...@cyberone.com.au> wrote:
>
> The problem with these readability scores is that they don't take into
> consideration the way html works. For instance you very rarely put a
> full stop in a heading tag (eg: <h1>Hello.</h1>) but this will affect
> most of the scores as that word will now be added to the next sentence
> and make it longer then it actually is. And with lots of headings you
> actually making the page more readable. Ditto for lists as (atleast at
> my work) we don't generally put a full stop after a list item. Running
> the scores on one of our pages (http://www.accc.gov.au/content/
> index.phtml/itemId/815360) I initially get a Flesch Kincaid Grade
> Level of 24.9 (if I run it over the entire page) and 18.2 if I strip
> the html tags out. But I get a much better score of 8.3 if I add full
> stops after the correct tags before stripping the tags out. Of course
> running it over just the content I start with a reading level of 11,
> but I still end up with the 8.3 I add the full stops. I would suggest
> that the code should be modified to take this into consideration. It's
> only a few lines of extra code and I'm happy to check in my changes to
> a branch if possible.
> >
>
--
AddedBytes.com - Web Marketing and Development