Google Mail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Readability of html
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  7 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Follow-up To:
Add Cc | Add Follow-up to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers that you hear
 
Joel Nation  
View profile   Translate to Translated (View Original)
 More options 22 Sep 2008, 10:21
From: Joel Nation <joel...@cyberone.com.au>
Date: Mon, 22 Sep 2008 02:21:49 -0700 (PDT)
Local: Mon 22 Sep 2008 10:21
Subject: Readability of html
The problem with these readability scores is that they don't take into
consideration the way html works. For instance you very rarely put a
full stop in a heading tag (eg: <h1>Hello.</h1>) but this will affect
most of the scores as that word will now be added to the next sentence
and make it longer then it actually is. And with lots of headings you
actually making the page more readable. Ditto for lists as (atleast at
my work) we don't generally put a full stop after a list item. Running
the scores on one of our pages (http://www.accc.gov.au/content/
index.phtml/itemId/815360) I initially get a Flesch Kincaid Grade
Level of 24.9 (if I run it over the entire page) and 18.2 if I strip
the html tags out. But I get a much better score of 8.3 if I add full
stops after the correct tags before stripping the tags out. Of course
running it over just the content I start with a reading level of 11,
but I still end up with the 8.3 I add the full stops. I would suggest
that the code should be modified to take this into consideration. It's
only a few lines of extra code and I'm happy to check in my changes to
a branch if possible.

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Child  
View profile   Translate to Translated (View Original)
 More options 25 Sep 2008, 19:17
From: "David Child" <d...@addedbytes.com>
Date: Thu, 25 Sep 2008 19:17:32 +0100
Local: Thurs 25 Sep 2008 19:17
Subject: Re: Readability of html
Hi Joel,

Good points all. I've added you as a member to the project at
http://code.google.com/p/php-text-statistics/ - you should be able to
commit code now. Looking forward to seeing your additions!

Dave

--
AddedBytes.com - Web Marketing and Development

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Joel Nation  
View profile   Translate to Translated (View Original)
 More options 7 Oct 2008, 11:00
From: Joel Nation <joel...@cyberone.com.au>
Date: Tue, 7 Oct 2008 03:00:58 -0700 (PDT)
Local: Tues 7 Oct 2008 11:00
Subject: Re: Readability of html
Okay I checked in my first changes. This covers all the HTML tags we
use at my work that should have a full stop in front of them. There
may be a couple of others, but this should cover the vast majority of
HTML use. I didn't use a preg_replace, more comfortable out of the
world of regexps! I don't have PHP4, but I'll check in a PHP4 version
tomorrow hopefully. I'll have to use a strtolower and then just use
str_replace. I don't have the PHPUnit framework at home so I haven't
checked in a test, but I have a test that I've been running that I'll
check in once I have the PHPUnit framework up and running.

I noticed you've added the dale_chall list to the wiki. Are you
planning to add the dale_chall function in? I've already written a
quick implementation for work and I can check that in also if you
want.

-Joel

On Sep 26, 5:17 am, "David Child" <d...@addedbytes.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Child  
View profile   Translate to Translated (View Original)
 More options 7 Oct 2008, 11:27
From: "David Child" <d...@addedbytes.com>
Date: Tue, 7 Oct 2008 11:27:20 +0100
Local: Tues 7 Oct 2008 11:27
Subject: Re: Readability of html
Hi Joel,

Great work. Will run tests against PHPUnit when at home later, but all
looks fine.

I've been working, sporadically, on a few of the other various
readability scores, including Spache and Dale-Chall. Would be great to
see what you've come up with for Dale-Chall so far.

Some of the readability scores are decidedly ropey, I've come to
realise. Certainly none seem to make use of the power of computers in
any meaningful way. Perhaps it's time to come up with a better
readability score?

Dave


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Joel Nation  
View profile   Translate to Translated (View Original)
 More options 9 Oct 2008, 07:53
From: Joel Nation <joel...@cyberone.com.au>
Date: Wed, 8 Oct 2008 23:53:37 -0700 (PDT)
Local: Thurs 9 Oct 2008 07:53
Subject: Re: Readability of html
I agree, they do look a little dodgey (especially when it's really
random numbers), but they have all been tested and there relative
effectiveness has been ranked (dale-chall being the best one I know
of). I think we can use computers in another way - to suggest how to
improve the text. Since most of them rely on sentence length, the
easiest thing to do is provide a way for the code to highlight your
longest sentences. I'm doing this at work at the moment, but it's
actually not terribly useful as it's hard to determine what effect
shortening the sentence will have. A smarter way would be for the
system to analyse your longest sentences (say top 10) and then
determine how much of an effect on the reading level would have if you
halved the top few (since on average your probably going to split a
sentence in half). As it goes down the list, the effect on the reading
level would reduce and you could stop when it wasn't reducing it by
more than a certain factor (say half a grade point).

Another way we could improve it is to use the Dale Chall common word
list to highlight the 'complex' words (words not in the common list)
and then suggest alternatives that are in the Dale Chall list. The
problem here is that you have to
 - some how work out synonyms (maybe a mash-up with an online
thesaurus)
 - determine which words it makes sense to make common (as some words
are unavoidable - proper nouns, domain terms etc)

On 7 Oct, 21:27, "David Child" <d...@addedbytes.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Child  
View profile   Translate to Translated (View Original)
 More options 11 Oct 2008, 14:00
From: "David Child" <d...@addedbytes.com>
Date: Sat, 11 Oct 2008 14:00:07 +0100
Local: Sat 11 Oct 2008 14:00
Subject: Re: Readability of html
I've run the unit tests and your changes work fine on the test text -
great stuff, Joel.

It would be useful to have some test HTML to run unit tests against.
I'll start putting some together. I'm also trying to sort out the
Dale-Chall and Spache unit tests. I've added the word lists to the
repository already so others can have a play with them if they want.

Dave


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Child  
View profile   Translate to Translated (View Original)
 More options 11 Oct 2008, 14:04
From: "David Child" <d...@addedbytes.com>
Date: Sat, 11 Oct 2008 14:04:38 +0100
Local: Sat 11 Oct 2008 14:04
Subject: Re: Readability of html
And I completely forgot to actually reply to the bulk of your message,
Joel. Sorry about that - was distracted by bacon :)

I think that's a great idea - identifying places for improvements, and
highlighting difficult words, would make a really useful tool. Making
blanket suggestions is a good start, and a synonym mashup would be
very cool.

Also, is it my imagination or do none of the readability scores take
account of commas and semi-colons in text? Surely their addition can
make text far more easily readable?

Dave


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google