This data comes from a small random selection of the 4.5 million
http:// URLs listed in the Open
Directory Project, collected on . (There is an
obvious strong bias towards some major sites, to English-language and European
sites, to CNN.com, etc.)
Pages were downloaded with
curl, following redirections, and
those that returned HTTP code
Each page was passed through the HTML5 tokenisation algorithm, recording details
about start tags and their attributes and some other features. Non-PCDATA
<script>, etc) were handled
properly, but none of the rest of the tree-construction algorithm was performed.
All data was treated as ISO-8859-1.
It may be interesting to compare against Rene Saarsoo's survey of pages from the same source a year ago, and Google's older survey from an unidentified set of pages.
Can't call method "selectrow_array" on an undefined value at /var/www/canvex/survey/2007-07-17/analyse.cgi line 139.
For help, please send mail to the webmaster (firstname.lastname@example.org), giving this error message and the time and date of the error.