BAILANDO Project: WebTango -> Tools -> Profiles of Highly-Rated Web Interfaces

	WebTango Automating Web Site Evaluation
Home People Tools Publications Talks Contact Us		Profiles of Highly-Rated Web Interfaces Several statistical models have been developed based on analysis of 5,346 pages from 639 sites from the 2000 Webby Awards. The models classify English pages and sites into one of three classes using Webby judges' ratings: good (top 33% of sites); average (middle 34% of sites); and poor (bottom 33% of sites). The page-level and site-level models employed by the Analysis Tool are summarized below. More information can be found in our publications. Page-Level Models Overall Page Quality: a decision tree model for classifying a page into the good, average, and poor classes without considering the functional type of a page or the content category (see below). The model also reports the decision tree rule that generated the prediction. Closest Good Page Cluster: a K-means clustering model for mapping a page into one of the three clusters of good pages-- small-page, large-page, and formatted-page. The model reports the distance between the measure values on a page and the measure values at the cluster centroid; this distance reflects the total standard deviation units of difference across all measures. The model reports the top 10 measures that are consistent with the closest cluster. The model also reports the top 10 measures that are inconsistent with the closest cluster and acceptable metric values. In both cases, measures are ordered by their importance in distinguishing pages in the three clusters. Page Type Quality: discriminant classification models for classifying a page into the good, average, and poor classes when considering the functional type of a page-- home, link, content, form, and other. The model reports the top 10 measures that are consistent with the page type. The model also reports the top 10 measures that are inconsistent with the page type and acceptable metric values. In both cases, measures are ordered by their importance in distinguishing pages in the good, average, and poor classes. A separate decision tree model predicts the functional type of a page based on page-level measures. Content Category Quality: discriminant classification models for classifying a page into the good, average, and poor classes when considering the content category of the site-- community, education, finance, health, living, and services. Each model reports the top 10 measures that are consistent with the content category. Each model also reports the top 10 measures that are inconsistent with the content category and acceptable metric values. In both cases, measures are ordered by their importance in distinguishing pages in the good, average, and poor classes. Users specify content categories for sites when using the Analysis Tool. Site-Level Models PLEASE NOTE: To ensure consistency with site level models, the Site Crawler Tool should be used to download pages from sites. The crawler should be configured to crawl 3 levels on each site and to download 15 level-one pages and 3 level-two pages from each level-one page. Site-level models do not take page-level quality into consideration. Thus, it is possible for a site to be classified as good even though all of the pages in the site are classified as poor and vice versa. To remedy this situation, the median predictions for pages in the site are reported. Please consider these median page-level predictions when determining the overall quality of a site. Overall Site Quality: a decision tree model for classifying a site into the good, average, and poor classes without considering the content category (see below). The model also reports the decision tree rule that generated the prediction. Predictions generated by the overall page quality model (see above) are considered separately. The median prediction across pages in a site is returned. Content Category Quality: desicision tree models for classifying a site into the good, average, and poor classes when considering the content category of the site-- community, education, finance, health, living, and services. Each model reports the decision tree rule that generated the prediction. Predictions generated by the content category page-level models (see above) are considered separately. The median prediction across pages in a site is returned for each content category.

Home \| People \| Tools \| Publications \| Talks \| Contact Us