Classifying web pages with visual features

Open Access
Authors
Publication date 2010
Host editors
  • J. Filipe
  • J. Cordeiro
Book title WEBIST 2010
Book subtitle proceedings of the 6th International Conference on Web Information Systems and Technologies : Valencia, Spain, April 7-10, 2010
ISBN
  • 9789896740252
Event 6th International Conference on Web Information Systems and Technologies (WEBIST 2010), Valencia, Spain
Volume | Issue number 1
Pages (from-to) 245-252
Publisher Setúbal: INSTICC Press
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
To automatically classify and process web pages, current systems use the textual content of those pages, including both the displayed content and the underlying (HTML) code. However, a very important feature of a web page is its visual appearance. In this paper, we show that using generic visual features we can classify the web pages for several different types of tasks. The features used in this document are simple color and edge histograms, Gabor and texture features. These were extracted using an off-the-shelf visual feature extraction method. In three experiments, we classify web pages based on their aesthetic value, their recency and the type of website. Results show that these simple, global visual features already produce good classification results. We also introduce an online tool that uses the trained classifiers to assess new web pages.
Document type Conference contribution
Language English
Published at https://doi.org/10.5220/0002804102450252
Published at http://www.few.vu.nl/~vbr240/publications/Webist10Names.pdf
Downloads
28041 (Final published version)
Permalink to this page
Back