A Quantitative Comparison of Semantic Web Page Segmentation Approaches

Robert Kreuzer, J. Hage, Ad Feelders

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

We compare three known semantic web page segmentationalgorithms, each serving as an example of a particular approach to theproblem, and one self-developed algorithm, WebTerrain, that combinestwo of the approaches. We compare the performance of the four algorithmsfor a large benchmark of modern websites we have constructed,examining each algorithm for a total of eight configurations. We foundthat all algorithms performed better on random pages on average thanon popular pages, and results are better when running the algorithmson the HTML obtained from the DOM rather than on the plain HTML.Overall there is much room for improvement as we find the best averageF-score to be 0.49, indicating that for modern websites currentlyavailable algorithms are not yet of practical use.
Original languageEnglish
Title of host publicationEngineering the Web in the Big Data Era. ICWE 2015
PublisherSpringer
Pages374-391
Number of pages18
ISBN (Electronic)9783319198903
ISBN (Print)9783319198897
DOIs
Publication statusPublished - 10 Jun 2015

Publication series

NameLecture Notes in Computer Science
Volume9114
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Fingerprint

Dive into the research topics of 'A Quantitative Comparison of Semantic Web Page Segmentation Approaches'. Together they form a unique fingerprint.

Cite this