Towards Automated Technologies in the Referencing Quality of Wikidata

Seyed Amir Hosseini Beghaeiraveri*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Wikidata is a general-purpose knowledge graph with the content being crowd-sourced through an open wiki, along with bot accounts. The Wikidata data model enables assigning references to every single statement. Currently, there are more than 1 billion statements in Wikidata, of which about 70% have got references. Due to the rapid growth of Wikidata, the quality of Wikidata references is not well covered in the literature. To cover the gap, we suggest using automated tools to verify and improve the quality of Wikidata references. For verifying reference quality, we develop a comprehensive referencing assessment framework based on Data Quality dimensions and criteria. Then, we implement the framework as automated reusable scripts. To improve reference quality, we use Relation Extraction methods to establish a reference-suggesting framework for Wikidata. During the research, we managed to develop a subsetting approach to create a comparison platform and handle the big size of Wikidata. We also investigated reference statistics in 6 Wikidata topical subsets. The results of the latter investigation indicate the need for a wider assessment framework, which we aim to address in this dissertation.

Original languageEnglish
Title of host publicationWWW '22: Companion Proceedings of the Web Conference 2022
PublisherAssociation for Computing Machinery
Pages324-328
Number of pages5
ISBN (Electronic)9781450391306
DOIs
Publication statusPublished - 16 Aug 2022
Event31st ACM Web Conference 2022 - Virtual, Online, France
Duration: 25 Apr 2022 → …

Conference

Conference31st ACM Web Conference 2022
Abbreviated titleWWW 2022
Country/TerritoryFrance
CityVirtual, Online
Period25/04/22 → …

Keywords

  • data quality
  • reference quality
  • relation extraction and linking
  • semantic web
  • subsetting
  • topical subset
  • Wikidata

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Fingerprint

Dive into the research topics of 'Towards Automated Technologies in the Referencing Quality of Wikidata'. Together they form a unique fingerprint.

Cite this