Abstract
Wikidata is the only general-purpose open knowledge graph
with the capability of specifying references for every single statement. Currently, about 68% of Wikidata statements have at least one reference but the quality of these references is rarely covered in data quality studies. There is also a lack of a comprehensive framework for evaluating references. In this paper, we investigate the statistics of Wikidata references in 6 topical subsets of Wikidata. We compare these statistics over two Wikidata dumps; one from 2016 and one from 2021.
with the capability of specifying references for every single statement. Currently, about 68% of Wikidata statements have at least one reference but the quality of these references is rarely covered in data quality studies. There is also a lack of a comprehensive framework for evaluating references. In this paper, we investigate the statistics of Wikidata references in 6 topical subsets of Wikidata. We compare these statistics over two Wikidata dumps; one from 2016 and one from 2021.
Original language | English |
---|---|
Article number | 3 |
Journal | CEUR Workshop Proceedings |
Volume | 2982 |
Publication status | Published - 14 Oct 2021 |
Event | 2nd Wikidata Workshop 2021 - Duration: 24 Oct 2021 → 24 Oct 2021 https://wikidataworkshop.github.io/2021/ |
Keywords
- Data quality
- Gene Wiki
- Reference quality
- Topical subset
- WikiProject
- Wikidata
ASJC Scopus subject areas
- General Computer Science