Extending defoe for the Efficient Analysis of Historical Texts at Scale

Rosa Filgueira, Claire Grover, Vasilios Karaiskos, Beatrice Alex, Sarah Van Eyndhoven, Lisa Gotthard, Melissa Terras

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents the new facilities provided in defoe, a parallel toolbox for querying a wealth of digitised newspapers and books at scale. defoe has been extended to work with further Natural Language Processing () tools such as the Edinburgh Geoparser, to store the preprocessed text in several storage facilities and to support different types of queries and analyses. We have also extended the collection of XML schemas supported by defoe, increasing the versatility of the tool for the
analysis of digital historical textual data at scale. Finally, we have conducted several studies in which we worked with humanities and social science researchers who posed complex and interested questions to large-scale digital collections. Results shows that
defoe allows researchers to conduct their studies and obtain results faster, while all the large-scale text mining complexity is automatically handled by defoe.
Original languageEnglish
Title of host publication17th IEEE International Conference on eScience 2021
PublisherIEEE
Pages1-9
Number of pages9
Publication statusAccepted/In press - 10 Jun 2021
Event17th IEEE eScience 2021
- online
Duration: 20 Sep 202123 Sep 2021
https://escience2021.org/

Conference

Conference17th IEEE eScience 2021
Period20/09/2123/09/21
Internet address

Keywords

  • text mining
  • distributed queries
  • High- Performance Computing
  • XML schemas
  • digital tools
  • digitised primary historical sources
  • humanities research

ASJC Scopus subject areas

  • Information Systems
  • Arts and Humanities (miscellaneous)

Fingerprint

Dive into the research topics of 'Extending defoe for the Efficient Analysis of Historical Texts at Scale'. Together they form a unique fingerprint.

Cite this