Go to the main content Skift sprog til dansk

500,000 newspaper pages from the absolute monarchy recreated as digital text data

Read about how researchers have created machine learning models which can recognize both layout and text on newspaper pages with high precision and segment the recognized text.

Mercury surrounded by a banner with the Latin motto "MUNDIVE LOCIOR AURACO"

Photo: ENO – Enevældens Nyheder Online: https://hislab.quarto.pub/eno/

The project Enevældens Nyheder Online (ENO) (News from the Absolute Monarchy Online) aims to recreate Denmark-Norway's newspaper corpus from the time period under absolute monarchy as digital text data.

Our interest in the material is about themes such as the labor market, crime and consumption. Royal Danish Library's enormous newspaper collection is an underutilised resource for social and cultural history research.

- Johan Heinsen, Aalborg University

More about the project

Project nameEnevældens News Online – ENO (News from the Absolute Monarchy Online)
Scientists
  • Johan Heinsen, professor, Department of Politics and Society, Aalborg University
  • Camilla Bøgeskov, PhD student, Department of Politics and Society, Aalborg University
Related material
Service from Royal Danish LibraryWe used the material as it is made available through LOAR (https://loar.kb.dk/collections/3933596a-95ca-4927-b55c-3ba948ea6603) and mediastream's API.
Royal Danish Libraryc. 500,000 newspaper pages in image form with associated metadata about date, edition and page numbering. The images mostly come from the digitization of the newspaper collection's microfilm, but we have also used new photographs of individual series that were not part of the original newspaper digitization.
Contact at Royal Danish LibraryAsk the library


The researcher explains further

Based on provided image files and expert sparring with Royal Danish Library, we have created machine learning models that can recognize both layout and text on newspaper pages with high precision, as well as segment the recognized text. Among other things, we have used the new version to train a historical language model DA-BERT_Old_News, which makes it possible to calculate semantic relationships between the more than five million newspaper texts.