500,000 newspaper pages from the absolute monarchy recreated as digital text data

Read about how researchers have created machine learning models that can recognise both layout and text on newspaper pages with high precision and segment the recognised text.

Mercury surrounded by a banner with the Latin motto "MUNDIVE LOCIOR AURACO" — Photo: ENO – Enevældens Nyheder Online: https://hislab.quarto.pub/eno/

The project Enevældens Nyheder Online (ENO) (News from the Absolute Monarchy Online) aims to recreate Denmark-Norway's newspaper corpus from the time period under absolute monarchy as digital text data.

Our interest in the material is about themes such as the labor market, crime and consumption. Royal Danish Library's enormous newspaper collection is an underutilised resource for social and cultural history research.

- Johan Heinsen, Aalborg University

More about the project

Project name	Enevældens News Online – ENO (News from the Absolute Monarchy Online)
Scientists	Johan Heinsen, professor, Department of Politics and Society, Aalborg University Camilla Bøgeskov, PhD student, Department of Politics and Society, Aalborg University
Related material	Johan Heinsen and Anders Dyrborg Birkemose, “Wanted: Identity, Coercion and Mobility, 1750-1850”, TEMP 14:27, 2023: 24-53. The project's data platform The project's language model
Service from Royal Danish Library	We used the material as it is made available through LOAR (https://loar.kb.dk/collections/3933596a-95ca-4927-b55c-3ba948ea6603) and mediastream's API.
Royal Danish Library	c. 500,000 newspaper pages as images with associated metadata about date, edition and page numbering. The images mostly come from the digitisation of the newspaper collection's microfilm, but we have also used new photographs of individual series that were not part of the original newspaper digitisation.
Contact at Royal Danish Library	Ask the library

The researcher explains further

Based on provided image files and expert sparring with Royal Danish Library, we have created machine learning models that can recognise both layout and text on newspaper pages with high precision, as well as segment the recognised text. Among other things, we have used the new version to train a historical language model DA-BERT_Old_News, which makes it possible to calculate semantic relationships between the more than five million newspaper texts.

500,000 newspaper pages from the absolute monarchy recreated as digital text data

More about the project

The researcher explains further

See how others have used the cultural heritage collections

OCR scanning of 839 books from Royal Danish Library's book collection

17th-century mortality crises in a rural parish in Southern Jutland

History of the Danish web 1992-1997