Scanning the oldest printed books

The Danish Latin-language literature printed 1482-1600 is now being OCR scanned in a collaboration between the Carlsberg Foundation, Aarhus University and Royal Danish Library.

A series of old printed books

Photo: Karsten Bundgaard

A large eScience project in DeiC National Cultural Heritage Cluster at Royal Danish Library must OCR scan an extensive corpus of Latin texts that are part of the Danish cultural heritage. The project is led by Professor Marianne Pade, Aarhus University.

The Latin texts are OCR scanned; OCR stands for "Optical Character Recognition", and this means that it becomes possible to do a full text search in the Latin texts. It has not been possible until now to OCR scan early printed books with the existing equipment, but this has changed with the method OCR4all, which has recently been developed by a group of researchers from the University of Würzburg precisely for the purpose of OCR scanning early printed books.

The project will utilize the new method on an extensive amount of Danish material that is absolutely central to the knowledge of Danish culture in the late Middle Ages and Renaissance.

All Latin texts must be scanned - or just about

The plan is to scan all Latin texts printed in Danish areas or written by Danes between 1482 and 1600, with the exception of editions by classical Latin authors. The texts are registered in Lauritz Nielsen, Dansk Bibliografi I-II (Copenhagen 1919, 1931-33, 2nd enlarged edition 1996), and texts from the period 1536-1600 are registered in the Database of Nordic Neo-Latin Literature with all necessary metadata. The Danish areas also includes Norway, Iceland, Skåneland and Schleswig. The extensive corpus is for the most part already available as regular image files.