Find our open datasets containing raw text from monographs, newspapers and audio recordings in our Library Open Access Repository (LOAR).
Det Kgl. Bibliotek continually takes new initiatives to support data science.
In our Library Open Access research data Repository (LOAR), we have included the following material:
- datasets based on books printed up to 1881 (due to the 140 year copyright rule)
- datasets with Freedom of Press Writings
- a large collection of OCR (optical character recognition) text based on digitised newspapers from 1660 to 1877
- The Ruben collection which contains Denmark's first sound recordings (1889-1895)
The datasets can be used for natural language processing, text and data mining for research and teaching use.
Contact us via email@example.com if you have questions about the metadata and uses of the datasets.