Denmark's Letters
The digitiation of Denmarks Letters makes it possible to search across publications of letters. You can find letters from the same sender across all publications, for example.
Photo: Ophav ukendt
The material behind the dataset
The collection Denmark's Letters contains digitisations of a large number of printed letter publications from the library's collections, published over a number of years by many different publishers. Senders and recipients are often leading and powerful people with an impact on Danish political and cultural history. There are currently over 13,000 letters from the 16th century to 1937. The collection as a whole contains over 70,000 letters, which are continuously released as the copyright on the letter publications expires.
The digitisations make it possible to search across letter publications. For example, you can find letters from the same sender, even if the letters have not been printed in the same publication. It is also possible to do a full-text search to investigate whether a sender uses certain words or phrases. And finally, Denmark's Letters has made it easier to use the library's many letter publications, regardless of where in the world you are located.
Search and view the digitised letters.
About the dataset
The dataset consists of:
- A folder with the xml files containing the approximately 13,000+ letters described above.
- A .txt file containing a Danish stopword list with stopwords from the 18th, 19th and 20th centuries.
- A .csv file containing the letter text and associated bibliographic metadata. The letter text is in two columns. In one (raw_text), the text appears raw as extracted from the xml file. In the other (text_st), the text is stripped of tabs, newlines, and hyphens, and spaces are inserted before common grammatical characters such as periods, commas, and exclamation marks.
Metadata has been attempted to be harmonised, and the harmonised data has been inserted into new columns. In the harmonisation, for example, names and places have been standardised as far as possible, which means that if a letter states that it was sent from either Christiania, Kristiania or Oslo, then the standardised column will state that it was sent from Oslo. The most important metadata fields consist of sender (sender_st), recipient (recipiant_st) and year (year_st).
The dataset can be used for statistics and plotting of sender and recipient networks, for example. It can also select a cohort of specific historical figures, for example certain women, or certain professions, for example soldiers, and examine the textual content of their letters. Studies of semantic fields and emotions would also be a possibility. Finally, there is also the possibility of examining the geographical entities of the letters and conducting spatial analyses.
The dataset is free of copyright.
The creation of the dataset
The dataset was created based on the data that also underlies the access solution. As a starting point, metadata originates from the original printed letter publications. Digitisation and metadata were carried out by Royal Danish Library. In connection with the digitisation, some metadata from the letter publications has been harmonised to simplify the search. In some cases, this harmonisation has resulted in new metadata. The publishing principles for the individual letter publications may differ, as the works have been published over a long period of years and by many different publishers of letters.
The character recognition for the printed letters is around 99%, which means that there will be some errors in the digital text. The character recognition is also slightly lower for certain foreign languages and for texts printed in fraktur (Gothic letters).