Digitization of the collections at Ømålsordbogen – the Dictionary of Danish Insular Dialects: challenges and opportunities

Research output: Contribution to journalConference articleResearchpeer-review

135 Downloads (Pure)

Abstract

Ømålsordbogen (the Dictionary of Danish Insular Dialects, henceforth DID) is an historical dictionary giving thorough descriptions of the dialects, i.e. the spoken vernacular of peasants and fishermen, on the Danish isles Seeland, Funen and surrounding islands. It covers the period from 1750 to 1950, the core period being 1850 to 1920. Publishing began in 1992 and the latest volume (11, kurv-lindorm) appeared in 2013 but the project was initiated in 1909 and data collection dates back to the 1920s and 1930s. The project is currently undergoing an extensive process of digitization: old, outdated editing tools have been replaced with modern (database, xml, Unicode), and the old, printed volumes have been extracted to xml as well and are now searchable as a single xml file. Furthermore, the underlying physical data collections are being digitized.
In the following we give a brief account of the latter digitization process, involving the physical collections, and we discuss a number of questions and dilemmas that this process gives rise to. The collections underlying the DID project comprise a variety of sub-collections characterized by a large heterogeneity in terms of form as well as content. The information on the paper slips is usually densified, often idiosyncratic, and normally complicated to decode, even for other specialists. The digitization process naturally points towards web publication of the collections, either alone or in combination with the edited data, but it also gives rise to a number of questions. The current digitization process being very basic, only very few metadata (1-2 or 3) can be added during the scanning process, we point to the obvious fact that web publication of the collections presupposes an addition of further, carefully selected metadata, taking different user needs and qualifications into account. We also discuss the relationship between edited and non-edited data in a publication perspective. Some of the paper slips are very difficult to decipher due to handwriting or idiosyncratic densification and we point out that web publication in a raw, i.e. non-edited or non-annotated form, might be more misleading than helpful for a number of users.
Original languageEnglish
JournalCEUR Workshop Proceedings
Volume2084
Pages (from-to)341-348
Number of pages8
ISSN1613-0073
Publication statusPublished - 3 Apr 2018
EventDigital Humanities in the Nordic Countries 3rd Conference - HELDIG – the Helsinki Centre for Digital Humanities at the University of Helsinki, the Faculty of Arts, Helsinki, Finland
Duration: 7 Mar 20189 Mar 2018
Conference number: 3
https://www.helsinki.fi/en/helsinki-centre-for-digital-humanities/dhn-2018

Conference

ConferenceDigital Humanities in the Nordic Countries 3rd Conference
Number3
LocationHELDIG – the Helsinki Centre for Digital Humanities at the University of Helsinki, the Faculty of Arts
Country/TerritoryFinland
CityHelsinki
Period07/03/201809/03/2018
Internet address

Keywords

  • Faculty of Humanities
  • digital humanities
  • cultural heritage
  • dialect dictionaries
  • digitization
  • metadata

Cite this