Integrating TEI/XML Text with Semantic Lexicographic Data

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

74 Downloads (Pure)

Abstract

Traditional excerption-based historical dictionaries often provide a very detailed semantic analysis of a high proportion of words in the corpora they cover. The Dictionary of Old Norse Prose will have analyzed and defined around 7% of all words in a 11 million word corpus, for example. Linking the semantic analysis of excerpted citations to new digital texts of the works in the corpus offers the potential to give much more detailed context for the citations in the dictionary and at the same time contextual semantic information (definitions) for a high proportion of specific words in the corpus. The task is nontrivial as it involves linking separately-formed datasets consisting of tens of thousands of tokens. This paper describes a process by which a very high proportion of citations in the dictionary are linked to individual words in new digital editions, using sorting and lexical information. The result is that users of the dictionary can view the citations in their full textual context, and read
Original languageEnglish
Title of host publicationDHN 2020 Digital Humanities in the Nordic Countries 2020 : Post-Proceedings of the 5th Conference Digital Humanities in the Nordic Countries (DHN 2020)
Number of pages10
Volume2865
Place of PublicationRiga
Publisherceur workshop proceedings
Publication date14 May 2021
Pages16-25
Publication statusPublished - 14 May 2021

Cite this