The New Lexicon Poeticum

Publikation: KonferencebidragKonferenceabstrakt til konferenceForskningpeer review

Abstract

The New Lexicon Poeticum (lexiconpoeticum.org) is a project to produce a new lexicographic resource covering Old Norse poetry (initially the category known as skaldic poetry). It is based on the corpus produced by the Skaldic Project (supported project no. 60 of the Union Académique Internationale, with funding provided by UK Arts & Humanities Research Council, Australian Research Council, Joint Committee of the Nordic Research Councils for Humanities, the National Endowment for the Humanities, Deutsche Forschungsgemeinschaft and other bodies). SkP is nearing completion, with over 80% of the corpus entered into its digital resource. SkP was inspired by major problems with previous research, in particular Finnur Jónsson’s edition (1915-18) of the corpus of skaldic poetry (Skj) and the dictionary based on it (Lexicon Poeticum, 2nd ed. 1931). While Skj is a monumental work which has provided the foundation for almost a century of skaldic poetry studies, Finnur Jónsson used a heavy hand of intervention, with frequent and silent emendation. His lexicon, based on his own corpus, is therefore founded on a body of material that does not accurately reflect the manuscript evidence. It includes a large number of words that only exist through editorial conjecture, and omits large numbers of words that are evidenced in the manuscript tradition, particularly as manuscript variants are largely ignored. This situation has left a significant gap in methodologies between the material evidence of the poetic lexicon and the resources to analyse it. SkP provides the foundation for the current project because it will have re-edited the entire corpus based on current philological and textual editing methodologies. The edition is in the form of a digital resource (skaldic.abdn.ac.uk) from which the printed volumes are exported. It links together the normalised, occasionally emended edition with variant readings, manuscripts, secondary literature, prose contexts and previous editions. It includes unnormalised transcriptions of the main manuscripts of the corpus and significant numbers of variant manuscripts. The new resource will be linked directly to these resources, enabling the lexicon to be understood in its complex contexts. ONP, founded in 1939, is the major dictionary of Old Norse. The poetic corpus was specifically excluded from ONP because of the lack of a reliable edition of this material — a lack that is now being addressed by SkP. ONP has a sophisticated database with a web interface that links the lexicon to the citation index and textual corpus. It uses reliable diplomatic editions and manuscript spellings, but is reliant on those editions rather than the manuscripts themselves. The skaldic project’s corpus is in a relational database structure with all words entered as separate items, with a normalised syntax and translation linked to each word, along with linked manuscript information including variants. It differs from lemmatised XML texts in that the lemmata (headwords) are linked to the (future) dictionary entry. The nature of the corpus is such that there are a very large number of headwords: with 100,000 words lemmatised, over 13,000 headwords have been linked to the corpus. Lemmatising produces an automatic concordance with a full set of contextual translations. Owing to the structure of the corpus database, each headword can be linked to its manuscript witnesses and to nominal periphrases (kennings) in which it occurs. There are a number of questions that arise from the project as it has been conceived: 1. How to create interfaces for linking hundreds of thousands of words to tens of thousands of headwords. Additionally, variants add another 20% to the corpus, but need their status and relationship to the manuscript preserved. All this information must be in a form that can be checked and updated. Some forms of analysis were performed by the original project (diction (kenningar and heiti), translations, free text variants); others were not (lexical variants, lemmatising, compounds). 2. How to maintain alignment with both the original database and other lexicographic projects, particularly ONP, so that a word’s use and history can be researched across corpora. 3. As a more general question, how to create a meaningful and useful lexical resource when the original and underlying corpus is so rich in itself, with translation, notes and commentary linked to each word — and how to publish it in the current metrics-driven research environment. User interfaces The original skaldic project uses a web interface to enter, edit and manage the data of the project. Relational databases differ from XML as there is no inherent connection between the data structure and its digital storage (serialisation). This has the advantage that the data can easily be exported in a number of ways, but direct editing of the data is not easy to perform. Early on I developed a web application for both viewing the edition, browsing the contextual information and editing the data, with customised forms for entering the textual data, and a generic interface for dealing with other information. This allowed editors to produce editions where a putative natural prose order is linked to each text (allowing for easier interpretation and potential morphosyntactic analysis), as well as a translation, with each word linked and reordered. Each stanza has a full set of linked manuscript references, as well as variants linked to both the words and manuscripts. The process of lemmatising has been performed on the original corpus, again facilitated by the user interface. A web form lists all the words in a stanza or block of text. The user can select the lemma if it has the same form as the text, or look up the lemma by entering a search term. Variations in form and spelling are saved and used to prompt the user when they next occur, although all choices must be confirmed manually. The word list was originally taken from ONP (with permission) and has been supplemented as new headwords are identified. The_New_Lexicon_Poeticum_1 Figure 1: Detail of form for assisted lemmatisation The new lexicon will include all variant manuscript readings, something that previous lexica poetica have not documented systematically. As the original variants were entered as free text, rather than as words within the data structure for words in the database, the new project needs to add these to the corpus. To aid this process I have created a web form which uses the variant apparatus in the corpus database to prompt the user to add lexical variants and link them to headwords. This is a complex process, with no direct correspondence between the words linked in the main text and those in the variants, but the interface attempts to analyse the information in the database to facilitate the process. The_New_Lexicon_Poeticum_2 Figure 2: Detail of form for adding and lemmatising lexical variants Relationship to other dictionaries The original word list for the lexicon was copied from ONP almost a decade ago. Unfortunately the original unique identifiers for this list were not saved, and both the original ONP wordlist and the new lexicon’s wordlist have continued to evolve. The connection between headwords in the two lexica is not reliable but we are making efforts to recover and check this information so that a single interface can be built to both resources. There are still some questions regarding the nature and function of the new lexicon. The process of lemmatising a corpus with translations linked to each word produces already a concordance of all words with a gloss that effectively gives the interpretation of that word by the editor. Further information about each word can often be found in the notes linked to the word. What, then, does a dictionary entry for the word add to the information already available? Additionally, the prose dictionary ONP will have more comprehensively covered the more common words in the lexicon. Should LP simply supplement that lexicon, or should it be a full description of the skaldic lexicon in its own right? These questions derive from broader issues about the nature of traditional scholarship as DH methods become increasingly sophisticated. Using and visualising the data The linking of the rich corpus to dictionary headwords in itself provides an enormous amount of information for each word. The current interface shows all instances of each word with contextual translation and linked notes where relevant, plus compounds. Words occurring within kennings (nominal periphrases) are also explained in this context. Additionally, using the linked manuscript information, all manuscripts representing the word in both the base text and variants can be listed. Analysis can be performed on this information to see, for example, the way parts of speech are distributed within each stanza and half-stanza of poetry. We plan to perform more nuanced analyses of the metrics by using the grammatical information linked by this process to identify line types (e.g. the Sievers/Kuhn system). Additional dating information for both the manuscripts and the poetry (albeit unreliable at this stage) allows us to trace the history of the word in its poetic and material sources. Likewise, adding geographical data based on the poem’s place of composition and/or recitation allows us to perform diatopic analyses of the words and language of the corpus. Bibliography Tarrin Wills, ‘The thirteenth-century runic revival in Denmark’, NOWELE 67 (2016), 114-129. Tarrin Wills, ‘Social Media as a Research Method’, Communication, Research & Practice [special issue ‘Digital Media Research Methods: How to research and the implications of new media data’], 2:1 (2016), 7-19. doi:10.1080/22041451.2016.1155312 Tarrin Wills, ‘Semantic modelling of the Pre-Christian Religions of the North’, Digital Medievalist 9 (2014) <http://www.digitalmedievalist.org/journal/9/wills/>; Tarrin Wills, ‘Relational Data Modelling of Textual Corpora: The Skaldic Project and its Extensions’, Literary and Linguistic Computing [Digital Scholarship in the Humanities] (2013) doi:10.1093/llc/fqt045. Odd Einar Haugen, Matthew Driscoll, Karl Gunnar Johansson, Rune Kyrkjebø, Tarrin Wills, The Menota Handbook: Guidelines for the electronic encoding of medieval Nordic primary sources (Bergen: Medieval Nordic Text Archive (Menota), 2008).
OriginalsprogEngelsk
Publikationsdato2017
StatusUdgivet - 2017
BegivenhedDigital Humanities in the Nordic Countries, 2nd Conference - Götenburg, Sverige
Varighed: 14 mar. 201716 mar. 2017

Konference

KonferenceDigital Humanities in the Nordic Countries, 2nd Conference
Land/OmrådeSverige
ByGötenburg
Periode14/03/201716/03/2017

Citationsformater