Abstract
Analyzing direct speech in historical literary texts provides insights into character dynamics, narrative style, and discourse patterns. In late 19th century Danish and Norwegian fiction direct speech
reflects characters’ social and geographical backgrounds. However, inconsistent
typographic conventions in Scandinavian
literature complicate computational methods for distinguishing direct speech from
other narrative elements. To address this,
we introduce an annotated dataset from the
MeMo corpus, capturing speech markers
and tags in Danish and Norwegian novels.
We evaluate pre-trained language models
for classifying direct speech, with results
showing that a Danish Foundation Model
(DFM), trained on extensive Danish data,
has the highest performance. Finally, we
conduct a classifier-assisted quantitative
corpus analysis and find a downward trend
in the prevalence of speech over time.
reflects characters’ social and geographical backgrounds. However, inconsistent
typographic conventions in Scandinavian
literature complicate computational methods for distinguishing direct speech from
other narrative elements. To address this,
we introduce an annotated dataset from the
MeMo corpus, capturing speech markers
and tags in Danish and Norwegian novels.
We evaluate pre-trained language models
for classifying direct speech, with results
showing that a Danish Foundation Model
(DFM), trained on extensive Danish data,
has the highest performance. Finally, we
conduct a classifier-assisted quantitative
corpus analysis and find a downward trend
in the prevalence of speech over time.
Original language | Danish |
---|---|
Title of host publication | Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025) : Proceedings of the Conference : March 3-4, 2025 |
Editors | Sara Stymne, Richard Johansson |
Publisher | University of Tartu Library |
Publication date | 3 Mar 2025 |
Pages | 1-7 |
DOIs | |
Publication status | Published - 3 Mar 2025 |
Event | NoDaLiDa/Baltic-HLT 2025 - Tallinn, Estonia Duration: 3 Mar 2025 → 4 Mar 2025 |
Conference
Conference | NoDaLiDa/Baltic-HLT 2025 |
---|---|
Country/Territory | Estonia |
City | Tallinn |
Period | 03/03/2025 → 04/03/2025 |