Building Sense Representations in Danish by Combining Word Embeddings with Lexical Resources

Ida Rørmann Olsen, Asad Sayeed, Bolette Sandford Pedersen

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

75 Downloads (Pure)

Abstract

Our aim is to identify suitable sense representations for NLP in Danish. We investigate sense inventories that correlate with human interpretations of word meaning and ambiguity as typically described in dictionaries and wordnets and that are well reflected distributionally
as expressed in word embeddings. To this end, we study a number of highly ambiguous Danish nouns and examine the effectiveness of
sense representations constructed by combining vectors from a distributional model with the information from a wordnet. We establish
representations based on centroids obtained from wordnet synsets and example sentences as well as representations established via
a clustering approach; these representations are tested in a word sense disambiguation task. We conclude that the more information
extracted from the wordnet entries (example sentence, definition, semantic relations) the more successful the sense representation vector.
Original languageEnglish
Title of host publicationGlobalex Workshop on Linked Lexicography : LREC 2020 Workshop Language Resources and Evaluation Conference
Number of pages7
Place of PublicationMarseille, France
PublisherEuropean Language Resources Association
Publication date2020
Pages45-52
ISBN (Electronic)979-10-95546-46-7
Publication statusPublished - 2020

Cite this