Abstract
We introduce a word embedding method that generates a set of real-valued word vectors from a distributional semantic space. The semantic space is built with a set of context units (words) which are selected by an entropy-based feature selection approach with respect to the certainty involved in their contextual environments. We show that the most predictive context of a target word is its preceding word. An adaptive transformation function is also introduced that reshapes the data distribution to make it suitable for dimensionality reduction techniques. The final low-dimensional word vectors are formed by the singular vectors of a matrix of transformed data. We show that the resulting word vectors are as good as other sets of word vectors generated with popular word embedding methods.
| Originalsprog | Engelsk |
|---|---|
| Tidsskrift | Journal of Experimental and Theoretical Artificial Intelligence |
| Vol/bind | 32 |
| Udgave nummer | 4 |
| Sider (fra-til) | 557-579 |
| Antal sider | 23 |
| ISSN | 0952-813X |
| DOI | |
| Status | Udgivet - 3 jul. 2020 |
| Udgivet eksternt | Ja |
Bibliografisk note
Publisher Copyright:© 2019, © 2019 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.