TY - GEN
T1 - Contextually propagated term weights for document representation
AU - Hansen, Casper
AU - Hansen, Christian
AU - Alstrup, Stephen
AU - Simonsen, Jakob Grue
AU - Lioma, Christina
PY - 2019/7/18
Y1 - 2019/7/18
N2 - Word embeddings predict a word from its neighbours by learning small, dense embedding vectors. In practice, this prediction corresponds to a semantic score given to the predicted word (or term weight). We present a novel model that, given a target word, redistributes part of that word's weight (that has been computed with word embeddings) across words occurring in similar contexts as the target word. Thus, our model aims to simulate how semantic meaning is shared by words occurring in similar contexts, which is incorporated into bag-of-words document representations. Experimental evaluation in an unsupervised setting against 8 state of the art baselines shows that our model yields the best micro and macro F1 scores across datasets of increasing difficulty.
AB - Word embeddings predict a word from its neighbours by learning small, dense embedding vectors. In practice, this prediction corresponds to a semantic score given to the predicted word (or term weight). We present a novel model that, given a target word, redistributes part of that word's weight (that has been computed with word embeddings) across words occurring in similar contexts as the target word. Thus, our model aims to simulate how semantic meaning is shared by words occurring in similar contexts, which is incorporated into bag-of-words document representations. Experimental evaluation in an unsupervised setting against 8 state of the art baselines shows that our model yields the best micro and macro F1 scores across datasets of increasing difficulty.
KW - Contextual semantics
KW - Document representation
KW - Word embeddings
U2 - 10.1145/3331184.3331307
DO - 10.1145/3331184.3331307
M3 - Article in proceedings
T3 - SIGIR 2019 - Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 897
EP - 900
BT - SIGIR 2019 - Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
PB - Association for Computing Machinery
T2 - 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019
Y2 - 21 July 2019 through 25 July 2019
ER -