Preliminary study into query translation for patent retrieval

Charles Jochim, Christina Lioma, Hinrich Schütze, Steffen Koch, Thomas Ertl

Publikation: Bidrag til bog/antologi/rapportBidrag til bog/antologiForskningpeer review

12 Citationer (Scopus)

Abstract

Patent retrieval is a branch of Information Retrieval (IR) aiming to support patent professionals in retrieving patents that satisfy their information needs. Often, patent granting bodies require patents to be partially translated into one or more major foreign languages, so that language boundaries do not hinder their accessibility. This multilingual-ity of patent collections offers opportunities for improving patent retrieval. In this work we exploit these opportunities by applying query translation to patent retrieval. We expand monolingual patent queries with their translations, using both a domain-specific patent dictionary that we extract from the patent collection, and a general domain-free dictionary. Experimental evaluation on a standard CLEF-IP dataset shows that using either translation dictionary fetches similar results: query translation can help patent retrieval, but not always, and without great improvement compared to standard statistical monolingual query expansion (Rocchio). The improvement is greater when the source language is English, as opposed to French or German, a finding partly due to the effect of the complex French and German morphology upon translation accuracy, but also partly due to the prevalence of English in the collection. A thorough per-query analysis reveals that cases where standard query expansion fails (e.g. zero recall) can benefit from query translation.
OriginalsprogEngelsk
TitelProceedings of the 3rd international workshop on Patent information retrieval
Antal sider10
ForlagAssociation for Computing Machinery
Publikationsdato2010
Sider57-66
ISBN (Elektronisk)978-1-4503-0384-2
DOI
StatusUdgivet - 2010
Udgivet eksterntJa
Begivenhed3rd International Workshop on Patent Information Retrieval - Toronto, Canada
Varighed: 26 okt. 201026 okt. 2010
Konferencens nummer: 3

Konference

Konference3rd International Workshop on Patent Information Retrieval
Nummer3
Land/OmrådeCanada
ByToronto
Periode26/10/201026/10/2010

Citationsformater