KMI @ NTCIR CrossLink competitionPetr Knoth, Friday 09 September 2011 | Annotate
The KMI team consisting of Petr Knoth, Lukas Zilka and Zdenek Zdrahal scored first in the NTCIR CrossLink competition in the manual assessment category in A2F P@5. The team placed consistently in the top three in other categories. Twelve international teams took part in the evaluation. NTCIR is a major forum (similar to TREC) of evaluation workshops designed to enhance research in Information Access (IA) technologies including information retrieval, question answering, text summarization, extraction, etc. The NTCIR-9 will take place as usually in Tokyo, Japan this December. The CrossLink task (Cross-Lingual Link Discovery - CLLD) is a way of automatically finding potential links between documents in different languages. It is not directly related to traditional cross-lingual information retrieval (CLIR) because CLIR can be viewed as a process of creating a virtual link between the provided cross-lingual query and the retrieved documents; but CLLD actively recommends a set of meaningful anchors in the source document and uses them as queries with the contextual information from the text to establish links with documents in other languages. Wikipedia is an online multilingual encyclopaedia that contains a very large number of articles covering most written languages and so it includes extensive hypertext links between documents of same language for easy reading and referencing. However, the pages in different languages are rarely linked except for the cross-lingual link between pages about the same subject. This could pose serious difficulties to users who try to seek information or knowledge from different lingual sources. Therefore, cross-lingual link discovery tries to break the language barrier in knowledge sharing. With CLLD users are able to discover documents in languages which they either are familiar with, or which have a richer set of documents than in their language of choice.