KMi Publications

Tech Reports

Tech Report kmi-06-09 Abstract


LRD: Latent Relation Discovery for Vector Space Expansion and Information Retrieval
Techreport ID: kmi-06-09
Date: 2006
Author(s): Alexandre Gonçalves, Jianhan Zhu, Dawei Song, Victoria Uren, Roberto Pacheco
Download PDF

In this paper, we propose a text mining method called LRD (latent relation discovery), which extends the traditional vector space model of document representation in order to improve information retrieval (IR) on documents and document clustering. Our LRD method extracts terms and entities, such as person, organization, or project names, and discovers relationships between them by taking into account their co-occurrence in textual corpora. Given a target entity, LRD discovers other entities closely related to the target effec-tively and efficiently. With respect to such relatedness, a measure of relation strength between entities is defined. LRD uses relation strength to enhance the vector space model, and uses the enhanced vector space model for query based IR on documents and clustering documents in order to discover complex rela-tionships among terms and entities. Our experiments on a standard dataset for query based IR shows that our LRD method performed significantly better than traditional vector space model and other five standard statistical methods for vector expansion.

Publication(s):

Alexandre Goncalves, Jianhan Zhu, Dawei Song, Victoria Uren, Roberto Pacheco. LRD: Latent Relation Discovery for Vector Space Expansion and Information Retrieval. In Proc. of The Seventh International Conference on Web-Age Information Management (WAIM 2006), June, Hong Kong, China.
 
KMi Publications
 

Multimedia and Information Systems is...


Multimedia and Information Systems
Our research is centred around the theme of Multimedia Information Retrieval, ie, Video Search Engines, Image Databases, Spoken Document Retrieval, Music Retrieval, Query Languages and Query Mediation.

We focus on content-based information retrieval over a wide range of data spanning form unstructured text and unlabelled images over spoken documents and music to videos. This encompasses the modelling of human perception of relevance and similarity, the learning from user actions and the up-to-date presentation of information. Currently we are building a research version of an integrated multimedia information retrieval system MIR to be used as a research prototype. We aim for a system that understands the user's information need and successfully links it to the appropriate information sources, be it a report or a TV news clip. This work is guided by the vision that an automated knowledge extraction system ultimately empowers people making efficient use of information sources without the burden of filing data into specialised databases.

Visit the MMIS website