Full Seminar Details
Joćo Magalhćes
Imperial College London, and KMi, The Open University

This event took place on Wednesday 27 June 2007 at 11:30
To solve the problem of indexing collections with diverse text documents, image documents, or documents with both text and images, one needs to develop a model that supports heterogeneous types of documents.
In this paper, we show how information theory supplies us with the tools necessary to develop a unique model for text, image, and text/image retrieval. In our approach, for each possible query keyword we estimate a maximum entropy model based on exclusively continuous features that were pre-processed. The unique continuous feature-space of text and visual data is constructed by using a minimum description length criterion to find the optimal feature-space representation (optimal from an information theory point of view). We evaluate our approach in three
experiments: only text retrieval, only image retrieval, and text combined with image retrieval.