Research

Projects, PhD Research, Finished Grants, Finished Projects, Analysis, Groundwork, Studies, Questioning, Reasoning
Content Header Graphic
 

Finished Projects

 

Peter Howarth (2003-2007): Multimedia Indexing

This thesis investigates three of the core components of content-based image retrieval: visual features, similarity functions and indexing methods. In the content-based paradigm images are searched in a purely visual domain, where they are represented by high-dimensional features. Exhaustively searching this feature space can take a prohibitively long time. I argue that by the use of judicious approximations we can search large collections interactively, while keeping a good level of retrieval performance. I first carry out a thorough investigation of texture features. This results in novel feature modifications which give significant improvement in retrieval performance. The adaptations are based on considering the representation of texture within whole, real images.

Secondly, I present the first application of fractional dissimilarity functions to image retrieval. These functions emphasise points close to the query in each dimension, while reducing the noise from others. Through a thorough evaluation we show that these functions give a consistent improvement in retrieval performance across a wide range of collections and features. Following this theme, I generalise a class of local similarity functions that exhibit similar behaviour. These functions decompose features into dimensions and index these independently.

By selecting only the points local to the query in each dimension, we can ignore a large proportion of data and maintain retrieval performance. Experiments with five different image collections show that the optimal proportion of each dimension, to maximise retrieval performance, decreases from 10% to 0.1% - with a corresponding collapse of computing cycles - as the collection size increases from ten thousand to one million images.

The culmination of this work brings together the separate strands of research to implement an image search system. We demonstrate that, using normal hardware, this system was able to search four million images in just over one second, thus satisfying the goal of effective real-time searching of large image collections.

Alexei Yavlinsky (2003-2007): Automated Image Annotation using Invariant Image Statistics

Searching digital information archives on the Internet and elsewhere has become a significant part of our daily lives. Amongst the rapidly growing body of information there are a vast number of digital images. The task of automated image retrieval is complicated by the fact that many images do not have adequate textual descriptions.

Retrieval of images through analysis of their visual content is therefore an exciting and a worthwhile research challenge. In this thesis we argue that models of simple image features, such as global colour and texture, can be used to predict instances of different objects and scenes within photographic images. On this basis we propose the use of nonparametric density estimation to model these features and thus endow unlabelled images with probabilities of containing particular objects and scenes. This process, termed "automated image annotation", enables us to set up a scalable image indexing framework that allows users to retrieve unlabelled images from large collections using simple keyword queries. In this thesis we first investigate which image features yield good annotation performance on a number of different test image collections. We pay particular attention to modelling these features effectively.

Our experiments show that top benchmark performance results can be rivaled by our approach. Notably, we demonstrate that in addition to enabling retrieval of unlabelled images, our image annotation method can be used for improving the accuracy of text-based Internet image search. We then investigate whether our chosen image features model the presence of objects and scenes in a general and consistent manner. We do so by rigorously comparing the features' characteristic values for similar semantic image categories in different image collections. The investigation results are positive, indicating that our annotation method is suffciently general.

Finally, we show that automatically assigned image annotations can be re-used to improve the accuracy of the initial image annotation index at a small computational cost. This is a useful property for maintaining indexes of very large image collections.

Keywords: automatic image annotation automated image annotation learning image captions statistics of natural images

Ed Schofield (2002-2007): Pattern recognition on large sample spaces by constrained entropy maximisation

This thesis investigates the iterative application of Monte Carlo methods to the problem of parameter estimation for models of maximum entropy, minimum divergence, and maximum likelihood among the class of exponential-family densities. It describes a suite of tools for applying such models to large domains in which exact computation is not practically possible.

The first result is a derivation of estimators for the Lagrange dual of the entropy and its gradient using importance sampling from a measure on the same probability space or its image under the transformation induced by the canonical suffcient statistic. This yields two benefits. One is the flexibility to choose an auxiliary distribution for sampling that reduces the standard error of the estimates for a given sample size. The other is the opportunity to re-weight a fixed sample iteratively, which can cut the computational burden for each iteration.

The second result is the derivation of matrix-vector expressions for these estimators. Importance-sampling estimates of the entropy dual and its gradient can be computed effciently from a fixed sample; the computation is dominated by two matrix-vector products involving the same matrix of sample statistics.

The third result is an experimental study of the application of these estimators to the problem of estimating whole-sentence language models. The use of importance sampling in conjunction with sample-path optimization is feasible whenever the auxiliary distribution does not too severely under-represent any linguistic features under constraint. Parameter estimation is rapid, requiring a few minutes with a 2006-vintage computer to fit models under hundreds of thousands of constraints. The procedure is most effective when used to minimize divergence (relative entropy) from existing baseline models, such as n-grams estimated by traditional means, rather than to maximize entropy under constraints on the probabilities of rare n-grams.

Daniel Heesch 2001 - 2005: The NNk technique for image searching and browsing

The project is concerned with the development of intelligent methods for retrieving visual information from diverse collections of images, including wildlife photos, paintings, and technical sketches based on image content, i.e. automatically extracted image features rather than the text that may or may not accompany an image. One of the principal challenges in the area of content-based image retrieval derives from the fact that the meaning of an image is much richer and more ambiguous than information organized in textual form. Not only is a picture worth a thousand words, the set of words is also likely to vary between different users. Initial approaches to overcome this problem of polysemy includes the use of relevance feedback as a means of teaching the system the best set of features to use. Though the performance benefits through relevance feedback are well documented, the traditional technique suffers fundamental limitations, one of which is the fast convergence of performance during the first few iterations. Intelligently organized browsing structures that utilize user information collected as they interact with the collection seem to hold much more promise.

We have proposed a novel structure that has already proven very useful for the task of automatic image retrieval. In it, images are represented as vertices in a large connected and directed graph. Any two images are connected by an arc if one is the nearest neighbour of the other for at least one combination of different image features. This construction principle allows us to expose to the user the semantic richness of images. Even if the features employed may not capture every aspect of the image, it is made explicit to the user what the capabilities of the system are. Furthermore, the structure is highly interactive as it is entirely precomputed and thus allows for instantaneous navigation. Study of the digraph has revealed that it shares the interesting small-world properties first investigated by Watts and Strogatz in 1998: the average distance between nodes is remarkably small (around 3-4 links for image collections of 30,000 images), the average vertex degree is very low (20-30 outward arcs per node) while the clustering coefficient is much higher than for a random, Erdoes-Renyii graph. Current research involves elaborating techniques which exploit user information gathered along the browsing trail as well as applying the network to the traditional search-by-example scenario and evaluate its usefulness for automatic image annotation.

Take a look at the prototype system which is linked to from the demo page. Meanwhile the ECIR 2004 paper details the principal ideas behind the technique.

Marcus Pickering 2000 - 2004: Video Retrieval and Summarisation

This PhD project in the area of Multimedia Information Retrieval will focus in particular on Video Retrieval.

The work is supported by AT&T Laboratories, Cambridge.

Shyamala Doraisamy 2000 - 2004: Polyphonic Music Retrieval: The N-gram approach

This PhD project involves the study of content-based retrieval techniques for Music Information Retrieval systems, with a focus on polyphonic music data. Multimedia indexing and retrieval systems, together with standard principles of IR are amongst the technologies to be used with this media towards the system development and evaluation. One clear challenge is addressing human musical perception.