Research

Projects, PhD Research, Finished Grants, Finished Projects, Analysis, Groundwork, Studies, Questioning, Reasoning
Content Header Graphic
 

Finished Grants

 

Concept Learning and Structure Formation for Document Navigation.

ARC (Australian Research Council) Discovery 2003-2005, AU$211K.

The Freedom to Forget: Multimedia Knowledge Management

The research that was carried out during this fellowship was centred around the theme of Multimedia Information Retrieval, ie, video search engines, sketch databases, image databases, spoken document retrieval, music retrieval, query languages and query mediation. The main focus was to explore ways of content-based search, eg, search by image example (not by words matching the associated meta-data or library cards) or finding music pieces by humming. A related challenge is the question to which extent automated annotation and classification of multimedia objects can be made possible.

This project has made a number of contributions to the area of multimedia information retrieval ranging from i) the development and evaluation of simple image features such as texture, colour and shape for images based on psychological, signal-processing and statistical methods; ii) a novel polyphonic symbolic music representation that allows the use of ordinary text search technology such as google's to index and search music repositories by humming; iii) the introduction of novel automated structuring principles such as lateral similarity and search-result clustering that allow the user to browse (sub)collections intuitively; to iv) novel video summarisation schemes that are suitable, eg, for news search engines.

The overriding principle in this research has been the ability to create an easy, intuitive and user-friendly content-based multimedia search engine. To that end a number of research prototypes of music, image and video search engines were successfully developed and integrated into a multimedia search platform. This platform has undergone extensive metric-based evaluation in international collaborative evaluation conferences (such as TRECVID and ImageCLEF) where it has consistently proven to be amongst the top systems worldwide.

The research we have carried out so far during this fellowship has resulted in a well-designed and robust general framework for multimedia searches which lends itself to be deployed in specific application areas. Ultimately, those results are bound to improve searching, browsing, discovery and access in areas such as arts and media through imaginative navigation modes; crime prevention through automated analysis of CCTV footage; intellectual property through detection of trademark duplication or copyright infringement; journalism through content-based image searches and resource discovery, medical diagnosis through finding similar images from a database; and, in general, web repositories, cultural heritage collections and multimedia digital libraries.

This research is sponsored through the award of an EPSRC Advanced Research Fellowship from Oct 1999 to Sept 2004.

Low-cost, efficient, parallel algorithms for musical electronic learning aids

The EPSRC project GR/L 18273 Low-cost, efficient parallel algorithms for musical electronic learning aids was proposed to research, develop, implement and evaluate monophonic and polyphonic music recognition algorithms for use in computerised interactive musical learning systems. Specifically the aim was to develop real-time algorithms for note recognition in monophonic (task 1) and polyphonic (task 3) music. Further and essentially independent tasks were the development of a real-time tune recognition algorithm (task 2) and of an interactive electronic music tutor for a monophonic instrument (task 4).

We are pleased to report that considerable progress has been made, if along lines slightly different from the ones originally outlined in the proposal. Task 1 was completed early in the project and it was shown that this algorithm coped well with monophonic signals. However, the methods suggested in the proposal to extend this method to handle polyphony proved impractical. We were thus forced to return to more fundamental studies of pitch detection algorithms. Substantial theoretical and experimental investigations were carried out into existing algorithms and novel algorithms were developed and implemented that are capable of detecting notes in polyphonic music and which, we believe, represent significant advances over the current state-of-the-art in many aspects. Thus task 3, which was the most difficult fundamental part of the project, was successfully completed.

A two-step approach was adopted which divides the task of note recognition into two subtasks: (A) short-time spectral estimation of the musical signal, resulting in a time-frequency spectrum, and (B) note extraction based on the resulting spectra. Novel approaches have been developed for both the spectral analysis as well as the pattern recognition part of the note identification problem; for the former, the main novelty lies in the use of auto-regressive as opposed to conventional Fourier spectral estimators, for the latter in a combination of data classification methods and a topological approach to note identification which emphasises connectivity patterns in both time and pitch.

The resulting algorithms were coded in Mathematica and successfully tested with digitised recordings of both mono- and polyphonic piano music with up to 3 tones occurring simultaneously. At the time of writing one paper has been published [1], a second one is in preparation which will contain the major part of our results [2], and more technical issues are contained in an as yet unpublished report [3].

References:

[1] T von Schroeter (1998): Frequency Warping with Arbitrary Allpass Maps. IEEE Signal Processing Letters, 6, pp 116-118
[2] T von Schroeter and J Darlington (in preparation): Connectivity in auto-regressive spectra of polyphonic piano music - a topological approach to automated transcription.
[3] T von Schroeter: Auto-regressive spectral line analysis of piano tones, Technical report.