This event took place on Wednesday 13 June 2012 at 11:30
Our ambitious long-term goal is to understand multimodal interaction between humans and we use a sports game, tennis, as a starting-point. In tennis, the goals of interactions are clearly defined and the interaction is subject to clear rules. As such, the game can be effectively analysed in terms of sequences of “events”. Our work focuses on the retrieval of these sequences from audio and visual information, and moves beyond low-level information classification or clustering of features to inferring the low-level structure of the game, a task which we believe could also be accomplished by an intelligent human who had no previous exposure to the game of tennis. The process of segmenting the stream of events present in the game is somewhat akin to a child learning how to segment a stream of speech into a sequence of words: the child notices that some phonetic sequences tend to re-occur, and that there are patterns of co-occurrence across different sequences. In this spirit, we will use a variable-length multigram model (VLMM) to search for regular occurring patterns of match events that are detected and inferred using multimodal information and constitute the basic “units” in a tennis match.