9th Annual CLUK Research Colloquium

	Home
	Committees
	Programme
	Invited Speakers
	Abstract Submission
	Presentation Guidelines
	Transport & Venue





	Webcast Day 1
	Webcast Day 2

	Webcast archive will be available for at least 1 year after the colloquium

	Invited Speakers Professor Anne DeRoeck [Presentation date: 8th March] The Open University Title: Dataset Profiles: investigating the role of data in experimental NLP Abstract: It has been known for a long time that the performance of Information Retrieval and Natural Language Processing techniques in the context of a particular task is very sensitive to the characteristics of the data on which they are used. Though widely accepted, this fact has never been taken to its logical conclusion and in evaluation, for instance, experimental results are reported without reference to the impact of the underlying datasets or collections. This raises some very serious methodological, and practical issues around replicability. These could be addressed if we had reliable ways of profiling datasets, using measures that highlight relevant differences between collections. A first step would be to investigate what such measures might look like for a given range of tasks or techniques. In this talk, I will show that even standard textual datasets such as the TIPSTER collection differ in ways that challenge widely accepted assumptions about the general applicability of techniques, and that similar differences in data profile will show up between texts in the same genre but in different languages. In exploring what might be suitable profiling measures, I will set out some desirable properties that such measures should have. I will then introduce our work on modelling term burstiness, and explore what term distribution, and variations in burstiness patterns in the occurrence of a term can tell us about genres and datasets. Professor Jon Oberlander [Presentation date: 9th March] University of Edinburgh Title: The computational linguistics of affect: a personal view