|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Webcast archive will be available for at least 1 year after the colloquium |
|
|
|
Invited Speakers
Professor Anne DeRoeck [Presentation date: 8th March]
The Open University
Title: Dataset Profiles: investigating the role of data in experimental NLP
Abstract: It has been known for a long time that the performance of Information
Retrieval and Natural Language Processing techniques in the context of a
particular task is very sensitive to the characteristics of the data on
which they are used. Though widely accepted, this fact has never been
taken to its logical conclusion and in evaluation, for instance,
experimental results are reported without reference to the impact of the
underlying datasets or collections. This raises some very serious
methodological, and practical issues around replicability. These could
be addressed if we had reliable ways of profiling datasets, using
measures that highlight relevant differences between collections. A
first step would be to investigate what such measures might look like
for a given range of tasks or techniques.
In this talk, I will show that even standard textual datasets such as
the TIPSTER collection differ in ways that challenge widely accepted
assumptions about the general applicability of techniques, and that
similar differences in data profile will show up between texts in the
same genre but in different languages. In exploring what might be
suitable profiling measures, I will set out some desirable properties
that such measures should have. I will then introduce our work on
modelling term burstiness, and explore what term distribution, and
variations in burstiness patterns in the occurrence of a term can tell
us about genres and datasets.
Professor Jon Oberlander [Presentation date: 9th March]
University of Edinburgh
Title: The computational linguistics of affect: a personal view |
|
|
|
|
|
|
|
|