Workshops and tutorials are free to all registered for ECIR 2010. Anyone not attending ECIR 2010 wanting to participate in an event below needs to register for the "Workshop/Tutorial Day only" on the registration page.

Tutorials 28th March 2010

Designing Effective Search User Experiences Half day - Tutorial Full

13:30 - 17:00, CMR 11

Overview

This half-day tutorial provides a practical introduction to Human Centred Design for information search, access and discovery. We present the fundamental concepts and models of human information-seeking behaviour and show how to apply interaction design principles to the design of search user experiences. A key element of the tutorial is the opportunity to apply these skills in a practical group exercise.

Benefits

Participants will learn:

the fundamental concepts and principles of human information-seeking behaviour
how to differentiate between various types of search behaviour: known-item, exploratory, etc.
models of the information-seeking process, and how to apply interaction design principles based on those models
an understanding of the key variables of user type, goal and mode of interaction, and how to apply these variables when designing for varying user contexts
the role of design patterns, and how to apply Endeca UI design patterns and those of other pattern libraries in designing search experiences

Audience

Web designers, information architects, user experience architects, and IR professionals and researchers interested in the designing effective user experiences for search and information access.

Note: This tutorial will run on Sunday afternoon only and numbers are strictly limited, to take part in the practical exercises participants should bring with them an Internet-connected laptop.

Presenters

Tony Russell-Rose (User Experience Manager, Endeca Technologies) is User Experience Manager at Endeca Technologies, an enterprise software company specialising in innovative solutions for information search and discovery. Before joining Endeca, Tony was founder and director of UXLabs, a user experience consultancy specialising in technology innovation and applied R&D. Prior to this he was R&D group manager at Canon Research Centre Europe and technical lead at Reuters, specialising in advanced user interfaces for information access and search. He holds a PhD in HCI and a first degree in engineering, majoring in human factors. Tony is also Honorary Visiting Fellow at the Centre for Interactive Systems Research, City University London.

Mark Burrell (Worldwide Lead for User Experience, Endeca Technologies) is Worldwide Lead for User Experience at Endeca Technologies. He has over 25 years of professional experience, including 15 years focused on the evaluation, design, and adoption of interactive technology solutions (with special emphasis on applications that aim to support learning and discovery). Prior to joining Endeca, Mark built and led user experience teams at several leading product and service companies including serving as Sr. UX Manager for Microsoft’s Unified Communications product division and Global UX Lead for Sapient. Mark holds a PhD in Clinical psychology with concentrations in cognitive psychology and epistemology/philosophy of science.

Crowdsourcing for Relevance Evaluation Half day

14:00 - 17:30, CMR 15

General Description

Length: 3 hours (half-day)
Intended Audience: Introductory to intermediate
Background required: familiarity with information retrieval

Relevance evaluation is an essential part of the development and maintenance of information retrieval systems. There are a number of limitations with current approaches for relevance evaluation. Many Web search engines reportedly use large editorial staffs to judge the relevance of web pages for queries in an evaluation set. This is expensive and has obvious scalability issues. Academic researchers, without access to such editors, often rely instead on small groups of student volunteers. Because of the students’ limited time and availability, test sets are often smaller than desired, making it harder to detect statistically significant differences in performance by the experimental systems being tested.

Behavioral data is much cheaper than the editorial method but has limitations as well. It requires access to a large stream of data, something not always available to a researcher testing an experimental system and there are certain tasks for which it does not make sense.

Recently, crowdsourcing has emerged as a feasible alternative for relevance evaluation because it combines the flexibility of the editorial approach at a larger scale. Crowdsourcing is a term used to describe tasks that are outsourced to a large group of people instead of performed by an employee or contractor. Crowdsourcing is an open call to solve a problem or carry out a task and usually involves a monetary value in exchange for such service.

Amazon Mechanical Turk (AMT) is an example of a crowdsourcing platform that has gained a lot of attention as a tool for conducting different kinds of relevance evaluations. The AMT service is easy to use and has useful features for setting up experiments and collecting results. However, it is important to pay attention to the design of the experiment and its execution to gather useful results.

This tutorial is aimed at those who are interested in using crowdsourcing as another technique for performing different kinds of evaluations and user studies. The goal is to discuss the value of the crowdsourcing paradigm.

Tutorial Objectives

At the end of the tutorial, participants will understand:

When to use crowdsourcing for an experiment
How to use AMT via the user interface and API
How to set up relevance experiments
Apply design guidelines to maximize results

Presenter

Omar Alonso (Microsoft Bing.com) is part of the Bing team at Microsoft. He has been working on crowdsourcing for the last two years in industry and as a researcher applying this technique for a diverse set of experiments. Previously he was at A9.com (an Amazon.com company) and Oracle Corp. This past summer he was visiting researcher at Max-Planck Institute for Informatik. He is the workshop co-chair for SIGIR 2010. Omar holds a PhD in computer science from the University of California at Davis.

Distributed Information Retrieval Half day

09:00 - 12:30, CMR 11

Distributed Information Retrieval (DIR) integrates multiple searchable collections into one retrieval system. One of the advantages is that the DIR system can access Deep Web resources through their search interfaces without crawling them. Also it does not need to maintain a complete index of all the federated collections while retrieval results are always consistent and up-to-date.

The tutorial will give a background and motivation for DIR research. Main DIR architectures will be discussed, namely, broker-based architecture, DIR over peer-to-peer networks and Open Archive Initiative. The main accent will be made on the broker-based architecture and its main phases:

Resource Discovery: resources available for federation (having a search engine, supporting some communication protocol etc.) need to be discovered.
Resource Description: information about each resource should be acquired by using resource’s search engine.
Resource Selection: when a query is submitted the DIR system selects appropriate resources to send the query to.
Results Fusion: search results obtained from each of the selected search engines are fused into a single ranked list to be returned to the end user.
Results Presentation: finally, the results should be presented to the end user in a comprehensive and understandable way.

As a conclusion, current research directions in DIR will be discussed and several applications of DIR techniques in other research areas will be presented.

Presenters

Ilya Markov, Faculty of Informatics, University of Lugano, Switzerland
Fabio Crestani, Faculty of Informatics, University of Lugano, Switzerland

Machine Learning for IR: Recent Successes and New Opportunities Half day

09:00 - 12:30, CMR 15

This tutorial focuses on the interplay between information retrieval (IR) and machine learning. This intersection of research areas has seen tremendous growth and progress in recent years, much of it fueled by incorporating machine learning techniques into the core of information retrieval technologies, including Web search engines, e-mail and news filtering systems, music and movie recommendations, online advertising systems, and many others. As the complexity, scale, and user expectations for search technologies increase, it is becoming more and more important for each field to keep pace with and inform the other.

With that goal in mind, this tutorial will include the following topics. (Minor changes in this program are possible.)

Overview of Machine Learning

Basic ML concepts and their applications in IR
How to choose the right ML tools for IR applications

Recent advances at IR-ML crossroads

Text classification
Learning from user behavior
Learning to rank
Collaborative filtering

Emerging opportunities for Learning in IR

Online advertising
Robust retrieval and optimization models
Social media

This tutorial will be of interest to anyone with basic information retrieval knowledge who is interested in focusing on machine learning techniques as part of their IR research; or who may want to deploy machine learning technology as a component of an IR system; or who is simply curious about how widely-used IR techniques like Web search can benefit from learning algorithms now and in the future.
Web link: http://research.microsoft.com/ecir-2010-mlir-tutorial

Presenters

Paul Bennett (Presenter) is a researcher in the Context, Learning & User Experience for Search (CLUES) group at Microsoft Research where he works on using machine learning technology to improve information access and retrieval. His recent research has focused on pairwise preferences, human computation, text classification, sensitivity, calibration, and combination techniques. Paul obtained his Ph.D. in Computer Science from Carnegie Mellon University in 2006.
Web: http://research.microsoft.com/~pauben

Kevyn Collins-Thompson (Presenter) is a researcher in the Context, Learning & User Experience for Search (CLUES) group at Microsoft Research. His primary research interests involve the application of machine learning and optimization methods for more reliable and effective information retrieval algorithms. Kevyn obtained his Ph.D. in Computer Science from Carnegie Mellon University in 2008.
Web: http://research.microsoft.com/~kevynct

Misha Bilenko (Co-author) is a researcher in the Text Mining, Search & Navigation (TMSN) group at Microsoft Research. His research interests include machine learning, information retrieval and data mining tasks that arise in the context of large textual and behavioral datasets. His recent work has focused on learning algorithms that utilize user behavior data to improve web search and advertising. Misha completed his Ph.D. at the University of Texas at Austin in 2006.
Web: http://research.microsoft.com/~mbilenko

Workshops 28th March 2010

2nd International Workshop on Contextual Information Access, Seeking and Retrieval Evaluation (CIRSE) Half day

09:00 - 13:00, CMR 1

CIRSE Workshop aims to bring together IR researchers working on or interested in the evaluation of approaches to contextual information access, seeking and retrieval. Indeed, new research is needed to understand how to overcome the challenge of user-oriented evaluation and to design novel evaluation methodologies and criteria for contextual information retrieval evaluation.

Presenters

Bich-Liên Doan, Supélec, France
Joemon Jose, University of Glasgow, UK
Massimo Melucci, University of Padua, Italy
Lynda Tamine-Lechani, IRIT, France

1st International Workshop on Advances in Patent Information Retrieval (AsPIRe'10) Half day

09:15 - 12:45, KMi Podium, Berrill Building

Patent Information Retrieval is a cross cutting research area as it contains domains such as multilingual information retrieval, image processing and retrieval, language processing, and text categorization, clustering and mining. The main goal of the workshop is to gather scientists from these areas together to foster the collaboration among such interdisciplinary areas and spark discussions on open topics related to Information Retrieval and Machine Translation in the Intellectual Property domain in order to advance the current state-of-the-art of patent search tools.

A subset of 400,000 documents of the MAREC dataset is available for download. These documents can be accessed after registering to the MATRIXWARE.NET community (free registration).

Participants are encouraged to apply the techniques they develop to this dataset, where possible. This will allow the results of the presented techniques applied to the same dataset to be compared.

Presenters

Dr Helmut Berger, Matrixware Information Services, Vienna
Veronika Zenz, Matriware Information Services, Vienna
Allan Hanbury, Information Retrieval Facility, Vienna

Large-Scale Hierarchical Classification Full day

09:00 - 17:00, CMR 6

Hierarchies are becoming ever more popular for the organization of documents, particularly on the Web (e.g. Web directories). Along with their widespread use comes the need for automated classification of new documents to the categories in the hierarchy. Research on large-scale classification so far has focused on large numbers of documents and/or large numbers of features, with a limited number of categories. However, this is not the case in hierarchical category systems, such as DMOZ.

Approaching this problem, either existing large-scale classifiers can be extended, or new methods need to be developed. The goal of this workshop is to discuss and assess some of these strategies. In particular some of the issues that we expect to cover in the workshop are:

Learning to classify against many categories.
Data sparseness in the presence of large datasets.
Use of the statistical dependence of hierarchically organized classes.
The role of shrinkage methods in large hierarchies.
Ensemble methods for hierarchical classification.
Extending existing large-scale classifiers to hierarchies.
Challenging hierarchical classification tasks and datasets.

Important Dates

Paper submission - January 18
Acceptance notification - February 15
Final paper - March 1

For more information, please visit: http://lshtc.iit.demokritos.gr/workshop/

Presenters

Eric Gaussier, LIG, University Grenoble
Georgios Paliouras, NCSR "Demokritos", Athens, Greece
Aris Kosmopoulos, NCSR "Demokritos", Athens, Greece
Sujeevan Aseervatham, LIG, Univ. Grenoble, Yakaz, Paris, France

Workshop on Multilinguality in Information Access Evaluation (MLIA-CULT 2010): Bringing Content, Users, Languages and Tasks into the Loop Half day - Cancelled

Services and users of multilingual IR systems continue to evolve, with many new factors and trends influencing the field. E.g., we are moving to a situation in which there is no longer a single dominant language in which most online information is captured and, with the advance of broadband access and the evolution of both wired and wireless connectivity, users are not just information consumers, but also producers. Text now comes in many shapes—user generated, with low publishing threshold, heavily contextualised, often code switching and multi-lingual, and under little or no editorial control, changes the scene for much of the processing frameworks we previously have been able to assume: blogs, discussion forums, comments left behind on news sites, IM, SMS, Twitter—many new formats for textual interaction carry valuable and timely information, which needs specific tools for processing.

The goal of the workshop is to discuss and start understanding and improving the multilingual user experience. How can we move evaluation of multilingual information access (MLIA) systems beyond system benchmarking in the Cranfield/TREC-style tradition to assessing system effectiveness within today’s operational task contexts? Among other things, the workshop will explore the idea of “multilingual living laboratories” in which to conduct user studies at scale, where infrastructure and instruments for capturing user activity are created. The workshop will be organized around four key dimensions: (multilingual) content, (multilingual) users, (multilingual) tasks and (multilingual) evaluation methodology.

Presenters

Nicola Ferro, University of Padova, Italy
Maarten de Rijke, University of Amsterdam, The Netherlands

Information Access for Personal Media Archives (IAPMA 2010) Half day

13:30 - 17:30, KMi Podium, Berrill Building

Towards e-Memories: challenges of capturing, summarising, presenting, understanding, using, and retrieving relevant information from heterogeneous data contained in personal media archives.

It is now possible to archive much of our life experiences in digital form using a variety of sources, e.g. blogs written, tweets made, social network status updates, photographs taken, videos seen, music heard, physiological monitoring, locations visited and environmentally sensed data of those places, details of people met, etc. Information can be captured from a myriad of personal information devices including desktop computers, PDAs, digital cameras, video and audio recorders, and various sensors, including GPS, Bluetooth, and biometric devices.

In this workshop we seek to bring together researchers from diverse disciplines to exchange ideas on how we can advance towards the goal of effective capture, retrieval and exploration of e-memories. In addition to directly exploring relevant issues in information retrieval, we believe it is important for computing scientists to be aware of advances made in sensing technology by hardware engineers and material scientists, or for cognitive psychologists to be aware of the advances made by information scientists which may be of benefits to those with memory conditions, etc.

Presenters

Aiden R. Doherty, DCU, Ireland
Gareth J.F. Jones, DCU, Centre for digitial video processing, Ireland
Alan F. Smeaton, DCU, Ireland