Tech Report

Unsupervised data linking using a genetic algorithm

As commonly accepted identifiers for data instances in semantic datasets (such as ISBN codes or DOI identifiers) are often not available, discovering links between overlapping datasets on the Web is generally realised through the use of fuzzy similarity measures. Configuring such measures, i.e. deciding which similarity function to apply to which data properties with which parameters, is often a non-trivial task that depends on the domain, ontological schemas, and formatting conventions in data. Existing solutions either rely on the user's knowledge of the data and the domain or on the use of machine learning to discover these parameters based on training data. In this report, we present a novel approach to tackle the issue of data linking which relies on the unsupervised discovery of the required similarity parameters. Instead of using labeled training data, the method takes into account several desired properties which the distribution of output similarity values should satisfy.

The method includes these features into a fitness criterion used in a genetic algorithm to establish similarity parameters that maximise the quality of the resulting linkset according to the considered properties. We show in experiments using benchmarks as well as real-world datasets that such an unsupervised method can reach the same levels of performance as manually engineered methods, and how the different parameters of the genetic algorithm and the fitness criterion affect the results for different datasets.

ID: kmi-11-02

Date: 2011

Author(s): Andriy Nikolov,Mathieu d'Aquin,Enrico Motta

Resources:
Download PDF

View By

Other Publications

Jobs

Research Assistant / Associate x 2

Knowledge Media Institute (KMi)
£29,799 - £38,833
Based in Milton Keynes
Temporary contracts until 30th June 2019

The Open University’s Knowledge Media Institute (KMi) is a distinct research unit within the Faculty of Science, Technology, Engineering and Mathematics (STEM) in the UK. KMi has openings for two Research Associate/Assistant positions to join a successful team at the leading edge of research and development in several areas, including Data Science, Semantic Technologies, Visual Analytics, and Urban Computing. The team, which is led by Professor Enrico Motta, has recently been awarded a number...

Research Assistant / Associate

Knowledge Media Institute (KMi)
£29,799 - £38,833
Based in Milton Keynes
Temporary contract until 31 December 2018

The Knowledge Media Institute (KMi) is looking for a Research Assistant or a Research Associate (depending on qualification), to work on an EU funded project – Up2U. The project will be focusing on the context of secondary schools, often referred to as high schools, which provide secondary education between the ages of 11 and 19 depending on the country, after primary school and before higher education. The learning context from the perspective of the students is the intersection of...

Research Assistant / Associate

Knowledge Media Institute (KMi)
£29,799 - £38,833
Based in Milton Keynes
Temporary contract until 31st December 2018

The Open University’s Knowledge Media Institute (KMi) is a distinct research unit within the Faculty of Science, Technology, Engineering and Mathematics (STEM). KMi is looking for a Research Assistant or a Research Associate to work on an EU funded project – Hub4NGI. The project will strengthen and coordinate the work done by ongoing and upcoming projects, focusing on Next Generation Internet Experimentation (NGI-E), with the goal to transform the current NGI-E setting into an increasingly...

Project Officer - Data Hub Development

Knowledge Media Institute (KMi)
£32,548 - £38,833
Based in Milton Keynes
Temporary contract until 30 June 2019

The Open University’s Knowledge Media Institute (KMi) is a distinct research unit within the Faculty of Science, Technology, Engineering and Mathematics (STEM). KMi is looking for a Project Officer to support the maintenance and evolution of the MK Data Hub, a computational infrastructure for acquiring and managing city data originally developed in the context of the MK:Smart project, www.mksmart.org. The position is supported by two projects funded by the European Regional Development Fund...

CONTACT US

Knowledge Media Institute
The Open University
Walton Hall
Milton Keynes
MK7 6AA
United Kingdom

Tel: +44 (0)1908 653800

Fax: +44 (0)1908 653169

Email: KMi Support

COMMENT

If you have any comments, suggestions or general feedback regarding our website, please email us at the address below.

Email: KMi Development Team