Full Seminar Details
University of Genoa
This event took place on Wednesday 02 May 2018 at 11:30
Over the past years, we have seen a proliferation of Linked Data sources, however often these are not easily exploitable unless they are augmented with additional links between them. Only by doing so we can contribute to the evolution of the Web into a deeply integrated global data space, which is close the mission of project like DBpedia and SciGraph. Although such links can be created manually for small datasets, large ones require more automated solutions, for example, to the aim of providing a formal definition of the relations between sources, or of extracting the latent information in the content by using information extraction techniques. However, the inherent complexity in generating crosslinks makes it hard to solve this problem automatically, thus, manual contribution is required in several steps of the approach. In our work, we faced similar challenges in the context of a project that aims at enriching SciGraph data with links to DBpedia. The goal was to increase the discoverability of the data and improve identity resolution in existing sources. The methodology we will discuss introduces two approaches to achieve the interlinking between SciGraph and DBpedia datasets i) Link Discovery for the structured data and ii) Named Entity Recognition (NER) for unstructured text. At the end of the project, 50k+ distinct links have been produced and links have been represented by relying on existing ontologies with additional metadata. In our presentation we will describe the advantages of each of the approaches and discuss the potential applications and next steps for this research.