Web Data Mining
Knowledge Media Institute, The Open University, Milton Keynes, UK
3 year fully-funded PhD (Oct. 2012-Sept.2015)
Stipend: £40,770 (£13,590/year)
The Web is currently being flooded with data, from information reported in documents for human consumption to, more and more, open data available in structured and reusable forms (APIs, linked data, etc.) While a lot of efforts have been dedicated to the integration of data, using conceptual models such as ontologies, we are now facing the unprecedented need for support in "interpreting" data gathered from a large number of distributed, heterogeneous and un-controlled sources on the Web.
In this PhD studentship, which is inter-disciplinary in nature, the aim is to investigate the combination of ontology-based, top-down approaches to data interpretation, with techniques originating from data mining to make sense of data through the bottom-up emergence of meaningful information patterns. Contributions are therefore expected a the areas of the Semantic Web, linked data, ontology engineering, data mining, data analytics and machine learning.
With the rapid growth of the open data and linked data  movements, more and more data are being made available and directly accessible from a wide range of domains, areas and organisations, including government agencies, public institutions, as well as private organisations. Knowledge discovery from databases  has for objective to make knowledge emerge from hidden patterns in large amounts of data. It generally relies on data mining techniques, which exploit regularities in the structure and content distributions of the data to identify potentially relevant hidden models, governing these structures and distributions. While data mining itself is an automatic process, its effectiveness depends on the appropriate preparation of the data and interpretation of the results, which are tasks increasing in difficulty when dealing with heterogeneous, distributed data from external sources.
On the other hand, the aim of knowledge engineering  is to make knowledge, which is implicitly detained by the experts and practitioners of a domain, available and accessible to automatic processes. Ontologies  are logical models of the concepts, entities and relationships in a domain, and are nowadays most commonly used as the "data schemas" for the Web of linked data. However, constructing an ontology for a specific domain (or dataset) is traditionally done manually, requires close cooperation between domain experts and knowledge engineers, and takes a significant amount of time.
Naturally, integrating knowledge engineering and knowledge discovery to create potentially meaningful and exploitable "knowledge patterns" has been considered in the past . There are however crucial challenges emerging from applying such approaches to open Web data . In particular the scale of the data, their incompleteness and heterogeneity are characteristics that are difficult to tackle within a knowledge discovery process, but could be helped by the use of ontologies. On the other hand, acquiring the basic structure of information and capturing the implicit relationships in a domain are the challenges typically faced by knowledge engineering, which could be helped by the appropriate use of data mining.
The goal of this PhD is therefore to investigate the interplay of knowledge engineering and knowledge discovery, through mining Web data from multiple sources, extracting knowledge patterns and applying them in interpreting further data, while working with heterogeneous and noisy data.
While a "knowledge cycle" like the one described above could apply to many domain, we are particularly interested in investigating its use to support data-driven research activities. Indeed, initial experiments were realized with researchers in the area of reading history, where the ability to connect core research data (obtained from archives and libraries) with Web data has shown to create the potential for new research methodologies. Indeed, one concrete expected outcome of the PhD would be a tool to explore research data in relation with external Web datasets and extract new candidate knowledge patterns from these data, which can then lead to emerging research questions in the field and to their further investigation on the basis of the already collected data and models. It is in particular expected for such a tool, relying on data mining and knowledge engineering techniques, to make the use of large, open Web data more accessible to researchers in "non-technical" fields. Through this PhD, there will be opportunities to work in collaboration in particular with researchers in Arts and Humanities, and with data from a larger variety of research domains, such as economics, biology, etc.
Expected Contributions and Impact
This PhD will be realised in the frame of our on-going research programme on "next generation semantic web applications" , relying on established research and infrastructure components (e.g. the Watson Semantic Web search engine ) to exploit and use data and knowledge from the Semantic Web to build intelligent systems. It is however expected for the work considered here to contribute further to emerging research areas at the Knowledge Media Institute; applying data mining approaches to make sense of linked and semantic data (see e.g., ).
For further information on this PhD project please contact:
 W. Frawley, G. Piatetsky-Shapiro and C. Matheus. Knowledge Discovery in Databases: An Overview. Ai Magazine, Vol. 13 (1992), pp. 57-70.
 A. Gómez-Pérez, M. Fernández-López, O. Corcho. Ontological Engineering. Springer. ISBN 1- 85233-551-3, November 2003.
 S. Staab and R. Studer (editors) Handbook on Ontologies, Springer, 2003. ISBN978-3540408345.
 P. Gottgtroy. An Ontology Driven Knowledge Discovery Framework for Dynamic Domains: Methodology, Tools and a Biomedical Case. PhD Thesis, School of Computing and Mathematical Sciences, Auckland University of Technology, 2010.
 F. Valle, M. d'Aquin, T. Noia and E. Motta. LOTED: Exploiting Linked Data in Analyzing European Procurement Notices, Knowledge Injection and Extraction from Linked Data, KIELD at EKAW 2010
 M. d'Aquin, E. Motta, M. Sabou, S. Angeletou, L. Gridinoc, V. Lopez and D. Guidi. Towards a New Generation of Semantic Web Applications, IEEE Intelligent Systems, 23, 3, pp. 20-28, 2008
 d'Aquin, M. and Motta, E. (2011) Watson, more than a Semantic Web search engine, Semantic Web Journal, 2, IOS Press
 d'Aquin, M. and Motta, E. (2011) Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis, The Sixth International Conference on Knowledge Capture - K-CAP 2011
ApplicationsThe relevant application form can be found at http://www.open.ac.uk/research/research-degrees/overview.php.
It is essential that you include both a proposal and a CV with your application. These are central to our judging of applications, both at shortlisting time and afterwards.
Application submissions can be directed to KMi Recruitment Coordinator at the Knowledge Media Institute, Open University, Milton Keynes, MK7 6AA, UK, Tel. +44 (0)1908 654774, Fax +44 (0)1908 653169.