Showing all 7 Tech Reports linked to Jianhan Zhu
The Open University at TREC 2006 Enterprise Track Expert Search Task
The Multimedia and Information Systems group at the Knowledge Media Institute of the Open University par-ticipated in the Expert Search task of the Enterprise Track in TREC 2006. We have proposed to address three main innovative points in a two-stage language model, which consists of a document relevance model and a co-occurrence model, in order to improve the performance of expert search. The three innovative points are based on characteristics of documents. First, document authority in terms...read more
ID: kmi-07-02
Date: 2007
Author(s): Jianhan Zhu, Dawei Song, Stefan Rüger, Marc Eisenstadt, Enrico Motta
Resources:LRD: Latent Relation Discovery for Vector Space Expansion and Information Retrieval
In this paper, we propose a text mining method called LRD (latent relation discovery), which extends the traditional vector space model of document representation in order to improve information retrieval (IR) on documents and document clustering. Our LRD method extracts terms and entities, such as person, organization, or project names, and discovers relationships between them by taking into account their co-occurrence in textual corpora. Given a target entity, LRD discovers other entities...read more
ID: kmi-06-09
Date: 2006
Author(s): Alexandre Gonēalves, Jianhan Zhu, Dawei Song, Victoria Uren, Roberto Pacheco
Resources:Exploiting Semantic Association To Answer Vague Queries
Although today's web search engines are very powerful, they still fail to provide intuitively relevant results for many types of queries, especially ones that are vaguely-formed in the user's own mind. We argue that associations between terms in a search query can reveal the underlying information needs in the users' mind and should be taken into account in search. We propose a multi-faceted approach to detect and exploit such associations. The CORDER method measures the association strength...read more
ID: KMI-06-01
Date: 2006
Author(s): Jianhan Zhu, Marc Eisenstadt, Dawei Song, Chris Denham
Resources:Extracting Domain Ontologies with CORDER
The CORDER web mining engine developed at the Knowledge Media Institute computes a lexical coocurrence network out of websites - a binary relation R. A natural extension of CORDER would be that of learning an ontology. However, our work shows that coocurrence proves insufficient to discover concepts and conceptual taxonomies (i.e. very simple ontologies) out of this network. To tackle this problem two unsupervised learning methods were studied based, on the one hand, on set similarity (and thus...read more
BuddyFinder-CORDER: Leveraging Social Networks for Matchmaking by Opportunistic Discovery
Online social networking tools are extremely popular, but can miss potential discoveries latent in the social 'fabric'. Matchmaking services can do naive profile matching with old database technology, and modern ontological markup, though powerful, can be onerous at data-input time. In this paper, we present a system called BuddyFinder-CORDER which can automatically produce a ranked list of buddies to match a user's search requirements specified in a term-based query, even in the absence of...read more
ID: kmi-05-13
Date: 2005
Author(s): Jianhan Zhu, Marc Eisenstadt, Alexandre Goncalves, Chris Denham
Resources:Adaptive Named Entity Recognition for Social Network Analysis and Domain Ontology Maintenance
We present a system which unearths relationships between named entities from information in Web pages. We use an adaptive named entity recognition system, ESpotter, which recognizes entities of various types with high precision and recall from various domains on the Web, to generate entity data such as peoples' names. Given an entity, we apply a link analysis algorithm to the entity data for finding other entities which are closely related to it. We present our results to people whose names had...read more
ESpotter: Adaptive Named Entity Recognition for Web Browsing
Web users are facing information overload problems, i.e., it is hard for them to find desired information on the web. Hence the growing interest in named entity recognition (NER) for discovering relevant information on users behalf. We present a browser plug-in called ESpotter which adapts lexicons and patterns to a domain hierarchy consisting of domains on the web and user preferences for accurate and efficient NER. Mappings are created from domain independent types to domain...read more