ESpotter

 

ESpotter is supported by the Dot.Kom project.

Team:

Dr. Jianhan Zhu
Dr. Victoria Uren
Prof. Enrico Motta


Related Projects:

Magpie



Talks:
KMi Internal Talk: (June 14th 2005)
ESpotter: A Domain and User Adaptation Approach for Named Entity Recognition on the Web

Abstract: Named entity recognition (NER) systems are commonly designed with a "one-size-fits-all" philosophy. Lexicons and patterns manually crafted or learned from a training set of documents are applied to any other document without taking into account its background and user needs. However, when applying NER to Web pages, due to the diversity of these Web pages and user needs, one size frequently does not fit all. In this talk, I present a system called ESpotter, which improves NER on the Web by adapting lexicons and patterns to domains on the Web and user preferences. My results show that ESpotter provides more accurate and efficient NER on Web pages from various domains than current NER systems. ESpotter is implemented as a browser plug-in to help solve the information overload problem on the Web by discovering relevant information on user's behalf. Further work of integrating ESpotter with ontology based semantic browsing tool, Magpie, and the KMi semantic Web site are explored.

Keywords: Named entity recognition, information extraction, hierarchies.

Talk slides



Papers:

Jianhan Zhu, Victoria Uren, and Enrico Motta. ESpotter: Adaptive Named Entity Recognition for Web Browsing. To appear in Proc. of Workshop on IT Tools for Knowledge Management Systems at WM2005 Conference, Kaiserslautern, Germany, April 11-13, 2005.



Demos:

Jianhan Zhu, Victoria Uren, and Enrico Motta. ESpotter: A Prototype System for Adaptive Named Entity Recognition Supporting Web Browsing. The Fifteenth ACM Conference on Hypertext and Hypermedia (Hypertext'04), Santa Cruz, USA, August 9-13, 2004.

Jianhan Zhu, Victoria Uren, and Enrico Motta. ESpotter: A Prototype System for Adaptive Named Entity Recognition Supporting Web Browsing. The Fourteenth International Conference on Knowledge Engineering and Knowledge Management (EKAW'2004), Whittlebury Hall, Northamptonshire, UK, October 5-8, 2004.



Download ESpotter as a .NET Windows Application:

You can simply click one button to extract entities of various types, e.g., "Open University" as an organization and "Enrico Motta" as a person, from documents. You can select one or multiple documents in plain text format or html format and save the recognized entities in an XML file for further processing.

The tool is based on the .NET framework and can be downloaded. Run the ESpotter.msi file to install (you may need to install .net framework 1.0). The installation will create a shortcut for an ESpotter executable file on your desktop. One example XML output as follows shows entities of various types and their word offsets in a document.

  <?xml version="1.0" encoding="utf-8" standalone="yes"?>
<ESpotter-Processed-Documents corpusSize="284">
 <Document id="0">
  <has-directory>D:\test.xml</has-directory>
  <has-url>D:\test.xml</has-url>
  <has-document-size>284</has-document-size>
  <mentions-location>
   <instance content=" Australia " pos="108" />
  </mentions-location>
  <mentions-organization>
   <instance content=" Monash University " pos="132" />
  </mentions-organization>
  <mentions-person>
   <instance content="Larry Stillman" pos="130" />
  </mentions-person>
  <mentions-research-area>
   <instance content="network" pos="238" alias="TechnologiesCommunity Informatics Research Network" />
  </mentions-research-area>
  <pn>
   <instance content="ICT" pos="22" />
  </pn>
 </Document>
</ESpotter-Processed-Documents>


ESpotter uses an MS Access database file ESpotterResources.mdb to store lexicon and pattern information. Currently ESpotter recognize People, Organization, Location, Research Area, Email, Telephone, Postal Code, and other Proper Names. You can easily customize the lexicon and patterns in ESpotterResources.mdb file to recognize any type of entities you are interested in by adding new lexicon and patterns. Lexicon and patterns are grouped into different tables. When you add new lexicon or patterns, you can create a new table, and register the new table in the TableSchema table. New entity types need to be registered in the TypeSchema table. Using precision for domain adaptation is not used in the version of ESpotter and can be ignored in the database file.

For developers interested in ESpotter, the installation includes an DLL file ESpotterClass.dll for easy inclusion in a .NET application for language engineering. An example is given in the Class1.cs file. More info on using ESpotter for development is coming soon.


 

KMi

Open University