---- README ----

This Readme file accompanies the pagelocations.xml and placefrequency.xml files both released under the Apache 2.0 license. These files were produced by Simon Overell during his PhD at Imperial College London, supervised by Stefan Rueger at the Knowledge Media Institute of the Open University.

For more information on how these files were produced please read the accompanying paper "Geographic Co-occurrence as a tool for GIR" or contact Simon Overell: simon.overell01@imperial.ac.uk

Both files are XML dumps of SQL tables.

Pagelocations maps Wikipedia pages to Unique locations in the Getty Thesaurus of Geographic Names. The pagelocations.xml table contains three fields: ploc_page, ploc_title and ploc_locations. The ploc_page field is a unique integer for internal use mapping to use Wikipedia page. Ploc_title is a text string containing Wikipedia article titles. The ploc_location field contains an integer corresponding to a unique id in the Getty TGN.

Placefrequency maps Placenames (text strings) to unique locations in the Getty Thesaurus of Geographic Names. There is also a frequency field that contains the number of times a particular placename refers to a specific location in the 100,000 randomly selected Wikipedia articles we crawled. Pf_text is a text string containing the Placename referring to this location (extracted from Wikipedia anchor text). Pf_location contains an integer corresponding to a unique id in the Getty TGN. Pf_freq is an integer corresponding to the number of times this placename refers to this location.


-	Copyright 2007 Simon Overell

-   Licensed under the Apache License, Version 2.0 (the "License");
-   you may not use this file except in compliance with the License.
-   You may obtain a copy of the License at

-       http://www.apache.org/licenses/LICENSE-2.0

-   Unless required by applicable law or agreed to in writing, software
-   distributed under the License is distributed on an "AS IS" BASIS,
-   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-   See the License for the specific language governing permissions and
-   limitations under the License.