Full Seminar Details

Professor Beth Plale

Director, Data to Insight Center Managing Director, Pervasive Technology Institute Professor of Informatics and Computing at Indiana University

Professor Beth Plale
Big Data Opportunities and Challenges for IR, Text Mining, and NLP
This event took place on Thursday 12 December 2013 at 10:30

HTRC is a collaborative effort of Indiana University and the University of Illinois at Urbana-Champaign, along with the HathiTrust, to provide a new mode of access to the content of research libraries. That is, HTRC enables computational exploration of the digitized volumes that make up the HathiTrust digital library.
Initially launched in 2011, Phase I of the HTRC initiative was dedicated to construction of underlying software and services. Spring 2013 marks Phase II, focused on engaging with the research community to support and showcase computational research on the public domain corpus alongside ongoing technical development.   

In this talk, I will talk about a couple of recent developments to HTRC:

Community Contributed Analytics in Secure Capsule. Through funding from the Alfred P. Sloan Foundation, HTRC is developing secure software through which researchers can submit their own analytics algorithms to run against the full corpus of 11 M volumes, including both public domain and copyrighted content. Researchers with smaller scale needs obtain a dedicated virtual machine (VM) that is pre-configured but can be customized by the researcher with his or her own software. The VM runs on HTRC compute resources. When running, the VM has limited access to the network to ensure the safety of the data. HTRC is working on expanding access to statistical information about the entire 11M volume corpus, working with community members to identify particularly useful information like page-level token counts.
Metadata Enhancement.  Going beyond MARC, the HTRC team is adding more metadata fields to better serve the diverse needs of the community. Our indexing service has separated the full text index from the metadata index, making it more convenient to add additional metadata fields without interfering with the OCR content. So far, “gender” and “token count” fields have been added with plans to investigate and implement additional attributes.
See more details at http://www.hathitrust.org/htrc

Watch the webcast replay >>

View all past events

Maven of the month logo - Photo of Prof. Ricardo Baeza-Yates

Maven of the Month

We are also inviting top experts in AI and Knowledge Technologies to discuss major socio-technological topics with an audience that comprises both members of the Knowledge Media Institute, as well as the wider staff at The Open University. Differently from our seminar series, these events follow a Q&A format.

Past events


Knowledge Media Institute
The Open University
Walton Hall
Milton Keynes
United Kingdom

Tel: +44 (0)1908 653800

Fax: +44 (0)1908 653169

Email: KMi Support


If you have any comments, suggestions or general feedback regarding our website, please email us at the address below.

Email: KMi Development Team