Extracting and Querying Relations in Scientific Papers

  • Ulrich Schäfer
  • Hans Uszkoreit
  • Christian Federmann
  • Torsten Marek
  • Yajing Zhang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5243)

Abstract

High-precision linguistic and semantic analysis of scientific texts is an emerging research area. We describe methods and an application for extracting interesting factual relations from scientific texts in computational linguistics and language technology. We use a hybrid NLP architecture with shallow preprocessing for increased robustness and domain-specific, ontology-based named entity recognition, followed by a deep HPSG parser running the English Resource Grammar (ERG). The extracted relations in the MRS (minimal recursion semantics) format are simplified and generalized using WordNet. The resulting ‘quriples’ are stored in a database from where they can be retrieved by relation-based search. The query interface is embedded in a web browser-based application we call the Scientist’s Workbench. It supports researchers in editing and online-searching scientific papers.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bird, S., Dale, R., Dorr, B., Gibson, B., Joseph, M., Kan, M.Y., Lee, D., Powley, B., Radev, D., Tan, Y.F.: The ACL anthology reference corpus: a reference dataset for bibliographic research. In: Proc. of LREC, Marrakech, Morocco (2008)Google Scholar
  2. 2.
    Schäfer, U.: Integrating Deep and Shallow Natural Language Processing Components – Representations and Hybrid Architectures. PhD thesis, Faculty of Mathematics and Computer Science, Saarland University, Saarbrücken, Germany (2007)Google Scholar
  3. 3.
    Brants, T.: TnT - A Statistical Part-of-Speech Tagger. In: Proc. of Eurospeech, Rhodes, Greece (2000)Google Scholar
  4. 4.
    Drożdżyński, W., Krieger, H.U., Piskorski, J., Schäfer, U., Xu, F.: Shallow processing with unification and typed feature structures – foundations and applications. Künstliche Intelligenz 2004(1), 17–23 (2004)Google Scholar
  5. 5.
    Callmeier, U.: PET – A platform for experimentation with efficient HPSG processing techniques. Natural Language Engineering 6(1), 99–108 (2000)CrossRefGoogle Scholar
  6. 6.
    Copestake, A., Flickinger, D.: An open-source grammar development environment and broad-coverage English grammar using HPSG. In: Proc. of LREC, Athens, Greece, pp. 591–598 (2000)Google Scholar
  7. 7.
    Oepen, S., Flickinger, D., Toutanova, K., Manning, C.D.: LinGO redwoods: A rich and dynamic treebank for HPSG. In: Proc. of the Workshop on Treebanks and Linguistic Theories, TLT 2002, Sozopol, Bulgaria, September 20–21 (2002)Google Scholar
  8. 8.
    Copestake, A., Flickinger, D., Sag, I.A., Pollard, C.: Minimal recursion semantics: an introduction. Journal of Research on Language and Computation 3(2–3) (2005)Google Scholar
  9. 9.
    Uszkoreit, H., Jörg, B., Erbach, G.: An ontology-based knowledge portal for language technology. In: Proc. of ENABLER/ELSNET Workshop, Paris (2003)Google Scholar
  10. 10.
    Schäfer, U.: OntoNERdIE – mapping and linking ontologies to named entity recognition and information extraction resources. In: Proc. of LREC, Genoa, Italy (2006)Google Scholar
  11. 11.
    Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.J.: Five papers on WordNet. Technical report, Cognitive Science Lab, Princeton University (1993)Google Scholar
  12. 12.
    Rupp, C., Copestake, A., Corbett, P., Waldron, B.: Integrating general-purpose and domain-specific components in the analysis of scientific text. In: Proc. of the UK e-Science Programme All Hands Meeting 2007, Nottingham, UK (2007)Google Scholar
  13. 13.
    Sætre, R., Kenji, S., Tsujii, J.: Syntactic features for protein-protein interaction extraction. In: Baker, C.J., Jian, S. (eds.) Short Paper Proc. of the 2nd International Symposium on Languages in Biology and Medicine (LBM 2007), Singapore (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Ulrich Schäfer
    • 1
  • Hans Uszkoreit
    • 1
  • Christian Federmann
    • 1
  • Torsten Marek
    • 1
  • Yajing Zhang
    • 1
  1. 1.German Research Center for Artificial Intelligence (DFKI), Language Technology LabSaarbrückenGermany

Personalised recommendations