Authoring Technical Documents for Effective Retrieval

  • Jonathan Butters
  • Fabio Ciravegna
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6317)


In this paper we outline the design considerations and application of a methodology to author technical documents in order to improve retrieval. Our approach is firmly aimed at large organizations where variations in terminology at personal, national and international scales often impede retrieval of relevant knowledge. We first present the difficulties in performing entity extraction in technical domains and the role variation in terminology has in the information extraction task before outlining and evaluating a methodology that allows for effective retrieval.




Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Just-in-Time Delivery Comes to Knowledge Management. Harvard Business Review 80(7) (July 2002)Google Scholar
  2. 2.
    Kittredge, R., Lehrberger, J.: Sublanguage: Studies of Language in Restricted Semantic Domains. deGruyter (1982)Google Scholar
  3. 3.
    Engelson, S.P., Dagan, I.: Minimizing manual annotation cost in supervised training from corpora. In: Proceedings of the 34th Annual Meeting on Association for Computational Linguistics (1996)Google Scholar
  4. 4.
    Wilson, T., Wiebe, J., Hoffmann, P., et al.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing (2005)Google Scholar
  5. 5.
    Schlueter, S., Dong, Q., Brendel, V.: GeneSeqer@PlantGDB: gene structure prediction in plant genomes. Nucleic Acids Research 31(13), 3597–3600 (2003)CrossRefGoogle Scholar
  6. 6.
    Grishman, R.: Adaptive Information Extraction and Sublanguage Analysis. In: Proceedings of IJCAI Workshop on Adaptive Text Extraction and Mining, pp. 77–79 (2001)Google Scholar
  7. 7.
    Ciravegna, F., Dingli, A., Petrelli, D., Wilks, Y.: User-System Cooperation in Document Annotation based on Information Extraction. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS (LNAI), vol. 2473, p. 122. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  8. 8.
    Ciravegna, F.: Adaptiveinformationextractionfromtextbyruleinductionandgeneralisation. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence, IJCAI 2001 (2001)Google Scholar
  9. 9.
    Culotta, A., Sorensen, J.: Dependency tree kernels for relation extraction. In: Proceedings of ‘the 42nd Annual Meeting of the Association for Computational Linguistics, ACL 2004 (2004)Google Scholar
  10. 10.
    Zhang, Z., Iria, J.: A Novel Approach to Automatic Gazetteer Generation using Wikipedia. In: Proceedings of the ACL 2009 Workshop on Collaboratively (2009)Google Scholar
  11. 11.
    Ponzetto, S.P., Strube, M.: Exploiting semantic role labeling, wordnet and wikipedia for coreference resolution. In: Moore, R.C., Bilmes, J.A., Chu-Carroll, J., Sanderson, M. (eds.) HLT-NAACL. ACL (2006)Google Scholar
  12. 12.
    Strube, M., Ponzetto, S.P.: Wikirelate! computing semantic relatedness using wikipedia. In: AAAI, pp. 1419–1424. AAAI Press, Menlo Park (2006)Google Scholar
  13. 13.
    Toraland, A., Munoz, R.: A proposal to automatically build and maintain gazetteers for Named Entity Recognition by using Wikipedia. In: Workshop on New Text, 11th Conference of the European Chapter of the Association for Computational Linguistics (2006)Google Scholar
  14. 14.
    Pantel, P., Pennacchiotti, M.: Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In: ACL (2006)Google Scholar
  15. 15.
    Feldman, R., Rosenfeld, B., Soderland, S., Etzioni, O.: Self-supervised relation extraction from the web. In: ISMIS, pp. 755–764 (2006)Google Scholar
  16. 16.
    Agichtein, E.: Confidence estimation methods for partially supervised relation extraction. In: SDM 2006 (2006)Google Scholar
  17. 17.
    Chen, J., Ji, D.-H., Tan, C.L., Niu, Z.-Y.: Semi-supervised relation extraction with label propagation. In: HLT-NAACL (2006)Google Scholar
  18. 18.
    Riloff, E., Wiebe, J.: Learning extraction patterns for subjective expressions. In: EMNLP 2003 (2003)Google Scholar
  19. 19.
    Bhagdev, R., Chakravarthy, A., Chapman, S., Ciravegna, F., Lanfranchi, V.: Creating and Using Organisational Semantic Webs in Large Networked Organisations. In: Proceedings of the 7th International Semantic Web Conference, Karlsruhe, Germany (October 2008)Google Scholar
  20. 20.
    Liu, H., Lieberman, H., Selker, T.: GOOSE: A Goal-Oriented Search Engine With Commonsense. In: De Bra, P., Brusilovsky, P., Conejo, R. (eds.) AH 2002. LNCS, vol. 2347, p. 253. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  21. 21.
    Giunchiglia, F., Kharkevich, U., Zaihrayeu, I.: Concept search. In: Aroyo, L., Traverso, P., Ciravegna, F., Cimiano, P., Heath, T., Hyvönen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 429–444. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  22. 22.
    Sclano, F., Velardi, P.: Termextractor: a web application to learn the shared terminology of emergent web communities. In: Proceedings of the 3rd International Conference on Interoperability for Enterprise Software andApplications, I-ESA 2007 (2007)Google Scholar
  23. 23.
    Frantzi, K.T., Ananiadou, S.: The c/nc value domain independent method for multi-word term extraction. Journal of Natural Language Processing utilization in the Information Search and Delivery System for IBM Technical Support. IBM Systems Journal 43(3), 546–563 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Jonathan Butters
    • 1
  • Fabio Ciravegna
    • 1
  1. 1.Department of Computer ScienceUniversity of SheffieldSheffieldUK

Personalised recommendations