Skip to main content

Automatic Annotation of Bibliographical References for Descriptive Language Materials

  • Conference paper
Multilingual and Multimodal Information Access Evaluation (CLEF 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6941))

Abstract

The present paper considers the problem of annotating bibliographical references with labels/classes, given training data of references already annotated with labels. The problem is an instance of document categorization where the documents are short and written in a wide variety of languages. The skewed distributions of title words and labels calls for special carefulness when choosing a Machine Learning approach. The present paper describes how to induce Disjunctive Normal Form formulae (DNFs), which have several advantages over Decision Trees. The approach is evaluated on a large real-world collection of bibliographical references.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hammarström, H., Nordhoff, S.: Langdoc: Bibliographic infrastructure for linguistic typology. Oslo Studies in Language, 14 (in press, 2011)

    Google Scholar 

  2. Hammarström, H.: Automatic annotation of bibliographical references with target language. In: Proceedings of MMIES-2: Workshop on Multi-source, Multilingual Information Extraction and Summarization, ACL, pp. 57–64 (2008)

    Google Scholar 

  3. Sahlgren, M.: The Word-Space Model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. PhD thesis, Stockholm University, Stockholm (2006)

    Google Scholar 

  4. Huang, X., Croft, W.B.: A unified relevance model for opinion retrieval. In: CIKM 2009: Proceeding of the 18th ACM Conference on Information and Knowledge Management, pp. 947–956. ACM, New York (2009)

    Google Scholar 

  5. Lavrenko, V., Choquette, M., Croft, W.B.: Cross-lingual relevance models. In: SIGIR 2002: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 175–182. ACM, New York (2002)

    Chapter  Google Scholar 

  6. Zhang, D., Mei, Q., Zhai, C.: Cross-lingual latent topic extraction. In: ACL 2010: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1128–1137. Association for Computational Linguistics, Morristown (2010)

    Google Scholar 

  7. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2002)

    Article  Google Scholar 

  8. Al Zamil, M.G.H., Can, A.B.: Rolex-sp: Rules of lexical syntactic patterns for free text categorization. Knowledge-Based Systems 24(1), 58–65 (2011)

    Article  Google Scholar 

  9. Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)

    Google Scholar 

  10. Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  11. Cohen, W.W.: Fast effective rule induction. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 115–123. Morgan Kaufmann, San Francisco (1995)

    Google Scholar 

  12. Clark, P., Niblett, T.: The cn2 induction algorithm. Machine Learning 3, 261–283 (1989)

    Google Scholar 

  13. Sever, H., Gorur, A., Tolun, M.R.: Text Categorization with ILA. In: Yazıcı, A., Şener, C. (eds.) ISCIS 2003. LNCS, vol. 2869, pp. 300–307. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  14. van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworths, London (1979)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hammarström, H. (2011). Automatic Annotation of Bibliographical References for Descriptive Language Materials. In: Forner, P., Gonzalo, J., Kekäläinen, J., Lalmas, M., de Rijke, M. (eds) Multilingual and Multimodal Information Access Evaluation. CLEF 2011. Lecture Notes in Computer Science, vol 6941. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23708-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23708-9_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23707-2

  • Online ISBN: 978-3-642-23708-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics