Advertisement

Abstract

Although the literature contains reports of very high accuracy figures for the recognition of named entities in text, there are still some named entity phenomena that remain problematic for existing text processing systems. One of these is the ambiguity of conjunctions in candidate named entity strings, an all-too-prevalent problem in corporate and legal documents. In this paper, we distinguish four uses of the conjunction in these strings, and explore the use of a supervised machine learning approach to conjunction disambiguation trained on a very limited set of ‘name internal’ features that avoids the need for expensive lexical or semantic resources. We achieve 84% correctly classified examples using k-fold evaluation on a data set of 600 instances. Further improvements are likely to require the use of wider domain knowledge and name external features.

Keywords

Entity Recognition Sequential Minimal Optimization Conjunction Type Name Internal Internal Conjunction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Grishman, R., Sundheim, B.: Design of the MUC-6 Evaluation. In: Proceedings of Sixth Message Understanding Conference (MUC-6), Columbia, Maryland, November 6-8, Morgan Kaufmann, Los Altos (1995)Google Scholar
  2. 2.
    Grishman, R., Sundheim, B.: Message Understanding Conference-6: A Brief History. In: COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics, Morgan Kaufmann, Los Altos (1996)Google Scholar
  3. 3.
    Sang, E.F.T.K.: Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition. In: Roth, D., van den Bosch, A. (eds.) Proceedings of the 6th Conference on Natural Language Learning, Taipei, Taiwan, pp. 155–158 (2002)Google Scholar
  4. 4.
    Sang, E.F.T.K., De Meulder, F.: Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In: Daelemans, W., Osborne, M. (eds.) Proceedings of the 7th Conference on Natural Language Learning, Edmonton, Canada, pp. 142–147 (2003)Google Scholar
  5. 5.
    Rau, L.F.: Extracting company names from text. In: Proceedings of the Seventh Conference on Artificial Intelligence Applications, February 1991, pp. 189–194. IEEE Computer Society Press, Los Alamitos (1991)Google Scholar
  6. 6.
    Coates-Stephens, S.: The analysis and acquisition of proper names for the understanding of free text. Computers and the Humanities V26(5), 441–456 (1992), http://dx.doi.org/10.1007/BF00136985 CrossRefGoogle Scholar
  7. 7.
    McDonald, D.D.: Internal and external evidence in the identification and semantic categorization of proper names. In: Boguraev, B., Pustejovsky, J. (eds.) Corpus processing for lexical acquisition, pp. 21–39 (1996)Google Scholar
  8. 8.
    Mikheev, A., Grover, C., Moens, M.: Description of the LTG System Used for MUC-7. In: Proc. of MUC-7 Conf. (1998)Google Scholar
  9. 9.
    McDonald, R., Crammer, K., Pereira, F.: Flexible text segmentation with structured multilabel classification. In: EMNLP (2005)Google Scholar
  10. 10.
    Solorio, T.: Improvement of Named Entity Tagging by Machine Learning. Technical Report CCC-04-004, Coordinación de Ciencias Computacionales (2004)Google Scholar
  11. 11.
    Steedman, M.: Dependency and Coordination in the Grammar of Dutch and English. Language 61, 523–568 (1985)CrossRefGoogle Scholar
  12. 12.
    Dale, R., Calvo, R., Tilbrook, M.: Key Element Summarisation: Extracting Information from Company Announcements. In: Proc. of the 17th Australian Joint Conf. on AI, Australia, 7th-10th Dec. (2004)Google Scholar
  13. 13.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar
  14. 14.
    Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11, 63–91 (1993)CrossRefzbMATHGoogle Scholar
  15. 15.
    Rojas, R.: Neural networks: a systematic introduction. Springer, New York (1996)CrossRefzbMATHGoogle Scholar
  16. 16.
    Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)Google Scholar
  17. 17.
    Cleary, J.G., Trigg, L.E.: K*: An Instance-based Learner Using an Entropic Distance Measure. In: Proceedings of the 12th International Conference on Machine Learning, pp. 108–114. Morgan Kaufmann, San Francisco (1995)Google Scholar
  18. 18.
    Landwehr, N., Hall, M., Frank, E.: Logistic Model Trees. Machine Learning 59(1/2), 161–205 (2005)CrossRefzbMATHGoogle Scholar
  19. 19.
    Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)Google Scholar
  20. 20.
    Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel Methods: Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Robert Dale
    • 1
  • Paweł Mazur
    • 1
    • 2
  1. 1.Centre for Language Technology, Macquarie University, NSW 2109, SydneyAustralia
  2. 2.Institute of Applied Informatics, Wrocław University of Technology, Wyb. Wyspiańskiego 27, 50-370 WrocławPoland

Personalised recommendations