Skip to main content

Automatic Expansion of Chinese Abbreviations by Web Mining

  • Conference paper
Artificial Intelligence and Computational Intelligence (AICI 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5855))

Abstract

Abbreviations are common in everyday Chinese. For applications like information retrieval, we want not only to recognize the abbreviations, but also to know what they stand for. To tackle the emergence of all kinds of new abbreviations, this paper proposes a novel method that expands an abbreviation to its full name employing the Web as the main information source. Snippets containing full names of an abbreviation are obtained through a search engine by learned ”help words”. Then the snippets are examined using linguistic heuristics to generate a list of candidates. We select the optimal candidate according to a kNN-based ranking mechanism. Experiment shows that this method achieves satisfactory results.

This paper is supported in part by Chinese 863 project No. 2009AA01Z334 and the Shanghai Municipal Education Commission Foundation for Excellent Young University Teachers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chang, J., Teng, W.: Mining atomic Chinese abbreviation pairs: A probabilistic model for single character word recovery. Language Resources and Evaluation 40(3/4), 367–374 (2007)

    Article  Google Scholar 

  2. Chen, K., Bai, M.: Unknown word detection for Chinese by a corpus-based learning method. Computational Linguistics 3(1), 27–44 (1998)

    MathSciNet  Google Scholar 

  3. Sun, J., Gao, J., Zhang, L., Zhou, M., Huang, C.: Chinese named entity identification using class-based language model. In: COLING 2002, pp. 24–25 (2002)

    Google Scholar 

  4. Sun, X., Wang, H.: Chinese abbreviation identification using abbreviation-template features and context information. In: Matsumoto, Y., Sproat, R.W., Wong, K.-F., Zhang, M. (eds.) ICCPOL 2006. LNCS (LNAI), vol. 4285, pp. 245–255. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  5. Li, Z., Yarowsky, D.: Unsupervised Translation Induction for Chinese Abbreviations using Monolingual Corpora. In: Proceedings of ACL, pp. 425–433 (2008)

    Google Scholar 

  6. Chang, J., Lai, Y.: A preliminary study on probabilistic models for Chinese abbreviations. In: Proceedings of the Third SIGHAN Workshop on Chinese Language Learning, pp. 9–16 (2004)

    Google Scholar 

  7. Fu, G., Luke, K., Zhang, M., Zhou, G.: A hybrid approach to Chinese abbreviation expansion. In: Matsumoto, Y., Sproat, R.W., Wong, K.-F., Zhang, M. (eds.) ICCPOL 2006. LNCS (LNAI), vol. 4285, pp. 277–287. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  8. Huang, L.: More on the construction of modern Chinese abbreviations. Journal of Suihua University (004) (2008)

    Google Scholar 

  9. Mitchel, T.: Machine Learning 48(1) (1997)

    Google Scholar 

  10. Li, X.: Modern Chinese Standardized Dictionary. Foreign Language Teaching and Researching Press, Language and Literature Press, Beijing (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liu, H., Chen, Y., Liu, L. (2009). Automatic Expansion of Chinese Abbreviations by Web Mining. In: Deng, H., Wang, L., Wang, F.L., Lei, J. (eds) Artificial Intelligence and Computational Intelligence. AICI 2009. Lecture Notes in Computer Science(), vol 5855. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05253-8_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-05253-8_45

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-05252-1

  • Online ISBN: 978-3-642-05253-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics