Synonymous Paraphrasing Using WordNet and Internet

  • Igor A. Bolshakov
  • Alexander Gelbukh
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3136)

Abstract

We propose a method of synonymous paraphrasing of a text based on WordNet synonymy data and Internet statistics of stable word combinations (collocations). Given a text, we look for words or expressions in it for which WordNet provides synonyms, and substitute them with such synonyms only if the latter form valid collocations with the surrounding words according to the statistics gathered from Internet. We present two important applications of such synonymous paraphrasing: (1) style-checking and correction: automatic evaluation and computer-aided improvement of writing style with regard to various aspects (increasing vs. decreasing synonymous variation, conformistic vs. individualistic selection of synonyms, etc.) and (2) steganography: hiding of additional information in the text by special selection of synonyms. A basic interactive algorithm of style improvement is outlined and an example of its application to editing of newswire text fragment in English is traced. Algorithms of style evaluation and information hiding are also proposed.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Apresian, J.D., et al.: ETAP-3 Linguistic Processor: a Full-Fledged NPL Implementation of the Meaning–Text Theory. In: Proc. First Intern. Conf. Meaning–Text Theory, MTT 2003, Paris, Ecole Normale Supérieure, June 2003, pp. 279–288 (2003)Google Scholar
  2. 2.
    Bogatz, H.: The Advanced Reader’s Collocation Searcher (ARCS) (1997) ISBN 09709341-4-9, www.asksam.com/web/bogatz
  3. 3.
    Bentivogli, L., Pianta, E.: Detecting Hidden Multiwords in Bilingual Dictionaries. In: Proc. 10th EURALEX Intern. Congress, Copenhagen, Denmark, August 2002, pp. 14–17 (2002)Google Scholar
  4. 4.
    Bolshakov, I.A.: Getting One’s First Million.. Collocations. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 229–242. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  5. 5.
    Carrol, J., Minnen, G., Pearse, D., Canning, Y., Delvin, S., Tait, J.: Simplifying text for language-impaired readers. In: Proc. 9th Conference of the European Chapter of the ACL EACL 1999, Bergen, Norway (June 1999)Google Scholar
  6. 6.
    Chapman, M., Davida, G.: Hiding the hidden: A software system for concealing ciphertext as innocuous text. In: Han, Y., Quing, S. (eds.) ICICS 1997. LNCS, vol. 1334, pp. 335–345. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  7. 7.
    Chapman, M., Davida, G.I., Rennhard, M.: A Practical and Effective Approach to Large- Scale Automated Linguistic Steganography. In: Davida, G.I., Frankel, Y. (eds.) ISC 2001. LNCS, vol. 2200, pp. 156–165. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  8. 8.
    Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)MATHGoogle Scholar
  9. 9.
    Mel’ uk, I.: Dependency Syntax: Theory and Practice. SONY Press, NY (1988)Google Scholar
  10. 10.
    Oxford Collocations Dictionary for Students of English. Oxford University Press (2003)Google Scholar
  11. 11.
    Smadja, F.: Retreiving Collocations from text: Xtract. Computational Linguistics 19(1), 143–177 (1990)Google Scholar
  12. 12.
    Vossen, P. (ed.): EuroWordNet General Document. Vers. 3 final, www.hum.uva.nl/~ewn
  13. 13.
    Biemann, C., Bordag, S., Heyer, G., Quasthoff, U., Wolff, C.: Language-independent Methods for Compiling Monolingual Lexical Data. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 214–225. Springer, Heidelberg (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Igor A. Bolshakov
    • 1
  • Alexander Gelbukh
    • 1
    • 2
  1. 1.Center for Computing ResearchNational Polytechnic InstituteMexico
  2. 2.Department of Computer Science and EngineeringChung-Ang UniversitySeoulKorea

Personalised recommendations