Abstract
We propose a method of synonymous paraphrasing of a text based on WordNet synonymy data and Internet statistics of stable word combinations (collocations). Given a text, we look for words or expressions in it for which WordNet provides synonyms, and substitute them with such synonyms only if the latter form valid collocations with the surrounding words according to the statistics gathered from Internet. We present two important applications of such synonymous paraphrasing: (1) style-checking and correction: automatic evaluation and computer-aided improvement of writing style with regard to various aspects (increasing vs. decreasing synonymous variation, conformistic vs. individualistic selection of synonyms, etc.) and (2) steganography: hiding of additional information in the text by special selection of synonyms. A basic interactive algorithm of style improvement is outlined and an example of its application to editing of newswire text fragment in English is traced. Algorithms of style evaluation and information hiding are also proposed.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Work done under partial support of Mexican Government (CONACyT, SNI, CGPI-IPN) and Korean Government (KIPA research professorship). The second author is currently on Sabbatical leave at Chung-Ang University.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Apresian, J.D., et al.: ETAP-3 Linguistic Processor: a Full-Fledged NPL Implementation of the Meaning–Text Theory. In: Proc. First Intern. Conf. Meaning–Text Theory, MTT 2003, Paris, Ecole Normale Supérieure, June 2003, pp. 279–288 (2003)
Bogatz, H.: The Advanced Reader’s Collocation Searcher (ARCS) (1997) ISBN 09709341-4-9, www.asksam.com/web/bogatz
Bentivogli, L., Pianta, E.: Detecting Hidden Multiwords in Bilingual Dictionaries. In: Proc. 10th EURALEX Intern. Congress, Copenhagen, Denmark, August 2002, pp. 14–17 (2002)
Bolshakov, I.A.: Getting One’s First Million.. Collocations. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 229–242. Springer, Heidelberg (2004)
Carrol, J., Minnen, G., Pearse, D., Canning, Y., Delvin, S., Tait, J.: Simplifying text for language-impaired readers. In: Proc. 9th Conference of the European Chapter of the ACL EACL 1999, Bergen, Norway (June 1999)
Chapman, M., Davida, G.: Hiding the hidden: A software system for concealing ciphertext as innocuous text. In: Han, Y., Quing, S. (eds.) ICICS 1997. LNCS, vol. 1334, pp. 335–345. Springer, Heidelberg (1997)
Chapman, M., Davida, G.I., Rennhard, M.: A Practical and Effective Approach to Large- Scale Automated Linguistic Steganography. In: Davida, G.I., Frankel, Y. (eds.) ISC 2001. LNCS, vol. 2200, pp. 156–165. Springer, Heidelberg (2001)
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Mel’ uk, I.: Dependency Syntax: Theory and Practice. SONY Press, NY (1988)
Oxford Collocations Dictionary for Students of English. Oxford University Press (2003)
Smadja, F.: Retreiving Collocations from text: Xtract. Computational Linguistics 19(1), 143–177 (1990)
Vossen, P. (ed.): EuroWordNet General Document. Vers. 3 final, www.hum.uva.nl/~ewn
Biemann, C., Bordag, S., Heyer, G., Quasthoff, U., Wolff, C.: Language-independent Methods for Compiling Monolingual Lexical Data. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 214–225. Springer, Heidelberg (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bolshakov, I.A., Gelbukh, A. (2004). Synonymous Paraphrasing Using WordNet and Internet. In: Meziane, F., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2004. Lecture Notes in Computer Science, vol 3136. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27779-8_27
Download citation
DOI: https://doi.org/10.1007/978-3-540-27779-8_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22564-5
Online ISBN: 978-3-540-27779-8
eBook Packages: Springer Book Archive