Machine Learning Method for Paraphrase Identification

  • Oleksandr MarchenkoEmail author
  • Anatoly Anisimov
  • Andrii Nykonenko
  • Tetiana Rossada
  • Egor Melnikov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10333)


A new effective algorithm and a system for paraphrase identification have been developed using a machine learning approach. The system architecture has the form of a multilayer classifier. According to their strategies, sub-classifiers of the lower level make decisions about the presence of paraphrase in sentences, while a super-classifier of the upper level makes the final decision. Conducted experiments demonstrated that the system has the accuracy of the paraphrase detection comparable with the best known analogous systems while being superior to all of them in implementation.


Machine learning SVM Paraphrase identification 



The authors of the article are grateful to PHASE ONE: KARMA LTD. company, especially to the Unplag team for the support in research and considerable assistance in the development, testing and implementation of the paraphrase identification method.


  1. 1.
    Cheng, J., Kartsaklis, D.: Syntax-aware multi-sense word embeddings for deep compositional models of meaning. In: Proceedings of EMNLP 2015, pp. 1531–1542 (2015)Google Scholar
  2. 2.
    Das, D., Smith, N.A.: Paraphrase identification as probabilistic quasi-synchronous recognition. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics, pp. 468–476 (2009)Google Scholar
  3. 3.
    Denkowski, M., Lavie, A.: Extending the meteor machine translation metric to the phrase level. In: Proceedings of NAACL (2010)Google Scholar
  4. 4.
    Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of HLT, pp. 138–145 (2002)Google Scholar
  5. 5.
    Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)Google Scholar
  6. 6.
    Guo, W., Diab, M.: Modeling sentences in the latent space. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pp. 864–872 (2012)Google Scholar
  7. 7.
    Hassan, S.: Measuring semantic relatedness using salient encyclopedic concepts. Ph.D. thesis. University of North Texas (2011)Google Scholar
  8. 8.
    He, H., Gimpel, K., Lin, J.: Multi-perspective sentence similarity modeling with convolutional neural networks. In: Proceedings of EMNLP 2015, pp. 1576–1586 (2015)Google Scholar
  9. 9.
    Ji, Y., Eisenstein, J.: Discriminative improvements to distributional sentence similarity. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP 2013), pp. 891–896 (2013)Google Scholar
  10. 10.
    Madnani, N., Tetreault, J., Chodorow, M.: Re-examining machine translation metrics for paraphrase identification. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 182–190 (2012)Google Scholar
  11. 11.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of ACL (2002)Google Scholar
  12. 12.
    Parker, S.: BADGER: a new machine translation metric. In: Proceedings of the Workshop on Metrics for Machine Translation at AMTA (2008)Google Scholar
  13. 13.
    Wan, S., Dras, M., Dale, R., Paris, C.: Using dependency-based features to take the “para-farce” out of paraphrase. In: Australasian Language Technology, Workshop, pp. 131–138 (2006)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Oleksandr Marchenko
    • 1
    Email author
  • Anatoly Anisimov
    • 1
  • Andrii Nykonenko
    • 2
  • Tetiana Rossada
    • 1
  • Egor Melnikov
    • 3
  1. 1.Taras Shevchenko National University of KyivKyivUkraine
  2. 2.International Research and Training Center for IT and SystemsKyivUkraine

Personalised recommendations