Skip to main content

Detection of Semantically Equivalent Question Pairs

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12615))

Abstract

Knowledge sharing platforms like Quora have millions and billions of questions. With such a vast number of questions, there will be a lot of duplicates in it. Duplicate questions in these sites are normal, especially with the increasing number of questions asked. These redundant queries reduce efficiency and create repetitive data on the data server. Because these questions have the same answers, the user has to write the same content for each of these questions, which is a waste of time. Dealing with this issue would be significant for helping community question answering websites to sort out this problem and deduplicate their database. In this paper, we augment the Siamese-LSTM in two ways to achieve better results than the previous works. First, we augment basic Siamese-LSTM with a dense-layer (Model-1) to observe the improvement. In the second part, Siamese-LSTM is augmented with the machine learning classifier (Model-2). In both scenarios, we observed the improved results when we include the Hand-Engineered features. The proposed model (Model-1) achieves the highest accuracy of 89.11%.

R. Kumari and R. Mishra—Equally contributed.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Quora Question Pairs: https://www.kaggle.com/c/quora-question-pairs.

  2. 2.

    Ask-Ubuntu Question Dataset: https://github.com/taolei87/askubuntu.

  3. 3.

    GLOVE: Global Vectors for Word Representation.

  4. 4.

    GLOVE Repository: https://github.com/stanfordnlp/GloVe.

  5. 5.

    spaCy: https://spacy.io/.

  6. 6.

    “Siamese-LSTM” is a neural-network framework consists of two identical sub-networks (LSTMs) joined at their outputs.

  7. 7.

    XGBoost: eXtreme Gradient Boosting.

References

  1. Addair, T.: Duplicate question pair detection with deep learning. Stanf. Univ. J. (2017)

    Google Scholar 

  2. Bogdanova, D., dos Santos, C., Barbosa, L., Zadrozny, B.: Detecting semantically equivalent questions in online user forums. In: Proceedings of the Nineteenth Conference on Computational Natural Language Learning, pp. 123–131 (2015)

    Google Scholar 

  3. Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a “siamese” time delay neural network. In: Advances in Neural Information Processing Systems, pp. 737–744 (1994)

    Google Scholar 

  4. Dadashov, E., Sakshuwong, S., Yu, K.: Quora question duplication (2017)

    Google Scholar 

  5. Dhariya, O., Malviya, S., Tiwary, U.S.: A hybrid approach for Hindi-English machine translation. In: 2017 International Conference on Information Networking (ICOIN), pp. 389–394. IEEE (2017)

    Google Scholar 

  6. Gong, Y., Luo, H., Zhang, J.: Natural language inference over interaction space. arXiv preprint arXiv:1709.04348 (2017)

  7. Homma, Y., Sy, S., Yeh, C.: Detecting duplicate questions with deep learning. In: Proceedings of the International Conference on Neural Information Processing Systems (NIPS) (2016)

    Google Scholar 

  8. Jain, S., Malviya, S., Mishra, R., Tiwary, U.S.: Sentiment analysis: an empirical comparative study of various machine learning approaches. In: Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017), pp. 112–121 (2017)

    Google Scholar 

  9. Kaggle: Quora Question Pairs. https://www.kaggle.com/quora/question-pairs-dataset. Accessed 4 June 2020

  10. Kim, S., Kang, I., Kwak, N.: Semantic sentence matching with densely-connected recurrent and co-attentive information. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6586–6593 (2019)

    Google Scholar 

  11. Lili Jiang, S.C., Dandekar, N.: Semantic Question Matching with Deep Learning, Blog. https://www.quora.com/q/quoraengineering/Semantic-Question-Matching-with-Deep-Learning. Accessed 7 July 2020

  12. Mishra, R., Barnwal, S.K., Malviya, S., Mishra, P., Tiwary, U.S.: Prosodic feature selection of personality traits for job interview performance. In: Abraham, A., Cherukuri, A.K., Melin, P., Gandhi, N. (eds.) ISDA 2018 2018. AISC, vol. 940, pp. 673–682. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-16657-1_63

    Chapter  Google Scholar 

  13. Mishra, R., et al.: Computing with words through interval type-2 fuzzy sets for decision making environment. In: Tiwary, U.S., Chaudhury, S. (eds.) IHCI 2019. LNCS, vol. 11886, pp. 112–123. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-44689-5_11

    Chapter  Google Scholar 

  14. Mueller, J., Thyagarajan, A.: Siamese recurrent architectures for learning sentence similarity. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)

    Google Scholar 

  15. Parikh, A.P., Täckström, O., Das, D., Uszkoreit, J.: A decomposable attention model for natural language inference. arXiv preprint arXiv:1606.01933 (2016)

  16. Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  17. Revanuru, K.: Quora Questinn Pairs Report. https://karthikrevanuru.github.io/assets/documents/projects/Quora_Pairs.pdf. Accessed 5 June 2020

  18. Pathak, S., Ayush Sharma, S.S.S.: Semantic string similarity for quora question pairs (2019)

    Google Scholar 

  19. Silva, J., Rodrigues, J., Maraev, V., Saedi, C., Branco, A.: A 20% jump in duplicate question detection accuracy? Replicating IBM teams experiment and finding problems in its data preparation. META 20(4k), 1k (2018)

    Google Scholar 

  20. Singh, S., Malviya, S., Mishra, R., Barnwal, S.K., Tiwary, U.S.: RNN based language generation models for a Hindi dialogue system. In: Tiwary, U.S., Chaudhury, S. (eds.) IHCI 2019. LNCS, vol. 11886, pp. 124–137. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-44689-5_12

    Chapter  Google Scholar 

  21. Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075 (2015)

  22. Tung, A., Xu, E.: Determining entailment of questions in the quora dataset (2017)

    Google Scholar 

  23. Vila, M., Martí, M.A., Rodríguez, H., et al.: Is this a paraphrase? What kind? Paraphrase boundaries and typology. Open J. Modern Linguist. 4(01), 205 (2014)

    Article  Google Scholar 

  24. Wang, Z., Hamza, W., Florian, R.: Bilateral multi-perspective matching for natural language sentences. arXiv preprint arXiv:1702.03814 (2017)

  25. Yu, J., et al.: Modelling domain relationships for transfer learning on retrieval-based question answering systems in e-commerce. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 682–690 (2018)

    Google Scholar 

  26. Zhou, C., Sun, C., Liu, Z., Lau, F.: A C-LSTM neural network for text classification. arXiv preprint arXiv:1511.08630 (2015)

  27. Zhu, W., Yao, T., Ni, J., Wei, B., Lu, Z.: Dependency-based siamese long short-term memory network for learning sentence representations. PloS One 13(3), e0193919 (2018)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rohit Mishra .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kumari, R., Mishra, R., Malviya, S., Tiwary, U.S. (2021). Detection of Semantically Equivalent Question Pairs. In: Singh, M., Kang, DK., Lee, JH., Tiwary, U.S., Singh, D., Chung, WY. (eds) Intelligent Human Computer Interaction. IHCI 2020. Lecture Notes in Computer Science(), vol 12615. Springer, Cham. https://doi.org/10.1007/978-3-030-68449-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-68449-5_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-68448-8

  • Online ISBN: 978-3-030-68449-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics