Detection of Semantically Equivalent Question Pairs

Kumari, Reetu; Mishra, Rohit; Malviya, Shrikant; Tiwary, Uma Shanker

doi:10.1007/978-3-030-68449-5_2

Detection of Semantically Equivalent Question Pairs

Conference paper
First Online: 06 February 2021

1322 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12615))

Abstract

Knowledge sharing platforms like Quora have millions and billions of questions. With such a vast number of questions, there will be a lot of duplicates in it. Duplicate questions in these sites are normal, especially with the increasing number of questions asked. These redundant queries reduce efficiency and create repetitive data on the data server. Because these questions have the same answers, the user has to write the same content for each of these questions, which is a waste of time. Dealing with this issue would be significant for helping community question answering websites to sort out this problem and deduplicate their database. In this paper, we augment the Siamese-LSTM in two ways to achieve better results than the previous works. First, we augment basic Siamese-LSTM with a dense-layer (Model-1) to observe the improvement. In the second part, Siamese-LSTM is augmented with the machine learning classifier (Model-2). In both scenarios, we observed the improved results when we include the Hand-Engineered features. The proposed model (Model-1) achieves the highest accuracy of 89.11%.

R. Kumari and R. Mishra—Equally contributed.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Quora Question Pairs: https://www.kaggle.com/c/quora-question-pairs.
2.
Ask-Ubuntu Question Dataset: https://github.com/taolei87/askubuntu.
3.
GLOVE: Global Vectors for Word Representation.
4.
GLOVE Repository: https://github.com/stanfordnlp/GloVe.
5.
spaCy: https://spacy.io/.
6.
“Siamese-LSTM” is a neural-network framework consists of two identical sub-networks (LSTMs) joined at their outputs.
7.
XGBoost: eXtreme Gradient Boosting.

References

Addair, T.: Duplicate question pair detection with deep learning. Stanf. Univ. J. (2017)
Google Scholar
Bogdanova, D., dos Santos, C., Barbosa, L., Zadrozny, B.: Detecting semantically equivalent questions in online user forums. In: Proceedings of the Nineteenth Conference on Computational Natural Language Learning, pp. 123–131 (2015)
Google Scholar
Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a “siamese” time delay neural network. In: Advances in Neural Information Processing Systems, pp. 737–744 (1994)
Google Scholar
Dadashov, E., Sakshuwong, S., Yu, K.: Quora question duplication (2017)
Google Scholar
Dhariya, O., Malviya, S., Tiwary, U.S.: A hybrid approach for Hindi-English machine translation. In: 2017 International Conference on Information Networking (ICOIN), pp. 389–394. IEEE (2017)
Google Scholar
Gong, Y., Luo, H., Zhang, J.: Natural language inference over interaction space. arXiv preprint arXiv:1709.04348 (2017)
Homma, Y., Sy, S., Yeh, C.: Detecting duplicate questions with deep learning. In: Proceedings of the International Conference on Neural Information Processing Systems (NIPS) (2016)
Google Scholar
Jain, S., Malviya, S., Mishra, R., Tiwary, U.S.: Sentiment analysis: an empirical comparative study of various machine learning approaches. In: Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017), pp. 112–121 (2017)
Google Scholar
Kaggle: Quora Question Pairs. https://www.kaggle.com/quora/question-pairs-dataset. Accessed 4 June 2020
Kim, S., Kang, I., Kwak, N.: Semantic sentence matching with densely-connected recurrent and co-attentive information. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6586–6593 (2019)
Google Scholar
Lili Jiang, S.C., Dandekar, N.: Semantic Question Matching with Deep Learning, Blog. https://www.quora.com/q/quoraengineering/Semantic-Question-Matching-with-Deep-Learning. Accessed 7 July 2020
Mishra, R., Barnwal, S.K., Malviya, S., Mishra, P., Tiwary, U.S.: Prosodic feature selection of personality traits for job interview performance. In: Abraham, A., Cherukuri, A.K., Melin, P., Gandhi, N. (eds.) ISDA 2018 2018. AISC, vol. 940, pp. 673–682. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-16657-1_63
Chapter Google Scholar
Mishra, R., et al.: Computing with words through interval type-2 fuzzy sets for decision making environment. In: Tiwary, U.S., Chaudhury, S. (eds.) IHCI 2019. LNCS, vol. 11886, pp. 112–123. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-44689-5_11
Chapter Google Scholar
Mueller, J., Thyagarajan, A.: Siamese recurrent architectures for learning sentence similarity. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
Google Scholar
Parikh, A.P., Täckström, O., Das, D., Uszkoreit, J.: A decomposable attention model for natural language inference. arXiv preprint arXiv:1606.01933 (2016)
Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Revanuru, K.: Quora Questinn Pairs Report. https://karthikrevanuru.github.io/assets/documents/projects/Quora_Pairs.pdf. Accessed 5 June 2020
Pathak, S., Ayush Sharma, S.S.S.: Semantic string similarity for quora question pairs (2019)
Google Scholar
Silva, J., Rodrigues, J., Maraev, V., Saedi, C., Branco, A.: A 20% jump in duplicate question detection accuracy? Replicating IBM teams experiment and finding problems in its data preparation. META 20(4k), 1k (2018)
Google Scholar
Singh, S., Malviya, S., Mishra, R., Barnwal, S.K., Tiwary, U.S.: RNN based language generation models for a Hindi dialogue system. In: Tiwary, U.S., Chaudhury, S. (eds.) IHCI 2019. LNCS, vol. 11886, pp. 124–137. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-44689-5_12
Chapter Google Scholar
Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075 (2015)
Tung, A., Xu, E.: Determining entailment of questions in the quora dataset (2017)
Google Scholar
Vila, M., Martí, M.A., Rodríguez, H., et al.: Is this a paraphrase? What kind? Paraphrase boundaries and typology. Open J. Modern Linguist. 4(01), 205 (2014)
Article Google Scholar
Wang, Z., Hamza, W., Florian, R.: Bilateral multi-perspective matching for natural language sentences. arXiv preprint arXiv:1702.03814 (2017)
Yu, J., et al.: Modelling domain relationships for transfer learning on retrieval-based question answering systems in e-commerce. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 682–690 (2018)
Google Scholar
Zhou, C., Sun, C., Liu, Z., Lau, F.: A C-LSTM neural network for text classification. arXiv preprint arXiv:1511.08630 (2015)
Zhu, W., Yao, T., Ni, J., Wei, B., Lu, Z.: Dependency-based siamese long short-term memory network for learning sentence representations. PloS One 13(3), e0193919 (2018)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Technology, Indian Institute of Information Technology, Allahabad, Prayagraj, 211012, India
Reetu Kumari, Rohit Mishra, Shrikant Malviya & Uma Shanker Tiwary

Authors

Reetu Kumari
View author publications
You can also search for this author in PubMed Google Scholar
Rohit Mishra
View author publications
You can also search for this author in PubMed Google Scholar
Shrikant Malviya
View author publications
You can also search for this author in PubMed Google Scholar
Uma Shanker Tiwary
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rohit Mishra .

Editor information

Editors and Affiliations

Woosong University, Daejeon, Korea (Republic of)
Madhusudan Singh
Dongseo University, Busan, Korea (Republic of)
Dae-Ki Kang
Keimyung University, Daegu, Korea (Republic of)
Jong-Ha Lee
Indian Institute of Information Technoloy, Allahabad, India
Uma Shanker Tiwary
Hankuk University of Foreign Studies, Yongin, Korea (Republic of)
Dhananjay Singh
Pukyong National University, Busan, Korea (Republic of)
Wan-Young Chung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kumari, R., Mishra, R., Malviya, S., Tiwary, U.S. (2021). Detection of Semantically Equivalent Question Pairs. In: Singh, M., Kang, DK., Lee, JH., Tiwary, U.S., Singh, D., Chung, WY. (eds) Intelligent Human Computer Interaction. IHCI 2020. Lecture Notes in Computer Science(), vol 12615. Springer, Cham. https://doi.org/10.1007/978-3-030-68449-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-68449-5_2
Published: 06 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68448-8
Online ISBN: 978-3-030-68449-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics