Skip to main content

Exploring Jaccard Similarity and Cosine Similarity for Developing an Assamese Question-Answering System

  • Conference paper
  • First Online:
Proceedings of World Conference on Artificial Intelligence: Advances and Applications (WWCA 1997)

Abstract

This paper gives a complete study on the creation of an Assamese question-answering system (AQAS) employing two fundamental methods: cosine similarity and Jaccard similarity. Cosine similarity is a measure of similarity between two vectors, whereas Jaccard similarity is a measure of similarity between two sets. To improve efficiency, the cosine similarity method is paired with non-negative matrix factorization (NMF), which substantially reduces both space and temporal complexity. Our approach gets an accuracy rate of 93.8% for cosine similarity and 87.38% for Jaccard similarity after thorough testing. These results demonstrate the efficiency of our method in providing precise Assamese language responses. This work provides useful insights into the topic of Assamese question-answering systems and their practical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 279.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lai Y, Jia Y, Lin Y, Feng Y, Zhao D (2018) A Chinese question answering system for single-relation factoid questions. In: Natural language processing and chinese computing: 6th CCF international conference, NLPCC 2017, Dalian, China, Nov 8–12, 2017, Proceedings 6. Springer, pp 124–135

    Google Scholar 

  2. Sahu S, Vasnik N, Roy D (2012) Prashnottar: a Hindi question answering system. Int J Comput Sci Inf Technol 4(2):149

    Google Scholar 

  3. Hammo B, Abu-Salem H, Lytinen SL, Evens M (2002) Qarab: A: question answering system to support the Arabic language. In: Proceedings of the ACL-02 workshop on computational approaches to Semitic languages

    Google Scholar 

  4. Gupta P, Gupta V (2012) A survey of text question answering techniques. Int J Comput Appl 53(4)

    Google Scholar 

  5. Gupta V, Lehal GS (2011) Named entity recognition for Punjabi language text summarization. Int J Comput Appl 33(3):28–32

    Google Scholar 

  6. Uddin MM, Patwary NS, Hasan MM, Rahman T, Tanveer M (2020) End-to-end neural network for paraphrased question answering architecture with single supporting line in Bangla language. Int J Future Comput Commun 9(3)

    Google Scholar 

  7. Mishra A, Jain SK (2016) A survey on question answering systems with classification. J King Saud Univ-Comput Inf Sci 28(3):345–361

    Google Scholar 

  8. Dhanjal GS, Sharma S, Sarao PK (2016) Gravity based Punjabi question answering system. Int J Comput Appl 147(3):21

    Google Scholar 

  9. Gomes Jr J, de Mello RC, Ströele V, de Souza JF (2022) A hereditary attentive template-based approach for complex knowledge base question answering systems. Expert Syst Appl 205:117, 725

    Google Scholar 

  10. Do P, Phan TH (2022) Developing a Bert based triple classification model using knowledge graph embedding for question answering system. Appl Intell 52(1):636–651

    Article  Google Scholar 

  11. Pradhan R, Sharma DK (2022) An ensemble deep learning classifier for sentiment analysis on code-mix Hindi–English data. Soft Comput 1–18

    Google Scholar 

  12. Oh JH, Torisawa K, Hashimoto C, Kawada T, De Saeger S, Wang Y et al (2012) Why question answering using sentiment analysis and word classes. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, pp 368–378

    Google Scholar 

  13. Pessutto L, Moreira V (2022) Ufrgsent at semeval-2022 task 10: structured sentiment analysis using a question answering model. In: Proceedings of the 16th international workshop on semantic evaluation (SemEval-2022), pp 1360–1365

    Google Scholar 

  14. Pradhan R, Sharma DK (2022) A hierarchical topic modelling approach for short text clustering. Int J Inf Commun Technol 20(4):463–481

    Google Scholar 

  15. da Silva JWF, Venceslau ADP, Sales JE, Maia JGR, Pinheiro VCM, Vidal VMP (2020) A short survey on end-to-end simple question answering systems. Artif Intell Rev 53(7):5429–5453

    Article  Google Scholar 

  16. Gupta D, Kumari S, Ekbal A, Bhattacharyya P (2018) Mmqa: a multi-domain multi-lingual question-answering framework for English and Hindi. In: Proceedings of the Eleventh international conference on language resources and evaluation (LREC 2018)

    Google Scholar 

  17. Choo S, Kim W (2023) A study on the evaluation of tokenizer performance in natural language processing. Appl Artif Intell 37(1):2175, 112

    Google Scholar 

  18. Kochhar TS, Goyal G (2022) Design and implementation of stop words removal method for Punjabi language using finite automata. In: Advances in data computing, communication and security: proceedings of I3CS2021. Springer, pp 89–98

    Google Scholar 

  19. Kowsher M, Rahman MM, Ahmed SS, Prottasha NJ (2019) Bangla intelligence question answering system based on mathematics and statistics. In: 2019 22nd international conference on computer and information technology (ICCIT). IEEE, pp 1–6

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nomi Baruah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Baruah, N., Gupta, S., Ghosh, S., Afrid, S.N., Kakoty, C., Phukan, R. (2023). Exploring Jaccard Similarity and Cosine Similarity for Developing an Assamese Question-Answering System. In: Tripathi, A.K., Anand, D., Nagar, A.K. (eds) Proceedings of World Conference on Artificial Intelligence: Advances and Applications. WWCA 1997. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-99-5881-8_8

Download citation

Publish with us

Policies and ethics