Advertisement

Improving Arabic Microblog Retrieval with Distributed Representations

  • Shahad Alshalan
  • Raghad AlshalanEmail author
  • Hend Al-Khalifa
  • Reem Suwaileh
  • Tamer Elsayed
Conference paper
  • 42 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12004)

Abstract

Query expansion (QE) using pseudo relevance feedback (PRF) is one of the approaches that has been shown to be effective for improving microblog retrieval. In this paper, we investigate the performance of three different embedding-based methods on Arabic microblog retrieval: Embedding-based QE, Embedding-based PRF, and PRF incorporated with embedding-based reranking. Our experimental results over three variants of EveTAR test collection showed a consistent improvement of the reranking method over the traditional PRF baseline using both MAP and P@10 evaluation measures. The improvement is statistically-significant in some cases. However, while the embedding-based QE fails to improve over the traditional PRF, the embedding-based PRF successfully outperforms the baseline in several cases, with a statistically-significant improvement using MAP measure over two variants of the test collection.

Keywords

Twitter Arabic Query expansion Word embeddings 

References

  1. 1.
    Abdelali, A., Darwish, K., Durrani, N., Mubarak, H.: Farasa: a fast and furious segmenter for Arabic. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pp. 11–16 (2016)Google Scholar
  2. 2.
    Cer, D., et al.: Universal sentence encoder. arXiv preprint arXiv:1803.11175 (2018)
  3. 3.
    Conneau, A., Kiela, D., Schwenk, H., Barrault, L., Bordes, A.: Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364 (2017)
  4. 4.
    Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
  5. 5.
    Dhingra, B., Zhou, Z., Fitzpatrick, D., Muehl, M., Cohen, W.: Tweet2Vec: character-based distributed representations for social media. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 269–274. Association for Computational Linguistics, Berlin, August 2016. http://anthology.aclweb.org/P16-2044
  6. 6.
    Diaz, F., Mitra, B., Craswell, N.: Query expansion with locally-trained word embeddings. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Long Papers), pp. 367–377. Association for Computational Linguistics, Berlin (2016)Google Scholar
  7. 7.
    Efron, M., Organisciak, P., Fenlon, K.: Improving retrieval of short texts through document expansion. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 911–920. ACM (2012)Google Scholar
  8. 8.
    El-Ganainy, T., Magdy, W., Gao, W., Wei, Z.: QCRI at TREC 2013 microblog track. In: Proceedings of the 22nd Text Retrieval Conference (TREC) (2013)Google Scholar
  9. 9.
    El-Ganainy, T., Magdy, W., Rafea, A.: Hyperlink-extended pseudo relevance feedback for improved microblog retrieval. In: Proceedings of the First International Workshop on Social Media Retrieval and Analysis (SoMeRA 2014), pp. 7–12. ACM Press, Gold Coast (2014)Google Scholar
  10. 10.
    El Mahdaouy, A., El Alaoui, S.O., Gaussier, E.: Word-embedding-based pseudo-relevance feedback for Arabic information retrieval. J. Inf. Sci. 45(4), 429–442 (2018)CrossRefGoogle Scholar
  11. 11.
    Ganguly, D., Roy, D., Mitra, M., Jones, G.: Representing documents and queries as sets of word embedded vectors for information retrieval. In: ACM SIGIR Workshop on Neural Information Retrieval (Neu-IR) (2016)Google Scholar
  12. 12.
    Han, Z., Li, X., Yang, M., Qi, H., Li, S., Zhao, T.: HIT at TREC 2012 microblog track. In: Proceedings of the 21st Text Retrieval Conference (TREC), vol. 12, p. 19 (2012)Google Scholar
  13. 13.
    Hasanain, M., Elsayed, T.: QU at TREC-2014: online clustering with temporal and topical expansion for tweet timeline generation. In: Proceedings of the 23rd Text Retrieval Conference (TREC) (2014)Google Scholar
  14. 14.
    Hasanain, M., Suwaileh, R., Elsayed, T., Kutlu, M., Almerekhi, H.: EveTAR: building a large-scale multi-task test collection over Arabic tweets. Inf. Retr. J. 21(4), 307–336 (2018)CrossRefGoogle Scholar
  15. 15.
    Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)
  16. 16.
    Kenter, T., De Rijke, M.: Short text similarity with word embeddings. In: Proceedings of the 24th ACM International Conference on Information and Knowledge Management (CIKM), pp. 1411–1420. ACM (2015)Google Scholar
  17. 17.
    Kuzi, S., Carmel, D., Libov, A., Raviv, A.: Query expansion for email search. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017, pp. 849–852. ACM (2017)Google Scholar
  18. 18.
    Kuzi, S., Shtok, A., Kurland, O.: Query expansion using word embeddings. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management - CIKM 2016, pp. 1929–1932. ACM Press, Indianapolis (2016)Google Scholar
  19. 19.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv:1301.3781 [cs], January 2013
  20. 20.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  21. 21.
    Mitra, B., Craswell, N.: Neural models for information retrieval. arXiv preprint arXiv:1705.01509 (2017)
  22. 22.
    Mitra, B., Craswell, N., et al.: An introduction to neural information retrieval. Found. Trends® Inf. Retr. 13(1), 1–126 (2018)CrossRefGoogle Scholar
  23. 23.
    Mitra, B., Nalisnick, E., Craswell, N., Caruana, R.: A dual embedding space model for document ranking. arXiv preprint arXiv:1602.01137 (2016)
  24. 24.
    Miyanishi, T., Seki, K., Uehara, K.: Improving pseudo-relevance feedback via tweet selection. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 439–448. ACM (2013)Google Scholar
  25. 25.
    Nalisnick, E., Mitra, B., Craswell, N., Caruana, R.: Improving document ranking with dual word embeddings. In: Proceedings of the 25th International Conference Companion on World Wide Web, pp. 83–84. International World Wide Web Conferences Steering Committee (2016)Google Scholar
  26. 26.
    Onal, K.D., et al.: Neural information retrieval: at the end of the early years. Inf. Retr. J. 21(2–3), 111–182 (2018)CrossRefGoogle Scholar
  27. 27.
    Peters, M.E., et al.: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)
  28. 28.
    Rekabsaz, N., Lupu, M., Hanbury, A., Zamani, H.: Word embedding causes topic shifting; exploit global context! In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1105–1108. ACM (2017)Google Scholar
  29. 29.
    Soliman, A.B., Eissa, K., El-Beltagy, S.R.: AraVec: a set of Arabic word embedding models for use in Arabic NLP. Procedia Comput. Sci. 117, 256–265 (2017)CrossRefGoogle Scholar
  30. 30.
    Wei, Z., Gao, W., El-Ganainy, T., Magdy, W., Wong, K.F.: Ranking model selection and fusion for effective microblog search. In: Proceedings of the First International Workshop on Social Media Retrieval and Analysis (SoMeRA 2014), pp. 21–26. ACM Press, Gold Coast (2014)Google Scholar
  31. 31.
    Zamani, H., Croft, W.B.: Embedding-based query language models. In: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval, pp. 147–156. ACM (2016)Google Scholar
  32. 32.
    Zheng, G., Callan, J.: Learning to reweight terms with distributed representations. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 575–584. ACM (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Shahad Alshalan
    • 1
  • Raghad Alshalan
    • 1
    Email author
  • Hend Al-Khalifa
    • 2
  • Reem Suwaileh
    • 3
  • Tamer Elsayed
    • 3
  1. 1.Imam Abdulrahman Bin Faisal UniversityDammamSaudi Arabia
  2. 2.King Saud UniversityRiyadhSaudi Arabia
  3. 3.Qatar UniversityDohaQatar

Personalised recommendations