Mining of Relevant and Informative Posts from Text Forums

  • Kseniya BurayaEmail author
  • Vladislav Grozin
  • Vladislav Trofimov
  • Pavel Vinogradov
  • Natalia Gusarova
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 947)


In the modern world, the competitive advantage for every person is the possibility to obtain the information in a fast and comfortable way. Web forums occupy a significant place among the sources of information. It is a good place to gain professionally significant knowledge on different topics. However, sometimes it is not easy to identify the places on the forum, which contains useful information corresponding user demands. In this paper we consider the problem of automatic forum text summarization and describe the methods, which can help to solve it. We study the difference between relevance-oriented and useful-oriented query types. We will describe our dataset, that contains over 4000 of marked posts from web forums about various subject domains. The posts were marked by experts, by estimating them on a scale from 0 to 5 for selected query types. The results of our study can provide background for creation informational retrieval applications that will decrease the time of user’s searching and increase the quality of search results.


Text forums Information retrieval Relevant information 



This work was financially supported by the Government of the Russian Federation (Grant 08-08).


  1. 1.
    Agresti, A., Kateri, M.: Categorical Data Analysis. In: Lovric, M. (ed.) International Encyclopedia of Statistical Science. Springer, Heidelberg (2011).
  2. 2.
    Al-Hashemi, R.: Text summarization extraction system (TSES) using extracted keywords. Int. Arab J. e-Technol. 1(4), 164–168 (2010)Google Scholar
  3. 3.
    Almahy, I., Salim, N.: Web discussion summarization: study review. In: Herawan, T., Deris, M.M., Abawajy, J. (eds.) Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013). LNEE, vol. 285, pp. 649–656. Springer, Singapore (2014). Scholar
  4. 4.
    Beliga, S., Meštrović, A., Martinčić-Ipšić, S.: An overview of graph-based keyword extraction methods and approaches. J. Inf. Organ. Sci. 39(1), 1–20 (2015)Google Scholar
  5. 5.
    Bishop, C.M.: Pattern recognition. Mach. Learn. 128 (2006)Google Scholar
  6. 6.
    Biyani, P., Bhatia, S., Caragea, C., Mitra, P.: Using non-lexical features for identifying factual and opinionative threads in online forums. Knowl. Based Syst. 69, 170–178 (2014)CrossRefGoogle Scholar
  7. 7.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  8. 8.
    Bottenberg, R.A., Ward, J.H.: Applied multiple linear regression. Technical report, DTIC Document (1963)Google Scholar
  9. 9.
    Elbedweihy, K.M., Wrigley, S.N., Clough, P., Ciravegna, F.: An overview of semantic search evaluation initiatives. Web Semant. Sci. Serv. Agents World Wide Web 30, 82–105 (2015)CrossRefGoogle Scholar
  10. 10.
    Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Grozin, V., Dobrenko, N., Gusarova, N., Ning, T.: The application of machine learning methods for analysis of text forums for creating learning objects. Comput. Linguist. Intellect. Technol. 1, 199–209 (2015)Google Scholar
  12. 12.
    Grozin, V.A., Gusarova, N.F., Dobrenko, N.V.: Feature selection for language independent text forum summarization. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2015. CCIS, vol. 518, pp. 63–71. Springer, Cham (2015). Scholar
  13. 13.
    Harman, D.: Information Retrieval Evaluation. Synthesis Lectures on Information Concepts, Retrieval, and Services, vol. 3, no. 2, pp. 1–119 (2011CrossRefGoogle Scholar
  14. 14.
    Kelly, D.: Methods for evaluating interactive information retrieval systems with users. Found. Trends Inf. Retr. 3(12), 1–224 (2009)Google Scholar
  15. 15.
    Lomakina, L., Rodionov, V., Surkova, A.: Hierarchical clustering of text documents. Autom. Remote Control 75(7), 1309–1315 (2014)CrossRefGoogle Scholar
  16. 16.
    Lott, B.: Survey of keyword extraction techniques. UNM Education (2012)Google Scholar
  17. 17.
    Mikolov, T., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (2013)Google Scholar
  18. 18.
    Nenkova, A., McKeown, K.: A survey of text summarization techniques. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 43–76. Springer, Boston (2012). Scholar
  19. 19.
    Nettleton, D.F.: Data mining of social networks represented as graphs. Comput. Sci. Rev. 7, 1–34 (2013)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Oufaida, H., Nouali, O., Blache, P.: Minimum redundancy and maximum relevance for single and multi-document Arabic text summarization. J. King Saud Univ. Comput. Inf. Sci. 26(4), 450–461 (2014)Google Scholar
  21. 21.
    Petrelli, D.: On the role of user-centred evaluation in the advancement of interactive information retrieval. Inf. Process. Manage. 44(1), 22–38 (2008)CrossRefGoogle Scholar
  22. 22.
    Ren, Z., Ma, J., Wang, S., Liu, Y.: Summarizing web forum threads based on a latent topic propagation process. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 879–884. ACM (2011). Mining of relevant and informative posts from text forums 15Google Scholar
  23. 23.
    Romero, C., López, M.I., Luna, J.M., Ventura, S.: Predicting students’ final performance from participation in on-line discussion forums. Comput. Educ. 68, 458–472 (2013)CrossRefGoogle Scholar
  24. 24.
    Saracevic, T.: Evaluation of evaluation in information retrieval. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 138–146. ACM (1995)Google Scholar
  25. 25.
    Schütze, H.: Introduction to information retrieval. In: Proceedings of the International Communication of Association for Computing Machinery Conference (2008)Google Scholar
  26. 26.
    Sizov, G.: Extraction-based automatic summarization: theoretical and empirical investigation of summarization techniques (2010)Google Scholar
  27. 27.
    Smine, B., Faiz, R., Desclés, J.P.: Relevant learning objects extraction based on semantic annotation. Int. J. Metadata Semant. Ontol. 8(1), 13–27 (2013)CrossRefGoogle Scholar
  28. 28.
    Sondhi, P., Gupta, M., Zhai, C., Hockenmaier, J.: Shallow information extraction from medical forum data. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 1158–1166. Association for Computational Linguistics (2010)Google Scholar
  29. 29.
    Tang, J., Yao, L., Chen, D.: Multi-topic based query-oriented summarization. In: SDM, vol. 9, pp. 1147–1158. SIAM (2009)Google Scholar
  30. 30.
    Wang, J.Z., Yan, Z., Yang, L.T., Huang, B.X.: An approach to rank reviews by fusing and mining opinions based on review pertinence. Inf. Fusion 23, 3–15 (2015)CrossRefGoogle Scholar
  31. 31.
    Wartena, C., Brussee, R.: Topic detection by clustering keywords. In: 2008 19th International Workshop on Database and Expert Systems Applications, pp. 54–58. IEEE (2008)Google Scholar
  32. 32.
    Zhao, H., Zeng, Q.: Micro-blog keyword extraction method based on graph model and semantic space. J. Multimed. 8(5), 611–617 (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Kseniya Buraya
    • 1
    Email author
  • Vladislav Grozin
    • 1
  • Vladislav Trofimov
    • 1
  • Pavel Vinogradov
    • 1
  • Natalia Gusarova
    • 1
  1. 1.ITMO UniversitySt. PetersburgRussia

Personalised recommendations