International Conference on Knowledge Engineering and the Semantic Web

Knowledge Engineering and Semantic Web pp 63-71 | Cite as

Feature Selection for Language Independent Text Forum Summarization

  • Vladislav A. Grozin
  • Natalia F. Gusarova
  • Natalia V. Dobrenko
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 518)

Abstract

Nowadays the need for multilingual information retrieval for searching relevant information is rising steadily. Specialized text-based forums on the Web are a valuable source of such information. However, extraction of informative messages is often hindered by large amount of non-informative posts (the so-called offtopic posts) and informal language commonly used on forums.

The paper deals with the task of automatic identification of posts potentially useful for sharing professional experience within text forums irrespective of the forum’s language. For our experiments we have selected subsets from various text forums containing different languages. Manual markup was held by native speaking experts. Textual, thread-based, and social graph features were extracted. In order to select satisfactory language-independent forum features we used gradient boosting models, relative influence metric for model analysis, and NDCG metric for measuring selection method quality.

We have formed a satisfactory set of forum features indicating the post’s utility which do not demand sophisticated linguistic analysis and is suitable for practical use.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abbasi, A., Chen, H., Salem, A.: Sentiment Analysis in Multiple Languages: Feature Selection for Opinion Classification in Web Forums. The University of Arizona (2007). http://ai.arizona.edu/intranet/papers/AhmedAbbasi_SentimentTOIS.pdf
  2. 2.
    Alguliev, R.M., Aliguliyev, R.M., Hajirahimova, M.S., Mehdiyev, C.A.: MCMR: Maximum coverage and minimum redundant text summarization model. Expert Systems with Applications 38, 14514–14522 (2011)CrossRefGoogle Scholar
  3. 3.
    Banea, C., Mihalcea, R., Wiebe, J.: Sense-level subjectivity in a multilingual setting. Computer Speech and Language 28, 7–19 (2014)CrossRefGoogle Scholar
  4. 4.
    Biyani, P., Bhati, S., Caragea, C., Mitra, P.: Using non-lexical features for identifying factual and opinionative threads in online forums. Knowledge-Based Systems 69, 170–178 (2014)CrossRefGoogle Scholar
  5. 5.
    Carbonaro, A.: WordNet-based Summarization to Enhance Learning Interaction Tutoring. Peer Reviewed Papers 6(2) (2010)Google Scholar
  6. 6.
    Chen, J.-S., Hsieh, C.-L., Hsu, F.-C.: A study on Chinese word segmentation: Genetic algorithm approach. Information Management Research 2(2), 27–44 (2000)Google Scholar
  7. 7.
    Ding, S.L., Cong, G., Lin, C.Y., Zhu, X.Y.: Using conditional random fields to extract contexts and answers of questions from online forums. In: Proceedings of the 46th Annual Meeting of the Association of Computational Linguistics, Columbus, Ohio, pp. 710–718. ACL (2008)Google Scholar
  8. 8.
    Freeman, L.C.: Centrality in social networks: Conceptual clarification. Social Networks 1, 215–239 (1978)CrossRefGoogle Scholar
  9. 9.
    Friedman, J.: Greedy boosting approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)MATHCrossRefGoogle Scholar
  10. 10.
    Garbacea, C., Tsagkias, M., de Rijke, M.: Feature Selection and Data Sampling Methods for Learning Reputation Dimensions. The University of Amsterdam at RepLab 2014 (2014). http://ceur-ws.org/Vol-1180/CLEF2014wn-Rep-GarbaceaEt2014.pdf
  11. 11.
    Generalized Boosted Regression Models. http://cran.r-project.org/web/packages/gbm/index.html
  12. 12.
    Hogenboom, A., Heerschop, B., Frasincar, F., Kaymak, U., de Jong, F.: Multi-lingual support for lexicon-based sentiment analysis guided by semantics. Decision Support Systems 62, 43–53 (2014)CrossRefGoogle Scholar
  13. 13.
    Huang, C.-C.: Automated knowledge transfer for Internet forum. Master thesis, Graduate School of Information Management, I-Shou University, Taiwan, ROC (2003)Google Scholar
  14. 14.
    Li, Y., Liao, T., Lai, C.: A social recommender mechanism for improving knowledge sharing in online forums. Information Processing and Management 48, 978–994 (2012)MATHCrossRefGoogle Scholar
  15. 15.
    Ren, Z., Ma, J., Wang, S., Liu, Y.: Summarizing web forum threads based on a latent topic propagation process. In: CIKM 2011, October 24–28, Glasgow, Scotland, UK (2011)Google Scholar
  16. 16.
    Jones, K.S.: Automatic summarising: the state of the art. Information Processing and Management, Special Issue on Automatic Summarising (2007)Google Scholar
  17. 17.
    Steinberger, R.: Challenges and methods for multilingual text mining. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.167.4724
  18. 18.
    Tao, Y., Liu, S., Lin, C.: Summary of FAQs from a topical forum based on the native composition structure. Expert Systems with Applications 38, 527–535 (2011)CrossRefGoogle Scholar
  19. 19.
    Wang, B., Liu, B., Sun, C., Wang, X., Sun, L.: Thread Segmentation Based Answer Detection in Chinese Online Forums. Acta Automatica Sinica 39(1) (2013)Google Scholar
  20. 20.
    Wang, L., Cardie, C.: Summarizing decisions in spoken meetings. In: Proceedings of the Workshop on Automatic Summarization for Different Genres, Media, and Languages, Portland, Oregon, June 23, 2011, pp. 16–24. Association for Computational Linguistics (2011)Google Scholar
  21. 21.
    White, D.R., Borgatti, S.P.: Betweenness centrality measures for directed graphs. Social Networks 16, 335–346 (1994)CrossRefGoogle Scholar
  22. 22.
    Yang, S.J.H., Chen, I.Y.L.: A social network-based system for supporting interactive collaboration in knowledge sharing over peer-to-peer network. International Journal of Human Computer Studies 66(1), 36–40 (2008)CrossRefGoogle Scholar
  23. 23.
    Zhou, L., Hovy, E.: Digesting virtual geek culture: the summarization of technical internet relay chats. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL 2005, Stroudsburg, PA, USA, pp. 298–305. Association for Computational Linguistics (2005)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Vladislav A. Grozin
    • 1
  • Natalia F. Gusarova
    • 1
  • Natalia V. Dobrenko
    • 1
  1. 1.National Research University of Information Technologies, Mechanics and OpticsSaint-PetersburgRussia

Personalised recommendations