International Conference on Knowledge Engineering and the Semantic Web

Knowledge Engineering and Semantic Web pp 63-71

Feature Selection for Language Independent Text Forum Summarization

  • Vladislav A. Grozin
  • Natalia F. Gusarova
  • Natalia V. Dobrenko
Conference paper

DOI: 10.1007/978-3-319-24543-0_5

Part of the Communications in Computer and Information Science book series (CCIS, volume 518)
Cite this paper as:
Grozin V.A., Gusarova N.F., Dobrenko N.V. (2015) Feature Selection for Language Independent Text Forum Summarization. In: Klinov P., Mouromtsev D. (eds) Knowledge Engineering and Semantic Web. Communications in Computer and Information Science, vol 518. Springer, Cham

Abstract

Nowadays the need for multilingual information retrieval for searching relevant information is rising steadily. Specialized text-based forums on the Web are a valuable source of such information. However, extraction of informative messages is often hindered by large amount of non-informative posts (the so-called offtopic posts) and informal language commonly used on forums.

The paper deals with the task of automatic identification of posts potentially useful for sharing professional experience within text forums irrespective of the forum’s language. For our experiments we have selected subsets from various text forums containing different languages. Manual markup was held by native speaking experts. Textual, thread-based, and social graph features were extracted. In order to select satisfactory language-independent forum features we used gradient boosting models, relative influence metric for model analysis, and NDCG metric for measuring selection method quality.

We have formed a satisfactory set of forum features indicating the post’s utility which do not demand sophisticated linguistic analysis and is suitable for practical use.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Vladislav A. Grozin
    • 1
  • Natalia F. Gusarova
    • 1
  • Natalia V. Dobrenko
    • 1
  1. 1.National Research University of Information Technologies, Mechanics and OpticsSaint-PetersburgRussia

Personalised recommendations