Automatic Term Extraction for Sentiment Classification of Dynamically Updated Text Collections into Three Classes

  • Yuliya Rubtsova
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 468)


This paper presents an automatic term extraction approach for building a vocabulary that is constantly updated. A prepared dictionary is used for sentiment classification into three classes (positive, neutral, negative). In addition, the results of sentiment classification are described and the accuracy of methods based on various weighting schemes is compared. The paper also demonstrates the computational complexity of generating representations for N dynamic documents depending on the weighting scheme used.


Corpus linguistics sentiment analysis information extraction text classification and categorization social networks data analysis 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Rubtsova, Y.: A method for development and analysis of short text corpus for the review classification task. In: Proceedings of Conferences Digital Libraries: Advanced Methods and Technologies, Digital Collections, RCDL 2013, pp. 269–275 (2013)Google Scholar
  2. 2.
    Hu, M., Liu, B.: Mining and Summarizing Customer Reviews. In: KDD 2004, Seattle, pp. 168–177 (2004)Google Scholar
  3. 3.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Journal of Information Processing and management 24(5), 513–523 (1988)CrossRefGoogle Scholar
  4. 4.
    Paltoglou, G., Thelwall, M.: A study of information retrieval weighting schemes for sentiment analysis. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, July 11-16, pp. 1386–1395 (2010)Google Scholar
  5. 5.
    Jones, K.S.: A Statistical Interpretation of Term Specificity and Its Application in Retrieval. J. Documentation 28(1), 11–21 (1972)CrossRefGoogle Scholar
  6. 6.
    Reed, J., Jiao, Y., Potok, T.: TF-ICF: A new term weighting scheme for clustering dynamic data streams. In: Proceedings of the 5th International Conference on Machine Learning and Applications, USA, pp. 258–263 (2006)Google Scholar
  7. 7.
    Lertnattee, V., Theeramunkong, T.: Analysis of inverse class frequency in centroid-based text classification. In: Proceedings of the 4th International Symposium on Communication and Information Technology, Japan, pp. 1171–1176 (2004)Google Scholar
  8. 8.
    Lertnattee, V., Theeramunkong, T.: Improving Thai academic web page classification using inverse class frequency and web link information. In: Proceedings of the 22nd International Conference on Advanced Information Networking and Applications Workshops, Japan, pp. 1144–1149 (2008)Google Scholar
  9. 9.
    Jones, K.S.: A Statistical Interpretation of Term Specificity and Its Application in Retrieval. J. Documentation 60(5), 493–502 (2004)CrossRefGoogle Scholar
  10. 10.
    Read, J.: Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification. In: Proceedings of the Student Research Workshop at the 2005 Annual Meeting of the Association for Computational Linguistics, pp. 43–48. Ann Arbor, Michigan (2005)Google Scholar
  11. 11.
    Short text collection,
  12. 12.
    Lan, M., Tan, C.L., Su, J., Lu, Y.: Supervised and Traditional Term Weighting Methods for Automatic Text Categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(4), 721–735 (2009)CrossRefGoogle Scholar
  13. 13.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2002)CrossRefMathSciNetGoogle Scholar
  14. 14.
    Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines (2001),
  15. 15.
    LIBSVM – A Library for Support Vector Machines, (retrieved on July 02, 2014)

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Yuliya Rubtsova
    • 1
  1. 1.The A.P. Ershov Institute of Informatics Systems (IIS)Siberian Branch of the Russian Academy of SciencesRussia

Personalised recommendations