Skip to main content
Log in

Construction of unsupervised sentiment classifier on idioms resources

  • Published:
Journal of Central South University Aims and scope Submit manuscript

Abstract

Sentiment analysis is the computational study of how opinions, attitudes, emotions, and perspectives are expressed in language, and has been the important task of natural language processing. Sentiment analysis is highly valuable for both research and practical applications. The focuses were put on the difficulties in the construction of sentiment classifiers which normally need tremendous labeled domain training data, and a novel unsupervised framework was proposed to make use of the Chinese idiom resources to develop a general sentiment classifier. Furthermore, the domain adaption of general sentiment classifier was improved by taking the general classifier as the base of a self-training procedure to get a domain self-training sentiment classifier. To validate the effect of the unsupervised framework, several experiments were carried out on publicly available Chinese online reviews dataset. The experiments show that the proposed framework is effective and achieves encouraging results. Specifically, the general classifier outperforms two baselines (a Naïve 50% baseline and a cross-domain classifier), and the bootstrapping self-training classifier approximates the upper bound domain-specific classifier with the lowest accuracy of 81.5%, but the performance is more stable and the framework needs no labeled training dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. MILLER M, SATHI C, WIESENTHAL D, LESKOVEC J, POTTS C. Sentiment flow through hyperlink networks [C]// Proceedings of the 5th International AAAI Conference on Weblogs and Social Media. Barcelona, Spain: Association for the Advancement of Artificial Intelligence, 2011: 550–553.

    Google Scholar 

  2. PANG B, LEE L. Opinion mining and sentiment analysis [J]. Foundations and Trends in Information Retrieval, 2008, 2(1/2): 1–135.

    Article  Google Scholar 

  3. LIU B. Sentiment analysis and opinion mining [J]. Synthesis Lectures on Human Language Technologies, 2012, 5(1): 1–167.

    Article  Google Scholar 

  4. PANG B, LEE L, VAITHYANATHAN S. Thumbs up: Sentiment classification using machine learning techniques [C]// Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2002: 79–86.

    Chapter  Google Scholar 

  5. BOIY E, MOENS M. A machine learning approach to sentiment analysis in multilingual Web texts [J]. Information Retrieval, 2009, 12(5): 526–558.

    Article  Google Scholar 

  6. JIANG L, YU M, ZHOU M, LIU X, ZHAO T. Target-dependent Twitter sentiment classification [C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Oregon, Portland: Association for Computational Linguistics, 2011: 151–160.

    Google Scholar 

  7. MAAS A, DALY R, PHAM P, HUANG D, NG A, POTTS C. Learning word vectors for sentiment analysis [C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA, USA: Association for Computational Linguistics, 2011: 142–150.

    Google Scholar 

  8. YESSENALINA A, YUE Y, CARDIE C. Multi-level structured models for document-level sentiment classification [C]// Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2010: 1046–1056.

    Google Scholar 

  9. YU N, KUBLER S. Semi-supervised learning for opinion detection [C]// Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on. Toronto, Canada: IEEE, 2010: 249–252.

    Chapter  Google Scholar 

  10. SINDHWANI V, MELVILLE P. Document-word co-regularization for semi-supervised sentiment analysis [C]// Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on. Pisa, Italy: IEEE, 2008: 1025–1030.

    Chapter  Google Scholar 

  11. TAN S, WU G, TANG H, CHENG X. A novel scheme for domain-transfer problem in the context of sentiment analysis [C]// Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management. New York, NY, USA: ACM, 2007: 979–982.

    Chapter  Google Scholar 

  12. PAN S J, NI X, SUN J T, YANG Q, CHEN Z. Cross-domain sentiment classification via spectral feature alignment [C]// Proceedings of the 19th International Conference on World Wide Web. New York, NY, USA: ACM, 2010: 751–760.

    Chapter  Google Scholar 

  13. LU B, TAN C, CARDIE C, TSOU K Y B. Joint bilingual sentiment classification with unlabeled parallel corpora [C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA, USA: Association for Computational Linguistics, 2011: 320–330.

    Google Scholar 

  14. BRODY S, ELHADAD N. An unsupervised aspect-sentiment model for online reviews [C]// Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2010: 804–812.

    Google Scholar 

  15. TAN S, WANG Y, CHENG X. Combining learn-based and lexicon-based techniques for sentiment detection without using labeled examples [C]// Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY, USA: ACM, 2008: 743–744.

    Google Scholar 

  16. LIU J, SENEFF S. Review sentiment scoring via a parse-and-paraphrase paradigm [C]// Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1. Stroudsburg, PA, USA: Association for Computational Linguistics, 2009: 161–169.

    Chapter  Google Scholar 

  17. BACCIANELLA S, ESULI A, SEBASTIANI F. Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining [C]// Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC’10), Valletta, Malta. European Language Resources Association (ELRA). 2010: 417–422.

    Google Scholar 

  18. STONE P J, DUNPHY D C, SMITH M S, OGILVIE D M. The general inquirer: A computer approach to content analysis [J]. Journal of Regional Science, 1968, 8(1): 113–116.

    Article  Google Scholar 

  19. DONG Z, DONG Q. HowNet and the computation of meaning [M]. Beijing: World Scientific, 2006: 53–56.

    Book  Google Scholar 

  20. KU L W, LIANG Y T, CHEN H H. Opinion extraction, summarization and tracking in news and blog corpora [C]// Proceedings of AAAI-2006 Spring Symposium on Computational Approaches to Analyzing Weblogs. Stanford, US: American Association for Artificial Intelligence, 2006: 568–575.

    Google Scholar 

  21. LOU D, YAO T. Semantic polarity analysis and opinion mining on Chinese review sentences [J]. Journal of Computer Applications, 2006, 11: 30–45.

    Google Scholar 

  22. SUN Y, WERNER V, ZHANG X. A robust feature extraction approach based on an auditory model for classification of speech and expressiveness [J]. Journal of Central South University, 2012, 19: 504–510.

    Article  Google Scholar 

  23. NIU D, WANG J, LIU J. Knowledge mining collaborative DESVM correction method in short-term load forecasting [J]. Journal of Central South University of Technology, 2011, 18: 1211–1216.

    Article  Google Scholar 

  24. TeacherCn. China Education Network [EB/OL]. [06/06/2012]. http://chengyu.teachercn.com (in Chinese).

  25. TAN Song-bo. ChnSentiCorp [EB/OL]. [06/06/2012]. http://www.searchforum.org.cn/tansongbo/corpus/ (in Chinese).

  26. ZHANG Chi-yuan. Pymmseg-cpp [EB/OL]. [06/06/2012]. https://github.com/pluskid/pymmseg-cpp.

  27. NLTK. Natural Language ToolKits [EB/OL]. [06/06/2012]. http://nltk.org.

  28. CHANG C C, LIN C J. LIBSVM: A library for support vector machines [J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2011, 2(3): 1–27.

    Article  Google Scholar 

  29. GALAVOTTI L, SEBASTIANI F, SIMI M. Experiments on the use of feature selection and negative evidence in automated text categorization [J]. Research and Advanced Technology for Digital Libraries, 2000, 1(3): 59–68.

    Article  Google Scholar 

  30. BLUM A, CHAWLA S. Learning from labeled and unlabeled data using graph mincuts [C]// Proceedings of the Eighteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA, 2001: 19–26.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Song-xian Xie  (谢松县).

Additional information

Foundation item: Projects(61170156, 60933005) supported by the National Natural Science Foundation of China

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xie, Sx., Wang, T. Construction of unsupervised sentiment classifier on idioms resources. J. Cent. South Univ. 21, 1376–1384 (2014). https://doi.org/10.1007/s11771-014-2075-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11771-014-2075-4

Key words

Navigation