Abstract
Sentiment analysis is the computational study of how opinions, attitudes, emotions, and perspectives are expressed in language, and has been the important task of natural language processing. Sentiment analysis is highly valuable for both research and practical applications. The focuses were put on the difficulties in the construction of sentiment classifiers which normally need tremendous labeled domain training data, and a novel unsupervised framework was proposed to make use of the Chinese idiom resources to develop a general sentiment classifier. Furthermore, the domain adaption of general sentiment classifier was improved by taking the general classifier as the base of a self-training procedure to get a domain self-training sentiment classifier. To validate the effect of the unsupervised framework, several experiments were carried out on publicly available Chinese online reviews dataset. The experiments show that the proposed framework is effective and achieves encouraging results. Specifically, the general classifier outperforms two baselines (a Naïve 50% baseline and a cross-domain classifier), and the bootstrapping self-training classifier approximates the upper bound domain-specific classifier with the lowest accuracy of 81.5%, but the performance is more stable and the framework needs no labeled training dataset.
Similar content being viewed by others
References
MILLER M, SATHI C, WIESENTHAL D, LESKOVEC J, POTTS C. Sentiment flow through hyperlink networks [C]// Proceedings of the 5th International AAAI Conference on Weblogs and Social Media. Barcelona, Spain: Association for the Advancement of Artificial Intelligence, 2011: 550–553.
PANG B, LEE L. Opinion mining and sentiment analysis [J]. Foundations and Trends in Information Retrieval, 2008, 2(1/2): 1–135.
LIU B. Sentiment analysis and opinion mining [J]. Synthesis Lectures on Human Language Technologies, 2012, 5(1): 1–167.
PANG B, LEE L, VAITHYANATHAN S. Thumbs up: Sentiment classification using machine learning techniques [C]// Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2002: 79–86.
BOIY E, MOENS M. A machine learning approach to sentiment analysis in multilingual Web texts [J]. Information Retrieval, 2009, 12(5): 526–558.
JIANG L, YU M, ZHOU M, LIU X, ZHAO T. Target-dependent Twitter sentiment classification [C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Oregon, Portland: Association for Computational Linguistics, 2011: 151–160.
MAAS A, DALY R, PHAM P, HUANG D, NG A, POTTS C. Learning word vectors for sentiment analysis [C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA, USA: Association for Computational Linguistics, 2011: 142–150.
YESSENALINA A, YUE Y, CARDIE C. Multi-level structured models for document-level sentiment classification [C]// Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2010: 1046–1056.
YU N, KUBLER S. Semi-supervised learning for opinion detection [C]// Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on. Toronto, Canada: IEEE, 2010: 249–252.
SINDHWANI V, MELVILLE P. Document-word co-regularization for semi-supervised sentiment analysis [C]// Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on. Pisa, Italy: IEEE, 2008: 1025–1030.
TAN S, WU G, TANG H, CHENG X. A novel scheme for domain-transfer problem in the context of sentiment analysis [C]// Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management. New York, NY, USA: ACM, 2007: 979–982.
PAN S J, NI X, SUN J T, YANG Q, CHEN Z. Cross-domain sentiment classification via spectral feature alignment [C]// Proceedings of the 19th International Conference on World Wide Web. New York, NY, USA: ACM, 2010: 751–760.
LU B, TAN C, CARDIE C, TSOU K Y B. Joint bilingual sentiment classification with unlabeled parallel corpora [C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA, USA: Association for Computational Linguistics, 2011: 320–330.
BRODY S, ELHADAD N. An unsupervised aspect-sentiment model for online reviews [C]// Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2010: 804–812.
TAN S, WANG Y, CHENG X. Combining learn-based and lexicon-based techniques for sentiment detection without using labeled examples [C]// Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY, USA: ACM, 2008: 743–744.
LIU J, SENEFF S. Review sentiment scoring via a parse-and-paraphrase paradigm [C]// Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1. Stroudsburg, PA, USA: Association for Computational Linguistics, 2009: 161–169.
BACCIANELLA S, ESULI A, SEBASTIANI F. Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining [C]// Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC’10), Valletta, Malta. European Language Resources Association (ELRA). 2010: 417–422.
STONE P J, DUNPHY D C, SMITH M S, OGILVIE D M. The general inquirer: A computer approach to content analysis [J]. Journal of Regional Science, 1968, 8(1): 113–116.
DONG Z, DONG Q. HowNet and the computation of meaning [M]. Beijing: World Scientific, 2006: 53–56.
KU L W, LIANG Y T, CHEN H H. Opinion extraction, summarization and tracking in news and blog corpora [C]// Proceedings of AAAI-2006 Spring Symposium on Computational Approaches to Analyzing Weblogs. Stanford, US: American Association for Artificial Intelligence, 2006: 568–575.
LOU D, YAO T. Semantic polarity analysis and opinion mining on Chinese review sentences [J]. Journal of Computer Applications, 2006, 11: 30–45.
SUN Y, WERNER V, ZHANG X. A robust feature extraction approach based on an auditory model for classification of speech and expressiveness [J]. Journal of Central South University, 2012, 19: 504–510.
NIU D, WANG J, LIU J. Knowledge mining collaborative DESVM correction method in short-term load forecasting [J]. Journal of Central South University of Technology, 2011, 18: 1211–1216.
TeacherCn. China Education Network [EB/OL]. [06/06/2012]. http://chengyu.teachercn.com (in Chinese).
TAN Song-bo. ChnSentiCorp [EB/OL]. [06/06/2012]. http://www.searchforum.org.cn/tansongbo/corpus/ (in Chinese).
ZHANG Chi-yuan. Pymmseg-cpp [EB/OL]. [06/06/2012]. https://github.com/pluskid/pymmseg-cpp.
NLTK. Natural Language ToolKits [EB/OL]. [06/06/2012]. http://nltk.org.
CHANG C C, LIN C J. LIBSVM: A library for support vector machines [J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2011, 2(3): 1–27.
GALAVOTTI L, SEBASTIANI F, SIMI M. Experiments on the use of feature selection and negative evidence in automated text categorization [J]. Research and Advanced Technology for Digital Libraries, 2000, 1(3): 59–68.
BLUM A, CHAWLA S. Learning from labeled and unlabeled data using graph mincuts [C]// Proceedings of the Eighteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA, 2001: 19–26.
Author information
Authors and Affiliations
Corresponding author
Additional information
Foundation item: Projects(61170156, 60933005) supported by the National Natural Science Foundation of China
Rights and permissions
About this article
Cite this article
Xie, Sx., Wang, T. Construction of unsupervised sentiment classifier on idioms resources. J. Cent. South Univ. 21, 1376–1384 (2014). https://doi.org/10.1007/s11771-014-2075-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11771-014-2075-4