Abstract
In this paper, we propose a novel approach to construct a Vietnamese SentiWordNet (VSWN), a lexical resource supporting sentiment analysis in Vietnamese. A SentiWordNet is typically generated from WordNet in which each synset has numerical scores to indicate its opinion polarities. However, Vietnamese WordNet is not yet available currently. Therefore, we propose a method to construct a VSWN from a Vietnamese electronic dictionary, not from WordNet. The main drawback of constructing a VSWN from a dictionary is that it is easy to suffer from the sparsity problem, since the glosses in the dictionary are short in general. As a solution to this problem, we adopt a string kernel function which measures the string similarity based on both common contiguous and non-contiguous subsequences. According to our experimental results, first, the use of string kernel outperforms a baseline model which uses the standard bag-of-word kernel. Second, the Vietnamese SentiWordNet is competitive with the English SentiWordNet which uses WordNet when it constructed. All those results prove that our methodology is effective and efficient in constructing a SentiWordNet from an electronic dictionary.
Keywords
- SentiWordNet
- Vietnamese SentiWordNet
- Opinion Mining
- String Kernel
This is a preview of subscription content, access via your institution.
Buying options
Preview
Unable to display preview. Download preview PDF.
References
Baccianella, S., Esuli, A., Sebastiani, F.: SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In: Proceedings of the 7th Conference on International Language Resources and Evaluation, pp. 2200–2204 (2010)
Bellman, R.E.: Adaptive Control Processes: A Guided Tour. Princeton University Press (1961)
Das, A., Bandyopadhyay, S.: SentiWordNet for indian languages. In: Proceedings of the 8th Workshop on Asian Language Resources, pp. 56–63 (2010)
Das, A., Bandyopadhyay, S.: Towards the global SentiWordNet. In: Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation, pp. 799–808 (2010)
Esuli, A.: Automatic Generation of Lexcial Resources for Opinion Mining: Models, Algorithms, and Application. PhD thesis, University of Pisa (2008)
Esuli, A., Sebastiani, F.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp. 347–354 (2005)
Esuli, A., Sebastiani, F.: SentiWordNet: A publicly available lexical resource for opinion mining. In: Proceedings of the 3rd Conference on International Language Resources and Evaluation, pp. 417–422 (2006)
Esuli, A., Sebastiani, F.: SentiWordNet: A high-coverage lexical resource for opinion mining. Technical Report 2007-TR-02, Istitutiodi Scienza e Technologie dell’Informazione, University of Pisa (2007)
Fagin, R., Kumar, R., Mahdian, M., Sivakumar, D., Vee, E.: Comparing and aggregating rankings with ties. In: Proceedings of ACM International Conference on Principles of Database Systems, pp. 47–58 (2004)
Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press (1997)
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. Journal of Machine Learning Research 2, 419–444 (2002)
Nguyen, C.-T., Phan, X.-H., Nguyen, T.-T.: JVnTextPro: A java-based vietnamese text processing tool (2010), http://jvntextpro.sourceforge.net/
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2(1-2), 1–135 (2008)
Turney, P.D., Littman, M.L.: Measuring praise and criticism: Inference of semantic orientation from association. ACM Transaction on Information Systems 21(4), 315–346 (2003)
Vu, X.-S., Park, S.-B.: Construction of vietnamese SentiWordNet by using vietnamese dictionary. In: Proceedings of the 40th Conference of the Korea Information Processing Society, pp. 745–748 (2014)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning, pp. 412–420 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Vu, XS., Song, HJ., Park, SB. (2014). Building a Vietnamese SentiWordNet Using Vietnamese Electronic Dictionary and String Kernel. In: Kim, Y.S., Kang, B.H., Richards, D. (eds) Knowledge Management and Acquisition for Smart Systems and Services. PKAW 2014. Lecture Notes in Computer Science(), vol 8863. Springer, Cham. https://doi.org/10.1007/978-3-319-13332-4_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-13332-4_18
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13331-7
Online ISBN: 978-3-319-13332-4
eBook Packages: Computer ScienceComputer Science (R0)