Abstract
This paper proposes deep context by word vectors for deceptive review identification. The basic idea is that since deceptive reviews and truthful reviews are composed by writers without and with real experience, respectively, there should be different contexts of words used by them. Unlike previous work using the whole text collection to learn the word vectors, we produce two numerical vectors for each word by embedding contexts of words in deceptive and truthful reviews separately. Specifically, we propose a representation method called DCWord (Deep Context representation by Word vectors) to use average word vectors derived from deceptive and truthful contexts, respectively, to represent reviews for further classification. Then, we investigate three classifiers as support vector machine (SVM), simple logistic regression (LR) and back propagation neural network (BPNN) to identify the deceptive reviews. Experimental results on the Spam dataset demonstrate that by using the DCWord representation, SVM and LR have produced comparable performance and they outperform BPNN in deceptive review identification. The outcome of this study provides potential implications for online business intelligence in identifying deceptive reviews.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
USPTO stop words, online: https://www.uspto.gov/patft//help/stopword.htm.
- 2.
QTag for English part-of-speech, online: http://www.english.bham.ac.uk/staff/omason/software/qtag.html.
- 3.
Porter stemming algorithm, online: http://tartarus.org/martin/PorterStemmer/.
References
Chen, L., Wang, F.: Preference-based clustering reviews for augmenting e-commerce recommendation. Knowl. Based Syst. 50, 44–59 (2013)
Marrese-Taylor, E., Velásquez, J.D., Bravo-Marquez, F., Matsuo, Y.: Identifying customer preferences about tourism products using an aspect-based opinion mining approach. Procedia Comput. Sci. 22, 182–191 (2013)
B. Liu.: Opinion Spam Detection: Detecting Fake Reviews and Reviewers. https://www.cs.uic.edu/~liub/FBS/fake-reviews.html
Ott, M., Choi, Y., Cardie, C., Hancock, J.T.: Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, pp. 309–319, 19–24 June 2011
Lim, Y.J., Osman, A., Salahuddin, S.N., Romle, A.R., Abdullah, S.: Factors influencing online shopping behavior: the mediating role of purchase intention. Procedia Econ. Finan. 35, 401–410 (2016)
Jindal, N., Liu, B.: Opinion spam and analysis. In: Proceedings of WSDM 2008 (2008)
Gokhman, S., Hancock, J., Prabhu, P., Ott, M., Cardie, C.: In search of a gold standard in studies of deception. In: Proceedings of the EACL 2012 Workshop on Computational Approaches to Deception Detection, Avignon, France, pp. 23–30, 23–27 April 2012
Li, J., Ott, M., Cardie, C., Hovy, E.: Towards a general rule for identifying deceptive opinion spam. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 1566–1576 (2014)
Feng, S., Banerjee, R., Choi, Y.: Syntactic stylometry for deception detection. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea, pp. 171–175, 8–14 July 2012
Feng, V.W., Hirst, G.: Detecting deceptive opinions with profile compatibility. In: International Joint Conference on Natural Language Processing, Nagoya, Japan, pp. 338–346, 14–18 October 2013
Zhou, L., Shi, Y., Zhang, D.: A statistical language modeling approach to online deception detection. IEEE Trans. Knowl. Data Eng. 20(8), 1077–1081 (2008)
Li, F., Huang, M., Yang, Y., Zhu, X.: Learning to identifying review spam. In: Proceedings of IJCAI 2011 (2011)
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167. ACM (2008)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed Representations of Words and Phrases and their Compositionality. arXiv:1310.4546 (2013)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 (2013)
Nitin, I., Fred, J.D., Zhang, T.: Text mining: predictive methods for analyzing unstructured information, pp. 15–37. Springer Science and Business Media, Inc., New York (2005)
Zhang, W., Yoshida, T., Tang, X.: Text classification based on multi-word with support vector machine. Knowl. Based Syst. 21(8), 879–886 (2008)
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)
Hinton, G.E., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Liu, Q., Gao, Z., Liu, B., Zhang, Y.: A logic programming approach to aspect extraction in opinion mining. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence (WI-2013) (2013)
Acknowledgment
This research was supported in part by National Natural Science Foundation of China under Grant Nos. 71101138, 61379046, 91218301, 91318302 and 61432001; Beijing Natural Science Fund under Grant No. 4122087; the Fundamental Research Funds for the Central Universities (buctrc201504).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhang, W., Jiang, Y., Yoshida, T. (2016). Deep Context Identification of Deceptive Reviews Using Word Vectors. In: Chen, J., Nakamori, Y., Yue, W., Tang, X. (eds) Knowledge and Systems Sciences. KSS 2016. Communications in Computer and Information Science, vol 660. Springer, Singapore. https://doi.org/10.1007/978-981-10-2857-1_19
Download citation
DOI: https://doi.org/10.1007/978-981-10-2857-1_19
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2856-4
Online ISBN: 978-981-10-2857-1
eBook Packages: Computer ScienceComputer Science (R0)