Deep Context Identification of Deceptive Reviews Using Word Vectors

Zhang, Wen; Jiang, Yipan; Yoshida, Taketoshi

doi:10.1007/978-981-10-2857-1_19

Wen Zhang¹⁴,
Yipan Jiang¹⁴ &
Taketoshi Yoshida¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 660))

Included in the following conference series:

International Symposium on Knowledge and Systems Sciences

675 Accesses
1 Citations

Abstract

This paper proposes deep context by word vectors for deceptive review identification. The basic idea is that since deceptive reviews and truthful reviews are composed by writers without and with real experience, respectively, there should be different contexts of words used by them. Unlike previous work using the whole text collection to learn the word vectors, we produce two numerical vectors for each word by embedding contexts of words in deceptive and truthful reviews separately. Specifically, we propose a representation method called DCWord (Deep Context representation by Word vectors) to use average word vectors derived from deceptive and truthful contexts, respectively, to represent reviews for further classification. Then, we investigate three classifiers as support vector machine (SVM), simple logistic regression (LR) and back propagation neural network (BPNN) to identify the deceptive reviews. Experimental results on the Spam dataset demonstrate that by using the DCWord representation, SVM and LR have produced comparable performance and they outperform BPNN in deceptive review identification. The outcome of this study provides potential implications for online business intelligence in identifying deceptive reviews.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
USPTO stop words, online: https://www.uspto.gov/patft//help/stopword.htm.
2.
QTag for English part-of-speech, online: http://www.english.bham.ac.uk/staff/omason/software/qtag.html.
3.
Porter stemming algorithm, online: http://tartarus.org/martin/PorterStemmer/.

References

Chen, L., Wang, F.: Preference-based clustering reviews for augmenting e-commerce recommendation. Knowl. Based Syst. 50, 44–59 (2013)
Article Google Scholar
Marrese-Taylor, E., Velásquez, J.D., Bravo-Marquez, F., Matsuo, Y.: Identifying customer preferences about tourism products using an aspect-based opinion mining approach. Procedia Comput. Sci. 22, 182–191 (2013)
Article Google Scholar
B. Liu.: Opinion Spam Detection: Detecting Fake Reviews and Reviewers. https://www.cs.uic.edu/~liub/FBS/fake-reviews.html
Ott, M., Choi, Y., Cardie, C., Hancock, J.T.: Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, pp. 309–319, 19–24 June 2011
Google Scholar
Lim, Y.J., Osman, A., Salahuddin, S.N., Romle, A.R., Abdullah, S.: Factors influencing online shopping behavior: the mediating role of purchase intention. Procedia Econ. Finan. 35, 401–410 (2016)
Article Google Scholar
Jindal, N., Liu, B.: Opinion spam and analysis. In: Proceedings of WSDM 2008 (2008)
Google Scholar
Gokhman, S., Hancock, J., Prabhu, P., Ott, M., Cardie, C.: In search of a gold standard in studies of deception. In: Proceedings of the EACL 2012 Workshop on Computational Approaches to Deception Detection, Avignon, France, pp. 23–30, 23–27 April 2012
Google Scholar
Li, J., Ott, M., Cardie, C., Hovy, E.: Towards a general rule for identifying deceptive opinion spam. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 1566–1576 (2014)
Google Scholar
Feng, S., Banerjee, R., Choi, Y.: Syntactic stylometry for deception detection. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea, pp. 171–175, 8–14 July 2012
Google Scholar
Feng, V.W., Hirst, G.: Detecting deceptive opinions with profile compatibility. In: International Joint Conference on Natural Language Processing, Nagoya, Japan, pp. 338–346, 14–18 October 2013
Google Scholar
Zhou, L., Shi, Y., Zhang, D.: A statistical language modeling approach to online deception detection. IEEE Trans. Knowl. Data Eng. 20(8), 1077–1081 (2008)
Article Google Scholar
Li, F., Huang, M., Yang, Y., Zhu, X.: Learning to identifying review spam. In: Proceedings of IJCAI 2011 (2011)
Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167. ACM (2008)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed Representations of Words and Phrases and their Compositionality. arXiv:1310.4546 (2013)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 (2013)
Nitin, I., Fred, J.D., Zhang, T.: Text mining: predictive methods for analyzing unstructured information, pp. 15–37. Springer Science and Business Media, Inc., New York (2005)
MATH Google Scholar
Zhang, W., Yoshida, T., Tang, X.: Text classification based on multi-word with support vector machine. Knowl. Based Syst. 21(8), 879–886 (2008)
Article Google Scholar
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)
MathSciNet MATH Google Scholar
Hinton, G.E., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet MATH Google Scholar
Liu, Q., Gao, Z., Liu, B., Zhang, Y.: A logic programming approach to aspect extraction in opinion mining. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence (WI-2013) (2013)
Google Scholar

Download references

Acknowledgment

This research was supported in part by National Natural Science Foundation of China under Grant Nos. 71101138, 61379046, 91218301, 91318302 and 61432001; Beijing Natural Science Fund under Grant No. 4122087; the Fundamental Research Funds for the Central Universities (buctrc201504).

Author information

Authors and Affiliations

Research Center on Big Data Sciences, Beijing University of Chemical Technology, Beijing, 100029, People’s Republic of China
Wen Zhang & Yipan Jiang
School of Knowledge Science, Japan Advanced Institute of Science and Technology, 1-1 Ashahidai, Nomi, Ishikawa, 923-1292, Japan
Taketoshi Yoshida

Authors

Wen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yipan Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Taketoshi Yoshida
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wen Zhang .

Editor information

Editors and Affiliations

Tsinghua University , Beijing, China
Jian Chen
School of Knowledge Science, JAIST School of Knowledge Science, Nomi, Ishikawa, Japan
Yoshiteru Nakamori
Dept. Info Sci &, Fac. Sci & Engg, Konan University Dept. Info Sci &, Fac. Sci & Engg, Higashinada-ku, Kobe, Japan
Wuyi Yue
Chinese Academy of Sciences , Beijing, China
Xijin Tang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, W., Jiang, Y., Yoshida, T. (2016). Deep Context Identification of Deceptive Reviews Using Word Vectors. In: Chen, J., Nakamori, Y., Yue, W., Tang, X. (eds) Knowledge and Systems Sciences. KSS 2016. Communications in Computer and Information Science, vol 660. Springer, Singapore. https://doi.org/10.1007/978-981-10-2857-1_19

Download citation

DOI: https://doi.org/10.1007/978-981-10-2857-1_19
Published: 13 October 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2856-4
Online ISBN: 978-981-10-2857-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics