Skip to main content

Deep Context Identification of Deceptive Reviews Using Word Vectors

  • Conference paper
  • First Online:
Knowledge and Systems Sciences (KSS 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 660))

Included in the following conference series:

Abstract

This paper proposes deep context by word vectors for deceptive review identification. The basic idea is that since deceptive reviews and truthful reviews are composed by writers without and with real experience, respectively, there should be different contexts of words used by them. Unlike previous work using the whole text collection to learn the word vectors, we produce two numerical vectors for each word by embedding contexts of words in deceptive and truthful reviews separately. Specifically, we propose a representation method called DCWord (Deep Context representation by Word vectors) to use average word vectors derived from deceptive and truthful contexts, respectively, to represent reviews for further classification. Then, we investigate three classifiers as support vector machine (SVM), simple logistic regression (LR) and back propagation neural network (BPNN) to identify the deceptive reviews. Experimental results on the Spam dataset demonstrate that by using the DCWord representation, SVM and LR have produced comparable performance and they outperform BPNN in deceptive review identification. The outcome of this study provides potential implications for online business intelligence in identifying deceptive reviews.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    USPTO stop words, online: https://www.uspto.gov/patft//help/stopword.htm.

  2. 2.

    QTag for English part-of-speech, online: http://www.english.bham.ac.uk/staff/omason/software/qtag.html.

  3. 3.

    Porter stemming algorithm, online: http://tartarus.org/martin/PorterStemmer/.

References

  1. Chen, L., Wang, F.: Preference-based clustering reviews for augmenting e-commerce recommendation. Knowl. Based Syst. 50, 44–59 (2013)

    Article  Google Scholar 

  2. Marrese-Taylor, E., Velásquez, J.D., Bravo-Marquez, F., Matsuo, Y.: Identifying customer preferences about tourism products using an aspect-based opinion mining approach. Procedia Comput. Sci. 22, 182–191 (2013)

    Article  Google Scholar 

  3. B. Liu.: Opinion Spam Detection: Detecting Fake Reviews and Reviewers. https://www.cs.uic.edu/~liub/FBS/fake-reviews.html

  4. Ott, M., Choi, Y., Cardie, C., Hancock, J.T.: Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, pp. 309–319, 19–24 June 2011

    Google Scholar 

  5. Lim, Y.J., Osman, A., Salahuddin, S.N., Romle, A.R., Abdullah, S.: Factors influencing online shopping behavior: the mediating role of purchase intention. Procedia Econ. Finan. 35, 401–410 (2016)

    Article  Google Scholar 

  6. Jindal, N., Liu, B.: Opinion spam and analysis. In: Proceedings of WSDM 2008 (2008)

    Google Scholar 

  7. Gokhman, S., Hancock, J., Prabhu, P., Ott, M., Cardie, C.: In search of a gold standard in studies of deception. In: Proceedings of the EACL 2012 Workshop on Computational Approaches to Deception Detection, Avignon, France, pp. 23–30, 23–27 April 2012

    Google Scholar 

  8. Li, J., Ott, M., Cardie, C., Hovy, E.: Towards a general rule for identifying deceptive opinion spam. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 1566–1576 (2014)

    Google Scholar 

  9. Feng, S., Banerjee, R., Choi, Y.: Syntactic stylometry for deception detection. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea, pp. 171–175, 8–14 July 2012

    Google Scholar 

  10. Feng, V.W., Hirst, G.: Detecting deceptive opinions with profile compatibility. In: International Joint Conference on Natural Language Processing, Nagoya, Japan, pp. 338–346, 14–18 October 2013

    Google Scholar 

  11. Zhou, L., Shi, Y., Zhang, D.: A statistical language modeling approach to online deception detection. IEEE Trans. Knowl. Data Eng. 20(8), 1077–1081 (2008)

    Article  Google Scholar 

  12. Li, F., Huang, M., Yang, Y., Zhu, X.: Learning to identifying review spam. In: Proceedings of IJCAI 2011 (2011)

    Google Scholar 

  13. Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167. ACM (2008)

    Google Scholar 

  14. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed Representations of Words and Phrases and their Compositionality. arXiv:1310.4546 (2013)

  15. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 (2013)

  16. Nitin, I., Fred, J.D., Zhang, T.: Text mining: predictive methods for analyzing unstructured information, pp. 15–37. Springer Science and Business Media, Inc., New York (2005)

    MATH  Google Scholar 

  17. Zhang, W., Yoshida, T., Tang, X.: Text classification based on multi-word with support vector machine. Knowl. Based Syst. 21(8), 879–886 (2008)

    Article  Google Scholar 

  18. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)

    MathSciNet  MATH  Google Scholar 

  19. Hinton, G.E., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  20. Liu, Q., Gao, Z., Liu, B., Zhang, Y.: A logic programming approach to aspect extraction in opinion mining. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence (WI-2013) (2013)

    Google Scholar 

Download references

Acknowledgment

This research was supported in part by National Natural Science Foundation of China under Grant Nos. 71101138, 61379046, 91218301, 91318302 and 61432001; Beijing Natural Science Fund under Grant No. 4122087; the Fundamental Research Funds for the Central Universities (buctrc201504).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wen Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Zhang, W., Jiang, Y., Yoshida, T. (2016). Deep Context Identification of Deceptive Reviews Using Word Vectors. In: Chen, J., Nakamori, Y., Yue, W., Tang, X. (eds) Knowledge and Systems Sciences. KSS 2016. Communications in Computer and Information Science, vol 660. Springer, Singapore. https://doi.org/10.1007/978-981-10-2857-1_19

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-2857-1_19

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-2856-4

  • Online ISBN: 978-981-10-2857-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics