Advertisement

Computing

pp 1–15 | Cite as

Incorporating feature representation into BiLSTM for deceptive review detection

  • Wentao Liu
  • Weipeng JingEmail author
  • Yang Li
Article
  • 16 Downloads

Abstract

Consumers are increasingly influenced by product reviews when purchasing goods or services. At the same time, deceptive reviews usually mislead users. It is inefficient and inaccurate to manually identify deceptive reviews in massive reviews. Therefore, automatically identifying deceptive reviews has become a research trend. Most of existing methods are less effective since they are lack of deeply understanding of reviews. We propose a neural network method with bidirectional long short-term memory (BiLSTM) and feature combination to learn the representation of deceptive reviews. We conduct a large amount of experiments and demonstrate the effectiveness of our proposed method. Specifically, in the mixed-domain detection experiment, the results prove that our model is effective by making comparisons with other neural network-based methods. BiLSTM gives more than 3% improvement in F1 score compared with the most advanced neural network method. Since feature selection plays an important role in this direction, we combine features to improve the performance. Then we get 87.6% F1 value which outperforms the state-of-the-art method. Moreover, in the cross-domain detection experiment, our method achieves 82.4% F1 value which is about 6% higher than the state-of-the-art method on restaurant domain, and it is also robust on doctor domain.

Keywords

Deceptive review detection Bidirectional long short-term memory neural network Feature combination Representation learning 

Mathematics Subject Classification

68T50 

Notes

Acknowledgements

The work described in this paper is supported by National Natural Science Foundation of China (61806049), National Natural Science Foundation of China (31770768), the Natural Science Foundation of Heilongjiang Province of China (F2017001), Heilongjiang Province Applied Technology Research and Development Program Major Proje-ct (GA18B301), China State Forestry Administration Forestry Industry Public Welfare Project (201504307) and China Postdoctoral Science Foundation (2017M611407).

References

  1. 1.
    Streitfeld D (2012) For \$2 a star, an online retailer gets 5-star product reviews. New York Times (26)Google Scholar
  2. 2.
    Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. arXiv:1107.4557 [cs] pp 309–319
  3. 3.
    Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the international conference on web search and web data mining-WSDM ’08, pp 219–230Google Scholar
  4. 4.
    Nasraoui O (2008) Web data mining: exploring hyperlinks, contents, and usage data. ACM SIGKDD Explor Newsl 10(2):23CrossRefGoogle Scholar
  5. 5.
    Aghakhani H, Machiry A, Nilizadeh S, Kruegel C, Vigna G (2018) Detecting deceptive reviews using generative adversarial networks. In: 2018 IEEE security and privacy workshops (SPW), pp 89–95Google Scholar
  6. 6.
    Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1746–1751Google Scholar
  7. 7.
    Li L, Qin B, Ren W, Liu T (2017) Document representation and feature combination for deceptive spam review detection. Neurocomputing 254:33–41CrossRefGoogle Scholar
  8. 8.
    Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. AAAI 333:2267–2273Google Scholar
  9. 9.
    Elman JL (1990) Finding structure in time. Cogn Sci 14(2):179–211CrossRefGoogle Scholar
  10. 10.
    Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780 CrossRefGoogle Scholar
  11. 11.
    Li J, Ott M, Cardie C, Hovy E (2014) Towards a general rule for identifying deceptive opinion spam. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp 1566–1576Google Scholar
  12. 12.
    Qazvinian V, Rosengren E, Radev D R, Qiaozhu M (2011) Rumor has it: identifying misinformation in microblogs. In: Proceedings of the conference on empiricalmethods in natural language processing, pp 1589–1599Google Scholar
  13. 13.
    Chirita PA, Diederich J, Nejdl W (2005) MailRank: using ranking for spam detection. In: Proceedings of the 14th ACM international conference on information and knowledge management-CIKM ’05, pp 373–380Google Scholar
  14. 14.
    Krishnan V, Raj R (2006) Web spam detection with anti-trust rank. In: International workshop on AIRweb, pp 37–40Google Scholar
  15. 15.
    Cormack GV (2008) Email spam filtering: a systematic review. Found Trends Inf Retr 1(4):335–455CrossRefGoogle Scholar
  16. 16.
    Yoo KH, Gretzel U (2009) Comparison of deceptive and truthful travel reviews. In: Höpken W, Gretzel U, Law R (eds) Information and communication technologies in tourism 2009. Springer, Berlin, pp 37–47CrossRefGoogle Scholar
  17. 17.
    Ren Y, Ji D (2017) Neural networks for deceptive opinion spam detection: an empirical study. Inf Sci 385–386:213–224CrossRefGoogle Scholar
  18. 18.
    Prieto A, Prieto B, Ortigosa EM, Ros E, Pelayo F, Ortega J, Rojas I (2016) Neural networks: an overview of early research, current frameworks and new challenges. Neurocomputing 214:242–268CrossRefGoogle Scholar
  19. 19.
    Huang EH, Socher R, Manning CD, Ng AY (2012) Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th annual meeting of the association for computational linguistics: long papers, vol 1, pp 873–882Google Scholar
  20. 20.
    Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. Proc Empir Methods Nat Lang Process 12:1532–1543Google Scholar
  21. 21.
    Wang P, Xu B, Xu J, Tian G, Liu CL, Hao H (2016) Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing 174:806–814CrossRefGoogle Scholar
  22. 22.
    Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of the 2018 conference of the north American chapter of the association for computational linguistics: human language technologies, vol 1 (Long Papers), pp 2227–2237Google Scholar
  23. 23.
    Sivakumar S, Rajalakshmi R (2019) Comparative evaluation of various feature weighting methods on movie reviews. In: Behera HS, Nayak J, Naik B, Abraham A (eds) Computational intelligence in data mining. Springer, Singapore, pp 721–730CrossRefGoogle Scholar
  24. 24.
    Patro BN, Kurmi VK, Kumar S, Namboodiri VP (2018) Learning semantic sentence embeddings using sequential pair-wise discriminator. arXiv preprint arXiv:1806.00807
  25. 25.
    Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the north American chapter of the association for computational linguistics: human language technologies, pp 1480–1489Google Scholar
  26. 26.
    Newman ML, Pennebaker JW, Berry DS, Richards JM (2003) Lying words: predicting deception from linguistic styles. Personal Soc Psychol Bull 29(5):665–675CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Austria, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Information and Computer EngineeringNortheast Forestry UniversityHarbinChina

Personalised recommendations