Cross-domain deception detection using support vector networks

Hernández-Castañeda, Ángel; Calvo, Hiram; Gelbukh, Alexander; Flores, Jorge J. García

doi:10.1007/s00500-016-2409-2

Cross-domain deception detection using support vector networks

Focus
Published: 02 November 2016

Volume 21, pages 585–595, (2017)
Cite this article

Soft Computing Aims and scope Submit manuscript

Ángel Hernández-Castañeda¹,
Hiram Calvo ORCID: orcid.org/0000-0003-2836-2102¹,
Alexander Gelbukh¹ &
…
Jorge J. García Flores²

1504 Accesses
26 Citations
Explore all metrics

Abstract

Our motivation is to assess the effectiveness of support vector networks (SVN) on the task of detecting deception in texts, as well as to investigate to which degree it is possible to build a domain-independent detector of deception in text using SVN. We experimented with different feature sets for training the SVN: a continuous semantic space model source represented by the latent Dirichlet allocation topics, a word-space model, and dictionary-based features. In this way, a comparison of performance between semantic information and behavioral information is made. We tested several combinations of these features on different datasets designed to identify deception. The datasets used include the DeRev dataset (a corpus of deceptive and truthful opinions about books obtained from Amazon), OpSpam (a corpus of fake and truthful opinions about hotels), and three corpora on controversial topics (abortion, death penalty, and a best friend) on which the subjects were asked to write an idea contrary to what they really believed. We experimented with one-domain setting by training and testing our models separately on each dataset (with fivefold cross-validation), with mixed-domain setting by merging all datasets into one large corpus (again, with fivefold cross-validation), and with cross-domain setting: using one dataset for testing and a concatenation of all other datasets for training. We obtained an average accuracy of 86% in one-domain setting, 75% in mixed-domain setting, and 52 to 64% in cross-domain setting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fake news, disinformation and misinformation in social media: a review

Article 09 February 2023

A survey on sentiment analysis methods, applications, and challenges

Article 07 February 2022

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

Notes

References

Almela A, Valencia-García R, Cantos P (2012) Seeing through deception: a computational approach to deceit detection in written communication. In: Proceedings of the EACL 2012 workshop on computational approaches to deception detection. Avignon, France, pp 15–22
Al-Shammari ET, Keivani A, Shamshirband S, Mostafaeipour A, Yee L, Petković D, Ch S (2016) Prediction of heat load in district heating systems by support vector machine with firefly searching algorithm. Energy 95:266–273
Article Google Scholar
Al-Shammari ET, Mohammadi K, Keivani A, Ab Hamid SH, Akib S, Shamshirband S, Petković D (2016b) Prediction of daily dewpoint temperature using a model combining the support vector machine with firefly algorithm. J Irrig Drain Eng 142(5):04016013
Article Google Scholar
Altameem TA, Nikolić V, Shamshirband S, Petković D, Javidnia H, Kiah MLM, Gani A (2015) Potential of support vector regression for optimization of lens system. Comput Aided Des 62:57–63
Article Google Scholar
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
DePaulo BM, Lindsay JJ, Malone BE, Muhlenbruck L, Charton K, Cooper H (2003) Cues to deception. Psychol Bull 129:74–118
Article Google Scholar
Ekman P (1989) Why lies fail and what behaviors betray a lie. In: Yuille JC (ed) Credibility assessment. Kluwer, New York, pp 71–81
Chapter Google Scholar
Feng S, Banerjee R, Choi Y (2012) Syntactic stylometry for deception detection. In: Proceedings of the 50th annual meeting of the association for computational linguistics, Republic of Korea, Jeju, pp 171–175
Fornaciari T, Poesio M (2014) Identifying fake amazon reviews as learning from crowds. In: Proceedings of the 14th conference of the European chapter of the association for computational linguistics, Gothenburg, Sweden, pp 279–287
Gani A, Mohammadi K, Shamshirband S, Altameem TA, Petković D, Ch S (2016) A combined method to estimate wind speed distribution based on integrating the support vector machine with firefly algorithm. Environ Prog Sustain Energ 35: 867–875. doi:10.1002/ep.12262
Gocic M, Shamshirband S, Razak Z, Petković D, Ch S, Trajkovic S (2016) Long-term precipitation analysis and estimation of precipitation concentration index using three support vector machine methods. Adv Meteorol 2016:7912357. doi:10.1155/2016/7912357
Gokhman S, Hancock J, Prabhu P, Ott M, Cardie C (2012) In search of a gold standard in studies of deception. In: Proceedings of the EACL 2012 workshop on computational approaches to deception detection, Avignon, France, pp 23–30
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
Hall MA (1999) Correlation-based feature selection for machine learning. Ph.D. Thesis. Department of Computer Science, University of Waikato
Hall MA, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11(1):10–18
Article Google Scholar
Hauch V, Blandón-Gitlin I, Masip J, Sporer SL (2012) Linguistic cues to deception assessed by computer programs: a meta-analysis. In: Proceedings of the EACL 2012 workshop on computational approaches to deception detection, Avignon, France. Association for Computational Linguistics, pp 1–4
Hernández Fusilier D, Montes-y-Gómez M, Rosso P, Guzmán Cabrera R (2015) Detection of opinion spam with character n-grams. In: Proceedings of 16th international conference on intelligent text processing and computational linguistics, Cairo, Egypt, pp 285–294
Jović S, Danesh AS, Younesi E, Aničić O, Petković D, Shamshirband S (2016a) Forecasting of underactuated robotic finger contact forces by support vector regression methodology. Int J Pattern Recognit Artif Intell 30(7). doi:10.1142/S0218001416590199
Jović S, Radović A, Šarkoćević Ž, Petković D, Alizamir M (2016b) Estimation of the laser cutting operating cost by support vector regression methodology. Appl Phys A 122(9):798
Keila PS, Skillicorn DB (2005) Detecting unusual email communication. In: CASCON 2005, pp 117–125
Kisi O, Shiri J, Karimi S, Shamshirband S, Motamedi S, Petković D, Hashim R (2015) A survey of water level fluctuation predicting in Urmia Lake using support vector machine with firefly algorithm. Appl Math Comput 270:731–743
Google Scholar
Mihalcea R, Strapparava C (2009) The lie detector: explorations in the automatic recognition of deceptive language. In: Proceedings of the ACL-IJCNLP, ACL-IJCNLP. Suntec, Singapore, pp 309–312
Mohammadi K, Shamshirband S, Anisi MH, Alam KA, Petković D (2015) Support vector regression based prediction of global solar radiation on a horizontal surface. Energy Convers Manag 91:433–441
Article Google Scholar
Mohammadi K, Shamshirband S, Tong CW, Arif M, Petković D, Ch S (2015) A new hybrid support vector machine–wavelet transform approach for estimation of horizontal global solar radiation. Energy Convers Manag 92:162–171
Article Google Scholar
Newman ML, Pennebaker JW, Berry DS, Richards JM (2003) Lying words: predicting deception from linguistic styles. Personal Soc Psychol Bull 29(5):665
Article Google Scholar
Olatomiwa L, Mekhilef S, Shamshirband S, Petkovic D (2015) Potential of support vector regression for solar radiation prediction in Nigeria. Nat Hazards 77(2):1055–1068
Article Google Scholar
Olatomiwa L, Mekhilef S, Shamshirband S, Mohammadi K, Petković D, Sudheer C (2015) A support vector machine–firefly algorithm-based model for global solar radiation prediction. Solar Energy 115:632–644
Article Google Scholar
Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, Portland, Oregon, pp 309–319
Pennebaker JW, Chung CK, Ireland M, Gonzales A, Booth RJ (2007) The development and psychometric properties of liwc2007. LIWC.Net, Austin
Google Scholar
Pérez-Rosas V, Mihalcea R (2014a) Cross-cultural deception detection. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, pp 440–445
Pérez-Rosas V, Mihalcea R (2014b) Gender differences in deceivers writing style. 2014. In: 13th Mexican international conference on artificial intelligence. Tuxtla Gutiérrez, Chiapas, México, pp 163–174
Petković D, Shamshirband S, Saboohi H, Ang TF, Anuar NB, Pavlović ND (2014) Support vector regression methodology for prediction of input displacement of adaptive compliant robotic gripper. Appl Intell 41(3):887–896
Article Google Scholar
Petković D, Shamshirband S, Saboohi H, Ang TF, Anuar NB, Rahman ZA, Pavlović NT (2014b) Evaluation of modulation transfer function of optical lens system by support vector regression methodologies–a comparative study. Infrared Phys Technol 65:94–102
Article Google Scholar
Piri J, Shamshirband S, Petković D, Tong CW, ur Rehman MH (2015) Prediction of the solar radiation on the Earth using support vector regression technique. Infrared Phys Technol 68:179–185
Article Google Scholar
Protić M, Shamshirband S, Petković D et al (2015) Forecasting of consumers heat load in district heating systems using the support vector machine with a discrete wavelet transform algorithm. Energy 87:343–351
Article Google Scholar
Schelleman-Offermans K, Merckelbach H (2010) Fantasy proneness as a confounder of verbal lie detection tools. J Investig Psychol Offender Profiling 7:247–260
Article Google Scholar
Shamshirband S, Mohammadi K, Khorasanizadeh H, Yee L, Lee M, Petković D, Zalnezhad E (2016) Estimating the diffuse solar radiation using a coupled support vector machine–wavelet transform model. Renew Sustain Energy Rev 56:428–435
Article Google Scholar
Shamshirband S, Petković D, Amini A et al (2014) Support vector regression methodology for wind turbine reaction torque prediction with power-split hydrostatic continuous variable transmission. Energy 67:623–630
Article Google Scholar
Shamshirband S, Petković D, Javidnia H, Gani A (2015) Sensor data fusion by support vector regression methodology—a comparative study. IEEE Sens J 15(2):850–854
Article Google Scholar
Shamshirband S, Petković D, Pavlović NT, Ch S, Altameem TA, Gani A (2015) Support vector machine firefly algorithm based optimization of lens system. Appl Opt 54(1):37–45
Article Google Scholar
Shamshirband S, Tabatabaei M, Aghbashlo M, Yee L, Petković D (2016b) Support vector machine-based exergetic modelling of a DI diesel engine running on biodiesel-diesel blends containing expanded polystyrene. Appl Therm Eng 94:727–747
Article Google Scholar
Shenify M, Danesh AS, Gocić M, Taher RS et al (2016) Precipitation estimation using support vector machine with discrete wavelet transform. Water Resour Manag 30(2):641–652
Article Google Scholar
Toma CL, Hancock JT (2012) What lies beneath: the linguistic traces of deception in online dating profiles. J Commun 62:78–97
Article Google Scholar
Twitchell DP, Nunamaker JF Jr, Burgoon JK (2004) Using speech act profiling for deception detection. In: Intelligence and security informatics: second symposium on intelligence and security informatics, ISI 2004, pp 403–410
Williams SM, Talwar V, Lindsay RCL, Bala N, Lee K (2014) Is the truth in your words? Distinguishing children’s deceptive and truthful statements. J Criminol 2014:547519. doi:10.1155/2014/547519
Xu Q, Zhao H (2012) Using deep linguistic features for finding deceptive opinion spam. In: Proceedings of COOLING, pp 1341–1350

Download references

Acknowledgments

We thank Instituto Politécnico Nacional (SIP, COFAA and BEIFI), and SNI. Partially funded by CONACyT (Language Technologies Thematic Network Projects 260178, 271622) and SIP Project Number 20162058

Author information

Authors and Affiliations

Instituto Politécnico Nacional, Center for Computing Research CIC-IPN, Av. J.D. Bátiz e/ M.O. de Mendizábal, 07738, Mexico City, Mexico
Ángel Hernández-Castañeda, Hiram Calvo & Alexander Gelbukh
Laboratoire d’Informatique de Paris Nord, CNRS (UMR 7030), Université Paris 13, Sorbonne Paris Cité, 93430, Villetaneuse, France
Jorge J. García Flores

Authors

Ángel Hernández-Castañeda
View author publications
You can also search for this author in PubMed Google Scholar
Hiram Calvo
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Gelbukh
View author publications
You can also search for this author in PubMed Google Scholar
Jorge J. García Flores
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hiram Calvo.

Ethics declarations

Conflict of interest

Authors declare they have no conflict of interest.

Human and animal rights

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Communicated by H. Ponce.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hernández-Castañeda, Á., Calvo, H., Gelbukh, A. et al. Cross-domain deception detection using support vector networks. Soft Comput 21, 585–595 (2017). https://doi.org/10.1007/s00500-016-2409-2

Download citation

Published: 02 November 2016
Issue Date: February 2017
DOI: https://doi.org/10.1007/s00500-016-2409-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross-domain deception detection using support vector networks

Abstract

Access this article

Similar content being viewed by others

Fake news, disinformation and misinformation in social media: a review

A survey on sentiment analysis methods, applications, and challenges

Natural language processing: state of the art, current trends and challenges

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cross-domain deception detection using support vector networks

Abstract

Access this article

Similar content being viewed by others

Fake news, disinformation and misinformation in social media: a review

A survey on sentiment analysis methods, applications, and challenges

Natural language processing: state of the art, current trends and challenges

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation