Deception detection in Twitter

  • Jalal S. AlowibdiEmail author
  • Ugo A. Buy
  • Philip S. Yu
  • Sohaib Ghani
  • Mohamed Mokbel
Original Article


Online Social Networks (OSNs) play a significant role in the daily life of hundreds of millions of people. However, many user profiles in OSNs contain deceptive information. Existing studies have shown that lying in OSNs is quite widespread, often for protecting a user’s privacy. In this paper, we propose a novel approach for detecting deceptive profiles in OSNs. We specifically define a set of analysis methods for detecting deceptive information about user genders and locations in Twitter. First, we collected a large dataset of Twitter profiles and tweets. Next, we defined methods for gender guessing from Twitter profile colors and names. Subsequently, we apply Bayesian classification and K-means clustering algorithms to Twitter profile characteristics (e.g., profile layout colors, first names, user names, and spatiotemporal information) and geolocations to analyze the user behavior. We establish the overall accuracy of each indicator through extensive experimentation with our crawled dataset. Based on the outcomes of our approach, we are able to detect deceptive profiles about gender and location with a reasonable accuracy.


Deception detection Gender classification Profile indicators Profile characteristics Profile classification Location classification Twitter 


  1. Al Zamal F, Liu W, Ruths D (2012) Homophily and latent attribute inference: inferring latent attributes of Twitter users from neighbors. In: 6th International AAAI Conference on Weblogs and Social Media (ICWSM’12), 2012Google Scholar
  2. Alowibdi JS, Buy UA, Yu PS (2013) Empirical evaluation of profile characteristics gender classification on Twitter. In: The 12th International Conference on Machine Learning and Applications (ICMLA), vol. 1, Dec 2013, pp 365–369Google Scholar
  3. Alowibdi JS, Buy UA, Yu PS (2013) Language independent gender classification on Twitter. In: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM’13, Aug 2013, pp 739–743Google Scholar
  4. Alowibdi JS, Buy UA, Yu PS, Stenneth L (2014) Detecting deception in online social networks. In: Advances in Social Networks Analysis and Mining (ASONAM), 2014 IEEE/ACM International Conference on, 2014Google Scholar
  5. AnchorFree-Inc. (2014) Hotspot shield,
  6. Authority AS (2013) Children and advertising on social media websites,
  7. Berthold MR, Cebron N, Dill F, Gabriel TR, Kötter T, Meinl T, Ohl P, Thiel K, Wiswedel B (2009) Knime-the konstanz information miner: version 2.0 and beyond. ACM SIGKDD Exp Newslett 11(1):26–31Google Scholar
  8. Burger JD, Henderson J, Kim G, Zarrella G (2011) Discriminating gender on Twitter. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Edinburgh, Scotland, UK. pp 1301–1309. [Online]. Available:
  9. Caspi A, Gorsky P (2006) Online deception: prevalence, motivation, and emotion. CyberPsychol Behav 9(1):54–59CrossRefGoogle Scholar
  10. Castelfranchi C, Tan YH (2001) The role of trust and deception in virtual societies. In: Proceedings of the 34th Annual Hawaii International Conference on System SciencesGoogle Scholar
  11. Castillo C, Mendoza M, Poblete B (2011) Information credibility on Twitter. In: Proceedings of the 20th ACM international conference on World wide web, 2011, pp. 675–684Google Scholar
  12. Cheng Z, Caverlee J, Lee K (2010) You are where you tweet: a content-based approach to geo-locating Twitter users. In: Proceedings of the 19th ACM international conference on Information and knowledge management, 2010, pp 759–768Google Scholar
  13. E.-M. of (2007) Social networking sites: Almost two thirds of users enter false information to protect identity,
  14. Guerrero LKK, Andersen PA, Afifi WA (2012) Close encounters: communication in relationships. Sage Publications, USAGoogle Scholar
  15. G. of canada (2014) Country travel advice and advisories,
  16. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Exp Newslett 11(1):10–18CrossRefGoogle Scholar
  17. Hancock JT, Curry L, Goorha S, Woodworth MT (2004) Lies in conversation: an examination of deception using automated linguistic analysis. In: Annual Conference of the Cognitive Science Society, vol. 26, 2004, pp 534–540Google Scholar
  18. Jurgens D (2013) Thats what friends are for: Inferring location in online social media platforms based on social relationships. In: Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media, 2013Google Scholar
  19. Lenhart A, Madden M (2007) Teens, privacy & online social networks,
  20. Liu W, Al Zamal F, Ruths D (2012) Using social media to infer gender composition of commuter populations. In: Proceedings of the When the City Meets the Citizen Workshop, the International Conference on Weblogs and Social Media, 2012Google Scholar
  21. Liu W, Ruths D (2013) Whats in a name? using first names as features for gender inference in Twitter. In; 2013 AAAI Spring Symposium Series, In Symposium on Analyzing Microtext, 2013Google Scholar
  22. Mislove A, Jørgensen SL, Ahn YY, Onnela JP, Rosenquist JN (2011) Understanding the demographics of Twitter users. In; 5th International AAAI Conference on Weblogs and Social Media (ICWSM’11), 2011, pp 554–557Google Scholar
  23. Newman ML, Pennebaker JW, Berry DS, Richards JM (2003) Lying words: predicting deception from linguistic styles. Personal Social Psychol Bulletin 29(5):665–675CrossRefGoogle Scholar
  24. Pennacchiotti M, Popescu AM (2011) A machine learning approach to Twitter user classification. In: proceedings of the International Conference on Weblogs and Social Media, 2011Google Scholar
  25. Rao D, Paul MJ, Fink C, Yarowsky D, Oates T, Coppersmith G (2011) Hierarchical bayesian models for latent attribute detection in social media. In: 5th International AAAI Conference on Weblogs and Social Media (ICWSM’11)Google Scholar
  26. Rao D, Yarowsky D, Shreevats A, Gupta M (2010) Classifying latent user attributes in Twitter. In: Proceedings of the 2nd international workshop on Search and mining user-generated contents, 2010, pp 37–44Google Scholar
  27. S. O. Newspaper (2014) Saudi top destinations abroad,
  28. Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on World wide web, 2010, pp 851–860Google Scholar
  29. Speech at CMU (2013) The CMU pronouncing dictionary
  30. T. Information and Research Centre (2014) Tourism information,
  31. Thomas K, McCoy D, Grier C, Kolcz A, Paxson V (2013) Trafficking fraudulent accounts: the role of the underground market in Twitter spam and abuse. In: USENIX Security Symposium, 2013Google Scholar
  32. Turner B (2010) Do people often lie on social networks?
  33. Warkentin D, Woodworth M, Hancock JT, Cormier N (2010) Warrants and deception in computer mediated communication. In: Proceedings of the 2010 ACM conference on Computer supported cooperative work, 2010, pp 9–12Google Scholar
  34. Yardi S, Romero D, Schoenebeck G et al (2009) Detecting spam in a Twitter network, First Monday 15(1)Google Scholar

Copyright information

© Springer-Verlag Wien 2015

Authors and Affiliations

  • Jalal S. Alowibdi
    • 1
    Email author
  • Ugo A. Buy
    • 2
  • Philip S. Yu
    • 2
  • Sohaib Ghani
    • 3
  • Mohamed Mokbel
    • 3
  1. 1.Faculty of Computing and Information TechnologyUniversity of JeddahJeddahSaudi Arabia
  2. 2.Department of Computer ScienceUniversity of Illinois at ChicagoChicagoUSA
  3. 3.KACST GIS Technology Innovation CenterUmm Al-Qura UniversityMakkahSaudi Arabia

Personalised recommendations