Multimedia Tools and Applications

, Volume 76, Issue 3, pp 3213–3233 | Cite as

Authorship verification applied to detection of compromised accounts on online social networks

A continuous approach
  • Sylvio BarbonJr
  • Rodrigo Augusto Igawa
  • Bruno Bogaz Zarpelão


Compromising legitimate accounts has been the most used strategy to spread malicious content on OSN (Online Social Network). To address this problem, we propose a pure text mining approach to check if an account has been compromised based on its posts content. In the first step, the proposed approach extracts the writing style from the user account. The second step comprehends the k-Nearest Neighbors algorithm (k-NN) to evaluate the post content and identify the user. Finally, Baseline Updating (third step) consists of a continuous updating of the user baseline to support the current trends and seasonality issues of user’s posts. Experiments were carried out using a dataset from Twitter composed by tweets of 1000 users. All the three steps were individually evaluated, and the results show that the developed method is stable and can detect the compromised accounts. An important observation is the Baseline Updating contribution, which leads to an enhancement of accuracy superior of 60 %. Regarding average accuracy, the developed method achieved results over 93 %.


Compromised accounts Authorship verification Online social networks 


  1. 1.
    Aggarwal CC (2014) Data classification: algorithms and applications CRC PressGoogle Scholar
  2. 2.
    Argamon S, Šarić M, Stein SS (2003) Style mining of electronic messages for multiple authorship discrimination: first results. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 475–480Google Scholar
  3. 3.
    Bahrainian S-A, Dengel A (2013) Sentiment analysis Summarization of twitter data. In: 2013 IEEE 16th International conference on Computational Science and Engineering (CSE). IEEE, pp 227–234Google Scholar
  4. 4.
    Benevenuto F, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on twitter. In: Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), vol 6, p 12Google Scholar
  5. 5.
    Bhat SY, Abulaish M (2013) Community-based features for identifying spammers in online social networks. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining. ACM, pp 100–107Google Scholar
  6. 6.
    Bliss CA, Kloumann IM, Harris KD, Danforth CM, Dodds PS (2012) Twitter reciprocal reply networks exhibit assortativity with respect to happiness. J Comput Sci 3(5):388–397CrossRefGoogle Scholar
  7. 7.
    Brocardo ML, Traore I, Saad S, Woungang I (2013) Authorship verification for short messages using stylometry. In: Computer, Information and Telecommunication Systems (CITS) international conference on. IEEE, pp 1–6Google Scholar
  8. 8.
    Brocardo ML, Traore I, Woungang I (2014) Authorship verification of e-mail and tweet messages applied for continuous authentication. Journal of Computer and System Sciences pages –Google Scholar
  9. 9.
    Canales O, Monaco V, Murphy T, Zych E, Stewart J, Castro CTA, Sotoye O, Torres L, Truley G (2011) A stylometry system for authenticating students taking online tests. P. of Student-Faculty Research Day, Ed., CSIS. Pace UniversityGoogle Scholar
  10. 10.
    Cao Q, Sirivianos M, Yang X, Pregueiro T (2012) Aiding the detection of fake accounts in large scale social online services. In: Proceedings of the 9th USENIX conference on networked systems design and implementation. USENIX Association, pp 15–15Google Scholar
  11. 11.
    Chen X, Hao P, Chandramouli R, Subbalakshmi KP (2011) Authorship similarity detection from email messages. In: Machine learning and data mining in pattern recognition. Springer, pp 375–386Google Scholar
  12. 12.
    Cingiz MÖ, Diri B, Biricik G (2015) Am i typing fresh tweets: detecting up-to-dateness and worth of categorical information in microblogs. Expert Syst Appl 42(12):5256–5263CrossRefGoogle Scholar
  13. 13.
    Corney M, Vel OD, Anderson A, Mohay G (2002) Gender-preferential text mining of e-mail discourse. In: Computer security applications conference proceedings. 18th annual, pp 282–289Google Scholar
  14. 14.
    Cresci S, Pietro RD, Petrocchi M, Spognardi A, Tesconi M (2014) A fake follower story: improving fake accounts detection on twitter. IIT-CNR, Tech. Rep TR-03Google Scholar
  15. 15.
    da Silva NFF, Hruschka ER, Hruschka ER (2014) Tweet sentiment analysis with classifier ensembles. Decis Support Syst 66:170–179CrossRefGoogle Scholar
  16. 16.
    Derczynski L, Ritter A, Clark S, Bontcheva K (2013) Twitter part-of-speech tagging for all: overcoming sparse and noisy dataGoogle Scholar
  17. 17.
    Donais JA, Frost RA, Peelar SM, Roddy RA (2013) Summary: A system for the automated author attribution of text and instant messages. In: Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM international conference on. IEEE, pp 1484–1485Google Scholar
  18. 18.
    Duda RO, Hart PE, Stork DG (2012) Pattern Classification. Wiley, New YorkMATHGoogle Scholar
  19. 19.
    Egele M, Stringhini G, Kruegel C, Vigna G (2013) Compa: detecting compromised accounts on social networks. In: NDSSGoogle Scholar
  20. 20.
    El Manar El S, Kassou I (2014) Authorship analysis studies: a survey. Int J Comput Appl 86(12)Google Scholar
  21. 21.
    Fan X, Yuan C (2015) An improved lower bound for bayesian network structure learning. In: AAAI, pp 3526–3532Google Scholar
  22. 22.
    Fan X, Yuan C, Malone BM (2014) Tightening bounds for Bayesian network structure learning. In: AAAI, pp 2439–2445Google Scholar
  23. 23.
    Fersini E, Messina E, Pozzi FA (2014) Sentiment analysis Bayesian ensemble learning. Decis Support Syst 68:26–38CrossRefGoogle Scholar
  24. 24.
    Fong S, Zhuang Y, He J (2012) Not every friend on a social network can be trusted: classifying imposters using decision trees. In: 2012 International conference on future generation communication technology (FGCT), pp 58–63Google Scholar
  25. 25.
    Gao H, Hu J, Wilson C, Li Z, Chen Y, Zhao BY (2010) Detecting and characterizing social spam campaigns. In: Proceedings of the 10th ACM SIGCOMM conference on internet measurement. ACM, pp 35–47Google Scholar
  26. 26.
    Grier C, Thomas K, Paxson V, Zhang M (2010) @ spam: the underground on 140 characters or less. In: Proceedings of the 17th ACM conference on computer and communications security. ACM, pp 27–37Google Scholar
  27. 27.
    Hadjidj R, Debbabi M, Lounis H, Iqbal F, Szporer A, Benredjem D (2009) Towards an integrated e-mail forensic analysis framework. Digit Investig 5 (3):124–137CrossRefGoogle Scholar
  28. 28.
    Hassan A, Abbasi A, Zeng D (2013) Twitter sentiment analysis: a bootstrap ensemble framework. In: 2013 International conference on social computing (SocialCom). IEEE, pp 357–364Google Scholar
  29. 29.
    Hogenboom A, Frasincar F, Jong FD, Kaymak U (2015) Polarity classification using structure-based vector representations of text. Decis Support Syst 74:46–56CrossRefGoogle Scholar
  30. 30.
    Hsieh L-C, Lee C-W, Chiu T-H, Hsu W (2012) Live semantic sport highlight detection based on analyzing tweets of twitter. In: 2012 IEEE international conference on multimedia and expo (ICME). IEEE, pp 949–954Google Scholar
  31. 31.
    Igawa RA, Barbon Jr S, Paulo KCS, Kido GS, Guido RC, Júnior MLP, da Silva IN (2016) Account classification in online social networks with lbca and wavelets. Inf Sci 332:72–83CrossRefGoogle Scholar
  32. 32.
    Igawa RA, de Almeida AMG, Zarpelao BB, Barbon Jr S (2015) Recognition of compromised accounts on twitter. In: Proceedings of the annual conference on Brazilian symposium on information systems: information systems: a computer socio-technical perspective. SBSI 2015, vol 1. Brazilian Computer Society, Porto Alegre, Brazil, Brazil, pp 2:9–2:14Google Scholar
  33. 33.
    Iqbal F, Binsalleeh H, Fung BCM, Debbabi M (2010) Mining writeprints from anonymous e-mails for forensic investigation. Digit Investig 7(1):56–64CrossRefGoogle Scholar
  34. 34.
    Iqbal F, Binsalleeh H, Fung BCM, Debbabi M (2013) A unified data mining solution for authorship analysis in anonymous textual communications. Inf Sci 231:98–112CrossRefGoogle Scholar
  35. 35.
    Iqbal F, Hadjidj R, Fung BCM, Debbabi M (2008) A novel approach of mining write-prints for authorship attribution in e-mail forensics. Digit Investig 5:S42–S51CrossRefGoogle Scholar
  36. 36.
    Iqbal F, Khan LA, Fung B, Debbabi M (2010) E-mail authorship verification for forensic investigation. In: Proceedings of the ACM symposium on applied computing. ACM, pp 1591–1598Google Scholar
  37. 37.
    Jiang M, Cui P, Beutel A, Faloutsos C, Yang S (2014) Detecting suspicious following behavior in multimillion-node social networks. In: Proceedings of the companion publication of the 23rd international conference on world wide web companion. International World Wide Web Conferences Steering Committee, pp 305–306Google Scholar
  38. 38.
    Keretna S, Hossny A, Creighton D (2013) Recognising user identity in twitter social networks via text mining. In: 2013 IEEE International conference on systems, man, and cybernetics (SMC). IEEE, pp 3079–3082Google Scholar
  39. 39.
    Koppel M, Argamon S, Shimoni AR (2002) Automatically categorizing written texts by author gender. Literary Linguistic Comput 17(4):401–412CrossRefGoogle Scholar
  40. 40.
    Koppel M, Schler J (2004) Authorship verification as a one-class classification problem. In: Proceedings of the twenty-first international conference on machine learning. ACM, p 62Google Scholar
  41. 41.
    Koppel M, Schler J, Argamon S (2009) Computational methods in authorship attribution. J Am Soc Inf Sci Technol 60(1):9–26CrossRefGoogle Scholar
  42. 42.
    Kucukyilmaz T, Barla Cambazoglu B, Aykanat C, Can F (2008) Chat mining: predicting user and message attributes in computer-mediated communication. Inf Process Manag 44(4):1448–1466CrossRefGoogle Scholar
  43. 43.
    Layton R, Watters P, Dazeley R (2010) Authorship attribution for twitter in 140 characters or less. In: 2010 Second cybercrime and trustworthy computing workshop (CTC). IEEE, pp 1–8Google Scholar
  44. 44.
    Lee K, Caverlee J, Webb S (2010) Uncovering social spammers: social honeypots + machine learning. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 435–442Google Scholar
  45. 45.
    Li R, Wang S, Deng H, Wang R, Chang K C-C (2012) Towards social user profiling: unified and discriminative influence model for inferring home locations. In: KDD, pp 1023–1031Google Scholar
  46. 46.
    Li X, Wang M, Liang T-P (2014) A multi-theoretical kernel-based approach to social network-based recommendation. Decis Support Syst 65:95–104CrossRefGoogle Scholar
  47. 47.
    Liao H-Y, Chen K-Y, Liu D-R (2015) Virtual friend recommendations in virtual worlds. Decis Support Syst 69:59–69CrossRefGoogle Scholar
  48. 48.
    Liu Z, Yang Z, Liu S, Shi Y (2013) Semi-random subspace method for writeprint identification. Neurocomputing 108:93–102CrossRefGoogle Scholar
  49. 49.
    Lumezanu C, Feamster N (2012) Observing common spam in tweets and email. In: Proc. IMC. CiteseerGoogle Scholar
  50. 50.
    Martinez-Romo J, Araujo L (2013) Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Syst Appl 40(8):2992–3000CrossRefGoogle Scholar
  51. 51.
    Mostafa MM (2013) More than words: social networks text mining for consumer brand sentiments. Expert Syst Appl 40(10):4241–4251CrossRefGoogle Scholar
  52. 52.
    Neme A, Pulido JRG, Muoz A, Hernn̈dez S, Dey T (2015) Stylistics analysis and authorship attribution algorithms based on self-organizing maps. Neurocomputing 147:147–159. Advances in self-organizing maps subtitle of the special issue: selected papers from the workshop on self-organizing maps 2012 (WSOM 2012)CrossRefGoogle Scholar
  53. 53.
    Potha N, Stamatatos E (2014) A profile-based method for authorship verification. In: Likas A, Blekas K, Kalles D (eds) Artificial intelligence: methods and applications, volume 8445 of lecture notes in computer science, pp 313–326. Springer International PublishingGoogle Scholar
  54. 54.
    Qian T, Liu B, Li C, Peng Z, Zhong M, He G, Li X, Gang X (2015) Tri-training for authorship attribution with limited training data: a comprehensive study. Neurocomputing pages –Google Scholar
  55. 55.
    Ramezani R, Sheydaei N, Kahani M (2013) Evaluating the effects of textual features on authorship attribution accuracy. In: 2013 3th International eConference on computer and knowledge engineering (ICCKE). IEEE, pp 108–113Google Scholar
  56. 56.
    Santos I, Miñambres-Marcos I, Laorden C, Galán-García P, Santamaría-Ibirika A, Bringas P (2014) Twitter content-based spam filtering. In: International Joint Conference SOCO13-CISIS13-ICEUTE13. Springer, pp 449–458Google Scholar
  57. 57.
    Smailović J, Grčar M, Lavrač N, žnidaršič M (2014) Stream-based active learning for sentiment analysis in the financial domain. Information SciencesGoogle Scholar
  58. 58.
    Song J, Lee S, Kim J (2011) Spam filtering in twitter using sender-receiver relationship. In: Recent advances in intrusion detection. Springer, pp 301–317Google Scholar
  59. 59.
    Stein T, Chen E, Mangla K (2011) Facebook immune system. In: Proceedings of the 4th workshop on social network systems. ACM, p 8Google Scholar
  60. 60.
    Sun J, Yang Z, Wang P, Liu S (2010) Variable length character n-gram approach for online writeprint identification. In: International conference on multimedia information networking and security (MINES). IEEE, pp 486–490Google Scholar
  61. 61.
    Theodoridis S, Pikrakis A, Koutroumbas K, Cavouras D (2010) Introduction to pattern recognition: a Matlab approach: a Matlab approach. Academic PressGoogle Scholar
  62. 62.
    Weathers D, Swain SD, Grover V (2015) Can online product reviews be more helpful? Examining characteristics of information content by product type. Decis Support Syst 79:12–23CrossRefGoogle Scholar
  63. 63.
    Yu SJ (2012) The dynamic competitive recommendation algorithm in social network services. Inf Sci 187:1–14CrossRefGoogle Scholar
  64. 64.
    Zadeh AH, Sharda R (2014) Modeling brand post popularity dynamics in online social networks. Decis Support Syst 65:59–68CrossRefGoogle Scholar
  65. 65.
    Zangerle E, Specht G (2014) Sorry, I was hacked: a classification of compromised twitter accounts. In: Proceedings of the 29th annual ACM symposium on applied computing. ACM, pp 587–593Google Scholar
  66. 66.
    Zappavigna M (2011) Ambient affiliation: a linguistic perspective on twitter. New Media Soc 13(5): 788–806CrossRefGoogle Scholar
  67. 67.
    Zhang C, Xindong W, Niu Z, Ding W (2014) Authorship identification from unstructured texts Knowledge-based systemsGoogle Scholar
  68. 68.
    Zhang Z, Wang K (2013) A trust model for multimedia social networks. Soc Netw Anal Min 3(4): 969–979CrossRefGoogle Scholar
  69. 69.
    Zhang Z, Liu Y, Ding W, Huang WW, Qin S, Chen P (2015) Proposing a new friend recommendation method, frutai, to enhance social media providers’ performance. Decis Support Syst 79:46–54CrossRefGoogle Scholar
  70. 70.
    Zhou X, Sai W, Chen C, Chen G, Ying S (2014) Real-time recommendation for microblogs. Inf Sci 279:301–325CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Sylvio BarbonJr
    • 1
  • Rodrigo Augusto Igawa
    • 1
  • Bruno Bogaz Zarpelão
    • 1
  1. 1.Londrina State UniversityLondrinaBrazil

Personalised recommendations