Advertisement

Authorship Analysis of Social Media Contents Using Tone and Personality Features

  • Athira UshaEmail author
  • Sabu M. Thampi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10656)

Abstract

Online social networks have contributed to the countless services that ease human interaction. But the veil of anonymity has become a resort to majority of cyber criminals who indulge in unethical cyber activities. The availability of Wi-Fi hotspots and smart phones has made tracking the individuals behind the activities, a daunting task. To curtail the worst impact of these activities, one can make use of identifying authors of text contents in online social media, the only readily available imprint of an individual. Here we propose a novel authorship analysis technique applied on twitter data using tone based, personality based and stylistic features. We propose an authorship attribution scheme by training author data using Convolutional Neural Network pretrained on personality data and combines the features obtained from this model with the features obtained from another CNN architecture for tone analysis proposed by us. These features are combined together with hand crafted features pertaining to the stylistic aspects of the author and an SVM is trained on these feature combination. To the best of our knowledge this is the first work employing tone based and personality based features for attributing authorship. The new approach paves way for a fool proof authorship analysis mechanism that can be employed to curb security issues like hacked account. This is because the features chosen for our attribution method are difficult to be imitated as well as consciously controlled.

Keywords

Authorship analysis Personality Stylistics Convolutional neural network Tone analysis Personality identification 

References

  1. 1.
    Juola, P.: Authorship attribution. Found. Trends® Inf. Retr. 1(3), 233–334 (2008)CrossRefGoogle Scholar
  2. 2.
    Mosteller, F., Wallace, D.L.: Inference and Disputed Authorship: The Federalist. Addison-Wesley, Boston (1964)zbMATHGoogle Scholar
  3. 3.
    Luyckx, K., Daelemans, W: Authorship attribution and verification with many authors and limited data. In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1, pp. 513–520. Association for Computational Linguistics, August 2008Google Scholar
  4. 4.
    Jockers, M.L., Witten, D.M.: A comparative study of machine learning methods for authorship attribution. Lit. Linguist. Comput. 25(2), 215–223 (2010)CrossRefGoogle Scholar
  5. 5.
    Luyckx, K., Daelemans, W.: The effect of author set size and data size in authorship attribution. Lit. Linguist. Comput. 26(1), 35–55 (2010)CrossRefGoogle Scholar
  6. 6.
    Brennan, M.R., Greenstadt, R.: Practical attacks against authorship recognition techniques. In: IAAI, 14 July 2009Google Scholar
  7. 7.
    Bhargava, M., Mehndiratta, P., Asawa, K.: Stylometric analysis for authorship attribution on Twitter. In: Bhatnagar, V., Srinivasa, S. (eds.) BDA 2013. LNCS, vol. 8302, pp. 37–47. Springer, Cham (2013).  https://doi.org/10.1007/978-3-319-03689-2_3 CrossRefGoogle Scholar
  8. 8.
    Rocha, A., Scheirer, W.J., Forstall, C.W., Cavalcante, T., Theophilo, A., Shen, B., Stamatatos, E.: Authorship attribution for social media forensics. IEEE Trans. Inf. Forensics Secur. 12(1), 5–33 (2017). IEEECrossRefGoogle Scholar
  9. 9.
    Stamatatos, E.: A survey of modern authorship attribution methods. J. Assoc. Inf. Sci. Technol. 60(3), 538–556 (2009). Wiley Online LibraryCrossRefGoogle Scholar
  10. 10.
    Burrows, J.F.: Word-patterns and story-shapes: the statistical analysis of narrative style. Lit. Linguist. Comput. 2(2), 61–70 (1987)CrossRefGoogle Scholar
  11. 11.
    Koppel, M., Schler, J., Argamon, S.: Computational methods in authorship attribution. J. Assoc. Inf. Sci. Technol. 60(1), 9–26 (2009)CrossRefGoogle Scholar
  12. 12.
    Layton, R., Watters, P., Dazeley, R.: Authorship attribution for Twitter in 140 characters or less. In: 2010 Second Cybercrime and Trustworthy Computing Workshop (CTC), pp. 1–8. IEEE, July 2010Google Scholar
  13. 13.
    Albadarneh, J., Talafha, B., Al-Ayyoub, M., Zaqaibeh, B., Al-Smadi, M., Jararweh, Y., Benkhelifa, E.: Using big data analytics for authorship authentication of arabic tweets. In: IEEE/ACM International Conference on Utility and Cloud Computing, pp. 448–452. IEEE (2015)Google Scholar
  14. 14.
    Li, J.S., Chen, L.C., Monaco, J.V., Singh, P., Tappert, C.C.: A comparison of classifiers and features for authorship authentication of social networking messages. Concurr. Comput.: Pract. Exp., 29(14) (2017)Google Scholar
  15. 15.
    Barbon, S., Igawa, R.A., Bogaz Zarpelão, B.: Authorship verification applied to detection of compromised accounts on online social networks. Multimed. Tools Appl. 76(3), 3213–3233 (2017)CrossRefGoogle Scholar
  16. 16.
    Macke, S., Hirshman, J.: Deep Sentence-Level Authorship Attribution (2015). CS224Google Scholar
  17. 17.
    Digman, J.: Personality structure: emergence of the five-factor model. Ann. Rev. Psychol. 41, 417–440 (1990)CrossRefGoogle Scholar
  18. 18.
    Campbell, R.S., Pennebaker, J.W.: The secret life of pronouns: flexibility in writing style and physical health. Psychol. Sci. 14(1), 60–65 (2003)CrossRefGoogle Scholar
  19. 19.
    Pennebaker, J.W., Chung, C.K.: Computerized text analysis of Al-Qaeda transcripts. In: A Content Analysis Reader, pp. 453–465 (2008)Google Scholar
  20. 20.
    Hinton, G.E., et al.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)
  21. 21.
  22. 22.
    Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media Inc., Sebastopol (2009)zbMATHGoogle Scholar
  23. 23.
    Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemometr. Intell. Lab. Syst. 2(1–3), 37–52 (1987)CrossRefGoogle Scholar
  24. 24.
    Rappoport, A., Schwartz, R., Tsur, O., Koppel, M.: Authorship attribution of micro-messages, July 2013. http://u.cs.biu.ac.il/~koppel/papers/twitter_authorship_emnlp.pdf
  25. 25.
    Pennington, J., Socher, R., Manning, C.D.: GloVe: Global Vectors for Word Representation (2014)Google Scholar
  26. 26.
    Pennebaker, J.W., King, L.A.: Linguistic styles: language use as an individual difference. J. Pers. Soc. Psychol. 77(6), 1296–1312 (1999)CrossRefGoogle Scholar
  27. 27.
    Celli, F., Pianesi, F., Stillwell, D., Kosinski, M.: Workshop on computational personality recognition (shared task). In: Proceedings of WCPR13, in Conjunction with ICWSM 2013 (2013)Google Scholar
  28. 28.
    Majumder, N., Poria, S., Gelbukh, A., Cambria, E.: Deep learning-based document modeling for personality detection from text. IEEE Intell. Syst. 32(2), 74–79 (2017)CrossRefGoogle Scholar
  29. 29.
    Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
  30. 30.
    Nakov, P., Rosenthal, S., Kozareva, Z., Stoyanov, V., Ritter, A., Wilson, T.: Semeval-2013 task 2: sentiment analysis in Twitter. In: Proceedings of the International Workshop on Semantic Evaluation, vol. 13 (2013)Google Scholar
  31. 31.
    Rosenthal, S., Ritter, A., Nakov, P., Stoyanov, V.: Semeval-2014 task9: sentiment analysis in Twitter. In: International Workshop on Semantic Evaluation (SemEval 2014), pp. 73–80 (2014)Google Scholar
  32. 32.
    Loria, S., Keen, P., Honnibal, M., Yankovsky, R., Karesh, D., Dempsey, E.: TextBlob: simplified text processing. Secondary TextBlob: Simplified Text Processing (2014)Google Scholar
  33. 33.
    Aman, S., Szpakowicz, S.: Using roget’s thesaurus for fine-grained emotion recognition. In: IJCNLp, pp. 312–318 (2008)Google Scholar
  34. 34.
    Chen, L., Lee, C.M.: Convolutional neural network for humor recognition. arXiv preprint arXiv:1702.02584 (2017)
  35. 35.
    Bertero, D., Fung, P.: A long short-term memory framework for predicting humor in dialogues. In: HLT-NAACL, pp. 130–135 (2016)Google Scholar
  36. 36.
    Poria, S., Cambria, E., Hazarika, D., Vij, P.: A deeper look into sarcastic tweets using deep convolutional neural networks. In: COLING, Osaka, pp. 1601–1612 (2016)Google Scholar
  37. 37.
    Ghosh, A., Veale, T.: Fracking sarcasm using neural network. In: WASSA@ NAACL-HLT, pp. 161–169 (2016)Google Scholar
  38. 38.
    Tausczik, Y.R., Pennebaker, J.W.: The psychological meaning of words: LIWC and computerized text analysis methods. J. Lang. Soc. Psychol. 29(1), 24–54 (2010)CrossRefGoogle Scholar
  39. 39.
    Keretna, S., Hossny, A., Creighton, D.: Recognizing user identity in Twitter social networks via text mining. In: IEEE International Conference on Systems, Man, and Cybernetics, pp. 3079–3082. IEEE (2013)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Indian Institute of Information Technology and Management KeralaTrivandrumIndia
  2. 2.CSE, Faculty of Engineering and TechnologyUniversity of KeralaTrivandrumIndia

Personalised recommendations