Skip to main content

Authorship Analysis of Social Media Contents Using Tone and Personality Features

  • Conference paper
  • First Online:
Security, Privacy, and Anonymity in Computation, Communication, and Storage (SpaCCS 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10656))

Abstract

Online social networks have contributed to the countless services that ease human interaction. But the veil of anonymity has become a resort to majority of cyber criminals who indulge in unethical cyber activities. The availability of Wi-Fi hotspots and smart phones has made tracking the individuals behind the activities, a daunting task. To curtail the worst impact of these activities, one can make use of identifying authors of text contents in online social media, the only readily available imprint of an individual. Here we propose a novel authorship analysis technique applied on twitter data using tone based, personality based and stylistic features. We propose an authorship attribution scheme by training author data using Convolutional Neural Network pretrained on personality data and combines the features obtained from this model with the features obtained from another CNN architecture for tone analysis proposed by us. These features are combined together with hand crafted features pertaining to the stylistic aspects of the author and an SVM is trained on these feature combination. To the best of our knowledge this is the first work employing tone based and personality based features for attributing authorship. The new approach paves way for a fool proof authorship analysis mechanism that can be employed to curb security issues like hacked account. This is because the features chosen for our attribution method are difficult to be imitated as well as consciously controlled.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://cs229.stanford.edu/proj2012/CastroLindauerAuthorIdentificationOnTwitter.pdf.

  2. 2.

    http://www.sananalytics.com/lab/twitter-sentiment/.

  3. 3.

    https://www.crowdflower.com/wp-content/uploads/2016/07/text_emotion.csv.

  4. 4.

    http://emotion-research.net/toolbox/toolboxdatabase.2006-10-13.2581092615.

  5. 5.

    https://github.com/CrowdTruth/Short-Text-Corpus-For-Humor-Detection.

  6. 6.

    http://www.parrotanalytics.com/pacific-asia-knowledge-discovery-and-data-mining-conference-2016-contest/.

References

  1. Juola, P.: Authorship attribution. Found. Trends® Inf. Retr. 1(3), 233–334 (2008)

    Article  Google Scholar 

  2. Mosteller, F., Wallace, D.L.: Inference and Disputed Authorship: The Federalist. Addison-Wesley, Boston (1964)

    MATH  Google Scholar 

  3. Luyckx, K., Daelemans, W: Authorship attribution and verification with many authors and limited data. In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1, pp. 513–520. Association for Computational Linguistics, August 2008

    Google Scholar 

  4. Jockers, M.L., Witten, D.M.: A comparative study of machine learning methods for authorship attribution. Lit. Linguist. Comput. 25(2), 215–223 (2010)

    Article  Google Scholar 

  5. Luyckx, K., Daelemans, W.: The effect of author set size and data size in authorship attribution. Lit. Linguist. Comput. 26(1), 35–55 (2010)

    Article  Google Scholar 

  6. Brennan, M.R., Greenstadt, R.: Practical attacks against authorship recognition techniques. In: IAAI, 14 July 2009

    Google Scholar 

  7. Bhargava, M., Mehndiratta, P., Asawa, K.: Stylometric analysis for authorship attribution on Twitter. In: Bhatnagar, V., Srinivasa, S. (eds.) BDA 2013. LNCS, vol. 8302, pp. 37–47. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-03689-2_3

    Chapter  Google Scholar 

  8. Rocha, A., Scheirer, W.J., Forstall, C.W., Cavalcante, T., Theophilo, A., Shen, B., Stamatatos, E.: Authorship attribution for social media forensics. IEEE Trans. Inf. Forensics Secur. 12(1), 5–33 (2017). IEEE

    Article  Google Scholar 

  9. Stamatatos, E.: A survey of modern authorship attribution methods. J. Assoc. Inf. Sci. Technol. 60(3), 538–556 (2009). Wiley Online Library

    Article  Google Scholar 

  10. Burrows, J.F.: Word-patterns and story-shapes: the statistical analysis of narrative style. Lit. Linguist. Comput. 2(2), 61–70 (1987)

    Article  Google Scholar 

  11. Koppel, M., Schler, J., Argamon, S.: Computational methods in authorship attribution. J. Assoc. Inf. Sci. Technol. 60(1), 9–26 (2009)

    Article  Google Scholar 

  12. Layton, R., Watters, P., Dazeley, R.: Authorship attribution for Twitter in 140 characters or less. In: 2010 Second Cybercrime and Trustworthy Computing Workshop (CTC), pp. 1–8. IEEE, July 2010

    Google Scholar 

  13. Albadarneh, J., Talafha, B., Al-Ayyoub, M., Zaqaibeh, B., Al-Smadi, M., Jararweh, Y., Benkhelifa, E.: Using big data analytics for authorship authentication of arabic tweets. In: IEEE/ACM International Conference on Utility and Cloud Computing, pp. 448–452. IEEE (2015)

    Google Scholar 

  14. Li, J.S., Chen, L.C., Monaco, J.V., Singh, P., Tappert, C.C.: A comparison of classifiers and features for authorship authentication of social networking messages. Concurr. Comput.: Pract. Exp., 29(14) (2017)

    Google Scholar 

  15. Barbon, S., Igawa, R.A., Bogaz Zarpelão, B.: Authorship verification applied to detection of compromised accounts on online social networks. Multimed. Tools Appl. 76(3), 3213–3233 (2017)

    Article  Google Scholar 

  16. Macke, S., Hirshman, J.: Deep Sentence-Level Authorship Attribution (2015). CS224

    Google Scholar 

  17. Digman, J.: Personality structure: emergence of the five-factor model. Ann. Rev. Psychol. 41, 417–440 (1990)

    Article  Google Scholar 

  18. Campbell, R.S., Pennebaker, J.W.: The secret life of pronouns: flexibility in writing style and physical health. Psychol. Sci. 14(1), 60–65 (2003)

    Article  Google Scholar 

  19. Pennebaker, J.W., Chung, C.K.: Computerized text analysis of Al-Qaeda transcripts. In: A Content Analysis Reader, pp. 453–465 (2008)

    Google Scholar 

  20. Hinton, G.E., et al.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)

  21. Twitter corpus (2015). https://github.com/bwbaugh/twitter-corpus/blob/master/twitter_corpus.py. Accessed 28 July 2017

  22. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media Inc., Sebastopol (2009)

    MATH  Google Scholar 

  23. Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemometr. Intell. Lab. Syst. 2(1–3), 37–52 (1987)

    Article  Google Scholar 

  24. Rappoport, A., Schwartz, R., Tsur, O., Koppel, M.: Authorship attribution of micro-messages, July 2013. http://u.cs.biu.ac.il/~koppel/papers/twitter_authorship_emnlp.pdf

  25. Pennington, J., Socher, R., Manning, C.D.: GloVe: Global Vectors for Word Representation (2014)

    Google Scholar 

  26. Pennebaker, J.W., King, L.A.: Linguistic styles: language use as an individual difference. J. Pers. Soc. Psychol. 77(6), 1296–1312 (1999)

    Article  Google Scholar 

  27. Celli, F., Pianesi, F., Stillwell, D., Kosinski, M.: Workshop on computational personality recognition (shared task). In: Proceedings of WCPR13, in Conjunction with ICWSM 2013 (2013)

    Google Scholar 

  28. Majumder, N., Poria, S., Gelbukh, A., Cambria, E.: Deep learning-based document modeling for personality detection from text. IEEE Intell. Syst. 32(2), 74–79 (2017)

    Article  Google Scholar 

  29. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)

  30. Nakov, P., Rosenthal, S., Kozareva, Z., Stoyanov, V., Ritter, A., Wilson, T.: Semeval-2013 task 2: sentiment analysis in Twitter. In: Proceedings of the International Workshop on Semantic Evaluation, vol. 13 (2013)

    Google Scholar 

  31. Rosenthal, S., Ritter, A., Nakov, P., Stoyanov, V.: Semeval-2014 task9: sentiment analysis in Twitter. In: International Workshop on Semantic Evaluation (SemEval 2014), pp. 73–80 (2014)

    Google Scholar 

  32. Loria, S., Keen, P., Honnibal, M., Yankovsky, R., Karesh, D., Dempsey, E.: TextBlob: simplified text processing. Secondary TextBlob: Simplified Text Processing (2014)

    Google Scholar 

  33. Aman, S., Szpakowicz, S.: Using roget’s thesaurus for fine-grained emotion recognition. In: IJCNLp, pp. 312–318 (2008)

    Google Scholar 

  34. Chen, L., Lee, C.M.: Convolutional neural network for humor recognition. arXiv preprint arXiv:1702.02584 (2017)

  35. Bertero, D., Fung, P.: A long short-term memory framework for predicting humor in dialogues. In: HLT-NAACL, pp. 130–135 (2016)

    Google Scholar 

  36. Poria, S., Cambria, E., Hazarika, D., Vij, P.: A deeper look into sarcastic tweets using deep convolutional neural networks. In: COLING, Osaka, pp. 1601–1612 (2016)

    Google Scholar 

  37. Ghosh, A., Veale, T.: Fracking sarcasm using neural network. In: WASSA@ NAACL-HLT, pp. 161–169 (2016)

    Google Scholar 

  38. Tausczik, Y.R., Pennebaker, J.W.: The psychological meaning of words: LIWC and computerized text analysis methods. J. Lang. Soc. Psychol. 29(1), 24–54 (2010)

    Article  Google Scholar 

  39. Keretna, S., Hossny, A., Creighton, D.: Recognizing user identity in Twitter social networks via text mining. In: IEEE International Conference on Systems, Man, and Cybernetics, pp. 3079–3082. IEEE (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Athira Usha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Usha, A., Thampi, S.M. (2017). Authorship Analysis of Social Media Contents Using Tone and Personality Features. In: Wang, G., Atiquzzaman, M., Yan, Z., Choo, KK. (eds) Security, Privacy, and Anonymity in Computation, Communication, and Storage. SpaCCS 2017. Lecture Notes in Computer Science(), vol 10656. Springer, Cham. https://doi.org/10.1007/978-3-319-72389-1_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-72389-1_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-72388-4

  • Online ISBN: 978-3-319-72389-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics