Authorship Analysis of Social Media Contents Using Tone and Personality Features

Usha, Athira; Thampi, Sabu M.

doi:10.1007/978-3-319-72389-1_18

Athira Usha^17,18 &
Sabu M. Thampi¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10656))

Included in the following conference series:

International Conference on Security, Privacy and Anonymity in Computation, Communication and Storage

1929 Accesses
6 Citations
1 Altmetric

Abstract

Online social networks have contributed to the countless services that ease human interaction. But the veil of anonymity has become a resort to majority of cyber criminals who indulge in unethical cyber activities. The availability of Wi-Fi hotspots and smart phones has made tracking the individuals behind the activities, a daunting task. To curtail the worst impact of these activities, one can make use of identifying authors of text contents in online social media, the only readily available imprint of an individual. Here we propose a novel authorship analysis technique applied on twitter data using tone based, personality based and stylistic features. We propose an authorship attribution scheme by training author data using Convolutional Neural Network pretrained on personality data and combines the features obtained from this model with the features obtained from another CNN architecture for tone analysis proposed by us. These features are combined together with hand crafted features pertaining to the stylistic aspects of the author and an SVM is trained on these feature combination. To the best of our knowledge this is the first work employing tone based and personality based features for attributing authorship. The new approach paves way for a fool proof authorship analysis mechanism that can be employed to curb security issues like hacked account. This is because the features chosen for our attribution method are difficult to be imitated as well as consciously controlled.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Juola, P.: Authorship attribution. Found. Trends® Inf. Retr. 1(3), 233–334 (2008)
Article Google Scholar
Mosteller, F., Wallace, D.L.: Inference and Disputed Authorship: The Federalist. Addison-Wesley, Boston (1964)
MATH Google Scholar
Luyckx, K., Daelemans, W: Authorship attribution and verification with many authors and limited data. In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1, pp. 513–520. Association for Computational Linguistics, August 2008
Google Scholar
Jockers, M.L., Witten, D.M.: A comparative study of machine learning methods for authorship attribution. Lit. Linguist. Comput. 25(2), 215–223 (2010)
Article Google Scholar
Luyckx, K., Daelemans, W.: The effect of author set size and data size in authorship attribution. Lit. Linguist. Comput. 26(1), 35–55 (2010)
Article Google Scholar
Brennan, M.R., Greenstadt, R.: Practical attacks against authorship recognition techniques. In: IAAI, 14 July 2009
Google Scholar
Bhargava, M., Mehndiratta, P., Asawa, K.: Stylometric analysis for authorship attribution on Twitter. In: Bhatnagar, V., Srinivasa, S. (eds.) BDA 2013. LNCS, vol. 8302, pp. 37–47. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-03689-2_3
Chapter Google Scholar
Rocha, A., Scheirer, W.J., Forstall, C.W., Cavalcante, T., Theophilo, A., Shen, B., Stamatatos, E.: Authorship attribution for social media forensics. IEEE Trans. Inf. Forensics Secur. 12(1), 5–33 (2017). IEEE
Article Google Scholar
Stamatatos, E.: A survey of modern authorship attribution methods. J. Assoc. Inf. Sci. Technol. 60(3), 538–556 (2009). Wiley Online Library
Article Google Scholar
Burrows, J.F.: Word-patterns and story-shapes: the statistical analysis of narrative style. Lit. Linguist. Comput. 2(2), 61–70 (1987)
Article Google Scholar
Koppel, M., Schler, J., Argamon, S.: Computational methods in authorship attribution. J. Assoc. Inf. Sci. Technol. 60(1), 9–26 (2009)
Article Google Scholar
Layton, R., Watters, P., Dazeley, R.: Authorship attribution for Twitter in 140 characters or less. In: 2010 Second Cybercrime and Trustworthy Computing Workshop (CTC), pp. 1–8. IEEE, July 2010
Google Scholar
Albadarneh, J., Talafha, B., Al-Ayyoub, M., Zaqaibeh, B., Al-Smadi, M., Jararweh, Y., Benkhelifa, E.: Using big data analytics for authorship authentication of arabic tweets. In: IEEE/ACM International Conference on Utility and Cloud Computing, pp. 448–452. IEEE (2015)
Google Scholar
Li, J.S., Chen, L.C., Monaco, J.V., Singh, P., Tappert, C.C.: A comparison of classifiers and features for authorship authentication of social networking messages. Concurr. Comput.: Pract. Exp., 29(14) (2017)
Google Scholar
Barbon, S., Igawa, R.A., Bogaz Zarpelão, B.: Authorship verification applied to detection of compromised accounts on online social networks. Multimed. Tools Appl. 76(3), 3213–3233 (2017)
Article Google Scholar
Macke, S., Hirshman, J.: Deep Sentence-Level Authorship Attribution (2015). CS224
Google Scholar
Digman, J.: Personality structure: emergence of the five-factor model. Ann. Rev. Psychol. 41, 417–440 (1990)
Article Google Scholar
Campbell, R.S., Pennebaker, J.W.: The secret life of pronouns: flexibility in writing style and physical health. Psychol. Sci. 14(1), 60–65 (2003)
Article Google Scholar
Pennebaker, J.W., Chung, C.K.: Computerized text analysis of Al-Qaeda transcripts. In: A Content Analysis Reader, pp. 453–465 (2008)
Google Scholar
Hinton, G.E., et al.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)
Twitter corpus (2015). https://github.com/bwbaugh/twitter-corpus/blob/master/twitter_corpus.py. Accessed 28 July 2017
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media Inc., Sebastopol (2009)
MATH Google Scholar
Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemometr. Intell. Lab. Syst. 2(1–3), 37–52 (1987)
Article Google Scholar
Rappoport, A., Schwartz, R., Tsur, O., Koppel, M.: Authorship attribution of micro-messages, July 2013. http://u.cs.biu.ac.il/~koppel/papers/twitter_authorship_emnlp.pdf
Pennington, J., Socher, R., Manning, C.D.: GloVe: Global Vectors for Word Representation (2014)
Google Scholar
Pennebaker, J.W., King, L.A.: Linguistic styles: language use as an individual difference. J. Pers. Soc. Psychol. 77(6), 1296–1312 (1999)
Article Google Scholar
Celli, F., Pianesi, F., Stillwell, D., Kosinski, M.: Workshop on computational personality recognition (shared task). In: Proceedings of WCPR13, in Conjunction with ICWSM 2013 (2013)
Google Scholar
Majumder, N., Poria, S., Gelbukh, A., Cambria, E.: Deep learning-based document modeling for personality detection from text. IEEE Intell. Syst. 32(2), 74–79 (2017)
Article Google Scholar
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Nakov, P., Rosenthal, S., Kozareva, Z., Stoyanov, V., Ritter, A., Wilson, T.: Semeval-2013 task 2: sentiment analysis in Twitter. In: Proceedings of the International Workshop on Semantic Evaluation, vol. 13 (2013)
Google Scholar
Rosenthal, S., Ritter, A., Nakov, P., Stoyanov, V.: Semeval-2014 task9: sentiment analysis in Twitter. In: International Workshop on Semantic Evaluation (SemEval 2014), pp. 73–80 (2014)
Google Scholar
Loria, S., Keen, P., Honnibal, M., Yankovsky, R., Karesh, D., Dempsey, E.: TextBlob: simplified text processing. Secondary TextBlob: Simplified Text Processing (2014)
Google Scholar
Aman, S., Szpakowicz, S.: Using roget’s thesaurus for fine-grained emotion recognition. In: IJCNLp, pp. 312–318 (2008)
Google Scholar
Chen, L., Lee, C.M.: Convolutional neural network for humor recognition. arXiv preprint arXiv:1702.02584 (2017)
Bertero, D., Fung, P.: A long short-term memory framework for predicting humor in dialogues. In: HLT-NAACL, pp. 130–135 (2016)
Google Scholar
Poria, S., Cambria, E., Hazarika, D., Vij, P.: A deeper look into sarcastic tweets using deep convolutional neural networks. In: COLING, Osaka, pp. 1601–1612 (2016)
Google Scholar
Ghosh, A., Veale, T.: Fracking sarcasm using neural network. In: WASSA@ NAACL-HLT, pp. 161–169 (2016)
Google Scholar
Tausczik, Y.R., Pennebaker, J.W.: The psychological meaning of words: LIWC and computerized text analysis methods. J. Lang. Soc. Psychol. 29(1), 24–54 (2010)
Article Google Scholar
Keretna, S., Hossny, A., Creighton, D.: Recognizing user identity in Twitter social networks via text mining. In: IEEE International Conference on Systems, Man, and Cybernetics, pp. 3079–3082. IEEE (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Indian Institute of Information Technology and Management Kerala, Trivandrum, India
Athira Usha & Sabu M. Thampi
CSE, Faculty of Engineering and Technology, University of Kerala, Trivandrum, India
Athira Usha

Authors

Athira Usha
View author publications
You can also search for this author in PubMed Google Scholar
Sabu M. Thampi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Athira Usha .

Editor information

Editors and Affiliations

Guangzhou University , Guangzhou, China
Guojun Wang
Edith Kinney Gaylord Presidential Professor, University of Oklahoma, Norman, Oklahoma, USA
Mohammed Atiquzzaman
Aalto University, Espoo, Finland
Zheng Yan
University of Texas at San Antonio, San Antonio, Texas, USA
Kim-Kwang Raymond Choo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Usha, A., Thampi, S.M. (2017). Authorship Analysis of Social Media Contents Using Tone and Personality Features. In: Wang, G., Atiquzzaman, M., Yan, Z., Choo, KK. (eds) Security, Privacy, and Anonymity in Computation, Communication, and Storage. SpaCCS 2017. Lecture Notes in Computer Science(), vol 10656. Springer, Cham. https://doi.org/10.1007/978-3-319-72389-1_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-72389-1_18
Published: 07 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-72388-4
Online ISBN: 978-3-319-72389-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics