Abstract
Online social networks have contributed to the countless services that ease human interaction. But the veil of anonymity has become a resort to majority of cyber criminals who indulge in unethical cyber activities. The availability of Wi-Fi hotspots and smart phones has made tracking the individuals behind the activities, a daunting task. To curtail the worst impact of these activities, one can make use of identifying authors of text contents in online social media, the only readily available imprint of an individual. Here we propose a novel authorship analysis technique applied on twitter data using tone based, personality based and stylistic features. We propose an authorship attribution scheme by training author data using Convolutional Neural Network pretrained on personality data and combines the features obtained from this model with the features obtained from another CNN architecture for tone analysis proposed by us. These features are combined together with hand crafted features pertaining to the stylistic aspects of the author and an SVM is trained on these feature combination. To the best of our knowledge this is the first work employing tone based and personality based features for attributing authorship. The new approach paves way for a fool proof authorship analysis mechanism that can be employed to curb security issues like hacked account. This is because the features chosen for our attribution method are difficult to be imitated as well as consciously controlled.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
References
Juola, P.: Authorship attribution. Found. Trends® Inf. Retr. 1(3), 233–334 (2008)
Mosteller, F., Wallace, D.L.: Inference and Disputed Authorship: The Federalist. Addison-Wesley, Boston (1964)
Luyckx, K., Daelemans, W: Authorship attribution and verification with many authors and limited data. In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1, pp. 513–520. Association for Computational Linguistics, August 2008
Jockers, M.L., Witten, D.M.: A comparative study of machine learning methods for authorship attribution. Lit. Linguist. Comput. 25(2), 215–223 (2010)
Luyckx, K., Daelemans, W.: The effect of author set size and data size in authorship attribution. Lit. Linguist. Comput. 26(1), 35–55 (2010)
Brennan, M.R., Greenstadt, R.: Practical attacks against authorship recognition techniques. In: IAAI, 14 July 2009
Bhargava, M., Mehndiratta, P., Asawa, K.: Stylometric analysis for authorship attribution on Twitter. In: Bhatnagar, V., Srinivasa, S. (eds.) BDA 2013. LNCS, vol. 8302, pp. 37–47. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-03689-2_3
Rocha, A., Scheirer, W.J., Forstall, C.W., Cavalcante, T., Theophilo, A., Shen, B., Stamatatos, E.: Authorship attribution for social media forensics. IEEE Trans. Inf. Forensics Secur. 12(1), 5–33 (2017). IEEE
Stamatatos, E.: A survey of modern authorship attribution methods. J. Assoc. Inf. Sci. Technol. 60(3), 538–556 (2009). Wiley Online Library
Burrows, J.F.: Word-patterns and story-shapes: the statistical analysis of narrative style. Lit. Linguist. Comput. 2(2), 61–70 (1987)
Koppel, M., Schler, J., Argamon, S.: Computational methods in authorship attribution. J. Assoc. Inf. Sci. Technol. 60(1), 9–26 (2009)
Layton, R., Watters, P., Dazeley, R.: Authorship attribution for Twitter in 140 characters or less. In: 2010 Second Cybercrime and Trustworthy Computing Workshop (CTC), pp. 1–8. IEEE, July 2010
Albadarneh, J., Talafha, B., Al-Ayyoub, M., Zaqaibeh, B., Al-Smadi, M., Jararweh, Y., Benkhelifa, E.: Using big data analytics for authorship authentication of arabic tweets. In: IEEE/ACM International Conference on Utility and Cloud Computing, pp. 448–452. IEEE (2015)
Li, J.S., Chen, L.C., Monaco, J.V., Singh, P., Tappert, C.C.: A comparison of classifiers and features for authorship authentication of social networking messages. Concurr. Comput.: Pract. Exp., 29(14) (2017)
Barbon, S., Igawa, R.A., Bogaz Zarpelão, B.: Authorship verification applied to detection of compromised accounts on online social networks. Multimed. Tools Appl. 76(3), 3213–3233 (2017)
Macke, S., Hirshman, J.: Deep Sentence-Level Authorship Attribution (2015). CS224
Digman, J.: Personality structure: emergence of the five-factor model. Ann. Rev. Psychol. 41, 417–440 (1990)
Campbell, R.S., Pennebaker, J.W.: The secret life of pronouns: flexibility in writing style and physical health. Psychol. Sci. 14(1), 60–65 (2003)
Pennebaker, J.W., Chung, C.K.: Computerized text analysis of Al-Qaeda transcripts. In: A Content Analysis Reader, pp. 453–465 (2008)
Hinton, G.E., et al.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)
Twitter corpus (2015). https://github.com/bwbaugh/twitter-corpus/blob/master/twitter_corpus.py. Accessed 28 July 2017
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media Inc., Sebastopol (2009)
Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemometr. Intell. Lab. Syst. 2(1–3), 37–52 (1987)
Rappoport, A., Schwartz, R., Tsur, O., Koppel, M.: Authorship attribution of micro-messages, July 2013. http://u.cs.biu.ac.il/~koppel/papers/twitter_authorship_emnlp.pdf
Pennington, J., Socher, R., Manning, C.D.: GloVe: Global Vectors for Word Representation (2014)
Pennebaker, J.W., King, L.A.: Linguistic styles: language use as an individual difference. J. Pers. Soc. Psychol. 77(6), 1296–1312 (1999)
Celli, F., Pianesi, F., Stillwell, D., Kosinski, M.: Workshop on computational personality recognition (shared task). In: Proceedings of WCPR13, in Conjunction with ICWSM 2013 (2013)
Majumder, N., Poria, S., Gelbukh, A., Cambria, E.: Deep learning-based document modeling for personality detection from text. IEEE Intell. Syst. 32(2), 74–79 (2017)
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Nakov, P., Rosenthal, S., Kozareva, Z., Stoyanov, V., Ritter, A., Wilson, T.: Semeval-2013 task 2: sentiment analysis in Twitter. In: Proceedings of the International Workshop on Semantic Evaluation, vol. 13 (2013)
Rosenthal, S., Ritter, A., Nakov, P., Stoyanov, V.: Semeval-2014 task9: sentiment analysis in Twitter. In: International Workshop on Semantic Evaluation (SemEval 2014), pp. 73–80 (2014)
Loria, S., Keen, P., Honnibal, M., Yankovsky, R., Karesh, D., Dempsey, E.: TextBlob: simplified text processing. Secondary TextBlob: Simplified Text Processing (2014)
Aman, S., Szpakowicz, S.: Using roget’s thesaurus for fine-grained emotion recognition. In: IJCNLp, pp. 312–318 (2008)
Chen, L., Lee, C.M.: Convolutional neural network for humor recognition. arXiv preprint arXiv:1702.02584 (2017)
Bertero, D., Fung, P.: A long short-term memory framework for predicting humor in dialogues. In: HLT-NAACL, pp. 130–135 (2016)
Poria, S., Cambria, E., Hazarika, D., Vij, P.: A deeper look into sarcastic tweets using deep convolutional neural networks. In: COLING, Osaka, pp. 1601–1612 (2016)
Ghosh, A., Veale, T.: Fracking sarcasm using neural network. In: WASSA@ NAACL-HLT, pp. 161–169 (2016)
Tausczik, Y.R., Pennebaker, J.W.: The psychological meaning of words: LIWC and computerized text analysis methods. J. Lang. Soc. Psychol. 29(1), 24–54 (2010)
Keretna, S., Hossny, A., Creighton, D.: Recognizing user identity in Twitter social networks via text mining. In: IEEE International Conference on Systems, Man, and Cybernetics, pp. 3079–3082. IEEE (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Usha, A., Thampi, S.M. (2017). Authorship Analysis of Social Media Contents Using Tone and Personality Features. In: Wang, G., Atiquzzaman, M., Yan, Z., Choo, KK. (eds) Security, Privacy, and Anonymity in Computation, Communication, and Storage. SpaCCS 2017. Lecture Notes in Computer Science(), vol 10656. Springer, Cham. https://doi.org/10.1007/978-3-319-72389-1_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-72389-1_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-72388-4
Online ISBN: 978-3-319-72389-1
eBook Packages: Computer ScienceComputer Science (R0)