Cluster Computing

, Volume 22, Supplement 5, pp 12101–12110 | Cite as

An improved framework for authorship identification in online messages

  • L. SrinivasanEmail author
  • C. Nalini


The authorship identification will determine the likelihood of the writing produced, by an author, by means of examining the other writings. The rapid proliferation of technologies along with the applications of the internet, the misuse of online messages for the purpose of inappropriate or for illegal reasons is a major concern in society. The online message distribution and its anonymous nature will make the identity of tracing anyone of critical issue. The work has been developed using a framework for the identification of authorship of the online messages for addressing as well as tracing such problems. For this framework, identification of authorship is done by the four writing style features (the lexical, the syntactic, the structural, and the n-gram features) that are extracted and inductive learning algorithms have been used for building a feature based classification model for the identification of the authorship of the online messages. For this work, the C4.5, the fuzzy and also the Ada boost classifiers will be used for the task of authorship-identification. An experimental study on this framework with the effects of these classification techniques on online messages is evaluated.


Authorship identification Online messages Lexical features Syntactic features Structural features Ngram features C4.5 Adaboost and fuzzy classifier 


  1. 1.
    Zheng, R., Li, J., Chen, H., Huang, Z.: A framework for authorship identification of online messages: writing-style features and classification techniques. J. Am. Soc. Inf. Sci. Technol. 57(3), 378–393 (2006)CrossRefGoogle Scholar
  2. 2.
    Iqbal, F., Binsalleeh, H., Fung, B.C., Debbabi, M.: A unified data mining solution for authorship analysis in anonymous textual communications. Inf. Sci. 231, 98–112 (2013)CrossRefGoogle Scholar
  3. 3.
    Zhang, C., Wu, X., Niu, Z., Ding, W.: Authorship identification from unstructured texts. Knowl. Syst. 66, 99–111 (2014)CrossRefGoogle Scholar
  4. 4.
    Benjamin, V., Chung, W., Abbasi, A., Chuang, J., Larson, C.A., Chen, H.: Evaluating text visualization for authorship analysis. Secur. Inf. 3(1), 1 (2014)CrossRefGoogle Scholar
  5. 5.
    Cristani, M., Roffo, G., Segalin, C., Bazzani, L., Vinciarelli, A., Murino, V.: Conversationally-inspired stylometric features for authorship attribution in instant messaging. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 1121–1124. ACM (2012)Google Scholar
  6. 6.
    Nirkhi, S., Dharaskar, R.V.: Authorship identification in digital forensics using machine learning approach. Int. J. Latest Trends Eng. Technol. (IJLTET) 5(1) (2015)Google Scholar
  7. 7.
    Nirkhi, S., Dharaskar, R.V.: Comparative study of authorship identification techniques for cyber forensics analysis. arXiv:1401.6118 (2013)
  8. 8.
    Halvani, O., Steinebach, M., Zimmermann, R.: Authorship verification via k-nearest neighbor estimation. Notebook for pan at CLEF (2013)Google Scholar
  9. 9.
    Zamani, H., Esfahani, H.N., Babaie, P., Abnar, S., Dehghani, M., Shakery, A.: . Authorship identification using dynamic selection of features from probabilistic feature set. In: International Conference of the Cross-Language Evaluation Forum for European Languages, pp. 128–140. Springer, New York (2014)Google Scholar
  10. 10.
    Qian, T., Liu, B., Zhong, M., He, G.: Co-training on authorship attribution with very fewlabeled examples: methods vs. views. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 903–906. ACM (2014)Google Scholar
  11. 11.
    Ding, S.H., Fung, B., Debbabi, M.: A visualizable evidence-driven approach for authorship attribution. ACM Trans. Inf. Syst. Secur. 17(3), 12 (2015)CrossRefGoogle Scholar
  12. 12.
    Layton, R., Watters, P., Dazeley, R.: Recentred local profiles for authorship attribution. Nat. Lang. Eng. 18(03), 293–312 (2012)CrossRefGoogle Scholar
  13. 13.
    Ouamour, S., Sayoud, H.: Authorship attribution of ancient texts written by ten arabic travelers using a SMO-SVM classifier. In: IEEE International Conference on Communications and Information Technology (ICCIT), pp. 44–47 (2012)Google Scholar
  14. 14.
    Brocardo, M.L., Traore, I., Woungang, I.: Authorship verification of e-mail and tweet messages applied for continuous authentication. J. Comput. Syst. Sci. 81(8), 1429–1440 (2015)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Ouamour, S., Sayoud, H.: Authorship attribution of short historical arabic texts based on lexical features. In: IEEE International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), pp. 144–147 (2013)Google Scholar
  16. 16.
    Pandian, A., Ramalingam, V.V., Preet, R.V.: Authorship Identification for Tamil Classical Poem (Mukkoodar Pallu) Using C4. 5 Algorithm. Indian J. Sci. Technol. 9(47) (2016)Google Scholar
  17. 17.
    Marukatat, R., Somkiadcharoen, R., Nalintasnai, R., Aramboonpong, T.: Authorship attribution analysis of Thai online messages. In: IEEE International Conference on Information Science and Applications (ICISA), pp. 1–4 (2014)Google Scholar
  18. 18.
    Wang, L.Z.: News authorship identification with deep learning (2017)Google Scholar
  19. 19.
    Homem, N., Carvalho, J.P.: Authorship identification and author fuzzy “fingerprints”. In: Annual Meeting of the North American IEEE Fuzzy Information Processing Society (NAFIPS), pp. 1–6 (2011)Google Scholar
  20. 20.
    Luiz Brocardo, M., Traore, I., Saad, S., Woungang, I.: Verifying online user identity using stylometric analysis for short messages. J. Netw. 9(12), 3347–3355 (2014)Google Scholar
  21. 21.
    Brocardo, M.L., Traore, I., Woungang, I.: Toward a framework for continuous authentication using stylometry. In: IEEE 28th International Conference on Advanced Information Networking and Applications, pp. 106–115 (2014)Google Scholar
  22. 22.
    Tan, R.H.R., Tsai, F.S.: Authorship identification for online text. In: IEEE International Conference on Cyberworlds (CW), pp. 155–162 (2010)Google Scholar
  23. 23.
    El, S.E.M., Kassou, I.: Authorship analysis studies: a survey. Int. J. Comput. Appl. 86(12) (2014)Google Scholar
  24. 24.
    Mikros, G.K., Perifanos, K.: Authorship identification in large email collections: experiments using features that belong to different linguistic levels. Notebook for PAN at CLEF (2011)Google Scholar
  25. 25.
    Houvardas, J., Stamatatos, E.: N-gram feature selection for authorship identification. In: International Conference on Artificial Intelligence: Methodology, Systems, and Applications, pp. 77–86. Springer, Berlin (2006)Google Scholar
  26. 26.
    Hssina, B., Merbouha, A., Ezzikouri, H., Erritali, M.: A comparative study of decision tree ID3 and C4. 5. Int. J. Adv. Comput. Sci. Appl. 4(2) (2014)Google Scholar
  27. 27.
    Sharma, S., Agrawal, J., Sharma, S.: Classification through machine learning technique: C4. 5 algorithm based on various entropies. Int. J. Comput. Appl. 82(16) (2013)Google Scholar
  28. 28.
    Cintra, M.E., Monard, M.C., Camargo, H.A.: A fuzzy decision tree algorithm based on C4.5. Mathw. Soft Comput. 20, 56–62 (2013)Google Scholar
  29. 29.
    Kaur, E.N., Kaur, E.Y.: Object classification Techniques using Machine Learning Model. Int. J. Comput. Trends Technol. 18(4) (2014)Google Scholar
  30. 30.
    Schapire, R.E.: The boosting approach to machine learning: an overview. In: Nonlinear Estimation and Classification, pp. 149–171. Springer, New York (2003)Google Scholar
  31. 31.
    Pulkkinen, P., Koivisto, H.: Fuzzy classifier identification using decision tree and multiobjective evolutionary algorithms. Int. J. Approx. Reason. 48(2), 526–543 (2008)CrossRefGoogle Scholar
  32. 32.
    Elayidom, M.S., Jose, C., Puthussery, A., Sasi, N.K.: Text classification for authorship attribution analysis. arXiv:1310.4909 (2013)

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Authors and Affiliations

  1. 1.Dhirajlal Gandhi College of TechnologySalemIndia
  2. 2.Kongu Engineering CollegeErodeIndia

Personalised recommendations