Deep Dive into Authorship Verification of Email Messages with Convolutional Neural Network

  • Marina LitvakEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 898)


Authorship verification is the task of determining whether a specific individual did or did not write a text, which very naturally can be reduced to the binary-classification problem. This paper deals with the authorship verification of short email messages. Hereafter, we use “message” to identify the content of the information that is transmitted by email. The proposed method implements the binary classification with a sequence-to-sequence (seq2seq) model and trains a convolutional neural network (CNN) on positive (written by the “target” user) and negative (written by “someone else”) examples. The proposed method differs from previously published works, which represent text by numerous stylometric features, by requiring neither advanced text preprocessing nor explicit feature extraction. All messages are submitted to the CNN “as is,” after padding to the maximal length and replacing all words by their ID numbers. CNN learns the most appropriate features with backpropagation and then performs classification. The experiments performed on the Enron dataset using the TensorFlow framework show that the CNN classifier verifies message authorship very accurately.


Authorship verification Binary classification Convolutional neural network 



The author is grateful to Vlad Vavilin and Mark Mishaev for the implementation and running the experiments using the TensorFlow framework.


  1. 1.
    Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation OSDI 2016, pp. 265–283. USENIX Association, Berkeley (2016).
  2. 2.
    Britz, D.: Understanding convolutional neural networks for NLP (2015)Google Scholar
  3. 3.
    Brocardo, M.L., Traore, I., Woungang, I.: Authorship verification of e-mail and tweet messages applied for continuous authentication. J. Comput. Syst. Sci. 81(8), 1429–1440 (2015)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Brocardo, M.L., Traore, I., Woungang, I., Obaidat, M.S.: Authorship verification using deep belief network systems. Int. J. Commun. Syst. 30(12), e3259 (2017)CrossRefGoogle Scholar
  5. 5.
    Chen, X., Hao, P., Chandramouli, R., Subbalakshmi, K.P.: Authorship similarity detection from email messages. In: Perner, P. (ed.) MLDM 2011. LNCS (LNAI), vol. 6871, pp. 375–386. Springer, Heidelberg (2011). Scholar
  6. 6.
    Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(Aug), 2493–2537 (2011)zbMATHGoogle Scholar
  7. 7.
    Desmedt, Y.: Man-in-the-middle attack. In: van Tilborg, H.C.A. (ed.) Encyclopedia of Cryptography and Security. Springer, Boston (2005). Scholar
  8. 8.
    El Bouanani, S.E.M., Kassou, I.: Authorship analysis studies: a survey. Int. J. Comput. Appl. (0975 – 8887) 86(12), 22–29 (2014)Google Scholar
  9. 9.
    Iqbal, F., Khan, L.A., Fung, B., Debbabi, M.: E-mail authorship verification for forensic investigation. In: Proceedings of the 2010 ACM Symposium on Applied Computing, pp. 1591–1598. ACM (2010)Google Scholar
  10. 10.
    Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751 (2014)Google Scholar
  11. 11.
    Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: 2017 Proceedings of ICLR (2017)Google Scholar
  12. 12.
    Klimt, B., Yang, Y.: The enron corpus: a new dataset for email classification research. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 217–226. Springer, Heidelberg (2004). Scholar
  13. 13.
    Koppel, M., Schler, J.: Authorship verification as a one-class classification problem. In: Proceedings of the Twenty-First International Conference on Machine learning, p. 62. ACM (2004)Google Scholar
  14. 14.
    Li, J.S., Chen, L.C., Monaco, J.V., Singh, P., Tappert, C.C.: A comparison of classifiers and features for authorship authentication of social networking messages. Concurr. Comput.: Pract. Exp. 29(14), e3918 (2017)CrossRefGoogle Scholar
  15. 15.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  16. 16.
    Nirkhi, S.M., Dharaskar, R., Thakare, V.: Authorship identification using generalized features and analysis of computational method. Trans. Mach. Learn. Artif. Intell. 3(2), 41 (2015)Google Scholar
  17. 17.
    Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)Google Scholar
  18. 18.
    Polychronakis, M., Provos, N.: Ghost turns zombie: exploring the life cycle of web-based malware. LEET 8, 1–8 (2008)Google Scholar
  19. 19.
    Zhang, Y., Wallace, B.: A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820 (2015)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Shamoon College of EngineeringBeer ShevaIsrael

Personalised recommendations