Skip to main content

Author Attribution of Email Messages Using Parse-Tree Features

  • Conference paper
  • First Online:
  • 3087 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9166))

Abstract

Most existing research on authorship attribution uses various types of lexical, syntactic, and structural features for classification. Some of these features are not meaningful for small texts such as email messages. In this paper we demonstrate a very effective use of a syntactic feature of an author’s writing - text’s parse tree characteristics - for authorship analysis of email messages. We define author templates consisting of context free grammar (CFG) production frequencies occurring in an author’s training set of email messages. We then use similar frequencies extracted from a new email message to match against various authors’ templates to identify the best match. We evaluate our approach on Enron email dataset and show that CFG production frequencies work very well and are robust in attributing the authorship of email messages.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Hope, W., Holston, K.: The Shakespeare Controversy: An Analysis of the Authorship Theories. McFarland, Jefferson (2009)

    Google Scholar 

  2. Sousa Silva, R., Laboreiro, G., Sarmento, L., Grant, T., Oliveira, E., Maia, B.: ‘twazn me!!!;(’ Automatic authorship analysis of micro-blogging messages. In: Muñoz, R., Montoyo, A., Métais, E. (eds.) NLDB 2011. LNCS, vol. 6716, pp. 161–168. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  3. De Vel, O., Anderson, A., Corney, M., Mohay, G.: Mining e-mail content for author identification forensics. ACM Sigmod Rec. 30, 55–64 (2001)

    Article  Google Scholar 

  4. Gamon, M.: Linguistic correlates of style: authorship classification with deep linguistic analysis features. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 611. Association for Computational Linguistics, Stroudsburg (2004)

    Google Scholar 

  5. Baayen, R., Van Halteren, H., Tweedie, F.: Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Literary Linguist. Comput. 11, 121–131 (1996)

    Article  Google Scholar 

  6. Teng, G.F., Lai, M.S., Ma, J.B., Li, Y. :E-mail authorship mining based on SVM for computer forensic. In: Proceedings of 2004 International Conference on Machine Learning and Cybernetics, pp. 1204–1207. IEEE Press, New York (2004)

    Google Scholar 

  7. De Vel, O.: Mining e-mail authorship. In: Proceedings of Workshop on Text Mining, ACM 6th International Conference on Knowledge Discovery and Data Mining (2000)

    Google Scholar 

  8. Nizamani, S., Memon, N.: CEAI: CCM-based e-mail authorship identification model. Egypt. Inf. J. 14, 239–249 (2013)

    Article  Google Scholar 

  9. Zheng, R., Qin, Y., Huang, Z., Chen, H.: Authorship analysis in cybercrime investigation. In: Chen, H., Miranda, R., Zeng, D.D., Demchak, C.C., Schroeder, J., Madhusudan, T. (eds.) ISI 2003. LNCS, vol. 2665, pp. 59–73. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  10. Iqbal, F., Binsalleeh, H., Fung, B., Debbabi, M.: Mining writeprints from anonymous e-mails for forensic investigation. Digital Invest. 7, 56–64 (2010)

    Article  Google Scholar 

  11. Iqbal, F., Binsalleeh, H., Fung, B., Debbabi, M.: A unified data mining solution for authorship analysis in anonymous textual communications. Inf. Sci. 231, 98–112 (2013)

    Article  Google Scholar 

  12. Peng, F., Schuurmans, D., Wang, S., Keselj, V.: Language independent authorship attribution using character level language models. In: Proceedings of the 10th Conference on European Chapter of the Association for Computational Linguistics, pp. 267–274. Association for Computational Linguistics, Stroudsburg (2003)

    Google Scholar 

  13. Mosteller, F., Wallace, D.L.: Applied Bayesian and Classical Inference. Springer Series in Statistics. Springer, New York (1984)

    Book  MATH  Google Scholar 

  14. Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, pp. 423–430. Association for Computational Linguistics, Stroudsburg (2003)

    Google Scholar 

  15. Leibler, R.A., Kullback, S.: On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951)

    Article  MATH  MathSciNet  Google Scholar 

  16. Jeffreys, H.: An invariant form for the prior probability in estimation problems. Proc. Roy. Soc. Lon. 186, 453–461 (1946)

    Article  MATH  MathSciNet  Google Scholar 

  17. Inder, J.E.T.A.: New developments in generalized information measures. In: Hawkes, P.W. (ed.) Advances in Imaging and Electron Physics, vol. 91, pp. 37–135. Academic Press, New York (2006)

    Google Scholar 

  18. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raj Bhatnagar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Patchala, J., Bhatnagar, R., Gopalakrishnan, S. (2015). Author Attribution of Email Messages Using Parse-Tree Features. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2015. Lecture Notes in Computer Science(), vol 9166. Springer, Cham. https://doi.org/10.1007/978-3-319-21024-7_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-21024-7_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-21023-0

  • Online ISBN: 978-3-319-21024-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics