Skip to main content

Writeprint Mining For Authorship Attribution

  • Chapter
  • First Online:
Machine Learning for Authorship Attribution and Cyber Forensics

Abstract

This chapter presents a novel approach to frequent-pattern based Writeprint creation, and addresses two authorship problems: authorship attribution in the usual way (disregarding stylistic variation), and authorship attribution by focusing on stylistic variations. Stylistic variation is the occasional change in the writing features of an individual, with respect to the type of recipient and the topic of a message. The authorship methods proposed in this chapter and in the following chapters are applicable to different types of online messages; however, for the purposes of experimentation, an e-mail corpus has been used in this chapter, to demonstrate the efficacy of said methods.

Some contents in this chapter are developed based on the concepts discussed [64] F. Iqbal, R. Hadjidj, B. C. M. Fung, and M. Debbabi, “A novel approach of mining write-prints for authorship attribution in e-mail forensics,” Digit. Investig., vol. 5, pp. S42–S51, 2008.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. R. Zheng, J. Li, H. Chen, Z. Huang, A framework for authorship identification of online messages: writing-style features and classification techniques. J. Am. Soc. Inf. Sci. Technol. 57(3), 378–393 (2006)

    Article  Google Scholar 

  2. O. De Vel, A. Anderson, M. Corney, G. Mohay, Mining e-mail content for author identification forensics. ACM SIGMOD Rec. 30(4), 55–64 (2001)

    Article  Google Scholar 

  3. F. Iqbal, R. Hadjidj, B.C.M. Fung, M. Debbabi, A novel approach of mining write-prints for authorship attribution in e-mail forensics. Digit. Investig. 5, S42–S51 (2008)

    Article  Google Scholar 

  4. F. Iqbal, H. Binsalleeh, B.C.M. Fung, M. Debbabi, A unified data mining solution for authorship analysis in anonymous textual communications. Inf. Sci. (NY) 231, 98–112 (2013)

    Article  Google Scholar 

  5. J.F. Burrows, Word-patterns and story-shapes: the statistical analysis of narrative style. Liter. Linguist. Comput. 2(2), 61–70 (1987)

    Article  Google Scholar 

  6. G.U. Yule, On sentence-length as a statistical characteristic of style in prose: with application to two cases of disputed authorship. Biometrika 30(3/4), 363–390 (1939)

    Article  Google Scholar 

  7. G.-F. Teng, M.-S. Lai, J.-B. Ma, Y. Li, E-mail authorship mining based on SVM for computer forensic, in Proceedings of 2004 International Conference on Machine Learning and Cybernetics, vol. 2 (2004), pp. 1204–1207

    Google Scholar 

  8. O. De Vel, Mining e-mail authorship, in Proc. Workshop on Text Mining, ACM International Conference on Knowledge Discovery and Data Mining (KDD’2000) (2000)

    Google Scholar 

  9. A. Abbasi, H. Chen, Writeprints: a stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Trans. Inf. Syst. 26(2), 7 (2008)

    Article  Google Scholar 

  10. F.J. Tweedie, R.H. Baayen, How variable may a constant be? Measures of lexical richness in perspective. Comput. Hum. 32(5), 323–352 (1998)

    Article  Google Scholar 

  11. R. Agrawal, T. Imieliński, A. Swami, Mining association rules between sets of items in large databases. ACM SIGMOD Rec 22(2), 207–216 (1993)

    Article  Google Scholar 

  12. J. Han, J. Pei, Mining frequent patterns by pattern-growth: methodology and implications. ACM SIGKDD Explor. Newsl. 2(2), 14–20 (2000)

    Article  MathSciNet  Google Scholar 

  13. M.J. Zaki, Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)

    Article  Google Scholar 

  14. B.C.M. Fung, K. Wang, M. Ester, Hierarchical document clustering using frequent itemsets, in Proceedings of the 2003 SIAM International Conference on Data Mining (2003), pp. 59–70

    Google Scholar 

  15. J.D. Holt, S.M. Chung, Efficient mining of association rules in text databases, in Proceedings of the Eighth International Conference on Information and Knowledge Management (1999), pp. 234–242

    Google Scholar 

  16. H. Li, D. Shen, B. Zhang, Z. Chen, Q. Yang, Adding semantics to email clustering, in Sixth International Conference on Data Mining, 2006. ICDM’06 (2006), pp. 938–942

    Google Scholar 

  17. H. Baayen, H. Van Halteren, F. Tweedie, Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Liter. Linguist. Comput. 11(3), 121–132 (1996)

    Article  Google Scholar 

  18. S.J. Stolfo, G. Creamer, S. Hershkop, A temporal based forensic analysis of electronic communication, in Proceedings of the 2006 International Conference on Digital Government Research (2006), pp. 23–24

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Iqbal, F., Debbabi, M., Fung, B.C.M. (2020). Writeprint Mining For Authorship Attribution. In: Machine Learning for Authorship Attribution and Cyber Forensics. International Series on Computer Entertainment and Media Technology. Springer, Cham. https://doi.org/10.1007/978-3-030-61675-5_5

Download citation

Publish with us

Policies and ethics