Skip to main content

Writeprint Mining For Authorship Attribution

  • 268 Accesses

Part of the International Series on Computer Entertainment and Media Technology book series (ISCEMT)

Abstract

This chapter presents a novel approach to frequent-pattern based Writeprint creation, and addresses two authorship problems: authorship attribution in the usual way (disregarding stylistic variation), and authorship attribution by focusing on stylistic variations. Stylistic variation is the occasional change in the writing features of an individual, with respect to the type of recipient and the topic of a message. The authorship methods proposed in this chapter and in the following chapters are applicable to different types of online messages; however, for the purposes of experimentation, an e-mail corpus has been used in this chapter, to demonstrate the efficacy of said methods.

Some contents in this chapter are developed based on the concepts discussed [64] F. Iqbal, R. Hadjidj, B. C. M. Fung, and M. Debbabi, “A novel approach of mining write-prints for authorship attribution in e-mail forensics,” Digit. Investig., vol. 5, pp. S42–S51, 2008.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-61675-5_5
  • Chapter length: 18 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   139.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-61675-5
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   179.99
Price excludes VAT (USA)
Hardcover Book
USD   179.99
Price excludes VAT (USA)
Fig. 5.1
Fig. 5.2

References

  1. R. Zheng, J. Li, H. Chen, Z. Huang, A framework for authorship identification of online messages: writing-style features and classification techniques. J. Am. Soc. Inf. Sci. Technol. 57(3), 378–393 (2006)

    CrossRef  Google Scholar 

  2. O. De Vel, A. Anderson, M. Corney, G. Mohay, Mining e-mail content for author identification forensics. ACM SIGMOD Rec. 30(4), 55–64 (2001)

    CrossRef  Google Scholar 

  3. F. Iqbal, R. Hadjidj, B.C.M. Fung, M. Debbabi, A novel approach of mining write-prints for authorship attribution in e-mail forensics. Digit. Investig. 5, S42–S51 (2008)

    CrossRef  Google Scholar 

  4. F. Iqbal, H. Binsalleeh, B.C.M. Fung, M. Debbabi, A unified data mining solution for authorship analysis in anonymous textual communications. Inf. Sci. (NY) 231, 98–112 (2013)

    CrossRef  Google Scholar 

  5. J.F. Burrows, Word-patterns and story-shapes: the statistical analysis of narrative style. Liter. Linguist. Comput. 2(2), 61–70 (1987)

    CrossRef  Google Scholar 

  6. G.U. Yule, On sentence-length as a statistical characteristic of style in prose: with application to two cases of disputed authorship. Biometrika 30(3/4), 363–390 (1939)

    CrossRef  Google Scholar 

  7. G.-F. Teng, M.-S. Lai, J.-B. Ma, Y. Li, E-mail authorship mining based on SVM for computer forensic, in Proceedings of 2004 International Conference on Machine Learning and Cybernetics, vol. 2 (2004), pp. 1204–1207

    Google Scholar 

  8. O. De Vel, Mining e-mail authorship, in Proc. Workshop on Text Mining, ACM International Conference on Knowledge Discovery and Data Mining (KDD’2000) (2000)

    Google Scholar 

  9. A. Abbasi, H. Chen, Writeprints: a stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Trans. Inf. Syst. 26(2), 7 (2008)

    CrossRef  Google Scholar 

  10. F.J. Tweedie, R.H. Baayen, How variable may a constant be? Measures of lexical richness in perspective. Comput. Hum. 32(5), 323–352 (1998)

    CrossRef  Google Scholar 

  11. R. Agrawal, T. Imieliński, A. Swami, Mining association rules between sets of items in large databases. ACM SIGMOD Rec 22(2), 207–216 (1993)

    CrossRef  Google Scholar 

  12. J. Han, J. Pei, Mining frequent patterns by pattern-growth: methodology and implications. ACM SIGKDD Explor. Newsl. 2(2), 14–20 (2000)

    MathSciNet  CrossRef  Google Scholar 

  13. M.J. Zaki, Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)

    CrossRef  Google Scholar 

  14. B.C.M. Fung, K. Wang, M. Ester, Hierarchical document clustering using frequent itemsets, in Proceedings of the 2003 SIAM International Conference on Data Mining (2003), pp. 59–70

    Google Scholar 

  15. J.D. Holt, S.M. Chung, Efficient mining of association rules in text databases, in Proceedings of the Eighth International Conference on Information and Knowledge Management (1999), pp. 234–242

    Google Scholar 

  16. H. Li, D. Shen, B. Zhang, Z. Chen, Q. Yang, Adding semantics to email clustering, in Sixth International Conference on Data Mining, 2006. ICDM’06 (2006), pp. 938–942

    Google Scholar 

  17. H. Baayen, H. Van Halteren, F. Tweedie, Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Liter. Linguist. Comput. 11(3), 121–132 (1996)

    CrossRef  Google Scholar 

  18. S.J. Stolfo, G. Creamer, S. Hershkop, A temporal based forensic analysis of electronic communication, in Proceedings of the 2006 International Conference on Digital Government Research (2006), pp. 23–24

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Verify currency and authenticity via CrossMark

Cite this chapter

Iqbal, F., Debbabi, M., Fung, B.C.M. (2020). Writeprint Mining For Authorship Attribution. In: Machine Learning for Authorship Attribution and Cyber Forensics. International Series on Computer Entertainment and Media Technology. Springer, Cham. https://doi.org/10.1007/978-3-030-61675-5_5

Download citation