Abstract
This chapter presents a novel approach to frequent-pattern based Writeprint creation, and addresses two authorship problems: authorship attribution in the usual way (disregarding stylistic variation), and authorship attribution by focusing on stylistic variations. Stylistic variation is the occasional change in the writing features of an individual, with respect to the type of recipient and the topic of a message. The authorship methods proposed in this chapter and in the following chapters are applicable to different types of online messages; however, for the purposes of experimentation, an e-mail corpus has been used in this chapter, to demonstrate the efficacy of said methods.
Some contents in this chapter are developed based on the concepts discussed [64] F. Iqbal, R. Hadjidj, B. C. M. Fung, and M. Debbabi, “A novel approach of mining write-prints for authorship attribution in e-mail forensics,” Digit. Investig., vol. 5, pp. S42–S51, 2008.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
R. Zheng, J. Li, H. Chen, Z. Huang, A framework for authorship identification of online messages: writing-style features and classification techniques. J. Am. Soc. Inf. Sci. Technol. 57(3), 378–393 (2006)
O. De Vel, A. Anderson, M. Corney, G. Mohay, Mining e-mail content for author identification forensics. ACM SIGMOD Rec. 30(4), 55–64 (2001)
F. Iqbal, R. Hadjidj, B.C.M. Fung, M. Debbabi, A novel approach of mining write-prints for authorship attribution in e-mail forensics. Digit. Investig. 5, S42–S51 (2008)
F. Iqbal, H. Binsalleeh, B.C.M. Fung, M. Debbabi, A unified data mining solution for authorship analysis in anonymous textual communications. Inf. Sci. (NY) 231, 98–112 (2013)
J.F. Burrows, Word-patterns and story-shapes: the statistical analysis of narrative style. Liter. Linguist. Comput. 2(2), 61–70 (1987)
G.U. Yule, On sentence-length as a statistical characteristic of style in prose: with application to two cases of disputed authorship. Biometrika 30(3/4), 363–390 (1939)
G.-F. Teng, M.-S. Lai, J.-B. Ma, Y. Li, E-mail authorship mining based on SVM for computer forensic, in Proceedings of 2004 International Conference on Machine Learning and Cybernetics, vol. 2 (2004), pp. 1204–1207
O. De Vel, Mining e-mail authorship, in Proc. Workshop on Text Mining, ACM International Conference on Knowledge Discovery and Data Mining (KDD’2000) (2000)
A. Abbasi, H. Chen, Writeprints: a stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Trans. Inf. Syst. 26(2), 7 (2008)
F.J. Tweedie, R.H. Baayen, How variable may a constant be? Measures of lexical richness in perspective. Comput. Hum. 32(5), 323–352 (1998)
R. Agrawal, T. Imieliński, A. Swami, Mining association rules between sets of items in large databases. ACM SIGMOD Rec 22(2), 207–216 (1993)
J. Han, J. Pei, Mining frequent patterns by pattern-growth: methodology and implications. ACM SIGKDD Explor. Newsl. 2(2), 14–20 (2000)
M.J. Zaki, Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)
B.C.M. Fung, K. Wang, M. Ester, Hierarchical document clustering using frequent itemsets, in Proceedings of the 2003 SIAM International Conference on Data Mining (2003), pp. 59–70
J.D. Holt, S.M. Chung, Efficient mining of association rules in text databases, in Proceedings of the Eighth International Conference on Information and Knowledge Management (1999), pp. 234–242
H. Li, D. Shen, B. Zhang, Z. Chen, Q. Yang, Adding semantics to email clustering, in Sixth International Conference on Data Mining, 2006. ICDM’06 (2006), pp. 938–942
H. Baayen, H. Van Halteren, F. Tweedie, Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Liter. Linguist. Comput. 11(3), 121–132 (1996)
S.J. Stolfo, G. Creamer, S. Hershkop, A temporal based forensic analysis of electronic communication, in Proceedings of the 2006 International Conference on Digital Government Research (2006), pp. 23–24
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Iqbal, F., Debbabi, M., Fung, B.C.M. (2020). Writeprint Mining For Authorship Attribution. In: Machine Learning for Authorship Attribution and Cyber Forensics. International Series on Computer Entertainment and Media Technology. Springer, Cham. https://doi.org/10.1007/978-3-030-61675-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-61675-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61674-8
Online ISBN: 978-3-030-61675-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)