Abstract
In the previous chapters, the different aspects of the authorship analysis problem were discussed. This chapter will propose a framework for extracting criminal information from the textual content of suspicious online messages. Archives of online messages, including chat logs, e-mails, web forums, and blogs, often contain an enormous amount of forensically relevant information about potential suspects and their illegitimate activities. Such information is usually found in either the header or body of an online document. The IP addresses, hostnames, sender and recipient addresses contained in the e-mail header, the user ID used in chats, and the screen names used in web-based communication help reveal information at the user or application level. For instance, information extracted from a suspicious e-mail corpus helps us to learn who the senders and recipients are, how often they communicate, and how many types of communities/cliques there are in a dataset. Such information also gives us an insight into the inter and intra-community patterns of communication. A clique or a community is a group of users who have an online communication link between them. Header content or user-level information is easy to extract and straightforward to use for the purposes of investigation.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
T. Kucukyilmaz, B.B. Cambazoglu, C. Aykanat, F. Can, Chat mining: predicting user and message attributes in computer-mediated communication. Inf. Process. Manag. 44(4), 1448–1466 (2008)
O. De Vel, A. Anderson, M. Corney, G. Mohay, Mining e-mail content for author identification forensics. ACM SIGMOD Rec. 30(4), 55–64 (2001)
R. Zheng, Y. Qin, Z. Huang, H. Chen, Authorship analysis in cybercrime investigation, in International Conference on Intelligence and Security Informatics (2003), pp. 59–73
E. Alfonseca, S. Manandhar, An unsupervised method for general named entity recognition and automated concept discovery, in Proceedings of the 1st International Conference on General WordNet, Mysore, India (2002), pp. 34–43
E. Minkov, R.C. Wang, W.W. Cohen, Extracting personal names from email: applying named entity recognition to informal text, in Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (2005), pp. 443–450
G. Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval Of (Read. Addison-Wesley, 1989)
R. Agrawal, T. Imieliński, A. Swami, Mining association rules between sets of items in large databases. ACM SIGMOD Rec 22(2), 207–216 (1993)
H. Chen, W. Chung, J.J. Xu, G. Wang, Y. Qin, M. Chau, Crime data mining: a general framework and some examples. Computer (Long. Beach. Calif). 37(4), 50–56 (2004)
R. Zheng, J. Li, H. Chen, Z. Huang, A framework for authorship identification of online messages: writing-style features and classification techniques. J. Am. Soc. Inf. Sci. Technol. 57(3), 378–393 (2006)
A. Abbasi, H. Chen, Writeprints: a stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Trans. Inf. Syst. 26(2), 7 (2008)
D.P. Chris et al., Another stemmer. ACM SIGIR Forum 24(3), 56–61 (1990)
M.F. Porter, An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
V.A. Yatsko, T.N. Vishnyakov, A method for evaluating modern systems of automatic text summarization. Autom. Doc. Math. Linguist. 41(3), 93–103 (2007)
M.-H. Antoni-Lay, G. Francopoulo, L. Zaysser, A generic model for reuseable lexicons: the Genelex project (1994)
S.D. Kamvar, D. Klein, C.D. Manning, Interpreting and extending classical agglomerative clustering algorithms using a model-based approach (2002)
J. Heer, S.K. Card, J.A. Landay, Prefuse: a toolkit for interactive information visualization, in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (2005), pp. 421–430
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Iqbal, F., Debbabi, M., Fung, B.C.M. (2020). Criminal Information Mining. In: Machine Learning for Authorship Attribution and Cyber Forensics. International Series on Computer Entertainment and Media Technology. Springer, Cham. https://doi.org/10.1007/978-3-030-61675-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-61675-5_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61674-8
Online ISBN: 978-3-030-61675-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)