Authorship Analysis in Cybercrime Investigation

Zheng, Rong; Qin, Yi; Huang, Zan; Chen, Hsinchun

doi:10.1007/3-540-44853-5_5

Rong Zheng⁴,
Yi Qin⁴,
Zan Huang⁴ &
…
Hsinchun Chen⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2665))

Included in the following conference series:

International Conference on Intelligence and Security Informatics

2145 Accesses
54 Citations

Abstract

Criminals have been using the Internet to distribute a wide range of illegal materials globally in an anonymous manner, making criminal identity tracing difficult in the cybercrime investigation process. In this study we propose to adopt the authorship analysis framework to automatically trace identities of cyber criminals through messages they post on the Internet. Under this framework, three types of message features, including style markers, structural features, and content-specific features, are extracted and inductive learning algorithms are used to build feature-based models to identify authorship of illegal messages. To evaluate the effectiveness of this framework, we conducted an experimental study on data sets of English and Chinese email and online newsgroup messages. We experimented with all three types of message features and three inductive learning algorithms. The results indicate that the proposed approach can discover real identities of authors of both English and Chinese Internet messages with relatively high accuracies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

B. Brainerd, Statistical analysis of Lexical data using Chi-squared and related distributions. Computers and the Humanities, 9, 161–178. (1975).
Article Google Scholar
Binongo and Smith, A Study of Oscar Wilde’s Writings, Journal of Applied Statistics, vol. 26–7, p. 781, (1999).
Article MathSciNet Google Scholar
R. H. Baayen, Statistical Models for Word Frequency Distributions: A Linguistic Evaluation. Computers and the Humanities, 26 347–363, 347–363. (1993).
Article Google Scholar
R. H. Baayen, H. van Halteren, and F. J. Tweedie, Outside The Cave of Shadows: Using Syntactic Annotation to Enhance Authorship Attribution. Literary and Linguistic Computing, 2, 110–120, (1996).
Google Scholar
R. Bosch and J. Smith, Separating hyperplanes and the authorship of the disputed federalist papers, American Mathematical Monthly, 105(7): 601–608, (1998).
Article MATH MathSciNet Google Scholar
H. Chen, G. Shankaranarayanan, A. Iyer, and L. She, A Machine Learning Approach to Inductive Query by Examples: An Experiment Using Relevance Feedback, ID3, Genetic Algorithms, and Simulated Annealing, Journal of the American Society for Information Science, Volume 49, Number 8, Pages 693–705, (1998).
Article Google Scholar
N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines, Cambridge University Press, (2000).
Google Scholar
E. Charniak, Statistical Language Learning. MIT Press, Cambridge, (1993).
Google Scholar
J. Diederich, J. Kindermann, E. Leopold, and G. Paass, Authorship Attribution with Support Vector Machines, Applied Intelligence, (2000).
Google Scholar
W. Elliot and R. Valenza, Was the Earl of Oxford The True Shakespeare? Notes and Queries, 38: 501–506, (1991).
Google Scholar
I. S. Francis, An Exposition of a Statistical Approach to the Federalist Dispute. In J. Leed (Ed.), The Computer and Literary Style (pp. 38–79). Kent, Ohio: Kent State University Press. (1966).
Google Scholar
J. M. Farringdon, Analyzing for Authorship A Guide to the Cusum Technique. Cardiff: University of Wales Press. (1996).
Google Scholar
D. Foster, Author Unknown: On the Trail of Anonymous, Henry Holt, New York, (2000).
Google Scholar
A. Gray, P. Sallis, and S. MacDonell, Software forensics: Extending authorship analysis techniques to computer programs, in Proc. 3rd Biannual Conf. Int. Assoc. of Forensic Linguists (IAFL’97), pages 1–8, (1997).
Google Scholar
C. W. Hsu and C. J. Lin. A comparison on methods for multi-class support vector machines, IEEE Transactions on Neural Networks, 13, pages 415–425, (2002).
Article Google Scholar
D. I. Holmes and R. S. Forsyth, The Federalist Revisited: New Directions in Authorship Attribution. Literary and Linguistic Computing, 10, 111–127. (1995).
Article Google Scholar
D. I. Holmes, The Evolution of Stylometry in Humanities. Literary and Linguistic Computing, 13, 3. (1998).
Google Scholar
T. Joachims, Text Categorization with Support Vector Machines, in: Proceedings of the European Conference on Machine learning (ECML), (1998).
Google Scholar
D.V. Khmelev and F. J. Tweedir, Using Markov Chains for Identification of Writers, Literary and Linguistic Computing, vol. 16, no. 4, pp. 299–307, (2001).
Article Google Scholar
B. Kjell, Authorship Determination Using Letter-pair Frequency Features with Neural Network Classifiers. Literary and Linguistic Computing, 9, 119–124. (1994).
Article Google Scholar
D. Lowe, and R. Matthews, Shakespeare vs. Fletcher: A Stylometric Analysis by Radial Basis Functions. Computers and the Humanities, 29, 449–461 (1995).
Article Google Scholar
R. P. Lippmann, An Introduction to Computing with Neural Networks, IEEE Acoustics Speech and Signal Processing Magazine, 4(2): 4–22, (1987).
Google Scholar
F. Mosteller and D. L. Wallace, Inference and Disputed Authorship: The Federalist, Addison-Wesley, Reading, Mass., (1964).
MATH Google Scholar
F. Mosteller, Frederick, and D. L. Wallace, Applied Bayesian and Classical Inference: the Case of the Federalist Papers, in the 2nd edition of Inference and Disputed Authorship, The Federalist, Springer-Verlag, (1964).
Google Scholar
A. McCallum and K. Nigam, A Comparison of Event Models for Naive Bayes Text Classification. AAAI-98 Workshop on “Learning for Text Categorization”, (1998).
Google Scholar
J. Moody and J. Utans, Architecture Selection Strategies for Neural Networks Application to Corporate Bond Rating, Neural Networks in the Capital Markets, (1995).
Google Scholar
E. Osuna, R. Freund and F. Girosi, Training Support Vector Machines: An Application to Face Detection, Proceedings of Computer Vision and Pattern Recognition, 130–136, (1997).
Google Scholar
J. R. Quinlan, Induction of Decision Trees, Machine Learning, 1(1): 81–106, (1986).
Google Scholar
J. Rudman, The State of Authorship Attribution Studies: Some Problems and Solutions. Computers and the Humanities, 31, 351–365. (1998).
Article Google Scholar
R. Thisted, and B. Efron, Did Shakespeare Write a Newly Discovered Poem? Biometrika, 74, 445–455. (1987).
Article MATH MathSciNet Google Scholar
D. Thomas, and B. D. Loader, Introduction — Cyber Crime: law enforcement, security and surveillance in the information age, Taylor & Francis Group, New York, NY, (2000).
Google Scholar
T. Tomoji, Dickens’s Narrative Style: A Statistical Approach to Chronological Variation. Revue, Informatique et Statistique dans les Sciences Humaines (RISSH, Centre Informatique de Philosophie et Lettres, Universite de Liege, Belgique), 30, 165–182, (1994).
Google Scholar
F. J. Tweedie, S. Singh, and D. I. Holmes, Neural Network Applications in Stylometry: The Federalist Papers. Computers and the Humanities, 30(1), 1–10 (1996).
Article Google Scholar
K. M. Tolle, H. Chen and H. Chow, Estimating Drug/Plasma Concentration Levels by Applying Neural Networks to Pharmacokinetic Data Sets, Decision Support Systems, Special Issue on Decision Support for Health Care in a New Information Age, 30(2), 139–152, (2000).
Google Scholar
O. de Vel, A. Anderson, M. Corney and G. Mohay, Mining E-mail Content for Author Identification Forensics, SIGMOD Record, 30(4): 55–64, (2001).
Article Google Scholar
O. de Vel, Mining e-mail authorship. In Proc. Workshop on Text Mining, ACM International Conference on Knowledge Discovery and Data Mining (KDD’2000), (2000).
Google Scholar
V. Vapnik, The Nature of Statistical Learning Theory, Springer Verlag, New York, (1995).
MATH Google Scholar
B. Widrow, D. E. Rumelhart and M. A. Lehr, Neural Networks: Applications in Industry, Business, and Science, Communications of the ACM, 37, 93–105, (1994).
Article Google Scholar
G. U. Yule, On sentence length as a statistical characteristic of style in prose, Bometrikka, 30, (1938).
Google Scholar
G. U. Yule, The statistical study of literary vocabulary, Cambridge University Press, (1944).
Google Scholar

Download references

Author information

Authors and Affiliations

Artificial Intelligence Lab, Department of Management Information Systems, The University of Arizona, Tucson, Arizona, 85721, USA
Rong Zheng, Yi Qin, Zan Huang & Hsinchun Chen

Authors

Rong Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Yi Qin
View author publications
You can also search for this author in PubMed Google Scholar
Zan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Hsinchun Chen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Management Information Systems, University of Arizona, Tucson, AZ, 85721, USA
Hsinchun Chen , Daniel D. Zeng & Therani Madhusudan , &
Tucson Police Department, 270 S. Stone Ave., Tucson, AZ, 85701, USA
Richard Miranda & Jenny Schroeder &
School of Public Administration and Policy, University of Arizona, Tucson, AZ, 85721, USA
Chris Demchak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zheng, R., Qin, Y., Huang, Z., Chen, H. (2003). Authorship Analysis in Cybercrime Investigation. In: Chen, H., Miranda, R., Zeng, D.D., Demchak, C., Schroeder, J., Madhusudan, T. (eds) Intelligence and Security Informatics. ISI 2003. Lecture Notes in Computer Science, vol 2665. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44853-5_5

Download citation

DOI: https://doi.org/10.1007/3-540-44853-5_5
Published: 27 May 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40189-6
Online ISBN: 978-3-540-44853-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics