Skip to main content

Authorship Analysis in Cybercrime Investigation

  • Conference paper
  • First Online:
Intelligence and Security Informatics (ISI 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2665))

Included in the following conference series:

Abstract

Criminals have been using the Internet to distribute a wide range of illegal materials globally in an anonymous manner, making criminal identity tracing difficult in the cybercrime investigation process. In this study we propose to adopt the authorship analysis framework to automatically trace identities of cyber criminals through messages they post on the Internet. Under this framework, three types of message features, including style markers, structural features, and content-specific features, are extracted and inductive learning algorithms are used to build feature-based models to identify authorship of illegal messages. To evaluate the effectiveness of this framework, we conducted an experimental study on data sets of English and Chinese email and online newsgroup messages. We experimented with all three types of message features and three inductive learning algorithms. The results indicate that the proposed approach can discover real identities of authors of both English and Chinese Internet messages with relatively high accuracies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. B. Brainerd, Statistical analysis of Lexical data using Chi-squared and related distributions. Computers and the Humanities, 9, 161–178. (1975).

    Article  Google Scholar 

  2. Binongo and Smith, A Study of Oscar Wilde’s Writings, Journal of Applied Statistics, vol. 26–7, p. 781, (1999).

    Article  MathSciNet  Google Scholar 

  3. R. H. Baayen, Statistical Models for Word Frequency Distributions: A Linguistic Evaluation. Computers and the Humanities, 26 347–363, 347–363. (1993).

    Article  Google Scholar 

  4. R. H. Baayen, H. van Halteren, and F. J. Tweedie, Outside The Cave of Shadows: Using Syntactic Annotation to Enhance Authorship Attribution. Literary and Linguistic Computing, 2, 110–120, (1996).

    Google Scholar 

  5. R. Bosch and J. Smith, Separating hyperplanes and the authorship of the disputed federalist papers, American Mathematical Monthly, 105(7): 601–608, (1998).

    Article  MATH  MathSciNet  Google Scholar 

  6. H. Chen, G. Shankaranarayanan, A. Iyer, and L. She, A Machine Learning Approach to Inductive Query by Examples: An Experiment Using Relevance Feedback, ID3, Genetic Algorithms, and Simulated Annealing, Journal of the American Society for Information Science, Volume 49, Number 8, Pages 693–705, (1998).

    Article  Google Scholar 

  7. N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines, Cambridge University Press, (2000).

    Google Scholar 

  8. E. Charniak, Statistical Language Learning. MIT Press, Cambridge, (1993).

    Google Scholar 

  9. J. Diederich, J. Kindermann, E. Leopold, and G. Paass, Authorship Attribution with Support Vector Machines, Applied Intelligence, (2000).

    Google Scholar 

  10. W. Elliot and R. Valenza, Was the Earl of Oxford The True Shakespeare? Notes and Queries, 38: 501–506, (1991).

    Google Scholar 

  11. I. S. Francis, An Exposition of a Statistical Approach to the Federalist Dispute. In J. Leed (Ed.), The Computer and Literary Style (pp. 38–79). Kent, Ohio: Kent State University Press. (1966).

    Google Scholar 

  12. J. M. Farringdon, Analyzing for Authorship A Guide to the Cusum Technique. Cardiff: University of Wales Press. (1996).

    Google Scholar 

  13. D. Foster, Author Unknown: On the Trail of Anonymous, Henry Holt, New York, (2000).

    Google Scholar 

  14. A. Gray, P. Sallis, and S. MacDonell, Software forensics: Extending authorship analysis techniques to computer programs, in Proc. 3rd Biannual Conf. Int. Assoc. of Forensic Linguists (IAFL’97), pages 1–8, (1997).

    Google Scholar 

  15. C. W. Hsu and C. J. Lin. A comparison on methods for multi-class support vector machines, IEEE Transactions on Neural Networks, 13, pages 415–425, (2002).

    Article  Google Scholar 

  16. D. I. Holmes and R. S. Forsyth, The Federalist Revisited: New Directions in Authorship Attribution. Literary and Linguistic Computing, 10, 111–127. (1995).

    Article  Google Scholar 

  17. D. I. Holmes, The Evolution of Stylometry in Humanities. Literary and Linguistic Computing, 13, 3. (1998).

    Google Scholar 

  18. T. Joachims, Text Categorization with Support Vector Machines, in: Proceedings of the European Conference on Machine learning (ECML), (1998).

    Google Scholar 

  19. D.V. Khmelev and F. J. Tweedir, Using Markov Chains for Identification of Writers, Literary and Linguistic Computing, vol. 16, no. 4, pp. 299–307, (2001).

    Article  Google Scholar 

  20. B. Kjell, Authorship Determination Using Letter-pair Frequency Features with Neural Network Classifiers. Literary and Linguistic Computing, 9, 119–124. (1994).

    Article  Google Scholar 

  21. D. Lowe, and R. Matthews, Shakespeare vs. Fletcher: A Stylometric Analysis by Radial Basis Functions. Computers and the Humanities, 29, 449–461 (1995).

    Article  Google Scholar 

  22. R. P. Lippmann, An Introduction to Computing with Neural Networks, IEEE Acoustics Speech and Signal Processing Magazine, 4(2): 4–22, (1987).

    Google Scholar 

  23. F. Mosteller and D. L. Wallace, Inference and Disputed Authorship: The Federalist, Addison-Wesley, Reading, Mass., (1964).

    MATH  Google Scholar 

  24. F. Mosteller, Frederick, and D. L. Wallace, Applied Bayesian and Classical Inference: the Case of the Federalist Papers, in the 2nd edition of Inference and Disputed Authorship, The Federalist, Springer-Verlag, (1964).

    Google Scholar 

  25. A. McCallum and K. Nigam, A Comparison of Event Models for Naive Bayes Text Classification. AAAI-98 Workshop on “Learning for Text Categorization”, (1998).

    Google Scholar 

  26. J. Moody and J. Utans, Architecture Selection Strategies for Neural Networks Application to Corporate Bond Rating, Neural Networks in the Capital Markets, (1995).

    Google Scholar 

  27. E. Osuna, R. Freund and F. Girosi, Training Support Vector Machines: An Application to Face Detection, Proceedings of Computer Vision and Pattern Recognition, 130–136, (1997).

    Google Scholar 

  28. J. R. Quinlan, Induction of Decision Trees, Machine Learning, 1(1): 81–106, (1986).

    Google Scholar 

  29. J. Rudman, The State of Authorship Attribution Studies: Some Problems and Solutions. Computers and the Humanities, 31, 351–365. (1998).

    Article  Google Scholar 

  30. R. Thisted, and B. Efron, Did Shakespeare Write a Newly Discovered Poem? Biometrika, 74, 445–455. (1987).

    Article  MATH  MathSciNet  Google Scholar 

  31. D. Thomas, and B. D. Loader, Introduction — Cyber Crime: law enforcement, security and surveillance in the information age, Taylor & Francis Group, New York, NY, (2000).

    Google Scholar 

  32. T. Tomoji, Dickens’s Narrative Style: A Statistical Approach to Chronological Variation. Revue, Informatique et Statistique dans les Sciences Humaines (RISSH, Centre Informatique de Philosophie et Lettres, Universite de Liege, Belgique), 30, 165–182, (1994).

    Google Scholar 

  33. F. J. Tweedie, S. Singh, and D. I. Holmes, Neural Network Applications in Stylometry: The Federalist Papers. Computers and the Humanities, 30(1), 1–10 (1996).

    Article  Google Scholar 

  34. K. M. Tolle, H. Chen and H. Chow, Estimating Drug/Plasma Concentration Levels by Applying Neural Networks to Pharmacokinetic Data Sets, Decision Support Systems, Special Issue on Decision Support for Health Care in a New Information Age, 30(2), 139–152, (2000).

    Google Scholar 

  35. O. de Vel, A. Anderson, M. Corney and G. Mohay, Mining E-mail Content for Author Identification Forensics, SIGMOD Record, 30(4): 55–64, (2001).

    Article  Google Scholar 

  36. O. de Vel, Mining e-mail authorship. In Proc. Workshop on Text Mining, ACM International Conference on Knowledge Discovery and Data Mining (KDD’2000), (2000).

    Google Scholar 

  37. V. Vapnik, The Nature of Statistical Learning Theory, Springer Verlag, New York, (1995).

    MATH  Google Scholar 

  38. B. Widrow, D. E. Rumelhart and M. A. Lehr, Neural Networks: Applications in Industry, Business, and Science, Communications of the ACM, 37, 93–105, (1994).

    Article  Google Scholar 

  39. G. U. Yule, On sentence length as a statistical characteristic of style in prose, Bometrikka, 30, (1938).

    Google Scholar 

  40. G. U. Yule, The statistical study of literary vocabulary, Cambridge University Press, (1944).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zheng, R., Qin, Y., Huang, Z., Chen, H. (2003). Authorship Analysis in Cybercrime Investigation. In: Chen, H., Miranda, R., Zeng, D.D., Demchak, C., Schroeder, J., Madhusudan, T. (eds) Intelligence and Security Informatics. ISI 2003. Lecture Notes in Computer Science, vol 2665. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44853-5_5

Download citation

  • DOI: https://doi.org/10.1007/3-540-44853-5_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40189-6

  • Online ISBN: 978-3-540-44853-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics