Supporting Law Enforcement in Digital Communities through Natural Language Analysis

  • Danny Hughes
  • Paul Rayson
  • James Walkerdine
  • Kevin Lee
  • Phil Greenwood
  • Awais Rashid
  • Corinne May-Chahal
  • Margaret Brennan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5158)


Recent years have seen an explosion in the number and scale of digital communities (e.g. peer-to-peer file sharing systems, chat applications and social networking sites). Unfortunately, digital communities are host to significant criminal activity including copyright infringement, identity theft and child sexual abuse. Combating this growing level of crime is problematic due to the ever increasing scale of today’s digital communities. This paper presents an approach to provide automated support for the detection of child sexual abuse related activities in digital communities. Specifically, we analyze the characteristics of child sexual abuse media distribution in P2P file sharing networks and carry out an exploratory study to show that corpus-based natural language analysis may be used to automate the detection of this activity. We then give an overview of how this approach can be extended to police chat and social networking communities.


Social Networks P2P Network Monitoring Natural Language Analysis Child Protection 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    MySpace (April 2008),
  2. 2.
    MSN Messenger (April 2008),
  3. 3.
    The Gnutella Protocol Specification, version 0.4 (retrieved, April 2008),
  4. 4.
    Ellison, L.: Cyberstalking: Tackling Harassment on the Internet. In: 14th BILETA Conference: CYBERSPACE 1999: Crime, Criminal Justice and the Internet (1999)Google Scholar
  5. 5.
    Pallister, D.: Internet paedophile gets nine years for sex with schoolgirls, Guardian Newspaper (June 23, 2006),
  6. 6.
    Hughes, D., Gibson, S., Walkerdine, J., Coulson, G.: Is Deviant Behaviour the Norm on P2P File Sharing Networks? IEEE Distributed Systems Online 7(2) (February 2006)Google Scholar
  7. 7.
    Bittorrent Protocol Specification, version 1.0 (retrieved, April 2008),
  8. 8.
    Karagiannis, T., Broido, A., Brownlee, N., Faloutsos, M.: Is P2P Dying or Just Hiding? In: Proceedings of Globecom 2004, Dallas, Texas, USA (December 2004)Google Scholar
  9. 9.
    Lee, K., Walkerdine, J., Hughes, D.: On the Penetration of Business Networks by P2P File Sharing. In: Proceedings of the 2nd International Conference on Internet Monitoring and Protection (ICIMP 2007), Santa Clara, California, USA (July 2007)Google Scholar
  10. 10.
    RFC 1459: Internet Relay Chat (IRC) Protocol (retrieved, April 2008),
  11. 11.
    Skype (April 2008),
  12. 12.
    BBC News 24, Chat room Paedophile Jailed,
  13. 13.
    BBC News 24, Men Jailed for Online Rape Plot (April 2008),
  14. 14.
    The Virtual Global Task Force (April 2008),
  15. 15.
    The UK Child Exploitation and Online Protection Centre (CEOP) (April 2008),
  16. 16.
    Scottish Parliament, The Protection of Children and Prevention of Sexual Offences (Scotland) Bill (April 2008),
  17. 17.
    Facebook (April 2008),
  18. 18.
    Social Media Today, Facebook Explodes (June 2007),
  19. 19.
    Office of Public Sector Information, Malicious Communications Act 1988 (April 2008),
  20. 20.
    Crime Library (2007), Cyberstalking- A Case Study (April 2008),
  21. 21.
    Panorama Transcript: One click from Danger (2008) (April 2008),
  22. 22.
    Scott, M.: Focusing on the text and its key words. In: Burnard, L., McEnery, T. (eds.) Rethinking Language Pedagogy from a Corpus Perspective, Peter Lang, Frankfurt, pp. 104–121 (2000)Google Scholar
  23. 23.
    Rayson, P., Leech, G., Hodges, M.: Social differentiation in the use of English vo-cabulary: some analyses of the conversational component of the British National Corpus. Intl. Journal of Corpus Linguistics 2(1), 133–152 (1997)Google Scholar
  24. 24.
    Hofland, K., Johansson, S.: Word frequencies in British and American English, NCCH, Bergen, Norway (1982)Google Scholar
  25. 25.
    Rayson, P.: Matrix: A statistical method and software tool for linguistic analysis through corpus comparison, Ph.D. thesis, Lancaster University (2003)Google Scholar
  26. 26.
    Holmes, D.I.: Authorship attribution, Computers and the humanities 28(2), 87–106 (1994)CrossRefGoogle Scholar
  27. 27.
    Juola, P., Sofko, J., Brennan, P.: A prototype for authorship attribution studies. Literary and Linguistic Computing 21, 169–178 (2006)CrossRefGoogle Scholar
  28. 28.
    Wmatrix (April 2008),
  29. 29.
    SpectorSoft ‘Spector Pro’ (April 2008),
  30. 30.
    Protecting Each Other, Crisp ctingeachother (April 2008), http://www.prote
  31. 31.
    SpyTech ‘Spy Agent’ (April 2008),
  32. 32.
    Rayson, P., Garside, R.: Comparing corpora using frequency profiling. In: Proceedings of the workshop on Comparing Corpora, held in conjunction with ACL 2000, Hong Kong, October 1-8, pp. 1–6 (2000)Google Scholar
  33. 33.
    Sawyer, P., Rayson, P., Cosh, K.: Shallow Knowledge as an Aid to Deep Understand-ing in Early Phase Requirements Engineering. IEEE Transactions on Software Engineering 31(11), 969–981 (2005)CrossRefGoogle Scholar
  34. 34.
  35. 35.
    Peng, H.: A Data Mining Approach Based on Grey Prediction Model in Web Environment. Semantics, Knowledge and Grid, 76 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Danny Hughes
    • 1
  • Paul Rayson
    • 1
  • James Walkerdine
    • 1
  • Kevin Lee
    • 2
  • Phil Greenwood
    • 1
  • Awais Rashid
    • 1
  • Corinne May-Chahal
    • 3
  • Margaret Brennan
    • 4
  1. 1.Computing, InfoLab 21, South DriveLancaster UniversityLancasterUK
  2. 2.Isis ForensicsLancasterUK
  3. 3.Department of Applied Social ScienceLancaster UniversityLancasterUK
  4. 4.Child Exploitation and Online Protection CentreLondonUK

Personalised recommendations