Measures to Detect Word Substitution in Intercepted Communication

  • SzeWang Fong
  • David B. Skillicorn
  • D. Roussinov
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3975)


Those who want to conceal the content of their communications can do so by replacing words that might trigger attention by other words or locutions that seem more ordinary. We address the problem of discovering such substitutions when the original and substitute words have the same natural frequency. We construct a number of measures, all of which search for local discontinuities in properties such as string and bag-of-words frequency. Each of these measures individually is a weak detector. However, we show that combining them produces a detector that is reasonably effective.


False Positive Rate English Word High False Positive Rate Class Word Rare Word 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bilmes, J.A., Kirchhoff, K.: Factored language models and generalized parallel backoff. In: Proceedings of HLT/NACCL (2003)Google Scholar
  2. 2.
    British National Corpus (BNC) (2004),
  3. 3.
    European Parliament Temporary Committee on the ECHELON Interception System. Final report on the existence of a global system for the interception of private and commercial communications (ECHELON interception system) (2001)Google Scholar
  4. 4.
    Fong, S.W., Skillicorn, D.B., Roussinov, D.: Detecting word substitution in adversarial communication. In: Workshop on Link Analysis, Counterterrorism and Security at the SIAM International Conference on Data Mining, to appear (2006)Google Scholar
  5. 5.
    Golding, A.R., Roth, D.: A Winnow-based approach to context-sensitive spelling correction. In: Machine Learning, Special issue on Machine Learning and Natural Language (1999)Google Scholar
  6. 6.
    Ferrer, R., Cancho, I., Solé, R.V.: The small world of human language. In: Proceedings of the Royal Society of London Series B – Biological Sciences, pp. 2261–2265 (2001)Google Scholar
  7. 7.
    Lee, H., Ng, A.Y.: Spam deobfuscation using a Hidden Markov Model. In: Proceedings of the Second Conference on Email and Anti-Spam (2005)Google Scholar
  8. 8.
    Roussinov, D., Zhao, L.: Automatic discovery of similarity relationships through web mining. In: Decision Support Systems, pp. 149–166 (2003)Google Scholar
  9. 9.
    Roussinov, D., Zhao, L., Fan, W.: Mining context specific similarity relationships using the World Wide Web. In: Proceedings of the 2005 Conference on Human Language Technologies (2005)Google Scholar
  10. 10.
    Skillicorn, D.B.: Beyond keyword filtering for message and conversation detection. In: Kantor, P., Muresan, G., Roberts, F., Zeng, D.D., Wang, F.-Y., Chen, H., Merkle, R.C. (eds.) ISI 2005. LNCS, vol. 3495, pp. 231–243. Springer, Heidelberg (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • SzeWang Fong
    • 1
  • David B. Skillicorn
    • 1
  • D. Roussinov
    • 2
  1. 1.School of ComputingQueen’s University 
  2. 2.W.P. Carey School of BusinessArizona State University 

Personalised recommendations