Measures to Detect Word Substitution in Intercepted Communication
Those who want to conceal the content of their communications can do so by replacing words that might trigger attention by other words or locutions that seem more ordinary. We address the problem of discovering such substitutions when the original and substitute words have the same natural frequency. We construct a number of measures, all of which search for local discontinuities in properties such as string and bag-of-words frequency. Each of these measures individually is a weak detector. However, we show that combining them produces a detector that is reasonably effective.
KeywordsFalse Positive Rate English Word High False Positive Rate Class Word Rare Word
Unable to display preview. Download preview PDF.
- 1.Bilmes, J.A., Kirchhoff, K.: Factored language models and generalized parallel backoff. In: Proceedings of HLT/NACCL (2003)Google Scholar
- 2.British National Corpus (BNC) (2004), http://www.natcorp.ox.ac.uk
- 3.European Parliament Temporary Committee on the ECHELON Interception System. Final report on the existence of a global system for the interception of private and commercial communications (ECHELON interception system) (2001)Google Scholar
- 4.Fong, S.W., Skillicorn, D.B., Roussinov, D.: Detecting word substitution in adversarial communication. In: Workshop on Link Analysis, Counterterrorism and Security at the SIAM International Conference on Data Mining, to appear (2006)Google Scholar
- 5.Golding, A.R., Roth, D.: A Winnow-based approach to context-sensitive spelling correction. In: Machine Learning, Special issue on Machine Learning and Natural Language (1999)Google Scholar
- 6.Ferrer, R., Cancho, I., Solé, R.V.: The small world of human language. In: Proceedings of the Royal Society of London Series B – Biological Sciences, pp. 2261–2265 (2001)Google Scholar
- 7.Lee, H., Ng, A.Y.: Spam deobfuscation using a Hidden Markov Model. In: Proceedings of the Second Conference on Email and Anti-Spam (2005)Google Scholar
- 8.Roussinov, D., Zhao, L.: Automatic discovery of similarity relationships through web mining. In: Decision Support Systems, pp. 149–166 (2003)Google Scholar
- 9.Roussinov, D., Zhao, L., Fan, W.: Mining context specific similarity relationships using the World Wide Web. In: Proceedings of the 2005 Conference on Human Language Technologies (2005)Google Scholar