Generalized Hamming Distance
Rent the article at a discountRent now
* Final gross prices may vary according to local VAT.Get Access
Many problems in information retrieval and related fields depend on a reliable measure of the distance or similarity between objects that, most frequently, are represented as vectors. This paper considers vectors of bits. Such data structures implement entities as diverse as bitmaps that indicate the occurrences of terms and bitstrings indicating the presence of edges in images. For such applications, a popular distance measure is the Hamming distance. The value of the Hamming distance for information retrieval applications is limited by the fact that it counts only exact matches, whereas in information retrieval, corresponding bits that are close by can still be considered to be almost identical. We define a “Generalized Hamming distance” that extends the Hamming concept to give partial credit for near misses, and suggest a dynamic programming algorithm that permits it to be computed efficiently. We envision many uses for such a measure. In this paper we define and prove some basic properties of the “Generalized Hamming distance”, and illustrate its use in the area of object recognition. We evaluate our implementation in a series of experiments, using autonomous robots to test the measure's effectiveness in relating similar bitstrings.
- Adriaans P and Zantinge D (1996) Data Mining. Addison Wesley Longman Ltd, Harlow, England.
- Boykov Y and Huttenlocher D (1999) A new Bayesian framework for object recognition. In: proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society.
- Bookstein A and Klein ST (1990) Information retrieval tools for literary analysis. In: Tjoa AM, Ed., Database and Expert Systems Applications. Springer-Verlag, Vienna, pp. 1–7.
- Bookstein A and Klein ST (1991) Compression of correlated bit-vectors. Information Systems, 16:387–400.
- Bookstein A, Klein ST and Raita T (1998) Clumping properties of content-bearing words. Journal of the American Society for Information Science, 49:102–114.
- Bookstein A, Klein ST and Raita T (2001) Fuzzy hamming distance: A new dissimilarity measure. In: Amir, Amihood and Landau, G Eds., Proceedings: 12th Annual Symposium on Combinatorial Pattern Matching (CPM2001). Jerusalem, Israel, 1-4 July, 2001.
- Chang P and Krumm J (1999) Object recognition with color cooccurrence histograms. In: proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society.
- Cormen TH, Leiserson CE and Rivest RL (1990) Introduction to Algorithms. MIT Press, Cambridge, MA.
- Crochemore M and Rytter W (1994) Text Algorithms. Oxford University Press, New York.
- Doyle L (1961) Semantic road maps for literature searchers. Journal of the ACM, 8(4):553–578.
- Hamming RW (1980) Coding and Information Theory. Prentice-Hall, Englewood Cliffs, NJ.
- Hearst MA (1994) Multi-paragraph segmentation of expository text. In: proc. ACL Conf. Las Cruces.
- Hearst MA and Plaunt C (1993) Subtopic structuring for full-length document access. In: proc. 16-th ACM-SIGIR Conf. Pittsburgh, pp. 59–68.
- Jansen BJ, Spink A, Bateman J and Saracevic T (1998) Real life information retrieval: A study of user queries on the web. SIGIR Forum, 32(1):5–18.
- Knuth DE (1973) The Art of Computer Programming, Vol I, Fundamental Algorithms, Addison-Wesley, Reading, MA.
- Kulyukin V and Burke A (in press) Mining free text for structure. In: wang J, Ed., Data Mining: Opportunities and Challenges, Idea Group Publishing, Pensylvania.
- Kulyukin V and Morley N (2002) Integrated object recognition in the three-tiered robot architecture, In: Proc. of the International Conference on Artificial Intelligence, Las Vegas.
- Kulyukin V and Zoko A (2000) Hamming distance for object recognition by mobile robots. In: Proceedings of the Research Symposium of the DePaul University School of Computer Science, Telecommunications and Information Systems (CTIRS-2000). DePaul University.
- Murphy R (2000) AI Robotics. MIT Press, New York.
- Parker, JR (1993) Practical Computer Vision Using C. John Wiley and Sons, New York.
- Roman S (1992) Coding and Information Theory. Springer-Verlag, New York.
- Sankoff D and Kruskal JB (1983) Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, Reading, MA.
- Swain MJ and Ballard DH (1991) Color Indexing. International Journal of Computer Vision, 7:11–32.
- Tam AM and Leung CHC (2001) Structured natural-language descriptions for semantic content retrieval of visual materials. Journal of the American Society for Information Science and Technology, 52(11):930–937.
- Witten IH, Moffat A and Bell TC (1999) Managing Gigbytes. Morgan Kaufmann Publishers, San Fransisco.
- Young SS, Scott PD and Nasrabadi N (1994) Multi-layer hopfield neural network. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recogntion. IEEE Computer Society.
- Generalized Hamming Distance
Volume 5, Issue 4 , pp 353-375
- Cover Date
- Print ISSN
- Online ISSN
- Kluwer Academic Publishers
- Additional Links
- information retrieval
- hamming distance
- computer vision
- image retrieval
- object recognition
- robot vision
- Industry Sectors
- Author Affiliations
- 1. Center for Information and Language Studies, University of Chicago, Chicago, IL, 60637
- 2. Computer Science Department, Utah State University, Logan, Utah, 84322-4205
- 3. Computer Science Department, University of Turku, 20520, Turku, Finland