Computing the Cyclic Edit Distance for Pattern Classification by Ranking Edit Paths
Abstract
The cyclic edit distance between two strings A and B of lengths m and n is the minimum edit distance between A and every cyclic shift of B. This can be applied, for instance, in classification tasks where strings represent the contour of objects. Bunke and Bühler proposed an algorithm that approximates the cyclic edit distance in time O(mn). In this paper we show how to apply a technique for ranking the K shortest paths to an edit graph underlying the Bunke and Bühler algorithm to obtain the exact solution. This technique, combined with pruning rules, leads to an efficient and exact procedure for nearest-neighbour classification based on cyclic edit distances. Experimental results show that the proposed method can be used to classify handwritten digits using the exact cyclic edit distance with only a small increase in computing time with respect to the original Bunke and Bühler algorithm.
Keywords
Cyclic strings cyclic edit distance string matching Bunke and Bühler algorithm handwritten text recognition OCR K shortest pathsReferences
- 1.Sankoff, D., Kruskal, J. (eds.): Time Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison. Addison-Wesley, Reading (1983)Google Scholar
- 2.Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. Journal of the ACM 21, 168–173 (1974)MATHCrossRefMathSciNetGoogle Scholar
- 3.Bunke, H., Bühler, H.: Applications of approximate string matching to 2D shape recognition. Pattern Recognition 26, 1797–1812 (1993)CrossRefGoogle Scholar
- 4.Zhang, D., Lu, G.: Review of shape representation and description techniques. Pattern Recognition 37, 1–19 (2004)MATHCrossRefGoogle Scholar
- 5.Maes, M.: On a cyclic string-to-string correction problem. Information Processing Letters 35, 73–78 (1990)MATHCrossRefMathSciNetGoogle Scholar
- 6.Marzal, A., Barrachina, S.: Speeding up the computation of the edit distance for cyclic strings. In: Int. Conf. on Pattern Recognition, pp. 271–280 (2000)Google Scholar
- 7.Peris, G., Marzal, A.: Fast cyclic edit distance computation with weighted edit costs in classification. In: Int. Conf. on Pattern Recognition, pp. 184–187 (2002)Google Scholar
- 8.Mollineda, R.A., Vidal, E., Casacuberta, F.: Efficient techniques for a very accurate measurement of dissimilarities between cyclic patterns. In: Amin, A., Pudil, P., Ferri, F., Iñesta, J.M. (eds.) SPR 2000 and SSPR 2000. LNCS, vol. 1876, pp. 337–346. Springer, Heidelberg (2000)CrossRefGoogle Scholar
- 9.Marzal, A., Mollineda, R., Peris, G., Vidal, E.: Cyclic string matching: efficient exact and approximate algorithms. In: Chen, D., Cheng, X. (eds.) Pattern Recognition and String Matching, pp. 477–497. Kluwer Academic, Dordrecht (2002)Google Scholar
- 10.Jiménez, V.M., Marzal, A.: Computing the K shortest paths: a new algorithm and an experimental comparison. In: Vitter, J.S., Zaroliagis, C.D. (eds.) WAE 1999. LNCS, vol. 1668, pp. 15–29. Springer, Heidelberg (1999)CrossRefGoogle Scholar
- 11.Cormen, T., Leiserson, C., Rivest, R.: Introduction to Algorithms. The MIT Press, Cambridge (1990)MATHGoogle Scholar
- 12.Grother, P.J.: NIST Special Database 19: Handprinted forms and characters database. Technical report, National Institute of Standards and Technology (1995)Google Scholar