A New Editing Scheme Based on a Fast Two-String Median Computation Applied to OCR

  • José Ignacio Abreu Salas
  • Juan Ramón Rico-Juan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6218)

Abstract

This paper presents a new fast algorithm to compute an approximation to the median between two strings of characters representing a 2D shape and its application to a new classification scheme to decrease its error rate. The median string results from the application of certain edit operations from the minimum cost edit sequence to one of the original strings. The new dataset editing scheme relaxes the criterion to delete instances proposed by the Wilson Editing Procedure. In practice, not all instances misclassified by its near neighbors are pruned. Instead, an artificial instance is added to the dataset expecting to successfully classify the instance on the future. The new artificial instance is the median from the misclassified sample and its same-class nearest neighbor. The experiments over two widely used datasets of handwritten characters show this preprocessing scheme can reduce the classification error in about 78% of trials.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Cárdenas, R.: A Learning Model for Multiple-Prototype Classification of Strings. In: 17th International Conference on Pattern Recognition, vol. 4, pp. 420–442 (2004)Google Scholar
  2. 2.
    Devijver, I., Kittler, J.: On the edited nearest neighbour rule. In: 5th Int. Conf. on Pattern Recognition, pp. 72–80 (1980)Google Scholar
  3. 3.
    Duta, N., Jain, A., Dubuisson-Jolly, M.: Automatic Construction of 2D Shape Models. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 433–446 (2001)CrossRefGoogle Scholar
  4. 4.
    Ferri, F., Vidal, E.: Comparison of several editing and condensing techniques for colour image segmentation and object location. Pattern Recognition and Image Analysis (1992)Google Scholar
  5. 5.
    Jiang, X., Schiffmann, L., Bunke, H.: Computation of median shapes. In: 4th Asian Conference on Computer Vision (2000)Google Scholar
  6. 6.
    Koplowitz, J., Brown, T.: On the relation of performance to editing in nearest neighbour rules. Pattern Recognition 13, 251–255 (1981)CrossRefGoogle Scholar
  7. 7.
    Levenshtein, V.: Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics 10, 707–710 (1966)MathSciNetGoogle Scholar
  8. 8.
    Martínez, C., Juan, A., Casacubierta, F.: Median strings for k-nearest neighbour classification*1. Pattern Recognition Letters 24, 173–181 (2003)MATHCrossRefGoogle Scholar
  9. 9.
    Olvera, J., Martínez, F.: Edition schemes based on BSE. In: 10th Iberoamerican Congress on Pattern Recognition, pp. 360–368 (2005)Google Scholar
  10. 10.
    Penrod, C., Wagner, T.: Another look at the edited neares neighbour rule. IEEE Trans. on Systems, Man and Cybernetics 7, 92–94 (1977)MATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Rico-Juan, J.R., Micó, L.: Comparison of AESA and LAESA search algorithms using string and tree-edit-distances. Pattern Recognition Letters 24, 1417–1426 (2003)MATHCrossRefGoogle Scholar
  12. 12.
    Sánchez, J., Pla, F., Ferri, F.: Using the nearest centroid neighbourhood concept for editing purposes. In: 7th Symposium National de Reconocimiento de Formas y Análisis de Imágen, vol. 1, pp. 175–180 (1997)Google Scholar
  13. 13.
    Tomek, I.: An experiment with the edit nearest neighbour. IEEE Trans. on Systems, Man and Cybernetics 6, 448–452 (1976)MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Tomek, I.: A generalization of the k-NN rule. IEEE Trans. on Systems, Man and Cybernetics 6, 121–126 (1976)MATHMathSciNetGoogle Scholar
  15. 15.
    Vázquez, F., Sánchez, J., Pla, F.: A stochastic approach to Wilson’s editing algorithm. In: Marques, J.S., Pérez de la Blanca, N., Pina, P. (eds.) IbPRIA 2005. LNCS, vol. 3523, pp. 35–42. Springer, Heidelberg (2005)Google Scholar
  16. 16.
    Wagner, R., Fischer, M.: The String-to-String Correction Problem. Journal of the ACM 21, 168–173 (1974)MATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Wilson, D.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. on Systems, Man. and Cybernetics 2, 408–421 (1972)MATHCrossRefGoogle Scholar
  18. 18.
    Wilson, D., Martínez, T.: Reduction techniques for instance based learning algorithms. Machine Learning 38, 257–286 (2000)MATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • José Ignacio Abreu Salas
    • 1
  • Juan Ramón Rico-Juan
    • 2
  1. 1.Universidad de MatanzasCuba
  2. 2.Dpto Lenguajes y Sistemas InformáticosUniversidad de AlicanteSpain

Personalised recommendations