Signal, Image and Video Processing

, Volume 8, Issue 1, pp 191–195 | Cite as

The challenge of general machine vision

  • Theo Pavlidis
Original Paper


I argue that in spite of the progress in several machine vision applications, the general machine vision problem is not going to be solved any time soon. There are three reasons for that: (1) The complexity of human vision: Bottom-up and top-down processes are tightly interwoven, and we have no good models for dealing with that; (2) The fact that perceptual similarity is not the same as mathematical similarity; (3) The illusion of progress by relying on “proofs by example” that are not always valid. I discuss several examples of applications that were successful because they did not face any of the three obstacles.


Machine vision Human vision Perceptual similarity Mathematical similarity Image retrieval 



While writing this paper, I received significant help from several people who commented on drafts of the paper and/or provided pointers to literature and research activities that I had missed. They are (in alphabetical order): Henry Baird (Lehigh Univ.), Alex Berg (Stony Brook Univ.), Kevin Bowyer (Notre Dame), Enis Cetin (Bilkent Univ., Ankara), Jelena Kovacevic (Carnegie Mellon Univ.), Mike McCann (Carnegie Mellon Univ.), Imad Malhas (IrisGuard), Paul Pavlidis (Univ. of British Columbia), Fabio Roli (Univ. of Cagliari, Italy), Dimitri Samaras (Stony Brook Univ.), Ray Smith (Google), John Tsotsos (York Univ., Canada) and Greg Zelinsky (Stony Brook Univ.). Obviously, these people need not share all of my views on machine vision.

Supplementary material

11760_2013_549_MOESM1_ESM.docx (813 kb)
Supplementary material 1 (docx 812 KB)


  1. 1.
    Julesz, B.: Early vision and focal attention. Rev. Mod. Phys. 63, 735–772 (1991)CrossRefGoogle Scholar
  2. 2.
    Ramachandran, V.S., Blakeslee, S.: Phantoms in the Brain. William Morrow and Company Inc., New York (1998)Google Scholar
  3. 3.
    Pavlidis, T.: Context dependent shape perception. In: Arcelli, C., Cordella, L.P., Sanniti di Baja, G. (eds.) Aspects of Visual Form Processing, pp. 440–454. World Scientific, Singapore (1994)Google Scholar
  4. 4.
    Eric, W., Grimson, L., Kikinis, Ron, Jolesz, Ferenc A., Black, Peter McL: Image-Guided Surgery. Scientific American, London (1999)Google Scholar
  5. 5.
    Gunay, O., Toreyin, B.U., Kose, K., Cetin, A.E.: Entropy-functional-based online adaptive decision fusion framework with application to wildfire detection in video. IEEE Trans. Image Process. 21, 2853–2865 (2012)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Figueroa, A., Tsai, P.S., Bent, E., Guo, R.: Robust spots finding in microarray images with distortions.
  7. 7.
  8. 8.
    Tsotsos, J.K.: A Computational Perspective on Visual Attention. MIT Press, Cambridge (2011)CrossRefGoogle Scholar
  9. 9.
    Zhang, W., Samaras, D., Zelinsky, G.: Classifying objects based on their visual similarity to target categories. In: Proceedings of the 30th Annual Conference of the Cognitive Science Society, pp. 1856–1861 (2008)Google Scholar
  10. 10.
    Zhao, X., Reyes, M. G., Pappas, T. N., Neuhoff, D. L.: Structural texture similarity metrics for retrieval applications. In: Proceedings of the International Conference on Image Processing, pp. 1196–1199 (2008)Google Scholar
  11. 11.
    Jain, A.K., Feng, J., Nandakumar, K.: Fingerprint matching. IEEE Comput. 43, 36–44 (2010)CrossRefGoogle Scholar
  12. 12.
    Bowyer, K.W., Hollingsworth, K., Flynn, P.J.: Image understanding for iris biometrics: a survey. Comput. Vis. Image Underst. 110, 281–307 (2008)CrossRefGoogle Scholar
  13. 13.
    Burge, M.J., Bowyer, K.W. (eds.): Handbook of Iris Recognition. Springer, Berlin (2013)Google Scholar
  14. 14.
  15. 15.
    Lee, J.-E., Jain, A.K., Jin, R.: Scars, marks, and tattoos (SMT): soft biometric for suspect and victim identification. Biometrics Symposium, Tampa, September (2008)Google Scholar
  16. 16.
    Burns, L.R. Jr., Dorrance, D.R., Golab, T.J., Shylanski, M.S., Strege, T.A.: Common reference target machine vision wheel alignment system. US Patent No. 7164472 B2 (2007) Google Scholar
  17. 17.
  18. 18.
    Vandewalle, P., Kovacevic, J., Veterlli, M.: Reproducible research in signal processing. IEEE Signal Process. Mag. 26(3), 37–47 (2009)Google Scholar
  19. 19.
    Torralba, A., Efros, A. A.: Unbiased Look at Dataset Bias. CVPR, pp. 1521–1528 (2011)Google Scholar
  20. 20.
    Pavlidis, T.: The number of all possible meaningful or discernible pictures. Pattern Recognit. Lett. 30, 1413–1415 (2009)CrossRefGoogle Scholar
  21. 21.
  22. 22.
    Xiu, P., Baird, H.S.: Whole-book recognition. IEEE Trans. PAMI. 34(12), 2467–2480(2012)Google Scholar
  23. 23.
    Lee, D.-S., Smith, R.: Improving book OCR by adaptive language and image model. In: Proceedings of the 2012 10th IAPR Int’l Workshop on Document Analysis Systems, IEEE, pp. 115–119Google Scholar
  24. 24.
    Le et al. Q.V.: Building high-level features using large scale unsupervised learning. In: Proceedings of the 29th International Conference on Machine Learning, Edinburgh, Scotland, UK (2012)Google Scholar
  25. 25.
    Pavlidis, T., Joseph, E., He, D., Hatton, E., Lu, K.: Measurement of dimensions of solid objects from two-dimensional image(s). US Patent 6,995,762 February 7 (2006)Google Scholar
  26. 26.
    Lu, K.-F., Pavlidis, T.: Detecting textured objects using convex hull. Mach. Vis. Appl. 18, 123–133 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  1. 1.Computer Science DepartmentStony Brook UniversityStony BrookUSA

Personalised recommendations