Skip to main content
Log in

The challenge of general machine vision

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript


I argue that in spite of the progress in several machine vision applications, the general machine vision problem is not going to be solved any time soon. There are three reasons for that: (1) The complexity of human vision: Bottom-up and top-down processes are tightly interwoven, and we have no good models for dealing with that; (2) The fact that perceptual similarity is not the same as mathematical similarity; (3) The illusion of progress by relying on “proofs by example” that are not always valid. I discuss several examples of applications that were successful because they did not face any of the three obstacles.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others


  1. Julesz, B.: Early vision and focal attention. Rev. Mod. Phys. 63, 735–772 (1991)

    Article  Google Scholar 

  2. Ramachandran, V.S., Blakeslee, S.: Phantoms in the Brain. William Morrow and Company Inc., New York (1998)

    Google Scholar 

  3. Pavlidis, T.: Context dependent shape perception. In: Arcelli, C., Cordella, L.P., Sanniti di Baja, G. (eds.) Aspects of Visual Form Processing, pp. 440–454. World Scientific, Singapore (1994)

    Google Scholar 

  4. Eric, W., Grimson, L., Kikinis, Ron, Jolesz, Ferenc A., Black, Peter McL: Image-Guided Surgery. Scientific American, London (1999)

    Google Scholar 

  5. Gunay, O., Toreyin, B.U., Kose, K., Cetin, A.E.: Entropy-functional-based online adaptive decision fusion framework with application to wildfire detection in video. IEEE Trans. Image Process. 21, 2853–2865 (2012)

    Article  MathSciNet  Google Scholar 

  6. Figueroa, A., Tsai, P.S., Bent, E., Guo, R.: Robust spots finding in microarray images with distortions.


  8. Tsotsos, J.K.: A Computational Perspective on Visual Attention. MIT Press, Cambridge (2011)

    Book  Google Scholar 

  9. Zhang, W., Samaras, D., Zelinsky, G.: Classifying objects based on their visual similarity to target categories. In: Proceedings of the 30th Annual Conference of the Cognitive Science Society, pp. 1856–1861 (2008)

  10. Zhao, X., Reyes, M. G., Pappas, T. N., Neuhoff, D. L.: Structural texture similarity metrics for retrieval applications. In: Proceedings of the International Conference on Image Processing, pp. 1196–1199 (2008)

  11. Jain, A.K., Feng, J., Nandakumar, K.: Fingerprint matching. IEEE Comput. 43, 36–44 (2010)

    Article  Google Scholar 

  12. Bowyer, K.W., Hollingsworth, K., Flynn, P.J.: Image understanding for iris biometrics: a survey. Comput. Vis. Image Underst. 110, 281–307 (2008)

    Article  Google Scholar 

  13. Burge, M.J., Bowyer, K.W. (eds.): Handbook of Iris Recognition. Springer, Berlin (2013)

    Google Scholar 


  15. Lee, J.-E., Jain, A.K., Jin, R.: Scars, marks, and tattoos (SMT): soft biometric for suspect and victim identification. Biometrics Symposium, Tampa, September (2008)

  16. Burns, L.R. Jr., Dorrance, D.R., Golab, T.J., Shylanski, M.S., Strege, T.A.: Common reference target machine vision wheel alignment system. US Patent No. 7164472 B2 (2007)


  18. Vandewalle, P., Kovacevic, J., Veterlli, M.: Reproducible research in signal processing. IEEE Signal Process. Mag. 26(3), 37–47 (2009)

    Google Scholar 

  19. Torralba, A., Efros, A. A.: Unbiased Look at Dataset Bias. CVPR, pp. 1521–1528 (2011)

  20. Pavlidis, T.: The number of all possible meaningful or discernible pictures. Pattern Recognit. Lett. 30, 1413–1415 (2009)

    Article  Google Scholar 


  22. Xiu, P., Baird, H.S.: Whole-book recognition. IEEE Trans. PAMI. 34(12), 2467–2480(2012)

    Google Scholar 

  23. Lee, D.-S., Smith, R.: Improving book OCR by adaptive language and image model. In: Proceedings of the 2012 10th IAPR Int’l Workshop on Document Analysis Systems, IEEE, pp. 115–119

  24. Le et al. Q.V.: Building high-level features using large scale unsupervised learning. In: Proceedings of the 29th International Conference on Machine Learning, Edinburgh, Scotland, UK (2012)

  25. Pavlidis, T., Joseph, E., He, D., Hatton, E., Lu, K.: Measurement of dimensions of solid objects from two-dimensional image(s). US Patent 6,995,762 February 7 (2006)

  26. Lu, K.-F., Pavlidis, T.: Detecting textured objects using convex hull. Mach. Vis. Appl. 18, 123–133 (2007)

    Article  Google Scholar 

Download references


While writing this paper, I received significant help from several people who commented on drafts of the paper and/or provided pointers to literature and research activities that I had missed. They are (in alphabetical order): Henry Baird (Lehigh Univ.), Alex Berg (Stony Brook Univ.), Kevin Bowyer (Notre Dame), Enis Cetin (Bilkent Univ., Ankara), Jelena Kovacevic (Carnegie Mellon Univ.), Mike McCann (Carnegie Mellon Univ.), Imad Malhas (IrisGuard), Paul Pavlidis (Univ. of British Columbia), Fabio Roli (Univ. of Cagliari, Italy), Dimitri Samaras (Stony Brook Univ.), Ray Smith (Google), John Tsotsos (York Univ., Canada) and Greg Zelinsky (Stony Brook Univ.). Obviously, these people need not share all of my views on machine vision.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Theo Pavlidis.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (docx 812 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pavlidis, T. The challenge of general machine vision. SIViP 8, 191–195 (2014).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: