Towards Transparent Systems: Semantic Characterization of Failure Modes

  • Aayush Bansal
  • Ali Farhadi
  • Devi Parikh
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8694)


Today’s computer vision systems are not perfect. They fail frequently. Even worse, they fail abruptly and seemingly inexplicably. We argue that making our systems more transparent via an explicit human understandable characterization of their failure modes is desirable. We propose characterizing the failure modes of a vision system using semantic attributes. For example, a face recognition system may say “If the test image is blurry, or the face is not frontal, or the person to be recognized is a young white woman with heavy make up, I am likely to fail.” This information can be used at training time by researchers to design better features, models or collect more focused training data. It can also be used by a downstream machine or human user at test time to know when to ignore the output of the system, in turn making it more reliable. To generate such a “specification sheet”, we discriminatively cluster incorrectly classified images in the semantic attribute space using L1-regularized weighted logistic regression. We show that our specification sheets can predict oncoming failures for face and animal species recognition better than several strong baselines. We also show that lay people can easily follow our specification sheets.


Vision System Failure Mode Discriminative Function Semantic Attribute Failure Prediction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Supplementary material

978-3-319-10599-4_24_MOESM1_ESM.pdf (132 kb)
Electronic Supplementary Material (PDF 133 KB)


  1. 1.
    Stack, J.: Automation for underwater mine recognition: Current trends & future strategy. In: Proceedings of SPIE Defense & Security (2011)Google Scholar
  2. 2.
    Duin, R.P.W., Tax, D.M.J.: Classifier Conditional Posterior Probabilities. In: Amin, A., Pudil, P., Dori, D. (eds.) SPR 1998 and SSPR 1998. LNCS, vol. 1451, pp. 611–619. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  3. 3.
    Kukar, M.: Estimating confidence values of individual predictions by their typicalness and reliability. In: ECAI (2004)Google Scholar
  4. 4.
    Muhlbaier, M., Topalis, A., Polikar, R.: Ensemble confidence estimates posterior probability. In: Oza, N.C., Polikar, R., Kittler, J., Roli, F. (eds.) MCS 2005. LNCS, vol. 3541, pp. 326–335. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  5. 5.
    Delany, S.J., Cunningham, P., Doyle, D., Zamolotskikh, A.: Generating estimates of classification confidence for a case-based spam filter. In: Muñoz-Ávila, H., Ricci, F. (eds.) ICCBR 2005. LNCS (LNAI), vol. 3620, pp. 177–190. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  6. 6.
    Dredze, M., Crammer, K.: Confidence-weighted linear classification. In: ICML (2008)Google Scholar
  7. 7.
    Bach, N., Huang, F., Al-Onaizan, Y.: Goodness: A method for measuring machine translation confidence. In: ACL (2011)Google Scholar
  8. 8.
    Jiang, H.: Confidence measures for speech recognition: A survey. Speech Communication (2005)Google Scholar
  9. 9.
    Zhang, W., Yu, S.X., Teng, S.H.: Power svm: Generalization with exemplar classification uncertainty. In: CVPR (2012)Google Scholar
  10. 10.
    Boshra, M., Bhanu, B.: Predicting performance of object recognition. PAMI (2000)Google Scholar
  11. 11.
    Wang, R., Bhanu, B.: Learning models for predicting recognition performance. In: ICCV (2005)Google Scholar
  12. 12.
    Scheirer, W.J., Rocha, A., Micheals, R.J., Boult, T.E.: Meta-recognition: The theory and practice of recognition score analysis. PAMI (2011)Google Scholar
  13. 13.
    Wang, P., Ji, Q., Wayman, J.L.: Modeling and predicting face recognition system performance based on analysis of similarity scores. PAMI (2007)Google Scholar
  14. 14.
    Scheirer, W., Kumar, N., Belhumeur, P., Boult, T.: Multi-attribute spaces: Calibration for attribute fusion and similarity search. In: CVPR (2012)Google Scholar
  15. 15.
    Scheirer, W., Rocha, A., Micheals, R., Boult, T.: Robust fusion: Extreme value theory for recognition score normalization. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part III. LNCS, vol. 6313, pp. 481–495. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  16. 16.
    Sarma, A., Palmer, D.D.: Context-based speech recognition error detection and correction. In: NAACL (Short papers) (2004)Google Scholar
  17. 17.
    Choularton, S.: Early stage detection of speech recognition errors (2009)Google Scholar
  18. 18.
    Jammalamadaka, N., Zisserman, A., Eichner, M., Ferrari, V., Jawahar, C.V.: Has my algorithm succeeded? An evaluator for human pose estimators. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 114–128. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  19. 19.
    Hoiem, D., Chodpathumwan, Y., Dai, Q.: Diagnosing error in object detectors. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 340–353. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  20. 20.
    Farhadi, A., Endres, I., Hoiem, D.: Attribute-centric recognition for cross-category generalization. In: CVPR (2010)Google Scholar
  21. 21.
    Lampert, C., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: CVPR (2009)Google Scholar
  22. 22.
    Parikh, D., Grauman, K.: Relative attributes. In: ICCV (2011)Google Scholar
  23. 23.
    Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: CVPR (2009)Google Scholar
  24. 24.
    Kovashka, A., Parikh, D., Grauman, K.: Whittlesearch: Image search with relative attribute feedback. In: CVPR (2012)Google Scholar
  25. 25.
    Kumar, N., Belhumeur, P., Nayar, S.: FaceTracer: A search engine for large collections of images with faces. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 340–353. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  26. 26.
    Parkash, A., Parikh, D.: Attributes for classifier feedback. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 354–368. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  27. 27.
    Berg, T.L., Berg, A.C., Shih, J.: Automatic attribute discovery and characterization from noisy web data. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 663–676. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  28. 28.
    Wang, J., Markert, K., Everingham, M.: Learning models for object recognition from natural language descriptions. In: BMVC (2009)Google Scholar
  29. 29.
    Wang, G., Forsyth, D.: Joint learning of visual attributes, object classes and visual saliency. In: ICCV (2009)Google Scholar
  30. 30.
    Ferrari, V., Zisserman, A.: Learning visual attributes. In: NIPS (2007)Google Scholar
  31. 31.
    Branson, S., Wah, C., Schroff, F., Babenko, B., Welinder, P., Perona, P., Belongie, S.: Visual recognition with humans in the loop. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 438–451. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  32. 32.
    Wang, G., Forsyth, D., Hoiem, D.: Comparative object similarity for improved recognition with few or no examples. In: CVPR (2010)Google Scholar
  33. 33.
    Parikh, D., Grauman, K.: Interactively building a discriminative vocabulary of nameable attributes. In: CVPR (2011)Google Scholar
  34. 34.
    Biswas, A., Parikh, D.: Simultaneous active learning of classifiers & attributes via relative feedback. In: CVPR (2013)Google Scholar
  35. 35.
    Kumar, N., Berg, A., Belhumeur, P., Nayar, S.: Attribute and simile classifiers for face verification. In: ICCV (2009)Google Scholar
  36. 36.
    Patterson, G., Hays, J.: Sun attribute database: Discovering, annotating, and recognizing scene attributes. In: CVPR (2012)Google Scholar
  37. 37.
    Kulkarni, G., Premraj, V., Dhar, S., Li, S., Choi, Y., Berg, A.C., Berg, T.L.: Baby talk: Understanding and generating simple image descriptions. In: CVPR (2011)Google Scholar
  38. 38.
    Koh, K., Kim, S.J., Boyd, S.: An interior-point method for large-scale l1-regularized logistic regression. J. Mach. Learn. Res. (2007)Google Scholar
  39. 39.
    Platt, J.: Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In: Advances in Large Margin Classiers (2000)Google Scholar
  40. 40.
    Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Machine Learning International Workshop (1996)Google Scholar
  41. 41.
    Appel, R., Fuchs, T., Dollár, P., Perona, P.: Quickly boosting decision trees - pruning underachieving features early. In: ICML (2013)Google Scholar
  42. 42.
    Dollár, P.: Piotr’s Image and Video Matlab Toolbox,

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Aayush Bansal
    • 1
  • Ali Farhadi
    • 2
  • Devi Parikh
    • 3
  1. 1.Carnegie Mellon UniversityPittsburghUSA
  2. 2.University of WashingtonSeattleUSA
  3. 3.Virginia TechBlacksburgUSA

Personalised recommendations