International Journal of Computer Vision

, Volume 108, Issue 3, pp 186–203 | Cite as

Mining Mid-level Features for Image Classification



Mid-level or semi-local features learnt using class-level information are potentially more distinctive than the traditional low-level local features constructed in a purely bottom-up fashion. At the same time they preserve some of the robustness properties with respect to occlusions and image clutter. In this paper we propose a new and effective scheme for extracting mid-level features for image classification, based on relevant pattern mining. In particular, we mine relevant patterns of local compositions of densely sampled low-level features. We refer to the new set of obtained patterns as Frequent Local Histograms or FLHs. During this process, we pay special attention to keeping all the local histogram information and to selecting the most relevant reduced set of FLH patterns for classification. The careful choice of the visual primitives and an extension to exploit both local and global spatial information allow us to build powerful bag-of-FLH-based image representations. We show that these bag-of-FLHs are more discriminative than traditional bag-of-words and yield state-of-the-art results on various image classification benchmarks, including Pascal VOC.


Frequent itemset mining Image classification Discriminative patterns Mid-level features 



The authors acknowledge the support of the iMinds Impact project Beeldcanon, the FP7 ERC Starting Grant 240530 COGNIMUND and PASCAL 2 Network of Excellence.


  1. Agarwal, A., & Triggs, B. (2008). Multilevel image coding with hyperfeatures. International Journal of Computer Vision, 78, 15–27. doi: 10.1007/s11263-007-0072-x.Google Scholar
  2. Agrawal, R., Imieliński, T., & Swami, A. (1993). Mining association rules between sets of items in large database. SIGMOD Record, 22, 207–216.CrossRefGoogle Scholar
  3. Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In VLDB, (pp. 487–499). Accessed 30 Aug 2013.
  4. Boiman, O., Shechtman, E., Irani, M. (2008). In defense of nearest-neighbor based image classification. In CVPR.Google Scholar
  5. Bourdev, L., & Malik, J. (2009). Poselets: Body part detectors trained using 3d human pose annotations. In International conference on computer vision (ICCV). . Accessed 30 Aug 2013.
  6. Boureau, Y. L., Bach, F., LeCun, Y., & Ponce, J. (2010). Learning mid-level features for recognition. In CVPR.Google Scholar
  7. Chang, C. C., & Lin, C. J. (2011). Libsvm: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27:1–27:27.CrossRefGoogle Scholar
  8. Chatfield, K., Lempitsky, V., Vedaldi, A., & Zisserman, A. (2011). The devil is in the details: An evaluation of recent feature encoding methods. In BMVC.Google Scholar
  9. Cheng, H., Yan, X., Han, J., & Hsu, C. W. (2007). Discriminative frequent pattern analysis for effective classification. In ICDE (pp. 716–725). doi: 10.1109/ICDE.2007.367917.
  10. Chum, O., Perdoch, M., & Matas, J. (2009). Geometric min-hashing: Finding a (thick) needle in a haystack. In CVPR. doi: 10.1109/CVPR.2009.5206531.
  11. Cinbis, R. G., Verbeek, J., & Schmid, C. (2012). Image categorization using fisher kernels of non-iid image models. In CVPR.Google Scholar
  12. Csurka, G., Dance, C. R., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In Work. on statistical learning in CV (pp. 1–22).Google Scholar
  13. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE conference on computer vision and patternn recognition (CVPR).Google Scholar
  14. Endres, I., Shih, K. J., Jiaa, J., & Hoiem, D. (2013). Learning collections of part models for object recognition. In The IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  15. Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2007). The PASCAL Visual Object Classes Challenge 2007 Results. Accessed 30 Aug 2013.
  16. Farhadi, A., Endres, I., Hoiem, D., & Forsyth, D. (2009). Describing objects by their attributes. In CVPR (pp. 1778–1785). doi: 10.1109/CVPR.2009.5206772.
  17. Fernando, B., Fromont, E., Muselet, D., & Sebban, M. (2012). Discriminative feature fusion for image classification. In CVPR.Google Scholar
  18. Fernando, B., Fromont, É., & Tuytelaars, T. (2012). Effective use of frequent itemset mining for image classification. In ECCV, Lecture Notes in Computer Science (Vol. 7572, pp. 214–227). New York: Springer.Google Scholar
  19. Fernando, B., & Tuytelaars, T. (2013). Mining multiple queries for image retrieval: On-the-fly learning of an object-specific mid-level representation. In ICCV.Google Scholar
  20. Gilbert, A., Illingworth, J., Bowden, R. (2009). Fast realistic multi-action recognition using mined dense spatio-temporal features. In ICCV (pp. 925–931). doi: 10.1109/ICCV.2009.5459335.
  21. Jaakkola, T., & Haussler, D. (1998) Exploiting generative models in discriminative classifiers. In NIPS (pp. 487–493).Google Scholar
  22. Juneja, M., Vedaldi, A., Jawahar, C. V., & Zisserman, A. (2013) Blocks that shout: Distinctive parts for scene classification. In CVPR.Google Scholar
  23. Kim, S., Jin, X., & Han, J. (2010). Disiclass: Discriminative frequent pattern-based image classification. In Tenth int. workshop on multimedia data mining. doi:  10.1145/1814245.1814252.
  24. Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR (pp. 2169–2178).Google Scholar
  25. Lee, A. J., Liu, Y. H., Tsai, H. M., Lin, H. H., & Wu, H. W. (2009). Mining frequent patterns in image databases with 9d-spa representation. Journal of Systems and Software, 82(4), 603–618. doi: 10.1016/j.jss.2008.08.028.Google Scholar
  26. Lee, Y. J., Efros, A. A., & Hebert, M. (2013). Style-aware mid-level representation for discovering visual connections in space and time. In International conference on computer vision.Google Scholar
  27. Ling, H., & Soatto, S. (2007). Proximity distribution kernels for geometric context in category recognition. In ICCV.Google Scholar
  28. Liu, D., Hua, G., Viola, P., & Chen, T. (2008). Integrated feature selection and higher-order spatial feature extraction for object categorization. In it CVPR.Google Scholar
  29. Lowe, D. G. (1999). Object recognition from local scale-invariant features. In ICCV, (pp. 1150–1157).Google Scholar
  30. Nilsback, M. E., & Zisserman, A. (2008). Automated flower classification over a large number of classes. In ICVGIP (pp. 722–729). doi:  10.1109/ICVGIP.2008.47.
  31. Nowozin, S., Tsuda, K., Uno, T., Kudo, T., & Bakir, G. (2007). Weighted substructure mining for image analysis. In CVPR. doi: 10.1109/CVPR.2007.383171.
  32. Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2004). Weak hypotheses and boosting for generic object detection and recognition. In ECCV (pp. 71–84).Google Scholar
  33. Perronnin, F., Sánchez, J., & Mensink, T. (2010). Improving the fisher kernel for large-scale image classification. In ECCV (pp. 143–156). Accessed 30 Aug 2013.
  34. Quack, T., Ferrari, V., & Gool, L. V. (2006). Video mining with frequent itemset configurations. In CIVR (pp. 360–369).Google Scholar
  35. Quack, T., Ferrari, V., Leibe, B., & Van Gool, L. (2007). Efficient mining of frequent and distinctive feature configurations. In ICCV.Google Scholar
  36. Rematas, K., Fritz, M., & Tuytelaars, T. (2012). The pooled nbnn kernel: Beyond image-to-class and image-to-image. ACCV, 7724, 176–189.Google Scholar
  37. Savarese, S., Winn, J., & Criminisi, A. (2006). Discriminative object class models of appearance and shape by correlatons. In CVPR.Google Scholar
  38. Shahbaz Khan, F., van de Weijer, J., & Vanrell, M. (2009). Top-down color attention for object recognition. In ICCV (pp. 979–986).Google Scholar
  39. Sharma, G., Jurie, F., & Schmid, C. (2013). Expanded parts model for human attribute and action recognition in still images. In CVPR.Google Scholar
  40. Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep fisher networks for large-scale image classification. In Advances in neural information processing systems.Google Scholar
  41. Singh, S., Gupta, A., & Efros, A. (2012). Unsupervised discovery of mid-level discriminative patches. In ECCV.Google Scholar
  42. Sivic, J., & Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. ICCV, 2, 1470–1477. Google Scholar
  43. Sivic, J., & Zisserman, A. (2004). Video data mining using configurations of viewpoint invariant regions. In CVPR. doi:  10.1109/CVPR.2004.1315071.
  44. Tuytelaars, T., Fritz, M., Saenko, K., & Darrell, T. (2011). The nbnn kernel. In ICCV (pp. 1824–1831).Google Scholar
  45. Uno, T., Asai, T., Uchida, Y., & Arimura, H. (2003). Lcm: An efficient algorithm for enumerating frequent closed item sets. In FIMI. Accessed 30 Aug 2013.
  46. van de Weijer, J., & Schmid, C. (2007). Applying color names to image description. In ICIP (pp. 493–496).Google Scholar
  47. Xie, N., Ling, H., Hu, W., & Zhang, X. (2010). Use bin-ratio information for category and scene classification. In CVPR (pp. 2313–2319). doi:  10.1109/CVPR.2010.5539917.
  48. Yan, X., Cheng, H., Han, J., & Xin, D. (2005). Summarizing itemset patterns: A profile-based approach. In ACM SIGKDD.Google Scholar
  49. Yang, Y., & Newsam, S. (2011). Spatial pyramid co-occurrence for image classification. In ICCV.Google Scholar
  50. Yao, B., & Fei-Fei, L. (2010). Grouplet: A structured image representation for recognizing human and object interactions. In CVPR.Google Scholar
  51. Yimeng Zhang, T. C. (2009). Efficient kernels for identifying unbounded-order spatial features. In CVPR.Google Scholar
  52. Yuan, J., Luo, J., & Wu, Y. (2008) Mining compositional features for boosting. In CVPR. doi:  10.1109/CVPR.2008.4587347.
  53. Yuan, J., Wu, Y., & Yang, M. (2007). Discovery of collocation patterns: from visual words to visual phrases. In CVPR. doi:  10.1109/CVPR.2007.383222.
  54. Yuan, J., Yang, M., & Wu, Y. (2011). Mining discriminative co-occurrence patterns for visual recognition. In CVPR (pp. 2777–2784). doi: 10.1109/CVPR.2011.5995476.
  55. Yun, U., & Leggett, J. J. (2005). Wfim: Weighted frequent itemset mining with a weight range and a minimum weight. In SDM’05.Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Basura Fernando
    • 1
  • Elisa Fromont
    • 2
  • Tinne Tuytelaars
    • 1
  1. 1.KU Leuven, ESAT-PSI, iMindsHeverleeBelgium
  2. 2.Laboratoire Hubert-Curien, UMR CNRS 5516Université de Lyon, Université de St-EtienneSaint-ÉtienneFrance

Personalised recommendations