Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Mining Mid-level Features for Image Classification

  • 2410 Accesses

  • 44 Citations


Mid-level or semi-local features learnt using class-level information are potentially more distinctive than the traditional low-level local features constructed in a purely bottom-up fashion. At the same time they preserve some of the robustness properties with respect to occlusions and image clutter. In this paper we propose a new and effective scheme for extracting mid-level features for image classification, based on relevant pattern mining. In particular, we mine relevant patterns of local compositions of densely sampled low-level features. We refer to the new set of obtained patterns as Frequent Local Histograms or FLHs. During this process, we pay special attention to keeping all the local histogram information and to selecting the most relevant reduced set of FLH patterns for classification. The careful choice of the visual primitives and an extension to exploit both local and global spatial information allow us to build powerful bag-of-FLH-based image representations. We show that these bag-of-FLHs are more discriminative than traditional bag-of-words and yield state-of-the-art results on various image classification benchmarks, including Pascal VOC.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8


  1. 1.



  1. Agarwal, A., & Triggs, B. (2008). Multilevel image coding with hyperfeatures. International Journal of Computer Vision, 78, 15–27. doi:10.1007/s11263-007-0072-x.

  2. Agrawal, R., Imieliński, T., & Swami, A. (1993). Mining association rules between sets of items in large database. SIGMOD Record, 22, 207–216.

  3. Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In VLDB, (pp. 487–499). http://portal.acm.org/citation.cfm?id=645920.672836. Accessed 30 Aug 2013.

  4. Boiman, O., Shechtman, E., Irani, M. (2008). In defense of nearest-neighbor based image classification. In CVPR.

  5. Bourdev, L., & Malik, J. (2009). Poselets: Body part detectors trained using 3d human pose annotations. In International conference on computer vision (ICCV). http://www.eecs.berkeley.edu/lbourdev/poselets . Accessed 30 Aug 2013.

  6. Boureau, Y. L., Bach, F., LeCun, Y., & Ponce, J. (2010). Learning mid-level features for recognition. In CVPR.

  7. Chang, C. C., & Lin, C. J. (2011). Libsvm: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27:1–27:27.

  8. Chatfield, K., Lempitsky, V., Vedaldi, A., & Zisserman, A. (2011). The devil is in the details: An evaluation of recent feature encoding methods. In BMVC.

  9. Cheng, H., Yan, X., Han, J., & Hsu, C. W. (2007). Discriminative frequent pattern analysis for effective classification. In ICDE (pp. 716–725). doi:10.1109/ICDE.2007.367917.

  10. Chum, O., Perdoch, M., & Matas, J. (2009). Geometric min-hashing: Finding a (thick) needle in a haystack. In CVPR. doi:10.1109/CVPR.2009.5206531.

  11. Cinbis, R. G., Verbeek, J., & Schmid, C. (2012). Image categorization using fisher kernels of non-iid image models. In CVPR.

  12. Csurka, G., Dance, C. R., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In Work. on statistical learning in CV (pp. 1–22).

  13. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE conference on computer vision and patternn recognition (CVPR).

  14. Endres, I., Shih, K. J., Jiaa, J., & Hoiem, D. (2013). Learning collections of part models for object recognition. In The IEEE conference on computer vision and pattern recognition (CVPR).

  15. Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2007). The PASCAL Visual Object Classes Challenge 2007 Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html. Accessed 30 Aug 2013.

  16. Farhadi, A., Endres, I., Hoiem, D., & Forsyth, D. (2009). Describing objects by their attributes. In CVPR (pp. 1778–1785). doi:10.1109/CVPR.2009.5206772.

  17. Fernando, B., Fromont, E., Muselet, D., & Sebban, M. (2012). Discriminative feature fusion for image classification. In CVPR.

  18. Fernando, B., Fromont, É., & Tuytelaars, T. (2012). Effective use of frequent itemset mining for image classification. In ECCV, Lecture Notes in Computer Science (Vol. 7572, pp. 214–227). New York: Springer.

  19. Fernando, B., & Tuytelaars, T. (2013). Mining multiple queries for image retrieval: On-the-fly learning of an object-specific mid-level representation. In ICCV.

  20. Gilbert, A., Illingworth, J., Bowden, R. (2009). Fast realistic multi-action recognition using mined dense spatio-temporal features. In ICCV (pp. 925–931). doi:10.1109/ICCV.2009.5459335.

  21. Jaakkola, T., & Haussler, D. (1998) Exploiting generative models in discriminative classifiers. In NIPS (pp. 487–493).

  22. Juneja, M., Vedaldi, A., Jawahar, C. V., & Zisserman, A. (2013) Blocks that shout: Distinctive parts for scene classification. In CVPR.

  23. Kim, S., Jin, X., & Han, J. (2010). Disiclass: Discriminative frequent pattern-based image classification. In Tenth int. workshop on multimedia data mining. doi: 10.1145/1814245.1814252.

  24. Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR (pp. 2169–2178).

  25. Lee, A. J., Liu, Y. H., Tsai, H. M., Lin, H. H., & Wu, H. W. (2009). Mining frequent patterns in image databases with 9d-spa representation. Journal of Systems and Software, 82(4), 603–618. doi:10.1016/j.jss.2008.08.028.

  26. Lee, Y. J., Efros, A. A., & Hebert, M. (2013). Style-aware mid-level representation for discovering visual connections in space and time. In International conference on computer vision.

  27. Ling, H., & Soatto, S. (2007). Proximity distribution kernels for geometric context in category recognition. In ICCV.

  28. Liu, D., Hua, G., Viola, P., & Chen, T. (2008). Integrated feature selection and higher-order spatial feature extraction for object categorization. In it CVPR.

  29. Lowe, D. G. (1999). Object recognition from local scale-invariant features. In ICCV, (pp. 1150–1157).

  30. Nilsback, M. E., & Zisserman, A. (2008). Automated flower classification over a large number of classes. In ICVGIP (pp. 722–729). doi: 10.1109/ICVGIP.2008.47.

  31. Nowozin, S., Tsuda, K., Uno, T., Kudo, T., & Bakir, G. (2007). Weighted substructure mining for image analysis. In CVPR. doi:10.1109/CVPR.2007.383171.

  32. Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2004). Weak hypotheses and boosting for generic object detection and recognition. In ECCV (pp. 71–84).

  33. Perronnin, F., Sánchez, J., & Mensink, T. (2010). Improving the fisher kernel for large-scale image classification. In ECCV (pp. 143–156). http://dl.acm.org/citation.cfm?id=1888089.1888101. Accessed 30 Aug 2013.

  34. Quack, T., Ferrari, V., & Gool, L. V. (2006). Video mining with frequent itemset configurations. In CIVR (pp. 360–369).

  35. Quack, T., Ferrari, V., Leibe, B., & Van Gool, L. (2007). Efficient mining of frequent and distinctive feature configurations. In ICCV.

  36. Rematas, K., Fritz, M., & Tuytelaars, T. (2012). The pooled nbnn kernel: Beyond image-to-class and image-to-image. ACCV, 7724, 176–189.

  37. Savarese, S., Winn, J., & Criminisi, A. (2006). Discriminative object class models of appearance and shape by correlatons. In CVPR.

  38. Shahbaz Khan, F., van de Weijer, J., & Vanrell, M. (2009). Top-down color attention for object recognition. In ICCV (pp. 979–986).

  39. Sharma, G., Jurie, F., & Schmid, C. (2013). Expanded parts model for human attribute and action recognition in still images. In CVPR.

  40. Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep fisher networks for large-scale image classification. In Advances in neural information processing systems.

  41. Singh, S., Gupta, A., & Efros, A. (2012). Unsupervised discovery of mid-level discriminative patches. In ECCV.

  42. Sivic, J., & Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. ICCV, 2, 1470–1477.

  43. Sivic, J., & Zisserman, A. (2004). Video data mining using configurations of viewpoint invariant regions. In CVPR. doi: 10.1109/CVPR.2004.1315071.

  44. Tuytelaars, T., Fritz, M., Saenko, K., & Darrell, T. (2011). The nbnn kernel. In ICCV (pp. 1824–1831).

  45. Uno, T., Asai, T., Uchida, Y., & Arimura, H. (2003). Lcm: An efficient algorithm for enumerating frequent closed item sets. In FIMI. http://fimi.ua.ac.be/src/. Accessed 30 Aug 2013.

  46. van de Weijer, J., & Schmid, C. (2007). Applying color names to image description. In ICIP (pp. 493–496).

  47. Xie, N., Ling, H., Hu, W., & Zhang, X. (2010). Use bin-ratio information for category and scene classification. In CVPR (pp. 2313–2319). doi: 10.1109/CVPR.2010.5539917.

  48. Yan, X., Cheng, H., Han, J., & Xin, D. (2005). Summarizing itemset patterns: A profile-based approach. In ACM SIGKDD.

  49. Yang, Y., & Newsam, S. (2011). Spatial pyramid co-occurrence for image classification. In ICCV.

  50. Yao, B., & Fei-Fei, L. (2010). Grouplet: A structured image representation for recognizing human and object interactions. In CVPR.

  51. Yimeng Zhang, T. C. (2009). Efficient kernels for identifying unbounded-order spatial features. In CVPR.

  52. Yuan, J., Luo, J., & Wu, Y. (2008) Mining compositional features for boosting. In CVPR. doi: 10.1109/CVPR.2008.4587347.

  53. Yuan, J., Wu, Y., & Yang, M. (2007). Discovery of collocation patterns: from visual words to visual phrases. In CVPR. doi: 10.1109/CVPR.2007.383222.

  54. Yuan, J., Yang, M., & Wu, Y. (2011). Mining discriminative co-occurrence patterns for visual recognition. In CVPR (pp. 2777–2784). doi:10.1109/CVPR.2011.5995476.

  55. Yun, U., & Leggett, J. J. (2005). Wfim: Weighted frequent itemset mining with a weight range and a minimum weight. In SDM’05.

Download references


The authors acknowledge the support of the iMinds Impact project Beeldcanon, the FP7 ERC Starting Grant 240530 COGNIMUND and PASCAL 2 Network of Excellence.

Author information

Correspondence to Basura Fernando.

Additional information

Communicated by M. Hebert.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Fernando, B., Fromont, E. & Tuytelaars, T. Mining Mid-level Features for Image Classification. Int J Comput Vis 108, 186–203 (2014). https://doi.org/10.1007/s11263-014-0700-1

Download citation


  • Frequent itemset mining
  • Image classification
  • Discriminative patterns
  • Mid-level features