Advertisement

International Journal of Computer Vision

, Volume 126, Issue 8, pp 822–854 | Cite as

Robust Detection and Affine Rectification of Planar Homogeneous Texture for Scene Understanding

  • Shahzor Ahmad
  • Loong-Fah Cheong
Article

Abstract

Man-made environments tend to be abundant with planar homogeneous texture, which manifests as regularly repeating scene elements along a plane. In this work, we propose to exploit such structure to facilitate high-level scene understanding. By robustly fitting a texture projection model to optimal dominant frequency estimates in image patches, we arrive at a projective-invariant method to localize such generic, semantically meaningful regions in multi-planar scenes. The recovered projective parameters also allow an affine-ambiguous rectification in real-world images marred with outliers, room clutter, and photometric severities. Comprehensive qualitative and quantitative evaluations are performed that show our method outperforms existing representative work for both rectification and detection. The potential of homogeneous texture for two scene understanding tasks is then explored. Firstly, in environments where vanishing points cannot be reliably detected, or the Manhattan assumption is not satisfied, homogeneous texture detected by the proposed approach is shown to provide alternative cues to obtain a scene geometric layout. Second, low-level feature descriptors extracted upon affine rectification of detected texture are found to be not only class-discriminative but also complementary to features without rectification, improving recognition performance on the 67-category MIT benchmark of indoor scenes. One of our configurations involving deep ConvNet features outperforms most current state-of-the-art work on this dataset, achieving a classification accuracy of 76.90%. The approach is additionally validated on a set of 31 categories (mostly outdoor man-made environments exhibiting regular, repeating structure), being a subset of the large-scale Places2 scene dataset.

Keywords

Homogeneous texture Planar rectification Invariant texture detection Scene geometric layout Scene classification Deep features 

References

  1. Ahmad, S., & Cheong, L.-F. (2016). Facilitating and exploring planar homogeneous texture for indoor scene understanding. In Proceedings of European conference on computer vision (pp. 35–51).Google Scholar
  2. Aiger, D., Cohen-Or, D., & Mitra, N. J. (2012). Repetition maximization based texture rectification. Computer Graphics Forum (EUROGRAPHICS), 31(2.2), 439–448.CrossRefGoogle Scholar
  3. Arandjelovi, R., & Zisserman, A. (2012). Three things everyone should know to improve object retrieval. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 2911–2918).Google Scholar
  4. Bappy, J. H., & Roy-Chowdhury, A. K. (2016). Inter-dependent CNNs for joint scene and object recognition. In Proceedings of international conference on pattern recognition.Google Scholar
  5. Boureau, Y.-L., Bach, F., LeCun, Y., & Ponce, J. (2010). Learning mid-level features for recognition. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 2559–2566).Google Scholar
  6. Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239.CrossRefGoogle Scholar
  7. Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
  8. Chatfield, K., Lempitsky, V., Vedaldi, A., & Zisserman, A. (2011). The devil is in the details: An evaluation of recent feature encoding methods. In Proceedings of British machine vision conference (pp. 76.1–76.12).Google Scholar
  9. Chum, O., & Matas, J. (2010). Planar affine rectification from change of scale. In Proceedings of Asian conference on computer vision (pp. 347–360).Google Scholar
  10. Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., & Vedaldi, A. (2014). Describing textures in the wild. In Proceedings IEEE conference on computer vision and pattern recognition (pp. 3606–3613).Google Scholar
  11. Cimpoi, M., Maji, S., & Vedaldi, A. (2015). Deep filter banks for texture recongition and segmentation. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 3828–3836).Google Scholar
  12. Collins, T., Durou, J., Gurdjos, P., & Bartoli, A. (2010). Single-view perspective shape-from-texture with focal length estimation: A piecewise affine approach. In Proceedings of 3D data processing, visualization and transmission (3DPVT).Google Scholar
  13. Coughlan, J. M., & Yuille, A. L. (1999). Manhattan world: Compass direction from a single image by Bayesian inference. In Proceedings of IEEE international conference on computer vision (pp. 941–947).Google Scholar
  14. Criminsi, A., & Zisserman, A. (2000). Shape from texture: Homogeneity revisited. In Proceedings of British machine vision conference (p. 8291).Google Scholar
  15. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of IEEE international conference on computer vision (pp. 886–893).Google Scholar
  16. Doersch, C., Gupta, A., & Efros, A. A. (2013). Mid-level visual element discovery as discriminative mode seeking. In Proceedings of neural information processing systems (pp. 494–502).Google Scholar
  17. Donahue*, J., Jia*, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E. & Darrell, T. (2014). DeCAF: A deep convolutional activation feature for generic visual recognition. In Proceedings of international conference on machine learning. ( * = equal contribution).Google Scholar
  18. Eigen, D., & Fergus, R. (2015). Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of IEEE international conference on computer vision.Google Scholar
  19. Everingham, M., Eslami, S. M. A., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2014). The PASCAL visual object classes challenge: A retrospective. IJCV, 111(1), 98–136.CrossRefGoogle Scholar
  20. Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRefGoogle Scholar
  21. Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395.MathSciNetCrossRefGoogle Scholar
  22. Gong, Y., Wang, L., Guo, R., & Lazebnik, S. (2014). Multi-scale orderless pooling of deep convolutional activation features. In Proceedings of European conference on computer vision (pp. 392–407).Google Scholar
  23. Hartley, R. I., & Zisserman, A. (2004). Multiple view geometry in computer vision (2nd ed.). Cambridge: Cambridge University Press. ISBN: 0521540518.CrossRefzbMATHGoogle Scholar
  24. Havlicek, J. P., Bovik, A. C., & Maragos, P. (1992). Modulation models for image processing and wavelet-based image demodulation. In Proceedings of Asilomar conference on signals, systems and computers (pp. 805–810).Google Scholar
  25. Hedau, V., Hoiem, D., & Forsyth, D. (2009). Recovering the spatial layout of cluttered rooms. In Proceedings of IEEE international conference on computer vision (pp. 1849–1856).Google Scholar
  26. Hoiem, D., Efros, A. A., & Hebert, M. (2007). Recovering surface layout from an image. International Journal of Computer Vision, 75(1), 151–172.CrossRefzbMATHGoogle Scholar
  27. Hong, W., Yang, A. Y., Huang, K., & Ma, Y. (2004). On symmetry and multiple-view geometry: Structure, pose, and calibration from a single image. International Journal of Computer Vision, 60(3), 241–265.CrossRefGoogle Scholar
  28. Huang, Y., Wu, Z., Wang, L., & Tan, T. (2014). Feature coding in image classification: A comprehensive study. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(3), 493–506.CrossRefGoogle Scholar
  29. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093. http://caffe.berkeleyvision.org/.
  30. Juneja, M., Vedaldi, A., Jawahar, C. V., & Zisserman, A. (2013). Blocks that shout: Distinctive parts for scene classification. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 923–930).Google Scholar
  31. Kosecka, J., & Zhang, W. (2003). Extraction, matching and pose recovery based on dominant rectangular structures. In First IEEE international workshop on higher-level knowledge in 3D modeling and motion analysis (pp. 83–91).Google Scholar
  32. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of neural information processing systems (pp. 1097–1105).Google Scholar
  33. Krumm, J., & Shafer, S. (1992). Shape from periodic texture using the spectrogram. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 284–289).Google Scholar
  34. Kulkarni, P., Jurie, F., Zepeda, J., Prez, P., & Chevallier, L. (2016). SPLeaP: Soft pooling of learned parts for image classification. In Proceedings of European conference on computer vision (pp. 329–345).Google Scholar
  35. Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 2169–2178).Google Scholar
  36. Leung, T., & Malik, J. (1996). Detecting, localizing and grouping repeated scene elements from an image. In Proceedings of European conference on computer vision (pp. 546–555).Google Scholar
  37. Lian, X.-C., Li, Z., Lu, B.-L., & Zhang, L. (2010). Max-margin dictionary learning for multiclass image categorization. In Proceedings of European conference on computer vision (pp. 157–170).Google Scholar
  38. Lin, D., Lu, C., Liao, R., & Jia, J. (2014). Learning important spatial pooling regions for scene classification. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 3726–3733).Google Scholar
  39. Liu, X., Veksler, O., & Samarabandu, J. (2010). Order-preserving moves for graph-cut-based optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(7), 1182–1196.CrossRefGoogle Scholar
  40. Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRefGoogle Scholar
  41. Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., et al. (2005). A comparison of affine region detectors. International Journal of Computer Vision, 65(1–2), 43–72.CrossRefGoogle Scholar
  42. Ojala, T., Pietikinen, M., & Menp, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 971–987.CrossRefGoogle Scholar
  43. Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.CrossRefzbMATHGoogle Scholar
  44. Pandey, M., & Lazebnik, S. (2011). Scene recognition and weakly supervised object localization with deformable part-based models. In Proceedings of IEEE international conference on computer vision (pp. 1307–1314).Google Scholar
  45. Patterson, G., Xu, C., Su, H., & Hays, J. (2014). The SUN attribute database: Beyond categories for deeper scene understanding. International Journal of Computer Vision, 108(1), 59–81.CrossRefGoogle Scholar
  46. Petkov, N., & Kruizinga, P. (1997). Computational models of visual neurons specialised in the detection of periodic and aperiodic oriented visual stimuli: Bar and grating cells. Biological Cybernetics, 76(2), 83–96.CrossRefzbMATHGoogle Scholar
  47. Picard, R. W. (2010). A society of models for video and image libraries. IBM Systems Journal, 35(3.4), 292–312.CrossRefGoogle Scholar
  48. Pritts, J., Chum, O., & Matas, J. (2014). Detection, rectification and segmentation of coplanar repeated patterns. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 2973–2980).Google Scholar
  49. Qi, M., & Wang, Y. (2016). Deep-CSSR: Scene classification using category-specific salient region with deep features. In Proceedings of international conference on image processing.Google Scholar
  50. Quan, Y., Xu, Y., Sun, Y., & Luo, Y. (2014). Lacunarity analysis on image patterns for texture classification. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 160–167).Google Scholar
  51. Quattoni, A., & Torralba, A. (2009). Recognizing indoor scenes. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 413–420).Google Scholar
  52. Razavian, A. S., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: An astounding baseline for recognition. In Proceedings of IEEE conference on computer vision and pattern recognition workshop (pp. 512–519).Google Scholar
  53. Renninger, L. W., & Malik, J. (2004). When is scene identification just texture recognition? Vision Research, 44(19), 2301–2311.CrossRefGoogle Scholar
  54. Ribeiro, E., & Hancock, E. R. (2000). Estimating the 3d orientation of texture planes using local spectral analysis. Image and Vision Computing, 18(8), 619–631.CrossRefGoogle Scholar
  55. Rosenholtz, R., & Malik, J. (1997). Surface orientation from texture: Isotropy or homogeneity (or both)? Vision Resarch, 37(16), 2283–2293.CrossRefGoogle Scholar
  56. Rother, C. (2000). A new approach for vanishing point detection in architectural environments. In Proceedings of British machine vision conference (pp. 382–391).Google Scholar
  57. Russakovsky*, O., Deng*, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252. ( * = equal contribution).Google Scholar
  58. Schaffalitzky, F., & Zisserman, A. (1998). Geometric grouping of repeated elements within images. In Proceedings of British machine vision conference (pp. 165–181).Google Scholar
  59. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2014). Overfeat: Integrated recognition, localization and detection using convolutional networks. In Proceedings of international conference on learning representations. http://cilvr.nyu.edu/doku.php?id=software:overfeat:start.
  60. Shaw, D., & Barnes, N. (2006). Perspective rectangle detection. In Proceedings of European conference on computer vision workshop on applications of computer vision.Google Scholar
  61. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In Proceedings of international conference on learning representations.Google Scholar
  62. Singh, S., Gupta, A., & Efros, A. A. (2012). Unsupervised discovery of mid-level discriminative patches. In Proceedings of European conference on computer vision (pp. 73–86).Google Scholar
  63. Stella, X. Y., Zhang, H., & Malik, J. (2008). Inferring spatial layout from a single image via depth-ordered grouping. In IEEE conference on computer vision and pattern recognition workshop (pp. 1–7).Google Scholar
  64. Super, B. J., & Bovik, A. C. (1991). Three-dimensional orientation from texture using gabor wavelets. In Proceedings of SPIE visual communications and image processing ’91: Image processing.Google Scholar
  65. Super, B. J., & Bovik, A. C. (1995a). Planar surface orientation from texture spatial frequencies. Pattern Recognition, 28(5), 729–743.CrossRefGoogle Scholar
  66. Super, B. J., & Bovik, A. C. (1995b). Shape from texture using local spectral moments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(4), 333–343.CrossRefGoogle Scholar
  67. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 1–9).Google Scholar
  68. Tuytelaars, T., Turina, A., & Gool, L. V. (2003). Noncombinatorial detection of regular repetitions under perspective skew. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(4), 418–432.CrossRefGoogle Scholar
  69. Varma, M., & Zisserman, A. (2002). Classifying images of materials: Achieving viewpoint and illumination independence. In Proceedings of European conference on computer vision (pp. 255–271).Google Scholar
  70. Vedaldi, A., & Fulkerson, B. (2008). VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/.
  71. Vedaldi, A., & Lenc, K. (2015). MatConvNet—convolutional neural networks for MATLAB. In Proceedings of ACM international conference on multimedia. http://www.vlfeat.org/matconvnet/.
  72. Wu, C., Frahm, J.-M., & Pollefeys, M. (2010). Detecting large repetitive structures with salient boundaries. In Proceedings of European conference on computer vision (pp. 142–155).Google Scholar
  73. Wu, C., Frahm, J.-M., & Pollefeys, M. (2011). Repetition-based dense single-view reconstruction. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 3113–3120).Google Scholar
  74. Wu, J., & Rehg, J. M. (2011). CENTRIST: A visual descriptor for scene categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8), 1489–1501.CrossRefGoogle Scholar
  75. Wu, R., Wang, B., Wang, W., & Yu, Y. (2015). Harvesting discriminative meta objects with deep CNN features for scene classification. In Proceedings of IEEE international conference on computer vision.Google Scholar
  76. Xiao, J., Ehinger, K. A., Hays, J., Torralba, A., & Oliva, A. (2016). SUN database: Exploring a large collection of scene categories. International Journal of Computer Vision, 119, 3–22.MathSciNetCrossRefGoogle Scholar
  77. Xie, L., Wang, J., Guo, B., Zhang, B., & Tian, Q. (2014). Orientational pyramid matching for recognizing indoor scenes. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 3734–3741).Google Scholar
  78. Yang, J., Yu, K., & Huang, T. (2010). Supervised translation-invariant sparse coding. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 3517–3524).Google Scholar
  79. Zhang, J., & Tan, T. (2003). Affine invariant classification and retrieval of texture images. Pattern Recognition, 36(3), 657–664.CrossRefGoogle Scholar
  80. Zhang, J., Marszaek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: A comprehensive study. International Journal of Computer Vision, 73(2), 213–238.CrossRefGoogle Scholar
  81. Zhang, Z. (1998). Determining the epipolar geometry and its uncertainty: A review. International Journal of Computer Vision, 27(2), 161–195.CrossRefGoogle Scholar
  82. Zhang, Z., Liang, X., Ganesh, A., & Ma, Y. (2010). TILT: Transform invariant low-rank textures. In Proceedings of Asian conference on computer vision (pp. 314–328).Google Scholar
  83. Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using Places Database. In Proceedings of neural information processing systems.Google Scholar
  84. Zhou, B., Khosla, A., Lapedriza, A., Torralba, A., & Oliva, A. (2016). Places: An image database for deep scene understanding. arXiv preprint. http://places2.csail.mit.edu/.
  85. Zuo, Z., Wang, G., Shuai, B., Zhao, L., Yang, Q., & Jiang, X. (2014). Learning discriminative and shareable features for scene classification. In Proceedings of European conference on computer vision (pp. 552–568).Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.College of Electrical and Mechanical EngineeringNational University of Sciences and TechnologyIslamabadPakistan
  2. 2.Department of Electrical and Computer EngineeringNational University of SingaporeSingaporeSingapore

Personalised recommendations