Skip to main content
Log in

Detection of engineering vehicles in high-resolution monitoring images

  • Published:
Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Abstract

This paper presents a novel formulation for detecting objects with articulated rigid bodies from high-resolution monitoring images, particularly engineering vehicles. There are many pixels in high-resolution monitoring images, and most of them represent the background. Our method first detects object patches from monitoring images using a coarse detection process. In this phase, we build a descriptor based on histograms of oriented gradient, which contain color frequency information. Then we use a linear support vector machine to rapidly detect many image patches that may contain object parts, with a low false negative rate and a high false positive rate. In the second phase, we apply a refinement classification to determine the patches that actually contain objects. In this stage, we increase the size of the image patches so that they include the complete object using models of the object parts. Then an accelerated and improved salient mask is used to improve the performance of the dense scale-invariant feature transform descriptor. The detection process returns the absolute position of positive objects in the original images. We have applied our methods to three datasets to demonstrate their effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Avidan, S., 2006. SpatialBoost: adding spatial reasoning to AdaBoost. Proc. 9th European Conf. on Computer Vision, p.386–396. [doi:10.1007/11744085_30]

    Google Scholar 

  • Bay, H., Ess, A., Tuytelaars, T., et al., 2008. Speeded-up robust features (SURF). Comput. Vis. Image Understand., 110(3):346–359. [doi:10.1016/j.cviu.2007.09.014]

    Article  Google Scholar 

  • Breiman, L., Spector, P., 1992. Submodel selection and evaluation in regression. The X-random case. Int. Statist. Rev., 60(3):291–319.

    Article  Google Scholar 

  • Calonder, M., Lepetit, V., Strecha, C., et al., 2010. BRIEF: binary robust independent elementary features. Proc. 11th European Conf. on Computer Vision, p.778–792. [doi:10.1007/978-3-642-15561-1_56]

    Google Scholar 

  • Dalal, N., Triggs, B., 2005. Histograms of oriented gradients for human detection. Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.886–893. [doi:10.1109/CVPR.2005.177]

    Google Scholar 

  • Déniz, O., Bueno, G., Salido, J., et al., 2011. Face recognition using histograms of oriented gradients. Patt. Recogn. Lett., 32(12):1598–1603. [doi:10.1016/j.patrec.2011.01.004]

    Article  Google Scholar 

  • Dubout, C., Fleuret, F., 2012. Exact acceleration of linear object detectors. Proc. 12th European Conf. on Computer Vision, p.301–311. [doi:10.1007/978-3-642-33712-3_22]

    Google Scholar 

  • Felzenszwalb, P.F., Huttenlocher, D.P., 2005. Pictorial structures for object recognition. Int. J. Comput. Vis., 61(1):55–79. [doi:10.1023/B:VISI.0000042934.15159.49]

    Article  Google Scholar 

  • Felzenszwalb, P.F., Girshick, R.B., McAllester, D., 2010a. Cascade object detection with deformable part models. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.2241–2248. [doi:10.1109/CVPR.2010.5539906]

    Google Scholar 

  • Felzenszwalb, P.F., Girshick, R.B., McAllester, D., et al., 2010b. Object detection with discriminatively trained part-based models. IEEE Trans. Patt. Anal. Mach. Intell., 32(9):1627–1645. [doi:10.1109/TPAMI.2009.167]

    Article  Google Scholar 

  • Fischler, M.A., Elschlager, R.A., 1973. The representation and matching of pictorial structures. IEEE Trans. Comput., 22(1):67–92.

    Article  Google Scholar 

  • Goferman, S., Tal, A., Zelnik-Manor, L., 2010. Puzzle-like collage. Comput. Graph. For., 29(2):459–468. [doi:10.1111/j.1467-8659.2009.01615.x]

    Google Scholar 

  • Goferman, S., Zelnik-Manor, L., Tal, A., 2012. Contextaware saliency detection. IEEE Trans. Patt. Anal. Mach. Intell., 34(10):1915–1926. [doi:10.1109/TPAMI.2011.272]

    Article  Google Scholar 

  • Grauman, K., Darrell, T., 2005. The pyramid match kernel: discriminative classification with sets of image features. Proc. 10th IEEE Int. Conf. on Computer Vision, p.1458–1465. [doi:10.1109/ICCV.2005.239]

    Google Scholar 

  • Itti, L., Koch, C., 2001. Computational modelling of visual attention. Nat. Rev. Neurosci., 2(3):194–203. [doi:10.1038/35058500]

    Article  Google Scholar 

  • Juan, L., Gwun, O., 2009. A comparison of SIFT, PCA-SIFT and SURF. Int. J. Image Process., 3(4):143–152.

    Google Scholar 

  • Kanan, C., Cottrell, G., 2010. Robust classification of objects, faces, and flowers using natural image statistics. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.2472–2479. [doi:10.1109/CVPR.2010.5539947]

    Google Scholar 

  • Kanan, C., Tong, M.H., Zhang, L., et al., 2009. SUN: topdown saliency using natural statistics. Vis. Cogn., 17(6–7):979–1003. [doi:10.1080/13506280902771138]

    Article  Google Scholar 

  • Ke, Y., Sukthankar, R., 2004. PCA-SIFT: a more distinctive representation for local image descriptors. Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.506–513. [doi:10.1109/CVPR.2004.1315206]

    Google Scholar 

  • Kobayashi, T., 2013. BFO meets HOG: feature extraction based on histograms of oriented p.d.f gradients for image classification. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.747–754. [doi:10.1109/CVPR.2013.102]

    Google Scholar 

  • Kokkinos, I., 2011. Rapid deformable object detection using dual-tree branch-and-bound. Advances in Neural Information Processing Systems, p.2681–2689.

    Google Scholar 

  • Lazebnik, S., Schmid, C., Ponce, J., 2006. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.2169–2178. [doi:10.1109/CVPR.2006.68]

    Google Scholar 

  • Leutenegger, S., Chli, M., Siegwart, R.Y., 2011. BRISK: binary robust invariant scalable keypoints. Proc. IEEE Int. Conf. on Computer Vision, p.2548–2555. [doi:10.1109/ICCV.2011.6126542]

    Google Scholar 

  • Li, F.F., Perona, P., 2005. A Bayesian hierarchical model for learning natural scene categories. Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.524–531. [doi:10.1109/CVPR.2005.16]

    Google Scholar 

  • Li, F.F., Fergus, R., Perona, P., 2007. Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Comput. Vis. Image Understand., 106(1):59–70. [doi:10.1016/j.cviu.2005.09.012]

    Article  Google Scholar 

  • Li, W.H., Lin, Y.F., Fu, B., et al., 2013. Cascade classifier using combination of histograms of oriented gradients for rapid pedestrian detection. J. Softw., 8(1):71–77. [doi:10.4304/jsw.8.1.71-77]

    Google Scholar 

  • Liu, C., Yuen, J., Torralba, A., et al., 2008. SIFT flow: dense correspondence across different scenes. Proc. 10th European Conf. on Computer Vision, p.28–42. [doi:10.1007/978-3-540-88690-7_3]

    Google Scholar 

  • Lowe, D.G., 2004. Distinctive image features from scaleinvariant keypoints. Int. J. Comput. Vis., 60(2):91–110. [doi:10.1023/B:VISI.0000029664.99615.94]

    Article  Google Scholar 

  • Ojala, T., Pietikainen, M., Maenpaa, T., 2002. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Patt. Anal. Mach. Intell., 24(7):971–987. [doi:10.1109/TPAMI.2002.1017623]

    Article  Google Scholar 

  • Otsu, N., 1975. A threshold selection method from gray-level histograms. Automatica, 11:23–27.

    Google Scholar 

  • Ott, P., Everingham, M., 2009. Implicit color segmentation features for pedestrian and object detection. Proc. IEEE 12th Int. Conf. on Computer vision, p.723–730. [doi:10.1109/ICCV.2009.5459238]

    Google Scholar 

  • Pedersoli, M., Vedaldi, A., Gonzàlez, J., et al., 2015. A coarse-to-fine approach for fast deformable object detection. Patt. Recogn., 48(5):1844–1853. [doi:10.1016/j.patcog.2014.11.006]

    Article  Google Scholar 

  • Rahtu, E., Kannala, J., Salo, M., et al., 2010. Segmenting salient objects from images and videos. Proc. 11th European Conf. on Computer Vision, p.366–379. [doi:10.1007/978-3-642-15555-0_27]

    Google Scholar 

  • Rutishauser, U., Walther, D., Koch, C., et al., 2004. Is bottom-up attention useful for object recognition? Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.37–44. [doi:10.1109/CVPR.2004.1315142]

    Google Scholar 

  • Santella, A., Agrawala, M., DeCarlo, D., et al., 2006. Gazebased interaction for semi-automatic photo cropping. Proc. SIGCHI Conf. on Human Factors in Computing Systems, p.771–780. [doi:10.1145/1124772.1124886]

    Google Scholar 

  • Tola, E., Lepetit, V., Fua, P., 2010. DAISY: an efficient dense descriptor applied to wide-baseline stereo. IEEE Trans. Patt. Anal. Mach. Intell., 32(5):815–830. [doi:10.1109/TPAMI.2009.77]

    Article  Google Scholar 

  • van de Sande, K.E.A., Gevers, T., Snoek, C.G.M., 2010. Evaluating color descriptors for object and scene recognition. IEEE Trans. Patt. Anal. Mach. Intell., 32(9):1582–1596. [doi:10.1109/TPAMI.2009.154]

    Article  Google Scholar 

  • Vedaldi, A., Fulkerson, B., 2010. VLFeat: an open and portable library of computer vision algorithms. Proc. Int. Conf. on Multimedia, p.1469–1472. [doi:10.1145/1873951.1874249]

    Chapter  Google Scholar 

  • Wilcoxon, F., 1945. Individual comparisons by ranking methods. Biometr. Bull., 1(6):80–83.

    Article  Google Scholar 

  • Wu, J.X., Rehg, J.M., 2009. Beyond the Euclidean distance: creating effective visual codebooks using the histogram intersection kernel. Proc. IEEE 12th Int. Conf. on Computer Vision, p.630–637. [doi:10.1109/ICCV.2009.5459178]

    Google Scholar 

  • Yan, J.J., Lei, Z., Wen, L.Y., et al., 2014. The fastest deformable part model for object detection. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.2497–2504. [doi:10.1109/CVPR.2014.320]

    Google Scholar 

  • Zaklouta, F., Stanciulescu, B., 2014. Real-time traffic sign recognition in three stages. Robot. Auton. Syst., 62(1):16–24. [doi:10.1016/j.robot.2012.07.019]

    Article  Google Scholar 

  • Zhang, J., Marszalek, M., Lazebnik, S., et al., 2007. Local features and kernels for classification of texture and object categories: a comprehensive study. Int. J. Comput. Vis., 73(2):213–238. [doi:10.1007/s11263-006-9794-4]

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yin Zhang.

Additional information

Project supported by the China Knowledge Centre for Engineering Sciences and Technology (No. CKCEST-2014-1-2), the Zhejiang Provincial Natural Science Foundation of China (No. LY14F020027), and the National Natural Science Foundation of China (No. 61272304)

ORCID: Xun LIU, http://orcid.org/0000-0002-3045-2943

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, X., Zhang, Y., Zhang, Sy. et al. Detection of engineering vehicles in high-resolution monitoring images. Frontiers Inf Technol Electronic Eng 16, 346–357 (2015). https://doi.org/10.1631/FITEE.1500026

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/FITEE.1500026

Key words

Navigation