Abstract
This paper presents a novel formulation for detecting objects with articulated rigid bodies from high-resolution monitoring images, particularly engineering vehicles. There are many pixels in high-resolution monitoring images, and most of them represent the background. Our method first detects object patches from monitoring images using a coarse detection process. In this phase, we build a descriptor based on histograms of oriented gradient, which contain color frequency information. Then we use a linear support vector machine to rapidly detect many image patches that may contain object parts, with a low false negative rate and a high false positive rate. In the second phase, we apply a refinement classification to determine the patches that actually contain objects. In this stage, we increase the size of the image patches so that they include the complete object using models of the object parts. Then an accelerated and improved salient mask is used to improve the performance of the dense scale-invariant feature transform descriptor. The detection process returns the absolute position of positive objects in the original images. We have applied our methods to three datasets to demonstrate their effectiveness.
Similar content being viewed by others
References
Avidan, S., 2006. SpatialBoost: adding spatial reasoning to AdaBoost. Proc. 9th European Conf. on Computer Vision, p.386–396. [doi:10.1007/11744085_30]
Bay, H., Ess, A., Tuytelaars, T., et al., 2008. Speeded-up robust features (SURF). Comput. Vis. Image Understand., 110(3):346–359. [doi:10.1016/j.cviu.2007.09.014]
Breiman, L., Spector, P., 1992. Submodel selection and evaluation in regression. The X-random case. Int. Statist. Rev., 60(3):291–319.
Calonder, M., Lepetit, V., Strecha, C., et al., 2010. BRIEF: binary robust independent elementary features. Proc. 11th European Conf. on Computer Vision, p.778–792. [doi:10.1007/978-3-642-15561-1_56]
Dalal, N., Triggs, B., 2005. Histograms of oriented gradients for human detection. Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.886–893. [doi:10.1109/CVPR.2005.177]
Déniz, O., Bueno, G., Salido, J., et al., 2011. Face recognition using histograms of oriented gradients. Patt. Recogn. Lett., 32(12):1598–1603. [doi:10.1016/j.patrec.2011.01.004]
Dubout, C., Fleuret, F., 2012. Exact acceleration of linear object detectors. Proc. 12th European Conf. on Computer Vision, p.301–311. [doi:10.1007/978-3-642-33712-3_22]
Felzenszwalb, P.F., Huttenlocher, D.P., 2005. Pictorial structures for object recognition. Int. J. Comput. Vis., 61(1):55–79. [doi:10.1023/B:VISI.0000042934.15159.49]
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., 2010a. Cascade object detection with deformable part models. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.2241–2248. [doi:10.1109/CVPR.2010.5539906]
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., et al., 2010b. Object detection with discriminatively trained part-based models. IEEE Trans. Patt. Anal. Mach. Intell., 32(9):1627–1645. [doi:10.1109/TPAMI.2009.167]
Fischler, M.A., Elschlager, R.A., 1973. The representation and matching of pictorial structures. IEEE Trans. Comput., 22(1):67–92.
Goferman, S., Tal, A., Zelnik-Manor, L., 2010. Puzzle-like collage. Comput. Graph. For., 29(2):459–468. [doi:10.1111/j.1467-8659.2009.01615.x]
Goferman, S., Zelnik-Manor, L., Tal, A., 2012. Contextaware saliency detection. IEEE Trans. Patt. Anal. Mach. Intell., 34(10):1915–1926. [doi:10.1109/TPAMI.2011.272]
Grauman, K., Darrell, T., 2005. The pyramid match kernel: discriminative classification with sets of image features. Proc. 10th IEEE Int. Conf. on Computer Vision, p.1458–1465. [doi:10.1109/ICCV.2005.239]
Itti, L., Koch, C., 2001. Computational modelling of visual attention. Nat. Rev. Neurosci., 2(3):194–203. [doi:10.1038/35058500]
Juan, L., Gwun, O., 2009. A comparison of SIFT, PCA-SIFT and SURF. Int. J. Image Process., 3(4):143–152.
Kanan, C., Cottrell, G., 2010. Robust classification of objects, faces, and flowers using natural image statistics. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.2472–2479. [doi:10.1109/CVPR.2010.5539947]
Kanan, C., Tong, M.H., Zhang, L., et al., 2009. SUN: topdown saliency using natural statistics. Vis. Cogn., 17(6–7):979–1003. [doi:10.1080/13506280902771138]
Ke, Y., Sukthankar, R., 2004. PCA-SIFT: a more distinctive representation for local image descriptors. Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.506–513. [doi:10.1109/CVPR.2004.1315206]
Kobayashi, T., 2013. BFO meets HOG: feature extraction based on histograms of oriented p.d.f gradients for image classification. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.747–754. [doi:10.1109/CVPR.2013.102]
Kokkinos, I., 2011. Rapid deformable object detection using dual-tree branch-and-bound. Advances in Neural Information Processing Systems, p.2681–2689.
Lazebnik, S., Schmid, C., Ponce, J., 2006. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.2169–2178. [doi:10.1109/CVPR.2006.68]
Leutenegger, S., Chli, M., Siegwart, R.Y., 2011. BRISK: binary robust invariant scalable keypoints. Proc. IEEE Int. Conf. on Computer Vision, p.2548–2555. [doi:10.1109/ICCV.2011.6126542]
Li, F.F., Perona, P., 2005. A Bayesian hierarchical model for learning natural scene categories. Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.524–531. [doi:10.1109/CVPR.2005.16]
Li, F.F., Fergus, R., Perona, P., 2007. Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Comput. Vis. Image Understand., 106(1):59–70. [doi:10.1016/j.cviu.2005.09.012]
Li, W.H., Lin, Y.F., Fu, B., et al., 2013. Cascade classifier using combination of histograms of oriented gradients for rapid pedestrian detection. J. Softw., 8(1):71–77. [doi:10.4304/jsw.8.1.71-77]
Liu, C., Yuen, J., Torralba, A., et al., 2008. SIFT flow: dense correspondence across different scenes. Proc. 10th European Conf. on Computer Vision, p.28–42. [doi:10.1007/978-3-540-88690-7_3]
Lowe, D.G., 2004. Distinctive image features from scaleinvariant keypoints. Int. J. Comput. Vis., 60(2):91–110. [doi:10.1023/B:VISI.0000029664.99615.94]
Ojala, T., Pietikainen, M., Maenpaa, T., 2002. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Patt. Anal. Mach. Intell., 24(7):971–987. [doi:10.1109/TPAMI.2002.1017623]
Otsu, N., 1975. A threshold selection method from gray-level histograms. Automatica, 11:23–27.
Ott, P., Everingham, M., 2009. Implicit color segmentation features for pedestrian and object detection. Proc. IEEE 12th Int. Conf. on Computer vision, p.723–730. [doi:10.1109/ICCV.2009.5459238]
Pedersoli, M., Vedaldi, A., Gonzàlez, J., et al., 2015. A coarse-to-fine approach for fast deformable object detection. Patt. Recogn., 48(5):1844–1853. [doi:10.1016/j.patcog.2014.11.006]
Rahtu, E., Kannala, J., Salo, M., et al., 2010. Segmenting salient objects from images and videos. Proc. 11th European Conf. on Computer Vision, p.366–379. [doi:10.1007/978-3-642-15555-0_27]
Rutishauser, U., Walther, D., Koch, C., et al., 2004. Is bottom-up attention useful for object recognition? Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.37–44. [doi:10.1109/CVPR.2004.1315142]
Santella, A., Agrawala, M., DeCarlo, D., et al., 2006. Gazebased interaction for semi-automatic photo cropping. Proc. SIGCHI Conf. on Human Factors in Computing Systems, p.771–780. [doi:10.1145/1124772.1124886]
Tola, E., Lepetit, V., Fua, P., 2010. DAISY: an efficient dense descriptor applied to wide-baseline stereo. IEEE Trans. Patt. Anal. Mach. Intell., 32(5):815–830. [doi:10.1109/TPAMI.2009.77]
van de Sande, K.E.A., Gevers, T., Snoek, C.G.M., 2010. Evaluating color descriptors for object and scene recognition. IEEE Trans. Patt. Anal. Mach. Intell., 32(9):1582–1596. [doi:10.1109/TPAMI.2009.154]
Vedaldi, A., Fulkerson, B., 2010. VLFeat: an open and portable library of computer vision algorithms. Proc. Int. Conf. on Multimedia, p.1469–1472. [doi:10.1145/1873951.1874249]
Wilcoxon, F., 1945. Individual comparisons by ranking methods. Biometr. Bull., 1(6):80–83.
Wu, J.X., Rehg, J.M., 2009. Beyond the Euclidean distance: creating effective visual codebooks using the histogram intersection kernel. Proc. IEEE 12th Int. Conf. on Computer Vision, p.630–637. [doi:10.1109/ICCV.2009.5459178]
Yan, J.J., Lei, Z., Wen, L.Y., et al., 2014. The fastest deformable part model for object detection. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.2497–2504. [doi:10.1109/CVPR.2014.320]
Zaklouta, F., Stanciulescu, B., 2014. Real-time traffic sign recognition in three stages. Robot. Auton. Syst., 62(1):16–24. [doi:10.1016/j.robot.2012.07.019]
Zhang, J., Marszalek, M., Lazebnik, S., et al., 2007. Local features and kernels for classification of texture and object categories: a comprehensive study. Int. J. Comput. Vis., 73(2):213–238. [doi:10.1007/s11263-006-9794-4]
Author information
Authors and Affiliations
Corresponding author
Additional information
Project supported by the China Knowledge Centre for Engineering Sciences and Technology (No. CKCEST-2014-1-2), the Zhejiang Provincial Natural Science Foundation of China (No. LY14F020027), and the National Natural Science Foundation of China (No. 61272304)
ORCID: Xun LIU, http://orcid.org/0000-0002-3045-2943
Rights and permissions
About this article
Cite this article
Liu, X., Zhang, Y., Zhang, Sy. et al. Detection of engineering vehicles in high-resolution monitoring images. Frontiers Inf Technol Electronic Eng 16, 346–357 (2015). https://doi.org/10.1631/FITEE.1500026
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.1500026