Skip to main content
Log in

Growing Regression Tree Forests by Classification for Continuous Object Pose Estimation

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

We propose a novel node splitting method for regression trees and incorporate it into the random regression forest framework. Unlike traditional binary splitting, where the splitting rule is selected from a predefined set of binary splitting rules via trial-and-error, the proposed node splitting method first finds clusters in the training data which at least locally minimize the empirical loss without considering the input space. Then splitting rules which preserve the found clusters as much as possible, are determined by casting the problem as a classification problem. Consequently, our new node splitting method enjoys more freedom in choosing the splitting rules, resulting in more efficient tree structures. In addition to the algorithm for the ordinary Euclidean target space, we present a variant which can naturally deal with a circular target space by the proper use of circular statistics. In order to deal with challenging, ambiguous image-based pose estimation problems, we also present a voting-based ensemble method using the mean shift algorithm. Furthermore, to address data imbalanceness problems present in some of the datasets, we propose a bootstrap sampling method using a sample weighting technique. We apply the proposed random regression forest algorithm to head pose estimation, car direction estimation and pedestrian orientation estimation tasks, and demonstrate its competitive performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. In the earlier version Hara and Chellappa (2014) and in Pelleg and Moore (2000), the variance is incorrectly estimated by missing q in the denominator. The results of the AKRF on Pointing’04 datasets have been updated. However, the difference is insignificant. The results on EPFL Multi-vew Car Dataset are unaffected as \(q=1\).

  2. The results of AKRF have been updated from (Hara and Chellappa 2014) by fixing Eq. 11, however, the difference is insignificant.

References

  • Andriluka, M., Roth, S., & Schiele, B. (2010). Monocular 3D pose estimation and tracking by detection. In CVPR 2010: IEEE conference on computer vision and pattern recognition.

  • Bailly, K., Milgram, M., & Phothisane, P. (2009). Head pose estimation by a stepwise nonlinear regression. In International conference on computer analysis of images and patterns.

  • Baltieri, D., Vezzani, R., & Cucchiara, R. (2012). People orientation recognition by mixtures of wrapped distributions on random trees. In European conference on computer vision. Heidelberg: Springer.

  • Berzal, F., Cubero, J. C., Marn, N., & Sánchez, D. (2004). Building multi-way decision trees with numerical attributes. Information Sciences, 165(1–2), 73–90.

    Article  MathSciNet  MATH  Google Scholar 

  • Bissacco, A., Yang, M. H., & Soatto, S. (2007). Fast human pose estimation using appearance and motion via multi-dimensional boosting regression. In 2007 IEEE conference on computer vision and pattern recognition.

  • Breiman, L. (2001). Random forest. Machine Learning, 45(1), 5–32.

    Article  MATH  Google Scholar 

  • Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees. London: Chapman and Hall/CRC.

    MATH  Google Scholar 

  • Cao, X., Wei, Y., Wen, F., & Sun, J. (2012). Face alignment by explicit shape regression. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 27.

    Article  Google Scholar 

  • Chang-Chien, S. J., Hung, W. L., & Yang, M. S. (2012). On mean shift-based clustering for circular data. Soft Computing, 16(6), 1043–1060.

    Article  Google Scholar 

  • Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique nitesh. Journal of Artificial Intelligence Research, 16, 321–357.

    MATH  Google Scholar 

  • Chen, C., Liaw, A., & Breiman, L. (2004). Using random forest to learn imbalanced data. UC Berkeley: Technical report, Department of Statistics.

  • Chen, C., Heili, A., & Odobez, J. M. (2011). Combined estimation of location and body pose in surveillance video. In International conference on advanced video and signal based surveillance (AVSS)

  • Cheng, Y. (1995). Mean shift, mode seeking, and clustering. PAMI, 17(8), 790–799.

    Article  Google Scholar 

  • Chou, P. A. (1991). Optimal partitioning for classification and regression trees. PAMI, 13(4), 340–354.

    Article  Google Scholar 

  • Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. PAMI, 24(5), 603–619.

    Article  Google Scholar 

  • Criminisi, A., & Shotton, J. (2013). Decision forests for computer vision and medical image analysis. New York: Springer.

    Book  Google Scholar 

  • Criminisi, A., Shotton, J., Robertson, D., & Konukoglu, E. (2010). Regression forests for efficient anatomy detection and localization in CT studies. In Medical computer vision. Recognition techniques and applications in medical imaging (Vol. 6533, pp. 106–117).

  • Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05).

  • Dantone, M., Gall, J., Fanelli, G., & Gool, L. V. (2012). Real-time facial feature detection using conditional regression forests. In 2012 IEEE conference on computer vision and pattern recognition (CVPR).

  • Dobra, A., & Gehrke, J. (2002). Secret: A scalable linear regression tree algorithm. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining.

  • Dollár, P., Welinder, P., & Perona, P. (2010). Cascaded pose regression. In 2010 IEEE conference on computer vision and pattern recognition (CVPR).

  • Domingos, P. (1999). MetaCost: A general method for making classifiers cost-sensitive. In Proceedings of the 5th ACM SIGKDD international conference on Knowledge discovery and data mining.

  • Drucker, H., Burges, C. J. C., Kaufman, L., Smola, A., & Vapnik, V. (1996). Support vector regression machines. In Advances in neural information processing systems NIPS

  • Drummond, C., & Holte, R. C. (2003). C4.5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. In: ICML workshop on learning from imbalanced datasets II.

  • Duin, R. P. W. (1976). On the choice of smoothing parameters for parzen estimators of probability density functions. IEEE Transactions on Computers, C–25(11), 1175–1179.

    Article  MATH  Google Scholar 

  • Enzweiler, M., & Gavrila, D. M. (2010). Integrated pedestrian classification and orientation estimation. In CVPR 2010: IEEE conference on computer vision and pattern recognition

  • Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.

    MATH  Google Scholar 

  • Fanelli, G., Gall, J., & Gool, L. V. (2011). Real time head pose estimation with random regression forests. In 2011 IEEE conference on computer vision and pattern recognition (CVPR)

  • Fayyad, U. M., Irani, & K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the international joint conference on uncertainty in AI

  • Fenzi, M., & Ostermann, J. (2014). Embedding geometry in generative models for pose estimation of object categories. In British machine vision conference.

  • Fenzi, M., Leal-taixé, L., Rosenhahn, B., & Ostermann, J. (2013). Class generative models based on feature regression for pose estimation of object categories. In Proceedings of the IEEE conference on computer vision and pattern recognition.

  • Fenzi, M., Leal-taixé, L., Ostermann, J., & Tuytelaars, T. (2015). Continuous pose estimation with a spatial ensemble of fisher regressors. In Proceedings of the IEEE international conference on computer vision (ICCV).

  • Fisher, N. I. (1996). Statistical analysis of circular data. Cambridge: Cambridge University Press.

    Google Scholar 

  • Fukunaga, K., & Hostetler, L. D. (1975). The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Transactions on Information Theory, 21(1), 32–40.

    Article  MathSciNet  MATH  Google Scholar 

  • Gaile, G. L., & Burt, J. E. (1980). Directional statistics (concepts and techniques in modern geography). Norwich: Geo Abstracts Ltd.

    Google Scholar 

  • Gall, J., & Lempitsky, V. (2009). Class-specific hough forests for object detection. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Gandhi, T., & Trivedi, M. M. (2008). Image based estimation of pedestrian orientation for improving path prediction. In Intelligent vehicles symposium.

  • Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42.

    Article  MATH  Google Scholar 

  • Girshick, R., Shotton, J., Kohli, P., Criminisi, A., & Fitzgibbon, A. (2011). Efficient regression of general-activity human poses from depth images. In 2011 IEEE international conference on computer vision (ICCV).

  • Goto, K., Kidono, K., Kimura, Y., & Naito, T. (2011). Pedestrian detection and direction estimation by cascade detector with multi-classifiers utilizing feature interaction descriptor. In IEEE intelligent vehicles symposium (IV).

  • Gourier, N., Hall, D., & Crowley, J. L. (2004). Estimating face orientation from robust detection of salient facial structures. In ICPR international workshop on visual observation of deictic gestures.

  • Habbema, J. D. F., & Hermans, J. (1977). Selection of variables in discriminant analysis by F-statistic and error rate. Technometrics, 19(4), 487–493.

    Article  MATH  Google Scholar 

  • Haj, M. A., Gonzalez, J., & Davis, L. S. (2012). On partial least squares in head pose estimation: How to simultaneously deal with misalignment. In 2012 IEEE conference on computer vision and pattern recognition (CVPR).

  • Hara, K., & Chellappa, R. (2013). Computationally efficient regression on a dependency graph for human pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition.

  • Hara, K., & Chellappa, R. (2014). Growing regression forests by classification: Applications to object pose estimation. In The European conference on computer vision (ECCV).

  • He, K., Sigal, L., & Sclaroff, S. (2014). Parameterizing object detectors in the continuous pose space. In The European conference on computer vision (ECCV).

  • Herdtweck, C., & Curio, C. (2013). Monocular car viewpoint estimation with circular regression forests. In Intelligent vehicles symposium (IVS)

  • Ho, H. T., & Chellappa, R. (2012). Automatic head pose estimation using randomly projected dense SIFT descriptors. In 2012 19th IEEE international conference on image processing.

  • Huang, C., Ding, X., & Fang, C. (2010). Head pose estimation based on random forests for multiclass classification. In 2010 20th International conference on pattern recognition (ICPR).

  • Kafai, M., Miao, Y., & Okada, K. (2010). Directional mean shift and its application for topology classification of local 3D structures. In CVPR workshop.

  • Kashyap, R. L. (1977). A Bayesian comparison of different classes of dynamic models using empirical data. IEEE Transactions on Automatic Control, 22(5), 715–727.

    Article  MathSciNet  MATH  Google Scholar 

  • Kobayashi, T., & Otsu, N. (2010). Von mises-fisher mean shift for clustering on a hypersphere. In 2010 20th International conference on pattern recognition (ICPR).

  • Kubat, M., Holte, R., & Matwin, S. (1997). Learning when negative examples abount. In Proceedings of ECML-97, 10th European conference on machine learning.

  • Loh, W. Y., & Vanichsetakul, N. (1988). Tree-structured classification via generalized discriminant analysis. Journal of the American Statistical Association, 83(403), 715–725.

    Article  MathSciNet  MATH  Google Scholar 

  • Mardia, K. V., & Jupp, P. (2000). Directional statistics (2nd ed.). New York: Wiley.

    MATH  Google Scholar 

  • Nakajima, C., Pontil, M., Heisele, B., & Poggio, T. (2003). Full-body person recognition system. Pattern Recognition, 36(9), 1997–2006.

    Article  MATH  Google Scholar 

  • Orozco, J., Gong, S., & Xiang, T. (2009). Head pose classification in crowded scenes. In Procedings of the British machine vision conference (BMVC 2009).

  • Ozuysal, M., Lepetit, V., & Fua, P. (2009). Pose estimation for category specific multiview object localization. In 2009 IEEE conference on computer vision and pattern recognition (CVPR).

  • Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., & Brunk, C. (1994). Reducing misclassification costs. In Proceedings of the 11th international conference on machine learning.

  • Pelleg, D., & Moore, A. (2000). X-means: Extending K-means with efficient estimation of the number of clusters. In Proceedings of the 17th international conference on machine learning.

  • Redondo-cabrera, C., Lopez-Sastre, R., & Tuytelaars, T. (2014). All together now : Simultaneous object detection and continuous pose estimation using a hough forest with probabilistic locally enhanced voting. In 25th British machine vision conference—BMVC.

  • Rosipal, R., & Trejo, L. J. (2001). Kernel partial least squares regression in reproducing kernel hilbert space. JMLR, 2, 97–123.

    MATH  Google Scholar 

  • Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.

    Article  MathSciNet  MATH  Google Scholar 

  • Shimizu, H., & Poggio, T. (2004). Direction estimation of pedestrian from multiple still images. In Intelligent vehicles symposium (IVS).

  • Sun, M., Kohli, P., & Shotton, J. (2012). Conditional regression forests for human pose estimation. In 2012 IEEE conference on computer vision and pattern recognition (CVPR).

  • Tao, J., & Klette, R. (2013). Integrated pedestrian and direction classification using a random decision forest. In ICCV Workshop.

  • Torgo, L., & Gama, J. (1996). Regression by classification. In Brazilian symposium on artificial intelligence.

  • Torgo, L., Ribeiro, R. P., Pfahringer, B., & Branc, P. (2013). SMOTE for regression. In Portuguese conference on artificial intelligence.

  • Torki, M., Elgammal, A. (2011). Regression from local features for viewpoint and pose estimation. In 2011 International conference on computer vision.

  • Vapnik, V. (1998). Statistical learning theory. New York: Wiley.

    MATH  Google Scholar 

  • Weiss, S. M., & Indurkhya, N. (1995). Rule-based machine learning methods for functional prediction. Journal of Artificial Intelligence Research, 3, 383–403.

    MATH  Google Scholar 

  • Wu, K. L., & Yang, M. S. (2007). Mean shift-based clustering. Pattern Recognition, 40(11), 3035–3052.

  • Yan, Y., Ricci, E., Subramanian, R., Lanz, O., & Sebe, N. (2013). No matter where you are: Flexible graph-guided multi-task learningfor multi-view head pose classification under target motion. In Proceedings of the IEEE international conference on computer vision.

  • Yang, L., Liu, J., & Tang, X. (2014). Object detection and viewpoint estimation with auto-masking neural network. In European conference on computer vision.

  • Zhang, H., El-gaaly, T., Elgammal, A., & Jiang, Z. (2013). Joint object and pose recognition using homeomorphic manifold analysis. In Association for the advancement of artificial intelligence (AAAI).

  • Zhao, G., Takafumi, M., Shoji, K., & Kenji, M. (2012). Video based estimation of pedestrian walking direction for pedestrian protection system. Journal of Electronics (China), 29(1–2), 72–81.

    Article  Google Scholar 

  • Zhao, G., Takafumi, M., Shoji, K., & Kenji, M. (2012). Video based estimation of pedestrian walking direction for pedestrian protection system. Journal of Electronics (China), 29(1–2), 72–81.

    Article  Google Scholar 

  • Zhen, X., Wang, Z., Yu, M., & Li, S. (2015). Supervised descriptor learning for multi-output regression. In Proceedings of the IEEE conference on computer vision and pattern recognition.

Download references

Acknowledgments

This research was supported by a MURI Grant from the US Office of Naval Research under N00014-10-1-0934.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kota Hara.

Additional information

Communicated by Hiroshi Ishikawa, Takeshi Masuda, Yasuyo Kita and Katsushi Ikeuchi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hara, K., Chellappa, R. Growing Regression Tree Forests by Classification for Continuous Object Pose Estimation. Int J Comput Vis 122, 292–312 (2017). https://doi.org/10.1007/s11263-016-0942-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-016-0942-1

Keywords

Navigation