Advertisement

International Journal of Computer Vision

, Volume 119, Issue 2, pp 159–178 | Cite as

Hierarchical Adaptive Structural SVM for Domain Adaptation

  • Jiaolong Xu
  • Sebastian Ramos
  • David Vázquez
  • Antonio M. López
Article

Abstract

A key topic in classification is the accuracy loss produced when the data distribution in the training (source) domain differs from that in the testing (target) domain. This is being recognized as a very relevant problem for many computer vision tasks such as image classification, object detection, and object category recognition. In this paper, we present a novel domain adaptation method that leverages multiple target domains (or sub-domains) in a hierarchical adaptation tree. The core idea is to exploit the commonalities and differences of the jointly considered target domains. Given the relevance of structural SVM (SSVM) classifiers, we apply our idea to the adaptive SSVM (A-SSVM; Xu et al., IEEE Trans Pattern Anal Mach Intell 36(12):2367–2380, 2014a), which only requires the target domain samples together with the existing source-domain classifier for performing the desired adaptation. Altogether, we term our proposal as hierarchical A-SSVM (HA-SSVM). As proof of concept we use HA-SSVM for pedestrian detection, object category recognition and face recognition. In the former we apply HA-SSVM to the deformable part-based model (DPM; Felzenszwalb et al., IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645, 2010) while in the rest HA-SSVM is applied to multi-category classifiers. We will show how HA-SSVM is effective in increasing the detection/recognition accuracy with respect to adaptation strategies that ignore the structure of the target data. Since, the sub-domains of the target data are not always known a priori, we shown how HA-SSVM can incorporate sub-domain discovery for object category recognition.

Keywords

Target Domain Domain Adaptation Source Domain Pedestrian Detection Domain Discovery 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgments

This work is supported by the Spanish MEC Project TRA2014-57088-C2-1-R, the Spanish DGT Project SPIP2014-01352, the Generalitat de Catalunya Project 2014-SGR-1506, Jiaolong Xu’s Chinese Scholarship Council (CSC) Grant No.2011611023, and Sebastian Ramos’ FPI Grant BES-2012-058280. Finally, we also want to thank the NVIDIA Corporation for the generous support in the form of different GPU hardware units.

References

  1. Aytar, Y., & Zisserman, A. (2011). Tabula rasa: Model transfer for object category detection. In Proceedings of international conference on computer vision, Singapore.Google Scholar
  2. Behley, J., Steinhage, V., & Cremers, A. B. (2013). Laser-based segment classification using a mixture of bag-of-words. In IEEE international conference on intelligent robots and systems, New York.Google Scholar
  3. Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Vaughan, J. (2009). A theory of learning from different domains. Machine Learning, 79(1), 151–175.MathSciNetGoogle Scholar
  4. Bergamo, A., & Torresani, L. (2010). Exploring weakly-labeled web images to improve object classification: A domain adaptation approach. In Advances in neural information processing systems, Vancouver.Google Scholar
  5. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE conference on computer vision and pattern recognition, San Diego.Google Scholar
  6. Daumé III, H. (2007). Frustratingly easy domain adaptation. In Meeting of the association for computational linguistics, Prague.Google Scholar
  7. Daumé III, H. (2009). Bayesian multitask learning with latent hierarchies. In UAI, Montreal.Google Scholar
  8. Deng, J., Krause, J., Berg, A., & Li, F.-F. (2012). Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition. In IEEE conference on computer vision and pattern recognition, Washington.Google Scholar
  9. Dollár, P., Wojek, C., Schiele, B., & Perona, P. (2012). Pedestrian detection: An evaluation of the state of the art. IEEE Transaction on Pattern Analysis and Machine Intelligence, 34(4), 743–761.CrossRefGoogle Scholar
  10. Duan, L., Tsang, I. W., Xu, D., & Chua, T.-S. (2009). Domain adaptation from multiple sources via auxiliary classifiers. In International conference on machine learning, Montreal.Google Scholar
  11. Duan, L., Xu, D., & Tsang, I. W. (2012). Learning with augmented features for heterogeneous domain adaptation. In International conference on machine learning, Edinburgh.Google Scholar
  12. Ess, A., Leibe, B., & Gool, L. V. (2007). Depth and appearance for mobile scene analysis. In International conference on computer vision, Rio de Janeiro.Google Scholar
  13. Felzenszwalb, P., Girshick, R., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRefGoogle Scholar
  14. Finkel, J., & Christopher, D. (2009). Hierarchical bayesian domain adaptation. In NAACL, Colorado.Google Scholar
  15. Geiger, A., Wojek, C., & Urtasun, R. (2011). Joint 3D estimation of objects and scene layout. In Advances in neural information processing systems, Granada.Google Scholar
  16. Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? the KITTI vision benchmark suite. In IEEE conference on computer vision and pattern recognition, Washington.Google Scholar
  17. Georghiades, A., Belhumeur, P., & Kriegman, D. (2001). From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6), 643–660.CrossRefGoogle Scholar
  18. Girshick, R. (2012). From rigid templates to grammars: Object detection with structured models. Ph.D. thesis, The University of Chicago, Chicago.Google Scholar
  19. Girshick, R., Felzenszwalb, P., & McAllester, D. (2012). Discriminatively trained deformable part models, release 5. http://www.people.cs.uchicago.edu/rbg/latent-release5/.
  20. Gong, B., Shi, Y., Sha, F., & Grauman, K. (2012). Geodesic flow kernel for unsupervised domain adaptation. In IEEE conference on computer vision and pattern recognition, Providence.Google Scholar
  21. Gong, B., Grauman, K., & Sha, F. (2013a). Connecting the dots with landmarks: Discriminatively learning domain-invariant features for unsupervised domain adaptation. In International conference on machine learning, Atlanta.Google Scholar
  22. Gong, B., Grauman, K., & Sha, F. (2013b). Reshaping visual datasets for domain adaptation. In Advances in neural information processing systems, Lake Tahoe.Google Scholar
  23. Gong, B., Grauman, K., & Sha, F. (2014). Learning kernels for unsupervised domain adaptation with applications to visual object recognition. International Journal on Computer Vision, 109(1–2), 3–27.MathSciNetCrossRefMATHGoogle Scholar
  24. Gopalan, R., Li, R., & Chellappa, R. (2011). Domain adaptation for object recognition: An unsupervised approach. In International conference on computer vision, Barcelona.Google Scholar
  25. Gourier, N., Hall, D., & Crowley, J. L. (2004). Estimating face orientation from robust detection of salient facial features. In International conference in pattern recognition, New York.Google Scholar
  26. Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset. Technical report, California Institute of Technology.Google Scholar
  27. Hoffman, J., Kulis, B., Darrell, T., & Saenko, K. (2012). Discovering latent domains for multisource domain adaptation. In European conference on computer vision, Florence.Google Scholar
  28. Hoffman, J., Rodner, E., Donahue, J., Saenko, K., & Darrell, T. (2013). Efficient learning of domain invariant image representations. In International conference on learning representations, Arizona.Google Scholar
  29. Hoffman, J., Rodner, E., Donahue, J., Kulis, B., & Saenko, K. (2014). Asymmetric and category invariant feature transformations for domain adaptation. International Journal on Computer Vision, 109(1–2), 28–41.MathSciNetCrossRefMATHGoogle Scholar
  30. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. arXiv:1408.5093.
  31. Jiang, J. (2008). A literature survey on domain adaptation of statistical classifiers. Technical report, School of Information Systems, Singapore Management University.Google Scholar
  32. Kan, M., Wu, J., Shan, S., & Chen, X. (2014). Domain adaptation for face recognition: Targetize source domain bridged by common subspace. International Journal on Computer Vision, 109(1–2), 94–109.CrossRefMATHGoogle Scholar
  33. Kulis, B., Saenko, K., & Darrell, T. (2011). What you saw is not what you get: Domain adaptation using asymmetric kernel transforms. In IEEE conference on computer vision and pattern recognition, Washington.Google Scholar
  34. Lee, K., Ho, J., & Kriegman, D. (2005). Acquiring linear subspaces for face recognition under variable lighting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5), 684–698.CrossRefGoogle Scholar
  35. Lu, B., Chellappa, R., & Nasrabadi, N. M. (2015). Incremental dictionary learning for unsupervised domain adaptation. In British machine vision conference, Swansea.Google Scholar
  36. Mansour, Y., Mohri, M., & Rostamizadeh, A. (2008). Domain adaptation with multiple sources. In Advances in neural information processing systems, Vancouver.Google Scholar
  37. Mirrashed, F., & Rastegar, M. (2013). Domain adaptive classification. In International conference on computer vision, Sydney.Google Scholar
  38. Mosek. (2013). Optimization toolkit. http://www.mosek.com.
  39. Nguyen, H., Ho, H. T., Patel, V., & Chellappa, R. (2015). Dash-n: Joint hierarchical domain adaptation and feature learning. IEEE Transactions on Image Processing, 24(12), 5479–5491.MathSciNetCrossRefGoogle Scholar
  40. Ni, J., Qiu, Q., & Chellappa, R. (2013). Subspace interpolation via dictionary learning for unsupervised domain adaptation. In IEEE conference on computer vision and pattern recognition, Oregon.Google Scholar
  41. Pan, S., & Yang, Q. (2009). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.CrossRefGoogle Scholar
  42. Park, D., Ramanan, D., & Fowlkes, C. (2010). Multiresolution models for object detection. In European conference on computer vision, Crete.Google Scholar
  43. Pepikj, B., Stark, M., Gehler, P., & Schiele, B. (2015). Multi-view and 3d deformable part models. In IEEE transactions on pattern analysis and machine intelligence, New York.Google Scholar
  44. Premebida, C., Carreira, J., Batista, J., & Nunes, U. (2014). Pedestrian detection combining rgb and dense lidar data. In IEEE international conference on intelligent robots and systems, Chicago.Google Scholar
  45. Saenko, K., Hulis, B., Fritz, M., & Darrel, T. (2010). Adapting visual category models to new domains. In European conference on computer vision, Hersonissos, Heraklion, Crete.Google Scholar
  46. Tang, K., Ramanathan, V., Fei-fei, L., & Koller, D. (2012). Shifting weights: Adapting object detectors from image to video. In Advances in neural information processing systems, Lake Tahoe.Google Scholar
  47. Teh, Y., Daumé III, H., & Roy, D. (2007). Bayesian agglomerative clustering with coalescents. In Advances in neural information processing systems, Vancouver.Google Scholar
  48. Vázquez, D., López, A., & Ponsa, D. (2012). Unsupervised domain adaptation of virtual and real worlds for pedestrian detection. In International conference in pattern recognition, Tsukuba.Google Scholar
  49. Vázquez, D., López, A., Marín, J., Ponsa, D., & Gerónimo, D. (2014). Virtual and real world adaptation for pedestrian detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(4), 797–809.CrossRefGoogle Scholar
  50. Xu, J., Ramos, S., Vázquez, D., & López, A. (2014a). Domain adaptation of deformable part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(12), 2367–2380.Google Scholar
  51. Xu, J., Vázquez, D., López, A., Marín, J., & Ponsa, D. (2014b). Learning a part-based pedestrian detector in a virtual world. IEEE Transactions on Intelligent Transportation Systems, 15(5), 2121–2131.Google Scholar
  52. Xu, H., Zheng, J., & Chellappa, R. (2015). Bridging the domain shift by domain adaptive dictionary learning. In British machine vision conference, Swansea.Google Scholar
  53. Yang, J., Yan, R., & Hauptmann, A. (2007). Cross-domain video concept detection using adaptive SVMs. In ACM multimedia, Augsburg.Google Scholar
  54. Yebes, J., Bergasa, L., & García, M. (2015). Visual object recognition with 3d-aware features in kitti urban scenes. Sensors, 15(4), 9228–9250.CrossRefGoogle Scholar
  55. Zhu, L., Chen, Y., Yuille, A., & Freeman, W. (2010). Latent hierarchical structural learning for object detection. In IEEE conference on computer vision and pattern recognition, San Francisco.Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Jiaolong Xu
    • 1
    • 2
  • Sebastian Ramos
    • 1
    • 2
  • David Vázquez
    • 1
  • Antonio M. López
    • 1
    • 2
  1. 1.Computer Vision CenterBarcelonaSpain
  2. 2.Department of Computer ScienceUniversitat Autònoma de BarcelonaBarcelonaSpain

Personalised recommendations