Recognition in Terra Incognita

  • Sara BeeryEmail author
  • Grant Van Horn
  • Pietro Perona
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11220)


It is desirable for detection and classification algorithms to generalize to unfamiliar environments, but suitable benchmarks for quantitatively studying this phenomenon are not yet available. We present a dataset designed to measure recognition generalization to novel environments. The images in our dataset are harvested from twenty camera traps deployed to monitor animal populations. Camera traps are fixed at one location, hence the background changes little across images; capture is triggered automatically, hence there is no human bias. The challenge is learning recognition in a handful of locations, and generalizing animal detection and classification to new locations where no training data is available. In our experiments state-of-the-art algorithms show excellent performance when tested at the same location where they were trained. However, we find that generalization to new locations is poor, especially for classification systems.(The dataset is available at


Recognition Transfer learning Domain adaptation Context Dataset Benchmark 



We would like to thank the USGS and NPS for providing data. This work was supported by NSFGRFP Grant No. 1745301, the views are those of the authors and do not necessarily reflect the views of the NSF. Compute time was provided by an AWS Research Grant.

Supplementary material

474218_1_En_28_MOESM1_ESM.pdf (210 kb)
Supplementary material 1 (pdf 209 KB)


  1. 1.
    Van Horn, G., et al.: The iNaturalist challenge 2017 dataset. arXiv preprint arXiv:1707.06642 (2017)
  2. 2.
    Norouzzadeh, M.S., Nguyen, A., Kosmala, M., Swanson, A., Packer, C., Clune, J.: Automatically identifying wild animals in camera trap images with deep learning. arXiv preprint arXiv:1703.05830 (2017)
  3. 3.
    van Horn, G., Barry, J., Belongie, S., Perona, P.: The Merlin Bird ID smartphone app (
  4. 4.
    Esteva, A., Kuprel, B., Novoa, R.A., Ko, J., Swetter, S.M., Blau, H.M., Thrun, S.: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639), 115 (2017)CrossRefGoogle Scholar
  5. 5.
    Poplin, R., et al.: Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat. Biomed. Eng. 1 (2018)Google Scholar
  6. 6.
    Fukushima, K., Miyake, S.: Neocognitron: a self-organizing neural network model for a mechanism of visual pattern recognition. Competition and Cooperation in Neural Nets, pp. 267–285. Springer, Berlin (1982). Scholar
  7. 7.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  8. 8.
    Schaller, R.R.: Moore’s law: past, present and future. IEEE Spectr. 34(6), 52–59 (1997)CrossRefGoogle Scholar
  9. 9.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR09 (2009)Google Scholar
  10. 10.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). Scholar
  11. 11.
    Van Horn, G., Perona, P.: The devil is in the tails: fine-grained classification in the wild. arXiv preprint arXiv:1709.01450 (2017)
  12. 12.
    Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)CrossRefGoogle Scholar
  13. 13.
    Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1521–1528. IEEE (2011)Google Scholar
  14. 14.
    Welinder, P., Welling, M., Perona, P.: A lazy man’s approach to benchmarking: semisupervised classifier evaluation and recalibration. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3262–3269. IEEE (2013)Google Scholar
  15. 15.
    Murphy, G.: The Big Book of Concepts. MIT press, Cambridge (2004)Google Scholar
  16. 16.
    Ren, X., Han, T.X., He, Z.: Ensemble video object cut in highly dynamic scenes. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1947–1954. IEEE (2013)Google Scholar
  17. 17.
    Yu, X., Wang, J., Kays, R., Jansen, P.A., Wang, T., Huang, T.: Automated identification of animal species in camera trap images. EURASIP J. Image Video Process. 2013(1), 52 (2013)CrossRefGoogle Scholar
  18. 18.
    Wilber, M.J., et al.: Animal recognition in the mojave desert: vision tools for field biologists. In: 2013 IEEE Workshop on Applications of Computer Vision (WACV), pp. 206–213. IEEE (2013)Google Scholar
  19. 19.
    Chen, G., Han, T.X., He, Z., Kays, R., Forrester, T.: Deep convolutional neural network based species recognition for wild animal monitoring. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 858–862. IEEE (2014)Google Scholar
  20. 20.
    Lin, K.H., Khorrami, P., Wang, J., Hasegawa-Johnson, M., Huang, T.S.: Foreground object detection in highly dynamic scenes using saliency. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 1125–1129. IEEE (2014)Google Scholar
  21. 21.
    Swanson, A., Kosmala, M., Lintott, C., Simpson, R., Smith, A., Packer, C.: Snapshot serengeti, high-frequency annotated camera trap images of 40 mammalian species in an African savanna. Sci. Data 2, 150026 (2015)CrossRefGoogle Scholar
  22. 22.
    Zhang, Z., Han, T.X., He, Z.: Coupled ensemble graph cuts and object verification for animal segmentation from highly cluttered videos. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 2830–2834. IEEE (2015)Google Scholar
  23. 23.
    Zhang, Z., He, Z., Cao, G., Cao, W.: Animal detection from highly cluttered natural scenes using spatiotemporal object region proposals and patch verification. IEEE Trans. Multimed. 18(10), 2079–2092 (2016)CrossRefGoogle Scholar
  24. 24.
    Miguel, A., Beery, S., Flores, E., Klemesrud, L., Bayrakcismith, R.: Finding areas of motion in camera trap images. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 1334–1338. IEEE (2016)Google Scholar
  25. 25.
    Giraldo-Zuluaga, J.H., Salazar, A., Gomez, A., Diaz-Pulido, A.: Camera-trap images segmentation using multi-layer robust principal component analysis. Vis. Comp. 1–13 (2017)Google Scholar
  26. 26.
    Yousif, H., Yuan, J., Kays, R., He, Z.: Fast human-animal detection from highly cluttered camera-trap images using joint background modeling and deep learning classification. In: 2017 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–4. IEEE (2017)Google Scholar
  27. 27.
    Villa, A.G., Salazar, A., Vargas, F.: Towards automatic wild animal monitoring: Identification of animal species in camera-trap images using very deep convolutional neural networks. Ecol. Inf. 41, 24–32 (2017)CrossRefGoogle Scholar
  28. 28.
    Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRefGoogle Scholar
  29. 29.
    Krasin, I., et al.: Openimages: a public dataset for large-scale multi-label and multi-class image classification. Dataset available from (2017)
  30. 30.
    Ponce, J., et al.: Dataset issues in object recognition. In: Ponce, J., Hebert, M., Schmid, C., Zisserman, A. (eds.) Toward Category-Level Object Recognition. LNCS, vol. 4170, pp. 29–48. Springer, Heidelberg (2006). Scholar
  31. 31.
    Spain, M., Perona, P.: Some objects are more equal than others: measuring and predicting importance. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 523–536. Springer, Heidelberg (2008). Scholar
  32. 32.
    Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD birds-200-2011 dataset (2011)Google Scholar
  33. 33.
    Nilsback, M.E., Zisserman, A.: A visual vocabulary for flower classification. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 2, 1447–1454 (2006)Google Scholar
  34. 34.
    Kumar, N., et al.: Leafsnap: a computer vision system for automatic plant species identification. In: The 12th European Conference on Computer Vision (ECCV) (October 2012)CrossRefGoogle Scholar
  35. 35.
    Van Horn, G., et al.: Building a bird recognition app and large scale dataset with citizen scientists: the fine print in fine-grained dataset collection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 595–604 (2015)Google Scholar
  36. 36.
    St-Charles, P.L., Bilodeau, G.A., Bergevin, R.: Subsense: a universal change detection method with local adaptive sensitivity. IEEE Trans. Image Process. 24(1), 359–373 (2015)MathSciNetCrossRefGoogle Scholar
  37. 37.
    Babaee, M., Dinh, D.T., Rigoll, G.: A deep convolutional neural network for background subtraction. arXiv preprint arXiv:1702.01731 (2017)
  38. 38.
    Zhan, Y., Fu, K., Yan, M., Sun, X., Wang, H., Qiu, X.: Change detection based on deep siamese convolutional network for optical aerial images. IEEE Geosci. Remote Sens. Lett. 14(10), 1845–1849 (2017)CrossRefGoogle Scholar
  39. 39.
    Benedek, C., Szirányi, T.: A mixed markov model for change detection in aerial photos with large time differences. In: 19th International Conference on Pattern Recognition, ICPR 2008, pp. 1–4. IEEE (2008)Google Scholar
  40. 40.
    eMammal: a tool for collecting, archiving, and sharing camera trapping images and data. Accessed 13 March 2018
  41. 41.
    Csurka, G.: Domain adaptation for visual applications: a comprehensive survey. arXiv preprint arXiv:1702.05374 (2017)
  42. 42.
    Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: International Conference on Machine Learning, pp. 1180–1189 (2015)Google Scholar
  43. 43.
    Gebru, T., Hoffman, J., Fei-Fei, L.: Fine-grained recognition in the wild: a multi-task domain adaptation approach. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1358–1367. IEEE (2017)Google Scholar
  44. 44.
    Busto, P.P., Gall, J.: Open set domain adaptation. In: The IEEE International Conference on Computer Vision (ICCV), vol. 1 (2017)Google Scholar
  45. 45.
    Hoffman, J., Wang, D., Yu, F., Darrell, T.: FCNS in the wild: pixel-level adversarial and constraint-based adaptation. arXiv preprint arXiv:1612.02649 (2016)
  46. 46.
    Chen, Y., Li, W., Van Gool, L.: Road: reality oriented adaptation for semantic segmentation of urban scenes. arXiv preprint arXiv:1711.11556 (2017)
  47. 47.
    Zhang, Y., David, P., Gong, B.: Curriculum domain adaptation for semantic segmentation of urban scenes. In: The IEEE International Conference on Computer Vision (ICCV), vol. 2, p. 6 (2017)Google Scholar
  48. 48.
    Peng, X., Sun, B., Ali, K., Saenko, K.: Learning deep object detectors from 3D models. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1278–1286. IEEE (2015)Google Scholar
  49. 49.
    Tang, K., Ramanathan, V., Fei-Fei, L., Koller, D.: Shifting weights: adapting object detectors from image to video. In: Advances in Neural Information Processing Systems, pp. 638–646 (2012)Google Scholar
  50. 50.
    Sun, B., Saenko, K.: From virtual to reality: fast adaptation of virtual object detectors to real domains. In: BMVC, vol. 1, p. 3 (2014)Google Scholar
  51. 51.
    Hattori, H., Boddeti, V.N., Kitani, K., Kanade, T.: Learning scene-specific pedestrian detectors without real data. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3819–3827. IEEE (2015)Google Scholar
  52. 52.
    Xu, J., Ramos, S., Vázquez, D., López, A.M.: Domain adaptation of deformable part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2367–2380 (2014)CrossRefGoogle Scholar
  53. 53.
    Raj, A., Namboodiri, V.P., Tuytelaars, T.: Subspace alignment based domain adaptation for RCNN detector. arXiv preprint arXiv:1507.05578 (2015)
  54. 54.
    Van Horn, G., Scott Laurie, S.B., Perona, P.: Lean multiclass crowdsourcing. Comput. Vis. Pattern Recognit. (2018)Google Scholar
  55. 55.
    Bengio, Y.: Practical recommendations for gradient-based training of deep architectures. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 437–478. Springer, Heidelberg (2012). Scholar
  56. 56.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)Google Scholar
  57. 57.
    Huang, J., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: IEEE CVPR (2017)Google Scholar
  58. 58.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.CaltechPasadenaUSA

Personalised recommendations