Skip to main content

The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13668))

Included in the following conference series:

Abstract

We present the Caltech Fish Counting Dataset (CFC ), a large-scale dataset for detecting, tracking, and counting fish in sonar videos. We identify sonar videos as a rich source of data for advancing low signal-to-noise computer vision applications and tackling domain generalization in multiple-object tracking (MOT) and counting. In comparison to existing MOT and counting datasets, which are largely restricted to videos of people and vehicles in cities, CFC is sourced from a natural-world domain where targets are not easily resolvable and appearance features cannot be easily leveraged for target re-identification. With over half a million annotations in over 1,500 videos sourced from seven different sonar cameras, CFC allows researchers to train MOT and counting algorithms and evaluate generalization performance at unseen test locations. We perform extensive baseline experiments and identify key challenges and opportunities for advancing the state of the art in generalization in MOT and counting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahumada, J.A., et al.: Wildlife insights: a platform to maximize the potential of camera trap and other passive sensor wildlife data for the planet. Environ. Conserv. 47(1), 1–6 (2020)

    Article  MathSciNet  Google Scholar 

  2. Anton, V., Hartley, S., Geldenhuis, A., Wittmer, H.U.: Monitoring the mammalian fauna of urban areas using remote cameras and citizen science. J. Urban Ecol. 4(1), juy002 (2018)

    Google Scholar 

  3. Arac, A., Zhao, P., Dobkin, B.H., Carmichael, S.T., Golshani, P.: DeepBehavior: a deep learning toolbox for automated analysis of animal and human behavior imaging data. Front. Syst. Neurosci. 13, 20 (2019)

    Article  Google Scholar 

  4. Arteta, C., Lempitsky, V., Zisserman, A.: Counting in the wild. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 483–498. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_30

    Chapter  Google Scholar 

  5. Austin, C.P., et al.: The knockout mouse project. Nat. Genet. 36(9), 921 (2004)

    Article  Google Scholar 

  6. Australian Institute of Marine Science (AIMS) and University of Western Australia (UWA) and Curtin University: Ozfish dataset - machine learning dataset for baited remote underwater video stations (2019)

    Google Scholar 

  7. Bai, H., Cheng, W., Chu, P., Liu, J., Zhang, K., Ling, H.: Gmot-40: a benchmark for generic multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6719–6728 (2021)

    Google Scholar 

  8. Beery, S., Agarwal, A., Cole, E., Birodkar, V.: The iWildCam 2021 competition dataset. arXiv preprint arXiv:2105.03494 (2021)

  9. Beery, S., Van Horn, G., Mac Aodha, O., Perona, P.: The iWildCam 2018 challenge dataset. arXiv preprint arXiv:1904.05986 (2019)

  10. Beery, S., Van Horn, G., Perona, P.: Recognition in terra incognita. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 456–473 (2018)

    Google Scholar 

  11. Berg, T., Liu, J., Woo Lee, S., Alexander, M.L., Jacobs, D.W., Belhumeur, P.N.: Birdsnap: large-scale fine-grained visual categorization of birds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2011–2018 (2014)

    Google Scholar 

  12. Berger-Wolf, T.Y., et al.: Wildbook: crowdsourcing, computer vision, and data science for conservation. arXiv preprint arXiv:1710.08880 (2017)

  13. Bernardin, K., Stiefelhagen, R.: Evaluating multiple object tracking performance: the clear mot metrics. EURASIP J. Image Video Process. 2008, 1–10 (2008)

    Article  Google Scholar 

  14. Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B.: Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 3464–3468. IEEE (2016)

    Google Scholar 

  15. Blanchard, G., Lee, G., Scott, C.: Generalizing from several related classification tasks to a new unlabeled sample. In: Advances in Neural Information Processing Systems, vol. 24 (2011)

    Google Scholar 

  16. Boenisch, F., Rosemann, B., Wild, B., Dormagen, D., Wario, F., Landgraf, T.: Tracking all members of a honey bee colony over their lifetime using learned models of correspondence. Front. Robot. AI 5, 35 (2018)

    Article  Google Scholar 

  17. Bogue, M.A., et al.: Mouse phenome database: a data repository and analysis suite for curated primary mouse phenotype data. Nucleic Acids Res. 48(D1), D716–D723 (2020)

    Google Scholar 

  18. Bolkensteyn, D.: dbolkensteyn/vatic.js, May 2020. https://github.com/dbolkensteyn/vatic.js. Original-date: 2016–11-23T12:39:07Z

  19. Bondi, E., et al.: BIRDSAI: a dataset for detection and tracking in aerial thermal infrared videos. In: WACV (2020)

    Google Scholar 

  20. Boom, B., et al.: A research tool for long-term and continuous analysis of fish assemblage in coral-reefs using underwater camera footage. Ecol. Inf. 23, 83–97 (2014)

    Article  Google Scholar 

  21. Bozek, K., Hebert, L., Mikheyev, A.S., Stephens, G.J.: Towards dense object tracking in a 2d honeybee hive. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4185–4193 (2018)

    Google Scholar 

  22. Brandt, M., et al.: An unexpectedly large count of trees in the west African Sahara and Sahel (2020). https://doi.org/10.3334/ORNLDAAC/1832

  23. Bruslund Haurum, J., Karpova, A., Pedersen, M., Hein Bengtson, S., Moeslund, T.B.: Re-identification of zebrafish using metric learning. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, pp. 1–11 (2020)

    Google Scholar 

  24. Bui, N., Yi, H., Cho, J.: A vehicle counts by class framework using distinguished regions tracking at multiple intersections. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 578–579 (2020)

    Google Scholar 

  25. Chan, A.B., Liang, Z.S.J., Vasconcelos, N.: Privacy preserving crowd monitoring: Counting people without people models or tracking. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–7. IEEE (2008)

    Google Scholar 

  26. Change Loy, C., Gong, S., Xiang, T.: From semi-supervised to transfer counting of crowds. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2256–2263 (2013)

    Google Scholar 

  27. Chen, K., Loy, C.C., Gong, S., Xiang, T.: Feature mining for Localised crowd counting. In: Bmvc. vol. 1, p. 3 (2012)

    Google Scholar 

  28. Chen, Y., Li, W., Gool, L.V.: Road: reality oriented adaptation for semantic segmentation of urban scenes. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7892–7901 (2018)

    Google Scholar 

  29. Ciaparrone, G., Sánchez, F.L., Tabik, S., Troiano, L., Tagliaferri, R., Herrera, F.: Deep learning in video multi-object tracking: a survey. Neurocomputing 381, 61–88 (2020)

    Article  Google Scholar 

  30. Csurka, G.: Domain adaptation for visual applications: a comprehensive survey. arXiv preprint arXiv:1702.05374 (2017)

  31. Cutter, G., Stierhoff, K., Zeng, J.: Automated detection of rockfish in unconstrained underwater videos using Haar cascades and a new image dataset: labeled fishes in the wild. In: 2015 IEEE Winter Applications and Computer Vision Workshops, pp. 57–62. IEEE (2015)

    Google Scholar 

  32. van Dam, E.A., van der Harst, J.E., ter Braak, C.J., Tegelenbosch, R.A., Spruijt, B.M., Noldus, L.P.: An automated system for the recognition of various specific rat behaviours. J. Neurosci. Methods 218(2), 214–224 (2013)

    Article  Google Scholar 

  33. Dave, A., Khurana, T., Tokmakov, P., Schmid, C., Ramanan, D.: TAO: a large-scale benchmark for tracking any object. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 436–454. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_26

    Chapter  Google Scholar 

  34. Dendorfer, P., et al.: Mot20: a benchmark for multi object tracking in crowded scenes. arXiv preprint arXiv:2003.09003 (2020)

  35. Ditria, E.M., Connolly, R.M., Jinks, E.L., Lopez-Marcano, S.: Annotated video footage for automated identification and counting of fish in unconstrained seagrass habitats. Front. Mar. Sci. 8, 160 (2021)

    Article  Google Scholar 

  36. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88(2), 303–338 (2010)

    Article  Google Scholar 

  37. Eyjolfsdottir, E., Branson, K., Yue, Y., Perona, P.: Learning recurrent representations for hierarchical behavior modeling. arXiv preprint arXiv:1611.00094 (2016)

  38. Eyjolfsdottir, E., et al.: Detecting social actions of fruit flies. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 772–787. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_50

    Chapter  Google Scholar 

  39. Fang, C., Xu, Y., Rockmore, D.N.: Unbiased metric learning: on the utilization of multiple datasets and web images for softening bias. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1657–1664 (2013)

    Google Scholar 

  40. Fang, Y., Zhan, B., Cai, W., Gao, S., Hu, B.: Locality-constrained spatial transformer network for video crowd counting. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 814–819. IEEE (2019)

    Google Scholar 

  41. Fennell, M., Beirne, C., Burton, A.C.: Use of object detection in camera trap image identification: assessing a method to rapidly and accurately classify human and animal detections for research and application in recreation ecology. bioRxiv (2022). https://doi.org/10.1101/2022.01.14.476404, https://www.biorxiv.org/content/early/2022/01/21/2022.01.14.476404

  42. Fernandes, A.F.A., Dórea, J.R.R., Rosa, G.J.D.M.: Image analysis and computer vision applications in animal sciences: an overview. Front. Vet. Sci. 7, 551269 (2020)

    Article  Google Scholar 

  43. Rahr, G.: Why protect salmon. https://www.wildsalmoncenter.org/why-protect-salmon/

  44. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the Kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)

    Google Scholar 

  45. van Gemert, J.C., Verschoor, C.R., Mettes, P., Epema, K., Koh, L.P., Wich, S.: Nature conservation drones for automatic localization and counting of animals. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8925, pp. 255–270. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16178-5_17

    Chapter  Google Scholar 

  46. Geuther, B.Q., et al.: Robust mouse tracking in complex environments using neural networks. Commun. Biol. 2(1), 1–11 (2019)

    Article  Google Scholar 

  47. Geuther, B.Q., Peer, A., He, H., Sabnis, G., Philip, V.M., Kumar, V.: Action detection using a neural network elucidates the genetics of mouse grooming behavior. Elife 10, e63207 (2021)

    Article  Google Scholar 

  48. Gulrajani, I., Lopez-Paz, D.: In search of lost domain generalization. arXiv preprint arXiv:2007.01434 (2020)

  49. Hoffman, J., Wang, D., Yu, F., Darrell, T.: FCNs in the wild: pixel-level adversarial and constraint-based adaptation (2016)

    Google Scholar 

  50. Holmberg, J., Norman, B., Arzoumanian, Z.: Estimating population size, structure, and residency time for whale sharks Rhincodon Typus through collaborative photo-identification. Endangered Species Res. 7(1), 39–53 (2009)

    Article  Google Scholar 

  51. Hong, W., Kennedy, A., Burgos-Artizzu, X.P., Zelikowsky, M., Navonne, S.G., Perona, P., Anderson, D.J.: Automated measurement of mouse social behaviors using depth sensing, video tracking, and machine learning. Proc. Natl. Acad. Sci. 112(38), E5351–E5360 (2015)

    Article  Google Scholar 

  52. Hornakova, A., Henschel, R., Rosenhahn, B., Swoboda, P.: Lifted disjoint paths with application in multiple object tracking. In: International Conference on Machine Learning, pp. 4364–4375. PMLR (2020)

    Google Scholar 

  53. Hsieh, M.R., Lin, Y.L., Hsu, W.H.: Drone-based object counting by spatially regularized regional proposal network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4145–4153 (2017)

    Google Scholar 

  54. Huang, J., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7310–7311 (2017)

    Google Scholar 

  55. Idrees, H., et al.: Composition loss for counting, density map estimation and localization in dense crowds. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 532–546 (2018)

    Google Scholar 

  56. Jocher, G., et al.: Ultralytics/yolov5: v6.1 - TensorRT, TensorFlow Edge TPU and OpenVINO Export and Inference, February 2022. https://doi.org/10.5281/zenodo.6222936

  57. Jones, F.M., et al.: Time-lapse imagery and volunteer classifications from the zooniverse penguin watch project. Sci. Data 5(1), 1–13 (2018)

    Article  Google Scholar 

  58. Kalman, R.E.: A new approach to linear filtering and prediction problems. J. Basic Eng. 82(1), 35–45 (1960). https://doi.org/10.1115/1.3662552, https://asmedigitalcollection.asme.org/fluidsengineering/article/82/1/35/397706/A-New-Approach-to-Linear-Filtering-and-Prediction

  59. Kamenetsky, D., Sherrah, J.: Aerial car detection and urban understanding. In: 2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA), pp. 1–8. IEEE (2015)

    Google Scholar 

  60. Kellenberger, B., Marcos, D., Tuia, D.: Detecting mammals in UAV images: best practices to address a substantially imbalanced dataset with deep learning. Remote Sens. Environ. 216, 139–153 (2018)

    Article  Google Scholar 

  61. Key, B., Miller, J., Huang, J.: Operational plan: Kenai river chinook salmon sonar assessment at river mile 13(7), 2020–2022 (2020)

    Google Scholar 

  62. Kocamaz, M.K., Gong, J., Pires, B.R.: Vision-based counting of pedestrians and cyclists. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–8. IEEE (2016)

    Google Scholar 

  63. Koh, P.W., et al.: Wilds: a benchmark of in-the-wild distribution shifts. In: International Conference on Machine Learning, pp. 5637–5664. PMLR (2021)

    Google Scholar 

  64. Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: 2011 International Conference on Computer Vision, pp. 2556–2563. IEEE (2011)

    Google Scholar 

  65. Kulits, P., Wall, J., Bedetti, A., Henley, M., Beery, S.: ElephantBook: a semi-automated human-in-the-loop system for elephant re-identification. In: ACM SIGCAS Conference on Computing and Sustainable Societies, pp. 88–98 (2021)

    Google Scholar 

  66. Kumar, N., et al.: Leafsnap: a computer vision system for automatic plant species identification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7573, pp. 502–516. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33709-3_36

    Chapter  Google Scholar 

  67. Leal-Taixé, L., Milan, A., Reid, I., Roth, S., Schindler, K.: Motchallenge 2015: towards a benchmark for multi-target tracking. arXiv preprint arXiv:1504.01942 (2015)

  68. Li, S., Li, J., Lin, W., Tang, H.: Amur tiger re-identification in the wild. arXiv e-prints pp. arXiv-1906 (2019)

    Google Scholar 

  69. Liu, L., Lu, H., Cao, Z., Xiao, Y.: Counting fish in sonar images. In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 3189–3193, October 2018. https://doi.org/10.1109/ICIP.2018.8451154. iSSN: 2381-8549

  70. Luiten, J., et al.: Hota: a higher order metric for evaluating multi-object tracking. Int. J. Comput. Vision 129(2), 548–578 (2021)

    Article  Google Scholar 

  71. Luo, W., Xing, J., Milan, A., Zhang, X., Liu, W., Kim, T.K.: Multiple object tracking: a literature review. Artif. Intell. 293, igence, x (2021)

    Google Scholar 

  72. Ma, Z., Chan, A.B.: Crossing the line: Crowd counting by integer programming with local features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2539–2546 (2013)

    Google Scholar 

  73. Mandal, V., Adu-Gyamfi, Y.: Object detection and tracking algorithms for vehicle counting: a comparative analysis. J. Big Data Anal. Transp. 2(3), 251–261 (2020)

    Article  Google Scholar 

  74. Marstaller, J., Tausch, F., Stock, S.: Deepbees-building and scaling convolutional neuronal nets for fast and large-scale visual monitoring of bee hives. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2019)

    Google Scholar 

  75. Mathis, M.W., Mathis, A.: Deep learning tools for the measurement of animal behavior in neuroscience. Curr. Opin. Neurobiol. 60, 1–11 (2020)

    Article  Google Scholar 

  76. McCann, E., Li, L., Pangle, K., Johnson, N., Eickholt, J.: An underwater observation dataset for fish classification and fishery assessment. Sci. Data 5(1), 1–8 (2018)

    Article  Google Scholar 

  77. Milan, A., Leal-Taixé, L., Reid, I., Roth, S., Schindler, K.: Mot16: A benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831 (2016)

  78. Moranduzzo, T., Melgani, F.: Automatic car counting method for unmanned aerial vehicle images. IEEE Trans. Geosci. Remote Sens. 52(3), 1635–1647 (2013)

    Article  Google Scholar 

  79. Muandet, K., Balduzzi, D., Schölkopf, B.: Domain generalization via invariant feature representation. In: International Conference on Machine Learning, pp. 10–18. PMLR (2013)

    Google Scholar 

  80. Naphade, M., et al.: The 5th AI city challenge. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4263–4273 (2021)

    Google Scholar 

  81. Nilsback, M.E., Zisserman, A.: A visual vocabulary for flower classification. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), vol. 2, pp. 1447–1454. IEEE (2006)

    Google Scholar 

  82. Norouzzadeh, M.S., et al.: Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. Proc. Natl. Acad. Sci. 115(25), E5716–E5725 (2018)

    Article  Google Scholar 

  83. Oñoro-Rubio, D., López-Sastre, R.J.: Towards perspective-free object counting with deep learning. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 615–629. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_38

    Chapter  Google Scholar 

  84. Pardo, L.E., et al.: Snapshot safari: a large-scale collaborative to monitor Africa’s remarkable biodiversity. S. J. Sci. 117(1–2), 1–4 (2021)

    Google Scholar 

  85. Parham, J.R., Crall, J., Stewart, C., Berger-Wolf, T., Rubenstein, D.: Animal population censusing at scale with citizen science and photographic identification. In: 2017 AAAI Spring Symposium Series (2017)

    Google Scholar 

  86. Pedersen, M., Haurum, J.B., Bengtson, S.H., Moeslund, T.B.: 3d-zef: a 3d zebrafish tracking benchmark dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2426–2436 (2020)

    Google Scholar 

  87. Revaud, J., Weinzaepfel, P., Harchaoui, Z., Schmid, C.: Deepmatching: Hierarchical deformable dense matching. Int. J. Comput. Vision 120(3), 300–323 (2016)

    Article  MathSciNet  Google Scholar 

  88. Rey, N., Volpi, M., Joost, S., Tuia, D.: Detecting animals in African savanna with UAVs and the crowds. Remote Sens. Environ. 200, 341–351 (2017)

    Article  Google Scholar 

  89. Richards, B.L., Drazen, J.C., Virginia Moriwake, V.: Hawai’i deep-7 bottomfish training and validation image dataset: Noaa pacific islands fisheries science center botcam stereo-video (2014)

    Google Scholar 

  90. Ristani, E., Solera, F., Zou, R., Cucchiara, R., Tomasi, C.: Performance measures and a data set for multi-target, multi-camera tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 17–35. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_2

    Chapter  Google Scholar 

  91. Rodriguez, I.F., Megret, R., Acuna, E., Agosto-Rivera, J.L., Giray, T.: Recognition of pollen-bearing bees from video using convolutional neural network. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 314–322. IEEE (2018)

    Google Scholar 

  92. Saenko, K., Kulis, B., Fritz, M., Darrell, T.: Adapting visual category models to new domains. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 213–226. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_16

    Chapter  Google Scholar 

  93. Saleh, A., Laradji, I.H., Konovalov, D.A., Bradley, M., Vazquez, D., Sheaves, M.: A realistic fish-habitat dataset to evaluate algorithms for underwater visual analysis. Sci. Rep. 10(1), 1–10 (2020)

    Article  Google Scholar 

  94. Sam, D.B., Peri, S.V., Sundararaman, M.N., Kamath, A., Radhakrishnan, V.B.: Locate, size and count: accurately resolving people in dense crowds via detection. IEEE Trans. Pattern Anal. Mach. Intell. 43(8), 2739–2751 (2020)

    Google Scholar 

  95. Schneider, S., Zhuang, A.: Counting fish and dolphins in sonar images using deep learning. arXiv preprint arXiv:2007.12808 (2020)

  96. Shao, W., Kawakami, R., Yoshihashi, R., You, S., Kawase, H., Naemura, T.: Cattle detection and counting in UAV images based on convolutional neural networks. Int. J. Remote Sens. 41(1), 31–52 (2020)

    Article  Google Scholar 

  97. Soomro, K., Zamir, A.R., Shah, M.: Ucf101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)

  98. Stierhoff, K., Cutter, G.: Rockfish (sebastes spp.) training and validation image dataset: Noaa southwest fisheries science center remotely operated vehicle (ROV) digital still images (2013)

    Google Scholar 

  99. Sun, J.J., et al.: The multi-agent behavior dataset: mouse dyadic social interactions. arXiv preprint arXiv:2104.02710 (2021)

  100. Sun, P., et al.: Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2446–2454 (2020)

    Google Scholar 

  101. Swanson, A., Kosmala, M., Lintott, C., Simpson, R., Smith, A., Packer, C.: Snapshot Serengeti, high-frequency annotated camera trap images of 40 mammalian species in an African savanna. Sci. Data 2(1), 1–14 (2015)

    Article  Google Scholar 

  102. Tabak, M.A., et al.: Machine learning to classify animal species in camera trap images: applications in ecology. Methods Ecol. Evol. 10(4), 585–590 (2019)

    Article  Google Scholar 

  103. The Nature conservancy: channel islands camera traps 1.0 (2021)

    Google Scholar 

  104. Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: CVPR 2011, pp. 1521–1528. IEEE (2011)

    Google Scholar 

  105. Tuia, D., et al.: Perspectives in machine learning for wildlife conservation. Nat. Commun. 13(1), 1–15 (2022)

    Article  Google Scholar 

  106. Van Horn, G., et al.: Building a bird recognition app and large scale dataset with citizen scientists: the fine print in fine-grained dataset collection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 595–604 (2015)

    Google Scholar 

  107. Van Horn, G., et al.: The INaturalist species classification and detection dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8769–8778 (2018)

    Google Scholar 

  108. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD birds-200-2011 dataset (2011)

    Google Scholar 

  109. Wang, Q., Gao, J., Lin, W., Li, X.: NWPU-Crowd: a large-scale benchmark for crowd counting and localization. IEEE Trans. Pattern Anal. Mach. Intell. 43(6), 2141–2149 (2020)

    Article  Google Scholar 

  110. Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Underst. 104(2–3), 249–257 (2006)

    Article  Google Scholar 

  111. Weinstein, B.G., et al.: A remote sensing derived data set of 100 million individual tree crowns for the national ecological observatory network. Elife 10, e62922 (2021)

    Article  Google Scholar 

  112. Wen, L., et al.: UA-DETRAC: a new benchmark and protocol for multi-object detection and tracking. Comput. Vis. Image Underst. 193, 102907 (2020)

    Article  Google Scholar 

  113. Wen, L., et al.: Detection, tracking, and counting meets drones in crowds: a benchmark. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7812–7821 (2021)

    Google Scholar 

  114. Wen, L., et al.: Detection, tracking, and counting meets drones in crowds: a benchmark. In: CVPR (2021)

    Google Scholar 

  115. Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649. IEEE (2017)

    Google Scholar 

  116. Wu, Z., Fuller, N., Theriault, D., Betke, M.: A thermal infrared video benchmark for visual analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 201–208 (2014)

    Google Scholar 

  117. Ye, N., et al.: OoD-Bench: quantifying and understanding two dimensions of out-of-distribution generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7947–7958 (2022)

    Google Scholar 

  118. Yousif, H., Kays, R., He, Z.: Dynamic programming selection of object proposals for sequence-level animal species classification in the wild. IEEE Trans. Circuits Syst. Video Technol. (2019)

    Google Scholar 

  119. Yu, F., et al.: Bdd100k: a diverse driving video database with scalable annotation tooling, vol. 2, no. 5, p. 6 (2018). arXiv preprint arXiv:1805.04687

  120. Zhang, C., Kang, K., Li, H., Wang, X., Xie, R., Yang, X.: Data-driven crowd understanding: a baseline for a large-scale crowd dataset. IEEE Trans. Multimedia 18(6), 1048–1061 (2016)

    Article  Google Scholar 

  121. Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 833–841 (2015)

    Google Scholar 

  122. Zhang, S., Wu, G., Costeira, J.P., Moura, J.M.: FCN-rLSTM: deep spatio-temporal neural networks for vehicle counting in city cameras. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3667–3676 (2017)

    Google Scholar 

  123. Zhang, Y., David, P., Gong, B.: Curriculum domain adaptation for semantic segmentation of urban scenes. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2039–2049 (2017)

    Google Scholar 

  124. Zhang, Y., et al.: Bytetrack: multi-object tracking by associating every detection box. arXiv preprint arXiv:2110.06864 (2021)

  125. Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: Fairmot: on the fairness of detection and re-identification in multiple object tracking. arXiv preprint arXiv:2004.01888 (2020)

  126. Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 589–597 (2016)

    Google Scholar 

  127. Zhang, Z., He, Z., Cao, G., Cao, W.: Animal detection from highly cluttered natural scenes using spatiotemporal object region proposals and patch verification. IEEE Trans. Multimedia 18(10), 2079–2092 (2016)

    Article  Google Scholar 

  128. Zhao, Z., Li, H., Zhao, R., Wang, X.: Crossing-line crowd counting with two-phase deep neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 712–726. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_43

    Chapter  Google Scholar 

  129. Zheng, Z., Yang, X., Yu, Z., Zheng, L., Yang, Y., Kautz, J.: Joint discriminative and generative learning for person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2138–2147 (2019)

    Google Scholar 

  130. Zhou, K., Liu, Z., Qiao, Y., Xiang, T., Loy, C.C.: Domain generalization in vision: a survey (2021)

    Google Scholar 

  131. Zhou, Y., Yu, H., Wu, J., Cui, Z., Zhang, F.: Fish behavior analysis based on computer vision: a survey. In: Mao, R., Wang, H., Xie, X., Lu, Z. (eds.) ICPCSEE 2019. CCIS, vol. 1059, pp. 130–141. Springer, Singapore (2019). https://doi.org/10.1007/978-981-15-0121-0_10

    Chapter  Google Scholar 

  132. Zhu, P., Peng, T., Du, D., Yu, H., Zhang, L., Hu, Q.: Graph regularized flow attention network for video animal counting from drones. IEEE Trans. Image Process. (2021)

    Google Scholar 

  133. Zou, Z., Shi, Z., Guo, Y., Ye, J.: Object detection in 20 years: a survey. arxiv preprint arXiv:1905.05055 (2019)

Download references

Acknowledgment

We are grateful to AWS for a gift to Trout Unlimited (TU) that supported data annotations, computational and storage costs, and to the Resnick Sustainability Institute at Caltech for funding to SB and PP. An NSF Fellowship supported SB. JK, SD, and EY volunteered their time. GVH was supported by the Macaulay Library at Cornell University. For collecting the dataset, and for feedback, encouragement, and moral support, we are grateful to: George Pess and Oleksandr Stefankiv (Northwest Fisheries Science Center); James Miller, Carl Pfisterer, Dawn Wilburn, Brandon Key, Suzanne Maxwell, Gregory Buck, April Faulkner, and Jordan Head (Alaska Department of Fish and Game); Dave Kajtaniak and Michael Sparkman (California Department of Fish and Wildlife); Dean Finnerty (TU’s Wild Steelhead Project); and Keith Denton, Mike McHenry, and the Lower Elwha Klallam Tribe.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Justin Kay .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 5023 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kay, J. et al. (2022). The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13668. Springer, Cham. https://doi.org/10.1007/978-3-031-20074-8_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20074-8_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20073-1

  • Online ISBN: 978-3-031-20074-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics