Abstract
The ability to capture detailed interactions among individuals in a social group is foundational to our study of animal behavior and neuroscience. Recent advances in deep learning and computer vision are driving rapid progress in methods that can record the actions and interactions of multiple individuals simultaneously. Many social species, such as birds, however, live deeply embedded in a three-dimensional world. This world introduces additional perceptual challenges such as occlusions, orientation-dependent appearance, large variation in apparent size, and poor sensor coverage for 3D reconstruction, that are not encountered by applications studying animals that move and interact only on 2D planes. Here we introduce a system for studying the behavioral dynamics of a group of songbirds as they move throughout a 3D aviary. We study the complexities that arise when tracking a group of closely interacting animals in three dimensions and introduce a novel dataset for evaluating multi-view trackers. Finally, we analyze captured ethogram data and demonstrate that social context affects the distribution of sequential interactions between birds in the aviary.
Similar content being viewed by others
Data and code availability
Data and code will be made publicly available via Google Drive and GitHub.
References
Anderson, H. L., Perkes, A., Gottfried, J. S., Davies, H. B., White, D. J., & Schmidt, M. F. (2021). Female signal jamming in a socially monogamous brood parasite. Animal Behaviour, 172, 155–169. https://doi.org/10.1016/j.anbehav.2020.10.011
Atanasov, N., Zhu, M., Daniilidis, K., & Pappas, G. J. (2014). Semantic localization via the matrix permanent. Robotics: Science and Systems, 2, 1–10.
Badger, M., Wang, Y., Modh, A., Perkes, A., Kolotouros, N., Pfrommer, B., & Daniilidis, K. (2020). 3d bird reconstruction: A dataset, model, and shape recovery from a single view. Eccv.
Bala, P. C., Eisenreich, B. R., Yoo, S. B. M., Hayden, B. Y., Park, H. S., & Zimmermann, J. (2020). Automated markerless pose estimation in freely moving macaques with openmonkeystudio. Nature Communications, 11(1), 4560. https://doi.org/10.1038/s41467-020-18441-5
Beery, S., Wu, G., Rathod, V., Votel, R., & Huang, J. (2020). Context r-cnn: Long term temporal context for per-camera object detection. Cvpr.
Bergmann, P., Meinhardt, T., & Leal-Taixé, L. (2019). Tracking without bells and whistles. Iccv. https://doi.org/10.1109/ICCV.2019.00103
Berman, G. J., Choi, D. M., Bialek, W., & Shaevitz, J. W. (2014). Mapping the stereotyped behaviour of freely moving fruit flies. Journal of The Royal Society Interface, 11(99), 20140672. https://doi.org/10.1098/rsif.2014.0672
Bernardin, K., & Stiefelhagen, R. (2008). Evaluating multiple object tracking performance: The clear mot metrics. EURASIP Journal on Image and Video Processing, 2008(1), 246309. https://doi.org/10.1155/2008/246309
Bewley, A., Ge, Z., Ott, L., Ramos, F., & Upcroft, B. (2016). Simple online and realtime tracking. In 2016 ieee international conference on image processing (icip) (pp. 3464–3468). https://doi.org/10.1109/ICIP.2016.7533003
Bewley, A., Ge, Z., Ott, L., Ramos, F., & Upcroft, B. (2016). Simple online and realtime tracking. Icip
Caravaggi, A., Banks, P. B., Burton, A. C., Finlay, C. M., Haswell, P. M., Hayward, M. W., & Wood, M. D. (2017). A review of camera trapping for conservation behaviour research. Remote Sensing in Ecology and Conservation, 3(3), 109–122.
Cavagna, A., Melillo, S., Parisi, L., & Ricci-Tersenghi, F. (2021). Sparta tracking across occlusions via partitioning of 3d clouds of points. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 1394–1403.
Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. In Proceedings of the British Machine Vision Conference (BMVC).
Chen, Z., Zhang, R., Eva Zhang, Y., Zhou, H., Fang, H.-S., Rock, R. R., & Lu, C. (2020). Alphatracker: A multi-animal tracking and behavioral analysis tool. bioRxiv. https://doi.org/10.1101/2020.12.04.405159
Cheng, X., Qian, Z.-M., Wang, S. H., Jiang, N., Guo, A., & Chen, Y. (2015). A novel method for tracking individuals of fruit fly swarms flying in a laboratory flight arena. PloS One, 10, e0129657. https://doi.org/10.1371/journal.pone.0129657
Chiu, H. -k., Prioletti, A., Li, J., & Bohg, J. (2020). Probabilistic 3d multi-object tracking for autonomous driving. arXiv preprint arXiv:2001.05673.
Ciaparrone, G., Luque Sánchez, F., Tabik, S., Troiano, L., Tagliaferri, R., & Herrera, F. (2020). Deep learning in video multi-object tracking: A survey. Neurocomputing, 381, 61–88. https://doi.org/10.1016/j.neucom.2019.11.023
Dave, A., Khurana, T., Tokmakov, P., Schmid, C., & Ramanan, D. (2020). Tao: A large-scale benchmark for tracking any object. Eccv.
Dong, J., Fang, Q., Jiang, W., Yang, Y., Huang, Q., Bao, H., & Zhou, X. (2021). Fast and robust multi-person 3d pose estimation and tracking from multiple views. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10), 6981–92.
Dunn, T. W., Marshall, J. D., Severson, K. S., Aldarondo, D. E., Hildebrand, D. G. C., Chettih, S. N., & Ölveczky, B. P. (2021). Geometric deep learning enables 3d kinematic profiling across species and environments. Nature Methods, 18(5), 564–573. https://doi.org/10.1038/s41592-021-01106-6
Dutta, A., & Zisserman, A. (2019). The VIA annotation software for images, audio and video. In Proceedings of the 27th acm international conference on multimedia. New York, NY, USAACM. https://doi.org/10.1145/3343031.3350535
Evangelista, D. J., Ray, D. D., Raja, S. K., & Hedrick, T. L. (2017). Three-dimensional trajectories and network analyses of group behaviour within chimney swift flocks during approaches to the roost. Proceedings of the Royal Society B: Biological Sciences, 284(1849), 20162602. https://doi.org/10.1098/rspb.2016.2602
Ferreira, A. C., Silva, L. R., Renna, F., Brandl, H. B., Renoult, J. P., Farine, D. R., & Doutrelant, C. (2020). Deep learning-based methods for individual recognition in small birds. Methods in Ecology and Evolution, 11(9), 1072–1085.
Gan, Y., Han, R., Yin, L., Feng, W., & Wang, S. (2021). Self-supervised multi-view multi-human association and tracking. Acm mm.
Girshick, R. (2015). Fast r-cnn. Iccv.
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. Cvpr.
Gosztolai, A., Günel, S., Lobato-Ríos, V., Pietro Abrate, M., Morales, D., Rhodin, H., & Ramdya, P. (2021). Liftpose3d, a deep learning-based approach for transforming two-dimensional to three-dimensional poses in laboratory animals. Nature Methods, 18(8), 975–981. https://doi.org/10.1038/s41592-021-01226-z
Graving, J. M., Chae, D., Naik, H., Li, L., Koger, B., Costelloe, B. R., & Couzin, I. D. (2019). Deepposekit, a software toolkit for fast and robust animal pose estimation using deep learning. eLife, 8, e47994. https://doi.org/10.7554/eLife.47994
Günel, S., Rhodin, H., Morales, D., Campagnolo, J., Ramdya, P., & Fua, P. (2019). Deepfly3d, a deep learning-based approach for 3d limb and appendage tracking in tethered, adult drosophila. eLife, 8, e48571. https://doi.org/10.7554/eLife.48571
Han, X., You, Q., Wang, C., Zhang, Z., Chu, P., Hu, H., & Liu, Z. (2021). MMPTRACK: Large-scale Densely Annotated Multi-camera Multiple People Tracking Benchmark. Mmptrack: Large-scale densely annotated multi-camera multiple people tracking benchmark.
Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge: Cambridge University Press.
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. Iccv.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (cvpr) (pp. 770–778). https://doi.org/10.1109/CVPR.2016.90
Heras, F. J. H., Romero-Ferrero, F., Hinz, R. C., & de Polavieja, G. G. (2019). Deep attention networks reveal the rules of collective motion in zebrafish. PLOS Computational Biology, 15(9), 1–23. https://doi.org/10.1371/journal.pcbi.1007354
Hou, J., He, Y., Yang, H., Connor, T., Gao, J., Wang, Y., et al. (2020). Identification of animal individuals using deep learning: A case study of giant panda. Biological Conservation, 242, 108414.
Joska, D., Clark, L., Muramatsu, N., Jericevich, R., Nicolls, F., Mathis, A. & Patel, A. (2021). Acinoset: A 3d pose estimation dataset and baseline models for cheetahs in the wild. In 2021 IEEE International Conference on Robotics and Automation (icra) (pp. 13901–13908). https://doi.org/10.1109/ICRA48506.2021.9561338
Karunasekera, H., Wang, H., & Zhang, H. (2019). Multiple object tracking with attention to appearance, structure, motion and size. IEEE Access, 7, 104423–104434.
Katz, Y., Tunstrøm, K., Ioannou, C. C., Huepe, C., & Couzin, I. D. (2011). Inferring the structure and dynamics of interactions in schooling fish. Proceedings of the National Academy of Sciences, 108(46), 18720–18725. https://doi.org/10.1073/pnas.1107583108
Kohn, G. M., King, A. P., Dohme, R., Meredith, G. R., & West, M. J. (2013). In the company of cowbirds, molothrus ater ater: Robust patterns of sociability predict reproductive performance. Journal of Comparative Psychology, 127, 40–48. https://doi.org/10.1037/a0029681
Krogius, M., Haggenmiller, A., Olson, E. (2019 October). Flexible layouts for fiducial tags. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
Lauer, J., Zhou, M., Ye, S., Menegas, W., Schneider, S., Nath, T., & Mathis, A. (2022). Multi-animal pose estimation, identification and tracking with deeplabcut. Nature Methods, 19(4), 496–504.
Lauer, J., Zhou, M., Ye, S., Menegas, W., Schneider, S., Nath, T., et al. (2022). Multi-animal pose estimation, identification and tracking with deeplabcut. Nature Methods, 19(4), 496–504.
Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. Iccv.
Ling, H., Mclvor, G. E., Nagy, G., MohaimenianPour, S., Vaughan, R. T., Thornton, A., & Ouellette, N. T. (2018). Simultaneous measurements of three-dimensional trajectories and wingbeat frequencies of birds in the field. Journal of The Royal Society Interface, 15(147), 20180653.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A.C. (2016). Ssd: Single shot multibox detector. Eccv.
Luo, H., Gu, Y., Liao, X., Lai, S., & Jiang, W. (2019). June). Cvpr workshops: Bag of tricks and a strong baseline for deep person re-identification.
Luo, H., Jiang, W., Gu, Y., Liu, F., Liao, X., Lai, S., & Gu, J. (2019). A strong baseline and batch normalization neck for deep person re-identification. IEEE Transactions on Multimedia. https://doi.org/10.1109/TMM.2019.2958756
Maguire, S. E., Schmidt, M. F., & White, D. J. (2013). Social brains in context: Lesions targeted to the song control system in female cowbirds affect their social network. PLOS ONE, 8(5), 1–8. https://doi.org/10.1371/journal.pone.0063239
Mathis, A., Mamidanna, P., Cury, K. M., Abe, T., Murthy, V. N., Mathis, M. W., & Bethge, M. (2018). DeepLabCut: Markerless pose estimation of user-defined body parts with deep learning. Nature Neuroscience, 21(9), 1281–1289.
Mordant, N., Crawford, A., & Bodenschatz, E. (2004). Experimental lagrangian acceleration probability density function measurement. Physica D: Nonlinear Phenomena, 193(1), 245–251. https://doi.org/10.1016/j.physd.2004.01.041
Ouellette, N. T., Xu, H., & Bodenschatz, E. (2006). A quantitative study of three-dimensional lagrangian particle tracking algorithms. Experiments in Fluids, 40(2), 301–313.
Pereira, T. D., Aldarondo, D. E., Willmore, L., Kislin, M., Wang, S.S.-H., Murthy, M., & Shaevitz, J. W. (2019). Fast animal pose estimation using deep neural networks. Nature Methods, 16(1), 117–125. https://doi.org/10.1038/s41592-018-0234-5
Pereira, T. D., Tabris, N., Matsliah, A., Turner, D. M., Li, J., Ravindranath, S., et al. (2022). Sleap: A deep learning system for multi-animal pose tracking. Nature Methods, 19(4), 486–495.
Pérez-Escudero, A., Vicente-Page, J., Hinz, R. C., Arganda, S., & de Polavieja, G. G. (2014). idtracker: Tracking individuals in a group by automatic identification of unmarked animals. Nature Methods, 11(7), 743–748. https://doi.org/10.1038/nmeth.2994
Qi, J., Gao, Y., Hu, Y., Wang, X., Liu, X., Bai, X. & Bai, S. (2021). Occluded video instance segmentation: A benchmark. arXiv preprint arXiv:2102.01558.
Quigley, M., Gerkey, B., Conley, K., Faust, J., Foote, T., Leibs, J. & Ng, A. (2009). Ros: an open-source robot operating system. In Proc. of the IEEE Intl. Conf. on Robotics and Automation (icra) Workshop on Open Source Robotics.
Rajasegaran, J., Pavlakos, G., Kanazawa, A., & Malik, J. (2021). Tracking people with 3d representations.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. Cvpr. https://doi.org/10.1109/CVPR.2016.91
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, 28.
Romero-Ferrero, F., Bergomi, M. G., Hinz, R. C., Heras, F. J. H., & de Polavieja, G. G. (2019). idtracker.ai: Tracking all individuals in small or large collectives of unmarked animals. Nature Methods, 16(2), 179–182. https://doi.org/10.1038/s41592-018-0295-5
Schneider, S., Taylor, G., Linquist, S., & Kremer, S. (2018). Past, present, and future approaches using computer vision for animal re-identification from camera trap data. Methods in Ecology and Evolution. https://doi.org/10.1111/2041-210X.13133
Schofield, D., Nagrani, A., Zisserman, A., Hayashi, M., Matsuzawa, T., Biro, D., & Carvalho, S. (2019). Chimpanzee face recognition from videos in the wild using deep learning. Science Advances, 5(9), eaaw0736.
Segalin, C., Williams, J., Karigo, T., Hui, M., Zelikowsky, M., Sun, J. J., & Kennedy, A. (2021). The mouse action recognition system (mars) software pipeline for automated analysis of social behaviors in mice. eLife, 10, e63720. https://doi.org/10.7554/eLife.63720
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. Sci: Comput. Therm
Sinhuber, M., Van Der Vaart, K., Ni, R., Puckett, J. G., Kelley, D. H., & Ouellette, N. T. (2019). Three-dimensional time-resolved trajectories from laboratory insect swarms. Scientific Data, 6(1), 1–8.
Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Patnaik, V., & Tsui, P. (2020). others. Waymo open dataset. Cvpr: Scalability in perception for autonomous driving.
Walter, T., & Couzin, I. (2021). Trex, a fast multi-animal tracking system with markerless identification, 2d posture estimation and visual field reconstruction. eLife, 10, e64000. https://doi.org/10.7554/eLife.64000
Wang, B., Wang, G., Luk Chan, K., & Wang, L. (2014). Tracklet association with online target-specific metric learning. Cvpr.
Wang, X., Kong, T., Shen, C., Jiang, Y., & Li, L. (2020). SOLO: Segmenting objects by locations. Eccv.
Wang, Y., Kolotouros, N., Daniilidis, K., & Badger, M. (2021). Birds of a feather: Capturing avian shape models from images. Computer Vision and Pattern Recognition (cvpr).
Weinberger, K. Q., & Saul, L. K. (2009). Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10(9), 207–244.
Weng, X., Wang, J., Held, D., & Kitani, K. (2020). 3D Multi-object tracking: A baseline and new evaluation metrics. Iros.
White, D. J. (2010). A social ethological perspective applied to care of and research on songbirds. ILAR Journal, 51(4), 387–393. https://doi.org/10.1093/ilar.51.4.387
White, D. J., Gersick, A. S., & Snyder-Mackler, N. (2012). Social networks and the development of social skills in cowbirds. Philosophical Transactions of the Royal Society of London. Series B, Biological sciences, 367(1597), 1892–1900. https://doi.org/10.1098/rstb.2011.0223
Wojke, N., Bewley, A., & Paulus, D. (2017). Simple online and realtime tracking with a deep association metric. Icip.
Wojke, N., Bewley, A., & Paulus, D. (2017). Simple online and realtime tracking with a deep association metric. In 2017 IEEE International Conference on Image Processing (icip) (pp. 3645–3649). https://doi.org/10.1109/ICIP.2017.8296962
Wu, Z., & Betke, M. (2016). Global optimization for coupled detection and data association in multiple object tracking. Computer Vision and Image Understanding, 143, 25–37.
Wu, Z., Fuller, N., Theriault, D., & Betke, M. (2014). A thermal infrared video benchmark for visual analysis. Cvpr workshops.
Wu, Z., Hristov, N. I., Kunz, T. H., & Betke, M. (2009). Tracking-reconstruction or reconstruction-tracking? comparison of two multiple hypothesis tracking approaches to interpret 3d object motion from several camera views. Workshop on Motion and Video Computing (wmvc). https://doi.org/10.1109/WMVC.2009.5399245
Xu, H. (2008). Tracking lagrangian trajectories in position-velocity space. Measurement Science and Technology, 19, 075105. https://doi.org/10.1088/0957-0233/19/7/075105
Yi, D., Lei, Z., Liao, S., & Li, S. Z. (2014). Deep metric learning for person re-identification. Icpr.
Yin, T., Zhou, X., & Krahenbuhl, P. (2021). Center-based 3d object detection and tracking. Cvpr.
Yu, H., Xu, Y., Zhang, J., Zhao, W., Guan, Z., & Tao, D. (2021). Ap-10k: A benchmark for animal pose estimation in the wild. arXiv preprint arXiv:2108.12617.
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015). Scalable person re-identification: A benchmark. In 2015 IEEE International Conference on Computer Vision (iccv) (pp. 1116–1124). https://doi.org/10.1109/ICCV.2015.133
Zhou, X., Zhu, M., & Daniilidis, K. (2015). Multi-image matching via fast alternating minimization. Iccv.
Zivkovic, Z. (2004). Improved adaptive gaussian mixture model for background subtraction. Icpr. https://doi.org/10.1109/ICPR.2004.1333992
Zivkovic, Z., & van der Heijden, F. (2006). Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognition Letters, 27(7), 773–780. https://doi.org/10.1016/j.patrec.2005.11.005
Zou, G., Fu, G., Peng, X., Liu, Y., Gao, M., & Liu, Z. (2021). Person re-identification based on metric learning: a survey. Multimedia Tools and Applications, 80(17), 26855–26888.
Zuffi, S., Kanazawa, A., Berger-Wolf, T., & Black, M.J. (2019). Three-d safari: Learning to estimate zebra pose, shape, and texture from images” in the wild”. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 5359–5368).
Acknowledgements
We are grateful for the help of Henry Korpi, Ana Alonso, Greg Forkin, and Marcelina Martynek for their helpful discussion and many contributions to annotations in the dataset.
Funding
We gratefully acknowledge support through the following grants: National Science Foundation IOS-1557499, National Science Foundation MRI 1626008, National Science Foundation NCS-FO 2124355.
Author information
Authors and Affiliations
Contributions
MS and KD conceived of the study. AP, BP, and MS constructed the aviary and collected the data. MB, SX, YW, and KD designed the tracking approaches and dataset. MB, SX, and YW developed the tracking and re-ID pipelines. MB and AP prepared the dataset. MB, SX, and YW performed the experiments and created the figures. MB and SX wrote the first draft. MB, SX, YW, MS and KD edited the paper for submission.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing or conflicts of interest.
Ethical approval
The aviary and cowbird data collection were approved by the University of Pennsylvania Institutional Animal Care and Use Committee.
Additional information
Communicated by Helge Rhodin.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xiao, S., Wang, Y., Perkes, A. et al. Multi-view Tracking, Re-ID, and Social Network Analysis of a Flock of Visually Similar Birds in an Outdoor Aviary. Int J Comput Vis 131, 1532–1549 (2023). https://doi.org/10.1007/s11263-023-01768-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-023-01768-z