1 Introduction

From intercepting moving targets to manoeuvring through clutter, birds use vision to coordinate their flight manoeuvres with an agility and flexibility beyond the reach of current autonomous systems. Nevertheless, the links between their vision, guidance and control are complex and poorly understood. Much more is known about the role of vision in insect flight (Taylor et al., 2008), presumably because the size and sentience of birds complicates the experimental characterisation of their visuomotor control (Altshuler & Srinivasan, 2018). Many of the previous works on avian visually guided flight followed insect studies (Baird et al., 2021; Tammero & Dickinson, 2002a, b; Altshuler & Srinivasan, 2018) and investigated the animal’s behaviour in abstract visual environments (Bhagavatula et al., 2011; Schiffner & Srinivasan, 2015; Dakin et al., 2016; Ros & Biewener, 2016), such as corridors with vertically or horizontally striped walls. This proved useful as a first step in exploring how birds use visual self-motion cues, and in isolating their effects on flight control. For example, budgerigars flying through narrow corridors regulate flight speed in response to optic flow from sliding gratings projected onto the walls (Schiffner & Srinivasan, 2015). However, these approaches oversimplify the rich visual input available to birds in their natural habitat, more so than for flying insects, since birds’ visual acuity and neural organisation is more complex (Altshuler & Srinivasan, 2018). As a result, the conclusions that can be drawn from these studies about the birds’ strategies in the wild are limited.

In this paper, we present a method for reconstructing the visual scene a bird experiences while flying through a structured environment, as a first step towards understanding how birds use visual information to guide and control their flight. Specifically, we combine high-speed motion capture data with a three-dimensional (3D) reconstruction of the laboratory environment to generate synthetic visual inputs that characterise the information likely available to the bird in flight.

We aim to support the analysis of large quantities of data across multiple individuals, using environments that may vary experimentally across trials. The synthetic inputs we can generate with our method characterise the bird’s visual experience of its own self-motion in detail over its full visual field, opening up several new avenues of research in bio-inspired computer vision and behavioural modelling.

We demonstrate the possibilities of using our method to answer mechanistic questions in behaviour with pilot data from three sample flights: one pursuit flight, in which a hawk intercepts a moving target pulled across the ground, and two obstacle avoidance flights, in which a hawk flies between two perches around a set of obstacles. In the following sections, we provide an overview of active vision in bird flight, review previous approaches to study it experimentally, and summarise the contributions of our method.

1.1 Active Vision in Birds

In animals with well-developed visual systems, vision is largely active: through a variety of head, eye and body movements, animals can manipulate the position and orientation of their viewpoint relative to their environment (Land & Nilsson, 2012). Understanding how birds interrogate their visual environment during flight may be key to unravelling the cognitive processes coordinating their impressively fast manoeuvres.

Fig. 4
figure 1

Related work. (a) Reconstruction of the environment around a nest of homing ants, and reconstructed views (A, B, C, D) at different instances of their recorded paths (Ardin et al., 2016). (b) Reconstructed view from each eye of a mouse hunting a cricket, using laser-scanned data and texture from high-resolution images (Holmgren et al., 2021). Panels (a) and (b) are reproduced from the cited works (Ardin et al., 2016; Holmgren et al., 2021) without modification under the terms of the Creative Commons Attribution License.

Birds mainly use head movements controlled by their neck motor system to look around the environment. This is because their eyes have a limited range of motion within their orbit, and the largest eye movements driven by the oculomotor system are small compared to those made by the head (Yorzinski et al., 2015; Mitkus et al., 2018; Potier et al., 2020). A small body of work has assessed how the frequency and amplitude of head movements in birds are affected by the visual environment experienced in flight. For example, in pigeons, these have been shown to vary with the structure of the landscape they are flying through (Kano et al., 2018), the structure of the clutter they are negotiating (Ros & Biewener, 2016), and the presence of another individual when flying in pairs (Taylor et al., 2019). Birds’ head movements have also been found to modulate their visual input: in turning flight, birds display a characteristic saccade-and-fixate strategy reminiscent of primate eye movements (Eckmeier et al., 2008; Ozawa, 2010; Kress et al., 2015; Ros & Biewener, 2017). This strategy supports the use of optic flow in flight control by eliminating the rotational component of the vector field during fixation, leaving only the translational component that contains the depth information (Eckmeier et al., 2008).

Although a bird’s head pose is the primary determinant of its gaze direction, eye tracking provides the most direct measure of gaze. Compared to scleral search coils (Rivers et al., 2014) or implanted magnets (Payne & Raymond, 2017), eye-tracking cameras offer the least invasive method to track eye movements. In birds, these have so far been restricted to terrestrial use cases, such as identifying where birds look when assessing mates (Yorzinski et al., 2013), watching predators (Yorzinski & Platt, 2014; Yorzinski, 2021), or inspecting the environment (Yorzinski et al., 2015). Their lack of use in flight is due to weight limitations, and the challenge of keeping the camera steady without occluding the frontal field of view.

Most bird studies therefore take head orientation as a proxy for gaze direction, which is often sufficient to identify the features to which a bird is attending. For example, work on lovebirds flying to a perch (Kress et al., 2015) investigated the alignment of the bird’s head with the edges of the perch and flight arena, whilst work on pigeons negotiating a forest of vertical poles (Lin et al., 2014) investigated the alignment of the bird’s head with the gaps between the obstacles. However, both studies analysed the problem in two dimensions, focusing only on changes in head azimuth, and reduced these extended features of the environment (selected a priori) to single points in the visual field. A complete understanding of the problem requires a full \(360^\circ \) reconstruction of the bird’s view in flight, which is what motivates the present study.

1.2 Related Work

Previous approaches to reconstructing what animals see of their environment have relied either on animal-borne cameras, or on reconstructing images synthetically. We review these approaches in the following sections, focusing on bird flight applications.

1.2.1 Head-Mounted Cameras

Head-mounted video cameras can sample the view of a bird as a result of its self-motion through the environment, and have been used to analyse aerial attack behaviours in hawks and falcons (Kane et al., 2015; Kane & Zamani, 2014; Ochs et al., 2016). This approach allows us to investigate a bird’s behaviour in its natural habitat, but is subject to the extreme limitations of pixel count, dynamic range and field of view of any camera small enough to mount on the head. Payload is conventionally limited to \(\le 5\%\) of a bird’s body mass on welfare grounds (Fair et al., 2010), but much more stringent limits may be required to ensure natural behaviour if the load is carried on the head (Kane & Zamani, 2014). The 20 g cameras that have been used previously (Kane et al., 2015; Kane & Zamani, 2014) are twice the weight of many small birds, and therefore only suitable for very large species such as raptors. Even so, it is currently not possible for a small camera to cover a bird’s full field of view at an appropriate optical or sampling resolution. For example, the vertical field of view (\(31^\circ \)) of the head-mounted camera used to study goshawks and falcons (Kane et al., 2015; Kane & Zamani, 2014) wouldn’t cover the vertical extent of the binocular overlap (\(100^\circ \)) of the birds in this work, namely Harris’ hawks (Potier et al., 2016). Furthermore, the possibilities for analysing head-mounted video data are also impacted by the cameras’ low frame rates (30 Hz was used in Kane et al. , 2015, and in Kane and Zamani , 2014), and the motion blur associated with low shutter speeds and rolling shutters (Kane & Zamani, 2014). Finally, although head-mounted cameras can be held reasonably fixed relative to a raptor’s head using a tightly fitted hood (Kane et al., 2015), fitting a hood may not be a possibility in untrained or smaller birds. Generally, head-mounted cameras will be prone to some degree of wobble unless surgically attached to the head (Lev-Ari & Gutfreund, 2018; Hazan et al., 2015), which is an undesirable intervention. Head-mounted video cameras therefore have less utility for studying visually guided flight in birds than might first be imagined.

1.2.2 Synthetic Reconstruction

An animal’s visual input can be recreated synthetically using bio-inspired hardware, for example with custom-designed cameras (Stuerzl et al., 2010) or event-based cameras (Zhu et al., 2021; Gallego et al., 2022). It can also be done via software, using rendering methods (Holmgren et al., 2021). Rendering is particularly attractive because it offers complete control of the detail presented over the visual field (Holmgren et al., 2021; Neumann, 2002), and it is well suited to scientific inference because of the possibility of defining alternative views (Eckmeier et al., 2013; Ravi et al., 2022; Miñano & Taylor, 2021; Bian et al., 2021). For example, in a series of works studying the effectiveness of movement-based signalling in lizards, renderings were used to investigate the effect of different lighting and wind conditions (Bian et al., 2018, 2019, 2021).

State-of-the-art research has proven it possible to render novel views from only a set of camera views and poses (Tancik et al., 2022a; Mildenhall et al., 2020). However, while user-friendly approaches are emerging for the use of these cutting-edge techniques among non-experts (Tancik et al., 2022b), most current rendering applications still require an explicit 3D model of the environment. There are some challenges involved in realistically modelling a natural-looking 3D environment. Standard modelling approaches such as simultaneous localization and mapping (SLAM) suffer from accumulating noise and drift when covering large areas (Schonberger & Frahm, 2016), and automating the post-processing of the resulting meshes may not be straight forward (Risse et al., 2018; Stürzl et al., 2015). Nevertheless, the quality of the dense maps that can now be captured with consumer-level handheld devices has improved greatly in the past few years, achieving results comparable to more expensive laser-scanning methods even in complex forest environments (Tatsumi et al., 2022; Gollob et al., 2021).

Most synthetic reconstructions of an animal’s visual scene to date have focused on insects. Insects are generally simpler to model than vertebrates, due to their lower sampling resolution, and the fact that their eyes are rigidly fixed to their heads. A few studies have investigated ant navigation using fully synthetic models of the natural environment (Ardin et al. , 2015; Ardin et al. , 2016; see Fig. 1a), and panoramic images of the ants’ habitat (Zeil et al., 2014). The role of optic flow in bee flight has been analysed using a basic geometric reconstruction of the laboratory environment (Ravi et al., 2019, 2022), whereas the homing flight of bees and wasps has been studied using detailed 3D models of their natural environment (Stürzl et al., 2015; Stuerzl et al., 2016; Schulte et al., 2019). The latter used models obtained using laser scanners, structure-from-motion (SfM), and photographic reconstruction techniques.

Although vertebrates generally have more complex visual systems than insects, the same general approaches have been extended to study their visually guided behaviours. One recent study analysed prey pursuit in mice by tracking the animal’s head and eye movements, and combining them with a high-resolution 3D laser scan of the lab environment (Holmgren et al. , 2021; see Fig. 1b). An earlier work in zebra finches reconstructed a simplified view of a bird in a single turning flight (Eckmeier et al., 2013), using a basic geometric model of the flight arena and the bill’s orientation as a proxy for gaze direction. In both of these studies, the environments mapped were \(< 1 \hbox {m}^{3}\), but there is currently growing interest in reconstructing an animal’s experience of its environment at much larger scales, relevant to ecology and conservation (Tuia et al., 2022). This interest has led to demonstrations of animal-borne 3D mapping sensors (McClune, 2018), and a mobile-camera method for embedding an animal’s track in an aerial view of its environment (Haalck et al., 2020).

Fully synthetic renderings also lend themselves to being used in a virtual reality (VR) environment. This approach has been applied to tethered and freely moving animals, mostly insects (Kern et al., 2005; Taylor et al., 2008; Windsor & Taylor, 2017), but also more recently small vertebrates. Examples include restrained birds (Eckmeier et al., 2013) and freely moving mice and zebra fish (Stowers et al., 2017; Naik et al., 2020). Such environments are currently limited to volumes of approximately 1 m\(^3\), so have yet to find use for larger animals making larger-scale movements.

Fig. 5
figure 2

Summary of method for reconstructing the visual information contained within the visual field of a bird in flight. The head movements of birds executing flight manoeuvres are recorded in a large motion capture lab (left panel). The lab environment is modelled in Blender, using geometric primitives and a dense 3D map to model objects with a more complex geometry (centre panel). The measured head pose is then used to define a virtual camera that is representative of the bird’s visual field (right panel). With this approach, we can generate detailed information describing the visual scene that the bird experiences in flight, including: (a) RGB renderings; (b) semantic maps; (c) depth maps; and/or (d) optic flow. In the virtual model of the lab (centre panel), the pulleys and target are displayed at twice their actual size for clarity. The spheres representing the bird’s visual field in the right panel show the animal’s view as it flies through the lab. The retinal margins of the left eye (blue line) and right eye (red line) are shown for reference, as well as the blind sector above the bird’s head (black fill). Note that because the spheres are represented using an orthographic projection, not all the visual field of the bird is visible

1.3 Contribution

We describe a method to render the visual scene experienced by a bird in flight, combining high-speed motion capture with 3D modelling of the laboratory environment. The data rendered from the bird’s perspective includes a rich set of outputs: RGB, semantic, depth and optic flow maps over the complete visual field of the bird (see Fig. 2). Although analogous data have been generated for insects, these were produced at much lower spatial resolution (Ravi et al., 2022, 2019; Schulte et al., 2019; Stuerzl et al., 2016; Stürzl et al., 2015). Additionally, none of these previous works involved a unique method to produce the full set of outputs we consider here. In birds, a similar RGB reconstruction has been previously generated for a single flight of a zebra finch (Eckmeier et al., 2013). However, this is the first time, to our knowledge, that such detailed data have been produced for large birds in flight, capturing their full visual fields and the full 6 degrees-of-freedom of their heads’ motion.

Compared to previous approaches to characterise the visual input of a bird in flight, our method has significant advantages:

  • It considers the complete visual field of the bird. As a result, posterior analyses are not limited by the available field of view of a head-mounted camera (Ochs et al., 2016; Kane et al., 2015; Kane & Zamani, 2014), or to local features of the scene falling in the direction of the bird’s gaze (Eckmeier et al., 2008; Yorzinski et al., 2013; Yorzinski & Platt, 2014; Yorzinski et al., 2015; Kress et al., 2015; Yorzinski, 2021).

  • It is not limited by other technical specifications of a camera that is practical to attach to a bird’s head for use in flight.

  • It is minimally invasive, which is preferred both on welfare grounds, and to preserve the animal’s natural behaviour as much as possible. The total weight carried by the bird is 3 g, much lower than the 20 g of a typical head-mounted camera (Kane et al., 2015; Kane & Zamani, 2014).

  • It allows us to consider different camera models and gaze strategies for the purposes of hypothesis testing.

  • The method is designed to support the collection of large amounts of data across different individuals, in environments that may vary experimentally across trials.

  • It is able to take advantage of a 3D modelling approach that can be adapted to the required level of detail and realism.

  • It combines several computer vision techniques (high-speed motion capture, 3D mapping, rendering and coordinate system registration) in a novel way enabling application in the field to investigate animal behaviour.

We demonstrate how our method can provide a unique insight into the hawks’ visually guided behaviour with simple behavioural analyses on three sample flights. However, the method would be useful too in more sophisticated and novel approaches to animal behaviour, for example to develop data-driven models of animal visuomotor control (Zhang et al., 2018; Merel et al., 2020), to provide realistic stimuli relevant for neural recording experiments in VR setups (Eckmeier et al., 2013), or as a first step towards fully synthetic models of an animal’s behaviour (Neumann, 2002). To support further work in these directions, we will provide the code for the rendering pipeline shortly after publication at https://github.com/sfmig/hawk-eyes.

Fig. 6
figure 3

Key features of the method. Panel I: we carried out motion capture experiments with Harris’ hawks, in which we tracked their head movements while executing pursuit (a) and obstacle avoidance manoeuvres (b). We used additional markers to locate the main elements of the scene. The pursuit flight takes 2.5 s and the obstacle flights around 2 s each. Note that the two obstacle avoidance flights correspond to the two legs of the same trial (magenta arrows and text). Panel II: we used the motion capture data to estimate the transform from a headpack coordinate system (c, shown schematically in red) to a coordinate system representative of the bird’s visual field (c, shown schematically in green). We used data available in the literature to estimate the monocular, binocular and blind regions on the bird’s visual coordinate system (d). Panel III: we propose to model the lab environment with a hybrid approach, which uses a combination of geometric primitives for the simple geometries in the scene, and dense 3D meshes for the more complex ones. To facilitate the integration of the captured dense 3D maps in the motion capture coordinate system, we transform them at the point of acquisition using an ArUco fiducial marker (f). The transforms between coordinate systems are shown (magenta text), where \(T_{A}^{B}\) represents the transform from A to B. We demonstrate this hybrid approach for the pursuit flight (g), modelling the curtain with a dense 3D map. For the obstacle avoidance flights, we used geometric primitives only (e). Panel IV: our method allows us to define alternative gaze strategies for the purpose of hypothesis testing. We demonstrate this by defining two scenarios for each of the flights. In the first scenario (h), the virtual camera (yellow) tracks the visual coordinate system, which we expect to be representative of the pose of the bird’s visual field. In the second one (i), the virtual camera tracks the trajectory coordinate system, which represents a horizon-level camera whose optical axis is tangent to the bird’s head trajectory (black line). This is shown for the pursuit flight in the figure, along with the target’s trajectory (orange line). The virtual camera is represented schematically as a pyramid, but note a \(360^\circ \) virtual camera was used for all renderings

2 Methods

In this section, we describe the key details of the motion capture experiments and of their synthetic reconstruction in a computational environment.

2.1 Motion Capture Experiments

We recorded Harris’ hawks (Parabuteo unicinctus) flying in a large (\(20\times 6\times 3.3\) m) motion capture lab, using 22 Vicon Vantage V16 motion capture cameras sampling at 200 Hz (Vicon Motion Systems Ltd., Oxford, UK). Here we present results for \(n=3\) sample flights from two different birds, executing pursuit and obstacle avoidance manoeuvres. These flights are part of a larger dataset of over \(>100\) trials across 5 individuals. We use this small subset of flights to describe and illustrate the method, and a complete description of the full set of experiments will be provided separately elsewhere.

2.1.1 Bird Flights

For the pursuit flight, the bird (Toothless) chased a cylindrical artificial target with food reward (length: 0.15 m; diameter: 0.025 m) that was dragged in an unpredictable direction around a series of pulleys at an average speed of 5.6 m s\(^{-1}\). To further challenge the bird’s manoeuvring, we hung a black curtain across the room from floor to ceiling, leaving a gap of approximately one wingspan (\(1.0-1.1\) m) to either side, through which the bird and target passed (Fig. 3a).

For the obstacle avoidance flights, the bird (Drogon) flew between two perches set 9 m apart, and around a set of four cylindrical styrofoam pillars (height: 2 m; diamater: 0.3 m) placed 1.5 m in front of one the perches (Fig. 3b). Note that the two flights correspond to the two legs of the same trial (see magenta arrows in Fig. 3b). Results from the two obstacle avoidance flights have also been presented in the preprint by Miñano and Taylor (2021), using a slightly different analysis approach. The walls of the laboratory environment were hung with camouflage netting, and other reconstruction cases that we trialled included placing small trees within this environment. Further details on the experimental setup and the birds can be found in Appendix A.

2.1.2 Motion Capture Data

We tracked the bird’s head, using a custom ‘headpack’ comprising a rigid arrangement of four or five 6.4 mm diameter spherical retroreflective markers that we fixed to a Velcro patch glued to the bird’s head (see Appendix 1). We tracked the target using three 6.4 mm diameter markers, and attached further 6.4 or 14 mm diameter markers to the main static elements of the scene. We used Nexus v2.8.0 software (Vicon Motion Systems Ltd., Oxford, UK) to extract the 3D positions of all the unlabelled retroreflective markers. For the pursuit flight, we labelled the headpack markers manually within Nexus, using its semi-automatic labelling functionality. For the obstacle avoidance flights, we labelled the headpack markers automatically using custom scripts written in MATLAB R2020b (The Mathworks Inc., Natick, MA). In both cases we used custom MATLAB scripts to label stationary obstacle markers, to compute and interpolate the pose of the headpack and target, and to handle missing marker data. Further detail on these post-processing steps is presented in B.

2.1.3 Calibration of the Bird’s Visual Coordinate System

The headpack was arbitrarily placed on the bird’s head before the experiments. As a result, a coordinate system defined relative to the headpack markers is not necessarily aligned with the principal axes of the bird’s visual field (Fig. 3c). To estimate the bird’s visual coordinate system, we make use of three assumptions (Miñano & Taylor, 2021). First, we assume that the bird’s gaze movements across the environment are largely executed via head movements, and that the eyes’ movement relative to the head is small (Kano et al., 2018; Brighton et al., 2017; Ros & Biewener, 2017; Kress et al., 2015; Kane & Zamani, 2014; Eckmeier et al., 2008). Second, we assume that the bird’s gaze direction is known from first principles during calibration and we identify it with the forward direction of the head. In the pursuit case, we assume that the bird looks at food presented to it by the falconer; in the obstacle avoidance case, we assume that the bird looks at the perch centre upon landing (Potier et al., 2016; Kress et al., 2015). Third, we assume that the bird holds its eyes level during the calibration. This eye-levelling behaviour has been reported repeatedly in the bird flight literature (Brighton & Taylor, 2019; Ros & Biewener, 2017; Warrick et al., 2002), and is confirmed by our reference videos too. Further detail on the visual coordinate system calibration is included in C.

We additionally determined the monocular, binocular and blind areas of the visual field of a Harris’ hawk in a sphere centred at the origin of the estimated visual coordinate system (Fig. 3d). To do this, we digitised and interpolated the data available in the literature (Potier et al. , 2016, Figures 5C and 6), and assumed the gaze direction of our visual coordinate system corresponded to the direction of maximum binocular overlap. The same method, further described in Appendix C.4, could be used in other animal species, given the data that is typically published to describe the visual field of an animal.

2.1.4 Ethics Statement

This work has received approval from the Animal Welfare and Ethical Review Board of the Department of Zoology, University of Oxford, in accordance with University policy on the use of protected animals for scientific research, permit no. APA/1/5/ZOO/NASPA, and is considered not to pose any significant risk of causing pain, suffering, damage or lasting harm to the animals. No adverse effects were noted during the trials.

2.2 Computational Model in Blender

We defined a computational model of the motion capture experiments in Blender (Blender Online Community, 2021), a 3D modelling software package with a rendering engine. This involved: (i) defining a virtual camera, representative of the bird’s perspective in flight, and (ii) defining a 3D model of the lab geometry during the experiments. The code to generate the model of the lab environment and define the corresponding virtual camera in Blender will be made available at https://github.com/sfmig/hawk-eyes.

2.2.1 Virtual Camera

We modelled the scene viewed by the bird using a 360\(^\circ \) virtual camera whose translation and rotation per frame matched those of the estimated visual coordinate system (Fig. 3h). Since any vergence movements of the eyes are unknown, we modelled the bird’s binocular visual system as a monocular camera. We selected a resolution of 5 pixels per degree latitude and longitude. This results in all pixels within the bird’s visible region (monocular plus binocular) having a length and width \(\sim 10\times \) the minimum resolution angle of the bird at its fovea (Potier et al., 2016). Note that we consider a uniform resolution across the camera’s full visual field, but this is not the case for the hawks: their visual acuity varies across their retinas and is highest at the foveae (Mitkus et al., 2018; Potier et al., 2016). For each flight, we rendered the view from this virtual camera using the Cycles rendering engine, and produced RGB, depth, semantic and optic flow data per pixel, for each motion capture frame (sampled at 200 Hz).

To test the effect of the bird’s head movements, we defined an alternative gaze strategy. This is represented by a horizon-levelled virtual camera whose optical axis is always tangent to the bird’s head trajectory (see Fig. 3i). Specifically, the virtual camera follows a trajectory coordinate system, whose y-axis is defined parallel to the bird’s head velocity vector, whose x-axis is parallel to the floor plane, and whose origin is that of the visual coordinate system (see Appendix C.5). For each flight, we rendered the view from this virtual camera as well.

2.2.2 Hybrid 3D Model of the Lab

We model the lab environment using a hybrid approach, which uses a combination of geometric primitives for the simple geometries in the scene, and dense 3D meshes for the most complicated ones (Fig. 3, panel III). The dense 3D meshes are captured with a mobile device, and expressed at acquisition time in the same coordinate system as the motion capture trajectories. This way we minimise the modelling and postprocessing effort, while producing realistic representations of the environment.

We demonstrate the use of a hybrid model of the lab for the pursuit flight, modelling the curtain using a dense 3D mesh, and the rest of the objects in the scene as geometric primitives (Fig. 3g). In the obstacle avoidance flights, we used geometric primitives only (Fig. 3e).

2.2.3 Dense 3D Map

To capture a dense 3D map of the curtain in the pursuit flight, we used the open-source SemanticPaint framework (Golodetz et al., 2015, 2018), which is built on top of InfiniTAM v3 (Prisacariu et al., 2017). We used the ASUS ZenFone AR smartphone as a mobile mapping sensor, and to perform visual-inertial odometry (ZenFone ZS571KL, ASUS, Taipei, Taiwan). To facilitate the integration of the dense map in the virtual model of the lab, we registered it to the motion capture coordinate system, using ArUco fiducial markers (Romero-Ramirez et al., 2018; Garrido-Jurado et al., 2014). The voxel size was set to 10 mm and the truncation distance to 40 mm (\(4\times \) the voxel size).

Figure 3f summarises the coordinate transformations applied to a captured dense 3D map to express it in the motion capture coordinate system. The 3D mesh is initially expressed in the SLAM world coordinate system, which is defined by default as the first camera pose. To compute the required transform from the SLAM world coordinate system to the motion capture coordinate system, we used an ArUco calibration plate. This consisted of an ArUco fiducial marker of size \(28.8 \times 28.8\) cm fixed to an acrylic plastic sheet with three retroreflective markers (10 mm diameter) on three of its corners. When brought into camera view, the coordinates of the ArUco marker’s corners are computed in the SLAM world coordinate system. Since we also placed retroreflective markers on these corners, their coordinates in the motion capture coordinate system are also known. By defining an auxiliary coordinate system with these three points, we can compute the transform from the SLAM world coordinate system to the motion capture one.

We used the open-source software MeshLab (Cignoni et al., 2008) to crop the mesh, remove duplicate vertices, and remove isolated pieces. We found that the floor plane of the mesh was slightly deviated from the motion capture system’s xy-plane (\(2.4^\circ \), see Appendix D.3), likely due to drift. We used MATLAB’s Point Cloud Processing functions to fit a plane to the floor of the mesh (mean error \(=0.038\) m), and transform it to the motion capture’s xy-plane. The transformed mesh deviated on average by 0.093 m from the reference markers placed on the curtain’s edges, as they were registered during the pursuit trial, and by 0.089 m from their position recorded just before capturing the mesh. Further details on the postprocessing of the mesh and the deviation metrics are included in Appendix D.3.

2.2.4 Geometric Primitives

We modelled the floor, ceiling and walls of the motion capture lab as planes. The floor plane was computed during the calibration of the motion capture system, and we determined the walls and ceiling planes from the motion capture cameras’ positions and orientations (see Appendix B.4).

In the pursuit flight, we modelled the pulleys as cones and the boxes covering the target’s initial position as cuboids (Fig. 3g). We modelled the target as a cylinder of 15 cm length and 2.54 cm diameter. In the obstacle avoidance flights, we modelled the obstacles as vertical cylinders, and the perches as horizontal cylinders, thereby reducing each A-frame perch to the top rung on which the bird landed (Fig. 3e). The position, orientation and size of all these geometric elements was determined from the retroreflective markers attached to the corresponding objects, and from measurements of the dimensions of the real objects. We placed reference markers on the curtain’s edges and on the wall netting at the curtain gap, but only used them to measure deviation from our modelled geometry (see Appendices A.3 and D.3). Further details on the definition of the geometric primitives are included in Appendix D.

Textures can also easily be added to make the synthetic scene as photo-realistic as required. As an example, we include an RGB rendering of the pursuit flight in which the walls of the lab model are textured, replicating the camouflage netting that was hung in the lab to prevent the birds from perching (see Online Resources 1 and 2).

Fig. 7
figure 4

Snapshots of the rendered pursuit flight and semantic heatmaps of the target. The snapshots of the RGB rendering (ae) represent the bird’s view as it flies through the lab in the pursuit flight. The caudal blind area is shown (black fill). The full flight takes 2.5 s. The heatmaps (f and g) show the value of h (Eq. 1), representing the frequency with which the target appears at each pixel in the visual field throughout a flight, normalised by the relative area of the solid angle that each pixel subtends. Results are shown for a virtual camera following the visual coordinate system (f), and the trajectory coordinate system (g). The camera axis (red star) corresponds to the estimated direction of the bird’s gaze \(\vec {v}_{gaze}\) in the visual coordinate system, and to the direction of the head’s velocity vector in the trajectory coordinate system. The retinal margins for the left eye (blue line) and right eye (red line) are shown for reference. Note that because the visual field spheres are represented using an orthographic projection, not all of the bird’s visual field is shown (see Fig. 26)

3 Results

For the three sample flights (one showing a pursuit manoeuvre and two recording obstacle avoidance manoeuvres), we rendered the view from: (i) a virtual camera aligned with the visual coordinate system, representing the bird’s visual field inclusive of all head movements; and (ii) a virtual camera following the trajectory coordinate system, representing the bird’s visual field exclusive of the animal’s rotational head movements. The rendered outputs per frame (RGB, depth, semantic and optic flow data) are included as supplementary videos (Online Resources 1–15, see Table 1). Further details on how these videos were produced are included in the Supplementary information and in Appendix 1.

We used the RGB and semantic data per frame to inspect the birds’ gaze strategy in the pursuit and obstacle avoidance manoeuvres, as a demonstration of how this approach can offer new insights into the bird’s behaviour. We followed a similar approach to the preliminary analysis presented by Miñano and Taylor (2021).

Fig. 8
figure 5

Trajectory of the target in the bird’s visual field. The edge contour of the target is represented for each frame of the pursuit flight in a cropped equirectangular projection of the area around the estimated gaze direction \(\vec {v}_{gaze}\) (red star), for the frames before (a) and after (b) turning around the curtain. The colormap indicates normalised time through the flight. The extension of the target in longitude and latitude over time is shown in (c) and (d) respectively, with the maximum (purple), minimum (green), and mean (blue) values represented per frame. For those frames in which the bird’s head transform was interpolated, these values are shown in black. The blue vertical line in (c) and (d) indicates the frame that defines the data split before (a) and after (b) turning the curtain. Reference lines are shown in (c) and (d) at \(\pm 10^\circ \) (red dashed lines) and at \(4^\circ \) latitude in (d) (yellow dashed line). The normalised time is 0 at the takeoff frame, identified when the bird dips its head just before the takeoff jump; note that it takes some frames for the target to become visible, but the linear motor pulling it was already triggered at this point. The normalised time is 1 at interception, when the bird and the target reach a local minimum of distance in the terminal phase of interception. The data shown cover 2.145 s

Fig. 9
figure 6

Trajectory of the target in the trajectory coordinate system. The edge contour of the target is represented for each frame of the pursuit flight in a cropped equirectangular projection of the area around the camera axis (red star) in the trajectory coordinate system. The virtual camera’s axis corresponds to the direction of the head’s velocity vector. The extent of the retinal margins of the bird’s right eye (red) and left eye (blue) is shown relative to the virtual camera’s axis for reference, although they are not expected to be positioned correctly or consistently in this coordinate system. The data is split following the same criteria as in Fig. 5, separating the frames before (a) and after (b) turning around the curtain. The colormap indicates normalised time through the trial. Note that in the first part of the flight the target shows prominent pitch oscillations likely reflecting the reaction to the wingbeat. In the second part of the flight, the target is not constrained to the equivalent of the binocular area, showing that the velocity vector of the head trajectory is not aligned with the target

3.1 Pursuit Flight

The RGB renderings in the visual coordinate system show that the target remains within the bird’s area of binocular overlap for almost the entire duration of the pursuit (see Fig. 4a–e, and Online Resources 1 and 2). In contrast, when the camera tracks the trajectory coordinate system the target is not held steady or centered, and the RGB renderings display pronounced pitch oscillations. This comparison confirms that the bird uses its rotational head movements to stabilize its gaze, and to keep the target reasonably well centered within its visual field.

To analyse these behaviour quantitatively, we use the semantic data. Figure 4f shows the frequency with which the target appears at each point in the visual field during the flight. For each pixel, the figure displays the metric:

$$\begin{aligned} h_i = \frac{n_i}{N A_i}, \end{aligned}$$
(1)

where \(n_i\) denotes the number of frames over which the ith pixel saw the target, N denotes the total number of frames analysed, and \(A_i\) denotes the solid angle subtended by the ith pixel, normalised by the maximum solid angle that any pixel subtends. Note that different pixels subtend different solid angles, due to the semantic output being an equirectangular projection of a sphere. We only consider pixels within the visible areas of the bird’s visual field, and exclude from the analysis any frames in which the head transform was interpolated, and any frames after interception. The results show that the target is held within \(\pm 10^\circ \) longitude and from \(-10^\circ \) to \(4^\circ \) latitude in the visual coordinate system for most of the flight (Fig. 4f). This is in sharp contrast with the results in the trajectory coordinate system, in which the target is not confined to this central area at all (Fig. 4g). However, a limitation of these results is that they are affected by the apparent size of the target, and refer to data aggregated across the whole of the flight. How does the target’s position in the visual field vary along the flight?

Figure 5 plots the evolution of the target’s contour in the visual coordinate system. The visual field is cropped close to the binocular area and shown in equirectangular projection. The sections of flight before and after the curtain are plotted separately for clarity (Fig. 5a and b respectively). In the first section of the flight, the target begins drifting across the visual field, but then seems to be stabilised at approximately \(10^\circ \) longitude (Fig. 5a). Target tracking seems to be lost temporarily as the bird turns around the curtain (green contours in Fig. 5a), but is quickly recovered with the target now stabilised at \(-10^\circ \) longitude (Fig. 5b). Towards the end of the flight, the target gradually becomes centred in the visual field, looming until interception. The same evolution can be seen by inspecting the longitudinal position of the boundaries and midpoint of the target through time (Fig. 5c). The target’s boundaries also remain between \(-10^\circ \) and \(4^\circ \) latitude for most of the flight (Fig. 5d).

For comparison, we computed the equivalent path of the target’s contour as seen from the trajectory coordinate system (Fig. 6). In the first part of the flight, the target shows considerably more oscillations in the vertical direction, likely due to the wingbeat motion (Fig. 6a). Just before turning around the curtain, the target appears to be aligned longitudinally with the bird’s head velocity vector. In the second part of the flight (Fig. 6b), the target is clearly not aligned with the head’s velocity vector, drifting out of the central area of the trajectory coordinate system (Fig. 6b). The comparison of Figs.  5b and 6b reflects how the estimated gaze direction, which approximates the forward direction of the bird’s head, diverges from the bird’s velocity vector in the final phase of interception.

Fig. 10
figure 7

Trajectory of the obstacles in the visual field sphere. The edge contour of the set of obstacles is represented in the virtual camera’s visual field for each frame (coloured semi-transparent) of both obstacle avoidance flights, from the point of takeoff to the point at which the perch is fully visible (i.e. until the second blue dashed vertical line in Fig. 8). The colormap indicates normalised time through each flight. For the frames in which the camera’s transform was interpolated (i.e. because not enough markers were reconstructed), the contour of the obstacles is shown in black. The retinal margins of Harris’ hawks are shown for the left eye (blue line) and right eye (red line), and the virtual camera’s axis is shown for reference (red star). Note that the virtual camera’s axis corresponds to the estimated gaze direction \(\vec {v}_{gaze}\) in the visual coordinate system (a), and to the direction of the head’s velocity vector in the trajectory coordinate system (b) In the visual coordinate system, the leftmost edge of the set of obstacles stays largely aligned with the estimated sagittal plane (i.e., the symmetry plane of the head) in both flights. In contrast, in the trajectory coordinate system, the obstacles are not stabilised in the vertical direction and they are not aligned with the head velocity vector either. The visible area in the bird’s visual field extends beyond what is shown in this orthographic projection (Color figure online)

3.2 Obstacle Avoidance Flights

In both obstacle avoidance flights, the RGB renderings in the visual coordinate system show the obstacles centred in the bird’s visual field (see Online Resources 8 and 9 for the flight corresponding to leg 1 of the trial, and Online Resources 12 and 13 for the flight corresponding to leg 2). This is not the case when inspecting the RGB renderings in the trajectory coordinate system. In them the obstacles are not centred, and oscillations are clearly visible (see Online Resources 10 and 11 for leg 1 of the trial, and Online Resources 14 and 15 for leg 2). This again confirms that the bird actively stabilises its visual field against the pitch oscillations associated with its wingbeat, and also directs its gaze so as to keep the obstacles broadly centered. In this case, however, close inspection of the semantic data reveals a more subtle interpretation of how the bird is directing its gaze.

We display the evolution of the obstacles’ contour in Fig. 7. We use an orthographic projection, rather than the equirectangular one we used for the pursuit flight, to reduce the distortion, since the obstacles occupy a much larger portion of the field of view than the target. The obstacles’ contour is represented from the takeoff frame until the frame at which the landing perch is visible without occlusion. The data show that the nearside edge of the obstacles as seen by the bird remains aligned longitudinally with the centre of the visual coordinate system in both flights (Fig. 7, top row). This alignment does not appear for the data rendered in the trajectory coordinate system (Fig. 7, bottom row), which reinforces the role of the bird’s head movements in fixating the obstacles’ nearside edge.

We can inspect the bird’s attention on the obstacles and the landing perch by combining the semantic data from these two elements. Figure 8 plots how the longitudinal positions of the midpoint and edges of the obstacles and the landing perch evolve through time. For the obstacle avoidance flight comprising the first leg of the trial, the nearside edge of the obstacles appears to remain approximately aligned with the centre of the visual field until the point at which the landing perch first becomes fully visible (Fig. 8a). Beyond this point, the midpoint of the landing perch becomes the object most closely aligned with the centre of the visual field. For the obstacle avoidance flight comprising the second leg of the trial, the nearside edge of the obstacles is again aligned closely with the bird’s estimated gaze direction until the point at which the landing perch becomes visible. Then, the bird appears to make a head saccade such that its new gaze direction aligns with the nearside edge of the landing perch. This remains the case for approximately the next 0.8 s (Fig. 8b), after which the bird seems to make another head saccade, to realign its head forward direction with the midpoint of the perch. Both head saccades can be seen in the corresponding RGB rendered videos (Online Resources 12 and 13).

Fig. 11
figure 8

Longitudinal extension of the set of obstacles and landing perch for the obstacle avoidance flights. The evolution through time of the longitudinal extension of the obstacles and landing perch is represented for the two obstacle avoidance flights, corresponding to leg 1 (a) and leg 2 (b) of the trial. The inside and outside edges of the obstacles and landing perch are labelled relative to the bird’s turn, together with the midpoint of the angle subtended by the obstacles and landing perch. Black markers denote frames in which the bird’s head transform was interpolated. The range of frames when the landing perch is partially occluded by the obstacles is marked between two vertical dashed lines. Note that because objects curve when they are close to the poles of the spherical virtual camera, it may be that the landing perch is fully visible but that its longitudinal extension overlaps with that of the obstacles (e.g. at around 225 frames from takeoff; see Online Resources 8 and 9). The first and last frames used to estimate the bird’s gaze direction \(\vec {v}_{gaze}\) are marked with vertical red dashed lines. In leg 1 of the flight (a), the obstacles appear within the central part of the visible field after 350 frames, but in reality they would have been occluded by the bird’s body (see Online Resource 8). In leg 2 (b), we can visually identify two potential head saccades: one at 87 frames after takeoff seems to align the estimated gaze direction with the left edge of the landing perch, as seen by the bird; the other at 242 frames after takeoff seems to align the estimated gaze direction with the centre of the perch (see Online Resources 12 and 13). The sampling rate is 200 Hz

4 Discussion

We have presented a method to generate synthetic data that characterise the visual experience of a bird in flight. To our knowledge, this is the first time that such a detailed description of the complete visual field of a large bird in flight has been generated. A similar approach was carried out for a single lovebird in Eckmeier et al. (2013), albeit at a much smaller scale and without focusing on its reproducibility to other species or individuals.

We have used three sample flights to illustrate the method and carry out behavioural analyses. Although simple, these analyses already show the potential of using our method to investigate the role of vision in bird flight. Comparable studies would be exceedingly challenging if video data from head-mounted cameras was used, given the attendant limitations on payload, resolution, field of view, and motion blur. They would also be much more limited if relying on point estimates of the bird’s gaze direction in relation to prominent visual features in the environment, rather than considering the animal’s full visual field. Additionally, our method allows us to inspect counterfactual scenarios, which we have demonstrated by comparing the rendered views from the visual coordinate system and the trajectory coordinate system. Again, this would be difficult or impossible to do using any of the other reviewed approaches.

4.1 Key Features of the Rendering Method

We have described how to model the lab environment using a hybrid approach, which combines basic geometric primitives defined using the motion capture data with dense 3D maps of features with more complex geometry. In this way, we avoid the accumulated drift and noise typical of large 3D maps, make the most of the accurate motion capture data, and reduce the modelling effort for the most intricate shapes. To facilitate the integration of the dense meshes within the basic geometric model of the lab, we transform them to the motion capture coordinate system at the point of acquisition. We have demonstrated the applicability of this hybrid approach in the pursuit flight, for the simple example of reconstructing a curtain that was hung to act as an obstacle to the bird. However, the method would be especially relevant for modelling natural-looking environments with more complicated features, such as small trees. Figure 9 illustrates this idea and shows the 3D mesh of a set of trees that we placed around the lab and captured using SemanticPaint, in this case using a Kinect v1 as the mapping sensor.

4.2 Behavioural Analysis of the Pursuit Flight

During the pursuit flight, we found that the target is held within the area of the bird’s binocular overlap for most of the flight. Modelling a counterfactual gaze strategy, in which the virtual camera’s principal axis is aligned with the bird’s head velocity vector, corroborates that the bird actively directs its gaze to keep the target in this region of the visual field. Further inspecting the evolution of the target’s position in the visual field through the flight, we find that the bird fixates the target at \(\pm 10^\circ \) longitude from the estimated gaze direction, and only centres it towards the terminal interception phase.

In common with most other raptors, Harris’ hawks have two areas of acute vision per retina: one projecting frontally and the other laterally (Mitkus et al., 2018; Potier et al., 2016; Inzunza et al., 1991). Where these four foveal regions (two in each retina) project on the visual field of Harris’ hawks has not been determined experimentally. For most diurnal raptors the frontal-facing foveae are estimated to project between \(9^\circ \) and \(16^\circ \) longitude from the forward direction of the head, and the lateral-facing fovea somewhere above \(30^\circ \) longitude (Wallman & Pettigrew, 1985; Frost et al., 1990; Kane & Zamani, 2014; Tucker, 2000). It is unclear whether the frontal-facing foveae of Harris’ hawks usually project to a single point in their visual fields. For example, in Anna’s hummingbirds it has been shown that the area temporalis, a high resolution area in their visual fields which faces frontally, does not project to a single point, even when their eyes are fully converged (Tyrrell et al., 2018). Binocular convergence of the frontal-facing foveae does seem possible in raptors, but was rarely observed during head-restrained experiments with a little eagle. Its primary gaze position was with its frontal-facing foveae at around \(13^\circ \) longitude from the head sagittal plane (Wallman & Pettigrew, 1985).

Whilst it would be premature to draw any firm conclusions from data for a single flight of a single bird, we hypothesise that these locations at \(\pm 10^\circ \) longitude at which the target seems to be fixated in the pursuit flight may correspond to the projections of the frontal-facing foveae of the bird’s left and right eyes. This being so, our results could indicate that the bird tracks the target with one or other of its frontal foveae throughout the flight, before centering it in the visual field prior to interception.

4.3 Behavioural Analysis of the Obstacle Avoidance Flights

In both obstacle avoidance flights we found that the nearside edge of the obstacles was aligned with the longitudinal centre of the visual field for substantial portions of the flights in which the obstacles were visible. Again this was not observed in the counterfactual gaze strategy that we considered, in which the virtual camera’s principal axis was aligned with the bird’s head velocity. This is in accordance with similar findings in lovebirds (Kress et al., 2015), bees (Ravi et al., 2022) and humans (Raudies et al., 2012; Rothkopf & Ballard, 2009), all of which seem to fixate on the edges of objects that can be perceived as obstacles, and on the centre of objects perceived as goals. Moreover, in the flight corresponding to the second leg of the trial, the bird seemed to align its visual field first with the edge of the landing perch and then with its centre before landing (see Online Resources 12 and 13, around frames 2029 and 2184 as numbered in the video). This is also in line with previous reports in lovebirds (Kress et al., 2015) and may reflect a strategy based on aiming at intermediate goals. It is important to note that the alignment with the perch centre is inevitable as we approach the set of frames that we used to calibrate the bird’s gaze direction (Fig. 8). However, the coincidence of the edge of the obstacles with the centre of the visual coordinate system provides strong internal support for the reliability of the visual coordinate system calibration.

An alternative explanation for the bird’s observed gaze behaviour around the obstacles is that the bird aligns with the direction where it expects the landing perch to appear. This could be the case for example in the flight executed in leg 1 of the trial, in which the edge of the landing perch is very close to the nearside edge of the obstacles (Fig. 8). Would the bird fixate on the edge of the obstacle if the perch was partially visible from the start? This could be tested directly, varying the position of the obstacles relative to the landing perch. The results obtained using the bird’s gaze strategy in this scenario could be compared to the results obtained assuming alternative gaze strategies, such as continuous fixation on the obstacles’ edge or continuous fixation on the landing perch’s edge or centre. In any case, as with the pursuit flight, these hypotheses on the animal’s behaviour can only be confirmed or rejected by analysing the complete set of flights across different individuals.

4.4 Limitations and Future Work

We aimed to develop a reconstruction method that would enable the collection of large amounts of data from many individuals. Some key steps that we have taken towards this goal include our non-invasive tracking of head pose using marker-based motion capture, our programmatic definition of geometric primitives based on motion capture markers placed on objects in the environment, and our automated integration of the dense 3D maps on the motion capture coordinate system. The key bottleneck in our current pipeline is the need to calibrate the bird’s visual coordinate system against the coordinate system of the headpack. Currently we achieve this using two different methods, one for each manoeuvre, but both sharing common assumptions.

An alternative approach that would likely improve the accuracy of the estimated visual coordinate system would involve integrating calibrated stereo cameras with the motion capture system. Using video-based motion tracking tools (Nath et al., 2019; Pereira et al., 2022) on data collected during a calibration trial, the 3D coordinates of easily identifiable features on the bird (such as its bill tip or its eyes) could be determined in the headpack coordinate system. In this way, the bird’s head pose could be directly estimated relative to the headpack coordinate system; a similar approach was demonstrated in the recent work by Naik et al. (2020). Even if this method still assumes the bird’s eyes are fixed relative to the head, it would provide an improved estimate of the true forward direction of the bird’s head, and of the midpoint between the bird’s eyes (the ideal origin of the visual field sphere, see Martin , 2007). It would also allow us to estimate the bird’s stereo baseline, and thus define the animal’s view as a binocular system in Blender. More importantly, it provides a common calibration method independent of the recorded manoeuvre, and if properly automated, it would allow us to scale up the analyses to much larger datasets. This will be a key focus of our future work. Other improvements to further streamline the method could be automating the correction of the dense 3D maps (potentially using the object’s motion capture markers as fiducials) or improving the ArUco plate, for better floor alignment.

Tracking the eye movements of birds in flight would also be of interest to define a more accurate visual coordinate system (Holmgren et al., 2021). It would allow us to explore changes in the bird’s visual field configuration in flight (Tyrrell et al., 2018), precisely inspect how the birds make use of the high-resolution areas in their visual field (Potier et al., 2016), or investigate whether the hawks can track targets simultaneously and independently with each eye, as it has been shown for grackles (Yorzinski, 2021). However, eye-tracking in flight seems currently very challenging; in birds it has only been used in relatively large species doing terrestrial tasks (Yorzinski et al., 2013; Yorzinski & Platt, 2014; Yorzinski et al., 2015; Yorzinski, 2019).

Alternative definitions of the virtual camera or adaptations of its synthetic outputs can also provide a better approximation of the bird’s visual system. For example, the resolution of the camera could be defined non-uniformly across the visual field to more closely represent the bird’s higher visual acuity at the foveae. This would reduce rendering time and may also provide insight into what information is required by the bird at high resolution to solve the task (Matthis et al., 2018). If the RGB renderings were to be used as stimuli for birds in VR experiments (Eckmeier et al., 2013), it may be relevant to adapt them attending to the animals’ spectral sensitivity (Tedore & Johnsen, 2017; Lind et al., 2013). Considering an event-based virtual camera could also be relevant to analyse the bird’s behaviour (Mueggler et al., 2017; Rebecq et al., 2018). These bio-inspired cameras output a spike (‘event’) when pixel-level brightness changes are detected, which makes them fast and very interesting for low-latency control in flying robots (Gallego et al., 2022). Analysing the bird’s gaze strategy using the event stream of its visual experience may facilitate potential applications to autonomous drones (Rodriguez-Gomez et al., 2022; Zhu et al., 2021).

Fig. 12
figure 9

3D model of a forest environment in the lab. We used a mix of 20 small laurel and bay trees of under 2 m height to recreate a forest environment in the motion capture lab. The 3D model was captured using SemanticPaint and a Kinect v1 as a mapping sensor. Note that in this case the world coordinate system of the captured map is not aligned with that of the motion capture coordinate system

The method described here is primarily designed around a motion capture system, but a relevant development going forward would be to translate this method to the field. Bio-loggers combining GPS and IMU units could be used to track the bird’s head position and orientation (Kano et al., 2018), although differential GPS or a similar technology would be required to obtain sufficient location accuracy (Keshavarzi et al., 2021; Sachs, 2016). To reconstruct the natural environment, laser scanners, structure-from-motion or photogrammetry techniques may be used (Tuia et al., 2022; Stürzl et al., 2015; Stuerzl et al., 2016; Schulte et al., 2019). Consumer-level handheld mapping devices like the one used here could be useful, as they have been shown to reconstruct forest environments with reasonably accurate results (Tatsumi et al., 2022; Gollob et al., 2021). On the other hand, there are still interesting research questions to address in the lab. For example, we could examine whether the gaze strategy of the bird is affected by the familiarity or the novelty of the elements in the scene, and consider the role of top-down or bottom-up attention mechanisms in their control of gaze. Similar questions have been explored in stationary owls, albeit in a lab environment not representative of their habitat (Lev-Ari & Gutfreund, 2018; Hazan et al., 2015). A set of lab experiments with a ‘simulated forest’ like the one shown in Fig. 9 could provide this approximation, and be very useful for comparison with experiments in the field. However, to minimise the risk of occluded motion capture markers in the simulated forest would likely require reducing the volume of interest and carefully considering the cameras’ arrangement, potentially with many cameras placed right overhead.

Table 1 Description of supplementary videos

These suggestions show how there is still much to learn about how animals interact with their environment, and innovative methods such as the one we describe open a wide range of possibilities for behavioural analysis. Human active vision has already inspired robotic applications (Seara & Schmidt, 2004; Seara et al., 2002, 2001), as active observers have been shown to solve basic vision problems more efficiently than passive ones (Aloimonos et al., 1988). Similarly, understanding the role of active vision in bird flight may reveal efficient processing strategies that could be translated to autonomous systems. Additionally, large datasets collected with this or similar methods could support data-driven models of behaviour and offer new insights into the bird’s gaze strategy in flight; similar approaches already exist that make use of human motion capture data to generate active-sensing behaviours in synthetic humanoids (Merel et al., 2020). In conclusion, we see many exciting opportunities in the future for mutual collaboration between the animal behaviour and computer vision communities.

Supplementary information. We provide the rendered outputs as video supplementary material. For the pursuit flight, we include videos for the RGB, semantic, depth and optic flow synthetic data generated. For the obstacle avoidance flights, we include RGB videos. All videos are reproduced at 20Hz, (1/10 of the real speed), except the optic flow video which is reproduced at 5 Hz (1/40 of the real speed). The frame numbering shown in the videos follows the motion capture data system’s numbering. A description of each of the video files is presented in Table 1.

The RGB outputs are presented using two projections: equirectangular, in which the geometry appears distorted, but which shows the complete field of view of the bird; and orthographic, in which the distortion is reduced but doesn’t include the most peripheral regions of the bird’s visual field. The rest of the rendered outputs are only represented in orthographic projection. Note that for the orthographic case, the point of view is as if looking frontally to the bird’s visual field sphere (see Fig. 26). In all videos, a red contour around the figure indicates that the head transform for that frame was interpolated. The retinal margins and the areas of the bird’s visual field (blind, monocular and binocular) are overlaid on the rendered output. The virtual camera is also defined with uniform resolution over its visual field, but note that this is not the case for the birds (see Sect. 2.2.1).

In the optic flow video, the colormap represents for each frame the instantaneous angular speed per pixel, in degrees per second. The colorbar is in logarithmic scale and capped at \(10^\circ \) s\(^{-1}\) in the lower bound and \(1000^\circ \) s\(^{-1}\) in the upper bound. The vector field results from the transformation of the output data from pixel space to the surface of the unit sphere. Further details on the computation of the videos are included in Appendix E.

Table 2 Birds used in the experiments