Construction of a calibrated set of silhouette views
A calibrated set of racket silhouette views was constructed in a common reference frame. A racket (Prince Warrior 100L ESP) was mounted upright at the centre of a Perspex board (400 × 300 × 4 mm); two cameras (Phantom Miro M110, Vision Research) were positioned to view the rig (Fig. 1a) and a two-dimensional planar calibration was performed for each camera with a checkerboard based on Zhang’s algorithm , as per similar work [7, 18, 29,30,31]. Twenty-one stereo calibrations were performed with the master camera fixed and the slave camera in a different position each time. A common reference frame between all slave camera positions and the racket was obtained by digitising control points on the Perspex board and racket (orthogonal calibration object) in the image plane of the master camera (Fig. 1b). A silhouette view of the racket was extracted digitally using MATLAB’s  image processing toolbox from an image obtained from the slave camera in each position to form the calibrated set.
The racket was painted matt black and white sheets formed a contrasting backdrop to aid digital silhouette extraction (Fig. 1b). The Perspex board formed the lower calibration plane (global X–Y plane) and control points 1–4 consisted of machined grooves filled with black paint to form the corners of a rectangle measuring ~360 × ~290 mm. The origin of the global coordinate system was set at control point 1, creating a left-hand reference frame (Fig. 1a). The racket formed the upper calibration plane and control points 5–7 were white marks painted on two black rods (diameter of 4 mm) fixed perpendicular to one another in the frame. A laser scanner (Metris ModelMaker D100) accurate to 0.050 mm was used to obtain the relative position of all control points, while also confirming that the calibration planes were orthogonal and the face of the racket was parallel to the global Y–Z plane, within practical limits (<0.5°).
The master camera was positioned ~2 m from the racket, with the optical axis forming an angle of ~10° with the global X–Z plane (Fig. 1c). The slave camera was positioned at angles from ~−60° to ~60° in ~20° increments at heights of ~1.15, ~1.55, and ~1.85 m to form three tiers (Fig. 1d). A suitable configuration for the calibrated set was found using simulations with computer generated silhouette views in Blender (v2.70) , with modifications to incorporate the practicalities of positioning cameras and undertaking a calibration, as detailed in Elliott . The cameras were set at their maximum resolution of 1280 × 800 pixels, with an F-stop (aperture) of F22 and a shutter speed of 1/8th s. The checkerboard (8 × 8 squares each measuring 50 × 50 mm) was set in 50 orientations for each stereo calibration to maximise image coverage [28, 35].
Checkerboard images were passed to Bouguet’s calibration Toolbox  in MATLAB  to obtain the intrinsic [focal length (fx, fy), principal point (cx, cy), and lens distortion] and extrinsic (rotation and translation in a common reference frame) camera parameters. Based on the findings of Elliott , the intrinsic parameters were estimated using a 4th order radial distortion model without the tangential component, and they were not recomputed when estimating the extrinsic parameters. The control points on the orthogonal calibration object were manually digitised ten times each (with short breaks to limit any learning effect), with the mean values passed to the Toolbox along with the corresponding world coordinates from the laser scan. The Toolbox used the control point coordinates from the manual digitisation and the laser scan, along with the intrinsic parameters, to compute the relative pose of the master camera with respect to the origin (control point P1), in a common reference frame with the racket. Averaged across all control points, the mean standard deviation from manual digitisation was 0.1 pixels, which equates to a relative calibrated camera pose error of less than 1 mm in translation and below 0.1° in rotation .
Using the extrinsic parameters from the stereo calibrations, rigid body transformations  were applied to obtain the slave cameras in a common reference frame with the master and racket. The calibration with the orthogonal object was performed once as it remained stationary along with the master camera, reducing uncertainty from manual digitisation in comparison to digitising the control points in an image from each slave camera position. Using an orthogonal object improved camera pose accuracy, in comparison to simply using the Perspex board  or racket as a planar calibration object, as detailed by Elliott . MATLAB’s  image processing toolbox was used to perform thresholding to digitally extract racket silhouettes from the slave camera’s images and to segment polygonal silhouette boundaries [37,38,39,40,41]. The extracted boundary was plotted on the original image and its quality assessed visually. The threshold value was manually adjusted, to ensure the extracted boundary provided an accurate representation of the racket in the original image.
Estimating racket position from candidate relative camera poses
Each silhouette was removed from the calibrated set and its camera pose was estimated using an initial candidate relative pose. Estimates were then compared with a criterion, which was the pose of the camera (obtained from calibration) removed from the calibrated set. Tests with computer generated camera poses confirmed that reducing the number of silhouette views in the calibrated set by one did not influence the results .
Since an unloaded (not at impact) tennis racket frame has a fairly regular shape, the probability of the Levenberg–Marquardt optimisation routine converging to a local minimum is increased, resulting in a camera pose estimate on the wrong side of the object (antipodal view) . The method works best if the initial candidate pose provided to the algorithm falls on the correct side of the racket, close to the true camera position. Each candidate pose was, therefore, created using a spherical coordinate system centred at the midpoint of the racket, with the radius corresponding to the known distance of the camera (obtained from calibration) taken from the calibrated set. The curved surface corresponding to the search region for candidate poses extended up to 30° either side (azimuthal angle) and 30° above and below (polar angle) the known camera pose, decreasing the likelihood of the antipodal view being found, as illustrated in Fig. 2.
A maximum of 100 candidate relative poses were used  and searches were terminated when the root-mean-squared (RMS) value of the ETE vector reached a threshold of 0.5 pixels. A threshold was required as the RMS ETE would not converge to zero due to inherent inconsistency in the set, as a result of small errors associated with calibration and silhouette extraction. If the threshold was not reached for any of the candidate relative poses, then the solution corresponding to the lowest ETE would be used. A threshold of 1 pixel did not always allow the optimisation to fully converge and reducing the value below 0.5 did not affect the solution.
Each camera pose estimate obtained with the view fitting techniques was used to reconstruct 106 3D coordinates on the racket face plane, using a camera-plane model [42, 43]. The reconstructed coordinates were compared with corresponding points on the racket frame surface obtained from the laser scan (criterion). As the coordinates extracted from the laser scan were not on the racket face plane, stereo triangulation was used for the reconstruction. Pixel projections of the coordinates were obtained using the calibration parameters, allowing for triangulation using the master camera (criterion) and each slave camera pose estimate from the view fitting techniques. The ability of the view fitting method to accurately reconstruct these coordinates was taken as the measure of how well racket position could be estimated.
Proof of concept for application to tennis
The methods described thus far were designed to develop a calibrated set configuration and validate a single view fitting method to estimate 3D racket position in a laboratory. Application of the method to play conditions requires development beyond the scope of this paper. For proof of concept without on-court testing, the calibrated set was simulated using computer generated camera poses and silhouette views [21, 41] of a full size racket model created in Blender (v2.70) . Based on findings of Elliott , the simulated calibrated set was modified with the camera poses orientated randomly (not upright) about the optical axis. The random camera pose orientations were generated between −90° and 90° (camera poses were upright at 0°) using an inbuilt MATLAB  function. The calibrated set was used to estimate the 3D position of the racket model during a simplified simulated serve movement, using the camera pose in Fig. 3. The pose was similar to those used by Choppin et al. , the camera was outside the court (which was full size) and should not be intrusive during play. The racket was located at the centre mark, with its face aligned with the global Y–Z plane. For simplicity, the racket model butt was set at the global origin as obtaining the relative position between the camera and racket was of interest. The court in Fig. 3 is for illustrative purposes.
To simulate motion during a simplified serve, the racket model was rotated about an axis 10.16 cm (4 inches) from the butt aligned with the global Y axis. This is the location of the axis of rotation used to define the swing weight of a racket . The racket was rotated about the Y axis between −40° and 30° in 2° increments, with 0° corresponding to upright. Silhouette images of the racket model were rendered every 2°, which for typical racket head speeds during a serve [18, 45,46,47,48]; a high-speed camera would need to operate at 200 frames per second (fps), so that sufficient silhouette images could be obtained. The algorithm was instructed to perform two optimisations; the first worked backwards from when the racket was oriented at 0° to −40°, the second worked forwards from 0° to 30°. An orientation of 0° was used as a starting point, because it was found that this position provided a more accurate pose initialisation . Thus, for the first optimisation, with the racket orientated at 0°, the candidate relative pose was obtained using the method described in Sect. 2.2. This scenario requires the operator to provide the algorithm with an initial approximate distance between the camera and the racket, i.e., 14 m should be sufficient for baseline shots (Fig. 3). The following optimisations were then initialised using the camera pose estimate from the previous solution. The 3D racket positions were obtained using the camera pose estimates to reconstruct the 130 coordinates on its face plane in the Y, Z, and resultant dimensions for each angle . Reconstruction results were validated against known 3D coordinates obtained from the racket model mesh.