Detection of Aerial Balls Using a Kinect Sensor
Detection of objects in the air is a difficult problem to tackle given the dynamics and speed of a flying object. The problem is even more difficult when considering a non-controlled environment where the predominance of a given color is not guaranteed, and/or when the vision system is located on a moving platform. As an example, most of the Middle Size League teams in RoboCup competition detect the objects in the environment using an omni directional camera that only detects the ball when in the ground, and losing any precise information of the ball position when in the air. In this paper we present a first approach towards the detection of a ball flying using a Kinect camera as sensor. The approach only uses 3D data and does not consider, at this time, any additional intensity information. The objective at this stage is to evaluate how useful is the use of 3D information in the Middle Size League context. A simple algorithm to detect a flying ball and evaluate its trajectory was implemented and preliminary results are presented.
KeywordsGround Plane Ball Position Kinect Sensor Occupancy Grid Mask Size
To our knowledge, most of the RoboCup teams of the Middle Size League (MSL) have limited vision systems regarding the detection of the ball when in it is in flight. Most teams use an omni directional camera on top of the robots that only detects correctly ball position when on the ground since they use projective geometry and a single camera.
Given that most of the robots shoot the ball through the air, the possibility to detect the ball when in flight is very relevant. Obvious solutions using more than a single camera (either using two additional cameras to provide stereo vision, or combining the information from the omni directional camera with additional cameras) can be considered as in [1, 2, 3]. However they present some limitations: first these additional cameras may point outside the field and cannot use background or color information to simplify ball segmentation (since a flying ball might be in the air with any possible backgrounds - tribunes, chairs, spectators, etc.) or might have limited field of view (most omni directional camera point downwards, meaning a maximum height of around 60 cm).
In a previous work , we developed algorithms based on color and shape detection using a single perspective camera. In this work, the above mentioned problems were detected. The work presented in this paper uses a different vision approach based on a depth sensor instead of an intensity sensor. This work presents several similarities with the work of Khandelwal et al.  that uses a Kinect sensors as a low cost ground truth detection system. As 3D sensor, a Kinect was chosen given its low price (making it possible to use in an aggressive environment such as RoboCup), its ability to directly provide 3D depth information and its refresh rate of 30 fps similar to the RGB camera used in the onmi direction vision system of the robots .
2 3D Data
Kinect is a motion sensing input device developed by Microsoft and launched on November 2010. The sensor includes an RGB camera, an infrared laser projector, a monochrome CMOS sensor, and other components less relevant for our application. The field of view is 57 degrees horizontally, and around 43 degrees vertically. Acquisition of the 3D data from the Kinect is done using a C++ wrapper for libfreenect that transforms depth images into an OpenCV matrix. For an initial stage of the application, ROS was used to access directly the 3D cloud of points. However, the time needed for processing and transmitting the 3D data was large and it was verified that direct access to the 2D depth data was faster. Besides, the CAMBADA team code structure is not ROS based, which would ultimately become an issue. The transformation from raw data to metric is done using a formula found in an online manual .
3 Ball Detection Algorithm
Despite the limitations inherent from the discretization of the space, more visible at higher distances, the use of 3D information still seems to be a good option.
3.1 Flying Objects
A first approach to ball detection using the Kinect cloud of points as a source would be to use geometry, for example fitting half spheres to the data and trying to find areas of interest. Preliminary trials have been performed with this approach but, given the discretization effect, this fitting only appeared to provide reliable data when close to the sensor (within the typical Kinect working range, less than 4 m). For longer distance the spherical shape of the ball becomes difficult to detect.
The approach used in this paper is to detect flying objects within the Kinect field of view. To achieve this, given the properties of a flying ball in the MSL environment, we decided to voxelize the space to work in an occupancy voxel space rather than considering the while cloud of points.
The values used for the grid and the mask obviously depend on the size of the ball to detect. However, they have to be defined taking into consideration two main aspects: (1) the grid size must be large enough to allow that a real flying ball, when voxelized, does not become smaller than the space between any two planes. This issue becomes more hazardous at farther distances; (2) the grid mask must be large enough to accommodate a volume larger than the ball, since some blurring is inevitable due to the high speeds achieved by a ball.
With this mask approach, we expect to rule out false positives from any other object on the field of play, since all other artifacts during a game can only be a robot or a human. Since all of them have a clear “connection” with the ground, the mask will not allow a valid detection.
Also, any object that could effectively be identified by the mask as a ball, at this point, could be outside the field of play. Since this is a complementary vision system for our robots and since they know their position inside the field of play, further integration steps will be responsible for handling these possible false positives.
In our preliminary tests in a field with limited range and a wall at 7 m, we empirically set the following values for the grid and mask sizes: the grid size of the occupancy grid is 0.27 m, and the mask size (corresponding to the outside of the mask) is 5. As it is, and with our test scenario, the resulting mask size is around 1.35 m meaning that a flying object must be at least 0.27 m away from any other structure to be detected. These values were used to avoid wrong detection related to objects close to the wall at 7 m, where the discretization effect of the Kinect is significant. In Fig. 4 we present the ball correctly detected in three consecutive points corresponding to a kick away from the sensor.
3.2 Ground Objects
The algorithm presented in the previous section is only suitable to detect flying balls, but it can be easily adapted to detect ground balls by ignoring the bottom part of the 3D mask for balls lying on the ground. To apply this idea it is necessary to know where the ground is and apply a different 3D mask for voxels lying above the ground.
For optimizing the process and since usually the height of the sensor will not change during acquisition, the rigid body transform between the original coordinate frame and the ground is computed only once in the first image (since the RANSAC plane fitting is quite heavy) but any following point cloud is transformed to align the ground plane with the \(XY\) coordinate plane. The final algorithm will be exactly the same but will use a given mask for flying objects when \(y > 1\) and the ground mask when \(y = 1\).
Figure 6 shows the results of ground ball detection in 4 consecutive images corresponding to a kick away from the user with the ball on the field.
4 Trajectory Estimation
Besides detecting the ball in the environment, it is important for the robot to estimate the ball trajectory in order to predict the best action to take. In robotic soccer, the algorithm for trajectory estimation presented in this section is useful so that the goalkeeper can move to a position in order to prevent a goal.
For flying objects, and considering that air resistance is negligible, the trajectory can be approximate by a simple ballistic trajectory. To perform this evaluation, we keep trace of the last ball positions. Currently, 10 previous position are kept since it is enough for most of the flying movements detected.
The trajectory estimation algorithm computes the 2D vector (x,y) between the actual ball and the initial point of the trajectory (the first ball detected that supports the actual trajectory). It then used this vector norms to estimate the point on the trajectory (according to the current estimated ballistic) at the same distance. If the Euclidean between the original ball position and the point on the trajectory is below a threshold (empirically set to 0.25 m in the current experiments) the position is considered as supporting the current trajectory.
Given a number of points supporting the trajectory, Singular Value Decomposition (SVD) is used to compute the parabolic equation that best fits the whole supporting coordinates. The algorithm used is the one implemented in the Eigen Library .
With the trajectory estimated, the agent can use the current projection of the ball position on the ground and even the predicted touchdown point of the ball.
5 Kinect Position Calibration on the Robot
6 Experimental Results
In this configuration the robot evaluated the ball position on several flying ball tests and provide feedback on the base station of the estimated intersection point (projecting the 3D position on the ground and computing the closest point to the line). Visual observation on the base station was according to the expected as the robot movement was toward the correct direction.
Ball position evaluation regarding the proposed vision system. The ball was fixed by a thin wire at 1 m high in the front of the robot (\(x=0\)) at several distances (y).
7 Conclusion and Future Work
In this paper we present preliminary work toward the detection of 3D flying balls using only the depth information provided by a Kinect. The final objective is to use the algorithms presented here to extend the current vision of our MSL robotic agent to cope with 3D object positions. An algorithm based on an occupancy grid was developed. Preliminary results are encouraging since the algorithm can process the Kinect data in real time (processing at 30 fps) and the visual inspection of the detection is quite convincing.
Regarding future work, the first step is the fully integration of the Kinect on the platform to allow real experiments during a game, in the presence of other objects than the ball. Ball validation using the RGB image is also under development, in order to validate any false positive that can occur.
This work was developed in the Institute of Electronic and Informatic Engineering of University of Aveiro and was partially supported by FEDER through the Operational Program Competitiveness Factors - COMPETE and by National Funds through FCT - Foundation for Science and Technology in a context of a project FCOMP-01-0124-FEDER-022682 (FCT reference PEst-C/EEI/UI0127/2011).
- 2.Voigtlnder, A., Lange, S., Lauer, M., Riedmiller, M.: Real-time 3D ball recognition using perspective and catadioptric cameras (2007)Google Scholar
- 3.Scaramuzza, D., Pagnottelli, S., Valigi, P.: Ball detection and predictive ball following based on a stereoscopic vision system. In: IEEE International Conference on Robotics and Automation, pp. 1561–1566 (2005)Google Scholar
- 7.Burrus, N.: Kinect calibration, consulted in 2013/2014Google Scholar
- 8.Rusu, R.B., Cousins, S.: 3D is here: point cloud library (PCL). In: IEEE International Conference on Robotics and Automation (ICRA), Shanghai, 9–13 May 2011Google Scholar
- 9.Guennebaud, G., Jacob, B., et al.: Eigen v3 (2010). http://eigen.tuxfamily.org