Audiovisual Integration for Racquet Sports Video Retrieval

  • Yaqin Zhao
  • Xianzhong Zhou
  • Guizhong Tang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4093)


This paper presents a new audiovisual integration scheme for racquet sports video structure indexing and highlight generating. Instead of using low-level features, the method is built upon the combination of visual and audio features. With respect to prior information about this kind of video content and editing rules, visual features based on dominant color and motion attention model are applied to classify shots into two classes: global view shots and non-global view shots. The classification algorithm is independent of predefined court color, and much robust to lighting conditions. Afterwards, among shots important auditory features including both ball hitting and applause are detected for identifying interesting events with strong semantic meaning, such as missed serves, aces, rallies and replays in tennis video. Finally, a reasonable model is built to rank rally events by excitement. The results showed the scheme could effectively identify typical scenes for retrieving highlights.


Video Clip Global View Video Shot Audio Feature Dominant Color 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Petkovic, M., Zivkovic, Z., Jonker, W.: Recognizing Strokes in Tennis Videos using Hidden Markov Models. In: Proceedings of IASTED International Conference on Visualization, Imaging and Image Processing, Spain (2001)Google Scholar
  2. 2.
    Kijak, E., Oisel, L., Gros, P.: Temporal Structure Analysis of Broadcast Tennis Video using Hidden Markov Models. In: SPIE Storage and Retrieval for Media Databases, pp. 289–299 (2003)Google Scholar
  3. 3.
    Kijak, E., Gravier, G., Oisel, L., Gros, P.: Audiovisual Integration for Tennis Broadcast Structuring. In: Proceedings of International Conference on Multimedia and Exhibition (2003)Google Scholar
  4. 4.
    Dayhot, R., Kokaram, A., Rea, N.: Joint Audio Visual Retrieval for Tennis Broadcasts. In: IEEE International Conference on Acoustics, Speech, & Signal Processing, Hong Kong (2003)Google Scholar
  5. 5.
    Xu, M., Duan, L.Y., Xu, C.S., Tian, Q.: A Fusion Scheme of Visual and Auditory Modalities for Event Detection in Sports Video. In: IEEE International Conference on Acoustics, Speech, & Signal Processing, Hong Kong, pp. 333–336 (2003)Google Scholar
  6. 6.
    Xing, L.Y., Ye, Q.X., Zhang, W.G.: A Scheme for Racquet Sports Video Analysis with the Combination of Audio-Visual Information. In: Proceedings of International Conference on Visual Communications and Image Processing, vol. 5960 (2005)Google Scholar
  7. 7.
    Ma, Y.F., Zhang, H.J.: A Model of Motion Attention for Video Skimming. In: Proceedings of International Conference on Image Processing, vol. 1, pp. 129–132 (2002)Google Scholar
  8. 8.
    Ngo, C.W., Ma, Y.F., Zhang, H.J.: Video Summarization and Scene Detection by Graph Modeling. IEEE Transactions on Circuits and Systems for Video Technology 15(2), 296–305 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Yaqin Zhao
    • 1
  • Xianzhong Zhou
    • 2
  • Guizhong Tang
    • 3
  1. 1.College of Mechanical and Electronic EngineeringNanjing Forestry UniversityNanjingChina
  2. 2.School of Management and EngineeringNanjing UniversityNanjingChina
  3. 3.School of AutomationNanjing University of TechnologyNanjingChina

Personalised recommendations