A Foreground Segmentation Scheme
Speech signal, video caption text and video frame images are all key factors for a person to understand the video content. Through above observation, we bring forward a scheme which integrating continuous speech recognition, video caption text recognition and object recognition. The video is firstly segmented into a number of shots by shot detection. Then the caption text recognition and speech recognition are carried out and the results are treated as two paragraphs of text. Only the noun words are kept. The words are further depicted as a graph. The graph vertices stand for the words and the edges denote the semantic relation between two neighboring words. In the last step, we apply the dense sub graph finding method to mine the video semantic meaning. Experiments show that our video semantic mining method is efficient.
KeywordsVideo Semantic Mining Auto Speech Recognition Information Fusion Object Recognition
Unable to display preview. Download preview PDF.
- 1.Snoek, C.G.M., Worring, M.: Concept-based video retrieval. Trends Inf. Retriev. 4(2), 215 (2009)Google Scholar
- 2.Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Proc. ECCV Int. Workshop Statistical Learning in Computer Vision, Prague, Czech Republic (2004)Google Scholar
- 3.Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: Proc. IEEE Int. Conf. Computer Vision (2003)Google Scholar
- 4.Smeaton, A., Over, P., Kraaij, W.: Evaluation campaigns and TRECVID. In: Proc. ACM SIGMM Int. Workshop Multimedia Information Retrieval (2006)Google Scholar
- 5.Winn, J.: The PASCAL Visual Object Classes Challeng 2010 (VOC 2010) Development Kit. Tech. Rep., University of Leeds (2010)Google Scholar
- 6.Zhang, X., Liu, Y., Liang, C., Xu, C.: A visualized communication system using cross-media semantic association. In: Proceedings of the 17th International Conference on Advances in Multimedia Modelling, Taipei, Taiwan, January 05-07 (2011)Google Scholar
- 7.Hanbury, A., Müller, H.: Automated component-level evaluation, present and future. In: Proceedings of the 2010 International Conference on Multilingual and Multimodal Information Access Evaluation: Cross-Language Evaluation Forum, Padua, Italy, September 20-23 (2010)Google Scholar
- 8.Lee, H., Yu, J., Im, Y., et al.: A unified scheme of shot boundary detection and anchor shot detection in news video story parsing. Multimedia Tools and Applications, 1127 (2011)Google Scholar
- 9.Open Source Toolkit For Speech Recognition, http://cmusphinx.sourceforge.net/
- 10.Rleon, M., Mallo, S., Gasull, A.: A tree structured-based caption text detection approach. In: Proceedings of the Fifth IASTED International Conference on Visualization, Imaging, and Image Processing, p. 220 (2005)Google Scholar