A New Passage Ranking Algorithm for Video Question Answering

  • Yu-Chieh Wu
  • Yue-Shi Lee
  • Jie-Chi Yang
  • Show-Jane Yen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4319)


Developing a question answering (Q/A) system involves in integrating abundant linguistic resources such as syntactic parsers, named entity recognizers which are not only impose time cost but also unavailable in other languages. Ranking-based approaches take the advantage of both efficiency and multilingual portability but most of them bias to high frequent words. In this paper, we propose a new passage ranking algorithm for extending textQ/A toward videoQ/A based on searching lexical information in videos. This method takes both N-gram match and word density into account and finds the optimal match sequence using dynamic programming techniques. Besides, it is very efficient to handle real time tasks for online video question answering. We evaluated our method with 150 actual user’s questions on the 45GB video collections. Nevertheless, four well-known but multilingual portable ranking approaches were adopted to compare. Experimental results show that our method outperforms the second best approach with relatively 25.64% MRR score.


Chinese Character Text Component Match Sequence Optical Character Recognition Question Answering 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cai, M., Song, J., Lyu, M.R.: A new approach for video text detection. In: Proceedings of International Conference on Image Processing, pp. 117–120 (2002)Google Scholar
  2. 2.
    Cao, J., Nunamaker, J.F.: Question answering on lecture videos: a multifaceted approach. In: International Conference on Digital Libraries, pp. 214–215 (2004)Google Scholar
  3. 3.
    Chang, F., Chen, G.C., Lin, C.C., Lin, W.H.: Caption analysis and recognition for building video indexing systems. ACM Multimedia systems 10(4), 344–355 (2005)CrossRefGoogle Scholar
  4. 4.
    Cui, H., Sun, R., Li, K., Kan, M., Chua, T.: Question answering passage retrieval using dependency relations. In: Proceedings of the 28th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 400–407 (2005)Google Scholar
  5. 5.
    Fan, J., Yau, D.K.Y., Elmagarmid, A.K., Aref, W.G.: Automatic image segmentation by integrating color-edge extraction and seeded region growing. IEEE Trans. On Image Processing 10(10), 1454–1464 (2001)MATHCrossRefGoogle Scholar
  6. 6.
    Hong, T., Lam, S.W., Hull, J.J., Srihari, S.N.: The design of a nearest-neighbor classifier and its use for japanese character recognition. In: Proceedings of Third International Conference on Document Analysis and Recognition, pp. 270–291 (1995)Google Scholar
  7. 7.
    Lee, G.G., Seo, J.Y., Lee, S.W., Jung, H.M., Cho, B.H., Lee, C.K., Kwak, B.K., Cha, J.W., Kim, D.S., An, J.H., Kim, H.S.: SiteQ: Engineering high performance QA system using lexico-semantic pattern matching and shallow NLP. In: Proceedings of the 10th Text Retrieval Conference, pp. 437–446 (2001)Google Scholar
  8. 8.
    Lienhart, R., Wernicke, A.: Localizing and segmenting text in images and videos. IEEE Trans. Circuits and Systems for Video Technology 12(4), 243–255 (2002)CrossRefGoogle Scholar
  9. 9.
    Lin, C.J., Liu, C.C., Chen, H.H.: A simple method for Chinese video OCR and its application to question answering. Computational linguistics and Chinese language processing 6(2), 11–30 (2001)MathSciNetGoogle Scholar
  10. 10.
    Lin, J., Quan, D., Sinha, V., Bakshi, K., Huynh, D., Katz, B., Karger, D.R.: What makes a good answer? the role of context in question answering. In: Proceedings of the 9th international conference on human-computer interaction (INTERACT), pp. 25–32 (2003)Google Scholar
  11. 11.
    Lyu, M.R., Song, J., Cai, M.: A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Trans. Circuits and Systems for Video Technology 15(2), 243–255 (2005)CrossRefGoogle Scholar
  12. 12.
    Pasca, M., Harabagiu, S.: High-performance question answering. In: Proceedings of the 24th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 366–374 (2001)Google Scholar
  13. 13.
    Robertson, E., Walker, S., Beaulieu, M.: Okapi at TREC-7: automatic ad hoc, filter-ing, VLC and interactive track. In: Proceedings of the 7th Text Retrieval Conference (1998)Google Scholar
  14. 14.
    Rus, V., Moldovan, D.: High precision logic form transformation. International Journal on Artificial Intelligence Tools 11(3), 437–454 (2002)CrossRefGoogle Scholar
  15. 15.
    Savoy, J.: Comparative study on monolingual and multilingual search models for use with Asian languages. ACM transactions on Asian language information processing (TALIP) 4(2), 163–189 (2005)CrossRefGoogle Scholar
  16. 16.
    Tellex, S., Katz, B., Lin, J.J., Fernandes, A., Marton, G.: Quantitative evaluation of passage retrieval algorithms for question answering. In: Proceedings of the 26th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 41–47 (2003)Google Scholar
  17. 17.
    Voorhees, E.M.: Overview of the TREC 2001 question answering track. In: Proceedings of the 10th Text Retrieval Conference, pp. 42–52 (2001)Google Scholar
  18. 18.
    Wu, Y.C., Lee, Y.S., Chang, C.H.: CLVQ: Cross-language video question/answering system. In: Proceedings of 6th IEEE International Symposium on Multimedia Software Engineering, pp. 294–301 (2004)Google Scholar
  19. 19.
    Yang, H., Chaison, L., Zhao, Y., Neo, S.Y., Chua, T.S.: VideoQA: Question answering on news video. In: Proceedings of the 11th ACM International Conference on Multimedia, pp. 632–641 (2003a)Google Scholar
  20. 20.
    Yang, H., Chua, T.S., Wang, S.G., Koh, C.K.: Structural use of external knowledge for event-based open domain question answering. In: Proceedings of the 26th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 33–40 (2003b)Google Scholar
  21. 21.
    Zhang, D., Nunamaker, J.: A natural language approach to content-based video indexing and retrieval for interactive E-learning. IEEE Transactions on Multimedia 6(3), 450–458 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Yu-Chieh Wu
    • 1
  • Yue-Shi Lee
    • 3
  • Jie-Chi Yang
    • 2
  • Show-Jane Yen
    • 3
  1. 1.Department of Computer Science and Information EngineeringNational Central University 
  2. 2.Graduate Institute of Network Learning TechnologyNational Central UniversityJhongli City, Taoyuan CountyTaiwan, R.O.C.
  3. 3.Department of Computer Science and Information EngineeringMing Chuan UniversityTaoyuanTaiwan, R.O.C.

Personalised recommendations