World Wide Web

, Volume 14, Issue 1, pp 53–73 | Cite as

Clustering Web video search results based on integration of multiple features

  • Alex Hindle
  • Jie Shao
  • Dan Lin
  • Jiaheng Lu
  • Rui Zhang


The usage of Web video search engines has been growing at an explosive rate. Due to the ambiguity of query terms and duplicate results, a good clustering of video search results is essential to enhance user experience as well as improve retrieval performance. Existing systems that cluster videos only consider the video content itself. This paper presents the first system that clusters Web video search results by fusing the evidences from a variety of information sources besides the video content such as title, tags and description. We propose a novel framework that can integrate multiple features and enable us to adopt existing clustering algorithms. We discuss our careful design of different components of the system and a number of implementation decisions to achieve high effectiveness and efficiency. A thorough user study shows that with an innovative interface showing the clustering output, our system delivers a much better presentation of search results and hence increases the usability of video search engines significantly.


Web video YouTube search results clustering user interface 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with Bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)MATHMathSciNetGoogle Scholar
  2. 2.
    Bao, S., Yang, B., Fei, B., Xu, S., Su, Z., Yu, Y.: Social propagation: boosting social annotations for web mining. World Wide Web 12(4), 399–420 (2009)CrossRefGoogle Scholar
  3. 3.
    Cai, D., He, X., Li, Z., Ma, W.Y., Wen, J.R.: Hierarchical clustering of WWW image search results using visual, textual and link information. In: ACM Multimedia, pp. 952–959 (2004)Google Scholar
  4. 4.
    Carpineto, C., Osinski, S., Romano, G., Weiss, D.: A survey of web clustering engines. ACM Comput. Surv. 41(3) (2009)Google Scholar
  5. 5.
    Cheung, S.C.S., Zakhor, A.: Efficient video similarity measurement with video signature. IEEE Trans. Circuits Syst. Video Technol. 13(1), 59–74 (2003)CrossRefGoogle Scholar
  6. 6.
    Cutting, D.R., Karger, D.R., Pedersen, J.O.: Constant interaction-time scatter/gather browsing of very large document collections. In: SIGIR, pp. 126–134 (1993)Google Scholar
  7. 7.
    Cutting, D.R., Pedersen, J.O., Karger, D.R., Tukey, J.W.: Scatter/gather: a cluster-based approach to browsing large document collections. In: SIGIR, pp. 318–329 (1992)Google Scholar
  8. 8.
    Eda, T., Yoshikawa, M., Uchiyama, T., Uchiyama, T.: The effectiveness of latent semantic analysis for building up a bottom-up taxonomy from folksonomy tags. World Wide Web 12(4), 421–440 (2009)CrossRefGoogle Scholar
  9. 9.
    Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Gao, B., Liu, T.Y., Qin, T., Zheng, X., Cheng, Q., Ma, W.Y.: Web image clustering by consistent utilization of visual features and surrounding texts. In: ACM Multimedia, pp. 112–121 (2005)Google Scholar
  11. 11.
    García, R., Gimeno, J.M., Perdrix, F., Gil, R., Oliva, M., López, J.M., Pascual, A., Sendín, M.: Building a usable and accessible semantic Web interaction platform. World Wide Web 13(1–2), 143–167 (2010)CrossRefGoogle Scholar
  12. 12.
    Gibbon, D.C., Liu, Z.: Introduction to Video Search Engines. Springer (2008)Google Scholar
  13. 13.
    Golub, G.H., Loan, C.F.V.: Matrix Computations, 3rd edn. The Johns Hopkins University Press (1996)Google Scholar
  14. 14.
    Huang, Z., Shen, H.T., Shao, J., Zhou, X., Cui, B.: Bounded coordinate system indexing for real-time video clip search. ACM Trans. Inf. Syst. 27(3) (2009)Google Scholar
  15. 15.
    Islam, A., Inkpen, D.Z.: Semantic text similarity using corpus-based word similarity and string similarity. TKDD 2(2) (2008)Google Scholar
  16. 16.
    Jansen, B.J., Campbell, G., Gregg, M.: Real time search user behavior. In: CHI Extended Abstracts, pp. 3961–3966 (2010)Google Scholar
  17. 17.
    Jansen, B.J., Spink, A., Saracevic, T.: Real life, real users, and real needs: a study and analysis of user queries on the Web. Inf. Process. Manag. 36(2), 207–227 (2000)CrossRefGoogle Scholar
  18. 18.
    Jing, F., Wang, C., Yao, Y., Deng, K., Zhang, L., Ma, W.Y.: Igroup: web image search results clustering. In: ACM Multimedia, pp. 377–384 (2006)Google Scholar
  19. 19.
    Kummamuru, K., Lotlikar, R., Roy, S., Singal, K., Krishnapuram, R.: A hierarchical monothetic document clustering algorithm for summarization and browsing search results. In: WWW, pp. 658–665 (2004)Google Scholar
  20. 20.
    Li, Y., McLean, D., Bandar, Z., O’Shea, J., Crockett, K.A.: Sentence similarity based on semantic nets and corpus statistics. IEEE Trans. Knowl. Data Eng. 18(8), 1138–1150 (2006)CrossRefGoogle Scholar
  21. 21.
    Liu, S., Zhu, M., Zheng, Q.: Mining similarities for clustering Web video clips. In: CSSE (4), pp. 759–762 (2008)Google Scholar
  22. 22.
    Mecca, G., Raunich, S., Pappalardo, A.: A new algorithm for clustering search results. Data Knowl. Eng. 62(3), 504–522 (2007)CrossRefGoogle Scholar
  23. 23.
    Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI (2006)Google Scholar
  24. 24.
    Osinski, S., Weiss, D.: A concept-driven algorithm for clustering search results. IEEE Intell. Syst. 20(3), 48–54 (2005)CrossRefGoogle Scholar
  25. 25.
    Rege, M., Dong, M., Hua, J.: Graph theoretical framework for simultaneously integrating visual and textual features for efficient web image clustering. In: WWW, pp. 317–326 (2008)Google Scholar
  26. 26.
    Shah, C.: Tubekit: a query-based youtube crawling toolkit. In: JCDL, p. 433 (2008)Google Scholar
  27. 27.
    Shen, H.T., Ooi, B.C., Zhou, X., Huang, Z.: Towards effective indexing for very large video sequence database. In: SIGMOD Conference, pp. 730–741 (2005)Google Scholar
  28. 28.
    Shen, H.T., Zhou, X., Cui, B.: Indexing and integrating multiple features for WWW images. World Wide Web 9(3), 343–364 (2006)CrossRefGoogle Scholar
  29. 29.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)CrossRefGoogle Scholar
  30. 30.
    Siorpaes, K., Simperl, E.P.B.: Human intelligence in the process of semantic content creation. World Wide Web 13(1–2), 33–59 (2010)CrossRefGoogle Scholar
  31. 31.
    Snoek, C., Worring, M.: Multimodal video indexing: a review of the state-of-the-art. Multimedia Tools Appl. 25(1), 5–35 (2005)CrossRefGoogle Scholar
  32. 32.
    Taddesse, F.G., Tekli, J., Chbeir, R., Viviani, M., Yétongnon, K.: Semantic-based merging of rss items. World Wide Web 13(1–2), 169–207 (2010)CrossRefGoogle Scholar
  33. 33.
    Wang, H., Divakaran, A., Vetro, A., Chang, S.F., Sun, H.: Survey of compressed-domain features used in audio-visual indexing and analysis. J. Vis. Commun. Image Represent. 14(2), 150–183 (2003)CrossRefGoogle Scholar
  34. 34.
    Wang, X.J., Ma, W.Y., Zhang, L., Li, X.: Iteratively clustering web images based on link and attribute reinforcements. In: ACM Multimedia, pp. 122–131 (2005)Google Scholar
  35. 35.
    Woodruff, A., Rosenholtz, R., Morrison, J.B., Faulring, A., Pirolli, P.: A comparison of the use of text summaries, plain thumbnails, and enhanced thumbnails for web search tasks. JASIST 53(2), 172–185 (2002)CrossRefGoogle Scholar
  36. 36.
    Xu, S., Jin, T., Lau, F.C.M.: A new visual search interface for web browsing. In: WSDM, pp. 152–161 (2009)Google Scholar
  37. 37.
    Yang, J., Li, Q., Wenyin, L., Zhuang, Y.: Searching for flash movies on the web: A content and context based framework. World Wide Web 8(4), 495–517 (2005)CrossRefGoogle Scholar
  38. 38.
    Zamir, O., Etzioni, O.: Grouper: A dynamic clustering interface to web search results. Comput. Networks 31(11–16), 1361–1374 (1999)CrossRefGoogle Scholar
  39. 39.
    Zeng, H.J., He, Q.C., Chen, Z., Ma, W.Y., Ma, J.: Learning to cluster Web search results. In: SIGIR, pp. 210–217 (2004)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Alex Hindle
    • 1
  • Jie Shao
    • 1
  • Dan Lin
    • 2
  • Jiaheng Lu
    • 3
  • Rui Zhang
    • 1
  1. 1.Department of Computer Science and Software EngineeringThe University of MelbourneMelbourneAustralia
  2. 2.Department of Computer ScienceMissouri University of Science and TechnologyRoullaUSA
  3. 3.School of Information and DEKE, MOERenmin University of ChinaBeijingPeople’s Republic of China

Personalised recommendations