Videolization: knowledge graph based automated video generation from web content

  • Murat Kalender
  • M. Tolga Eren
  • Zonghuan Wu
  • Ozgun Cirakman
  • Sezer Kutluk
  • Gunay Gultekin
  • Emin Erkan Korkmaz


Web content nowadays can also be accessed through new generation of Internet connected TVs. However, these products failed to change users’ behavior when consuming online content. Users still prefer personal computers to access Web content. Certainly, most of the online content is still designed to be accessed by personal computers or mobile devices. In order to overcome the usability problem of Web content consumption on TVs, this paper presents a knowledge graph based video generation system that automatically converts textual Web content into videos using semantic Web and computer graphics based technologies. As a use case, Wikipedia articles are automatically converted into videos. The effectiveness of the proposed system is validated empirically via opinion surveys. Fifty percent of survey users indicated that they found generated videos enjoyable and 42 % of them indicated that they would like to use our system to consume Web content on their TVs.


Semantic web Computer graphics Text-to-video Entity linking Knowledge graph DBpedia 


  1. 1.
    Bailer W, Schallauer P (2006) Detailed audiovisual profile: enabling interoperability between mpeg-7 based systems. In: 2006 12th International Multi-Media Modelling Conference, pp 8. doi:10.1109/MMMC.2006.1651323
  2. 2.
    Borman A, Mihalcea R, Tarau P (2005) Picnet: Augmenting semantic resources with pictorial representations. In: AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors, AAAI, pp 1–7Google Scholar
  3. 3.
    Cai R, Zhang L, Jing F, Lai W, Ma WY (2007) Automated music video generation using web image resource. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, IEEE, vol 2, pp II–737Google Scholar
  4. 4.
    Cornolti M, Ferragina P, Ciaramita M (2013) A framework for benchmarking entity-annotation systemsGoogle Scholar
  5. 5.
    Coyne B, Sproat R (2001) Wordseye: An automatic text-to-scene conversion system. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, ACM, New York, NY, USA, SIGGRAPH ’01, pp 487–496Google Scholar
  6. 6.
    Ferragina P, Scaiella U (2010) Tagme: On-the-fly annotation of short text fragments (by wikipedia entities). In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, ACM, New York, NY, USA, CIKM ’10, pp 1625–1628Google Scholar
  7. 7.
    Hansen V (2006) Interactive television design – designing for interactive television v 1.0 bbci & interactive tv programmes. BBCGoogle Scholar
  8. 8.
    Heath D, Ventura D (2016) Creating images by learning image semantics using vector space models.
  9. 9.
    Hoffart J, Suchanek FM, Berberich K, Lewis-Kelham E, de Melo G, Weikum G (2011a) Yago2: Exploring and querying world knowledge in time, space, context, and many languages. In: Proceedings of the 20th International Conference Companion on World Wide Web, ACM, New York, NY, USA, WWW ’11, pp 229–232. doi:10.1145/1963192.1963296
  10. 10.
    Hoffart J, Yosef MA, Bordino I, Fürstenau H, Pinkal M, Spaniol M, Taneva B, Thater S, Weikum G (2011b) Robust disambiguation of named entities in text. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Stroudsburg, PA, USA, EMNLP ’11, pp 782–792Google Scholar
  11. 11.
    Kulkarni S, Singh A, Ramakrishnan G, Chakrabarti S (2009) Collective annotation of wikipedia entities in web text. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, NY, USA, KDD ’09, pp 457–466Google Scholar
  12. 12.
    Liu Y, Zhang D, Lu G, Ma WY (2007) A survey of content-based image retrieval with high-level semantics. Pattern Recognition 40(1):262–282. doi:10.1016/j.patcog.2006.04.045. CrossRefMATHGoogle Scholar
  13. 13.
    Meij E, Weerkamp W, de Rijke M (2012) Adding semantics to microblog posts. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, ACM, New York, NY, USA, WSDM ’12, pp 563–572Google Scholar
  14. 14.
    Mendes PN, Jakob M, García-Silva A, Bizer C (2011) Dbpedia spotlight: Shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, ACM, New York, NY, USA, I-Semantics ’11, pp 1–8Google Scholar
  15. 15.
    Mihalcea R, Leong CW (2008) Toward communicating simple sentences using pictorial representations. Machine Translation 22(3):153–173CrossRefGoogle Scholar
  16. 16.
    Milne D, Witten IH (2008) Learning to link with wikipedia. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, ACM, New York, NY, USA, CIKM ’08, pp 509–518Google Scholar
  17. 17.
    Nenkova A, McKeown K (2012) blubberdiblubb A survey of text summarization techniques. In: Aggarwal CC, Zhai C, blubberdiblubb (eds) Mining Text Data, Springer, pp 43–76Google Scholar
  18. 18.
    Ohya H, Morishima S (2012) Automatic music video creation system by reusing existing contents in video-sharing service based on hmmGoogle Scholar
  19. 19.
    Ratinov L, Roth D, Downey D, Anderson M (2011) Local and global algorithms for disambiguation to wikipedia. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, Association for Computational Linguistics, Stroudsburg, PA, USA, HLT ’11, pp 1375–1384Google Scholar
  20. 20.
    Shim H, Kang B, Kwag K (2009) Web2animation - automatic generation of 3d animation from the web text. In: Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01, IEEE Computer Society, Washington, DC, USA, WI-IAT ’09, pp 596–601Google Scholar
  21. 21.
    Socher R, Karpathy A, Le QV, Manning CD, Ng AY (2014) Grounded compositional semantics for finding and describing images with sentences. TACL 2:207–218Google Scholar
  22. 22.
    Sumi K, Tanaka K (2005) Transforming web contents into a storybook with dialogues and animations. In: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, ACM, New York, NY, USA, WWW ’05, pp 1076–1077Google Scholar
  23. 23.
    Tanaka K (2007) Research on fusion of the web and tv broadcasting. In: Proceedings of the Second International Conference on Informatics Research for Development of Knowledge Society Infrastructure, IEEE Computer Society, Washington, DC, USA, ICKS ’07, pp 129–136Google Scholar
  24. 24.
    Tao D, Cheng J, Gao X, Li X, Deng C (2016a) Robust sparse coding for mobile image labeling on the cloud. IEEE Transactions on Circuits and Systems for Video Technology PP(99):1–1. doi:10.1109/TCSVT.2016.2539778 CrossRefGoogle Scholar
  25. 25.
    Tao D, Guo Y, Song M, Li Y, Yu Z, Tang YY (2016b) Person re-identification by dual-regularized kiss metric learning. IEEE Transactions on Image Processing 25(6):2726–2738. doi:10.1109/TIP.2016.2553446 MathSciNetCrossRefGoogle Scholar
  26. 26.
    UzZaman N, Bigham JP, Allen JF (2011) Multimodal summarization of complex sentences. In: Proceedings of the 16th International Conference on Intelligent User Interfaces, ACM, New York, NY, USA, IUI ’11, pp 43–52. doi:10.1145/1943403.1943412
  27. 27.
    Witten IH, Milne D, 2008 An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: an Evolving Synergy, AAAI Press, Chicago, USA, pp 25-30Google Scholar
  28. 28.
    Wu X, Xu B, Qiao Y, Tang X (2012) Automatic music video generation: cross matching of music and image. In: Proceedings of the 20th ACM international conference on Multimedia, ACM, pp 1381–1382Google Scholar
  29. 29.
    Zhu X, Goldberg AB, Eldawy M, Dyer CR, Strock B (2007) A text-to-picture synthesis system for augmenting communication. In: Proceedings of the 22Nd National Conference on Artificial Intelligence - Volume 2, AAAI Press, AAAI’07, pp 1590–1595Google Scholar
  30. 30.
    Zitnick CL, Parikh D, Vanderwende L (2013) Learning the visual interpretation of sentences. In: ICCV, IEEE, pp 1681–1688Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Murat Kalender
    • 1
    • 2
  • M. Tolga Eren
    • 2
  • Zonghuan Wu
    • 3
  • Ozgun Cirakman
    • 2
  • Sezer Kutluk
    • 2
  • Gunay Gultekin
    • 2
  • Emin Erkan Korkmaz
    • 1
  1. 1.Department of Computer Engineering, Faculty of EngineeringYeditepe UniversityIstanbulTurkey
  2. 2.Huawei TechnologiesHuawei Turkey R&D CenterIstanbulTurkey
  3. 3.Huawei TechnologiesSoftware LabSanta ClaraUSA

Personalised recommendations