Multimedia Tools and Applications

, Volume 74, Issue 23, pp 10923–10963 | Cite as

SAPTE: A multimedia information system to support the discourse analysis and information retrieval of television programs

  • Moisés H. R. Pereira
  • Celso L. de Souza
  • Flávio L. C. Pádua
  • Giani D. Silva
  • Guilherme T. de Assis
  • Adriano C. M. Pereira


This paper presents a novel multimedia information system, called SAPTE, for supporting the discourse analysis and information retrieval of television programs from their corresponding video recordings. Unlike most common systems, SAPTE uses both content independent and dependent metadata, which are determined by the application of discourse analysis techniques as well as image and audio analysis methods. The proposed system was developed in partnership with the free-to-air Brazilian TV channel Rede Minas in an attempt to provide TV researchers with computational tools to assist their studies about this media universe. The system is based on the Matterhorn framework for managing video libraries, combining: (1) discourse analysis techniques for describing and indexing the videos, by considering aspects, such as, definitions of the subject of analysis, the nature of the speaker and the corpus of data resulting from the discourse; (2) a state of the art decoder software for large vocabulary continuous speech recognition, called Julius; (3) image and frequency domain techniques to compute visual signatures for the video recordings, containing color, shape and texture information; and (4) hashing and k-d tree methods for data indexing. The capabilities of SAPTE were successfully validated, as demonstrated by our experimental results, indicating that SAPTE is a promising computational tool for TV researchers.


Content-based video retrieval Video indexing Television Discourse analysis 



The authors gratefully acknowledge the financial support of FAPEMIG-Brazil under Procs. APQ-01180-10 and APQ-02269-11; CEFET-MG under Procs. PROPESQ-088/12 and PROPESQ-076/09; CAPES-Brazil and CNPq-Brazil.


  1. 1.
    Abrahamsson H, Nordmark M (2012) Program popularity and viewer behaviour in a large TV-on-demand system. In: Proceedings of the ACM conference on internet measurement conference, ACM, pp 199–210Google Scholar
  2. 2.
    Al-Surmi M (2012) Authenticity and TV shows: a multidimensional analysis perspective. TESOL Q 46(4):671–694Google Scholar
  3. 3.
    Andrade AAB, Sabino JLMF, Silva GD, Pádua FLC (2012) Perfil de Potenciais Usuarios de Um Sistema de Informação Multimídia para Recuperação de Vídeos Televisivos. In: Proceedings of the XVII Brazilian conference on communication sciences (INTERCOM-SE). Intercom 2012, Ouro Preto - MG, vol 1, pp 1–13Google Scholar
  4. 4.
    Avila SEFD, Araujo ADA (2009) VSUMM: an approach based on color features for automatic summarization and a subjective evaluation method. In: Proceedings of the XXII Brazilian symposium on computer graphics and image processing, SIBGRAPI. Rio de Janeiro, p 10. doi: 10.1109/SIBGRAPI.2008.31
  5. 5.
    Baaziz N, Abahmane O, Missaoui R (2010) Texture feature extraction in the spatial-frequency domain for content-based image retrieval. Comput Res Repos. arXiv:1012.5208
  6. 6.
    Bai H, Wang L, Qin G, Zhang J, Tao K, Chang X, Dong Y (2011) TV program segmentation using multi-modal information fusion. In: Proceedings of the ACM international conference on multimedia retrieval. ACM Press, pp 1–8Google Scholar
  7. 7.
    Baker P (2006) Using corpora in discourse analysis. ContinuumGoogle Scholar
  8. 8.
    Biber D, Jones JK (2005) Merging corpus linguistic and discourse analytic research goals: discourse units in biology research articles. Corpus Linguist Linguist Theory 1(2):151–182Google Scholar
  9. 9.
    Brown E, Srinivasan S, Coden A, Ponceleon D, Cooper J, Amir A, Pieper J (2001) Toward speech as a knowledge resource. IBM Syst J 40(4):526–528CrossRefGoogle Scholar
  10. 10.
    Brown JS, Duguid P (1991) Organizational learning and communities-of-practice: toward a unified view of working, learning, and innovation. Organ Sci 2(1):40–57CrossRefGoogle Scholar
  11. 11.
    Cesar P, Chorianopoulos K (2009) The evolution of TV systems, content, and users toward interactivity. Found Trends Human-Comp Inter 2(4):373–395CrossRefGoogle Scholar
  12. 12.
    Chang SF, Chen W, Meng HJ, Sundaram H, Zhong D (1997) VideoQ: an automated content based video search system using visual cues. In: Proceedings of the 5th ACM international conference on Multimedia. ACM, pp 313–324Google Scholar
  13. 13.
    Chang T, Kuo CJ (1993) Texture analysis and classification with tree-structured wavelet transform. IEEE Trans Image Process 2(4):429–441CrossRefGoogle Scholar
  14. 14.
    Charaudeau P (2002) A communicative conception of discourse. Discourse Studies 4(3):301–318CrossRefGoogle Scholar
  15. 15.
    Chatzigiorgaki M, Skodras AN (2009) Real-time keyframe extraction towards video content identification. In: Proceedings of the international conference on digital signal processing. IEEE Press, pp 934–939Google Scholar
  16. 16.
    Chen BW, Wang JC, Wang JF (2009) A novel video summarization based on mining the story-structure and semantic relations among concept entities. IEEE Trans Multimedia 11(2):295–312CrossRefGoogle Scholar
  17. 17.
    Chen LH, Lai YC, Mark Liao HY (2008) Movie scene segmentation using background information. Pattern Recognition 41:1056–1065CrossRefzbMATHGoogle Scholar
  18. 18.
    Cheng F (2012) Connection between news narrative discourse and ideology-based on narrative perspective analysis of News Probe. Asian Social Science 8(12):75CrossRefGoogle Scholar
  19. 19.
    Chiu CY, Wang JH, Chang HC (2007) Efficient histogram-based indexing for video copy detection. In: Proceedings of the IEEE international symposium on multimedia workshops. IEEE Computer Society, pp 265–270Google Scholar
  20. 20.
    Croft WB, Metzler D, Strohman T (2010) Search engines: information retrieval in practice. Pearson Education, IncGoogle Scholar
  21. 21.
    Datta R, Joshi D, Li J, Wang JZ (2008) Image retrieval: ideas, influences, and trends of the new age. ACM Comput Surv (CSUR) 40(2):1–60CrossRefGoogle Scholar
  22. 22.
    Duguid A (2010) Newspaper discourse informalisation: a diachronic comparison from keywords. Corpora 5(2):109–138CrossRefGoogle Scholar
  23. 23.
    Fontaine G, Borgne-Bachschmidt L, Leiba M et al (2010) Scenarios for the internet migration of the television industry. Communicataions Strategies 1(77):21–34Google Scholar
  24. 24.
    Geetha P, Narayanan V (2008) A survey of content-based video retrieval. J Comput Sci 4(6):474–486CrossRefGoogle Scholar
  25. 25.
    Gospodnetić O, Hatcher E (2005) Lucene in action: a guide to the java search engine. Manning PublicationsGoogle Scholar
  26. 26.
    Hearst MA (1993) TextTiling: a quantitative approach to discourse segmentation. Technical ReportGoogle Scholar
  27. 27.
    Hollink L, Schreiber G, Huurnink B, Van Liempt M, de Rijke M, Smeulders A, Oomen J, De Jong A (2009) A multidisciplinary approach to unlocking television broadcast archives. Interdisc Sci Rev 34(2-3):2–3CrossRefGoogle Scholar
  28. 28.
    Hu MK (1962) Visual pattern recognition by moment invariants. IRE Trans Inf Theory 8(2):179–187CrossRefzbMATHGoogle Scholar
  29. 29.
    Chen H, Li C (2010) A practical method for video scene segmentation. In: Proceedings of the 3rd IEEE international conference on computer science and information technology, vol 9, pp 153–156Google Scholar
  30. 30.
    Huurnink B, Snoek C, de Rijke M, Smeulders A (2012) Content-based analysis improves audiovisual archive retrieval. IEEE Trans Multimed 14(4):1166–1178CrossRefGoogle Scholar
  31. 31.
    Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3304–3311Google Scholar
  32. 32.
    Jonathan C, Archer D, Davies M (2008) Pragmatic annotation. WileyGoogle Scholar
  33. 33.
    Jorgensen MW, Phillips LJ (2002) Discourse analysis as theory and method. SageGoogle Scholar
  34. 34.
    Kawahara T, Lee A, Takeda K, Itou K, Shikano K (2004) Recent progress of open-source LVCSR engine Julius and Japanese model repository. In: 8th international conference on spoken language processingGoogle Scholar
  35. 35.
    Ketterl M, Schult OA, Hochman A (2010) Opencast Matterhorn: a community-driven open source software project for producing, managing, and distributing academic video. ITSE 7(3):168–180Google Scholar
  36. 36.
    Ketterl M, Schulte O, Hochman A (2009) Opencast Matterhorn: a community-driven open source solution for creation, management and distribution of audio and video in academia. In: Proceedings of the 11th IEEE international symposium on multimedia. IEEE, pp 687–692Google Scholar
  37. 37.
    Khalid MS, Ilyas MU, Sarfaraz MS, Ajaz MA (2006) Bhattacharyya coefficient in correlation of gray-scale objects. J Multimedia 1(1):209–214Google Scholar
  38. 38.
    Lagoze C, Van de Sompel H (2003) The making of the open archives initiative protocol for metadata harvesting. Library Hi Tech 21(2):118–128CrossRefGoogle Scholar
  39. 39.
    Lave J, Wenger E (2002) Legitimate peripheral participation in communities of practice. Supporting Lifelong Learning 1:111–126Google Scholar
  40. 40.
    Li Y, Narayanan S, Kuo C (2004) Content-based movie analysis and indexing based on audiovisual cues. IEEE Trans Circ Syst Video Tech 14(8):1073–1085CrossRefGoogle Scholar
  41. 41.
    Long F, Zhang H, Feng DD (2003) Multimedia information retrieval and management - technological fundamentals and applications. In: Science, chap Fundamenta, p 476. Springer-Verlag, BerlinGoogle Scholar
  42. 42.
    Lowe DG (1999) Object recognition from local scale-invariant features. In: Proceedings of the international conference on computer vision, ICCV ’99. IEEE Computer Society, vol 2Google Scholar
  43. 43.
    Lux M (2009) Caliph & Emir: MPEG-7 photo annotation and retrieval. In: Proceedings of the 17th ACM international conference on Multimedia. ACMGoogle Scholar
  44. 44.
    Lv Q, Josephson W, Wang Z, Charikar M, Li K (2006) Ferret: a toolkit for content-based similarity search of feature-rich data. In: Proceedings of the EuroSys conference, ACM, Leuven, BelgiumGoogle Scholar
  45. 45.
    Mann WC, Thompson SA (1998) Rhetorical structure theory: toward a functional theory of text organization. Text 8(3):243–281Google Scholar
  46. 46.
    Manson G, Berrani SA (2010) Automatic TV broadcast structuring. Int J DigitalMultimedia Broadcasting. doi: 10.1155/2010/153160
  47. 47.
    Marchionini G, Wildemuth BM, Geisler G (2006) The open video digital library: a Möbius strip of research and practice. J Am Soc Info Sci Tech 57(12):1629–1643CrossRefGoogle Scholar
  48. 48.
    Marcu D (2000) The rhetorical parsing of unrestricted texts: a surface-based approach. Computational Linguistics 26(3):395–448CrossRefGoogle Scholar
  49. 49.
    Neto N, Patrick C, Klautau A, Trancoso I (2011) Free tools and resources for Brazilian Portuguese speech recognition. J Braz Comput Soc 17:53–68CrossRefGoogle Scholar
  50. 50.
    Obrist M, Bernhaupt R, Tscheligi M (2008) Interactive TV for the Home: an ethnographic study on users’ requirements and experiences. Int J Hum Comput Interact 24(2):174–196CrossRefGoogle Scholar
  51. 51.
    (2014). Opencast Matterhorn: official release documentation for opencast Matterhorn (Matterhorn adopter guides).
  52. 52.
    Pan Z, Kosicki GM (1993) Framing analysis: an approach to news discourse. Political Communication 10(1):55–75CrossRefGoogle Scholar
  53. 53.
    Passonneau RJ, Litman DJ (1997) Discourse segmentation by human and automated means. Computational Linguistics 23(1):103–139Google Scholar
  54. 54.
    Pereira MHR, Pádua FLC, Silva GD, Assis GT, Zenha TM (2012) A multimedia information system to support the discourse analysis of video recordings of television programs. 7th Iberian conference on information systems and technologies (CISTI), vol 1, pp 58–63Google Scholar
  55. 55.
    (2013) Rede Minas: Television broadcaster TV Rede Minas.
  56. 56.
    Rey JM (2001) Changing gender roles in popular culture: dialogue in star trek episodes from 1966 to 1993. In: Conrad S, Biber D (eds) Variation in english: multidimensional studies, pp 138–55Google Scholar
  57. 57.
    Rubin N (2009) Preserving digital public television: not just an archive, but a new attitude to preserve public broadcasting. Library Trends 57(3):393–412CrossRefGoogle Scholar
  58. 58.
    Sabino JLMF (2011) Análise Discursiva de Entrevistas e Debates Televisivos como Parâmetro para Indexação e Recuperação de Informações em um Banco de Dados Audiovisuais. Master’s Thesis in Linguistics, Centro Federal de Educação Tecnológica de Minas Gerais (CEFET-MG), Belo HorizonteGoogle Scholar
  59. 59.
    Sabino JLMF, Silva GD, Pádua FLC (2010) Parâmetros Discursivos para Indexação da Programação Televisiva em um Banco de Dados Audiovisuais: Análise do Programa Rede Mídia, vol 1, pp 1–14Google Scholar
  60. 60.
    Sadlier DA, Marlow S, O’Connor N, Murphy N (2002) Automatic TV advertisement detection from MPEG Bitstream. Pattern Recognit 35(12):2719–2726CrossRefzbMATHGoogle Scholar
  61. 61.
    Sandhu R, Georgiou T, Tannenbaum A (2008) A new distribution metric for image segmentation. Medical Imaging, vol 6914Google Scholar
  62. 62.
    Schiffrin D, Tannen D, Hamilton HE (2008) The handbook of discourse analysis.
  63. 63.
    Smeaton AF (2007) Techniques used and open challenges to the analysis, indexing and retrieval of digital video. Inf Syst 32(4):545–559CrossRefGoogle Scholar
  64. 64.
    Smeaton AF, Lee H, McDonald K (2004) Experiences of creating four video library collections with the Físchlár System. Int J Digit Libr 4(1):42–44CrossRefGoogle Scholar
  65. 65.
    Souza CL (2012) Recuperação de Vídeos Baseada em Conteúdo em um Sistema de Informação para Apoio à Análise do Discurso Televisivo. Master’s Thesis in Mathematical and Computational Modeling, Centro Federal de Educação Tecnológica de Minas Gerais (CEFET-MG), Belo Horizonte – MGGoogle Scholar
  66. 66.
    Spaniol M, Klamma R, Jan βen H, Renzel D (2006) LAS: a lightweight application server for MPEG-7 services in community engines. In: Proceedings of the I-KNOW, vol 6, pp 6–8Google Scholar
  67. 67.
    Spyrou E, Avrithis Y (2007) Keyframe extraction using local visual semantics in the form of a region thesaurus. In: Proceedings of the international workshop on semantic media adaptation and personalization. IEEE Computer Society, pp 98–103Google Scholar
  68. 68.
    Stamou G, Van Ossenbruggen J, Pan JZ, Schreiber G, Smith JR (2006) Multimedia annotations on the semantic web. MultiMedia, IEEE 13(1):86–90CrossRefGoogle Scholar
  69. 69.
    Stegmaier F, Bailer W, Burger T, Suarez-Figueroa MC, Mannens E, Evain J, Kosch H (2013) Unified access to media metadata on the web. MultiMedia, IEEE 20(2):22–29CrossRefGoogle Scholar
  70. 70.
    Stegmeier J (2013) Toward a computer-aided methodology for discourse analysis. SPIL 41:91–114CrossRefGoogle Scholar
  71. 71.
    Upton TA, Cohen MA (2009) An approach to corpus-based discourse analysis: the move analysis as example. Discourse Studies 11(5):585–605CrossRefGoogle Scholar
  72. 72.
    Van Dijk TA (1987) News analysis. L Erlbaum AssociatesGoogle Scholar
  73. 73.
    Van Dijk TA (2013) News as discourse. RoutledgeGoogle Scholar
  74. 74.
    Wactlar H, Christel M, Gong Y, Hauptmann A (1999) Lessons learned from building a terabyte digital video library. Computer 32(2):66–73CrossRefGoogle Scholar
  75. 75.
    Weibel SL, Koch T (2000) The Dublin core metadata initiative. D-lib Magazine 6(12):1082–9873CrossRefGoogle Scholar
  76. 76.
    Van de Wouwer G, Scheunders P, Livens S, Van Dyck D (1999) Wavelet correlation signatures for color texture characterization. Pattern Recogn 32(3):443–451CrossRefGoogle Scholar
  77. 77.
    Yuan J, Zheng Q, Sun Z, Wang S (2012) Research on the technology of video semantic retrieval based on structured semantic strings. Foundations of intelligent systems, advances in intelligent and soft computing, vol 122. Springer Berlin Heidelberg, pp 721–730Google Scholar
  78. 78.
    Zeadally S, Moustafa H, Siddiqui F (2011) Internet protocol television (IPTV): architecture, trends, and challenges. Syst J IEEE 5(4):518–527CrossRefGoogle Scholar
  79. 79.
    Zheng Q, Zhou Z (2011) An MPEG-7 compatible video retrieval system with support for semantic queries. International conference on consumer electronics, communications and networks (CECNet), vol 122, pp 1035–1041Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Moisés H. R. Pereira
    • 1
  • Celso L. de Souza
    • 2
  • Flávio L. C. Pádua
    • 1
  • Giani D. Silva
    • 3
  • Guilherme T. de Assis
    • 4
  • Adriano C. M. Pereira
    • 5
  1. 1.Department of ComputingCEFET-MGBelo HorizonteBrazil
  2. 2.Department of ComputingIFSudeste-MGSão João del-ReiBrazil
  3. 3.Department of LanguagesCEFET-MGBelo HorizonteBrazil
  4. 4.Department of ComputingUFOPBelo HorizonteBrazil
  5. 5.Department of Computer ScienceUFMGBelo HorizonteBrazil

Personalised recommendations