A video indexing and retrieval computational prototype based on transcribed speech

Spolaôr, Newton; Lee, Huei Diana; Takaki, Weber Shoity Resende; Ensina, Leandro Augusto; Parmezan, Antonio Rafael Sabino; Oliva, Jefferson Tales; Coy, Claudio Saddy Rodrigues; Wu, Feng Chung

doi:10.1007/s11042-021-11401-1

A video indexing and retrieval computational prototype based on transcribed speech

Published: 30 August 2021

Volume 80, pages 33971–34017, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Newton Spolaôr ORCID: orcid.org/0000-0003-0748-3693¹,
Huei Diana Lee¹,
Weber Shoity Resende Takaki¹,
Leandro Augusto Ensina¹,
Antonio Rafael Sabino Parmezan^1,2,
Jefferson Tales Oliva^1,3,
Claudio Saddy Rodrigues Coy⁴ &
…
Feng Chung Wu^1,4

457 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

Using the voice to interact with systems is attractive in medicine and other areas due to its friendliness and flexibility. Video indexing and retrieval have benefited from this resource. However, few initiatives use speech recognition to support both tasks. This work aims to develop and evaluate a prototype system to index and retrieve videos from speech transcription. In particular, the user can narrate each video’s content, generating the utterance that is captured, transformed into text and timestamped by the computational system. Simple text processing techniques are then applied to the obtained transcript before indexing. Afterward, the user can also query by speech or text to find relevant videos previously indexed. We conducted an experimental evaluation of the prototype in sets of 50 and 10 public videos. As part of this process, one collaborator manually narrated the 50 videos, while four others narrated a subset of 13 videos. An automatic narration scheme was also applied to this subset and the set of 10 videos. The evaluation showed promising results regarding Brazilian Portuguese speech recognition and retrieval performance. For example, the average word error rate reached down to 0.03 and the mean average precision achieved up to 1.00. Besides performing well, the computational tool is flexible since few changes are required to support other languages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

What Speech Recognition Accuracy is Needed for Video Transcripts to be a Useful Search Interface?

An Investigation of Cross-Language Information Retrieval for User-Generated Internet Video

A Corpus with Wavesurfer and TEI: Speech and Video in TEITOK

Data availability

The videos used by us are publicly available in Portuguese in the links reported in Sect. 3.3. More information on the data or the prototype’s configuration files is available under request to the authors.

Code availability

Currently, the code is not available.

Notes

References

Agharwal A, Kovvuri R, Nevatia R, Snoek CGM (2016) Tag-based video retrieval by embedding semantic content in a continuous word space. In IEEE Winter Conf Appl Comput Vis New York. IEEE, The United States of America, pp 1–8. https://doi.org/10.1109/WACV.2016.7477706
Akosu N, Selamat A (2014) Enhancing the effectiveness of the spelling checker approach for language identification. In: Badica A, Trawinski B, Nguyen NT (eds) Recent Developments in Computational Collective Intelligence, Studies in Computational Intelligence, vol 513, Springer International Publishing, Cham, pp 157–16. https://doi.org/10.1007/978-3-31901787-7_15
Al Kabary I, Schuldt H (2014) Enhancing sketch-based sport video retrieval by suggesting relevant motion paths. In: ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, The United States of America, pp 1227–1230. https://doi.org/10.1145/2600428.2609551
Ambekar T, Musande V (2017) A novel approach to personalize the health care video search. In: International Conference on Intelligent Systems and Information Management, IEEE, New York, The United States of America, pp 212–216, https://doi.org/10.1109/ICISIM.2017.8122175
Amir A, Srinivasan S, Efrat A (2003) Search the audio, browse the video–a generic paradigm for video collections. EURASIP J Adv Sig Pr 2003(2):209–222. https://doi.org/10.1155/S111086570321012X
Article Google Scholar
Amorim MN, Segundo RMC, Santos CAS, Tavares OL (2017) Crowdnote: Crowdsourcing environment for complex video annotations. In: Brazilian Symposium of Multimedia Systems and the Web–Tools and Applications Workshop, Brazilian Computer Society, Porto Alegre, Brazil, pp 194–198
Atkins A, Niranjan M, Gerding E (2018) Financial news predicts stock market volatility better than close price. J Finance Data Sci 4(2):120–137. https://doi.org/10.1016/j.jfds.2018.02.002
Article Google Scholar
Barra GDO, Lux M, I-Nieto XG (2016) Large scale content-based video retrieval with LIvRE. In: International Workshop on Content-Based Multimedia Indexing, IEEE, New York, The United States of America, pp 1–4. https://doi.org/10.1109/CBMI.2016.7500266
Bastianelli E, Castellucci G, Croce D, Basili R, Nardi D (2017) Structured learning for spoken language understanding in human-robot interaction. Int J Robot Res 36(5–7):660–683. https://doi.org/10.1177/0278364917691112
Article Google Scholar
Bernard G, Lebboss G (2017) Methods for word encoding: A survey. In: International Conference on Engineering and Technology, IEEE, New York, The United States of America, pp 1–6. https://doi.org/10.1109/ICEngTechnol.2017.8308139
Besacier L, Barnard E, Karpov A, Schultz T (2014) Automatic speech recognition for under-resourced languages: A survey. Speech Commun 56:85–100. https://doi.org/10.1016/j.specom.2013.07.008
Article Google Scholar
Bird S, Klein E, Loper E (2009) Natural Language Processing with Python - Analyzing Text with the Natural Language Toolkit. O’Reilly Media, Sebastopol, The United States of America
MATH Google Scholar
Bonilla Cardona DA, Nedjah N, Mourelle LM (2017) Online phoneme recognition using multi-layer perceptron networks combined with recurrent non-linear autoregressive neural networks with exogenous inputs. Neurocomputing 265:78–90
Article Google Scholar
Cao Y, Tavanapong W, Li D, Oh J, de Groen PC, Wong J (2004) A visual model approach for parsing colonoscopy videos. In: Enser P, Kompatsiaris Y, O’Connor NE, Smeaton AF, Smeulders AWM (eds) Image and Video Retrieval, Lecture Notes in Computer Science, vol 3115, Springer Berlin Heidelberg, Berlin, Germany, pp 160–169. https://doi.org/10.1007/978-3-540-27814-6_22
Carpineto C, Romano G (2012) A survey of automatic query expansion in information retrieval. ACM Comput Surv 44(1):1:1–1:50. https://doi.org/10.1145/2071389.2071390
Charriére K, Quellec G, Lamard M, Coatrieux G, Cochener B, Cazuguel G (2014) Automated surgical step recognition in normalized cataract surgery videos. In: Int Conf IEEE Eng Med Biol Soc, IEEE, New York, The United States of America, pp 4647–4650. https://doi.org/10.1109/EMBC.2014.6944660
Choi J, Wang Z, Lee S, Jeon WJ (2013) A spatio-temporal pyramid matching for video retrieval. Comput Vis Image Und 117(6):660–669. https://doi.org/10.1016/j.cviu.2013.02.003
Article Google Scholar
Christel MG, Huang C, Moraveji N, Papernick N (2004) Exploiting multiple modalities for interactive video retrieval. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, New York, The United States of America, vol 3, pp 1032–1035. https://doi.org/10.1109/ICASSP.2004.1326724
Coulouris G, Dollimore J, Kindberg T, Blair G (2011) Distributed systems: concepts and design. Addison-Wesley, Boston, The United States of America
MATH Google Scholar
D’agostino RB, Belanger A, Jr RBD (1990) A suggestion for using powerful and informative tests of normality. Am Stat 44(4):316–321. https://doi.org/10.1080/00031305.1990.10475751
Das D, Chen D, Hauptmann AG (2008) Improving multimedia retrieval with a video ocr. In: Gevers T, Jain RC, Santini S (eds) Multimedia Content Access: Algorithms and Systems II, Proceedings of SPIE, vol 6820, SPIE, Bellingham, The United States of America, pp 68200B–1– 68200B–12. https://doi.org/10.1117/12.766931
de Toledo TF, Lee HD, Spolaôr N, Coy CSR, Wu FC (2019) Web system prototype based on speech recognition to construct medical reports in Brazilian Portuguese. Int J Méd Informatics 121:39–52. https://doi.org/10.1016/j.ijmedinf.2018.10.010
Doan A, Ramakrishnan R, Halevy AY (2011) Crowdsourcing systems on the world-wide web. Commun ACM 54(4):86–96. https://doi.org/10.1145/1924421.1924442
Article Google Scholar
Ghoulam A, Barigou F, Belalem G, Meziane F (2018) Query expansion using medical information extraction for improving information retrieval in french medical domain. Int J Intell Inf Technol 14(3):1–17. https://doi.org/10.4018/IJIIT.2018.070101
Article Google Scholar
Giannakopoulos T, Pikrakis A, Theodoridis S (2008) A novel efficient approach for audio segmentation. In: Int Conf Pattern Recognit, IEEE, Tampa, The United States of America, pp 1–4
Girish KVV (2019) Beginner’s guide to speech analysis. https://towardsdatascience.com/beginners-guide-to-speech-analysis4690ca7a7c05
Goel P, Giangreco I, Rossetto L, Tănase C, Schuldt H(2017) “hey,vitrivr!” – a multimodal ui for video retrieval. In: Jose JM, Hauff C, Altıngovde IS, Song D, Albakour D, Watt S, Tait J (eds) Advances in Information Retrieval, Springer International Publishing, Cham, Switzerland, pp 749–752. https://doi.org/10.1007/978-3-319-56608-5_7
Gómez-Durán J, Simancas-García J, Acosta-Coll M, Meléndez-Pertuz F, Vélez-Zapata J (2017) Speech recognition algorithm based on nonlinear techniques (in spanish). Espacios 38(17):4–21. https://repositorio.cuc.edu.co/xmlui/handle/11323/904
Granell E, Romero V, MartínezHinarejos CD (2018) Multimodality, interactivity, and crowdsourcing for document transcription. Comput Intell 34(2):398–419. https://doi.org/10.1111/coin.12169
Hu W, Xie N, Li L, Zeng X, Maybank S (2011) A survey on visual content based video indexing and retrieval. IEEE Trans Syst Man Cyber C Appl Rev 41(6):797–819. https://doi.org/10.1109/TSMCC.2011.2109710
Huurnink B, Snoek CGM, de Rijke M, Smeulders AWM (2012) Content-based analysis improves audiovisual archive retrieval. IEEE Trans Multimedia 14(4):1166–1178. https://doi.org/10.1109/TMM.2012.2193561
Article Google Scholar
Ianeva TI, Vries APD, Westerveld T (2004) A dynamic probabilistic multimedia retrieval model. In: IEEE International Conference on Multimedia and Expo, IEEE, New York, The United States of America, vol 3, pp 1607–1610. https://doi.org/10.1109/ICME.2004.1394557
Inoue N, Shinoda K (2016) Semantic indexing for large-scale video retrieval. ITE Trans Media Technol Appl 4(3):209–217. https://doi.org/10.3169/mta.4.209
Article Google Scholar
Iwata S, Ohyama W, Wakabayashi T, Kimura F (2016) Recognition and transition frame detection of arabic news captions for video retrieval. In: Int Conf Pattern Recognit, IEEE, New York, The United States of America, pp 4005–4010. https://doi.org/10.1109/ICPR.2016.7900260
Ji X, Han J, Hu X, Li K, Deng F, Fang J, Guo L, Liu T (2011) Retrieving video shots in semantic brain imaging space using manifold-ranking. In: IEEE International Conference on Image Processing, IEEE, New York, The United States of America, pp 3633–3636. https://doi.org/10.1109/ICIP.2011.6116505
Jiang L, Yu S, Meng D, Yang Y, Mitamura T, Hauptmann AG (2015) Fast and accurate content-based semantic search in 100m internet videos. In: ACM International Conference on Multimedia, ACM, New York, The United States of America, pp 49–58. https://doi.org/10.1145/2733373.2806237
Johnson M, Lapkin S, Long V, Sanchez P, Suominen H, Basilakis J, Dawson L (2014) A systematic review of speech recognition technology in health care. BMC Med Inform Decis Mak 14(1):94. https://doi.org/10.1186/14726947-14-94
Article Google Scholar
Johnston AB, Burnett DC (2001) Professional Java Server Programming J2EE 1.3 Edition. Wrox Press, Birmingham, United Kingdom
Johnston AB, Burnett DC (2014) WebRTC: APIs and RTCWEB Protocols of the HTML5 Real-Time Web, 3rd edn. Digital Codex LLC, Saint Louis, The United States of America
Google Scholar
Kamabathula VK, Iyer S (2011) Automated tagging to enable fine-grained browsing of lecture videos. In: 2011 IEEE International Conference on Technology for Education, IEEE, New York, The United States of America, pp 96–102. https://doi.org/10.1109/T4E.2011.23
Kayama A, Carvalho F, Castro L, Herr M, Rubim M, Pádua M, Mattos W (2007) Sung Brazilian Portuguese: Pronunciation standards for Brazilian Portuguese in scholarly chant (in Portuguese). OPUS 13(2):16–38. https://www.anppom.com.br/revista/index.php/opus/article/view/300
Google Scholar
Kemp T, Weber M, Waibel A (2001) The ISL view4you broadcast news transcription system. Int J Speech Technol 4(3–4):177–191. https://doi.org/10.1023/A:1011348306007
Article MATH Google Scholar
Larson M, Newman E, Jones GJF (2010) Overview of videoclef 2009: New perspectives on speech-based multimedia content enrichment. In: Peters C, Caputo B, Gonzalo J, Jones GJF, Kalpathy-Cramer J, Müller H, Tsikrika T (eds) Multilingual Information Access Evaluation II. Multimedia Experiments, Lecture Notes in Computer Science, vol 6242, Springer-Verlag, Berlin, Germany, pp 354–368. https://doi.org/10.1007/978-3-642-15751-6_46
Li H, Bao L, Gao Z, Overwijk A, Liu W, Zhang L, Yu S, Chen M, Metze F, Hauptmann AG (2010) Informedia @ trecvid 2010. https://www.cs.unc.edu/~wliu/papers/trecvid2010_informedia.pdf
Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165. https://doi.org/10.1147/rd.22.0159
Article MathSciNet Google Scholar
Luong TH, Pham NM, Vu QH (2016) Vietnamese multimedia agricultural information retrieval system as an info service. In: Murakami Y, Lin D (eds) International Workshop on Worldwide Language Service Infrastructure, Lecture Notes in Computer Science, vol 9442, Springer International Publishing, Cham, Switzerland, pp 147–160. https://doi.org/10.1007/978-3319-31468-6_11
Machado RB, Lee HD, Ayrizono MDLS, Leal RF, Coy CSR, Fagundes JJ, Wu FC (2012) Prototype of a computer system for managing data and video colonoscopy exams. J Coloproctol (Rio de Janeiro) 32(1):50–59. https://doi.org/10.1590/S2237-93632012000100007
Article Google Scholar
Mitrović D, Zeppelzauer M, Zaharieva M, Breiteneder C (2011) Retrieval of visual composition in film. In: International Workshop on Image Analysis for Multimedia Interactive Services, TU Delft, Delft, The Netherlands, pp 1–4
Mühling M, Meister M, Korfhage N, Wehling J, Hörth A, Ewerth R, Freisleben B (2016) Content-based video retrieval in historical collections of the german broadcasting archive. In: Fuhr N, Kovács L, Risse T, Nejdl W (eds) International Conference on Theory and Practice of Digital Libraries, Lecture Notes in Computer Science, vol 9819, Springer International Publishing, Cham, Switzerland, pp 67–78. https://doi.org/10.1007/978-3-31943997-6_6
Neto N, Patrick C, Klautau A, Trancoso I (2011) Free tools and resources for Brazilian Portuguese speech recognition. J Braz Comput Soc 17(1):53–68. https://doi.org/10.1007/s13173-010-0023-1
Article Google Scholar
Oliva JT, Lee HD, Spolaôr N, Takaki WSR, Coy CSR, Fagundes JJ, Wu FC (2019) A computational system based on ontologies to automate the mapping process of medical reports into structured databases. Expert Syst Appl 115:37–56. https://doi.org/10.1016/j.eswa.2018.08.004
Article Google Scholar
Pala M, Parayitam L, Appala V (2019) Real-time transcription, keyword spotting, archival and retrieval for telugu tv news using ASR. Int J Speech Technol 22:433–439. https://doi.org/10.1007/s10772-019-09598-6
Article Google Scholar
Pereira MHR, de Souza CL, Pádua FLC, Silva GD, de Assis GT, Pereira ACM (2015) SAPTE: A multimedia information system to support the discourse analysis and information retrieval of television programs. Multimed Tools Appl 74(23):10923–10963. https://doi.org/10.1007/s11042-014-2311-9
Article Google Scholar
Pham NM, Vu QH (2013) Acoustic modeling for under-resourced languages: A role in Vietnamese soccer video retrieval. In: International Conference on Advanced Technologies for Communications, IEEE, New York, The United States of America, pp 652–656. https://doi.org/10.1109/ATC.2013.6698195
Pham NM, Vu QH (2013) Temporal confusion network for speech-based soccer event retrieval. In: International Conference on Advanced Technologies for Communications, IEEE, New York, The United States of America, pp 549–553. https://doi.org/10.1109/ATC.2013.6698176
Pranali B, Anil W, Kokhale S (2015) Inhalt based video recuperation system using OCR and ASR technologies. In: International Conference on Computational Intelligence and Communication Networks, IEEE, New York, The United States of America, pp 382–386. https://doi.org/10.1109/CICN.2015.315
Pressman RS (2010) Software Engineering: A Practitioner’s Approach, 7th edn. McGraw-Hill, Boston, The United States of America
MATH Google Scholar
Priya R, Shanmugam TN (2013) A comprehensive review of significant researches on content based indexing and retrieval of visual information. Front Comput Sci 7(5):782–799. https://doi.org/10.1007/s11704-013-1276-6
Article MathSciNet Google Scholar
Quilici AF (2000) Colonoscopy (in Portuguese). Lemos, São Paulo, Brazil
Google Scholar
Radha N (2016) Video retrieval using speech and text in video. In: International Conference on Inventive Computation Technologies, IEEE, New York, The United States of America, pp 1–6. https://doi.org/10.1109/INVENTIVE.2016.7824801
Rahman MM, Bhuiyan MA (2012) Continuous bangla speech segmentation using short-term speech features extraction approaches. Int J Adv Comput Sci Appl 3(11):131–138. https://doi.org/10.14569/IJACSA.2012.031121
Article Google Scholar
Rautiainen M, Ojala T, Seppänen T (2004) Analysing the performance of visual, concept and text features in content-based video retrieval. In: ACM SIGMM International Workshop on Multimedia Information Retrieval, ACM, New York, The United States of America, pp 197–204. https://doi.org/10.1145/1026711.1026744
Ravinder M, Venugopal T (2016) Content-based video indexing and retrieval using block based local binary patterns and pixel change ratio map (bblbppcrm). Int J Eng Technol 7(6):2156–2162. http://www.enggjournals.com/ijet/docs/IJET15-07-06-050.pdf
Google Scholar
Repp S, Linckels S, Meinel C (2008) Question answering from lecture videos based on an automatic semantic annotation. SIGCSE Bull 40(3):17–21. https://doi.org/10.1145/1597849.1384278
Article Google Scholar
Rooij OD, Worring M (2012) Efficient targeted search using a focus and context video browser. ACM Trans Multimedia Comput Commun Appl 8(4):51:1–51:19. https://doi.org/10.1145/2379790.2379793
Rosas VP, Mihalcea R, Morency LP (2013) Multimodal sentiment analysis of spanish online videos. IEEE Intell Syst 28(3):38–45. https://doi.org/10.1109/MIS.2013.9
Article Google Scholar
Rossetto L, Giangreco I, Gasser R, Schuldt H (2018) Content-based multimedia retrieval using vitrivr. ACM SIGMultimedia Rec 9(3):8:8–8:8. 10.1145/3178422.3178430
Rudinac S, Larson M, Hanjalic A (2010) Exploiting result consistency to select query expansions for spoken content retrieval. In: Gurrin C, He Y, Kazai G, Kruschwitz U, Little S, Roelleke T, Rüger S, van Rijsbergen K (eds) Advances in Information Retrieval, Lecture Notes in Computer Science, vol 5993, Springer Berlin Heidelberg, Berlin, Heidelberg, pp 645– 648. https://doi.org/10.1007/978-3-642-12275-0_67
Saita J (2018) Ok google: How to do speech recognition? https://towardsdatascience.com/ok-google-how-to-do-speechrecognition-f77b5d7cbe0b
Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620. https://doi.org/10.1145/361219.361220
Article MATH Google Scholar
Schoeffmann K, Beecks C, Lux M, Uysal MS, Seidl T (2016) Content based retrieval in videos from laparoscopic surgery. In: Webster RJ, Yaniv ZR (eds) Medical Imaging 2016: Image-Guided Procedures, Robotic Interventions, and Modeling, Proceedings of SPIE, vol 9786, SPIE, Bellingham, The United States of America, pp 9786–9786–10. https://doi.org/10.1117/12.2216864
Shao L, Jones S, Li X (2014) Efficient search and localization of human actions in video databases. IEEE Trans Circuits Syst Video Technol 24(3):504–512. https://doi.org/10.1109/TCSVT.2013.2276700
Article Google Scholar
Sharma R, Mummareddy S, Hershey J, Jung N (2013) Method and system for analyzing shopping behavior in a store by associating RFID data with video-based behavior and segmentation data. Patent US 8380558
Sheikh I, Fohr D, Illina I, Linars G (2017) Modelling semantic context of oov words in large vocabulary continuous speech recognition. IEEE/ACM Trans Audio Speech Lang Process 25(3):598–610. https://doi.org/10.1109/TASLP.2017.2651361
Article Google Scholar
Silva CPA (2010) A speech recognition software for Brazilian Portuguese (in Portuguese). Master’s thesis, Pará Federal University, Belém, Brazil
Singh A, Larson M (2013) Narrative-driven multimedia tagging and retrieval: Investigating design and practice for speech-based mobile applications. Language and Audio in Multimedia, In Workshop on Speech, pp 90–95
Google Scholar
Singhal A (2001) Modern information retrieval: A brief overview. Bull IEEE Comput Soc Technical Comm Data Eng 24(4):35–43
Sivic J, Zisserman A (2009) Efficient visual search of videos cast as text retrieval. IEEE Trans Pattern Anal Mach Intell 31(4):591–606. https://doi.org/10.1109/TPAMI.2008.111
Article Google Scholar
Spolaôr N, Lee HD, Takaki WSR, Ensina LA, Coy CSR, Wu FC (2020) A systematic review on content-based video retrieval. Eng Appl Artif Intel 90:103557. https://doi.org/10.1016/j.engappai.2020.103557
Sprugnoli R, Moretti G, Bentivogli L, Giuliani D (2017) Creating a ground truth multilingual dataset of news and talk show transcriptions through crowdsourcing. Lang Resour Eval 51(2):283–317. https://doi.org/10.1007/s10579-016-9372-5
Article Google Scholar
Tahayna B, Ayyasamy RK, Alhashmi S, Eu-Gene S (2010) A novel weighting scheme for efficient document indexing and classification. In: International Symposium on Information Technology, IEEE, New York, The United States of America, vol 2, pp 783-788. https://doi.org/10.1109/ITSIM.2010.5561553
Vigneshwari G, Juliet ANM (2015) Optimized searching of video based on speech and video text content. In: International Conference on Soft-Computing and Networks Security, IEEE, New York, The United States of America, pp 1–4. https://doi.org/10.1109/ICSNS.2015.7292369
Vogel M, Kaisers W, Wassmuth R, Mayatepek E (2015) Analysis of documentation speed using web-based medical speech recognition technology: Randomized controlled trial. J Méd Internet Res 17(11):e247. https://doi.org/10.2196/jmir.5072
Article Google Scholar
Waheed K, Weaver K, Salam FM (2002) A robust algorithm for detecting speech segments using an entropic contrast. In: The Midwest Symposium on Circuits and Systems, IEEE, Tulsa, The United States of America, pp III–328–III–331
Wang X, Yang C, Guan R (2018) A comparative study for biomedical named entity recognition. Int J Mach Learn Cyber 9(3):373–382. https://doi.org/10.1007/s13042-015-0426-6
Article Google Scholar
Wei XY, Jiang YG, Ngo CW (2011) Concept-driven multi-modality fusion for video search. IEEE Trans Circuits Syst Video Technol 21(1):62–73. https://doi.org/10.1109/TCSVT.2011.2105597
Article Google Scholar
Witbrock MJ, Hauptmann AG (1998) Speech recognition for a digital video library. J Am Soc Inf Sci Technol 49(7):619–632. https://doi.org/10.1002/(SICI)1097-4571
Article Google Scholar
Wu FC, Lee HD, Coy CSR, Fagundes JJ, Ferrero CA, Machado RB, Maletzke AG, Zalewski W, Leal RF, Ayrizono MLS, Costa LHD (2010) Method to map textual documents into structured databases using ontologies (in Portuguese). Patent BR INPI 01810036941
Yang H, Meinel C (2014) Content based lecture video retrieval using speech and video text information. IEEE Trans Learn Technol 7(2):142–154
Article Google Scholar
Yang H, Meinel C (2014) Content based lecture video retrieval using speech and video text information. IEEE Trans Learn Technol 7(2):142–154
Yin Y, Seo B, Zimmermann R (2015) Content vs. context: Visual and geographic information use in video landmark retrieval. ACM Trans Multimedia Comput Commun Appl 11(3):39:1–39:21. https://doi.org/10.1145/2700287
Yu D, Deng L (2015) Automatic Speech Recognition: A Deep Learning Approach. Springer-Verlag, London, London, United Kingdom
MATH Google Scholar
Zhai Y, Liu J, Shah M (2006) Automatic query expansion for news video retrieval. In: IEEE International Conference on Multimedia and Expo, IEEE, New York, The United States of America, pp 965–968. https://doi.org/10.1109/ICME.2006.262693
Zhao B, Xu S, Lin S, Luo X, Duan L (2016) A new visual navigation system for exploring biomedical open educational resource (OER) videos. J Am Med Inform Assoc 23(e1):e34–e41. https://doi.org/10.1093/jamia/ocv123
Article Google Scholar

Download references

Funding

We would like to thank Araucária Foundation for the Support of the Scientific and Technological Development of Paraná through a Research and Technological Productivity Scholarship for H. D. Lee (grant 028/2019). We also would like to thank PGEEC/UNIOESTE through a postdoctoral scholarship for N. Spolaôr, the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) – Finance Code 001 through a MSc. scholarship for L. A. Ensina and the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) through the grant number 142050/2019-9 for A. R. S. Parmezan. These agencies did not have any further involvement in this paper.

Author information

Authors and Affiliations

Laboratory of Bioinformatics, Western Paraná State University (UNIOESTE), Presidente Tancredo Neves Avenue 6731, Foz do Iguaçu, 85867-900, PR, Brazil
Newton Spolaôr, Huei Diana Lee, Weber Shoity Resende Takaki, Leandro Augusto Ensina, Antonio Rafael Sabino Parmezan, Jefferson Tales Oliva & Feng Chung Wu
Laboratory of Computational Intelligence, Institute of Mathematics and Computer Science, University of São Paulo (USP), São Paulo, SP, Brazil
Antonio Rafael Sabino Parmezan
Federal University of Technology (UTFPR), Pato Branco, PR, Brazil
Jefferson Tales Oliva
Service of Coloproctology, Faculty of Medical Sciences, University of Campinas (UNICAMP), Campinas, SP, Brazil
Claudio Saddy Rodrigues Coy & Feng Chung Wu

Authors

Newton Spolaôr
View author publications
You can also search for this author in PubMed Google Scholar
Huei Diana Lee
View author publications
You can also search for this author in PubMed Google Scholar
Weber Shoity Resende Takaki
View author publications
You can also search for this author in PubMed Google Scholar
Leandro Augusto Ensina
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Rafael Sabino Parmezan
View author publications
You can also search for this author in PubMed Google Scholar
Jefferson Tales Oliva
View author publications
You can also search for this author in PubMed Google Scholar
Claudio Saddy Rodrigues Coy
View author publications
You can also search for this author in PubMed Google Scholar
Feng Chung Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Newton Spolaôr.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Calibration text in Brazilian Portuguese

Tudo indica que a Reforma da Previdência será o tema de destaque. A proposta deve começar a ser discutida no plenário na quinta-feira, mas a expectativa de votação é só pra semana que vem. Está na pauta do plenário ainda uma proposta que parcela dívidas dos produtores rurais com a previdência, que substitui uma medida provisória que perdeu a validade. O texto base foi aprovado.

1.2 Queries used for video retrieval

Table 12 First part of the queries applied in the databases built from news videos

A video indexing and retrieval computational prototype based on transcribed speech

Abstract

Access this article

Similar content being viewed by others

What Speech Recognition Accuracy is Needed for Video Transcripts to be a Useful Search Interface?

An Investigation of Cross-Language Information Retrieval for User-Generated Internet Video

A Corpus with Wavesurfer and TEI: Speech and Video in TEITOK

Data availability

Code availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 Calibration text in Brazilian Portuguese

1.2 Queries used for video retrieval

1.3 Speech recognition results

1.4 Video retrieval results

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation