Skip to main content
Log in

Real-time visual content description system based on MPEG-7 descriptors

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper presents a real-time Visual Content Description System (VCDS) based on MPEG-7 descriptors. In our approach the system’s structure is divided into two parts, the first of which is the extraction of the descriptors using the VCDS. The second part uses the descriptors’ values in a particular search algorithm. We propose here original solutions for both parts. The proposed system architecture could be used for real-time video indexing and retrieval, content summarization, content delivery, surveillance, personalized services, etc. The descriptor extractor IP core, which is part of the VCDS, implements four MPEG-7 visual descriptors and was designed for ASIC implementation in CMOS 0.35 μm, which is a novel solution for a real-time content description problem. The proposed hardware architecture splits the computational burden into several threads, so that calculations are made simultaneously in order to improve the system’s speed. These methods make the hardware implementation of the most computationally demanding modules of the system more time- and power-efficient. Four different variations of the basic hardware architecture are discussed. New search algorithms based on the VCDS responses are also proposed. Experimental results demonstrate the effectiveness of the hardware architectures, and the new approach to similarity-based searching methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

References

  1. Bae B, Yang SW, Ro YM (2003) Fast MPEG-7 visual descriptor extraction using DCT coefficient. TENCON, Bangalore, pp 1136–1139

    Google Scholar 

  2. Boyd JE, Sayles M, Olsen L, Tarjan P (2004) Content description servers for networked video surveillance. In: Proceedings international conference on information technology. Coding and Computing, pp 798–803

  3. Chang SF, Puri A, Sikora T, Zhang H (2001) Overview of the MPEG-7 standard. IEEE Trans. Circuits Syst Video Technol 11:688–695

    Article  Google Scholar 

  4. Chang JY, Fung HC, Hitang YW, Chen LG (2004) Architecture of MPEG-7 color structure description generator for real-time video applications.Proc. of International Conference on Image Processing, pp 2813–2816

  5. CUDA for GPU Computing. http://news.developer.nvidia.com/2007/02/cuda_for_gpu_co.html

  6. Döller M, Kosch H, Dörflinger B, Bachlechner A, Blaschke G (2002) Demonstration of an MPEG-7 multimedia data cartridge. In: Proceedings of the tenth ACM international conference on Multimedia. Juan-les-Pins, France, pp 85–86

    Chapter  Google Scholar 

  7. Ebrahimi T, Abdeljaoued Y, Figureas RM, Divorra Escoda O (2001) MPEG-7 camera. In: Proc. international conference on image processing, vol 3. Thessaloniki, pp 600–603

  8. Eid M, Alamri A, El Saddik A (2006) MPEG-7 description of haptic applications using HAML. In: IEEE international workshop on haptic audio visual environments and their applications. HAVE’2006, Ottawa, Canada, pp 134–139

  9. Ferman AM, Krishnamachari S, Abdel-Mottaleb M, Tekalp AM, Mehrotra R (2001) Core experiment on Group-of-Frames/Pictures histogram descriptors (CT7). Technical Report #13-05, MPEG-7 Color Descriptors

  10. Kapela R, Rybarczyk A (2007) A real-time shape description system based on MPEG-7 descriptors. J Systems Archit 53:602–618

    Article  Google Scholar 

  11. Kapela R, Rybarczyk A, Śniatała P, Rudnicki R (2006) Hardware realisation of the MPEG-7 edge histogram descriptor. In: Proc. mixed design of integrated circuits and systems. MIXDES, Gdynia, Poland, pp 675–678

    Chapter  Google Scholar 

  12. Kasutani E, Yamada A (2001) The MPEG-7 color layout descriptor: a compact image feature description for high-speed image/video segment retrieval. International Conference on Image Processing, ICIP’2001, Thessaloniki, Greece

  13. Koenen R, Pereira F (2000) MPEG-7: a standardised description of audiovisual content. Signal Processing: Image Communication 16(1–2):5–13

    Article  Google Scholar 

  14. Kreppa M (2006) MDCT IP Core specification. Rev. 1.1. www.opencores.org

  15. Manjunath BS, Salembier P, Sikora T (2002) Introduction to MPEG-7. Multimedia Content Description Interface. John Wiley & Sons, Ltd

  16. Martin O, Solana MJ (2001) Programable processor for on-line computing of inverse Haar transform. Electron Lett 37(16):1050–1052

    Article  Google Scholar 

  17. Multimedia content description interface—ISO/IEC 15938-3 (2001) Part 3 Visual, Version 1, pp 44–52

  18. Ndjiki-Nya P, Novychny O, Wiegand T (2004) Video content analysis using MPEG-7 descriptors. In: 1st European conference on visual media production (CVMP). London, United Kingdom, pp 95–101

    Google Scholar 

  19. Ndjiki-Nya P, Restat J, Meiers T, Ohm JR, Seyferth A, Sniehotta R (2000) Subjective evaluation of the MPEG-7 retrieval accuracy measure (ANMRR). Technical Report #13-02, MPEG-7 Color Descriptors

  20. Savakis A, Śniatała P, Rudnicki R (2003) Real time video annotation using MPEG-7 motion activity descriptors. In: Proc. mixed design of integrated circuits and systems, vol 1. MIXDES, Łodz, Poland, pp 625–628

    Google Scholar 

  21. Savakis A, Śniatała P, Rudnicki R (2004) Hardware implementation of MPEG-7 color descriptors. In: Proc. mixed design of integrated circuits and systems, vol 1. MIXDES, Szczecin, Poland, pp 199–203

    Google Scholar 

  22. Śniatała P, Kapela R, Rudnicki R, Rybarczyk A (2007) Efficient hardware architectures of selected MPEG-7 color descriptors. EUSIPCO, Poznań, Poland, pp 1672–1675

    Google Scholar 

  23. Steiger O (2001) Smart camera for MPEG-7. Ecole Polytechnique Federale de Lausanne, Lausanne

    Google Scholar 

  24. Won CS, Park DK, Park SJ (2002) Efficient use of MPEG-7 edge histogram descriptor. ETRI J 24(1):23–30

    Article  Google Scholar 

  25. Xing B, Fu P, Sun Z, Liu Y, Zhao J, Chen M, Li X (2006) Hardware for MPEG-7 compact color descriptor based on sub-block. ICSP, China

    Google Scholar 

  26. Xu H, Mita Y, Shibata T (2002) Similarity-measure-based VLSI searching system for MPEG-7. In: Proc. of world automation congress, pp 357–363

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rafał Kapela.

Appendix: Object matching techniques

Appendix: Object matching techniques

In this section we describe three object matching techniques used in our experiments. The matching process inputs are always two color images: I , I ′′. The particular object extracted from an image i is referred to as \(O_{i}^{\prime}\). Object properties are indicated as follows:

  • .Vol—object’s area (volume);

  • .SCD—object’s SCD;

  • .EHD—object’s EHD;

  • .x, .y—the coefficients of i-th object’s mass center.

1.1 Combined matching technique

The combined matching technique relies on a two-step image similarity measurement. The steps are called griddles, because of their specific task (each one filters similar images based on different measurement techniques). The first griddle passes those images which have a sufficient number of similar features.

Assume that n, m are the number of objects in images I , I ′′ respectively. Then we have:

$$ \left\{\exists i,j \ : \ D\left(O^{\prime}_{i},O_{j}^{\prime\prime}\right)\leq t_1,t_2 \right\}\Rightarrow \left\{s=s+1\right\} \label{eq:com-match-1-griddle} $$
(8)

where

$$ D\left(a,b\right) \leq t_1, t_2 \Leftrightarrow \left\{\left[ L_1\left(a.SCD,b.SCD\right)\leq t_1 \right]\wedge\right. \left.\left[L_1\left(a.EHD,b.EHD\right)\leq t_2 \right] \right\}\\ \label{eq:com-match-dist} $$
(9)

and i = 1,..,n j = 1,..,m. s is the number of similar features on both of the images.

The second one is based on calculations of the combined distance which takes into account SCD, EHD and vertical placement of the objects on the images:

$$ \left\{\left[s\geq 0.6n\right] \vee \left[s\geq0.9m\right]\right\}\Rightarrow \left\{\forall i,j \left[\mathrm{min}\left(L_1\left(O^{'}_{i}.SCD,O^{''}_{j} .SCD\right)\right)\right] \ : \ \mathrm{d}\left(O^{'}_{i},O^{''}_{j}\right) \right\} $$

where

$$ \mathrm{d}\left(a,b\right) = \left\{1.5L_1\left(a.SCD,b.SCD\right)\right. + \left. 4L_1\left(a.EHD,b.EHD\right) + 8|a.y-b.y|\right\}\\ \label{eq:com-match-dist-final} $$
(10)

is the distance that allows us to rank images passed through first griddle.

1.2 Similar objects matching technique

This method is quite similar to the previous one. We have modified the similarity measure used in the first step—when two objects are matched as similar, the respective bits in an additional vector are set in order to exclude the objects from the remainder of the similarity measurement process.

Assume that n,m are the number of objects in images I , I ′′ respectively. We initialize a vector \(M\in\Im^{1\times n}, \ \left\{ \forall i \ : \ M\left(i\right)=0 \right\}\).

The first griddle works as follows:

$$ \left\{\left[\exists i,j \ : \ D\left(O^{'}_{i},O_{j}^{''}\right)\leq t_1,t_2\right] \wedge \left[M\left(i\right)=0\right] \right\}\Rightarrow \left\{s=s+1, M\left(i\right)=1\right\} $$

The distance D is computed in the same manner as in the previous matching method (8). Note that in this technique, the placement of the objects is not taken into account. For further distance calculations we assume

$$ \left\{\left[s\geq 0.6n\right] \vee \left[s\geq 0.9m\right]\right\}\Rightarrow \left\{\forall i,j \left[\mathrm{min}\left(L_1\left(O^{'}_{i}.SCD,O^{''}_{j} .SCD\right)\right)\right] \ : \ \mathrm{d}\left(O^{'}_{i},O^{''}_{j}\right) \right\} $$

where

$$ \mathrm{d}\left(a,b\right) = \left\{1.5L_1\left(a.SCD,b.SCD\right) +\right. \left. 4L_1\left(a.EHD,b.EHD\right)\right\} \label{eq:sim-match-dist-final} $$
(11)

is the distance that allows us to rank images passed through first griddle.

1.3 The biggest object matching technique

The biggest object matching technique differ considerably from the other techniques presented. It does not contain complex similarity matching methods—the idea is to find the two biggest objects in the reference image and to calculate the distance between the selected objects from the first image, and the most similar objects from the second image. The assumption that we must use is that n,m ≥ 2. Then we have:

$$\begin{array}{rrr} \left\{\exists i,j \left[\mathrm{max}\left(O_{i}^{'}.Vol\right)\right.\right. \wedge \\ \left.\left.\mathrm{min}\left(L_1\left(O_{i}^{'}.SCD,O_{j}^{''} .SCD\right)\right)\right] \ : \ d(O_{i}^{'},O_{j}^{''})\right\} \wedge\\ \left\{\exists l,k \left[\mathrm{max}\left(O_{l}^{'}.Vol\right)\right.\right. \wedge \\ \left.\left.\mathrm{min}\left(L_1\left(O_{l}^{'}.SCD,O_{k}^{''} .SCD\right)\right)\right] \ : \ d(O_{l}^{'},O_{k}^{''})\right\} \wedge\\ \left(l\neq i\right)\wedge \left(l\neq k\right)\Rightarrow 0.5\mathrm{d}\left(O^{'}_{i},O^{''}_{j}\right)+0.5\mathrm{d}\left(O^{'}_{l},O^{ ''}_{k} \right) \end{array} $$

Where the distance d is computed as follows:

$$ \mathrm{d}\left(a,b\right) = L_1(a.SCD,b.SCD) \label{eq:big-match-dist-final} $$
(12)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kapela, R., Śniatała, P. & Rybarczyk, A. Real-time visual content description system based on MPEG-7 descriptors. Multimed Tools Appl 53, 119–150 (2011). https://doi.org/10.1007/s11042-010-0493-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-010-0493-3

Keywords

Navigation