Querying Objects Modeled by Arbitrary Probability Distributions

  • Christian Böhm
  • Peter Kunath
  • Alexey Pryakhin
  • Matthias Schubert
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4605)


In many modern applications such as biometric identification systems, sensor networks, medical imaging, geology, and multimedia databases, the data objects are not described exactly. Therefore, recent solutions propose to model data objects by probability density functions(pdf). Since a pdf describing an uncertain object is often not explicitly known, approximation techniques like Gaussian mixture models(GMM) need to be employed. In this paper, we introduce a method for efficiently indexing and querying GMMs allowing fast object retrieval for arbitrary shaped pdf. We consider probability ranking queries which are very important for probabilistic similarity search. Our method stores the components and weighting functions of each GMM in an index structure. During query processing the mixture models are dynamically reconstructed whenever necessary. In an extensive experimental evaluation, we demonstrate that GMMs yield a compact and descriptive representation of video clips. Additionally, we show that our new query algorithm outperforms competitive approaches when answering the given probabilistic queries on a database of GMMs comprising about 100.000 single Gaussians.


Sensor Network Video Clip Gaussian Mixture Model Query Processing Index Structure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Faradjian, A., Gehrke, J., Bonnet, P.: GADT: A Probability Space ADT For Representing and Querying the Physical World. In: Proc. 18th Int. Conf. on Data Engineering (ICDE 2002),San Jose, CA, USA p. 201 (2002)Google Scholar
  2. 2.
    Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating Probabilistic Queries over Imprecise Data. In: Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD 2003), San Diego, CA, USA pp. 551–562 (2003)Google Scholar
  3. 3.
    Cheng, R., Xia, Y., Prabhakar, S., Shah, R., Vitter, J.S.: Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data. In: Proc. 30th Int. Conf. on Very Large Data Bases (VLDB 2004), Toronto, Cananda, pp. 876–887 (2004)Google Scholar
  4. 4.
    Deshpande, A., Guestrin, C., Madden, S., Hellerstein, J., Hong, W.: Model-driven data acquisition in sensor networks. In: Proc. 30th Int. Conf. on Very Large Data Bases (VLDB 2004), Toronto, Cananda (2004)Google Scholar
  5. 5.
    Tao, Y., Cheng, R., Xiao, X., Ngai, W.K., Kao, B., Prabhakar, S.: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions. In: Proc. 30th Int. Conf. on Very Large Data Bases (VLDB 2005), Trondheim, Norway, pp. 922–933. (2005)Google Scholar
  6. 6.
    Böhm, C., Pryakhin, A., Schubert, M.: The Gauss-Tree: Efficient Object Identification of Probabilistic Feature Vectors. In: Proc. 22nd Int. Conf. on Data Engineering (ICDE 2006), Atlanta, GA, US, p. 9 (2006)Google Scholar
  7. 7.
    Titterington, D.M., Smith, A.F.M., Makov, U.E.: Statistical analysis of finite mixture distribution. Wiley, New York (1985)Google Scholar
  8. 8.
    Lindsay, B.G.: Mixture models: Theory, geometry, and applications (1995)Google Scholar
  9. 9.
    Greenspan, H., Goldberger, J., Mayer, A.: A probabilistic framework for spatio-temporal video representation & indexing. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 461–475. Springer, Heidelberg (2002)Google Scholar
  10. 10.
    Yang, M., Ahuja, N.: Gaussian mixture model for human skin color and its application in image and video databases. In: SPIE 1999. Proc. of the Conf. on Storage and Retrieval for Image and Video Databases, vol. 3656, pp. 458–466. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  11. 11.
    Chen, S.-C., Kashyap, R.L., Ghafoor, A.: Semantic Models for Multimedia Database Searching and Browsing. Kluwer Academic Publishers, Dordrecht (2002)Google Scholar
  12. 12.
    Srinivasan, U., Nepal, N.: Managing Multimedia Semantics. IRM Press (2005)Google Scholar
  13. 13.
    Deb, S.: Video Data Management and Information Retrieval. Idea Group Publishing (2005)Google Scholar
  14. 14.
    Gavin, D.G., Hu, F.S.: Bioclimatic modelling using gaussian mixture distributions and multiscale segmentation. Global Ecology and Biogeography 14, 491 (2005)CrossRefGoogle Scholar
  15. 15.
    Lim, P., Quek, S., Peh, K.: Application of the gaussian mixture model to drug dissolution profiles prediction. Neural Comput. Appl. 14(4), 345–352 (2005)CrossRefGoogle Scholar
  16. 16.
    Zajdel, W., Kröse, B.: Gaussian mixture model for multi-sensor tracking. In: Proc. of the 15th Dutch-Belgian Artificial Intelligence Conference (BNAIC 2003), pp. 371–378 (2003)Google Scholar
  17. 17.
    Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digital Signal Processing 10(1), 19–41 (2000)CrossRefGoogle Scholar
  18. 18.
    Yoo, S-H.: Application of a mixture model to approximate bottled water consumption distribution. Applied Economics Letters 10(3), 181–184 (2003)CrossRefGoogle Scholar
  19. 19.
    Deshpande, A., Guestrin, C., Madden, S.R.: Using Probabilistic Models for Data Management in Acquisitional Environments. In: Proc. CIDR (2005)Google Scholar
  20. 20.
    Böhm, C., Pryakhin, A., Schubert, M.: Probabilistic Ranking Queries on Gaussians. In: Proc. of the 18th Int. Conf. on Scientific and Statistical Database Management (SSDBM 2006), pp. 169–178 (2006)Google Scholar
  21. 21.
    Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluation of Probabilistic Queries over Imprecise Data in Constantly-Evolving Environments 32(1), 104–130 (2007)Google Scholar
  22. 22.
    Dai, X., Yiu, M.L., Mamoulis, N., Tao, Y., Vaitis, M.: Probabilistic Spatial Queries on Existentially Uncertain Data. In: Bauzer Medeiros, C., Egenhofer, M.J., Bertino, E. (eds.) SSTD 2005. LNCS, vol. 3633, pp. 400–417. Springer, Heidelberg (2005)Google Scholar
  23. 23.
    Ljosa, V., Singh, A.K.: APLA: Indexing arbitrary probability distributions. In: Proc. of the 23rd Int. Conf. on Data Engineering (ICDE 2007) (2007)Google Scholar
  24. 24.
    Chang, H.S., Sull, S., Lee, S.U.: Efficient Video Indexing Scheme for Content-Based Retrieval. In: IEEE Transactions on Circuits and Systems for Video Technology, vol. 9, pp. 1269–1279. IEEE Computer Society Press, Los Alamitos (1999)Google Scholar
  25. 25.
    Zhuang, Y., Rui, Y., Huang, T.S., Mehrotra, S.: Adaptive key frame extraction using unsupervised clustering. In: ICIP (1), pp. 866–870 (1998)Google Scholar
  26. 26.
    Cheung, S.S., Zakhor, A.: Efficient video similarity measurement with video signature. In: ICIP 2002. IEEE International Conference on Image Processing, vol. 1, pp. 621–624. IEEE Computer Society Press, Los Alamitos (2002)Google Scholar
  27. 27.
    Han, J., M., K.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2006)Google Scholar
  28. 28.
    Witten, I.H., E., F.: Data Mining. Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar
  29. 29.
    Guttman, A.: R-trees: A Dynamic Index Structure for Spatial Searching. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 47–57. ACM Press, New York (1984)Google Scholar
  30. 30.
    Berchtold, S., Keim, D.A., Kriegel, H.P.: The X-Tree: An Index Structure for High-Dimensional Data. In: Proc. 22nd Int. Conf. on Very Large Data Bases (VLDB 1996), Bombay, India, pp. 28–39 (1996)Google Scholar
  31. 31.
    Eiter, T., Mannila, H.: Distance measures for point sets and their computation. Acta Informatica 34(2), 103–133 (1997)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Christian Böhm
    • 1
  • Peter Kunath
    • 1
  • Alexey Pryakhin
    • 1
  • Matthias Schubert
    • 1
  1. 1.Institute for Computer Science, Ludwig-Maximilians Universität München 

Personalised recommendations