Personal and Ubiquitous Computing

, Volume 17, Issue 5, pp 951–963 | Cite as

Top-k entities query processing on uncertainly fused multi-sensory data

  • Dexi Liu
  • Changxuan Wan
  • Naixue Xiong
  • Jong Hyuk Park
  • Seungmin Rho
Original Article


Sensor fusion is the combining of sensory data from disparate sources such that the resulting information is in some sense better than would be possible when these sources were used individually. The natural uncertainty exists in these data because sensors are not precise enough. Hence, the intuitive method to store this kind of data is using uncertain database. Finding the top-k entities according to one or more attributes is a powerful technique when the uncertain database contains large quantity of data. However, compared to top-k in traditional databases, queries over uncertain database are more complicated because of the existence of exponential possible worlds. We propose a method to process entity–based global top-k aggregate queries in uncertain database, which returns the top-k entities that have the highest aggregate value. Our method has two levels, entity state generation and G-topk-E query processing. In the former level, entity states, which satisfy the properties of x-tuple, are generated one after the other according to their aggregate values, while in the latter level, dynamic programming–based global top-k entity query processing is employed to return the answers. Comprehensive experiments on different data sets demonstrate the effectiveness of the proposed solutions.


G-topk-E-Agg query e-Tuple, dynamic programming algorithm Uncertain database Multi-sensory data 



This work is supported by Natural Science Foundation of China (No. 60803105), Science & Technology Project of Department of Education of Jiangxi Province (No. GJJ08508). The author is grateful for the anonymous reviewers of the 4th International Symposium on Security and Multimodality in Pervasive Environments (SMPE2010) who made constructive comments.


  1. 1.
    Halevy A, Rajaraman A, Ordille J (2006) Data integration: the teenage year. In: Proceedings of VLDB 2006. pp 9–16Google Scholar
  2. 2.
    Chaudhuri S, Ganjam K, Ganti V, Motwani R (2003) Robust and efficient fuzzy match for online data cleaning. In: Proceedings of SIGMOD 2003. pp 313–324Google Scholar
  3. 3.
    Gupta R, Sarawagi S (2006) Creating probabilistic databases from information extraction models. In: Proceedings of VLDB 2006. pp 965–976Google Scholar
  4. 4.
    Deshpande A, Guestrin C, Madden S, Hellerstein J, Hong W (2004) Model-driven data acquisition in sensor networks. In: Proceedings of VLDB 2004. pp 588–599Google Scholar
  5. 5.
    Jeffery SR, Garofalakis M, Franklin MJ (2006) Adaptive cleaning for RFID data streams. In: Proceedings of VLDB 2006. pp 163–174Google Scholar
  6. 6.
    Liu L (2007) From data privacy to location privacy: models and algorithms. In: Proceedings of VLDB 2007. pp 1429–1430Google Scholar
  7. 7.
    Dalvi N, Suciu D (2007) Management of probabilistic data foundations and challenges. In: Proceedings of SIGMOD 2007. pp 1–12Google Scholar
  8. 8.
    Abiteboul S, Kanellakis P, Grahne G (1987) On the representation and querying of sets of possible worlds. ACM SIGMOD Rec 16(3):34–48CrossRefGoogle Scholar
  9. 9.
    Green TJ, Tannen V (2006) Models for incomplete and probabilistic information. IEEE Date Eng Bull 29(1):17–24Google Scholar
  10. 10.
    Sarma AD, Benjelloun O, Halevy A, Widom J (2006) Working models for uncertain data. In: Proceedings of ICDE 2006. p 7-7Google Scholar
  11. 11.
    Antova L, Koch C, Olteanu D (2007) \( 10^{10^{6}}\) Worlds and beyond: efficient representation and processing of incomplete information. In: Proceedings of ICDE 2007. pp 1021–1040Google Scholar
  12. 12.
    Cheng R, Kalashnikov D, Prabhakar S (2003) Evaluating probabilistic queries over imprecise data. In: Proceedings of SIGMOD 2003. pp 551–562Google Scholar
  13. 13.
    Dalvi N, Suciu D (2007) Efficient query evaluation on probabilistic databases. VLDB J 16(4):523–544CrossRefGoogle Scholar
  14. 14.
    Antova L, Koch C, Olteanu D (2007) From complete to incomplete information and back. In: Proceedings of SIGMOD 2007. pp 713–724Google Scholar
  15. 15.
    Tao Y, Cheng R, Xiao X, Ngai WK, Kao B, Prabhakar S (2005) Indexing multi-dimensional uncertain data with arbitrary probability density functions. In: Proceedings of VLDB 2005. pp 922–933Google Scholar
  16. 16.
    Singh S, Mayfield C, Prabhakar S, Shah R, Hambrusch S (2007) Indexing uncertain categorical data. In: Proceedings of ICDE 2007. pp 616–625Google Scholar
  17. 17.
    Ilyas IF, Beskales G, Soliman MA (2008) Survey of Top-k query processing techniques in relational database systems. ACM Comput Surv 40(4):1–58CrossRefGoogle Scholar
  18. 18.
    Soliman MA, Ilyas IF, Chang KC (2008) Probabilistic Top-k and ranking-aggregate queries. TODS 33(3) 13:1–13:54Google Scholar
  19. 19.
    Soliman MA, Ilyas IF, Chang KC (2007) Top-k query processing in uncertain databases. In: Proceedings of ICDE 2007. pp 896–905Google Scholar
  20. 20.
    Lian X, Chen L (2008) Probabilistic ranked queries in uncertain databases. In: Proceedings of EDBT 2008. pp 511–522Google Scholar
  21. 21.
    Hua M, Pei J, Zhang W, Lin X (2008) Efficiently answering probabilistic threshold Top-k queries on uncertain data. In: Proceedings of ICDE 2008. pp 1357–1364Google Scholar
  22. 22.
    Zhang X, Chomicki J (2008) On the semantics and evaluation of Top-k queries in probabilistic databases. In: Proceedings of DBRank 2008. pp 556–563Google Scholar
  23. 23.
    Cormode G, Li F, Yi K (2009) Semantics of ranking queries for probabilistic data and expected ranks. In: Proceedings of ICDE 2009. pp 305–316Google Scholar
  24. 24.
    Yi K, Li F, Srivastava D, Kollios G (2008) Efficient processing of Top-k queries in uncertain databases with X-relations. IEEE TKDE 20(12):1669–1682Google Scholar
  25. 25.
    Jin Ch, Yi K, Chen L, Yu J X, Lin X (2008) Sliding-window Top-k queries on uncertain streams. In: Proceedings of VLDB 2008. pp 301–312Google Scholar
  26. 26.
    Beskales G, Soliman MA, Ilyas IF (2008) Efficient search for the Topk probable nearest neighbors in uncertain databases. In: Proceedings of VLDB 2008. pp 326–339Google Scholar
  27. 27.
    Agrawal P, Benjelloun O, Das Sarma A, Hayworth C, Nabar S, Sugihara T, and Widom J (2006) Trio: a system for data, uncertainty, and lineage. In: Proceedings of VLDB 2006. pp 1151–1154Google Scholar
  28. 28.
    Liu D (2009) Dynamic programming based Top-k aggregate queries in uncertain database. J Inf Comput Sci 6(3):1589–1596Google Scholar

Copyright information

© Springer-Verlag London Limited 2012

Authors and Affiliations

  • Dexi Liu
    • 1
  • Changxuan Wan
    • 1
  • Naixue Xiong
    • 2
  • Jong Hyuk Park
    • 3
  • Seungmin Rho
    • 4
  1. 1.Jiangxi Key Laboratory of Data and Knowledge Engineering, School of Information TechnologyJiangxi University of Finance and EconomicsNanchangChina
  2. 2.Department of Computer ScienceGeorgia State UniversityAtlantaUSA
  3. 3.Department of Computer Science and EngineeringSeoul National University of Science and TechnologySeoulKorea
  4. 4.School of Electrical EngineeringKorea UniversitySeoulKorea

Personalised recommendations