Abstract
There has been much interest in answering top-k queries on probabilistic data in various applications such as market analysis, personalised services, and decision making. In relation to probabilistic data, the most common problem in answering top-k queries is selecting the semantics of results according to their scores and top-k probabilities. In this paper, we propose a novel top-k best probability query to obtain results which are not only the best top-k scores but also the best top-k probabilities. We also introduce an efficient algorithm for top-k best probability queries without requiring the user’s defined threshold. Then, the top-k best probability answer is analysed, which satisfies the semantic ranking properties of queries [3,18] on uncertain data. The experimental studies are tested with both the real data to verify the effectiveness of the top-k best probability queries and the efficiency of our algorithm.
Keywords
- Top-k Query
- Query Processing
- Uncertain data
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C., Yu, P.S.: A survey of uncertain data algorithms and applications. IEEE TKDE 21, 609–623 (2009)
Atallah, M.J., Qi, Y.: Computing all skyline probabilities for uncertain data. In: PODS, pp. 279–287 (2009)
Li, F., Cormode, G., Yi, K.: Semantics of ranking queries for probabilistic data and expected ranks. In: ICDE, March 29-April 2, pp. 305–316 (2009)
Ge, T., Zdonik, S., Madden, S.: Top-k queries on uncertain data: on score distribution and typical answers. In: SIGMOD, pp. 375–388 (2009)
Getoor, L.: Learning Probabilistic Relational Models. In: Choueiry, B.Y., Walsh, T. (eds.) SARA 2000. LNCS (LNAI), vol. 1864, pp. 322–323. Springer, Heidelberg (2000)
Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach. In: SIGMOD, pp. 673–686 (2008)
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM 40, 1–58 (2008)
Jan, C., Parke, G., Jarek, G., Dongming, L.: Skyline with presorting theory & optimizations. IIPWM 31, 595–604 (2005)
Jin, C., Yi, K., Chen, L., Yu, J.X., Lin, X.: Sliding-window top-k queries on uncertain streams. In: VLDB, pp. 301–312 (2008)
Lange, K.: Numerical analysis for statisticians. Springer, Heidelberg (1999)
Li, J., Saha, B., Deshpande, A.: A unified approach to ranking in probabilistic databases. In: VLDB, pp. 502–513 (2009)
Pang-Ning, T., Michael, S., Vipin, K.: Introduction to data mining. Library of Congress (2006)
Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skyline queries. In: SIGMOD, pp. 467–478 (2003)
Pei, J., Jiang, B., Lin, X., Yuan, Y.: Probabilistic skylines on uncertain data. In: VLDB, pp. 15–26 (2007)
Sarma, A.D., Benjelloun, O., Halevy, A., Widom, J.: Working models for uncertain data. In: ICDE (2006)
Soliman, M.A., Ilyas, I.F., Chang, K.C.-C.: Top-k query processing in uncertain databases. In: ICDE, pp. 896–905 (2007)
Soliman, M.A., Ilyas, I.F., Chang, K.C.–C.: Probabilistic top-k & ranking-aggregate queries. ACM Trans. Database Syst. 33, 13:1–13:54 (2008)
Xi, Z., Jan, C.: Semantics and evaluation of top-k queries in probabilistic databases. Distributed Parallel Databases 26(1), 67–126 (2009)
Yan, D., Ng, W.: Robust Ranking of Uncertain Data. In: Yu, J.X., Kim, M.H., Unland, R. (eds.) DASFAA 2011, Part I. LNCS, vol. 6587, pp. 254–268. Springer, Heidelberg (2011)
Yi, K., Li, F., Kollios, G., Srivastava, D.: Efficient processing of top-k queries in uncertain databases with x-relations. TKDE 20, 1669–1682 (2008)
Zhang, S., Zhang, C.: A probabilistic data model and its semantics. Journal of Research & Practice in Information Technology 35, 237–256 (2003)
Zhang, W., Lin, X., Pei, J., Zhang, Y.: Managing uncertain data: probabilistic approaches. In: Web-Age Information Management (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Le, T.M.N., Cao, J. (2012). Top-k Best Probability Queries on Probabilistic Data. In: Lee, Sg., Peng, Z., Zhou, X., Moon, YS., Unland, R., Yoo, J. (eds) Database Systems for Advanced Applications. DASFAA 2012. Lecture Notes in Computer Science, vol 7239. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29035-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-29035-0_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29034-3
Online ISBN: 978-3-642-29035-0
eBook Packages: Computer ScienceComputer Science (R0)
