Abstract
Top-k dominating queries, which return the k best items with a comprehensive “goodness” criterion based on dominance, have attracted considerable attention recently due to its important role in many data mining applications including multi-criteria decision making. In the Big Data era, the modes of data storage and processing are becoming distributed, and data is incomplete commonly in some real applications. The related existing researches focus on centralized datasets, or on complete data in distributed environments, and do not involve incomplete data in distributed environments. In this work, we present the first study for processing top-k dominating queries on incomplete data in distributed environments. We show that, through detailed analysis, even though the dominance relation on incomplete data objects is non-transitive in general, the transitive dominance relation holds for some incomplete data objects with different bitmaps. We then propose an novel algorithm TKDI-MR based on MapReduce for processing TKD queries on incomplete data in distributed environments utilizing the aforementioned property. Extensive experiments with both real-world and large-scale synthetic datasets demonstrate that our approach is able to achieve good efficiency and stability.
This work is supported by the Jiangsu Natural Science Foundation (No. 202010006) and the Project of Shanghai Information Development Special Fund (No. XX-XXFZ-05-16-0139).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amagata, D., Sasaki, Y., Hara, T., Nishio, S.: Efficient processing of top-k dominating queries in distributed environments. World Wide Web-internet Web Inf. Syst. 19(4), 545–577 (2016)
Borzonyi, S.: The skyline operator. In: Proceedings of the 17th International Conference on Data Engineering, pp. 421–430 (2001)
Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)
Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. ACM Sigops Oper. Syst. Rev. 37(5), 29–43 (2003)
Han, X., Li, J., Gao, H.: Efficient Top-k Dominating Computation on Massive Data. IEEE Educational Activities Department (2017)
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4), 1–58 (2008)
Khalefa, M.E., Mokbel, M.F., Levandoski, J.J.: Skyline query processing for incomplete data. In: IEEE International Conference on Data Engineering, pp. 556–565 (2008)
Man, L.Y., Mamoulis, N.: Efficient processing of top-k dominating queries on multi-dimensional data. In: International Conference on Very Large Data Bases, University of Vienna, Austria, pp. 483–494, September 2007
Miao, X., Gao, Y., Zheng, B., Chen, G., Cui, H.: Top-k dominating queries on incomplete data. In: IEEE International Conference on Data Engineering, pp. 1500–1501 (2016)
Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database systems. ACM Trans. Database Syst. 30(1), 41–82 (2005)
Saha, B., Srivastava, D.: Data quality: The other face of big data. In: IEEE International Conference on Data Engineering, pp. 1294–1297 (2014)
Tiakas, E., Papadopoulos, A.N., Manolopoulos, Y.: Progressive processing of subspace dominating queries. VLDB J. 20(6), 921–948 (2011)
Yiu, M.L., Mamoulis, N.: Multi-dimensional top-k dominating queries. VLDB J. 18(3), 695–718 (2009)
Zhan, L., Zhang, Y., Zhang, W., Lin, X.: Identifying top k dominating objects over uncertain data. In: International Conference on Database Systems for Advanced Applications, pp. 388–405 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Ding, X., Yan, C., Zhao, Y., Yang, Z. (2018). Efficient Processing of Top-K Dominating Queries on Incomplete Data Using MapReduce. In: Sun, X., Pan, Z., Bertino, E. (eds) Cloud Computing and Security. ICCCS 2018. Lecture Notes in Computer Science(), vol 11063. Springer, Cham. https://doi.org/10.1007/978-3-030-00006-6_44
Download citation
DOI: https://doi.org/10.1007/978-3-030-00006-6_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00005-9
Online ISBN: 978-3-030-00006-6
eBook Packages: Computer ScienceComputer Science (R0)