Abstract
Recently, keyword-based query interface is a de facto standard for information retrieval. A user gives a keyword and gets necessary objects that are closely related to the keyword. How to select the necessary objects is one of the most important problem in database literature. Top-k query is popular method to select important objects from large candidate objects. A user specfies a scoring function and k. Then, the top-k query selects the k objects based on the scoring function. However, each user may have different scoring function to select the top-k object, which means the top-k objects are valuable only for users who share the same scoring function. In this paper, we propose k-objects selection function that selects various k objects that are preferable for all users who may have different scoring function. We applied the idea of skyline queries to select the k objects in this paper. We also considered efficient computation by using MapReduce flamework.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of ACM PODS, pp. 102–113 (2001)
Bruno, N., Gravano, L., Marian, A.: Evaluating top-k queries over web-accessible databases. ACM Transactions on Database Systems 29(2), 319–362 (2004)
Chang, K.C., Hwang, S.-W.: Minimal probing: supporting expensive predicates for top-k queries. In: Proceedings of ACM SIGMOD, pp. 346–357 (2002)
Hwang, S.-W., Chang, K.C.: Optimizing access cost for top-k queries over web sources. In: Proceedings of IEEE ICDE, pp. 188–189 (2005)
Bentley, J.L., Kung, H.T., Schkolnick, M., Thompson, C.D.: On the average number of maxima in a set of vectors and applications. Journal of ACM 25(4), 536–543 (1978)
Bentley, J.L., Clarkson, K.L., Levine, D.B.: Fast linear expected-time algorithms for computing maxima and convex hulls. In: Proceedings of ACM-SIAM SODA, pp. 179–187 (1990)
Nielsen, O.B., Sobel, M.: On the distribution of the number of admissable points in a vector random sample. Theory of Probability and its Application 11(2), 249–269 (1966)
Lin, X., Yuan, Y., Zhang, Q., Zhang, Y.: Selecting stars: the k most representative skyline operator. In: Proceedings of IEEE ICDE, pp. 86–95 (2007)
Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database systems. ACM Transactions on Database Systems 30(1), 41–82 (2005)
Tao, Y., Ding, L., Lin, X., Pei, J.: Distance-based representative skyline. In: Proceedings of IEEE ICDE, pp. 892–903 (2009)
Borzsonyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proceedings of IEEE ICDE, pp. 421–430 (2001)
Kossmann, D., Ramsak, F., Rost, S.: Shooting stars in the sky: An online algorithm for skyline queries. In: Proceedings of VLDB, pp. 275–286 (2002)
Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with Presorting. In: Proceedings of IEEE ICDE, pp. 717–719 (2003)
Tan, K.-L., Eng, P.-K., Ooi, B.C.: Efficient Progressive Skyline Computation. In: Proceedings of VLDB, pp. 301–310 (2001)
Vlachou, A., Doulkeridis, C., Kotidis, Y., Vazirgiannis, M.: SKYPEER: Efficient Subspace Skyline Computation over Distributed Data. In: Proceedings of IEEE ICDE, pp. 416–425 (2007)
Fotiadou, K., Pitoura, E.: BITPEER: Continuous Subspace Skyline Computation with Distributed Bitmap Indexes. In: Proceedings of DaMaP, pp. 35–42 (2008)
Chan, C.-Y., Jagadish, H.V., Tan, K.-L., Tung, A.K.H., Zhang, Z.: On High Dimensional Skylines. In: Ioannidis, Y., et al. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 478–495. Springer, Heidelberg (2006)
Tao, Y., Xiao, X., Pei, J.: Subsky: Efficient Computation of Skylines in Subspaces. In: Proceedings of IEEE ICDE, pp. 65–65 (2006)
Tao, Y., Lin, W.: XIAO, X.: Minimal MapReduce Algorithm. In: Proceedings of ACM SIGMOD, pp. 529–540 (2013)
Park, Y., Min, J, Shim, K.: Parallel Computation of Skyline and Reverse Skyline Queries Using MapReduce. In: Proceedings of VLDB, pp. 2002–2013 (2013)
Jiang, D., Tung, A.K.H., Chen, G.: MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters. IEEE Transactions on Knowledge and Data Engineering 23(9), 1299–1311 (2011)
Blanas, S., Patel, J.M., Ercegovac, V., Rao, J., Shekita, E.J., Tian, Y.: A comparison of join algorithms for log processing in MaPreduce. In: Proceedings of ACM SIGMOD, pp. 975–986 (2010)
Vernica, R., Carey, M.J., Li, C.: Efficient parallel set-similarity joins using MapReduce. In: Proceedings of ACM SIGMOD, pp. 495–506 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Siddique, M.A., Morimoto, Y. (2014). Efficient Selection of Various k-Objects for a Keyword Query Based on MapReduce Skyline Algorithm. In: Madaan, A., Kikuchi, S., Bhalla, S. (eds) Databases in Networked Information Systems. DNIS 2014. Lecture Notes in Computer Science, vol 8381. Springer, Cham. https://doi.org/10.1007/978-3-319-05693-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-05693-7_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05692-0
Online ISBN: 978-3-319-05693-7
eBook Packages: Computer ScienceComputer Science (R0)