Skip to main content
Log in

Taking the Big Picture: representative skylines based on significance and diversity

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

The skyline is a popular operator to extract records from a database when a record scoring function is not available. However, the result of a skyline query can be very large. The problem addressed in this paper is the automatic selection of a small number \((k)\) of representative skyline records. Existing approaches have only focused on partial aspects of this problem. Some try to identify sets of diverse records giving an overall approximation of the skyline. These techniques, however, are sensitive to the scaling of attributes or to the insertion of non-skyline records into the database. Others exploit some knowledge of the record scoring function to identify the most significant record, but not sets of records representative of the whole skyline. In this paper, we introduce a novel approach taking both the significance of all the records and their diversity into account, adapting to available knowledge of the scoring function, but also working under complete ignorance. We show the intractability of the problem and present approximate algorithms. We experimentally show that our approach is efficient, scalable and that it improves existing works in terms of the significance and diversity of the results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

Notes

  1. \(\le \) and \(<\) can also be used if for some attributes lower values are preferred, but for simplicity we will assume to prefer higher values throughout the paper without loss of generality.

  2. E.g., http://en.wikipedia.org/wiki/Diversity_index.

  3. A bound \(b\) means that the solution found by the algorithm is guaranteed to be at least \(1/b\) of the maximum achievable value.

  4. With this notation, we indicate the min between \(\overline{\delta }(\)Nas, Nov\()\) and \(\overline{\delta }(\)Nas, How\()\).

  5. http://www.databasebasketball.com.

  6. http://www.bbr.dk/fatibbr.

  7. Notice that our integration variable is \(t\), i.e., the thresholds of which we do not know the exact values.

  8. 0 indicates the null function, returning 0 for all input, but any significance function can be used because with \(\lambda =1\) its contribution to the sum is 0.

References

  1. Balke, W.T., Zheng, J., Güntzer, U.: Approaching the efficient frontier: cooperative database retrieval using high-dimensional skylines. In: Zhou, L., Ooi, B.C., Meng, X. (eds.) Database Systems for Advanced Applications, Lecture Notes in Computer Science, vol. 3453, pp. 410–421. Springer, Berlin (2005). doi:10.1007/b107189. http://www.springerlink.com/content/l2c92arjwdva2lvt/

  2. Bartolini, I., Zhang, Z., Papadias, D.: Collaborative filtering with personalized skylines. IEEE Trans. Knowl. Data Eng. 23(2), 190–203 (2011). doi:10.1109/TKDE.2010.86. http://www.computer.org/csdl/trans/tk/2011/02/ttk2011020190-abs.html

    Google Scholar 

  3. Beecks, C., Assent, I., Seidl, T.: Content-based multimedia retrieval in the presence of unknown user preferences. In: Lee, K.T., Tsai, W.H., Liao, H.Y.M., Chen, T., Hsieh, J.W., Tseng, C.C. (eds.) MMM (1), Lecture Notes in Computer Science, vol. 6523, pp. 140–150. Springer, Berlin (2011)

  4. Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE, pp. 421–430. IEEE Computer Society (2001)

  5. Chan, C.Y., Eng, P.K., Tan, K.L.: Stratified computation of skylines with partially-ordered domains. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data—SIGMOD ’05, p. 203. ACM Press, New York, NY, USA (2005). doi:10.1145/1066157.1066181. http://dl.acm.org/citation.cfm?id=1066157.1066181

  6. Chan, C.Y., Jagadish, H.V., Tan, K.L., Tung, A.K.H., Zhang, Z.: Finding k-dominant skylines in high dimensional space. In: Proceedings of the SIGMOD, pp. 503–514 (2006)

  7. Chan, C.Y., Jagadish, H.V., Tan, K.L., Tung, A.K.H., Zhang, Z.: On high dimensional skylines. In: Proceedings of the EDBT, pp. 478–495 (2006)

  8. Chandra, B., Halldórsson, M.M.: Approximation algorithms for dispersion problems. J. Algorithms 38(2), 438–465 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  9. Chaudhuri, S., Dalvi, N.N., Kaushik, R.: Robust cardinality and cost estimation for skyline operator. In: Liu, L., Reuter, A., Whang, K.Y., Zhang, J. (eds.) ICDE, p. 64. IEEE Computer Society (2006)

  10. Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. In: ICDE (2003)

  11. Godfrey, P.: Skyline cardinality for relational processing. In: D. Seipel, J.M.T. Torres (eds.) FoIKS, Lecture Notes in Computer Science, vol. 2942, pp. 78–97. Springer, Berlin (2004)

  12. Gollapudi, S., Sharma, A.: An axiomatic framework for result diversification. IEEE Data Eng. Bull. 32(4), 7–14 (2009)

    Google Scholar 

  13. Goncalves, M., Vidal, M.E.: Top-k skyline: a unified approach. In: Meersman, R., Tari, Z., Herrero, P. (eds.) On the Move to Meaningful Internet Systems 2005: OTM 2005 Workshops, Lecture Notes in Computer Science, vol. 3762, pp. 790–799. Springer, Berlin (2005). doi:10.1007/11575863. http://www.springerlink.com/content/1d8q8933rvnm8280/

  14. Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top- k query processing techniques in relational database systems. ACM Comput. Surv. 40(4), 1–58 (2008)

    Google Scholar 

  15. Jin, W., Han, J., Ester, M.: Mining thick skylines over large databases. In: Boulicaut, J.F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD, Lecture Notes in Computer Science, vol. 3202, pp. 255–266. Springer, Berlin (2004)

  16. Koltun, V., Papadimitriou, C.H.: Approximately dominating representatives. Theor. Comput. Sci. 371(3), 148–154 (2007). doi:10.1016/j.tcs.2006.11.003. http://dl.acm.org/citation.cfm?id=1225304.1225532

    Google Scholar 

  17. Kontaki, M., Papadopoulos, A.N., Manolopoulos, Y.: Continuous Top-k dominating queries in subspaces. In: 2008 Panhellenic Conference on Informatics, pp. 31–35. IEEE (2008). doi:10.1109/PCI.2008.45. http://dl.acm.org/citation.cfm?id=1439269.1439313

  18. Kossmann, D., Ramsak, F., Rost, S.: Shooting stars in the sky: an online algorithm for skyline queries. In: Proceedings of the VLDB, pp. 275–286 (2002)

  19. Lahaie, S.M., Parkes, D.C.: Applying learning algorithms to preference elicitation. In: Proceedings of the 5th ACM Conference on Electronic Commerce—EC ’04, p. 180. ACM Press, New York, NY, USA (2004). doi:10.1145/988772.988800. http://dl.acm.org/citation.cfm?id=988772.988800

  20. Lee, J., You, G.w., Hwang, S.w.: Telescope: zooming to interesting skylines. In: DASFAA, pp. 539–550 (2007). http://dl.acm.org/citation.cfm?id=1783823.1783883

  21. Lee, J., You, Gw, Hwang, Sw: Personalized top-k skyline queries in high-dimensional space. Inf. Syst. 34(1), 45–61 (2009). doi:10.1016/j.is.2008.04.004

    Article  MATH  Google Scholar 

  22. Lee, J., You, G.W., Hwang, S.W., Selke, J., Balke, W.T.: Optimal preference elicitation for skyline queries over categorical domains. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) Database and Expert Systems Applications, Lecture Notes in Computer Science, vol. 5181, pp. 610–624. Springer, Berlin (2008). doi:10.1007/978-3-540-85654-2. http://dl.acm.org/citation.cfm?id=1430456.1430523

  23. Lee, K.C.K., Lee, W.C., Zheng, B., Li, H., Tian, Y.: Z-SKY: an efficient skyline query processing framework based on Z-order. VLDB J. 19(3), 333–362 (2010). doi:10.1007/s00778-009-0166-x. http://www.springerlink.com/content/b6q547wq075114px/

    Google Scholar 

  24. Li, C., Pearl, P.: Survey of Preference Elicitation Methods (2004). http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.58.5265

  25. Li, H., Tan, Q., Lee, W.C.: Efficient progressive processing of skyline queries in peer-to-peer systems. In: Infoscale (2006)

  26. Lian, X., Chen, L.: Top-k dominating queries in uncertain databases. In: Proceedings of the 12th International Conference on Extending Database Technology Advances in Database Technology—-EDBT ’09, p. 660. ACM Press, New York, NY, USA (2009). doi:10.1145/1516360.1516437. http://dl.acm.org/citation.cfm?id=1516360.1516437

  27. Lin, X., Yuan, Y., Zhang, Q., Zhang, Y.: Selecting stars: the k most representative skyline operator. In: ICDE, pp. 86–95. IEEE (2007)

  28. Lo, E., Yip, K.Y., Lin, K.I., Cheung, D.W.: Progressive skylining over Web-accessible databases. Data Knowl. Eng. 57(2), 122–147 (2006). doi:10.1016/j.datak.2005.04.003. http://dl.acm.org/citation.cfm?id=1141066.1141068

    Google Scholar 

  29. Lofi, C., Güntzer, U., Balke, W.T.: Efficient computation of trade-off skylines. In: Proceedings of the 13th International Conference on Extending Database Technology—EDBT ’10, p. 597. ACM Press, New York, NY, USA (2010). doi:10.1145/1739041.1739112. http://dl.acm.org/citation.cfm?id=1739041.1739112

  30. Lu, H., Jensen, C.S., Zhang, Z.: Flexible and efficient resolution of skyline query size constraints. IEEE Trans. Knowl. Data Eng. 23(7), 991–1005 (2011). doi:10.1109/TKDE.2010.47. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5432175

    Google Scholar 

  31. Magnani, M., Assent, I.: Anytime skyline query processing for interactive systems. In: DBRank (2012)

  32. Magnani, M., Assent, I.: From stars to galaxies: skyline queries on aggregate data. In: Proceedings of the 16th International Conference on Extending Database Technology (EDBT) (2013)

  33. Magnani, M., Assent, I., Hornbæk, K., Jacobsen, M.R., Larsen, K.F.: Skyview: a user evaluation of the skyline operator. In: CIKM Conference (2013)

  34. Milton Friedman, L.J.S.: The utility of choices involving risk. J. Polit. Econ. 4, 279–304 (1948)

    Article  Google Scholar 

  35. Minack, E., Demartini, G., Nejdl, W.: Current approaches to search result diversification. In: 1st International Workshop on Living Web: Making Web Diversity a True Asset (ISWC Conference) (2009)

  36. Mindolin, D., Chomicki, J.: Discovering relative importance of skyline attributes. Proc. VLDB Endow. 2(1), 610–621 (2009). http://dl.acm.org/citation.cfm?id=1687627.1687697

  37. Mindolin, D., Chomicki, J.: Preference elicitation in prioritized skyline queries. VLDB J. 20(2), 157–182 (2011). doi:10.1007/s00778-011-0227-9. http://www.springerlink.com/content/0123259721n421l5/

    Google Scholar 

  38. Nanongkai, D., Sarma, A.D., Lall, A., Lipton, R.J., Xu, J.: Regret-minimizing representative databases. Proc. VLDB Endow. 3, 1114–1124 (2010). http://dl.acm.org/citation.cfm?id=1920841.1920980

  39. Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skyline queries. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 467–478. ACM New York, NY, USA (2003)

  40. Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database systems. ACM TODS 30(1), 41–82 (2005)

    Article  Google Scholar 

  41. Raghavan, V., Rundensteiner, E.A.: Progressive result generation for multi-criteria decision support queries. In: ICDE (2010)

  42. Raghavan, V., Rundensteiner, E.A., Srivastava, S.: Skyline and mapping aware join query evaluation. Inf. Syst. 36(6), 917–936 (2011). doi:10.1016/j.is.2011.03.002

    Article  Google Scholar 

  43. Sarma, A.D., Lall, A., Nanongkai, D., Lipton, R.J., Xu, J.J.: Representative skylines using threshold-based preference distributions. In: Abiteboul, S., Böhm, K., Koch, C., Tan, K.L. (eds.) ICDE, pp. 387–398. IEEE Computer Society (2011)

  44. Siddique, M.A., Morimoto, Y.: Extended k-dominant skyline in high dimensional space. In: 2010 International Conference on Information Science and Applications, pp. 1–8. IEEE (2010). doi:10.1109/ICISA.2010.5480364. http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5480364

  45. Skoutas, D., Sacharidis, D., Simitsis, A., Kantere, V., Sellis, T.: Top- k dominant web services under multi-criteria matching. In: Proceedings of the 12th International Conference on Extending Database Technology Advances in Database Technology—EDBT ’09, p. 898. ACM Press, New York, NY, USA (2009). doi:10.1145/1516360.1516463. http://dl.acm.org/citation.cfm?id=1516360.1516463

  46. Srivastava, V., Bullo, F.: Hybrid combinatorial optimization: sample problems and algorithms. In: CDC-ECE, pp. 7212–7217. IEEE (2011)

  47. Su, L., Zou, P., Jia, Y.: Adaptive mining the approximate skyline over data stream. Int. Conf. Comput. Sci. 3, 742–745 (2007)

    Google Scholar 

  48. Tan, K.L., Eng, P.K., Ooi, B.C.: Efficient progressive skyline computation. In: VLDB, pp. 301–310 (2001)

  49. Tao, Y., Ding, L., Lin, X., Pei, J.: Distance-based representative skyline. In: Proceedings of the 2009 IEEE International Conference on Data Engineering, pp. 892–903. IEEE Computer Society, Washington, DC, USA (2009). doi:10.1109/ICDE.2009.84. http://dl.acm.org/citation.cfm?id=1546683.1547325

  50. Vassilvitskii, S., Yannakakis, M.: Efficiently computing succinct trade-off curves. Autom. Lang. Program. 3142, 1201–1213 (2004). doi:10.1007/b99859. http://link.springer.com/chapter/10.1007/978-3-540-27836-8_99

  51. Vlachou, A., Vazirgiannis, M.: Ranking the sky: discovering the importance of skyline points through subspace dominance relationships. Data Knowl. Eng. 69, 943–964 (2010). doi:10.1016/j.datak.2010.03.008

  52. Xia, T., Zhang, D., Tao, Y.: On skylining with flexible dominance relation. In: 2008 IEEE 24th International Conference on Data Engineering, pp. 1397–1399. IEEE (2008). doi:10.1109/ICDE.2008.4497568. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4497568

  53. Yiu, M.L., Mamoulis, N.: Multi-dimensional top- k dominating queries. VLDB J. 18(3), 695–718 (2009)

    Article  Google Scholar 

  54. Zhang, Y., Zhang, W., Lin, X., Jiang, B., Pei, J.: Ranking uncertain sky: the probabilistic top-k skyline operator. Inf. Syst. 36(5), 898–915 (2011). doi:10.1016/j.is.2011.03.008

  55. Zhang, Z., Lu, H., Ooi, B.C., Tung, A.K.H.: Understanding the meaning of a shifted sky: a general framework on extending skyline query. VLDB J. 19(2), 181–201 (2010)

    Google Scholar 

  56. Zhao, F., Das, G., Tan, K.L., Tung, A.K.H.: Call to order: a hierarchical browsing approach to eliciting users’ preference. In: Proceedings of the 2010 International Conference on Management of Data, pp. 27–38. ACM (2010)

Download references

Acknowledgments

We would like to thank Kenneth S. Bøgh for implementing one of the tested methods. We would also like to thank the anonymous reviewers of a previous version of this document for useful comments. This work has been supported in part by the Danish Council for Strategic Research, Grant 10-092316, by EU FET-Open project DATASIM (FP7-ICT 270833) and by the Italian Ministry of Education, Universities and Research FIRB project RBFR107725. Part of this work was done while Matteo Magnani was at Aarhus University, Denmark, at the KDDLab, ISTI, CNR and at the University of Bologna, Italy.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matteo Magnani.

Appendix: Proofs

Appendix: Proofs

Proof (Proof of Proposition 1)

To prove stability, we simply notice that div does not use non-skyline records in their definitions. To prove scale invariance, we notice that in case of complete ignorance, \(E(\sigma (p)) = \frac{1}{2}\) and \(\delta (p, q)\) is not affected by attribute re-scaling because the actual values are not used in its definition. Therefore, the value of the objective function does not change after re-scaling. \(\square \)

Proof (Proof of Proposition 2)

We show that \(\delta (r,t) + \delta (t,s)\) \( \ge \delta (r,s)\). This holds when \(t=r\) because \(\delta (r,t) + \delta (t,s)\) reduces to \(\delta (r,r) + \delta (r,s) = \delta (r,s)\) and \(\delta (r,r) = 0\). Similarly, it holds when \(t=s\). When \(t \ne r,s\) we verify that for each dimension \(i\) we have:

$$\begin{aligned}&\frac{|\{ o \in {\mathrm {sky}}(\mathrm {R}) \ | \ (r.i \ge o.i \ge s.i) \vee (s.i \ge o.i \ge r.i)\}|}{|{\mathrm {sky}}(\mathrm {R})|-1} \\&\quad \le \frac{|\{ o \in {\mathrm {sky}}(\mathrm {R}) \ | \ (r.i \ge o.i \ge t.i) \vee (t.i \ge o.i \ge r.i)\}|}{|{\mathrm {sky}}(\mathrm {R})|-1} \\&\qquad + \frac{|\{ o \in {\mathrm {sky}}(\mathrm {R}) \ | \ (t.i \ge o.i \ge s.i) \vee (s.i \ge o.i \ge t.i)\}|}{|{\mathrm {sky}}(\mathrm {R})|-1} \end{aligned}$$

This is done by noting that if a record \(o\) belongs to the left-hand side expression, it also belongs to at least one of the two sets in the right-hand side expression. \(\square \)

Proof (Proof of Proposition 3)

In general, the expected value of a function \(g\) where the probability density of \(x\) is \(f(x)\) is given by the integral \(\int g(x) f(x) \mathrm {d}x\) computed between \(-\infty \) and \(\infty \). In the specific case of sigmoid functions, having a probability density function for the thresholds (let us call it \(f\)), we can precisely write the expected value \(E(\sigma (r))\) of the significance of a record \(r\) as follows (in the following equations we omit the normalization factor for simplicity):Footnote 7

$$\begin{aligned} \frac{\sum _{i \in [1,d]} \lim \limits _{a.i\rightarrow -\infty , b.i\rightarrow \infty } \int _{a.i}^{b.i} \frac{1}{1+e^{-r.i+t.i}} f(t.i) \,\mathrm {d}t.i}{d} \end{aligned}$$

Starting from this, we can compute the expected value according to our actual knowledge. If we do not know anything about \(f\), we may use a uniform probability distribution:

$$\begin{aligned} \frac{\sum _{i \in [1,d]} \lim \limits _{a.i\rightarrow -\infty , b.i\rightarrow \infty } \int _{a.i}^{b.i} \frac{1}{1+e^{-r.i+t.i}} \frac{1}{b.i-a.i} \,\mathrm {d}t.i}{d} \end{aligned}$$

and if we are able to further restrict the possible thresholds inside \(d\) intervals \([a.i,b.i], i \in [1,d]\), we can express the expected significance as:

$$\begin{aligned} \frac{\sum _{i \in [1,d]} \int _{a.i}^{b.i} \frac{1}{1+e^{-r.i+t.i}} \frac{1}{b.i-a.i} \,\mathrm {d}t.i}{d} \end{aligned}$$

This expression can be treated analytically: The terms \(\frac{1}{b.i-a.i}\) do not depend on \(r.i\), and the logistic function \(\frac{1}{1+e^{-x}}\) has indefinite integral \(\ln (1+e^x)\). Therefore, we can move \(\frac{1}{b.i-a.i}\) out of the integral and solve it by substituting \(x = - t.i + r.i\). We obtain:

$$\begin{aligned} \frac{\sum _{i \in [1,d]} \frac{1}{b.i-a.i} (-\ln (1+e^{(r.i-b.i)}) + \ln (1+e^{(r.i-a.i)}))}{d} \end{aligned}$$

\(\square \)

Proof (of Theorem 1)

We want to prove that the problem in Definition 2, that we call here Rep-Skyline(R, k, \(\lambda , \delta , \sigma \)), is NP-hard. For convenience, we repeat the definition of Rep-Skyline: Let \(R\) be a relation, \(k\) an integer constant, \(\lambda \) a real number in \([0,1], \delta \) a function assessing the diversity of two records and \(\sigma \) a function assessing the significance of a record. We notate the skyline of \(\mathrm {R}\) as \({\mathrm {sky}}(\mathrm {R})\) and its subsets of size \(k\) as \(\mathcal {P}_k({\mathrm {sky}}(\mathrm {R}))\). We want to find:

$$\begin{aligned} {\mathop {\hbox {arg max}}\limits _{S \in \mathcal {P}_k({\mathrm {sky}}(\mathrm {R}))}} \ \lambda \sum _{r \in S} \min _{s \in S \setminus \{r\}} \delta (r, s) + (1-\lambda ) \sum _{r \in S} E(\sigma (r)) \end{aligned}$$

To prove NP-hardness, we provide a polynomial time reduction of the problem known as Remote- Pseudoforest(R, k, \(\overline{\delta }\)) to RepSkyline. The Remote- Pseudoforest problem is the following: Let R be a set of \(d\)-dimensional points, \(\overline{\delta }\) be a distance function and \(\mathcal {P}_k(R)\) indicate the set of subsets of R of size \(k\). We want to find:

$$\begin{aligned} {\mathop {\hbox {arg max}}\limits _{S \in \mathcal {P}_k(R)}} \ \sum _{r \in S} \min _{s \in S \setminus \{r\}} \overline{\delta }(r, s) \end{aligned}$$

The Remote-Pseudoforest problem has been proven to be NP-complete [8].

To solve Remote-Pseudoforest(R, k, \(\overline{\delta }\)), first we map \(R\) into a new set \(R'\) where for every \(d\)-dimensional record \(x = \langle x_1, \ldots , x_d\rangle \) in \(R\) we have a \(2d\)-dimensional record \(x' = \langle x_1, \ldots , x_d, -x_1, \ldots , -x_d\rangle \) in \(R'\). We also define a new distance function \(\overline{\delta }': R' \rightarrow [0,1]\) such that \(\overline{\delta }'(x') = \overline{\delta }(x)\), i.e., \(\overline{\delta }\) is computed on the first \(d\) dimensions of \(x'\). This transformation guarantees that all points in \(R'\) are in the \(2d\)-dimensional skyline and preserves the distance between all pairs of points \(x, y\) before and after the transformation (\(\overline{\delta }(x,y) = \overline{\delta }'(x',y')\)).

Now we can solve the problem of Remote- Pseudoforest(\(R, k, \overline{\delta }\)) by solving Rep-Skyline \((R', k, 1, \overline{\delta }', 0)\) Footnote 8 and mapping the result of Rep-Skyline back to the \(d\)-dimensional space. In fact, if we set the values \(R',1,\gamma ,\delta \) and \(0\) in the Rep-Skyline we obtain:

$$\begin{aligned} {\mathop {\hbox {arg max}}\limits _{S \in \mathcal {P}_k({\mathrm {sky}}(R'))}} \ \sum _{r \in S} \min _{s \in S \setminus \{r\}} \overline{\delta }'(r, s) \end{aligned}$$

where \({\mathrm {sky}}(R') = R'\).

As a result, as we are solving Remote-Pseudoforest by calling Rep-Skyline as a sub-procedure and only applying a transformation of the input that is linear on the number of records and dimensions, we can state that Remote-Pseudoforest is not more difficult (\(\le _p\)) than Rep-Skyline. Because Remote-Pseudoforest is NP-complete, we can conclude that Rep-Skyline is an NP-hard problem. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Magnani, M., Assent, I. & Mortensen, M.L. Taking the Big Picture: representative skylines based on significance and diversity. The VLDB Journal 23, 795–815 (2014). https://doi.org/10.1007/s00778-014-0352-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-014-0352-3

Keywords

Navigation