Skip to main content
Log in

Quantifying the competitiveness of a dataset in relation to general preferences

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Typically, a specific market (e.g., of hotels, restaurants, laptops, etc.) is represented as a multi-attribute dataset of the available products. The topic of identifying and shortlisting the products of most interest to a user has been well-explored. In contrast, in this work we focus on the dataset, and aim to assess its competitiveness with regard to different possible preferences. We define measures of competitiveness, and represent them in the form of a heat-map in the domain of preferences. Our work finds application in market analysis and in business development. These applications are further enhanced when the competitiveness heat-map is used in tandem with information on user preferences (which can be readily derived by existing methods). Interestingly, our study also finds side-applications with strong practical relevance in the area of multi-objective querying. We propose a suite of algorithms to efficiently produce the heat-map, and conduct case studies and an empirical evaluation to demonstrate the practicality of our work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Algorithm 2
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. For legibility, only a 2% random sample of the weight vectors is illustrated.

  2. Without loss of generality, we assume that no pair of products have identical attributes in all dimensions.

  3. r\(k\)-skyband computation by [47] follows the general branch-and-bound paradigm using the index on \({\mathcal {P}}\); n here refers to the number of products popped from its search heap and considered for inclusion into RKS(c).

  4. If \({\varvec{p}}_i\) r-dominates \({\varvec{p}}_j\) for N, the same holds for every partition of N too.

  5. The \(\#\textrm{DR}\) heat-map is very similar to \(|\textrm{RKS}|\), as we will elaborate shortly.

  6. The Pearson correlation coefficient for n value pairs \((x_i,y_i)\) is defined as \(r_{xy}= \frac{ \sum _{i=1}^{n}(x_i-{\bar{x}})(y_i-{\bar{y}}) }{\sqrt{\sum _{i=1}^{n}(x_i-{\bar{x}})^2}\sqrt{\sum _{i=1}^{n}(y_i-{\bar{y}})^2}}\), where \({\bar{x}}\) is the mean of \(x_i\) values, and analogously for \({\bar{y}}\). It takes values between \(-1\) (perfect anti-correlation) and 1 (perfect correlation).

References

  1. Achtert, E., Böhm, C., Kröger, P., Kunath, P., Pryakhin, A., Renz, M.: Efficient reverse k-nearest neighbor search in arbitrary metric spaces. In SIGMOD Conference, pp. 515–526 (2006)

  2. Agarwal, P.K., Arge, L., Erickson, J., Franciosa, P.G., Vitter, J.S.: Efficient searching with linear constraints. J. Comput. Syst. Sci. 61(2), 194–216 (2000)

    Article  MathSciNet  Google Scholar 

  3. Agarwal, P.K., Kumar, N., Sintos, S., Suri, S.: Efficient algorithms for k-regret minimizing sets. In: SEA, volume 75 of LIPIcs, pp. 7:1–7:23 (2017)

  4. Asudeh, A., Nazi, A., Zhang, N., Das., G.: Efficient computation of regret-ratio minimizing set: A compact maxima representative. In: SIGMOD Conference, pp. 821–834 (2017)

  5. Asudeh, A., Nazi, A., Zhang, N., Das, G., Jagadish, H.V.: RRR: rank-regret representative. In: SIGMOD Conference, pp. 263–280 (2019)

  6. Basketball positions. https://en.wikipedia.org/wiki/Basketball_positions

  7. Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: An efficient and robust access method for points and rectangles. In: SIGMOD Conference, pp. 322–331 (1990)

  8. Berg, M.d., Cheong, O., Kreveld, M.v., Overmars, M.: Computational geometry: algorithms and applications. Springer-Verlag TELOS, 2008

  9. Bernecker, T., Emrich, T., Kriegel, H., Renz, M., Zankl, S., Züfle, A.: Efficient probabilistic reverse nearest neighbor query processing on uncertain data. PVLDB 4(10), 669–680 (2011)

    Google Scholar 

  10. Bhattacharyya, A.: On a measure of divergence between two statistical populations defined by their probability distributions. Bull. Calcutta Math. Soc. 35, 99–109 (1943)

    MathSciNet  Google Scholar 

  11. Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE, pp. 421–430 (2001)

  12. Chang, Y.-C., Bergman, L., Castelli, V., Li, C.-S., Lo, M.-L., Smith, J.R.: The onion technique: Indexing for linear optimization queries. In: SIGMOD Conference, pp. 391–402 (2000)

  13. Chester, S., Thomo, A., Venkatesh, S., Whitesides, S.: Computing k-regret minimizing sets. PVLDB 7(5), 389–400 (2014)

    Google Scholar 

  14. Church, J., Ware, R.: Industrial Organization: A Strategic Approach. Irwin McGraw Hill, New York (2000)

    Google Scholar 

  15. Ciaccia, P., Martinenghi, D.: Reconciling skyline and ranking queries. PVLDB 10(11), 1454–1465 (2017)

    Google Scholar 

  16. Ciaccia, P., Martinenghi, D.: FA + TA \(<\) fsa: Flexible score aggregation. In: CIKM, pp 57–66 (2018)

  17. Das, G., Gunopulos, D., Koudas, N., Tsirogiannis, D.: Answering top-k queries using views. In: VLDB, pp. 451–462 (2006)

  18. Duch, J., Waitzman, J.S., Amaral, L.A.N.: Quantifying the performance of individual players in a team activity. PLOS ONE 5(6), 1–7 (2010)

    Article  Google Scholar 

  19. Dyer, J.S., Sarin, R.K.: Measurable multi attribute value functions. Oper. Res. 27(4), 810–822 (1979)

    Article  MathSciNet  Google Scholar 

  20. Edelsbrunner, H., Seidel, R., Sharir, M.: On the zone theorem for hyperplane arrangements. SIAM J. Comput. 22(2), 418–429 (1993)

    Article  MathSciNet  Google Scholar 

  21. Emrich, T., Kriegel, H., Kröger, P., Renz, M., Züfle.: Boosting spatial pruning: on optimal pruning of mbrs. In: SIGMOD Conference, pp. 9–50 (2010)

  22. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)

    Article  MathSciNet  Google Scholar 

  23. Feller, W.: An Introduction to Probability Theory and Its Applications. Wiley, USA (1968)

    Google Scholar 

  24. Fürnkranz, J., Hüllermeier, E. (eds.): Preference Learning. Springer, Cham (2010)

    Google Scholar 

  25. Ge, S., U, L.H., Mamoulis, N., Cheung, D.W.: Efficient all top-\(k\) computation - A unified solution for all top-\(k\), reverse top-\(k\) and top-\(m\) influential queries. IEEE Trans. Knowl. Data Eng., 25(5):1015–1027 (2013)

  26. Gerardi, K., Shapiro, A.: Does competition reduce price dispersion? new evidence from the airline industry. J. Polit. Econ. 117(1), 1–37 (2009)

    Article  Google Scholar 

  27. Godfrey, P., Shipley, R., Gryz, J.: Algorithms and analyses for maximal vector computation. VLDB J. 16(1), 5–28 (2007)

    Article  Google Scholar 

  28. Hausman, J., Leonard, G., Zona, J.D.: Competitive analysis with differentiated products. Anna. Econ. Stat. 34, 159–180 (1994)

    Google Scholar 

  29. Hotel dataset. https://github.com/RyanBalfanz/hotelsbase

  30. House dataset. https://www.ipums.org

  31. Hose, K., Vlachou, A.: A survey of skyline processing in highly distributed environments. VLDB J. 21(3), 359–384 (2012)

    Article  Google Scholar 

  32. Hristidis, V., Koudas, N., Papakonstantinou, Y.: PREFER: A system for the efficient execution of multi-parametric ranked queries. In: SIGMOD Conference, pp. 259–270 (2001)

  33. Hristidis, V., Papakonstantinou, Y.: Algorithms and applications for answering ranked queries using ranked views. VLDB J. 13(1), 49–70 (2004)

    Article  Google Scholar 

  34. Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comp. Surv. 40(4), 11–58 (2008)

    Article  Google Scholar 

  35. Kalyvas, C., Tzouramanis, T.: A survey of skyline query processing. CoRR, abs/1704.01788 (2017)

  36. Keeney, R.L., Raiffa, H.: Decisions with Multiple Objectives: Preferences and Value Trade-Offs. Wiley, USA (1976)

    Google Scholar 

  37. Kolahdouzan, M.R., Shahabi, C.: Voronoi-based K nearest neighbor search for spatial network databases. In: VLDB, pp. 840–851 (2004)

  38. Korn, F., Muthukrishnan, S.: Influence sets based on reverse nearest neighbor queries. In: SIGMOD Conference, pp. 201–212 (2000)

  39. Kriegel, H., Renz, M., Schubert, M.: Route skyline queries: A multi-preference path planning approach. In: ICDE, pp. 261–272 (2010)

  40. Li, C., Ooi, B.C., Tung, A.K.H., Wang, S.: DADA: a data cube for dominant relationship analysis. In: SIGMOD Conference, pp. 659–670 (2006)

  41. Lin, J.: Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 37(1), 145–151 (1991)

    Article  MathSciNet  Google Scholar 

  42. Lin, X., Yuan, Y., Zhang, Q., Zhang, Y.: Selecting stars: The k most representative skyline operator. In: ICDE, pp. 86–95 (2007)

  43. Liu, J., Xiong, L., Zhang, Q., Pei, J., Luo, J.: Eclipse: Generalizing kNN and skyline. In: ICDE, pp. 972–983. IEEE (2021)

  44. Miah, M., Das, G., Hristidis, V., Mannila, H.: Standing out in a crowd: Selecting attributes for maximum visibility. In: ICDE, pp. 356–365 (2008)

  45. Mouratidis, K., Li, K., Tang, B.: Marrying top-k with skyline queries: Relaxing the preference input while producing output of controllable size. In: SIGMOD Conference, pp. 1317–1330 (2021)

  46. Mouratidis, K., Lin, Y., Yiu, M.L.: Preference queries in large multi-cost transportation networks. In: ICDE, pp. 533–544 (2010)

  47. Mouratidis, K., Tang, B.: Exact processing of uncertain top-k queries in multi-criteria settings. PVLDB 11(8), 866–879 (2018)

    Google Scholar 

  48. Mouratidis, K., Zhang, J., Pang, H.: Maximum rank query. PVLDB 8(12), 1554–1565 (2015)

    Google Scholar 

  49. Mulmuley, K.: On levels in arrangements and voronoi diagrams. Discr. Comput. Geometry 6, 307–338 (1991)

    Article  MathSciNet  Google Scholar 

  50. Nanongkai, D., Sarma, A.D., Lall, A., Lipton, R.J., Xu, J.J.: Regret-minimizing representative databases. PVLDB 3(1), 1114–1124 (2010)

    Google Scholar 

  51. National Institute of Standards and Technology. NIST/SEMATECH e-handbook of statistical methods; 7.1.6. what are outliers in the data, 2013

  52. NBA dataset. https://www.basketball-reference.com

  53. Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database systems. ACM Trans. Database Syst. 30(1), 41–82 (2005)

    Article  Google Scholar 

  54. Pearson, K.: Note on Regression and Inheritance in the Case of Two Parents. Proc. R. Soc. Lond. Ser. I(58), 240–242 (1895)

    Google Scholar 

  55. Pearson, R.K.: Univariate outlier detection, chapter 3, pp. 69–91 (2005)

  56. Peng, P., Wong, R.C.: k-hit query: Top-k query with probabilistic utility function. In: SIGMOD Conference, pp. 577–592 (2015)

  57. Qian, L., Gao, J., Jagadish, H.V.: Learning user preferences by adaptive pairwise comparison. PVLDB 8(11), 1322–1333 (2015)

    Google Scholar 

  58. Rahul, S., Tao, Y.: Efficient top-k indexing via general reductions. In: PODS, pp. 277–288. ACM (2016)

  59. Sarma, A.D., Lall, A., Nanongkai, D., Xu, J.J.: Randomized multi-pass streaming skyline algorithms. PVLDB 2(1), 85–96 (2009)

    Google Scholar 

  60. Sharifzadeh, M., Shahabi, C.: The spatial skyline queries. In: VLDB, pp. 751–762 (2006)

  61. Sharifzadeh, M., Shahabi, C.: Vor-tree: R-trees with voronoi diagrams for efficient processing of spatial nearest neighbor queries. PVLDB 3(1), 1231–1242 (2010)

    Google Scholar 

  62. Sheng, C., Tao, Y.: Worst-case i/o-efficient skyline algorithms. ACM Trans. Database Syst. 37(4), 1–22 (2012)

    Article  Google Scholar 

  63. Silberschatz, A., Korth, H.F., Sudarshan, S.: Database System Concepts, 7th edn. McGraw-Hill Book Company, USA (2020)

    Google Scholar 

  64. Soliman, M.A., Ilyas, I.F., Martinenghi, D., Tagliasacchi, M.: Ranking with uncertain scoring functions: semantics and sensitivity measures. In: SIGMOD Conference, pp. 805–816 (2011)

  65. Stanoi, I., Agrawal, D., Abbadi, A.E.: Reverse nearest neighbor queries for dynamic databases. In: SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 44–53 (2000)

  66. Sun, Y., Zhang, R., Xue, A.Y., Qi, J., Du, X.: Reverse nearest neighbor heat maps: A tool for influence exploration. In: ICDE, pp. 966–977 (2016)

  67. Tang, B., Mouratidis, K., Han, M.: On m-impact regions and standing top-k influence problems. In: SIGMOD Conference, pp. 1784–1796 (2021)

  68. Tang, B., Mouratidis, K., Yiu, M.L.: Determining the impact regions of competing options in preference space. In: SIGMOD Conference, pp. 805–820 (2017)

  69. Tang, B., Mouratidis, K., Yiu, M.L., Chen, Z.: Creating top ranking options in the continuous option and preference space. PVLDB 12(10), 1181–1194 (2019)

    Google Scholar 

  70. Tao, Y., Ding, L., Lin, X., Pei, J.: Distance-based representative skyline. In: ICDE, pp. 892–903 (2009)

  71. Tao, Y., Hristidis, V., Papadias, D., Papakonstantinou, Y.: Branch-and-bound processing of ranked queries. Inf. Syst. 32(3), 424–445 (2007)

    Article  Google Scholar 

  72. Tao, Y., Papadias, D., Lian, X., Xiao, X.: Multidimensional reverse k NN search. VLDB J. 16(3), 293–316 (2007)

    Article  Google Scholar 

  73. Tsaparas, P., Palpanas, T., Kotidis, Y., Koudas, N., Srivastava, D.: Ranked join indices. In: ICDE, pp. 277–288 (2003)

  74. TripAdvisor dataset. https://www.cs.virginia.edu/~hw5x/dataset.html

  75. Vlachou, A., Doulkeridis, C., Kotidis, Y., Nørvåg, K.: Reverse top-k queries. In: ICDE, pp. 365–376 (2010)

  76. Vlachou, A., Doulkeridis, C., Kotidis, Y., Nørvåg, K.: Monochromatic and bichromatic reverse top-k queries. IEEE Trans. Knowl. Data Eng. 23(8), 1215–1229 (2011)

    Article  Google Scholar 

  77. Vlachou, A., Doulkeridis, C., Nørvåg, K., Kotidis, Y.: Identifying the most influential data objects with reverse top-k queries. PVLDB 3(1), 364–372 (2010)

    Google Scholar 

  78. Wan, Q., Wong, R.C., Ilyas, I.F., Özsu, M.T., Peng, Y.: Creating competitive products. PVLDB 2(1), 898–909 (2009)

    Google Scholar 

  79. Wang, H., Lu, Y., Zhai, C.: Latent aspect rating analysis on review text data: a rating regression approach. In: KDD, pp. 783–792 (2010)

  80. Wong, R.C., Özsu, M.T., Yu, P.S., Fu, A.W., Liu, L.: Efficient method for maximizing bichromatic reverse nearest neighbor. PVLDB 2(1), 1126–1137 (2009)

    Google Scholar 

  81. Wu, Y., Gao, J., Agarwal, P.K., Yang, J.: Finding diverse, high-value representatives on a surface of answers. PVLDB 10(7), 793–804 (2017)

    Google Scholar 

  82. Xia, T., Zhang, D., Tao, Y.: On skylining with flexible dominance relation. In: ICDE, pp. 1397–1399 (2008)

  83. Xie, M., Wong, R.C., Lall, A.: An experimental survey of regret minimization query and variants: bridging the best worlds between top-k query and skyline query. VLDB J. 29(1), 147–175 (2020)

    Article  Google Scholar 

  84. Xie, M., Wong, R.C., Li, J., Long, C., Lall, A.: Efficient k-regret query algorithm with restriction-free bound for any dimensionality. In: SIGMOD Conference, pp. 959–974 (2018)

  85. Xin, D., Chen, C., Han, J.: Towards robust indexing for ranked queries. In: VLDB, pp. 235–246, (2006)

  86. Yang, G., Cai, Y.: Querying improvement strategies. In: EDBT, pp. 294–305 (2017)

  87. Yang, J., Zhang, Y., Zhang, W., Lin, X.: Influence based cost optimization on user preference. In: ICDE, pp. 709–720 (2016)

  88. Yang, S., Cheema, M.A., Lin, X., Zhang, Y.: SLICE: reviving regions-based pruning for reverse k nearest neighbors queries. In: ICDE, pp. 760–771 (2014)

  89. Yiu, M.L., Mamoulis, N.: Multi-dimensional top-k dominating queries. VLDB J. 18(3), 695–718 (2009)

    Article  Google Scholar 

  90. Yu, A., Agarwal, P.K., Yang, J.: Top-k preferences in high dimensions. IEEE Trans. Knowl. Data Eng. 28(2), 311–325 (2016)

    Article  Google Scholar 

  91. Zhang, P., Cheng, R., Mamoulis, N., Renz, M., Züfle, A., Tang, Y., Emrich, T.: Voronoi-based nearest neighbor search for multi-dimensional uncertain databases. In: ICDE, pp. 158–169, (2013)

  92. Zhang, Z., Jin, C., Kang, Q.: Reverse k-ranks query. PVLDB 7(10), 785–796 (2014)

Download references

Acknowledgements

This work was supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 (Award No. MOE-T2EP20121-0002). Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not reflect the views of the Ministry of Education, Singapore. This work was partially supported by Shenzhen Fundamental Research Program (Grant No. 20220815112848002).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyriakos Mouratidis.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mouratidis, K., Li, K. & Tang, B. Quantifying the competitiveness of a dataset in relation to general preferences. The VLDB Journal 33, 231–250 (2024). https://doi.org/10.1007/s00778-023-00804-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-023-00804-1

Keywords

Navigation