Quantifying the competitiveness of a dataset in relation to general preferences

Mouratidis, Kyriakos; Li, Keming; Tang, Bo

doi:10.1007/s00778-023-00804-1

Quantifying the competitiveness of a dataset in relation to general preferences

Regular Paper
Published: 08 August 2023

Volume 33, pages 231–250, (2024)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

148 Accesses
Explore all metrics

Abstract

Typically, a specific market (e.g., of hotels, restaurants, laptops, etc.) is represented as a multi-attribute dataset of the available products. The topic of identifying and shortlisting the products of most interest to a user has been well-explored. In contrast, in this work we focus on the dataset, and aim to assess its competitiveness with regard to different possible preferences. We define measures of competitiveness, and represent them in the form of a heat-map in the domain of preferences. Our work finds application in market analysis and in business development. These applications are further enhanced when the competitiveness heat-map is used in tandem with information on user preferences (which can be readily derived by existing methods). Interestingly, our study also finds side-applications with strong practical relevance in the area of multi-objective querying. We propose a suite of algorithms to efficiently produce the heat-map, and conduct case studies and an empirical evaluation to demonstrate the practicality of our work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cost optimization based on influence and user preference

Article 01 January 2019

What Matters for Shoppers: Investigating Key Attributes for Online Product Comparison

A Customer-Oriented Assortment Selection in the Big Data Environment

Notes

For legibility, only a 2% random sample of the weight vectors is illustrated.
Without loss of generality, we assume that no pair of products have identical attributes in all dimensions.
r\(k\)-skyband computation by [47] follows the general branch-and-bound paradigm using the index on \({\mathcal {P}}\); n here refers to the number of products popped from its search heap and considered for inclusion into RKS(c).
If \({\varvec{p}}_i\) r-dominates \({\varvec{p}}_j\) for N, the same holds for every partition of N too.
The \(\#\textrm{DR}\) heat-map is very similar to \(|\textrm{RKS}|\), as we will elaborate shortly.
The Pearson correlation coefficient for n value pairs \((x_i,y_i)\) is defined as \(r_{xy}= \frac{ \sum _{i=1}^{n}(x_i-{\bar{x}})(y_i-{\bar{y}}) }{\sqrt{\sum _{i=1}^{n}(x_i-{\bar{x}})^2}\sqrt{\sum _{i=1}^{n}(y_i-{\bar{y}})^2}}\), where \({\bar{x}}\) is the mean of \(x_i\) values, and analogously for \({\bar{y}}\). It takes values between \(-1\) (perfect anti-correlation) and 1 (perfect correlation).

References

Achtert, E., Böhm, C., Kröger, P., Kunath, P., Pryakhin, A., Renz, M.: Efficient reverse k-nearest neighbor search in arbitrary metric spaces. In SIGMOD Conference, pp. 515–526 (2006)
Agarwal, P.K., Arge, L., Erickson, J., Franciosa, P.G., Vitter, J.S.: Efficient searching with linear constraints. J. Comput. Syst. Sci. 61(2), 194–216 (2000)
Article MathSciNet Google Scholar
Agarwal, P.K., Kumar, N., Sintos, S., Suri, S.: Efficient algorithms for k-regret minimizing sets. In: SEA, volume 75 of LIPIcs, pp. 7:1–7:23 (2017)
Asudeh, A., Nazi, A., Zhang, N., Das., G.: Efficient computation of regret-ratio minimizing set: A compact maxima representative. In: SIGMOD Conference, pp. 821–834 (2017)
Asudeh, A., Nazi, A., Zhang, N., Das, G., Jagadish, H.V.: RRR: rank-regret representative. In: SIGMOD Conference, pp. 263–280 (2019)
Basketball positions. https://en.wikipedia.org/wiki/Basketball_positions
Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: An efficient and robust access method for points and rectangles. In: SIGMOD Conference, pp. 322–331 (1990)
Berg, M.d., Cheong, O., Kreveld, M.v., Overmars, M.: Computational geometry: algorithms and applications. Springer-Verlag TELOS, 2008
Bernecker, T., Emrich, T., Kriegel, H., Renz, M., Zankl, S., Züfle, A.: Efficient probabilistic reverse nearest neighbor query processing on uncertain data. PVLDB 4(10), 669–680 (2011)
Google Scholar
Bhattacharyya, A.: On a measure of divergence between two statistical populations defined by their probability distributions. Bull. Calcutta Math. Soc. 35, 99–109 (1943)
MathSciNet Google Scholar
Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE, pp. 421–430 (2001)
Chang, Y.-C., Bergman, L., Castelli, V., Li, C.-S., Lo, M.-L., Smith, J.R.: The onion technique: Indexing for linear optimization queries. In: SIGMOD Conference, pp. 391–402 (2000)
Chester, S., Thomo, A., Venkatesh, S., Whitesides, S.: Computing k-regret minimizing sets. PVLDB 7(5), 389–400 (2014)
Google Scholar
Church, J., Ware, R.: Industrial Organization: A Strategic Approach. Irwin McGraw Hill, New York (2000)
Google Scholar
Ciaccia, P., Martinenghi, D.: Reconciling skyline and ranking queries. PVLDB 10(11), 1454–1465 (2017)
Google Scholar
Ciaccia, P., Martinenghi, D.: FA + TA \(<\) fsa: Flexible score aggregation. In: CIKM, pp 57–66 (2018)
Das, G., Gunopulos, D., Koudas, N., Tsirogiannis, D.: Answering top-k queries using views. In: VLDB, pp. 451–462 (2006)
Duch, J., Waitzman, J.S., Amaral, L.A.N.: Quantifying the performance of individual players in a team activity. PLOS ONE 5(6), 1–7 (2010)
Article Google Scholar
Dyer, J.S., Sarin, R.K.: Measurable multi attribute value functions. Oper. Res. 27(4), 810–822 (1979)
Article MathSciNet Google Scholar
Edelsbrunner, H., Seidel, R., Sharir, M.: On the zone theorem for hyperplane arrangements. SIAM J. Comput. 22(2), 418–429 (1993)
Article MathSciNet Google Scholar
Emrich, T., Kriegel, H., Kröger, P., Renz, M., Züfle.: Boosting spatial pruning: on optimal pruning of mbrs. In: SIGMOD Conference, pp. 9–50 (2010)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)
Article MathSciNet Google Scholar
Feller, W.: An Introduction to Probability Theory and Its Applications. Wiley, USA (1968)
Google Scholar
Fürnkranz, J., Hüllermeier, E. (eds.): Preference Learning. Springer, Cham (2010)
Google Scholar
Ge, S., U, L.H., Mamoulis, N., Cheung, D.W.: Efficient all top-\(k\) computation - A unified solution for all top-\(k\), reverse top-\(k\) and top-\(m\) influential queries. IEEE Trans. Knowl. Data Eng., 25(5):1015–1027 (2013)
Gerardi, K., Shapiro, A.: Does competition reduce price dispersion? new evidence from the airline industry. J. Polit. Econ. 117(1), 1–37 (2009)
Article Google Scholar
Godfrey, P., Shipley, R., Gryz, J.: Algorithms and analyses for maximal vector computation. VLDB J. 16(1), 5–28 (2007)
Article Google Scholar
Hausman, J., Leonard, G., Zona, J.D.: Competitive analysis with differentiated products. Anna. Econ. Stat. 34, 159–180 (1994)
Google Scholar
Hotel dataset. https://github.com/RyanBalfanz/hotelsbase
House dataset. https://www.ipums.org
Hose, K., Vlachou, A.: A survey of skyline processing in highly distributed environments. VLDB J. 21(3), 359–384 (2012)
Article Google Scholar
Hristidis, V., Koudas, N., Papakonstantinou, Y.: PREFER: A system for the efficient execution of multi-parametric ranked queries. In: SIGMOD Conference, pp. 259–270 (2001)
Hristidis, V., Papakonstantinou, Y.: Algorithms and applications for answering ranked queries using ranked views. VLDB J. 13(1), 49–70 (2004)
Article Google Scholar
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comp. Surv. 40(4), 11–58 (2008)
Article Google Scholar
Kalyvas, C., Tzouramanis, T.: A survey of skyline query processing. CoRR, abs/1704.01788 (2017)
Keeney, R.L., Raiffa, H.: Decisions with Multiple Objectives: Preferences and Value Trade-Offs. Wiley, USA (1976)
Google Scholar
Kolahdouzan, M.R., Shahabi, C.: Voronoi-based K nearest neighbor search for spatial network databases. In: VLDB, pp. 840–851 (2004)
Korn, F., Muthukrishnan, S.: Influence sets based on reverse nearest neighbor queries. In: SIGMOD Conference, pp. 201–212 (2000)
Kriegel, H., Renz, M., Schubert, M.: Route skyline queries: A multi-preference path planning approach. In: ICDE, pp. 261–272 (2010)
Li, C., Ooi, B.C., Tung, A.K.H., Wang, S.: DADA: a data cube for dominant relationship analysis. In: SIGMOD Conference, pp. 659–670 (2006)
Lin, J.: Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 37(1), 145–151 (1991)
Article MathSciNet Google Scholar
Lin, X., Yuan, Y., Zhang, Q., Zhang, Y.: Selecting stars: The k most representative skyline operator. In: ICDE, pp. 86–95 (2007)
Liu, J., Xiong, L., Zhang, Q., Pei, J., Luo, J.: Eclipse: Generalizing kNN and skyline. In: ICDE, pp. 972–983. IEEE (2021)
Miah, M., Das, G., Hristidis, V., Mannila, H.: Standing out in a crowd: Selecting attributes for maximum visibility. In: ICDE, pp. 356–365 (2008)
Mouratidis, K., Li, K., Tang, B.: Marrying top-k with skyline queries: Relaxing the preference input while producing output of controllable size. In: SIGMOD Conference, pp. 1317–1330 (2021)
Mouratidis, K., Lin, Y., Yiu, M.L.: Preference queries in large multi-cost transportation networks. In: ICDE, pp. 533–544 (2010)
Mouratidis, K., Tang, B.: Exact processing of uncertain top-k queries in multi-criteria settings. PVLDB 11(8), 866–879 (2018)
Google Scholar
Mouratidis, K., Zhang, J., Pang, H.: Maximum rank query. PVLDB 8(12), 1554–1565 (2015)
Google Scholar
Mulmuley, K.: On levels in arrangements and voronoi diagrams. Discr. Comput. Geometry 6, 307–338 (1991)
Article MathSciNet Google Scholar
Nanongkai, D., Sarma, A.D., Lall, A., Lipton, R.J., Xu, J.J.: Regret-minimizing representative databases. PVLDB 3(1), 1114–1124 (2010)
Google Scholar
National Institute of Standards and Technology. NIST/SEMATECH e-handbook of statistical methods; 7.1.6. what are outliers in the data, 2013
NBA dataset. https://www.basketball-reference.com
Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database systems. ACM Trans. Database Syst. 30(1), 41–82 (2005)
Article Google Scholar
Pearson, K.: Note on Regression and Inheritance in the Case of Two Parents. Proc. R. Soc. Lond. Ser. I(58), 240–242 (1895)
Google Scholar
Pearson, R.K.: Univariate outlier detection, chapter 3, pp. 69–91 (2005)
Peng, P., Wong, R.C.: k-hit query: Top-k query with probabilistic utility function. In: SIGMOD Conference, pp. 577–592 (2015)
Qian, L., Gao, J., Jagadish, H.V.: Learning user preferences by adaptive pairwise comparison. PVLDB 8(11), 1322–1333 (2015)
Google Scholar
Rahul, S., Tao, Y.: Efficient top-k indexing via general reductions. In: PODS, pp. 277–288. ACM (2016)
Sarma, A.D., Lall, A., Nanongkai, D., Xu, J.J.: Randomized multi-pass streaming skyline algorithms. PVLDB 2(1), 85–96 (2009)
Google Scholar
Sharifzadeh, M., Shahabi, C.: The spatial skyline queries. In: VLDB, pp. 751–762 (2006)
Sharifzadeh, M., Shahabi, C.: Vor-tree: R-trees with voronoi diagrams for efficient processing of spatial nearest neighbor queries. PVLDB 3(1), 1231–1242 (2010)
Google Scholar
Sheng, C., Tao, Y.: Worst-case i/o-efficient skyline algorithms. ACM Trans. Database Syst. 37(4), 1–22 (2012)
Article Google Scholar
Silberschatz, A., Korth, H.F., Sudarshan, S.: Database System Concepts, 7th edn. McGraw-Hill Book Company, USA (2020)
Google Scholar
Soliman, M.A., Ilyas, I.F., Martinenghi, D., Tagliasacchi, M.: Ranking with uncertain scoring functions: semantics and sensitivity measures. In: SIGMOD Conference, pp. 805–816 (2011)
Stanoi, I., Agrawal, D., Abbadi, A.E.: Reverse nearest neighbor queries for dynamic databases. In: SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 44–53 (2000)
Sun, Y., Zhang, R., Xue, A.Y., Qi, J., Du, X.: Reverse nearest neighbor heat maps: A tool for influence exploration. In: ICDE, pp. 966–977 (2016)
Tang, B., Mouratidis, K., Han, M.: On m-impact regions and standing top-k influence problems. In: SIGMOD Conference, pp. 1784–1796 (2021)
Tang, B., Mouratidis, K., Yiu, M.L.: Determining the impact regions of competing options in preference space. In: SIGMOD Conference, pp. 805–820 (2017)
Tang, B., Mouratidis, K., Yiu, M.L., Chen, Z.: Creating top ranking options in the continuous option and preference space. PVLDB 12(10), 1181–1194 (2019)
Google Scholar
Tao, Y., Ding, L., Lin, X., Pei, J.: Distance-based representative skyline. In: ICDE, pp. 892–903 (2009)
Tao, Y., Hristidis, V., Papadias, D., Papakonstantinou, Y.: Branch-and-bound processing of ranked queries. Inf. Syst. 32(3), 424–445 (2007)
Article Google Scholar
Tao, Y., Papadias, D., Lian, X., Xiao, X.: Multidimensional reverse k NN search. VLDB J. 16(3), 293–316 (2007)
Article Google Scholar
Tsaparas, P., Palpanas, T., Kotidis, Y., Koudas, N., Srivastava, D.: Ranked join indices. In: ICDE, pp. 277–288 (2003)
TripAdvisor dataset. https://www.cs.virginia.edu/~hw5x/dataset.html
Vlachou, A., Doulkeridis, C., Kotidis, Y., Nørvåg, K.: Reverse top-k queries. In: ICDE, pp. 365–376 (2010)
Vlachou, A., Doulkeridis, C., Kotidis, Y., Nørvåg, K.: Monochromatic and bichromatic reverse top-k queries. IEEE Trans. Knowl. Data Eng. 23(8), 1215–1229 (2011)
Article Google Scholar
Vlachou, A., Doulkeridis, C., Nørvåg, K., Kotidis, Y.: Identifying the most influential data objects with reverse top-k queries. PVLDB 3(1), 364–372 (2010)
Google Scholar
Wan, Q., Wong, R.C., Ilyas, I.F., Özsu, M.T., Peng, Y.: Creating competitive products. PVLDB 2(1), 898–909 (2009)
Google Scholar
Wang, H., Lu, Y., Zhai, C.: Latent aspect rating analysis on review text data: a rating regression approach. In: KDD, pp. 783–792 (2010)
Wong, R.C., Özsu, M.T., Yu, P.S., Fu, A.W., Liu, L.: Efficient method for maximizing bichromatic reverse nearest neighbor. PVLDB 2(1), 1126–1137 (2009)
Google Scholar
Wu, Y., Gao, J., Agarwal, P.K., Yang, J.: Finding diverse, high-value representatives on a surface of answers. PVLDB 10(7), 793–804 (2017)
Google Scholar
Xia, T., Zhang, D., Tao, Y.: On skylining with flexible dominance relation. In: ICDE, pp. 1397–1399 (2008)
Xie, M., Wong, R.C., Lall, A.: An experimental survey of regret minimization query and variants: bridging the best worlds between top-k query and skyline query. VLDB J. 29(1), 147–175 (2020)
Article Google Scholar
Xie, M., Wong, R.C., Li, J., Long, C., Lall, A.: Efficient k-regret query algorithm with restriction-free bound for any dimensionality. In: SIGMOD Conference, pp. 959–974 (2018)
Xin, D., Chen, C., Han, J.: Towards robust indexing for ranked queries. In: VLDB, pp. 235–246, (2006)
Yang, G., Cai, Y.: Querying improvement strategies. In: EDBT, pp. 294–305 (2017)
Yang, J., Zhang, Y., Zhang, W., Lin, X.: Influence based cost optimization on user preference. In: ICDE, pp. 709–720 (2016)
Yang, S., Cheema, M.A., Lin, X., Zhang, Y.: SLICE: reviving regions-based pruning for reverse k nearest neighbors queries. In: ICDE, pp. 760–771 (2014)
Yiu, M.L., Mamoulis, N.: Multi-dimensional top-k dominating queries. VLDB J. 18(3), 695–718 (2009)
Article Google Scholar
Yu, A., Agarwal, P.K., Yang, J.: Top-k preferences in high dimensions. IEEE Trans. Knowl. Data Eng. 28(2), 311–325 (2016)
Article Google Scholar
Zhang, P., Cheng, R., Mamoulis, N., Renz, M., Züfle, A., Tang, Y., Emrich, T.: Voronoi-based nearest neighbor search for multi-dimensional uncertain databases. In: ICDE, pp. 158–169, (2013)
Zhang, Z., Jin, C., Kang, Q.: Reverse k-ranks query. PVLDB 7(10), 785–796 (2014)

Download references

Acknowledgements

This work was supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 (Award No. MOE-T2EP20121-0002). Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not reflect the views of the Ministry of Education, Singapore. This work was partially supported by Shenzhen Fundamental Research Program (Grant No. 20220815112848002).

Author information

Authors and Affiliations

School of Computing and Information Systems, Singapore Management University, Singapore, Singapore
Kyriakos Mouratidis
Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China
Keming Li
Research Institute of Trustworthy Autonomous Systems, Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China
Bo Tang

Authors

Kyriakos Mouratidis
View author publications
You can also search for this author in PubMed Google Scholar
Keming Li
View author publications
You can also search for this author in PubMed Google Scholar
Bo Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kyriakos Mouratidis.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mouratidis, K., Li, K. & Tang, B. Quantifying the competitiveness of a dataset in relation to general preferences. The VLDB Journal 33, 231–250 (2024). https://doi.org/10.1007/s00778-023-00804-1

Download citation

Received: 26 July 2022
Revised: 08 May 2023
Accepted: 19 June 2023
Published: 08 August 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s00778-023-00804-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quantifying the competitiveness of a dataset in relation to general preferences

Abstract

Access this article

Similar content being viewed by others

Cost optimization based on influence and user preference

What Matters for Shoppers: Investigating Key Attributes for Online Product Comparison

A Customer-Oriented Assortment Selection in the Big Data Environment

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Quantifying the competitiveness of a dataset in relation to general preferences

Abstract

Access this article

Similar content being viewed by others

Cost optimization based on influence and user preference

What Matters for Shoppers: Investigating Key Attributes for Online Product Comparison

A Customer-Oriented Assortment Selection in the Big Data Environment

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation