k-nearest neighbor (k-NN) queries are well-known and widely used in a plethora of applications. However, in the original definition of k-NN queries there is no concern regarding diversity of the answer set with respect to the user’s interests. For instance, travelers may be looking for touristic sites that are close to where they are, but that would also lead them to see different parts of the city. Likewise, if one is looking for restaurants close by, it may be more interesting to learn about restaurants of different categories or ethnicities which are nonetheless relatively close. The interesting novel aspect of this type of query is that there are two competing criteria to be optimized: closeness and diversity. We propose two approaches that leverage the notion of linear skyline queries in order to find the k diverse nearest neighbors within a radius r from a given query point, or (k, r)-DNNs for short. Our proposed approaches return a relatively small set containing all optimal solutions for any linear combination of the weights a user could give to the two competing criteria, and we consider three different notions of diversity: spatial, categorical and angular. Our experiments, varying a number of parameters and exploring synthetic and real datasets, in both Euclidean space and road networks, respectively, show that our approaches are several orders of magnitude faster than a straightforward approach.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Since it finds all possible sets of size k regardless of the diversity considered, BF’s processing time does not vary with the type of diversity; hence we omit the results for those.
Abbar S, Amer-Yahia S, Indyk P, Mahabadi S, Varadarajan KR (2013) Diverse near neighbor problem. In: Proceedings of the 29th Symposium on Computational Geometry, pp 207–214
Ahmadi E, Nascimento M (2017) Datasets of roads, public transportation and points-of-interest in Amsterdam, Oslo and Berlin. In: https://sites.google.com/ualberta.ca/nascimentodatasets/
Borzsony S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of 17th International Conference on Data Engineering, pp 421–430
Carbonell J, Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 335–336
Carterette B (2011) An analysis of np-completeness in novelty and diversity ranking. Inf Retr 14:89–106
Clarke CL, Kolla M, Cormack GV, Vechtomova O, Ashkan A, Büttcher S, MacKinnon I (2008) Novelty and diversity in information retrieval evaluation. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 659–666
Costa CF, Nascimento MA (2017) Towards spatially- and category-wise k-diverse nearest neighbors queries. In: International Symposium on Spatial and Temporal Databases, pp 163–181
Gu Y, Liu G, Qi J, Xu H, Yu G, Zhang R (2016) The moving k diversified nearest neighbor query. IEEE Trans Knowl Data Eng 28:2778–2792
Handl J, Knowles J (2005) Cluster generators for large high-dimensional data sets with large numbers of clusters. http://dbkgrouporg/handl/generators
Huang Z et al (2011) A clustering based approach for skyline diversity. Expert Syst Appl 38:7984–7993
Jain A, Sarda P, Haritsa JR (2004) Providing diversity in k-nearest neighbor query results. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp 404–413
Kucuktunc O, Ferhatosmanoglu H (2013) λ-diverse nearest neighbors browsing for multidimensional data. IEEE Trans Knowl Data Eng 25:481–493
Lee K C, Lee W C, Leong H V (2010) Nearest surrounder queries. IEEE Trans Knowl Data Eng 22:1444–1458
Rafiei D, Bharat K, Shukla A (2010) . In: Proceedings of the 19th International Conference on World Wide Web, pp 781–790
Roussopoulos N, Kelley S, Vincent F (1995) Nearest neighbor queries. In: ACM SIGMOD Record, pp 71–79
Shekelyan M, Jossé G, Schubert M, Kriegel HP (2014) Linear path skyline computation in bicriteria networks. In: International Conference on Database Systems for Advanced Applications, pp 173–187
Tao Y (2009) Diversity in skylines. IEEE Data Eng Bull 32:65–72
Valkanas G, Papadopoulos AN, Gunopulos D (2013) Skydiver: a framework for skyline diversification. In: Proceedings of the 16th International Conference on Extending Database Technology, pp 406–417
Vieira MR, Razente HL, Barioni MC, Hadjieleftheriou M, Srivastava D, Traina C, Tsotras VJ (2011) On query result diversification. In: IEEE 27th International Conference on Data Engineering, pp 1163–1174
Yu C, Lakshmanan L, Amer-Yahia S (2009) It takes variety to make a world: Diversification in recommender systems. In: Proceedings of the 12th International Conference on Extending Database Technology, pp 368–378
Zhang C, Zhang Y, Zhang W, Lin X, Cheema MA, Wang X (2014) Diversified spatial keyword search on road networks. In: Proceedings of the 17th International Conference on Extending Database Technology, pp 367–378
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research has been partially supported by NSERC, Canada and CNPq’s Science Without Borders program, Brazil.
About this article
Cite this article
F. Costa, C., A. Nascimento, M. & Schubert, M. Diverse nearest neighbors queries using linear skylines. Geoinformatica 22, 815–844 (2018). https://doi.org/10.1007/s10707-018-0332-7
- Diverse nearest neighbors
- k-nearest neighbors
- Linear skyline
- Skyline queries