Diverse nearest neighbors queries using linear skylines

Abstract

k-nearest neighbor (k-NN) queries are well-known and widely used in a plethora of applications. However, in the original definition of k-NN queries there is no concern regarding diversity of the answer set with respect to the user’s interests. For instance, travelers may be looking for touristic sites that are close to where they are, but that would also lead them to see different parts of the city. Likewise, if one is looking for restaurants close by, it may be more interesting to learn about restaurants of different categories or ethnicities which are nonetheless relatively close. The interesting novel aspect of this type of query is that there are two competing criteria to be optimized: closeness and diversity. We propose two approaches that leverage the notion of linear skyline queries in order to find the k diverse nearest neighbors within a radius r from a given query point, or (k, r)-DNNs for short. Our proposed approaches return a relatively small set containing all optimal solutions for any linear combination of the weights a user could give to the two competing criteria, and we consider three different notions of diversity: spatial, categorical and angular. Our experiments, varying a number of parameters and exploring synthetic and real datasets, in both Euclidean space and road networks, respectively, show that our approaches are several orders of magnitude faster than a straightforward approach.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Notes

  1. 1.

    Since it finds all possible sets of size k regardless of the diversity considered, BF’s processing time does not vary with the type of diversity; hence we omit the results for those.

References

  1. 1.

    Abbar S, Amer-Yahia S, Indyk P, Mahabadi S, Varadarajan KR (2013) Diverse near neighbor problem. In: Proceedings of the 29th Symposium on Computational Geometry, pp 207–214

  2. 2.

    Ahmadi E, Nascimento M (2017) Datasets of roads, public transportation and points-of-interest in Amsterdam, Oslo and Berlin. In: https://sites.google.com/ualberta.ca/nascimentodatasets/

  3. 3.

    Borzsony S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of 17th International Conference on Data Engineering, pp 421–430

  4. 4.

    Carbonell J, Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 335–336

  5. 5.

    Carterette B (2011) An analysis of np-completeness in novelty and diversity ranking. Inf Retr 14:89–106

    Article  Google Scholar 

  6. 6.

    Clarke CL, Kolla M, Cormack GV, Vechtomova O, Ashkan A, Büttcher S, MacKinnon I (2008) Novelty and diversity in information retrieval evaluation. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 659–666

  7. 7.

    Costa CF, Nascimento MA (2017) Towards spatially- and category-wise k-diverse nearest neighbors queries. In: International Symposium on Spatial and Temporal Databases, pp 163–181

  8. 8.

    Gu Y, Liu G, Qi J, Xu H, Yu G, Zhang R (2016) The moving k diversified nearest neighbor query. IEEE Trans Knowl Data Eng 28:2778–2792

    Article  Google Scholar 

  9. 9.

    Handl J, Knowles J (2005) Cluster generators for large high-dimensional data sets with large numbers of clusters. http://dbkgrouporg/handl/generators

  10. 10.

    Huang Z et al (2011) A clustering based approach for skyline diversity. Expert Syst Appl 38:7984–7993

    Article  Google Scholar 

  11. 11.

    Jain A, Sarda P, Haritsa JR (2004) Providing diversity in k-nearest neighbor query results. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp 404–413

    Google Scholar 

  12. 12.

    Kucuktunc O, Ferhatosmanoglu H (2013) λ-diverse nearest neighbors browsing for multidimensional data. IEEE Trans Knowl Data Eng 25:481–493

    Article  Google Scholar 

  13. 13.

    Lee K C, Lee W C, Leong H V (2010) Nearest surrounder queries. IEEE Trans Knowl Data Eng 22:1444–1458

    Article  Google Scholar 

  14. 14.

    Rafiei D, Bharat K, Shukla A (2010) . In: Proceedings of the 19th International Conference on World Wide Web, pp 781–790

  15. 15.

    Roussopoulos N, Kelley S, Vincent F (1995) Nearest neighbor queries. In: ACM SIGMOD Record, pp 71–79

  16. 16.

    Shekelyan M, Jossé G, Schubert M, Kriegel HP (2014) Linear path skyline computation in bicriteria networks. In: International Conference on Database Systems for Advanced Applications, pp 173–187

    Google Scholar 

  17. 17.

    Tao Y (2009) Diversity in skylines. IEEE Data Eng Bull 32:65–72

    Google Scholar 

  18. 18.

    Valkanas G, Papadopoulos AN, Gunopulos D (2013) Skydiver: a framework for skyline diversification. In: Proceedings of the 16th International Conference on Extending Database Technology, pp 406–417

  19. 19.

    Vieira MR, Razente HL, Barioni MC, Hadjieleftheriou M, Srivastava D, Traina C, Tsotras VJ (2011) On query result diversification. In: IEEE 27th International Conference on Data Engineering, pp 1163–1174

  20. 20.

    Yu C, Lakshmanan L, Amer-Yahia S (2009) It takes variety to make a world: Diversification in recommender systems. In: Proceedings of the 12th International Conference on Extending Database Technology, pp 368–378

  21. 21.

    Zhang C, Zhang Y, Zhang W, Lin X, Cheema MA, Wang X (2014) Diversified spatial keyword search on road networks. In: Proceedings of the 17th International Conference on Extending Database Technology, pp 367–378

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Camila F. Costa.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research has been partially supported by NSERC, Canada and CNPq’s Science Without Borders program, Brazil.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

F. Costa, C., A. Nascimento, M. & Schubert, M. Diverse nearest neighbors queries using linear skylines. Geoinformatica 22, 815–844 (2018). https://doi.org/10.1007/s10707-018-0332-7

Download citation

Keywords

  • Diverse nearest neighbors
  • k-nearest neighbors
  • Linear skyline
  • Skyline queries