Skip to main content
Log in

Multiple k nearest neighbor search

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

The problem of kNN (k Nearest Neighbor) queries has received considerable attention in the database and information retrieval communities. Given a dataset D and a kNN query q, the k nearest neighbor algorithm finds the closest k data points to q. The applications of kNN queries are board, not only in spatio-temporal databases but also in many areas. For example, they can be used in multimedia databases, data mining, scientific databases and video retrieval. The past studies of kNN query processing did not consider the case that the server may receive multiple kNN queries at one time. Their algorithms process queries independently. Thus, the server will be busy with continuously reaccessing the database to obtain the data that have already been acquired. This results in wasting I/O costs and degrading the performance of the whole system. In this paper, we focus on this problem and propose an algorithm named COrrelated kNN query Evaluation (COKE). The main idea of COKE is an “information sharing” strategy whereby the server reuses the query results of previously executed queries for efficiently processing subsequent queries. We conduct a comprehensive set of experiments to analyze the performance of COKE and compare it with the Best-First Search (BFS) algorithm. Empirical studies indicate that COKE outperforms BFS, and achieves lower I/O costs and less running time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22
Figure 23
Figure 24
Figure 25

Similar content being viewed by others

References

  1. Foursquer. https://foursquare.com/

  2. Google+. https://plus.google.com/

  3. The north east dataset. http://www.rtreeportal.org/

  4. Yelp. http://www.yelp.com/

  5. Bohm, C., Ooi, B. C., Plant, C., Yan, Y.: Efficiently Processing Continuous K-Nn Queries on Data Streams. In: Proceedings of the International Conference on Data Engineering (ICDE), pp. 156–165 (2007)

  6. Braunmuller, B., Ester, M., Kriegel, H.P., Sander, J.: Multiple similarity queries: a basic dbms operation for mining in metric databases. IEEE Trans. Knowl. Data Eng. 13(1), 79–95 (2001)

    Article  Google Scholar 

  7. Chávez, E., Navarro, G., Baeza-Yates, R., Marroguin, J.L.: Searching in metric spaces. J. ACM Comput. Surv. (CSUR) 33(3), 273–321 (2001)

    Article  Google Scholar 

  8. Cui, B., Ooi, B.C., Su, J., Tan, K.L.: Indexing high-dimensional data for efficient in-memory similarity search. IEEE Trans. Knowl. Data Eng. 17(3), 339–353 (2005)

    Article  Google Scholar 

  9. Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R.: Advances in knowledge discovery and data mining (1996)

  10. Hjaltason, G.R., Samet, H.: Distance browsing in spatial databases. ACM Trans. Database Syst. (TODS) 24(2), 265–318 (2010)

    Article  Google Scholar 

  11. Jagadish, H.V., Ooi, B.C., lee Tan, K., Yu, C., Zhang, R.: Idistance: An adaptive b+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst. (TODS) 30(2), 364–397 (2005)

    Article  Google Scholar 

  12. Jiazhu, D., Zhilong, L.: A Location Authentication Scheme Based on Proximity Test of Location Tags. In: Proceedings of 2013 International Conference on Information and Network Security (ICINS 2013), pp. 1–6 (2013)

  13. Kolahdouzan, M., Shahabi, C.: Voronoi-Based K Nearest Neighbor Search for Spatial Network Databases. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, pp. 840–851 (2004)

  14. Korn, F., Muthukrishnan, S.: Influence Sets Based on Reverse Nearest Neighbor Queries. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 201–212 (2000)

  15. Korn, F., Sidiropoulos, N., Faloutsos, C., Siegel, E., Protopapas, Z.: Fast Nearest Neighbor Search in Medical Image Databases (1996). CA, USA

  16. Lu, H., Ooi, B.C., Shen, H.T., Xue, X.: Hierarchical indexing structure for efficient similarity search in video retrieval. IEEE Trans. Knowl. Data Eng. 18(11), 1544–1559 (2006)

    Article  Google Scholar 

  17. Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-Probe Lsh: Efficient Indexing for High-Dimensional Similarity Search. In: Proceedings of the 33Rd International Conference on Very Data Bases, pp. 950–961 (2007)

  18. Mokbel, M.F., Xiong, X., Aref, W.G.: Sina: Scalable Incremental Processing of Continuous Queries in Spatio-Temporal Databases. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 623–634 (2004)

  19. Mouratidis, K., Papadias, D.: Continuous nearest neighbor queries over sliding windows. IEEE Trans. Knowl. Data Eng. 19(6), 789–803 (2007)

    Article  Google Scholar 

  20. Papadias, D., Tao, Y., Mouratidis, K., Hui, C.K.: Aggregate nearest neighbor queries in spatial databases. ACM Trans. Database Syst. 30(2), 529–576 (2005)

    Article  Google Scholar 

  21. Rao, J., Ross, K.A.: Making B+-Trees Cache Conscious in Main Memory. In: ACM SIGMOD Record, vol. 29, pp. 475–486 (2000)

  22. Roussopoulos, N., Kelly, S., Vncent, F.: Nearest Neighbor Queries. In: ACM SIGMOD International Conference on Management of Data. New Your, USA, pp. 71–79 (1995)

  23. Saraiva, P.C., de Moura, E.S., Ziviani, N., andRodrigo Fonseca, W.M., Riberio-Neto, B.: Rank-Preserving Two-Level Caching for Scalable Search Engines. In: Proceedings of the 24Th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 51–58 (2001)

  24. Seidl, T., Kriegel, H.P.: Efficient User-Adaptable Similarity Search in Large Multimedia Database. In: Proceedings of the 23Rd International Conference on Very Large Data Bases, pp. 506–515 (1997)

  25. Sharifzadeh, M., Shahabi, C.: Vor-Tree: R-trees with Voronoi Diagrams for Efficient Processing of Spatial Nearest Neighbor Queries. In: Proceedings of the VLDB Endowment (2010)

  26. Tao, Y., Papadias, D., Lian, X.: Reverse Knn Search in Arbitrary Dimensionality. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases - Volume 30, pp. 744–755 (2004)

  27. Tao, Y., Yi, K., Sheng, C., Kalnis, P.: Quality and Efficiency in High Dimensional Nearest Neighbor Search. In: ACM SIGMOD (2009)

  28. Teevan, J., Adar, E., Jones, R., Potts, M.: History Repeats Itself: Repeat Queries in Yahoo’s Query Logs. In: Proceedings of the 29Th Annual ACM Conference on Research and Development in Information Retrieval (SIGIR ’06), pp. 703–704 (2005)

  29. Weber, R., Schek, H., Blott, S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: VLDB, pp. 194–205 (1998)

  30. Willhalm, T., Popovici, N., Boshmaf, Y., Plattner, H., Zeier, A., Schaffner, J.: Simd-scan: ultra fast in-memory table scan using on-chip vector processing units. Proc. VLDB Endowment 2(1), 385–394 (2009)

    Article  Google Scholar 

  31. Xie, Y., O’Hallaron, D.: Locality in Search Engine Queries and Its Implications for Caching. In: Proceedings of 21 Annual Joint Conference of the IEEE Computer and Communications Societies (Infocom), pp. 1238–1247 (2002)

  32. Xiong, X., Mokbel, M.F., Aref, W.G.: Sea-Cnn: Scalable Processing of Continuous K-Nearest Neighbor Queries in Spatio-Temporal Databases. In: Proceedings of the 21St International Conference on Data Engineering, pp. 643–654 (2005)

  33. Yang, C., Lin, K.I.: An Index Structure for Efficient Reverse Nearest Neighbor Queries. In: Proceedings of the 2001 International Conference on Data Engineering, pp. 485–492 (2001)

  34. Zhang, J., Zhu, M., Papadias, D., Tao, Y., Lee, D.L.: Location-Based Spatial Queries. In: ACM SIGMOD Int. Conf. Manag. Data. NY, USA, pp. 443–454 (2003)

  35. Zhang, R., Stradling, M.: The hv-tree: a memory hierarchy aware version index. Proc. VLDB Endowment 3(1-2), 397–408 (2010)

    Article  Google Scholar 

  36. Zhuang, Y., Li, Q., Chen, L.: Multi-Query Optimization for Distributed Similarity Query Processing. In: International Conference on Distributed Computing Systems (ICDCS) (2008)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu-Chi Chung.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chung, YC., Su, IF., Lee, C. et al. Multiple k nearest neighbor search. World Wide Web 20, 371–398 (2017). https://doi.org/10.1007/s11280-016-0392-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-016-0392-2

Keywords

Navigation