Multiple k nearest neighbor search

Chung, Yu-Chi; Su, I-Fang; Lee, Chiang; Liu, Pei-Chi

doi:10.1007/s11280-016-0392-2

Multiple k nearest neighbor search

Published: 13 May 2016

Volume 20, pages 371–398, (2017)
Cite this article

World Wide Web Aims and scope Submit manuscript

Yu-Chi Chung¹,
I-Fang Su²,
Chiang Lee³ &
…
Pei-Chi Liu⁴

830 Accesses
6 Citations
Explore all metrics

Abstract

The problem of kNN (k Nearest Neighbor) queries has received considerable attention in the database and information retrieval communities. Given a dataset D and a kNN query q, the k nearest neighbor algorithm finds the closest k data points to q. The applications of kNN queries are board, not only in spatio-temporal databases but also in many areas. For example, they can be used in multimedia databases, data mining, scientific databases and video retrieval. The past studies of kNN query processing did not consider the case that the server may receive multiple kNN queries at one time. Their algorithms process queries independently. Thus, the server will be busy with continuously reaccessing the database to obtain the data that have already been acquired. This results in wasting I/O costs and degrading the performance of the whole system. In this paper, we focus on this problem and propose an algorithm named COrrelated kNN query Evaluation (COKE). The main idea of COKE is an “information sharing” strategy whereby the server reuses the query results of previously executed queries for efficiently processing subsequent queries. We conduct a comprehensive set of experiments to analyze the performance of COKE and compare it with the Best-First Search (BFS) algorithm. Empirical studies indicate that COKE outperforms BFS, and achieves lower I/O costs and less running time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Foursquer. https://foursquare.com/
Google+. https://plus.google.com/
The north east dataset. http://www.rtreeportal.org/
Yelp. http://www.yelp.com/
Bohm, C., Ooi, B. C., Plant, C., Yan, Y.: Efficiently Processing Continuous K-Nn Queries on Data Streams. In: Proceedings of the International Conference on Data Engineering (ICDE), pp. 156–165 (2007)
Braunmuller, B., Ester, M., Kriegel, H.P., Sander, J.: Multiple similarity queries: a basic dbms operation for mining in metric databases. IEEE Trans. Knowl. Data Eng. 13(1), 79–95 (2001)
Article Google Scholar
Chávez, E., Navarro, G., Baeza-Yates, R., Marroguin, J.L.: Searching in metric spaces. J. ACM Comput. Surv. (CSUR) 33(3), 273–321 (2001)
Article Google Scholar
Cui, B., Ooi, B.C., Su, J., Tan, K.L.: Indexing high-dimensional data for efficient in-memory similarity search. IEEE Trans. Knowl. Data Eng. 17(3), 339–353 (2005)
Article Google Scholar
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R.: Advances in knowledge discovery and data mining (1996)
Hjaltason, G.R., Samet, H.: Distance browsing in spatial databases. ACM Trans. Database Syst. (TODS) 24(2), 265–318 (2010)
Article Google Scholar
Jagadish, H.V., Ooi, B.C., lee Tan, K., Yu, C., Zhang, R.: Idistance: An adaptive b+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst. (TODS) 30(2), 364–397 (2005)
Article Google Scholar
Jiazhu, D., Zhilong, L.: A Location Authentication Scheme Based on Proximity Test of Location Tags. In: Proceedings of 2013 International Conference on Information and Network Security (ICINS 2013), pp. 1–6 (2013)
Kolahdouzan, M., Shahabi, C.: Voronoi-Based K Nearest Neighbor Search for Spatial Network Databases. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, pp. 840–851 (2004)
Korn, F., Muthukrishnan, S.: Influence Sets Based on Reverse Nearest Neighbor Queries. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 201–212 (2000)
Korn, F., Sidiropoulos, N., Faloutsos, C., Siegel, E., Protopapas, Z.: Fast Nearest Neighbor Search in Medical Image Databases (1996). CA, USA
Lu, H., Ooi, B.C., Shen, H.T., Xue, X.: Hierarchical indexing structure for efficient similarity search in video retrieval. IEEE Trans. Knowl. Data Eng. 18(11), 1544–1559 (2006)
Article Google Scholar
Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-Probe Lsh: Efficient Indexing for High-Dimensional Similarity Search. In: Proceedings of the 33Rd International Conference on Very Data Bases, pp. 950–961 (2007)
Mokbel, M.F., Xiong, X., Aref, W.G.: Sina: Scalable Incremental Processing of Continuous Queries in Spatio-Temporal Databases. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 623–634 (2004)
Mouratidis, K., Papadias, D.: Continuous nearest neighbor queries over sliding windows. IEEE Trans. Knowl. Data Eng. 19(6), 789–803 (2007)
Article Google Scholar
Papadias, D., Tao, Y., Mouratidis, K., Hui, C.K.: Aggregate nearest neighbor queries in spatial databases. ACM Trans. Database Syst. 30(2), 529–576 (2005)
Article Google Scholar
Rao, J., Ross, K.A.: Making B+-Trees Cache Conscious in Main Memory. In: ACM SIGMOD Record, vol. 29, pp. 475–486 (2000)
Roussopoulos, N., Kelly, S., Vncent, F.: Nearest Neighbor Queries. In: ACM SIGMOD International Conference on Management of Data. New Your, USA, pp. 71–79 (1995)
Saraiva, P.C., de Moura, E.S., Ziviani, N., andRodrigo Fonseca, W.M., Riberio-Neto, B.: Rank-Preserving Two-Level Caching for Scalable Search Engines. In: Proceedings of the 24Th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 51–58 (2001)
Seidl, T., Kriegel, H.P.: Efficient User-Adaptable Similarity Search in Large Multimedia Database. In: Proceedings of the 23Rd International Conference on Very Large Data Bases, pp. 506–515 (1997)
Sharifzadeh, M., Shahabi, C.: Vor-Tree: R-trees with Voronoi Diagrams for Efficient Processing of Spatial Nearest Neighbor Queries. In: Proceedings of the VLDB Endowment (2010)
Tao, Y., Papadias, D., Lian, X.: Reverse Knn Search in Arbitrary Dimensionality. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases - Volume 30, pp. 744–755 (2004)
Tao, Y., Yi, K., Sheng, C., Kalnis, P.: Quality and Efficiency in High Dimensional Nearest Neighbor Search. In: ACM SIGMOD (2009)
Teevan, J., Adar, E., Jones, R., Potts, M.: History Repeats Itself: Repeat Queries in Yahoo’s Query Logs. In: Proceedings of the 29Th Annual ACM Conference on Research and Development in Information Retrieval (SIGIR ’06), pp. 703–704 (2005)
Weber, R., Schek, H., Blott, S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: VLDB, pp. 194–205 (1998)
Willhalm, T., Popovici, N., Boshmaf, Y., Plattner, H., Zeier, A., Schaffner, J.: Simd-scan: ultra fast in-memory table scan using on-chip vector processing units. Proc. VLDB Endowment 2(1), 385–394 (2009)
Article Google Scholar
Xie, Y., O’Hallaron, D.: Locality in Search Engine Queries and Its Implications for Caching. In: Proceedings of 21 Annual Joint Conference of the IEEE Computer and Communications Societies (Infocom), pp. 1238–1247 (2002)
Xiong, X., Mokbel, M.F., Aref, W.G.: Sea-Cnn: Scalable Processing of Continuous K-Nearest Neighbor Queries in Spatio-Temporal Databases. In: Proceedings of the 21St International Conference on Data Engineering, pp. 643–654 (2005)
Yang, C., Lin, K.I.: An Index Structure for Efficient Reverse Nearest Neighbor Queries. In: Proceedings of the 2001 International Conference on Data Engineering, pp. 485–492 (2001)
Zhang, J., Zhu, M., Papadias, D., Tao, Y., Lee, D.L.: Location-Based Spatial Queries. In: ACM SIGMOD Int. Conf. Manag. Data. NY, USA, pp. 443–454 (2003)
Zhang, R., Stradling, M.: The hv-tree: a memory hierarchy aware version index. Proc. VLDB Endowment 3(1-2), 397–408 (2010)
Article Google Scholar
Zhuang, Y., Li, Q., Chen, L.: Multi-Query Optimization for Distributed Similarity Query Processing. In: International Conference on Distributed Computing Systems (ICDCS) (2008)

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, Chang Jung Christian University, Taiwan, Republic of China
Yu-Chi Chung
Department of Computer Science and Information Engineering, National Cheng-Kung University, Taiwan, Republic of China
I-Fang Su
Department of Computer Science and Information Engineering, National Cheng-Kung University, Taiwan, Republic of China
Chiang Lee
Cloud Computing Laboratory, Chunghwa Telecom Laboratories, Taipei, Taiwan
Pei-Chi Liu

Authors

Yu-Chi Chung
View author publications
You can also search for this author in PubMed Google Scholar
I-Fang Su
View author publications
You can also search for this author in PubMed Google Scholar
Chiang Lee
View author publications
You can also search for this author in PubMed Google Scholar
Pei-Chi Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu-Chi Chung.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chung, YC., Su, IF., Lee, C. et al. Multiple k nearest neighbor search. World Wide Web 20, 371–398 (2017). https://doi.org/10.1007/s11280-016-0392-2

Download citation

Received: 15 March 2015
Revised: 29 April 2016
Accepted: 02 May 2016
Published: 13 May 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s11280-016-0392-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiple k nearest neighbor search

Abstract

Access this article

Similar content being viewed by others

Scalable decision fusion algorithm for enabling decentralized computation in distributed, big data clustering problems

MongoDB Vs PostgreSQL: A comparative study on performance aspects

RefinerHash: a new hashing-based re-ranking technique for image retrieval

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multiple k nearest neighbor search

Abstract

Access this article

Similar content being viewed by others

Scalable decision fusion algorithm for enabling decentralized computation in distributed, big data clustering problems

MongoDB Vs PostgreSQL: A comparative study on performance aspects

RefinerHash: a new hashing-based re-ranking technique for image retrieval

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation