The VLDB Journal

, Volume 25, Issue 3, pp 317–338 | Cite as

Exact and approximate flexible aggregate similarity search

  • Feifei Li
  • Ke Yi
  • Yufei Tao
  • Bin Yao
  • Yang Li
  • Dong Xie
  • Min Wang
Regular Paper

Abstract

Aggregate similarity search, also known as aggregate nearest-neighbor (Ann) query, finds many useful applications in spatial and multimedia databases. Given a group Q of M query objects, it retrieves from a database the objects most similar to Q, where the similarity is an aggregation (e.g., \({{\mathrm{sum}}}\), \(\max \)) of the distances between each retrieved object p and all the objects in Q. In this paper, we propose an added flexibility to the query definition, where the similarity is an aggregation over the distances between p and any subset of \(\phi M\) objects in Q for some support\(0< \phi \le 1\). We call this new definition flexible aggregate similarity search and accordingly refer to a query as a flexible aggregate nearest-neighbor (Fann) query. We present algorithms for answering Fann queries exactly and approximately. Our approximation algorithms are especially appealing, which are simple, highly efficient, and work well in both low and high dimensions. They also return near-optimal answers with guaranteed constant-factor approximations in any dimensions. Extensive experiments on large real and synthetic datasets from 2 to 74 dimensions have demonstrated their superior efficiency and high quality.

Keywords

Aggregate nearest neighbor query Approximate similarity search Aggregate similarity search 

References

  1. 1.
    Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.Y.: An optimal algorithm for approximate nearest neighbor searching in fixed dimensions. J. ACM 45(6), 891–923 (1998)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Berchtold, S., Böhm, C., Keim, D.A., Kriegel, H.-P.: A cost model for nearest neighbor search in high-dimensional data space. In: Proceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Tucson. PODS ’97, pp. 78–86. ACM, New York (1997)Google Scholar
  3. 3.
    Berg, M., Kreveld, M., Overmars, M., Schwarzkopf, O.: Computational Geometry: Algorithms and Applications. Springer, New York (1997)Google Scholar
  4. 4.
    Böhm, C.: A cost model for query processing in high dimensional data spaces. ACM Trans. Database Syst. 25(2), 129–178 (2000)Google Scholar
  5. 5.
    Chakrabarti, K., Porkaew, K., Mehrotra, S.: The Color Data Set (2006). http://kdd.ics.uci.edu/databases/CorelFeatures/CorelFeatures.data.html
  6. 6.
    Ciaccia, P., Patella, M., Zezula, P.: M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of the 23rd International Conference on Very Large Data Bases. VLDB ’97, pp. 426–435. Morgan Kaufmann Publishers Inc., San Francisco (1997)Google Scholar
  7. 7.
    Fagin, R., Kumar, R., Sivakumar, D.: Efficient similarity search and classification via rank aggregation. In: SIGMOD (2003)Google Scholar
  8. 8.
    Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: PODS (2001)Google Scholar
  9. 9.
    Ferhatosmanoglu, H., Stanoi, I., Agrawal, D., El Abbadi, A.: Constrained nearest neighbor queries. In: SSTD, pp. 257–278 (2001)Google Scholar
  10. 10.
    Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB (1999)Google Scholar
  11. 11.
    Hjaltason, G.R., Samet, H.: Distance browsing in spatial databases. ACM Trans. Database Syst. 24(2), 265–318. doi:10.1145/320248.320255
  12. 12.
    Jagadish, H.V., Ooi, B.C., Tan, K.L., Yu, C., Zhang, R.: iDistance: an adaptive B\(^+\)-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst. 30(2), 364–397 (2005)CrossRefGoogle Scholar
  13. 13.
    Kumar, P., Mitchell, J.S.B., Yildirim, E.A.: Approximate minimum enclosing balls in high dimensions using core-sets. ACM J. Exp. Algorithmics 8, Art ID 1.1. doi:10.1145/996546.996548 (2003)
  14. 14.
    LeCun, Y., Cortes, C.: The MNIST Data Set (1998). http://yann.lecun.com/exdb/mnist
  15. 15.
    Li, F., Yao, B., Kumar, P.: Group enclosing queries. IEEE Trans Knowl Data Eng 23(10), 1526–1540 (2010)Google Scholar
  16. 16.
    Li, H., Lu, H., Huang, B., Huang, Z.: Two ellipse-based pruning methods for group nearest neighbor queries. In: Proceedings of the 13th Annual ACM International Workshop on Geographic Information Systems, Bremen. GIS ’05, pp. 192–199. ACM, New York (2005)Google Scholar
  17. 17.
    Li, Y., Li, F., Yi, K., Yao, B., Wang, M.: Flexible aggregate similarity search. In: SIGMOD, pp. 1009–1020 (2011)Google Scholar
  18. 18.
    Papadias, D., Shen, Q., Tao, Y., Mouratidis, K.: Group nearest neighbor queries. In: ICDE (2004)Google Scholar
  19. 19.
    Papadias, D., Tao, Y., Mouratidis, K., Hui, C.K.: Aggregate nearest neighbor queries in spatial databases. ACM TODS 30(2), 529–576 (2005)CrossRefGoogle Scholar
  20. 20.
    Razente, H.L., Barioni, M.C.N., Traina, A.J.M., Faloutsos, C., Traina Jr., C.: A novel optimization approach to efficiently process aggregate similarity queries in metric access methods. In: CIKM (2008)Google Scholar
  21. 21.
    Rose, K., Manjunath, B.S.: The CORTINA Data Set (2004). http://www.scl.ece.ucsb.edu/datasets/index.htm
  22. 22.
    Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, San Jose. SIGMOD ’95, pp. 71–79. ACM, New York (1995)Google Scholar
  23. 23.
    Stanoi, I., Agrawal, D., El Abbadi, A.: Reverse nearest neighbor queries for dynamic databases. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 44–53 (2000)Google Scholar
  24. 24.
    Tao, Y., Yi, K., Sheng, C., Kalnis, P.: Quality and efficiency in high dimensional nearest neighbor search. In: SIGMOD (2009)Google Scholar
  25. 25.
    Yiu, M.L., Mamoulis, N., Papadias, D.: Aggregate nearest neighbor queries in road networks. IEEE TKDE 17(6), 820–833 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Feifei Li
    • 1
  • Ke Yi
    • 2
  • Yufei Tao
    • 3
  • Bin Yao
    • 4
  • Yang Li
    • 4
  • Dong Xie
    • 4
  • Min Wang
    • 5
  1. 1.University of UtahSalt Lake CityUSA
  2. 2.Hong Kong University of Science and TechnologyHong KongChina
  3. 3.Chinese University of Hong KongHong KongChina
  4. 4.Shanghai Key Laboratory of Scalable Computing and SystemsShanghai Jiao Tong UniversityShanghaiChina
  5. 5.Visa Research, Visa Inc.Foster CityUSA

Personalised recommendations