Finding Data Broadness Via Generalized Nearest Neighbors

  • Jayendra Venkateswaran
  • Tamer Kahveci
  • Orhan Camoglu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3896)

Abstract

A data object is broad if it is one of the k-Nearest Neighbors (k-NN) of many data objects. We introduce a new database primitive called Generalized Nearest Neighbor (GNN) to express data broadness. We also develop three strategies to answer GNN queries efficiently for large datasets of multidimensional objects. The R*-Tree based search algorithm generates candidate pages and ranks them based on their distances. Our first algorithm, Fetch All (FA), fetches as many candidate pages as possible. Our second algorithm, Fetch One (FO), fetches one candidate page at a time. Our third algorithm, Fetch Dynamic (FD), dynamically decides on the number of pages that needs to be fetched. We also propose three optimizations, Column Filter, Row Filter and Adaptive Filter, to eliminate pages from each dataset. Column Filter prunes the pages that are guaranteed to be non-broad. Row Filter prunes the pages whose removal do not change the broadness of any data point. Adaptive Filter prunes the search space dynamically along each dimension to eliminate unpromising objects. Our experiments show that FA is the fastest when the buffer size is large and FO is the fastest when the buffer size is small. FD is always either fastest or very close to the faster of FA and FO. FD is significantly faster than the existing methods adapted to the GNN problem.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Albers, S.: Competitive Online Algorithms. Technical Report LS-96-2, brics (September 1996)Google Scholar
  2. 2.
    Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: An Efficient and Robust Access Method for Points and Rectangles. In: International Conference on Management of Data (SIGMOD), pp. 322–331 (1990)Google Scholar
  3. 3.
    Berchtold, S., Ertl, B., Keim, D.A., Kriegel, H.-P., Seidl, T.: Fast Nearest Neighbor Search in High-dimensional Space. In: International Conference on Data Engineering (ICDE), pp. 209–218 (1998)Google Scholar
  4. 4.
    Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is nearest neighbor meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  5. 5.
    Böhm, C., Krebs, F.: The k-Nearest Neighbour Join: Turbo Charging the KDD Process. Knowledge and Information Systems (KAIS) 6(6) (2004)Google Scholar
  6. 6.
    Çamoğlu, O., Kahveci, T., Singh, A.K.: Towards Index-based Similarity Search for Protein Structure Databases. Journal of Bioinformatics and Computational Biology (JBCB) 2(1), 99–126 (2004)CrossRefGoogle Scholar
  7. 7.
    Chan, C.Y., Ooi, B.C.: Efficient Scheduling of Page Access in Index- Based Join Processing. IEEE Transactions on Knowledge and Data Engineering (TKDE) 9(6), 1005–1011 (1997)CrossRefGoogle Scholar
  8. 8.
    Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. In: Computational Systems Bioinformatics Conference (CSB), pp. 523–528 (2003)Google Scholar
  9. 9.
    Hjaltason, G.R., Samet, H.: Ranking in Spatial Databases. In: Symposium on Spatial Databases, Portland, Maine, August 1995, pp. 83–95 (1995)Google Scholar
  10. 10.
    Huang, X., Madan, A.: CAP3: A DNA Sequence Assembly Program. Genome Research 9(9), 868–877 (1999)CrossRefGoogle Scholar
  11. 11.
    Kamel, I., Faloutsos, C.: Hilbert R-tree: An Improved R-tree using Fractals. In: International Conference on Very Large Databases (VLDB), pp. 500–509 (1994)Google Scholar
  12. 12.
    Korn, F., Muthukrishnan, S.: Influence sets based on reverse nearest neighbor queries. In: International Conference on Management of Data (SIGMOD), pp. 201–212 (2000)Google Scholar
  13. 13.
    Korn, F., Sidiropoulos, N., Faloutsos, C., Siegel, E., Protopapas, Z.: Fast Nearest Neighbor Search in Medical Databases. In: International Conference on Very Large Databases (VLDB), India, pp. 215–226 (1996)Google Scholar
  14. 14.
    Merrett, T.H., Kambayashi, Y., Yasuura, H.: Scheduling of Page-Fetches in Join Operations. In: International Conference on Very Large Databases (VLDB), pp. 488–498 (1981)Google Scholar
  15. 15.
    Roussopoulos, N., Kelley, S., Vincent, F.: Nearest Neighbor Queries. In: International Conference on Management of Data (SIGMOD), San Jose, CA (1995)Google Scholar
  16. 16.
    Leutenegger, M.L.S., Edgington, J.: STR: A Simple and Efficient Algorithm for R-Tree Packing. In: International Conference on Data Engineering (ICDE), pp. 497–506 (1997)Google Scholar
  17. 17.
    Seeger, B.: An analysis of schedules for performing multi-page requests. Information Systems 21(5), 387–407 (1996)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Seidl, T., Kriegel, H.P.: Optimal Multi-Step k-Nearest Neighbor Search. In: International Conference on Management of Data, SIGMOD (1998)Google Scholar
  19. 19.
    Stanoi, I., Riedewald, M., Agrawal, D., Abbadi, A.E.: Discovery of Influence Sets in Frequently Updated Databases. In: International Conference on Very Large Databases (VLDB), pp. 99–108 (2001)Google Scholar
  20. 20.
    Tao, Y., Papadias, D., Lian, X.: Reverse kNN Search in Arbitrary Dimensionality. In: International Conference on Very Large Databases, VLDB (2004)Google Scholar
  21. 21.
    Xia, C., Lu, H., Ooi, B.C., Hu, J.: GORDER: An Efficient Method for KNN Join Processing. In: International Conference on Very Large Databases, VLDB (2004)Google Scholar
  22. 22.
    Yang, C., Lin, K.-I.: An Index Structure for Efficient Reverse Nearest Neighbor Queries. In: International Conference on Data Engineering (ICDE), pp. 485–492 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jayendra Venkateswaran
    • 1
  • Tamer Kahveci
    • 1
  • Orhan Camoglu
    • 2
  1. 1.CISE DepartmentUniversity of FloridaGainesville
  2. 2.University of CaliforniaSanta Barbara

Personalised recommendations