Skip to main content
Log in

Performance analysis of a dual-tree algorithm for computing spatial distance histograms

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Many scientific and engineering fields produce large volume of spatiotemporal data. The storage, retrieval, and analysis of such data impose great challenges to database systems design. Analysis of scientific spatiotemporal data often involves computing functions of all point-to-point interactions. One such analytics, the Spatial Distance Histogram (SDH), is of vital importance to scientific discovery. Recently, algorithms for efficient SDH processing in large-scale scientific databases have been proposed. These algorithms adopt a recursive tree-traversing strategy to process point-to-point distances in the visited tree nodes in batches, thus require less time when compared to the brute-force approach where all pairwise distances have to be computed. Despite the promising experimental results, the complexity of such algorithms has not been thoroughly studied. In this paper, we present an analysis of such algorithms based on a geometric modeling approach. The main technique is to transform the analysis of point counts into a problem of quantifying the area of regions where pairwise distances can be processed in batches by the algorithm. From the analysis, we conclude that the number of pairwise distances that are left to be processed decreases exponentially with more levels of the tree visited. This leads to the proof of a time complexity lower than the quadratic time needed for a brute-force algorithm and builds the foundation for a constant-time approximate algorithm. Our model is also general in that it works for a wide range of point spatial distributions, histogram types, and space-partitioning options in building the tree.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Allen, M.: Introduction to Molecular Dynamics Simulation. John von Neumann Institute of Computing, NIC Seris, vol. 23 (2003)

  2. Allen M.P., Tildesley D.J.: Computer Simulations of Liquids. Clarendon Press, Oxford (1987)

    Google Scholar 

  3. Arya, M., Cody, W.F., Faloutsos, C., Richardson, J., Toya, A.: QBISM: Extending a DBMS to Support 3D Medical Images. In: ICDE, pp. 314–325, (1994)

  4. Bamdad M., Alavi S., Najafi B., Keshavarzi E.: A new expression for radial distribution function and infinite shear modulus of lennard-jones fluids. Chem. Phys. 325, 554–562 (2006)

    Article  Google Scholar 

  5. Barnes J., Hut P.: A hierarchical O(N log N) force-calculation algorithm. Nature 324(4), 446–449 (1986)

    Article  Google Scholar 

  6. Brown, P.G.: Overview of scidb: large scale array storage, processing and analysis. In: SIGMOD Conference, pp. 963–968 (2010)

  7. Callahan P.B., Kosaraju S.R.: A decomposition of multidimensional point sets with applications to k-nearest-neighbors and n-body potential fields. J. ACM 42(1), 67–90 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  8. Cormen T.H., Leiserson C.E., Rivest R.L., Stein C.: Introduction to Algorithms, pp. 73–75 2nd edn. MIT Press and McGraw-Hill, Cambridge (2001)

    MATH  Google Scholar 

  9. Csabai, I., Trencseni, M., Dobos, L., Jozsa, P., Herczegh, G., Purger, N., Budavari, T., Szalay, A.S.: Spatial indexing of large multidimensional databases. In: Proceedings of the 3rd Biennial Conference on Innovative Data Systems Resarch (CIDR), pp. 207–218 (2007)

  10. Eltabakh, M.Y., Ouzzani, M., Aref, W.G.: BDBMS—a database management system for biological data. In: Proceedings of the 3rd Biennial Conference on Innovative Data Systems Resarch (CIDR), pp. 196–206 (2007)

  11. Feig M., Abdullah M., Johnsson L., Pettitt B.M.: Large scale distributed data repository: design of a molecular dynamics trajectory database. Future Gener. Comput. Syst. 16(1), 101–110 (1999)

    Article  Google Scholar 

  12. Filipponi A.: The radial distribution function probed by X-ray absorption spectroscopy. J. Phys. Condens. Matt. 6, 8415–8427 (1994)

    Article  Google Scholar 

  13. Finocchiaro G., Wang T., Hoffmann R., Gonzalez A., Wade R.: DSMM: a database of simulated molecular motions. Nucl. Acids Res. 31(1), 456–457 (2003)

    Article  Google Scholar 

  14. Frenkel D., Smit B.: Understanding Molecular Simulation: From Algorithm to Applications, volume 1 of Computational Science Series. Academic Press, New York (2002)

    Google Scholar 

  15. Gawlick, D., Lenkov, D., Yalamanchi, A., Chernobrod, L.: Applications for expression data in relational database system. In: ICDE, pp. 609–620 (2004)

  16. Gray, A.G., Moore, A.W.: N-body problems in statistical learning. In: Advances in Neural Information Processing Systems (NIPS), pp. 521–527, MIT Press (2000)

  17. Gray J., Liu D., Nieto-Santisteban M., Szalay A., DeWitt D., Heber G.: Scientific data management in the coming decade. SIGMOD Rec. 34(4), 34–41 (2005)

    Article  Google Scholar 

  18. Greengard L., Rokhlin V.: A fast algorithm for particle simulations. J. Comput. Phys. 135(12), 280–292 (1987)

    Google Scholar 

  19. Heber, G., Gray, J.: Supporting finite element analysis with a relational database backend. Part I: there is life beyond files. Technical Report MSR-TR-2005-49, Microsoft Research (2005)

  20. Hess B., Kutzner C., van der Spoel D., Lindahl E.: GROMACS 4: algorithms for highly efficient, load-balanced, and scalable molecular simulation. J. Chem. Theory Comput. 4(3), 435–447 (2008)

    Article  Google Scholar 

  21. Howe, B., Maier, D., Bright, L.: Smoothing the ROI curve for scientific data management applications. In: CIDR, pp. 185–195 (2007)

  22. Klasky, S., Ludaescher, B., Parashar, M.: The Center for Plasma Edge Simulation Workflow Requirements. In: EEE Workshop on Workflow and Data Flow for Scientific Applications (SciFlow’06), pp. 73–73 (1991)

  23. Krishnamurthy L., Nadeau J., Ozsoyoglu G., Ozsoyoglu M., Schaeffer G., Tasan M., Xu W.: Pathways database system: an integrated system for biological pathways. Bioinformatics 19(8), 930–937 (2003)

    Article  Google Scholar 

  24. Ma, X., Winslett, M., Norris, J., Jiao, X., Fiedler, R.: Godiva: lightweight data management for scientific visualization applications. In: ICDE, pp. 732–744 (2004)

  25. Moore A.W., Connolly A.J., Genovese C., Gray A., Grone L., Kanidoris N. II, Nichol R.C., Schneider J., Szalay A.S., Szapudi I., Wasserman L.: Mining the Sky, volume 2001 of ESO Astrophysics Symposia, Chapter Fast Algorithms and Efficient Statistics: N-Point Correlation Functions, pp. 71–82. Springer, Heidelberg (2006)

    Google Scholar 

  26. Omeltchenko A., Campbell T.J., Kalia R.K., Liu X., Nakano A., Vashishta P.: Scalable I/O of large-scale molecular dynamics simulations: a data-compression algorithm. Comput. Phys. Commun. 131, 78–85 (2000)

    Article  MATH  Google Scholar 

  27. Orenstein J.A.: Multidimensional tries used for associative searching. Inf. Process. Lett. 14(4), 150–157 (1982)

    Article  Google Scholar 

  28. Patel J.M.: The role of declarative querying in bioinformatics. OMICS J. Integr. Biol. 7(1), 89–91 (2003)

    Article  Google Scholar 

  29. Samet H.: The quadtree and related hierarchical data structures. ACM Comput. Surv. 16(2), 187–260 (1984)

    Article  MathSciNet  Google Scholar 

  30. Springel V., White S.D.M., Jenkins A., Frenk C.S., Yoshida N., Gao L., Navarro J., Thacker R., Croton D., Helly J., Peacock J.A., Cole S., Thomas P., Couchman H., Evrard A., Colberg J., Pearce F.: Simulations of the formation, evolution and clustering of galaxies and quasars. Nature 435, 629–636 (2005)

    Article  Google Scholar 

  31. Stark J.L., Murtagh F.: Astronomical Image and Data Analysis. Springer, Heidelberg (2002)

    Google Scholar 

  32. Stonebraker, M., Madden, S., Abadi, D.J., Harizopoulos, S., Hachem, N., Helland, P.: The End of an Architectural Era (It’s Time for a Complete Rewrite). In: VLDB, pp. 1150–1160 (2007)

  33. Szalay, A.S., Gray, J., Thakar, A., Kunszt, P.Z., Malik, T., Raddick, J., Stoughton, C., vandenBerg, J.: The SDSS Skyserver: public access to the sloan digital sky server data. In: Proceedings of International Conference on Management of Data (SIGMOD), pp. 570–581 (2002)

  34. Szapudi I.: A new method for calculating counts in cells. Astrophys. J. 493(1), 39–51 (1998)

    Article  Google Scholar 

  35. Szapudi I., Colombi S., Bernardeau F.: Cosmic statistics of statistics. Mon. Notes Roy. Astron. Soc. 310(2), 428–444 (1999)

    Article  Google Scholar 

  36. Tao Y., Sun J., Papadias D.: Analysis of predictive spatio-temporal queries. ACM Trans. Database Syst. 28(4), 295–336 (2003)

    Article  Google Scholar 

  37. Tu, Y.-C., Chen, S., Pandit, S.: Computing Spatial Distance Histograms Efficiently in Scientific Databases. Technical Report CSE/08-103, http://www.cse.usf.edu/~ytu/pub/tr/pdh.pdf, Department of Computer Science and Engineering, University of South Florida (2008)

  38. Tu, Y.-C., Chen, S., Pandit, S.: Computing distance histograms efficiently in scientific databases. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 796–807 (2009)

  39. Türker, C., Akal, F., Joho, D., Schlapbach, R.: B-fabric: an open source life sciences data management system. In: SSDBM, pp. 185–190 (2009)

  40. Xu, W., Ozer, S., Gutell, R.R.: Covariant evolutionary event analysis for base interaction prediction using a relational database management system for RNA. In: SSDBM, pp. 200–216 (2009)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi-Cheng Tu.

Additional information

Work was done when Chen was a visiting professor at the University of South Florida.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, S., Tu, YC. & Xia, Y. Performance analysis of a dual-tree algorithm for computing spatial distance histograms. The VLDB Journal 20, 471–494 (2011). https://doi.org/10.1007/s00778-010-0205-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-010-0205-7

Keywords

Navigation