Skip to main content
Log in

Self-tuning management of update-intensive multidimensional data in clusters of workstations

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Contemporary applications continuously modify large volumes of multidimensional data that must be accessed efficiently and, more importantly, must be updated in a timely manner. Single-server storage approaches are insufficient when managing such volumes of data, while the high frequency of data modification render classical indexing methods inefficient. To address these two problems we introduce a distributed storage manager for multidimensional data based on a Cluster-of-Workstations. The manager addresses the above challenges through a set of mechanisms that, through selective on-line data reorganization, collectively maintain a balanced load across a cluster of workstations. With the help of both a highly efficient and speedy self-tuning mechanism, based on a new data structure called stat-index, as well as a query aggregation and clustering algorithm, our storage manager attains short query response times even in the presence of massive modifications and highly skewed access patterns. Furthermore, we provide a data migration cost model used to determine the best data redistribution strategy. Through extensive experimentation with our prototype, we establish that our storage manager can sustain significant update rates with minimal overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Arge, L., Hinrichs, K., Vahrenhold, J., Vitter, J.: Efficient bulk operations on dynamic R-trees. In: ALENEX, pp. 328–348 (1999)

  2. Beckmann, N., Kriegel, H., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data. Atlantic City, 23–25 May, pp. 322–331. ACM Press, New York (1990)

  3. Berchtold, S., Böhm, C., Kriegel, H.: Improving the query performance of high-dimensional index structures by bulk-load operations. In: Proceedings of the 6th International Conference on Extending Database Technology, EDBT. vol. 1377, pp. 216–230. Springer, Heidelberg, 23–27 (1998)

  4. Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: a multidimensional workload-aware histogram. In: SIGMOD Conference, pp. 211–222 (2001)

  5. Chen, L., Choubey, R., Rundensteiner, E.: Bulk-insertions infor-trees using the small-tree-large-tree approach. In: ACM-GIS ’98, Proceedings of the 6th International Symposium on Advances in Geographic Information Systems, 6-7 November 1998, Washington, pp. 161–162. ACM, New York (1998)

  6. Choubey, R., Chen, L., Rundensteiner, E.: GBI: a generalized R-tree bulk-insertion strategy. In: Advances in Spatial Databases, 6th International Symposium, SSD’99, Hong Kong, July 20-23 July 1999, Proceedings, vol. 1651. Lecture Notes in Computer Science, pp. 91–108. Springer, Heidelberg (1999)

  7. Deshpande, P., Ramasamy, K., Shukla, A., Naughton, J.F.: Caching multidimensional queries using chunks. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, Seattle, 2–4 June, pp. 259–270 (1998)

  8. Eager, D.L., Lazowska, E.D., Zahorjan, J.: A comparison of receiver-initiated and sender-initiated adaptive load sharing. In: Proceedings of ACM SIGMETRICS, pp. 1–3 (1985)

  9. Ellis C.: Distributed data structures: a case study. IEEE Trans. Comput. 34(12), 1178–1185 (1985)

    Google Scholar 

  10. Feeley, M., Morgan, W.E., Pighin, F.H., Karlin, A.R., Levy, H.M.: Implementing global memory management in a workstation custer. In: Proceedings of the 21st Symposium on Operating Systems Principles, October (1995)

  11. Ghanem, T.M., Shah, R., Mokbel, M.F., Aref, W.G., Vitter, J.S.: Bulk operations for space-partitioning trees. In: ICDE, pp. 29–41 (2004)

  12. Gibbons P.B., Matias Y., Poosala V.: Fast incremental maintenance of approximate histograms. ACM Trans. Database Syst. 27(3), 261–298 (2002)

    Article  Google Scholar 

  13. Gray J., Reuter A.: Transaction Processing: Concepts and Techniques. Morgan-Kaufman, San Mateo (1992)

    Google Scholar 

  14. Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: SIGMOD’84, Proceedings of Annual Meeting, Boston, 18-21 June, pp. 47–57. ACM Press, New York (1984)

  15. Hadjielefteriou, M.: R*-tree implentation version 0.62b. http://ucr.ca.edu/marios

  16. Hadjieleftheriou, M., Kriakov, V., Tao, Y., Kollios, G., Delis, A., Tsotras, V.J.: Spatio-temporal data services in a shared-nothing environment. In: Proceedings of 16th International Conference on Scientific and Statistical Database Management SSDBM. June (2004)

  17. Hall, J., Hartline, J., Karlin, A.R., Saia, J., Wilkes, J.: On algorithms for efficient data migration. In: 12th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA)

  18. Iyengar S., Bastani F., Yen I.: Concurrent maintenance of data structures in a distributed environment. Comput. J. 31(12), 165–174 (1988)

    MATH  Google Scholar 

  19. Johnson, T., Krishna, P., Colbrook, A.: Distributed indices for accessing distributed data. In: IEEE Symposium on Mass Storage Systems (MSS ’93), pp. 199–208, Los Alamitos, Ca., USA, April. IEEE Computer Society Press (1993)

  20. K-means clustering algorithm. http://mathworld.wolfram.com/K-MeansClusteringAlgorithm.html

  21. Kamel, I., Faloutsos, C.: Parallel R-trees. In: Proceedings of the 1992 ACM SIGMOD International Conference on Management of Data, San Diego, 2-5 June, pp. 195–204. ACM Press, New York (1992)

  22. Kamel, I., Faloutsos, C.: Hilbert R-tree: an improved R-tree using fractals. In: VLDB’94, Proceedings of 20th International Conference on Very Large Data Bases, 12-15 September 1994, Santiago de Chile, Chile, pp. 500–509. Morgan Kaufmann, Los Altos (1994)

  23. Karlsson, J.S.: hQT*: a scalable distributed data structure for high-performance spatial accesses. In: FODO, pp. 37–46 (1998)

  24. Khuller, S., Kim, Y.-A., Wan, Y.-C.J.: Algorithms for data migration with cloning. In: Proceedings of the Twenty-second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. pp. 27–36. ACM Press, New York

  25. Kornacker, M., Banks, D.: High-concurrency locking in R-trees. In: VLDB’95, Zurich, pp. 134–145 (1995)

  26. Koudas, N., Faloutsos, C., Kamel, I.: Declustering spatial databases on a multi-computer architecture. In: Advances in Database Technology—EDBT’96, 5th International Conference on Extending Database Technology, Avignon (1996)

  27. Kriakov, V., Delis, A., Kollios, G.: Management of highly dynamic multidimensional data in a cluster of workstations. In: Proceedings of the 9th International Conference on Extending Database Technology, EDBT, vol. 2992, pp. 748–764. Springer, Heidelberg (2004)

  28. Kroll, B., Widmayer, P.: Distributing a search structure among a growing number of processors. In: Proceedings of the 1994 ACM SIGMOD Conference, pp. 265–276 (1994)

  29. Kulkarni, P., Ganesan, D., Shenoy, P.J., Lu, Q.: SensEye: a Multi-Tier Camera Sensor Network. In: ACM Multimedia, pp. 229–238, (2005)

  30. Lee, M., Kitsuregawa, M., Ooi, B., Tan, K., Mondal, A.: Towards self-tuning data placement in parallel database systems. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, 16–18 May 2000, Dallas, pp. 225–236 (2000)

  31. Litwin, W., Neimat, M.A.: k-RP*S: a scalable distributed data structure for high-performance multi-attribute access. In: Proceedings of the Fourth International Conference on Parallel and Distributed Information Systems, 18-20 December, Miami Beach, pp. 120–131. IEEE Computer Society (1996)

  32. Litwin, W., Neimat, M.A., Schneider, D.: Linear hashing for distributed files. In: Proceedings of the 1993 SIGMOD Conference, Washington, May (1993)

  33. Matsliach, G., Shmueli, O.: An efficient method for distributing search structures. In: First International Conference on Parallel and Distributed Information Systems, pp. 159–166 (1991)

  34. Mondal, A., Kitsuregawa, M., Ooi, B.C., Tan, K.L.: R-tree-based data migration and self-tuning strategies in shared-nothing spatial databases. In: Proceedings of ACM Geographic Information Systems, pp. 28–33. ACM Press, New York (2001)

  35. Ousterhout, J. K. G. T. Hamachi, Mayo, R.N., Scott, W.S., Taylor, G.S.: Magic: A VLSI layout system. In: 21st Design Automation Conference, pp. 152–159, June (1984)

  36. Pagel, B., Korn, F., Faloutsos, C.: Deflating the dimensionality curse using multiple fractal dimensions. In: ICDE, pp. 589–598 (2000)

  37. Panagos E., Biliris A.: Synchronization and recovery in a client-server storage system. VLDB J. 6(3), 209–223 (1997)

    Article  Google Scholar 

  38. Panwar S., Mao S., Ryoo J., Li Y.: TCP/IP Essentials: A Lab-Based Approach. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  39. Papadopoulos A., Manolopoulos Y.: Nearest neighbor queries in shared-nothing environments. GeoInformatica 1(4), 369–392 (1997)

    Article  Google Scholar 

  40. Papadopoulos A., Manolopoulos Y.: Parallel bulk-loading of spatial data. Parallel Comput. 29(10), 1419–1444 (2003)

    Article  MathSciNet  Google Scholar 

  41. Patel, J., Yu, J.-B., Kabra, N., Tufte, K.: Building a scalable geo-spatial dbms: technology, implementation, and evaluation. In: Proceedings of the ACM SIGMOD, pp. 336–347 (1997)

  42. Porkaew, K., Lazaridis, I., Mehrotra, S.: Querying mobile objects in spatio-temporal databases. In: Proceedings of 7th SSTD, July (2001)

  43. Pritchett, D.: BASE: An ACID Alternative. ACM Queue, 6(3), May/June (2008)

  44. Qiao L., Iyer B.R., Agrawal D., El Abbadi A.: Automated storage management with QoS guarantee in large-scale virtualized storage systems. IEEE Data Eng. Bull. 29(3), 47–54 (2006)

    Google Scholar 

  45. Ramamritham K., Chrysanthis P.K.: A taxonomy of correctness criteria in database applications. VLDB J. 5(1), 85–97 (1996)

    Article  Google Scholar 

  46. Robinson, J.T.: The K-D-B-Tree: A search structure for large multidimensional dynamic indexes. In: Proceedings of the 1981 ACM SIGMOD International Conference on Management of Data, Ann Arbor, 29 April–1 May, pp. 10–18. ACM Press, New York (1981)

  47. Roussopoulos, N., Kotidis, Y., Roussopoulos, M.: Cubetree: Organization of and bulk updates on the data cube. In: SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, 13-15 May 1997, Tucson, pp. 89–99. ACM Press, New York (1997)

  48. Saltenis, S., Jensen, C., Leutenegger, S., Lopez, M.A.: Indexing the positions of continuously moving objects. In: Proceeding of the ACM SIGMOD, pp. 331–342, May (2000)

  49. Salzberg B., Tsotras V.J.: Comparison of access methods for time-evolving data. ACM Comput. Surv. 31(2), 158–221 (1999)

    Article  Google Scholar 

  50. Scheuermann, P., Weikum, G., Zabback, P.: Data partitioning and load balancing in parallel disk systems. VLDB J. 7(1) (1998)

  51. Schnitzer, B., Leutenegger, S.: Master-Client R-Trees: A new parallel R-tree architecture. In: Statistical and Scientific Database Management, pp. 68–77 (1999)

  52. Sellis, T., Roussopoulos, N., Faloutsos, C.: The R+-tree: a dynamic index for multi-dimensional objects. In: VLDB’87, Proceedings of 13th International Conference on Very Large Data Bases, 1–4 September 1987, Brighton, pp. 507–518. Morgan Kaufmann, Los Altos (1987)

  53. Smiljanic A.: Flexible bandwidth allocation in high-capacity packet switches. IEEE/ACM Trans. Netw. 10(2), 287–293 (2002)

    Article  Google Scholar 

  54. Sun, X., Wang, R., Salzberg, B., Zou, C.: Online B-tree merging. In: Proceedings of ACM SIGMOD, pp. 335–346 (2005)

  55. Szalay, A., Gray, J., van den Berg, J.: Petabyte scale data mining: dream or reality. In: Proceedings of SIPE Astronomy Telescopes and Instruments, August (2002)

  56. Slutz, D., Barclay, T., Gray, J.: TerraServer: a spatial data warehouse. In: Proceedings of ACM SIGMOD, pp. 307–318 (2000)

  57. Satoh T., Honishi, T., Inoue, U.: An index structure for parallel database processing. In: IEEE Second International Workshop on Research Issues on Data Engineering, pp. 224–225 (1992)

  58. The Earth Observing System Data and Information System. http://spsosun.gsfc.nasa.gov/eosinfo/EOSDIS_Site/index.html

  59. Tao Y., Papadias D.: Range aggregate processing in spatial databases. IEEE Trans. Knowl. Data Eng. 16(12), 1555–1570 (2004)

    Article  Google Scholar 

  60. Thaper, N., Guha, S., Indyk, P., Koudas, N.: Dynamic multidimensional histograms. In: SIGMOD Conference, pp. 428–439 (2002)

  61. Theodoridis Y., Sellis T. (1996) A model for the prediction of R-tree performance. In: PODS, pp. 161–171 (1996)

  62. Van den Bercken, J., Seeger, B.: An evaluation of generic bulk loading techniques. In: VLDB’01, Roma, pp. 461–470 (2001)

  63. Van den Bercken, J., Seeger, B., Widmayer, P.: A generic approach to bulk loading multidimensional index structures. In: VLDB’97, Proceedings of 23rd International Conference on Very Large Data Bases, 25–29 August, Athens, pp. 406–415. Morgan Kaufmann, Los Altos (1997)

  64. Zeiler, T.L.: LANDSAT program report 2002. Technical report, US Geological Survey—US Department of Interior. EROS Data Center, Sioux Falls (2002)

  65. Zou, C., Salzberg, B.: On-line reorganization of sparsely-populated B+trees. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, 4–6 June, pp. 115–124. ACM Press, New York (1996)

  66. Zou, C., Salzberg, B.: Safely and efficiently updating references during on-line reorganization. In: Proceedings of VLDB, pp. 512–522 (1998)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alex Delis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kriakov, V., Kollios, G. & Delis, A. Self-tuning management of update-intensive multidimensional data in clusters of workstations. The VLDB Journal 18, 739–764 (2009). https://doi.org/10.1007/s00778-008-0121-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-008-0121-2

Keywords

Navigation