The VLDB Journal

, Volume 18, Issue 3, pp 739–764 | Cite as

Self-tuning management of update-intensive multidimensional data in clusters of workstations

Regular Paper

Abstract

Contemporary applications continuously modify large volumes of multidimensional data that must be accessed efficiently and, more importantly, must be updated in a timely manner. Single-server storage approaches are insufficient when managing such volumes of data, while the high frequency of data modification render classical indexing methods inefficient. To address these two problems we introduce a distributed storage manager for multidimensional data based on a Cluster-of-Workstations. The manager addresses the above challenges through a set of mechanisms that, through selective on-line data reorganization, collectively maintain a balanced load across a cluster of workstations. With the help of both a highly efficient and speedy self-tuning mechanism, based on a new data structure called stat-index, as well as a query aggregation and clustering algorithm, our storage manager attains short query response times even in the presence of massive modifications and highly skewed access patterns. Furthermore, we provide a data migration cost model used to determine the best data redistribution strategy. Through extensive experimentation with our prototype, we establish that our storage manager can sustain significant update rates with minimal overhead.

Keywords

Multi-dimensional data Cluster of workstations Self-tuning storage 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arge, L., Hinrichs, K., Vahrenhold, J., Vitter, J.: Efficient bulk operations on dynamic R-trees. In: ALENEX, pp. 328–348 (1999)Google Scholar
  2. 2.
    Beckmann, N., Kriegel, H., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data. Atlantic City, 23–25 May, pp. 322–331. ACM Press, New York (1990)Google Scholar
  3. 3.
    Berchtold, S., Böhm, C., Kriegel, H.: Improving the query performance of high-dimensional index structures by bulk-load operations. In: Proceedings of the 6th International Conference on Extending Database Technology, EDBT. vol. 1377, pp. 216–230. Springer, Heidelberg, 23–27 (1998)Google Scholar
  4. 4.
    Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: a multidimensional workload-aware histogram. In: SIGMOD Conference, pp. 211–222 (2001)Google Scholar
  5. 5.
    Chen, L., Choubey, R., Rundensteiner, E.: Bulk-insertions infor-trees using the small-tree-large-tree approach. In: ACM-GIS ’98, Proceedings of the 6th International Symposium on Advances in Geographic Information Systems, 6-7 November 1998, Washington, pp. 161–162. ACM, New York (1998)Google Scholar
  6. 6.
    Choubey, R., Chen, L., Rundensteiner, E.: GBI: a generalized R-tree bulk-insertion strategy. In: Advances in Spatial Databases, 6th International Symposium, SSD’99, Hong Kong, July 20-23 July 1999, Proceedings, vol. 1651. Lecture Notes in Computer Science, pp. 91–108. Springer, Heidelberg (1999)Google Scholar
  7. 7.
    Deshpande, P., Ramasamy, K., Shukla, A., Naughton, J.F.: Caching multidimensional queries using chunks. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, Seattle, 2–4 June, pp. 259–270 (1998)Google Scholar
  8. 8.
    Eager, D.L., Lazowska, E.D., Zahorjan, J.: A comparison of receiver-initiated and sender-initiated adaptive load sharing. In: Proceedings of ACM SIGMETRICS, pp. 1–3 (1985)Google Scholar
  9. 9.
    Ellis C.: Distributed data structures: a case study. IEEE Trans. Comput. 34(12), 1178–1185 (1985)Google Scholar
  10. 10.
    Feeley, M., Morgan, W.E., Pighin, F.H., Karlin, A.R., Levy, H.M.: Implementing global memory management in a workstation custer. In: Proceedings of the 21st Symposium on Operating Systems Principles, October (1995)Google Scholar
  11. 11.
    Ghanem, T.M., Shah, R., Mokbel, M.F., Aref, W.G., Vitter, J.S.: Bulk operations for space-partitioning trees. In: ICDE, pp. 29–41 (2004)Google Scholar
  12. 12.
    Gibbons P.B., Matias Y., Poosala V.: Fast incremental maintenance of approximate histograms. ACM Trans. Database Syst. 27(3), 261–298 (2002)CrossRefGoogle Scholar
  13. 13.
    Gray J., Reuter A.: Transaction Processing: Concepts and Techniques. Morgan-Kaufman, San Mateo (1992)Google Scholar
  14. 14.
    Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: SIGMOD’84, Proceedings of Annual Meeting, Boston, 18-21 June, pp. 47–57. ACM Press, New York (1984)Google Scholar
  15. 15.
    Hadjielefteriou, M.: R*-tree implentation version 0.62b. http://ucr.ca.edu/marios
  16. 16.
    Hadjieleftheriou, M., Kriakov, V., Tao, Y., Kollios, G., Delis, A., Tsotras, V.J.: Spatio-temporal data services in a shared-nothing environment. In: Proceedings of 16th International Conference on Scientific and Statistical Database Management SSDBM. June (2004)Google Scholar
  17. 17.
    Hall, J., Hartline, J., Karlin, A.R., Saia, J., Wilkes, J.: On algorithms for efficient data migration. In: 12th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA)Google Scholar
  18. 18.
    Iyengar S., Bastani F., Yen I.: Concurrent maintenance of data structures in a distributed environment. Comput. J. 31(12), 165–174 (1988)MATHGoogle Scholar
  19. 19.
    Johnson, T., Krishna, P., Colbrook, A.: Distributed indices for accessing distributed data. In: IEEE Symposium on Mass Storage Systems (MSS ’93), pp. 199–208, Los Alamitos, Ca., USA, April. IEEE Computer Society Press (1993)Google Scholar
  20. 20.
  21. 21.
    Kamel, I., Faloutsos, C.: Parallel R-trees. In: Proceedings of the 1992 ACM SIGMOD International Conference on Management of Data, San Diego, 2-5 June, pp. 195–204. ACM Press, New York (1992)Google Scholar
  22. 22.
    Kamel, I., Faloutsos, C.: Hilbert R-tree: an improved R-tree using fractals. In: VLDB’94, Proceedings of 20th International Conference on Very Large Data Bases, 12-15 September 1994, Santiago de Chile, Chile, pp. 500–509. Morgan Kaufmann, Los Altos (1994)Google Scholar
  23. 23.
    Karlsson, J.S.: hQT*: a scalable distributed data structure for high-performance spatial accesses. In: FODO, pp. 37–46 (1998)Google Scholar
  24. 24.
    Khuller, S., Kim, Y.-A., Wan, Y.-C.J.: Algorithms for data migration with cloning. In: Proceedings of the Twenty-second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. pp. 27–36. ACM Press, New YorkGoogle Scholar
  25. 25.
    Kornacker, M., Banks, D.: High-concurrency locking in R-trees. In: VLDB’95, Zurich, pp. 134–145 (1995)Google Scholar
  26. 26.
    Koudas, N., Faloutsos, C., Kamel, I.: Declustering spatial databases on a multi-computer architecture. In: Advances in Database Technology—EDBT’96, 5th International Conference on Extending Database Technology, Avignon (1996)Google Scholar
  27. 27.
    Kriakov, V., Delis, A., Kollios, G.: Management of highly dynamic multidimensional data in a cluster of workstations. In: Proceedings of the 9th International Conference on Extending Database Technology, EDBT, vol. 2992, pp. 748–764. Springer, Heidelberg (2004)Google Scholar
  28. 28.
    Kroll, B., Widmayer, P.: Distributing a search structure among a growing number of processors. In: Proceedings of the 1994 ACM SIGMOD Conference, pp. 265–276 (1994)Google Scholar
  29. 29.
    Kulkarni, P., Ganesan, D., Shenoy, P.J., Lu, Q.: SensEye: a Multi-Tier Camera Sensor Network. In: ACM Multimedia, pp. 229–238, (2005)Google Scholar
  30. 30.
    Lee, M., Kitsuregawa, M., Ooi, B., Tan, K., Mondal, A.: Towards self-tuning data placement in parallel database systems. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, 16–18 May 2000, Dallas, pp. 225–236 (2000)Google Scholar
  31. 31.
    Litwin, W., Neimat, M.A.: k-RP*S: a scalable distributed data structure for high-performance multi-attribute access. In: Proceedings of the Fourth International Conference on Parallel and Distributed Information Systems, 18-20 December, Miami Beach, pp. 120–131. IEEE Computer Society (1996)Google Scholar
  32. 32.
    Litwin, W., Neimat, M.A., Schneider, D.: Linear hashing for distributed files. In: Proceedings of the 1993 SIGMOD Conference, Washington, May (1993)Google Scholar
  33. 33.
    Matsliach, G., Shmueli, O.: An efficient method for distributing search structures. In: First International Conference on Parallel and Distributed Information Systems, pp. 159–166 (1991)Google Scholar
  34. 34.
    Mondal, A., Kitsuregawa, M., Ooi, B.C., Tan, K.L.: R-tree-based data migration and self-tuning strategies in shared-nothing spatial databases. In: Proceedings of ACM Geographic Information Systems, pp. 28–33. ACM Press, New York (2001)Google Scholar
  35. 35.
    Ousterhout, J. K. G. T. Hamachi, Mayo, R.N., Scott, W.S., Taylor, G.S.: Magic: A VLSI layout system. In: 21st Design Automation Conference, pp. 152–159, June (1984)Google Scholar
  36. 36.
    Pagel, B., Korn, F., Faloutsos, C.: Deflating the dimensionality curse using multiple fractal dimensions. In: ICDE, pp. 589–598 (2000)Google Scholar
  37. 37.
    Panagos E., Biliris A.: Synchronization and recovery in a client-server storage system. VLDB J. 6(3), 209–223 (1997)CrossRefGoogle Scholar
  38. 38.
    Panwar S., Mao S., Ryoo J., Li Y.: TCP/IP Essentials: A Lab-Based Approach. Cambridge University Press, Cambridge (2004)Google Scholar
  39. 39.
    Papadopoulos A., Manolopoulos Y.: Nearest neighbor queries in shared-nothing environments. GeoInformatica 1(4), 369–392 (1997)CrossRefGoogle Scholar
  40. 40.
    Papadopoulos A., Manolopoulos Y.: Parallel bulk-loading of spatial data. Parallel Comput. 29(10), 1419–1444 (2003)CrossRefMathSciNetGoogle Scholar
  41. 41.
    Patel, J., Yu, J.-B., Kabra, N., Tufte, K.: Building a scalable geo-spatial dbms: technology, implementation, and evaluation. In: Proceedings of the ACM SIGMOD, pp. 336–347 (1997)Google Scholar
  42. 42.
    Porkaew, K., Lazaridis, I., Mehrotra, S.: Querying mobile objects in spatio-temporal databases. In: Proceedings of 7th SSTD, July (2001)Google Scholar
  43. 43.
    Pritchett, D.: BASE: An ACID Alternative. ACM Queue, 6(3), May/June (2008)Google Scholar
  44. 44.
    Qiao L., Iyer B.R., Agrawal D., El Abbadi A.: Automated storage management with QoS guarantee in large-scale virtualized storage systems. IEEE Data Eng. Bull. 29(3), 47–54 (2006)Google Scholar
  45. 45.
    Ramamritham K., Chrysanthis P.K.: A taxonomy of correctness criteria in database applications. VLDB J. 5(1), 85–97 (1996)CrossRefGoogle Scholar
  46. 46.
    Robinson, J.T.: The K-D-B-Tree: A search structure for large multidimensional dynamic indexes. In: Proceedings of the 1981 ACM SIGMOD International Conference on Management of Data, Ann Arbor, 29 April–1 May, pp. 10–18. ACM Press, New York (1981)Google Scholar
  47. 47.
    Roussopoulos, N., Kotidis, Y., Roussopoulos, M.: Cubetree: Organization of and bulk updates on the data cube. In: SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, 13-15 May 1997, Tucson, pp. 89–99. ACM Press, New York (1997)Google Scholar
  48. 48.
    Saltenis, S., Jensen, C., Leutenegger, S., Lopez, M.A.: Indexing the positions of continuously moving objects. In: Proceeding of the ACM SIGMOD, pp. 331–342, May (2000)Google Scholar
  49. 49.
    Salzberg B., Tsotras V.J.: Comparison of access methods for time-evolving data. ACM Comput. Surv. 31(2), 158–221 (1999)CrossRefGoogle Scholar
  50. 50.
    Scheuermann, P., Weikum, G., Zabback, P.: Data partitioning and load balancing in parallel disk systems. VLDB J. 7(1) (1998)Google Scholar
  51. 51.
    Schnitzer, B., Leutenegger, S.: Master-Client R-Trees: A new parallel R-tree architecture. In: Statistical and Scientific Database Management, pp. 68–77 (1999)Google Scholar
  52. 52.
    Sellis, T., Roussopoulos, N., Faloutsos, C.: The R+-tree: a dynamic index for multi-dimensional objects. In: VLDB’87, Proceedings of 13th International Conference on Very Large Data Bases, 1–4 September 1987, Brighton, pp. 507–518. Morgan Kaufmann, Los Altos (1987)Google Scholar
  53. 53.
    Smiljanic A.: Flexible bandwidth allocation in high-capacity packet switches. IEEE/ACM Trans. Netw. 10(2), 287–293 (2002)CrossRefGoogle Scholar
  54. 54.
    Sun, X., Wang, R., Salzberg, B., Zou, C.: Online B-tree merging. In: Proceedings of ACM SIGMOD, pp. 335–346 (2005)Google Scholar
  55. 55.
    Szalay, A., Gray, J., van den Berg, J.: Petabyte scale data mining: dream or reality. In: Proceedings of SIPE Astronomy Telescopes and Instruments, August (2002)Google Scholar
  56. 56.
    Slutz, D., Barclay, T., Gray, J.: TerraServer: a spatial data warehouse. In: Proceedings of ACM SIGMOD, pp. 307–318 (2000)Google Scholar
  57. 57.
    Satoh T., Honishi, T., Inoue, U.: An index structure for parallel database processing. In: IEEE Second International Workshop on Research Issues on Data Engineering, pp. 224–225 (1992)Google Scholar
  58. 58.
    The Earth Observing System Data and Information System. http://spsosun.gsfc.nasa.gov/eosinfo/EOSDIS_Site/index.html
  59. 59.
    Tao Y., Papadias D.: Range aggregate processing in spatial databases. IEEE Trans. Knowl. Data Eng. 16(12), 1555–1570 (2004)CrossRefGoogle Scholar
  60. 60.
    Thaper, N., Guha, S., Indyk, P., Koudas, N.: Dynamic multidimensional histograms. In: SIGMOD Conference, pp. 428–439 (2002)Google Scholar
  61. 61.
    Theodoridis Y., Sellis T. (1996) A model for the prediction of R-tree performance. In: PODS, pp. 161–171 (1996)Google Scholar
  62. 62.
    Van den Bercken, J., Seeger, B.: An evaluation of generic bulk loading techniques. In: VLDB’01, Roma, pp. 461–470 (2001)Google Scholar
  63. 63.
    Van den Bercken, J., Seeger, B., Widmayer, P.: A generic approach to bulk loading multidimensional index structures. In: VLDB’97, Proceedings of 23rd International Conference on Very Large Data Bases, 25–29 August, Athens, pp. 406–415. Morgan Kaufmann, Los Altos (1997)Google Scholar
  64. 64.
    Zeiler, T.L.: LANDSAT program report 2002. Technical report, US Geological Survey—US Department of Interior. EROS Data Center, Sioux Falls (2002)Google Scholar
  65. 65.
    Zou, C., Salzberg, B.: On-line reorganization of sparsely-populated B+trees. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, 4–6 June, pp. 115–124. ACM Press, New York (1996)Google Scholar
  66. 66.
    Zou, C., Salzberg, B.: Safely and efficiently updating references during on-line reorganization. In: Proceedings of VLDB, pp. 512–522 (1998)Google Scholar

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  1. 1.Polytechnic Institute of New York UniversityBrooklynUSA
  2. 2.Boston UniversityBostonUSA
  3. 3.University of AthensAthensGreece

Personalised recommendations