Principles of Distributed Data Management in 2020?

  • Patrick Valduriez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6860)


With the advents of high-speed networks, fast commodity hardware, and the web, distributed data sources have become ubiquitous. The third edition of the Özsu-Valduriez textbook Principles of Distributed Database Systems [10] reflects the evolution of distributed data management and distributed database systems. In this new edition, the fundamental principles of distributed data management could be still presented based on the three dimensions of earlier editions: distribution, heterogeneity and autonomy of the data sources. In retrospect, the focus on fundamental principles and generic techniques has been useful not only to understand and teach the material, but also to enable an infinite number of variations. The primary application of these generic techniques has been obviously for distributed and parallel DBMS versions. Today, to support the requirements of important data-intensive applications (e.g. social networks, web data analytics, scientific applications, etc.), new distributed data management techniques and systems (e.g. MapReduce, Hadoop, SciDB, Peanut, Pig latin, etc.) are emerging and receiving much attention from the research community. Although they do well in terms of consistency/flexibility/performance trade-offs for specific applications, they seem to be ad-hoc and might hurt data interoperability. The key questions I discuss are: What are the fundamental principles behind the emerging solutions? Is there any generic architectural model, to explain those principles? Do we need new foundations to look at data distribution?


Cloud Computing Cloud Provider Data Integration System Distribute Database System Multidatabase System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abadi, D.J.: Data Management in the Cloud: Limitations and Opportunities. IEEE Data Engineering Bulletin 32(1), 3–12 (2009)Google Scholar
  2. 2.
    Abiteboul, S., Bienvenu, M., Galland, A., Antoine, E.: A Rule-based Language for Web Data Management. In: ACM Symposium on Principles of Database Systems, PODS (2011)Google Scholar
  3. 3.
    Ailamaki, A., Kantere, V., Dash, D.: Managing scientific data. CACM 53(6), 68–78 (2010)CrossRefGoogle Scholar
  4. 4.
    Chang, F., et al.: Bigtable: A Distributed Storage System for Structured Data. ACM TOCS 26(2) (2008)Google Scholar
  5. 5.
    Cooper, B.F., et al.: PNUTS: Yahoo!’s Hosted Data Serving Platform. Proceedings of the VLDB Endowment (PVLDB) 1(2), 1277–1288 (2008)CrossRefGoogle Scholar
  6. 6.
    Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: Symp. on Operating System Design and Implementation (OSDI), pp. 137–150 (2004)Google Scholar
  7. 7.
    Gray, J., Liu, D.T., Nieto-Santisteban, M.A., Szalay, A.S., DeWitt, D.J., Heber, G.: Scientific Data Management in the Coming Decade. Technical Report MSR-TR-2005-10, Microsoft Research (2005)Google Scholar
  8. 8.
    Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: ACM SIGMOD Int. Conf. on Management of Data (SIGMOD), pp. 1099–1110 (2008)Google Scholar
  9. 9.
    Osgarawa, E., Dias, J., Oliveira, D., Porto, F., Valduriez, P., Mattoso, M.: An Algebraic Approach for Data-Centric Scientific Workflows. In: Proceedings of the VLDB Endowment, PVLDB (to appear, 2011)Google Scholar
  10. 10.
    Özsu, T., Valduriez, P.: Principles of Distributed Database Systems, 3rd edn. Springer, Heidelberg (2011)Google Scholar
  11. 11.
    Özsu, T., Valduriez, P., Abiteboul, S., Kemme, B., Jiménez-Peris, R., Chin Ooi, B.: Distributed data management in 2020? In: IEEE Int. Conf. On Data Engineering (ICDE), vol. 1360 (2011)Google Scholar
  12. 12.
    Pacitti, E., Valduriez, P., Mattoso, M.: Grid Data Management: open problems and new issues. Journal of Grid Computing 5(3), 273–281 (2007)CrossRefGoogle Scholar
  13. 13.
    Tomasic, A., Raschid, L., Valduriez, P.: Scaling access to heterogeneous data sources with DISCO. IEEE Trans. on Knowledge and Data Engineering 10(5), 808–823 (1998)CrossRefGoogle Scholar
  14. 14.
    Valduriez, P.: Parallel Database Systems: open problems and new issues. Int. Journal on Distributed and Parallel Databases 1(2), 137–165 (1993)CrossRefGoogle Scholar
  15. 15.
    Valduriez, P., Pacitti, E.: Data Management in Large-Scale P2P Systems. In: Daydé, M., Dongarra, J., Hernández, V., Palma, J.M.L.M. (eds.) VECPAR 2004. LNCS, vol. 3402, pp. 104–118. Springer, Heidelberg (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Patrick Valduriez
    • 1
  1. 1.INRIA and LIRMMMontpellierFrance

Personalised recommendations