Architekturen für verteiltes und paralleles Datenmanagement

Rahm, Erhard; Saake, Gunter; Sattler, Kai-Uwe

doi:10.1007/978-3-642-45242-0_3

Erhard Rahm⁴,
Gunter Saake⁵ &
Kai-Uwe Sattler⁶

Part of the book series: eXamen.press ((EXAMEN))

9893 Accesses

Zusammenfassung

Zur verteilten und parallelen Datenverwaltung besteht ein breites Spektrum an Architekturen, um den unterschiedlichen Anforderungen wie Skalierbarkeit, Verfügbarkeit, Knotenautonomie u.a. gerecht zu werden. Die wenig einschränkende Randbedingung ist dabei nur, dass die Datenverwaltung kooperativ auf mehreren Prozessoren und Rechnerknoten durchgeführt wird, was bereits bei einem Server mit mehreren Prozessoren bzw. Cores der Fall ist. Wir wollen in diesem Kapitel die wesentlichen Architekturklassen mit ihren Eigenschaften und Varianten unterscheiden, deren Realisierungskonzepte dann in den nachfolgenden Kapiteln vertieft werden.

Bevor wir auf die einzelnen Architekturen eingehen, diskutieren wir zunächst welche Arten der Parallelverarbeitung zu unterscheiden und nach Möglichkeit zu unterstützen sind. Danach stellen wir die Klassifikationskriterien vor, anhand der wir die verschiedenen Architekturen einordnen und charakterisieren. Ausführlich diskutieren wir dann die drei grundlegenden Architekturen für Parallele DBS: Shared Everything, Shared Disk und Shared Nothing. Im Anschluss werden noch Architekturen mit funktional spezialisierten Prozessoren (u.a. Client/Server-DBS und Optionen zur Hardware-Unterstützung) sowie Alternativen zur Unterstützung heterogener Datenbanken behandelt. Weiterhin führen wir in die Shared-Nothing-Plattform Hadoop ein, die zur Analyse großer Datenmengen (Big Data) breite Anwendung findet. Abschließend nehmen wir einen Vergleich der Architekturen hinsichtlich der Anforderungen aus dem einleitenden Kapitel dieses Buchs vor.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 39.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Fasst man Ad-hoc-Anfragen als spezielle Transaktionen auf, dann ist für diese Inter-Query-Parallelität gleichbedeutend mit Inter-Transaktionsparallelität und Intra-Transaktions- gleichbedeutend mit Intra-Query-Parallelität.
2.
Datenparallelität wird auch als horizontale, Pipelineparallelität als vertikale Parallelität bzw. Datenflussparallelität bezeichnet.
3.
Bei Shared Nothing muss gewährleistet sein, dass die Daten der von einem Rechnerausfall betroffenen Partition weiterhin zugänglich sind, z. B. aufgrund von replizierter Speicherung oder – bei lokaler Verteilung – durch Übernahme der betroffenen Externspeicher durch überlebende Rechner.

Literatur

Ballinger, C., Fryer, R.: Born to be parallel: Why parallel origins give Teradata an enduring performance edge. IEEE Data Eng. Bull. 20(2), 3–12 (1997)
Google Scholar
Bauer, A., Günzel, H. (Hrsg.): Data-Warehouse-Systeme: Architektur, Entwicklung, Anwendung, 4. Aufl. dpunkt (2013)
Google Scholar
Bellahsene, Z., Bonifati, A., Rahm, E. (Hrsg.): Schema Matching and Mapping. Data-Centric Systems and Applications. Springer (2011)
Google Scholar
Bernstein, P.A., Giunchiglia, F., Kementsietsidis, A., Mylopoulos, J., Serafini, L., Zaihrayeu, I.: Data management for peer-to-peer computing: A vision. WebDB., S. 89–94 (2002)
Google Scholar
Bernstein, P.A., Haas, L.M.: Information integration in the enterprise. Commun. ACM 51(9), 72–79 (2008)
Article Google Scholar
Bizer, C., Heath, T., Berners-Lee, T.: Linked data – the story so far. International Journal on Semantic Web and Information Systems 5(3), 1–22 (2009)
Article Google Scholar
Blakeley, J.A., Dyke, P.A., Galindo-Legaria, C.A., James, N., Kleinerman, C., Peebles, M., Tkachuk, R., Washington, V.: Microsoft SQL Server Parallel Data Warehouse: Architecture overview. In: Proc. BIRTE, S. 53–64. (2011)
Google Scholar
Breß, S., Beier, F., Rauhe, H., Sattler, K., Schallehn, E., Saake, G.: Efficient co-processor utilization in database query processing. Inf. Syst. 38(8), 1084–1096 (2013). http://dx.doi.org/10.1016/j.is.2013.05.004 doi:10.1016/j.is.2013.05.004.
Article Google Scholar
Clarke, J.: Oracle Exadata Recipes: A Problem-Solution Approach. apress (2013)
Book Google Scholar
Conrad, S.: Föderierte Datenbanksysteme: Konzepte der Datenintegration. Springer-Verlag, Berlin/Heidelberg (1997)
Book MATH Google Scholar
Date, C.J.: An Introduction to Database Systems, 5. Aufl. Bd. I. Addison-Wesley (1990)
Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. OSDI., S. 10 (2004)
Google Scholar
DeWitt, D.J., Futtersack, P., Maier, D., Vélez, F.: A study of three alternative workstation-server architectures for object oriented database systems. VLDB., S. 107–121 (1990)
Google Scholar
DeWitt, D.J., Gray, J.: Parallel database systems: The future of high performance database systems. Commun. ACM 35(6), 85–98 (1992)
Article Google Scholar
Di Lorenzo, G.: Hacid, H., Paik, H.y., Benatallah, B.: Data integration in mashups. ACM Sigmod Record 38(1), 59–66 (2009)
Article MathSciNet Google Scholar
Doan, A., Halevy, A.Y., Ives, Z.G.: Principles of Data Integration. Morgan Kaufmann (2012)
Google Scholar
Endrullis, S., Thor, A., Rahm, E.: WETSUIT: An efficient mashup tool for searching and fusing web entities. PVLDB 5(12), 1970–1973 (2012)
Google Scholar
Francisco, P.: The Netezza Data Appliance Architecture: A Platform for High Performance Data Warehousing and Analytics. IBM Redbooks (2011)
Google Scholar
Gates, A., Natkovich, O., Chopra, S., Kamath, P., Narayanam, S., Olston, C., Reed, B., Srinivasan, S., Srivastava, U.: Building a highlevel dataflow system on top of mapreduce: The pig experience. PVLDB 2(2), 1414–1425 (2009)
Google Scholar
Ghemawat, S., Gobioff, H., Leung, S.T.: The google file system. SIGOPS 37(5), 29–43 (2003)
Article Google Scholar
Govindaraju, N., Gray, J., Kumar, R., Manocha, D.: GPUTeraSort: high performance graphics co-processor sorting for large database management. Proc. ACM SIGMOD Conf., S. 325–336 (2006)
Google Scholar
Govindaraju, N.K., Lloyd, B., Wang, W., Lin, M., Manocha, D.: Fast computation of database operations using graphics processors Proc. ACM SIGMOD Conf.., S. 215–226 (2004)
Google Scholar
Halevy, A.Y., Ashish, N., Bitton, D., Carey, M.J., Draper, D., Pollock, J., Rosenthal, A., Sikka, V.: Enterprise information integration: successes, challenges and controversies Proc. ACM SIGMOD Conf., S. 778–787 (2005)
Google Scholar
Härder, T., Mitschang, B., Nink, U., Ritter, N.: Workstation/Server-Architekturen für datenbankbasierte Ingenieuranwendungen. Inform. Forsch. Entwickl. 10(2), 55–72 (1995)
Article Google Scholar
Härder, T., Rothermel, K.: Concurrency Control Issues in Nested Transactions. VLDB J. 2(1), 39–74 (1993)
Article Google Scholar
He, B., et al.: Relational joins on graphics processors Proc. ACM SIGMOD Conf.. (2008)
Book Google Scholar
Kaldewey, T., Lohman, G.M., Müller, R., Volk, P.B.: GPU join processing revisited. Proc.8th Workshop on Data Management on New Hardware (DaMoN)., S. 55–62 (2012)
Google Scholar
Keim, D.A., Prawirohardjo, E.S.: Datenbankmaschinen: Performanz durch Parallelität Bd. 86. Bibliographisches Institut (1992)
Google Scholar
Kiefer, T., Schlegel, B., Lehner, W.: Experimental evaluation of NUMA effects on database management systems Proc. Datenbanksysteme für Business, Technologie und Web (BTW)., S. 185–204 (2013)
Google Scholar
Köppen, V., Saake, G., Sattler, K.U.: Data Warehouse Technologien, 2. Aufl. MITP (2014)
Google Scholar
Lang, H., Leis, V., Albutiu, M.C., Neumann, T., Kemper, A.: Massively parallel NUMA-aware hash joins Proc. VLDB workshop on in-memory data management and analytics. (2013)
Google Scholar
Larson, P.Å., Goldstein, J., Zhou, J.: MtCache: Transparent mid-tier database caching in SQL Server Proc. ICDE Conf.., S. 177–188 (2004)
Google Scholar
Leser, U., Naumann, F.: Informationsintegration – Architekturen und Methoden zur Integration verteilter und heterogener Datenquellen. dpunkt.verlag (2007)
MATH Google Scholar
Li, Y., Pandis, I., Mueller, R., Raman, V., Lohman, G.M.: NUMA-aware algorithms: the case of data shuffling. CIDR (2013)
Google Scholar
Luo, Q., Krishnamurthy, S., Mohan, C., Pirahesh, H., Woo, H., Lindsay, B.G., Naughton, J.F.: Middle-tier database caching for E-Business Proc. ACM SIGMOD Conf.., S. 600–611 (2002)
Google Scholar
Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD 10. ACM, New York, NY, USA, S. 135–146 (2010)
Google Scholar
Moss, J., Eliot, B.: Nested transactions: an approach to reliable distributed computing. MIT Press (1985)
Google Scholar
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. SIGMOD., S. 1099–1110 (2008)
Google Scholar
Rahm, E.: Parallel query processing in shared disk database systems. ACM SIGMOD Record 22(4), 32–37 (1993)
Article Google Scholar
Rahm, E.: Mehrrechner-Datenbanksysteme. Addison-Wesley (1994)
Google Scholar
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)
Article MATH Google Scholar
Rahm, E., Thor, A., Aumueller, D.: Dynamic fusion of web data. In: Database and XML Technologies, S. 14–16. Springer (2007)
Google Scholar
Shankar, S., Nehme, R.V., Aguilar-Saborit, J., Chung, A., Elhemali, M., Halverson, A., Robinson, E., Subramanian, M.S., DeWitt, D.J., Galindo-Legaria, C.A.: Query optimization in Microsoft SQL server PDW Proc. ACM SIGMOD Conf.., S. 767–776 (2012)
Google Scholar
Simmen, D.E., Altinel, M., Markl, V., Padmanabhan, S., Singh, A.: Damia: data mashups for intranet applications Proc. ACM SIGMOD Conf., S. 1171–1182 (2008)
Google Scholar
Su, S.: Database Computers: concepts, architecture & techniques. McGraw-Hill, Inc. (1988)
Google Scholar
Tatarinov, I., Halevy, A.: Efficient query reformulation in peer data management systems Proc. ACM SIGMOD Conf.., S. 539–550 (2004)
Google Scholar
Tate, J., Beck, P., Ibarra, H.H., Kumaravel, S., Miklas, L.: Introduction to Storage Area Networks and System Networking. IBM Red Book (2012)
Google Scholar
The Apache Software Foundation: Apache Hadoop. http://wiki.apache.org/hadoop/
The Apache Software Foundation: HDFS Architecture (2014). http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
Google Scholar
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. The VLDB Journal 2(2), 1626–1629 (2009)
Google Scholar
Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)
Article Google Scholar
Weikum, G., Zabback, P.: I/O-Parallelität und Fehlertoleranz in Disk-Arrays, Teil 1: I/O-Parallelität. Informatik Spektrum 16(3), 133–142 (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Informatik, Universität Leipzig, Leipzig, Deutschland
Erhard Rahm
Fakultät für Informatik, Otto-von-Guericke Universität, Magdeburg, Sachsen-Anhalt, Deutschland
Gunter Saake
Fakultät für Informatik und Automatisierung, Technische Universität Ilmenau, Ilmenau, Deutschland
Kai-Uwe Sattler

Authors

Erhard Rahm
View author publications
You can also search for this author in PubMed Google Scholar
Gunter Saake
View author publications
You can also search for this author in PubMed Google Scholar
Kai-Uwe Sattler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gunter Saake .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rahm, E., Saake, G., Sattler, KU. (2015). Architekturen für verteiltes und paralleles Datenmanagement. In: Verteiltes und Paralleles Datenmanagement. eXamen.press. Springer Vieweg, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45242-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-45242-0_3
Published: 23 June 2015
Publisher Name: Springer Vieweg, Berlin, Heidelberg
Print ISBN: 978-3-642-45241-3
Online ISBN: 978-3-642-45242-0
eBook Packages: Computer Science and Engineering (German Language)

Publish with us

Policies and ethics