Advertisement

Towards Dynamic Data Placement for Polystore Ingestion

  • Jiang DuEmail author
  • John Meehan
  • Nesime Tatbul
  • Stan Zdonik
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 337)

Abstract

Integrating low-latency data streaming into data warehouse architectures has become an important enhancement to support modern data warehousing applications. In these architectures, heterogeneous workloads with data ingestion and analytical queries must be executed with strict performance guarantees. Furthermore, the data warehouse may consists of multiple different types of storage engines (a.k.a., polystores or multi-stores). A paramount problem is data placement; different workload scenarios call for different data placement designs. Moreover, workload conditions change frequently. In this paper, we provide evidence that a dynamic, workload-driven approach is needed for data placement in polystores with low-latency data ingestion support. We study the problem based on the characteristics of the TPC-DI benchmark in the context of an abbreviated polystore that consists of S-Store and Postgres.

Notes

Acknowledgments

We thank Renee J. Miller and Boris Glavic for reviewing the work. We also thank the anonymous reviewers and the BIRTE 2017 workshop attendees for their helpful suggestions. This research is funded in part by a Bell Canada Fellowship, NSERC, the Intel Science and Technology Center for Big Data, and the NSF under grant NSF IIS-1111423.

References

  1. 1.
  2. 2.
    Lambda Architecture. http://lambda-architecture.net/
  3. 3.
  4. 4.
    Altinel, M., Bornhovd, C., Krishnamurthy, S., Mohan, C., Pirahesh, H., Reinwald, B.: Cache tables: paving the way for an adaptive database cache. In: VLDB, pp. 718–729 (2003)Google Scholar
  5. 5.
    Barber, R., et al.: Wildfire: concurrent blazing data ingest and analytics. In: SIGMOD, pp. 2077–2080 (2016)Google Scholar
  6. 6.
    Bruno, N., Chaudhuri, S.: An online approach to physical design tuning. In: ICDE, pp. 826–835 (2007)Google Scholar
  7. 7.
    Cetintemel, U., et al.: S-Store: a streaming NewSQL system for big velocity applications. PVLDB 7(13), 1633–1636 (2014)Google Scholar
  8. 8.
    Cudre-Mauroux, P., et al.: A demonstration of SciDB: a science-oriented DBMS. PVLDB 2(2), 1534–1537 (2009)Google Scholar
  9. 9.
    DeBrabant, J., Pavlo, A., Tu, S., Stonebraker, M., Zdonik, S.: Anti-caching: a new approach to database management system architecture. PVLDB 6(14), 1942–1953 (2013)Google Scholar
  10. 10.
    Du, J., Glavic, B., Tan, W., Miller, R.J.: DeepSea: progressive workload-aware partitioning of materialized views in scalable data analytics. In: EDBT, pp. 198–209 (2017)Google Scholar
  11. 11.
    Elmore, A., et al.: A demonstration of the BigDAWG polystore system. PVLDB 8(12), 1908–1911 (2015)Google Scholar
  12. 12.
    Fernandez, R.C., et al.: Liquid: unifying nearline and offline big data integration. In: CIDR (2015)Google Scholar
  13. 13.
    Fitzpatrick, B.: Distributed caching with memcached. Linux J. 124, 5–5 (2004)Google Scholar
  14. 14.
    Golab, L., Johnson, T., Seidel, J.S., Shkapenyuk, V.: Stream warehousing with DataDepot. In: SIGMOD, pp. 847–854 (2009)Google Scholar
  15. 15.
    Josifovski, V., Schwarz, P., Haas, L., Lin, E.: Garlic: a new flavor of federated query processing for DB2. In: SIGMOD, pp. 524–532 (2002)Google Scholar
  16. 16.
    Kallman, R., et al.: H-Store: a high-performance, distributed main memory transaction processing system. PVLDB 1(2), 1496–1499 (2008)Google Scholar
  17. 17.
    Kreps, J., Narkhede, N., Rao, J.: Kafka: a distributed messaging system for log processing. In: NetDB Workshop (2011)Google Scholar
  18. 18.
    LeFevre, J., Sankaranarayanan, J., Hacigumus, H., Tatemura, J., Polyzotis, N., Carey, M.J.: MISO: souping up big data query processing with a multistore system. In: SIGMOD, pp. 1591–1602 (2014)Google Scholar
  19. 19.
    Meehan, J., Aslantas, C., Zdonik, S., Tatbul, N., Du, J.: Data ingestion for the connected world. In: CIDR (2017)Google Scholar
  20. 20.
    Meehan, J., et al.: S-Store: streaming meets transaction processing. PVDLB 8(13), 2134–2145 (2015)MathSciNetGoogle Scholar
  21. 21.
    Meehan, J., et al.: Integrating real-time and batch processing in a polystore. In: IEEE HPEC (2016)Google Scholar
  22. 22.
    Özsu, M.T., Valduriez, P.: Distributed database systems: where are we now? IEEE Comput. 24(8), 68–78 (1991)CrossRefGoogle Scholar
  23. 23.
    Poess, M., Rabl, T., Jacobsen, H., Caufield, B.: TPC-DI: the first industry benchmark for data integration. PVLDB 7(13), 1367–1378 (2014)Google Scholar
  24. 24.
    Stonebraker, M., et al.: C-store: a column-oriented DBMS. In: VLDB, pp. 553–564 (2005)Google Scholar
  25. 25.
    Stonebraker, M., Cetintemel, U.: “One size fits all”: an idea whose time has come and gone. In: ICDE, pp. 2–11 (2005)Google Scholar
  26. 26.
    Tatbul, N., et al.: Handling shared, mutable state in stream processing with correctness guarantees. IEEE Data Eng. Bull. Special Issue Next-Gener. Stream Process. 38(4), 94–104 (2015)Google Scholar
  27. 27.
    Vassiliadis, P., Simitsis, A.: Near real-time ETL. In: Kozielski, S., Wrembel, R. (eds.) New Trends in Data Warehousing and Data Analysis, pp. 1–31. Springer, Boston (2009).  https://doi.org/10.1007/978-0-387-87431-9_2CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Jiang Du
    • 1
    Email author
  • John Meehan
    • 2
  • Nesime Tatbul
    • 3
  • Stan Zdonik
    • 2
  1. 1.University of TorontoTorontoCanada
  2. 2.Brown UniversityProvidenceUSA
  3. 3.Intel Labs and MITCambridgeUSA

Personalised recommendations