Advertisement

Midas: Towards an Interactive Data Catalog

  • Patrick HollEmail author
  • Kevin Gossling
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11721)

Abstract

This paper presents the ongoing work on the Midas polystore system. The system combines data cataloging features with ad-hoc query capabilities and is specifically tailored to support agile data science teams that have to handle large datasets in a heterogeneous data landscape. Midas consists of a distributed SQL-based query engine and a web application for managing and virtualizing datasets. It differs from prior systems in its ability to provide attribute level lineage using graph-based virtualization, sophisticated metadata management, and query offloading on virtualized datasets.

Keywords

Polystore Data catalog Metadata management 

References

  1. 1.
    Apache arrow homepage. https://arrow.apache.org/. Accessed 15 Mar 2019
  2. 2.
    Dremio is the data-as-a-service platform. - dremio. https://www.dremio.com/. Accessed 15 Dec 2018
  3. 3.
    Aggarwal, C.C.: Trio a system for data uncertainty and lineage. In: Aggarwal, C. (ed.) Managing and Mining Uncertain Data, vol. 35, pp. 1–35. Springer, Boston (2009).  https://doi.org/10.1007/978-0-387-09690-2_5CrossRefzbMATHGoogle Scholar
  4. 4.
    Cui, Y., Widom, J.: Lineage tracing for general data warehouse transformations. VLDB J. Int. J. Very Large Data Bases 12(1), 41–58 (2003)CrossRefGoogle Scholar
  5. 5.
    Franklin, M., Halevy, A., Maier, D.: From databases to dataspaces: a new abstraction for information management. ACM SIGMOD Rec. 34(4), 27–33 (2005)CrossRefGoogle Scholar
  6. 6.
    Halevy, A., et al.: Goods: organizing Google’s datasets. In: Proceedings of the 2016 International Conference on Management of Data, pp. 795–806. ACM (2016)Google Scholar
  7. 7.
    Hausenblas, M., Nadeau, J.: Apache drill: interactive ad-hoc analysis at scale. Big Data 1(2), 100–104 (2013)CrossRefGoogle Scholar
  8. 8.
    Kong, W., Li, R., Luo, J., Zhang, A., Chang, Y., Allan, J.: Predicting search intent based on pre-search context. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 503–512. ACM (2015)Google Scholar
  9. 9.
    Lenzerini, M.: Ontology-based data management. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 5–6. ACM (2011)Google Scholar
  10. 10.
    Melnik, S., et al.: Dremel: interactive analysis of web-scale datasets. Proc. VLDB Endow. 3(1–2), 330–339 (2010)CrossRefGoogle Scholar
  11. 11.
    Tenopir, C., et al.: Data sharing by scientists: practices and perceptions. PLoS ONE 6(6), e21101 (2011)CrossRefGoogle Scholar
  12. 12.
    Traverso, M.: Presto: Interacting with petabytes of data at facebook (2013). Accessed 4 Feb 2014Google Scholar
  13. 13.
    Woodruff, A., Stonebraker, M.: Supporting fine-grained data lineage in a database visualization environment. In: Proceedings 13th International Conference on Data Engineering, pp. 91–102. IEEE (1997)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Technical University of MunichGarching b. MuenchenGermany

Personalised recommendations