D’MART: A Tool for Building and Populating Data Warehouse Model from Existing Reports and Tables

  • Sumit Negi
  • Manish A. Bhide
  • Vishal S. Batra
  • Mukesh K. Mohania
  • Sunil Bajpai
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7418)

Abstract

As companies grow (organically or inorganically), Data Administration (i.e. Stage 5 of Nolan’s IT growth model) becomes the next logical step in their IT evolution. Designing a Data Warehouse model, especially in the presence of legacy systems, is a challenging task. A lot of time and effort is consumed in understanding the existing data requirements, performing Dimensional and Fact modeling etc. This problem is further exacerbated if enterprise outsource their IT needs to external vendors. In such a situation no individual has a complete and in-depth view of the existing data setup. For such settings, a tool that can assist in building a data warehouse model from existing data models such that there is minimal impact to the business can be of immense value. In this paper we present the D’MART tool which addresses this problem. D’MART analyzes the existing data model of the enterprise and proposes alternatives for building the new data warehouse model. D’MART models the problem of identifying Fact/Dimension attributes of a warehouse model as a graph cut on a Dependency Analysis Graph (DAG). The DAG is built using the existing data models and the BI Report generation (SQL) scripts. The D’MART tool also uses the DAG for generation of ETL scripts that can be used to populate the newly proposed data warehouse from data present in the existing schemas. D’MART was developed and validated as part of an engagement with Indian Railways which operates one of the largest rail networks in the world.

Keywords

Dimension Table Fact Table Base Table Star Schema Data Mart 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arora, S., Rao, S., Vazirani, U.: Expander flows, geometric embeddings and graph partitioning (2009)Google Scholar
  2. 2.
    Chowdhary, P., Mihaila, G., Lei, H.: Model driven data warehousing for business performance management (2006)Google Scholar
  3. 3.
    Doan, A., Domingos, P., Halevy, A.: Reconciling schemas of disparate data sources: a machine-learning approach (2001)Google Scholar
  4. 4.
    Edmonds, J., Karp, R.: Theoretical improvements in algorithmic efficiency for network flow problems (1972)Google Scholar
  5. 5.
    Golfarelli, M., Maio, D., Rizzi, S.: Conceptual design of data warehouses from e/r schemes (1998)Google Scholar
  6. 6.
    Kimball, R.: The Data Warehouse Toolkit: Practical Techniques For Building Dimensional Data Warehouse. John Wiley & Sons (1996)Google Scholar
  7. 7.
    Rahm, E., Bernstein, P.: A survey of approaches to automatic schema matching (2001)Google Scholar
  8. 8.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press and McGraw-Hill (2009)Google Scholar
  9. 9.
    Westerman, P.: Data Warehousing using the Wal-Mart Model. Morgan Kaufmann (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Sumit Negi
    • 1
  • Manish A. Bhide
    • 2
  • Vishal S. Batra
    • 1
  • Mukesh K. Mohania
    • 1
  • Sunil Bajpai
    • 3
  1. 1.IBM ResearchNew DelhiIndia
  2. 2.IBM Software GroupHyderabadIndia
  3. 3.Center for Railway Information SystemIndian RailwaysDelhiIndia

Personalised recommendations