Skip to main content

An Extensible Metadata Framework for Data Quality Assessment of Composite Structures

  • Conference paper
Book cover Data Warehousing and Knowledge Discovery (DaWaK 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4654))

Included in the following conference series:

  • 1233 Accesses

Abstract

Data quality is a critical issue both in operational databases and in data warehouse systems. Data quality assessment is a strong requirement regarding the ETL subsystem, since bad data may destroy data warehouse credibility. During the last two decades, research and development efforts in the data quality field have produced techniques for data profiling and cleaning, which focus on detecting and correcting bad values in data. Little efforts have been done considering data quality when it relates to the well-formedness of coarse grained data structures resulting from the assembly of linked data records. This paper proposes a metadata model that supports the structural validation of linked data records, from a data quality point of view. The metamodel is built on top of the CWM standard and it supports the specification of data structure quality rules in a high level of abstraction, as well as by means of very specific fine grained business rules.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ackermann, J., Turowski, V.: A Library of OCL Specification Patterns to Simplify Behavioral Specification of Software Components. In: Dubois, E., Pohl, K. (eds.) CAiSE 2006. LNCS, vol. 4001, pp. 255–269. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  2. Booch, G., Rumbaugh, J., Jacobson, I.: The Unified Modeling Language User Guide. Addison-Wesley, Reading, MA (1999)

    Book  Google Scholar 

  3. Dasu, T., Johnson, T.: Exploratory Data Mining and Data Cleaning. John Wiley & Sons, Chichester (2003)

    MATH  Google Scholar 

  4. Dasu, T., Vesonder, G., Wright, J.: Data quality through knowledge engineering. In: KDD 2003. Proc. 9th ACM SIGKDD, Washington, D.C, pp. 705–710. ACM Press, New York (2003)

    Chapter  Google Scholar 

  5. Galhardas, H., Florescu, D., Shasha, D., Simon, E.: AJAX: An Extensible Data Cleaning Tool. In: Proc. ACM SIGMOD Conf., Dallas, Texas, p. 590 (2000)

    Google Scholar 

  6. Gomes, P., Farinha, J., Trigueiros, M.J.: A Data Quality Metamodel Extension to CWM. In: Roddick, J.F., Hinze, A. (eds.) APCCM 2007. Proc. 4th Asia-Pacific Conference on Conceptual Modelling, Ballarat, Australia. CRPIT, 67, pp. 17–26. ACS (2007)

    Google Scholar 

  7. Kimball, R., Caserta, J.: The Data Warehouse ETL Toolkit. Wiley Publishing, Inc., Chichester (2004)

    Google Scholar 

  8. Olson, J.: Data Quality: The Accuracy Dimension. Morgan Kaufman, San Francisco (2003)

    Google Scholar 

  9. OMG (ed.): Common Warehouse Metamodel (CWM), Version 1.1, Object Management Group, Inc. (2003), Internet: http://www.omg.org/technology/documents/formal/cwm.htm

  10. OMG (ed.): Object Constraint Language Specification, Version 2.0, Object Management Group, Inc. (2006), Internet: http://www.omg.org/technology/documents/formal/ocl.htm

  11. Raman, V., Hellerstein, J.: Potter’s wheel: An interactive data cleaning system. In: Proc. 27th VLDB, Roma, Italy, pp. 381–390 (2001)

    Google Scholar 

  12. Richters, M., Gogolla, M.: A Metamodel for OCL. In: France, R.B., Rumpe, B. (eds.) UML 1999. LNCS, vol. 1723, pp. 156–171. Springer, Heidelberg (1999)

    Google Scholar 

  13. Wahler, M., Koehler, J., Bruckner, A.: Model-Driven Constraint Engineering. In: Proc. 6th OCL Workshop on OCL for (Meta-)Models (OCLApps 2006)/ MoDELS Conferences, Genova, Italy (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Il Yeal Song Johann Eder Tho Manh Nguyen

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Farinha, J., Trigueiros, M.J. (2007). An Extensible Metadata Framework for Data Quality Assessment of Composite Structures. In: Song, I.Y., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2007. Lecture Notes in Computer Science, vol 4654. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74553-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74553-2_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74552-5

  • Online ISBN: 978-3-540-74553-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics