Skip to main content

DSD: The Data Source Description Vocabulary

  • Conference paper
  • First Online:
Database and Expert Systems Applications - DEXA 2023 Workshops (DEXA 2023)

Abstract

Training machine learning models, especially in producing enterprises with numerous information systems having different data structures, requires efficient data access. Hence, standardized descriptions of data sources and their data structures are a fundamental requirement. We therefore introduce version 4.0 of the Data Source Description Vocabulary (DSD), which represents a data source in a standardized form using an ontology. We present several real-world applications where the DSD vocabulary has been applied in recent years to demonstrate its relevance. An evaluation against the FAIR principles highlights the scientific quality and potential for reuse of the DSD vocabulary.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Available online: IRI: https://w3id.org/dsd; DOI: https://doi.org/10.5281/zenodo.7773861.

  2. 2.

    https://www.eclipse.org/modeling/emf/.

  3. 3.

    https://www.w3.org/TR/owl2-overview/.

  4. 4.

    http://www.w3.org/ns/dcat#.

  5. 5.

    http://rdfs.org/ns/void#.

  6. 6.

    http://www.w3.org/ns/csvw#.

  7. 7.

    http://purl.org/linked-data/cube#.

  8. 8.

    https://www.w3.org/TR/REC-xml/#dt-doctype.

  9. 9.

    https://www.w3.org/TR/xmlschema-0/.

  10. 10.

    See the “connectors” Java package in https://github.com/lisehr/dq-meerkat.

  11. 11.

    https://w3id.org/foops/.

  12. 12.

    The FAIR principles and the corresponding descriptions in the leftmost column of Table 1 are directly taken from the GO-FAIR website (https://www.go-fair.org/fair-principles/).

  13. 13.

    http://www.w3.org/ns/dqv#.

References

  1. Atzeni, P., Gianforme, G., Cappellari, P.: A universal metamodel and its dictionary. In: Hameurlain, A., Küng, J., Wagner, R. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems I. LNCS, vol. 5740, pp. 38–62. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03722-1_2

    Chapter  Google Scholar 

  2. Candel, C.J.F., Sevilla Ruiz, D., García-Molina, J.J.: A Unified Metamodel for NoSQL and Relational Databases. Information Syst. 104, 101898 (2022). https://doi.org/10.1016/j.is.2021.101898

  3. Ehrlinger, L., Gindlhumer, A., Huber, L., Wöß, W.: DQ-MeeRKat: automating Data Quality Monitoring with a Reference-Data-Profile-Annotated Knowledge Graph. In: Proceedings of the 10th International Conference on Data Science, Technology and Applications - DATA, pp. 215–222. SciTePress (2021)

    Google Scholar 

  4. Ehrlinger, L., Werth, B., Wöß, W.: Automated continuous data quality measurement with QuaIIe. Int. J. Adv. Softw. 11(3 & 4), 400–417 (2018)

    Google Scholar 

  5. Ehrlinger, L., Wöß, W.: Semi-automatically generated hybrid ontologies for information integration. In: Joint Proceedings of the Posters and Demos Track of 11th International Conference on Semantic Systems - SEMANTiCS2015 and 1st Workshop on Data Science: Methods, Technology and Applications (DSci15), vol. 1481, pp. 100–104. CEUR Workshop Proceedings (2015). https://ceur-ws.org/Vol-1481/paper30.pdf

  6. Ehrlinger, L., Wöß, W.: Automated schema quality measurement in large-scale information systems. In: Hacid, H., Sheng, Q.Z., Yoshida, T., Sarkheyli, A., Zhou, R. (eds.) QUAT 2018. LNCS, vol. 11235, pp. 16–31. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-19143-6_2

    Chapter  Google Scholar 

  7. Garijo, D., Corcho, O., Poveda-Villalón, M.: FOOPS!: an ontology pitfall scanner for the FAIR principles. In: International Semantic Web Conference (ISWC) 2021. CEUR Workshop Proceedings, vol. 2980 (2021). http://ceur-ws.org/Vol-2980/paper321.pdf

  8. Gebru, T.: Datasheets for datasets. Commun. ACM 64(12), 86–92 (2021). https://doi.org/10.1145/3458723

    Article  Google Scholar 

  9. Klarlund, N., Møller, A., Schwartzbach, M.I.: The DSD schema language. Autom. Softw. Eng. 9, 285–319 (2002). https://doi.org/10.1023/A:1016376608070

    Article  MATH  Google Scholar 

  10. Rashid, S.M., et al.: The semantic data dictionary - an approach for describing and annotating data. Data Intell. 2(4), 443–486 (2020). https://doi.org/10.1162/dint_a_00058

    Article  Google Scholar 

  11. Schrott, J., Weidinger, S., Tiefengrabner, M., Lettner, C., Wöß, W., Ehrlinger, L.: GOLDCASE: a generic ontology layer for data catalog semantics. In: Garoufallou, E., Vlachidis, A. (eds.) MTSR 2022. CCIS, vol. 1789, pp. 26–38. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-39141-5_3

  12. Wilkinson, M.D., et al.: The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3(1), 160018 (2016). https://doi.org/10.1038/sdata.2016.18

  13. World Wide Web Consortium: All Standards and Drafts - W3C. https://www.w3.org/TR/. Accessed 21 Feb 2023

Download references

Acknowledgements

This research has been partially funded by BMK, BMAW, and the State of Upper Austria in the frame of the SCCH competence center INTEGRATE (FFG grant no. 892418) part of the FFG COMET Competence Centers for Excellent Technologies Programme and by the “ICT of the Future” project QuanTD (no. 898626).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Lisa Ehrlinger or Johannes Schrott .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ehrlinger, L., Schrott, J., Wöß, W. (2023). DSD: The Data Source Description Vocabulary. In: Kotsis, G., et al. Database and Expert Systems Applications - DEXA 2023 Workshops. DEXA 2023. Communications in Computer and Information Science, vol 1872. Springer, Cham. https://doi.org/10.1007/978-3-031-39689-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-39689-2_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-39688-5

  • Online ISBN: 978-3-031-39689-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics