Abstract
To describe modeling objects in Model-Based Systems Engineering (MBSE) tools, physical properties of these objects are often provided only in data sheets, which are not truly machine-readable. Previously, we proposed a product data hub to exchange spacecraft product information between manufacturers and various MBSE tools. However, issues with heterogeneous structures and semantics of information, such as differences in data format and vocabularies, persist. Using ontologies to maintain product descriptions can mitigate the heterogeneity problem by providing semantic descriptions and supporting different vocabularies for a single concept. To automatically and semantically obtain information from documents that contain tables, lists, and text, we developed an ontology-based information extraction tool. We present how to use the Data Sheets Annotation Tool (DSAT) for, either manually or automatically, extracting information from data sheets, and populating a database with the obtained data. Particularly, we emphasize on the usage of DSAT as a user interface for improving ontologies, which, in turn, are used for a (better) information extraction from the data sheets. Although DSAT is initially created for supporting collaborative systems engineering, it is not limited to the domain of spacecraft design. It can also be applied to other domains, where information needs to be extracted from a multitude of heterogeneous sources.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Amazon Comprehend - Natural Language Processing (NLP) and Machine Learning (ML). https://aws.amazon.com/comprehend/. Accessed 25 June 2021
Baclawski, K., et al.: Ontology Summit 2017 communiqué - AI, learning, reasoning and ontologies. Appl. Ontol. 13, 3–18 (2017)
Barkschat, K. Semantic information extraction on domain specific data sheets. In: ESWC (2014)
Camelot. https://camelot-py.readthedocs.io/ Accessed 25 June 2021
ConTrOn. Contron - spacecraft parts ontology - dsat demo, September 2020. https://zenodo.org/record/4034478
DBpedia Spotlight - Shedding light on the web of documents. https://www.dbpedia-spotlight.org/. Accessed 25 June 2021
Fischer, P.M., Lüdtke, D., Lange, C., Roshani, F.-C., Dannemann, F., Gerndt, A.: Implementing model-based system engineering for the whole lifecycle of a spacecraft. CEAS Space J. 9(3), 351–365 (2017)
English Named Entity Recognizer. https://cloud.gate.ac.uk/shopfront/displayItem/annie-named-entity-recognizer. Accessed 25 June 2021
INCOSE SE Vision 2020. techreport, International Council on Systems Engineering (INCOSE) (2007)
Intelligent Tagging & Text Analytics | Refinitiv. https://www.refinitiv.com/en/products/intelligent-tagging-text-analytics. Accessed 25 June 2021
Murdaca, F., et al.: Knowledge-based information extraction from datasheets of space parts. In 8th International Systems & Concurrent Engineering for Space Applications Conference, September 2018
Opasjumruskit, K., Peters, D., Schindler, S.: DSAT: ontology-based information extraction on technical data sheets. In: SEMWEB (2020)
Opasjumruskit, K., Schindler, S., Thiele, L., Schäfer, P.M.: Towards learning from user feedback for ontology-based information extraction. In: Proceedings of the 1st International Workshop on Challenges and Experiences from Data Integration to Knowledge Graphs co-located with the 25th ACM SIGKDD, vol. 2512 of CEUR Workshop Proceedings, CEUR-WS.org (2019)
PDFMiner - a tool for extracting information from PDF documents. https://github.com/pdfminer/pdfminer.six. Accessed 25 June 2021
Peters, D., Fischer, P.M., Schäfer, P.M., Opasjumruskit, K., Gerndt, A.: Digital availability of product information for collaborative engineering of spacecraft. In: Luo, Y. (ed.) CDVE 2019. LNCS, vol. 11792, pp. 74–83. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30949-7_9
Pollock, R.: Tools for extracting data and text from pdfs - a review. April 2016. https://okfnlabs.org/blog/2016/04/19/pdf-tools-extract-text-and-data-from-pdfs.html
Rizvi, S.T.R., Mercier, D., Agne, S., Erkel, S., Dengel, A., Ahmed, S.: Ontology-based information extraction from technical documents. In: Proceedings of the 10th International Conference on Agents and Artificial Intelligence, SCITEPRESS - Science and Technology Publications (2018)
Textricator. https://textricator.mfj.io/. Accessed 25 June 2021
Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)
Wimalasuriya, D.C., Dou, D.: Ontology-based information extraction: an introduction and a survey of current approaches. J. Inf. Sci. 36, 306–323 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Opasjumruskit, K., Schindler, S., Peters, D. (2021). Automatic Data Sheet Information Extraction for Supporting Model-Based Systems Engineering. In: Luo, Y. (eds) Cooperative Design, Visualization, and Engineering. CDVE 2021. Lecture Notes in Computer Science(), vol 12983. Springer, Cham. https://doi.org/10.1007/978-3-030-88207-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-88207-5_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88206-8
Online ISBN: 978-3-030-88207-5
eBook Packages: Computer ScienceComputer Science (R0)