The goals of Simple-ML are realized through a domain model (Fig. 1), semantic dataset profiles and the SDAW. We conduct the modeling in RDFFootnote 1 reusing existing vocabularies (e.g. dcatFootnote 2), where possible. The terms specific to Simple-ML are defined in the Simple-ML vocabulary, denoted using the sml prefixFootnote 3.
Domain Model: In Simple-ML, the domain model describes relevant concepts, their properties and relations in the specific application domain. The class sml:DomainModel represents the model of an application domain. The domain-specific concepts are modeled as instances of the class sml:DomainClass.
Dataset Profiles: A dataset profile is a formal representation of dataset characteristics (features). A dataset profile feature is a dataset characteristic. Such features can belong to general, qualitative, provenance, statistical, licensing and dynamics categories . In Simple-ML, the goal of the dataset profiles is to define dataset characteristics required to facilitate SDAWs, including information required for data materialization.
Dataset profile: A dataset profile is modeled as an instance of dcat:Dataset. General dataset profile features as well as provenance and licensing features are described using the DCMI Vocabulary (dcterms) Statistical dataset profile features (e.g. the number of instances) can be provided at the dataset and the attribute levels.
Dataset attributes: The attributes of the dcat:Dataset are modeled as instances of sml:Attribute. An attribute is described through its statistical characteristics at the instance level (e.g. the mean value sml:meanValue), along with the access information to the underlying data source (e.g. the column name in a relational database) to facilitate data access and materialization.
Dataset access: Simple-ML supports access to datasets through dedicated attributes that represent physical storage location and data format (e.g. sml:fileLocation and csvw:separator). Currently, relational databases (sml:Database) and text files (sml:TextFile) are supported.
Mapping between the Dataset Profile and the Domain Model: Dataset attributes are mapped to the concepts in the domain model (sml:DomainClass) through the sml:Mapping class, as illustrated in Fig. 1. This mapping adds domain-specific semantic description to the dataset attributes and facilitates their use in the SDAWs. The class sml:Mapping provides two properties: sml:mapsToProperty to map a dataset attribute to a property in the domain model, and sml:mapsToDomain to specify the rdfs:domain of this property, which is an instance of sml:DomainClass.
Data Catalog: Dataset profiles are organized in a domain-specific data catalog. The extensible Simple-ML data catalog is modeled as an instance of dcat:Catalog. The data catalog schema including representations of dataset profiles and the mapping to the domain model is illustrated in Fig. 2.