1 Introduction

It is estimated that up to 30% of the total health care budget in Ireland is spent managing the large amounts of data generated, i.e. collecting data, querying data and storing data. Reports indicate that much of the healthcare data in Ireland, like the US, often exists in silos, fragmented processes and is only accessible to disparate stakeholders [11, 21]. Of the data that is digitized, it is rarely standardised from the perspective of data interoperability, meaning it does not adhere to any particular standardised terminologies, schema or syntax. This impacts on data quality and data value in the longer term [14].

The Web of Data is an initiative to make data open and interconnected, stored and shared across the World Wide WebFootnote 1 using a well established architecture, the semantic web stack [23]. Central to this is the concept of Linked Data (LD), a way of structuring and sharing data on the web based on the Resource Description Framework (RDF). By defining data models using semantic web technologies, it becomes possible to make data schemas available using standard web access mechanisms, e.g. HTTP. Once a data schema is described using an ontology (or as an RDF vocabulary) and published, it resides on the web, and any data described using LD can be associated with this ontology (or vocabulary) so that the semantics of the data are open and freely available to a global audience. Combined with SPARQL, an RDF query language, this is a powerful tool for sharing and re-using data and has the potential to provide improved support for interoperability in health and social care in Ireland. Already within the health care domain, organisations such as HL7Footnote 2 are exploring the use of resource based data models as evident in the Fast Healthcare Interoperability Resources (FHIR) [1, 5]. FHIR provides an architecture which aligns with the semantic web and can provide a sound basis for supporting data interoperability in the health domain.

In this article we present an analysis of health care data in the Irish context [12], with a particular focus on demographic data as this is well established within the health domain and already has several standard approaches to supporting its exchange, e.g. the ISO 13606 demographics package [6], OpenEHR [4] and FHIR (captured under Person and Patient) [1]. This analysis forms part of a methodology for supporting the uplift of data into the Web of Data. The article is structured as follows, first we provide a description of our methodology for supporting semantic uplift, next a brief overview of some of the relevant health care standards for capturing demographic data, then an analysis of health care data in the Irish context with a focus on demographic data. Finally, some recommendations for managing health care data that can be applied within the Irish context for sharing demographic data in accordance with national planned eHealth Strategy. It is also one of two articles, the second providing an overview of where this work fits into the wider requirements for interoperability in health care in Ireland and internationally [15].

2 Methodology for Semantic Uplift

Semantic uplift is the conversion of structured or semi-structured data into Linked Data based upon semantic-web technologies. Our process for supporting semantic uplift within the healthcare domain is based on a standard methodology for ontology developmentFootnote 3, which consists of defining the scope, reuse of existing ontologies and vocabularies, enumeration of terms, definition of classes, properties and constraints, and finally the creation of instances (Fig. 1). Ontology development (chevron 3–6 in Fig. 1) is only required where analysis determines no existing vocabulary can be found to satisfy the data exchange requirements defined within the scoping stage, or to support the interlinking process where multiple ontologies have been found. This article addressed specifically the analysis of existing standards and ontologies (Sect. 3), and existing data schemas (Sect. 4).

Fig. 1.
figure 1

Overview of methodology for developing ontology

3 Analysis of Demographic Data in Electronic Health Care Standards

Electronic Health Record (EHR) systems in Europe and internationally are digitized records of patient and population health information that can be shared between health care settings [19]. Here we examine how several key international standards model demographic data.

3.1 ISO 13606

In the European Union CEN’s TC 251 Technical Committee is a decision making body for Standardization. They develop standards in the field of Health Information and Communications Technology (ICT). The goal is interoperability between independent EHR systems and towards this goal CEN TC 251 have generated a large number of standards related to health care for harmonisation of systems and data [9], of particular relevance is ISO 13606-1 [2] which specifies a generic information model of part or all of the EHR of a single identified subject of care between EHR systems, or between EHR systems and a centralized EHR data repository. ISO/EN 13606 is based on a dual model: a Reference Model (RM) for information, and an Archetype Object Model (AOM) for defining knowledge, i.e. the concepts of the clinical domain by means of Archetypes. Archetypes are patterns that represent the specific characteristics of the clinical data, and aim to support domain experts to create and change archetypes, giving them control over how EHRs are built and to reflect their knowledge.

Several classes are available to represent how a clinical system will be delivered to a recipient. These are EHR_EXTRACT - a top level container for a transaction between an EHR Provider and an EHR Recipient, FOLDER - a means for compartmentalizing within EHR, COMPOSITION - information committed to an EHR by some agent, SECTION within a composition, ENTRY - a result from a clinical action, i.e an observation or test result, CLUSTER - a means to organise multiple entries, e.g. in a time series and ELEMENT - containing a single data value. Built upon these classes, the ISO 13606 demographics package was developed to solve three scenarios related to demographic data: 1) minimum identification to permit demographic matching between two systems; 2) a rich enough descriptor set to populate a recipient’s demographic system with enough to identify and contact persons or organisations, and 3) for the whole thing to be optional if the exchange is occurring inside a shared demographics realm [3]. It enables the modelling of values such as a persons name, postal address, gender, birth date, birth order and deceased time.

The main shortcomings with 13606-1 are related to its licensing, which is not open, inhibiting re-use. There are also issues with its definition of interfaces as these are minimal and would benefit from expansion [3]. A good candidate for this is SPARQL, and more recent work has looked specifically at converting ISO 13606-1 into an OWL ontology, called OntoCR [16], although work on OntoCR appears to have ceased as of 2016 and the ontology is not publicly available.

3.2 OpenEHR

OpenEHRFootnote 4 is another standard for electronic health records which unlike 13606-1 is open source, maintained by the openEHR foundation and HL7 (see Sect. 3.3). ISO 13606-1 forms the backbone of the openEHR reference model, which can be viewed as a super-set of the 13606 RM, and the archetype model in ISO “13606 Part 2: Archetypes” is similar to that published by openEHR. OpenEHR provides specifications for the management, storage and retrieval of an Electronic Patient Record (EPR), and not just the communication of EHR data [22]. Like ISO 13606, OpenEHR has two distinct levels of models; a high level standardised RM with generic concepts; and a lower level more specific (clinical) model based upon archetypes, the archetype model (AM). Using this two-level approach, openEHR accounts for changes in clinical concepts by only requiring modifications to the archetypes, without having to change the reference model [18]. The types of archetypes are listed as follows; demographic, composition, section, entry (these include observation, instruction, action, evaluation and admin_entry), cluster and element.

Demographic dataFootnote 5 is influenced by clinical adaptations including the HL7v3 Reference Information Model (RIM) (Sect. 3.3). At the top most level is the concept of “Party”. This is a superclass of Actor and Role. Actor in turn is a superclass of Agent, Group, Organisation and Person. A class “Party_Relationship” provides for the definition of relationships between parties. In addition to these classes, there also exist “Party_Identity”, Contact and Address.

To ensure interoperability, consistent use of archetypes between different openEHR systems is required. To this end, the openEHR community has provided the Clinical Knowledge Manager (CKM), an international repositoryFootnote 6, where clinicians can freely develop, manage, publish and use archetypes, and several countries have established Electronic Patient Record strategies involving openEHR (such as UK, Norway and Australia) [22]. A criticism of openEHR is that the complexity of the data models, for example the CKM has over 500 Archetypes available, designed to cover all possible data elements, understanding these requires an investment of time and effort. Also the modelling of new archetypes is a complex task, where careful weighing of benefits against costs must be considered [7]. To support greater re-use and continuous improvement of archetypes, work has been conducted on converting openEHR to OWL, and OWL versions of the standards are available. These include an OWL description of the openEHR demographic modelFootnote 7. Having contacted the author of this ontology it is no longer being actively developed, and has not been applied to a specific use case.

3.3 HL7

HL7 is developed by HL7 international, concerned with interoperability standards for health informatics. Within the Irish context, the Health Service Executive (HSE) Healthlink is a key resource for GP messaging standards using message files specified in HL7 [13]. HL7 consists of several different standards as well as a framework to develop these standards, the HL7 Development Framework (HDF). This is a framework of modeling and administrative processes, policies, and deliverables used to produce specifications that are used by the healthcare information management community to overcome challenges and barriers to interoperability among computerized healthcare-related information systems. /Standards developed within HL7 include the Clinical Document Architecture (CDA) based on the RIM for the generation of EHR document, HL7 v2 to support hospital workflows and HL7 v3 which aims to support all healthcare workflows, and unlike HL7 v2 is based upon object oriented principles.

A promising new standard is the HL7 Fast Healthcare Interoperability Resource (FHIR), developed by the HL7 FHIR working group [17]. While openEHRs focus is on the data model and complete data models, FHIR is more concerned with information exchange and the description of the APIs, providing the option to extend information models as required. FHIR is a web-based standard and uses the REpresentational State Transfer (REST). RESTful web services typically communicate over HTTP, and thus provide interoperability between computer systems on the Internet. Combined with a Linked Data module utilizing Semantic Web technologies, i.e. RDF and OWL, the semantic expression capability of FHIR can be expanded and facilitate inference and data linkage across datasets. Resources can be combined using PROFILES to identify packages of data to address clinical and administrative needs. Profiles constrain what a particular application needs to communicate based on Resources and Extensions (data elements, self-defined, that are not part of the core set), i.e. you only send data that is required for specific purposes. Examples of Profiles are for referral of a patient; for populating registries; adverse event reporting; ordering a medication; and providing data to a clinical decision support algorithm such as a risk assessment calculation. FHIR offers a promising method for supporting interoperability based upon RDF and ontologies [20]. The FHIR ontologyFootnote 8 has well defined and detailed descriptions related to demographics, such as the concept of a Person, with data properties; name, address, telecom, photo, birthDate, gender, etc. It therefore provides a strong basis to support data interoperability within the Irish health services. In the next section we explore the representation of demographic data within the Irish Health domain.

4 Analysis of Healthcare Collections in Ireland

In this section we analyse healthcare collections in Ireland with a focus on demographic data, as it is the most well represented data category, with 28 of the 75 catalogues explicitly mentioning they cover demographic data, the largest of any specific category. The consistent representation of demographic data is therefore essential to support interoperability between health services.

The analysis was done based upon the Health Information and Quality Authority (HIQA) Catalogue of national health and social care data collections [12]. HIQA is an independent authority established to drive high-quality and safe care for people using health and social care services in Ireland. HIQA’s role is to develop standards, inspect and review these services and support informed decisions on how services are delivered. Towards this goal, HIQA has published the “Catalogue of national health and social care data collections” (version 3). The aim of this third version of the Catalogue is to enable all stakeholders (including the general public, patients and service users, clinicians, researchers, and healthcare providers) to readily access information about health and social care data collections in Ireland. The catalogue consists of a comprehensive list of national health and social care data collections. These are national repositories of routinely collected health and social care data in the Republic of Ireland.

The catalogue lists 75 of these collections, and for each collection provides data in terms of title, managing organisation, description/summary, data providers, available data dictionaries, data content (i.e. a breakdown of the type of data collected) etc. In order to structure the analysis, HIQAs National standard demographic dataset and guidance for use in health and social care settings in Ireland was used [10]. This provides guidelines on a set of concepts and properties for describing demographic data, such as related to name, date of birth, contact details, address, etc. (see Fig. 2).

Fig. 2.
figure 2

HIQA Overview of Demographic Data

4.1 Methodology for Analysis of Health Catalogue

The methodology consisted of three main phases. In the first phase the data dictionaries given in the data dictionary field were analysed. The second phase the different named concepts were extracted from the data field. The third phase consisted of a harmonisation process, to identify a set of classifications for the different concepts identified.

4.2 Results of Analysis

The analysis began with the collections with associated data dictionaries, 39 had “no”, “not available”, or “not available online”. Of the remaining 36, 15 provided links (such as www.noca.ie) with no obvious way to access the data dictionary or required a password, 3 had broken links, 5 mentioned resources that could not be located (e.g. Under revision as part of HRB LINK project) and so these were discounted. The remaining 13 data dictionaries were mostly pdf documents, such as the Ambulatory Care Report (ACR), Cardiac First Responder (CFR), Patient Care Report (PCR), Patient Treatment Register (PTR) standards, as well as EUROCATFootnote 9, and heartwatch. The Irish Mental Health Care provides an excel file.

Secondly all 75 collections “data content” field was analysed. Typical examples of this type of data (without a corresponding data dictionary) is “Name, address, date of birth, gender, District Electoral Division (DED), HSE area, Local Health Office (LHO) area, task force area, date commenced on methadone, type of methadone treatment, prescribing doctor, dispensing clinic, date and reason for discontinuation of methadone, client photograph and client signature.”, although the range of data concepts covered is highly varied, reflecting the nature of the health services. From this analysis a matrix of collections against listed data concepts was created (such as name, HSE area, so on) and a tick was given for a data concept if it is present in a collection. Due to the wide range of data concepts, a process of harmonisation took place to identify classes for data concepts, either taken the name of the data concept directly, i.e. “Name”, or deriving an appropriate class for a set of data concepts, e.g. “Patient” and “Person”, based on our analysis of openEHR and FHIR, and also the HIQA schema.

Fig. 3.
figure 3

Number of “Person” related data properties identified across analysed health care catalogues.

Figure 3 gives the count for each concept explicitly referenced in the data content field, with gender and date of birth (dob) being the most represented. It should be noted that it is expected that more data related to the person class is included in the data collections, as often they refer to “demographic data, for example...” listing then one or two examples of the type of demographic data (this explains the high number of gender and dob). Figure 4 shows number of properties for both Name and Contact Details, both are classes directly related to Person. There are three Object Properties (associations) that relate Person to these classes.

Fig. 4.
figure 4

Number of “Name” and “Contact Details” related data properties identified across analysed health care catalogues

Through this process over 50 potential classes have been identified within the Irish health domain, ranging from (not an exhaustive list): person, name, contact details, patient, patient infant, patient pregnant, disabled person, address, location, medical/clinical information/assessment, treatment, therapy, prescription, observations, test, results, diagnosis, event, incident, injury, death, paediatric mortality, service, procedure, operation, product, device, vehicle,vaccination/ immunisations, disease, infection, staff, practitioner, admission, child admission unit, legal status, approved centre, etc. Each class has two or more data concepts taken from the analysis.

4.3 Demographic Data Alignments with Standards

Table 1 gives a very high level overview of some of the data concepts and properties identified in the above standards with respect to demographics. As can be seen, the 13606 demographics package includes concepts such as Person, Postal Address, gender, name, gender and birth time. It does not explicitly have a concept Patient, although it does have an “entity role relationship”, so the potential exists to model a patient relationship between person entities in a similar fashion as OpenEHRs demographic information model, in which a Patient can be defined using the “Party_Relationship” held between a Person and an Organisation. The OpenEHR Person concept is more detailed than 13606-1 allowing a person to have a contact, and also a “Party_Identity” which can be broken down into elements for first name, last name, etc. The OpenEHR CKM provides more detailed archetypes for defining demographics (e.g. the DEMOGRAPHIC-CLUSTER archetypes), these cover the concepts found in our analysis of the Irish Health domain. Nonetheless, the concepts are scattered across the different archetypes. The FHIR ontology on the other hand covers all the required concepts and each can be found explicitly defined in the ontology. Given the focus of FHIR on information exchange, its adherence to semantic web principles, we therefore believe FHIR is the most suitable approach for managing Irish healthcare data going forward.

Table 1. Occurrence of demographic related concepts in standards

5 Conclusion

In this article we presented a review of standards relevant to the definition of demographic data in the health care domain in Europe and internationally and identified the need for harmonisation to ensure data interoperability. Semantic web technologies such as RDF, OWL and SPARQL have been identified as a prime candidate for supporting greater data interoperability, as demonstrated by the research efforts to convert existing resources and the move towards these technologies in HL7 FHIR.

An analysis was conducted over the Irish health catalogue provided by the Irish Health Information and Quality Authority (HIQA). This gives an overview of 75 collections (data sets maintained by different health services in Ireland) and provides information on each in terms of the types of data being collected. From this analysis, typical data concepts, i.e. those related to demographic data on patients (i.e. age, gender etc.) have been identified and these can each be directly mapped to Patient and Person concepts modelled within the FHIR ontology. We therefore believe that FHIR is currently explicit enough to support interoperability of demographic data within the Irish health context. By extending vocabularies such as FHIR, additional data properties required within the Irish context can be provided while maintaining interoperability with the wider international community. This is an important step towards greater data interoperability for health services in Ireland.

This work is being conducted within the greater context of eHealth Ireland with coordination and collaboration of various stakeholders such as the eHealth ecosystem [8], which includes patients, providers, software vendors, legislators, and health information technology (IT) professionals. This is important to foster ownership ensuring that health care data is not viewed solely as a commodity for profit, rather than a means to improve health care in Ireland. The next steps for this work are to examine a wider range of data concepts, beyond demographic data, and determine if FHIR is suitable for managing these data resources, particularly within the context of the FAIRVASC project and the management of data related to the rare disease vasculitis.