Keywords

1 Introduction

Digitization or digital transformation, financial technology (FinTech), and insurance technology (InsuranceTech) are rapidly transforming the financial and insurance services industry [1, 2]. Although it is evident the entire financial sector is in a moment of new opportunities and visible, tangible growth, it is also apparent that this transformation is motivated for the FinTech and InsuranceTech enterprises, which are heavily disrupting the traditional business models, and the volume of relevant investments is a proof: Over $23 billion of venture capital and growth equity has been allocated to FinTech innovations during 2011–2014, while $12.2 billion was deployed in 2014 alone [3]. Moreover, a recent McKinsey & Co study revealed that FinTech start-ups in 2016 exceeded 2.000, from approx. 800 in 2015 [4]. Furthermore, most global banks and investment firms have already planned to increase their FinTech/InsuranceTech investments to yield a 20% average return on their investments. Again, beyond FinTech/InsuranceTech, financial institutions and insurance organizations are heavily investing in their digital transformation as a means of improving the efficiency of their business processes and optimizing their decision making.

Traditionally the financial and insurance services sectors and particularly the banking sector have been quite resistant to technology disruption. This is no longer the case in the current trend of digitizing society and its services and applications. The waves of the digital economy and unified markets demand new paradigms to be designed, implemented, and deployed. The vast majority of services and applications that have been developed for the finance and insurance sectors are data-intensive. This transformation holds for applications in different areas such as retail banking, corporate banking, payments, investment banking, capital markets, insurance services, financial services security, and mail. These applications leverage very large datasets from legacy banking systems (e.g., customer accounts, customer transactions, investment portfolio data), which they combine with other data sources such as financial market data, regulatory datasets, social media data, real-time retail transactions, and more. Disruptive innovation in finance and insurance is already possible today, for example, with the advent of Internet-of-Things (IoT) devices and applications (e.g., Fitbits, smartphones, smart home devices), several FinTech/InsuranceTech applications can take advantage of contextual data associated with finance and insurance services to offer a better quality of service at a more competitive cost (e.g., personalized healthcare insurance based on medical devices and improved car insurance based on connected car sensors). Furthermore, alternative data sources (e.g., social media and online news) provide opportunities for new, more automated, personalized, and accurate services. Moreover, recent advances in data storage and processing technologies (including advances in Artificial Intelligence (AI) and blockchain technologies) provide new opportunities for exploiting the above-listed massive datasets, and they are stimulating more investments in digital finance/insurance services [5].

Financial and insurance organizations can take advantage of Big Data, IoT, and AI technologies to improve the accuracy and cost-effectiveness of their services and the overall value they provide to their corporate and retail customers. Nevertheless, despite early data space deployments, there are still many challenges that have to be overcome before leveraging the full potential of Big Data/IoT/AI in the finance and insurance sectors, which could also act as a catalyst for attracting more investments and for significantly improving the competitiveness of enterprises in these sectors[6].

This book chapter analyzes the basis of data space design and best practices for data interoperability by introducing concepts and illustrating the way to understand how to enable the interoperability of information using a methodological approach to formalize and represent financial data by using semantic technologies and information models (knowledge engineering) [7]. This chapter also focuses on the role that semantic technologies like Linked Data and information interoperability provide for the support of financial and insurance industries in the process of digital transformation.

The organization of this chapter is as follows: Section 2 presents challenges in the data space domain in terms of interoperability in the financial and insurance sectors where information exchange occurs to support Big Data, IoT, and AI-enabled services creation and delivery. Section 3 introduces the best practices for the data exchange approach in developments in several parallel streams. These streams facilitate information interoperability and act as a baseline supporting the information interoperability approach. Section 4 introduces the INFINITECH Way, a design, implementation, and deployment methodology to support FinTech Data Spaces. Section 5 presents the current state of the art and motivations for using semantic technologies in convergence and interoperability. Section 6 describes scalable features about the management of Linked Data and its benefits when used in the financial and insurance sectors. Section 7 presents the summary, and finally, some relevant references used in this chapter are listed.

2 Challenges in Data Space Design

Many of the challenges present in current Data Spaces [8] and information management systems are generated by data sharing and exchange, both considered data interoperability problems. Persistent challenges blocking progress in data space design and deployment are as follows.

2.1 Data Fragmentation and Interoperability Barriers

Nowadays, most of the data collected and possessed by financial organizations reside in a wide array of “siloed” (i.e., fragmented) systems and databases, including operational systems and OLTP (online transaction processing) databases, OLAP (online analytical processing) databases and data warehouses, data lakes (e.g., Hadoop-based systems) with raw data (including alternative data like social media), and others. In this fragmented landscape, heavy analytical queries are usually performed over OLAP systems, which leads financial organizations to transfer data from OLTP, data lakes, and other systems to OLAP systems based on intrusive and expensive extract-transform-load (ETL) processes.

In several cases, ETLs consume 75–80% of the budget allocated to data analytics while being a setup for seamless interoperability across different data systems using up-to-date data. Beyond the lack of integrated OLTP and OLAP processes, financial/insurance organizations have no unified way of accessing and querying vast amounts of structured, unstructured, and semi-structured data (i.e., as part of SQL and NoSQL databases), which increase the effort and cost that are associated with the development of Big Data analytics and AI systems. Moreover, there is a lack of semantic interoperability across diverse datasets that refer to the same data entities with similar (yet different) semantics beyond data fragmentation. This is a setback to sharing datasets across various stakeholders and enabling more connected applications that span multiple systems across the financial supply chain.

2.2 Limitations for Cost-Effective Real-Time Analytics

Most of the existing applications operate over offline collections of large datasets based on ETL (extract-transform-load) operations and fail to fully exploit the potential of real-time analytics, which is a prerequisite for a transition from reactive decisions (e.g., what to do following the detection of fraud) to proactive and predictive ones (e.g., how to avoid an anticipated fraud incident). Also, state-of-the-art near-real-time applications tend to be expensive as they have to persist large amounts of data in memory. Moreover, existing engines for real-time analytics (e.g., state-of-the-art streaming engines with stateless parallelization) have limitations when it comes to executing complex data mining tasks such as AI (deep learning-based) algorithms.

2.3 Regulatory Barriers

Big Data and IoT deployments must respect a complex and volatile regulatory environment. In particular, they must adhere to a range of complex regulations (e.g., PSD2 (Second Payment Services Directive), MiFIDII/MiFIDR (Markets in Financial Instruments Directive), 4MLD (fourth EU Money Laundering Directive) for financial/insurance) while at the same time complying with general regulations such as the GDPR (General Data Protection Regulation) and the ePrivacy directive. To this end, several RegTech initiatives aim at establishing regulatory sandboxes (e.g., [9,10,11]), i.e., specialized environments, that facilitate Big Data/IoT experimentation through ensuring access and processing of data in line with applicable laws and regulations. Nevertheless, the development of regulatory sandboxes is in its infancy and only loosely connected to leading-edge Big Data/IoT/AI technologies.

2.4 Data Availability Barriers

To innovate with IoT and Big Data, financial and insurance organizations (including FinTech/InsuranceTech innovators) need access to experimentation yet realistic datasets (e.g., customer account and payments’ datasets) that would allow them to test, validate, and benchmark data analytics algorithms. Unfortunately, such data are hardly available, as their creation requires complex anonymization processes or even tedious processes that can realistically simulate/synthesize them. Hence, innovators have no easy access to data for experimentation and testing of novel ideas [12]. Also, due to the fragmentation of Europe’s FinTech/InsuranceTech ecosystems, there are no easy ways to share such resources across financial/insurance organizations and innovators.

2.5 Lack of a Blueprint Architectures for Big Data Applications

Given the existing limitations (e.g., data silos and lack of interoperability), financial organizations are creating ad hoc solutions for their problems at hand. They leverage one or more instances of popular data infrastructures such as data warehouses, data lakes, elastic data stores, and machine learning toolkits in various deployment configurations. However, they have no easy way to create, deploy, and operate such infrastructures through adhering to proven patterns and blueprints that will lower their integration, deployment, and operation efforts and costs.

2.6 No Validated Business Models

Big Data and IoT deployments in finance/insurance have, in several cases, demonstrated their merits on the accuracy, performance, and quality of the resulting services (e.g., increased automation in business processes, improved risk assessment, faster transaction completion for end-users, better user experience). However, there is still a lack of concrete and validated business models that could drive monetization and tangible business benefits for these service improvements. Such business models could foster the rapid development and adoption of Big Data and IoT innovations, including emerging innovations that leverage real-time analytics and AI [13].

3 Best Practices for Data Space Design and Implementation

To address these challenges and leverage the full potential of Big Data (including AI) and IoT in finance/insurance, there is a need for developments in several parallel streams.

3.1 Technical/Technological Developments

At the technical/technological forefront, there is a need for Big Data architectures and toolkits tailored to the needs of data-intensive applications in the finance/insurance sector. These shall include several novel building blocks, including (1) infrastructures for handling arbitrarily large datasets from multiple fragmented sources in a unified and interoperable way; (2) semantic interoperability solutions for the financial/insurance supply chain; (3) novel techniques for real-time analytics and real-time AI; (4) advanced data analytics algorithms (including AI); (5) technologies and techniques for security and regulatory compliance, such as data encryption and anonymization technologies; (6) blueprint architectures for combining the above-listed building blocks with coherent and cost-effective solutions; and (7) open APIs that will facilitate innovators to produce and validate innovative solutions.

3.2 Development of Experimentation Infrastructures (Testbeds)

The development of Big Data, IoT, and AI-based innovations requires significant testing and validation efforts, such as testing for regulatory compliance and optimizing machine learning and deep learning data models. Therefore, there is a need for widely available experimentation infrastructures at the national and EU levels, which shall provide access to resources for application development and experimentation, such as datasets, regulatory sandboxes, libraries of ML (machine learning)/DL (deep learning) algorithms, Open (banking/finance) APIs, and more. Furthermore, such experimentation infrastructures should be available in appropriate testbeds, based on deploying the above-listed technical building blocks in various configurations. The latter should support experimentation and testing of all types of Big Data/AI/IoT applications in the finance and insurance sectors, such as KYC (Know Your Customer) and KYB (Know Your Business), credit risk scoring, asset management recommendations, usage-based insurance applications, personalized portfolio management, automated payment applications, and many more.

3.3 Validation of Novel Business Models

To showcase and evaluate the tangible value of the above-listed technologies and testbeds, there is also a need for validating them in the scope of real-life business cases involving realistic business processes and applications for retail and corporate finance/insurance. The validation shall focus on novel business models, which essentially disrupt existing operations of financial organizations and deliver exceptional business benefits in terms of automation, personalization, cost-effectiveness, and intelligence.

4 The INFINITECH Way to Design/Support FinTech Data Spaces

INFINITECH is the largest joint effort of Europe’s leaders in IT and finance/insurance sectors toward providing the technological capabilities, the experimentation facilities (testbeds and sandboxes), and the business models needed to enable European financial organizations, insurance enterprises, and FinTech/InsuranceTech innovators to fully leverage the benefits of Big Data, IoT, and AI technologies. The latter benefits include a shift toward autonomous (i.e., automated and intelligent) processes that are dynamically adaptable and personalized to end-user needs while complying with the sector’s regulatory environment. Furthermore, INFINITECH brings together all the stakeholders involving NGOs with their members, financial institutions and insurance companies, research centers, large industry, and SMEs.

4.1 Technological Building Blocks for Big Data, IoT, and AI

INFINITECH looks at the finance and insurance sectors and provides multiple assets, including infrastructures, components and toolkits for seamless data access and querying across multiple fragmented data sources, technologies for cost-effective real-time analytics, advanced analytics algorithms (including AI), technologies for Data Governance and regulatory compliance, technologies for trustful and secure data sharing over blockchain infrastructures, as well as handling of semantic data interoperability across stakeholders of the financial/insurance supply chain. Furthermore, INFINITECH also follows reference architecture (RA) approach for Big Data, IoT, and AI applications in the financial sector [14,15,16,17], whose aim is to serve as a blueprint for integrating, deploying, and operating Big Data and IoT infrastructures, including infrastructures that will leverage the above-listed building blocks. Furthermore, the reference architecture provides the means for integrating and deploying applications that take advantage of leading-edge technologies, including predictive analytics, different instances of AI (e.g., DL, chatbots), and blockchains.

4.2 Tailored Experimentation Infrastructures

INFINITECH provides the necessary mechanisms for creating tailored experimentation environments (i.e., testbeds and sandboxes) for different applications (e.g., sandboxes for fraud detection, credit risk assessment, personalized financial assistance) using flexible configurations of the testbed resources. Testbeds and sandboxes are used for different Big Data, IoT, and AI applications in the financial and insurance sectors, enabling innovators to access and share resources for testing, innovation, and experimentation, including previous datasets. The INFINITECH testbeds and sandboxes use the Open API standard for experimentation, and innovation is crucial. This facilitates the adoption and the extension of the designed, deployed, and tested solutions. ML/DL algorithms and regulatory compliance tools play a relevant role in the tailored experimentation testbeds and sandboxes.

INFINITECH uses this concept by deploying testbeds and sandboxes European-wide, thus demonstrating that it is possible to support the FinTech/InsuranceTech partners through experimentation testbeds. INFINITECH includes seven testbeds established at individual banks and one (EU-wide). The testbeds are made available to innovators’ communities via the established innovation management structures of the project’s partners and through a (virtualized) digital innovation hub (VDIH) set by the project as part of its exploitation strategy.

4.3 Large-Scale Innovative Pilots in Finance and Insurance

The use of a large ecosystem like INFINITECH for texting and validation will leverage both the technological developments of the project (including the INFINITECH reference architecture) and the testbeds/sandboxes to later deploy and implement as part of commercial solutions the novel and validated use cases. The pilot’s target in real-life environments is based on realistic datasets, i.e., either anonymized or synthetic datasets with pragmatic statistical properties. The pilots will span a wide array of areas covering the most prominent processes of the financial and insurance sectors, including KYC and customer-centric analytics, fraud detection and financial crime, credit risk assessment, risk assessment for capital management, personalized portfolio management, risk assessment in investment banking, personalized usage-based insurance, insurance product recommendations, and more. The pilots will demonstrate the added value of the project’s technologies and testbeds while at the same time showcasing the project’s disruptive impact on Europe’s financial and insurance sectors.

4.4 Business Model Development and Validation

In the scope of the innovative pilots and use cases in finance and insurance, notably a novel and replicable business model or a set of them needs to be associated with each of the listed pilots/use cases [18, 19]. A practice to resolve one of the significant issues when developing new technologies based on experimentation is the use of a real exploitation model. These business models will pave the ground for disrupting the financial sector based on advanced Big Data, IoT, and AI infrastructures and applications, thus demonstrating the tangible impact of the project in financial institutions, insurance organizations, and FinTech/InsuranceTech enterprises.

5 Technology Capabilities for Convergence and Interoperability

Financial technology (FinTech) and insurance technology (InsuranceTech) are rapidly developing and have created new business models and transformed the financial and insurance services industry in the last few years. Technological convergence supporting data sharing and exchange between services applications is a barrier that the financial and insurance sectors have recently confronted with the globalization of economies and markets for a long time. This need is becoming more relevant, and today more than ever before, it needs to be addressed. Semantic technologies have played a crucial role as an enabler of many of the applications and services in other domain areas, although not much in the financial domain, and as has already been mentioned in the financial and insurance sectors, it is until just recently that the progress, in terms of implementation, has become more evident requirements; however, the convergence between technological development and interoperability has not entirely run in parallel, mainly due to many complex issues involving non-interoperable aspects where social, economic, and political dimensions are taking place.

5.1 Semantic Interoperability and Analytics

INFINITECH provides a shared semantics solution for the interoperability of diverse finance/insurance datasets. To this end, the project relies on existing ontologies for financial information modeling and representation (such as FIBO, FIGI, and LKIF) [20, 21], which are appropriately extended as required by the INFINITECH-RA and the project’s pilots. Moreover, INFINITECH offers a solution for parallel and high-performance analytics over semantic streams, based on the customization of existing solution of semantic linked stream analytics (such as NUIG’s Super Stream Collider (SSC) solution [22,23,24,25]). The INFINITECH semantic interoperability infrastructure is available in all cases/pilots where semantic reasoning will be required for extra intelligence.

5.2 INFINITECH Building Blocks for Big Data, IoT, and AI

There is always a high demand for integrated systems and technological components that can almost transparently connect and transfer data in the finance and insurance sectors. The integrated environment, including infrastructures, components, and toolkits, shall be designed to support seamless data access and querying across multiple fragmented data sources, technologies for cost-effective real-time analytics, advanced analytics algorithms (including AI), technologies for Data Governance and regulatory compliance, technologies for trustful and secure data sharing over blockchain infrastructures, as well as handling of semantic data interoperability across stakeholders of the financial/insurance supply chain. INFINITECH emerges as an alternative to those technological demands and provides a reference architecture (RA), as shown in Fig. 1. The INFINITECH reference architecture brings together technologies for Big Data, IoT, and AI applications in the financial sector, which will serve as a blueprint for integrating, deploying, and operating Big Data and IoT infrastructures.

Fig. 1
figure 1

INFINITECH reference architecture – high-level overview of Big Data/IoT platforms and technological building blocks

INFINITECH provides the means for integrating and deploying applications that take advantage of leading-edge technologies, including predictive analytics, different instances of AI (e.g., DL, chatbots), and blockchains between other technologies. Figure 1 depicts the mentioned innovation-driven functional architecture approach from the INFINITECH ecosystem. It is an overall and FinTech holistic view. It’s design and implementation rely on the intelligence plane, a combination of Big Data, IoT, and AI analytics applications. In the INFINITECH reference architecture data analytics plane, the exchange of information facilitates knowledge-driven support and the generation of composing services with operations by enabling interoperable management information.

The INFINITECH approach, in terms of the design approach, uses the design principles introduced in this chapter and looks at implementing the different scenarios and testbeds as described. INFINITECH moves toward converged IP and cloud-based communications networks, increasing solutions to a number of significant technical issues by using more standard information exchange, promoting information interoperability, and allowing that the testbeds and sandboxes be managed effectively, and, most importantly, offering new open opportunities for a user knowledge-based service-oriented support can have a fundamental impact on future financial and insurance services.

6 Scalability and Security Considerations for FinTech and InsuranceTech

There are basic characteristics that must be taken into consideration at the time of building new approaches for the FinTech and InsuranceTech industries. Relatively modern ways to build multiple applications are sandboxes and testbeds specialized for providing close to real deployment, and thus implementations can be tested under real digital infrastructure conditions. The Big Data/IoT technologies and sandboxes/testbeds must be coupled with novel business models that will enable a whole new range of novel applications that will emphasize automation, intelligence, personalization, security, stakeholders’ collaboration, and regulatory.

INFINTECH provides a 360° coverage of all the issues that hinder financial institutions’ and FinTech enterprises’ efforts to use and fully leverage IoT and Big Data technologies, including AI. Thus, this section summarizes pragmatically (following INFINITECH experiences) how those efforts, for scalability, are done.

Compliance and at the same time INFINITECH will enable the development, deployment, and business validation of a whole new range of applications that will be characterized by SHARP (Smart, Holistic, Autonomy, Personalized and Regulatory Compliance) characteristics. The following are just short paragraphs describing how SHARP can be implemented and briefly and as implementation reference how they have been addressed in INFINITECH.

  • Smart: Services shall take advantage of predictive analytics and AI on Big Data datasets to anticipate changes in financial/insurance contexts and automatically adapt to them. INFINITECH has designed a set of end-to-end, business-to-customer, or business-to-business applications. Those services are based on analytics or ML solutions.

  • Holistic: Architectures shall empower a holistic approach to data-driven services, which shall support all different financial applications across all phases of their lifecycle, including applications spanning multiple stakeholders and systems in the financial supply chain. INFINITECH implemented a series of reference architectures in the form of functional components.

  • Autonomy: The deployed infrastructures shall take advantage of Big Data, IoT, and AI to significantly reduce error-prone manual processes and decision making through increased automation. The INFINITECH reference architecture presented paves the ground for the fully autonomous processes in the future that will disrupt the entire finance and insurance sectors. The use of the architecture is a sign of autonomy, and it does contain all the necessary software parts to run and operate services.

  • Personalization: Processes where data is processed timely profile customers and subsequently offer them individualized and dynamic services that adapt to their needs. INFINITECH involved KYC/KYB processes to include individuals’ characteristics and build profiles based on the available data.

  • Regulatory Compliant: Based on the use of particular or general technologies, the financial and insurance enterprises can ensure regulatory compliance by design. INFINITECH will take advantage of data processing to achieve faster, cost-effective, and reliable compliance to regulations.

To ensure scalability and security, permissioned blockchain for data sharing and data trading is required. There are already several use cases in the finance/insurance sector [26] that involve sharing of data across different organizations (e.g., sharing of customer data for customer protection or faster KYC, sharing of businesses’ data for improved credit risk assessment, sharing of customer insurance data for faster claims management, and more); these are the ideal scenarios for emerging solutions like DLT (distributed ledger technologies, the baseline for blockchain technologies).

INFINITECH uses a permissioned DLT infrastructure, which provides privacy control, auditability, secure data sharing, and faster operations. Some of the later characteristics are inherent in permissioned DLT’s features (i.e., the Hyperledger Fabric by IBM and the Linux Foundation) and can be directly configured in INFINITECH testbeds, sandboxes, and pilots. The core of the DLT infrastructure, i.e., the Fabric, will be enhanced in two directions: (1) integration of tokenization features and relevant cryptography as a means of enabling assets trading (e.g., personal data trading) through the platform and (2) for selected pilots enhancement of the blockchain infrastructure with multi-party computation (MPC) and linear secret sharing (LSS) algorithms from the OPAL (Open Algorithm Project) to enable querying of encrypted data as a means of offering higher data privacy guarantees.

7 Conclusions

This chapter addresses best practices for data space design and implementation identified from the state-of-the-art analysis. These challenges are tested and validated in the context of an H2020 European large-scale ecosystem called INFINITECH. The best practices leverage the full potential of Big Data, IoT, and AI applications in finance/insurance and identify a need for developments in several other areas to support scaling-up applications.

The interoperable data model following the formalization of vocabularies [7, 27] and using the FIBO, FIGI, and LKIF approaches were mentioned. It is the most adaptive interoperability model for the financial and insurance sector. Although the details are out of the scope of this chapter, the references and methods to use them as part of the formal vocabularies to build the FinTech and InsuranceTech lingua franca are already an innovative approach toward data exchange and data sharing capabilities and are introduced as part of the results from the INFINITECH ecosystem approach called INFINITECH Way.

The INFINITECH reference architecture has been introduced, which provides the means for integrating and deploying applications that take advantage of leading-edge technologies, including predictive analytics, different instances of AI (e.g., DL, chatbots), blockchains, and more. In addition, INFINITECH provides technology for semantic interoperability based on shared semantics, along with a permissioned blockchain solution for data sharing across finance/insurance institutions.

This book chapter analyzed the basis of data space design and discussed the best practices for data interoperability by introducing concepts and illustrating the INFINITECH Way to enable interoperability of the information using a reference architecture following the methodological approach and the formalization and representation of financial data by using semantic technologies and information models (knowledge engineering). In addition, the INFINITECH Way introduced discusses best practices and explains how challenges for data interoperability can be overcome using a graph data modeling approach as part of the FinTech and InsuranceTech.