In order to implement the research and innovation strategy, and to align technical issues with aspects of cooperation and coordination, five major types of mechanisms were identified:
-
Innovation Spaces (i-Spaces): Cross-organisational and cross-sectorial environments that allow challenges to be addressed in an interdisciplinary way and serve as a hub for other research and innovation activities
-
Lighthouse projects: To raise awareness of the opportunities offered by big data and the value of data-driven applications for different sectors, acting as incubators for data-driven ecosystems
-
Technical projects: To tackle specific big data issues, addressing targeted aspects of the technical priorities
-
Data platforms: To support the sharing and trading of industrial and personal data (free flow of data) as a key enabler of the data economy
-
Cooperation and coordination projects: To foster international cooperation for efficient information exchange and coordination of activities within the ecosystem
4.1 European Innovation Spaces (i-Spaces)
Extensive consultation with many stakeholders from areas related to big data value (BDV) had confirmed that in addition to technology and applications, several key issues required consideration. First, infrastructural, economic, social and legal issues have to be addressed. Second, the private and public sectors need to be made aware of the benefits that BDV can provide, thereby motivating them to be innovative and to adopt BDV solutions.
To address all of these aspects, European cross-organisational and cross-sectorial environments, which rely and build upon existing national and European initiatives, play a central role in a European big data ecosystem. These so-called European Innovation Spaces (or i-Spaces for short) are the main elements to ensure that research on BDV technologies and novel BDV applications can be quickly tested, piloted and thus exploited in a context with the maximum involvement of all the stakeholders of BDV ecosystems. As such, i-Spaces enable stakeholders to develop new businesses facilitated by advanced BDV technologies, applications and business models. They contribute to the building of communities, providing a catalyst for community engagement and acting as incubators and accelerators of data-driven innovation.
In this sense, i-Spaces are hubs for uniting technical and non-technical activities, for instance, by bringing technology and application development together and by fostering skills, competence and best practices. To this end, i-Spaces offer both state-of-the-art and emerging technologies and tools from industry, as well as open-source software initiatives; they also provide access to data assets. In this way, i-Spaces foster community building and an interdisciplinary approach to solving BDV challenges along the core dimensions of technology, applications, legal, social and business issues, data assets, and skills.
The creation of i-Spaces is driven by the needs of large and small companies alike to ensure that they can easily access the economic opportunities offered by BDV and develop working prototypes to test the viability of actual business deployments. This does not necessarily require moving data assets across borders; rather, data analytic tools and computation activities are brought to the data. In this way, valuable data assets are made available in environments that simultaneously support the legitimate ownership, privacy and security policies of corporate data owners and their customers, while facilitating ease of experimentation for researchers, entrepreneurs and small and large IT providers.
Concerning the discovery of value creation, i-Spaces support various models: at one end, corporate entities with valuable data assets can specify business-relevant data challenges for researchers or software developers to tackle; at the other end, entrepreneurs and companies with business ideas to be evaluated can solicit the addition and integration of desired data assets from corporate or public sources. i-Spaces also contribute to filling the skills gap Europe is facing in providing (controlled) access to real use cases and data assets for education and skills improvement initiatives.
i-Spaces themselves are data-driven, both at the planning and the reporting stage. At the planning stage, they prioritise the inclusion of data assets that, in conjunction with existing assets, present the greatest promise for European economic development (while taking full account of the international competitive landscape); at the reporting stage, they provide methodologically sound quantitative evidence on important issues such as increases in performance for core technologies or reductions in costs for business processes. These reports have been an important basis to foster learning and continuous improvement for the next cycle of technology and applications.
The particular value addition of i-Spaces in the European context is that they federate, complement and leverage activities of similar national incubators and environments, existing PPPs, and other national or European initiatives. With the aim of not duplicating existing efforts, complementary activities considered for inclusion have to stand the test of expected economic development: new data assets and technologies are considered for inclusion to the extent that they can be expected to open new economic opportunities when added to and interfaced with the assets maintained by regional or national data incubators or existing PPPs.
Over recent years, the successive inclusion of data assets into i-Spaces, in turn, has driven and prioritised the agenda for addressing data integration or data processing technologies. One example is the existence of data assets with homogenous qualities (e.g. geospatial factors, time series, graphs and imagery), which called for optimising the performance of existing core technology (e.g. querying, indexing, feature extraction, predictive analytics and visualisation). This required methodologically sound benchmarking practices to be carried out in appropriate facilities. Similarly, business applications exploiting BDV technologies have been evaluated for usability and fitness for purpose, thereby leading to the continuous improvement of these applications.
Due to the richness of data that i-Spaces offer, as well as the access they afford to a large variety of integrated software tools and expert community interactions, the data environments provide the perfect setting for the effective training of data scientists and domain practitioners. They encourage a broader group of interested parties to engage in data activities. These activities are designed to complement the educational offerings of established European institutions.
4.2 Lighthouse Projects
Lighthouse projectsFootnote 1 are projects with a high degree of innovation that run large-scale data-driven demonstrations whose main objectives are to create high-level impact and to promote visibility and awareness, leading to faster uptake of big data value applications and solutions.
They form the major mechanism to demonstrate big data value ecosystems and sustainable data marketplaces, and thus promote increased competitiveness of established sectors as well as the creation of new sectors in Europe. Furthermore, they propose replicable solutions by using existing technologies or very near-to-market technologies that show evidence of data value and could be integrated in an innovative way.
Lighthouse projects lead to explicit business growth and job creation, which is measured by the clear indicators and success factors that had been defined by all projects in both a qualitative and quantitative manner beforehand.
Increased competitiveness is not only a result of the application of advanced technologies; it also stems from a combination of changes that expand the technological level, as well as political and legal decisions, among others. Thus, Lighthouse projects were expected to involve a combination of decisions centred on data, including the use of advanced big data-related technologies, but also other dimensions. Their main purpose has been to render results visible to a widespread and high-level audience to accelerate change, thus allowing the explicit impact of big data to be made in a specific sector, and a particular economic or societal ecosystem.
Lighthouse projects are defined through a set of well-specified goals that materialise through large-scale demonstrations deploying existing and near-to-market technologies. Projects may include a limited set of research activities if that is needed to achieve their goals, but it is expected that the major focus will be on data integration and solution deployment.
Lighthouse projects are different from Proof of Concepts (which are more related to technology or process) or pilots (which are usually an intermediate step on the way to full production): they need to pave the way for a faster market roll-out of technologies (big data with Cloud and HPC or the IoT), they need to be conducted on a large scale, and they need to use their successes to rapidly transform the way an organisation thinks or the way processes are run.
Sectors or environments that were included were not pre-determined but had been in line with the goal mentioned above of creating a high-level impact.
The first call for Lighthouse projects made by the BDV PPP resulted in two actions in the domains of bioeconomy (including agriculture, fisheries and forestry) and transport and logistics. The second call resulted in two actions for health and smart manufacturing.
Lighthouse projects operate primarily in a single domain, where a meaningful (as evidenced by total market share) group of EU industries from the same sector can jointly provide a safe environment in which they make available a proportion of their data (or data streams) and demonstrate, on a large scale, the impact of big data technologies. Lighthouse projects used data sources other than those of the specific sector addressed, thereby contributing to breaking silos. In all cases, projects did enable access to appropriately large, complex and realistic datasets.
Projects needed to show sustainable impact beyond the specific large-scale demonstrators running through the project duration. Whenever possible, this was addressed by projects through solutions that could be replicated by other companies in the sector or by other application domains.
All Lighthouse projects were requested to involve all relevant stakeholders to reach their goals. This again did lead to the development of complete data ecosystems of the addressed domain or sector. Whenever this was appropriate, Lighthouse projects did rely on the infrastructure and ecosystems facilitated by one or more i-Spaces.
Some of the indicators that were used to assess the impact of Lighthouse projects have been the number and size of datasets processed (integrated), the number of data sources made available for use and analysis by third parties, and the number of services provided for integrating data across sectors. Market indicators are obviously of utmost importance.
Key elements for the implementation of Lighthouse projects include at least the following areas.
The Use of Existing or Close-to-Market Technologies
Lighthouses have not been expected to develop entirely new solutions; instead, they have been requested to make use of existing or close-to-market technologies and services by adding and/or adapting current relevant technologies, as well as accelerating the roll-out of big data value solutions using the Cloud and the IoT or HPC. Solutions should provide answers for real needs and requirements, showing an explicit knowledge of the demand side. Even though projects were asked to concentrate on solving concrete problems which again might easily lead to specific deployment challenges, the replicability of concepts was always a high priority to ensure impact beyond the particular deployments of the project. Lighthouse projects have been requested to address frameworks and tools from a holistic perspective, considering, for example, not only analytics but also the complete data value chain (data generation, the extension of data storing and analysis).
Interoperability and Openness
All projects did take advantage of both closed and open data; during the project, they could determine if open source or proprietary solutions were the most suitable to address their challenges. However, it was always requested that projects promote the interoperability of solutions to avoid locking in customers.
The involvement of smaller actors (e.g. through opportunities for start-ups and entrepreneurs) who can compete in the same ecosystem in a fair way was always a must. For instance, open Application Programming Interfaces (APIs) had been identified as an important way forward (e.g. third-party innovation through data sharing). In addition, projects have been requested to focus on re-usability and ways to reduce possible barriers or gaps resulting from big data methods impacting end-users (break the ‘big data for data analysts only’ paradigm).
Performance
All projects have been requested to contribute to common data collection systems and to have a measurement methodology in place. Performance monitoring was accomplished over at least two-thirds of the duration of the project.
The Setting Up of Ecosystems
Lighthouse projects have a transformational power, that is, they had never been restricted to any type of narrow-minded experiments with limited impact. All projects demonstrated that they could improve (sometimes changing associated processes) the competitiveness of the selected industrial sector in a relevant way. To achieve this, the active involvement of different stakeholders is mandatory. For that reason, the supporting role of the ecosystem that enabled such changes is an important factor to keep in mind: All Lighthouse projects had been connected to communities of stakeholders from the design phase. Ecosystems evolved, extended or connected with existing networks of stakeholders and hubs, whenever this was possible.
As is well known, the European industry is characterised by a considerable number of small and medium-sized enterprises. Therefore, the adequate consideration of SME integration in the projects was always a central requirement to create a healthy environment.
Even though all projects had been requested to primarily focus on one particular sector, the use of data from different sources and industrial fields had always been encouraged, with priority given to avoiding the ‘silo’ effect.
Long-Term Commitment and Sustainability
The budgets assigned to the projects have been envisioned as seeds for more widely implemented plans. All funded activities had been integrated into more ambitious strategies that allowed for the involvement of additional stakeholders and further funding (preferably private but also possibly a combination of public and private).
After the launch of the four initial Lighthouse projects, all learnings related to the concept of Lighthouse projects could be consolidated. As a result, a more advanced concept had been proposed including more concrete requirements for the upcoming large-scale pilots, in some cases further specifying aspects that had already been worked out. The following list served as guidance without the claim of completeness:
-
It is important to reuse technologies and frameworks by combining and adapting relevant existing technologies (big data with the Cloud, HPC or IoT) that are already in the market or close to it (i.e. those with a high technology readiness level) to avoid the development of new platforms where a reasonable basis already exists (e.g. as part of the Open Source community). In addition, projects are especially encouraged to build on the technologies created by the ongoing projects of the Big Data PPP that fit their requirements (e.g. in the area of privacy-preserving technologies).
-
Particular attention should be paid to interoperability. This applies to all layers of the solution, including data (here, some of the results of the projects funded under the Big Data PPP with a focus on data integration could be particularly useful), and to relevant efforts within the HPC, Cloud and IoT communities.
-
It is expected that projects will combine the use of open and closed data. While it is understandable that some closed data will remain as such, we also expect these projects to contribute to the increasing availability of datasets that could be used by other stakeholders, such as SMEs and start-ups. This could happen under different regimes (not necessarily for free). Projects should declare how they will contribute to this objective by quantifying and qualifying datasets (when possible) and by including potential contributions to the ongoing data incubators/accelerators and Innovation Spaces.
-
Lighthouse projects have to contribute to the horizontal activities of the Big Data PPP as a way of helping in the assessment of the PPP implementation and increasing its potential impact. Some of the targeted activities include contributing to the standardisation of activities, the measurement of KPIs, and coordination with the PPP branding, or active participation in training and educational activities proposed by the PPP.
4.3 Technical Projects
Technical projects focus on addressing one issue or a few specific aspects identified as part of the BDV technical priorities. In this way, technical projects provide the technology foundation for Lighthouse projects and i-Spaces. Technical projects may be implemented as Research and Innovation Actions (RIA) or Innovation Actions (IA), depending on the amount of research work required to address the respective technical priorities.
To identify the most important technical priorities to be addressed within these projects, the stakeholders within the data ecosystem had been engaged within a structured methodology to produce a set of consolidated cross-sectorial technical research requirements. The result of this process was the identification of five key technical research priorities (data management, data processing architectures, deep analytics, data protection and pseudonymisation, advanced visualisation and user experience) together with 28 sub-level challenges to delivering big data value (Zillner et al. 2017). Based on this analysis, the overall, strategic technical goal could be summarised as follows:
Deliver big data technology empowered by deep analytics for data-at-rest and data-in-motion, while providing data protection guarantees and optimised user experience, through sound engineering principles and tools for data-intensive systems.
Further details on the technical priorities and how they were defined are provided in Chap. “Technical Research Priorities for Big Data”. The Big Data Value Reference Model, which structures the technical priorities identified during the requirements analysis, is detailed in Chap. “A Reference Model for Big Data Technologies”.
4.4 Platforms for Data Sharing
Platform approaches have proved successful in many areas of technology (Gawer and Cusumano 2014), from supporting transactions among buyers and sellers in marketplaces (e.g. Amazon), to innovation platforms which provide a foundation on top of which to develop complementary products or services (e.g. Windows), to integrated platforms which are a combined transaction and innovation platform (e.g. Android and the Play Store).
The idea of large-scale “data” platforms has been touted as a possible next step to support data ecosystems (Curry and Sheth 2018). An ecosystem data platform would have to support continuous, coordinated data flows, seamlessly moving data among intelligent systems. The design of infrastructure to support data sharing and reuse is still an active area of research (Curry and Ojo 2020).
Data sharing and trading are seen as important ecosystem enablers in the data economy, although closed and personal data present particular challenges for the free flow of data. The following two conceptual solutions – Industrial Data Platforms (IDP) and Personal Data Platforms (PDP) – introduce new approaches to addressing this particular need to regulate closed proprietary and personal data.
4.4.1 Industrial Data Platforms (IDP)
IDPs have increasingly been touted as potential catalysts for advancing the European Data Economy as a solution for emerging data markets, focusing on the need to offer secure and trusted data sharing to interested parties, primarily from the private sector (industrial implementations). The IDP conceptual solution is oriented towards proprietary (or closed) data, and its realisation should guarantee a trusted, secure environment within which participants can safely, and within a clear legal framework, monetise and exchange their data assets. A functional realisation of a continent-wide IDP promises to significantly reduce the existing barriers to a free flow of data within an advanced European Data Economy. The establishment of a trusted data-sharing environment will have a substantial impact on the data economy by incentivising the marketing and sharing of proprietary data assets (currently widely considered by the private sector as out of bounds) through guarantees for fair and safe financial compensations set out in black and white legal terms and obligations for both data owners and users. The ‘opening up’ of previously guarded private data can thus vastly increase its value by several orders of magnitude, boosting the data economy and enabling cross-sectorial applications that were previously unattainable or only possible following one-off bilateral agreements between parties over specific data assets.
The IDP conceptual solution complements the drive to establish BDVA i-Spaces by offering existing infrastructure and functional technical solutions that can better regulate data sharing within the innovation spaces. This includes better support for the secure sharing of proprietary or ‘closed’ data within the trusted i-Space environment. Moreover, i-Spaces offer a perfect testbed for validating existing implementations of conceptual solutions such as the IDP.
The identified possibilities for action can be categorised into two branches:
-
Standardisation: Addressing the lack of an existing standard platform (technical solution) that limits stakeholders from participating in the European Digital Single Market, and the availability of clear governance models (reference models, guidelines and best practices) regulating the secure and trusted exchange of proprietary data.
-
Implementation: Establishing, developing or aligning existing IDP implementations to provide a functional European-wide infrastructure within which industrial participants can safely, and within a clear legal framework, monetise and exchange data assets.
Standardisation activities outlined by the Strategic Research and Innovation Agenda (SRIA) (Zillner et al. 2017) and in Chap. “Recognition of Formal and Non-formal Training in Data Science” have taken into account the need to accommodate activities related to the evolving IDP solutions. The opportunity to drive forward emerging standards also covers the harmonisation of reference architectures and governance models put forward by the community. Notable advanced contributions in this direction include the highly relevant white paper and the reference architectureFootnote 2 provided by the Industrial Data Space (IDS) Association. The Layered Databus, introduced by the Industrial Internet Consortium,Footnote 3 is another emerging standard advocating the need for data-centric information-sharing technology that enables data market players to exchange data within a virtual and global data space.
The implementation of IDPs needs to be approached on a European level, and existing and planned EU-wide, national and regional platform development activities could contribute to these efforts. The industries behind existing IDP implementations, including the IDS reference architecture and other examples such as the MindSphere Open Industrial Cloud Platform,Footnote 4 can be approached to move towards a functional European Industrial Data Platform. The technical priorities outlined by the SRIA (Zillner et al. 2017), particularly the Data Management priority, need to address data management across a data ecosystem comprising both open and closed data. The broadening of the scope of data management is also reflected in the latest BDVA reference model, which includes an allusion to the establishment of a digital platform whereby marketplaces regulate the exchange of proprietary data.
4.4.2 Personal Data Platforms (PDP)
So far, consumers have trusted companies, including Google, Amazon, Facebook, Apple and Microsoft, to aggregate and use their personal data in return for free services. While EU legislation, through directives such as the Data Protection Directive (1995) and the ePrivacy Directive (1998), has ensured that personal data can only be processed lawfully and for legitimate use, the limited user control offered by such companies and their abuse of a lack of transparency have undermined consumers’ trust. In particular consumers experience everyday leakage of their data, traded by large aggregators in the marketing networks for value only returned to consumers in the form of often unwanted digital advertisements. This has recently led to a growth in the number of consumers adopting adblockers to protect their digital life,Footnote 5 while at the same time they are becoming more conscious of and suspicious about their personal data trail.
In order to address this growing distrust, the concept of Personal Data Platforms (PDP) has emerged as a possible solution that could allow data subjects and data owners to remain in control of their data and its subsequent use.Footnote 6 PDPs leverage ‘the concept of user-controlled cloud-based technologies for storage and use of personal data (“personal data spaces”)’.Footnote 7 However, so far consumers have only been able to store and control access to a limited set of personal data, mainly by connecting their social media profiles to a variety of emerging Personal Information Management Systems (PIMS). More successful (but limited in number) uses of PDPs have involved the support of large organisations in agreeing to their customers accumulating data in their own self-controlled spaces. The expectation here is the reduction of their liability in securing such data and the opportunity to access and combine them with other data that individuals will import and accumulate from other aggregators. However, a degree of friction and the lack of a successful business model are still hindering the potential of the PDP approach.
A new driver behind such a self-managed personal data economy has recently started to appear. As a result of consumers’ growing distrust, measures such as the General Data Protection Regulation (GDPR), which has been in force since May 2018, have emerged. The GDPR constitutes the single pan-European law on data protection, and, among other provisions and backed by the risk of incurring high fines, it will force all companies dealing with European consumers to (1) increase transparency and (2) provide users with granular control for data access and sharing and will (3) guarantee consumers a set of fundamental individual digital rights (including the right to rectification, erasure, data portability and to restrict processing). In particular, by representing a threat to the multi-billion euro advertising business, we expect individuals’ data portability right, as enshrined in the GDPR, to be the driver for large data aggregators to explore new business models for personal data access. As a result, this will create new opportunities for PDPs to emerge. The rise of PDPs and the creation of more decentralised personal datasets will also open up new opportunities for SMEs that might benefit from and investigate new secondary uses of such data, by gaining access to them from user-controlled personal data stores – a privilege so far available only to large data aggregators. However, further debate is required to reach an understanding on the best business models (for demand and supply) to develop a marketplace for personal data donors, and on what mechanisms are required to demonstrate transparency and distribute rewards to personal data donors. Furthermore, the challenges organisations face in accessing expensive data storage, and the difficulties in sharing data with commercial and international partners due to the existence of data platforms which are considered to be unsafe, need to be taken into account. Last but not least, questions around data portability and interoperability also have to be addressed.
4.5 Cooperation and Coordination Projects
Cooperation and coordination projects aimed to work on detailed activities that ensured coordination and coherence in the PPP implementation and provided support to activities. The portfolio of support activities comprised support actions that addressed complementary, non-technical issues alongside the European Innovation Spaces, Lighthouse projects, data platforms, and research and innovation activities. In addition to the activities addressed, the governance of the data ecosystem, cooperation and coordination activities focused on the following.
Skills Development
The educational support for data strategists and data engineers needs to meet industry requirements. The next generation of data professionals needs this wider view to deliver the data-driven organisation of the future. Skill development requirements need to be identified that can be addressed by collaborating with higher education institutes, education providers and industry to support the establishment of:
-
New educational programmes based on interdisciplinary curricula with a clear focus on high-impact application domains
-
Professional courses to educate and re-skill/up-skill the current workforce with the specialised skillsets needed to be data-intensive engineers, data scientists and data-intensive business experts
-
Foundational modules in data science, statistical techniques and data management within related disciplines such as law and the humanities
-
A network between scientists (academia) and industry that leverages Innovation Spaces to foster the exchange of ideas and challenges
-
Datasets and infrastructure resources provided by industry that enhance the industrial relevance of courses.
Business Models and Ecosystems
The big data value ecosystem will comprise many new stakeholders and will require a valid and sustainable business model. Dedicated activities for investigating and evaluating business models will be connected to the innovation spaces where suppliers and users will meet. These activities include:
-
Delivering means for the systematic analysis of data-driven business opportunities
-
Establishing a mapping of technology providers and their value contribution
-
Identifying mechanisms by which data value is determined and value is established
-
Providing a platform for data entrepreneurs and financial actors, including venture capitalists, to identify appropriate levels of value chain understanding
-
Describing and validating business models that can be successful and sustainable in the future data-driven economy
Policy and Regulation
The stakeholders of the data ecosystem need to contribute to the policy and regulatory debate about non-technical aspects of the future big data value creation as part of the data-driven economy. Dedicated activities addressed the aspects of data governance and usage, data protection and privacy, security, liability, cybercrime, and Intellectual Property Rights (IPR). These activities enabled the exchange between stakeholders from industry, end-users, citizens and society to develop input to ongoing policy debates where appropriate. Of equal importance was the identification of concrete legal problems for actors in the Value Chain, particularly SMEs that have limited legal resources. The established body of knowledge on legal issues was of high value for the wider community.
Social Perceptions and Societal Implications
Societal challenges cover a wide range of topics including trust, privacy, ethics, transparency, inclusion efficacy, manageability and acceptability in big data innovations. There needs to be a common understanding in the technical community leading to an operational and validated method that applies to data-driven innovations development. At the same time, it is critical to develop a better understanding of inclusion and collective awareness aspects of big data innovations that enable a clear profile of the social benefits provided by big data value technology. By addressing the listed topics, the PPP ensured that citizens’ views and perceptions were taken into account so that technology and applications were developed with a chance to be widely accepted.