Towards a Research Agenda for Personal Data Spaces: Synthesis of a Community Driven Process

,


Introduction
While data underpin almost every action or process in our society it proves difficult to achieve a state of 'data liquidity' in which data can be reused where and when needed. Most data continue to stagnate in silos, controlled by data holders and inaccessible to its subjects or parties who could use it. In addition, people (e.g. social media users) have little meaningful options to control their personal data and information flows [1] and have barely any agency on how their data are 'packaged' and 'sold'.
To mitigate these issues of autonomy and power asymmetries, decentralised storage of data has been put forward as an important response [2] and start-up or research companies are bringing to market applications supporting decentralised storage of personal information, e.g. Cozy Cloud, Meeco, OpenPDS and Solid. These applications, often called Personal Data Pods, are also being investigated by larger companies and (governmental) institutions. Examples are public broadcasters such as the BBC [3] or the VRT [4] which are experimenting with personal data store technologies that empower its viewers and listeners with their own data or the Flemish government (Flanders is the northern part of Belgium) which is founding a 'data utility company' to become one of the parties that will provide each of Flanders' millions of citizens with their own Personal Data Space or Personal Data Pod [5].
For Flanders, the focus on Personal Data Spaces (PDS) or Personal Data Pods as a key policy aspect and as a driver for innovation in Flanders was formalized in September 2021 during the annual policy declaration of the Flemish Government [6] and a real momentum has grown in Flanders to become a region that takes the storage and processing of digital personal data serious. In this light, PDS are considered as a valuable alternative for giving individuals granular control over the data that is captured about them and over how this data is shared and used, but also as a means for organisations to more easily develop data-based services and to stimulate a data economy.
Solid is a W3C specification [7] that supports such PDS that are under the control of an individual or organisation. This enables individuals to reuse their data among different applications while also providing them with a sense of control over how their data is used. The introduction of PDS puts individuals at the heart of the management of personal data and gives them an important role in the current data ecosystem. Due to the possibility of actively engaging with their data, people can obtain a consolidated view of their personal data. For example, using Solid-based applications [8], individuals can conveniently switch between data storage providers and application providers. They can give third parties, such as companies, permission to access certain data for a specific purpose and for a limited time (e.g. processing a loan application or delivering a personalized ad). Implementing PDS in society should create an ecosystem where individuals control the sharing of their data between interoperable data sources and endpoints [9], with companies, institutions and governments accessing individuals' data with permission and for a specific purpose. In this article we focus on PDS enabled by Solid.
The goal of this paper is threefold. First, the architectural design of Solid is described. Secondly, the foundation and operations of the Flemish 'Solid Community' are discussed. The scope of the community, launched on April 20th, 2021, and consisting out of a variety of private and public stakeholders, includes both technical and non-technical facets, with a focus on architecture and user experience, use cases, business models, legal aspects and information security. The third goal of the paper is to describe, structure and reflect on the hurdles and challenges, with regards to making Solid enabled PDS a reality in Flanders, that emerged during the first year of the Flemish 'Solid Community'.

Solid Architecture
Solid is a W3C specification [7] that provides individuals with one or more online storage spaces, similar to commercial services such as Dropbox. These are referred to as Personal Online Data Stores (pod or pods) and may be offered by public or private parties and differ in their pricing, security features and accessibility [10]. Research has shown that local connected devices, such as smartphones, might also be used as a Solid pod [11]. As the Solid specification prescribes that data should be stored in a standardised and interoperable format (i.e. Linked Data) [12], users are able to switch between pod providers with relative ease, thus achieving a form of decentralization. In addition, individuals may grant organisations and applications granular access to read or write certain data to a Solid pod (e.g. a recruitment agency requesting access to diploma data). This access grant can be withdrawn at any time, to the extent legally possible, allowing individuals to retain a sense of control. As data is stored in a standardised format, applications are able to use data that was previously written by another application [13]. This means that individuals may be able to more easily switch between applications, as the switching costs are lowered. This also contributes to the decentral nature of Solid.

Flemish Solid Community
The Flemish 'Solid Community' was initiated by the Flemish government to promote cooperation between academia, governments, citizens and the industry (public and private companies) around the development of Solid based PDS [14]. The main objective was to stimulate the acceptance and usage of Solid by end users and service providers, to give individuals more control over their data and to increase data sharing within Flanders and Belgium. To achieve these goals in a responsible and durable way the project focuses on sharing knowledge, creating ecosystems, developing partnerships, executing projects and incentivizing the use of Solid [15].
The community originally operated through monthly plenary sessions where the various stakeholders were represented and given the opportunity to contribute ideas, ask questions and establish collaborations. These sessions initially explored the possibilities of Solid PDS within the domains of healthcare, mobility and culture, with an extensive reflection on how these applications can meet societal needs and challenges. These explorations included both conceptual considerations on what the main focal points should be but also demonstrations of prototypes, as contributed by members, to further the discussion. The topics discussed included user experience, identity and consent management, business models, interoperability and ecosystem architecture, legal and ethical issues, information security and ecosystem governance. This was complemented by various presentations of PDS pilot projects from both private parties and the Flemish government, of which the topics are listed below.
• Mobility PDS: mobility budget management, sharing mobility data to improve traffic management, personalised ride sharing applications, more appropriate mobility services for people with a disability, demand-driven mobility, simplified driving license check for car rentals • Media & culture PDS: cross-service media curation, personalised media guide, exposure to new cultural content • Health PDS: involving people in their own health (food, diabetes, exercise, BMI), informed decision making in the context of a pandemic • Finance PDS: simplified social security application, simplified handling of fines • Administration PDS: simplifying address changes, giving control over sharing of personal data when applying for a job, more personalised job recommendations These projects were considered from both a technical and a non-technical viewpoint. In addition, there were reflections on key legal aspects such as the proposed European Data Governance Act [16]. The main challenges that were identified during this process are discussed in Sect. 4.
Based on these insights a governance model was developed to accommodate and optimise cooperation between stakeholders and to allow for the fruition of the community. This entailed operating through four working groups of which two focused on the technological and social dimensions of challenges related to PDS, one on translating these insights to concrete use cases and one on disseminating knowledge to external parties. The frequency of these sessions varied between bi-weekly and bi-monthly. The general principles of the Flemish Solid Community were bundled in a charter and entailed a focus on control over data, centrality of user requirements, stimulating partnerships between stakeholders, providing social added value, providing room for innovation and experimentation, knowledge sharing, transparency and stimulating intercommunity collaboration.

Methodology
In order to reflect on the prominent challenges that were put forward within the community, the notes of the past working group sessions were analysed. These sessions were facilitated and led by the Flemish government, Ghent University and the authors. Information about their contents and the attendees can be found in Table 1. Written notes were created by the authors during these sessions as recording was not possible due to privacy constraints. To accommodate for the potential loss of detail and context associated with this method of data collection, presentation slides of the speakers were included in the analysis if available. The analysis was based on the principles of Grounded Theory [17] and divided in three stadia. Firstly, the initial concepts were explored by selecting and coding fragments that relate to certain hurdles or challenges with Solid PDS. While describing these fragments, care was taken to stay close to the original wording of the attendees ('open coding'). However, it should be noted that due to the unavailability of a transcription, certain nuances might already have been lost. Secondly, the fragments were compared with the aim of reducing the number of codes and finding overarching categories ('axial' and 'selective' coding). Thirdly, this process was repeated to create an overall typology of the identified categories.
This coding effort structured the identified challenges in four domains: social, technical, legal and network (ecosystem) level challenges. As some concepts were discussed multiple times and from different perspectives, they may be located within multiple domains. This analytical approach was chosen for its ability to identify what issues attendees engage with and how this translates to a general picture.
While the large number of topics discussed within the Flemish Solid Community allowed for a broad analysis that covered a range of domains, this also limited the depth of the analysis. Later publications may focus on reports from a specific working group within the Flemish Solid Community, such as the working group for social dimensions, to allow for a more profound reflection on how these challenges are conceptualised. In addition, the near exclusive focus on Solid PDS in the analysed reports might limit the applicability of the results to other PDS technologies. Further research may consider related developments and technologies to broaden the perspective on Personal Data Spaces. It should also be noted that the analysis depicts the challenges related to PDS as perceived by members of the Flemish Solid Community. The validity of these issues might vary, as might their comprehensiveness. The latter relates to the organic way in which topics for discussion were selected, i.e. community members were able to choose or emphasise certain subjects, meaning that certain challenges might remain out of scope of this analysis.

Challenges Related to Solid PDS
As indicated, the identified challenges could be situated within four domains: social, technical, legal and network (ecosystem) challenges.

Social Challenges
Social challenges are defined as challenges related to limitations in human understanding or to broader societal dynamics of inequality. This might encompass concerns about how users of PDS can be provided with control over their data, what new business models in PDS ecosystems will mean for privacy and integrity and how one can communicate about data management and privacy self-management.

Meaningful Control.
A first challenge relates to how individuals can be provided with control over their data. One proposed contributing factor to this is an intelligible way to provide consent for sharing information. However, it was noted that challenges lie not only with implementing this in a user-friendly way but also with developing interfaces that do this in a meaningful way. The latter refers to considering problems as privacy fatigue, time constraints and cognitive biases that degrade the extent to which an individual can provide consent in a meaningful way. This issue has been referred to as the 'consent dilemma' [18].
Consent intermediaries were considered as a viable way to counteract this. These refer to services that bring together and simplify consent management while still allowing for some degree of individual control [19]. However, it was argued that such forms of delegated consent and the role of consent intermediaries also require a communication framework to clearly explain their purpose to individuals. Other suggestions to improve intelligibility included embedding consent flows in the content itself or standardizing the flow over multiple applications to support recognizability. Related to this is the concept of providing a holistic overview of all consent decisions made by an individual. While this might contribute to providing more control and transparency, it is unclear how this information might be presented in an understandable way. The importance of a holistic overview of an individual's data was also discussed. Solid PDS allow data to be stored in various locations and among different organisations. Through so called 'data browser' or 'data dashboard' applications one can gain control over their data in a centralized fashion. However, uncertainty remains as to how this concept of decentralized data shown in a central way can be translated to individuals and to what extent. It was questioned to what extent individuals desire control over where data should be stored and retrieved.
These challenges are augmented when derived data is considered, e.g. algorithmically processed personal data, and its impact on the aforementioned measures of control. Not only should be examined how individuals can determine what algorithms are allowed to process their data, but also how informed decisions can be made based on these data. This might not only entail behavioural choices, such as estimating the risk of meeting a friend in a COVID-19 context, but also insight in how personal data is valued by various stakeholders and how these valuations differ when data is traded individually or in an aggregated manner.

Privacy, Integrity and Inclusion.
A second concern focuses on what new business models in PDS ecosystems will mean for privacy and integrity. It was argued that increased control over larger amounts of data might induce exploitative dynamics where individuals are required to supply more data than today to make use of services. Additionally, this would carry the risk of exclusion for those that are not willing to share the required information. It also remains unclear how the willingness to share data from a personal data store differs by context and by data type. It was suggested that an overarching ethical framework be put in place to manage these issues and to explore further ethical barriers that may limit the adoption of Solid and have undesired societal effects.
Another aspect in consideration was how PDS could work for a diverse group of people, including those with limited access to technology or those in a vulnerable situation. Further research is needed to define what easy access and accessibility might mean for Solid PDS, and how people can be adequately protected against exclusion.

Intelligible Communication and Tangibility. A third challenge targets understandable communication about Solid PDS to the general public. This refers to ways in which
Solid can be made tangible to people, for instance through education and storytelling. More broadly it entails questions about how to communicate about data management and privacy self-management. The latter refers to measures that allow people to take control of their own data [18]. A focal point is the decentral nature of Solid. Not only should be investigated how individuals experience this decentral aspect and how the decentral architecture of Solid should be communicated, it was also questioned how such communication can be uniform when various parties are involved in a decentral ecosystem.
Another set of questions that arose were related to the ways in which PDS can be effectively marketed. This encompasses ways to show the added value of PDS and promoting trust between individuals and service providers. These require the identification of relevant use cases and adoption requirements that contribute to user trust. It was argued that user involvement is an important component in the development of a communication strategy.

Network Challenges
Network or ecosystem challenges concern the changing roles in Solid PDS ecosystems and include the identification of business models for adopting Solid PDS for the storage and exchange of data, hurdles with regards to interoperability and standardization as well as challenges on how responsibilities should be shared and on how governing frameworks can avoid multiple interoperable competing technologies.
Business Models. A primary concern was the identification of business models that make it interesting for both commercial and non-commercial organisations to adopt Solid PDS for the storage and exchange of data (e.g. customer data). It should be investigated how current business models can be adapted to the context of a decentralized market and how these models compare against, and compete with, data silo models. In addition, it was questioned how a tipping point could be reached, i.e. overcoming a chicken-andegg problem where individuals nor service providers are willing to adopt PDS due to a lack of presence of the other. To this end, it was argued that a framework be developed that both maps current funding methods for PDS projects and that aims to involve and support adopting organisations in the early innovation phase which carries a high risk of failure. This should include a mapping of the various potential stakeholders in a PDS ecosystem, such as pod or identity providers.
It was noted that there is a fair amount of uncertainty as to the competitive dynamics that would exist when such business practices are adopted. This includes how to cope with current dominant market players and what future value exchange will look like, but also more specific inquiries such as what a competitive market for consent intermediaries might look like. Another important aspect is what the separation of data and applications will mean for organisations. It was suggested that the improved access to data might improve the innovative potential of SME's or contribute to a separation of power in value chains. However, concerns were expressed about the possible establishment of new data silos to due to extensive data monetization that limits data access to smaller organisations.

Interoperability and Standardization.
To ensure that the aforementioned data exchanges are able to take place there is a need for standards that enable interoperability. While various local and international standards exist that allow for data exchange, such as the Flemish OSLO initiative for open standards and Linked Data [20] it is unclear to what extent these can be reused for or applied to Solid PDS ecosystems. There was an extensive focus on pod interoperation and pod browser interoperability. The former referring to the standardization of data storage locations and the latter referring to how data browsing interfaces for individuals can be standardised. It was noted that interoperability with legacy systems and interfaces, as well as other standards, is important to accommodate the adoption of PDS.
In addition, there might be a need to adapt existing standard development frameworks as to ensure broad support among stakeholders in decentralized environment. It was suggested that further research is required to investigate what actors might or should become responsible for ensuring such interoperability, but also how cases of non-compliance should be handled, and that interoperability might be studied from a technical, legal, organisational and semantic perspective [21].
Governance Models. It was argued that to combat such challenges, a governance framework that focuses on creating trust among actors is essential. However, it is as yet unclear what factors might fully contribute to this aspect and how this might depend on the mode of cooperation or sector. Governing rules for the compliance to Solid standards and the development of new standards were suggested as in important element to further trust within an ecosystem. A governing framework for PDS might include other factors that optimise vertical and horizontal cooperation such as a mapping of how individuals can cooperate to market their data collectively in so called data collaboratives [22] and the role of PPPP-models (public-private-people partnership) in a cross-sector data sharing context. In addition, further research might focus on how responsibilities should be shared between various actors and how governing frameworks can avoid multiple interoperable competing technologies to share and control data.

Legal Challenges
Legal challenges point to regional, (inter)national and European legislation issues that might arise when implementing Solid-based applications to afford people to conveniently switch between data storage providers and application providers. These include concerns about data control and portability (e.g. the impact of the upcoming European Data Governance Act), about assuring legal compliance of PDS and about how consent can be delegated, for instance to a consent intermediary.
Data Control and Portability. An important concern was related to understanding what the European Data Governance Act [16] might mean for Solid, and especially for the position that Solid pod providers can take and the functions that they can perform. In addition, there were concerns about where the limits of data portability lie in this context and how the difference between a data holder and a data subject can be communicated clearly to individuals. In addition, there were questions regarding to what extent the right to manage one's data might translate to the duty to manage one's data, meaning that it should be investigated to what extent individuals will be forced to manage their data themselves and whether this is desirable. In relation to this, further research might explore how individuals can be supported in managing their data consciously.
Compliance. Another primary concern was assuring legal compliance of PDS and how this can be guaranteed when various providers are active. This includes assurances for individuals that pod providers are not to have access to the content of a pod, and the ability to verify that data was not used for undesired purposes.
Another important aspect is how fragmented data can be consolidated through Solid technology while remaining in compliance with both the European General Data Protection Regulation and local, regional, federal and international regulation. To this end, questions were raised about legitimate interest, its scope, and what this means for organisations and individuals. In addition, it was posited that research on these regulations should focus on the legal barriers that limit data sharing between governments and private organisations. Such data transactions might include derived data that are algorithmically generated, which pose further questions about their potential use within legal limits. Further concerns were raised about the role of data protection officers in a decentralised context and the alignment of European and local regulatory frameworks and how this impacts stakeholders that operate in a PDS ecosystem.
Lastly it was noted that frameworks should be mapped or developed that not only allow for the ethical, transparent and safe use of data in a lawful manner, but that also exceed the current legal requirements. To this end it was suggested that research efforts should focus on what conformity and ethical labels are required to represent and protect these requirements and on how these might differ across domains (e.g. health and mobility).

Consent.
There also lies a challenge with informing individuals in an intelligible way that, while Solid's focus lies with providing individuals control through consent mechanisms, the right to process one's data might be granted by another legal basis [23]. Associated to this is how individuals can be made aware of what data they are legally required to provide access to and what data they can control more freely through consent mechanisms. Other points of uncertainty included how organisations should manage the withdrawal of consent by individuals when data has already been duplicated, and to what extent and in what contexts these consent withdrawals are a possibility. Uncertainty also remained about how consent can be delegated, for instance to a consent intermediary.
Software Licensing. A last legal challenge concerns the ways in which software contributions or components for Solid PDS can be licensed or shared. This while protecting the interests of both the organisations that make these components available and the broader ecosystem.

Technical Challenges
Technical challenges refer to the management of identities and achieving information security, scalability and maturity in both the development and deployment of Solid PDS.

Identity and Pod Management.
A first series of concerns is related to identities and how these can be managed when multiple identity or pod providers are involved, and especially how individuals can maintain overview and control in this context. In addition, it is unclear what kind of additional flexible identities, such as a tourist or a short stay identity, might be required and how this differs per domain or use case. This question also relates to how these identities might be extended to internet connected devices such as sensors, wearables or cars and what their requirements are. When regarding identities, there were questions about how Solid's identity mechanisms can be used in tandem with decentralised identity technology.
In addition, it was suggested that research efforts should focus on how pod management or pod browser applications can be made interoperable (e.g. allowing to control national data with local or foreign pod management tools). In this context it was argued that standardizing efforts should also extend to consent granting delegation to other parties.
Information Security. In terms of information security, it was noted that it would be beneficial to map the current malicious applications or malicious ways of using data and reflecting on how these issues will be handled in the context of Solid PDS. Such research might also include what security standards a pod must adhere to, how this differs per use case and what generic solutions can be used for this purpose. Specific interest was shown in the potential risk that allowing individuals to store verifiable credentials carries.
In addition, it was argued that there should be a focus on techniques that allow for the drawing of conclusions from fragmented data sources without copying whole datasets (e.g. multi-party computation) and other techniques that allow for data minimalization. Attention might be required for handling or preventing data duplication. Furthermore, it was posited that the ways in which end-to-end and other emerging encryption technologies can be linked to the Solid realm should be investigated.

Scalability and Maturity.
A last topic under discussion concerns the ways in which Solid can scale and mature. Firstly, this refers to an architecture that can scale to a vast number of pods or resources, for instance through caching or aggregator solutions. Secondly it considers providing tools for developers and organisations that allow for the implementation of Solid PDS and applications with limited available time. This includes identifying common components that are required by various parties and aligning the architecture with current reference models (e.g. International Data Spaces and European Common Data Spaces). Attention must be paid to how participating organisations might manage the complexity of this architecture, which might include various decisions about whether, where and how to host pods and applications. Lastly it was noted that more research is required that show how PDS technologies like Solid perform and why they might be preferred as a solution.

Discussion
The analysis shows that the challenges associated with the development of Solid PDS by the Flemish Solid Community are complex and multidisciplinary in nature. Through their conceptualisation within four domains this paper aims to contribute to the development of an interdisciplinary research agenda for the diffusion of personal data spaces technologies that are socially robust, ethically justified and both technically and legally supported. However, as there is only a limited reflection on their scientific relevance, further research may focus on how these challenges can be theoretically framed. Such a reflection might contribute to the development of domain specific research agendas that are based on the current needs as experienced by governments, academia and the industry. In addition, further research may focus on whether the current results are applicable in different contexts. This might entail the comparison of Solid PDS challenges with related PDS technologies and data sharing paradigms, such as open data. From a legal perspective, the influence of differing legal frameworks may be considered, while from a network-level perspective, the future relevance of these challenges may assessed by framing them within emerging data sharing models [24].
It should be noted that although citizens were able to participate within 'Solid Community', their actual involvement was very limited. As a major goal of PDS technologies entails improving individuals' agency on how their data is used, their involvement during the shaping of a scientific agenda is essential. Further research should augment the current results with the perspective of citizens.

Conclusion
Taking these identified challenges into account, the Flemish 'Solid Community' is well aware of the needs and problems Solid implementations are confronted with nowadays. Functioning as a reflexive and interactive platform, the Flemish 'Solid Community' next steps are to formulate answers to these social, network, legal and technical questions confronting practical citizen centred PDS initiatives in Flanders. The 'Solid Community' partners are therefore in an excellent position to pro-actively help to shape the framework conditions for the further diffusion of socially robust, ethically justified, and legally supported PDS initiatives in Flanders.
The 'Solid Community' takes an interdisciplinary approach as it bundles knowledge and expertise from different disciplines and domains. Interdisciplinarity is then at the core of this community and reflected in the composition of the community. Each partner has strong experience in certain aspects pertaining to Solid ecosystems and brings this expertise together. By approaching the topic in these diverse ways, a more fundamental grip on PDS as a concept is gained. Finally, the 'Solid Community' supports an Open Science-approach as it enables early and open sharing of research and as it involves all relevant knowledge actors including governments, companies, civil society and end users. It acknowledges that research and innovation processes are embedded within societal and political discourses, cultural practices and institutional structures.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.