Keywords

1 Introduction

An increasing number of government agencies release open data to spur economic growth through the development of new digital services or increasing organizational productivity. McKinsey [1] estimates that open data may provide 900 billion US dollar additional annual economic growth in Europe compared to an economy without open data. To unlock this potential, more and more data sources, ranging from transport to educational data, are available in open data portals throughout Europe. However, the question remains how to further exploit the economic potential of open data. Currently, innovation based on open data sets seems to lag behind the high expectations of policy makers. Hence, previous research took stock of barriers to the publication and use of open data. This paper contributes by taking a capabilities perspective to study how successful open data re-users create both economic as well as social value out of the available data sources. What can we learn from these frontrunners? Hence, the central research question is: “What capabilities are most important to create value out of open data?”

The gap between open data provision and usage is reflected in IS research. To date, most attention has been paid to barriers at the side of open data providers, which are predominantly government agencies [25]. Recently, scholars have started to examine the impediments that open data re-users experience [5]. However, most of these studies focus on the technical barriers related to opening, finding and using data that hinder usage [5, 6]. Several scholars urge for more research on the non-technical barriers, such as those related to business cases, management support and organizational culture, that open data re-users experience [7, 8]. Only a few studies examine the capabilities that open data re-users need to create economic value from data [9, 10]. Jetzek et al. [9] describe capabilities as “the collective ability of individuals and organizations to use and re-use open data” and focus on access to open data and data literacy. Complementing Jetzek et al. [9], we aim to study open data innovation capabilities related to data re-use focused on the organizational level.

Our paper contributes to open data research in two ways. First, we synthesize information systems (IS) and innovation management (IM) literature on capabilities for data-driven innovation to develop a conceptual framework. To complement current open data research, this framework includes IT, organizational and people-related capabilities. Second, we present the results of the study of 12 Dutch frontrunners in innovation based on open data to test the conceptual framework. The study includes a questionnaire and semi-structured interviews with the CIOs and/or technical managers of the 12 organizations.

The practical implications of this study are twofold. First, we formulate the capabilities required for organizations that want to start to innovate with open data, e.g. by developing new services or improving their current practices. Firms may use the insights to develop an open data innovation plan that includes technical and non-technical measures. Second, policy makers may use the insights of this paper to develop policy instruments that stimulate data-driven innovation. For example, our research may fuel policy debate about the digital skills that organizations need.

2 An Open Data Innovation Capabilities Framework

A literature review identified IS and IM journal articles and conference papers that study capabilities needed for data-driven innovation. Keywords for our search included combinations of the words e-Skills, capabilities, innovation, “data-driven”, “digital skills”, “data re-users” and barriers. We found 19 relevant studies that examined capabilities for data-driven innovation. The studies are mentioned in the tables below.

We clustered the results of our literature review around three main types of capabilities: IT related capabilities, organizational capabilities and skills. Each capability type was identified in literature as influencing the data innovation capacity of an organization. Within each concept we distinguish a number of categories and sub categories, presented in the sections below.

In contrast to previous research [10], the selected IS and IM studies point at the importance of non-technical capabilities to innovate with data. This reflects the general notion in innovation management that three quarter of innovations’ success is related to non-technical factors [11]. Furthermore, many capabilities are related to small or medium-sized enterprises and not to large enterprises, for example entrepreneurship and agility.

2.1 IT Capabilities

With respect to IT capabilities we found three categories that influence the data innovation potential of an organization: (1) Infrastructure and enabling technologies that include all hardware and software that is needed to collect, store, analyze and visualize open data, (2) An IT strategy that includes planning, data management and governance, and measures to secure data, and (3) Interoperability that includes capabilities on working with standards and ability to integrate IS. Table 1 provides an overview of the IT capabilities, how they are defined in literature and a list of studies that mention them.

Table 1. Overview of IT capabilities identified

2.2 Organizational Capabilities

With respect to organizational capabilities we found four categories that influence the data innovation potential of an organization: (1) Strategic capabilities that includes top management support for data-driven innovation and the ability to change decisions and policies based on open data, (2) Tactical capabilities that include the mixing of disciplines over organizational silos, allocation of resources, and ability to adapt organizational processes based on data, (3) Operational capabilities that include entrepreneurship and R&D, and (4) cultural capabilities that include a culture focused on innovation and agility. Table 2 provides an overview of the organizational capabilities, how they are defined in literature and a list of relevant IS and IM studies.

Table 2. Overview of organizational capabilities identified

2.3 Skills

With respect to skills we found two categories that influence the data innovation potential of an organization: (1) Hard skills, and (2) Soft skills. Where hard skills can be defined as more technical skills such as programming or data analytics skills and soft skills as more non-technical such as interdisciplinary cooperation and entrepreneurship. Table 3 provides an overview of the capabilities, how they are defined in literature, and a list of relevant IS studies.

Table 3. Overview of skills identified

3 Methods

To answer our research question, we studied 12 leading open data re-users in the Netherlands. We decided to sample cases from the Netherlands as the research team had access to Dutch cases and the Netherlands has a lively open data community with governments opening up more and more datasets. However, despite the Dutch central government wishes the Netherlands to be an open data frontrunner in Europe, policy evaluations indicate that this still requires significant effort.Footnote 1 The open data frontrunners in this study were selected based on their presence as showcase in national and European open data research (e.g. the former ePractice portal and the European Commission’s join up community) or because they have won national and European app awards. Furthermore, we selected cases that vary in size and sector to avoid selection bias. The selection process resulted in a longlist of 17 Dutch frontrunners of which 12 respondents finally participated in our research. Frontrunner only participated in the survey. The response rate of 71 % can be qualified as very high, which indicates that respondents were highly involved with the topic and therefore more inclined to participate. Our research targeted the frontrunners’ technical managers, innovation managers, CIOs or CTOs. Usually, only one respondent per case was contacted.

We used a two-step approach to measure the capabilities in our conceptual framework. First, we operationalized the capabilities in the framework in a questionnaire that included seven questions on the organization, their application of open data and which capabilities they find most important. We tested and improved the questions on comprehensiveness. The questions in the survey asked respondents to rate the importance of each capability to the success of creating value out of open data on a five-point Likert scale (ranging from very unimportant to very important). In addition to the closed questions, we allowed respondents to add their own capabilities and explain why they are important. After we received the answers on the questionnaire, we contacted each respondent within one week after filling in the questionnaire for a semi-structured interview. In this interview, we asked respondents to explain their answers: why are certain capabilities important or not important at all in their view? Interviews were conducted by phone and varied in length from 15 to 45 min. The questionnaire (in Dutch) and interview protocol are available on request.

Table 4 below provides an overview of the frontrunners that participated in our research.

Table 4. Description of cases

An analysis of the cases shows that two third of our respondents have more than 3 years’ experience with open data. About half of our cases are organizations with less than 10 employees. Only 2 cases are large organizations.

Most organizations (91.7 %) use open data to develop new products or services or improve existing products (58.3 %). Half of the organizations aim to develop societal value with open data. Examples of societal value created by the organizations are insight in quality of schools or the plans of the local government in your neighborhood. Our respondents focus less on the optimization of internal processes and decision support with open data. This might be related to the relatively small average size of the organizations (mainly SMEs).

4 Results

In this section we describe how respondents rated and explained the importance of IT, organizational and people-related capabilities for the use of open data. The focus in this paper is more on the underlying reasons that organizations have for finding certain aspects important in relation to data innovation than to have representative scores. This is reflected in the modest amount of respondents. Therefore, the figure does not present representative numbers but a reflection of the average of our respondent group (Fig. 1).

Fig. 1.
figure 1

Overview of how capabilities were rated on a scale from 1 to 5

4.1 Information Technology Capabilities

The factors that were found to be most important are data governance (4.40, on a scale from 1 to 5), followed by data management (4.25) and interoperability (4.22). The factors that received the lowest scores were Hardware and Information Systems (3.42) and having an IS strategy and planning in place (3.42).

Looking at a higher level, we can conclude that IT capabilities related to the handling of open data are perceived as most important. First, data governance, which describes the processes and policies that ensures that important data assets are formally managed, is essential for data re-users as it ensures that data can be trusted and that data owners are accountable for risks, for example a low data quality. Specifically, the ability to judge data quality was mentioned by most respondents as being essential to evaluate if data can actually be used. Knowing who the data owner is, how the maintenance of the data is deployed and who is responsible for the quality of the data are success factors when innovating with open data. Another factor related to data governance is to define rules on how to deal with new technological developments related to the data, e.g. cloud storage. Second, respondents rated data management, which comprises all the disciplines related to managing data as a valuable resource, as a very important IT capability when innovating with open data. It is important to specify how data are stored, manipulated and processed, in particular when open data are combined with proprietary data for new services. Third, innovation often means that data re-users need to be able to access different datasets and link them in new ways. Hence, interoperability, semantic standards and system integration are important capabilities, or as one of the respondents indicated: “without standards there is no integration”. The ability to handle different open data standards (both open source and proprietary) is important to ensure that data can actually be reused, e.g. data is readable, actual and relevant. The availability of standards makes the reuse of open data faster and more efficient. Data re-users need to be familiar with and their information systems need to be compatible with the latest data standards of the data providers. These standards are domain specific and require domain knowledge.

Data analytics technologies, such as software and tools were found to be important but as they are widely (commercially) available, this factor is not considered a challenge anymore. The same holds for hardware and software which are seen as a prerequisite for data innovation but are also considered as given and thus not a challenge anymore to invest in. Of all IT capabilities, having the right hardware and software is deemed the least important. However, just having the right hardware and software does not guarantee that they are used to their full potential. Hardware and software becomes an important capability when privacy, security and ethical questions of the use open data start to arise. With respect to data security it is important that the infrastructure used ensures that the data is stored securely and that data access is in line with access policies. Last, an IT strategy and planning seems to be less relevant to our respondents, which might be biased due to the relatively high amount of small and innovative firms in our sample.

4.2 Organizational Capabilities

The organizational capabilities that were found to be most important by our respondents are entrepreneurial orientation (4.55, on a scale from 1 to 5) and an innovative organizational culture (4.55), followed by the organizational capability to develop new ideas (4.45). The organizational capability that was perceived by far as least important for data-driven innovation is the organization’s capability to efficiently allocate resources (3.36).

We divided the organizational capabilities into strategic, tactical, operational and cultural capabilities. Based on the questionnaire and interviews we conclude that the operational and cultural organizational capabilities are considered to be most important when innovating with open data.

Two third of our respondents indicated that an entrepreneurial orientation is very important and that, similar to other types of innovation, it comes down to entrepreneurship. This result can be influenced by the fact that the majority of our respondents are smaller, innovative and often more entrepreneurial organizations. To create value out of open data, organizations need teams that are capable of making links between different subjects, develop a network in and outside the organization, recognize opportunities that open data sources offer and show willingness to exploit these ideas. While this capability will come back later as an important skill, it also requires an organizational culture which supports innovation and entrepreneurial thinking by e.g. handling short lines between different management levels and working in small teams. Consequently, respondents indicated that organizations, in order to innovate, need to provide an environment where experimenting with data is encouraged, which is a creative process that may take a while before it creates value. People in organizations with such a mindset think in terms of possibilities and not in terms of barriers. They are not afraid of potential hurdles and challenges but just get started. These organizations are often data driven, have a long term vision and have a culture where innovative teams can play around with data and failure is acceptable. It is best to start an open data innovation project with a small team that has a strong mission and believes in open data innovation. In time this team can take small steps and expand with more and more people that share the same mindset. Moreover, an innovative culture implies that the costs of the innovation e.g. data applications are positioned as costs for the business, not as costs for the IT department. Thus, making the business responsible for data innovation is essential for its success. As the context and challenges around the data might constantly change it is essential to have a learning, agile organization that can quickly adapt to changes in the provision of data.

Multidisciplinary teams are crucial in the open data innovation process, as organizations need people with both technical knowledge and people that understand the domain context of the data and the needs of potential customers. Teams need to understand the bigger picture of the data and be able to talk across boundaries of systems and organizations. Multidisciplinarity helps teams to understand and translate technical complex concepts in something that is visually appealing.

Our respondents indicated that central leadership in itself is less important to innovate, but having mandate and budget from top management is very relevant for the success of an open data innovation. Although innovation initiatives and ideas are most likely to develop on business level it is very important to have a data driven vision at board level to move from an experimental phase to a next phase where innovations are further deployed and implemented. Our interviewees further mentioned that organizations should use successful examples as inspiration to stimulate the re-use of open data in their own organization. Central coordination of resources and activities as well as asset orchestration was found to be the least important in supporting the innovation. Most of the small innovative and entrepreneurial organizations in our sample did not recognize this capability as being important. However, coordinated actions might be valuable for organizations to match the supply and demand of data and stimulate the dialog between data owners and re-users. The management of an organization might want to play a catalyst role in this.

We find that the larger organizations in our sample find it important that the overall organization is ready and willing to support processes and decision-making based on open data. If an organization really wants to innovate with data it should let go of its classical decision making process and use the results of its data innovation to form its own decisions and policies.

Relationship management is a capability that was not initially considered based on our literature analysis and questionnaire, but based on the interviews the ability to engage with external stakeholders (e.g. end-users) is found to be important to be able to innovate. Often, the availability of open data sets is a first step, and close collaboration between the data provider and user is necessary to take the next steps and create value. As one of the respondents said: “Do not think in organizational silos but look across organizational boundaries when innovating with data.”

4.3 Skills

The factors that got the highest scores and are found to be the most important skills are data analytics skills (4.67 on a scale from 1 to 5) and entrepreneurship (4.64) followed by general e-literacy (4.45) and the capability of people to cooperate between disciplines (4.45). The factor that received the lowest score and was found to be less but still important when innovating with open data are data visualization and reporting skills (3.83). This shows that a successful organization or team consists of people with data analytics skills in combination with entrepreneurship skills and who are able to combine knowledge from several disciplines.

Regarding technical or hard skills, our respondents foremost stressed that it is important to “employ people who can detect patterns in data, who are creative and have a high quantitative skills”. Even though respondents valued a team with a diverse set of skills, most team members should have a basic understanding of data (e.g. experience with working with large databases) and good analytical skills to actually make technical sense out of the data. One respondent noted that “it is more important to have mathematicians than having only programmers. The latter you can simply hire”.

General IT skills, such as programming, are still important when you innovate with open data: “if you do not have your IT Skills in order, then you are not part of the game anymore”. However, IT skills just qualify, but a deep understanding and creativity with data gives a competitive advantage. Consequently, open data re-users need employees that have R&D skills to experiment (or ‘play’ as respondents qualified this skills) and test how different data sources can provide value or insights about the problems they want to address. More specifically, employees need to think in large distributed systems and make complex data combinations (from both open and closed data source).

Regarding non-technical or soft skills, having employees who are team players is key. Respondents valued employers that are open to the ideas of others and who are willing to learn. These quick learners need to be good networkers with a mindset to create new ecosystem, that are a able set up new collaborations with new parties and defend and their business decisions taken based on data. Additionally, it is important to have people with the capacity to think in terms of customer needs. Business models based on open data are often still a challenge. For example, one of the respondents see applications based on open data as a showcase to get to know new customers and sell additional data services. Hence, it is important to start early in the process with the identification of customer needs and business potential.

5 Discussion

Comparing the three types of capabilities we see that all were found to be important but accents differ. On average skills are valued the highest (4.27) closely followed by organizational capabilities (4.19). IT capabilities are with a score of 3.86 rated the lowest by our respondents. This may indicate that investing in technology and other IT related capabilities that allow you to process and analyze data may not be enough when you want to create value from the re-use of open data. It is seems even more important to focus on skills and organizational capabilities. For example, a lack of employees with the right skills is the biggest barrier to open data innovation.

Although we address the three capabilities in isolation there are many interdependencies between the capabilities. Future quantitative studies with a higher number of respondents may conduct factor analysis to cluster capabilities into a more coherent set of open data innovation capabilities. In addition, some capabilities are present on multiple levels. For example, entrepreneurial orientation (organizational) and an entrepreneurial mindset (people) are very similar and probably overlap. Respondents stress the importance of an emergent innovation strategy: you need to keep your goal in mind, but how and what IT resources are needed may change rapidly. Organizations need to know where they are going, however this does not need to be planned completely in an IT strategy. This finding may contradict with IS Strategy literature that emphasizes the importance of a well-planned IT strategy that are more important to large organizations. SMEs employ more emergent and flexible strategies. Furthermore, we find that frontrunners focus strongly on open innovation: networks are important in the innovation process to make optimal use of partners’ competences and be able to complement each other in the open data innovation process.

While open data is in some aspects different from internal, proprietary organizational data some experts expect that the barriers related open data are largely the same. However, open data may be less sensitive to privacy risks than proprietary data. The value of open data may be the highest when an open data sets are combined with proprietary datasets. Such a combination can bring new insights which would not have been possible without open data.

The small number of organizations studied and the focus on the Netherlands is a limitation of our research which does not allow us to generalize the results to a larger amount of organizations in different national contexts. The surveys and interviews, however, provide interesting insights for future research on the re-use of open data. Future research may expand the number of cases in different countries over a longer period of time. A longitudinal study is required to study how dynamic capabilities may change over time, for example along the maturity of the data innovator.

6 Conclusion

This paper aims to study which capabilities are required to innovate successfully with open data. We present an open data innovation capabilities framework based on a literature review on capabilities for data-driven innovation. Our analysis of 12 Dutch open data front runners reveals that for innovation with data to happen it is not enough to just make more and more open data available by government agencies and focus on IT capabilities at the side of data re-users.

Theoretically, we contribute by offering a capabilities perspective on open data innovation that complements earlier studies on the barriers of data availability and re-use [25]. As data-driven innovation seems to require entrepreneurship, open data research may benefit from theories in entrepreneurship literature, such as dynamics capabilities or causation/effectuation, to better explain the value creation process [27].

Practically, digital skills and digital entrepreneurship policies are as important as persuading governments to open-up data. Policy makers may need to set up stimulating programs that are aimed at educating more employees and entrepreneurs with the right set of skills. Organizations aiming to create more value out of open data may follow the following lessons from frontrunners: (1) Set up a multi-disciplinary team with motivated and creative employees, give this team the mandate to experiment with data, and let them formulate an emergent data strategy and get top management support to create a stable innovation environment. (2) Take an entrepreneurial approach: Recognize and exploit opportunities and strive for an open and experimental culture. When successful, the start-up team may motivate and engage more teams within the organization. (3) Think outside organizational boundaries and silos. Set up new inter-organizational networks with complementing skills and experience. The social interaction between employees from a variety of organizations and domains may inspire innovation. Furthermore it is essential that data users and suppliers keep discussing their data needs and ensure that open data is more reliable and usable for external stakeholders, decreasing the risks for data innovators regarding data quality and availability.