Nowadays, fields like Big Data, Data Analytics and Data Science have drawn a considerable amount of attention from industry. In order to boost the data-driven economy in Europe, the data needs required by industry keep growing; therefore, the main challenge is bridging the gap between these industrial needs and the availability of skilled data scientists.
The popularity of data-oriented fields has an impact on the creation of a plethora of degrees in universities and online courses that offer a wide range of skill sets to aspiring data scientists. Therefore, the data skills needed by industry can be acquired through formal learning (e.g. undergraduate or graduate university degrees) or non-formal learning (e.g. e-learning or professional training).
Nevertheless, the availability of a plethora of resources does not suggest a direct link between industry and future data scientists, resulting in a range of challenges for the gap to be bridged, defined below:
-
Given the constant technological and societal changes, the needs may also quickly change; therefore, it is vital to identify the current industrial needs or trends and adjust the educational offerings according to those altered needs.
-
Given the plethora of available formal and non-formal programmes, there is a need to provide a platform and living repository that will give more targeted and filtered access to these resources to potential data scientists or professionals that want to enhance their skills.
-
A programme needs to be defined that will be able to provide recognition of skills of data scientists acquired through both formal and non-formal education.
-
A framework needs to be defined that will align the current industrial needs with the Data Science curricula and skills provided by formal and non-formal institutions.
This chapter explores the ways in which Europe could build a strong and vibrant big data economy by tackling the challenges above through the enhancement of the benefits that educational institutions and existing skills recognition initiatives have to offer. Specifically, some directions towards the desirable result involve the creation of the Big Data Value Education Hub (EduHub) and the Big Data Value (BDV) Data Science Badges and Labels.
The EduHub is a platform that provides access to Data Science and Data Engineering programmes offered by European universities as well as on-site/online professional training programmes. The aim of the platform is to facilitate knowledge exchange on educational programmes and meet current industrial needs.
BDV Data Science Badges and Labels are skills recognition programmes for skills acquired by formal and non-formal education, respectively. The initial stage of the badges contained the types and requirements for the system by leveraging existing work by the European Data Science AcademyFootnote 1 (EDSA) and EDISONFootnote 2 projects, which were European Union (EU) projects related to Data Science skills. Later, the programmes were enhanced by gathering feedback from academia and industry and by proposing methodologies to bring together interested stakeholders (from both academia and industry) for the design and deployment of the badges and labels, as well as their evaluation and feedback.
This chapter also explores a practical view of how this platform and the skills recognition programme can work in isolation as well as together in order to bridge the industry with academia. This is presented via a pilot of the BDV Data Science Analytics Badge that is currently issued by two universities and the way the badges as well as the educational programmes which issue them can be accessed in the EduHub.
1.1 The Data Skills Challenge
In order to leverage the potential of BDV, a key challenge for Europe is to ensure the availability of highly and correctly skilled people who have an excellent grasp of the best practices and technologies for delivering BDV within applications and solutions (Zillner et al. 2017). In addition to meeting the technical, innovation and business challenges as laid out in this chapter, Europe needs to systematically address the need to educate people so that they are equipped with the right skills and are able to leverage BDV technologies, thereby enabling best practices. Education and training will play a pivotal role in creating and capitalising on BDV technologies and solutions.
There was a need to jointly define the appropriate profiles required to cover the full data value chain. One main focus should be on the individual needs linked to company size. Start-ups, SMEs and big industries have individual requirements in Data Science. We distinguish between three different profiles, (1) to cover the hardware- and software-infrastructure-related part, (2) the analytical part and (3) the business expertise.
The educational support for data strategists and data engineers is, however, far too limited to meet the industry’s requirements, mainly due to the spectrum of skills and technologies involved. By transforming the current knowledge-driven approach into an experience-driven one, we can fulfil industry’s needs for individuals capable of shaping the data-driven enterprise. Current curricula are furthermore highly siloed, leading to communication problems and suboptimal solutions and implementations. The next generation of data professionals needs this wider view in order to deliver the data-driven organisation of the future:
-
Data-intensive engineers: Successful data-intensive engineers control how to deal with data storage and management. They are experts on distributed computing and computing centres; hence they are mostly at the advanced system administrator levels. They have the know-how to operate large clusters of (virtual) machines, configure and optimise load balancing, and organise Hadoop clusters, and know about Hadoop Distributed File System and Resilient Distributed Datasets, etc.
-
Data scientists: Successful data scientists will require solid knowledge in statistical foundations and advanced data analysis methods, combined with a thorough understanding of scalable data management, with the associated technical and implementation aspects. They will be the specialists that can deliver novel algorithms and approaches for the BDV stack in general, such as advanced learning algorithms and predictive analytics mechanisms. They are data-intensive analysts. They need to know statistics and data analysis; they need to be able to talk to data-intensive engineers, but should be relieved from system administrator problems; and they need to understand how to transform problems into appropriate algorithms which may need to be modified slightly. Data scientist benchmarks select and optimise these algorithms to reach a business objective. They also need to be able to evaluate the results obtained, following sound scientific procedures. A data scientist curriculum would ideally provide enough insight into the Data Engineering discipline to steer the selection of algorithms, not only from a business perspective but also from an operational and technical perspective. For this, Europe needs new educational programmes in Data Science as well as ideally a network between scientists (academia) and industry that will foster the exchange of ideas and challenges.
-
Data-intensive business experts: These are the specialists that develop and exploit techniques, processes, tools and methods to develop applications that turn data into value. In addition to technical expertise, data-intensive business experts need to understand the domain and the business of the organisations. This means they need to bring in domain knowledge and are thus working at the intersection of technology, application domains and business. In a sense, they thereby constitute the link between technology experts and business analysts. Data-intensive business experts will foster the development of big data applications from an “art” into a disciplined engineering approach. They will thereby allow the structured and planned development and delivery of customer-specific big data solutions, starting from a clear understanding of the domain, as well as the customer’s and user’s needs and requirements.
In order to successfully meet the skills challenge, it is critical that industry works with both higher education institutes and education providers to identify the skill requirements that can be addressed with the establishment of:
-
New educational programmes based on interdisciplinary curricula with a clear focus on high-impact application domains.
-
Professional courses to educate and re-skill/up-skill the current workforce with the specialised skillsets needed to be data-intensive engineers, data scientists and data-intensive business experts. These courses will stimulate lifelong learning in the domain of data and in adopting new data-related skills.
-
Foundational modules in Data Science, Statistical Techniques, and Data Management within related disciplines such as law and the humanities.
-
A network between scientists (academia) and industry that leverages innovation spaces to foster the exchange of ideas and challenges.
-
Datasets and infrastructure resources, provided by industry, that enhance the industrial relevance of courses.
1.2 Formal and Non-formal Learning
To provide a more enhanced educational support to tackle the skills challenges defined above, both formalFootnote 3 and non-formalFootnote 4 learning can be considered as they contribute to the lifelong learning of data scientists – the continual training of data scientists throughout their careers. While formal systems are often focused on initial training, a lifelong learning system must include a variety of formal and non-formal learning together. This is necessary to meet the individual’s need for continuous and varied renewal of knowledge and the industry’s need for a constantly changing array of knowledge and competences.
Here, we will consider non-formal education to include any organised training activity outside of formal education (undergraduate or graduate university degrees). Non-formal training includes both e-learning and traditional professional training. These courses can be of widely different durations and include training provided by employers, traditional educational institutions and other third parties.
Therefore, in Data Science non-formal education plays a crucial role and complements formal training, by allowing practitioners to up-skill and re-skill to adapt to new Data Science requirements.