Abstract
The SchoolWork Alternation (SWA) programme was developed (under a European Commission call) to bridge the gaps and establish a welltuned partnership between education and the job market. This work details the development of the AI4Citizen pilot, an AI software suite designed to support the SWA programme. The AI4Citizen pilot, developed within the H2020 AI4EU project, offers AI tools to automate and enhance the current SWA process. At the same time, the AI4Citizen pilot offers novel tools to support the complex problem of allocating student teams to internship programs, promoting collaborative learning and teamwork skills acquisition. Notably, the AI4Citizen pilot corresponds to a pipeline of AI tools, integrating existing and novel technologies. Our exhaustive empirical analysis confirms that the AI4Citizen pilot can alleviate the difficulties of current processes in the SWA, and therefore it is ready for realworld deployment.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Citizens daily engage with the public sector in a variety of domains and with different needs. Finding the correct contact and associated procedure, understanding and following the instructions and procedure in the proper manner and making the right decision when several alternatives exist takes much work. Furthermore, conveniently guiding citizens is challenging for public servants, who typically lack decisionsupport tools to assist them. As a result, the time devoted by public servants to public services makes them costly. Nevertheless, more is needed because the limited availability of public servants to guide citizens hinders the quality of service.
A similar case stands for public services offered in the realm of education, which is the target domain of this paper. In particular, here we focus on Alternanza scuola lavoro or SchoolWork Alternation (SWA) programme leveraging the knowhow of Fondazione Bruno Kessler (FBK) who is a partner in the organisation of SWA in Trentino. The European Commission calls for reinforcing the partnership between educational institutions and the job market [1]. The aim is to support the development of skills for the employability of new generations, pointed out as a collective responsibility within educational and training contexts. Following European recommendations, Italy started a training programme for students of every high school, for whom it is compulsory to spend a significant number of hours within workplaces during the three last years of their high school course^{Footnote 1}. This scheme is called the SchoolWork Alternation programme and is described by the Italian Law 107 / 2015: it involves circa 1.5 million pupils ranging from 15 to 19 years old. As argued in [2], the SWA programme creates transformative learning spaces by spurring teachers to investigate how to approach formal learning to realworld needs and providing students with experiences that help them develop both their technical and soft skills (Fig. 1).
Nowadays, implementing the SWA programme requires much manual intervention from public servants in schools. On the one hand, students are typically interviewed to guide them in their choice of internships on offer. Thus, because of limited resources, providing personalised guidance to students is suboptimal and very timeconsuming. On the other hand, schools are challenged with the intricate task of matching internship offers with students’ competencies and skills so that they can ultimately assign students to internships while considering students’ preferences. Therefore, more decisionsupport tools need to be devised to aid school personnel in guiding students and producing allocations that satisfy both students and companies offering internships.
Besides that, the current SWA practice individually allocates each student to some internship. This practice hinders the full potential of the programme for two reasons. First, note that teambased, cooperative learning has been shown to tremendously succeed as a learning method in education (as evidenced by e.g. [3,4,5]). Such a finding has recently spurred research on Artificial Intelligence (AI) algorithms [5,6,7] and AIbased systems to compose teams in education scenarios (e.g. Teammaker within CATME [8], or Eduteams [9]), as well as on empirical studies (e.g. [7, 10]). Furthermore, since one of the aims of the SWA programme is to promote the development of students’ soft skills [2], working in teams would foster the learning of the soft skills that are considered crucial for collaboration and teamwork [11].
In this context, in this paper, we illustrate the development of the AI4Citizen pilot, a software prototype designed as a solution to the shortcomings of the SWA programme. The pilot, developed in the context of the H2020 AI4EU project [12], provides AI tools to automate the SWA programme’s current process while introducing new tools to facilitate teambased learning and the acquisition of teamwork skills. Notably, the AI4Citizen pilot involves pipelining a range of AI technologies, including (1) NLP algorithms to extract competencies and skills from students’ curricula and companies’ internship offers and to match them; (2) a chatbot to assist students in selecting internships; and (3) a novel algorithm to group students into teams and allocate them to internships. Specifically, the AI4Citizen pilot tries to make headway in supporting the SWA program with the following contributions:

An NLPbased tool to match students with internships. The main challenge is ensuring that both the employers and the job applicants speak the same language when describing the required skills and competencies. The descriptions are often in free text format, making it difficult to automatically match the job requirements with the applicant’s qualifications and experiences. To tackle this issue, we have developed a tool to bridge this gap by utilising ESCO [13], a multilingual classification system of European Skills, Competences, Qualifications, and Occupations. This system encompasses a taxonomy of 13,485 competencies and 13,485 jobs linked with relationships and is designed to map the free text descriptions of the job requirements and candidate experiences to a standardised language.

A chatbot to assist students. In order to develop the AI4Citizen chatbot, we used SAP Conversational AI, an endtoend collaborative platform for creating chatbots. Our chatbot understands 28 different types of students’ intentions related to gathering information about the process and expressing and reviewing preferences. The NLP engine recognises the intents and triggers suitable skills to interact with the end users. These skills interact with a dedicated component interacting with the database minimising the need to exchange personal information outside the FBK perimeter.

An empirical evaluation of the chatbot usability. We conducted a user study to evaluate our study. Students from three different classes, 55 students in total, participated in the study. During the evaluation, each student was supposed to single out three internships (wishes) and understand the internship context. In the first class, \(10\%\) of the students did not manage to complete the evaluation due to a default which thereafter was fixed. The students (90%) who completed the evaluation needed less than half an hour (30 minutes) to do so, and they reported being reasonably satisfied with the experience. Experience satisfaction was improved due to fixes and vocabulary enrichment based on former interactions. However, the main criticism reported was regarding the overguidance offered by the chatbot. Instead, the users would prefer a free interaction. These findings should be taken into consideration in the future. At the same time, further experiments should be conducted to help us understand the impact on task completion and the time to complete the global mission.

An empirical evaluation of the team allocation algorithm. We conducted a twofold evaluation to confirm the usability of our proposed algorithm. First, we pitched Edu2Com against a stateoftheart linear programming solver, CPLEX [14]. The results of solving synthetically generated instances of the matching problem showed that Edu2Com outperforms CPLEX in solving time. Second, we tasked Edu2Com to solve large, realworld instances (involving real students’ profiles and internship descriptions) of the problem. Our proposed algorithm can handle the problem and find a solution, while an optimal solver such as CPLEX cannot generate the necessary encoding within a reasonable time. Thus Edu2Com solves large problem instances that CPLEX cannot handle.

An expertdriven validation of the team allocation algorithm. Finally, we tasked educational experts with experience in allocating students to internships to assess allocations produced by Edu2Com against allocations manually produced by experienced teachers. The results indicate that our algorithm is the one of choice to solve the problem.
The rest of the paper is organised as follows. Section 2 presents and analyses our case study, Section 3 outlines the overall architecture of the AI4Citizen pilot, Sections 4, 5 and 6 describe the core AI components of AI4Citizen, Section 9 studies the potential business value of AI4Citizen, and Section 7 empirically evaluates the core components of our systems. Finally, Section 10 draws conclusions and sets paths to future research.
2 The Alternanza Scuola Lavoro case study
An essential aspect of innovation is exploiting information technology to ease people’s everyday life. In this work, we followed SAP’s best practices and adopted a Design Thinkingled development process (see Fig. 2) to discern users’ needs and collect users’ feedback.
Scope. As described in the introduction, the AI4Citizen pilot focuses on the SWA scheme. The scheme involves all students from high schools and technical institutes during the last years of their curriculum. The goal of the scheme is to provide practical experience to students, to help them to consolidate the knowledge acquired at school, to test their attitudes on the field, to enrich their training and guide their study path, and, ultimately, to help them to choose their future career path.
Research. In the research phase, we focused on understanding how our users interact with the system and their individual needs. To better understand the specifics of this case study, we run a series of interviews with schools’ and companies’ stakeholders leveraging the central role of FBK in the organisation of SWA in Trentino. We eventually derive the following four personae [15]:
 Ludovica:

she is an 18yearold student; while attending the fourth class of the classic high school, she is looking for an internship where she can experience team working. She has found a job offer, which seems suitable. However, she does not exactly understand which skills the company requests—Fig. 3 shows a detailed view of Ludovica’s persona.
 Arnoldo:

he is a teacher 61 years old; he manages the SWA office at his Institute, sharing this duty with a colleague; they help more than 300 students every year first to find out a suitable internship; then, they follow the initial administrative work, resolve problems during the internship, and at the end, collect students’ final report and tutors’ evaluation form.
 Carolina:

she is a 48yearold assistant in the internships office at a large research centre; she takes care of the relationships with local high schools, interacts with tutors like Arnoldo when they have to select a team of interns, and chases researchers to stimulate them to prepare new internship offers. Carolina also supports researchers in preparing the final evaluation reports.
 Rosanna:

she is a 37 years old busy researcher; every summer, she looks for a team of two to four students able to help her with pet projects, also aiming to transmit her passion for scientific research work.
Synthesize. Following the interviews with teachers, researchers, and parents, we focused on Ludovica’s and Arnoldo’s experiences to synthesize the main scenario identifying their main pain points. The following paragraph, the scenario asis, describes the typical interaction between our student Ludovica and the reference teacher Arnoldo nowadays.
Arnoldo is the reference teacher for the SWA for a set of classes. During the past weeks, he has already updated a list of all the internship offers received, classified, and published it on the school’s website. He has also sent several emails to all the students and their parents during the previous month concerning new internship proposals. First, Ludovica checks her mailing box to collect all these emails and then scrolls the internship list on the school website. She also asks a friend of her family, who owns a hightech company, if he can host her for an internship. He answered positively to Ludovica. After this work, she prepares a list where she writes the internship proposals that seem interesting to her. In this list are two internships: an internship extracted from one of the emails sent by the reference teacher and the internship in the company of a friend of Ludovica’s family. At this point, Ludovica would like to discuss the two different options with Arnoldo, the teacher who manages SWA at her school. Ludovica should make an appointment with him. They set up a meeting during which Ludovica would like to interact with Arnoldo to understand the internship goals and objectives better and determine if the family friend internship can be performed. The discussion should focus on understanding how the different internships match Ludovica’s competencies, objectives, weaknesses, and strengths. Unfortunately, Arnoldo has to manage thousands of questions from students and can dedicate only a fraction of the needed time to tackle the different questions, which eludes the family’s friend’s internship question. Ludovica feels totally lost in the process and fills out a form with three preferences. A few weeks later, while Arnoldo is trying to make an educated choice after receiving the preferences of all students, composing the teams for the different internships looks to him like solving a complex jigsaw but without knowing the image to compose! Still, Arnoldo remembers the discussion with Ludovica, and he finds out that an apparently good global solution enables her to get assigned to her second choice. Apparently, the young Ludovica is frustrated, especially when Arnoldo cannot precisely explain how he figures out these internships’ matchings fighting with his spreadsheet.
Following this return on experience from the current process, and in order to enhance the student and referent experience, we decided to focus on two specific problems, which are particularly timeconsuming and not supported by the current version of the Vivoscuola portal^{Footnote 2} operated by the Autonomous Province of Trento:

1.
How to best inform and guide Ludovica in her internship quest with a teenagerready conversational interface?

2.
How to support Arnoldo in building teams of complementary interns while optimizing the acquisition of new skills and their personal wishes?
Ideate. The following descriptions, the scenario tobe, explain how AI will modify Ludovica’s quest for the perfect internship and support Arnoldo to dedicate more quality time.
An announcement was made on the school billboard, informing students that it was time to start thinking about their SWA options. Students were informed that they could use social media channels like Facebook and Twitter to search for and enrol in internships. These channels were securely connected to the SWAFBK system, allowing it to access information about each student, including their competencies, previous internship experiences, and past year statistics and testimonials. Ludovica decided to connect to the chatbot using her SWAFBK credentials. The general chatbot asked her questions to determine her interests and preferences, using both the information stored in the SWAFBK system and additional information provided by Ludovica. The chatbot provided Ludovica with answers to common questions. At the same time, it redirected some questions about a specific internship offer to Arnoldo, who receives a small portion of questions so he is able to answer in due time. The chatbot proposed several internship programs aligned with Ludovica’s interests and preferences and passed this information on to the team formation algorithm. After a couple of hours, Arnoldo received the team assignment provided by the algorithm. Although Ludovica had expressed a strong desire to be assigned to her first choice, Arnoldo realized that this constraint decreased the overall assignment score and the fit of five teams, including Ludovica’s. As a result, he followed the algorithm’s recommendation and assigned Ludovica to the internship that was her second choice, explaining the rationale behind the decision to her.
3 A blueprint of the AI4Citizen architecture
Before proceeding with the implementation details of each component, we detail in this section the overall architecture of the AI4Citizen pilot. The architecture of our pilot is funded on pipelining the different intelligence technologies: the competence and skill extraction tool, the chatbot assistant and the team formation algorithm. At the same time, these technologies are to be integrated with FBK’s existing IT system that manages students’ profiles (information regarding their studies and past activities) along with internship program offers. As a result, we deliver a prototype implementation of our endtoend AI4Citizens pilot. In a nutshell, Fig. 4 illustrates the pilot’s architecture, which consists of the entities below:

A module for interfacing with and integrating from the external — preexisting — an IT system that contains and manages the information regarding students’ profiles, their activities, and experiences, etc.

A module for interfacing with and integrating from the external IT system that contains and manages job/ internship offers information.

Data management services to facilitate the “normalization” of the data in terms of vocabularies and taxonomies for the purpose of the offer/candidate matching. More specifically, the two key services here are (i) skill matching to associate an entity with the skills and competences in the ESCO ontology [13]; and (ii) a multidimensional classification of internships to guide the selection process. The latter service characterizes offers in different practical ways (e.g. activity domain, geographical distribution, context (e.g., private vs public hosting entities), etc.).

A team formation service to match teams of students to internships taking into account competencies, preferences, and availability in a holistic, crossorganizational manner.

An internship browser that brings this information together, exposing different APIs for searching, matching, and selecting offers, as well as for storing preferences and matching teams to available offers.

A chatbotbased Natural Language Processing (NLP) and User Interaction (UI) service and its associated chatbot logic component that make the AI4Citizen pilot available to the main stakeholders, in particular to students and teachers. While the role of the UI is to make AI4Citizen directly available on the web or a cell phone, the chatbot logic drives the selection process, answers frequently asked questions and helps collect students’ preferences.
The architecture of the AI4Citizen pilot considers two execution environments. On the one hand, the SWA FBK environment runs the SWA system prior to AI4Citizen. It is in charge of storing and handling the schools’ and companies’ data. On the other hand, the AI4EU environment hosts the AI4EU services required to enable an AIbased SWA system, namely competence and skill extraction, team formation, the chatbot, and the internship browser. The two environments, SWA FBK and AI4EU, provide strong data encapsulation, limiting the interactions with the new components to a set of restrictive APIs to avoid the usage of any Personal Identifiable Information and to respect the term of services between FBK and the Trento province.
Our implementation follows the principles of CloudNative Applications. It uses a containerbased component model, where each separate microservice is deployed in a Docker container on top of the Cloud infrastructure. The Chatbot NLP and UI component run on the SAP cloud and interact with the SAP logic thanks to a REST API over a secure channel.
The following sections propose a detailed description of the main components (Table 1).
4 Competence and skill extraction
Every day thousands of people look for a job and share their curriculum vitae using different channels, like social networks, leasing companies, etc. In the meantime, a similar number of companies look for workers and use the same channels to share the requirements for their open positions. In the context of the scenario presented in this paper, a similar situation arises between the students looking for internships and the companies searching for candidates with appropriate skills and competencies.
The fundamental problem here is to make the two parties speak the same language when it comes to characterising the skills and competencies required by the companies and those owned by the candidates obtained through the activities and experiences they have carried out during their careers and/or studies. Indeed, frequently the descriptions of experiences and school curriculum are expressed in natural language, making it challenging to match requests and offers in an automated manner.
To address this issue, we have defined and developed a tool that bridges the gap between the required and provided competencies when expressed as a free text description.
To reach this goal, we use two existing resources, (i) the ESCO Ontology [13] and (ii) the FastText software [16]. ESCO is the multilingual classification of European Skills, Competences, Qualifications and Occupations. It contains a taxonomy of 13,485 competencies and 13,485 jobs connected with relations. It has been developed as a part of the Europe 2020 strategy. It identifies and categorises skills, competencies, qualifications, and occupations relevant to the EU labour market and education and training. It systematically shows the relationships between the different concepts described in 27 European languages. It has been recently adopted in various countries and regions to describe the professional context in the corresponding territories unambiguously. We use the ESCO ontology as a basis for the description of the job and internship offers as well as for characterising candidates’ skills and experiences.
The latter resource, FastText, is a stateoftheart technology developed by Facebook, which computes the semantic similarity between two text spans. It is released with an opensource license and is available in over 100 languages.
FastText is an evolution of word2vec [17], a word representation model where each word is an object in a vector space, and its position is optimised for the task of predicting the surrounding context.
The main difference between the two models is that the representation \(v_w\) of a word w is not only the representation of its symbol, but it is augmented with the sum of the representations of its subword units:
where \(\mathcal {S}\) is the set containing some character ngrams contained in w, \(u_w\) is the vector representation of the whole word w and \(u_s\) are the representations of the subwords.
Combining the two resources, we have developed the socalled Competence and Skill Extraction (CSE) tool, an AI component that provides the following functionalities:

using FastText, we compute the semantic similarity between plain text (from a job description or CV experience) and the competencies appearing in the ESCO ontology;

we then rank the identified competencies using this similarity value; and

according to the extracted ranking, we suggest the most appropriate candidate competencies for the description.
As a positive side effect of using native multilanguage resources, the tool is automatically able to compare texts written in different languages. For example, the text “application of rules and scientific methods to solve problems” can be semantically connected to the competencies “develop strategy to solve problems”, “plan activities to accomplish specific goals”, and “problem solving”. A similar result can be obtained by replacing the English text with the Italian one, “applicare regole e metodi scientifici per risolvere problemi”. Consequently, the tool may be used in international markets, making it possible to relate different IT systems and their data across EU boundaries and facilitate internships abroad in our context.
It is important to remark that the tool allows for a wide range of use cases in the context of the proposed scenario and broader settings. In particular, the tool allows to:

annotate existing job descriptions with the skill/ competence information for further elaboration;

guide the description and annotation of the information within the IT systems managing the corresponding data (such a tool may assist the operators in appropriate and concise data entry activity);

support the data analysis tools when reasoning across all the data, also considering historical perspective, legacy sources and data, etc.

bridge the gap between the systems using different skill/competence taxonomies in different contexts, such as different territories and organisations.
The implementation of the skill matching tool is done in Python and is available on GitHub under the Apache 2.0 license [18].
5 Chatbot
To enable a smooth interaction between students and the system so that students can learn about the SWA program, get to know the offered internship programs, and ultimately express their preferences, we early took the decision to leverage the concepts of the conversational user interface. Conversational UX is becoming a de facto standard to address processes of medium complexity that can be customised to a specific user by gathering contextual information (e.g., student profile, topics, grade, past internships, past conversation). Leveraging this information, we enable students to ask questions using natural language about the process itself in a generic (e.g., What is an SWA program?), contextual (e.g., Who is my referent professor?) or individualised (e.g., What are my recommended internships?) way, which will eventually trigger questions to propose a pertinent answer. The choice of a conversational UX also enables to leverage the inclination of Generation Z better to use instant messaging platforms.
Our choice to use SAP Conversational AI [19] to implement the AI4Citizen chatbot was due to different factors:

1.
It does not provide selflearning capabilities since we wanted to avoid the danger of bot manipulation (e.g., the Tay experiment [20]);

2.
Its NLP engine provides multilanguage support, including Italian, Spanish and Catalan on top of English;

3.
It mixed nocode and scripting capabilities with an API, enabling interaction in a bidirectional way with other software components;

4.
Its platform provides enterprisegrade level including GDPR support, monitoring and debugging capabilities.
To better understand a chatbot’s development process, we need to introduce some terminology used in the domain. The main aim of a bot is to understand the expression that a user might say. For this purpose, the NLP engine of the bot will analyse the expression and extract features such as date, city or person names  the entities  and enrich them with information suitable for programming (e.g., the GPS location associated with a city) and eventually the sentiment analysis of the expression. Twentyeight types of entities, the gold entities, are predefined, and the programmer can add custom entities to adapt the capability of the chatbot to a given domain. Once this first stage is performed, the conversational engine attempts to match the expression — thanks to a concept of expression distance — with one of the expressions that are captured at the design stage to express an intent: intents are the core of the chatbot and are responsible for understanding user requests. Intent recognition is leveraging our platform’s machinelearning capabilities based on the corpus of expressions defined in the intent. To answer the intent, a skill is triggered that will eventually ask the user to enrich its request with details represented by entities (e.g., the time of departure for booking a train to a city) to enable to propose a pertinent answer and continue the conversation. Action enables skill to build the answer directly on the platform with message or interact with external components with webhook or trigger complementary skills.
The SWA chatbot leverages all these capabilities to interact with the student on one side and with the information stored about their curriculum or previous session. We can divide the skills developed into four groups:

14 skills related to answering frequently asked questions

10 customised skills that capture preferences and provide individual advice to the student

2 skills that implement an authentication pattern to interact with the customised skills

2 skills developed for facilitating the testing sessions
These skills are associated with circa 30 intents capturing more than 200 expressions. We also enriched the gold entities with eight custom entities serving three different purposes:

to enable domainspecific interactions, this encompasses competencies (e.g., related to the ECSO ontology), companies’ names and cities contained in the internship database. While we initially did not foresee adding cities as entities, we realised that the related gold entities lack precision for small cities. Fortunately, these entities can be loaded thanks to the chatbot API automatically upon database modification;

to capture the format of specific identifiers;

to enable the students to refer to the chatbot artefacts (e.g., to ask questions about the chatbot itself).
In order to propose individual advice to students, the system should access or store information related to each student.
This access should be done securely, and due to privacy concerns, we had to minimise the information stored at the chatbot level. For that purpose, our chatbot relies on a set of webhook, which uses a stateless proxy, called chatbot logic, residing on the FBK realm. The communication is done via a secure channel where we transit the information captured during the conversation with the student. The chatbot logic interacts with the internship browser component to store or retrieve the information in order to build the answer based on (i) the academic student profile (e.g., schools, grade, curriculum), (ii) the preferences already captured in the current session or previously, and (iii) the information captured by the current skill. Aside from the privacy advantages of such an approach, it also enables us to benefit from a fullfledged programming language to elaborate the answers relying on the provided message library (e.g., text, card, button). Figure 5 shows an example of a simple interaction during the early stage of the conversation.
To sum up, Fig. 6 shows the different type of skills and their related actions that enables Ludovica to engage with the chatbot in order to better understand the rationales around the SWA activity, to ask questions about the process and administrative details, to express preferences, and to browse internship offers.
6 Allocating teams of students to internships
This section focuses on the problem of forming teams of students and matching them with internships (i.e., allocating student teams to internships), which is equivalent to the team formation service shown in Fig. 4. Due to the complexity of this problem, it is difficult and timeconsuming for an expert to make optimal allocations. Previously, we used existing technologies to develop the competence and skill extraction service and the chatbot, as explained in Sections 4 and 5. However, creating our team formation service requires new research contributions. Section 6.1 explains why this is necessary, followed by a formal definition of the allocation problem in Section 6.2. Then, in Section 6.3, we provide the encoding for the allocation problem to be optimally solved as a linear program (LP) with some stateoftheart existing solver. Lastly, we introduce a novel heuristic algorithm for computing team allocations in Section 6.4. This algorithm allows us to tackle realworld problems of significant size, which cannot be solved with sophisticated optimisation libraries like CPLEX, as demonstrated in Section 7.3.
6.1 Motivation
Many realworld problems require teams of people or machines to work together on tasks, such as robot teams in search and rescue missions [21] or teams of drones for surveillance [22]. As a result, there has been much research on various methods for assigning teams to tasks in the field of artificial intelligence.
Existing literature has explored how to form one team to work on one task [23,24,25], form one team to work on many tasks [26], and form multiple teams to solve a single task [5, 10, 27, 28]. Some research has also been done on forming multiple teams to match multiple tasks, allowing overlaps either on agents participating in many teams [21] or on many teams performing the same task [29]. However, there needs to be more attention given to the problem of how to allocate teams to tasks with no overlaps, i.e., teams that share no common members and tasks that are uniquely tackled by one team. This problem above is the one our team formation service aims to address when allocating teams of students to internships. Despite limited existing research on this topic [30, 31], we cannot rely on prior approaches to solve our problem due to several shortcomings. Specifically, one of the approaches uses brute force and branchandbound techniques that limit the number of agents and tasks that can be analyzed [31], while the other approach handles only a small number of tasks [30].
Additionally, existing models assume that a team must possess the exact competencies a task requires. Such an assumption is only sometimes practical in realworld scenarios. As [28] points out, existing literature endures this assumption and distinguishes between two competence models: boolean models where an agent acquires or lacks some competence [23,24,25, 30], and graded models where an agent acquires some competence up to some degree [5, 27, 28, 32]. However, in the real world, it might be the case that acquiring some similar competence is sufficient for handling a specific required competence. Consider, for example, the educational world, where a student can be adequate for some internship program even if the student does not possess all the required competencies precisely as requested as long as the student’s competencies are similar enough.
Therefore, this work aims to develop a new formalization and algorithm that can practically address the many teams to many tasks allocation problem. In what follows, we use the terms agents and tasks instead of students and internship programs to present the problem more broadly. Later (in Section 7.3), we will apply our general problem to our case study of matching teams of students with internship programs. This approach will help us consider the competencies possessed by team members, including similar competencies that may still qualify a student for an internship, even if they do not possess the exact competency required by the task. Ultimately, our goal is to develop a novel algorithm that can effectively allocate teams of agents to tasks in realworld scenarios.
6.2 Defining the team allocation problem
This section aims to cast the problem as an optimisation one formally. First, we introduce the basic concepts of the team allocation problem, then we move to the concept of competence models and put forward the model used in this work. Finally, we formally define the optimisation problem.
6.2.1 Basic Concepts
According to the Oxford Learnner’s Dictionary^{Footnote 3}, competence refers to the ability to perform well, the authority or power in handling a particular situation, or a necessary skill for performing a particular job or task. Let us denote with \(\mathcal {C}\) a fixed set of competencies. A task is described through its requirements on (i) competencies and (ii) team size. Firstly, for some agent(s) to successfully carry out a task, they must adequately deal with the required competencies. Secondly, the required team size signifies the necessary and sufficient number of agents working together to complete the task successfully—i.e., a team with fewer members than the required team size will not be able to complete the task, while a larger team overconsumes (agent) resources. For example, an internship program in a computer tech company might require four competencies (machine learning principles, coding in Python, web development, and fluency in English) and a team of size three. Therefore, the company offers an internship that needs three students together who possess the four required competencies. Moreover, even though every required competence is necessary for successfully completing a task, not all competencies are equally important; as such, we consider the competencies’ relative importance as part of the task description. For example, for an internship in a computer tech company, coding in Python might be more critical than fluency in English. Formally, taask \(\tau \) is given by \(\langle C_\tau , w_\tau ,\textrm{s}_\tau \rangle \), where \(C_\tau \subseteq \mathcal {C}\) denotes the required competencies, \(w_\tau :C_\tau \rightarrow (0,1]\) is a relative importance weight function, and \(\textrm{s}_\tau =1,2,\cdots \) denotes the required team size.^{Footnote 4} We denote with \(T=\{\tau _1,\cdots ,\tau _m\}\) a set of m tasks (with \(T = m\)). An agent is described through the competencies they possess. A team K corresponds to a subset of agents, \(K \subseteq A\), who jointly work on some task(s). A sizecompliant team K for some task \(\tau \in T\) is a team such that \(K = \textrm{s}_\tau \), i.e., a team with as many members as required by \(\tau \). We denote with \(\mathcal {K}_\tau = \{K \subseteq A : K = \textrm{s}_\tau \}\) the set of the sizecompliant teams for \(\tau \) given a set of agents A.
6.2.2 Competence coverage and affinity
In order to assign a team of agents to a task, it is necessary to ensure that the team possesses the competencies required to solve the task. To achieve this, the competencies, as determined by \(C_\tau \), must be assessed before allocating a team to a task. For a team of agents K to be considered suitable for a task \(\tau \), the team must possess the necessary competencies required by \(\tau \). That means that for each required competence by the task, there must be at least one agent in the team who possesses that competence.
The current literature on teamtask allocation considers competencies as either Boolean or graded features, with a team’s matching quality being determined by a function, usually expressed as a utility function. However, these existing models are pretty restrictive, so a new approach is needed. In this study, we propose a method for assessing the matching quality of a team with a task based on the semantic similarity of the competencies required by the task and the competencies possessed by the team. To be more precise, we introduce the concepts of competence coverage and competence affinity, which provide an intuitive way to determine the matching quality of a team for a task. Competence coverage is determined by the degree of similarity between the required competencies of the task and the team’s collective competencies. Considering the semantic similarity of the competencies allows us to deal with the current competence models’ limitations.
Within the scope of some specific domain (e.g., the educational domain), we can structure the competencies related to the domain to capture semantic relations among them. More and more entities (countries, organisations, institutes, etc.) work towards registering and structuring competencies in ontologies—e.g., ESCO [13], O*Net [33], SFIA [34], ISFOL [35]. For example, two essentially different competencies, such as coding in C++ (competence c) and coding in Java (competence \(c'\)), share essential principles, e.g., both are compiled languages and objectoriented languages.
Therefore, we assume that the competencies in \(\mathcal {C}\) are structured in a competence ontology based on the semantic relations among the competencies. Then, for any two competencies \(c, c' \in \mathcal {C}\), there is a degree of similarity, denoted as \(\textrm{sim}({c,c'}) \in [0,1]\), reflected in the competence ontology. Moreover, we assume that the competencies can be represented in a (treelike) graph structure, where nodes denote competencies and directed edges indicate intimate semantic relations between the nodes (such that Parent nodes are broader concepts of their successor nodes). In Section 7.3, we utilise ESCO ontology, a wellestablished ontology with such properties. Given a competence ontology of \(\mathcal {C}\) and two competencies \(c,c' \in \mathcal {C}\), to compute the semantic similarity between c and \(c'\) we use a variant of the similarity metric presented in [36]. Unlike [36], the variant we propose here guarantees that any competence node is maximally similar to itself (i.e., \(\textrm{sim}({c,c})=1)\). Thus, the similarity is given by
where l stands for the shortest path connecting competencies c and \(c'\) and h stands for the depth of the deepest competence subsuming c and \(c'\). Parameters \(\kappa ,\lambda \) control the impact of the distance l and depth of common ancestor h on the similarity metric, respectively. Similarly to [37], for any two \(c,c'\in \mathcal {C}\) it holds that \(\textrm{sim}({c,c'}) \in [0,1]\).
We say that some agent \(a\in A\) with acquired competencies \(C_a\subseteq \mathcal {C}\) covers competence \(c \in \mathcal {C}\) with a’s most similar competence \(c'\), and specifically with a degree reflecting the semantic similarity between c and \(c'\), i.e., \(\textrm{cvg}(c, a) = \max _{c' \in C_a} \textrm{sim}({c,c'})\). Moreover, we say that an agent \(a\in A\) covers a task \(\tau \in T\) with required competencies \(C_\tau \in \mathcal {C}\) with a degree depending on how well a can cover each of the required competencies. Specifically, we use the product over a’s competence coverage across all required competencies:
In a team context, it is unlikely that each agent will need to cover all the required competencies. Instead, team members collaborate and complement each other by sharing responsibilities [38]—notably, [39] highlights that recent studies define team through assigning responsibilities to agents. Thus, we must solve a competence assignment function. Let \(\tau \in T\) be a task and \(K\subseteq A\) be a subset of agents that forms a team; each agent, \(a \in K\), is assigned with some of the required competencies in \(C_\tau \). That is, each agent in the team is responsible for covering only some of the competencies required by the task. Therefore, a competence assignment function (CAF), denoted as \(\eta _{\tau \rightarrow K} : K \rightarrow 2^{C_\tau }\), assigns a subset of required competencies to each team member \(a\in K\). As noted in [27, 28], a CAF should satisfy the following property: at least one agent, \(a \in K\), must cover any competence \(c \in C_\tau \), i.e., \(\bigcup _{a \in K} \eta _{\tau \rightarrow K}(a) = C_\tau \). Additionally, let the reversed function (rCAF), denoted as \(\theta _{\tau \rightarrow K}: C_tau \rightarrow 2^{K}\), show which agents are responsible for each required competence. Let \(\Theta _{\tau \rightarrow K}\) be the set of all CAFs of some team K for task \(\tau \). Notably, there are many different CAFs, though not all CAFs are equally good, i.e., some CAFs are better than others. [27] discusses inclusive competence assignments, thus, here we build on inclusive CAFs, and we propose the fair competence assignment functions (FCAF). In more detail, an FCAF satisfies two properties: each agent is responsible for (i) at least one competence (as in inclusive competence assignments), and (ii) at most \(\left\lceil \frac{C_\tau }{K}\right\rceil \). These properties ensure that all agents actively participate in the teamwork and that a few agents are assigned excessively many responsibilities while others are assigned just the minimum.
Given a task \(\tau \), a team K and an FCAF \(\eta _{\tau \rightarrow K}\), we assess the matching quality of K working on \(\tau \) as how well each team member can cover their responsibilities according to the FCAF at hand. Thus, we use the competence affinity metric, which (i) awards high coverage of competencies and (ii) does not penalise low coverage of competencies with low relative importance. First, we define the competence affinity from a single agent’s perspective:
Definition 6.1
(Agents’ Competence Affinity). Given an agent \(a \in A\), a task \(\tau \in T\), and a competence assignment function \(\eta _{\tau \rightarrow K}\), the competence affinity of a to \(\tau \) is:
Then, the team’s competence affinity depends on the competence affinity of each team member. Specifically, it is defined as the alaNash product over the individual competence affinity of every agent in K. The product promotes equal contributions across the agents in the team (i.e., all agents are equally contributing to the task) and therefore favours teams with balanced contributions. Formally, the competence affinity from a team’s perspective is defined as:
Definition 6.2
(Team’s Competence Affinity). Given a team of agents \(K\subseteq A\), a task \(\tau \in T\), and a fair competence assignment \(\eta _{\tau \rightarrow K}\), the competence affinity of K to \(\tau \) is:
Notice that a team’s competence affinity changes as responsibilities are assigned differently, i.e., the very same team exhibits different matching qualities when different FCAFs are in place. The FCAF that results in the best competence affinity of a team for a task amounts to solving an optimisation problem:
such that \(\bigcup _{a\in K} \eta _{\tau \rightarrow K}^* (a) = C_\tau \) and \(1\le  \eta _{\tau \rightarrow K}^*(a)\le \left\lceil \frac{C_\tau }{K}\right\rceil \) for every \(a \in K\). In practice, though, we can efficiently solve the above optimisation problem optimally since team sizes and the number of required competencies for a task (and therefore the search space \(\Theta _{\tau \rightarrow K})\) are relatively small—usually team size ranges in [2, 5] and a task requires \(\le 10\) competencies.
Then, for task \(\tau \), considering all the sizecompliant tams \(\mathcal {K}_\tau \), the best team shall maximise the competence affinity; i.e., \(K^* = \underset{K \in \mathcal {K}_\tau }{\mathrm {arg\;max}} \; \textrm{aff}(K,\tau ,\eta _{\tau \rightarrow K}^*)\). Note that the search space for this optimisation problem is \(\left( {\begin{array}{c}n\\ \textrm{s}_\tau \end{array}}\right) \) large (and increases as n and \(\textrm{s}_\tau \) increase), where n is the number of agents in A and \(\textrm{s}_\tau \) is the team size of task \(\tau \).
6.2.3 The optimisation problem
Now, given a set of tasks T (with \(T>1\)), finding a team for each task, i.e., computing a many teams to many tasks allocation, is another optimisation problem. In more detail, the optimal allocation of agent teams to tasks is the one that maximises the competence affinity of every team, respecting (i) that each agent cannot be part of more than two teams; (ii) each team can work on at most one task; and (iii) at most one team can work on each task. These three properties we just mentioned describe a Feasible Team Allocation Function, which we formally define as:
Definition 6.3
(Feasible Team Allocation Function (FTAF)). Given a set of tasks T, and a set of agents A, a feasible team allocation function g is a function \(g:T\rightarrow 2^A\) such that: (1) every task \(\tau \in T\) is allocated its requested number of agents so that \(g(\tau ) = \textrm{s}_\tau \); and (2) an agent can only be assigned to one team: for every pair of tasks \(\tau ,\tau ' \in T\), such that \(\tau \ne \tau '\), it holds that \(g(\tau )\cap g(\tau ') = \emptyset \).
Given tasks T and agents A, G denotes the family of all feasible team allocation functions. Then, the overall affinity of an FTAF g equals the product over the competence affinity of every team for their assigned task according to g. As we mentioned earlier, the product favours balance, i.e., allocations where every team formed is more or less equally good for their task. The optimum FTAF \(g^*\) is the one that maximises the overall team affinity:
Definition 6.4
(NonOverlapping Many Teams to Many Tasks (NOMTMT) Allocation Problem). Given a set of tasks T, and a set of agents A, the NonOverlapping Many Teams to Many Tasks Allocation Problem is to find the team allocation function \(g^* \in G\) that maximises the overall team affinity of the team allocation:
Here we highlight that for computing the overall team affinity for some FCAF \(g \in G\), we need to compute the optimal FCAF for each team for their assigned task according to g. Thus solving the NOMTM allocation problem requires solving T optimisation problems for each \(g \in G\)..^{Footnote 5} Next, we show that the NOMTMT allocation problem is \(\mathcal{N}\mathcal{P}\)hard.
Theorem 6.5
The NonOverlapping Many Teams to Many Tasks (NOMTMT) allocation problem is \(\mathcal{N}\mathcal{P}\)hard.
We omit the proof due to space limitations. However, in Appendix B we show that the problem is \(\mathcal{N}\mathcal{P}\)hardby using a reduction from binary combinatorial auction winner determination problem for singleunit auctions, which is known to be \(\mathcal{N}\mathcal{P}\)hard [40].
6.3 Solving the team allocation problem: a linear programming approach
We can optimally solve the NOMTMT allocation problem with the means of a linear program (LP); thus, here, we provide the corresponding LP encoding. Let us start, though, with a nonlinear encoding. We use a decision variable \(x_\tau ^K\) for each task \(\tau \in T\) and each sizecompliant team \(K \in \mathcal {K}_\tau \). \(x_\tau ^K\) equals 1 when team K is allocated to work on task \(\tau \) in the optimal solution, and 0 otherwise. Then the nonlinear program is:
subject to:
Then, for the optimal allocation function \(g^*\) it holds that \(g^*(\tau ) = K\) if and only if \(x_\tau ^K = 1\). Constraints (6b) and (6a) ensures that the resulted \(g^*\) is an FTAF (Def 6.3). As we said, the above is not a linear program due to the nonlinear objective function (Eq. (6)). However, we can do a linear transformation by considering the logarithm. Thus, we can obtain the optimal allocation function by solving the equivalent linear program below:
subject to: Eqs. (6a), (6b), and (6c). To obtain the equivalent LP, we first shift the function’s domain—we do so in order to avoid computing \(\log (0)\)—and then we use the logarithm to convert the double product into a double sum and power factors into products. (i) we use the \(\log (\cdot )\) to convert the double product to double sum, and the powered factor into a product; and (ii) we change the function’s domain to avoid \(\log (0)\). We can solve this LP with the aid of an offtheshelf solver (e.g. CPLEX [14], Gurobi [41]), GLPK [42], or SCIP [43]). Given sufficient time, an LP solver will return an optimal solution to the NOMTMT allocation problem.
Notice that to solve the LP above, we need to precompute the competence affinity between each task and every possible sizecompliant team, meaning that we need to precompute the optimal CAF (via solving an optimisation problem) for each such \(\langle \)team,task\(\rangle \) pair. This is bound to lead to large linear programs as the number of agents and tasks grows.
6.4 Solving the team allocation problem: a heuristic approach
In this Section, we put forward Edu2Com, a novel twostage heuristic algorithm, to overcome large LPs. In the first stage, we compute an initial feasible solution for the allocation problem, i.e., an initial FTAF (Definition 6.3), while in the second stage, we iteratively improve the initial solution via different swapping strategies.
6.4.1 Building an initial team allocation
First, the algorithm builds an initial FTAF. For each task, the algorithm forms a promising team, i.e., a team that seems capable of tackling the task based on the task’s required competencies and the competencies offered by each team member. The teams are formed sequentially, starting with the ‘hardest’ tasks. We characterise a task as hard when only a few agents can cover its required competencies. With this heuristic, we prioritise the tasks so that ‘lighter’ tasks (i.e., tasks whose required competencies can be easily covered by many agents) do not bind agents whose acquired competencies can be utilised in hard tasks.
Computing tasks’ allocation hardness. The allocation hardness (or simply ‘hardness’) of a task assesses the difficulty of finding agents who can adequately cover the task’s required competencies. Intuitively, when, for some competence c, more agents adequately cover c (i.e., with high coverage on c), it is easier to find an agent for some task requiring c, and therefore the task is less hard. Inspired by the notion of moment of inertia [44], we measure the difficulty to cover a competence, and therefore the task’s hardness requiring that competence, as the effort (distance from (1,0)) that agents should make to reach the ideal competence coverage of c, which is 1. We remind the reader that the coverage of a competence c ranges from 0 to 1. Thus, the ideal competence coverage for a competence occurs if every agent can fully cover the competence (i.e. competence coverage equals 1 for all agents). Let \(\mathcal {I}\) be a partition of the competence coverage domain [0, 1] in ten distinct intervals, and mid(J) be the midpoint of the interval \(J \in \mathcal {I}\). Now, given a competence \(c\in \mathcal {C}\) and a set of agents A, we compute the moment of inertia of c as \(I(c) = \sum _{J \in \mathcal {I}} n_J^c \cdot \big (1  mid(J)\big )^2\), where \(n_J^c\) is the number of agents in A whose coverage of competence c lies within interval J, and hence represents the mass of c in the interval.
Then the hardness of a task \(\tau \in T\) depends on (i) the moment of inertia of each required competence \(c \in C_\tau \), and (ii) each competence’s relative importance \(w_\tau (c)\). As such, for some task \(\tau \) we compute \(\tau \)’s hardness as:
where \(\omega \) is a normalising factor on the importance weights of all competencies in \(C_\tau \).
Building an initial team allocation. When we compute each task’s hardness, we build the initial FTAF. To do so, our algorithm, Edu2Com, sequentially for all tasks picks a team out of the available agents, starting from the hardest one. In more detail, let us denote with \(A_\tau \subseteq A\) the available agents (i.e., the agents who have not yet been allocated to a team) while picking a team for task \(\tau \); and with \(\bar{C}_\tau \) task’s \(\tau \) required competences sorted according to their relative importance. The ith most important competence is denoted as \(\bar{C}_\tau ^i\). The first agent to be allocated to \(\tau \)’s team is the agent in \(A_\tau \) that can cover best competence \(\bar{C}_\tau ^1\)—formally, the first agent to be picked is computed as \(\sigma _1 = \underset{a \in A_\tau }{\mathrm {arg\;max}} \{\textrm{cvg}(\bar{C}_\tau ^1,a)\}\). After picking the first agent \(\sigma _1\), the remaining available agents are \(A_\tau  \{\sigma _1\}\). Then, the ith agent to be picked for the team of task \(\tau \) is computed as \(\sigma _i = \underset{a \in A_\tau  \Sigma _{i1}}{\mathrm {arg\;max}} \ \{\textrm{cvg}(\bar{C}_\tau ^j,a)\}\) where \(\Sigma _{i1} =\bigcup _{k=1}^{i1}\{\sigma (\bar{C}^k_\tau )\}\) denotes the agents already picked for the team, and \(j = \big (i1\pmod {C_\tau }\big )+ 1\) indicates which competence agent \(\sigma _i\) shall cover best. Thus, the team formed to work on \(\tau \) is \(K = \bigcup _{i=1}^{\textrm{s}_\tau } \sigma _i\), while the agents in K are no longer considered available, i.e., \(A_{\tau '} = A_\tau K\) where \(\tau '\) is the hardest task after \(\tau \).
6.4.2 Improving team allocation
After finding the initial promising FTAF, in the second stage, Edu2Com attempts to improve it iteratively. Specifically, we introduce several heuristics to be applied in each iteration, exploiting swaps of agents among teams. We distinguish two kinds of iterations based on the heuristics applied in each kind of iteration:

1.
Single pairing. For two randomly selected tasks, Edu2Com applies the following two swappingbased heuristics:

(a)
Exploiting swap. Considering solely the agents currently allocated in the selected tasks, find the optimal team allocation.

(b)
Exploring swap. If there are available agents (i.e. agents assigned to no team), try to swap a randomly selected assigned agent (to any of the two tasks) with a randomly selected unassigned agent. Try a maximum of k times, and keep only the swaps that improve the competence affinity.
Each heuristic is applied (if possible) once within a single paring iteration.

(a)

2.
Exhaustive pairing. For every pair of tasks, swap every possible pair of agents within them. If competence affinity improves, keep the change and stop the exhaustive pairing iteration.
The iterative process lasts until (i) the global maximum competence affinity is reached, (ii) no solution improvement occurs for a number of iterations, or (iii) the algorithm is stopped by the user. In all cases, the most recently found solution is returned.
6.5 Handling students’ preferences
Notice that although the description of our case study in Section 2 considers students’ preferences, the definition of our task allocation problem in Section 6.2 focuses on competence affinity. However, our problem definition can be readily extended to handle preferences. Next, we show how to achieve that by turning the singleobjective optimisation problem in Equation 5 into a multiobjective optimisation problem that accommodates both competencies and students’ satisfaction. Thus, we propose to follow a similar approach to our own work in [28], where we composed teams considering both competencies and personalities.
Formally, we consider that the preferences for each student a in A can be represented through a linear order (or a ranking) \(\succeq _a\) over the tasks in T. We will refer to this order as the preference profile of a. We will note as \(\succeq _A\) the set of preference profiles of agents in A. Then, given a task \(\tau \in T\), we can compute the satisfaction degree of student a as:
where \(pos(\succeq _a,\tau )\) stands for the position of \(\tau \) in the preference profile \(\succeq _a\), being 1 the position of the most preferred task, and T the position of the least preferred task. Again, since we want to guarantee a balanced preference satisfaction within a team, we compute the satisfaction degree of a team as follows:
Intuitively within a team K, we reach maximum satisfaction when the assigned task \(\tau \) is the first choice of all members in K, i.e., \(\tau \) is the most highly ranked task of all for every \(a \in K\):
On the other hand, if \(\tau \) is least proffered even for a single student, the satisfaction of the team is dropped, and of course, the minimum satisfaction is reached when the assigned task \(\tau \) is the least ranked task for every \(a \in K\):
Now we are ready to embed the preference satisfaction of a team in our problem by adding a further objective in the objective function of Equation 6, which then becomes:
We use weighting parameters \(w_a\) and \(w_s\) to regulate the influence of the team’s competence affinity and satisfaction, respectively, in the multiobjective optimisation function. Hence we can obtain a family of objective functions including two edge points (i) considering the competence affinity (\(w_a = 1\) and \(w_s = 0\)) solely, and (ii) considering the team’s satisfaction (\(w_a = 0\) and \(w_s = 1\)) solely. It also includes any function that (iii) proportionally considers the two parties (\(w_a+w_s=1\) with \(w_a>0\) and \(w_s>0\)).
7 Empirical analysis
Our empirical analysis aims to evaluate each of the core AI services that compose the AI4Citizen pilot. Unfortunately, we had to fragment our evaluation and conduct separate empirical analyses on each component due to the pandemic (COVID19) since the SchoolWork Alternation programme was cancelled. Thus, Section 7.1 empirically explores using FastText that yields better performance. Thereafter, in Section 7.2, we involve highschool students in the evaluation of the usability of our chatbot. Finally, in Section 7.3, we analyse the time required by our team formation service to provide allocations over realworld data. Furthermore, we conduct a validation of the algorithm’s recommended allocations with the aid of experts. Thanks to the results of our empirical analysis, we conclude that the AI4Citizen pilot is ready for deployment in a realworld setting.
7.1 Evaluation of competence and skill extraction
This section aims to evaluate the extraction capabilities of the CSE tool introduced in Section 4. Recall that the CSE tool results from combining the ESCO Ontology [13] and FastText [16], allowing us to relate the texts taken from a CV with job offers by associating them to the competencies in the ESCO Ontology.
To evaluate the CSE tool, we performed two different experiments. In both experiments, we used competencies, CVs, and job offers described in Italian. For the first experiment, we adopted the FastText similarity algorithm outofthebox: (i) first, the description of the competence in natural language is converted into a vector; (ii) similarly, vectors are also computed from all the competences present in ESCO, written in the same language of the required competence; (iii) the cosine between the vector in (i) and each of the vectors in (ii) is calculated, and the results are ranked. The higher the cosine, the more similar the texts behind.
Let d be the text we want to compare with the competencies in ESCO. First, all the documents (job descriptions, CVs, ESCO descriptions of competencies) are preprocessed using Tint [45], a tool for analysing texts written in Italian. With Tint, we perform some preprocessing steps such as tokenisation, sentence splitting, and lemmatization^{Footnote 6}, so that words such as andavano, vado are all referred to the main form andare.
After that, vectors for each word are obtained using FastText. A document can therefore be represented as an ordered list of vectors:
where \(v_i\) is the vector representation of the word \(w_i\), being \(n_d\) the length of the document in term of words.
Following [46], all the vectors can be merged into a single one by computing the average of them, document by document. Each document d is therefore transformed in a single vector \(v_d\) as follows:
Finally, we use as a similarity function the cosine between two vectors. Let d and e be two documents, and \(v_d\) and \(v_e\) be their corresponding vectors. The cosine similarity between them is calculated as follows:
where \(\alpha \) is the angle between \(v_d\) and \(v_e\). The smaller the angle (and therefore, the larger the cosine), the more similar are the two documents.
For the second experiment, we added a preprocessing phase where only the most relevant words are taken into account, using TFIDF (term frequencyinverse document frequency) [47], using an approach similar to the one described in [48].
TFIDF is a statistical measure that reflects the importance of a word in a set of documents. The intuition is that a term occurring in many documents is not a good discriminator, and hence it should be given less weight than words that occur in fewer documents. The TFIDF function measures the importance of a term in a particular document: it grows proportionally to the number of times the terms are used in the document, but it decreases proportionally to the frequency of the term in the whole collection.
The function is the product of the two components, TF and IDF. Let t be the term and d the document:
where \(N=\left D\right \) is the number of documents in the set D.
Once we have computed the TDIDF value for each term and document, our goal is to use that function to remove from the set of words both the terms that are too rare and those that—conversely—are too common in the ESCO corpus. We then set two parameters, \(f_\textrm{min}\) and \(w_\textrm{min}\), respectively, the minimum frequency of a word in the ESCO ontology and the minimum TFIDF weight. We test different settings with a matrix of possible values and find out that the best values wrt. our evaluation is \(f_\textrm{min}=5\) and \(w_\textrm{min}=2.0\). We then filter out terms that appear less than 5 times in the whole set of documents and words whose TFIDF value is less than 2.0.
After these preprocessing steps, documents are processed as already described above.
To evaluate the approaches, we ask teachers from high schools in Italy to annotate some sentences written in Italian taken from the “Alternanza ScuolaLavoro” project, plus a set of custom sentences written from scratch by the teachers themselves. The first ten competencies from ESCO were shown for each sentence, and teachers were asked to annotate whether they were relevant to the input sentence or not. A total of 20 teachers took part in the experiment, annotating 368 predefined sentences and 65 manually written sentences. Table 2 shows the number of sentences that were annotated as correct and incorrect in the first two ESCO competencies of the ranking.
Results show that both algorithms are effective on predefined sentences, and in particular, the preprocessing using TFIDF increases the accuracy compared to the plain use of FastText. By analysing the sentences the teachers wrote, we can see a massive difference in the ratio between correct and incorrect in the two sets. This is due to the freedom left to the teachers, which led them to enter texts that are too long or too short concerning the average descriptions of competencies in ESCO, yielding more challenging examples to the algorithm.
Generally, the system performs well, and our experiments allowed us to understand which algorithm performs better. Most of the inaccuracies are due to the imbalance of the ESCO ontology: some branches are very detailed, while others are often poor or missing at all. Both the approaches (with and without TFIDF) are easy to extend to any language, as long as both FastText and ESCO are available in that language.
Since the preprocessing phase using TFIDF outperforms Fastext alone, we use that algorithm in the AI4Citizen pilot. Notice that our algorithm is publicly available on Github [18].
7.2 Evaluation of the chatbot usability
We evaluated the usability of the chatbot involving three high school classes of two different technical institutes for a total of 55 students:

20 students from a fourth class of a technical institute (referred to as Class1)

17 students from a fifth class of the same institute (referred to as Class2)

18 students of a fourth class of another institute (referred to as Class3)
We only involved students of technical institutes because these types of schools require their students to do long periods of SWA (schoolwork alternation) activities.
7.2.1 Procedure
The students of Class1 and Class2 were in their classroom assisted by their professor, whereas the researchers who organised the evaluation were remotely connected to the classroom due to the constraints imposed by the pandemic.
The students of Class3 were remotely connected to a virtual meeting room. Two professors chaired the evaluation activity from the school classroom connected to the virtual meeting room.
The evaluation activity consisted of three phases:

1.
Each student logged into the chatbot using their private institute’s credentials and worked on a given mission.

2.
After completing the mission, they filled in a questionnaire composed of the Chatbot Usability Questionnaire (CUQ) plus four open questions (see Appendix A for detail about the questionnaire).

3.
Finally, the researchers give a short presentation of the AI4EU project to inform students about the context of the evaluation test.
The mission given to the students consisted of (a) understanding if they had to do an internship this school year and (b) choosing three internships of interest. Class3 also had a third mission consisting of (c) looking at how to reach the companies of the formerly chosen internships and opening their website.
We enhanced the chatbot between the different assessment phases within this reallife experimentation. Since the students interacted freely with the chatbot, we enriched the intent recognition with the students’ expressions using the platform’s monitoring feature. We also identified that a few students were too restrictive in expressing their preferences, which ultimately did not enable them to complete their mission (e.g., the selection of 3 choices), causing frustrations. We modified our chatbot to guide them better to alleviate these restrictions and select other internships. We were also enabled to connect to the company website and to look at their precise locations taking into account informal remarks from the first class. The improvement of the results (see Table 3) showed how these relatively simple enhancements drastically changed the student experience and enabled a more significant proportion to complete their mission (see Table 4).
7.2.2 Results
The scores of the CUQ questionnaire filled in by the students are reported in Table 3. When asked if it was easy to solve the given mission, \(65\%\) of the participants (36 students) said yes (see Table 4 for details). When asked what they liked most about the chatbot, 21 students (\(38\%\) of the participants) answered that they liked to receive information on internship offers; 16 students (\(30\%\)) answered they liked the interaction with the chatbot. When asked about suggestions for improving the chatbot, we got many interesting answers that helped improve it. Seventeen students recommended expanding the vocabulary of allowed inputs. They would have preferred more tolerance in the type of questions they could enter.
7.2.3 Discussion of the results
From the data obtained with the CUQ questionnaire, it emerges that usability problems still need to be solved, even if the score has improved from the first to the third experimentation (see Table 4).
Despite the fact that the chatbot still has usability issues, students liked being able to use this modality to get information about their internship.
Interestingly, our results showed that students still believe that interacting with a chatbot is the same as interacting with a person. They would like to use a completely free vocabulary, not constrained at all. This issue has to be considered in future versions of the chatbot.
7.3 Evaluating the team allocation algorithm
In this section, we evaluate the capability of our team allocation algorithm to cope with realworld problems. Moreover, we evaluate the quality of the team allocations that it computes with the aid of experts. More precisely, Section 7.3.2 details the analysis of the solution quality, runtime performance and anytime capabilities of Edu2Com when pitched against CPLEX V12.10.0, a stateoftheart linear programming solver, over synthetic data. Section 7.3.3 reports on the solution quality and time performance of Edu2Com when solving several problem instances with realworld data. Finally, Section 7.3.4 details the validation of the quality of allocations by experts.
7.3.1 Empirical settings
The implementation of Edu2Com, along with all the necessary supporting code, was made in Python3.7. All the experiments ran on a PC with Intel Core i7 CPU, 8 cores, and 8Gb RAM. In our experiments below, we set our algorithm’s parameters to: compute similarity with \(\kappa =0.35, \lambda =0.75\); perform one exhaustivepairing every 50 singlepairings; stop the algorithm after two blocks of singlepairings and exhaustive pairings have elapsed with no improvements.
7.3.2 Quality, runtime and anytime analysis
Synthetic Data Generation. To compare our heuristic algorithm against the optimal solver, we created three sets of problem instances with varying sizes (small, medium, and large), all with solutions that CPLEX could obtain within acceptable time limits. The problem instances were created with the process described below. Initially, we determined the number of tasks by selecting from the set \(\{10,15,20\}\). Then, for each task \(\tau \), we randomly selected: the team size \(\textrm{s}_{\tau }\sim \mathcal {U}(2,3)\), the number of required competencies \(C_\tau  \sim \mathcal {U}(2,5)\), the set of \(C_\tau \) competencies from the ESCO ontology, and an importance weight for each competence \(c \in C_\tau \) as \(w_{\tau }(c)\sim \mathcal {N}\big (\mu =\mathcal {U}(0,1),\sigma {\sim }\mathcal {U}(0.01,0.1) \big )\).^{Footnote 7} After that, for each task \(\tau \), we created \(\textrm{s}_\tau \) agents so that each agent acquires competencies that either a required competence in \(C_\tau \) or a child node (in ESCO) of a required competence. In total, we evaluated solving 60 problem instances of varying sizes. Specifically, we considered three data families containing 20 instances each, where each problem instance involves (1) 10 tasks and on average \(\sim 24.5\) agents in the lowsize family, (2) 15 tasks and on average \(\sim 37.7\) agents in the middlesize family, and (3) 20 tasks and on average \(\sim 50.55\) agents in the largesize family.
Quality analysis. Figure 7 illustrates the quality of the solutions build with our algorithm, Edu2Com, across time. As quality, we consider the ratio of the overall competence affinity of a solution against the competence affinity of the optimal solution computed by CPLEX. In the figure, we see the average of over 20 problem instances per data family (lowsize, middlesize, largesize). As we can see, Edu2Com always reaches the optimal solution. That is, our algorithm exhibits quality 1 in all instances across every data family.
Runtime analysis. The greatest advantage of our heuristic algorithm is that it is much faster than CPLEX. Table 5 shows the time we can save with respect to CPLEX to reach optimality. Overall, using our heuristic can save from \({\sim }52\%\) to \({\sim }65\%\) time with respect to CPLEX. Specifically, for problem instances with 10 tasks (small scenario), we save \(\sim 56\%\) time wrt. CPLEX; for problem instances with 15 tasks (medium scenario), we save \(\sim 52\%\) time; and, for problem instances with 20 tasks (large scenario), we save \(\sim 65\%\) time wrt. CPLEX. We should note that the primary timeconsuming task for CPLEX is building the LP encoding the problem.
Anytime analysis. Our algorithm reaches highquality solutions (with quality above 0.8) in \(\sim 6.4\%\) to \(\sim 12.6\%\) of the time required by CPLEX to yield a solution. Let \(t_{opt}\) be the time in seconds that CPLEX needs to yield a solution. Edu2Com finds the first solution: (1) after \(2.25\cdot 10^{5}\cdot t_{opt}\) (=0.091) seconds with a quality \(32\%\) of the quality of the optimal in the smallsize scenario, (2) after \(3.31\cdot 10^{3}\cdot t_{opt}\) (=1.451) seconds and \(30\%\) in the mediumsize scenario, and (3) after \(1.35\cdot 10^{3}\cdot t_{opt}\) (=2.564) seconds and \(28\%\) in the largesize scenario. In Table 6, we provide the time needed by Edu2Com to reach a solution of a certain quality and the portion of time compared to \(t_{opt}\). Notably, our algorithm reaches a solution of quality above \(80\%\) in (1) \(0.064\cdot t_{opt}\) seconds in the smallsize scenario, (2) \(0.149\cdot t_{opt}\) in the mediumsize scenario, and (3) \(0.126\cdot t_{opt}\) seconds in the largesize scenario.
7.3.3 Solving realworld problems
In this empirical analysis section, we used realworld data. In more detail, we use 100 profiles of students and 50 internship programs, with the competencies described in the ESCO ontology [13]. There are 118 distinct competencies in the students’ profiles, each acquiring \(\sim 11.98\) competencies, while there are 34 distinct competencies in the internship programs’ description, each requiring 4 competencies (minimum 2, maximum 15).
This analysis aims to highlight the scalability of the problem as tasks’ required team sizes increase and assess Edu2Com on handling the problem. Notably, the actual data regarding the tasks (internship programs) did not specify the team size. Thus we synthetically populated the problem instances with specific team sizes. [28] points out that teams within classrooms shall have at most five members. As such, we considered problem instances where all tasks require the same team size, ranging in [2, 5]. Moreover, we considered problem instances with varying required team sizes. Specifically, the problem instances contained tasks requiring team sizes ranging in [2,3], [2,4], [2,5], [3,4], [3,5], and [4,5]. In each problem instance, the number of tasks requiring a specific team size is uniformly distributed across the team sizes in the corresponding range. As we discuss in Section 7.3.5, we cannot optimally solve instances with 100 agents, 50 tasks, and the aforementioned team sizes.
Analysis. In Fig. 8, we show our findings with respect to the time needed by our algorithm to converge to a solution. Each bar in Fig. 8 illustrates the average time (in minutes: seconds) over 20 problem instances per required team size. Here we highlight that Edu2Com converge to a solution in less than 50 minutes, especially in large settings that contain tasks requiring teams of size 5. Our experiments showed that Edu2Com needs less time to solve instances containing tasks requiring smaller team sizes—such a result is expected since the search space is much smaller in small teamsize instances. Specifically, we observe that solving a problem instance with team size 2 requires less time than a problem instance with team size 3, which in its turn requires less time than a problem instance with team size 4, which in its turn requires less time than a problem instance with team size 5. Moreover, notice the time needed for finding a solution in problem instances with tasks requiring team sizes in a range [a, b]. In such problem instances, the time needed to find a solution falls between the time Edu2Com needs to solve (a) problem instances requiring teams only of size either a or b, and (b) instances requiring teams in team sizes in ranges \([a,b1]\) and \([a+1,b]\). In the hardest scenario where all tasks’ required team size is size 5, Edu2Com yields a solution in (approximately) less than 50 minutes. Given that this process does not need to be performed in realtime with highly demanding time constraints (i.e., we do not need to come up with a solution within a few seconds), it is acceptable. Notably, the educational authorities spend much more working time matching students to internships since the current practice is to do it manually; while solving the problem optimally with the means of an LP is infeasible since we cannot even generate the LP in time. Hence, our findings confirm our algorithm’s feasibility in large problem instances and show that Edu2Com can handle the team allocation problem in this education scenario.
7.3.4 Validation by experts
In the last part of our empirical analysis, we focus on validating our algorithm by educational authorities with experience in allocating students to internship programs. To do so, we pitched Edu2Com against teachers with such experience—we refer to them as experts. In more detail, we consider a synthetic problem instance—similar to the actualworld problem instances we employed in Section 7.3.3—with 100 agents (students) and 50 tasks (internships) requiring teams of sizes 1, 2 and 3. The problem instance used here is the largest regarding the number of tasks we can generate with the 100 student profiles at hand. Notably, solving this problem optimally (e.g., with CPLEX) would require more than 1.8 million decision variables.
Then, we followed the process below. Given the synthetically generated problem instance: (1) an expert matches each internship with a student team by hand, (2) Edu2Com computes an allocation of students to internships, and (3) we randomly allocate a student team to each internship. Henceforth, \(g_\text {expert}\), \(g_\text {edu2com}\), and \(g_\text {random}\) stand for the allocations built with each method, respectively. Next, we task eight experts with experience in the allocating process to assess and compare the three allocations, ignoring by which method each allocated was built—we refer to these experts as evaluators. Notably, regarding the time needed to reach a solution, Edu2Com required less than 1 hour and 45 minutes to build an allocation. In contrast, the expert needed approximately the time of a working week, including studying, analysing and matching each internship with a student team.
Evaluation Process. We asked each evaluator to study and assess the three allocations. Specifically, the evaluators should mark the team allocated to each internship as follows: 1 for high quality, 2 for medium quality, and 3 for low quality. Moreover, the evaluators were allowed to mark with the same value teams assigned to the same internship that were produced by different methods in case they considered the teams were of the same quality
Handling missing data During the final analysis, we encountered a challenge of missing data where the expert could not form a team for every internship. Specifically, out of the 50 internships, the expert failed to provide a team for 13 of them, resulting in 23 students out of 100 being without internships. As a result, the evaluators had to work with incomplete data and provide incomplete evaluations. The absence of expert allocation for some tasks prevented the evaluators from rating all three allocation methods (\(g_\text {expert}\), \(g_\text {edu2com}\), and \(g_\text {random}\)). An auxiliary mark of 4 was used to signify the missing allocation, indicating absence, which was considered worse than a lowquality mark (mark 3). As a result, all evaluators marked any missing allocation with a four. Additionally, the evaluators missed marking the teams of some internships, resulting in lowquality marks (mark 3) for missing evaluations.
Analysis. Our analysis aims to determine the best method for allocating student teams to internships, founded on the evaluators’ assessments. For that, we consider the evaluation of the teams assigned to each internship as a tournament, while a tournament involves three rounds of competing pairs: (1) Edu2Com vs Expert; (2) Edu2Com vs Random; and (3) Expert vs Random.
The evaluators’ marks serve as the means for determining the winning allocation method for each round and, therefore, for each tournament. The allocation method that receives the higher aggregated mark in a round is declared the winner of that round and earns one point for the corresponding internship assignment. If in a round two internship assignments are of the same quality, i.e. there is a tie, both of their allocation methods receive half a point. By tallying the points earned from each round of the tournament, we utilise the Copeland\(_{\alpha }\) voting rule [49] (with \(\alpha = 0.5\)) to determine the overall winner. This voting rule is shown in [50] to be “resistant to all the standard types of (constructive) electoral control.” In a nutshell, the allocation method that accumulates the highest number of points throughout the three rounds is declared the tournament winner. Again, if there is a tie between two allocation methods, each one earns half a point. For instance, suppose that for a given tournament, 8 evaluators considered the heuristic algorithm (Edu2Com) the best assignment, 5 evaluators preferred the expert’s allocation to the random one, and 2 evaluators equally favoured both the expert and the random allocation. This would result in the heuristic algorithm earning 8 points, the expert’s allocation receiving 6 points, and the random allocation getting 2 points. Therefore, the tournament winner, in this case, would be the heuristic algorithm.
In each tournament, there can be three outcomes: (i) a single winner, (ii) a tie with two winners, or (ii) no winner. Our analysis shows that \(58\%\) out of 50 tournaments declared a single winner, \(34\%\) declared two winners in a tie, and \(8\%\) declared no winner. Figure 9 illustrates the tournament results that declared a single winner. As we can see, \(55.17\%\) of these tournaments were announced as the winning allocation method Edu2com, the heuristic algorithm, \(34.48\%\) of them were announced as the winning allocation method the human exert, while \(10.35\%\) of the tournaments were won by the random allocation method. As such, our heuristic algorithm, Edu2Com, was the preferred allocation method to allocate student teams to internships.
Now, in Fig. 10, we report the tournaments announcing a tie with two winners. As expected, the majority of the tournaments (\(52.92\%\)) declared a tie between the methods of our heuristic algorithm and the human expert. Overall, Edu2Com was announced as a winner in a tie in \(88.23\%\) of the tournaments. To summarise, our analysis indicates that expert evaluators deem our proposed heuristic algorithm as the method of choice to assign teams of students to internships.
7.3.5 Reasonable time limits for response and limitations of optimal solving
In Sections 7.3.27.3.4, we systematically evaluated our proposed heuristic algorithm, Edu2Com. First, we pitched Edu2Com against CPLEX, and then we challenged it to solve realworld problem instances. Notably, the problem we address here does not require a realtime solution within tight time limits. As we stated in Section 7.3.4, a human expert may require a whole week in order to arrange a realworld schoolwork alternation programme, namely to solve a realworld problem instance with around 100 students and 50 internships. As such, we believe that 24 hours is a reasonable computing time for solving the problem. As mentioned in Section 7.3.3, CPLEX cannot cope with realworld instances. The purpose of this section is to empirically characterise the instances that we can optimally solve, and hence the limits of CPLEX as requested by the reviewer. Figure 11 shows the configurations of problem instances (in terms of the number of agents, number of tasks, and team sizes) that CPLEX can optimally solve. Green squares show problem instance configurations that CPLEX managed to solve, whereas red squares show problem instance configurations that CPLEX could not solve (because it runs out of memory). The figure shows that CPLEX cannot handle realworld problem instances (with 100 agents and 50 internships) unless we limit team size to 2. This is very limiting because it forces us to team up agents in pairs, which is not realistic because, as pointed out in [28], educational scenarios require teams of sizes up to 5. As we increase the team size from 2 to 5, the range of configuration of problem instances that CPLEX can optimally solve dramatically decreases. We empirically observe that CPLEX did not manage to solve problem instances that lead to an LP containing more than around \(8\cdot 10^5\) decision variables. Figure 11 characterises the frontier of configurations of problem instances that CPLEX can optimally solve.^{Footnote 8} We highlight that the solving time depends on (1) the number of agents, (2) the number of tasks, and (3) the required team sizes (by all tasks), with the number of agents and team size being the most influential.
8 Privacy and responsible AI
From the inception of the AI4EU project, the AI4Citizen pilot had a special duty to address privacy and responsible AI thoughtfully. These topics are critical for accepting AIbased services run by governments and government agencies and reducing the fear of dystopian scenarios. The SWA case study itself is particularly demanding since the system needs to interact with minors, which are considered vulnerable by the GDPR [51]. Supporting the decision of the recommendation for internships is also critical and, albeit different, closely related to the highrisk scenarios identified in the recent proposed European Artificial Intelligence Act [52] shown in Fig. 12 concerning vocational training and employment.
During the design phase, considering the GDPR and in an attempt to comply with the existing contract between FBK and the region of Trento, we designed our solution to minimise the personal information exchanged between the system components. For example, the match between students and internships is done by a component hosted within the FBK network. Similarly, the matching algorithm only manipulates a set of preferences without access to students’ names or other PII. The information exchanged by the components is clearly described by a set of public APIs avoiding direct access to the underlying database. Finally, communication is done over secure communication channels with client authentication.
For the usage of AI, we analysed the intent of the automation and the impact on the two types of users  students and professors  of the solution. For students, our solution gathers the students’ preferences by recommending a set of proposals based on the tagging done by the school and automated mapping to the job ontology, the fit between the individuals’ offered skills and the internship required skills (provided as information). At the same time, nothing precludes a student from choosing an internship that exhibits a lousy fit. Since we lack feedback information (see Section 9), we, unfortunately, cannot currently provide further recommendations based on former students’ experiences. The professor ultimately does the practical assignment of students to internships. Our system—through the Edu2Com component— provides a recommendation to rely on based on the preferences expressed by the different students and their acquired skills, mimicking the professor’s behaviour but exploring the different combinations exhaustively. Still, to support the professor in taking an informed decision, we provide metrics that capture the overall solution fit, the group fit and the individual fit (see Section 6 for details). In the last version of the system, we also introduced the possibility for the professor to challenge certain choices by asking targeted questions.
To crosscheck the soundness of this approach, we evaluated our solution  each component and the global solution  against the preliminary questionnaire provided by the AI4EU ethical experts mainly inspired by the ALTAI guidelines [53]. We did not detect violations of these principles, and all components were below the highrisk level threshold. This collaboration contributed to the evolution of the assessment questionnaire later published in [54].
9 Perspectives for an actualworld deployment
In order to better understand the potential business value of the AI4Citizen scenario, we decided to use the DataDriven Innovation Framework (DDI) [55] to help us develop a consistent strategy and explore all the dimensions of our use case. The DDI Canvas was developed as a guide in exploring all relevant dimensions on the supply and demand side of datadriven innovation in a systematic manner. Using such a methodology in a research project is unusual but was vital to understand better (1) the data required to realise a full fledge version of our proof of concept, (2) to identify the network of actors required to achieve this vision and (3) to discover some avenues to make such software sustainable. As seen in Fig. 13, we identified the primary proposal value for three personas and the data sources to implement our scenario best.
Concerning data, the main gap was the lack of students’ feedback on former internships to enhance the advice provided by the system. Similarly, collecting more data was needed in order to improve the process itself and be able to capitalise on past students’ interviews. We strongly recommend that the institutes systematically collect such information and eventually share this data (upon anonymisation) across institutes; this will be a unique way to propose better support to students.
Concerning the ecosystem, while the institutes and the Ministry of Education are the main actors, we believe that the students may benefit from opening up to the business social network, such as LinkedIn, Twitter, and Glassdoor, reflecting the current practices in the business world in alignment with the rationales behind the SWA. While this can only apply to 18 years old students, this will enable them to start building a professional presence and business network, sharing insights on their experience and eventually getting acknowledgements from their mentors or people from companies. With the proper connectors, this information can also be a way to generate information that can also be retrofitted to the system to enrich a reputation system.
Besides education, we have found further applications of Edu2com, our novel algorithm, to group students into teams. Thus, we have adapted Edu2Com to support team formation in open innovation and team formation for social impact tasks. On the one hand, we have employed Edu2Com to form teams to tackle innovation challenges (posed as tasks) in the realm of the H2020 Crowd4SDG project [56] (supporting the events of GEAR Cycle 2 and GEAR Cycle 3). On the other hand, in collaboration with YOMAAfrica [57], we have employed Edu2Com to form teams of young volunteers to carry out social impact tasks. More precisely, we have used Edu2Com to form teams that undertake community cleanup missions^{Footnote 9}, matching each team with a location to collectively clean.
Finally, the cost to deploy, maintain and enrich such systems may benefit from a modern approach that leverages both the institutions and other actors. While one can argue that it is against educative system ethics, a tooconservative approach and the impact on resources to develop intelligent software may pave the way for private companies that will propose orientation services that will be accessible only to privileged students.
10 Conclusions
This paper focuses on providing AIbased decision support tools for the SchoolWork Alternation programme. Our technology, the AI4Citizen pilot, is meant to facilitate the programme’s implementation, which is called to support the development of skills for the employability of new generations. Our technology is meant to serve two primary purposes. First, we expect a dramatic reduction in the time required to allocate students to internships. This is because, nowadays, implementing the SWA programme requires much manual intervention from public servants in schools. Second, our technology has been conceived to foster innovation in the programme. This is because our AIbased system allows teams to be allocated to internships, enabling teambased learning and the acquisition of teamwork skills. Since one of the aims of the SWA programme is to spur the development of students’ soft skills [2], working in teams would foster the learning of the soft skills that are considered as crucial for collaboration and teamwork [11].
From an applied AI perspective, the AI4Citizen pilot results from pipelining a variety of AI technologies: (1) NLP algorithms for extracting competencies and skills offered by students in their curricula and requested by companies in their internship offers, and for, later on, matching them; (2) a chatbot to assist students in getting acquainted with the internships offered and select the ones they like best; (3) a novel algorithm to group students into teams to allocate them to the internships on offer. Interestingly, the development of each one of the AI technologies in the AI4Citizen pilot results from a different approach. First, the development of our NLP algorithms is founded on stateoftheart NLP techniques. Second, the development of our chatbot employs commercial chatbot technology. Third, our algorithm to match teams to internships called for novel research contributions. Finally, we conducted an exhaustive empirical analysis of each one of the AI components in the AI4Citizen pilot. Our results indicate that the AI4Citizen pilot is ready for deployment in a realworld setting.
We envision several paths to future research. First, since our results indicate that our pilot is ready for deployment, FBK will investigate plans to gradually integrate its components into their current SWA application, starting with the Competencies and Skills extraction component.
Second, we would like to enrich our experiments along two directions that were planned at the outset of the AI4EU project but were not possible because of COVID19. On the one hand, along the lines of [10], we would like to compare the performance at work of the teams composed by our algorithm versus teams composed by experts to quantify the learning benefits of our approach. On the other hand, it would be interesting to explore the employment of a coaching chatbot that will engage students proactively in an eventdriven way (e.g., an internship of interest became available, a deadline is approaching, an achievement improved the student’s profile) also tackling the following phases of SWA (e.g., internships, feedback collection).
Third, we would like to explore and consider objectives beyond competencies and preference satisfaction when matching student teams to internship programs. In particular, we may consider objectives that the literature in Psychology and Social Sciences has identified to benefit teamwork. Along this direction, an interesting, additional objective to consider is the diversity of personalities within teams. As shown in [5, 58], teams with diverse personalities are more effective and perform better (as a team). Another important objective positively affecting teamwork is social cohesion [59, 60]. Social cohesion refers to the cohesion of relations between team members. High social cohesion within a team influences the team’s performance positively. Similarly, we could consider maximising the consensus among team members or minimising conflicts and disagreements [61]. Furthermore, many educational institutions and organisations try to form diverse teams concerning nationality or cultural backgrounds to promote cultural exchange.
Finally, a very interesting line of research is that of building contrastive explanations [62] that help explain the team formation algorithm’s decisions to students and those in charge of running the programme.
Data availability
The data used in the experiment Section 7.3.2 is available from the corresponding author upon request. The students’ data we use in the rest of the experiments cannot be disclosed due to GDPR issues.
Change history
04 November 2023
A Correction to this paper has been published: https://doi.org/10.1007/s1048902305099x
Notes
The precise number of hours depends on the schools’ typologies.
For more information visit https://www.vivoscuola.it/
Note: we use subscript \(\tau \) to refer to the required competencies, relative importance, and team size of task \(\tau \in T\). Similarly we will use subscript a to distinguish the elements describing an agent \(a \in A\).
Note that for some specific FCAF \(g\in G\), computing the T optimal FCAFs corresponds to solving T independent optimisation problems.
There are common tasks in Natural Language Processing: (i) the tokenisation splits the text into words; (ii) the sentence splitting phase tries to understand, using a mix of heuristics and machine learning, where sentence boundaries are; (iii) the lemmatisation provide the base lemma for the words, a task which difficulty depends on the language, and it is particularly hard for Italian.
Note that any weight sampled below 0 is considered as 0, and any weight above 1 is considered as 1.
In the Appendix C we include the frontier of configurations of problem instances along with the number of decision variables involved in the corresponding LPs and required solving time.
See the public call published by Yoma Africa on Twitter at http://bitly.ws/DUPp.
References
Commission, E. A strategy for smart, sustainable and inclusive growth; Technical Report; European Commission, January. https://eurlex.europa.eu/legalcontent/EN/TXT/PDF/?uri=CELEX:52010DC2020 &from=EN
Tino, C.; Grion, V. Schoolwork alternation: Teachers’ voices about the development and the assessment process of soft skills, In Trends in vocational education and training research. Proceedings of the European Conference on Educational Research (ECER), Vocational Education and Training Network (VETNET), 2018; pp 339–347
Wilde DJ (2009) Teamology: The Construction and Organization of Effective Teams. SpringerVerlag, London
Wilde, D. PostJungian Personality Theory for Individuals and Teams; SYDROSE LP, 2013
Andrejczuk, E.; Bistaffa, F.; Blum, C.; RodríguezAguilar, J.A.; Sierra, C. Synergistic team composition: A computational approach to foster diversity in teams, KnowledgeBased Systems 2019, 182 (104799)
Layton, R.A.; Loughry, M.L.; Ohland, M.W.; Ricco, G.D. Design and Validation of a WebBased System for Assigning Members to Teams Using InstructorSpecified Criteria., Advances in Engineering Education 2010, 2 (1), n1
Alberola JM, Del Val E, SanchezAnguix V, Palomares A, Teruel MD (2016) An artificial intelligence tool for heterogeneous team formation in the classroom. KnowledgeBased Systems 101:1–14
Purdue Univerity. CATME Smarter Teamwork. https://www.catme.org
IIIACSIC. Build teams with a diversity of genres, personalities and intelligences for more efficient collaboration. https://eduteams.iiia.csic.es/
Andrejczuk E, RodríguezAguilar JA, Sierra C, Roig C, ParejoRomero Y (2018) Don’t leave anyone behind: Achieving team performance through diversity. In. IEEE Frontiers in Education Conference (FIE). IEEE 2018:1–9
Gibert A, Tozer WC, Westoby M (2017) Teamwork, Soft Skills, and Research Training. Trends in Ecology & Evolution 32(2):81–84
Cordis EU research results. A European AI On Demand Platform and Ecosystem (AI4EU). Grant agreement ID: 825619 https://cordis.europa.eu/project/id/825619
ESCO. European Skills, Competences, qualifications and Occupations, https://ec.europa.eu/esco/portal, 2010
IBM. IBM Ilog CPLEX Optimization Studio 12.10, https://www.ibm.com/usen/marketplace/ibmilogcplex
Cooper, A. The inmates are running the asylum. In SoftwareErgonomie’99; Springer, 1999; pp 17–17
Bojanowski, P.; Grave, E.; Joulin, A.; Mikolov, T. Enriching Word Vectors with Subword Information, arXiv preprint arXiv:1607.046062016
Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space, arXiv preprint arXiv:1301.37812013
FBK. Competence and Skill Extraction component. Github repository https://github.com/dhfbk/documentclassification
SAP. SAP Conversational AI  Low Code Chatbot Building Platform. https://cai.tools.sap
Wolf MJ, Miller K, Grodzinsky FS (2017) Why We Should Have Seen That Coming: Comments on Microsoft’s Tay Experiment, and Wider Implications. SIGCAS Comput. Soc. 47(3):54–64
Capezzuto, L.; Tarapore, D.; Ramchurn, S.D. Anytime and Efficient Coalition Formation with Spatial and Temporal Constraints, 2020
Ponda, S.S.; Johnson, L.B.; Geramifard, A.; How, J.P. Cooperative Mission Planning for MultiUAV Teams, Valavanis, K.P., Vachtsevanos, G.J., Eds.; Springer Netherlands: Dordrecht, 2015; pp 1447–1490
Anagnostopoulos, A.; Becchetti, L.; Castillo, C.; Gionis, A.; Leonardi, S. Power in Unity: Forming Teams in LargeScale Community Systems, In , 01, 2010; pp 599–608
Lappas, T.; Liu, K.; Terzi, E. Finding a Team of Experts in Social Networks, In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA; KDD ’09; Association for Computing Machinery, 2009; p 467476
Anagnostopoulos, A.; Becchetti, L.; Castillo, C.; Gionis, A.; Leonardi, S. Online Team Formation in Social Networks, In Proceedings of the 21st International Conference on World Wide Web, New York, NY, USA; WWW ’12; Association for Computing Machinery, 2012; p 839848
Crawford, C.; Rahaman, Z.; Sen, S. Evaluating the Efficiency of Robust Team Formation Algorithms, In Autonomous Agents and Multiagent Systems, Cham; Osman, N., Sierra, C., Eds.; Springer International Publishing, 2016; pp 14–29
Andrejczuk, E. Artificial intelligence methods to support people management in organisations. Ph.D. Thesis, Artificial Intelligence Research Institute, CSIC, May, 2018
Andrejczuk E, Berger R, RodríguezAguilar JA, Sierra C, MarínPuchades V (2018) The composition and formation of effective teams: computer science meets organizational psychology. Knowledge Eng. Review 33:e17
Bachrach, Y.; Meir, R.; Jung, K.; Kohli, P. Coalitional Structure Generation in Skill Games, In , 01, 2010; Vol. 2
Czatnecki, E.; Dutta, A. Hedonic Coalition Formation for Task Allocation with Heterogeneous Robots, In 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), 2019; pp 1024–1029
Präntare F, Heintz F (2020) An anytime algorithm for optimal simultaneous coalition structure generation and assignment. Autonomous Agents and MultiAgent Systems 34(1):29
Chalkiadakis G, Boutilier C (2012) Sequentially optimal repeated coalition formation under uncertainty. Autonomous Agents and MultiAgent Systems 24(3):441–484
O*NET. National Center for O*NET Development, www.onetonline.org/, 1990
SFIA. Skills Framework for the Information Age, https://ec.europa.eu/esco/portal, 2020
ISFOL. Professioni, Occupazione, Fabbisogni, https://fabbisogni.isfol.it, 2017
Li Y, Bandar ZA, Mclean D (2003) An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering 15(4):871–882
Osman, N.; Sierra, C.; Mcneill, F.; Pane, J.; Debenham, J. Trust and Matching Algorithms for Selecting Suitable Agents, ACM Trans. Intell. Syst. Technol. 2014, 5 (1), 16:1–16:39
Cohen SG, Bailey DE (1997) What makes teams work: Group effectiveness research from the shop floor to the executive suite. Journal of management 23(3):239–290
Kurtan, C.; Yolum, P.; Dastani, M. An Ideal Team Is More than a Team of Ideal Agents, In ECAI, 2020
Sandholm, T.; Suri, S.; Gilpin, A.; Levine, D. Winner Determination in Combinatorial Auction Generalizations, In Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems: Part 1, New York, NY, USA; AAMAS ’02; Association for Computing Machinery, 2002; p 6976
GUROBI. Gurobi Optimizer 8.0, https://www.gurobi.com/
GLPK. GLPK: GNU Linear Programming Kit), https://www.gnu.org/software/glpk/
Gamrath, G.; Anderson, D.; Bestuzheva, K.; Chen, W.K.; Eifler, L.; Gasse, M.; Gemander, P.; Gleixner, A.; Gottwald, L.; Halbig, K.; et al. The SCIP Optimization Suite 7.0; Technical Report; Optimization Online, March, 2020
Morrison RW, De Jong KA (2002) Measurement of Population Diversity. In: Collet P, Fonlupt C, Hao JK, Lutton E, Schoenauer M (eds) Artificial Evolution, Berlin, Heidelberg. Springer, Berlin Heidelberg, pp 31–41
Palmero Aprosio, A.; Moretti, G. Tint 2.0: an Allinclusive Suite for NLP in Italian 2018, 10, 12
Joulin, A.; Grave, E.; Bojanowski, P.; Mikolov, T. Bag of Tricks for Efficient Text Classification, arXiv preprint arXiv:1607.017592016
Jones KS (1972) A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28:11–21
Zhou T, Wang Y, Zheng X (2020) Chinese text classification method using FastText and term frequencyinverse document frequency optimization 1693:012121
Conitzer, V.; Sandholm, T. Complexity of Manipulating Elections with Few Candidates, In Eighteenth National Conference on Artificial Intelligence, USA; American Association for Artificial Intelligence, 2002; p 314319
Faliszewski, P.; Hemaspaandra, E.; Hemaspaandra, L.A.; Rothe, J. Llull and Copeland Voting Broadly Resist Bribery and Control, In ; AAAI’07; AAAI Press, 2007; p 724730
Council of European Union. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation), 2016. https://eurlex.europa.eu/legalcontent/EN/TXT/?uri=CELEX%3A32016R0679 &qid=1639589889181
Proposal for a Regulation laying down harmonised rules on artificial intelligence (Artificial Intelligence Act), May, 2021. https://digitalstrategy.ec.europa.eu/en/library/proposalregulationlayingdownharmonisedrulesartificialintelligenceartificialintelligence
“HighLevel Expert Group on Artificial Intelligence”. Assessment List for Trustworthy Artificial Intelligence, July, 2020. https://digitalstrategy.ec.europa.eu/en/library/assessmentlisttrustworthyartificialintelligencealtaiselfassessment
Dignum, V.; Nieves, J.C.; Theodorou, A.; Tubella, A.A. An abbreviated assessment list to support the Responsible Development and Use of AI, April, 2021
Zillner, S. Innovation in Times of Big Data and AI: Introducing the DataDriven Innovation (DDI) Framework, Curry, E., Metzger, A., Zillner, S., Pazzaglia, J.C., García Robles, A., Eds.; Springer International Publishing: Cham, 2021; pp 289–310
Crowd4SDG. Crowdsourcing for Sustainable Development Goals: a Data Repository and a Toolbox for CitizenScience Communities. A Horizon 2020 Research and Innovation Action, https://crowd4sdg.eu/. Last Accessed: 20230501
YOMA. YOMA: an online marketplace for youth which provides opportunities to develop skills, engage with a community and access employment., https://www.yoma.africa/. Last Accessed: 20230501
Wilde, D.J.; et al. Teamology: the construction and organization of effective teams; Springer, 2009; Vol. 3
Moreno JL, Jennings HH (1938) Statistics of Social Configurations. Sociometry 1(3/4):342–374
BallesterosPerez P, GonzálezCruz MC, FernándezDiego M (2012) Human resource allocation management in multiple projects using sociometric techniques. International Journal of Project Management 30(8):901–913
Fiol CM (1994) Consensus, diversity, and learning in organizations. Organization science 5(3):403–420
Miller T (2019) Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence 267:1–38
Holmes, S.; Moorhead, A.; Bond, R.; Zheng, H.; Coates, V.; McTear, M. Usability testing of a healthcare chatbot: Can we use conventional methods to assess conversational user interfaces?, In Proceedings of the 31st European Conference on Cognitive Ergonomics, 2019; pp 207–214
Brooke, J. Sus: a “quick and dirty’usability, Usability evaluation in industry 1996, 189
Georgara, A.; RodríguezAguilar, J.A.; Sierra, C. Towards a CompetenceBased Approach to Allocate Teams to Tasks, In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, Richland, SC; AAMAS ’21; International Foundation for Autonomous Agents and Multiagent Systems, 2021; p 1504–1506
Funding
Open Access funding provided thanks to the CRUECSIC agreement with Springer Nature. The research leading to these results received funding from project AI4EU (H2020825619). Athina Georgara has received funding from Generalitat de Catalunya under Grant 2019DI17; and has received research support from company Enzyme Advising Group. Furthermore, partial financial support was received from projects: CISUSTAIN (PID2019104156GBI00); Crowd4SDG (H2020872944); TAILOR (H2020952215); VAE (TED2021131295BC31), funded by MCIN/AEI /10.13039/501100011033 and by the “European Union NextGenerationEU/PRTR; VALAWAI (Horizon Europe #101070930); and 2021 SGR 00754.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this article was revised: The author name of Prof. Aprosio was incorrectly misspelled. The correct name is “Aprosio”. However, we captured the “Approsio”.
Appendices
Appendices
Appendix A. Our questionnaire
In this appendix, we describe the questionnaire we used for evaluating the usability of our chatbot. This questionnaire consisted of the Chatbot Usability Questionnaire (CUQ) plus four open questions. CUQ is a questionnaire specifically designed for measuring the usability of chatbots by an interdisciplinary team from the Ulster University [63]. Chatbots are conversationdriven systems. The most popular usability metrics, such as the System Usability Scale (SUS) [64], may not be suitable for evaluating their usability. However, the CUQ is designed to be comparable to SUS. They are both based on the rating scale proposed in Table 7.
The sixteen questions in the CUQ are shown in Fig. 14.
The four open questions we entered in our questionnaire are the following:
 Q17.:

Was it easy to solve the task?
 Q18.:

What did you like about the chatbot?
 Q19.:

What did you not like about the chatbot?
 Q20.:

Do you have any suggestions for improving the chatbot?
Appendix B. Theorems and proofs
Theorem 6.5
The NonOverlapping Many Teams to Many Tasks (NOMTMT) allocation problem is \(\mathcal{N}\mathcal{P}\)hard.
Proof
We can decide whether a given solution is feasible in polynomial time (\(\mathcal {O}(\nu )\) where \(\nu = \sum _{\tau \in T} \textrm{s}_\tau \)). We now show that the problem is \(\mathcal{N}\mathcal{P}\)hardby using a reduction from the binary combinatorial auction winner determination problem (BCAWDP) for singleunit auctions, which is shown to be \(\mathcal{N}\mathcal{P}\)hard [40]. An instance of BCAWDP for singleunit auctions is given by a set of items \(Items = \{1,2,\cdots , n\}\), and a set of bids \(Bids = \{\textbf{B}_1,\textbf{B}_2,\cdots ,\textbf{B}_x\}\). A single bid \(\textbf{B}_j = (S_j,p_j)\) consists of a subset of items \(S_j \subseteq Items\), and a price \(p_j\ge 0\) that the buyer is willing to pay in order to get the items in \(S_j\). Given an instance of BCAWDP, we construct an instance of the NOMTMT allocation problem as follows: “For each item in Items we create a dummy agent. For each set of bids over \(\kappa \) items, we define the set \(\mathcal {B}_\kappa = \{\textbf{B}_j \in Bids : S_j=\kappa \}\). From \(\mathcal {B}_\kappa \), we create \(\mathcal {B}_\kappa \) tasks of size \(\kappa \). Moreover, given a bid \(\textbf{B}_j \in \mathcal {B}_\kappa \), its items \(S_j\) make a candidate team for all tasks of size \(\kappa \) with matching quality \(p_j\). In case there are more than one (e.g., \(\xi >1\)) bids over the very same items offering different prices, since \(\xi \le \mathcal {B}_\kappa \), we allow each distinct price be the matching quality of the set for exactly one task, while only the highest price can be the matching quality of the team for more than one tasks. Any set of \(\kappa \) items that do not appear in the bids, are assumed to have a zero matching quality.” Note that the above corresponds to an NOMTMT instance where the number of available agents are less than the number of required agents by all the tasks at hand, as described in [65]. The NOMTMT allocation problem has a feasible solution if and only if BCAWDP has a solution, which is shown to be \(\mathcal{N}\mathcal{P}\)hard. \(\square \)
Typically, the winner determination problem for combinatorial auctions can be cast and solved as a linear program.
Appendix C. Frontier of problem instance configurations that CPLEX can solve
The required time is average over five problem instances per combination \(\langle \)number of tasks, number of agents, team size\(\rangle \) (Table 8).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Georgara, A., Kazhamiakin, R., Mich, O. et al. The AI4Citizen pilot: Pipelining AIbased technologies to support schoolwork alternation programmes. Appl Intell 53, 24157–24186 (2023). https://doi.org/10.1007/s10489023047583
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489023047583