1 Introduction

Soccer, or football as it is commonly known, is an almost universally popular team sport [1]. Player selection is a crucial aspect of winning a game; therefore, making a poor choice could cost the team the championship or even millions of dollars if the players underperform [1]. Thus, determining the starting lineup is a crucial and essential aspect of football matches. In order to win a game, coaches must consider various factors and select the best players for the designated lineup [2, 3]. The formation of a team is a critical decision for coaches in team sports, particularly in football. It involves determining the positional setup of the team at different moments of the match, and it's one of the most important tactical choices coaches make [4]. In a multiplayer game like football, the player selection and team composition process aims to choose the best player for each position and role [5]. During a football match, a team's performance can be evaluated based on the interaction between various technical, tactical, mental, physical and interaction factors [6,7,8]. It is essential to consider numerous factors when selecting players and forming teams [9]. To excel in team sports, one must have the ability to select and organize the most productive players [10]. Consequently, player selection and team formation strategies are of greater importance in contemporary sports [11]. Researchers have suggested that the skills required for different positions in football should be determined based on the specific duties of each player on the field [12, 13].

There are typically more than 20 players on a soccer squad (three goalkeepers are excluded), and the coach selects 10 of them to start each game [1]. Coaches must have a high level of expertise and knowledge of their players in order to select the optimal group of players for each match and position each player appropriately. This step is an important part of the coach's plan and a subject of football tactical analysis.

Currently, machine learning is utilized to solve numerous issues. In recent years, sports-related topics have captured the attention of researchers in this field. The majority of these studies have focused on predicting the results of matches. In the field of tactical analysis, football coaches and analysts are still confronted with a number of unresolved issues, and intelligent software is still required to address some of these problems. Intelligent systems provide new opportunities for professional sports [14].

By a temporal point of view, Vroonen et al. predicted players' potential by comparing their attributes to those of similar historical players [15].

Zhao et al. considered the player assessment as a multi-objective optimization problem and examined teams and football players by considering three attributes, one of which was their potential [16].

Multi-Criteria Decision-Making (MCDM) methods help researchers make decisions with multiple criteria in various fields [17, 18]. The AHP (Analytic Hierarchy Process) is used to rank player attributes for different positions, while TOPSIS (Technique for Order of Preference by Similarity to Ideal Solution) is a decision-making method used to rank alternative solutions. These methods have clear mathematical principles and have been widely adopted in various fields. Following this breakthrough, a wide range of methods has been developed, one of which is TOPSIS-IPA [19].

The old solutions have limitations because they cannot handle complex interactions and are difficult to compute. Recent works have used neural architectures to solve this problem by mapping skills to experts [20]. Uzochukwu and Enyindah figured out that neural network model performance is promising for developing a player selection model [21]. Additionally, it was demonstrated that neural network models are capable of distinguishing forward players [22].

Using fuzzy logic, researchers investigate the player selection and team formation problem [1]. This study examines these issues and introduces weights for the effectiveness of each player's characteristic at the three forward, midfielder, and defender positions. Another study identifies the best players for the forward, midfielder, and defender positions in a specified lineup by combining the the weights obtained by fuzzy logic with machine learning techniques [9]. It does not differentiate between positions located in the same area; for instance, it presents a group of defensive line players without distinguishing between right and left defenders. Therefore, the coach is in charge of the positioning of the players on the field. This article demonstrates the importance of the weights obtained by fuzzy logic in addressing this issue.

Another point of view is using GPS data to address the issue of player positioning on the field and to determine the optimal position [23]. This approach produces favorable results for some positions but is ineffective for others.

In order to solve the team formation problem, Karakaya et al. used the implementation of machine learning methods on the data obtained from the Internet of Things [24].

Yu et al. used a network structure to model the communication between players to provide the appropriate team formation [25].

García-Aliaga et al. reached interesting results about the possibility of determining the position of the players by the technical-tactical characteristics by the dimensionality reduction methods and rule extraction algorithms [26].

A study with graph theory approach utilizes the Hungarian algorithm [27], which is a strategy for solving the Maximum Bipartite Matching problem in polynomial time, to determine the optimal lineup for a baseball team [28]. There is a concept in graph theory known as "Maximum Bipartite Matching"; One of its applications involves problems in which a group of applicants (such as job candidates) is matched with a number of available positions (such as job positions) in such a way that each position is filled by a candidate and the profit from this matching is maximized. Baseball experts developed a scoring for players that is used to evaluate each player's skill at each position.

The football problem can be solved by converting the problem space into a bipartite graph, as described. As far as we know, this method has never been utilized in football analysis.

In the Material and dataset section, we explain how we obtained data and what are the research steps. In the dataset subsection, the dataset source and its preparation steps are described. In the methodology subsection, the research ideas implemented in this article are explained in three consecutive steps. In the results section, the results related to the steps stated in the methodology section have been presented and reviewed in three subsections. In the discussion section, the introduced model for scoring the players is compared with some classic machine learning models (linear regression, support vector regression, and random forest). Also, the ability of ITFPS to select the starting lineup has been examined. In the conclusion section, we summarize the study and state the limitations of the research and pointe out to future research ideas.

2 Materials and methods

2.1 Dataset

This study utilized the sofifa dataset, which is accessible and is published annually [29]. This dataset includes the majority of adult football players in the world, along with their scores in various soccer-related skills (such as shooting, passing, ball control, etc.), information on the player's contract, etc. FIFA experts (scouts) collected the features in the dataset. They carefully monitor the performance of the players throughout the season and assign a score between 1 and 100 to each player's distinct features. For instance, the score assigned to a player's dribbling ability is based on the number of dribbles he/she has executed during the relevant season. We utilised the FIFA dataset, which is renowned for accurately capturing classic football player features. This annual dataset includes a wealth of features and a large number of players from around the world. Frequently, other datasets lack the same depth and scope. This dataset contains a large number of players, which improves the generalizability and practical applicability of the models trained on it.The dataset has a total of 110 columns. Since the purpose of this research is to evaluate players solely on the basis of their physical and technical abilities and characteristics, we only considered variables related to the player's technical abilities and their physical variables. 44 of the 110 variables in this dataset are associated with technical skills, player age, player height, and player weight. The other variables were not considered and were omitted. It should be noted that in this study, only male players are considered.

One can access and download the archive of this dataset from the 2014–2015 season to the 2021–2022 season [30]. The dataset associated with the 2016–2017 season and later has been used in this study. Since 2016, several advantageous variables that were not included in the dataset in previous years have also been added. Each player in this dataset is assigned a club position for each season that corresponds to the position in which he/she appeared most frequently during that season. There are 29 positions like this: LWB, LB, LCB, CB, RCB, RB, RWB, LM, LCM, CM, RCM, RM, LDM, CDM, RDM, LAM, CAM, RAM, LW, RW, LS, ST, RS, LF, CF, RF, GK, RES, SUB (Left Wing Back, Left Back, Left Center Back, Center Back, Right Center Back, Right Back, Right Wing Back, Left Midfielder, Left Center Midfielder, Central Midfielder, Right Center Midfielder, Right Midfielder, Left Defensive Midfielder, Central Defensive Midfielder, Right Defensive Midfielder, Left Attacking Midfielder, Central Attacking Midfielder, Right Attacking Midfielder, Left Winger, Right Winger, Left Striker, Striker, Right Striker, Left Forward, Central Forward, Right Forward, Goalkeeper, Reserve, Substitute). It should be noted that the positions of a few players have not been documented. This study did not include goalkeepers, substitutes, reserves, and players whose positions were not recorded.

Five positions, LF, CF, RF, LAM, and RAM, each had fewer than 100 players in the world in the dataset collected from datasets from six consecutive seasons. This issue is a result of the change in football position terminology that has taken place in recent years. After omitting these positions from this study, we reached a total of 21 standard positions. Since a soccer player appears in the dataset over multiple years, his/her most recent season is used as a benchmark. Consequently, the final dataset includes data from 18,034 players and 48 attributes. Table 1 shows the names and brief descriptions of the variables used in this study.

Table 1 Descriptions of variables extracted from the FIFA dataset

As shown in Table 1, the dataset contains a variable titled 'overall' which is calculated by EA Sports and assigns each player a score out of 100 indicating how well-suited the player is to play at club position. Additionally, in the separate columns of the original dataset, for the other positions, each player has been assigned a score out of 100 indicating their suitability for those positions. In the results section, we will contrast these columns with ITFPS results.

2.2 Methodology

Using deep neural network models, we determine the suitability of each player for each position. Using the Hungarian method, the optimal team formation is then determined. This method contains some flaws. It is extremely rigid because a football club usually has more than 20 players, but only 10 of them are used in the optimal formation. The third step involves designing a System of Distinct Representatives (SDR) to address this issue and based on the outcome of the Hungarian method. SDR is a mathematical concept in set theory that deals with selecting representatives from a collection of sets. Given a collection of sets, the SDR problem seeks to identify a unique representative for each set such that no two representatives are identical. The Hall condition theorem, also referred to as the Marriage theorem, provides a sufficient and necessary condition for the existence of an SDR [31]. An SDR exists for a collection of sets according to the theorem if and only if the Hall condition holds, which states that for each subset of sets, the number of elements in their union is greater than or equal to the number of sets in the subset. The SDR and the Hall condition theorem have applications in numerous fields, such as combinatorics, graph theory, and matching problems, and provide a theoretical basis for solving problems involving distinct representative selections from sets. Using the results of the Hungarian method, we present a group of candidates as suitable candidates for each position; an SDR of these groups can be considered for determining the coach's team formation. Compared to the maximum matching approach, this attitude has the advantages of being less rigid, more realistic, and making full use of the team's capabilities. Typically, Hall's condition must be satisfied to demonstrate that an SDR can be obtained for a set of groups [31]. Since the output of the Hungarian algorithm is an SDR of these groups, the Hall condition doesn't need to be examined, and the SDR is always available.

Several metrics are established to evaluate the performance of ITFPS. We compare and evaluate the results of ITFPS with the decisions made by the coaches of Manchester City, Liverpool, Newcastle United, and Leeds United in the English Premier League during the 2021–2022 season. We think there are significant similarities between ITFPS and the choices made by well-known and respected coaches.

ITFPS is made up of three main steps, which are each explained in more detail below.

2.2.1 First step: Allocating scores for each position to a player

In this phase, we assign each player a score between 1 and 100 based on his/her suitability for a variety of football positions, utilizing football player characteristics. Each player's overall score is calculated using a linear relationship between his/her skills and characteristics scores [32]. Clearly, position affects the weights in this linear connection. To our knowledge, the weights of these linear relationships have never been explicitly reported by positions. It is important to note that in calculating these scores, factors other than the physical and technical ability of the players, such as their global fame, are considered [32]. We omitted these variables because we wanted to be able to precisely calculate these scores for each player across all positions based on just their technical and physical attributes. The objective of this step is to generate a vector of scores with 21 values for each player across all positions. There must be a significant difference between a player's scores at the position he/she plays and the positions he/she never plays.

80% of the players at each position are used as training data, while 20% have been used as testing data. The number of players participating in training and testing are detailed in Table 2.

Table 2 Number of players in training and testing by position

As shown in Table 2, the amount of data varies significantly at various positions. Certain positions are only utilized in lineups that are not chosen frequently.

We employ deep neural network modeling to estimate the scores. In the Discussion section, the linear regression model, support vector machine, and random forest and the reasons for not using them will be discussed.

Three of the player characteristics—body_type, work_rate, and preferred_foot—are discrete variables that are entered into the model using the one-hot encoding technique. These variables are used to generate 9, 8, and 1 binary indicator variables respectively. Other variables are ordinal discrete, or numeric variables.

In this study, the deep neural networks had the following general structure: an input layer with 59 neurons, three hidden layers with 30, 20, and 10 neurons respectively, and an output layer with a single neuron that returns an estimate of the player's suitability score. Using the Adam optimizer and the Mean Squared Error (MSE) loss function, the networks have been trained. 20% of the training data for each position was allocated to validation data. By default, we trained the network with 150 epochs, but for some positions, we needed more epochs or even more layers. As previously mentioned, this is due to the variation in number of players at different positions, because the number of samples has a significant effect on the accuracy of neural network models. The changes in model details across the positions are shown in Table 3. We made these decisions regarding hyperparameter selection based on empirical testing and experimentation, taking into account the characteristics of the dataset and the objective of obtaining accurate position suitability scores. To optimize the performance of the models, the number of neurons in hidden layers and the number of training epochs were fine-tuned.

Table 3 This table shows the number of epochs and hidden layers detail for different position models

2.2.2 Second step: allocating the best formation of players for a team based on a specific lineup

Using the preceding step, we can determine the player scores for each of the 21 positions on a team. This step aims to determine the optimal formation for a team based on the starting lineup selected by the coach and the player-by-player score vector obtained from the previous step.

To achieve this, the problem is reformulated within the field of graph theory. Assume the coach of the team decides to use a specific lineup for a match. In our method, the problem is formulated by constructing a complete bipartite graph. This graph contains two sets of vertices, set X representing players and set Y representing coach-designated positions. Each vertex (player) in set X is connected to each vertex (position) in set Y by a weighted edge, with edge weights determined by the neural network model's scores for the player's suitability for each position. In set Y, there are ten positions, whereas the number of players in set X is variable.

Our objective is to find the maximum matching in this bipartite graph, where to each position a player is assigned so as to maximize the sum of edge weights. This ensures that each position is filled by an appropriate player and that there are no duplicates.

We use the Hungarian algorithm, a method for solving assignment problems to find the optimal assignment or matching in a weighted bipartite graph. In bipartite graphs, the Hungarian algorithm is frequently described in terms of finding the maximum profit or maximum-weight matching.

To obtain this matching, we use the Hungarian algorithm, which can solve this problem in polynomial time. The Hungarian Algorithm is a method used to solve the assignment problem, which involves assigning tasks to workers while minimizing costs (maximizing profit). It starts with a cost matrix representing task-worker relationships. The algorithm iteratively identifies optimal assignments by marking rows and columns and minimizing the number of lines needed to cover zeros in the matrix. The objective is to adjust matrix values to create as many zeros as possible, leading to a minimal cost assignment. This process continues until a complete assignment is achieved. The algorithm ensures an optimal solution and is widely used in optimization scenarios.

Since the number of players is not less than 10, we can identify the maximum weighted matching that covers every vertex of positions using the Hungarian algorithm.

2.2.3 Third step: designing a System of Distinct Representatives (SDR) for a football team

The sole purpose of maximum weighted matching in the previous section was to identify the most effective formation(s). Consequently, only a portion of the team's players are used, leaving a number of players on the bench. However, due to the fact that many players are capable of playing in multiple positions, coaches frequently alter their formation from match to match rather than relying on a set group of players in some fixed positions. As a result, we make an effort to incorporate this issue into our methodology, so that not only we assign players to positions of a lineup with high performance, but also ITFPS utilizes the full potential of the players of the squad. In this regard, we introduce, for each position, a group of players with high performance at that position. In other words, rather than suggesting a single candidate for a position, we present a group of players as suitable candidates.

Using these groups, the coach can then select and assign players to the positions of his/her desired lineup. This method is utilized by the coach to prevent player deterioration and injury. Also, if a player gets hurt during a game or training, the coach can choose from the list of candidates to find a good replacement.

To achieve this objective, we employ the SDR concept. First, the Hungarian algorithm is applied, and the proper players are selected for the corresponding positions. To construct the set of candidates, we consider all players to be candidates for this position if their score for that position is greater than or equal to score of the selected player by Hungarian algorithm minus θ. θ is the level of leniency that is considered for players. Consequently, the corresponding set for each position is defined as follows:

$${Candidates\;Set}_{position}=\left\{palyer \right| {Score}_{position}(player) \ge {Score}_{position}(selected\;player)- \theta \}$$
(1)

Baesd on the Eq. 1, the list of candidates for each position is comprised of all players whose score for that position is either higher than the player selected by the Hungarian algorithm, or is no more than θ units lower. The larger θ indicates that it is easier for a player to be in the corresponding sets to the positions.

Given that the output of the Hungarian algorithm is an SDR for these candidate sets, it is unnecessary to test the Hall's condition for the existence of the SDR for the constructed sets. Now, the coach can select a different SDR from the aforementioned sets based on the available players in relation to the opponent's positions and strategies for a specific match. It is also possible that the coach needs a player for a specific position in a game who in addition to being qualified for that position, possesses a special quality (for example, high ability on free kicks). The coach can then evaluate the members of the candidate group and choose the player with the highest score for this feature.

In Fig. 1, the entire algorithmic process of ITFPS is displayed as a flowchart to provide a better understanding of why each step of the algorithm is necessary and how it relates to the other steps.

Fig. 1
figure 1

Flowchart of necessary steps in ITFPS: Step One) Neural network models are used to assess the suitability of each player for different positions based on their individual features. The model-generated score measures each player's aptitude for a particular position. Step Two) Using the scores obtained from the models, a bipartite graph is constructed, with one set of nodes representing the players and the other set representing the positions. The edges between the two sets represent each player's suitability for each position. By obtaining the maximum matching in this graph, the best team formation is obtained, taking into account all players on the team and the suitability of each player for different positions. Step Three) Once the best team formation has been determined, a set of suitable players for each position is offered to the coach. The coach should then choose one player from the presented set for each position, taking into account his/her strategies for the match

3 Results

In this section, the strategies employed by the coaches of Manchester City, Liverpool, Newcastle United, and Leeds United were compared to the results of ITFPS, utilizing each of the three steps outlined in the previous section and introducing relevant metrics.

3.1 Results of the first step:

MSE is the loss function used to evaluate the performance of the overall score prediction models. Table 4 demonstrates the error of the models on the test data.

Table 4 This table displays the loss function or error of the models that predict the overall score for each position on the test data

The CM position has the highest error at 4.54, while the LCB position has the lowest error at 0.84. The models reached low error very quickly; for example after approximately 40 epochs, the RCM position model achieved a very low error, as depicted in Fig. 2. Due to the linear relationship between variables and scores, the models can learn rapidly.

Fig. 2
figure 2

RCM position model loss function by epoch

3.2 Results of the second step

In the following two subsections, we compare coaches and algorithms from two perspectives:

  1. (1)

    Team formation

  2. (2)

    Player selection

"Team formation" refers to the method used to assign football players to positions. The term "player selection" refers to the selection of 10 players from the squad for a match.

3.2.1 Team formation

We considered the four English Premier League teams Manchester City, Liverpool, Newcastle United, and Leeds United in the 2021–2022 season and implemented ITFPS for each team on all matches played during the season in order to compare the results of the proposed method with the arrangements of the coaches. Manchester City and Liverpool have been led by Pep Guardiola and Jürgen Klopp, respectively. This season, both Newcastle United and Leeds United changed their coaches. Stephen Roger Bruce, Graeme Jones, and Eddie Howe coached 8, 3, and 27 games for the Newcastle United club, respectively. Marcelo Alberta Bielsa Caldera and Jesse Marsch each coached 26 and 12 games for Leeds United, respectively.

We treated the first and second coaches of Newcastle United as a single coach due to their limited number of matches in charge. The selection of these four Premier League teams—representing the top, middle, and bottom tiers of the final ranking table—was deliberate in order to achieve a comprehensive and diverse evaluation of our newly developed algorithm. These teams were tracked for the duration of the Premier League season, a total of 152 matches (38 for each team).

The following comparison was made between ITFPS results and the decisions drawn by the coaches of the four teams.

ITFPS arrangement was derived by applying the ITFPS to the starting lineup and players chosen by coaches throughout the season for each match. Then, we looked at how this proposed formation compared to the one the coach used during the match in question.

The set of 10 players participating in each match is considered one vertex set of a bipartite graph, while the set of positions within a line-up is considered the second vertex set. The optimal arrangement of players is then determined using the second step of the methodology. In other words, the coach selects not only the starting lineup but also the players of the match. In this step, we will evaluate if the ITFPS outputs are similar to the coaches' arrangements. To achieve this, we compare the optimal arrangement generated by ITFPS to the arrangement created by the coach.

To determine how closely the arrangement of a coach resembles the arrangement of ITFPS, we define the Similarity metric in Eq. 2. The Similarity metric is intended to measure the degree of similarity between the model's output and the coach's decisions regarding player positioning. This metric measures the number of players assigned to the same position by both the algorithm and the coach.

$${Similarity}_{position}\left(Coach,\,Algorithm\right)=\left\{\begin{array}{c}1;\;{coach}_{position}={ITFPS}_{position}\\0;{\;coach}_{position}\neq{ITFPS}_{position}\end{array}\right.$$
(2)

Figure 3 depicts the outcomes of the aforementioned Similarity metric. The average score of Similarity between the coaches varies considerably. This hypothesis was examined using the analysis of variance (ANOVA) test (p-value ≈ 2e-16). Analysis of variance test is a statistical test which is used to compare equality of average of a featutre among more than two independent groups. This indicates that the similarity between the algorithm's outputs and the coaches' decisions varied across the four teams. Figure 3 demonstrates that Pep Guardiola and Jürgen Klopp are matched better to the results with a median score of 6. The median score of Similarity between other coaches is between 3 and 4.

Fig. 3
figure 3

Box plot of the outcomes of the aforementioned similarity scoring for four teams

Figure 3 illustrates how the best and most renowned coaches tend to organize their players in accordance with ITFPS. The results of a t-test comparing the average of Similarity between the first and second coaches of Leeds United indicates that there is no statistically significant difference between the two coaches (p-value = 0.36). T-test is used to test the equality of the average of a feature in two independent groups. The results of a similar examination of the average Similarity between first and second coaches of Newcastle United indicates that there is also no significant difference between the two coaches (p-value = 0.64). The results of the t-test were consistent with the results of the similarity test between the coaches of Manchester City and Liverpool (p-value = 0.89). According to the introduced Similarity scoring, then, the coaches of these two teams exhibit comparable behavior. We can generally state that the ITFPS is more similar to the famous and successful coaches. We can also conclude that these coaches have greater knowledge and awareness of their player's abilities. The sizes of the rectangles in Fig. 3 indicate the distribution of the coach's preferences. Consequently, larger rectangles represent a greater variety of coaching options.

It is essential to note, however, that this metric is strict and does not account for minor positional adjustments players may make during a game. For example, if a coach assigns two players to the positions of right defender and right central defender, and the model assigns the same two players to the positions of right defender and right central defender, we consider these choices to be very similar due to the proximity of these positions and the interchangeability of players. However, the Similarity metric does not capture this similarity.

3.2.2 Player selection

For a more detailed comparison, we compare, for each lineup, the optimal team formation based on the output of ITFPS, in which all players are considered for assignment to the positions of a given lineup, with the player assignment considered by the coaches for those positions. This allows us to compare the player selection made by ITFPS to that of the coaches. In Eq. 3, set A represents the group of players included in the ITFPS arrangement, while set B represents the group of players selected by the coach for the match. When these two sets intersect, it demonstrates how similarly ITFPS and coach select players. The Intersection metric was used to compare how players were chosen for matches. It disregards player positions and instead emphasises the similarity in player selection between the model and the coach. By comparing the set of players selected by the model for the selected lineup to the set of players used by the coach in the game, we can determine the number of players shared by both sets.

$$Intersection(A, B)=|A\cap B|$$
(3)

Figure 4 depicts the boxplot of the results of this metric. The ANOVA test was performed to determine whether the mean of this metric was the same for all coaches; the results indicated that it was not (p-value ≈ 2e-16). According to the chart, coaches of Newcastle United are ranked lower than other coaches. Manchester City, Liverpool, and Leeds United second coach are in a better position than other coaches with a median score of 7.

Fig. 4
figure 4

Box plot of the outcomes of the aforementioned intersection similarity scoring for four teams

As mentioned previously, ITFPS only utilizes 10 players in optimal formation for this comparison. So, the Intersection metric appears rigid. In addition, coaches may choose not to utilize all of their elite players in matches that are less important. This metric addresses the limitation of the Similarity metric and offers insight into player selection strategies on a broader scale.

3.3 Results of the third step

3.3.1 Player selection

To compare the performance of the coaches with ITFPS in terms of player selection, we define a metric such that for each position, we assign 1 point of accordance if the player selected by the coach for that position is a member of the corresponding set for that position, and 0 points otherwise. We developed the Accordance metric to evaluate the congruence between coaches' and models' player arrangements. This metric is dependent on a set of candidates for each position. It indicates the number of players selected by the coach from candidate sets for various positions. This metric measures the efficiency which the coach utilizes the available sets for each position. In other words, a coach could get the most scores by choosing an SDR from the sets that match the positions. The metric is defined in Eq. 4.

$${Accordance}_{position}\left(Coach, Algorithm\right)=\left\{\begin{array}{c}1;\;player\in{Candidates\;set}_{position}\\0;\;player\not\in{Candidates\;set}_{position}\end{array}\right.$$
(4)

The value of θ is assumed to be 1 and 3 in this study. Additionally, larger values of θ were investigated, but the results were insignificant. The aforementioned metric results are displayed in Table 5. Minimum, first quartile, median, third quartile, maximum, and mean values of the metric for various θ values are displayed in separate columns in Table 5. Maximum value of each column is colored blue for θ = 1 and green for θ = 3.

Table 5 Minimum, first quartile, median, third quartile, maximum and standard deviation values for the Accordance metric between the coach and ITFPS for the values of 1 and 3 for θ (1 and 2 subscripts mean first and second coach respectively)

The assumption of the equality of mean of the Accordance metric among different coaches examined by ANOVA test. For θ equal to 1, p-value equals to 2e-16 and for θ equal to 3, p-value equals to 1.15e-07. Therefore, the assumption of equality of means is rejected at the type one error level of 0.05.

Notably, the value of this metric is affected by the θ parameter, which determines the level of strictness when creating sets. In our study, we reported results for θ values of 1 and 3, as larger values of θ produced candidate player sets with less distinction.

The aim of ITFPS is to select the optimal lineup for a team by identifying an SDR from a finite family of sets that represent the available players. Table 5 shows that when θ is equal to 1, Manchester City and Liverpool have a higher Accordance value. When θ is equal to 3, Leeds United (Second Coach) and Liverpool have a higher Accordance value, respectively. These results suggest that these three teams are currently in a favorable position in terms of selecting the best possible lineup for their games. By identifying an SDR for their respective sets, these teams have been able to select a set of players that can help them achieve their goals.

3.3.2 Team formation and player selection

To make the previous metric richer and more accurate, we introduce a Ratio metric (Eq. 6) that simultaneously considers both player selection and player arrangement tasks. The Ratio metric provides a comprehensive evaluation by taking both player arrangement and selection into consideration. It divides the Accordance metric by the number of players shared by the coach's picks and the union of all candidate sets for the selected positions. To do this we define knowledge of coach as follows in Eq. 5:

$${Knowledge}_{coach}=|M\cap \bigcup_{position}{Candidates\;Set}_{position}|$$
(5)

where M is the set of players selected by the coach for the match. We call it knowledge of coach because it demonstrates how well the coach is aware of and knowledgeable about his player's abilities. As a result, the Ratio metric for a match is defined as follows in Eq. 6:

$$Ratio= \frac{\sum_{position}{Accordance}_{position}(Coach,\,Algorithm)}{{Knowledge}_{coach}}$$
(6)

The results of this metric can be seen in the Table 6.

Table 6 Minimum, first quartile, median, third quartile, maximum and standard deviation values for the Ratio metric between the coach and ITFPS for the values of 1 and 3 for θ (1 and 2 subscripts mean first and second coach respectively)

A higher Ratio metric indicates that the coach correctly positioned the selected players in their respective positions, and a larger denominator indicates that the coach effectively selected the appropriate players for the match. This metric provides a comprehensive assessment of the coach's alignment of player positions and selections.

Maximum value of each column is colored blue for θ = 1 and green for θ = 3. When θ = 1, Manchester City and Liverpool performed better, according to Table 6. When θ = 3, Liverpool and Manchester City are the top two teams in this metric. Liverpool has performed exceptionally well with both θ values. When θ equals to 3, it has brought the median value of this metric to 1. This indicates that in at least half of the games played during the specified season (21 games), this team was able to correctly position all players belonging to at least one of the candidate sets.

The assumption of the equality of mean of the Ratio metric among different coaches examined by ANOVA test. For θ equal to 1, p-value equals to 8.81e-11 and for θ equal to 3, p-value equals to 1.07e-12. Therefore, the assumption of equality of means is rejected at the type one error level of 0.05.

Another notable aspect of Table 6 is the superior performance of the second coach of Newcastle United compared to both the first coach of Newcastle United and coaches of Leeds United. This superiority is evident in the ‘Med’ column beneath both θ values. We guess that this has something to do with the fact that Newcastle United finished the season with a higher standing than Leeds United. These outcomes are fully consistent with those of the second step. We consider the results of the third step to be more reasonable and applicable due to the fact that this metric took into account player selection and team formation simultaneously.

Here we take an example of a real match and calculate the metrics for it as a case study:

In the April 30, 2021 match between Liverpool and Newcastle, the Liverpool FC coach opted for a 4–3-3 formation with the following player selections:

LB: Andrew Robertson, LCB: Virgil van Dijk, RCB: Joël Andre Job Matip, RB: Joe Gomez, LCM: James Philip Milner, CDM: Jordan Henderson, RCM: Naby Keita, LW: Sadio Mané, ST: Diogo José Teixeira da Silva, RW: Luis Fernando Díaz Marulanda.

Comparatively, the output of our ITFPS model for this match closely resembled the coach's decision, with the exception of a minor difference in player assignment. ITFPS recommended specifically Henderson for the LCM position and Milner for the CDM position.

So, in this match, the Similarity metric produced a score of 8, indicating that the two sets of decisions are highly congruent.

In addition, we determined the optimal 4–3-3 formation suggested by ITFPS for Liverpool, as follows:

LB: Trent Alexander-Arnold, LCB: Virgil van Dijk, RCB: Joël Andre Job Matip, RB: Andrew Robertson, LCM: Thiago Alcântara do Nascimento, CDM: Fábio Henrique Tavares, RCM: Jordan Henderson, LW: Roberto Firmino Barbosa de Oliveira, ST: Sadio Mané, RW: Mohamed Salah Ghaly; which resulted in an Intersection metric score of 5 after accounting for all available players. The Ratio metric for this match, with the θ parameter set to 3, was 0.71 (numerator: 5, denominator: 7).

4 Discussion

4.1 Comparison of ITFPS with other potential alternatives

The FIFA dataset is a public dataset that rates players based on their classic features. These scores were calculated using linear relationships. In this paper, we present neural network models that can predict player scores based solely on their physical and technique. There is no doubt that a linear regression model can produce reasonably accurate results; however, this accuracy will not be perfect due to the omission of certain factors that influence these scores (such as global fame). In contrast, we hypothesized that, in addition to the accuracy of the models on the test data, we must assign low scores to positions that are far from the original position of a player. So, we expect that a model trained on, say, the CB position will give high scores to CB players and low scores to forward players.

Our research indicates that the deep neural network model is more capable of achieving this objective. We used the mean squared error (MSE) loss function as the evaluative metric for comparing the performance of linear regression, support vector regression, and random forest models on the data. The deep neural network model for the LCB position has a loss function of 0.84, while linear regression on the test set has a loss function of 0.63. These models now receive the ST position test set as input for prediction; the linear regression model assigns this group of players an average score of 48.52, while the deep neural network model assigns an average score of 47.4. The average score of the random forest and support vector regression models related to the LCB position which assign scores to striker players are 59.98 and 67.36, respectively, which means they performe worse than the linear regression model. The Kolmogorov–Smirnov test indicates that the cumulative distribution functions of scores from these models do not surpass those of ITFPS, indicating that the ITFPS model is more effective at assigning lower scores to striker players (p-values are about 1e-16). Kolmogorov–Smirnov test is used to check the equality of probability distribution of a random variable in two independent groups. By this test, we realized that ITFPS is more likely to assign low scores to the striker players. The assumption of the equality of the average scores obtained from these models with ITFPS is examined by t-test and rejected (p-values are about 1e-16).

Figure 5 compares the scoring of FIFA, neural network, and linear regression models trained on LCB position, on ST players. The Kolmogorov–Smirnov test indicates that the cumulative scoring distribution derived from the linear regression model is not above the scoring cumulative distribution derived from the deep neural network model (p-value = 0.04). In other words, the neural network model is more likely to assign ST position players lower scores.

Fig. 5
figure 5

FIFA, neural network, and linear regression models trained on LCB position score ST players

Furthermore, the null hypothesis that the mean scores obtained from the linear regression model and the neural network model are equal is rejected (p-value = 0.01) based on the results of the t-test employed to compare the average scores derived from the two models. We obtained comparable results for additional positions.

Consequently, we can observe that the error of the regression model is slightly less than that of the deep neural network model, but the latter is better able to distinguish between different positions.

On the other hand, since these scores are intended to form the weights of the edges of the graph, and the arrangement of a team is determined based on them. Therefore, the more precisely we can distinguish between appropriate and inappropriate positions for a player, the more precise the team's arrangement will be. So, the deep neural network model was used in this study to estimate these scores.

It should be noted that not every player in the world is included in this dataset. The described neural network model can be used to obtain the scores of players who are not included in the current dataset. The team coach only needs to assign scores to the player's features as model inputs, to determine the player's scores for various positions using ITFPS.

4.2 Examining the suitability of different lineups for a team

We determine the suitability of each player for each position using deep neural networks models. Using the Hungarian method, the optimal formation of players in a lineup is then determined. Now we consider the eight most common lineups (4–2-3–1, 4–3-3, 3–4-3, 4–4-2, 4–5-1, 4–3-2–1, 4–2-2–2, and 3–5-2) and present the best and worst performance of player assignment (the sum of the weights of the edges of the maximum matching of the corresponding bipartite graph) for each formation and team in Table 7.

Table 7 Highest and lowest scores for teams, as well as their respective lineups according to their squads

The weights of bipartite graph is greater for Manchester City and Liverpool than for the other two clubs because, on average, the players on these two clubs have higher skill levels than those on the other two teams. In terms of player selection and team formation, it became apparent from the arguments in the results section that ITFPS makes decisions very similar to the coaches of the Manchester City and Liverpool. Therefore, it is not unlikely that they would be first and second in the league ranking.

As shown in Table 7, the difference between the maximum score and minimum score for different lineups for the clubs is negligible, indicating that the positioning of the players on the field is more important than lineup selection.

4.3 Other topics

Newcastle United was in 19th place at the end of the 11th week of the league, but after a change of coach and Eddie Howe's coaching from the 12th week to the end of the season, they finished in 11th place. This issue is most likely to be the result of high Ratio metric of this coach. By employing superior strategies, this coach was able to rescue the team from the bottom of the ranking table.

Based on the coach's sensitivity to different positions, we can discuss how, in real-world implementations of the third step of the ITFPS, θ can be selected for each position as needed. In other words, the candidate sets for the positions that are more important to the coach should be stricter (lower θ values), and multiple θ values should be used for different positions rather than a single one.

The coach can also pay attention to the temporal aspects of the problem. When using ITFPS as an intelligent assistant, the coach can change the inputs of the model and change the score of the players' attributes based on the changes he/she observes in the characteristics of his/her players.

5 Conclusion

Machine learning, graph theory and statistics techniques have been used in a variety of research fields. In this article, we demonstrated how these techniques can be applied to football and addressed some issues.

We designed some deep learning models to get accurate scores for players' positions. We presented a procedure that can identify the best players for each position and serve as coach assistant software. Not only ITFPS improves team performance by providing ideal formations, but also allows the coach to change the formation of the team in order to utilize the team's full potential while avoiding player fatigue.

Contextual factors like opponent, game importance, injuries etc. affect real coaching decisions, so we came up with the idea of the system of distinct representatives because of these factors. For each position, we provide a set of candidate players for each position for the coach to choose from among these sets at his/her own strategic needs and considering these influential factors. Also, the coach may pay attention to the candidates sets during the substitutions he/she makes during the match. In future studies, we will pay attention to incorporating these factors into the modeling so that the model can give strategic suggestions to the coach based on the conditions of the match.

According to experiments conducted on four English Premier League teams, the outputs of ITFPS and the decision-making of well-known and top coaches are strikingly similar. As a result, by using this software, all football coaches can have a smart assistant on par with the best football coaches.

One of the limitations of this study is that we only used classical player features. Coaches choose players based on more than just these features such as advanced and contextual features. So, in the future, we will incorporate these features to improve the results of ITFPS. Another limitation of our study is the static nature of the players' features in the dataset. Tempoal aspects can also be included in the modeling in future studies as time series or longitudinal studies, as well as in any other type of study. However, the results of ITFPS demonstrate its effectiveness, and we recommend it as a useful tool for coaches in this version as well.