1 Introduction

Expert systems have been widely implemented and examined by researchers. They mimic the decision-making abilities of a human expert, and they are designed to solve complex problems by reasoning. Expert system applications include, among others, medicine [29, 53, 60], diagnosis and control of power systems [26, 27], evaluation of journal grades [61], information systems investment evaluation [19], transport management [31, 51], industry [14, 38] and sport [12, 15, 32, 33, 39].

Nowadays, in sports science various types of computer tools and methods play an important role. Competitors and coaches are looking for new solutions that can support their work. One aspect of such support can be the application of machine learning methods, which can be used to calculate performance results [13, 15, 43], identify sporting talent [35, 39, 48, 49] or support the training process [30, 40, 41, 45, 50, 52].

For example, in the paper [13], the authors use artificial neural networks to predict competitive performance in swimming. The neural models were cross-validated, and the results show that the modelling was very precise. The paper [43] describes the use of linear and nonlinear multivariable models as tools to predict the results of 400 m hurdles races. All the models were constructed using the training data of 21 athletes from the Polish National Team. The best prediction results were obtained by the LASSO regression method. Gu et al. [15] proposed an expert system to predict National Hockey League (NHL) game outcome. The prediction accuracy of the system was \(77.5\%\). Another paper [17] presents a review of data mining techniques that are used for prediction in various sports disciplines.

Roczniok et al. [48] proposed using Kohonen’s neural networks for the recruitment process in competitive swimming. Experiments were conducted on a group of 140 young swimming contestants aged about 10. Another approach to identifying sporting talent was proposed by Rogulj et al. [49]. The authors have developed two methodological approaches to recognize an athlete’s morphological compatibility for various sports. In the paper [35], Maszczyk et al. determined the usefulness of neural models in optimizing recruitment processes. Statistical analyses were carried out on the measured results of javelin throwers using full take off. For the investigated group, the perceptron network with the 4–3–2–1 structure achieved the best predictive results.

In the paper [50], Ryguła et al. proposed using an artificial neural network (ANN) to model swimming performance in the 200 m individual medley and the 400 m front crawl events. The ANNs were also used to analyze tactics in team sports [41]. Another study was devoted to the use of ANNs to classify kick techniques [30]. The aim of that paper was to find out whether it is possible to distinguish two different kick techniques from a kick impact force profile. The paper [52] presents the application of a neural network to model swimming performance. The authors created highly realistic models of swimming performance prediction based on previously selected criteria that were related to the dependent variable. Experiments were conducted on 138 swimmers (65 males and 73 females) at national level.

Despite the existing methods to predict and support training, there is lack of tools that could be used by coaches during the training process. Papić et al. [39] developed a fuzzy expert system for scouting and evaluating young sporting talent. A similar system is presented in [33], where the authors perform talent identification in soccer using a web-oriented expert system.

From the review of literature, it can be seen that there is a need to create tools for supporting sports training. The main contribution of this paper is, therefore, to develop a web-oriented expert system, named iHurdling, to predict results and generate training loads in the 110 and 400 m hurdles. The system we have developed can support a coach in planning training programmes in hurdles races. The system uses linear regression models (OLS, ridge, LASSO, elastic net) and nonlinear models (RBF, fuzzy model, OLS with fuzzy correction). The main advantages of this system are an easy-to-use interface and compatibility with different platforms which means that it can be run from a computer or a mobile device.

2 Training data

The training data contain training plans carried out by hurdlers in the Polish National Team. One record contains the parameters of an athlete and the training programme carried out by this athlete during their annual training cycle. The models for result prediction (PR) and for generating training loads (GT) were build using 21 variables (Table 1). For the PR models, the input variables \(x_1{-}x_5\) represent the parameters of the athlete, the input variables \(x_6{-}x_{20}\) represent the training loads and the output variable y represents the predicted result. For the GT models, \(x_1{-}x_6\) represent the parameters of the athlete and \(y_1{-}y_{15}\) represent the training loads. The training programs were recorded according to the classification proposed in [22]. The classification consists of two areas of influence: energy (exercise) and information (related to the formation of technique). In the analyzed training loads, there are speed, endurance and strength as well as exercises that develop the technique of hurdles clearance. A similar classification of exercises can be found in another papers devoted to sprinters and hurdlers [2, 37]. The values of these loads are the sum of all loads of the same type realized during the annual training cycle. The results for the hurdles races were registered before and after the cycle. Both runs were carried out under simulated starting conditions of the 110 and 400 m hurdle race. In this study, the current result at the training distance was assumed as the indicator of performance level. As concluded in the paper [25], this result is strongly correlated with performance parameters and other motor skills tests used in hurdles races. For the 110 m hurdles, the training data contain 40 records. These records were collected from 18 highly trained athletes (mean result in 110 m hurdles: 14.02 s) aged 18–28. In 400 m hurdles, the 48 records from 21 athletes aged 19–27 were used. The hurdlers practising the 400 m had also a high sport level. (Mean result on 400 m hurdles was equal to 51.26 s.)

Table 1 Description of variables used to construct the PR and GT modules for 110 and 400 m hurdles

3 Mathematical models

In this paper, we use the regression methods for building multi-input, single-output (MISO) and multi-input, multi-output (MIMO) models. The MISO models are used for the prediction of result, while the MIMO models are used in the generation of training loads. In the simplified description that follows we assume that we have one output, since a MIMO model will be represented as a set of MISO models.

In our expert system, we use:

  • linear models in the form of ordinary least squares (OLS) [7], ridge regression (RIDGE) [18], least absolute shrinkage and selection operator (LASSO) [54] and elastic net (ENET) [62],

  • nonlinear models in the form of radial basis function network (RBF) [6] and fuzzy rule-based system (FRBS) [47].

3.1 Linear models

Consider a MISO model with p inputs (predictors) creating the vector \(\mathbf {x}=[x_1,x_2,\ldots ,x_p]\) and one output (response) y. The goal is to build the regression function

$$\begin{aligned} y = f(\mathbf {x}) = \sum _{j=1}^{p}x_j w_j \end{aligned}$$
(1)

based on a data set containing n observations in the form of pairs \((\mathbf {x}_i,y_i)\), where \(\mathbf {x}_i=[x_{i1},x_{i2},\ldots ,x_{ip}]\), \(i=1,\dots ,n\). The element \(x_{ij}\) denotes the jth predictor in the ith observation, and \(y_i\) is the response in the ith observation.

The linear regression problem can be written as a matrix equation of the form

$$\begin{aligned} \mathbf {y}={\mathbf {X}}{\mathbf {w}} \end{aligned}$$
(2)

where

$$\begin{aligned} \mathbf {X} = \begin{bmatrix} x_{11}&\quad x_{12}&\quad \ldots&\quad x_{1p}\\ x_{21}&\quad x_{22}&\quad \ldots&\quad x_{2p}\\ \vdots&\quad \vdots&\quad \ddots&\quad \vdots \\ x_{n1}&\quad x_{n2}&\quad \ldots&\quad x_{np}\\ \end{bmatrix} \end{aligned}$$
(3)

and \(\mathbf {w}=[w_1,w_2,\ldots ,w_p]^T\), \(\mathbf {y}=[y_1,y_2,\ldots ,y_n]^T\). Denoting by \(J(\mathbf {w},\cdot )\) a cost function, the problem of finding a linear model involves minimizing the function \(J(\mathbf {w},\cdot )\), that is

$$\begin{aligned} \hat{\mathbf {w}} = \mathop {\hbox {arg min}}\limits _\mathbf {w} J(\mathbf {w},\cdot ) \end{aligned}$$
(4)

where \(\hat{\mathbf {w}}\) is the vector of the optimal parameter values. For the linear models, the cost functions have the form of

$$J_{{{\text{OLS}}}} ({\mathbf{w}}) = \left\| {{\mathbf{y}} - {\mathbf{Xw}}} \right\|_{2}^{2}$$
(5)
$$J_{{{\text{RIDGE}}}} ({\mathbf{w}},\lambda ) = \left\| {{\mathbf{y}} - {\mathbf{Xw}}} \right\|_{2}^{2} \,+\, \lambda \left\| {\mathbf{w}} \right\|_{2}^{2} {\text{ }}$$
(6)
$$\begin{aligned} J_{{{\text{LASSO}}}} ({\mathbf{w}},\lambda ) = \left\| {{\mathbf{y}} - {\mathbf{Xw}}} \right\|_{2}^{2}\, + \,\lambda \left\| {\mathbf{w}} \right\|_{1} \end{aligned}$$
(7)
$$\begin{aligned} J_{{{\text{ENET}}}} ({\mathbf{w}},\lambda _{1} ,\lambda _{2} ) = \left\| {{\mathbf{y}} - {\mathbf{Xw}}} \right\|_{2}^{2} \,+\, \lambda _{1} \left\| {\mathbf{w}} \right\|_{1}\, +\, \lambda _{2} \left\| {\mathbf{w}} \right\|_{2}^{2}\end{aligned}$$
(8)

where \(\lambda\), \(\lambda _1\) and \(\lambda _2\) are non-negative regularization parameters. The norms \({||\cdot ||}_2\) and \({||\cdot ||}_1\) denote the Euclidean and the Manhattan norms, respectively. The RIDGE, LASSO and ENET regressions are regularized which means that they can be used when the problem is ill-conditioned. The detailed description of the linear models can be found, for example, in [58].

3.2 Choosing the best model

All models were tested using cross-validation method. This is a method of evaluating the generalization ability (prediction for new data, not involved in modelling) of the model being created. In cross-validation, data are divided into two subsets: a training set and a testing (validation) set. In this study, due to the small amount of data (\(n=40\) for 110 m and \(n=48\) for 400 m), LOOCV (leave-one-out cross-validation) was used [3]. The idea of this method is to extract from the set of data n learning subsets. Each subset is created by removing only one pair from the data set, which becomes a test pair. Then, for each resulting subset, the model is constructed that is evaluated by determining the error for the remaining test pair. The predictive ability of a model is expressed by the root of the mean square error of cross-validation (\(\text {RMSE}_{\text {CV}}\)) calculated as

$$\begin{aligned} {\text {RMSE}}_{\text {CV}} =\sqrt{\frac{1}{n}\sum _{i=1}^n\left( y_i-{\hat{y}}_{-i}\right) ^2} \end{aligned}$$
(9)

where \({\hat{y}}_{-i}\) is the output of a model obtained after removing the pair \((\mathbf {x}_i,y_i)\) from the data set.

3.3 Nonlinear models

3.3.1 RBF models

An RBF network is a feed-forward network that typically consists of three layers: an input layer, a hidden layer and an output layer.

The input layer is composed of nodes that receive input signal \(\mathbf {x}\), and there is one node for each predictor variable. The hidden layer is composed of nodes with radially symmetric activation functions. The hidden node measures the distance between the input vector \(\mathbf {x}\) and the centre \(\mathbf {c}_k\) of its radial function:

$$\begin{aligned} \varphi _{k} ({\mathbf{x}}) = \varphi _{k} \left( {\left\| {{\mathbf{x}} - {\mathbf{c}}_{k} } \right\|} \right) \end{aligned}$$
(10)

The norm \({||\cdot ||}\) is usually taken as the Euclidean distance, and \(\varphi (\mathbf {x})\) is typically taken to be the Gaussian function. The output layer is composed of a node that receives the outputs of nodes in the hidden layer. This node calculates the output of the network as a linear combination of nonlinear functions of the form

$$\begin{aligned} y = \sum _{k=1}^{m}\varphi _k(\mathbf {x})w_k \end{aligned}$$
(11)

where m is the number of nodes in the hidden layer, \(\varphi _k(\mathbf {x})\) is a basis function and \(w_k\) is the weight of the kth neuron in the output node.

The training of the RBF network involves: the number of hidden neurons, the parameters of radial functions in the hidden layer and the weights in the output layer.

3.3.2 Fuzzy models

In this paper, we propose two approaches to use the FRBS [47] in regression problems. In the first approach, the fuzzy model is build similarly to the RBF model, that is, it is learned from the original data. (This model is called FUZZY.) In the second approach, the FRBS is used for the nonlinear correction of the OLS model. (This model is called F-OLS.) The idea is to change the output of a linear model by adding a nonlinear correction term, in such a way that the predictive error is reduced (Fig. 1). First, we build the OLS model and remember its cross-validation errors, and next we build a fuzzy model that “learns” these errors.

Fig. 1
figure 1

The idea of calculating the output of the F-OLS model. The variable \(y=f_{\mathrm {OLS}}(\mathbf {x})\) is the output of the ordinary least squares estimator, and \(d=f_c(\mathbf {x})\) is the output of the fuzzy nonlinear corrector

The design procedure for building F-OLS models is listed below.

  1. Step 1.

    Cross-validation of the OLS model \(y=f_{\mathrm {OLS}}(\mathbf {x})\) for the data \((\mathbf {x}_i,y_i)\). In the ith step of cross-validation, the error has the form

    $$\begin{aligned} e_i = y_i-y_{-i} \end{aligned}$$
    (12)

    where \(y_{-i}=f_{\mathrm {OLS}}(\mathbf {x}_{-i})\).

  2. Step 2.

    Constructing the fuzzy (nonlinear) corrector

    $$\begin{aligned} d=f_c(\mathbf {x}) \end{aligned}$$
    (13)

    for the data \((\mathbf {x}_i,e_i)\). This corrector predicts the errors obtained in Step 1. The best fuzzy model can be chosen on the basis of cross-validation conducted for different number of fuzzy sets.

  3. Step 3.

    Cross-validation of the OLS model with the corrected error in the form

    $$\begin{aligned} e_i^{ new }=y_i-(y_{-i}+d_i) \end{aligned}$$
    (14)

    where \(y_{-i}=f_{\mathrm {OLS}}(\mathbf {x}_{-i})\) and \(d_i=f_c(\mathbf {x}_i)\).

  4. Step 4.

    The predicted output of the F-OLS model is determined as

    $$\begin{aligned} y_{\mathrm {F}-\mathrm{OLS}}=f_{\mathrm {OLS}}(\mathbf {x})+f_c(\mathbf {x}) \end{aligned}$$
    (15)

    where \(f_c(\mathbf {x})\) is the function of the fuzzy corrector chosen in Step 3 (Fig. 1).

4 Expert system modules

The expert system consists of two modules, the prediction of result (PR) and the generation of training loads (GT) for the 110 and 400 m distances. The regression models for both modules were calculated in R language [46]. The functions with arguments used to generate the models are shown in Table 2, and they are described below.

The function lm was used to calculate the OLS, and the ridge regressions were calculated using the function lm.ridge from the “MASS” package [55] (with \(\lambda > 0\) in 6). The LASSO and the elastic net regressions were obtained with the function enet included in the “elasticnet” package [63]. This function has two parameters \((\lambda ,s)\), where \(\lambda \ge 0\) denotes \(\lambda _2\) in the formula (8) and \(s\in [0,1]\) is a fraction of the norm \(L_1\). The pair \((\lambda ,s)\) is used instead of the pair \((\lambda _1,\lambda _2)\) in the formula (8) because the elastic net regression can be treated as the LASSO regression for an augmented data set [62]. Taking \(\lambda =0\) we get the LASSO regression with one parameter s for the original data. The ENET models were selected by searching the parameters \(\lambda\) and s.

This study uses artificial neural networks in the form of the radial basis function (RBF). The training data were scaled before the RBF training, and the results of the predictions were unscaled. All the analyzed networks have one hidden layer. For the implementation of neural networks, the function RSNNS::rbf was used [6]. The optimal neural model was determined by searching a number of hidden neurons in the range from 2 to 10.

The fuzzy models were calculated using the function frbs.learn from the “frbs” package [47]. The learning method was the Wang–Mendel (W–M) algorithm [56]. This algorithm generates fuzzy rules from input–output data pairs. The input space is divided into fuzzy subspaces, and fuzzy rules are extracted for each subspace. The W–M method is a one-pass procedure and does not need time-consuming training. In the fuzzy model, the Gaussian membership functions are used, the t-norm is “minimum”, the defuzzification is “weighted average method”, and the implication is “minimum”. The number of fuzzy sets l was determined by calculating cross-validation errors as described in Sect. 3.3.2.

Table 2 R functions for models training

4.1 Models for result prediction

The cross-validation errors \(\text {RMSE}_{\text {CV}}\) and parameters of the models for the PR module are presented in Table 3. The parameters were chosen on the basis of the plots shown in Figs. 2 and 3. In the case of the ridge regression, the regularization parameter \(\lambda \in [0,40]\) was considered with the step 0.1 for both distances. In the case of the LASSO regression, the parameter \(s \in [0,1]\) was considered with the step 0.01. For the ENET regression, the following parameters were chosen: \(\lambda \in [0.1,0.25]\) with the step 0.008 and \(s\in [0.4,0.8]\) with the step 0.021 for the distance of 110 m, and \(\lambda \in [0,0.06]\) with the step 0.0032 and \(s\in [0.3,0.5]\) with the step 0.01 for the distance of 400 m. The RBF model was analyzed for the number of neurons in the hidden layer \(m \in \{2, 3, \ldots , 10\}\), and the fuzzy models were analyzed for the number of fuzzy sets \(l \in \{2, 3, \ldots , 13\}\). Based on the conducted analysis, the best models (models with the smallest cross-validation error) were selected (Table 3). It can be seen that for both the 110 and the 400 m distances, the lowest error was obtained by the F-OLS regression. The best F-OLS models have eight fuzzy sets for 110 m and nine sets for 400 m. The largest error for the 110 m was obtained by the OLS regression and by the FUZZY model for the 400 m.

Table 3 Errors and parameters for the PR module for 110 and 400 m hurdles
Fig. 2
figure 2

Cross-validation errors for linear models (RIDGE, LASSO, ENET) in result prediction

Fig. 3
figure 3

Cross-validation errors for nonlinear models (RBF, FUZZY, \(f_c\)) in result prediction

4.2 Models for generation of training loads

For the GT module, each output of the model (\(y_1\)-\(y_{15}\)) was considered and analyzed in a similar way as for the result prediction module. The errors \(\text {RMSE}_{\text {CV}}\) for the GT module are presented in Table 4, while the parameters of the models are presented in Table 5. The models in the GT module were cross-validated similarly to the PR module. For example, for the output \(y_{14}\) the FUZZY model has the largest errors (200.1 for 110 m and 132.9 for 400 m), and the F-OLS model has the smallest errors (33.18 for 110 m and 105.1 for 400 m). From Table 4, it can be observed that the smallest \(\text {RMSE}_{\text {CV}}\) for all outputs has the F-OLS model.

Table 4 Errors for the GT module for 110 and 400 m hurdles
Table 5 Parameters for the GT module for 110 and 400 m hurdles. The meaning of the parameters is as follows: \(\lambda\) and s are tuning parameters for regularized models, m is the number of hidden neurons in the RBF network, l is the number of fuzzy sets in fuzzy models
Fig. 4
figure 4

Screenshot of the iHurdling application with PR panel

5 Graphical user interface

The graphical user interface was implemented in R language using the shiny [11], shinyjs [4], shinythemes [10], shinydashboard [9] and rmarkdown [1] libraries. This interface is a web-oriented application and therefore requires only a web browser and an Internet connection to be used. The current version of the developed system is available on https://hurdles.shinyapps.io/ihurdling. The application shown in Fig. 4 consists of three panels labelled “Result prediction”, “Generation of training loads” and “Athletes’ database”.

On the left side of window is a sidebar menu with links to each panel. The radio button in this sidebar is used to select the PR or GT module. Moreover, the user can choose one of the developed regression models and generate reports. The footer contains the information about the application and the authors.

Fig. 5
figure 5

Screenshot of the box for entering endurance training loads

5.1 Panel for result prediction

The “Result prediction” panel is used for entering data and for result prediction (Fig. 4). The input variables are grouped into five boxes: “Athlete’s parameters”, “Training loads—endurance”, “Training loads—technique and rhythm”, “Training loads—strength” and “Training loads—speed”. The value of each input can be modified using appropriately scaled sliders. For example, the box “Training loads—endurance” presented in Fig. 5 has four sliders for changing endurance training loads. Each slider has a range determined on the basis of the minimum and maximum values in the database (Table 1) and depends on the distance of the hurdles race. For instance, the slider “Pace runs” for the 110 m hurdles has a range from 25,000 to 101,000 m with the step equal to one metre.

In the last box labelled “Results” two textOutput fields display the current and predicted results. Prediction of the result is performed automatically after changing the position of any slider. Moreover, the result depends on the radio button that selects the method in the sidebar menu. In this way, the user can modify the training loads and observe the changes that occur in the expected result. Generating a report from result prediction creates a .pdf file, which contains the values from all sliders and the predicted result.

Fig. 6
figure 6

Screenshot of the panel for generation of training loads for one training programme

Fig. 7
figure 7

Screenshot of the panel for generation of training loads over the athlete’s entire career

5.2 Panel for generation of training loads

Another system panel is the generation of training loads for both hurdles distances (Figs. 6, 7). This module consists of two boxes: “Athletes’ parameters” and “Generated training—annual cycle”. The first box is used to enter the athlete’s data, i.e. age, body height, body weight and his current result. This box also includes an option to choose the training generation mode. The user can choose the option of one training generation or the option to generate the training loads for a longer period of his career. The selection of the first option will cause a slider with the expected result to appear under the slider with the current result. If the “career” option is selected, these sliders are not available. The “career” option makes it possible to generate six training programmes which are consecutive and improve the result by 0.25 s each year (from 15.00 to 13.50 s) for 110 m hurdles and by 1 s each year (from 53.00 to 48.00 s) for 400 m hurdles.

The contents of the box “Generated training—annual cycle” change dynamically, depending on the mode. The “one training” option will generate a list of training loads with suggested values (Fig. 6). In addition, a graph is generated, in which the values of training loads, expressed as a percentage of the maximum value of the given output, are presented. The second option is “career”; its selection generates a table containing six annual training plans and 15 graphs showing the loads over the athlete’s entire career (Fig. 7). The “career” is an additional option that allows us to generate training loads for six consecutive years. In this option, the starting result is always constant and is 14.75 s for 100 m and 53 s for 400 m, respectively. Results are generated in the form of a table where each row represents the annual training and in the form of graphs where the x-axis is the expected result and the y-axis is the value of the training load. Career graphs allow observations of changes in individual loads in terms of a 6-year career. The coach can observe which load needs to be increased, which decreased and which should stay at the same level.

Generating a report from the “Generated training” panel creates a .pdf file, which contains values from the “Athlete’s parameters” box and a table with one or six annual training cycles depending on the types of generating training loads.

5.3 Panel for athletes’ database

The third system panel is used to create and change the database containing athletes’ details (Fig. 8). This panel consists of two boxes: “Athlete’s database” and “Edit”. In the first box, the records of the database loaded from the file are displayed. The system supports files saved in the .csv format with field separator “;” and “.” as the decimal point. The database file contains the following columns: “Name”, “Surname”, “Age”, “Body Mass” and “Body Height”. This box displays all athletes in a table; the choice of athlete is done by marking the appropriate line in this table. Furthermore, the name of the selected athlete is displayed in the sidebar. When an athlete is selected, his data can be edited via the “Edit” box. The saving of the edition is approved with the “Save” button. The “Delete” button removes the athlete from the database. The deselection of the athlete is done by re-selecting him/her in the database. If no athlete is selected, a new athlete can be entered into the database using the “Edit” window. After a new athlete is entered, you should click “Save as new”. After each operation performed on the database, the user should save the database using the button “Save database” on the first panel. The second button on the panel (“Clear database”) performs cleaning the database from the application memory. In the “Athletes’ database” panel, it is not possible to generate reports.

Fig. 8
figure 8

Screenshot of the panel for athletes’ database

6 Discussion

In this paper, mathematical models for generation of training loads and prediction of results expected from athletes training the 110 and 400 m hurdle races were presented. The best model verified by LOOCV in each of the considered tasks and for each distance turned out to be the model F-OLS proposed by the authors. The application of fuzzy models in sport was also presented by Mezyk and Unold [36]. The goal was to find the rules that can express swimmers feelings the day after in-water training. Their method was characterized by better predictive ability than the traditional methods of classification, and the effectiveness was at the level of 68.66%. In Papić et al. [39], the fuzzy expert system was also presented. This system was based on knowledge of experts in the field of sport, as well as the data obtained as a result of motor tests. The model suggested the most suitable sport, and it was designed to search for prospective sports talents. Evaluation of the system showed high reliability and high correlation with top experts in the field.

While analyzing the literature, it can be also noticed that mathematical models frequently used in sports are artificial neural networks [34, 35, 40, 44, 48, 50, 52, 58]. Numerous studies have shown that the ANN is a means of predicting sports results which has a good predictive ability [13, 59]. Thus, the ANN enables a coach to model the future level of athletes performance and supports the process of sports selection [34, 40, 48, 52]. For example, Silva et al. [52] presented high realistic models of swimming performance prediction based on multilayer perceptron. To establish a profile of the young swimmer, nonlinear combinations between preponderant variables for each gender and swim performance in the 200 m medley and 400 m front crawl events were developed. Artificial neural networks are also widely used in the process of planning training loads [44, 50]. In [50], Ryguła presents a new approach for determining training loads in a group of 16- and 17-year-old girls practising 100 m run.

Sports training is the matter of making decisions about the quality (type of exercise) and the quantity (volume). This is a classical principle of sports training, emphasized in all textbooks on the theory of sports [8, 42]. Selection of training means and their distribution at subsequent stages of sports training is the main element of hurdlers’ training optimization on both distances, i.e. 110 and 400 m [50]. The selection of exercises (training means) in hurdling is supported by research in the field of motor preparation (strength, speed, endurance) as well as in relation to the technical structure of the event (kinematic analyses) [21]. The observation of training programs of the best athletes [24], supported by the analysis of correlation between the results of hurdle run and the tests results including the physiological [16, 64] and biochemical basis [28], allows for selection of groups of the most valuable basic exercises. The performance tests carried out during the ergometric effort [5] are of great importance in assessing the specifics of hurdlers’ effort [5]. It should be emphasized that individualization of hurdlers’ training programs also requires an individual approach to the type of capacity, oscillating between aerobic and anaerobic capacity. Sprinting distances in hurdling are considered to be typical running efforts of anaerobic nature. In the case of 110 m hurdle race, anaerobic non-lactic acidic changes with the final accent on anaerobic lactic acidic changes are predominant. The 400 m hurdle run requires first of all an effort of anaerobic non-lactic acidic nature [57]. Data concerning the specificity of the effort at a distance of 400 m indicate that the proportions of aerobic and anaerobic efforts can be significantly varied, taking into account the material (sports performance level of runners), method and period of training. In the review study by Arcelli et al., [2] those parameters adopted values within the range of 28–70% (aerobic) and 30–72% (anaerobic). The authors suggested that the higher the sports performance level, the higher the share of anaerobic element.

The determination of the type of runner due to the aerobic and anaerobic processes would certainly make it possible to introduce some additional information in order to develop individual training. However, this problem has a logistic disadvantage, as monitoring of physiological reactions in hurdling is limited to the months when the athletes take part in competitions. Winter conditions are not conducive to specific running tests, and the choice of substitute distances may negatively affect the individual abilities of the hurdler.

Taking into account the extensive scope of hurdlers’ exercises, the basic problem of a coach is the choice of exercises, their volume and proportions during particular training periods. The researchers pay their attention to that specificity of sports [8, 37]. Apart from the representative collection of training means, the body physique and age, often identified with the sports performance level, were also used. The impact of those elements on the organization of training has been already emphasized on several occasions [20, 23]. It seems appropriate to determine the initial value (record from the given year) and the estimated scale of progress (plans for the next season), because it makes it possible to control training loads depending on the athletes age, their current performance level and the main objective. Each athlete has different predispositions, also to perform specific training tasks. The selection of exercises is necessary, because it is impossible to perform the same volume of all exercises at the same time. Such a procedure would also be pointless, since the “rhythmic” type of a hurdler prefers running exercises with hurdles, and the “speedy” type of the hurdle runner prefers shorter distances of the interval nature [20]. The database used is based on the period of 20 years of training Polish hurdlers, members of the national team. Those hurdlers represented various types (somatic, efficient and technical); therefore the scope of generalization (approximation) possibilities of the proposed computer system is significant and partly representative.

Summarizing the discussion, it should be noted that there are severe limitations of the presented approach connected with using the results in practice. The training programs do not consider the individual physiological and psychological parameters of an athlete. However, the generated training programs might be used as a suggestion for the coach who can perform necessary adjustments in order to adapt them for a particular athlete.

7 Conclusions

In this paper, a web-oriented expert system to predict results and generate training loads for 110 and 400 m hurdles races was presented. The system uses the linear regression models (OLS, ridge, LASSO, elastic net) and nonlinear regression models (RBF, fuzzy model, OLS with fuzzy correction). The lowest errors were obtained by the proposed F-OLS model, but creating this model is more complicated.

The application was implemented using R programming language with Shiny framework. The advantage of this application is that it can be run on multiple platforms such as personal computers and mobile devices. The easy-to-use interface allows the parameters of an athlete and the training loads to be changed. In this way, the coach can predict the expected result and select individual training components for a given athlete.

Further work will focus on migrating the developed expert system to mobile application.