1 Introduction

The sports industry plays a significant role in a country’s economic development, especially in proving the development of the nation’s sports and providing employment opportunities. Watching elite sports games is an important part of people’s lives, and the score influences fans’ behaviors. The results of competitions are impacted by several factors. Recently, the analysis of sports performance has been a hot topic among researchers and coaches, and the top players may be regarded as idols for the public. Therefore, the connection between the results of sport competitions and various scenarios are interesting topics of study [6, 24, 32, 39, 40, 51].

In many sports, especially in table tennis, the body height of athletes has less of an influence on sporting performance [20, 27]. This indoor sport is suitable for all, and is particularly popular in Asia. In the 2020 National Intercollegiate Athletic Games, there were 165 teams, 832 players in the table tennis competition. In particular, 219 national members joined the International Table Tennis Federation (ITTF) in 2022, more than any other international sporting federation [46]. According to the rankings from the ITTF in September 2022, the Taiwanese player Yun-Ju Lin ranked seventh in the world in Men’s Singles; Chien-An Chen and Chih-Yuan Chuang ranked sixth in the world in Men’s Doubles, and I-Ching Cheng and Yun-Ju Lin ranked first in the world in Mixed Doubles [26]. To become an elite table tennis player in a fast-paced and fierce competitive environment, aside from continually practicing, knowing one’s strength and weaknesses allows planning the strategy beforehand and increasing the winning rate, which are essential tasks in elite sports.

It is fascinating to discover the hidden relationships in actual data from competition records. Data mining is the process of discovering interesting patterns and knowledge from large amounts of data [17]. Data mining techniques may assist coaches and athletes in analyzing a large amount of sports performance data and support decision-making more effectively. Many studies have analyzed the results and scenarios in sports events using data mining [4, 16]. These studies cover sports such as baseball [29, 52], basketball [33, 49], American football [13, 21], and soccer [11, 12, 47].

To collect and analyze data, we established “The Intellectual Tactical System in Competitive Table Tennis”, for using image analysis in competition videos. The Intellectual Tactical System in Competitive Table Tennis is presented in Fig. 1(a) and (b).

Fig. 1
figure 1

(a) and (b) The intellectual tactical system in competitive table tennis

Currently, the three main methods for the analysis of table tennis are descriptive statistical analysis, computer-aided analysis, and model analysis [68]. Most studies in the past have used descriptive statistical analysis. The advantage of the traditional three-phase evaluation method [56] was that it provided clear scoring and usage rates in three phases of the game, and analyzed the result with norms. However, its limitation was that it was far too simple to analyze and interpret the complex technical and tactical characteristics of games. This approach was mainly results-based analysis, such as analysis of winning or losing points.

In this study, the 3 S theory presented by Wu [59] was adopted as the main conceptual framework [25, 57, 58]. The 3 S theory consists of 135 varieties of possibilities using speed, spin, and spot, where it can be used to provide and set up more objective strategies in table tennis games [9]. According to Zhang [68], model analysis becomes more valuable with the trend of the use of big data. However, its development is still at an initial stage in table tennis. The purpose of this study was to analyze the tactical performance of top men’s singles players in table tennis games and identify the important features of game scenarios. Therefore, this study proposed a new method, which combines neural networks and Patient Rule Induction Method (PRIM) [14] to break through the limitations of traditional methods on table tennis analysis strategies.

2 Related works

2.1 Match analyses of table tennis and 3 S theory

In the last 30 years, a common and traditional method for table tennis analysis was “the three-phase evaluation method” proposed by Wu et al. [56] in China. The method analyses the performance of the following three phases of a game: “attack after service” on the first and third strikes shot, “attack after receive” on the second and fourth shot, and “rally” after the fourth shot [68]. It uses the scoring rate and the usage rate in each phase of each game to analyze the strength of table tennis players’ techniques. The scoring rate (SR) and the usage rate (UR) are defined as follows [15].

$${SR}_{i}=\frac{{\alpha }_{i}}{{\alpha }_{i}+{\beta }_{i}}\times 100\%$$
(1)
$${UR}_{i}=\frac{{\alpha }_{i}+{\beta }_{i}}{{\alpha }_{g}+{\beta }_{g}}\times 100\%$$
(2)

In the above two basic formula, i is the number of each phase, URi is the usage rate at the i-th phase, SRi is the scoring rate at the i-th phase, \({\alpha }_{i}\) is the scoring points at the i-th phase, and \({\beta }_{i}\) is the losing points at the i-th phase, \({\alpha }_{g}\) is the scoring points in a game, and \({\beta }_{g}\) is the losing points in a game.

Later, Yang and Zhang [63] extended this method to develop the four-phase evaluation method for table tennis. Xiao et al. [60] also proposed the “double three-phase evaluation method”. There have been several articles using the three-phase evaluation method [7, 8, 10, 23, 61, 66, 67] and four-phase evaluation method [62] to analyze the skills of table tennis players.

The 3 S theory (speed, spin, spot) was first applied in the ITTF classification system for athletes with impairments by Wu [59]. The theory was used to analyze functional tests and technical skills of physically and intellectually disabled players. The 3 S theory generally includes the elements speed, spin, and spot, which constitute each shot. The table tennis “3S theory + techniques” architecture is shown in Fig. 2.

  1. 1.

    Speed includes three kinds of speed: fast, medium, and slow.

  2. 2.

    Spin includes five items: topspin, backspin, no spin, top sidespin, and back sidespin.

  3. 3.

    Spot includes nine spots in the nine-rectangle grid of a table: left-short, center-short, right-short, left-center-middle, center-middle, right- center-middle, left-long, center-long, and right-long.

Fig. 2
figure 2

Table tennis “3S theory + techniques” architecture

The 3 S theory can not only focus on 3 essential factors (speed, spin, spot), but establish strategies in table tennis competitions, taking a more objective and systematic approach to analyze the features of each time the ball is hit (3*5*9 = 135 types). As a result, the 3 S theory has been used widely around the world because of its innovative features.

In recent years, several articles have used the 3 S theory to collect data and analyze the tactical characteristics of table tennis players. Tsai et al. [50] used the 3 S theory to record the results of every point for the purpose of performance analysis of elite Taiwan table tennis doubles players. Chien et al. [9] analyzed the tactical characteristics of two talented young table tennis players, Tomokazu Harimoto (Japan) and Yun-Ju Lin (Taiwan), using the 3 S theory. Sun et al. [45] analyzed the technical and tactical characteristics of the first five strokes of the men’s singles match of Cheng-Ting Liao. Using the 3 S theory as the research tool to collect and analyze the competition data has become a popular method recently, however, there is limitation to the 3 S theory, as the recording of the ball’s path and technique must be carried out by experienced people. Empirical studies in table tennis are illustrated in Table 1.

Table 1 Empirical studies in table tennis

2.2 Artificial neural network

Neural networks are huge parallel distributed processors comprising simple processing units [28]. They provide useful features and functions, including nonlinearity, learning capability, flexibility, fault tolerance, and large parallel processing. Though there are many types of neural networks, the multi-layer feedforward style is the most popular [41]. This approach has been used successfully in problems such as classification, identification, forecasting, and diagnosis.

The structure of neural networks includes the input layer, a hidden layer, and an output layer. First, the input layer processes the displays into a reasonable rate range. The second layer, the hidden layer, connects every unit to the input layer. Last, the output layer connects to the output of neural networks. The architecture of neural networks in sensitivity analysis is shown in Fig. 3. Sigmoid was used as an activation function in the process [19], and the number of points in the hidden layer was 5. Neural networks have been applied widely in different fields, for example, bankruptcy forecasting and medical diagnosis of diabetes and breast cancer [1, 48]. Other applications in sports include American football [35] and javelin [34].

Sensitivity analysis is used to evaluate the impact of a specific input variable on a network output. The input of the variable is modified while the other input variables are fixed at a certain value. Meanwhile, changes in the network output are monitored. Sensitivity analysis uses backpropagation algorithm to train input data. Backpropagation learns by iteratively processing a set of training tuples and comparing the network’s prediction for each tuple with the actual known target value. For each training tuple, the weights are modified to minimize the mean-squared error between the network’s prediction and the actual target value. These modifications are made in the “backwards” direction, from the output layer through each hidden layer down to the first hidden layer [17].

Fig. 3
figure 3

The architecture of neural networks in sensitivity analysis

2.3 Association rules and PRIM

Association rules are an important data mining technique. The main purpose is to investigate the relevance between features in a multi‑attribute database [17]. Agrawal et al. [2] first presented an effective algorithm that figures out the association rules between minimum-confidence item sets in large data sets. Basket analysis is also known as association rules, and it tries to express the rules between two or more features [31]. Association rules differ from classification rules in two ways. They can predict any attribute, not just the class, and they can predict more than one attribute’s value at a time. [54].

There are two steps in the association rules data mining technique. Firstly, it needs to identify item sets higher than the minimum support, and secondly, to produce the rule that matches the minimum support and confidence from large item sets or frequent item sets [3]. The form of association rules is “if antecedent, then consequent”. Commonly, association rules are considered if they satisfy both a minimum support threshold and a minimum confidence threshold. These thresholds can be set by users or domain experts [17].

There are limitations to the a priori algorithm, and researchers find it hard to reach their expected result. In contrast, PRIM is an analytic algorithm that reveals more results of interest to researchers. PRIM is a data mining technique introduced by Friedman and Fisher [14]. According to Wolfgang and Wang [55], the purpose of PRIM is to figure out the target variable of a sub-area with a high (low) rate in the input space. These rules have the following form: if condition1 and … and condition K, then the estimated mean outcome value [38]. PRIM has been applied in multiple studies; for example, Nannings et al. [37] applied PRIM for selecting high-risk subgroups in very elderly ICU patients. Sadiq et al. [42] used a PRIM-based bump-hunting method to identify the spaces of higher modes and masses to indicate the peak anomalies in the Center for Medicare Services 2014 dataset.

2.4 Support, confidence, and lift

Support, confidence, and lift are the common statistical indexes for association rules [36, 64]. Support and confidence are two measures of rule interestingness, which reflect the usefulness and certainty, respectively, of discovered rules. Association rules are considered specifically if they satisfy both a minimum support threshold and a minimum confidence threshold. These thresholds can be set by users or domain experts [17].

Assuming item sets A and item sets B are a part of transaction C, the definition of the support is as follows.

$$\text{S}\text{u}\text{p}\text{p}\text{o}\text{r}\text{t}=\text{P}(\text{A}\cap \text{B})= \text{n}\text{u}\text{m}\text{b}\text{e}\text{r}\,\text{o}\text{f}\,\text{t}\text{r}\text{a}\text{n}\text{s}\text{a}\text{c}\text{t}\text{i}\text{o}\text{n}\text{s}\,\text{c}\text{o}\text{n}\text{t}\text{a}\text{i}\text{n}\text{i}\text{n}\text{g}\,\text{b}\text{o}\text{t}\text{h}\,\text{A}\,\text{a}\text{n}\text{d}\,\text{B}/\text{t}\text{o}\text{t}\text{a}\text{l}\,\text{n}\text{u}\text{m}\text{b}\text{e}\text{r}\,\text{o}\text{f}\,\text{t}\text{r}\text{a}\text{n}\text{s}\text{a}\text{c}\text{t}\text{i}\text{o}\text{n}\text{s}$$
(3)

The confidence of the association rule A ⇒ B is a measure of the accuracy of the rule, as determined by the percentage of transactions in C containing A that also contain B. In other words, the definition of confidence is as follows.

$$\text{C}\text{o}\text{n}\text{f}\text{i}\text{d}\text{e}\text{n}\text{c}\text{e}=\text{P}\left(\text{B}|\text{A}\right)=\text{P}\left(\text{A}\cap \text{B}\right)/\text{P}\left(\text{A}\right)$$
(4)

The lift value basically defines the importance of a rule. The lift value of an association rule is the ratio of the confidence of the rule and the expected confidence of the rule [43]. Not every association rule is available; therefore, the usefulness of the lift-quantified association rule is used. The definition of lift is as follows [5, 44].

$$\text{L}\text{i}\text{f}\text{t} = \text{P}(\text{A}\cap \text{B})/\text{P}\left(\text{A}\right)\text{P}\left(\text{B}\right)= \text{R}\text{u}\text{l}\text{e}\,\text{c}\text{o}\text{n}\text{f}\text{i}\text{d}\text{e}\text{n}\text{c}\text{e}\,/ \text{P}\text{r}\text{i}\text{o}\text{r}\,\text{p}\text{r}\text{o}\text{p}\text{o}\text{r}\text{t}\text{i}\text{o}\text{n}\,\text{o}\text{f}\,\text{t}\text{h}\text{e}\,\text{c}\text{o}\text{n}\text{s}\text{e}\text{q}\text{u}\text{e}\text{n}\text{t}$$
(5)

When A and B are positively correlated, lift > 1. A negative correlation between A and B implies that lift < 1. A lift value further from 1 implies a stronger association between A and B [53]. Aside from support, confidence, and lift, Support×Confidence (S×C) has been added in a study as a rule of choosing an index [31].

3 Methodology

In this study, sensitivity analysis was completed using a neural network algorithm. The analysis arranged data in order by features and explored attributes that affect players’ scores. Next, the PRIM algorithm was used to capture association rules that affect players’ scores. Last, association rules were applied to analyze the key factors and scenarios that impacted players’ scores. If a player can recognize their own style and that of their opponent, the key factors of winning and losing, and the context of the competition beforehand, usually the player can set up strategies in advance and may increase his or her chances of winning. This study therefore aims to analyze the factors that affect the winning rate of the player Yun-Ju Lin in single events. In addition, the study compares different scenarios and investigates the influences on and differences between competitors.

Wu proposed the 3 S theory in the ITTF classification system for testing a player’s functions and analysis of technical skills. The three key winning factors in the 3 S theory are speed, spin, and spot, and the theory provides an objective and systematic way to investigate the factors in a player’s hitting process. In addition, the 3 S theory has been applied to plenty of factor analyses regarding performance and strategies in table tennis [37, 38].

In this study, the open competitions of top Taiwanese single player Yun-Ju Lin are used as research samples. Twenty-two international games that Lin played in from 2015 to 2021 were the research resources in the study. Data collection was based on “The Intellectual Tactical System in Competitive Table Tennis” and image analysis for competition videos was used to collect and analyze data. The recorded the ball’s path, with factors including forehand and backhand (2 types), technique (12 types), speed (3 types), spin (5 types), and spot (9 types). Data were collected from 22 games with 14 opponents and a total of 1109 instances. Among the instances, there were 602 winning points and 507 losing points.

The research structure and method were separated into a 4-phase framework (Fig. 4). The first phase is data preprocessing, including data collection and processing. The second phase is sensitivity analysis. The third phase is the association mode of winning and losing. Last, the fourth phase is the scenario analysis. The description of each phase is as follows.

Fig. 4
figure 4

Research frameworks and four phases

3.1 Phase I: data preprocessing

First, attributes are defined, including different scenarios, serving techniques and methods, serving spot, and spin. Furthermore, receiving, serving, and winning points of the opponent are also included. Attribute data were collected from the video of player’s events and recorded into the recording form for later coding and analysis.

3.2 Phase II: sensitivity analysis

Sensitivity analysis can measure the importance of each attribute, which is the level to which the attribute affects the competition result (win or lose). Players’ serve characteristics and different event scenarios were used as attributes. The attributes went through sensitivity analysis using neural networks to measure the importance of each attribute. The steps of analysis [30] are as follows:

Step (1) Obtain a new observed value Xmean from the average of each attribute.

Step (2) Put Xmean into neural networks and produce output Outputmean.

Step (3) Change the rate of each attribute in order from small to big, and when Xmean changes, the network rate changes and is compared to the output Outputmean.

3.3 Phase III: extract association rules from records

In phase III, PRIM was applied to extract the association rules that affect players’ scores. The PRIM algorithm is analyzed as follows [18].

Step (1) Start with all of the training data, and a maximal box containing all of the data.

Step (2) Consider shrinking the box by compressing one face, so as to peel off the proportion α of observations having either the highest values of a predictor Xj, or the lowest. Choose the peeling that produces the highest response mean in the remaining box. (Typically α = 0.05 or 0.10.)

Step (3) Repeat step 2 until a minimal number of observations remain in the box.

Step (4) Expand the box along any face, as long as the resulting box mean increases.

Step (5) Steps 1–4 give a sequence of boxes, with different numbers of observations in each box. Use cross-validation to choose a member of the sequence. Call the box B1. B1 is the indices of the observations in the box found in step 1.

Step (6) Remove the data in box B1 from the dataset and repeat steps 2–5 to obtain a second box, and continue to obtain as many boxes as desired.

PRIM can express more results that researchers are interested in. At the same time, support, confidence, lift, and Support×Confidence (S×C) were used to evaluate the extracting rules to make sure the quality of association rules are in the right quality.

3.4 Phase IV: scenario analysis

The steps for building scenarios and analyzing scenarios in this stage are as follows.

Step (1) Building and analyzing scenarios from the association rules of phase 3.

Step (2) Adaptations for different scenarios: simulate the style of the opponent according to the scenario rules and train to adapt the style.

Step (3) Development of technique and tactical skill: choose the scenario that occurs frequently and analyze the scenario. The scenario helps the player to develop technique and tactical skill by understanding their own strengths and weaknesses and knowing the style of their opponent.

4 Results

Attributes contain three categories: scenarios, serve techniques and styles, and serve spin. Scenarios are the phases of the game (opening, middle, and end game), game statuses, and livescore. In competition, the match is typically best of five or seven games. We divide the game statuses into three types: leading, tie, and behind. In a game, the player who scores 11 points first wins. We divide the livescore into three types: leading, tie, and behind. Serve techniques comprise tradition, hook, squat, and corkspin; spin serves include topspin, backspin, no spin, top sidespin, and back sidespin; serve spot contains forehand short ball, middle short ball, backhand short ball, forehand half-long ball, middle half-long ball, backhand half-long ball, forehand long ball, middle long ball, and backhand long ball. As for opponents, 14 of them are Gaoyuan Lin, Long Ma, Jike Zhang, Xin Xu, and Zhendong Fan from China; Harimoto Tomokazu and Niwa Koki from Japan; and other international players, namely Hugo Calderano, Ruwen Filus, Vladimir Samsonov, Dimitrij Ovtcharov, and Chia Sheng Lee.

Table 2 Attributes importance of sensitivity analysis

Sensitivity analysis was enforced using backpropagation neural networks. The neural network output was measured using each attribute. In the sensitivity analysis, the larger the importance rate of the attribute, the bigger its effect level. The top six attributes chosen were livescore, serve techniques, game status, opponent, serve spot, and serve speed. The total weighting was 0.88. The importance of sensitivity analysis is shown in Table 2, while the top six attributes were analyzed by PRIM to discover the association rules.

Zhang et al. [65] suggested that the minimum thresholds of support and confidence to be set to 5.00% and 60.00%, respectively. The study first set the standard at 0.05 and selected 11 winning rules and 8 losing rules. Later, the study investigated the quality of association rules. In this process, besides the commonly used standards, support, confidence, and lift, and support and confidence were added in this study. Table 3 lists the rules based on their confidence values.

The minimum confidence of winning rules is 0.68 and the lift rates are all above 1, 11 winning rules are chosen in total in this study. In the losing rules, lift rates are all above 1, but L7 and L8 are deleted as their confidence is lower than 0.60, leaving only six losing rules.

Table 3 Rules ranked by confidence

Table 4 expresses the association rules that affect a player’s win and lose scores. W1 to W11 are the winning rules, and L1 to L6 are the losing rules. Rule W1 says IF livescore = lead and opponent = Jike Zhang THEN Yun-Ju Lin Wins the score point. Rule L1 says IF livescore = behind and Serve_spot = forehand_half-long_ball and Serve_tech = tradition THEN Yun-Ju Lin Loses the score point.

The competition scenarios are collected from the association rules of Table 4. Building scenarios are compiled by association rules. Rules W2, 4, 5, 7, 8, 9, 10, and 11 formed Scenario 1, and rules W1, 2, 3, 4, 5, 6, 8, and 10 formed Scenario 2. The rest may be deduced by analogy. As shown in Table 5, we obtained 12 scenarios.

Table 4 Association rules of win and lose
Table 5 Competition scenarios

5 Conclusion

Based on the analysis in the study, the player’s winning rate is higher when the game status and score are ahead. On the contrary, when their score is behind, their losing rate is higher. According to the findings, Yun-Ju Lin would need to adapt to the pressure during events, and obtaining a leading score is particularly important. When the serve is slow and the serving spot is backhand long ball, his winning rate is higher, which is an advantage of Lin. In addition, when the serving spot is a forehand half-long ball, Lin has a higher losing rate, which means he could match the serving style with a higher winning rate and control the tempo in the event. As for opponents, when facing Jike Zhang and Chia Sheng Lee, Lin has higher winning rates, but he has a much higher losing rate when facing Patrick Franziska. Therefore, Lin needs to figure out his specific strategy and enhance it to adapt to Franziska’s style.

In recent years, there have been extensive studies on performance outcomes in table tennis. However, the main problem with previous studies was that the content of data collection was not exhaustive. Traditional “the three-phase evaluation method” has been widely used in table tennis over the past three decades. According to the phase “attack after service”, “attack after receive”, and “rally”, the usage rate and scoring rate of each phase were compared [8, 23, 60]. Chien et al. [9] used the 3 S theory to analyze the tactical characteristics of two young table tennis players. The speed, spin, serve placement point, and technical usage rate of the serve were compared. From the above description, previous research only compared the usage rate and the scoring rate of each stage, which was unable to obtain which features were more important or the associations among different features.

We demonstrated that neural networks were useful for sensitivity analysis to select the attributes that impact a player’s wins and losses. Then, the attributes were gone through the PRIM algorithm to examine the association rules of wins and losses in greater details. Later, the selected rules were analyzed to identify the key factors and scenarios. The main goal of the study was to evaluate and understand the strengths and weaknesses of a player. As a result, the findings may provide useful suggestions for Yun-Ju Lin and his coach on training and building proper strategies in competition. We expect to offer helpful recommendations for other players in training and competition through this scientific analysis and approach revealed in this study.

This method makes up for the shortage of traditional tactical analysis in table tennis and provides effective predictions for different scenarios, game statuses, and opponents. Therefore, it can discover the strengths and weaknesses of the players and the characteristics of the game and draws up the various combinations of winning strategies. However, there may be some possible limitations in this study. This study only focused on the analysis of Yun-Ju Lin and the scenarios with different players in the games. Additionally, the game condition and experience of the player were out of consideration. Therefore, this study is expected to offer an useful reference for future training and strategy of Yun-Ju Lin. Furthermore, the study has gained association rules by analyzing different scenarios. In the future follow-up research, we can integrate and analyze the various performances of serving and receiving to provide more useful suggestions for players and analyze serving and catching in greater depth.