Stackelberg Game-Based Radio Resource Management Algorithm in an Urban Rail Transit Communication System

Train-to-wayside (T2W) and train-to-train (T2T) communication modes may coexist in future train-centric communication-based train control (CBTC) systems. The feasibility of T2T communication in urban rail transit is analyzed first. Referring to the device-to-device (D2D) communication scenario in the general cellular network, this paper establishes a radio resource optimization model for the coexistence of train-to-train communication and train-to-wayside communication. With the aim of more efficient scheduling of radio time-frequency resources in the dedicated frequency band, we propose a Stackelberg game-based radio resource management algorithm based on the consideration of different service priorities of trains. The analysis and simulation results show that the proposed algorithm can effectively guarantee the performance of the system and improve the reliability of the CBTC system.

make full use of radio resources to meet the quality of service (QoS) requirements of different services and improve the overall performance of wireless communication systems is a problem that must be solved.
Wang et al. [5] introduced the current status of C-V2X technology and the progress of communication architecture standards. Emara et al. [6] studied the end-to-end delay evaluation of C-V2X communication based on multiple access edge calculation, where its research background is the road vehicle scene. Chen et al. [7] proposed a C-V2X framework based on network slicing and spatial multiplexing, and allocated resources on demand for different types of users. Knoll et al. [8] put forward a C-V2X resource deployment architecture based on mobile network escort. None of these studies was aimed at urban rail transit application scenarios. At present, C-V2X vehicle-to-vehicle communication is based on D2D technology.
The research on radio resource management of D2D technology is mainly concentrated on the public network, including D2D communication mode selection and D2D communication interference coordination. In terms of mode selection, Azam et al. [9] proposed a communication mode selection mechanism to maximize the system throughput, using a linear approximation algorithm to solve the mixed integer optimization problem to obtain the suboptimal solution. Tang et al. [10] established a joint optimization problem of sub-channel allocation and transmit power in different D2D communication modes, and proposed a distributed two-stage method to separate mode selection and resource allocation, with low computational complexity. Hossain et al. [11] introduced a mode selection scheme based on game theory. The optimization goal is to reduce the transmit power of D2D users and improve energy efficiency. In terms of interference coordination, Oduola et al. [12] considered D2D users multiplexing cellular users' downlink resources for communication, established a power control optimization problem, and proposed centralized and distributed D2D power allocation mechanisms to ensure the QoS indicators of D2D users. Lee et al. [13] designed a distributed power allocation mechanism based on random geometric Poisson distribution, with the goal of maximizing D2D user throughput, but this mechanism does not take into account the interference of cellular users and the system performance of the cellular network. A D2D communication cellular network spectrum allocation algorithm based on game theory was proposed by Song et al. [14]. This algorithm can effectively avoid the interference between cellular and D2D users and make full use of resources.
To sum up, most of the existing research on radio resource management of D2D technology does not consider the complexity of the scene and the difference of user requirements.
Different from pursuing fairness between user equipments (UEs) proposed in Wang's work [15], we propose a radio resource management algorithm based on the Stackelberg game model for the specific application scenarios of urban rail transit, taking into account the priority of different train transmission services. The train-to-train communication responsible for the transmission of CBTC services is taken as the leader, and the train-to-wayside communication responsible for the transmission of Passenger Information System (PIS), CCTV, and other services is taken as the follower. The rest of this paper is summarized as follows. The feasibility of implementing train-to-train communication is analyzed in Sect. 2. Section 3 proposes a radio resource management algorithm based on the Stackelberg game model, which effectively improves the reliability of train CBTC service under the coexistence of train-to-wayside and train-to-train communication modes. Simulation results and discussions are given in Sect. 4. Finally, we conclude the study in Sect. 5.

Feasibility Analysis of Train-to-Train Communication
In this section the feasibility of the train-to-train communication mode is analyzed from two aspects: the wireless signal coverage of the train as the transmitter and the train interval in the actual operation scene of the urban rail transit train.

Wireless Signal Coverage
In the CBTC system, the train antenna is located at the front or rear of the train, it interacts with ground equipment to realize two-way communication between the train and the ground. The LTE-M system equipment technical specification [16] points out that the train shall meet the requirements for train-to-wayside communication when the received signal-to-interference-plus-noise ratio (SINR) value is greater than or equal to -2 dB. We use this specification as a benchmark to simulate the quality of the wireless received signal SINR changes with the train-totrain communication distance based on the actual simulation parameters of the LTE-M wireless communication system by taking into account large-scale fading and small-scale fading models, as shown in Fig. 1.
As can be seen from Fig. 1, with the increase of train-totrain communication distance, the quality of the wireless channel at the receiver continuously decreases. Taking the wireless quality of the receiver as -2 dB in reference [17], the coverage range of the onboard antenna transmission power is about 700 m.

Train Tracking Interval
In the rail transit transportation system, with the development of precise positioning technology and mobile communication technology for trains, moving block control has gradually replaced fixed block control and quasi-movable block control. The target braking point of tracking train is set as the rear of the front vehicle to dynamically control the train operation. Based on the train tracking distance in the absolute distance braking mode, this section regards the speed of the front train as 0, that is, the tail-place of the front train as the end point of mobile authorization, and demonstrates the feasibility of train-to-train communication between adjacent trains.
The train operation scene in the absolute distance braking mode is shown in Fig. 2. L b is the service braking distance corresponding to the current running speed of the tracking train; L s is the safety protection distance between the tracking train and the train ahead; L min is the minimum tracking distance of two trains in this mode.
According to the curve of most commonly used braking characteristics and emergency braking characteristics of trains given in [18], the relation between the value of deceleration corresponding to different speeds and the minimum tracking distance is shown in Table 1. With the increase of train speed, the minimum tracking distance will increase gradually.
It can be seen from Table 1 that under the commonly used train running speed of 80km/h, the minimum tracking distance is 478m, which is less than the coverage of trainto-train communication. Therefore, train-to-train communication is feasible in the urban rail transit environment. The optimization goal of radio resource management is given by taking into account the characteristics of train-totrain communication service in urban rail transit. The coexistence scenario of train-to-train communication and train-to-wayside communication is described. Based on this, the rationality of applying a Stackelberg game to deal with the radio resource management problem is demonstrated.

Optimization Objectives and Transmission Mode
Compared with CBTC service, passenger information system (PIS) and image monitoring system (IMS) services have less significance in urban rail transit systems, and the train-to-train communication should mainly bear train control related services, namely CBTC services. Therefore, the optimization index of train-to-train communication radio resource management should be based on the QoS requirements of CBTC services, that is, higher requirements for reliability but lower requirements for data rate. After the introduction of train-to-train communication, the urban rail transit wireless communication system has become more complicated, which puts forward higher requirements for radio resource management as well. How to improve the service quality of the train-to-train communication train carrying service based on service priority is the optimization goal of radio resource management in this paper. D2D communication is divided into centralized control and distributed control. Compared with distributed control, centralized control not only can take advantage of D2D communication, but can also facilitate the management and control of resources. Therefore, in this scenario, the trainto-train communication mode adopts centralized control. When train-to-train communication and train-to-wayside communication coexist and the same time-frequency resources are multiplexed, the two communication links will interfere with each other. In the resource multiplexing mode, it can be divided into two situations: multiplexing train-to-wayside communication uplink resources and downlink resources [19]. Due to the limitation of onboard terminal power, the LTE system is an uplink limited system, so this paper takes the LTE uplink as the research object. In Fig. 3, the D2D link (Sidelink) between train 2 and train 3 multiplex the uplink Uu channel resources of train 1. Train 2 is used as the transmitter of the D2D link, and train 3 is used as the receiver. The interference situation in the system is divided into two parts: the Uu link of the base station is interfered by the transmission power of the Sidelink link of train 2, and the Sidelink is interfered by the running channel on the Uu link of the train-to-wayside communication of train 1. The next step is to study the multiplexing of D2D Sidelink link and establish a resource scheduling model with Fig. 3 as an application scenario.

Optimization of Problem Transformation
Different from the public network that pursues fairness in resource allocation, the number of urban rail transit users is small and the priorities of different train services are different. Train-to-train communication is used to bear the CBTC service with the highest priority, while train-towayside communication is used to bear PIS and CCTV services with lower reliability requirements. Therefore, in the Stackelberg model game, the train-to-train communication link is the leader so as to encourage the establishment of the D2D link to improve the reliability of the CBTC business. Since the D2D link and the train-to-wayside communication link multiplex wireless resources, the train-to-wayside communication will cause interference to the train-to-train communication link, which may affect the transmission rate of the CBTC service. Therefore, the trainto-wayside communication is defined as a follower. On the basis of ensuring that the CBTC service is not affected, the throughput of video services such as PIS is increased as much as possible to maximize the link revenue.
According to the transmission mode mentioned above, the transmission mode of train-to-train communication is multiplexing the uplink channel of train ground In this section, the concept of PRB is used for resource scheduling during analysis. Combined with the actual scenario, in order to avoid complex channel interference caused by channel multiplexing, it is assumed that the D2D link multiplexes the PRB of a single train-to-wayside communication link. Each PRB is occupied by one train-towayside communication link at most and one D2D link at a time. And it is assumed that the link is not interfered by other factors except for the interference caused by channel multiplexing and Gaussian white noise.
Based on the above analysis, it can be concluded that the wireless signal quality at the receiver of the D2D link d can be expressed as Eq. 1. We first define a binary variable x d;n . If the D2D link d occupies PRB n , then the value of x d;n is 1; otherwise, the value of x d;n is 0 [15].
In Eq. 1, p n d represents the transmit power on PRB n at the transmitter of the D2D link, h n d is the channel gain of the D2D link, p n c represents the transmit power of the base station of the train-to-wayside link on PRB n , f n c;d represents the interference of the train-to-wayside communication base station to the train at the receiver of the D2D link, and N 0 represents Gaussian white noise power. Similarly, the wireless signal quality of the base station corresponding to the train-to-wayside communication link at PRB n can be expressed as Eq. 2.
where h n c represents the channel gain of the link between the train-to-wayside communication train and the base station. f n d;c represents the interference of the train at the transmitter of the D2D link to the base station. According to the previous analysis, the impact of train-to-train communication on the reliability of the train's CBTC service depends on the transmission power of the D2D link transmitter when the quality of the D2D link wireless channel is certain. However, the transmitter of the D2D link will interfere with the traditional train-to-wayside communication link, thereby affecting the throughput of the train-to-wayside communication user service. Therefore, in this paper, the problem of resource allocation under the coexistence of train-to-train communication and trainto-wayside communication is transformed into the study of throughput of both parties.

Radio Resource Management Algorithm Based on Stackelberg Game
Since the D2D link and the train-to-wayside communication link multiplex radio resources while improving the spectrum utilization rate and CBTC service reliability, it will inevitably cause interference to the train-to-wayside communication link. In order to reduce the influence of interference on the two communication modes, the Stackelberg model game is introduced to coordinate resources [20]. Train-to-train communication bears the highest-priority in CBTC service and takes priority in resource occupation. Therefore, this paper defines the D2D link as the leader and the cellular user (CU) link as the follower. The leader owns channel resources. If the followers want to share channel resources, the leader will formulate a strategy to charge appropriate fees; followers will consider whether shared resources are beneficial according to the cost, and determine the transmission power to obtain revenue.

Utility Function of Leader D2D Link
For the D2D link, the train in the CU link causes interference to the train d 0 at the receiver of the D2D link during uplink transmission. The train runs at high speed in the tunnel, and the channel quality of vehicle-to-vehicle communication is related to speed. The faster the speed, the larger the Doppler spread, and the more drastic the channel changes. In this model, the leader's utility function is determined by service priority, train speed, CU link interference, and base station, which is defined as throughput plus CBTC service priority weight and gain from CU link, and minus the loss caused by CU link interference to receiver, as shown in Eq. 3.
where g dd 0 ¼ jh d j 2 and g u;d 0 ¼ jf n u;d 0 j 2 represent the channel gain of the D2D link and the channel gain between the CU train and the train d at the receiver of the D2D link, respectively. p u is the transmit power of the CU link train; p d is the transmit power of train d at the transmitter of the D2D link; a is the charge price (a [ 0), which represents the fee charged to the D2D link because the D2D link reuses the uplink channel of the CU link for communication. b represents the priority of the CBTC service, m represents the train speed, and c represents the priority of the video service.
The optimization of the follower utility function can be summarized as adjusting p d so as to maximize u c a; p u ð Þ, as shown in formula 4. In addition, since the train transmission power p d is limited by practical factors such as power limit and antenna gain, there is a constraint condition of p min p d p max , shown in formula 5.

Utility Function of the Follower CU Link
Since the D2D link multiplexes the uplink channel resources of the CU link, the train d at the transmitter of the D2D link causes interference to the base station c. Similar to the leader's utility function, the follower's utility function is defined as Eq. 6: where g c ,jh c j 2 and g d;c ,jf d;c j 2 respectively represent the channel gain of the link between the CU train and the base station, and the channel gain between the train and the base station at the transmitter of the D2D link. The optimization of the leader's utility function can be summarized as adjusting a to maximize u d a; p d ð Þ, as shown in formula 7. max a u d a; p d ð Þ ð7Þ

Optimize the Utility Function of the Leader D2D Link
In the Stackelberg model game, the follower chooses the optimal strategy based on the leader's strategy fee a. At this time, the leader benefit function u d a; p d ð Þ is a unary function of p d and is differentiable. Taking the derivative of u d a; p d ð Þ, we can get: Suppose the above formula is equal to 0, and the only stationary point is: Continuing to analyze and solve the second derivative of u d a; p d ð Þ, it is always less than 0. .

Optimizing Utility Function of the Follower CU Link
From formula 9, whether p Ã d belongs to the interval p min ; p max ½ is related to the price a determined by the leader. When a is too large, p Ã d p min , and the follower's optimal strategy is p d ¼ p min . It can be obtained from the leader's utility function (4) that if p d is too small, the income will decrease. When a is too small, p Ã d ! p max , the optimal strategy for the follower at this time is p d ¼ p max , which will increase the interference of the leader and reduce its profit. Therefore, when the leader makes a decision, it hopes p Ã d 2 p min ; p max ½ . According to formula (10), p Ã d and a are negatively correlated, and a must satisfy: At this time, substituting the leader's best strategy formula 9 into the follower's utility function 6, we can get: Observing the structure of the above formula, let A ¼ 1 ln 2 , B ¼ p c g c , C ¼ ðp c g u;d 0 þN 0 Þg d;c g dd 0 and D ¼ 1 bv þ cp u g c , and simplify the above formula for subsequent analysis. At this time, the follower's utility function can be rewritten as: Derivation of the price a: Let the derivative function be equal to 0, and analyze the coefficients of the quadratic term to get the quadratic equations in one variable about a, as follows: By solving the above equation, the following equation can be obtained: where According to the relationship between N 0 À C and B À C þ N 0 and 0, the best strategy of the follower is obtained at a ¼ a Ã , a ¼ a min and a ¼ a max , respectively. From this, it can be concluded that the follower's optimal strategy is a 2 a Ã ; a min ; a max f g . So far, the analysis of radio resource management for the leaders (train-to-wayside communication) and the followers (train-to-train communication) based on the Stackelberg model game is completed under the coexistence of train-to-wayside communication and train-to-train communication modes in urban rail transit.

Simulation Model and Parameters
The scenario including train-to-train communication is shown in Fig. 3, in which trains meet the minimum tracking interval requirements, and CU links are established with the base station. In addition, the train group that meets certain conditions will establish a D2D communication link according to the train-to-train communication process assisted by the base station. The simulation parameters including train-to-train communication are shown in Table 2.

Change of Train-to-Train Communication Distance
(1) Influence on the Strategies of Both Parties in the Game Figure 4 describes the impact of train-to-train communication distance on the CU link strategy. It can be seen from the figure that as the train-to-train communication distance increases, the best price strategy for the CU link continues to decrease. The reason is that under the condition that the transmission power of the CU link train and the wireless channel gain are unchanged, as the channel quality of the D2D link continues to decrease, the revenue that the CU link obtains from the D2D link is also constantly decreasing. Therefore, in order to improve the reliability of the CBTC service and encourage D2D link communication, the price charged by the CU link is also continuously decreasing. Figure 5 describes the influence of train-to-train communication distance on the D2D link power strategy. As a follower, the D2D link provides the best transmit power strategy based on the price charged by the CU link. It can be seen from Fig. 5 that the optimal transmission strategy of the D2D link increases with the increase of the train-to-train communication distance, and can be divided into three segments . When the train-to-train communication distance is too close, due to excessive interference to the CU link, the price charged by the CU link is very high. In order not to lose more on the D2D link, the best transmission strategy for the transmitter is p min . When the train-to-train communication distance is moderate, and the price charged by the CU link makes the stagnation point of the D2D link utility function fall within p min ; p max ½ , the optimal transmission strategy of the transmitter is p Ã d , and the CU link and the D2D link reaches the game equilibrium. When the distance between trains is too big, although the CU link charge price is low due to the large loss of the D2D link, in order to improve its throughput the best transmission strategy of the transmitter is p max .
(2) Changes in throughput In order to verify that train-to-train communication can improve the reliability of CBTC business, this paper simulates the changes in the throughput of CBTC and the throughput of the base station side of the CU link after the introduction of train-to-train communication, and compares it with the situation of traditional train-to-wayside communication, as shown in Fig. 6.
It can be seen from Fig. 6 that after the introduction of train-to-train communication, the throughput of train CBTC service has been significantly improved, from 0.1 Mbps in traditional train-to-wayside communication to about 0.8 Mbps. This is because the D2D link improves the wireless signal quality of the train at the receiver, thereby increases the throughput. In addition, the throughput of the train CBTC service that introduces train-to-train communication has undergone a process of declining, rising, and then decreasing  with the increase of the train-to-train communication distance. This is because although the optimal power strategy of the D2D link continues to rise, the increase cannot offset the path loss caused by the increase in the train-to-train communication distance, so the throughput of the CBTC service has declined. When the train-to-train communication distance increases to 0.4 km, the charge from CU link to D2D link is 0. Figure 5 shows that the optimal power strategy of the train increases sharply at this time, reaching p max , so the throughput of CBTC business begins to rise. As the train-totrain communication distance continues to increase, the path loss continues to increase, and the optimal train power strategy is limited by p max , thus the throughput of CBTC services begins to decline. As for the throughput of the base station, because the D2D link multiplexes the uplink channel of the CU link, the base station will inevitably be interfered, so the throughput of the base station on this uplink channel is reduced compared with the scenario of the traditional train-to-wayside communication mode. However, since the CU link only allows the D2D link to multiplex one sub-channel, the number of uplink channels available to the train is large. Taking 10 MHz bandwidth as an example, the number of physical layer radio resource blocks is 50. Therefore, on the whole, the throughput of the base station is basically unchanged.

Change of Train-to-Wayside Communication Distance
(1) Influence on the Strategies of Both Parties in the Game With the increase of the train-to-wayside communication distance, the uplink signal received by the base station side is continuously decreasing. When the quality of the wireless channel of the D2D link and the signal of the transmitter remain unchanged, the interference from the D2D link received by the base station side remains unchanged, so the CU link revenue decreases. The CU link hopes to reduce the interference of D2D link by charging a higher price, which is reflected in that the optimal price strategy increases with the increase of vehicle ground communication distance, as shown in Fig. 7.
For the train at the receiver of D2D link, although the interference from the uplink signal from the CU link train is decreasing, as the CU link is the leader and the charging price is continuously increasing, D2D link will gradually reduce the transmitting power of its transmitter in order to pay as little as possible, as shown in Fig. 8  Optimal power strategy(dBm) Fig. 8 The optimal transmitted power strategy of D2D-link (2) Changes in throughput This section simulates the influence process of the change of train-to-wayside communication distance on throughput before and after the introduction of train-to-train communication, and the results are shown in Fig. 9. As can be seen from Fig. 9, after the introduction of train-to-train communication, the throughput of the train CBTC service at the receiver of the D2D link has been significantly improved, and it increases with the train-towayside communication distance. This is because, although the power of the transmitter of the D2D link continues to decrease, the interference of the CU link train to the train on the receiver of the D2D link decreases faster. On the other hand, the throughput of the base station side decreases with the train-to-wayside communication distance, because the path loss increases with the distance when the transmission power of the CU link train remains unchanged. At the same time, after the introduction of the train-to-train communication mode, the base station suffers interference on the shared channel, and its throughput is less than that of the base station under the traditional train-to-wayside communication mode.

Conclusion
The radio resource management of urban rail transit communication systems under the coexistence of train-to-train communication and train-to-wayside communication modes is studied in this paper, and a resource scheduling scheme based on Stackelberg model games is proposed and verified. The simulation and analysis results show that the proposed algorithm can improve the reliability of train CBTC service without affecting the train-to-wayside communication, and provide a reference for the radio resource management of the next-generation train control systems in the future. Throughput(Mbps) with D2D base station throughput with D2D train CBTC throughput without D2D base station throughput without D2D train CBTC throughput Fig. 9 The comparison of throughput before and after introducing train-to-train communication Urban Rail Transit (2021) 7(2):128-138 137