1 Introduction

Indian railway (IR) operates several different categories of trains, widely varying in terms of their technology, speed, braking characteristics, etc., and termed as mixed traffic. The present static block signalling (SBS) [1] used in IR has several limitations with respect to line capacity utilization and safe train operations, which can be improved through communication-based train control (CBTC) system [2] with moving block signalling (MBS) [3]. It is well known that the train control network is time critical (i.e. real-time response) and mission critical (i.e. reliability). Any operational or equipment fault may cause serious consequences; therefore, safety should be especially concerned [4]. Countries, where CBTC is in use have no such wide variation in the train categories and thus make their operational issues simplified for the implementation of CBTC. Hence, for study of mixed train traffic operation under MBS with the support of multi-agent-based CBTC system, it is necessary to propose a better train control system for IR to overcome the limitations of the present system. It involves developing the architectural design of multi-agent-based CBTC system followed by modelling, validation and verification through formal approach to check the efficacy and correctness of the designed system.

The complex, distributed, dynamic and highly interactive properties of the CBTC system make the multi-agent-based computing technique more suitable for its development [5]. A multi-agent-based computing system has several advantages over a traditional computing system as its distribution property naturally decomposes the system into multiple agents and the interactive property allows these agents interact with each other to achieve a desired global goal. Multi-agent-based computing technique is more suitable for the design and analysis of systems where the system is divided into several geographically distributed sub-systems in a dynamic environment and these sub-systems need to interact with each other more flexibly [6]. The operating scenarios of railway transportation system have similarity with the above scenario, thus making the multi-agent technology most suitable [712].

Traditionally, the development of multi-agent systems involves logical structure design and its implementation. Methods used for the evaluation of behavioural properties of multi-agent systems are based on actual implementation and formal modelling. The former is based on agent implementation tools [13] such as Java agent development framework [14], Zeus agent building tool-kit [15], Jack intelligent agents framework [16], etc., whereas the latter is based on Vienna development method [17], Z notation [18], Calculus of communicating systems programming language [19], Language of temporal ordering specification [20], Temporal logic [21], Communicating sequential processes formal language [22], Petri net [23], Coloured petri net (CPN) [24], etc. Formal modelling has several advantages over actual implementation to check the correctness and behaviour of the design through development of various test cases. CPN is a graphical and executable formal modelling technique well suited for model building and behavioural analysis of distributed and concurrent multi-agent systems [25].

Our earlier structural design of multi-agent CBTC system for IR [26] named Indian railways management system (IRMS) discusses the methodology for engineering system of software agents (MESSAGE) [27] based design details of sub-goal moving authority given to the block section. Verification and validation of the design through a formal modelling approach are taken up in this work. The rest of the paper is organized as follows. Section 2 covers the related work and Sect. 3 presents an overview of the proposed system. Modelling of the system is presented in Sect. 4, and validation and verification in Sect. 5. Finally Sect. 6 concludes the paper.

2 Review on CPN modelling

Various issues in research related to railway systems have been dealt with through mathematical modelling, simulations, multi-agents, soft computing techniques, etc. The rolling stock characteristics, infrastructure and the operational rules of IR [28] make the system unique from the other railway setup used elsewhere in the world. Taking a cue from related research, the project necessitated the detailed study of existing IR train control system and proposes a multi-agent CBTC system for IR with minimal modifications to the present setup. To the best of knowledge, our work is the first to explore the issues specific to IR that might be important to consider before moving IR to CBTC on MBS-based infrastructure. In the following, we discuss the use of CPN for the modelling of a few distributed and concurrent systems.

2.1 CPN modelling of multi-agent systems

An agent conversation protocol [29] has been implemented in CPN due to its simplicity and graphical representation along with great support for concurrency. Further, an idea about the implementation of CPN models in a real multi-agent framework is given. It used CPN to investigate the working of the proposed agent conversation protocol before its actual implementation on a real multi-agent framework.

An FIPA (foundation for intelligent physical agents) [30] compliant agent platform called concurrent agent platform architecture is developed in Petri net to provide facility of inter-platform communications in multi-agent nets (Mulan) architecture. It used Petri net to maintain the high degree of concurrency of the multi-agent system.

An urban traffic information system to solve path searching problem is designed as a case [31], to represent the use of CPN for designing multi-agent system. The authors described the multi-agent system as a specialization of distributed object oriented systems and proved the efficacy of CPN for modelling such systems.

A hierarchical CPN-based multi-agent system is presented in [32]. Each agent is modelled as a separate net that is connected with other agents’ nets and forms a hierarchical net. The behaviour of the multi-agent system is analysed through the dynamic properties such as reachability, deadlock detection and avoidance, fairness etc.

In our earlier work [33], a multi-agent-based CBTC framework with MBS using the MESSAGE methodology was presented for IR, and a simplified model of sub-goal and moving authority given using CPN was built.

2.2 CPN modelling of railway domain problems

The advantages of CPN-based formal modelling to describe a complex system with critical requirements are presented in [34]. It considered railways as a case and followed a modular approach to explain different aspects of modelling a complex system through CPN.

The timed CPN is used to model and analyse both operating schedules and the infrastructure of a railway station in [35]. The paper explains the use of timed CPN to model and analyse the dynamic behaviour of large and complex systems. It also provided a new analysis technique that constructs a reduced reachability graph.

The modelling of interlocking tables using CPN is presented in [36]. The developed CPN models comprise signalling layout that represents the physical arrangement of signals according to SBS system and interlocking control that represents actions performed for interlocking according to interlocking tables. These basic models can be reused to model more complex and large interlocking systems. The verification of modelled interlocking tables in various scenarios is presented in [37].

A study of high speed train positioning system on railway line equipped with European rail traffic management system level-2 is presented in [38]. The paper proposed a CPN model of train movement with interaction between train and eurobalises installed on the track that describes the various causes of eurobalise’s degradation. The authors focused on faults related to balise and balise transmission module antenna, while other types of faults those may arise during the train operation are not discussed.

A vehicle-on-board automatic train protection sub-system of CBTC system is implemented in CPN [39]. The work emphasized more on how to refine the basic CPN-based model for further research in the area of vehicle-on-board automatic train protection. The paper exemplifies the restricted speed estimation for a running train with respect to the obstacle on the track ahead.

To the best of our understanding, none of the above works provided system level structural design details and their behavioural analysis. It becomes very important to supplement the behaviour study of the structural design part of the designed system. With such an objective, we split our work into two components: the structural design followed by formal modelling of the designed system for behavioural analysis. This report is an extension of our earlier work [26] on structural design and focuses only on the behavioural analysis of the IRMS safety critical system through CPN modelling.

3 Architectural overview of IRMS

In this section, the major components of IRMS and their interaction are reproduced from our previous work [26] to help the reader in getting an overview of the system.

The software agent-based CBTC rail track infrastructure is divided into areas or regions, each under the control of a zone controller (ZC) and each with its own radio trans-receiving system with reliable and continuous radio link. Figure 1 shows the high level architecture of IRMS which describes the important functional components and their corresponding agents. It consists of four principal components: ZC, station controller (SC), trackside device control system (TSDCS) and on-board device control system (OBDCS).

Fig. 1
figure 1

Abstract level diagram of the IRMS [25]

OBDCS comprises a vehicle-on-board controller and several train borne equipment, such as global system for mobile communications-railway trans-receiver antenna, RFID reader, speedometer, accelerator, braking unit, driver display screen, train integrity monitor, etc. Train borne equipment collects the relevant information, such as train number, speed, location identifier, direction, etc., and periodically send it to their respective ZC. Location identifier is the location reference read by train borne RFID reader through track side RFID tags.

Upon receipt of this information, the respective ZC computes a safe moving authority on the basis of track status ahead and train characteristics, and communicates to the respective train. Further, ZC gives instructions to SC to create the root by interlocking for the arrival or departure of trains in the station section. Train handover between two adjacent ZCs occurs when a train passes the overlapped ZC boundaries.

Each station section has a single SC responsible for interlocking (establishing or releasing a route) for the arrival or departure of trains. The route is created by fixing all switches presented on the path in a required position. SC receives a route from ZC, i.e. sequence of switches and their positions, and gives instruction one by one to TSDCS to fix each switch in a required position. The SC on receipt of positive acknowledgement for all required switches from TSDCS gives route establishment acknowledgement to ZC. If SC receives negative acknowledgement for any required switch, it releases all switches and gives negative route establishment acknowledgement to ZC.

Switches falling under a particular station section are controlled by the respective TSDCS. It is responsible to fix each individual switch in a required position. It receives a switch identifier and its position information from the SC. If the switch is fixed in the desired position, it transmits a positive acknowledgement to SC; otherwise negative acknowledgement. TSDCS also continuously monitors the health and status of all its switches, and reports to respective SC.

To check the correctness and behaviour of the design, various operational scenarios (such as RFID fault, train equipment’s fault, train partition fault, track fault, communication fault) in block section are considered. The system has several agents and their sub-agents, which are distributed geographically and work cooperatively to accomplish the sub-goals. The sub-goal moving authority given is the key component for ensuring safety during various faults occurring at runtime. This sub-goal is accomplished by moving authority provider agent (MAPA) with support from other agents. Figs. 2 and 3 describe the various sub-agents of MAPA and their interaction with other agents, respectively, to provide moving authority. Circles with label C1 and C2 in subfigures (a) and (b) of Fig. 3 connect both subfigures.

Fig. 2
figure 2

MAPA’s sub-agents [25]

Fig. 3
figure 3figure 3

Workflow of sub-goal moving authority given [25]

4 CPN modelling

The modelling aspects of IRMS followed by a detailed discussion on MAPA and its sub-agents are carried in this section.

4.1 Modelling of IRMS

The CPN model is hierarchically structured into 36 subnets or pages. Each agent functionality is modelled on a separate page with page name same as the agent’s. Page IRMS (see Fig. 4) is the top page and provides an abstract view of a block section and the associated track-cum-communication infrastructure of the system. Table 1 describes the model and simulation parameters, and Table 2 characterizes the mixed traffic. Page IRMS has the following four types of substitute transitions. The place connecting two substitute transitions represents a communication link between them.

Fig. 4
figure 4

Top-page IRMS

Table 1 Model and simulation parameters
Table 2 Train characteristics of mixed traffic
  • Initialize Trains It models mixed traffic scenario. Mixed traffic is a crowd of trains with different physical characteristics such as train type, length, maximum speed, braking capacity and varying running schedules. Specific tokens represent trains and their values represent trains’ static and dynamic characteristics.

  • Cell Each block section is divided into a number of cells. This type of transition implements the functions necessary for the OBDCS installed on trains and track infrastructure.

  • Access Point Substitute transition models the functionalities of access points to facilitate the communication between OBDCS and ZC.

  • Zone Controller ZC encompasses several functionalities (see Fig. 1). However, the scope of the paper limits the discussion to two important functionalities within ZC. First, the MAPA with a variety of fault scenarios and resolution mechanisms; and second, the desired functionalities of ZCSA essential for achieving the sub-goal moving authority given.

The mixed traffic generated by page Initialize Trains runs within the pages Cell1 to Cell10 according to the assigned schedule. Each page of type Cell models the segment of track equipped with installed RFID tags and the functionality of train’s OBDCA. Train location information along with other necessary information is transmitted from respective Cell type page to Zone Controller page via corresponding Access Point page type.

The page Zone Controller comprises two substitute transitions ZCSA and MAPA. Page ZCSA models the functionality of agent ZCSA responsible for communication between pages Access Point and MAPA. Page ZCSA maintains a list of trains running under its control and models the handover process of trains across ZCs. The operations performed by page MAPA are described in the next sub-section. Page ZCSA receives a moving authority for the train from page MAPA and forwards it to the corresponding page of type Cell, via a connected page of type Access Point.

The page of type Cell forwards the received moving authority to the corresponding train’s OBDCA. The train’s OBDCA updates its information and follows the moving authority. At the time of sending the train’s location and other essential information to the Zone Controller page, OBDCA sets a time out of receiving the corresponding moving authority. If the response is not received within the set timeout, a communication failure is assumed and train’s OBDCA activates the emergency braking mechanism. Train retards according to its emergency braking characteristics. The last saved location information is considered as a train’s current location by page MAPA. Moving authority to trailing trains is provided based on this information for them to allow adjust their speed. Normal operations are resumed upon regaining the communication link.

4.2 MAPA model

Functional details and the modelling aspects of agent MAPA and its sub-agents are described. The model implements file handling procedures to store and process information related to the train and track. The page MAPA contains the following eleven substitute transitions, each modelling the functionality of its sub-agents.

  • Page ITPIA implements the initialization of trains’ position, length and direction at ZC.

  • Page RFMA models the mechanism to handle RFID faults resulting from either tag or reader malfunctioning. Flag value 1 indicates fault when two successive RFID locations are not recorded. The flag value is further used by page ERSCA for taking decision on applying emergency brake.

  • Train’s consistency information (handled by page TTFMA) defines the overall health status of its equipments, and flag value is set to 1 when the status is healthy. Flag status is used to take decision for applying the emergency brake by page ERSCA.

  • Train partition is a fault scenario where the coaches of a running train decouples, resulting in multiple parts of the same train. Page TPFMA handles these faults by taking current location and train information as input. The train partition fault flag is set to 1 when partition occurs. It records the last RFID tag number corresponding to the parted train’s rear end that is used for the estimation of trailing train’s moving authority.

  • Page ERSCA uses the flag values pertaining to the above faults for applying the emergency brake if required for the affected train according to its braking characteristics.

  • Track fault is a fault scenario where a train may be allowed to run at some restricted speed or not allowed at all. Page TFMA operates on the file containing track health information and their corresponding allowable speed of the trains. This information is used by page ORSCA for safe moving authority calculation. The track health file is continuously updated from the track fault detection devices.

  • Collision-free train running depends much on maintaining a safe distance from its successor trains. This necessitates recording of successor trains’ rear end location for the calculation of safe moving authority for the trailing train. Page STMA implements this functionality.

  • The moving authority for a train is affected by successor trains and ahead track faults which are referred to as obstacles. In some kind of track faults, the train is allowed but with restricted speed. Page ORSCA takes the successor trains’ (including parted trains’) information and fault data of track ahead to calculate the allowable speed corresponding to each obstacle and takes the minimum of all. It is used by page SBDCA for final allowable speed calculation.

  • Turnouts on rail track have speed restrictions and the running trains must follow these restrictions. Page TRSCA implements and ensures the prescribed speed restrictions. It takes the specifications of the turnout ahead (like its location and allowed speed) and train information to calculate the allowable speed.

  • Safe braking distance is an important factor for collision-free train operation. To ensure this, the system must alert the running train sufficiently ahead in time so that the train can be stopped according to its braking characteristics. Page SBDCA takes the obstacle restricted speed from page ORSCA, emergency brake restricted speed from page ERSCA and turnout restricted speed from page TRSCA as input to estimate the final allowable speed. The safe braking distance is computed based on the final allowable speed and train’s braking characteristics.

  • When a fault is detected, a train’s normal running is affected either by completely stopping the train or allowing running at a restricted speed until normalcy is restored. Page TRMA models detection of fault and waits for a random period. This random period models the fault recovery time. Once restored, the system initiates the process of computing the moving authority.

5 Validation and verification

Validation allows checking whether or not the developed model meets the expected behaviour and verification allows identifying errors and anomalous properties in the developed model. CPN provides a simulation facility for the validation and a state space graph for the verification of the developed models.

5.1 Validation

A simulation study under mixed traffic condition is done to understand the behavioural aspects of the model to achieve sub-goal moving authority given. The prime objective is to provide a moving authority for the trains to ensure collision-free movement and to study the proactive behaviour of IRMS to fault scenarios. The CPN model has been tested for its validity through simulation report analysis. The simulation report is used to extract relevant data to create graphical representations of simulation results.

Availability of aggregated time series data pertaining to various faults is not maintained by any department in IR. Mostly, the faults are rectified by the local technical field staff and practically there is no record keeping mechanism of such data for future analysis. Thus, obtaining such fault related data and fitting it to a model was not possible.

To evaluate the working of the model in the actual traffic scenario, we conducted several experiments for a group of trains with different characteristics representing the mixed traffic. The system deals with two kinds of parameters associated with the track: operator defined which is static and deterministic such as the maximum allowable speed on the track, turnout restricted speed. The other arising in real time is dynamic and random, e.g. speed restrictions due to track faults, train equipment faults or any kind of obstacle on the track. Due to the static and dynamic parameters, trains’ normal running is affected. However, as the static parameters are deterministic, the system has prior knowledge about this. The system takes care of such restrictions while estimating moving authority for the trains. On the other hand, the occurrence of dynamic parameters is known at runtime and this is challenging while estimating moving authority for the trains. In our work, both scenarios have been taken to test the overall responsiveness of the proposed system. The results obtained are categorized into two parts: one showing the relationship between speed and location to represent the safe moving authority provided to the trains for their safe running in case of any fault scenario identified in runtime, and second, the relationship between time and location to represent the collision-free running of the trains in the same scenario. For the purpose of describing the behavioural analysis, we considered only one instance from several simulation experiments for each fault type.

5.1.1 RFID fault

Failure of RFID tag or RFID reader results in failure of recording the location information, and in such a scenario the train(s) must be stopped for safety reasons. A tag fault is detected, a by page RFMA at RFID tag 2,501 by train 1120 (Fig. 5) running at speed 31.53 m/s. Page ERSCA estimates speed according to emergency braking characteristics of the affected train. The affected train and the following trains adjust their speeds according to the moving authority received from MAPA (shown by decreasing speed curves). The increasing speed curves (starting at tag 2,531) indicate the train encountered healthy RFID tags, and eventually regain the normal speed at tag 2,591. The speed curves of all trains gradually drop to zero in the same sequence of their dispatch prior to the end of the block section at RFID tag 5,000. The non-intersecting time curves in Fig. 6 represent the running of a collision-free train during handling of RFID faults. The time curves pertaining to trains between location 2,501 and 2,530 show a slight jump, indicating the slow speed of trains in the fault section. Finally, when trains stop at the end of the block section, the upward movement of their time curves (see Fig. 6) indicates that the trains have stopped.

Fig. 5
figure 5

Speed versus location (RFID tag fault)

Fig. 6
figure 6

Time versus location (RFID tag fault)

5.1.2 Train equipment fault

Page OBDCA periodically sends a consistency report to ZC indicating the health status of train borne equipment. Pages TTFMA and ERSCA of MAPA process the consistency report and an issue emergency brake command if a fault is detected. Equipment fault in train 1118 detected at location 2,750 (Fig. 7) causes page MAPA to command the application of emergency brakes. The train decelerates from its present speed of 29.5 m/s and finally stopped at tag 2,820. Trailing trains decelerate according to the moving authority received to maintain a safe braking distance from leading trains. The time curve of train 1118 going vertically straight (Fig. 8) at location 2,820 represents the fault rectification time (528.45 s). Time curves of trailing trains show that all trailing trains are stopped due to the fault on the leading train. The increasing speed curves (starting at tag 2,820) indicate that the fault has been rectified and eventually all trains regains their normal speed.

Fig. 7
figure 7

Speed versus location (Train equipment fault)

Fig. 8
figure 8

Time versus location (Train equipment fault)

5.1.3 Train partition fault

When a partition fault is detected, the trailing trains are not allowed to enter into the area between the locations where train partition was detected and its final halt point, until the parted coaches are coupled. The response of the system to partition faults is shown in Fig. 9. Pages TPFMA and ERSCA of MAPA detect the partition of train 1118 at location 3,001 and instruct to apply the emergency brake. Upon receiving moving authority, it starts deceleration from its present speed of 29.5 m/s and finally stops at location 3,071. The rear end’s parted coaches may be anywhere between the RFID tag 3,001 and 3,071. Trailing trains receive moving authority to stop prior to the location 3,001. The time curve of train 1118 going vertically straight (Fig. 10) at location 3,071 represents the fault rectification time (654.76 s). The results show that system is able to move trains safely even in train partition fault situations.

Fig. 9
figure 9

Speed versus location (Train partition fault)

Fig. 10
figure 10

Time versus location (Train partition fault)

5.1.4 Track fault (restricted speed 10.0 m/s)

Trains are allowed to run at some restricted speed when a portion of the track is not healthy enough to support train run at normal speeds. Pages TFMA and ORSCA of MAPA deal with track faults to control speeds of trains according to the applicable speed restrictions. Figure 11 depicts a restricted speed region between location 3,501 and 3,700 where the trains are required to run at reduced speed of 10.0 m/s. The leading train 1120 running at speed 31.53 m/s decelerates at location 3,374 and reaches the restricted speed at location 3,501. After the restricted speed region, trains accelerate to their normal speed while maintaining safe braking distance between them. The time curves of the trains (Fig. 12) between location 3,501 and 3,700 show an upward trend, indicating trains running at restricted speed and taking more time to cross the restricted region in the track section.

Fig. 11
figure 11

Speed versus location (Track fault with 10.0 m/s restricted speed)

Fig. 12
figure 12

Time versus location (Track fault with 10.0 m/s restricted speed)

5.1.5 Communication fault

Fault during communication in safety critical systems can be attributed to the message omissions, bit flips and timeliness of message arrivals. Scenarios arising out of these faults affect the safe movement of trains. Bit flip faults can be handled with the help of any message consistency check methods and are not considered here. If the train does not receive moving authority from ZC within a timeout, its OBDCA stops train by application of emergency brake. Figure 13 depicts communication failure between all trains and ZC for 550.0 s (at the simulation time 750.0 s). All trains decelerate as a result of emergency braking shown by the decreasing speed curves, and their corresponding stop locations indicate that the trains maintain safe distance. Trains accelerate when the communication is regained. The vertical straight part of the time curves (Fig. 14) of trains represents that the trains have stopped due to communication failure.

Fig. 13
figure 13

Speed versus location (Communication fault)

Fig. 14
figure 14

Time versus location (Communication fault)

5.2 Verification

The state space report obtained from the model is analysed to check the correctness of the model and see if the model satisfies dynamic properties. Complex models with a large number of continuous variables generate large state space graph and report that is difficult to analyse. To arrive at a state space graph and report that is convenient to analyse, we reduced the number of trains from four to two and the track length from 50 km (5,000 RFID tags) to 1 km (100 RFID tags). Despite of this size reduction, the resulting state space graph was too large for visual analysis. Whereas, the size of state space report was relatively convenient in comparison state space graph. Therefore, the standard state space report for the reduced model (Table 3) is considered for verification.

Table 3 State space report

The partial state space graph created 41,782 nodes and 161,171 arcs in 300 s of simulation run. The graph contains strongly connected components (SSC) equal to the state space nodes, implying that the model does not have any SSC with more than one node, and hence no infinite occurrence sequences. In other words, the execution of model terminates. The integer and multi-set bounds represent the maximum and minimum number of tokens and their values each place may contain. Due to the large quantity of places and tokens involved in the model, their discussion is insignificant and not necessary. The initial marking of the model is not a home marking which means the initial marking is not reachable from all reachable markings. The model has 2,628 dead markings, indicating that there are 2,628 different ways to the model stopping. The model contains 6 dead transitions, meaning that these transitions were not enabled during the simulation run. Absences of the live transitions indicate the model termination. The fairness properties show how often the individual transitions occurred. As shown in the report, the model does not have any infinite occurrence sequences. The state space report gives a first rough idea about if the model works as expected and presents a number of useful information about its behaviour. A faulty model reflects errors in its state space report.

6 Conclusion

A CPN-based formal model of moving authority given sub-goal for block section is presented. The state space report verifies the model correctness in terms of the reachability, boundedness, liveness and home properties. Various fault scenarios have been modelled to make the behaviour analysis of the overall model. The outcome of the behavioural analysis allowed us to justify the design correctness of the proposed CBTC system for IR. The test of our models based on actual fault data collected over a period can help correlate the proposed models’ behaviour with the present simulation results. Further, such actual data can be statistically modelled to understand the pattern of fault occurrence that may help IR to take preventive measures.