1 Introduction

Several market reports predict an increase in demand for public transport (PT) services during the coming years (Market Research Future 2021; Markets and Markets 2019). Some of the main reasons mentioned for this is the growing population (and thereby growing demand) and the environmental effects of car travel, pressuring people to review their travelling behaviours. These reasons will presumably remain, even though Covid 19 appears to at least temporally have stopped the trend. Moreover, road congestions are becoming an increasingly severe problem in some cities, which calls for alternative modes of travel (Moya-Gómez and García-Palomares 2017). Consequently, many governing bodies at both local and national level are working to move travellers from private cars to PT. PT is thereby faced with requirements to become a more attractive travelling alternative which, together with growing demands, pose great challenges for the PT sector. At the same time, complexity of the transport systems and traffic behaviour increase, as the number of alternative travelling options expand and the transport systems evolve. Emerging technologies such as Artificial Intelligence (AI) open up for new opportunities to handle these, and other types of challenges (UITP 2020). These opportunities may also bring financial benefits. For instance, it has been estimated that AI will increase profitability within transportation and storage by 44% and within public services by 27%, by the year 2035 (Purdy et al. 2017).

The term Artificial Intelligence was coined in 1955 and has been defined as “the science and engineering of making intelligent machines, especially intelligent computer programs. It is related to the similar task of using computers to understand human intelligence, but AI does not have to confine itself to methods that are biologically observable.” (McCarthy 1998; McCarthy et al. 2006). However, over the years, many different definitions of this complex term have been suggested. Russell and Norvig (2010) identified four main categories of AI definitions: Thinking humanly, Acting humanly, Thinking rationally and Acting rationally. Since the research community still is far from a commonly agreed definition, we have, in the literature review presented in this paper, chosen to rely on the authors of the selected publications’ use of the term AI.

In recent years, the amount of research on how to apply AI to PT has increased rapidly. The aim of these studies is often to extend the general understanding of the PT system and its travellers, or to make PT more efficient, leading to, e.g., cost savings and a reduced environmental impact. The intended users of the AI technology differ depending on these aims, from governmental PT authorities, via planners and operators, to individual travellers. We argue that PT has a number of characteristics that make it suitable for applying AI, e.g., a vast amount of data is continuously being generated but is not used to its full potential, and the PT systems are affected by distributed decision making involving both travellers and several PT actors. Thereby, AI may make a difference in many situations by supporting both individuals and organizational actors. Moreover, the Internet of Things (IoT) is also an enabler in this area (Davidsson et al. 2016), which could spark many changes in the PT domain when combined with AI.

This paper aims to review research on applications of AI that can be used to improve PT, i.e., to make PT better in some way. This improvement can benefit, for instance, the travellers, the actors or the environment; however, studies in which PT is only used as a means to achieve other types of goals, such as, when buses are used as probes for detecting road potholes (Sharma and Sharma 2019), are not included in the review.

Previous attempts have been made to review this topic, at least partially. For instance, Abduljabbar et al. (2019) reviewed previous research using AI within transport in general; however, little attention was paid to PT modes other than buses. Similarly, Koushik et al. (2020) conducted a review on activity-based travel behaviour studies that employ machine learning (ML) techniques, but with little focus on PT. Liyanage et al. (2019) made an environmental scan and analysis of the technological, social, and economic impacts surrounding flexible on-demand mobility, including some of the AI-based tools that have been applied within PT. Li et al. (2018b) conducted a literature review of the practice of using smart card data for estimating traveller destination, where some approaches are based on ML. Furthermore, Welch and Widita (2019) reviewed literature on sources of big data and big data applications, of which some were AI-based, applied to public transportation problems. Similarly, Ge et al. (2021) reviewed the current state of the art of public transport data sources, as well as summarized and analysed the potentials and challenges of the main data sources. They also presented an information management framework to enhance the use of the data sources. Finally, several literature review studies focus on the benefits of using agent technology within PT (Bazzan and Klügl 2014; Chen and Cheng 2010). Although these reviews provide some useful insights into how AI can be utilized within PT, to the best of our knowledge, there is no previous review focusing on exploring the potential of using AI to improve PT. This gap in literature, in combination with a significant growing interest in AI within PT, suggests that the proposed review would represent both a timely and an important contribution.

The goal of this review is to contribute to the general question: What is the potential of AI to improve PT? To answer this question, we have identified the following more concrete research questions:

  1. 1.

    What problems have been addressed and what are the intended benefits?

  2. 2.

    To what extent have these benefits been realized?

  3. 3.

    How has AI been used, and what are the requirements on data availability?

  4. 4.

    What are the main challenges faced when using AI to improve PT?

The first two questions concern the intended benefits, e.g., cost savings or service quality improvement, and application areas, e.g., a particular transport mode and task. To better understand the potential of AI for PT, it is important to comprehend how AI actually contributes in the different studies and which requirements need to be met to achieve the benefits; in particular, what data is needed. These aspects are covered by the third question. Finally, to understand the potential, we also have to understand the challenges.

The result is expected to provide increased knowledge to different types of PT actors and public authorities about the potential of AI to improve PT both in general and in specific areas, including planning and operation improvement; for instance, in terms of efficiency, reliability and safety. Furthermore, the review provides scholars with an overview of the state of the art and an indication of current knowledge gaps. Notably, the review does not intend to dig into which particular AI method is best suited to address a particular application area in the PT domain, since that would only be possible if multiple methods were applied in very similar context. Moreover, the success of using, for instance, ML is often more connected to other aspects than the chosen ML algorithm, such as, parameter tuning, data preparation, data availability, etc. (see, e.g., Lavesson and Davidsson (2006)). Instead, this review intends to identify application areas of successful implementations and real usage in business if such exist. Our approach to reach the aim of this study is not to identify every single article in the area, but rather to base our analysis on a representative sample of the literature in the domain. The next section explains the methodology used for the literature review. In Sect. 3, the results are presented and analysed. Section 4 discusses the result and concludes the study.

2 Methodology

To investigate the potential of AI to improve PT, a literature review was carried out. The aim of this review was to identify a relevant and representative sample of research, and by characterizing the research performed, synthesize an indication of the potential. An indication of potential may be that a problem was addressed by using AI; and a stronger indication is, of course, if validated benefits are reported. Notably, we view potential from a broad perspective, including, for instance, what type of benefits can be achieved, and what type of AI-technologies and data sources appear to be useful.

The literature review was conducted in a systematic manner, following most of the guidelines provided by Kitchenham and Charters (2007). The main difference between these guidelines and our review is that another approach was used to refine the selection criteria (see step 4 below), and that we did not perform any study quality assessment. A study quality assessment primarily aims at providing more refined selection criteria, and it estimates the quality differences between the studies. As mentioned above, this review aims at identifying a representative sample of the current research in this area, irrespective of quality. However, the reviewed studies were classified according to benefit validation (e.g., conceptual solution or experiments based on real data) and whether the proposed solutions have been implemented in business, which gives an indication of the quality (Kitchenham and Charters 2007). Given that our research questions are to some extent on a general level, our review has resulted in a systematic mapping study. However, some research questions are on a more specific level and thus, the study also has elements of a systematic literature review (Kitchenham and Charters 2007). In short, the literature review included the following sequential steps:

  1. 1.

    The research questions were specified (see Sect. 1).

  2. 2.

    A preliminary review protocol was developed and agreed by all researchers in the research group.

  3. 3.

    Potentially relevant research studies were identified based on the search strategy (see Sect. 2.1).

  4. 4.

    The review protocol was revised based on a study of 20 of the research papers identified in step 3 (see Sect. 2.1).

  5. 5.

    The primary research studies were identified based on the selection criteria (see Sect. 2.2).

  6. 6.

    The data were extracted (see Appendices A and B).

  7. 7.

    The data were analysed and synthesized (see Sect. 3).

The review protocol included the research questions, search strategy, selection criteria, selection procedure, data extraction strategy, and synthesis strategy. The data extraction strategy involved a classification framework that defined a number of categories to which the identified research studies were mapped.

The literature review was performed by a research group of four researchers. In step 4, 20 randomly selected studies were divided between the group members, amounting to 5 studies per researcher to read and to analyse. The results from this work were then discussed by the entire research group. Based on these discussions, the selection criteria and classification framework in the review protocol were revised. Section 2.1 presents the results of the agreed upon final review protocol. However, the classification framework is presented in Sect. 3, as it was further revised during the subsequent classification work with the rest of the studies, i.e., it was extended whenever a study could not be properly classified by the framework.

2.1 Review protocol

As mentioned above, the review protocol included the research questions, search strategy, selection criteria, selection procedure, data extraction strategy, and synthesis strategy. The search strategy was defined as follows:

  • Search databases: Scopus, IEEE Xplore, and Web of Science.

  • Publication year: no limit.

  • Search phrase: (“AI” or “Artificial Intelligence” or “Machine Learning”) and (“Public transport” or “Mass transit” or “Public Transit”).

  • Search fields: title and abstract, i.e., the search phrase may match the title, the abstract, or partly the title, and partly the abstract.

The main reason for selecting Scopus, IEEE Xplore, and Web of Science is that these are three of the most important databases for computer science research, and that they cover a large scope of public transport research (Bar-Ilan et al. 2007; Hoonlor et al. 2013). Moreover, IEEE Xplore and Scopus are identified as important within computer science (software engineering) by the review guidelines developed by Kitchenham and Charters (2007). The search phrase focuses on AI, ML and PT (in different nuances). ML, which traditionally is viewed as a subarea of AI, was added to the search phrase due to the recent huge practical success of ML, which has significantly contributed to the increased interest in AI (Holzinger et al. 2018). As with AI, the review relies on the authors of the selected papers’ perceptions of the ML concept. Obviously, there might be more search phrases – such as, deep learning, software agents etc. – that can generate a larger number of studies; however, as stated above, the aim was to identify a representative sample. Moreover, by using this strategy, we rely on the publication authors’ views on what is included in the concept of AI and ML.

The selection criteria were defined as follows in the review protocol:

  1. (1)

    Inclusion criteria

    1. (a)

      Written in English.

    2. (b)

      Journal articles, conference papers, and book chapters.

    3. (c)

      Describe the use of AI to improve or to understand PT.

  2. (2)

    Exclusion criteria

    1. (a)

      Earlier or shorter versions of a paper if there is a refined or extended version.

    2. (b)

      Review studies.

Criterion 1c implies that all selected studies must be related to PT, including, e.g., bus, railway and bike sharing systems. Furthermore, studies in which only a part of the route involves PT are included in the review, i.e., at least some part of a route must involve PT, whereas other parts may involve other means of transport (e.g., private bicycle or car). All selected studies must describe the use of AI to improve or to understand PT. This means that studies addressing, for instance, strictly legal issues related to implementing AI in PT have not been included. Furthermore, studies describing general approaches to improve vehicles or road infrastructure are not included. Criterion 2b implies that we only focus on primary studies, not secondary studies, i.e. review studies.

In the selection procedure, the research studies were randomly distributed and studied by the research group members. Any ambiguity raised by a group member during this work was discussed by the entire research group. The selection procedure included the following steps:

  1. 1.

    Search the three databases according to the search strategy.

  2. 2.

    Remove all duplicates (identical papers appearing in more than one database).

  3. 3.

    Read all paper abstracts and include/exclude based on the selection criteria.

  4. 4.

    Read all full papers and include/exclude based on the selection criteria.

As mentioned above, the data extraction strategy involved a classification framework. The classification framework consists of a number of categories that were developed based on the research questions above, the initial review of 20 papers (step 4 above), and some literature reviews in related areas (Davidsson et al. 2005). Each category contains a number of classes, which were identified iteratively during the review, based on the content of the studies. After the selection procedure, all data relevant for the classification framework were extracted from the selected studies, i.e., for each study information was mapped to the framework. As before, all ambiguities were discussed by the entire research group. When all studies had been mapped to the classification framework, the framework categories were divided between the research group members and further studied to detect any inconsistencies. Thereafter, the information in each of the categories was analysed as part of the synthesis strategy. Moreover, a cross-analysis between the most relevant categories was performed.

2.2 Paper selection

The database searches cover all papers up to and including the year 2020. The initial searches resulted in 187 records in Scopus, 39 records in IEEE Xplore, and 77 records in Web of Science. Of these, 91 records were duplicates, and thereby removed. In the subsequent abstract scanning process, another 71 records were removed (24 based on criterion 1b, 46 based on criterion 1c and 1 based on criterion 2b). Thus after the abstract scanning, 141 studies remained. The full paper scanning process resulted in the removal of 30 studies (6 based on criterion 1a, 18 based on criterion 1c, 2 based on criterion 2a, and 4 based on criterion 2b). As a result, 111 studies were included in the subsequent framework classification and analysis.

3  Results

During the review, we found that AI is typically used for automating or supporting decision making, including planning. In principle, AI can be employed in all different potential steps—from structuring data to suggesting decisions or even affecting the real world by automation. Therefore, the aspects of how AI improves PT have been incorporated into the classification framework. Table 1 shows the resulting classification framework based on all reviewed papers, including categories and corresponding classes. In the subsequent subsections, the main classification results are quantified and analysed. The complete set of results are presented in Appendix A. The framework allows for double classification, when suitable.

Table 1 Classification framework

3.1 Benefits

We have classified the benefits aimed for, or achieved, in the studies through explicit claims of benefits connected to the following: Cost savings, Service quality improvement, Environmental, General understanding, Safety/security. Naturally, the statements in the individual studies concerning how the results can be used to improve PT, are often on a more detailed level. Moreover, sometimes the same AI technology could be used for several purposes, e.g., improving service and reducing emissions as suggested by Mackett (1994), Prashanth et al. (2016) and Shatnawi et al. (2020). These studies have been double classified. However, such multiple benefits probably apply for more studies, although they are not explicitly stated. Hence, the classification of a study into a benefit category mainly reflects the stated motivational goals but not its entire potential areas of benefit.

Cost savings include attempts to reduce PT costs by, for instance, increasing resource utilization through improved scheduling and reduced resources while maintaining service, using, e.g., trip or mode optimization (Manivannan et al. 2020; Tekin et al. 2018). Service quality improvements include not only the optimization of line scheduling but also examples of detecting problems in PT for quick alleviation, such as bunching (Degeler et al. 2020). If the motive is connected to the environment, including ecological sustainability and energy savings, it is classified as Environmental. Within this class, studies that use AI for mode choice analysis with the purpose of reduced car driving are included (cf. Lazar et al. 2019). If AI is applied for mode choice analysis, but with the purpose of making things better in PT in several ways, it falls within the class of General understanding. The class General understanding contains PT system forecasting without a particular focus of the usage of the forecast, e.g., forecasting traveller mode choices within e-mobility (Ferrara et al. 2019) or using mobility data for mode choice predictions (Liang et al. 2019). General understanding also includes additional aspects not covered in the other classes, such as PT funding (Ubbels and Nijkamp 2002) and assessing equity (Mayaud et al. 2019). Figure 1 shows the number of studies addressing different benefits.

Fig. 1
figure 1

Number of studies addressing different benefits

Our findings show that AI has mainly been used for improving service quality and general understanding. Most studies related to the General understanding class concern mode choice analysis with the purpose of making things better in PT in several ways. Given that AI in general often is viewed as a tool for making systems more efficient, one might expect that the Cost savings class, which involves resource improvement aspects, would have included more studies than we have found in our review. Notably, no study had a clear primary goal to increase revenue.

3.2 Transport modes

A broad range of transport modes are addressed in the studies. Moreover, while some studies focus on a single transport mode, e.g., conventional buses, trains or bike sharing systems, others address the combination of several transport modes, e.g., the combination of private car and railway for the same journey. We have identified five different main transport mode classes: Bus, Railway, Any PT, PT and other modes and Bike sharing systems. The classes Bus, Railway and Bike sharing systems refer to studies that focus on a single transport mode (Aditi et al. 2020; Singla et al. 2015; Tang et al. 2020), whereas studies addressing both bus and railway are double classified to the Bus and Railway classes. The class Any PT includes both studies that specifically state that their results can be applied for any type of PT mode, and studies that do not specify which type of PT they are addressing but whose results can be applied for any type of PT mode (Roulland et al. 2014; Van Egmond et al. 2003). The class PT and other modes include studies that focus on both PT modes and other transport modes (e.g., private car or bicycle) (Chapleau et al. 2019; Tu et al. 2016). Figure 2 presents the number of studies in each type of transport mode class. As can be seen, most of the studies address the application of AI for bus, followed by Railway, Any PT and PT and other modes. A few studies also experiment with applying AI for bike sharing systems.

Fig. 2
figure 2

Number of studies addressing different transport modes

3.3 Time horizons

In terms of supported time horizons, we have identified three classes: Long term, Medium term and short term. However, since it is often difficult to clearly assess the time horizon of the influence based on the information present in the studies, this classification should be viewed as indicative. Long-term support refers to studies applying AI for supporting long-term decisions, or can be used annually for the distant future, e.g., using AI for selecting the appropriate form of PT systems for cities (Liang et al. 2019; Mackett 1994; Victoriano et al. 2020). In total, 32 studies address supported decisions concerning the long-term horizon, covering a broad range of applications. Medium-term support refers to studies that address decisions affecting the practices of PT for the coming weeks and up to a year. A total of 34 studies have explored this type of supported decisions, where the support was provided through, for example, using AI to optimize the PT routes and the number of operational vehicles (Tekin et al. 2018). Decisions related to the short-term horizon influence the practices in real-time contexts, e.g., travel-time prediction via forecasts of the status of the buses (position of the bus, estimating the delay or detecting critical situations) in order to support and allow the traffic controller to quickly handle situations (Wei et al. 2017), as well as automation. As presented in Fig. 3, most studies focus on short-term decision support, whereas equally many focus on long-term and medium-term decisions.

Fig. 3
figure 3

Number of studies addressing different time horizons

3.4 Benefit validation and implemented in business

The maturity level of the AI solutions suggested in the studies is reflected by two dimensions: (1) to what extent the benefits of applying AI in PT are validated, and (2) to what extent the benefits/applications are implemented in business.

The level of validation of the application is classified to five levels: Conceptual/theoretical, Experiments based on real data, Experiments based on artificial data, implemented in small scale, and Implemented in large scale. Conceptual/theoretical results include studies whose results are on a conceptual/theoretical level and thus, not realized in practice. They mainly focus on exploring solutions based on artificial data and modelling (Cao et al. 2011; Dimanche et al. 2017; Sosnowska and Skibski 2018), and the results serve as contributions to potential further development. Experiments based on artificial data and Experiments based on real data include studies that show applied experiments. Those studies often first propose a conceptual/theoretical method that applies AI for solving certain PT-related problems, and then they test their methods, based on either artificial data or real data, such as registered PT vehicle positions, actual arrival/departure times, or the number of boarding/alighting passengers at different stations (Bahuleyan and Vanajakshi 2017; Berbey Alvarez et al. 2015). Very few cases are implemented in large or small scale. The main difference between the implementation in large scale and small scale is related to both the development status of the application and the scale of the implementations. The applications implemented in a small scale are at an earlier development stage, implemented in one or two empirical cases through research or pilot projects. They are tested in the real world for improving and validating the application (Sykes et al. 2019), as well as demonstrating the need for improving/developing the applications (Mackett 1994). The applications implemented in a large scale are at a later stage of the development and implemented in multiple cases in a large scale, such as an expert system for station management which was tested in the railway system of Hongkong. Its feasibility was proved by the implementation which supported the validation in the next step (Chang 1996). Figure 4 shows the number of the studies of each validation category.

Fig. 4
figure 4

Number of studies addressing different means of benefit validation

For the Implemented in business category, three classes were identified: Broadly used, Single example and Nothing reported. The class Broadly used refers to applications that have been broadly used in business for supporting certain decisions (Scemama 1995), while Single example includes applications that have only been used in business in a single case (Bocchetti et al. 2009; Mackett 1994). The rest of the studies either specifically state that the applications have not been used in business or they do not report anything about whether the applications were used in business or not (Bembalkar and Game 2019; Manivannan et al. 2020; Ubbels and Nijkamp 2002). These studies have been classified as Nothing reported. As shown in Fig. 5, a very limited number of services/applications have been reported as used in business. However, this may not indicate that the rest of the services are not used in practice, since this issue remains unreported in most studies.

Fig. 5
figure 5

Number of studies used in business to different extents

3.5 Mechanism for achieving benefit

To better understand the way AI was applied in the different studies, we used a category concerning the mechanisms AI provided for achieving the benefits. Figure 6 presents the number of studies addressing these mechanisms. We have classified the studies into the following classes: Current status estimate, Prediction, Planning/scheduling/resource allocation. Here, the rather common case of using pattern recognition is included in the Current status estimate class. An example of image recognition concerns recognition of traffic signs (Sykes et al. 2019), but also other data sources, such as, smart car data can be used in the current status estimate (Zhang and Cheng 2018). Predictions concern estimates of future states, typically including an ML model, for instance, predicting the travel time based on vehicle positions (Bahuleyan and Vanajakshi 2017) or travel behaviour (Chapleau et al. 2019). The literature review only included five studies focusing on automation, all the other focused on decision support. Two of these studies were classified as Planning/scheduling/resource allocation, being a case of controlling traffic lights (Cao et al. 2011) and the other about automatic control of backup of surveillance videos (Cui et al. 2020) and one study classified as Prediction being a case of information messaging (Genser et al. 2020), and two classified as Current status estimate, being a case of Covid 19 detection (Liu and Huang 2020) and one being a case of smart fare collection (Mastalerz et al. 2020). AI appears rather evenly distributed between the classes, although prediction is the most commonly used mechanism, whereas the more advanced tasks of planning/scheduling/resource allocation are the least common. Note, however, that in many studies classified as Current status estimate and Prediction the ambition of the studies are on a more advanced tasks, e.g., to plan, but this is then done with other methods, typically by a domain expert/planner.

Fig. 6
figure 6

Number of studies addressing different mechanisms for achieving the benefits

3.6 AI technology

The most commonly used AI technology is ML, which is applied in 77% of the studies. ML is used for generalizing from large amounts of data. The result is typically a classifier that can be used to predict or estimate some unknown value, e.g., related to travel behaviour, traffic flows, land use, predictive maintenance, number of passengers in a vehicle, or estimated arrival time. Many different ML algorithms have been applied, such as Neural Networks, Support Vector Machines, K-Nearest Neighbours, Decision Trees, Random Forests, Naïve Bayes, and Rough Set Analysis. In many of the studies, different ML algorithms are tested and compared, in order to identify the one best suited for the particular aim. However, there is no clear consensus which algorithms are most appropriate for the different tasks.

Another technology often used in the reviewed studies is Heuristic SearchFootnote 1, such as the A* algorithm, Genetic Algorithms, and Ant Colonies. These algorithms have been used for different optimization tasks, such as scheduling, routing, finding shortest paths in PT networks, and vehicle allocation, as well as, optimizing the PT system as a whole.

AI technologies for Reasoning are typically based on some kind of formal logic, e.g., fuzzy logic or expressed as if-then rules in an Expert System. They are used to automatically draw conclusions based on known facts, e.g., which congestion control action to choose based on the current congestion pattern.

Multi-Agent Systems (MAS) is an AI technology in which several intelligent entities are typically collaborating to manage a complex task, such as traffic control. However, we found only one application of MAS (Cao et al. 2011); one reason for this could be that the term AI is not always used when describing MAS applications.

There has been a clear current trend over recent years to apply ML to solve PT problems, whereas Reasoning mainly was used in the early work of applying AI in PT. Figure 7 shows the number of studies addressing each type of AI technology.

Fig. 7
figure 7

Number of studies addressing different types of AI technologies

3.7 Data needs

Based on the data used in the different studies, 17 subclasses were identified. To get a clearer picture of the relations between the data and the different parts/parties involved in a PT system, these 17 subclasses were then grouped into 4 main classes. This resulted in the following main classes and subclasses:

  • Data connected to travellers:

    • Number/ID of boarding/alighting passengers at different stations (e.g., passenger flow between stations).

    • Station congestion/demand levels (e.g., passenger flows at a station).

    • Traveller characteristics/behaviour/opinions (e.g., personal trip characteristics, purpose and duration of trips, or personal costs).

    • People’s positions/acceleration/etc. (passenger movement data)

    • Surveillance on in-vehicle passengers (e.g., passenger position and movement inside vehicle).

    • Fare transactions and journey searches (e.g., fare records generated through different ticketing media used by the traveller, or renting and return records of bike sharing system).

  • Data connected to PT system:

    • PT vehicle positions/acceleration/etc. and actual arrival/departure times (vehicle movement data).

    • Surveillance on PT system and in-vehicle conditions (e.g., video and audio surveillance data from PT system, or CO2 concentration in bus).

    • Vehicle capacity.

    • Timetables and structure of PT system (e.g., interchanges and travel times, or number of stations and vehicles along a route).

    • PT organization/funding/marketing/services/maintenance (e.g., records of frequency and size of maintenance parts orders, or PT marketing and partnerships).

  • Data connected to outdoor environment:

    • Local weather conditions (e.g., weather conditions when making a trip).

    • Local built and natural environment as well as city regulations (e.g., natural environment around households, geographical positions of different bike sharing stations, or city politics and regulations).

  • Data connected to roads and private cars:

    • Private car positions.

    • Conditions/traffic volumes/speeds on roads (e.g., road condition of an intersection, road congestion, or freeway speeds and volumes).

    • Signal light state (signal light state of an intersection).

    • Road traffic incidents.

Depending on the timeframe and the data collection method, the data were also classified into historical data or real-time data, as well as sensor data, questionnaire data, or documental data. Studies that express a need for collecting and using data in real time (or near real time) are considered as using real-time data, whereas studies that use data collected in the past are considered as using historical data. Real-time data include passenger movements, surveillance data on the PT system or in-vehicle passengers, bus positions or arrival times etc. (Elizalde-Ramírez et al. 2019; Bocchetti et al. 2009; Belapurkar et al. 2018; Prashanth et al. 2016; Borodinov and Myasnikov 2020a), whereas historical data include historical passenger flow between stations, area maps, household characteristics, etc. (Berbey et al. 2012; Hu et al. 2016; Hagenauer and Helbich 2017). Naturally, a study may use both historical and real-time data (Agafonov and Yumaganov 2019). Sensor data and questionnaire data represent data that have been collected using different types of sensors (e.g., position data, temperature data, video data) or questionnaires (e.g., concerning user behaviour, conditions and opinions). Documental data originate from different public officials, administrative officers, or other office-workers. This class includes, for instance, characteristics and locations of different Park-and-Ride stations, and timetable and structure information from the PT system (Ferrara et al. 2019; Mayaud et al. 2019). Data collected from social media are also included in this class (Kulkarni et al. 2018).

The results of the classification show that the most commonly used data belong to the following three subclasses: Number/ID of boarding/alighting passengers at different stations; PT vehicle positions/acceleration/etc. and actual arrival/departure times; and Timetables and structure of PT system (see Appendix B). This can be interpreted as the data from these subclasses being the most useful for AI applications within PT. However, this result may also reflect the accessibility of the data, i.e., this type of data is probably more easily accessed than, for instance, data related to people’s positions. Therefore, data from these subclasses are more commonly used in the studies.

Figure 8 illustrates the main results of the data needs classification. As can be seen, a relatively large share of the data used are connected to the travellers. This means that AI is not only applied for applications focused solely on the PT system, but many applications also relate to the travellers and their behaviour, characteristic, needs, etc. Furthermore, many studies use data that have been collected at an earlier point in time, i.e. historical data. Even though some studies use both historical and real-time data, one conclusion that can be drawn is that AI is mostly used to provide support for decisions that do not depend on real-time data. However, another conclusion is that by opening up to more real-time data, many more AI-applications may be enabled. Figure 8 also shows that multiple studies depend on sensor data, either if it is used in real time or not, i.e., sensors are strongly needed to enable AI applications within PT.

Fig. 8
figure 8

Number of studies requiring a data from the different main classes, b different means of data collection, and c different data timeframes

Note that not all studies that focus on real-time support require real-time data. For instance, some of the studies use data that are unspecified by the authors (Dimanche et al. 2017; Molina 2005; Yu et al. 2018), and some use historical data for real-time support (e.g., calculating optimal route choice for the traveller based on information about timetables and structure of the PT system (Song et al. 2015) or predicting the PT delay based on historical delay and weather (Leung et al. 2020).

3.8 Applications in public transport

The application of AI technology in PT covers many different problem areas. We have identified four main classes corresponding to different application areas. Figure 9 presents the number of studies in each main class. The most studied area concerns different ways of travel service improvement, including the perspectives of both the operators and the travellers. The estimation of arrival time and support for route choices are the most common applications of this kind, but there are also examples including an improved understanding of the travellers’ preferences and state, as well as the integration with other services such as bike sharing. The following applications related to travel service improvement have been found in the reviewed studies:

  • Estimating travel/arrival time (Agafonov and Yumaganov 2019; Bahuleyan and Vanajakshi 2017; Biyani 2019; Grzenda et al. 2020; Heghedus 2017; Heghedus et al. 2019; Kyaw et al. 2019; Reddy et al. 2016; Olczyk et al. 2017; Leung et al. 2020; Pandurangi et al. 2020; Tran et al. 2020; Yang et al. 2020a; Yuan et al. 2020).

  • Supporting route choice (Elizalde-Ramírez et al. 2019; Nachtigall 1995; Prashanth et al. 2016; Song et al. 2015; Manivannan et al. 2020; Amrani et al. 2020).

  • Balancing and availability of rental bikes (Lin et al. 2018; Singla et al. 2015; Wang and Kim 2018; Yang et al. 2020b; Bei et al. 2020).

  • Capturing travellers’ opinions (Kulkarni et al. 2018; Lock and Pettit 2020; Othman et al. 2019; Raflesia et al. 2018; Rahimi et al. 2020).

  • Traveller recommender systems (Borodinov and Myasnikov 2019, 2020a, b).

  • Improving communication with travellers (Yu et al. 2018; Kuberkar and Singhal 2020; Sykes et al. 2019; Velosa and Florez 2020).

  • Monitoring traveller’s state (Belapurkar et al. 2018; Liu and Huang 2020).

The second-most studied application area concerns operations support, including real-time support for different monitoring, diagnosis and planning tasks. These applications concern both the traffic and vehicles, as well as the travellers. For instance, some applications support traffic monitoring or vehicle tracking whereas others estimate the number of travellers in a vehicle. The following applications have been found within this area, in the reviewed studies:

  • Supporting dispatching (Dimanche et al. 2017; Moreira-Matias et al. 2016; Wang et al. 2019; Degeler et al. 2020).

  • Supporting diagnosis, prediction and planning (Blandin et al. 2019; Molina 2005).

  • Supporting emergency management (Chang 1996).

  • Supporting maintenance (Adamson et al. 2005; Killeen et al. 2019; Hermann et al. 2020).

  • Vehicle tracking (Barbosa et al. 2017; Wilkowski et al. 2020).

  • Monitoring passenger flows (Haq et al. 2020; Paletta et al. 2005).

  • Traffic monitoring (Scemama 1995; Wei et al. 2017).

  • Traffic management (Cao et al. 2011; Minea et al. 2019; Genser et al. 2020; Ayman et al. 2020).

  • Security surveillance (Bocchetti et al. 2009; Rohit 2020).

  • Estimating/predicting the number of travellers in a vehicle/station (Li et al. 2018a; Pasini et al. 2019; Liu et al. 2020; Skhosana et al. 2020).

  • Estimating real-time on-board bus ride comfort (Nguyen et al. 2021).

  • Fraud detection (Claiborne and Gupta 2018).

  • Optimizing when to backup surveillance video files (Cui et al. 2020).

  • Fare collection (Mastalerz et al. 2020).

The third application area concerns the understanding/predicting travel behaviour, such as travel patterns, mode choice, and how the built environment affects travel behaviour. The following applications have been found within this area, in the reviewed studies:

  • Understanding travel patterns (Berbey et al. 2012; Ghaemi et al. 2015; Jung and Sohn 2017; Kedia et al. 2017; Sun and Yang 2018; Tang et al. 2020; Xue et al. 2014; Yu et al. 2015; Zhang et al. 2019; Shalit et al. 2020).

  • Understanding/predicting the travellers’ mode choice (Chapleau et al. 2019; Ferrara et al. 2019; Hagenauer and Helbich 2017; Lazar et al. 2019; Liang et al. 2019; Niklas et al. 2020; Tu et al. 2016; Victoriano et al. 2020; Zhou et al. 2019).

  • Understanding the effects of the built environment on travel behaviour (Deng and Yan 2019; Hu et al. 2016).

  • Predicting passenger dwelling time and flow (Berbey Alvarez et al. 2015).

  • Predicting travellers’ social demographics (Zhang and Chen 2018).

A slightly less studied application area concerns support for long-term PT system analysis and planning. This area includes, for instance, support for determining the appropriate PT system (e.g., for PT system planning in a city), optimizing or simulating timetable scheduling, and evaluating a PT system, from different aspects. The following applications have been found within the area, in the reviewed studies:

  • Supporting PT system planning (Degeler et al. 2020; Leprêtre et al. 2019; Mackett 1994, 1996; Roulland et al. 2014; Shatnawi et al. 2020; Ullón et al. 2020; Van Egmond et al. 2003).

  • Timetable scheduling support (Bembalkar and Game 2019; Othman and Tan 2018; Tan et al. 2011; Tekin et al. 2018; Xie et al. 2004).

  • Analysing PT systems (Mayaud et al. 2019; Sosnowska and Skibski 2018).

  • Generating synthetic data (Golubev et al. 2016).

  • Predicting ticket prices (Aditi et al. 2020; Branda et al. 2020).

Fig. 9
figure 9

Number of studies addressing different application areas of AI technology in PT

3.9 Challenges of using AI

In most of the reviewed studies, the challenges encountered when applying the AI methods were not discussed at all. The challenges that actually have been identified mainly concern data availability and quality. For many AI methods, in particular ML, large datasets, often from different sources, are needed to get good results, and they may require substantial efforts to collect and be difficult to store (Mastalerz et al. 2020; Nguyen et al. 2021; Reddy et al. 2016; Roulland et al. 2014; Tran et al. 2020). Moreover, some key types of data can be difficult to obtain at all, and sometimes the quality of the data is too low to be useful (Kulkarni et al. 2018; Mastalerz et al. (2020; Nguyen et al. 2021; Rahimi et al. (2020; Shalit et al. (2020; Tran et al. (2020; Yuan et al. (2020). Also, there is often a need to pre-process the data before it can be used, which, together with the processing of the data, may require a substantial amount of resources; in particular, if time is a limiting factor (Jung and Sohn 2017; Sykes et al. 2019; Tran et al. (2020). Much data related to PT concern individual travellers in one way or another, which impose the challenge of how to avoid compromising the personal privacy of the travellers (Ferrara et al. 2019). Finally, to apply AI methods often requires significant AI knowledge and skills, e.g., it could be difficult to select the most suitable AI method for the problem at hand (Ghaemi et al. 2015; Wang and Kim 2018).

3.10 Cross analysis and trends

The reviewed studies cover a broad variety of AI applications for PT, particularly in terms of benefit areas, transport modes, data needs, and the type of applied AI technologies. After having classified the studies according to the characteristics of each category above, we carried out a cross analysis to synthesize the knowledge across all the categories. This section presents the main cross analysis results. Moreover, a histogram of the publication years of the selected studies is presented.

Regarding which applications use which AI technologies, all four application areas identified in Sect. 3.8 were represented among the applications using Machine learning. Heuristic Search and Reasoning were also used for most of the identified application areas, apart from Understanding/predicting travel behaviour and Travel service improvement. Moreover, by analysing the classification from the time horizon and addressed benefits perspectives, a number of relations could be extracted. Studies concerning the Long-term time horizon mainly contribute to the General understanding and Environmental benefits, whereas benefits of improving Safety/security are not addressed in this time horizon at all. The benefits of the long-term planning can be summarized into the following five areas: (1) transport planning and urban planning in relation to build environment and land use planning; (2) future PT route planning based on the understanding of predicted mode choices related to the individual characteristics, e.g., household size, demographic information, trip attributes; (3) optimization of PT fairs and funding mechanism; (4) developing policy indications for reforming in the regulatory framework of the organizations; (5) supporting the improvement of the maintenance system, for instance, by improving the re-ordering strategies for a multiplicity of different supply parts in relation to their usage and consumption rates, hence a better forecasting of the item consumption can be reached. The studies concerning medium-term time horizon primarily intend to achieve Service quality improvement, General understanding and Cost saving related to: (1) improving the general understanding of the PT system and travellers; (2) optimizing the service attributes such as scheduling and route planning; (3) evaluation of the services; and (4) predicting the electronic fare fraud by detecting indication of fraud in fare transaction records. The studied real-time-related solutions support all types of benefits, to different extents. The supported decisions mainly focus on improving the service quality for the passengers in the form of reducing travel time and transferring time, which is attained, for example, by providing a method for identifying a transit station for passengers to switch from private to public transport along the traveller’s trajectory, or by improving the efficiency of bicycle distribution from a bike sharing system. Real-time horizon support is also explored by developing methods for improving the PT information system to enable passengers to access the information more easily and to plan their trips based on more dynamic and accurate information. As for the different levels of maturity, it is worth noting that the implemented services are real-time and long-term-related supported services, while none of the medium-term-related services has been implemented.

As for data needs, most of the services aiming for Service quality improvement and Safety/security require real-time data, as opposed to services aiming for General understanding, Environmental benefits or Cost savings, which most often do not require real-time data. These results indicate that many of the former services provide real-time decision support, whereas most of the latter do not. Furthermore, we conclude that access to historical data is a requirement for most of the AI-based services within PT. In particular, all applications, except one, in this study aiming for Cost savings and General understanding require historical data. Figure 10 shows the relationships between the main classes of data needs and the AI technologies used in the studies. As can be seen, studies using Machine Learning and Heuristic Search need data from all classes, where Data connected to the PT system is most extensively used. Studies focussing on Reasoning need Data connected to travellers most extensively, whereas Multi-Agent system studies need Data connected to outdoor environment and Data connected to roads and private cars to an equal extent. However, the number of studies in these latter two AI-classes are too few to actually draw any conclusions.

Fig. 10
figure 10

Relations between Data needs and AI technologies

Finally, to illustrate the change over time concerning research interests in this area, a histogram of the publication years of the studies included in this literature review, is presented in Fig. 11. Surprisingly, only two studies were published in the years 2006 to 2010. As can be seen, all application areas have gained increased research interests over the past 10 years. Travel service improvement has increased the most and is currently also receiving the most research attention, followed by Understanding/predicting travel behaviour. These results indicate a clear focus on the traveller, and less focus on operational and systems support.

Fig. 11
figure 11

Number of studies for different periods of years

4 Discussion and conclusions

This study reviewed more than 87 scientific publications which describe a broad variety of applications of AI in PT, particularly in terms of what benefits they aim for, which transport modes they concern, which data sources are used, and what type of AI technologies are applied. The method used to select what articles to include have some limitations. For instance, some relevant articles are not indexed in any of the three databases but meet the other criteria (e.g., Kumar et al. (2014); Palacio (2018); Shakeel et al. (2019)). Others do not use “Public transport” or “Mass transit” or “Public Transit” in the title or abstract (e.g. Toqué et al. (2017)). Similarly, some articles do not use “AI” or “Artificial Intelligence” or “Machine Learning” in the title or abstract (e.g. Berlingerio et al. (2013)). However, we believe that our selection of articles is representative with respect to the work in the field of AI for public transport.

The review shows that the interest of using AI in PT appears to be growing, given the rapidly increasing number of studies during the last couple of years. This trend strengthens the hypothesis that AI may have great potential to improve PT. Further, the reviewed studies propose several types of AI solutions for different application areas, tasks and decision makers (including travellers). This wide scope of AI usage in this domain can also be seen as an indication of great potential. Finally, a large portion of the studies use real data, which points at the possibility of actually obtaining data for AI-applications. However, very few of the research studies provide solid evidence of improvements of PT, since they are mainly experimental and without results from being implemented in business (at least not reported at the date of publication). This might indicate that AI has been applied in real life only to a small extent; however, perhaps a more likely explanation may be that the extent of AI implementation is not reflected in scientific publications, since most business companies have no interest in writing scientific publications. Nevertheless, given the variety of application areas and the growing interest, we draw the conclusion that there is great potential for using AI to improve PT.

The review also shows that the studies almost exclusively deal with AI for decision support, and not automation. In the near future, however, autonomous PT vehicles will most probably represent a major application area within PT and automation. The reason why these studies have not been captured by our search phrase is probably that they are focussed on transport vehicles in general and not dedicated to PT. Nevertheless, our results indicate that AI in the near future has a significant role to play in supporting human decision makers (and not replacing them). Furthermore, the studies are rather evenly distributed between support for real-time decision making and support for planning (medium term and long term). The purpose is mostly to increase the quality of service of PT services or to increase the knowledge about traveller behaviour, and, to a much lesser extent, achieve direct cost savings. Actually, no study had the primary goal to increase revenue, but this might have been a secondary goal and/or a potential effect. AI is applied for several modes of transport, where train and bus are the dominant modes, but also other transport modes are considered in some studies, as well as the entire PT system. We found that there are three main mechanisms that the AI solutions contribute to, where the most frequent is prediction, followed by estimating the current state and resource allocation, including planning and scheduling. The data used in the AI ​​applications are largely sensor data concerning the traveller and/or the PT system. It is noteworthy that historical data of several types are also used to a great extent and that many applications do not require real-time data at all. Finally, only a few studies discussed challenges that were encountered when applying the AI methods. The challenges that actually were identified mainly concern data availability and data quality, which we think is a good pointer to future research needs; in particular, which data is the most important, how to make the data available and how to improve data quality.

To further analyse our findings, we only find one similar study that our result can compare to; an international study from the industry perspective made by UITP (UITP, 2020). Similar to this review, the UITP study identifies challenges connected to the availability of large datasets and data quality. Additionally, the UITP study identifies challenges connected to general knowledge and capacity of deploying AI, as well as establishing commitment from top management to drive the change for utilizing AI. These challenges were not documented by the publications included in our review.

Although studies have identified challenges connected to data needs and data pre-processing, surprisingly few explicit indications were given concerning the need for labelling data. Potentially, this characterizes the domain, where a lot of data exists. However, the need for labelling data, or supporting supervised learning in general, may increase when new applications of AI for PT are being developed and deployed. That may engage more domain experts in developing AI for supporting their more complex decision making than the ones needed without AI support nowadays.