1 Introduction

Every year millions of Muslims travel to Makkah, Saudi Arabia, to fulfill their religious duties during Hajj, the great Islamic pilgrimage. It takes place in the last month of the Islamic lunar calendar, from the 8th to 13th day of Dhu al-Hijjah, a period of only 6 days. According to the General Authority of Statistics of the Kingdom of Saudi Arabia (GaStat) there was a total of 2.5 million Hajj attendees in 2019 (GaStat 2019), which makes it one of the largest mass gatherings in the world (Kassens-Noor et al. 2015; Müller 2015). Additionally, each year there are up to two million unregistered pilgrims attending Hajj from within Saudi Arabia (Haase et al. 2016). It is one of the five pillars of Islam and should be performed by every devout Muslim at least once in their lifetime. The single rituals of the Hajj are determined chronologically. If a ritual is not performed correctly or at the right time, the pilgrimage is considered invalid in the sense of the pilgrims fulfilling their faith obligations.

Fig. 1
figure 1

Major crowd movements during Hajj. Movement 3 is most critical for metro operations

Since the locations where the individual rituals take place are far apart (the total length of the route is about 18 km), it is important that the pilgrims are transported quickly and safely between the different locations (Figs. 1 and 2). During Hajj there are five major crowd movements (Fig. 1). On the first day pilgrims visit the Holy Mosque in Makkah to perform the circulation of the Ka’aba before they move to the Mina valley where they are accommodated in a permanent tent city. The next movement happens on days one and two, when pilgrims visit Mount Arafat. Pilgrims who want to spend the first night at Mount Arafat are accommodated in a temporary tent city in the Arafat area. For the pilgrimage to be valid, pilgrims must spend at least the afternoon of day two praying at Mount Arafat until sunset. The most critical movement takes place after sunset, when all pilgrims have to travel from the Arafat area to Muzdalifah within just a few hours. This poses significant challenges for the metro. Pilgrims spend the night in the plains of Muzdalifah and collect pebbles to execute the next day’s rituals. The next early morning they move back to Mina to perform the first part of the stoning-of-the-devil ritual at the Jamarat Bridge (Fig. 4a). On day four they revisit the Ka’aba and afterwards move back to Mina where they finish the stoning-of-the-devil ritual on days four and five. The pilgrimage concludes with a final visit to the Holy Mosque in Makkah either on day five or six. For more details on the individual rituals we refer the interested reader to Koch (2019b) and references therein.

In the past, pilgrims had to travel the route on foot or by bus, causing congestion at bottlenecks each year. In past decades there also have been several accidents causing numerous deaths.Footnote 1 To cope with the growing numbers of pilgrims and to ensure safe and rapid transport, a metro line was introduced in 2010.

Fig. 2
figure 2

Map of the Makkah Metro line with stops and the geographic location of Makkah (Source: Google Maps)

The metro can transport 72,000 passengers per hour in each direction, which makes it one of the largest-capacity metros in the world. There are three stopping areas, Arafat, Muzdalifah, and Mina, with three stops in each area (Fig. 2), designed to ensure that pilgrims are able to safely arrive and depart (Figs. 3 and 4a). The metro was built specifically to support the transport of pilgrims undertaking movements 2–4 outlined in Fig. 1. It is able to transport up to 500,000 pilgrims during each of these movements (Koch 2019b). Particularly in the event of heavy crowds and the resulting waiting times, only the capacity of the trains is a bottleneck. The entrances and exits of the stations are all very generously designed and the management of the platforms ensures that the trains are always filled without delays due to an even distribution of passengers on the platforms.

At Makkah Metro, user satisfaction is an important consideration. If user satisfaction falls into a certain negative range, the operator of the metro will be replaced. Therefore, surveys are conducted on a regular basis to assess it.

Fig. 3
figure 3

Stop of the Makkah Metro. Glass doors ensure safe boarding and disembarking from the metro trains. (Source: meccametro.com)

Regarding user satisfaction, literature largely agrees on the dis-confirmation of expectations paradigm conceptualized by Oliver (1980). It defines user satisfaction as the result of a cognitive comparison between the users’ expectations and their perceptions of the quality of a product or service. If the expectations of the customer are exceeded, the customer is satisfied as they experience a positive dis-confirmation of their expectations. If the expectations are not reached, the customer is dissatisfied due to a negative dis-confirmation. Zero dis-confirmation applies if customer expectations are met (exactly) and it is likely that the costumer is satisfied (Swamidass 2000).

Fig. 4
figure 4

Pedestrian flows from and towards metro stations

Concerning the Makkah Metro, waiting time is assumed to be important to satisfaction. This is owing to a variety of factors. First, to successfully complete the Hajj, pilgrims need to perform the rituals in a timely manner. A failure due to delays would thus be unacceptable for any pilgrim, not least because participating in the Hajj requires a substantial financial outlay. Prices differ significantly across the different countries’ pilgrim organizations and also depend on the pilgrims’ required level of comfort for the journey (e.g. accommodation in tents or hotels, first class flight, etc.), but usually lie between $2500 and $15,000 per person for pilgrims from outside of Saudi Arabia. However, pilgrims from wealthy countries such as Dubai may pay up to $68,000 for so-called “VIP” packages (Ladki and Mazeh 2017). In 2021, ticket prices for pilgrims from inland Saudi Arabia were advertised at SR12,000–SR16,000, which is about $3200–$4300 (Saudi Gazette 2021).

Furthermore, it can be very unpleasant to have long waiting times outside a metro station, especially in the heat of the Arabian Peninsula, with average daily mean temperatures of 31\(\,^\circ\)C (88\(\,^\circ\)F) and maximum temperatures of up to 51\(\,^\circ\)C (124\(\,^\circ\)F) measured in Makkah between 1985 and 2013 (Abdou 2014). Not only the heat, but also the sheer mass of people can be uncomfortable due to the fact that there is little space in such a dense crowd (Fig. 4).

We assume that factors such as heat, crowding, reliability, accessibility and comfort contribute significantly to metro users’ satisfaction. Unfortunately, our data does not allow us to separate these effects. Nevertheless, it is not unrealistic to assume that the negative effects of crowding, for example, increase with waiting time. Therefore, the perceived waiting time should be a good proxy.

Apart from these exacerbating factors, we know from the literature on the value of travel time savings that in-vehicle travel time often is less of an issue than out-of-vehicle walking and waiting time. Quarmby (1967), for example, discovered that one minute of walking to a bus station or waiting for the bus at the station is viewed in the same way as two to three minutes of in-vehicle travel time. Abrantes and Wardman (2011) reported similar findings in their study, where the dis-utility resulting from waiting for one minute amounts to twice the dis-utility of in-vehicle travel time.

To the best of our knowledge, satisfaction with the operations at mega events and mass gatherings has not, to date, been thoroughly analyzed, nor is it well understood (Kassens-Noor et al. 2015). Therefore, our contribution is twofold: (i) we are adding to the growing literature on customer satisfaction in the service industry and (ii) we are providing initial insights into visitor satisfaction with operations at mega events. In particular, we provide evidence for cultural/regional and gender differences in pilgrims’ satisfaction due to waiting times (as a function of crowding) during Hajj.

Fig. 5
figure 5

Pilgrim Tent City 2016, taken from Haase et al. (2019)

We expect that reducing the waiting time of pilgrims will increase their satisfaction with the service. However, some pilgrim groups might react more sensitive to waiting time than others. In addition to shortening the waiting time for all pilgrims by optimizing the planning process, another approach would be to give priority to user groups that are particularly sensitive to long waiting times. Positive discrimination towards these groups could be expressed by the choice of location for their camp site or by influencing the flow of people at the stops (Koch 2019a). This would reduce their waiting time, meaning that overall user satisfaction would be positively affected by operational measures. Pilgrims are already organized in establishments (pilgrim-organizations of a greater geographical region) and accommodated in camps according to their place of origin (Fig. 5). These establishments are further subdivided into service-offices each responsible for up to 5000 pilgrims. The service-offices submit their pilgrim groups’ preferred time-slots for conducting the rituals. An algorithm then schedules departure times at metro stations according to pilgrims’ preferences, while ensuring that critical crowding levels are not exceeded at certain locations (Haase et al. 2016, 2019).

Fig. 6
figure 6

Utilization of the Jamarat Bridge’s 4th level in terms of preferences, schedules, and video counting data during Hajj 2016 (Illustration and data taken from Haase et al. (2019)). The horizontal axis refers to 30 min intervals from the 10th to 13th day of Dhu al-Hijjah. Peak periods are displayed by the red shaded areas

Fig. 6 shows the utilization of the 4th level of the Jamarat Bridge. The 4th level is reserved for metro users only, so it is directly related to the utilization of the metro. It shows that transportation demand varies significantly throughout the day. Most pilgrims prefer to perform the rituals at the same times when, according to tradition, their prophet Muhammad did. Peak periods are the periods where most of the pilgrims are expected to conduct the stoning rituals. To avoid crowd disasters none of the registered pilgrims are scheduled to perform the rituals during peak periods (solid red line). But the actual count of pilgrims suggests a low schedule compliance, especially during peak periods. Knowing about the waiting time sensitivities of different user groups thus has the potential to improve overall user satisfaction by adjusting the scheduling procedures accordingly. For this reason, we aim to identify user groups that are more sensitive to waiting time than others by applying an ordered logit model to analyze user satisfaction data.

The remainder of this paper is organized as follows. First, we summarize some of the existing literature on customer satisfaction and waiting times. Based on prior findings we derive hypotheses regarding the general satisfaction and sensitivities towards waiting time of different groups of users of the Makkah Metro. We continue by briefly explaining the ordinal logit model before introducing our unique data from 2012. After presenting and discussing our results we sum up with a conclusion.

2 Waiting time and user satisfaction

2.1 Literature review

Kotler and Keller (2006), seemingly to Oliver (1980), define customer satisfaction as “a person’s feelings of pleasure or disappointment resulting from comparing a product’s perceived performance (or outcome) in relation to his or her expectations”. According to this definition it is a function of both the perceived and expected quality of a product or service. The larger the gap between the two, the greater is the users’ satisfaction or dissatisfaction depending on whether their expectations are exceeded or frustrated.

Parasuraman et al. (1988) objectified and defined service quality as a measure for the difference between customer expectations and their perceived level of service. The authors view perceived service quality as a global attitude towards the service provided by a company, while user satisfaction relates to specific transactions. Thus, a customer may be satisfied with a service whilst feeling the service company is of low quality. For example, the German railway operator “Deutsche Bahn” is considered unreliable by many passengers, but they might still be satisfied with a specific journey if everything went smoothly. Obviously, service quality is important, for example, for customers to recommend a service company to their friends and family. There are models such as SERVQUAL (Parasuraman et al. 1988) or the “Customer Service Quality Index” (Hensher 2015) to measure a firm’s service quality. However, we would like to emphasize that our survey data centers around waiting times and customer satisfaction and is not suitable to feed a typical service quality model.

Waiting time is not exclusively an issue for the public transport sector. The study by Davis and Maggard (1990) revealed that waiting time at a fast-food restaurant negatively influenced customer satisfaction with the service. Interestingly, the pre-order waiting time was perceived as being worse than the order-processing time. This is, to some extent, in line with the above-mentioned finding that out-of-vehicle travel time is perceived to be worse than in-vehicle travel time. One could argue that customers face a reduced level of uncertainty about the service successfully satisfying their needs once they have entered the vehicle or ordered their food, respectively. There are several other studies across many service industries supporting the concept of waiting time having a negative impact on customer satisfaction (Davis and Voilmann 1990; De Vries et al. 2018; Dube-Rioux et al. 1989; Hensley and Sulek 2007; Lee and Lambert 2006; Li 2010; Pruyn and Smidts 1998; Tom and Lucey 1995).

The effects of waiting time on user satisfaction in transport services have been investigated, for example, by Taylor (1994). The author provides empirical evidence, that delay to a flight and also its magnitude, i.e., the length of the additional waiting time at the airport terminal, negatively affects the passengers’ evaluation of the service. More recently, Feng et al. (2016) found that the user satisfaction of bus users in a Chinese province exponentially decreases with their perceived waiting time. Allen et al. (2018) studied how different user characteristics and service attributes of public transport services impact overall satisfaction as well as the level of satisfaction with ten different service dimensions such as frequency, safety, and convenience. They found that the perceived waiting time negatively impacts all satisfaction constructs. dell’Olio et al. (2010) investigated customers’ perception of the quality of the public transport system in Santander, a medium-sized Spanish town. Ordered probit models were applied to analyze which factors account the most for the customers’ opinions of the quality of the service. Their study revealed that service reliability and waiting time had the greatest influence on the respondents’ perceptions of the service.

To our knowledge, literature dealing with the way in which passengers’ socio-economic characteristics interact with waiting time in transportation is scarce. dell’Olio et al. (2011) studied the desired service quality of public transport users and potential users in the city of Santander. Their study thus tackles the customers’ expectations and not their perceptions of the quality of public transport services. They designed a survey to collect stated preference data from both bus users and potential bus users. In the surveys, the authors sought information about the following attributes: waiting time, vehicle occupancy, cleanliness, journey time, comfort during the journey and driver kindness. In addition, they collected data regarding the respondents’ age, gender, income level, and the frequency of use. The interactions of the users’ characteristics and the service attributes were analyzed with several multinomial logit model specifications. It transpired that waiting time, comfort, and cleanliness are the most important factors for all respondents, irrespective of their characteristics. Potential users valued vehicle occupancy, waiting time and journey time most, but they did not care much about cleanliness and comfort. With regard to waiting time interactions, the authors observed only a few effects: (1) women were slightly less sensitive to waiting time than men, but the coefficient was not significant; (2) sensitivity increases with household income, i.e., higher income leads to higher valuations of time; (3) frequent users are less sensitive than sporadic ones. Their main justification for (3) is that high-frequency users have more knowledge about the time schedules, so they can optimize their waiting times.

2.2 Hypotheses regarding user satisfaction for the Makkah Metro

Hypothesis 1::

The impact of waiting time on user satisfaction is independent of the users’ age. dell’Olio et al. (2011) found that the contribution of waiting time to the utility function does not vary with the users’ age. We assume this will also be true for pilgrims using the Makkah Metro.

Hypothesis 2::

Women are more sensitive to waiting time than men. This hypothesis cannot be derived directly from the literature. In fact, it contradicts with the findings of dell’Olio et al. (2011), where men were found to be slightly more sensitive to waiting time than women. However, the case of the Makkah Metro is somewhat different, as people often have to wait in large and dense crowds. Women are around four times more likely to be diagnosed with agoraphobia than men (Bekker 1996). Furthermore, previous research suggests that women react more sensitively to crowded situations in public transport than men (Soza-Parra et al. 2019; Tirachini et al. 2017). According to results from Tirachini et al. (2017), people are willing to accept longer travel times in exchange for less crowded conditions during their journey. The authors also provide evidence that women feel less comfortable than men with increasing vehicle occupancy. Although these studies are concerned with the level of crowding inside vehicles, a similar rationale may apply to waiting in or outside of a station. Moreover, Fan et al. (2016) investigated the relationship between perceived and actual waiting times at transit stations. While they did not conclude that women generally perceive waiting times to be longer than men do (although a weak relationship was found) there was evidence that this is indeed the case when passengers feel they are in an unsafe environment. Ultimately, waiting time sensitivities per se may not differ between men and women but differences may occur in the data that are induced by differences in the level of discomfort associated with the time spent waiting in a dense crowd.

Hypothesis 3::

The general satisfaction level as well as the sensitivity towards waiting time differs depending on the pilgrims’ countries of origin. We expect that the pilgrims’ expectations of the service level are partially driven by their previous experiences with the public transport services in their home countries. Since user satisfaction is a function of the expected and perceived service quality, these different expectations should result in different satisfaction levels. Furthermore, cultural differences may lead to different sensitivities towards waiting time, as for example, according to Hall (1989), culture impacts a person’s perception of time being “short” or “long”. This theory is also supported by Rose et al. (2003), who found that download times impact user satisfaction differently depending on the respondents’ home country. They also reported significant differences in the perceived waiting times between different cultures. The authors performed an experiment with people from four different countries: Egypt, Peru, USA and Finland. The countries were separated into “monochronic” (USA, Finland) and “polychronic” (Egypt, Peru) cultures. On average, the polychronic cultures’ perceived download times were 25% longer than those of the monochronic cultures. Surprisingly, although they perceived the download times to be longer, the subjects from Egypt and Peru had a more positive attitude towards the delay than those from the USA and Finland. Findings that support Hypothesis 3 have the potential to influence the operations to improve the user satisfaction of pilgrims from certain countries of origin, as pilgrims are accommodated by their countries of origin and the location of the campsites can have an impact on the waiting times at the metro stations (Haase et al. 2016; Koch 2019a).

Hypothesis 4::

We assume that the pilgrims’ prior Hajj experience, i.e., the number of times they have participated before, is negatively correlated with their sensitivity towards waiting time. Pilgrims who have already completed the Hajj before should have a more accurate idea of what to expect in terms of waiting times and overall service levels, which might mitigate their judgement about long waiting times during the current Hajj. Moreover, pilgrims who completed the Hajj before the metro was introduced may be used to far more congestion and thus their opinion about the service might be positively affected by the overall improvements achieved through the metro. This would also be in line with initial findings from 2010, where Hajj experience and the pilgrims’ satisfaction with metro operation were positively correlated (Kaysi et al. 2013). Kaysi et al. (2013) analyzed pilgrims’ satisfaction with Makkah Metro in 2010, when the system was running for the very first year (at only 30% of the total capacity). They used descriptive statistics to assume that age and Hajj experience impact pilgrims’ satisfaction with the metro service.

3 Ordinal logit model

The central question of the survey is “How satisfied are you with the work of the Makkah Metro service?”. The answers to that question are of central importance if the operator is to avoid being replaced. The survey allowed for the following answers:

  1. 1.

    I am satisfied with the Makkah Metro service

  2. 2.

    My opinion of the Makkah Metro service is indifferent

  3. 3.

    I am dissatisfied with the Makkah Metro service

The key characteristic here is that the possible answers are ordered. The response “I am satisfied” is closer to the answer “I am indifferent” than to the answer “I am dissatisfied”. A logit model can be estimated such that each answer option is an alternative. The assumption of the logit model of independent errors for each alternative is inconsistent insofar as the alternatives are ordered. If the possible answers are sorted, they are more similar to another if they are closer together in the sequence (e.g. option 1 is more similar to 2 than to 3).

We assume that participants have an opinion on the work of the Makkah Metro service. If this is the case, this opinion is not observable for the analyst and defined as random utility U. The greater U, the greater the participant’s approval of the Makkah Metro service. No matter how good or poor the participant considers the service to be, the question only allows three possible answers. Respondents choose an answer according to their individual level of U. If U is greater than a certain threshold \(\tau _2\), they choose “satisfied”. If U is less than \(\tau _2\) but greater than another threshold \(\tau _1\), they choose “indifferent”, and if U is less than \(\tau _1\) they choose “dissatisfied”.

U is composed of two parts: observed and unobserved factors.

$$\begin{aligned} U=\beta 'x + \epsilon \end{aligned}$$
(1)

The observed factors x have certain factor loadings \(\beta\), which need to be estimated. The unobserved factors \(\epsilon\) are random and summarize all information that is not included in the specified model. Once this part is well defined, the probability of an answer can be calculated exactly. For simplicity, it is assumed that \(\epsilon\) is logistically distributed and consequently, the cumulative distribution of \(\epsilon\) is

$$\begin{aligned} F(\epsilon )=\frac{e^\epsilon }{1+e^\epsilon }. \end{aligned}$$
(2)

Thus, the participant’s choice probabilities are calculated as follows:

$$\begin{aligned} P({\text {satisfied}})&= P(U>\tau _2) \nonumber \\&= P(\beta 'x + \epsilon> \tau _2) \nonumber \\&= P(\epsilon > \tau _2 - \beta 'x) \nonumber \\&= 1 - \frac{e^{\tau _2-\beta 'x}}{1+e^{\tau _2-\beta 'x}} \end{aligned}$$
(3)
$$\begin{aligned} P(\text {indifferent})&= P (\tau _1< U< \tau _2) \nonumber \\&= P (\tau _1< \beta 'x + \epsilon< \tau _2) \nonumber \\&= P(\tau _1 - \beta 'x< \epsilon< \tau _2 - \beta 'x) \nonumber \\&= P(\epsilon< \tau _2 - \beta 'x) - P( \epsilon < \tau _1 - \beta 'x) \nonumber \\&= \frac{e^{\tau _2-\beta 'x}}{1+e^{\tau _2-\beta 'x}} - \frac{e^{\tau _1-\beta 'x}}{1+e^{\tau _1-\beta 'x}} \end{aligned}$$
(4)
$$\begin{aligned} P(\text {dissatisfied})&= P(U<\tau _1) \nonumber \\&= P(\beta 'x + \epsilon< \tau _1) \nonumber \\&= P(\epsilon < \tau _1 - \beta 'x) \nonumber \\&= \frac{e^{\tau _1-\beta 'x}}{1+e^{\tau _1-\beta 'x}} \end{aligned}$$
(5)

The factor loadings \(\beta\) and the thresholds \(\tau\) are then estimated b maximum-likelihood. The model is called ordered logit because it uses the logistic distribution of ordered alternatives (Train 2009, pp. 159–162).

4 Data

In the 2012 “TüV Süd Rail GmbH” Makkah Metro user survey, a total of 10,463 pilgrims were interviewed. In 2012, there were 537,000 metro users in total. The survey was conducted between 24th and 28th of October 2012, i.e., between the 8th and 12th day of Dhu al-Hijjah. Pilgrims were interviewed either on the trains, at the metro stations in Mina and Muzdalifah, or at their campsites. The survey team was instructed to follow a random sampling scheme to obtain a representative sample.

Table 1 Descriptive statistics of all and classification of the categorical variables

In addition to their general satisfaction with the transport service, which is the dependent variable in our model, the pilgrims were also asked about their age, gender, home country, Hajj experience, and maximum waiting time at the stations. Except for age and Hajj experience all exogeneous variables are treated as categorical variables, which are coded binary. An overview of the different categories, as well as the descriptive statistics of all considered variables is provided in Table 1. The “Fixed” column indicates which categories we fixed as the reference category for the respective variables. Note that for any categorical variable with L levels only \(L-1\) levels enter the utility function. The “Satisfaction” variable reflects the answers to the question about the respondents general satisfaction with the metro service. We coded the answers “I am dissatisfied” as \(\text {satisfaction}=1\), “I am indifferent” as \(\text {satisfaction}=2\) and “I am satisfied” as \(\text {satisfaction}=3\). The “Hajj” variable is an integer taking values corresponding to the number of times the respondent has participated in the Hajj including the present one (i.e., 1 = first participation, 2 = second participation, and so forth). In addition, Table 1 provides the mean population figures for the categorical variables (if available) obtained from the Hajj statistics in 2012 (GaStat 2012). The sample distribution by age and gender is shown in Fig. 7. Around 90% of the respondents were 50 years old or younger.

Waiting times were not measured objectively but respondents were asked about how long they had to wait. Since this is not an exact measurement, a quantification is to be viewed critically. This is because an actual waiting time of 20 min may be perceived as 15 min by one person and as 25 min by another. We assume that this error is approximately leveled out over the sample, but an interpretation of the waiting time in terms of the specified minutes is still not very helpful. Therefore, the indicated waiting times were grouped into categories that should rather be interpreted as “short”, “rather short”, “medium”, “rather long” and “long”. Of course, the perceived waiting times may be biased due to endogeneity. More precisely, the dissatisfaction of a customer—for whatever reason—might then negatively affect the stated waiting time, but not vice versa.

Fig. 7
figure 7

Age distribution of the survey respondents

5 Results

5.1 Hypothesis testing

To test our hypotheses, we estimated an ordered logit model including all variables and also all variable interactions that were relevant (Model 3). Beforehand, two initial models were run (a) only considering the waiting time intervals (Model 1, baseline) and (b) additionally controlling for the users’ characteristics (Model 2). For validation purposes we also estimated Model 3 assuming that the unobserved parts of utility \(\epsilon\) are standard normal distributed (Model 4, ordered probit). To estimate the models we used the Apollo package in R (Hess and Palma 2019a, b). Table 2 displays the estimation results from the four models. It depicts all estimated coefficients of the attributes as well as the relevant interactions for our hypotheses. The \(\tau\) values represent the thresholds of user satisfaction. As expected, waiting time negatively affects the pilgrims’ utility in all models. However, the dis-utility from waiting does not consistently rise with the wait duration, as we observe that respondents in the third interval are more dissatisfied than those in the second interval for Models 1 and 2. Taking the interactions into account, this inconsistency only applies to most of the waiting time coefficients in Model 3, presented in Fig. 8. The estimation results from Models 3 and 4 are fairly similar, ignoring the fact that the respective coefficients are scaled differently.

Table 2 Estimation results of Models 1–4, \(N=\text {10,463}\) observations

The probability distributions of the satisfaction ratings for Model 3 depending on \(\beta 'x\) are shown in Fig. 9. We also plotted the “average respondent’s”Footnote 2 level of \(\beta 'x\) for the different waiting time intervals. Note that in the waiting time interval “16–30 min” the probability of obtaining a satisfied rating is already half of that where the average respondent had to wait 0–15 min while the probability for a dissatisfied rating triples (except for pilgrims from other countries). Still, the average respondent is most likely to give a rating of either “indifferent” or “satisfied” for all waiting time intervals.

To test Hypothesis 1, we considered the interaction of age and waiting time in Model 3 to determine how user satisfaction changes with age as pilgrims wait. User satisfaction decreases with increasing age in all waiting periods. However, the effect of age appears to be very small. The coefficient is only significant for those pilgrims that had to wait 16–30 min (but still very small in magnitude). If we look at Fig. 8, it becomes clear that the effect of age is negligible for all waiting time intervals. Thus, our results mostly confirm the hypothesis that the contribution of waiting time to the utility function is independent from the passengers’ age. Additionally, we also observe that the general positive effect of age in Model 3 counteracts its interactions with waiting time. This would explain the increase of \(\beta _\text {Age}\) in both magnitude and significance from Model 2 to Model 3.

Fig. 8
figure 8

The contribution of waiting time to the utility function of Model 3 depending on Home Country, Gender, Age, and Hajj experience (all others constant); example calculation for the interaction of Home Country and the second waiting time interval: \(\beta _\text {wait}=(\beta _\text {wait1630} + \beta _\text {wait1630,Home Country} \times \text {Home Country}) \times \text {wait1630}\)

Fig. 9
figure 9

Cumulative distribution functions of the response variable. The deterministic utility values and the respective choice probabilities of male pilgrims for the different countries of origin and waiting time intervals are represented by the vertical dashed lines. The continuous variables are set to their sample means, i.e., \(\text {Age}=34.07\) and \(\text {Hajj}=2.33\)

Hypothesis 2 states that women respond more sensitively to longer waiting times than men. To test this hypothesis, we examine the interactions of the gender and waiting time attributes. We set “male” as the reference category. For instance, the coefficient \(\beta _{\text {female,wait1630}}\) reflects the additional effect on the satisfaction of being a female user rather than a male user, if they had to wait 16–30 min. A negative coefficient then refers to women being more sensitive to waiting time than men and vice versa. The estimation results show that women are significantly more sensitive towards waiting time than men in all waiting time intervals. Also, note that the general negative effect of the “Female” variable from Model 2 is almost completely transferred to the interactions of waiting time and gender in Model 3 where the general effect is close to zero and far from significant. We therefore clearly see that our hypothesis is confirmed.

Some service offices already send their women and men to the ritual sites in separate groups (Haase et al. 2019). An operational implication would thus be to improve the women’s satisfaction by sending them to the metro when waiting times are short. However, most pilgrim groups are mixed. Changing this would require some coordination between the organizers and the service offices and could also negatively affect satisfaction as most pilgrims may not want to be separated.

In all four waiting periods studied, pilgrims from South Asia are most sensitive to waits compared to pilgrims from other countries. Sensitivities also vary across the different countries in all waiting time intervals. Only the “46–60 min” interval reflects no difference in sensitivity between pilgrims from Saudi Arabia and those from “Other Countries”. We consider that Hypothesis 3, i.e., pilgrims from different regions have different waiting time sensitivities, is confirmed. However, only the interaction coefficients for pilgrims from South Asia are significant, while those for Saudi Arabia and the Gulf States are not, except for the interaction of Gulf States and the interval “>60 min”. Surprisingly, the pilgrims from the Gulf States behave inconsistently, as their dissatisfaction with waiting peaks at the “31–45 min” interval and then consistently decreases afterwards. They are the only pilgrims who were more dissatisfied with waiting 45–60 min than with waiting more than 60 min.

Based on these findings, it seems likely that user satisfaction could be increased, since pilgrims are guided to the metro separately in groups from the same geographic region. Thus, pilgrims from South Asia may be given priority regarding the arrival at the metro. That is, they could be guided to the metro when waiting times are likely to be short.

In general, very long waiting times at the stations indicate issues with the scheduling process and also its communication to the pilgrims. An explanation might be that the schedule is not sufficiently communicated or enforced.

Hypothesis 4 states that pilgrims with a greater Hajj experience are less sensitive to longer waiting times. However, our results show the opposite and the fourth hypothesis cannot be supported. User satisfaction decreases in Hajj experience for all waiting time intervals. The negative effect of Hajj experience on user satisfaction we observe in Model 2 is diminishing in Model 3 for which the Hajj coefficient is barely significant. Since the metro was introduced in 2010, it is unlikely that in 2012 pilgrims who had visited Makkah multiple times were already familiar with the metro. This makes it difficult to conclude that the metro service level had decreased in comparison with previous years.

5.2 Scenario analysis

In this section we manipulate the waiting time distribution in our sample according to predefined scenarios (policies) and predict the market shares of the satisfaction levels for each scenario using the results from Model 3. To provide prediction intervals for the market shares we not only use the point estimates from Model 3 but also consider 1000 draws from the coefficients’ asymptotic distribution.

We simply obtain the predicted market shares for the alternatives by sample enumeration (Train 2009, p. 31), i.e., averaging the predicted choice probabilities (3)–(5) over all observations:

$$\begin{aligned} S_i = \frac{1}{N} \sum _{n=1}^{N}P_{ni} \hspace{1.5cm}i \in \{\text {satisfied, indifferent, dissatisfied}\}, \end{aligned}$$
(6)

where \(S_i\) is the market share of alternative i and \(P_{ni}\) is the probability of individual n choosing alternative i. The waiting time distributions of five different scenarios and the prediction results are shown in Table 3. Scenario 0 refers to the original waiting time distribution of our data. A visualization of the results is presented in Fig. 10.

Already for the more or less realistic scenarios 1 and 2 we observe a noticeable decline in dissatisfied pilgrims of which the majority redistributes to a satisfied rating. In scenario 4, it is even the case that the additional pilgrims who state that they are satisfied, result from reductions in both indifferent and dissatisfied pilgrims. The results for all scenarios are mainly achieved by a reduction and redistribution of pilgrims in the wait61 category. Scenarios are chosen arbitrarily, and the predictions should only serve as an illustration of how an overall improvement in waiting times at the stations could affect the pilgrims’ satisfaction with the transit service. However, the outlined scenarios serve as guidelines for decision makers to identify the return of service improvements (reduction of waiting times due to adjusted scheduling) in terms of satisfaction levels.

Table 3 Market share predictions and 95% prediction intervals for different waiting time distributions
Fig. 10
figure 10

Visualization of the results presented in Table 3

Excessively high waiting times of more than 45 min could be avoided if all metro users adhered to the schedules provided to the service offices. The service offices provide guides to their pilgrim groups. As mentioned earlier, these service offices belong to establishments which are basically travel agencies licensed by Saudi Arabia to guide pilgrims during the Hajj. Metro ticket owners are identified by wristbands that are equipped with an RFID chip. Pilgrims may only pass the station gates if they wear a valid wristband. Also, the chips allow to count the number of pilgrims accessing or exiting the stations. Adding information about which service office a ticket holder belongs to would help identify service offices that do not follow the specified schedules. In this way, service offices that do not comply adequately with the schedules could be penalized economically, for example, by having their license revoked.

In addition to the manipulation of waiting times in the whole sample, we changed the waiting time distribution of pilgrims from South Asia and the Gulf States only. The aim is to show the impact of the different waiting time sensitivities on the satisfaction levels of the different countries. The sample distributions of waiting times as well as the general satisfaction for pilgrims from South Asia and the Gulf States are quite different (Table 5, Scenario 0). To provide comparable results, we proceed as follows. We change the waiting time distributions of both countries independently but in the exact same way as stated in Table 4. Afterwards we predict the market shares for South Asia and the Gulf States, respectively. We then compare the changes in the market shares compared to the original waiting time distributions in the sample.

The results presented in Table 5 and Fig. 11 clearly show a difference in the countries’ sensitivities towards waiting time. The satisfaction within the segment of South Asian pilgrims reacts much more sensitive to an improvement of waiting times compared to the Gulf states as it was already suggested in Fig. 8. Again, scenario 0 refers to the original waiting time distributions within the segments of South Asia and the Gulf States, respectively. Scenario 5 relates to the waiting time distributions after the changes from Table 4 have been applied.

Table 4 Changes of the waiting time distribution applied to pilgrims from South Asia and Gulf States independently
Table 5 Market share predictions for the segments South Asia and Gulf States corresponding to the original waiting time distribution in the sample (Sc. 0) and to the changes stated in Table 4 (Sc. 5)
Fig. 11
figure 11

Visualization of the results presented in Table 5

6 Conclusion

Our results can be used to develop ways to increase user satisfaction. They support Hypotheses 1–3, i.e.,

  1. 1.

    waiting time sensitivities do not vary with the pilgrims’ age,

  2. 2.

    female pilgrims are more sensitive to long waits than men,

  3. 3.

    and waiting time sensitivities differ according to the pilgrims’ home countries.

On the other hand, we cannot conclude that pilgrims with more Hajj experience are less sensitive to waiting time than those with less experience, as, in fact, the opposite seems to be the case. However, Hypothesis 4 is not suitable for deriving operational implications anyway, but rather provides information on how service has improved or deteriorated in comparison with previous years. The dissatisfaction with waiting time as well as the general dissatisfaction of more experienced pilgrims may also stem from factors such as rising prices or larger numbers of participants compared to previous years or could be related to an “everything was better in the old days” type of attitude.

Based on our findings, user satisfaction could be improved if the pilgrims were led to the metro separately by country of origin. Thus, pilgrims from South Asia, who are more sensitive to long waiting times than other pilgrims, may be treated preferentially upon arrival. In order to do this, they would be led to the metro when waiting times are expected to be short. Directing pilgrims to the metro separately by country of origin is comparatively easy to achieve since they are already accommodated in groups according to their home countries (Fig. 5)Footnote 3. That being said, our results support the hypothesis that culture has an impact on satisfaction with waits, as it has been suggested in previous literature (Chung et al. 2015, 2016; Hall 1989; Pàmies et al. 2016; Rose et al. 2003).

Another main finding is that women are clearly more sensitive to waiting time than men. This contradicts findings from previous studies where no direct relationship was found. As discussed in the literature review, there might not be a direct effect in our case either. Instead, the observed difference in waiting time sensitivities could be due to factors such as crowding discomfort, as there is already evidence females are more sensitive to this. Yet, it is obvious that crowding-levels and waiting times at the stations are closely related and, therefore, guiding the female pilgrims to the metro when waiting times are likely to be short has the potential to improve overall user satisfaction. Some pilgrims might not want to perform the rituals separately from the rest of their travel groups, but if they decide to do so female pilgrims could be given priority to have shorter waiting times.

Separating the pilgrims by age, on the other hand, seems to be unnecessary, as the effect of age on waiting time sensitivities is negligible or almost non-existent. However, it should be mentioned that other factors, such as comfort, an adequate supply of drinking water during waits, and station cleanliness may have an impact on user satisfaction. These factors are not taken into account in this study.

From the scenario analysis provided, a reasonable reduction of 8%-points in the proportion of pilgrims waiting longer than 60 min could already lead to a small but noticeable improvement in overall user satisfaction. Namely, the number of dissatisfied pilgrims would be reduced by 2–4%-points, while the number of indifferent and satisfied pilgrims would be increased by 1–3%-points each. We believe that waiting times of more than 60 min are due either to inadequate communication and/or enforcement of the departure schedule for pilgrim groups. In addition to a better communication of the schedule, communicating the expected waiting times at the stations to the pilgrims, especially at peak periods, might increase their schedule compliance.