Open Access
Article

Transportation

, Volume 40, Issue 2, pp 413-430

Representing and estimating interactions between activities in a need-based model of activity generation

Authors

  • Linda Nijland
    • Urban and Regional Research Centre Utrecht, Faculty of GeosciencesUtrecht University
  • Theo Arentze
    • Urban Planning Group, Faculty of the Built EnvironmentEindhoven University of Technology
  • Harry Timmermans
    • Urban Planning Group, Faculty of the Built EnvironmentEindhoven University of Technology

DOI: 10.1007/s11116-012-9423-8

Abstract

Although several activity-based models made the transition to practice in recent years, modeling dynamic activity generation and especially, the mechanisms underlying activity generation are not well incorporated in the current activity-based models. For instance, current models assume that activities are independent, but to the extent that different activities fulfill the same underlying needs and act as partial substitutes, their interactions/dependencies should be taken into account. For example, recreational, leisure, and social activities tend to be partly substitutable since they satisfy a common need of relaxation, and when undertaken together with others, social needs will be satisfied as well. This paper describes the parameter estimation of a need-based activity generation model, which includes the representation of possible interaction effects between activities. A survey was carried out to collect activity data for a typical week and a specific day among a sample of individuals. The diary data contain detailed information on activity history and future planning. Estimation of the model involves a range of shopping, social, leisure, and sports activities, as dependent variables, and socioeconomic, day preference, and interaction variables, as explanatory variables. The results show that several person, household, and dwelling attributes influence activity-episode timing decisions in a longitudinal time frame and, thus, the frequency and day choice of conducting the social, leisure, and sports activities. Furthermore, interactions were found in the sense that several activities influence the need for other activities and some activities affect the utility of conducting another activity on the same day.

Keywords

Activity-based modeling Dynamic activity generation Travel-demand modeling Needs Bayesian estimation

Introduction

Considerable progress has been made in development and application of activity-based models over the last decade. Examples of fully operational models are CEMDAP (Bhat et al. 2004), Albatross (Arentze and Timmermans 2004), Famos (Pendyala et al. 2005), and TASHA (Roorda et al. 2008). Currently, the models are making the transition to practice where they find application as instruments for planning support and policy evaluation. However, there is still ample room for improvement. High on the research agenda are the generation of activities based on the needs they satisfy or induce, interactions between activities, scheduling at the household level, and activity scheduling for a multi-day period.

In the existing activity-based models, mechanisms underlying activity generation are still poorly understood and not-well represented as argued by Roorda et al. (2008) and Habib and Miller (2008). Chapin (1974) was the first to argue that daily activities of individuals are driven by basic needs and that this concept should be the basis of activity-based approaches. About three decades later, this notion is further emphasized by Miller (2004) and Axhausen (2006). Maslow’s hierarchy of needs was suggested by Miller (2004) as a framework for modeling short- and long-term household-based decision making, but this was to the best of our knowledge not implemented in the models of his group. Meister et al. (2005) and Märki et al. (2011) to some extent also implemented needs into their models of activity scheduling.

The influence of history and interactions between activities on activity patterns has thus far been largely ignored in activity generation models. Basically, existing activity-based models assume a set of activities, which are then related to travel decisions. Generally, activities have been assumed implicitly or explicitly to be independent (an exception is e.g., Bradley et al. (2010)). However, different activities may (partially) satisfy the same underlying needs (Arentze and Timmermans 2009). That is, activities may be (partial) substitutes in satisfying underlying needs. Especially social, recreational and leisure activities tend to be partly substitutable because they satisfy a common need of relaxation (Nijland et al. 2010). For example, both cycling and walking (as leisure activities, not as a way of travelling) will satisfy similar needs (e.g., physical exercise, fresh air). Those activities, when conducted together with others, will also contain an element of meeting other people and therefore will partly satisfy some general social needs as well. Technically, this reasoning implies that interactions between activities should be taken into account in dynamic activity-generating models.

From the point of view of transportation, substitution effects have been investigated between in-home and out-of-home activities (Yamamoto and Kitamura 1999; Meloni et al. 2004; Akar et al. 2011), as out-of-home activities generate trips (e.g., preparing a meal at home vs. going out for dinner). Also substitution effects in the context of the use of ICT and telecommuting were examined in the past (Salomon 1998; Mokhtarian et al. 2006). Lu and Pas (1999) included rules for interactions among activity types into their model by dividing the activities into four types, namely subsistence, maintenance, recreation, and other activities and making a distinction between in-home and out-of-home activities. Although the authors focused on the relationships between in-home and out-of-home activities, they discovered that the duration of out-of-home subsistence (work) decreases as the duration of out-of-home maintenance, recreation or other activities increases.

Empirical studies based on multi-week activity diary data support the hypothesis that activity generation should be considered in a dynamic framework. Bayarma et al. (2007) made use of six-week travel diary data in order to analyze multi-day travel behavior. They observed that the daily travel patterns of individuals are heterogeneous. Schönfelder (2006) used the same longitudinal dataset to measure the repetitiousness of travel behavior. Bhat et al. (2005) used a multivariate hazard model to analyze the length between successive participations of shopping, social, recreation, and personal business activities. They found different weekly rhythms for participating in those activities, except that the rhythm for shopping activities is less distinctive. Furthermore, the results showed that interactions between activities as well as strong day-of-the-week effects on inter-episode durations within these activities exist.

Arentze and Timmermans (2009) developed a theoretical framework based on the assumption that activities are driven by a limited and universal set of subjective needs at person and household level. Within this framework the needs grow autonomously over time according to a logistic curve with parameters depending on the nature of the need and characteristics of the individual and the household. The model predicts the timing and duration of activities in a longitudinal time frame taking into account time budget constraints, possible interactions between activities, and both household-level and person-level needs. The face validity of the suggested framework and modeling approach is supported by the results of numerical simulations, demonstrating the possibility of incorporating positive and negative substitution effects between activities and complex dynamic interactions between activities in general. In a follow-up study, based on this framework, the authors developed a RUM model and explored the extent to which the model can be estimated using existing one-day datasets (Arentze et al. 2011). Until now, however, their approach lacks a full empirical validation based on data specifically collected for that purpose.

The aim of the present paper is to test the suggested approach empirically and to estimate parameters of the supposed relationships using data specifically collected for that purpose. The paper describes the modeling approach and estimation results based on a survey, designed to model and predict the timing of activities with respect to underlying needs. The model focuses on social, leisure, and sports activities (as those activities are most likely to be substitutable), the school/work hours in a typical week and the activity history and agenda of a specific sampled day. Factors included in the survey and the model consist of socioeconomic and demographic variables, activity history (e.g., time elapsed since last performance), and available time for discretionary activities (i.e., the amount of time spent on work or education on a day). The survey was held among a sample of approximately 300 individuals through a web-based questionnaire.

First, we will briefly summarize the RUM specification of the need-based concepts and model. This is followed by a description of the survey and the sample. “Results” describes the results of the parameter estimations. The paper closes with a discussion of the main findings of the study and remaining problems for future research.

Need-based model

Basic model

In this section we will briefly outline a model for predicting the timing of activities in a multi-day time frame that is proposed in Arentze et al. (2011). The model is based on concepts from a more theoretical need-based model of activity generation, which we cited above, and has parameters that should be identifiable based on observed temporal patterns of activities. The model predicts a multi-day activity pattern agenda for a given person for a period of arbitrary length. Rather than solving some resource allocation optimization problem, the model assumes that individuals make activity-selection decisions on a daily basis. Although the need-based model is able to take into account interactions between activities and between persons (in a household context), the RUM model used for first estimation considered a more limited situation where an individual is faced with a decision to conduct an activity i on a current day d given that the last time the activity was conducted was on day s < d (this means that the time elapsed equals d−s days). The utility of conducting an activity of type i on a given day d is defined as:
$$ U_{nid} (s) = V_{1ni,d - s} + V_{2,nid} + \varepsilon_{1nis} + \varepsilon_{2nid} $$
(1)
where n is an index of individual, d is the current day, s is the day activity i was conducted the last time before d, V 1ni,d−s is the utility of satisfying the need for activity i built-up between s and d, V 2,nid is a (positive or negative) preference for conducting activity i on day d and ε 1nis and ε 2nid are error terms related to need build-up (ε 1) and day (ε 2).

The utility components can be interpreted as follows. The first term (V 1) represents the amount of the need that has been built up across the elapsed time and that will be satisfied if the activity is implemented. The second term (V 2) represents a base utility dependent on preferences for day d. Note that events that are not driven by needs, but rather take place on a certain fixed day, can be modeled as activities with zero need growth (V 1 = 0) and a relatively high utility for the day (V 2 ≫ 0) when the event is to take place.

Implied by the first term is that a need for an activity grows over elapsed time since day s. There are several functional forms conceivable for a need’s growth curve. The original model assumed a logistic growth function, but also suggests that under normal conditions need growth only moves in the area around the inflection point where the curve is approximately linear. To reduce the number of parameters, the RUM model, therefore, assumed a simple linear function here:
$$ V_{1nit} = \beta_{ni} t $$
(2)
where β ni is a growth rate and t is the length of the need growth period between s and d (t = d−s).

The above equations define the (history-dependent) utility of an activity. A decision heuristic that takes into account limited time-budgets states that an activity i should be conducted on day d if d is the earliest moment when the utility of the activity per unit time exceeds a threshold. The utility-of-time threshold imposes a constraint on activity generation and represents an individual’s scarcity of time. The smaller a time budget for activities, the larger the threshold needs to be. When the threshold is well adjusted, the rule leads to fully use of available time (i.e., the budgets are exhausted). As the authors argue, the heuristic, even though it is simple, will lead, as a tendency, to patterns where the utility of activities across a multi-day period cannot be improved by a revision of activity timing decisions when thresholds are well-adjusted to existing time budgets.

As a first step in estimating the model, the existing RUM model leaves activity duration out of consideration. This means that the threshold is defined on the level of utility of the activity rather than utility per unit time. The decision rule then becomes: conduct the activity at the earliest moment when the following condition holds:
$$ U_{nid} (s) > u_{nd}^{o} $$
(3)
where \( u_{nd}^{o} \) represents a threshold for implementing activities on day d, given existing time demands on that day. Note that defined in this way, the need-growth parameter β for some activity will capture the time needed to overcome the threshold taking into account a (average) duration of that activity. For example, keeping everything else equal, the need-growth speed will be smaller, i.e., it takes longer to overcome the threshold, if the activity has a longer duration. The threshold is estimated based on observations of activity-participation decisions as a function of indicator variables of available time-budget (e.g., the work hours on the day considered).
Following a general approach in mixed-logit modeling, the choice model is derived from the assumption that ε 2, is normally distributed (ε 2 ~ N (0, σ)) and simulated, whereas the first error term, ε 1, is Gumbel distributed. Given this assumption, an ordered-logit framework of the following form can be derived from decision rule (3) (Arentze et al. 2011):
$$ P_{ni} (d|s) \, = \frac{{\exp [Z_{nid} (s)]}}{{1 + \exp [Z_{nid} (s)]}} - \frac{{\exp [\max_{k = s + 1}^{d - 1} [Z_{nik} (s)]]}}{{1 + \exp [\max_{k = s + 1}^{d - 1} [Z_{nik} (s)]]}} $$
(4)
where
$$ Z_{nid} (s) \equiv V_{1ni,d - s} + V_{2nid} + \varepsilon_{2nid} - u_{nd}^{o} $$
(5)
The first term on the right-hand-side of Eq. (4) defines the probability that the need for the activity has exceeded the threshold at day d and the second term represents the probability that the threshold has not been exceeded before this day. Thus, the equation defines the probability that day d is the earliest moment that the threshold is overcome. Note that the conditional probabilities sum up to one across days after s:
$$ \sum\nolimits_{d > s} {P_{ni} (d|s) = 1} $$
(6)

Although, in this equation, d goes to infinity, the cumulated probability will quickly approach one in the type of activities that are relevant here. Thus, P defines a choice probability distribution across days after s. In other words, the model predicts for a given activity and individual the probability of an interval time (t = d−s), thereby taking into account possible day-varying conditions related to day preferences and time budgets, in addition to need build-up rates. Note that the model determines whether or not an activity of a given type is conducted on a certain day; it leaves out of consideration whether this involves a single or multiple episodes of the activity on the same day.

The model represents dynamics of activity-generation decisions that follow from the fact that needs take time to re-build, and preferences and time budgets for conducting the activity may differ from day to day. A preference or size of the time-budget for a certain day of the week generates secondary effects on probabilities for other days. Secondary effects emerge because a need for the activity needs time to rebuild after the activity has been conducted. A static model which does not incorporate need build-up time, is not able to represent secondary effects of day-preferences and time-budgets, and, hence, would make wrong inferences about intrinsic day preferences.

Interactions between activities

In this section we propose a way to specify the above framework such that it can take into account interactions between activities. Interactions between activities can run through the needs component, V 1, and through the day component, V 2. As for needs, interactions occur if one activity increases or decreases the need of another activity. An example is a shopping activity that partially satisfies a need for a social activity and partially satisfies a need for being out in the open air, etc. Interactions on the level of the day-utility component concern possible benefits of combining two activities on a same day. An example is that combining a social and shopping activity on a same tour saves travel time. In this paper, we propose the following way to incorporate these notions:
$$ V_{1nid} (s) = \sum\limits_{t = s + 1}^{d} {\left( {\beta_{ni} + \sum\nolimits_{j} {\delta_{ij} I_{ntj} } } \right)} - \sum\nolimits_{j} {\delta_{ji} } $$
(7)
$$ V_{2nid} = \alpha_{nid} + \sum\nolimits_{j} {\varphi_{ij} I_{ndj} } $$
(8)
where n is an index of individual, β i is the size of daily increase of a need for activity i, as before, and δ ij is an increase in need of activity i caused by activity j, I tj  = 1 if activity j is conducted on day t and I tj  = 0, otherwise, α id is a preference for conducting activity i on day d and \( \phi_{ij} \) is the increase in utility of activity i when conducted on the same day as activity j. The last term on the right-hand side of Eq. (7) represents the notion that a need increase caused by an activity i for some other activity j reduces the utility of i with the same amount. Thus, the first term on the right-hand side of Eq. (7) represents the total need on a day d for the activity, depending on the history. Given the assumption that the existing need of an activity is fully satisfied when the activity is conducted, the total need for the activity is equal to a utility. The last term represents the total increase of needs for other activities caused by the activity. The need increase for other activities must be discounted, as it is a disutility.

The parameters β i , α id , u 0, δ ij , and \( \phi_{ij} \) are to be estimated on data. Parameters δ ij , and \( \phi_{ij} \) are new and represent the supposed two forms of interactions. We impose no restrictions on the ranges for these two parameters. As for the need-based interaction terms, a value of δ ij  > 0 would represent a negative substitution effect (activity j increases the need for i, e.g., a sports activity will increase the need to rest) and δ ij  < 0 a positive substitution effect (activity j decreases the need for i, e.g., touring by bike will decrease the need for walking as a leisure activity). Furthermore the substitution effects may be a-symmetric in the sense that δ ij  ≠ δ ji . Although we do expect that the two parameters have the same sign, we do not restrict the search range for the parameter in a log-likelihood estimation. Similarly, for the day-based interaction terms, \( \phi_{ij} \) can take on a positive as well as a negative value and need not be symmetric for any pair ij. A negative value indicates that a negative preference exists to combine the activities on the same day and a positive value indicates that there is a positive preference for doing so. Note that the difference in effects of the δ and φ parameters can be identified in that δ represents a longer lasting effect across days and φ represents a short-term effect within a day on utilities of activities.

We use the following decomposition of parameters:
$$ \beta_{ni} = \beta_{i}^{0} + \sum\nolimits_{k} {\beta_{ik} X_{1nik} } $$
(9)
$$ u_{nd}^{\text{o}} = \mu_{d}^{0} + \sum\nolimits_{m} {\mu_{m} X_{2dm} } $$
(10)
where X 1, X 2 are sets of explanatory variables of activity needs (Eq. 9) and time budgets (Eq. 10), and β 0 and μ 0 are base parameters and β and μ are effect parameters, to be estimated. On the other hand, for α, δ, and φ parameters we do not estimate effect parameters for reasons of parsimony (considering the degrees of freedom of the model). Finally, we use a mixed logit framework to estimate the scale σ i of the day-based error term ε 2i (ε 2i  ~ N (0, σ i )) for each activity i.Thus, the model takes into account that variance in utility caused by unobserved daily circumstances can differ between activities.

Model estimation

As expressed in Eq. (6), Eq. (4) defines a probability distribution across days d after s. Whether or not this form can be used to determine likelihoods of observations depends on the nature of observations. In the survey conducted to estimate the model (see below) individuals recorded their activity agenda for a given day (d) and in addition for an exhaustive list of activities the day the activity was performed the last time (s). In case of such observations, we know that the activity has not been conducted in the time between s and d. According to the model, the probability that the activity has not been conducted in the period from s + 1 and d−1 is defined as:
$$ Q_{ni} (d|s) \, = 1 - \frac{{\exp [\max_{k = s + 1}^{d - 1} [Z_{nik} (s)]]}}{{1 + \exp [\max_{k = s + 1}^{d - 1} [Z_{nik} (s)]]}} $$
(11)

Therefore, the probability of observing i in the agenda for day d knowing that the activity has not been conducted until that day is given by:

$$ L_{ni} (1|d,s) = P_{ni} (d|s)/Q_{ni} (d|s) $$
(12)
\( L_{ni}^{{}} (1|d,s) \) is the likelihood of observing activity i given observation day d and recalled last day s. This likelihood has the following property:
$$ L_{ni} (1|d,s) + L_{ni} (0|d,s) = 1 $$
(13)
where \( L_{ni} (0|d,s) \) is the likelihood of not observing activity i given observation day d and recalled last day s. The likelihood for a sample of observations can be defined as a function of the model’s parameters as follows:
$$ L(Y|\theta ) = \prod\nolimits_{n} {\prod\nolimits_{i} {L(y_{ni} |\theta )} } $$
(14)
where Y is a sample of individuals, θ is the set of parameters included in the model y ni is a binary variable of observing activity i in case of individual n.
L(y ni ) is a simulated likelihood to estimate for each activity the scale, σ i , of the day-based error term, ε 2i , simultaneously with the other parameters, θ, as follows:
$$ L(y_{ni} |\theta ,\sigma_{i} ) \approx (1/K)\sum\nolimits_{k}^{K} {L(y_{ni} |\theta ,\sigma_{i} ,{\rm E}_{2nik} )} $$
(15)
where \( L(y_{ni} |\theta ,\sigma_{i} ,{\rm E}_{2nik} ) \) is a likelihood as defined by Eq. (12), \( {\rm E}_{2nik} \) is a vector of drawn error terms across days in the observed interval (\( {\rm E}_{2nik} = ( \ldots ,\varepsilon_{2nikd} , \ldots ) \)), K is a pre-defined number of draws of this vector and \( L(y_{ni} |\theta ,\sigma_{i} ) \) is the simulated likelihood of the observation for individual n regarding activity i conditional on given settings of the parameters.

The likelihood function (or log-likelihood function) appears to be non-smooth in the area of the optimum values of β parameters in particular. Furthermore, due to the dependency relationship between activity probabilities across days, i.e., the secondary effects, convergence of search processes for optimal parameter values in standard log-likelihood methods is very slow. To circumvent these problems, we used a Bayesian method of estimating parameters. Bayesian methods are known to be more robust, as they do not use a function maximization process (Rossi et al. 2005).

The Bayesian method we used for the present estimation task is based on the following equation:
$$ K(\theta_{i} |\bar{\theta }_{i - }^{n} ,\bar{\theta }_{i + }^{n - 1} ,Y_{n} ) = \frac{{L(y_{n} |\bar{\theta }_{i - }^{n} ,\theta_{i} ,\bar{\theta }_{i + }^{n - 1} )K(\theta_{i} |\bar{\theta }_{i - }^{n} ,\bar{\theta }_{i + }^{n - 1} ,Y_{n - 1} )}}{{\sum\nolimits_{\theta } {L(y_{n} |\bar{\theta }_{i - }^{n} ,\theta ,\bar{\theta }_{i + }^{n - 1} )K(\theta |\bar{\theta }_{i - }^{n} ,\bar{\theta }_{i + }^{n - 1} ,Y_{n - 1} )} }} $$
(16)
where θ i is the i th parameter of the model, K(θ i ) is either a posterior (LHS) or prior (RHS) probability distribution across values of parameter θ i , y n is the n th observation in the sample, Y n is the set of observations up to n (Y n  = y 1y n ), \( \bar{\theta }_{i - } \) is a vector of expected values for parameters θ 1, θ 2, …, θ i−1, and \( \bar{\theta }_{i + } \) is a vector of expected values for parameters θ i+1, θ i+2, …, θ m (m = number of parameters of the model). Equation (16) describes an incremental Bayesian learning process. Initially, a uniform distribution across some predefined wide-enough range is assumed for each parameter of the model, reflecting the assumption that no prior knowledge about parameter values exists. Observations are processed one at a time in sequence y 1, y 2, …. For each observation the posterior distribution is determined one parameter at a time in sequence θ 1, θ 2, …., θ m using Eq. (16), whereby all other parameters are set to their current expected values (denoted as \( \bar{\theta } \)). The priors in each next case are set to the posteriors obtained from the last case. After all cases have been processed, the posterior distributions represent final estimates. Note that in this method each observation is used only once to update beliefs about the parameters.

Design of the survey

Data had to be collected in order to estimate the parameters of the above model. The questionnaire developed for this purpose was administered through the internet to reduce respondent burden and shorten the data entry time. In total, 37 social, sports, leisure and service-related activities were included in the survey. The activities chosen for this questionnaire were based on the activities used in earlier activity diary surveys (e.g., Amadeus (Timmermans et al. 2002)). The questionnaire consisted of six different parts. For estimating the parameters we focus on four of them, namely:
  • Socio-economic and demographic variables: as person, household, and dwelling attributes, questions concerning e.g., gender, age, household composition, income, dwelling type, education level, number of children, age youngest child, living area, car availability, and driver’s license were added.

  • The activity pattern of the day before: subjects were asked to indicate which activities they conducted the day before they filled out the questionnaire including some characteristics of those activities (e.g., duration, travel time, planning time horizon, and accompanying persons)

  • History: The last time the activities were conducted; respondents had two ways to indicate this. First, they could specify the date, which could be selected with the help of a calendar. Second, they could indicate how many days, weeks or months ago they last performed the activity. A third option was n/a (not applicable) which could be marked if it was longer than 6 months ago or if they never do the activity. The history information was requested for the exhaustive list of 37 activities (not just the activities conducted on the day before).

  • Time budgets: the standard week pattern in terms of school and work hours of the respondent. This data was obtained from a part of the questionnaire where respondents had to indicate, for every day of the week, which of the given activities they normally (phrased as ‘almost always’) conduct on that day. For each selected activity the subjects had to specify the usual duration and travel time. Eighteen activities were included in this part, like work, education, bring/collect child(ren), grocery shopping and some sports, leisure, and social activities. In the current analysis we use the time spent on work or education on the days of the week as an explanatory variable.

Sample

Subjects were selected from a sample of neighborhoods in the city of Eindhoven and seven surrounding towns. About 4,000 invitation cards were distributed to households in the chosen neighborhoods in June and July 2009. Additionally, we approached approximately 400 individuals, who in an earlier survey (Sun et al. 2009) had indicated their willingness to participate again in an internet survey, by e-mail. As an incentive, 20 vouchers of 50 Euros were allocated to respondents through a lottery. Altogether, 438 individuals started and 290 of them completed the web-based questionnaire.

Table 1 compares composition of the sample to the national population of the Netherlands with regard to some relevant socio-economic variables. The sample is reasonably representative except that above-average educated groups are overrepresented. This bias is typical for surveys in general (Bricka and Zmud 2003). Households consisting of two persons (married or living together) are a little overrepresented and the elderly (65+ years) and young persons (<25 years) are somewhat underrepresented.
Table 1

Composition of the sample

 

Sample (%)

Dutch population (%)

Gender

 Male

47

49.5

 Female

53

50.5

Age (years)

 15–24

7

15

 25–44

48

37

 45–64

34

33

 ≥65

10

16

Household composition

 Single, no children

23

35

 Single, children

3

6

 Double, no children

38

29

 Double, children

33

29

 Multiple persons

1

1

Education

 Below average

14

35

 Average

25

41

 Above average

61

24

The activity data used for the analyses in the current paper consists of the cases where the respondent indicated the date of (or the time passed since) the last performance of the activity. The variable ‘time passed since last performance’ showed the amount of days between the last performance and the day before the respondents filled out the questionnaire. The activity could either be conducted or not be conducted on the latter day. Both of these options were included in the model estimation. Altogether about 4,200 cases could be used for the analyses. By taking some of the most frequently conducted activities together; five activity groups were created, namely: daily shopping, non-daily/fun shopping, social visits, leisure, and sports. Note that the activity groups are only used at the level of the parameter estimation, in the model the activities are used individually (i.e., the time elapsed since last performance of the activity is calculated for the activities separately, not for the activity group in general). Table 2 shows which activities were put together. In total those activities contain 2,620 cases that can be used for the estimation of the parameters of the need-based model.
Table 2

Activity groups and their activities included in the estimations

Activity group

Activities included

Daily shopping

Daily shopping

Non-daily/Fun shopping

Non-daily shopping

Fun shopping

Social visits

Visiting relatives/friends

Receiving visitors

Visiting (e.g., birthday) party

Leisure

Going out for dinner

Visiting a theatre

Attending a concert

Visiting a café, bar, or discotheque

Going to the cinema

Visiting a museum

A day out (visit a city, recreation park)

Sports

Sports outdoors, club/association context

Sports outdoors, flexible

Sports indoors, club/association context

Sports indoors, flexible

Results

The selection and categorization of explanatory variables on individual and household levels to be included in the analysis were based on the number of cases available for each (dummy) variable. This number may not be too low in order to get a reliable result. A threshold of 400 cases was used. The need-based model and the Bayesian estimation method to estimate the model (using the Bayesian estimation method described above (Eq. (16))) were both developed in C.

Equations (7)–(10) were used for the estimation of the parameters. As explanatory variables of activity needs (X 1), we included the person, household, and dwelling attributes shown in Table 3. As said, parameters indicating possible interaction effects between activities consist of δ estimates, which show whether the need for an activity is influenced by another activity, and φ parameters, which represent whether the utility of an activity is affected by undertaking another activity on the same day. In the current analysis, work hours (as a continuous variable) and car availability (dummy coded) were used as explanatory variables (X 2) for the threshold value, as those variables are likely to affect time budgets on a day. In the current formulation of the model, temporal constraints such as limited opening hours are not represented separately from other, individual-related constraints. All constraints are represented by a single threshold function. It is possible to extend the model and represent the latter constraints as an all-or-nothing availability variable for days. We leave this for future research. The threshold parameters are estimated across all activities, as time-budget is day related rather than activity related (Eq. (10)). The β (need growth), α (day preferences) and day-error-scale parameters are estimated for each activity group separately (Eqs. (7)–(9)). The number of draws was set as K = 100.
Table 3

Explanatory variables considered for the model (base levels in italics)

Variable

Code

Description/range

Gender

Male

Male

Female

Female

Age group

Age30–

<30-years-old

Age3040

30–39 years old

Age4050

40–49 years old

Age5060

50–59 years old

Age60+

60 years and older

Household composition

Hh_singl_no

Single, no children

Hh_sd_child

Single or Double, with child(ren)

Hh_rst

Double, no children, living in at (grand)parents/relatives, student accommodation, group accommodation

Dwelling type

DwAp

Flat, apartment

DwGarden

House

Income household

Inc < av

Below average

Inc ≈ av

Average

Inc > av

Above average

Education level

Edu_low

Low

Edu1av

 

Edu_high

High

Age youngest child

Ageychild06

0–5 years old

Ageychild6+

6 years and older

Living area

City

City

Village

Village, countryside

Day of the week

Mon

Monday

Tue

Tuesday

Wed

Wednesday

Thu

Thursday

Fri

Friday

Sat

Saturday

Sun

Sunday

Car availability

CarA

Yes, always

CarO

Yes, to be agreed with others

CarN

No

Hours spent work a day

Tswork

Continuous

The results of the parameter estimation are shown in Table 4. In terms of need-build up rate, the β 0 parameter represents the intercept when all other β variables are zero. Person, household and dwelling attributes influence the value of β. We find that individuals living in a house with garden have larger need-rebuild time to go to the supermarket or other store for daily shopping than subjects living in a flat/apartment. This means that if available time (given work hours), car availability, specific day preferences, and interactions between activities are the same, this group would conduct daily shopping less often. On the other hand, below average and above average educated persons display shorter need-rebuild times for grocery shopping than respondents with a moderate education level. Furthermore, individuals that live in a city have shorter build-up times for needs for daily shopping. These results seem behaviorally intuitive as there are more grocery stores in cities, which indicates that persons live closer to a store (especially apartment buildings are often located near shopping areas) and, hence, may have developed higher-frequency solutions for re-stocking. The results of non-daily and fun shopping indicate that being single (hh_s_no) and/or being between 40- and 60-years-old decreases the need rebuild time to go shopping. Conversely, subjects whose youngest child is between 0- and 6-years-old have shorter interval times for shopping, which seems reasonable since for the children in this age group the parents need to buy a lot of goods/products (e.g., clothes). In case of social visits higher educated respondents have longer need build-up times for needs for visiting relatives/friends. On the contrary, keeping everything else equal, elderly people (50+) show a higher need-recover rate for social visits than younger persons. This might be caused by a satisfaction of the need for social contact by going to work, school or college that especially younger age groups experience. The activity group Leisure shows negative effects for β values when the household income is lower than average, the subject lives in a city, and the age lies below 30 or between 40 and 60 years. The age group 60+, higher income households, and people living in a house with garden, on the other hand, have a higher level of expressed needs. An explanation might be that they have more time or money available to participate in leisure activities. Finally, the results for sports show that respondents between 40- and 50-years-old, individuals living in a house with garden, and higher educated subjects have a longer need rebuild time for sports activities.
Table 4

Estimation results

Variable

Grocery shopping

Non-daily shopping

Social visits

Leisure

Sports

Estimate

t value

Estimate

t value

Estimate

t value

Estimate

t value

Estimate

t value

β 0

0.632

22.734

0.138

6.316

0.283

4.597

0.027

0.730

0.061

0.843

β Male

0.015

0.148

−0.013

−0.264

−0.052

−1.108

−0.138

−1.476

−0.048

−0.395

β Age30–

0.049

0.286

0.127

1.792

−0.025

−0.655

−0.112

−2.130

−0.089

−0.731

β Age4050

0.025

0.242

−0.074

−3.512

−0.019

−0.166

−0.156

−2.196

−0.155

−2.044

β Age5060

0.020

0.156

−0.079

−6.021

0.128

2.160

−0.133

−3.585

0.015

0.147

β Age60+

0.079

1.106

−0.005

−0.049

0.220

6.840

0.058

5.075

0.100

1.214

β Hh_sd_child

0.108

0.705

−0.048

−0.727

−0.019

−0.384

−0.013

−0.245

0.022

0.467

β Hh_singl_no

0.070

1.157

−0.032

−4.820

0.037

0.335

−0.053

−0.505

−0.115

−1.409

β DwGarden

−0.093

−2.085

0.053

0.911

−0.046

−1.005

0.155

4.343

−0.214

−3.675

β Inc < av

0.114

1.919

−0.038

−0.440

0.172

1.721

−0.102

−4.346

−0.102

−0.975

β Inc > av

0.066

0.571

−0.088

−1.429

−0.049

−0.575

0.186

4.146

−0.036

−0.972

β Edu_low

0.180

3.826

−0.128

−1.789

−0.151

−1.223

0.007

0.233

0.068

0.958

β Edu_high

0.159

2.367

0.069

0.995

−0.179

−6.163

−0.078

−1.816

−0.240

−5.771

β Ageychild06

−0.004

−0.053

0.197

2.670

−0.120

−0.935

−0.112

−1.318

0.127

0.833

β City

0.148

3.556

0.017

0.171

−0.072

−1.524

−0.210

−14.998

0.168

1.028

α Mon

0.236

0.812

0.090

0.279

−0.143

−0.811

−0.125

−0.631

−0.009

−0.026

α Tue

0.144

1.068

−0.007

−0.024

0.354

1.397

−0.372

−2.054

0.403

3.217

α Thu

−0.089

−0.287

−0.389

−0.927

−0.647

−4.943

−0.140

−0.883

−0.300

−1.088

α Fri

0.311

0.730

0.083

0.389

−0.022

−0.111

0.167

0.468

−0.108

−0.920

α Sat

0.496

3.396

0.201

0.833

0.178

2.317

0.208

1.124

0.089

0.360

α Sun

−0.378

−2.836

−0.048

−0.171

0.406

5.486

−0.459

−4.829

0.087

0.422

DaySTDEV

1.227

2.663

3.274

6.852

3.278

15.539

2.959

9.820

3.627

17.210

δ

δ Groc Shop

0.006

0.011

0.446

1.865

−0.018

−0.072

0.185

0.880

−0.371

−0.815

δ N-D Shop

0.077

0.302

0.236

0.805

0.245

2.503

0.080

0.313

−0.292

−0.992

δ Social Visits

0.095

1.396

0.539

4.898

0.222

1.163

0.316

1.984

−0.257

−11.820

δ Leisure

0.012

0.198

0.106

0.930

0.063

1.456

0.210

1.256

0.423

5.567

δ Sports

−0.101

−0.513

0.066

0.249

0.026

0.104

0.266

2.661

−0.160

−1.664

φ

φ Groc Shop

−0.018

−0.033

−0.209

−3.086

−0.136

−0.617

0.190

0.673

0.340

0.745

φ N-D Shop

−0.030

−0.123

−0.301

−0.812

0.390

1.710

0.502

5.761

−0.478

−1.289

φ Social Visits

−0.424

−12.307

0.079

0.218

0.338

0.807

−0.095

−0.299

0.323

0.854

φ Leisure

−0.447

−1.091

−0.051

−0.133

0.346

0.798

−0.496

−2.166

−0.084

−0.205

φ Sports

0.548

1.811

−0.035

−0.094

−0.203

−0.386

−0.424

−1.263

−0.130

−0.341

All activities

Thr 0

1.592

9.706

        

Thr Tswork

0.201

9.320

        

Thr CarA

−0.019

−0.198

        

Thr CarO

0.027

0.303

        

Significant estimates in bold

No. of obs. = 2620; LL = −711.166; LLO = −1,604.770; Rho square = 0.557; Rho sq (adj.) = 0.455

Some of the effects that one might expect did not occur. The general notion that the elderly visit grocery stores more often is counterbalanced within the need-based model by the threshold, which is lower for persons that have more time available. Furthermore, the gender variable did not show a significant effect. The common idea is that men go shopping less frequently than women. In the model this is corrected by the fact that females still work less hours a week compared to men, at least, in the Dutch context. Another interesting result is that the income variable only showed effects in case of leisure activities. Irrespective of the available time, higher income households undertake leisure activities more often. This suggests that the costs of leisure activities are important when choosing between activities.

If we look at day preferences, we see that individuals tend to have an intrinsic preference for doing grocery shopping on saturdays, social visits on saturdays and sundays, and sports on tuesdays. On the other hand, individuals do not prefer daily shopping on sundays or stores are closed on that day (in the Eindhoven region, by the time of the data collection, supermarkets were closed on sundays except that some of them could be open about once a month on a fixed date). Furthermore, they display decreased preferences for social visits on thursdays, and leisure activities on tuesdays and sundays. The day-error-scale (DaySTDEV) values show that in case of grocery shopping the random circumstances on the day (e.g., weather conditions) are less influential than in the case of the other activity groups. In other words, grocery shopping will be done (almost) regardless of the circumstances of the day. This seems rational as daily shopping is done indoors, frequently by car, and it often cannot be postponed, otherwise there is nothing to eat at home.

The δ parameters and φ parameters indicate in two different ways the possible interactions between activities. The δ estimates represent whether activities within the row activity group affect the need for an activity from the column activity group. The φ parameters, on the other hand, show whether the utility of the column activity group is influenced by conducting an activity of the activity group considered on the same day. The results of the δ estimations show some significant parameters. A social-visit activity increases a need for non-daily shopping (or people who often undertake social visits also frequently conduct non-daily shopping activities). Conversely, non-daily shopping raises the need for a social visit as well. Furthermore, social visits and sports activities increase the need for leisure and leisure increases the need for sports (or people who often engage in social/sports activities also tend to do leisure activities more often). The only significant parameter with a negative sign is social visits in case of sports activities: Social visits decrease the need for sports (or people who often undertake social visits tend to undertake sport activities less frequently than others). This might be caused by the fact that sports activities done together with others also satisfy the need for social contact. The results of the φ estimates show several interaction effects between activities: the utility of grocery shopping decreases when a social visit is conducted on the same day; the utility of non-daily shopping reduces when grocery shopping is done on the same day, and the utility of a leisure activity diminishes when another leisure activity is performed on the same day. In other words, interaction effects among leisure activities exist: if a leisure activity is performed the probability of conducting another leisure activity on the same day decreases. This counts for shopping activities as well. On the contrary, the utility of Leisure rises when a non-daily shopping activity is conducted on the same day.

Some variables can also have an impact on the threshold value. For this study we only included the number of work hours by day of the week and car availability as an explanatory variable. The results show that the amount of time spent on paid work on a day increases the threshold value and, hence, decreases the probability of conducting the activity on that day, which is what one would expect. In this study, car availability does not have a significant impact on the threshold value.

The Rho square of the estimation was calculated by using the log-likelihood of the estimated model and the log-likelihood of a null-model. A complete null model, where all parameters are set to zero is not a good indicator of the reference goodness-of-fit in that the need-growth and threshold value cannot be equal to zero. In order to find an appropriate reference goodness-of-fit we used ‘mean’ values of the intercepts of β and a value close to the threshold intercept parameter to calculate the log-likelihood of a null-model. For all intercept β parameters we chose 0.5 and for the threshold intercept a value 2. The Rho-square calculated on that basis is 0.557. However, the adjusted Rho-square is noticeably lower with a value of 0.455, which reflects the relatively large number of parameters of the model compared to the number of observations but still indicates that goodness of fit is satisfactory.

Discussion and conclusions

This paper described a first attempt of estimating a model of activity generation that is based on notions of dynamic needs with the aim to reveal (positive or negative) substitution relationships between activities. Data used were especially collected for this purpose. The survey included, for a list of 37 activities, the time elapsed since last conducting the activity and if the activity was conducted the day before.

The purpose of the present study is (1) to show that it is possible to specify a model and collect data which can be used to estimate the parameters of a dynamic need-based activity generation model and (2) to identify interactions between activities such as to find out to what extent activities are substitutable in the framework of the need-based model. Although the size of the sample is somewhat limited for the number of variables included in the model, we demonstrated that the developed methodology is feasible. The results of the parameter estimations indicate that several socioeconomic and dwelling variables have an impact on episode interval timing and day choice decisions of the shopping, social, leisure, and sports activities considered in the present study. Day preferences and interaction parameters show significant effects as well. Especially the fact that interaction effects are significant is highly relevant because it suggests that activities cannot be assumed independent when generating the dynamics of activity participation.

New data should be collected all year round, to capture seasonal influences, and in larger amounts. An interesting avenue is to combine inter-episode activity data, as collected in this study, with one-day data from a national travel survey, such as for example the Dutch travel survey (called the MON). The Bayesian estimation method used in the present study supports pre-specification of a priori distributions of parameters that could be set based on other data sources such as the MON. In that approach, data collected specifically for the model would be used for fine tuning rather than estimating parameters from scratch. There are also meaningful ways of extending the model, e.g., by incorporating the effects of travel time and cost on activity participation choice. Furthermore, an interesting problem for future research is to extend the model to account for a possible influence of future plans of activities/events on activity timing decisions that have a short-term planning horizon.

Open Access

This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Copyright information

© The Author(s) 2012