1 Introduction

The trip pattern of individuals is based on demographic characteristics and environmental factors, e.g., accessibility (Srinivasan & Ferreira, 2002). These data are collected mainly for a small percentage of an area, which enables transportation modelers to identify similar patterns and estimate the trip demand of a wider population. There is an unambiguous relationship between demographic characteristics, activity participation, and travel behavior (Cheng et al., 2019). Trip pattern choice is a function of the need for participation in dispersive activities in the urban environment and individual and house characteristics, including a set of alternatives and limitations. Accessibility also plays a crucial role in such pattern recognition since the generation of work and other trips are sensitive to accessibility (Cordera et al., 2017; Currans et al., 2020; Næss et al., 2018; Pitombo et al., 2011; Stead, 2001).

Random utility models (RUMs) are the primary tool for travel demand prediction. While these models allow interpretability, they are not as complex as deep learning models. This is important because their understanding of the traveler’s choices might be inefficient as the travel behavior pattern formation is complex in essence, and they can contain intra-household complexity substrates affected by the built environment (Yang et al., 2019).

Either machine learning models or random utility models are chosen, there is a trade-off between higher accuracy and interpretability (Ali et al., 2021; Derrible & Pereira, n.d.; García-García et al., 2022). The higher complexity of machine learning models allows them to identify non-linear relationships between the input and designated outputs to a higher degree than random utility models. This grants us with more accurate prediction. Despite the advantageous prediction capability, machine learning models divest the modeler of the interpretability and statistical tools that random utility models provide. This means the mechanism operating in a choice procedure and the significance to which each element serves a function that leads to a final decision is unknown when using a machine learning model. Regardless, due to simplicity, random utility models often prove computationally expensive, or even infeasible, when fed with a large volume of data (Wang et al., 2021).

There is substantial evidence that people with similar socioeconomic backgrounds show comparable travel behaviors (Carlsson-Kanyama & Linden, 1999; Li et al., 2018). Males, for instance, tend to engage in more business and work-related activities, whilst women engage in more leisure activities such as visiting family or seeing friends (Collins & Tisdell, 2002). Women prefer to commute over shorter distances, at off-peak hours, or by using flexible modes of transport (Ng & Acker, 2018). Income is another element that influences travel behavior; income level may change travel behavior habits such as distance (Jain & Tiwari, 2019). Because there has shown to be a causal relationship between travel patterns and mobility across demographic groups, travel patterns can be inferred and calculated using socioeconomic data.

A number of researchers have sought to explore essential variables in the formation of household travel behaviors. Bhat et al., 2013 proposed a household activity production model in Southern California to find how all individuals in a household make their decisions about activity participation. The daily trips of an individual can also be classified into distinct patterns. In that regard, Hedau & Sanghai, 2014 classified daily trips into five patterns to develop an activity-trip choice model using multiple variables. Molla et al., 2017 introduced a probabilistic activity-based travel generation model, which could infer the actual number of trip generations. They assumed that small organizations in an urban area could create activity-based models based on traditional trip surveys. In all of these studies, socioeconomic characteristics were considered significant in generating activity-based trips.

With such a relationship in place, some studies have investigated the derivation of socioeconomic characteristics from travel patterns. Zhu et al., 2017 predicted people’s sociodemographic variables such as work status, age, gender, and income based on GPS data, training SVM, and logistic regression. Among the most important differentiating characteristics they utilized for categorization were variables linked to the spatiotemporal variability of tours. Temporal-spatial data obtained from public transit smart cards may also be utilized to study trip patterns (Yang et al., 2018). Li et al., 2019 used large-scale data across three age groups to lessen the usage of survey data in the design of human-centered public transportation. The study concentrated on predicting age groups based on travel to various “points of interest” retrieved from trip destinations. Among the ML approaches trained and compared, the neural network (NN) produced the best results. Zhang & Chen, 2018 estimated vehicle ownership, age, gender, and income using extracted attributes from smart card data. After testing multiple supervised ML algorithms, they concluded that the NN produced the best results. While this study manually collected characteristics from the data before feeding it to ML models, Zhang, Cheng, & Sari Aslam, 2019 used a convolutional neural network (CNN) to undertake the same investigation without the requirement for feature extractions. CNN’s are extensively utilized in cutting-edge image processing models, but they may also be trained to recognize hidden patterns in non-image data. Similarly, after training and comparing many ML models, Zhang & Cheng, 2019 predicted job status from London’s public transportation smart card data and discovered that CNN performed the best in their scenario.

Whether traffic analysis zone (TAZ) parameters are incorporated into a model or not can influence trip patterns and, consequently, travel demand forecasts. Some researchers highlight this fact. For example, an urban accessibility relative index (UARI) was developed to integrate the collected multi-mode transportation big data (related to the taxis, buses, and subways) to quantify, visualize and understand the spatiotemporal patterns of accessibility in urban areas (Jiang et al., 2021). Neglecting an accessibility characteristic could lead to incorrect interpretations of travel demand forecasting for non-mandatory trips when modes other than private cars are used, or mandatory trips are made by private cars (Cordera et al., 2017). In addition, population density can affect trip generation (Zhang, Clifton, et al., 2019).

Machine learning algorithms have eliminated many practical limitations because of the abundance of mobility data and pattern recognition. They are widely used in the forecasting and analysis of travel behavior in activity-travel patterns, including supervised (e.g., neural networks and support vector machines) and unsupervised (e.g., K-means clustering) learning (Koushik et al., 2020). These algorithms can find complex travel behavior patterns through the relationship between Spatio-temporal and socioeconomic characteristics and estimate trip patterns. Despite this benefit, such techniques have rarely been employed in activity-based modeling and trip forecasting. For example, a support vector machine (SVM) was employed to recognize and forecast daily activity sequences (Allahviranloo & Recker, 2013). Furthermore, using a hybrid logit-SVM model, Yang et al., 2016 considered the role of the head of household in forecasting the number of household trips.

Despite the indispensable strength of great pattern recognition, machine learning algorithms are primarily non-interpretable, meaning that the process through which the final output is produced is unknown to the modeler. For this reason, these models are often referred to as the “black box.” Nevertheless, sometimes a combination of unsupervised (e.g., clustering algorithms) and supervised algorithms (e.g., decision trees) can enable interpretability. Pitombo et al., 2011 used this modeling approach to analyze the pattern-travel relationship involving activity, land use, and socioeconomic characteristics. In the same vein, Hafezi et al., 2019 proposed a model to identify activity patterns by combining K-means clustering and the classification and regression trees (CART) algorithm. After recognizing homogeneous patterns, they developed the CART model to allow a more in-depth analysis.

This study proposes a novel deep learning method to predict the future travel demand based on how population distribution—that is, according to designated demographic traits (e.g., age, gender, income)—changes over time. The proposed deep learning model will predict future travel demand more accurately than conventional random utility models. A DNN model is developed to predict trip patterns in two categories: mandatory and non-mandatory. For this purpose, the socioeconomic characteristics coupled with characteristics of TAZs of people residing in the Washington metropolitan area were used.

2 Data

The socioeconomic characteristics extracted from the Metropolitan Washington Council of Government Transportation Planning Board (MWCOGTPB) 2007–2008 survey data were used to train a trip-pattern-predicting model. This data set contains travel behavior and demographics of 11,000 households in the Washington metropolitan area, including Northern Virginia and some parts of Maryland. The Transportation Planning Board (TPB) periodically conducts the survey to evaluate the transportation system’s effectiveness with respect to the transportation demand of the households. The participants responded to a one-day questionnaire on a detailed travel diary from February 2007 to March 2008. Although the data is collected for 24 hours, it is collected from different days of the week, which provides a complete picture of the travel behavior in the area. Moreover, the mandatory and non-mandatory patterns are expected to continue in a recurring manner throughout the week, especially given the inflexible nature of non-mandatory trips. Therefore, the results can be confidently generalized to the area’s population.

Although more recent data would have been preferable, it was essential to incorporate the characteristics of each transportation analysis zone to improve the results and reduce bias, and such data was only available from 2007. As the infrastructural condition of the area has developed since then, we needed to match the collection timeline of the two data. Nonetheless, TPB data is one of the most comprehensive survey data available, and we focused our attention mainly on the algorithm, which can be trained on any data from any year.

A set of distinct variables typically available in travel surveys and census data were selected for inclusion in the model. As with any data, this data also had to be preprocessed before the machine learning model training. Once the data was prepared, it consisted of TAZ characteristics and the socioeconomic characteristics of individuals, which are shown in Table 1 and Table 2. Three types of features used to train the model included continuous, categorical, and binary variables. The categorical data were organized as dummy variables, while the continuous variables were scaled in the range of 0–1. This allowed faster model convergence (lower computational cost), resulting from a lower variance of each feature.

Table 1 Location and TAZ-related features
Table 2 Socioeconomic features

There are a total of 3722 TAZs in the Washington metropolitan area, which differ in urban infrastructures depending on whether they are in- or out-of-city zones. This study included TAZs features such as public transportation access, population density, and employment density. The spatial distribution of the participants at the scale of TAZ is shown in Fig. 1.

Fig. 1
figure 1

The spatial distribution of the participants at the scale of TAZ

Table 3 represents the mandatory and non-mandatory trip patterns and their share of data. Mandatory trips were defined based on the following assumption. Mandatory trips are inflexible, meaning they must take place at a specific time and last for a predefined amount of time. They are not made on a voluntary basis, nor is there a choice for the time of their occurrence. Some trips, such as grocery shopping, are necessary for the household, but they are not counted as mandatory based on this assumption.

Table 3 Mandatory and non-mandatory trips

Mandatory trip patterns were divided into seven groups based on the number of work, educational, and school trips, and one group was also considered for those with no mandatory trips. On the other hand, the remainder of the trip purposes were labeled as non-mandatory (e.g., shopping trips, visiting relatives, and recreational trips). These were classified into five groups based on the number of their occurrences. Since the number of individuals with no non-mandatory trips was tiny (0.2%), they were excluded from the modeling. The mandatory-trip model predicts a combination of trip purposes as categorical classes; for example, “1 work and 1 education” is a single class. School trips refer to trips made for the purpose of formal education—that is, school, university, and college—while educational trips refer to trips made for other educational activities, such as learning music or language.

3 Methodology

A deep neural network algorithm was used to train the developed model. While a shallow neural network typically consists of a small number of hidden layers, deep neural networks are created by stacking many hidden layers on top of each other. This makes the model bigger and more complex. Although training deeper models means longer training time (higher computational cost), it should be noted that the deeper a neural network, the more likely it is to identify and pick up more complex and non-linear patterns from the dataset.

A node in an NN is a processing unit with a weight and a sum function. A weight w is a mathematical value representing the relative power of connections to transfer data from one layer to another, while a sum function y calculates the total weight of all input variables in a processing unit. The performance signal appearing in the output of neuron j is calculated as:

$${y}_j=\sum_{i=1}^m{x}_i{w}_{ji}+{b}_j$$
(1)

where m is the number of variables introduced to neuron j, xi is a group of variables in neuron j, yj is the output of neuron j, wji is the calculated weight from neuron i to neuron j, and bj is the bias term. The activation function is typically required for a non-linear introduction to the neural network. It defines a non-linear relationship between the input and output of a node and a network. The present study adopted the softmax activation function in the output layer:

$$g(y)=\frac{\exp \left({y}_i\right)}{\sum_{j=1}^K\exp \left({y}_j\right)}$$
(2)

where y is the input vector and k is the number of classes in multiclass classification.

We scaled each variable in the range of 0–1. Scaling speeds up the gradient descent—a step-wise optimization algorithm of a neural network—which works hand in hand with back-propagation. Starting from randomly assigned weights to each variable, the algorithm takes small or big steps (depending on the learning rate) to minimize a cost function. This works based on calculating an error term on each step and then taking the derivatives of the activation function and adjusting the weights based on that. Constraining the variance of each variable within a limited range prevents the algorithms from taking large derivatives with each update of the gradient descent, hence reducing each consecutive computation time and, consequently, the total training time. Scaling also helps better the performance and stability of the optimization process (Bishop, 1995).

The sample was collected from a random subset of the population. Sample bias and data imbalance have always been challenging in such cases. The dominant groups are prone to overshadow less frequent observations while training the machine learning algorithm. This means DNNs can perform decently when dealing with uniformly distributed datasets, while their performance on datasets of an unbalanced distribution cannot be ensured (Wang et al., 2016). One way of dealing with the imbalance problem is to augment marginalized categories; however, this disturbs the authenticity of the distribution of the randomly sampled data, so the data will longer represent the actual population. Instead of a synthesization that would have undermined the validity of our analysis, we used class weighing during the training phase of the deep learning models, penalizing the error—using the cost function—commensurate with the share of samples in the data.

The prediction of mandatory and non-mandatory trip patterns through socioeconomic characteristics were both formulated as classification problems. Python programming language was used to implement the preprocessing and modeling of this paper.

3.1 The evaluation criteria

3.1.1 Accuracy, precision, recall, F1-score

The classification could be evaluated through the true positives (TP) as the number of correctly included classes, true negatives (TN) as the number of correctly excluded classes, false positives (FP) as the number of wrongly included classes, and false negatives (FN) as the number of wrongly excluded classes. These four criteria form a confusion matrix for the classification (Sokolova & Lapalme, 2009). In this respect, accuracy, precision, recall, and F1-score can be calculated as:

$$Total\ observations=\left( TP+ FP+ TN+ FN\right)$$
(3)
$$Accuracy=\frac{TP+ TN}{\left( Total\ observations\right)}$$
(4)
$$Precision=\frac{TP}{\left( TP+ FP\right)}$$
(5)
$$Recall=\frac{TP}{\left( TP+ FN\right)}$$
(6)
$$F1- score=2\times \frac{Precision\times Recall}{\left( Precision+ Recall\right)}$$
(7)

The accuracy of an ML model indicates how many times it was accurate overall, while precision measures how well a model predicts a specific category. Precision is an excellent metric when the costs of FPs are high. When a considerable cost is associated with FNs, we will utilize recall as the measure to choose our best model. F1-score is helpful while attempting to seek a balance between precision and recall. It may also be appropriate when there is an uneven class distribution. A collective consideration of the aeformentioned criteria was the best way to choose the final model, because every aspects of the model performance was clear for us.

3.1.2 Kappa coefficient

Cohen’s kappa coefficient helps solve multiclass classification problems with non-normal distributions. This criterion measures the agreement between classified data (Landis & Koch, 1977). Because our data was of an imbalance nature, the kappa coefficient was calculated along with the other evaluation indices to measure the model’s performance effectively. This coefficient is expressed by Eq. (7) as follows:

$${p}_0=\frac{TP+ TN}{Total\ observations}$$
(8)
$${p}_e=\left(\frac{TP+ FP}{Total\ observations}\right)\times \left(\frac{TP+ FN}{Total\ observations}\right)+\left(\frac{FN+ TN}{Total\ observations}\right)\times \left(\frac{FP+ TN}{Total\ observations}\right)$$
(9)
$$\kappa =\left({p}_0-{p}_e\right)/\left(1-{p}_e\right)$$
(10)

where p0 denotes real relative agreement between two datasets, while pe is the probability of random agreement between the datasets. It is required to define boundaries for the calculated coefficients to perform the evaluation. Although different performance levels have been suggested for the number that the kappa coefficient provides, the scoring system proposed by Landis and Koch was adopted (Table 4). The kappa coefficient varies from 0 to 1 at six evaluation levels. A larger kappa coefficient represents the higher efficiency and effectiveness of a model.

Table 4 Agreement Kappa statistic measures for categorical data (Landis & Koch, 1977)

4 Results

The mandatory and non-mandatory trip patterns in the Washington metropolitan area were estimated in this study. The socioeconomic characteristics were extracted from the MWCGTPB 2007–2008 data, and TAZ characteristics were included. Then, a DNN algorithm was formulated to predict trip patterns.

Figure 2 displays the mandatory trip pattern estimates as a confusion matrix. The vertical axis represents the real values (correct labels), and the horizontal axis represents the estimates. The color of each square represents the probability of correct estimations—Table 5 reports each class’s accuracy, precision, recall, and F-score.

Fig. 2
figure 2

The confusion matrix of the mandatory trips generation

Table 5 The evaluation metrics of mandatory trips pattern generation

A total of seven classes were predicted for mandatory trips, including “no mandatory trip.” The estimation accuracy of mandatory trips was 70.87%, implying its promising performance. Given the high recall and precision scores, the model mostly predicted the individuals with no mandatory trips more accurately. Most of the mandatory trip groups with “1 work trip” and “1 school trip” were predicted inaccurately, while the individuals with “1 work trip” were estimated with high accuracy.

Individuals with work trips were rarely confused with individuals who had educational trips. This distinction could be attributed to their socioeconomic characteristics, e.g., age and occupational-educational position. In other words, individuals aged 0 to 18 are mostly students, so they are not expected to generate work trips. In many cases, individuals with two work trips were wrongly predicted as those with one work trip, but the model could differentiate the work from educational trips. In addition, the class of three work trips was mainly confused with other groups of work trips, suggesting that the model was relatively inefficient in recognizing the number of work trips.

Educational trips were the second group of mandatory trips. The pattern of one educational trip was predicted with reasonable accuracy; however, this pattern was confused with the “no mandatory trip” pattern and was rarely predicted as a work trip pattern. The patterns of “one work trip and one educational trip” and “one educational trip and one school trip” (patterns combining multiple purposes) had low estimation accuracies. The former was mostly confused with work trips, while the latter was recognized as school trips. Finally, the “one school trip” pattern had high accuracy.

Apart from the outputs of mandatory trip patterns, Table 6 and Fig. 3 represent the estimation results of non-mandatory trips. The model yielded an overall prediction accuracy of 50.02% for non-mandatory trips. The pattern of “one non-mandatory trip” and “four or more non-mandatory trips” had the highest estimation accuracy, followed by the “two non-mandatory trip” pattern. In contrast, the “three non-mandatory trips” pattern had the lowest accuracy. This performance seems acceptable since non-mandatory trips involve a wide range of trip purposes, from buying gas to visiting relatives and recreation.

Table 6 The evaluation metrics of non-mandatory trips pattern generation
Fig. 3
figure 3

The confusion matrix of non-mandatory trip pattern generation

As presented in Table 7, the kappa coefficient was calculated to be 0.5853 for mandatory trips, which is a medium coefficient, while non-mandatory trips had a kappa coefficient of 0.3014, suggesting a fair value with an accuracy of 50.02. This implies acceptable performance for both mandatory and non-mandatory trips.

Table 7 Cohen’s kappa score for mandatory and non-mandatory trips

5 Discussion

This study presents a novel deep learning framework for forecasting future travel demand. A DNN model was used to discover a meaningful relationship between socioeconomic characteristics and accessibility measures on one side, and trip patterns on the other side. The proposed model is expected to outperform traditional random utility models in predicting future travel demand. The prediction ability of this model can be deployed on census data to generalize and synthesize the trip behavior of the area’s entire population. In other words, the predicted patterns can be aggregated on a specific geospatial scale (for example, TAZ) to estimate trip production and attractions as population distribution—demographic attributes (e.g., age, gender, income)—and urban accessibility changes over time. This will help in outlining planning and policy measures.

Given the accuracy gap between mandatory and non-mandatory patterns’ prediction results, it is apparent that the separated modeling of mandatory and non-mandatory trips helped the deep learning model map socioeconomics to trip patterns in a more distinguishable manner, owing to the nature of each trip category and its relation to socioeconomics. The relation between socioeconomics and spatial features was more recognizable for mandatory trip patterns, which could be attributed to the role of each individual in the household. Socioeconomics such as age, income, and gender define this role, hence affecting the creation of mandatory trips, with each category dependent directly and distinctly on the assigned role. Additionally, mandatory trips are inflexible, meaning they are not conducted voluntarily. Therefore, the question of what pattern an individual has as a routine part of his transportation diary is easier to answer because there is less flexibility and thus more certainty about their occurrence.

A similar analogy can be drawn for non-mandatory trips, however, adopting the reverse reasoning. This category of trips could be made under less strict circumstances. They can assume an arbitrary form and are not necessarily based on a predefined or recurring schedule. This means, even though socioeconomics plays a key role in the formation of trip patterns, they might not be as influential for the creation of non-mandatory trips. The when and if of the occurrence of non-mandatory trips are harder to relate to the socioeconomics, so there is much less certainty regarding this category. This justifies the lower performance accuracy of non-mandatory trips.

Based on the literature, adding land use data to the input of similar machine learning models often improves the results. Because land-use and socioeconomic characteristics are the main impetus for creating trips. Unfortunately, we could not access the land use information of the area, and trained the models on related features extractable from the data at hand (travel survey and accessibility measures). Thus, future work could use a more comprehensive set of inputs to improve the results.

With the rapid growth of big data technologies, especially GPS (Global Positioning System) data, the inference of socio-economic information also seems a promising direction for future work. Socioeconomic information is one of the main inputs of travel demand models, and relating these data, which are continually and passively collected via censors in our cellphones and cars, can reduce the survey data collection cost as well as help transportations modelers and policy designers draw more meaningful conclusions on how mobility is linked to socioeconomic characteristics. Future studies could use state-of-the-art deep learning models to find such linkage.

6 Conclusion

The present study aimed to forecast trip patterns based on socioeconomic and TAZ characteristics. Once the mandatory and non-mandatory trip patterns and socioeconomic characteristics were extracted from the MWCGTPB 2007–2008 survey data, a DNN was trained to classify these patterns. Mandatory trips included work, education, school, or a combination of such trips, while non-mandatory trips involved the remaining trips of the individuals. The model had an estimation accuracy of 70.87% for mandatory trips (seven groups of trips) and 50.02% for non-mandatory trips (four trip groups). The estimates of mandatory and non-mandatory trips were observed to be significantly different. Mandatory trip patterns with a single trip type were forecasted more accurately than combined mandatory trips, regardless of the number of trips. In addition to accuracy, Cohen’s kappa coefficient was calculated to validate the model’s predictive performance. The results of this study showed that a deep learning algorithm could effectively recognize the correlation between socioeconomic features and trip pattern formation. The prediction results of this model can then be aggregated on a larger geospatial scale to estimate trip production and attractions as population distribution and urban accessibility change over time. This provides transportation modelers with a more accurate tool in the process of travel demand forecasting.