Introduction

GPS data for freight vehicles is increasingly available, due to the deployment of telematics to companies with sizeable fleets and fleet management firms. A widely known example of such data is the American Transportation Research Institute truck GPS dataset (Short 2014). According to some criteria, such as its large volume and by-product nature, GPS traces of freight vehicles can be considered as big data. Dedicated data collection efforts are also becoming more sophisticated, such as the integration of GPS-enabled devices and GPS-enabled digital freight surveys (Alho et al. 2018). As a result, more vehicle trajectory and stop-level data is available to further study freight vehicle movements.

However, the methods to process and analyse such data for their use in freight transportation modelling have not been fully explored. Particularly, there is a research gap regarding the conversion of freight vehicle GPS traces into tour-level data. The importance of this process is evident from the fact tours are one of the adopted units for vehicle flow analysis, and furthermore, freight vehicle tours and tour-chains are an integral element of state-the-art agent-based urban freight simulations (Hunt and Stefan 2007; de Bok and Tavasszy 2018). Well-structured and information-rich records of truck tours have the potential to enhance the replication of freight vehicle tour-chains in a simulation environment for policy analysis. The definition of methods for identifying tours directly contributes to tour-chain modelling (Jing et al. 2019), tour-based simulation case studies (Alho et al. 2019, Gopalakrishnan et al. 2019), and the identification of commodity flows and load factors (Alho et al. 2018).

Generally speaking, freight vehicle tours are more challenging to predict than passenger tours. For passengers, home and work are pivotal points around which tours and sub-tours occur. On the other hand, a single freight vehicle might visit multiple overnight parking locations (Alho et al. 2018), which results in tour-chains having different start/end points at a daily level. Several other challenges are detailed by You et al. (2016), such as the limited data availability and increasing trip chaining behaviour (comparatively to passenger tours). It must be acknowledged that, ideally, data on the “ground truth” regarding stop-to-tour membership, tour-type and tour-chain would be collected. To the best of our knowledge, there is no consensus on the definition of a “freight vehicle tour”. In other words, the criteria which define the start and end of a tour are not well-established, which further justifies the research in this paper. This research sets to explore the output differences arising from the various assumptions in the stop-to-tour assignment process as well as in the tour-type and tour-chain identification processes. A descriptive analysis follows, to illustrate such implications, where an application is focused on the prediction of day-to-day pattern homogeneity and the differences across sub-populations. Follow ups to the analysis in this paper include the exploration of concepts such as tour typology by, for example, the characteristics of operator, commodity handled, vehicle type and transportation service, and tour topology (e.g., spatial tour characteristics such as spatial coverage and displacement) by subpopulation.

The rest of this paper is organised as follows. The second section provides a literature review, covering the definitions of a set of terms that are still not standardised in this knowledge domain; the third section describes a description of the data requirements for the analyses performed in this paper as well as the selected sample; the fourth section details selected tour formation algorithms and the experimental setting; the fifth section presents the results of the experiments to compare the algorithms, as well as the prediction of day-to-day pattern homogeneity and the differences across sub-populations; the sixth section concludes this paper, summarising obtained insights for data processing and modelling practice.

Literature Review

We define some terminology for purposes of this research. Vehicle trip ends, also known as vehicle stops, can be grouped into those relevant and those nonrelevant to the analysis at hand. For example, stops for short breaks might be considered differently than those to deliver goods. Zhou et al. (2014) justify this classification method. The sequence of trips taken between two relevant stops are defined as trip chains (Holguín-Veras and Patil 2005), represented in Fig. 1 by ({a}, {b, c}, {d}, {e}, {f}). A tour consists of one or more trip chains. For illustrative purposes, in Fig. 1, assume a return to relevant stop numbered as “1” marks the end of a tour. Then, there are two tours in Fig. 1: the first composed by trip chains ({a}, {b, c}) and the second by trip chains ({d}, {e}, {f}). A tour-chain is defined as the set of tours that occurs within a day. We borrow this term from Ruan et al. (2012). In Fig. 1, the tour-chain is composed of two tours.

Fig. 1
figure 1

Example sequence of stops and trips

With regard to data processing, the critical first steps rely on the methods for identifying stops from trips and their purposes (Du and Aultman-Hall 2007; Greaves and Figliozzi 2008; Schuessler and Axhausen 2009, Sharman and Roorda 2011; Joubert and Axhausen 2011, Yang et al. 2014). Data might be collected to inform the activities at/and destinations (Alho et al. 2018). Even in the case such data is not available (i.e. only GPS traces are available), activities and destinations can be inferred. Examples are given by Sharman et al. (2012) who identify depots from the attributes of stops, and Sharman and Roorda (2013) who propose a process to assign parcel level information about the destinations.

Following the identification of stops, there are two seminal quantitative studies that address stop-to-tour assignment. Liedtke and Schepperle (2004) briefly describe a fuzzy logic-based pattern recognition method to process a 1.7 million trip records in five tour-types, covering both urban and interurban trips. You and Ritchie (2018) propose a method to post-process GPS data for identifying freight vehicle tours and applied it to GPS traces of freight vehicles traveling from/to port facilities. Beziat et al. (2015) presents tour-type identification by qualitatively defining 13 tour profiles based on interviews and a review of academic literature.

Several studies focus on the relationships between tour-type and tour-chain and driver, vehicle, shipment, and operator characteristics. Zhou et al. (2014) assume tour-types as per the total of deliveries performed in a day and show that commercial vehicle tour-types tend to be associated with commodity type, land use type, loading/unloading cargo weight and travel speed. Ruan et al. (2012) identify five major daily tour-chain-types based on the vehicle base location(s), tours per day and stops per tour. Using urban commercial vehicle survey data for tour-chaining choice model estimation, they analyse the relationship between tour-chain-types, cost and shipment characteristics. Khan and Machemehl (2017) also determine tour-chain-types as function of a base location, tours per day and stops per tour. They estimate a model for determining tour-chain-types and the number of trips, based on a wide list of factors using a multiple discrete–continuous extreme value model. The aforementioned studies do not reveal the details of the processes that lead to defining the tour-chain-types. Sharman and Roorda (2011) analyse day-to-day variations in terms of the overlap between the stop locations. While their data indicates that few destinations were visited on a daily basis, the research does not cover the analysis into the regularity of tour-type or tour-chain-type in freight vehicle operations; such analysis could provide insights on vehicle operational homogeneity across vehicle type or industry type over a certain period, informing whether single-day sampling would be sufficient for obtaining data about the routine of freight vehicle operations. As for the usage of the processed data, You et al. (2016) present a modelling framework of freight flows with spatial–temporal constraints that relies on tour-level data for calibration. Subsequently, You and Ritchie (2018) use tour-level data to explore tour-level behaviour of clean drayage trucks, revealing distinct travel patterns across days despite tour-types having repetitive patterns.

This review demonstrates the wealth of research that both contributes to and leverages tour-level analysis. In light of some gaps, we argue positively for a comparative study of methods applicable to stop-to-tour assignment, as well as of tour-type and tour-chain identification. In the present research, we aim to reveal (1) insights into the interpretation and inference of base locations; (2) the outcome of different assumptions in the algorithms to identify freight vehicle tours from stop chains; (3) the outcome of different assumptions in the algorithms to identify tour-type and tour-chain-types; and (4) day-to-day pattern homogeneity with regard to tour-type and tour-chain-type for a sample of tracked vehicles.

Data

The data used in this research consists of GPS traces and stop-level data obtained from a driver survey. The data collection process and the data collection platform, Future Mobility Sensing, are extensively described in Alho et al. (2018) and You et al. (2018), respectively. Stop-level data include stop purpose, location, duration, and cargo volume handled. The dataset includes records of 2151 driver-days with 497 unique drivers/vehicles. The sample is summarized by the vehicle body type and by industry type served in Tables 1 and 2, respectively. It should be noted that the sample is not representative of the population of freight vehicles in Singapore and was not collected with such intention. The data is only used to showcase an application of the methods further described. Although the above-mentioned data is not big data in itself, the algorithms could be applied to big data and the insights we intend to provide aim to inform a purposeful application. Stop-level data are often unavailable for GPS traces (Holguín-Veras and Patil 2005; Eluru et al. 2018) as driver surveys for stop-level data are costly. However, stop-level data are inferable (Sharman et al. 2012). Furthermore, for data collected by fleet monitoring system, stop-purpose inference algorithms can leverage small surveys and/or Point of Interest data. Moreover, if GPS traces are collected by vehicle operators, these data could be matched with the activity and destination information based on their shipment records.

Table 1 Samples by vehicle body type
Table 2 Sample by vehicle operations industry

Methods

Stop Identification

The identification of stops is a two-step process using a custom developed method (Zhao et al. 2015) which includes DBSCAN (Ester et al. 1996), a clustering algorithm. First, a stop detection algorithm is applied, and then we aggregate raw stop records over the vehicle tracking period. Specifically, at the vehicle level, stop records at nearby coordinates, within 500 m, are considered the same stop. As mentioned earlier, we assume as relevant stops those for deliveries and/or pickups as well as those at a base. Any other stops are considered nonrelevant.

Base Identification

The definition of “base” can vary across freight agents (such as shippers or carriers). This understanding seems to be shared by Ruan et al. (2012) who hypothesize multiple functions for the base, such as a “distribution center, a warehouse, a business location (e.g., retail store, construction site), or fleet operator’s home office/garage”. Furthermore, for a given driver/vehicle, there could exist multiple bases which differ not only in purpose but also in location. For example, a driver/vehicle might have a “parking base”, i.e. overnight parking location, and a “pickup base”, i.e. the facility to which the vehicle returns multiple times during the day for picking up the goods. Ruan et al. (2012) also propose tour-chain structures that consider multiple bases. Selecting either of these bases as the pivotal point of tours potentially leads to a different set of tours despite the same stops being visited in the same order. We aim to contribute to this identification process, by exploring other algorithms that consider the purpose of stops and/or vehicle payload, as it will be further explained.

Base identification was prior addressed by Sharman and Roorda (2011) in the context of having no data apart from raw GPS traces. The authors evaluated the existence of bases (named ‘depots’) by considering several variables. Selection criteria were related to the percentage of stops performed at a given location within the study area and the average duration of the longest stop at such location on sampled days. Although the method is applicable to our case, our process differs as we attempt to leverage survey data first, providing an illustration of an alternative process.

In the driver survey we leverage, drivers had the option to declare a frequent stop (location) as a base, subject to their perception of what a base is. In our method, we first attempt to leverage declared bases over those identified using other methods, if confirmed “true” according to the criteria of “daily visits”. If there is a need for base identification, we first aim to use the locations, where drivers change shifts daily, and only subsequently, locations, where pickups occur daily. The main justification for the latter is that one of the stop-to-tour assignment methods uses pickup locations, and we wish to keep the applications distinct. The algorithm is described by the following high-level pseudocode, and further detailed in “Appendix 1”:

figure a

Stop-to-Tour Assignment

We explore three algorithms for tour-type identification purposes, focusing on the regularity of activities, the type of activities performed, and the vehicle capacity usage. These are not an exhaustive list and other variables could be explored, which is out of the scope of this research, as their applicability is not so clear. All algorithms iterate over the stop sequences, inspecting the characteristics of each stop and assigning a sequential tour-identification number to it. These are:

  • Base-driven algorithm: A return to the identified base marks the end of a tour. This algorithm is most aligned with prior research applications such as You and Ritchie (2018) and Gopalakrishnan et al. (2019).

  • Purpose-driven algorithm: A pickup stop that follows a delivery marks the start of a new tour. This algorithm aligns to the case, where a vehicle returns, or heads to, one or more “operational” base(s) for picking up goods throughout the day. It has been applied by Jing et al. (2019) and Alho et al. (2019).

  • Capacity-driven algorithm: A pickup stop by an empty vehicle (i.e. the capacity usage is zero or equal to zero) marks the start of a new tour. We expect some level of alignment between the outputs from Capacity-driven and Purpose-driven algorithm, unless vehicles do pickups with some of the prior load still in the vehicle.

In all the three algorithms, a stop that follows a prior stop with a duration of over 240 min is considered the start of a new tour. This threshold was defined similarly to past research (You and Ritchie 2018) and observed in our data as a clear point demarcating between stop durations during operation periods (e.g., those for rest, pickup, and delivery) and those during non-operation periods (e.g., overnight and over the weekend). The high-level pseudocode for the algorithms is described following and further detailed in “Appendix 2”.

The Base-driven algorithm pseudocode is:

figure b

The Purpose-driven algorithm pseudocode is:

figure c

The Capacity-driven algorithm pseudocode is:

figure d

Table 3 illustrates the algorithms’ application to a hypothetical case. It can be seen that for the Base-driven algorithm tour-identification number (id) switches from 1 to 2 upon visiting the base. In this case, the increment is aligned to the non-sequential pickup in the Purpose-driven algorithm, whereas the Capacity-driven algorithm only triggers a tour id change when the vehicle capacity reaches zero (stop 8).

Table 3 Illustration of hypothetical application of stop-to-tour assignment algorithms

We compare the outputs of the three stop-to-tour algorithms in terms of the mean and standard deviation (SD) of the following indicators: (1) tours per day, (2) stops per tour, (3) tour duration (minutes), and (4) tour distance (kilometres). Our intention is not to reveal the best algorithm but rather to expose the implications of selecting them. To the best of our knowledge, there is no consensus on how to evaluate the effectiveness of the algorithms in revealing the “true” tours, since the concept is a human construct. In fact, as mentioned earlier, different interpretations of tours are used for different applications.

Tour-Type and Tour-Chain Identification

In the past research, tour-type and tour-chains have been defined considering base location, tours per day and stops per tour (Ruan et al. 2012, Khan and Machemehl 2017). This is considered as a valid method but potentially relies on a subset of relevant variables. Therefore, we explore also stop purposes and their recurrence, as well as regularity of stop locations. Regarding stop-purpose recurrence, the distinction is made in terms of the number of pickups relative to the number of deliveries. It is based on the hypothesis that certain combinations of stop-purpose recurrence are strongly correlated with operational characteristics of freight movements. Regarding the regularity of stop locations, we make a distinction between those stops locations that are fixed and unfixed. Fixed stop locations are those visited more than once in a day, while those unfixed are only visited once in a day.

The definitions of tour-types and tour-chains to be explored are shown in Tables 4 and 5, respectively. The identification process primarily categorizes tours and tour-chains into the following four groups:

Table 4 Tour-type identification criteria
Table 5 Tour-chain identification criteria
  • Direct tours that consist of one pickup and one delivery, associated with full truck load (FTL) shipments. Past research indicates that this tour-type is associated with longer distances travelled and larger dwell times (Ruan et al. 2012).

  • Unloading tours that consist of one pickup and more than one delivery, associated with less than truckload shipping (LTL) (e.g., parcel deliveries).

  • Loading tours that consist of more than one pickup and one delivery, associated with operations such as those of waste collection.

  • Mixed tours that consist of multiple pickups and deliveries, and can be associated with delivery tours that also collect returned shipments.

To identify the tour-chain, two algorithms are considered:

  • Tour-type-based identification (TT) Tour-chain is identified based on the types of tours performed in a day. If a tour-type accounts for at least 60% of all tours within a day, the tour-chain is labelled by such tour-type. 60% is set assuming that when two tours of different types are performed daily there is no predominant type.

  • Tour-chain-based identification (TC) This alternative algorithm characterizes the tour-chain at the day level. Instead of using the predominant tour-type, the algorithm reads stops-to-tour assignments, averages the stops per tour by purpose and then identifies tour-chain for a day.

Note that tour-chain groups are also defined, consistent with the ratios between #Pickups/tour and #Deliveries/tour, for achieving direct comparisons between algorithms and further use in day-to-day pattern homogeneity analysis.

Day-to-Day Pattern Homogeneity Analysis

The day-to-day pattern homogeneity analysis is used as a partial demonstration of how differences in the assumptions can lead to differences in outputs. Moreover, it provides insights on whether there is some level of homogeneity on the patterns performed by the vehicles.

We propose to use an entropy concept (Eq. 1) to quantify day-to-day pattern homogeneity. In past research, the entropy concept was applied to calculate the diversity of commercial establishment functions (Alho and de Abreu e Silva 2014). In this research, we apply it to measure the diversity of tour patterns. Pj is the proportion of tours (or tour-chain groups) of type j. J is defined in this case as the number of tour-types or tour-chain groups, depending on the application. This indicator is normalized and, therefore, ranges between one (with the equal share across tour-types or tour-chain groups) and zero (with the presence of only one tour-type or tour-chain groups):

$${\text{Entropy}} = \mathop \sum \limits_{j} \frac{{\left| { P_{j} \times \ln \left( {P_{j} } \right) } \right|}}{\ln \left( J \right)}.$$
(1)

The entropy is measured for the results of both algorithm outputs in tour-chain-type identification (TT and TC). It should be noted that the method is applied differently in both cases. For TT, we group tours by tour-types across the observed period, since the generalization at the daily level (i.e. the definition of the tour-chain) lowers the resolution of the inputs. For TC, we simply use tour-chain-type group. Thus, TT is applied from a perspective of all tours over the period (i.e., one or more tours types per day), while TC inputs are at the daily level (i.e., one tour-chain-type group per day).

Software and Hardware

To process the data, we use scripts written in the Python programming language. The selected hardware was an Intel Core i7-7700 CPU @ 3.60 GHz processor and 32 Gb RAM. In the current experimental setting, the algorithms run under a batch processing model, with run times, as listed in Table 6. The algorithms are compatible with the latest developments in the FMS platform (You et al. 2018), which has been developed to process and display collected GPS data from several types of loggers (stand-alone, built-in smartphones, and tablets). The scripts can be coupled in the platform as a streaming model.

Table 6 Tasks and run time

Results

Base Identification

As mentioned earlier, we start with a comparison between bases declared by the respondent and those detected from the GPS traces by the detailed algorithm. This process allows us to clarify whether there is some common understanding of “base locations” among drivers and whether revealed data is valuable when compared with inferred data.

500 drivers declared 1072 frequent stops, out of which 502 frequent stops were marked as bases, an average of 1 base per respondent. 66% of declared bases were the locations, where drivers start/end the work shift, followed by the locations, where cargo is picked up (16%). On the other hand, 63% of the non-base frequent places were associated with locations, where cargo and/or trailers are picked up. This revealed that some level common understanding of the concept of “base” exists, with it being, where the work shift starts/ends, and not necessarily, where regular pickups are performed. Despite this, and contrary to expectations, frequent stops were found to be visited sparingly. During the period of the survey (5 days), 35% of declared base frequent places and 6% of declared non-base frequent places were visited, which highlights some fundamental flaw either in the process of recalling or reporting information. This result indicates that declared bases (or frequent places) are not suitable as the reference points to identify tours.

Following these conclusions, we set to use revealed locations which were often visited as bases. Out of 8718 detected stops (i.e. clusters of raw stop records), 1186 were visited every day. Out of these stops with non-mutually exclusive activities, 88% are associated with a start/end work shift activity, 42% a pickup and 40% a delivery. Following, for 93% of drivers a single base was identified, and 2% of drivers have two or more bases identified. The records for which no bases were identified (5% of drivers), were excluded from the following steps of the analysis. For those drivers with base(s) identified, 34% were from declared bases, 57% were from revealed locations, where drivers change shift frequently, and 9% were from revealed locations, where drivers perform pickups frequently.

Stop-to-Tour Assignment

Table 7 shows the results across the different stop-to-tour assignment algorithms. Relatively similar results can be observed in the Purpose-driven and Capacity-driven algorithms, whereas the Base-driven algorithm leads to different outcomes. As expected, the latter leads to a smaller average number of tours, with more stops per tour, since the bases are, in many cases, the locations, where the drivers start/end the work shift (Fig. 2). The alignment between the Purpose-driven and Capacity-driven algorithms is mainly due to that fact that most vehicles return/head to the next pickup location with an empty load (i.e., perform full truck load operations). The choice of the algorithm has an influence on the tour-level indicators, and the base-driven algorithm is prone to reveal longer tours potentially including loading/unloading operations within the tour-chain.

Table 7 Comparison of stop-to-tour assignment algorithm results
Fig. 2
figure 2

Frequency of stops per tour for each algorithm

Tour-Type and Tour-Chain Identifications

Tables 8 and 9 show the results for tour-type and tour-chain identifications, respectively, using TT and TC. In Table 8, the results in the Tours column shows the share of each tour-type identified, whereas the Tour-chain column illustrates the outcome of the TT process using the 60% threshold.

Table 8 Tour-chain identification results using TT
Table 9 Tour-chain identification results using TC

The results of the TT application provide interesting insights. The Base-driven algorithm, associated with predominantly change-shift locations leads to a much higher share of Mixed tours, i.e. containing multiple pickups and deliveries, than the Purpose-driven and Capacity-driven algorithm. The tours identified by Purpose-driven algorithm are more compatible with the tour-type alternatives, since it relies on the stop activity purpose. The Capacity-driven algorithm was expected to produce similar results to Purpose-driven algorithm, as prior data analysis revealed that vehicles load fully at the pickup stops, and this seems to hold in many cases. For these algorithms, it can be noticed that there is a larger share of Non-identifiable tour-chains versus non-identifiable tours. This is due to situations, where there are no predominant tour-types, particularly associated with large share of days illustrated as having two tours in Fig. 2.

Regarding TC algorithm, the results follow a similar pattern, particularly in the alignment between the results for the Purpose-driven and Capacity-driven algorithm. As expected, the share of Mixed and Non-identifiable tours reveals that the Base-driven algorithm does not allow understanding fully the patterns of pickups and deliveries performed by the vehicles.

There are interesting algorithm-to-algorithm comparisons that can be drawn and are demonstrated for an application of the Purpose-driven method. Regardless of the application of the TT or TC algorithm, most of tour-chains belong to the same group (direct, unloading, loading, or mixed). Specifically, matches for direct group are 94%, for Unloading 86%, for Loading 86%, and for Mixed 89%. Differences are found in Non-identifiable tours, with 37% matched across algorithms.

An advantage of the TC algorithm is the ability to reveal that direct tours–chains have different natures, even for relatively homogeneous samples like the one used. For example, considering tour-chains in the direct group, and using the purpose-driven algorithm, 39% of the cases are “Unfixed pickup, unfixed delivery”. This reveals a non-negligible share of tours that are not what direct tours are intuitively associated with; it is often thought that, in many cases, a fixed distribution center is used for full truck load shipments to several destinations. Another interesting finding is that the TC algorithm produces results that reveal: (a) less predominance of direct tours and (b) smaller share of Non-identifiable tour-chains. This was expected, since averaging out stops per tour “smoothens” tour heterogeneity at the daily level. The application of TC allows for a decrease in approximately 63% of tours–chains that the TT could not identify. Lacking “ground truth”, no process can ultimately be deemed as correct or wrong. However, deeming higher output resolution as desirable, the TC algorithm seems to be more suitable.

Following, we detail the results from the perspective of industry served and vehicle body type, selecting the Purpose-driven algorithm due to its better fit to the TC/TT methods as well as high replicability potential compared with the Capacity-driven algorithm. We aggregated the outputs of the TC algorithm using the prior defined groups to allow for a comparison with the TT outputs. Similar proportions of tour-chains, at the group level, can be observed for sampled vehicle-associated industry types (Fig. 3) with most industries operating on direct tours. The retail industry stands out (albeit the small sample size), as over 70% of its tour-chains are Loading and Unloading tours. Regarding the tour-types associated with the sampled vehicle types (Fig. 4), direct tours–chains were also predominantly observed in most cases, other than Refrigerated Vehicles and Vans. This is not surprising, since about half of the vans, and one-third of the refrigerated vehicles serve the retail industry.

Fig. 3
figure 3

Tour-chain-type groups and industry served by vehicle (top: TT; bottom: TC)

Fig. 4
figure 4

Tour-chain-type groups and vehicle body type (top: TT; bottom: TC)

Day-to-Day Pattern Homogeneity Analysis

In the application of the tour identification with the Purpose-driven method, we compare the day-to-day pattern homogeneity of the outputs from algorithm TT and TC, disaggregated by associated industry type (Table 10) and vehicle body type (Table 11). Industry and vehicle body types that show high (or low) entropy values are consistent between the two algorithms but entropy values tend to be higher for the application of TT compared with that of the TC.

Table 10 Entropy of tours quantified across industries served by vehicles
Table 11 Entropy of tours quantified across vehicle body types

Looking at the tour-type homogeneity, in terms of industries served, construction is an example of having more homogeneous tour-types. With regard to vehicle body types, vehicles predominantly associated with construction also demonstrate this behaviour, e.g., Tipper/Dump Trucks.

Conclusions

A research gap has been identified regarding the methods of processing freight vehicles’ GPS data to tour and tour-chain data. The main steps to which this research aims to contribute are stop-to-tour assignment as well as tour-type and tour-chain-type identifications. In this paper, we explored several algorithms for such objectives, i.e., assigning stops to tours, identifying tour-types and tour-chain-types, and compared their outputs, highlighting differences. A major finding about the identification of bases, a critical input to one of the stop-to-tour assignment algorithms (Base-driven), was that declared data regarding bases might not be as accurate as inferred data. This finding holds despite a common understanding that the base is mostly associated with the place, where drivers change shift, which are not necessarily pickup locations. This also contributes positively to arguments towards not fully relying on survey data, which is likely to occur for applications to big data. Our analysis also revealed that most vehicle operations were associated with a base visited daily, in line with the most common assumptions in the literature regarding tour starting/end points. However, these findings also impact on the application of the subsequent tour-type and tour-chain-type identification algorithms. The Base-driven and Purpose-driven algorithms relied on different types of pivotal points to trigger the start of a new tour, resulting in considerably different outputs (tour counts and number of stops per tour). Such difference resulted in a low compatibility of the identified tours using the Base-driven algorithm with the selected tour-type and tour-chain-type identification methods. Despite this, we highlight that this conclusion could be related to the nature of the data used in the application, and further applications are recommended. In case an alignment between identified bases and pickup locations had been achieved, the results would be expectedly different. Thorough this paper, several differences in outputs arising from a combination of the methods selected and data at hand have been exposed. Ultimately, our findings indicate that researchers should take due diligence on selecting algorithms and provide clear descriptions on the selected pre-processing steps for a better understanding from the readers on the potential implications of the assumptions.