Understanding the urban life pattern of young people from delivery data

Young people are the backbone of urban development and an important pillar of social stability. The growth of young floating population in China has given rise to urban land expansion. Understanding the urban life pattern of urban life for young people benefits rational and effective land expansion. In this article, we introduce food delivery data into the process of exploring behavioral patterns of urban youth in Hangzhou, Zhejiang Province, China. The dynamic time warping (DTW) distance-based k-medoids method is applied to explore the main activity areas and activity patterns of the urban youth population. The results indicate that many young people from Hangzhou work in Internet companies, and most of work hotspot areas are observed in high-tech parks. The existence of overtime work is proved. Combined with the housing price data in Hangzhou, it is found that young people consider both housing prices and education environment when choosing where to live. The analysis combined with road network data reflects the planning characteristics of the city, also looks into differences between the actual city functions and the planning map. The proposed methods can provide useful guidance and suggestions for city planning.


Introduction
Many young people have moved from rural areas to urban areas in the past two decades (Luo et al., 2018). The younger generation of rural migrant workers have become the bulk of the so-called floating population in China (Tang et al., 2020). The increase in the floatingpopulation has given rise to urban land expansion, but expansion without guidelines is undesirable. Rational and effective land expansion is an important part of urban management (Luo et al., 2018). Thus, it is an important task for urban planners to understand the main activity areas and activity patterns of urban youth. Jurji et al. (2018) investigated the participation patterns of youth in physical and sporting activity in Kuala Lumpur so that the government or local authorities can take action for the physical and mental health of youths. Louzado et al. (2021) identified the factors related to the quality of life of young workers in Northeast Brazil. Matt et al. (2020) explored the measures for young talent attractiveness in urban factories.
With the wide-spread use of smartphone, GPS, and cellular network, massive location-based data was recorded by service providers, including taxi trace data Pan et al., 2013), public bicycle data (Bao et al., 2017;Kaltenbrunner et al., 2010;Vogel et al., 2011) and phone call data (Pei et al., 2014;Toole et al., 2012). These data have become an important data source for many scholars to conduct research on urban spatial dynamics perception owing to their benefits on coverage, spatial temporal resolution and cost, helping us to recognize the relationship between human activity patterns and urban function. Joewono et al. (2017) explored the action space of young workers in Bandung City, Indonesia by their travel chain data, providing basic information to anticipate future city development. Besides, some scholars use social media data to explore the behavioral patterns of urban youth. Tsou et al. (2013) used social media and web search engines data to explore the information spreading pattern of youth. Zhi et al. (2016) figured out clusters with significant spatiotemporal characteristics in city by analyzing about 15 million social media check-in data. However, many geospatial data that reflect the characteristics of human activities have their limitations. Taxi data is largely dependent on the existence of road network, and taxis are usually used for long-distance trips, accounting for only a small portion of overall crowd flow (Liu et al., 2012). Public bicycle data only reflects information near the supply station, and phone cell data is restricted by the locations of base transceiver stations. Besides, these data contain information on the activities of people of all ages, lacking attention to specific age groups. Social media data has been used for urban youth studies, but social media users are not necessarily a representative sample (Martí et al., 2019).
In recent years, food delivery has become popular among the young generation. In China, the food delivery industry in 2020 had reached a turnover of 603.5 billion yuan. Besides, for the 300 million users of food delivery in China, about 75% users age under 35 years old (Iimedia Report, 2018). Statistics show that most of food delivery users are young people. Yan et al. (2020) classified delivery data time series using Euclidean distance based k-means method to study young people's work and residential areas.
In this article, we used dynamic time warping (DTW) distance based k-medoids clustering algorithm (Chen et al., 2017) to discover the information contained in food delivery data that reflects the behavioral patterns of urban youth. Chen et al. (2017) suggested that the DTW distance based method outperforms the Euclidean distance based method in delineating urban functional areas. We explored the main activity areas and activity patterns of the urban youth population in order to understand the behavior patterns of urban youth, which can support urban planning. The locations of residents and firms affect commuting traffic. Restaurants or entertainment venues targeting younger users want to locate in areas where people congregate. It is important to understand the preferences of young people in choosing where to live and work so that urban planning can be adjusted to help attract young talent to the city.
The remainder of this paper is organized as follows. Section 2 introduces the study area and data source. Section 3 discusses the steps of using DTW distance based k-medoids method to extract urban functional areas. Section 4 analyzes the hotspots of working and living places for young people, and further examines the function distribution properties at the traffic analysis zone (TAZ) scale. The article concludes with a summarize as well as directions for further research in Section 5.

Study area
Urban areas are places with high population densities and attract a large number of young workers (Peri, 2002). In order to explore the job-residence characteristics of young people in cities, we selected six districts in the main city area of Hangzhou in Zhejiang Province, a populous city in the east coast of China, as our study area (Fig. 1). Hangzhou is an ancient capital in the south of the Yangtze River, famous for its beautiful scenery and cultural heritage. Since the twenty-first century, the rise of the digital economy has pulled a new round of urban development in Hangzhou. Relying on Alibaba and other head Internet enterprises, Hangzhou has been at the forefront of digital development in China. The administrative boundaries of Hangzhou were downloaded from AutoNavi's open map API (https:// developer.amap.com/api/webservice/guide/api/ district).

Grids and road network
To generalize the order data, we use grids and road network as two basic research unit.
The 0.001 degrees (approximately 100 m) grids can provide a higher spatial resolution for further analysis, but grids themselves have limited geographic meanings. Thus, road networks ( Fig. 2) were introduced into our study. The road networks were constructed by tertiary roads, which can be approached on OpenStreetMap (https://www.openstreetmap.org/).

Food delivery data
Food delivery orders can express young people's jobresidence patterns, for the destinations of orders changes when they are at home or company. Our study used 6,104,862 food delivery orders during July 28 and September 26, 2017 (Fig. 3), which were provided by Dianwoda (https://www.dianwoda.com/), an instant logistics platform.
The order data contains departure and arrival coordinates, which form an OD pair for every order ( Table 1). The latitude and longitude provide spatial information, and the arrival datetime made it possible to compose time series of delivery orders.

POIs
POI (point-of-interest), is a specific point location that someone may find useful or interesting. As POIs have specific social function, the spatial distribution of POI categories, in some extent, can explain the function of regions (Gao et al., 2017).
The POI data was provided by NavInfo, a company with core businesses in HD map and high accuracy positioning.

House price data
Young generation in China are under pressure from high housing prices (Ma, 2018). In order to explore the inner connection between where the youth live and the house prices there, we use house prices data from the website

Time series construction
In order to extract the work and live areas of urban youth, we focus on the delivery destination points. The study area is divided into equal grids with spatial size of 0.001 degrees (approximately 100 m). The destination points have longitude, latitude and arrival time so that can be aggregated at grid level. The hourly value of the count of destination points in a grid is used to form a time series.
To reduce the computational burden, the time series is averaged for weekdays, Saturdays and Sundays. This pattern is chosen based on the dynamic time warping (DTW) (Sakoe & Chiba, 1978) distance between normalized timelines of different days in a week (Table 2). A larger distance indicates a greater difference between two time series. It is shown that the pattern of Monday, Tuesday, Wednesday, Thursday and Friday are much more similar to each other than to Saturday and Sunday. The large distance between Saturday and Sunday indicates that they have separate patterns. Thus weekdays, Saturday and Sunday form a 72-h series.
As the order data is separated extensive points, we first match the destination points into grids, then all the data will be merged by their timestamp in an hourly basis. As result, we get a 24-dimensional vector V i for each unit: where d t is the hourly density of arrival order. In this way, a 72-dimensional vector V′ i can be constructed by three 24-dimensional vectors: where V w represents for a 24-dimensional vector of typical workday, and V sa represents for a 24-dimensional vector of Saturday while V su represents for a 24dimensional vector of Sunday.  Last, we normalize the vectors by min-max normalization. The normalized vectors V ′ ′ i reflects the unit's weekly trend of food delivery orders and are then used in the DTW distance based k-medoids method.

DTW distance based k-medoids method
Dynamic time warping is used for similarity measure on time series data. Let A = < a 1 , a 2 , ..., a i , ..., a I > and B= < b 1 , b 2 , ..., b j , ..., b J > be two time series. An I-by-J matrix is constructed in which the element d ij indicates the distance between the two points a i and b j .
A set of neighboring matrix elements are chosen to form a warping path W, which indicates the mapping from pattern A to B. It is defined as w 1 , w 2 , ..., w k , ..., w K where max(I, J) ≤ K < I + J − 1 and w k = (i k , j k ).
The warping path has three following restrictions, namely monotonic condition, continuity condition and boundary condition. Monotonic condition indicates that i k − 1 ≤ i k and j k − 1 ≤ j k . Continuity conditions restrict that w k and w k − 1 should lie in adjacent cells (diagonally adjacent cells are included). Boundary condition requires w 1 = (1, 1) and w K = (I, J). So that the warping path starts and finishes in diagonally opposite corner cells of the matrix. Among all the paths that satisfy these conditions, the one that minimize the warping cost is chosen: The warping cost is the DTW distance between two time series. This optimal cost can be calculated using dynamic programming to find minimum of the cumulative distances: The overall similarity is given by the value of D(I, J). In order to find the similarity, DTW "warps" the time axis of one (or both) series for a better alignment (Keogh & Pazzani, 2001). However, this warping cannot be unconstrained in our study since we need to focus on the behavioral pattern in specific time periods such as lunch and dinner. Thus, we strengthen the boundary restriction by aligning two time series every six hours.
K-medoids (Kaufman & Rousseeuw, 2009) is a wellknown clustering method. It is closely related to the kmeans (MacQueen, 1967) algorithm. It is more robust than k-means since its center is an actual object which minimize the dissimilarities among data objects instead of the mean position of objects in the cluster (Arora and Varshney, 2016). In the DTW distance based k-medoids method, the similarity of objects is measured by the DTW distance. Objects that are more similar to each other when the DTW distance between them is smaller.
The reliability of different clustering number k is evaluated base on the silhouette score (Rousseeuw, 1987). For an object i, a(i) is the average distance between i and other objects in cluster C i . For other clusters is a criterion to evaluate the extent to which i is more suitable for C i rather than adjacent clusters: A large value of s(i) indicates that i is well assigned. If s(i) is close to 0, then it is uncertain whether i belongs to C i or other adjacent clusters. A negative value of s(i) indicates that the assignment of i is wrong. The average s(i) of all objects is used to measure the overall quality of clustering results.

Cluster results validation
POIs can be used to feature the function of an urban region. A POIs dataset assists in discovering functional attributes of different areas. This study uses POI data as auxiliary information to help identify the functional attributes of the clustering results. The enrichment factor (Verburg et al., 2004) is used to measure the relative abundance of a certain type of POIs. The factor is defined by the occurrence of a POI type of a location relative to the occurrence of this POI type in the whole study area: F i, k characterizes the enrichment of POI type k in the location i. n k, i is the number of POIs in the k th category in the location i. n i is the total number of POIs in the location i. N k is the total number of POIs in the k th category. N is the total number of POIs in the study area. The enrichment factor is larger (or smaller) than 1 when the enrichment of POI type k in the location is larger (or smaller) than the regional average enrichment. The factor is 1 when the enrichment of POI type k in the location equals the national average.
The final result is validated by comparing with the urban planning map released by Hangzhou government (Fig. 4). The land use types mainly include residential, industrial, commercial, educational and green. Commercial and industrial sites are generally used to build office buildings and industrial parks, representing places where people work. Educational land is generally used to build schools, which are mixed-function areas that include places where teachers work and dormitories where students live. Residential land, on the other hand, includes neighborhoods and residential buildings, which are places where people live. Our clustering method mainly distinguishes between workplace and residential places. Work areas include industrial land, commercial land and educational land. Residential areas include residential land and also educational land.
In order to better detect the urban structure, the cluster results are aggregated in the traffic analysis zone (TAZ) scale. Shannon index is used to characterize the function diversity of TAZs: where p i is the proportion of function i.

Cluster results analysis
Hotspot and cold-spot analysis is used in this study based on Getis-Ord Gi* statistic using fixed distance band in ArcGIS software (Getis & Ord, 1992). The Hotspot Analysis tool calculates Getis-Ord Gi* statistics for each element in the dataset. With the obtained z-score and p-value, we can know where in space clustering of high-or low-value elements occurs. This tool works by looking at each element in the environment of neighboring elements. To be a statistically significant hotspot, an element should have a high value and be surrounded by other elements that also have a high value. The Getis-Ord local statistic is given as: where x j is the attribute value for feature j, w i, j is the spatial weight between feature i and j, n is equal to the total number of features and: The G Ã i statistic is a z-score so no further calculations are required.

Cluster results
The appropriate cluster number for the DTW distance based distance k-medoids method is determined using the silhouette score. Figure 5 shows that the highest value is reached when the cluster number k equals 2. Therefore, the optimal cluster number is identified as two. Figure 6 represents the mean order value distribution over time for the two clusters. Different temporal pattern in weekdays, Saturday and Sunday is demonstrated. Figure 7 demonstrates the location distribution of the clustered grids. By comparing the temporal patterns of delivery orders inside the areas, we can identify their functional attributes, such as residential, work, etc. The work areas usually have more orders at lunch time than dinner time. The number of orders is also higher on weekdays than on weekends in work areas. The behavioral pattern of residential areas tends to act in the opposite way. According to this hypothesis, functional regions can be recognized.
Cluster 1 shows a typical live area temporal pattern. The order values on rest days, Saturday and Sunday, exceed those on work days. Delivery orders during weekday do not have much difference between lunch time and dinner time, while lunch orders on weekends are much higher than dinner orders. Table 3 shows that the enrichment factors of residential community, resident service and restaurant are high in Cluster 1 areas. This corroborates that Cluster 1 belongs to residential area.
Cluster 2 shows a work-related characteristic. Delivery orders reach a peak at lunch time on weekdays. This illustrates that delivery orders in this area are gathered during working hours. The order quantity from largest to smallest is weekdays, Saturday and Sunday, exactly opposite to which of Cluster 1. The enrichment factor of POIs for enterprise, car sales and service, as well as research institution categories are high in Cluster 2. These points of interests are all places where young people work at. Table 4 illustrates the calculation result of the overlap rate between cluster results and the official urban planning map. According to the temporal pattern of the two clusters, Cluster 1 is defined as live area and Cluster 2 is defined as work area. The classification results were validated according to the urban planning map, and the accuracy rate is 72.67%.  After extracting the work area, which is represented by the grids in Cluster 2, using DTW distance based kmedoids method, we use hotspot analysis to detect areas where young generation gather for work. The hot spot is extracted base on the delivery order count during lunch and dinner peaks during weekdays. Figure 8 indicates 5 hot spots of work area. Area A is located in the heart of the technology industry center of Hangzhou Economic and Technological Development Area. Hangzhou Hi-tech Enterprise Incubate Park, Zhongzi Science Park and many other high-tech industries are all located in this area. It is close to Xiasha district where many colleges and universities locate in. Relying on university science and education resources for high-tech industry construction is a very typical and orthodox concept.
Area B includes places such as Zhejiang National University Science Park and Hangzhou East Innovation Center. It is located near Jiubao, which is a large transportation hub that brings together the interwoven road network system of Subway Line 1, Passenger Transportation Center, Bus Rapid Transit, Desheng Expressway, Jiubao Bridge and Yanjiang Avenue.
Area C is situated in Hangzhou Hi-Tech Industry Development Zone in Binjiang, which is an area with a high concentration of Internet companies. The completion of the Binjiang Internet Center is, on the one hand, the result of land expansion and, on the other hand, the need to shift the center of gravity of the entire development of Hangzhou southward, i.e., from the West Lake era to the Qiantang River era, from the development around the lake to the development across the river. Area D is located in the central city of Hangzhou. Qianjiang New City, which carries Hangzhou's future urban CBD, is the image portal of Hangzhou's future international business and has many high-end office buildings.
Area E contains West Lake Science and Technology Park, Hangzhou Internet Innovation and Entrepreneurship Park and Hangzhou North Software Park.
Hangzhou is better known for its relatively developed Internet industry. The result illustrates that high-tech Internet companies in Hangzhou provide a large number of jobs for young people. The regional gathering effect is obvious for Hangzhou's Internet industry. The formation of Internet industry clusters contributes to the agglomeration effect of talent at the end.
The current state of overtime work for young people is a topic of wide social concern (Liu et al., 2019). The time pattern of Cluster 2 reveals that delivery during weekdays and Saturday are higher than delivery during Sunday. Thus, places where people work overtime are determined using the order data after 8 pm on weekdays and during the lunch-dinner peak on Saturday.
The result shows that the hot spots where people tend to work overtime is highly coincide with those where young people gather to work (Fig. 9). Overtime work has always been a tradition in the tech industry around the world (Xiaotian, 2019). Since a large number of young people work in Internet-related companies in Hangzhou, the health damage caused by demanding management and the rights and interests of workers should be taken seriously (Xiaotian, 2019).

Live areas
The number of delivery orders during peak dining hours on weekends is used to extract hotspots for accommodation (Fig. 10). The main difference between the residential hotspots and the work hotspots is that while many young people choose to work in the city center, few live in this area. Figure 11 indicates that a reason why young people choose to not live in the central city might be that the house price is too high to afford. They choose to live in places where housing prices are relatively lower. 70.37% percent of living hotspot belong to areas where house prices are lower than 48,484 yuan every square meter. The only exception is Wenjiao district, which is a residential hotspot but is located in the area with highest prices.
Wenjiao district in the west of the city is home to a number of famous elementary schools and middle schools in Hangzhou. It is the top educational gathering place in Hangzhou. High-quality educational resources have driven up the price of housing in this area. The strong cultural environment, mature facilities and location advantages also make this area popular among young people. This phenomenon shows that in the context of Hangzhou's multi-center development, the city center, where educational resources are concentrated, still has a strong attraction for young people.

Traffic analysis zone (TAZ) level analysis
Function detection based on grids is capable of analyzing urban functional areas at a fine-grained level. As the basic unit of urban function, urban management and urban cognition, traffic analysis zone (TAZ) is one of the important elements in urban planning and constitutes the basic unit of residents' life and urban environment. TAZ-level analysis may contain more socio-economic information.
Shannon index is used to evaluate the function diversity in different TAZs (Fig. 12a). The hot spots and cold spots are shown in Fig. 12b. Area A contains regions with a high degree of mixed functions, basically clustered around the downtown area. The central city undertakes various functions such as living, administrative  offices, commerce and finance, tourism services, science and technology education, culture and entertainment, high-tech industries, etc. Area B is in Xiasha district, which is a comprehensive new city with Hangzhou Economic and Technological Development Zone and High Education Park as the backbone. The new city has an obvious planned structure, with the education and research area in the north, the industrial area in the south and west, as well as the residential and living area in the central and eastern riverside areas. Figure 13 shows the difference between incoming orders and outgoing orders in TAZs during different time periods. This helps us understand the dynamic "sourcelink" structures of information landscape. (Gao, 2015) The visualization result indicates that the number of incoming orders and outgoing orders are in a balanced state. It shows that most of the TAZs in Hangzhou are able to reach a self-sufficient state in the road network planning. The number of areas where incoming order count exceeds outgoing order count is greater than the number of areas where outgoing order count is larger. This phenomenon indicates that the outgoing orders are concentrated in some areas and they provide orders for the surrounding areas.

Discussion and conclusion
Our study has the following practical and empirical implications. First, the time pattern of food delivery data can be used to detect work areas and live areas of urban youth. Monitoring and analysis using food delivery data facilitates the rapid detection of change and can provide urban planners with up-to-date information on the work and residential preferences of young people by targeting specific age groups. Second, urban function detection using the grid has a higher spatial resolution and can better reflect the planning characteristics of the city when combined with the functional analysis of the TAZ as the basic unit. Finally, the methodology used in this study can be more widely applied to explore the functional attributes and hotspots of urban areas using spatial data consisting of time series.
This study proves that the time curve formed by the food delivery data can reflect the functional characteristics of the region's employment and residence. The DTW distance based k-medoids method is able to cluster the regions with similar function. The temporal pattern in the work area is characterized by higher orders for lunch than for dinner, and higher orders on weekdays than on weekends. The temporal pattern in the residential area is characterized by a smaller difference between lunch orders and dinner orders, and higher orders on weekends than on weekdays.
After the division of work and residence areas, the number of delivery orders during peak meal times can be used for hot-spot analysis to explore the hotspot areas where young people work and live. The results show that Internet companies in Hangzhou provide a large number of employment positions for young people, and most of work hotspot areas are observed in hightech parks. The aggregation of take-out orders at  The functional analysis combined with the road network data can better reflect the planning characteristics of the city, as well as compare whether there are differences between the actual city functions and the planning map. TAZ-level analysis recognizes the hot  spot and cold spot of function diversity. The findings indicate a high degree of functional mix in the old city, while the new city has a more complete plan and a low degree of functional mix. By comparing the data on food deliveries to and from each TAZ, it is found that most of the TAZs are able to achieve selfsufficiency.
The recognition accuracy of the regional function in the study results still needs to be improved. When the function complexity of the study unit is high, the proportion of each function cannot be extracted. The study mainly focuses on the hot spots where young people work and live. It is also possible to dig further into their work-life. In future studies, we plan to incorporate multiple sources of human activity information to gain further insight into the working and residential lives of young people, such as commuting routes, etc. Mixed land use remains a major challenge in research, and future research could focus more explicitly on the problem of misclassification in the identification of mixed functional areas.

Availability of data and materials
Food delivery data and points of interest (POI) data presented in this study are available on request from the corresponding author. The data are not publicly available due to they are commercial data available for purchase. Publicly available datasets were analyzed in this study, the road network data can be found here: https://www.openstreetmap.org/, and the housing price data can be found here: https://hz.58.com/.