Deriving intercity human flow pattern and mechanism based on cell phone location data: case study of Guangdong Province, China

The spatial pattern and mechanism of human flow are of great significance for urban planning, economic development, transportation planning and so on. In this study, we used cell phone location data to represent the human flow network in Guangdong Province, China, using the 21 cities in Guangdong as “nodes” and the human flow intensity among them as “edges”. Then we explored macro and micro features of the human flow network, by using the index of degree distribution, alter-based centrality and alter-based power, respectively. Finally, we proposed a human flow estimation model which integrates individual urban characteristics, intercity links, and differences to further analyze the affecting factors of human flow. We found that the human flow network in this region is significantly scale-free, with Guangzhou, Shenzhen, Foshan, and Dongguan being the most important cities. We also found that the newly proposed model can explain the human flow in the study area, with an R2 of 0.914. Analysis results show that the factors of employment in tertiary sector, intercity internet attention, intercity differences in the number of tertiary workers, differences in population size, and distance have significant impacts on the human flow. This study may provide insights into human activity mechanisms that can contribute to urban planning and management.


Introduction
Spatio-temporal patterns of human flow can reflect the spatial structure, functional structure, economic vitality and development prospects of an area to a certain extent (Holland, 1996). Meanwhile, these patterns may also affect the spread of infectious diseases, transportation, pollution exposure (Dong et al., 2021;Tokey, 2021;Yu et al., 2020;Zhou et al., 2020). Exploring human flow patterns and their causes has, therefore, been an important research topic in human geography studies (Kou et al., 2021;Wang et al., 2021;Yang et al., 2019), and has received more and more attentions in recent years with the development and applications of various spatiotemporal big data.
Human flow estimation is the basis of human flow pattern analysis. In the early stage, due to the lack of data which can intuitively represent the flow of people, scholars mainly used theoretical models to estimate the human flow. The "push-pull" theory, proposed by Ravenstein (1885) and refined by Heberle (Heberle, 1938), holds that human migration is affected by the repulsive force of the origin and the attractive force of the destination. Based on the "push-pull" theory and the law of universal gravitation, Taylor P. J. proposed the gravity model (Taylor et al., 2001). In this model, the human flow intensity between two places can be determined by their residential population and physical distance. The gravity model explains human mobility intuitively with easily accessible data. It has, therefore, been widely adopted and improved by scholars in the quantitative researches of human flow (Belyi et al., 2017;Wang, Dong, et al., 2019). However, due to lack of data which can intuitively represent the flow of data, it is usually difficult to calibrate the unknown parameters in traditional models, as well as to verify their accuracy. Meanwhile, population size might not be able to accurately explain human flow (Liu et al., 2014).
With the increased accessibility of various spatiotemporal location big data, such as traffic data and user location data, scholars have started using various spatiotemporal big data to estimate the intercity human flow strength (Ahas et al., 2015;Jin et al., 2016;Lai & Pan, 2019). Among them, cell phone location data show greater potential of application. As a common device in modern society, cell phone has diverse user groups and large penetration rate. The location data of cell phone users are capable of representing close-to-real population distribution in some cases (Alexander et al., 2015;Shi et al., 2015;Zhang & Thill, 2019). More and more studies have, therefore, begun to use cell phone location data to estimate human flow and identify its patterns (Cao et al., 2021;Tokey, 2021;Zhang & Ng, 2021).
Network analysis is a commonly used method for pattern analysis, which regards human flows as "edges" and cities as "nodes". There are mainly two types of patterns for human flow networks, namely the micro and macro patterns (Zhang & Thill, 2019). The macro patterns refer to the overall network structure characteristic of a region, which can be represented by the ranking and distribution of nodes, while the micro patterns refer to the importance of cities represented by the centrality of the nodes. Centrality measures are important indicators which can reflect the significance of a city in a given network. A variety of measures have been proposed. The intermediate centrality considers the number of times nodes appear on the shortest path in the network (Freeman, 1978). The degree centrality and weighted degree centrality consider the number and weight of edges associated with nodes (Brands, 2001). The proximity centrality considers the distances between one node and the other nodes (Neal, 2011). The alter-based centrality and power considers both the direct and indirect interaction and distinguish the centrality and power of nodes (Neal, 2012;Ziyu et al., 2017). Zhao et al. further improved the indicators of alterbased centrality and alter-based power to make them more suitable for China (Zhao et al., 2017).
Existing studies have not only described characteristics of human flow, but also analyzed factors that affect it. The most considered factors include individual characteristics of the origin and destination cities, such as population size, GDP, and the geographic distance between the two cities (Wang, Dong, et al., 2019). Many scholars believe that cities with larger populations, larger GDP, and closer distance between them are more likely to generate massive human flow (Cao et al., 2018). Some scholars found that similarities in the intrinsic characteristics of the origin and destination cities, such as geomorphic condition, cultural background, administrative hierarchy, may also affect the human flow strength (CNNIC, 2015;Guo et al., 2018). There might be other factors influencing the human flow, for example, information flow, traffic flow, differences in the population and GDP, etc. Few studies, however, have explored influence of these factors on human flow.
In summary, current human flow pattern analysis requires accurate data to construct the network and select the appropriate indicators for the study area. When analyzing the factors of human flow, only the inherent characteristics and geographical distance of individual cities are considered, while the differences formed in the dynamic development of cities and the various links between cities are usually ignored. In this study, we selected Guangdong Province, China as our study area and used cell phone location data and complex network analysis to analyze the macro and micro structure of human flow. After that, we analyzed the main factors influencing the human flow strength in this area. This paper can provide basic information and decision-making support for this region.

Study area
In this study, we select Guangdong province as our study area, which locates in Southern China (Fig. 1). The total population of the study area is 115.21 million, with an area of 179,725 km 2 . There are altogether 21 cities in the study area, with most populated cities, Shenzhen and Guangzhou, respectively. Guangdong Province governs 21 municipal cities, which can be further divided into four regions, namely the Pearl River Delta region, Western Guangdong, Eastern Guangdong, and Northern Guangdong. Adjacent to Hong Kong SAR (SAR, Special Administrative Regions) and Macao SAR, the PRD (PRD, Pearl River Delta) region is the most developed region in the province, with per capital GDP 130,182 Yuan. There are two first-tier cities, namely Guangzhou and Shenzhen in this region. Western Guangdong is the second economically developed region in the province, with per capital GDP 43,922 Yuan. The Northern and Eastern Guangdong is less developed, with per capital GDP 33,039 Yuan to 35,844 Yuan. Fig. 1 shows location of the study area, locations of the 21 cities in the study area.

Dataset
Five types of data are included in this study, namely cell phone location data, Baidu index, intercity vehicle volume data, intercity road distance, and statistical data from yearbook. Specifically, Cell phone location data is the basis for human flow pattern analysis. The five datasets are used as variables in human flow estimation model. The specific descriptions of the datasets are as follows.
(1) Cell phone location data. Nowadays, cell phone has become the most commonly used mobile device in China. The penetration rate of cell phone had reached 112.2 cell phones per 100 people by the end of 2018 (MIIT, 2019). As for the study area, we calculated the cell phone penetration rates for all the 21 cities in Guangdong Province, using data from the 2019 Guangdong Provincial Statistical Yearbook. The penetration rate was calculated by dividing the number of cell phone subscribers in a city with its total population. The results show that the penetration rate of cell phones ranges from 0.8 to 2.4 in Guangdong, with an average value of 1.5 (Fig. 2). It should be noted that the subsribers include non-cellular devices which use mobile SIM cards. Although we do not have the specific number, most of the subsribers are personal cell phone users. The cell phone location data can, therefore, represent the human flow in each city well.
Each record in this data represents the number of cell phone users who appeared in both city i and city j during 30 days before day t. It can represent the human flow between cities over the 30-day period. The dataset was collected from June 13, 2018, to September 10, 2018, with a total of 61 days.
(2) Baidu Index. The Baidu index characterizes the internet attention between cities. The data was obtained from Baidu Index platform, which is a data analysis platform based on Baidu user search behavior data and currently provides large number of keywords search indices from 2011 onwards. By setting the keyword as the target city, one can find users' attention in each city to the target city. For example, we can get the internet attention of Shenzhen to Guangzhou by setting the keyword as "Guangzhou" and the query area as "Shenzhen". The observation period was set the same as the mobile location data. (3) Traffic carrying capacity data. Traffic carrying capacity refers to the number of passengers that can be carried between cities by trains and buses. In this study, we used the Selenium Python package to crawl the train and bus schedules in one single day from the booking page on Ctrip.com, and then estimated traffic carrying capacity between cities, assuming that there are 50 passengers per bus, 600 passengers per high-speed train, and 1200 passengers per normal-speed train (Wang, Dong, et al., 2019). (4) Inter-city road distance data. Inter-city road distances are obtained through Baidu Maps. We use Baidu Map's route search function to set the two cities as the starting and ending points, and query to get the distance. (5) Statistical yearbook data. The statistical yearbook contains socio-economic data. The data obtained from the statistical yearbook include GDP, average salary, residential population, total employed population, employed populations in primary, secondary and tertiary sectors. The data in this paper were obtained from the 2018 Guangdong Statistical Yearbook.

Method
Our study includes three main steps. First, we built a human flow network based on cell phone location data; second, we selected reasonable indicators of centrality and power for the human flow network and then analyzed its macro and micro characteristics; finally, we constructed an improved human flow estimation model and used it to analyze the factors affecting human flow.

Constructing human flow network
The key of constructing human flow network is to measure the weight of edges reasonably. In this study, we used cell phone location data as the weight of edges and constructed the intercity human flow network of the study area.
The 21 cities in the study area were defined as the nodes in the network. An edge is set for every two city nodes i and j (1 < =i, j < =21), while the weight of edge W ij is defined as the human flow volume between city i and j during the observation period, represented as, t = 1, 2, …, 61. The urban human flow network in the study area can then be constructed based on the nodes and edges defined above.

Analyzing the macro characteristics of human flow
Based on the above-mentioned human flow network, we analyzed the macro characteristics of human flow from three aspect. The spatial distribution of human flow intensity between cities was firstly analyzed. Then, the cities were ranked according to the weighted degree centrality (Introduction) as Eq. (1), which can represent the importance of nodes well. Finally, linear correlation was performed for the logarithmic values of the rank order and the weighted degree centrality. If there is a significant linear correlation, the weighted degree centrality of nodes in the network is power-law distributed and the network is a scale-free network (Barabási & Albert, 1999).
where n represents the number of edges.

Analyzing the micro characteristics of human flow
In this part, we further analyzed the micro characteristics of the human flow network, including the status and role of cities in the network. These characteristics of nodes are usually described by centrality and power in network analysis. Among different indicators of centrality and power, the improved version proposed by Zhao et al. considers both direct and indirect connections between nodes (Zhao et al., 2017) and distinguish the centrality and power, reducing the overestimation of satellite cities caused by the impact of their connected core cities. They proposed the dependency parameterd ij , which uses the ratio of the edge weight W ij of city i and j to the degree centrality C D j of city j to characterize the dependence of the city j on the city i (Zhao et al., 2015). Their method is more suitable for portraying the importance and position of different cities in a region. In this study, we adopted this idea and constructed the alter-based centrality and power. The indexes can be expressed as Eqs. (2), (3) and (4).
where AC i is the alter-based centrality of city i, AP i is the alter-based power of city i, W ij is the weight of the edge of city i and city j, i.e., the actual human flow volume evaluated by cell phone location data between the two cities, C D j is the weighted degree centrality of city j, i.e., the number of cities that city j connects with other cities, while d ij is the dependency parameter, which portrays the dependency between cities.
Higher alter-based centrality of a city indicates stronger ability to collect and disperse resources and more likely to become a hub in the network. Whereas higher alter-based power means stronger ability to control resources and more potential to attract resources.

Building an improved human flow estimation model and analyze the key influencing factors of human flow
Human flow estimation model can explain human flow and estimate real human flow when lacking real human flow data. Therefore, based on the classical gravity model, we proposed an improved human flow estimation model, as well as analyze the factors affecting human flow.
Model construction requires appropriate dependent and independent variables. The independent variables include individual characteristics, links, and differences.  Meanwhile, the dependent variable is the human flow obtained from cell phone location data. The reasons why we select these variables are in detail below. The variables are listed in Table 1.
(1) Individual characteristics of cities. The GDP and population size of a city are often mentioned in the analysis of human flow factors (Shen & Liu, 2016;Windzio, 2018;Zhang et al., 2020). We believe that the number of workers in different sectors is also a key factor affecting human flow. We, therefore, chose GDP (GDP) and average wage (Salary) as the measurement of economic development level, resident population (pop), and total employed population (workpop) as the indicators of population size, and employed population in primary, secondary and tertiary sectors (workpop1, workpop2, workpop3) as the measurement of each sector's scale. This data was obtained from the statistical yearbook of Guangdong Province in 2018.
(2) Links between cities. We considered two primary links between cities, i.e., the transportation accessibility and internet attention between cities.
(a) Inter-city transportation accessibility index. The higher the traffic accessibility, the more convenient it is to travel across cities, and the more likely to generate close urban human flow. In this paper, we adopted geographic distance dist ij and traffic carrying capacity traffic ij to characterize inter-city traffic accessibility, where traffic carrying capacity refers to the number of passengers that can be carried between cities by high-speed/normal-speed trains and long-distance buses. (b) Internet attention between cities. In the information age, using the internet to search for information has become a major activity of internet users (Wang & Loo, 2019). Internet attention has, therefore, become an important inter-city link. The more web searches a city has, the more likely it is to become a potential destination. We characterized bdindex ij , the internet attention of city i to city j, by counting the sum of Baidu index of search cities to the target city during the observed time.
(3) Differences between cities. In physics, gradient refers to the rate of change with respect to distance of a variable quantity. It can also describe the uneven distribution of geographical elements. The generalized gradient theory suggests that gradients are embodied in natural, economic, social, human resource, ecological, and institutional differences. A gradient in any sense is both a gradient nudging party and a gradient receiving party. It exists throughout human economic activities (Li, 2004). At present, little attention has been paid to the impact of gradient effect on human mobility. This paper aims to analyze the influence of the difference of GDP (d gdp ), the difference of average wage (d salary ), the difference of permanent resident population (d pop ), the difference of total employment population (d workpop ), the difference of employment population in various industries (d workpop1 , d workpop2 , d workpop3 ) on the human flow between cities.
Adding more explanatory variables can make the description of the human flow more complete. To find its unknown parameters, the logarithm is often taken for both sides of the model. Then we performed stepwise linear regression (Hair et al., 1998) for the data from June 13 to August 11, 2018, and selected the model with maximum R 2 value as the optimal model, as shown in Eq. (5).
Since population contributes a lot to human flow estimation and may be closely related to the employed population and internet attention, we added population as a confounding variable to control the population effect (Eq. (5)). The parameters description is in Table 2.
In order to validate the above-mentioned models, we used the data from August 12, 2018 to September 10, 2018 as test data, and used the index of R 2 and RMSE to evaluate the prediction accuracy. The formula of RMSE is shown in Eq. (6) Based on the above constructed model and the coefficients determined by regression analysis, we further analyzed the correlations between the variables and human flow, as well as their influences.

Macro characteristics of human flow network
According to the human flow network construction method described in 3.1, the human flow network map of the study area can be obtained, see Fig. 2. The local human flow patterns in the study area can be divided into four levels according to the linkage strength from high to low (In Fig. 2, the strength is listed from smallest to largest in order from top to bottom). The first level is the strong human flows between Guangzhou, Dongguan and Shenzhen, which are located in the center of the PRD. The second level is mainly the sub-strength links among Guangzhou, Dongguan and Foshan. The third level is the moderate links, which includes two types, namely, internal PRD links and PRD-peripheral cities links. The fourth level is the weak connection among the peripheral cities.
The weighted degree centrality is calculated according to Eq. (1). The results shown in Fig. 3 and Fig. 4(a) can represent the human flow passing through the node.
The human flow of each city node has great spatial heterogeneity ( Fig. 4(a)). Among them, Shenzhen has much more human flow than other cities, Guangzhou and Dongguan are next, which are basically at the same level, and Foshan is slightly less than the above three cities. They have much more human flow than other cities and occupy 78.4% of the human flow in the network. It can indicate that a few cities dominate the human flow in the network.
We further analyzed distribution of the weighted degree centrality of nodes in the human flow network. By performing linear regression analysis on the logarithm of weighted degree centrality and its rank, as shown in Fig.  4(b), the two shows significant linear correlation with R 2 reaching 0.8376. This indicates that the weighted degree centrality of the cities conforms to a power law distribution, and the regional urban human flow network is a scale-free network with great heterogeneity (Analyzing the macro characteristics of human flow).

Micro characteristics of human flow network
Alter-based centrality measures the ability of cities to collect and disperse human flow in the network. The alterbased centrality of each city is calculated according to Eq.
(2). The results are shown in Fig. 5(a) and Table 3, where the natural breakpoint method was used to show its gradation. The first grade includes Dongguan (1.722), Guangzhou (1.891), Shenzhen (2.521), with alter-based centrality higher than 0.895. Located in the central area of the PRD, they serve as the political, economic and cultural The alter-based power reflects the node's ability to dominate the human flow in the network. The alter-based power of each city node is calculated according to Eq. (3), and the results are shown in Fig. 5(b) and Table 3. Alter-based power also shows a hierarchical structure. The first level includes Guangzhou, Dongguan and Shenzhen, whose alterbased power reaches more than 1.021. It is worth mentioning that the alter-based power of Shenzhen reaches 3.957, which is 1.47 times higher than that of Guangzhou and locates at an absolute leading position in the city network. This may be explained by the fact that Shenzhen is the most economically developed and high-tech city in the region, and also an important window connecting the region with home and abroad. The second level includes Foshan, with alter-based power of 1.021. The third level includes other cities, with alter-based power of 0.521 or less.
It is worth mentioning that Zhanjiang City, which is located in the westernmost Guangdong with the fewest bordering cities, is ranked fifth in terms of both alter-based centrality (0.25) and alter-based power (0.521). It shows strong centrality and power comparing to the other cities in western Guangdong, like Maoming, Yangjiang, and Yunfu. This may be due to that Zhanjiang is closely connected to the core of the PRD (Fig. 3) where the human mobility is active, thus having a strong centrality relative to other peripheral cities. Zhanjiang city itself not only has a certain ability to collect and disperse resources, but also has the potential to drive other cities in Western Guangdong to accelerate their development.

Evaluation of the improved human flow estimation model
Following the method described in 3.2, we constructed the improved logarithm model of human flow. The model passed the F-test with the goodness of fit reached 0.914 and the p-value of each variable less than 0.01, indicating that the improved model was more suitable for describing the human flow in the study area. By analyzing the regression parameters shown in Table 4, we can see that the independent variables all pass the significance test and there is no collinearity. This indicates that the added variables can largely explain the amount of intercity human flow and the selection of independent variables is reasonable.
Furthermore, the proposed model was validated using the test data set. The result shows that there is a significant linear correlation between the prediction results and the reference data (Fig. 6), with an R 2 value of 0.9066. The high prediction accuracy further proves that the constructed model is reasonable.

Analysis on key influencing factors of human flow
Using the proposed model, we further analyzed the influences of different factors on human flow.
The coefficient and standardized coefficient for all independent variables are in Table 4. The standardized coefficient can eliminate the effect of different units of the independent variables to characterize the contribution of each independent variable. The larger the standardized coefficient, the more constructive it is for the dependent variable.
Among the indicators of individual characteristics, two of them passed the significance test, including the total employed population and the employed population in tertiary sector of the two cities. Specifically, the employed population in the tertiary sector of the two cities is the most constructive in this model. The larger the employed population in the tertiary sector, the stronger the human flow. It can therefore concluded that peripheral cities should develop the tertiary industry and create more tertiary employment opportunities, strengthening the human flow with other cities.
Among the indicators of links between cities, distance passed the significance test. The intercity human flow is negatively associated with the intercity distance. The shorter the distance, the stronger the human flow. That indicates that physical distance still has great impact on human flow. Peripheral cities in this region should, therefore, improve the transportation system to form a close connection with the core cities, hence reducing the development hindrance caused by the distance effect.
In addition to distance, the indicator of internet attention between two cities also passed the significance test. Internet attention shows a positive correlation with human flow strength. The higher the intercity internet attention, the stronger the human flow. This indicates the dissemination of network information may affect users' intercity interaction behavior in the information era. Users' internet attention to a city may reflect their willingness to travel to that city to a certain extent. Meanwhile, intercity internet attention, as the only data that changes over time in the improved model, plays an important role in estimating the amount of monthly-scale human flow. Peripheral cities could provide diversified information in cyberspace to improve their visibility online, which may draw users' attention to them inspire travel intentions and attract human flow in turn.
As for the index of difference between cities, three influencing factors, namely the differences of residential population, tertiary industry employment, and GDP between two cities, are found to be significant. Specifically, the difference of residential population shows a negative correlation with the human flow, which means that cities with similar population size are strongly connected, and weakly connected otherwise; The difference of employed people in the tertiary sector shows a positive correlation with the human flow. This means cities with greater differences in the number of people employed in the tertiary sector are relatively more connected, while cities with similar tertiary sector scales have weak human flows among them. Finally, the difference between the two cities' GDP shows a positive correlation with the human flow. However, the difference of GDP has little impact on the human flow between two cities. The government should encourage companies to settle in the peripheral cities, providing more work opportunities in the tertiary sector. Meanwhile, initiating attractive talent introduction programs and household registration policies may also help to improve the human flow.

Discussion
Currently, spatial interaction models are widely used in the study of human mobility. Among them, the classical gravity model is still used today and explains the human flow well in various studies. Although the gravity model has been confirmed in many scenarios, relatively few factors are considered, and whether it is applicable to portraying the human flow characteristics of the current study area remains to be explored.  There are unknown damping parameters in the gravitational model, which need to be obtained by tuning the parameters. In this paper, the tuning parameter range is set to [0,0.01,2], and the final parameter is determined to be 0.472. In addition, this paper compares the human flow calculated based on the gravity model with the human flow volume obtained from cell phone location data, calculates R 2 , and obtains the R 2 = 0.77, which means there are significant differences between the human flow estimated by the gravity model and the observed human flow. Compared with the classical gravity model, the improved model works better (R 2 = 0.914). So, using only three factors is not sufficient to explain the crowd interaction.
In the analysis of the factors influencing the human flow, the development of the tertiary industry plays an important role in it (Analysis on key influencing factors of human flow). The following is a speculation on the intrinsic reason of it: the tertiary industry is mostly laborintensive. Unlike the primary and secondary industries, which are limited by the nature of work and have relatively fixed workplaces, the tertiary industry has high employment elasticity and absorbs more labor forces, which will form a strong human flow. In addition, the tertiary industry includes transportation and communication services. When a city has more people employed in the tertiary industry, the city's transportation and communication services may be more developed, which can promote cross-city economic development and inter-city population movement.
After explaining the mechanism of abortion, we can try to give the following suggestion (Analysis on key influencing factors of human flow). The cities with developed tertiary industry in the region should create tertiary industry jobs that are needed by both core cities and peripheral cities, and increase the population of tertiary industry employment while expanding the urban population size, so that the connection between peripheral cities and core cities can be enhanced; in addition, the government should develop transportation infrastructure construction to reduce the cost of inter-city travel, so that the interaction of people between cities can increase and weaken the geographical limitation. At the same time, relevant departments should pay attention to the interaction of cities in the information network, improve the image of cities, and increase their internet attention, so as to strengthen the human flow. The government should exert its macro-control ability to promote close collaboration and linkage development in order to achieve regional integration.
There are still some limitations in this study. The model constructed in this paper performs well at the monthly scale, exploration of its estimation effect at a finer time scale is inadequate, due to the data limits. Besides, the parameters estimated in the model are based on the data obtained only in our study area, whether they are applicable in other regions needs further verification.
In future research, the relationship between internet attention and human flow can be further explored to build a more accurate human mobility model.

Conclusions
In this paper, we estimated the amount of human flow among 21 cities of the Guangdong Province based on mobile phone positioning data, performed an almost unbiased and accurate analysis on the spatial structure of the human mobility network, and analyzed factors that may influence formation of this spatial structure. In addition to commonly used indicators, we further discussed the factors of internet attention and intercity level. The main conclusions are listed as follows: First, cell phone location data shows intercity human flow in the Guangdong Province is a significant sale-free network, which means few nodes control most of the human flow. Cities in the PRD account for most of the human flow. The human flow network is, therefore, much denser within the PRD than in the peripheral regions; Second, the improved alter-based centrality and power can highlight important nodes and edges of the network. Zhanjiang is the only peripheral city that has a close relationship with the PRD. As a result of alterbased centrality and power, Zhanjiang can be distinguished from other peripheral cities; Besides, the improved human mobility model is better than the classical model. The model accuracy can be significantly improved by considering individual characteristics, links, and differences between cities. It is worth mentioning that daily intercity network attention enables the model to estimate the monthly human flow in the region with high accuracy; Finally, the combination of city employment, tertiary industry employment, distance, internet attention, inter-city population difference, tertiary industry employment difference, and GDP difference can explain 91% of inter-city human flow. The number of people employed in the tertiary sector and the internet attention are two important driving factors that have not been noticed in previous studies.
Abbreviations SAR: special administration region; GDP: gross domestic product; PRD: Pearl River Delta