1 Introduction

The development of high-precision location tracking technologies like global positioning systems (GPS) and radio frequency identification (RFID) tags, together with the reduction of data transmission and data storage costs, have generated a massive amount of data with a high spatiotemporal resolution (Chen et al., 2019; Jeong et al., 2019). Efforts have been made to mine this enormous amount of data for travel behavior extraction and movement pattern identification, such as trajectory comparison and classification, cluster detection, point of interest recommendations, etc. (Jeong et al., 2019; Zhang et al., 2011). From these big geodata, it is possible to extract both individual and aggregated mobility information. Individual mobility information is the information about how individuals travel in space and time (Yuan & Raubal 2016; Liu et al., 2009). This information can be used to capture the trajectories, activity range, and dynamic patterns of human mobility from different spatial and temporal scales, while also delving into the interconnectedness of individuals and their surroundings. On the other hand, aggregated human mobility information provides the mobility information of the population of an area over time (Yuan & Raubal 2016; Liu et al., 2009). Aggregated human mobility information can be used to understand the dynamic and complex urban system and activities in different parts of a city over time. Both individual and aggregated human movement pattern identification is important to efficiently detect the overall human movement patterns in space and time.

Moreover, geospatial big data with space and time tags provide information about what happens in an area and when an event occurs (Batty, 2013). The importance of geospatial big data in studying human mobility patterns has been widely recognized (Kwan, 2016; Liu et al., 2015a, 2015b). Identifying human movement patterns in an urban area is important for location-based recommendations (Hao et al., 2010), urban and transport planning (Abbasi et al., 2015; Rashidi et al., 2017), understanding the spread of infectious disease (Eubank et al., 2004), and performance and resource management in mobile communication (Song et al., 2010). Attempts have been made to quantify human travel patterns by using the gravitational model (Erlander & Stewart, 1990; Zipf, 1946), the exponential law (Liang et al., 2013), the radiation model (Simini et al., 2012), the random walk model, and levy flights (Brockmann et al., 2006).

There is a knowledge gap because very few studies have attempted to identify the effect of scale size on extracting individual travel information or aggregated urban-level travel patterns from movement data. Different sizes of scale may change the interpretation of aggregated urban mobilities. In this study, we attempt to fill the existing knowledge gap by identifying the effect of different scale sizes on geospatial big data when they are used to identify human movement patterns in an urban area. Also, most previous studies on the impact of scales focused on the measurement scale, which is the smallest distinguishable portion of a study (aka the resolution of studies) (Tobler, 1988). More specifically, to analyze and visually interpret the results from location-enabled data, the study area is partitioned into discrete, non-overlapping spatial units. These spatial units are the measurement scale of a study (Mennis, 2019). Different sizes of those spatial units cause different results for the same data and the same study area. However, in urban studies, it is also important to consider how the scale of the study area may cause different interpretations when aggregating human activities, such as the number of visitors. This type of scale is referred to as the observational scale, indicating the spatial extent or the actual coverage of the study (Lam et al., 2004). For example, a study may reveal different urban patterns if it only considers the central business area versus considering the entire city. However, the impact of the observational scale is not studied sufficiently when extracting urban mobility from big geodata (Chen et al., 2019; Lam et al., 2004). When talking about spatial scales in this study, we are referring to the observational scale. Our motivation is to provide a quantitative study of how observational scale impacts the extraction of urban patterns based on texture analysis.

To identify the effect of scale on aggregated human movement patterns of an urban area, we apply ‘texture analysis’ in this research. Texture analysis is a widely used technique in the field of remote sensing and image processing (Armi & Fekri-Ershad, 2019; Hall-Beyer, 2017b; Pathak & Barooah, 2013). Texture analysis quantitatively describes the relationships of digital number (DN) values of neighboring pixels of an image, which is used for per-pixel classification (Hall-Beyer, 2017a). In recent studies, the term ‘social sensing’ is used to describe human capabilities to act as sensors to capture spatial interactions and place semantics (Liu et al., 2015a, 2015b). It is often compared to remote sensing. Remote sensing imagery captures physical information about a place through sensors but cannot effectively capture information about human activity space and movement patterns. To address this gap, ‘social sensing’ can be used where humans act as active sensors to capture socioeconomic behavior and movement dynamics (Liu et al., 2015a, 2015b). Due to the similarity between social sensing and remote sensing, we plan to use an image-based method, called texture analysis algorithm, on social sensing datasets (i.e., location-enabled vehicle trajectory data) to identify human movement patterns from an urban environment.

Although traditionally texture analysis is a non-spatial method, we use it to extract spatial patterns for various reasons. First, after aggregating activity counts, each unit in the study area is similar to a pixel in an image, and the movement patterns of urban activities can be represented as spatial textures, capturing variations in density, intensity, and distribution of activities across different locations in the urban area. Second, texture analysis allows researchers to explore spatial contextual information in the urban environment. By considering not only the movements themselves but also the spatial relationships between different areas, we can uncover patterns that might be missed using traditional statistical methods. Third, this approach enables the identification of asymmetric urban mobility patterns, which will be explained in detail in Sect. 4.2. This offers insights into areas of intense or sparse urban engagement that can inform urban planning and resource allocation. Fourth, researchers can also use the results from spatial texture analysis to facilitate the detection of anomalies and shifts in urban activity patterns. Sudden changes in textures can signify emerging trends, such as shifts in commercial centers or the impact of new developments. Hence, it can be used to complement traditional spatial analysis methods by providing an extra layer of information on the spatial arrangement of human activities. However, it is worth noting that texture analysis results may not be easy to interpret by themselves, so it is often necessary to identify the landmarks within various zones of the study area to comprehend the human movement patterns captured by the texture analysis. Our research question for this study is to examine how observational scales affect the aggregated travel patterns extracted from geospatial big data in an urban area.

2 Literature review

2.1 Modeling urban mobility

Human movement patterns have been studied since ancient times, but the systematic study of human movement patterns started after the technological advancement that began with the industrial revolution (Batty, 2018; Yuan & Raubal, 2016). With the availability of automobiles, humans started to move farther and faster, and movement patterns also changed as people were no longer restricted by how far they could walk (Batty, 2018). The British Army provided one of the initial depictions of human movement patterns using a survey of all traffic movements in Dublin in the 1930s (Atack, 2013; Batty, 2018). For many years, surveys were the primary source of studying human movement patterns (Batty, 2018; Chen et al., 2016). However, in recent years, because of the rapid development of GPS and sensing technologies, communication and data storage capacities, and data mining techniques, the amount of extractable information on transportation and mobility patterns has increased significantly (Marjani et al., 2017; Miller & Shaw, 2015). This large amount of data opened the opportunity to study human movement patterns more effectively and efficiently (Anda et al., 2017; Tranos & Mack, 2019).

Human movement studies aim to predict the mobility patterns of people (and devices), which can provide valuable input to location-based recommendations (Hao et al., 2010), urban and transport planning, infectious disease prevention (Eubank et al., 2004; Yang et al., 2023), resource management in mobile communication (Song et al., 2010) and social network formation (Cho et al., 2011). Therefore, it is crucial to use statistical/mathematical models and measures to improve our knowledge about individuals’ travel behavior and overall movement patterns (Brockmann et al., 2006; Yuan et al., 2017).

In practice, human movements were first recorded at the individual level (Batty, 2018). Then studies showed that aggregated patterns produced by these individual movements (e.g., individual trajectories) could also be used to discover the overall patterns of an urban area (Batty, 2018). Different datasets have been used to identify the movement pattern of an urban area in previous studies. GPS-enabled taxi trajectory data have some advantages over other spatiotemporal data sources (e.g., travel survey data, mobile phone network data, Bluetooth data, and social media data) (Mao et al., 2016; Yuan et al., 2017). Broader areas can be covered with a lower cost because of the development of GPS technologies. Moreover, GPS-enabled taxi data can be collected at night, on weekends, and even holidays because taxi services are usually open 24/7 on all days of the year (Mao et al., 2016). This provides significant temporal coverage which is rare for other survey sources. The taxi trajectory dataset has a finer spatial and temporal resolution compared with other big data sources like the call detailed records data from mobile network data or social media check-in data. Other than that, taxi data are often collected at shorter time intervals (e.g., seconds) than other big data sources (Mao et al., 2016). All trips and their pickup and drop-off locations can be easily and precisely identified from the GPS data, and these trips often cover the entire urban area (Mao et al., 2016). For the aforementioned reasons, we have chosen to use taxi trajectory data to study urban mobility patterns.

Previous researchers used taxi trajectory data to combine the location of origins and destinations to identify information such as trip purpose, trip length, trip duration, departure and arrival times and travel modes, hotspots, travel behavior, land use class identification, and traffic prediction (Castro et al., 2013; X. Liu et al., 2015a, 2015b; Zhang et al., 2017), but very few studies have attempted to identify the overall travel patterns of an urban area and how different scales of aggregation can change the travel patterns. In this research, we attempt to fill this gap by identifying urban-level mobility patterns from taxi trajectory data and also by showing the changes in those mobility patterns for the change of scale size.

2.2 Effect of spatial scale on human movement pattern

It is important to understand different scale types to identify the effect of scale on spatial data. In literature, there are four common uses of scale in geography (Lam & Quattrochi, 1992; Lam et al., 2004; Zhang et al., 2014): cartographic, observational, measurement, and operational. First, the cartographic scale is used as a reference to display features on a map that are smaller than the real world to show more area. In cartography, the scale can be defined as the relationship between the size of features on a map and their real size on the ground (Lam et al., 2004). Second, the observational or geographic scale usually refers to the spatial extent or the areal coverage of the study (Lam et al., 2004). A large-scale study covers a larger study area, and a small-scale study covers a smaller study area under this category. Third, the measurement scale, which is also known as resolution, refers to the smallest distinguishable portion of a study (Tobler, 1988). For example, pixels are the smallest distinguishable part of remote sensing imagery and for vector data, it is the minimum mapping unit (e.g., polygon) (Wu & Li, 2006). The input data and results of an analysis vary depending on the size (finer or coarser) and shape (circular, square, rectangular, hexagonal, etc.) of the areal unit of analysis. The effect of spatial scale over geospatial data is well known as the Modifiable Areal Unit Problem (MAUP) (Openshaw, 1984). Finally, the operational scale usually refers to the spatial operational extent of certain processes in the environment. It is applicable to both natural and socio-political processes. This is the scale at which the maximum variability of a pattern can be captured. Finding the operational scale is crucial for a study as it helps to determine the size of the spatial extent (observational scale) and the resolution (measurement scale), which will identify the variation or characteristics of the patterns. It is important to identify the underlying processes of a study (Lam et al., 2004).

Previous studies show that when analyses are done using extensive data, the results are more scale-sensitive than using intensive data (Chen et al., 2019). For intensive data (e.g., temperature or elevation), the values do not change with the size of the analysis unit. Usually, the mean value does not change when the analysis unit changes for intensive data, but the value of variance drops when the unit of analysis gets larger (Chen et al., 2019; Ruddell & Wentz, 2009). On the other hand, for extensive data (e.g., social media check-ins), the mean value changes with the change of analysis unit, and the variance value may increase or decrease with a larger analysis unit. Therefore, extensive data are more sensitive to scale than intensive data (Chen et al., 2019).

Previous studies also show that a single spatial scale fails to demonstrate the spatial patterns of an area (Chen et al., 2019; Zhang et al., 2018). Researchers sometimes adopt multi-scale models to address this issue (Manley, 2006; Stone, 1972). Multiscale approach is highly computationally expensive when it is used to find spatial patterns (Chen et al., 2019). Researchers often choose one spatial scale to identify spatial patterns (Liu et al., 2012; Pei et al., 2014). Different sizes of regular grids are mostly used for urban studies (Chen et al., 2019). In previous studies, researchers have used different sizes of regular grids, such as 200 m (Pei et al., 2014), 250 m (X. Liu et al., 2015a, 2015b), 500 m (Liu et al., 2016), and 1,000 m (Liu et al., 2012) for mobility analysis. However, there is insufficient research on optimizing the spatial scale for studies in urban areas (Chen et al., 2019). In remote sensing, this issue is studied to find the proper spatial resolutions for image processing (Atkinson and Aplin 2004, Chen et al., 2019) and different statistical measures have been applied to solve this problem (Atkinson & Curran, 1997; Chen et al., 2019; Woodcock et al., 1988). In landscape ecology, researchers try to identify heterogeneous spatial patterns (Levin, 1992; Tortini et al., 2015), and scale plays an important role in this process. The term ‘observational scale’ is used in landscape ecology to identify the spatial dimension of different patterns within a landscape (Dabiri & Blaschke, 2019).

We can see from previous studies that there is no appropriate scale size on which researchers agree, and the resolution of scale may vary depending on data and research questions. Very few studies have attempted to identify the effects of different scale sizes on urban-level human mobility patterns. Our research here attempts to quantify the effect of different scale sizes on mobility pattern studies in an urban area. We examine how human mobility patterns change when we change the observational scale. We divide the study area into grids and then use different window sizes over those grids to identify changes in human movement patterns. We do not change the grid size (measurement scale) in this study, but instead, we change the window size (observational scale), which is also closely related to the operational scale of human activity patterns in an urban area. By changing the observational scale, we attempt to have a better understanding of the operational scale of human movement patterns of an urban area. In the rest of the paper when we mention change of scale size, it refers to the change of observation scale or change of the window size.

2.3 Texture analysis and its application

Texture analysis is a well-established technique in the field of image processing and analysis (Hall-Beyer, 2017b; Pathak & Barooah, 2013). In the literature, texture is identified by either a tangible feel or a visual inspection (Manjunath & Ma, 1996). While analyzing an image, texture can indicate the spatial variation of brightness values of the pixels (Armi & Fekri-Ershad, 2019). Analysis of texture is essential in modern computer vision applications such as object detection, pattern recognition, and medical image analysis (Armi & Fekri-Ershad, 2019). In previous literature, we see four different kinds of texture analysis methods: statistical, structural, model-based, and transform-based methods (Armi & Fekri-Ershad, 2019). There is a lack of studies that apply texture analysis to fields other than image analysis. Because texture analysis identifies the spatial variation of surface structures and patterns (Pathak & Barooah, 2013), it has the potential to contribute to human mobility studies.

Grey Level Co-occurrence Matrix (GLCM) is a statistical method for texture analysis. In remote sensing, GLCM is a computation of how often different combinations of pixel brightness values (grey levels) occur in an image (Hall-Beyer, 2017b). GLCM computes a co-occurrence matrix that is based on similar values to neighboring cells. The results depend highly on the chosen neighboring cell. Various statistical measures are then applied to the occurrence matrix. When similar values are close to each other, it provides low texture values. Some of the commonly used indicators in texture analysis include Homogeneity, Contrast, Dissimilarity, GLCM Mean, GLCM Variance, Entropy, Angular Second Moment, and GLCM Correlation. Both first-order and second-order statistics are used for texture analysis. When statistical measures are applied to the raw image data (DN values), they are called first-order measures. When statistical measures are applied to the data that are derived from the raw values, they are called second-order measures. For example, the co-occurrence matrix provides information about the relationship of the neighboring pixel values, which is why these measures are called second-order statistical measures. GLCM texture analysis is mostly used in image analysis. Very few studies used either first-order texture measures or second-order GLCM texture measures to study geospatial processes (e.g., movement patterns and socioeconomic phenomena over space).

The emergence of big geospatial data enabled us to use data-driven methods to identify human movement patterns without preconceived assumptions and theories. Different statistical (Gao, 2015; Huang et al., 2018; Toole et al., 2015) and machine learning (Luca et al., 2021; Toch et al., 2019; Wang et al., 2019) measures have been used to identify human movement patterns. But in this study, we use the ‘texture analysis’ method to identify human movement patterns. Although texture analysis is used in the field of remote sensing and image analysis, it has not been used in human movement studies. We use a remote sensing algorithm in the human movement pattern study because human movement pattern identification falls under the broad topic of ‘social sensing,’ and social sensing is comparable to remote sensing. The term ‘social sensing’ was used to demonstrate the ability of big geospatial data to understand the socio-economic environment (Liu et al., 2015a, 2015b). It can be compared to ‘remote sensing’, a well-established domain in GIScience. Remote sensing can identify the physical features of the earth’s surface, but it is not capable of capturing the socio-economic characteristics of people in the study area. In social sensing, each individual can play the role of a sensor (Liu et al., 2015a, 2015b) and capture the socio-economic information, human movement patterns, or interaction of different parts of a city, all of which are beyond the scope of traditional remote sensing. In that way, social sensing can complement remote sensing to capture the complete picture of a study area. Studying human movement patterns helps us to understand human behavior and its relationship with the built environment (Xu et al., 2016).

Texture analysis uses a set of first-order and second-order statistical measures to identify surface patterns, and this method is also well supported by existing literature. When it comes to studying human mobility data, it is possible to aggregate mobility counts into grid cells so results can be considered an image surface. Each cell of the grid shows the aggregated information of that area, which is analogous to image cells. This is especially true for studying urban-level patterns. Therefore, texture analysis is a potential method to identify human mobility patterns from big geodata. For this paper, we have used texture analysis to identify mobility patterns in an urban area.

As mentioned in Sect. 1, previous studies on human movement patterns have conducted limited studies on the crucial impact of observational scale on their results. In this study, we address this issue by quantifying the effects of observational scale on movement patterns. Texture analysis offers valuable insights into the density, intensity, and distribution of activities at different scales. Understanding these dynamics is vital for informed urban planning and design.

3 Methods

3.1 Study area

To identify the effect of different scale sizes on aggregated travel patterns, we use a taxi trajectory dataset from Nanjing, China. Nanjing is the capital of Jiangsu province of the People’s Republic of China and the second-largest city in the Yangtze River Delta after Shanghai (Fig. 1). It has 9.31 million permanent residents (National Bureau of Statistics of China, 2018) and an area of 6,600 square kilometers. The Nanjing administrative area contains 11 districts. Nanjing has served as the capital of various Chinese dynasties, kingdoms, and republican governments dating from the third century to 1949 (Nanjing Municipal People’s Government, 2012). It has long been a major center of culture, education, research, politics, economy, and transport networks in China.

Fig. 1
figure 1

The location of Nanjing in China

The study area is divided into 1 km-by-1 km grids. The number of pick-ups and drop-offs are aggregated for each grid of the study area. This study includes only the grids that have at least ten pick-up points per day. The rectangle bounding box shows the selected study area (Fig. 2). The grids are situated in the middle of the city (the downtown area), which has more taxi trips than other parts of the city. The final study area consists of 35 rows and 31 columns.

Fig. 2
figure 2

The selected study area of Nanjing City (more than 10 pick-up points per day)

3.2 Data

We use taxi trajectory data to identify the effect of scale on aggregated urban-level travel patterns. Taxi data is useful for identifying travel patterns in a city because taxis do not follow a fixed route like other public transportation modes (buses, trains, ferries, etc.). The dataset we use in this study is taxi trajectory data collected from June 4, 2010, to June 10, 2010 (5 weekdays and 2 weekend days) in Nanjing, China. The data contain approximately 135 million GPS locations from 7,194 vehicles. The dataset has the following attributes: ID, Vehicle ID, GPSTime, GPSLongtitude, GPSLatitude, GPSSpeed, GPSDirection, and PassengerState. The ID and VehicleID fields are unique identifiers for records and taxis, respectively. GPSTime represents the date and time for each record. GPSLongtitude and GPSLatitude represent the longitude and latitude of the taxi. Speed represents the speed of the taxi at a given time. GPSDirection is a horizontal angle measured clockwise from a north-facing baseline. PassengerState is a Boolean variable that represents whether the taxi is carrying passengers. The value “0” denotes that no passenger is in the taxi, and “1” represents that the taxi is occupied. The sampling resolution is between 15 to 30 s. On average, there are 4,666 records per day per taxi. Table 1 provides a sample record of our dataset.

Table 1 A sample taxi record

3.3 Texture analysis

Texture analysis is used in image processing. The texture is a function of spatial variation of the brightness intensity of the pixels. GLCM (Gray-Level Co-occurrence Matrix) is one of the methods of texture analysis frequently used in remote sensing. The GLCM functions characterize the texture of an image by calculating how often pairs of pixels with specific values and in a specified spatial relationship occur in an image. It creates a GLCM and then extracts statistical measures from this matrix. As GLCM texture analysis identifies local variation within an image, it can be used to identify spatial patterns. We use both first-order and second-order GLCM statistical measures for examining the texture and identifying the spatial relationship among the cells of the study area. We then calculate the statistical measures for different window sizes. Each window has a starting point and size associated with it. For example, starting point 0,0 means that the window will start from row number 0 and column number 0 (the top left corner of the study area). The window size 5 means 5 rows and 5 columns from the starting point of the window. The window is moved over the whole study area (moving window). We perform this analysis from all four corners of the study area with different window sizes to capture how the GLCM texture indicators change with different window sizes. This demonstrates how different scale sizes show different movement patterns in the study area.

In remote sensing, first-order texture measures are statistics calculated from the original image values and do not consider pixel relationships (Hall-Beyer, 2017a, 2017b). In this study, first-order statistical measures show the characteristics and changes in pick-up or drop-off numbers for different window sizes. We use five indicators to measure first-order statistical measures. The indicators are: Mean, Standard Deviation, Skew, Entropy, and Simpson’s Diversity. They are described in Table 2.

Table 2 First-order indicators

In Table 2, the metrics are calculated based on all of the observations within a window, i = 1 to n, where n is the total number of pick-up or drop-off cases and i = observed value of pick-ups or drop-offs. \({p}_{i}\) is the probability of i occurring in a window (number of points with value i/total number of points), and µ = the mean of the observed values.

Second-order statistical measures consider the relationship between groups of two pixels in the original image (Hall-Beyer, 2017a, 2017b) in remote sensing. We use second-order measures to examine the spatial relationship of pick-up or drop-off number values in each window. The indicators are: Dissimilarity, Contrast, Homogeneity, ASM (Angular Second Moment), Energy, and Correlation (Table 3).

Table 3 Second-order GLCM indicators

In Table 3i is the row number, and j is the column number. Pi,j is the probability value recorded for the cell i, j. µ is the mean and σ is the standard deviation of the values (pick-ups or drop-offs) of cells within a window.

The first-order and second-order texture measures help identify the differences in aggregated travel patterns for different window sizes. We calculate the texture measures for 5 × 5 to 25 × 25 window sizes. The different results due to different window sizes provide the changes in aggregated travel pattern extraction for different scale sizes. The grid size by which the study area has been divided (1 km x 1 km in this case) may also have an impact on the final results, but this issue is beyond the scope of this study. We use a 1 km x 1 km grid size to focus on the impact of different window sizes on the metrics.

4 Results and Findings

The first- and second-order texture analysis indicators (Tables 2 and 3) showed different results with the change of window sizes as they captured different patterns in the study area. The same indicators also showed distinct patterns when they captured weekday vs. weekend movement patterns, as well as when measured from different corners of the study area. We displayed different patterns captured by both first- and second-order indicators and tried to explain the potential reasons for different patterns in the following sub-sections.

4.1 Weekday and Weekend patterns

First, we wanted to determine if the indicators captured different travel patterns during weekdays and weekends in the study area (Figs. 3 and 4).

  • From Fig. 3, we saw that most of the first-order indicators (except for Simpson’s Diversity) reached a peak at a smaller window size on weekdays than on weekends. This may be because there were some landmarks and facilities in the southern and western parts of the downtown area (situated in the middle of the study area) that attracted more traffic during the weekends. A larger window size was required to capture those areas on weekends because windows were generated from the top left (northeast) corner of the study area. On the other hand, taxi pick-ups were more concentrated in the northern and middle section of the study area (downtown area of Nanjing) during weekdays. For weekdays, a relatively smaller window size sufficed to capture most of the taxi pick-ups. The first-order indicator, Mean, reached its peak when the window size was around 21 on weekdays and around 25 on weekends as shown in Fig. 3a. We saw in Fig. 4 that when the window size was 21 by 21, it covered some of the downtown areas of Nanjing from the top left corner. The peak Mean value at window size 21 on weekdays indicated that the overall number of taxi pick-up values decreased if the window size was higher or lower than 21 on weekdays. When zooming into the area confined by the 21 by 21 window, we noticed that most of the universities (e.g., Nanjing University, Nanjing University of Information Science and Technology, South East University, Nanjing University of Aeronautics and Astronautics), commercial buildings in the Xinjiekou area (e.g., Zhejiang Xingye Bank, Bank of Communications, the Central Shopping Center), parks (e.g. Mochou lake park, Xuanwu lake park), public attractions, (e.g., the Presidential palace) and organization headquarters (e.g., the headquarter of Eastern Theater Command) were located in that area as shown in Fig. 5. Located in the central business district (CBD) of Nanjing, Xinjiekou has many financial institutions, consulting firms, law offices, accounting firms, and corporate headquarters, including 80 business enterprises (Beaumont, 2018). Those areas attracted more people during weekdays. On the other hand, some areas attracted more people on the weekends (e.g., Yuhuatai Memorial Park of Revolutionary Martyrs, Nanjing Olympic Sports Center Stadium, and Purple Mountain scenic area), and those were situated beyond the area covered by the 21 by 21 window as shown in Fig. 6. Covering those areas required a bigger window size. For these areas that attracted more people on the weekends, the Mean value reached its peak when the window size was larger on the weekends.

  • Standard Deviation and Entropy reached their peak when the window size was around 19 on weekdays and around 23 on weekends. These indicators quantified the variability of taxi pickup values. However, on weekdays, Standard Deviation and Entropy reached their peak when the window size was 19. This possibly occurred because those window sizes included both high taxi pick-up numbers in the downtown area and comparatively low pick-up numbers in the suburban areas towards the northwest portion of the study area (i.e., on the opposite side of the Yangtze River from the downtown area). But on weekends, to capture the maximum differences in taxi pick-up areas, a larger window size was required. Because the commercial centers with office buildings located in the city center may have attracted fewer people on the weekends, a larger window size could have captured the variability when both the city center and suburb area have low taxi pick-up values.

  • Simpson’s Diversity increased with the increase of the window size for both weekdays and weekends. That meant a larger window size increased the probability of taxi pick-up values being different. The change of Simpson’s Diversity with window size showed similar patterns on both weekdays and weekends. This indicated that the taxi pick-up values in the cells of the study area were quite different, and there were not many cells in the study area with the same taxi pick-up values. With the increase of the window size, more cells were included in the window with different values.

  • In Figs. 7b and d, we observed that both Contrast and Dissimilarity metrics reached their peak values at different window sizes, approximately 18 on weekdays and 23 on weekends. These metrics effectively captured the local variations in taxi pick-up numbers. When the window size was around 18 on weekdays, there was a high variability in taxi pick-ups. This is likely because the window size of 18 covered both areas with potentially high pick-up rates (such as downtown) and low pick-up areas (like the suburbs). Conversely, the highest variability in taxi pick-up numbers occured on weekends with a window size of 23. This is attributed to shifts in human movement, with more people heading south (to destinations like Yuhuatai Memorial Park and Nanjing Olympic Sports Center) and east (to the Purple Mountain Scenic Area). To capture both high and low pick-up areas in the northern and western suburbs, a larger window size was needed.

  • Figs. 7a, e, and f show that ASM, Energy, and Homogeneity metrics decreased as the window size increased, both on weekdays and weekends. These metrics were indicators of the uniformity of taxi pick-up numbers across the study area. As the window size increased, taxi pick-up values became more diverse. Interestingly, on weekdays, taxi pick-up patterns exhibited greater diversity than on weekends, even for the same window size.

  • Examining Fig. 7c, we noticed significant differences in correlation patterns for smaller window sizes (i.e., between 6 and 8) on weekdays and weekends. Correlation measures the linear dependence between neighboring cell values within a window. For window sizes between 6 and 8, the correlation was high on weekends but low on weekdays. This discrepancy can be attributed to the fact that during weekends, many cells in the top left side of the study area (suburbs) had low taxi pick-up numbers. Additionally, this window size covered a forested area south of China National Highway 104 (G104) in the northeastern part of the study area, which consistently recorded low taxi pick-ups. Conversely, on regular weekdays, some cells within the window size of 6 to 8 exhibited high taxi pick-ups in certain housing and business areas, but the forested areas had low taxi pick-up numbers. Hence, the correlation values were low on weekdays for window sizes between 6 to 8.

  • The correlation values began to increase significantly when the window size fell between 11 to 14 on weekdays and 13 to 16 on weekends. This phenomenon may have occurred because, as the window size exceeded 11, it started covering downtown areas beyond the Changjian River (Yangtze River). Furthermore, on weekdays, taxi pick-up numbers may have been higher in the downtown area, necessitating a slightly larger window size to encompass these high pick-up areas compared to weekends.

Fig. 3
figure 3

First-order indicator changes within different window sizes. (a) Mean, (b) Standard Deviation, (c) Entropy, (d) Skewness, (e) Simpson’s Diversity

Fig. 4
figure 4

Area captured with a 21 by 21 window size from the top left corner (green colored grids)

Fig. 5
figure 5

Some of the landmarks covered by a 21 by 21 window size

Fig. 6
figure 6

Some of the landmarks not covered by a 21 by 21 window size

Fig. 7
figure 7

Second-order indicator changes for different window sizes. (a) ASM, (b) Contrast, (c) Correlation, (d) Dissimilarity, (e) Energy, (f) Homogeneity

4.2 Calculating indicators from different starting locations

We thought the first-order and second-order texture indicators may be biased if we measured them only from the top left corner of the study area. Therefore, we decided to capture the changes in texture analysis indicators with window sizes from all four corners of the study area and then compared the results. For first-order texture indicators, we picked Mean, Standard Deviation, and Simpsons Diversity. For second-order texture indicators, we picked Correlation, Dissimilarity, and Homogeneity. We picked these indicators because they measured different aspects (e.g., similarity, variety, correlation) of taxi pick-up values within a window of the study area. Other indicators represented similar criteria of the study area and mostly similar patterns. The results of Mean, Standard Deviation, and Simpson Diversity from four different corners of the study area are described in the following paragraphs.

  • Figure 8a shows that the Mean value reached its peak at window size 19 when it was measured from the top left and bottom left corner of the study area. But the Mean value reached its peak at window size 23 when it was measured from the top right and bottom right corner of the study area. This result showed that the high pick-up value areas were not located in the middle of the study area and were instead a little closer to the west border. Figure 9 shows that the downtown area is located on the bank of the Yangtze River, which is on the west side of the study area.

  • Figure 8b shows that the Standard Deviation reached its peak at window size 16 when measured from the bottom left corner of the study area, but the Standard Deviation reached its peak at window size 19 when it was measured from the other three corners. It might have occurred because the areas with high pick-up values (downtown) were relatively closer to the bottom left corner. Standard Deviation measures the variability of pickup values within a window. The corner areas of the study areas were mostly suburban areas with low taxi pick-up values. Naturally, if we look at the corner of the study area that is closer to downtown, we can see that the Standard Deviation value reached its peak with a smaller window size. In this study area, that specific corner was the one located in the bottom left, which corresponded to the southwestern part of the study area. Figure 8c shows the Simpson’s Diversity measured from all four corners of the study area. The Figure shows that Simpson’s Diversity demonstrated a similar trend regardless from which corner it was measured. It meant the probability of pick-up values being random increased equally with the increase of window size from every corner of the study area and that the pick-up numbers in the cells of different windows of the study area were different from each other. As the window size increased and included more cells of the study area, the probability of taxi pick-up numbers being different also increased.

  • The Dissimilarity value reached its peak at window size 18 when measured from the top left (northwestern) corner of the study area as shown in Fig. 8d. When measured from the other three corners, the Dissimilarity value reached its peak at window size 21. Dissimilarity values identify the variations of taxi pick-up values of the cells within each window. Hence, the highest variability of the taxi pick-up numbers could be captured with a smaller window size from the top left corner of the study area. We can see from Fig. 9 that there was a forest area located south of China National Highway 104 (G104), which was in the top left corner of the study area. This area did not attract taxi pick-ups, though the surrounding areas of the forest included residential and business areas that could attract taxi pick-ups. This variability of taxi pick-up values occurred because of different land uses in the top left (northwestern) corner of the study area. This type of abrupt land use change typically occurs closer to the top left corner than other corners of the study area.

  • The Homogeneity value reached its bottom at window size 21 when measured from the bottom right corner of the study area as shown in Fig. 8f, but it reached its lowest point at window size 24 when measured from the other three corners. Homogeneity is the measure of uniformity in the study area. A high value denotes the similarity of pick-up values within a window. Therefore, maximum uniformity could be captured with a smaller window size when measured from the bottom right corner of the study area. The bottom right corner of the study area is a mostly residential area, but there are a few educational institutions (e.g., China Pharmaceutical University Jiangning Campus and Nanjing Institute of Technology) located in that area as well. With a small window size, the analysis captured similar taxi pick-up values in the same type of urban functional areas (e.g., residential), but if the window size increases, it may capture some part of the residential areas and some part of the educational instructions within the same window. Residential areas and educational institutions having different taxi pick-up values explains why there is maximum uniformity with smaller window sizes from the bottom right corner of the study area. The overall trend of Homogeneity and Dissimilarity was almost opposite to each other as one variable identified uniformity and the other identified variability.

  • When correlation was measured from all four corners of the study area, it showed an interesting pattern as demonstrated in Fig. 8e. Correlation followed a non-linear pattern whereas the other indicators showed an almost linear pattern when plotted against the window size. The taxi pick-up numbers of a cell did not increase or decrease if the taxi pick-up numbers of neighboring cells within a window increased or decreased. This phenomenon was true even if the window size changed. Correlation reached its lowest value at window size 7 when it was measured from the top left corner of the study area. However, when it was measured from the other three corners, it reached its lowest value around window size 10 to 11. This may be because the window captured the Nanjing University of Information Science & Technology in the top left corner of the study area with a window size of 7. This university attracted more people than the corner cells of the grid. Windows from the other corners of the study area did not capture infrastructure that was both located closer to the corner and attracted a large number of people. Therefore, the linear dependency among the cells decreased more slowly from the other corners than from the top left corner of the study area.

Fig. 8
figure 8

First and second order indicators changes from four different corners of the study area on weekdays. (a) Mean, (b) Standard Deviation, (c) Simpson’s diversity, (d) Dissimilarity, (e) Correlation, (f) Homogeneity

Fig. 9
figure 9

Top Left Corner of the study area

Overall, as depicted in Figs. 3, 7, and 8, the first- and second-order texture analyses demonstrated their capability in discerning human movement patterns within urban areas during both weekdays and weekends. Additionally, they quantified variations in these movement patterns across different spatial scales (window sizes). It is important to note that different landmarks within a city attract varying numbers of visitors at different times. For instance, on weekdays, offices and educational institutions witnessed higher foot traffic, whereas amusement parks, museums, and shopping centers tended to draw larger crowds on weekends. Furthermore, conducting texture analysis from different origin points within the city yielded diverse results at different spatial scales, contingent upon the city's structural composition, such as the location of downtown areas, parks, commercial zones, and educational institutions. Table 4 provides insights into how these findings can be leveraged for urban planning and policy development.

Table 4 Example texture analysis results and the implications for urban planning and policy making

5 Conclusion

In this study, we use texture analysis to identify human movement patterns in an urban area. The results show that both first- and second-order texture indicators vary with window sizes and the starting point of a measurement window. We can see that both first-order and second-order texture measures reach their peak for smaller window sizes on weekdays. The main reason for this is that more taxi trips happen close to the city center on weekdays since more infrastructure that attracts people is located in the city center and is usually open on weekdays. On weekends, there are not as many activities in the city center. For this, a larger window size is required to capture the variability of taxi pick-up counts from one of the corners of the study area. Moreover, the texture analysis indicators reach a peak for smaller window sizes when they are measured from the bottom left corner of the study area than the other three corners. That is because the city center generates a lot of traffic and is closer to the bottom left corner. We can see that the operational/observational scale used to identify human movement patterns plays a significant role in the result. Different window sizes provide different patterns for the same study area. Moreover, the result highly depends on which portion of the study area attracts more people on a specific day. The results may vary for different cities based on the locations of high-traffic areas within the city.

The results also show that texture analysis can identify the similarities, dissimilarities, and relationships among different parts of the study area. This study is repeatable for any urban area that has vehicular data with passenger pick-up and drop-off numbers and temporal information attached to the data. A multiscale study can be done (as provided in this research) to explore the changes of different indicators with different window sizes at the beginning of the study. This will help researchers select the scale of their research based on the research goal.

To sum up, this study makes significant contributions from both empirical and methodological perspectives, enhancing our understanding of urban human movement patterns under different scales. Empirically, the investigation of how changes in observation scale affect urban patterns provides valuable insights into the dynamics of urban mobility. By quantifying these effects using texture analysis, we discovered interesting relationships between urban spaces and movement behaviors with different spatial contexts. Methodologically, the use of texture analysis shows the effectiveness of an image processing approach in spatial analysis, which provides a more comprehensive and data-driven examination of urban movement patterns. This method can also be adapted to other similar datasets in the big data era, such as georeferenced mobile phone data and location-based social media data.

From a policy perspective, understanding human movement patterns within an urban area holds crucial implications for transportation planners and policy makers, impacting roadway design, transportation network efficiency, traffic congestion reduction, and travel safety enhancements. Moreover, these movement patterns play an important role in predicting and influencing future land use changes. However, the extraction of human movement patterns is highly dependent on the spatial scale used to aggregate the data, leading to significant variations in results. This study thoroughly explores these effects, providing valuable insights for planners and policy makers. Urban planners can leverage this knowledge to create more sustainable and well-informed development plans. Decision-makers can use the methods proposed in this study to determine the most suitable scale size for their specific urban area, thereby optimizing policy decisions based on human movement patterns to effectively shape a more efficient and sustainable urban landscape. For example, while planning for a new commercial area, planners usually avoid the areas that already attract a lot of traffic on weekdays. A new commercial area in a high-volume traffic zone will increase the traffic congestion on weekdays. This way, identifying the human movement patterns in an urban area helps planners and decision-makers make future plans.

There are some opportunities to improve this research in the future. One of the limitations of this research is we do not have the most recent data. This research is reproducible and can be performed on any urban area with floating car data that has passenger and temporal information attached to it. A comparative study can also be done when more recent data become available. This comparative study can reveal information about how different infrastructure changes in a city area affect the overall human movement patterns of that city. Moreover, we chose a 1 km-by-1 km grid size and applied different window sizes (5 by 5 to 25 by 25) to it. The results may vary if we change the underlying grid size (measurement scale). Different grid sizes can be applied to investigate how the end results may change in the future. In addition to regularly shaped grids, irregular shapes can also be used to see how the results would change. For example, in mobility studies using data that include irregularly distributed base stations (e.g., cell phone towers or Bluetooth receivers), researchers also use these base stations to create Thiessen polygons of the study area, and future studies can investigate how these irregular-shaped polygons may be impacted by the scale effect. The movement patterns identified in this research can be verified by comparing the results with other mobility studies in Nanjing, China. Overall, this research introduces a new method to identify human movement patterns in an urban area that considers the effect of different scales on movement patterns.