Modeling human activity dynamics: an object-class oriented space–time composite model based on social media and urban infrastructure data

Zhang, Zhe; Yin, Dandong; Virrantaus, Kirsi; Ye, Xinyue; Wang, Shaowen

doi:10.1007/s43762-021-00006-x

Modeling human activity dynamics: an object-class oriented space–time composite model based on social media and urban infrastructure data

Original Paper
Open access
Published: 06 May 2021

Volume 1, article number 7, (2021)
Cite this article

Download PDF

You have full access to this open access article

Computational Urban Science Aims and scope Submit manuscript

Modeling human activity dynamics: an object-class oriented space–time composite model based on social media and urban infrastructure data

Download PDF

Zhe Zhang ORCID: orcid.org/0000-0001-7108-182X¹,
Dandong Yin²,
Kirsi Virrantaus³,
Xinyue Ye⁴ &
…
Shaowen Wang²

2434 Accesses
8 Citations
Explore all metrics

Abstract

Modeling human activity dynamics is important for many application domains. However, there are problems inherent in modeling population information, since the number of people inside a given area can change dynamically over time. Here, a cyberGIS-enabled spatiotemporal population model is developed by combining Twitter data with urban infrastructure registry data to estimate human activity dynamics. This model is an object-class oriented space–time composite model, in which real-world phenomena are modeled as spatiotemporal objects, and people can move from one object to another over time. In this research, all spatiotemporal objects are aggregated into 14 spatiotemporal object classes, and all objects in a given space at different times can be projected down to a spatial plane to generate a common spatiotemporal map. A temporal weight matrix is derived from Twitter activity curves for each spatiotemporal object class and represents population dynamics for each object class at different hours of a day. Finally, model performance is evaluated by using a comparison to registered census data. This spatiotemporal human activity dynamics model was developed in a cyberGIS computing environment, which enables computational and data intensive problem solving. The results of this research can be used to support spatial decision-making in various application areas such as disaster management where population dynamics plays an important role.

E-ware: a big data system for the incremental discovery of spatio-temporal events from microblogs

Article 25 June 2022

Platial mobility: expanding place and mobility in GIS via platio-temporal representations and the mobilities paradigm

Article Open access 16 July 2022

Reading urban land use through spatio-temporal and content analysis of geotagged Twitter data

Article 18 February 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Human activity plays a vital role in understanding large-scale social dynamics (Nara, Tsou, Yang, & Huang, 2018; Zhang, Demšar, Rantala, & Virrantaus, 2014; Zhang, Rangsima, & Virrantaus 2010). There are several data sources available for modelling human activity and population dynamics. For instance, mobile geolocation data has been used in assessing the movement patterns of population (Bengtsson, Lu, Thorson, Garfield, & Schreeb, 2011; González, Hidalgo, & Barabási, 2008; Pedro, 2020). However, the major limitations arise due to privacy issues since mobile data is linked with users’ private information, including bank information, social network information, and home locations, which causes difficulties in obtaining mobile data for research purposes. In order to protect users’ privacy, the mobile data such as SafeGraph data (https://www.safegraph.com/) is only available at coarse spatial scales such as county level. Furthermore, mobile data has a low resolution, since mobile phone users’ locations are estimated relative to the nearest phone tower, which can be several kilometers away from a person’s actual location. Jiang, Ferreira, & González (2012) presented an analysis of individual activities based on travel surveys conducted in the Chicago metropolitan area from a representative population sample. Compared to other data sources, travel survey data has disadvantages due to the high cost, small sample size, and low update frequency. The spatial coverage of survey data is limited, since the spatial information is collected based on locations visited by participants, which may not cover the entire study area.

Digital footprints within urban environments have become increasingly accessible to researchers due to the massive amounts of geo-tagged information shared via social media platforms such as Twitter (Li, Chaudhary, & Zhang, 2020). These new data sources provide important information about population dynamics within a city, and how the population is distributed across the urban infrastructure. In this regard, analysis of spatiotemporal patterns within Twitter data shows a distinct relationship between users’ activities and urban infrastructure types (Soliman, Yin, Soltani, Padmanabhan, & Wang, 2015; Wakamiya, Lee, & Sumiya, 2011). Tweets (or blogs) can be viewed as “personal journals” that describe people’s lives to others by telling stories nearly in “real-time”. These platforms can be efficient ways to inform others where users have been, where they are, and where they are going. Zhao and Rosson (2009) used interviews to discuss various forms of social activity enacted through Twitter. The results showed that interviewees use Twitter for a variety of social purposes, such as keeping in touch with one’s friends, sharing interesting things to one’s social networks, gathering useful information for one’s professional or other personal interests, seeking help and others’ opinions, and releasing emotional stress. Nardi, Schiano, and Gumbrecht (2004) also stated, people blog to provide a record of events in their lives for keeping track of what they have been doing. Many of these social activities are strongly related to location information. People are more likely to tweet in places such as restaurants, hotels, leisure places, and sport centers, since they are doing activities in these locations. Twitter data can be considered to represent an individual’s temporal location inside a specific type of building, therefore human activity dynamics can be modeled by using the relative number of tweets at a certain location and time. For example, Lin and Cromley (2015) evaluated the effectiveness of Twitter data as single ancillary information combined with other control variables in areal interpolation of population. They found that using geo-located tweets to enhance the process of population disaggregation could help to map the urban population under the age of 65. The statistics showed that nearly 70% of the Finnish population aged 18 to 64 participated in social networks. According to Statista’s Digital Market Outlook forecast, the number of social media users in Finland is projected to exceed 3.1 million users in 2018 and increase annually thereafter. In the year 2019, 25% of Finns used Twitter several times a day, 18% of Finns were on Twitter every day, and 8% used the service once a month (Statista, 2020). However, there is still 30% of the Finnish population does not use social media, especially for the elderly and children. Therefore, modeling human dynamics for the building types such as daycare centers and elderly homes cannot be estimated only based on Twitter data since the likelihood of a representative number of people inside these places using Twitter is difficult to measure.

Urban infrastructure registry data has been used as another data source to model human dynamics (Zhang, Rangsima, & Virrantaus, 2010). In the Finnish urban infrastructure registry data, there is information about how many people registered each building as a home address, which can be used to estimate the maximum number of residents inside a residential building. Business information such as business type, company name, and a number of employees is included in the business information section of the Finnish urban infrastructure registry data. This study aims to explore whether social media data combined with urban infrastructure data can be used to cost-effectively assess human activity patterns across spatiotemporal scales for various built environment types. In particular, we aim to develop an object class-oriented space-time composite model to analyze human dynamics for different types of built environment. Finally, the model’s performance was evaluated by comparing the estimated population for the Helsinki Metropolitan area at six moments of time to the registered population dataset.

The rest of the article is organized as follows. Section 2 gives an overview of the foundations and methods relevant to the development of the spatio-temporal population model. The results and their associated validation methodologies are illustrated in Section 3. Sections 4 and 5 present the discussions and conclusions.

2 Methods

The research objective is to design and implement a spatio-temporal population model to analyze human dynamics for different types of buildings. This section introduces the study area, data processing techniques, and implementation methods.

2.1 Study area and data processing

The study area for this research is the Helsinki Metropolitan area, which includes the cities of Helsinki, Espoo, Kauniainen, and Vantaa (see Fig. 1). Two data sources were used in this research. SeutuCD is a high-quality cohesive municipal registry dataset that includes information about the population, city plans, buildings, companies, and their business information (Helsinki Region Environmental Services Authority, 2020). SeutuCD building registry data contains data for building location, use, and total floor area. The population and business registry datasets for this research work are not available due to data privacy concerns.

Another data source used in this research was Twitter data, collected through the Twitter API over 3 months (March–June 2017). A total of five million Tweets (~ 20 GB) were collected using the Twitter streaming API, which posed significant computationally intensive challenges for traditional GIS computing environments. The data were collected over southern Finland (covering the Helsinki area with a bounding box: 20.566° E - 32.542° E, 59.790° N - 63.656° N). Among these five million Tweets, 868,571 of them have geolocation information. Each tweet contains exact geospatial coordinates and a timestamp treated as the user’s activity at that point. A polygonal representation of the buildings in the study area was obtained via OpenStreetMap (OpenStreetMap contributors, 2020). Each point was assigned to its nearest polygon to map each geolocated tweet into the corresponding building polygon.

2.2 An object class-oriented space–time composite model

In this research, an object class-oriented space–time composite model was developed to model population dynamics. In Geographical Information Science (GIScience), real-world phenomena are represented by using spatial data models, where each real-world object is modeled as a spatial object. A spatial object refers to an object that contains a spatial domain (Longley, Goodchild, Maguire, & Rhind, 2005). Object-oriented spatiotemporal models have often been discussed in the literature (Frihida, Danielle, & Thériault, 2002; Huang & Peng, 2008; Peuquet, 1999). On a conceptual level, object-based models are sometimes called entity-based models because they focus on modeling real-world entities, where the entity type is an abstraction of a class of entities that are similar (Kjenstad, 2006). For example, a residential building is an entity with attributes like the building address or the number of people that live inside the building. A relationship is an association between entities; for example, “lives in” describes the relationship between the entities “person” and “building”. A relationship may have attributes; for example, time duration gives the length of time that a person has lived in a building. The object-oriented spatiotemporal model represents the world as a set of discrete spatiotemporal objects consisting of location, aspatial, and temporal components orthogonal in 2D space (Peuquet, 1999).

The space–time composite model (Langran & Chrisman, 1988) treats the entire system as a common unit map consisting of spatial objects (geometric shapes) in space at different times. In the space-time-composite model, a base map represents a region’s geometry and spatial topology at a starting time. Each change causes the changed portion of the map by overlaying time-stamped layers (e.g., snapshots) (Langran & Chrisman, 1988). In the space-time composite model, attribute changes are recorded at discrete times, which has a limitation to capture temporality among attributes across space (Yuan, 1999; Yuan 1996). A space–time composite model can be implemented using an object-oriented approach called object-oriented space-time composite model. Each time-stamped layer represents a group of spatial objects (object classes) with entity-based models and overlayed according to time changes. For instance, a real-world phenomenon can be represented as spatiotemporal objects, such as office buildings, hotels, and residential buildings. Population dynamics can be modeled as sets of movements from one spatiotemporal object to another over time. The object-oriented approach focuses on each individual object. It can be further developed into a class-oriented space-time composite data model where all spatiotemporal objects are aggregated into a set of object classes (Zhao, Shaw, & Wang, 2015). The object class-oriented space-time composite model conceptually describes the changes of a spatiotemporal object class through period of time, and attribute changes are recorded at discrete times. In this case, buildings can be aggregated into object classes, and each building class contains many objects or instances.

2.3 Use of object class-oriented space–time composite model for modeling human activity dynamics

Based on using the building attribute in the SeutuCD building register data, 14 object classes are formed: residential building, hotel, office building, industrial building, university, hospital, shop, restaurant, leisure place, sport center, railway station, school, daycare center, and eldercare center. This object class-oriented model is combined with a space–time composite model, so objects in space at different times can be projected down to a spatial plane to generate a common unit map.

Figure 2 illustrates an example of this projection for three objects classes. At the beginning, the maximum amount of people inside each building according to its building class type is estimated. In an optimal solution, the maximum amount of people inside each object class should be estimated by using registry data, such as SeuduCD’s population and business register data. In the population registry section, there is information about how many people registered each building as a home address, which can be used to estimate the maximum number of residents inside a residential building. In the business information section, there is information about each office building’s business type, company name, and its corresponding number of employees. However, SeuduCD population and business datasets were not available for this research work since this research was conducted outside Finland. Therefore, another population estimation method was developed.

Here, the maximum number of people inside each spatiotemporal object (a building) was roughly estimated by using the total floor area of a building divided by the average floor area used by one person. Table 1 illustrates the estimated average number of people for the shop object class, and average estimated floor area that one person uses for other object classes. According to the literature, the number of appointments for all health care centers and central hospitals in the Helsinki Metropolitan area was 1,184,742 for the year 2016. The average number of appointments for each health care facility can be estimated by using the total number of appointments divided by the number of health care centers and hospitals. After that, the estimated number of people inside a healthcare facility per day was calculated by using the average number of appointments for a healthcare facility divided by 365 days, which equaled 147 (City of Helsinki, 2020). According to a real estate data provider, Helsinki offices average 23 square meters per worker (Rapal, 2020). Based on the literature, a matrix was developed to estimate the temporal weights, or percentages, for the maximum number of people inside a spatio-temporal object (building) according to its corresponding object (or building) class type (City of Helsinki, 2017; Rapal, 2020). Detailed information on number of customers for different types of businesses was available in SeuduCD but was not able to be delivered outside of Finland (or published) due to privacy issues. We developed a prototype of the model by using estimations that are based on publicly available statistics (City of Helsinki, 2017; Rapal, 2020). In the future, the matrix will be updated using the SeuduCD data to produce more accurate and realistic results.

Table 1 Information on the estimated average number of people inside shops and the average estimated floor areas that one person uses for some other object classes

Full size table

SeutuCD building register data is represented as points, and each point refers to a centroid of a building polygon. It also contains the class information of that building. SeutuCD data was spatially joined with OpenStreetMap building polygons to attach registration class information to building polygons. Here, we used Twitter data to model population dynamics for 11 spatial object classes (all the spatial object classes except school, daycare, and eldercare centers). In schools, daycare, and eldercare centers, there is a weaker correspondence between Twitter users and the number of people. Students in primary schools, children in daycare may not have mobiles or use their mobiles during classes. Some elderly people may not use Twitter. Human dynamics for those classes were estimated based on the facilities’ opening hours. In most cases (97.8%), the SeutuCD registration data matched with the building polygons perfectly; one building register data point was inside each building polygon. Therefore, the class information of all social media signals inside the building was assigned to the sole class type of the registration record in that building. However, as the sources of the two datasets differ, there are occasions where multiple registration points fall into the same building polygon. This usually happens for large complex buildings where multiple (typically 2–3) classes exist in different parts of the building. This situation applies for 3801 out of the 171,708 total buildings (2.2%) within the study area. In such cases, the building was treated as a mixed-class building and each social media activity was attached to the closest registration record inside the building, where the closest distance was calculated using Euclidian distance. After these operations, the set of tweets that belonged to a certain building class were obtained and the temporal activity (the number of tweets at each hour of a day) of that building class was calculated. Finally, the temporal weights were normalized according to Eq. (1). Formally, given matrix W that contains i rows and j columns (here i represents the number of object classes and j represents hours (0–23 h) of a day), the normalized temporal weight value of each cell can be calculated by using the formula:

$$ {W}_{i,j}\left({i}_{1,2\dots 11},{j}_{0,2,\dots, 23}\right)=\frac{X_{i,j}-{\mathit{\min}}_{j=0}^{23}\min \left({x}_{i,j}\right)}{{\mathit{\max}}_{j=0}^{23}\left({x}_{i,j}\right)-{\mathit{\min}}_{j=0}^{23}\left({x}_{i,j}\right)} $$

(1)

The results are plotted in Fig. 3. The x axis represents time of day (starting from 12 a.m. and ending at 11 p.m.), and the y axis represents the normalized number of tweets. Additionally, as many object classes have significantly different patterns on weekdays and weekends, temporal weights were further decomposed into the weekday and weekend. However, as this article is focused on population dynamics analysis of weekdays, results for the weekend are not presented, but can be implemented by using a similar methodology in the future. According to Fig. 3, object classes were grouped into four groups according to common characteristics, such as building use or opening hours. In each group, the number of tweets in different hours of the day was plotted for each spatial object class. The results indicated that all the Twitter activity curves within each group have a similar pattern. The number of tweets is lowest at 5 a.m. since most people are sleeping during that time, and it starts to increase rapidly after that. Therefore, for some object classes such as residential buildings and hotels, the Twitter activity pattern during a typical sleeping period (e.g. 12 a.m. to 5 a.m.) does not represent the actual population inside the building, so some adjustments need to be made for those object classes. For the office building object class, the second highest peak appears at 7 p.m., which is considered unrealistic for estimating the population since 7 p.m. is not a common office hour (Passport to Trade, 2020). The high value is caused by a mixed class building type or short period of data collection. In Finland, shops, apartments, and offices are often located in the same building. The high number of tweets could come from people who are at home or shopping in the same building, or on nearby streets. The tweets used in this study were collected over a short period of 3 months, which is another source of fluctuation and inaccuracy in the model.

Due to the above-mentioned problems, the Twitter activity curves were further processed and analyzed to provide a more realistic estimation of temporal weights. When the data sample size is large, the Twitter activity curves usually remain smooth (Soliman, Yin, Soltani, Padmanabhan, & Wang, 2015). Smooth individual activity curves were also developed in Jiang, Ferreira, and González (2012)‘s work by using survey data. In Fig. 3, we can see that the curves seem to fluctuate, and they were smoothed by using a moving average function (Chou, 1975; Obe & Hsu, 2015; MathWorks, 2020):

$$ {y}_s(i)=\frac{1}{2N+1}\left(y\left(i+N\right)+y\left(i+N-1\right)+\dots +y\left(i-N\right)\right) $$

(2)

where y_s(i) is the smoothed value for the ith data point, and N is the number of neighboring data points on either side of y_s(i). N is equal to 2 in this case, and 2 N + 1 is the span.

In the next step, background noise was removed for the following object classes: shops, office buildings, leisure places, and industrial buildings. Background noise in data refers to the number of tweets that appear during the hours when most people are sleeping, and facilities are closed. It was removed by subtracting background noise data from its corresponding hour of Twitter data. According to Jiang, Ferreira, & González (2012), a “U” shaped curve represents the population dynamics of a residential building. The lowest value appeared around 11 a.m., the time that most people are at work. Based on Jiang, Ferreira, & González (2012)‘s work, the lowest value of the Twitter activity curve was adjusted to 11 a.m. and noon for resident building and hotel object classes, respectively. The values (number of tweets) during the hours earlier than 11 a.m. (12 a.m. to 11 a.m.) were mirrored from the values of each hour after 11 a.m. (23 a.m. to 11 p.m.). The same idea was also applied to office buildings in order to lower the temporal weight at 7 p.m. so that the highest value appears at 1 p.m. The results were normalized and plotted in Fig. 4. According to Fig. 4, the shapes of the curves match Jiang, Ferreira, & González (2012)‘s work for the following object classes: residential buildings, office buildings, shops, sport centers and leisure places.

Finally, a spatiotemporal database for implementing the spatio-temporal population model was developed. Figure 5 illustrates the general framework of the spatio-temporal database. The first element is called a “Building”, and it contains attribute information on a building’s geospatial location, the purpose of the building (useOfBuilding), and the total floor area (floorArea) of a building. The element “Maximum People” represents the maximum amount of people inside a building at any given time. Here, the value of this element was roughly estimated by using the total floor area of the building divided by the average floor area that one person uses. In the future, estimation accuracy can be improved by using SeutuCD population and business data, if available, to estimate the maximum amount of people inside residential buildings, eldercare centers, and industrial and office buildings. Furthermore, statistical research on the total registered population and total amount of customers can also be used to improve estimation accuracy for the following object classes: universities, hospitals, shops, restaurants, leisure places, sport centers, railway stations, and daycare centers. The dynamic temporal column “Number of People” is used to represent the number of people inside a building in a specific hour. It was calculated by using the temporal weight that was derived from the Twitter data analysis results (see Fig. 4) multiplied by the “Maximum people” value. In Fig. 2, this corresponds to the temporal columns T1 to T24, and the values are illustrated as sets of temporal map layers. For instance, the number of people inside all the buildings of an object class X can be calculated by using the query:

$$ update\ building\ set`` Number\ of\ People"= Maximum\ People\ast weight\ where\ useOfBuilding=X $$

Values in the “Number of People” column are updated after this query. The element “Input Area” is used to represent the area of interest. For instance, the total number of people inside an area at a specific time can be calculated using the query:

$$ select\ SUM\left( Number\ of\ People\right) from\ building,\mathrm{Input}\ \mathrm{Area}\ where\ ST\_ contains\left(\mathrm{Input}\ \mathrm{Area}. geom, building. geom\right) $$

In this way, a spatiotemporal query can be articulated around a combination of temporal, spatial, and aspatial components, where all the components change dynamically. Here, temporal and spatial components refer to the time and location of an event, and the aspatial component represents human activity dynamics. The database should be based on a generic data model that is independent of specific software and hardware with an open architecture, to maximize the information exchange efficiency between emergency responders. Moreover, this spatiotemporal human activity dynamic model was implemented using a cyberGIS computing environment to process large number of Tweets (Wang, 2013).

3 Results

This section illustrates the results of the study, as well as introducing the model performance validation results. The results of the population dynamics analysis for two object classes, apartment buildings and office buildings, are illustrated in Fig. 6 as an example. In Fig. 6, maps A and B illustrate the population distribution in the Helsinki Metropolitan area office buildings at noon and 8 p.m., and maps C and D illustrate the population distribution among apartment buildings at noon and 8 p.m. Map A is compared with map B, and map C is compared with map D in order to see population movement patterns at two moments of time. Map A and C use the natural breaks classification method. For map B and D, the results are categorized into the same number of categories as map A and C, with the same interval values, in order to make the comparison more apparent. In map B, most office buildings have low population densities (less than 133 people per building) outside of common working hours. Map A shows that the population density in office buildings has increased by 1 or 2 categories compared to map B, which indicates that more people are in offices at noon. According to maps C and D, the population density is low at noon for most apartment buildings since people are at work. At 8 p.m., the population density increases for most of the apartment buildings except in some areas, such as Munkkivuori, Huopalahti, Kallio, and Lauttasaari. The population density in those areas remains low and unchanged relative to map C, because the number of residents of the buildings in that area is less than the maximum value of the low population density category.

In order to validate these assumptions, two new maps were created (Fig. 7) to illustrate the population difference of office and apartment buildings at noon and 8 p.m. For an apartment building, the difference was calculated by subtracting the number of people inside each apartment building at noon from the corresponding value at 8 p.m. Figure 7, map D shows the buildings that have less than 17 residents. One can see that the areas that have a low population density difference in map C also have a similar pattern in map D. On the other hand, all the values in map C are positive, which indicates that the number of residents is higher at 8 p.m. compared to noon. The same idea was also applied to the office building class, and the results are illustrated in Fig. 7, maps A and B. The office buildings that have no changes in population density are the buildings that have fewer than 133 employees.

Here, two model performance evaluation methods have been developed. The first one is used to illustrate how the human dynamics within the Helsinki Metropolitan area changes through six moments of time according to the population model analysis results. In an ideal situation, the total number of populations should not vary, especially over a short period of time. Some changes are expected (e.g. due to commuting) but, compared to the total population size, the changes should still be small. Six moments of time were observed: weekdays at 12 a.m., 4 a.m., 8 a.m., noon, 4 p.m., and 8 p.m. The estimated total population sizes were plotted in Fig. 8. The blue bars represent the population variation during weekdays over these six moments in time. The total population registered in the Helsinki Metropolitan area was 1,172,739 in the year 2019 (Wikipedia, 2020). The error rate was calculated by subtracting the registered total population from the estimated population size. The results indicate that the variation of the results and the error rate of the population estimation are relatively small during the hours of 12 a.m., 4 a.m., 8 a.m. and noon. The variation and error rate start to increase after 4 p.m. The highest error rate for the estimated population is about 38%. The total estimated population fluctuated around a mean of 1,304,752, which is higher than the registered total population in Helsinki Metropolitan area. This is likely because the latter number relies on the official residents of the city. However, the observed fluctuations during a 24-h cycle could be attributed to the trends of using Twitter during the course of the day, as well as the difference in usage between weekdays and the weekend.

Since the research goal is to estimate the human activity dynamics at a spatiotemporal scale, an observed area was created as an example over the central Helsinki area and modeled as a polygon in the database to represent the simulated area. The total number of people of all 14 building types inside the polygon was estimated by using the proposed model, and the results are illustrated in Table 2. Using this method, it is possible to observe population changes according to different times inside the study area. According to Table 2, the number of people inside the Helsinki center (an example of an observed area) reaches the lowest value at 4 a.m., and the population refers to the number of residents in that area. The number increases significantly and reaches the highest value at 4 p.m., since people travel from other regions to the Helsinki center for work or shopping. It decreases at 8 p.m., since people who live in other regions then start to travel back to their homes.

Table 2 Number of people inside the polygon at six moments in time

Full size table

4 Discussion

The major advantages of object-oriented modelling are that it permits an intuitive and direct representation of the complex real world and captures in-depth semantics of the application domain. However, the relationships between different temporal objects are not recorded in an object-oriented modelling approach. For instance, the results indicated higher variations and error rates at 4 p.m. and 8 p.m. compared to other moments in time. This could be related to the drawbacks of an object-oriented approach, since all the temporal weights were modeled and normalized within each object class and there was no linked association between classes. For instance, most of the object classes received a temporal weight value higher than 0.79 at 4 p.m., which causes over-estimation of the population.

On the other hand, the maximum number of people inside each spatiotemporal object class is estimated by using the total floor area of a building divided by the average floor area that one person uses. For most of the spatiotemporal object classes, the estimation of how big a floor area one person uses is not based on the literature or statistical research. It could be improved in future research by using statistical and population registry data. For instance, there is data available in SeutuCD on how many people register a building as a home address and the number of rooms in a building. In the SeutuCD business registry dataset, there is information about the location of the business, business type, and its corresponding number of employees and customers. In the future, these data would give a more accurate estimation on the maximum number of people inside residential, industrial, and office buildings. For the university object class, the maximum amount of people inside a university can be estimated based on university registry data, for example the number of students and employees the university has. The main objective of the article is to propose the methodology instead of producing accurate estimation results. Therefore, this spatiotemporal population model can be considered as a knowledge-based model, for which users can update the “maximum people” and “temporal weight” elements by using more detailed statistical literature surveys. We suggest the human activity dynamics for the object classes such as daycares and schools cannot be estimated using Twitter data since most children are not using Twitter. In this article, we combined urban infrastructure register data with information about the facilities’ opening hours to improve the estimation for those building types. In the future, a detailed literature survey of those facilities’ opening hours should also be developed to improve the accuracy of the model.

In our previous work, an object-oriented spatiotemporal population model was developed only using the urban infrastructure register data – SeutuCD (Zhang, Rangsima, and Virrantaus 2010). The previous work’s major drawback appears in estimating the temporal weight matrix, where the temporal weight for each building class’ human activity changes was based on random assumption. In this research, the temporal weight matrix was derived from Twitter activity curves for most object classes, which improved the modeling accuracy compared to our previous work. For some object classes, such as residential buildings and hotels, the Twitter curves cannot be used to represent the actual human activity dynamics during the early morning. Therefore, the Twitter activity curves were modified and compared with Jiang, Ferreira, & González (2012)‘s work. In addition, this research uses Twitter data that was collected during a short time period of 3 months. Limited data causes the Twitter activity curves to fluctuate. In future research, more Twitter data should be used in order to make the results more accurate. Moreover, more object classes should be included, such as transportation networks, forests and churches, to make the model more realistic. The human activity dynamics for weekends and holidays as well as among weekdays such as Monday, Tuesday, and Wednesday should also be considered.

In Finland, every municipality is responsible for collecting its building register data. Therefore, this spatiotemporal human activity dynamic model can be implemented at a national scale to support decision-making in emergency management.

5 Conclusions

This article describes an innovative way to model population dynamics, which can be used to give an approximate estimation for the number of people inside a certain area at a certain time. This model is object-oriented, where real-world phenomena are modeled as spatio-temporal objects, and people move from one object to others over time. Later, this model was developed into an object class-oriented model, and all the spatio-temporal objects were aggregated into 14 spatio-temporal object classes. This object class-oriented model was combined with a space–time composite model where objects in space at different times can be projected down to a spatial plane to generate a common spatio-temporal map. A temporal weight matrix was derived from Twitter activity variation curves, which represents population dynamics for object classes at different hours of the day.

As a result, population dynamics analyses for two spatio-temporal object classes (apartment buildings and office buildings) at noon and 8 p.m. were illustrated as an example. The model was validated by using registered population data. Small variations and errors were observed during the hours of 12 a.m., 4 a.m., 8 a.m. and noon. The estimated average number of residents in the Helsinki Metropolitan area is 1,304,752, which is slightly higher than the registered population of 1,125,136.

A large part of data processing, computation, and visualization for this research has been integrated on a CyberGIS-Jupyter platform, which leverages Jupyter notebook, Docker containers, cloud-based infrastructure provisioning, and high-performance computing to enable interactive and reproducible scientific geospatial computation based on cyberGIS (Yin, Liu, Padmanabhan, Terstriep, Rush, & Wang 2017; Zhang, Hu, Yin, Kashem, Li, Cai, Perkins & Wang (2018). Consequently, the CyberGIS-Juypyter platform enables the computation and visualization process to be easily validated, re-produced, and extended by authorized collaborators and scientists.

Availability of data and materials

The data will be available in the future.

References

Bengtsson, L., Lu, X., Thorson, A., Garfield, R., & Schreeb, J. (2011). Improved response to disasters and outbreaks by tracking population movements with mobile phone network data: A post-earthquake geospatial study in Haiti. PLoS Medicine, 8(8), e1001083.
Article Google Scholar
Chou, Y.-L. (1975). Statistical analysis. United Kingdom: Holt, Rinehart and Winston.
Google Scholar
City of Helsinki (2020). Fact about Helsinki. Helsinki, Finland. Available at: https://www.hel.fi/hel2/tietokeskus/julkaisut/pdf/17_06_08_tasku17_en_net.pdf. (cited 10.18.2020).
Frihida, A., Danielle, J. M., & Thériault, M. (2002). Spatio-temporal object-oriented data model for disaggregate travel behavior. Transactions in GIS, 6(3), 277–294.
Article Google Scholar
González, M. C., Hidalgo, C. A., & Barabási, A. L. (2008). Understanding individual human mobility patterns. Nature, 453, 779–782.
Article Google Scholar
Helsinki Region Environmental Services Authority. (2020). SeutuCD, Available at: https://www.hsy.fi/en/experts/regionaldata/geographicinformation/Pages/SeutuCD.aspx (cited 10.22.2020).
Huang, R., & Peng, Z. R. (2008). A spatiotemporal data model for dynamic transit networks. International Journal of Geographical Information Science, 22(5), 527–545.
Article Google Scholar
Jiang, S., Ferreira, J., & González, M. C. (2012). Clustering daily patterns of human activities in the city. Data Mining and Knowledge Discovery, 25, 478–510.
Article Google Scholar
Kjenstad, K. (2006). On the integration of object-based models and field-based models in GIS. International Journal of Geographical Information Science, 20(5), 491–509.
Article Google Scholar
Langran, G., & Chrisman, N. R. (1988). A framework for temporal geographic information. Cartographica, 25, 1–14.
Article Google Scholar
Li, D., Chaudhary, H., & Zhang, Z. (2020). Modeling spatiotemporal pattern of depressive symptoms caused by COVID-19 using social media data mining. International Journal of Environmental Research and Public Health, 17(14), 4988.
Article Google Scholar
Lin, J., & Cromley, R. G. (2015). Evaluating geo-located twitter data as a control layer for areal interpolation of population. Applied Geography, 58, 41–47.
Article Google Scholar
Longley, P. A., Goodchild, M. F., Maguire, D. J., & Rhind, D. W. (2005). Geographic information systems and science. New York: Wiley.
Google Scholar
MathWorks, (2020). Filtering and smoothing data. Available at: https://www.mathworks.com/help/curvefit/smoothing-data.html (cited 10.10.2020).
Nara, A., Tsou, M. H., Yang, J. A., & Huang, C. C. (2018). The opportunities and challenges with social media and big data for research in human dynamics. In Human dynamics research in smart and connected communities (pp. 223–234). Cham: Springer.
Chapter Google Scholar
Nardi, B. A., Schiano, D. J., & Gumbrecht, M. (2004). Blogging as social activity, or, would you let 900 million people read your diary? In Proceeding of CSCW (pp. 222–231).
Google Scholar
Obe, R. O., & Hsu, L. S. (2015). PostGIS in action. Manning: Publications.
Google Scholar
OnTheWorldMap, 2020. Map of Finland. Available from web: http://ontheworldmap.com/finland/ (cited 12.29.2020).
OpenStreetMap contributors. (2020). Planet dump [Data file from June 4th. 2017 dumps], 2020, Available at: https://planet.openstreetmap.org (cited 10.24.2020).
Passport to Trade (2020). Work life balance. Available at: http://businessculture.org/northern-europe/finland/work-life-balance-2/. (cited 10.10.2020).
Pedro, BLAR. (2020). How do people perceive new ways of sustainable mobility?: the case of electric scooters: San Francisco vs Lisbon. PhD dissertation.
Peuquet, D. J. (1999). Time in GIS and geographical databases. Geographical Information Systems, 1, 91–103.
Google Scholar
Rapal OY (2020) Kauppalehti uutiset, Available at: https://www.kauppalehti.fi/uutiset/toimitiloissa-huimaa-tuhlausta/SGyuSUDB ( cited 10.18.2020, in Finnish).
Soliman, A., Yin, J., Soltani, K., Padmanabhan, A., & Wang, S. (2015). Where Chicagoans tweet the most: Semantic analysis of preferential return locations of twitter users. In Proceedings of the 1st international ACM SIGSPATIAL workshop on smart cities and urban analytics (pp. 55–58). Seattle: ACM.
Chapter Google Scholar
Statista (2020) Twitter usage purposes in Finland. Available from web: https://www.statista.com/statistics/700531/twitter-usage-purposes-in-finland/ (cited 10.24.2020)
Wakamiya, S., Lee, R. and Sumiya, K (2011). Urban area characterization based on semantics of crowd activities in twitter. In International Conference on GeoSpatial Sematics (pp. 108-123). Springer, Berlin
Wang, S. (2013). CyberGIS: Blueprint for integrated and scalable geospatial software ecosystems. International Journal of Geographical Information Science, 27(11), 2119–2121.
Article Google Scholar
Wikipedia, Greater Helsinki, (2020). Available at: https://en.wikipedia.org/wiki/Greater_Helsinki (cited 10.26.2020).
Yin, D., Liu, Y., Padmanabhan, A., Terstriep, J., Rush, J., & Wang, S. (2017). A CyberGIS-Jupyter framework for geospatial analytics at scale. In Proceedings of the practice and experience in advanced research computing 2017 on sustainability, success and impact (pp. 1–8).
Google Scholar
Yuan, M. (1996). Temporal GIS and spatio-temporal modelling. In Proceedings of the international conference/workshop integrating GIS and environmental modelling, USA.
Google Scholar
Yuan, M. (1999). Use of a three-domain representation to enhance GIS support for complex spatiotemporal queries. Transactions in GIS, 3(2), 137–159.
Article Google Scholar
Zhang, Z., Demšar, U., Rantala, J., & Virrantaus, K. (2014). A fuzzy multiple-attribute decision making modelling for vulnerability analysis on the basis of population information for disaster management. International Journal of Geographical Information Science, 28(9), 1922–1939.
Article Google Scholar
Zhang, Z., Hu, H., Yin, D., Kashem, S., Li, R., Cai, H., Perkins, D., & Wang, S. (2018). A cyberGIS-enabled multi-criteria spatial decision support system: A case study on flood emergency management. International Journal of Digital Earth, 12(11), 1364–1381.
Article Google Scholar
Zhang, Z., Rangsima, S., and Virrantaus, K. (2010). A spatio-temporal population model for alarming, situational picture and warning system. Guilbert E., Lees B., Leung Y., eds., In: Proceeding joint international conference on theory, data handling and modeling in geospatial information science, the international archives of the photogrammetry, remote sensing and spatial information sciences, 38 (2), 69–74.
Google Scholar
Zhao, D., & Rosson, M. B. (2009). How and why people twitter: The role that micro-blogging plays in informal communication at work. In Proceedings of the ACM 2009 international conference on supporting group work (pp. 243–253). New York, U.S.A.
Zhao, Z., Shaw, S. L., & Wang, D. (2015). A space-time raster GIS data model for spatiotemporal analysis of vegetation responses to a freeze event. Transactions in GIS, 19(1), 151–168.
Article Google Scholar

Download references

Acknowledgments

OpenStreetMap contributors. (2015) Planet dump [Data file from June 4th. 2017 dumps]. Retrieved from https://planet.openstreetmap.org. We would like to thank Dr. Aiman Soliman for providing helpful discussions during the project.

The authors would like to acknowledge the Helsinki Region Environmental Services Authority (HSY) of Finland for providing SeutuCD data.

Code availability

The code will be available in the future.

Funding

This paper and associated materials are based in part upon work supported by the National Science Foundation under grant numbers: 1047916 and 1443080. Any opinions, findings, and conclusions or recommendations expressed in the paper and these materials are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Author information

Authors and Affiliations

Department of Geography, Texas A&M University, 797 Lamar St, College Station, TX, 77843, USA
Zhe Zhang
Department of Geography and Geographic Information Science, University of Illinois at Urbana-Champaign, Urbana, USA
Dandong Yin & Shaowen Wang
Department of Built Environment, Aalto University, Espoo, Finland
Kirsi Virrantaus
Department of Landscape Architecture & Urban Planning, Texas A&M University, College Station, USA
Xinyue Ye

Authors

Zhe Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Dandong Yin
View author publications
You can also search for this author in PubMed Google Scholar
Kirsi Virrantaus
View author publications
You can also search for this author in PubMed Google Scholar
Xinyue Ye
View author publications
You can also search for this author in PubMed Google Scholar
Shaowen Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization, Zhe Zhang, Shaowen Wang; Methodology Zhe Zhang; Software, Zhe Zhang, Dandong Yin; Validation, Zhe Zhang; Formal Analysis, Zhe Zhang; Investigation, Zhe Zhang and Shaowen Wang; Resources, Shaowen Wang, Kirsi Virrantau; Data Curation Zhe Zhang, Kirsi Virrantaus; Writing-Original Draft Preparation, Zhe Zhang, Dandong Yin; Writing-Review & Editing, Shaowen Wang, Kirsi Virrantaus; Visualization, Zhe Zhang, Dandong Yin, and Xinyue Ye; Supervision, Shaowen Wang and Zhe Zhang; Funding Acquisition, Shaowen Wang. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Shaowen Wang.

Ethics declarations

Competing interests

The authors declare no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhang, Z., Yin, D., Virrantaus, K. et al. Modeling human activity dynamics: an object-class oriented space–time composite model based on social media and urban infrastructure data. Comput.Urban Sci. 1, 7 (2021). https://doi.org/10.1007/s43762-021-00006-x

Download citation

Received: 29 October 2020
Accepted: 24 February 2021
Published: 06 May 2021
DOI: https://doi.org/10.1007/s43762-021-00006-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Modeling human activity dynamics: an object-class oriented space–time composite model based on social media and urban infrastructure data

Abstract

Similar content being viewed by others

E-ware: a big data system for the incremental discovery of spatio-temporal events from microblogs

Platial mobility: expanding place and mobility in GIS via platio-temporal representations and the mobilities paradigm

Reading urban land use through spatio-temporal and content analysis of geotagged Twitter data

1 Introduction