1 Introduction

Social science data typically consist of meanings, motives, definitions, and typification (Scott, 2000). In addition, the main types of social science data include attribute data and relational data (Scott, 1988). Attribute data involve the properties, qualities, or features which characterize individuals or groups, while relational data are the contacts, ties, and connections among individuals or groups. Network analysis is especially suitable to relational data, where the relations can be treated as the linkages among agents. Coming from textile metaphors, the term ‘network’ was integrated into the social science domains starting in the 1930s to indicate the interweaving relations how social actions, agents, and groups are organized. From the 1970s, the key and formal concepts of social network analysis emerged in social science domains and have triggered new modes of techniques capable of tackling the relational data. As another critical perspective indicating relations among social agents, geographical or spatial dimension was largely ignored until the 2000s. After that, more and more researchers delved into spatial social network analysis, and the corresponding literature base has been growing. Spatial social networks are typically treated as spatial transformations of social networks into maps (Tsou & Yang, 2016). In this context, the use of geospatial technologies in social network analysis has received growing attention. In particular, analyzing spatial social networks can reveal the spatial–temporal dynamics of information and link people’s online communications with real-world events (Yang et al., 2016; Ye & Liu, 2018).

As the use of spatial social network analysis has proliferated in the social sciences, its potential for explicating social phenomena has become increasingly realized, with implications for corresponding action strategies and policy solutions. For example, methodologies that integrate spatial and social network data have been applied in infectious disease and chronic disease epidemiology (Emch et al., 2012; Sun et al., 2018). In the domain of urban research in particular, these methodologies have been used for in the study of urban agglomerations, such as for transportation planning and air pollution mitigation (Song et al., 2018; Zhang et al., 2020; Zhu et al., 2021). However, as researchers increasingly consider spatial social network analysis, it is necessary to portray its research trends comprehensively (Ye & Andris, 2021). A nuanced understanding of these trends can serve several purposes, such as: (1) facilitating the sharing of research achievements in the field; (2) identifying emerging research directions; and (3) encouraging the continued development of research methods (Donthu et al., 2021). However, outside of a handful of studies that reflect on broader trends in this area, a general overview about spatial social network analysis does not exist. Thus, this paper addresses this persistent gap by conducting a review of network analysis from the perspective of spatial social science.

To portray research trends in the use of spatial social network analysis, a bibliometric analysis was conducted. Bibliometric analysis is a statistical approach to analyze relevant publications and understand the research trends in a particular domain (Garfield, 1970; Pritchard, 1969). In the current study, the purpose of this type of analysis is to identify the trends of publications and collaborations, as well as geographical and institutional distributions of scholarly outputs (Li et al., 2017). Furthermore, bibliometric network analysis, such as co-word analysis (Ding et al., 2001), co-citation analysis (He & Hui, 2002), co-authorship analysis (Glänzel & Schubert, 2004), and co-publication analysis (Schmoch & Schubert, 2008), was conducted in the current study to examine the relationships among authors, keywords, institutes, and countries.

This paper examines the research trends of publications on spatial social network analysis from the years 2000 to 2019. The aims of the study are to: (1) evaluate the research performance by country, institute, journal, subject category, and keyword; and (2) identify state-of-the art techniques and future research directions in spatial social network analysis.

2 Methodology

2.1 Data collection

The dataset was derived from the databases of the Science Citation Index Expanded (SCI-expanded) and Social Science Citation Index (SSCI) publications by the Web of Science covering the time period of 2000 to 2019. The following keywords, including TS (Topic) = (“social network*” OR “social media*”) AND (“spatial” OR “geography”), were employed to search all the archived documents for relevant publications. The selected publications include those keywords or close variants of those keywords (with *) in their titles, abstracts, or keywords. Information regarding titles, abstract, keywords, authors, institutions, and cited references was downloaded. The bibliographic search resulted in 2,721 publications. After deleting the records without complete authorship and publication year, 2,676 publications remain.

2.2 Analysis tools

Bibliometric analysis was conducted to assess the trends of spatial social network analysis research in the scientific literature. In this study, we used the R package “Bibliometrix” and the VOSviewer (Aria & Cuccurullo, 2017; Van Eck & Waltman, 2009). The R package “Bibliometrix” provides an open-source package of bibliometrics and scientometrics. The VOSviewer is a free toolbox for developing bibliometric visualization and analyzing publication trends. Natural language processing methods are built into the VOSviewer package, which can be used to generate the term co-occurrence networks, network layouts, and network clusters. The VOSviewer software adopts a labelled circle to denote an element, where the circle size indicates the relative importance, and the same color represents the same cluster.

3 Results and discussion

3.1 Characteristics of publications

A total of 2,676 publications include 2,442 articles, nine book chapters, 82 proceeding papers, 27 editorial materials, and others. The annual publications increased from 15 in 2000 to 410 in 2019, demonstrating an accelerated rise and upward growth of spatial social network research in the past 20 years. The average annual growth rate of publications in the field was 19.75%. Figure 1 shows that the yearly growth rate of publications has noticeably speeded since 2010.

Fig. 1
figure 1

Growth of publication outputs (Horizontal axis: year; Vertical axis: number of publications)

The average number of authors in a publication increased from 1.933 in 2000 to 3.829 in 2019 (Table 1), which demonstrates that the collaboration has steadily increased. The average number of cited references was generally stable during the 20 years and remained within the range of 47 to 67. Of note, the average citations per article reached its maximum of 191.412 in 2002. Three publications in 2002 had been cited more than 500 times by the end of 2019. Two of these publications were about models for analyzing social networks, specifically latent space approaches (Hoff et al., 2002) and agent-based models (Macy & Willer, 2002), while the third summarized how social network analysis can be applied in information sciences (Otte & Rousseau, 2002).

Table 1 Scientific outputs descriptors during 2000–2019

3.2 Subject categories and major journals

Spatial social network research has covered a wide variety of themes and many different disciplines. Based on the classification of the Web of Science categories, the most popular categories by order are “Geography” (589 publications, 21.4% of the sample documents), “Environmental Studies” (321 publications, 11.7%), “Computer Science Information Systems” (264 publications, 9.6%), and “Economics” (217 publications, 7.9%). When subject categories are examined, 456 combinations of unique categories were identified. The top 20 combinations of subject categories are illustrated in Table 2. The results show that spatial social network studies are relevant to a wide range of disciplines, while the related research outcomes are mostly rooted in “Geography”, “Multidisciplinary Sciences”, “Geography, Physical; Remote Sensing”, and “Economics; Geography” categories. The most cited publication in the “Geography” group discussed the concept and categories of embeddedness in details (Hess, 2004); in the “Multidisciplinary Sciences” group, a study about the rule for cooperative interactions on social network drew the most citations (Ohtsuki et al., 2006); the most highly cited publication in the “Geography, Physical; Remote Sensing” group used geotagged social media data to monitor visitor use of a national park in Finland and compared the performance with traditional visitor surveys (Heikinheimo et al., 2017); and Giuliani (2007) applied social network analysis to examine the knowledge network structure of wine clusters in Italy and Chile, which received the most citations among publications integrating economics and geography.

Table 2 Distribution of the subject category combinations: the top 20

The top 20 active journals are summarized in Table 3. In terms of the number of publications, PLOS ONE was the most prolific journal, followed by the International Journal of Geo-Information, and Sustainability. All these three journals are open access journals. Regarding the average citation number per article, the Journal of Economic Geography, Annals of the Association of American Geographers, and Urban Studies were the three most highly cited journals, with magnitudes of 101.800, 40.389, and 39.923, respectively.

Table 3 The most active journals

3.3 Geographical and institutional distribution of publications

The spatial and institutional distributions of publications were analyzed in terms of authors’ affiliation information. The ten most productive countries are shown in Fig. 2, based on the number of publications, articles by country, and international collaborations. Among these 10 countries, six were located in Europe, two in North America, one in Oceania, and one in Asia.

Fig. 2
figure 2

Most productive countries during 2000–2019 (TP: total publications; IP: the number of independent publications by single-country; CP: the number of internationally collaborative publications)

The most productive country was the United States with 1,015 total articles. The United Kingdom ranked second with 442 articles, followed by China with 433. Figure 2 also reveals that some countries had a higher rate of international collaborations than others. The countries with the highest rates of international collaborations were France (collaboration rate: 68.75%), Italy (collaboration rate: 61.76%), Canada (collaboration rate: 58.21%), the Netherlands (collaboration rate: 58.12%), and Germany (collaboration rate: 57.67%). Almost all were non-native English-speaking countries.

Co-authorship analysis was used to examine the network of the countries that produced the most research outcomes in the field, as plotted in Fig. 3. The size of the nodes reveals the number of publications with co-authorship in a country, while the thickness of the edges connecting them represents the strength of collaboration. There are two main clusters of collaborations: European countries (the red cluster), and Asian and North American countries (the green cluster). The largest number of papers with co-authorship were yielded by the United States, the United Kingdom, and China. The strongest collaboration was between the United States and China, followed by the United States and England.

Fig. 3
figure 3

Co-authorship network among productive countries

3.4 Institution collaboration network

The collaboration network of the 80 most productive institutions is visualized based on the VOSviewer (Fig. 4). The most productive institution was Wuhan University with 50 papers, followed by the University of Oxford with 43 papers and Harvard University with 40 papers (see Table 4 for the top 15 most productive institutions). Each node in Fig. 3 indicates an institution of higher education. The distance between two institutions in the visualization roughly represents the relatedness of the institutions in terms of co-authorships. The closer the two institutions are positioned to each other, the stronger their relatedness. The strength of co-authorship links between institutions is also demonstrated by the thickness of edges. The institutions are clustered into five groups of different colors (Fig. 4). Most UK institutions fall in the green group, while the yellow-green groups consist mainly of North American institutions. Most institutions in China fall in the purple group. Institutions within the same continent are more likely to network than institutions from different continents. This means that the geography of institutions maters for collaboration. Nine out of the 10 most highly cited publications that involved institution collaboration were conducted by institutions from the same continent. Researchers from European institutions have co-authored five of these publications (Bastug et al., 2014; Giuliani & Bell, 2005; Gordon & McCann, 2000; Otte & Rousseau, 2002; Perc & Szolnoki, 2010), while collaborations between U.S. institutions have contributed four of these publications (Eagle et al., 2009; Hoff et al., 2002; Macy & Willer, 2002; Sorenson & Stuart, 2001).

Fig. 4
figure 4

Institution collaboration network of the 80 most productive institutions

Table 4 Top 15 institutions based on the total number of publications

3.5 Keywords analysis

3.5.1 Keywords network analysis

Keywords of publications can depict a general profile of the article contents. The co-occurrence relationships among the top 70 high-frequency keywords were explored, and the co-word networks were visualized by the VOSviewer software (Fig. 5). The nodes are high-frequency keywords, whose sizes represent the degree of frequency. The size of the node is larger based on the higher the frequency of keyword use in the last 20 years. The distance between two keywords in the visualization roughly shows the relevance between the keywords regarding the co-occurrence. The closer two keywords are positioned to each other, the stronger their relatedness is. The strength of co-occurrence links between keywords is also demonstrated by the thickness of edges. As shown in Fig. 5, the 70 most frequently used keywords are grouped into three clusters. The red cluster is mainly about social network analysis, the blue cluster is mainly about spatial and geography dimension, and the green cluster is mainly about social media.

Fig. 5
figure 5

Co-occurrence network of the top 70 high-frequency keywords

The keywords with the highest frequencies were “social networks”, “social media”, “social network analysis”, and “social networks” because they matched the topics we used to collect publications. “Twitter”, “big data”, “networks”, “spatial analysis”, and “social capital” were used more than 50 times by authors, which indicate five research hotspots in the spatial social network field. The most highly cited publications with each of these five keywords were as follows: for “Twitter” research, Takhteyev et al. (2012) investigated different factors for the formation of social ties on Twitter and identified the frequency of airline flights between the two nodes as a vital predictor. Regarding “big data” research, Crampton et al. (2013) analyzed both the potential and shortcomings of big social media data and discussed the impacts of big data analytics on the human geography field. For “networks” research, Macy and Willer (2002) considered that some social network patterns need to be understood using a bottom-up dynamical model, so they introduced agent-based modeling approaches for sociological research. For “spatial analysis” research, a study analyzed the data on shootings in Chicago and Boston, the results of which indicated that both geography and social networks can influence gang violence (Papachristos et al., 2013). For “social capital” research, Carpiano (2006) incorporated the social capital theory into a framework of neighborhood social processes in order to investigate how community factors influence health and well-being. Other popular keywords in the last two decades included “geography”, “network analysis”, “China”, “mobility”, “human mobility”, “migration”, “gender”, “innovation”, “place”, “proximity”, “mobility”, “machine learning”, “GIS”, “Internet”, “location-based social networks”, “algorithms”, “cooperation”, “clustering”, “data mining”, “identity”, “location-based services”, “network”, and “performance”.

3.5.2 Temporal evolution of keywords

Examining temporal evolution of these keywords would provide insights about the trends of research hotspots. We divided the 20-year period into three consecutive periods (2000–2009, 2010–2014, and 2015–2019). For the 30 most frequently used keywords mentioned earlier, we listed their frequencies and ranks during the corresponding period in Column (2), (3), and (4) of Table 6.

Table 6 Temporal evolution of the 30 most frequently used keywords

If the rank of a keyword keeps moving upward across the three consecutive periods, we consider the keyword to be a rising trend. In contrast, a keyword is in a declining trend if its rank across the three consecutive periods keeps moving downward. It was found that 12 keywords (“social media,” “Twitter,” “big data,” “geography,” “China,” “human mobility,” “machine learning,” “GIS,” “location-based social networks,” “clustering,” “data mining,” and “location-based services”) became increasingly popular in publications during the past 20 years. The keywords “social media,” “Twitter,” and “big data” referred to the data sources of spatial social network analysis. These keywords did not exist in articles in the 2000–2009 period, but they became the first, fourth, and sixth most popular keywords in the period of 2015–2019. This dramatic increase coincided with the popularity of social media and accessibility of social media data. Due to the growing availability of high-speed Internet access and the development of Web 2.0 technology, many social media applications, such as Twitter, Facebook, and YouTube, were created between 2000–2010 (Kaplan & Haenlein, 2010). Twitter provides shorter messaging updates for faster dissemination and application programming interfaces (APIs) for easy data access. As such, Twitter became a valuable and popular tool for researchers to use to collect a large volume of data quickly for little cost (Huberman et al., 2009; Kwak et al., 2010). The spatial component of the spatial social networks is shown in the keywords “geography,” “human mobility,” “GIS,” “location-based social network,” and “location-based service.” Social media often conveyed not only “what” is happening, but also the “where” information, via both user locations and the locations of events (Crooks et al., 2013; Reynard & Shirgaokar, 2019). After some successful demonstrations of the potential of location-based social networks in the early 2010s (Cheng et al., 2011; Cranshaw et al., 2012; Long et al. 2012), many studies started to utilize the spatial component of social network to investigate human mobility patterns. As a result, those “spatial” keywords attracted more attention in the 2010–2014 and 2015–2019 periods. “Machine learning,” “clustering,” and “data mining” represent the popular methods for spatial social network analysis. Social network big data are generated and collected in very high volumes and very quickly and are nearly impossible to be manually read and qualitatively analyzed. Therefore, more and more researchers have started to use data mining (including clustering) and machine learning techniques to discover hidden patterns in such large datasets automatically (Blondel et al., 2015; Jiang et al., 2015; Lansley & Longley, 2016), and even in real time (Gu et al., 2016). The increasing trend of “China” indicates that China has been selected as the study area for spatial social network research more frequently in recent years. In contrast, the keywords “gender”, “cooperation”, and “network” received declining attention across the three time periods. “Gender” and “cooperation” were among the top 10 keywords during 2000–2009. The declining ranks of these keywords could simply reflect the change of research interests in the study field. The change of terminology from general “network” to more specific “social network” or “location-based social network” in later periods could be another possible reason for the declining trend of the keyword “network.”

4 Conclusions

Existing studies on research methods have only recently began to pay attention to spatial social network data, although relational data is a longstanding focus in social science. In the past two decades, the annual publications about spatial social network increased from 15 in 2000 to 410 in 2019, with an average annual growth rate of 19.75%. Further, the annual growth rate of publication has greatly accelerated since 2010. The three most productive journals on spatial social network analysis were PLOS ONE, International Journal of Geo-Information, and Sustainability. Regarding average citation number per document, Journal of Economic Geography, Annals of the Association of American Geographers, and Urban Studies were the three most highly cited journals, with average citations per article of 101.800, 40.389, and 39.923, respectively.

The United States was the most productive country, contributing the most single-country and international collaborative articles. The United Kingdom published the second highest number of articles, followed by China. Among the 10 most productive countries, six are in Europe, two are in North America, one is in Oceania, and one is in Asia.

The collaboration network of the top 15 most productive institutes suggest that the Wuhan University, the University of Oxford, and Harvard University were the most productive institutions. Among the 15 institutions, more than half are in the United States. Further, institutions from the same continent collaborate more intensively with one another than with institutions in different continents.

A keywords analysis through temporal evolution and co-occurrence network demonstrated that “Twitter,” “big data,” “networks,” “spatial analysis,” and “social capital” were the long-time keyword hotspots over the past 20 years. Some keywords, such as “social media,” “Twitter,” “big data,” “geography,” “China,” “human mobility,” “machine learning,” “GIS,” “location-based social networks,” “clustering,” “data mining,” and “location-based services,” attracted increasing attention over time.

On the basis of the temporal evolution of keywords, we can conclude that spatial social network analysis research has been enhanced by the rapid development of accessible data sources and big data techniques. Such enhancement facilitated the emerging research directions, such as monitoring human dynamics, building advanced models to solve network-relevant problems, and applying the SNA approach to public health.

1) Monitoring human dynamics. The geotagged user-generated information obtained from social media platforms (e.g., Twitter, Facebook, and Instagram) greatly facilitate human mobility and migration monitoring (Hu et al., 2021a). Facebook developed the Data for Good Movement Range Maps, to notify scholars and public health practitioners how people act upon physical distancing measures (Meta, 2022). Huang et al. (2020) created a mobility-based responsive index derived from geotagged Twitter data to monitor human mobility. The team further developed an online platform, share the mobility index with the public (Li et al., 2021). The accessible human mobility datasets dramatically promoted the emergence preparedness and responses. For example, Bonaccorsi et al. (2020) performed a massive analysis on near-real-time mobility data provided by Facebook and investigated how lockdown strategies impact the development of economics. Huang et al. (2021) analyzed multi-source human mobility data and highlighted the disparities in mobility dynamics across counties of various income levels in the US during the COVID-19 pandemic.

2) Advanced modeling in spatial social networks. With the rapid advances of Artificial Intelligence (AI), many studies utilize deep learning approaches to solve graph-related problems, such as GraphSage (Ahmed et al., 2017), GAAN (Zhang et al., 2018), and DeepMGGE (Fu et al., 2020). In social network, spatial variation is a key factor as well as the temporal-evolution characteristics. Thus, recent studies considered both spatial and temporal features to model the social networks. For instance, Min et al. (2021) utilized the temporal attention mechanism to identify the dynamic features of social networks and also proposed Spatial–Temporal Graph Social Network, a graph neural network framework. The method outperformed state-of-the-art machine learning algorithms.

3) Applications in public health. SSNA has been widely applied in subjects of geography (Wang et al, 2022a, b), economics (Bu et al., 2020; Reid et al., 2008), computer science (Fu et al., 2020; Min et al., 2021), information science (Hu & Zhang, 2021; Ye & Andris, 2021), and urban planning (Ye & Liu, 2018). The recent COVID-19 pandemic accelerates the application of SSNA in public health as well, especially in information distribution (Ye et al, 2021), public sentiments and opinions (Gong & Ye, 2021; Hu et al., 2021b; Yang et al., 2022), and disease modeling (Yum, 2020; Albery et al. 2021). For example, Ye et al. (2021) incorporated information heterogeneity into non-parametric inference of the hidden interaction network to understand both infodemic and epidemic spreading in the COVID-19 pandemic (Ye et al., 2021). Block et al. (2020) adopted s social network approach to assess the effectiveness of social distancing strategies in the COVID-19 pandemic.