1 Introduction

In the last 2 decades, car parking industry has been reshaped by the increasing number and diversity of industry sectors and professionals involved in the car parking management process. For instance, with the development of smart city concepts (Moreno et al. 2016; Kai et al. 2018), Internet-of-thing (IoT) (Piccialli et al. 2021) based sensors and digital solutions (Ji et al. 2014; Sheelarani et al. 2016) are used in identifying and tracking lot availability; mobile computing, social media and crowdsourcing are used to interact with drivers and to guide them toward their chosen parking locations; big data, data management and artificial intelligence (AI) are employed to offer extra services from either city or third-party providers; interaction with emerging self-driving cars, connected and electric car related services (Ferreira et al. 2013), among others. These emerging trends in car-parking industry yield a continuous updating of technology platform with its digital/software components (Ji et al. 2014; Piccialli et al. 2021; Zheng et al. 2015) and driver’s behavior and/or daily practices. At the economic level, this led to development of various dynamic pricing models (Kotb et al. 2016), parking activity monitoring (TJB et al. 2017), free parking spot predictions (Awan et al. 2020), driver activity prediction (Birek et al. 2018), and parking behavior analysis (Spiliopoulou and Antoniou 2012; Wang et al. 2018; Zong et al. 2019). At the urban planning and policy level and in the light of the minimum and multimodal parking policy, the design for such parking systems becomes a highly challenging and complex task that involves multidisciplinary teams from architects, urban planners, computer engineers, sociologists and health/safety specialists (Eswaran et al. 2013; Merriman 2016; Willson 2015), among others, see survey papers (Kirschner and Lanzendorf 2020; Lin et al. 2017; Polak and Axhausen 1990; Revathi and Dhulipala 2012; Siuhi and Mwakalonge 2016) for detailed argumentation. Indeed, smart parking becomes a very crucial component in smart city ecosystem (Al-Turjman and Malekloo 2019) due to the interconnection between economic, social and environmental factors. This includes, for instance, the demand-based pricing models, consumer solicitations of real-time parking information-enabled through the industrial internet of things approaches (Boyes et al. 2018), and the high demands for more environmentally friendly solutions. As highlighted by Lin (2015), a smart parking concept involves three major modules: information collection, system deployment, and service dissemination. This either directly or indirectly involves disciplines from physics, sociology English (1988), information communication technology (ICT), transport, environment and urban planning (Christiansen et al. 2017), security (Chaudhary et al. 2017; Ferreira et al. 2014), arts, among others. These multiples services and disciplines demonstrate the broad scope of car-parking industry ecosystem nowadays. By standing on the previous literature review, Table 1 presents a conceptual model of car-parking industry ecosystem, highlighting the various disciplines and industries implicated in this ecosystem.

Table 1 Various parking sectors and industries are involved in the parking concept

In the era of web 2.0 framework, and acknowledging the rich information conveyed by social media platforms in all aspects of our daily life, this work explores the interconnection among car parking industry actors as revealed by the analysis of the corresponding hashtag communities. Especially, this work builds on the acknowledged development of hashtag-based online communities in many industry sectors, and, thereby, hypothesizes that the strength of the interconnection between car-parking industry sectors can be encapsulated by the strength of the social ties of the corresponding hashtags. In essence, hashtags, represented by the keywords included in the post following the # symbol, are ultimately used to generate increased participation and a “buzz” around the underlined topic (Swani et al. 2014). The success of hashtags in attracting users’ interest has led many organizations to adopt this new technology as part of their marketing strategy, which provides a unique opportunity to intertwine public/private organizations and citizens on the highlighted topic. This motivates the approach advocated in this paper. We restrict the scope of hashtag analysis to those inferred from Twitter platform for ease of access and availability. For this purpose, Twitter data are collected using three leading car-parking hashtags: #parking, #parkinggarage, and #parkingspot. The data are then processed using standard natural language processing (NLP) techniques followed by social network analysis in order to extract interpretable communities that would help to unfold the interaction among industry actors, citizens’ involvement and potential factors that govern users’ car-parking decisions and preferences. The research outcomes are expected to provide insights and answers to the following two research questions:

\({\mathrm{RQ}}_{1}\): How are the different industry actors shaped within hashtag communities? To answer \({\mathrm{RQ}}_{1}\), various communities are formed by their common interest, and relevant indicators are used to assess the relevancy of these communities. \({\mathrm{RQ}}_{2}\): How do different industry actors shape the user’s car parking behavior? To answer \({\mathrm{RQ}}_{2}\),various measures, such as sentiment polarity, engagement of the trending hashtags in each community and importance of bots are quantified and analyzed.

In essence, the main contributions of the paper are threefold. First, hashtag information, inferred from the tweet content gathered from three leading car-parking hashtags, is used to construct a social graph whose community analysis of the different hashtags networks using a combination of online-detection algorithms (Girvan–Newman, Clique, K-plex and K-core) and a set of rational explainability principles enables us to yield useful communities with multiple industry sector distribution. Second, the interconnection among the various industry sectors is quantified in each online community, by taking into account an initially developed conceptual model of car-parking industry ecosystem. Third, a social bots analysis is performed for the various communities in order to identify human-bot accounts proportion in each community in a way to comprehend the type of bots involved in parking industry and their social impact. The rest of the paper is organized as follows. Section 2 presents the data gathering process that includes data collection, cleaning with processing. Then, highlights the methodology followed in this work, and introduces the techniques and algorithms used in the social network analysis. Section 3 presents the results obtained by the proposed method. Next, Sect. 4 is for discussion. Finally, conclusions are drawn in Sect. 6.

2 Related works

In this section, we review the related work linked to the two research questions \({\mathrm{RQ}}_{1}\) and \({\mathrm{RQ}}_{2}\). This is detailed in the two subsection below.

2.1 Hashtags and industry sectors

The list of industry sectors seeking support from social media as part of their marketing and development strategy is almost endless. Indeed, with the increased digitilization of society, city services and media, the recourse to hashtag based analysis is seen as a way to promote “brand” of the corresponding industry sector(s), interact with consumers, stakeholders, suppliers and competitors. From this perspective, social media and hashtag have revolutionized the way many industries and market segments share, communicate, modify, create, and discuss product content and services (Kietzmann et al. 2011). In 2011, nearly 83% of Fortune 500 companies were engaged in some form of social media related activity with either their customers or consumers (Naylor et al. 2012). Whereas nowadays, the is no company in Fortune 500 list without a social media presence. Furthermore, since the Millennial generation where users cannot be separated from their social media account and daily activities, effective use of social media, e.g., through, hashtags, can generate positive word-to-mouth advertising and create viral effects in the market domain (Hanna et al. 2011). The preceding would suggest that hashtag-based reasoning can be universally beneficial for all corpora and industry actors so that social media is seen as a fastest way to grow any company in the 21st century (Edosomwan et al. 2011). Other scholars have constrained this observation by other contextual aspects. This holds for instance in the case where the customer population is very limited or the product faces no competition in the market where long-standing digital communication tools like emails or news display are found to be more effective (Jarvinen et al. 2012). Similarly, recent survey of business actors has shown that for many executives, the benefit of using social media do not outweigh the cost and other risks (Michaelsen and Collini 2022), prompting a prudent attitude towards social media consumption, especially with the spread of fake news and bots. A key characteristic of hashtags is their excellent ability to sustain the emergence of virtual communities by recognizing the amount of support that can be gained by the introduction of a single hashtag, leading to new social formations that led to profound behavior societal change (Rheingold 2002). This motivated companies to seek into efficient way to build and maintain their virtual communities in a way to benefit their business activities and portfolios. Communities also enable companies to have continuous dialogue with their customers, stakeholder, which helps in identifying gaps and good practices. In terms of key industry sectors that directly utilize hashtags, VTT report (Ahlqvist et al. 2018) pointed out business analytics industry, advertisement, marketing, security, media, IT and Artificial Intelligence as top industry sectors where hahstag is actively researched. A distinguished marketing strategy consists in capitalizing on celebrity effects and social media influencers to reach a wider audience and gain support. Indeed, companies can generate cultural capital through the three main development phases: integrative use of external digital cultures, co-construction of web-mediated cultural value, ultimately resulting in the attainment of online celebrity (Rindova et al. 2007).

2.2 Industry sectors and car parking behavior

Regarding \({\mathrm{RQ}}_{2}\), we shall mention that car parking industry has been very much impacted by the national sustainable development plan, the technical advances in car connectivity, city infrastructures, transportation policy and general state of the ecosystem in the vicinity of user’s trip (EURO 2020). From this perspective, several industry sectors can either directly or indirectly have a strong incidence on user’s car parking behavior. This includes (1) Wireless Communication that can influence the ability of the user to get real time information; (2) Business activity in the vicinity of parking area; (3) IT that connects user to various city services he is searching for; (4) Construction where any ongoing construction or road maintenance work can generate increased frustration; (5) Transportation, especially in case of multi-modal transportation mode where user uses the parking to switch to another transportation mode; (6) Management and Planning where ill management of parking sites and lack of parking supply negatively impact the driver’s behavior. This also includes financial management where ticketing and pricing of Lot can constraint the user’s parking time and location; (7) Tourism industry where management of tourism attractions require special attention to ensure user’s full satisfaction; (8) Meteorology where bad weather prediction can create increased road traffic that impact parking supply; (9) Health where availability of parking supply nearby health centres can impact user’s decisions. Furthermore, it should be mentioned that when using intelligent car parking solution, several other factors and business sectors are simultaneously triggered, which directly impact user’s behavior (Moreno et al. 2016). In the literature, multiple works have addressed the concept of car parking behavior using various modalities and methods. For instance, Feng et al. (2019) have investigated the parking behavior in the Provence of Zhejiang in China by analyzing parking data from a shopping mall. The behavioral analysis consists in comprehending user’s attitude towards parking according to weather condition, location of Lots and pricing effects. Ying and Sun (2020) explored the parking behavior by conducting survey data in Wuhu city to assess the quality of the parking management in the city and parking influencing factors. Van der Waerden et al. (2015) have investigated the parking behavior as well by diving into the role of travel and trip information in shaping the parking decision of the driver. The study’s data were based on questionnaires in both Belgium and Netherlands. Zhang et al. (2021) suggested a predictive model for parking behavior by taking into account historical data on parking, environmental factors, and weather condition along with information about key events using data nearby business entities. Diaz Ogás et al. (2020) surveyed the smart parking systems literature, focusing on key algorithms and methods. In another survey Barriga et al. (2019) investigated the technologies, sensors, technical architectures, and software involved in smart parking solutions. Based on our review of relevant literature, a conceptual model for activities that influence parking behavior is drawn in Fig.1.

Fig. 1
figure 1

Diagram of the sectors influencing the parking behavior

3 Methodology

3.1 Dataset description

The dataset employed in this study is gathered using Twitter APIFootnote 1 (see, also Oussalah et al. 2013), and the python library GetOldTweets3Footnote 2 was used in the data collection which was useful to collect old tweets and three leading hashtags #parking, #parkinglot, and #parkinggarage. The choice of these hashtags is motivated by their relevance to the current study and their relatively high level of activity as predicted in the hashtagsforlikes web page. We collected the main tweet attributes (e.g., tweet message, mentions, unique identifier (ID), timestamp of the publication date, list of followers, count for retweets, count for favourite). In order to provide accurate data analysis and extract useful information from the collected data, a proper processing step is necessary. For this purpose, the tweets are first tokenized and stopwords are removed, after a careful manual scrutinization of the default English stopword list. Next, all the unnecessary characters such as punctuation, hypertext markup language (HTML) tags, HTML attributes, and character codes, excluding important characters like “@”, “#” as they enable us to discriminate retweets, hashtags and various mentions, are filtered out. The approach is made for each hashtag dataset separately. A summary of the dataset is provided in Table 2.

Table 2 Dataset summary

3.2 Overall method

Figure 2 highlights the overall approach describing the adopted methodology.

Fig. 2
figure 2

Diagram of overall methodology

As already pointed out in the introduction part of this paper, we mainly focus on hashtag community analysis to tackle research questions \({\mathrm{RQ}}_{1}\) and \({\mathrm{RQ}}_{2}\), assuming that both the surface-level content of hashtags and their associated tweets’ contents convey relevant information about the industry sectors involved in each community. For this purpose, first, a social network is constructed exploiting the interconnection among the various hashtags in the collected dataset together with some thresholding mechanism to ensure useful networks are generated. Second, a novel approach is put forward for community detection from the social network perspective, which uses a combination of standard graph mining methods (K-core, K-plex, Clique, Girvan–Newman) and a set of interpretability principles that exploit general trend analysis of outcomes through some semi-automated based approach. Third, the identified communities are analyzed in terms of industry sector-interconnection, proportion of bots, hashtag surface-level sentiment polarity and level of engagement. The latter can also provide insights to comprehend factors affecting user’s car parking behavior and decision.

3.3 Social network community analysis

Prior to social network community analysis, one shall mention the importance of global trends, which includes, among others, the top influencers accounts (in terms of number of tweets generated for instance). Especially, we can scrutinize other content of an identified user (using timeline Twitter API) to elucidate the scope of its activity at wide, which helps in elucidating any spamming activity as well.

3.3.1 Construction of hashtag network

Given two hashtags, say, \(H_{1}\) and \(H_{2}\), then a link is established between these two hashtags if and only if

$$\begin{aligned} |\left\{ {T:H_{1}\in T,H_{2}\in T }\right\} |\ge \xi \ \end{aligned}$$
(1)

The preceding indicates the existence of a sufficient number of tweets T, beyond a certain threshold \(\xi\), that contain both hashtags \(H_{1}\) and \(H_{2}\). The choice of the threshold \(\xi\) is guided by the nature of the generated social graph to ensure the existence of a useful giant component in the induced graph that can lead to useful (sub) communities. From this perspective, a semi-empirical choice of \(\xi =3\) was found appropriate in our testing and experiment. That is, a smaller value of \(\xi\) would make almost all nodes of the network to be connected, while a larger value results in a sharp decrease of the size of the giant component. Assuming that each hashtag conveys a specific subject of discussion, this expects to unfold the wide interests of car-parking community users, the inherent link among these various discussion topics, including social, economical and personal factors, as well as the range of industry actors involved with their interaction map. Therefore, incremental set of distinct choices of the threshold \(\xi\) has been tested in terms of the size of the generated giant components and size of communities, which motivated the semi-empirical choice mentioned earlier.

3.3.2 Community detection

A set of useful graph metrics and community detection algorithms have been considered in our methodology. Their implementation follow NetworkX guidance. This includes Clique, K-plex, K-core and iterative Girvan Newman algorithm (Mohanty et al. 2015). Besides, overall network metrics such as Average path length, Average degree centrality, Clustering coefficient, Diameter value were used to quantify individual graph or subgraph like community. From the implementation perspective, we have used NetworkX implementation of these algorithms.Footnote 3

3.3.3 Hashtag engagement

Hashtag engagement is a measure we used to evaluate the user interaction with a particular hashtag. Especially, a hashtag h is said to have received an engagement if it has been either “Retweeted” or “Favoured”. Therefore, the total engagement, for a given hashtag h, is measured by the proportion of tweets that have received an engagement over the total number of tweets that contain h. More formally, borrowing the notations used in DeMasi et al. (2016), the engagement E(h) of a hashtag h is defined as:

$$\begin{aligned} E({h)} = \frac{\sum \nolimits _{\tau \in T({h)}} I\,_{ [\tau \, {\text{has}} \, {\text{Retweet}} \, {\text{or}} \, {\text{Favourites}}]}}{|T({h)|}} \end{aligned}$$
(2)

where T(h) is the set of tweets containing the hashtag h, and the quantity I[] is an indicator function that returns 1 if the underlined tweet has been either retweeted or liked, and 0, otherwise.

3.3.4 Hashtag sentiment

Another measure that enables us to evaluate the potential interaction of the user with the hashtag is the sentiment attached to the sentence or string of hashtag’s statement. Intuitively, a strong positive or negative sentiment effect in the hashtag statement may attract certain categories of users in order to show support or refute the highlighted arguments. Strictly speaking, sentimental or mood related hashtags have been acknowledged to occur in one of the three classes of widely identified hashtags types, namely Thematic hashtags, Sentimental hashtag and hybrid sentimental-thematic hashtags (Van den Berg 2014).

The hashtag sentiment is performed by analyzing the words that form the particular hashtag using Sentiment VaderFootnote 4 (Hutto and Gilbert 2014), after an initial preprocessing that includes (1) Splitting hashtag compound wording into individual words using the compound-word-splitter pythonFootnote 5 library that helps in breaking the unrecognized word into a set of words, and then (2) Join the different words to form one small sentence ready for sentiment analysis.

3.3.5 Interpreting communities of hashtag graphs

The application of the various community detection algorithms (Clique, k-core. k-plex, Girvan–Neuman) can generate several (sub) communities, which are not necessarily useful for the purpose of our study. In order to achieve easily interpretable communities, three rational criteria have been set up.

  1. 1.

    The community size is beyond a certain threshold \(\alpha\), which would exclude small size communities formed by only one or two nodes for instance—Size Critically Principle.

  2. 2.

    There is a minimum engagement of the hashtags belonging to the community—Community Engagement Principle. More formally, for a community of size m, one should ensure:

    $$\begin{aligned} \frac{1}{m} \sum _{i=1}^{m} ({h_{i})} \ge \epsilon \end{aligned}$$
    (3)

    where \(\epsilon\) stands for the average community engagement threshold. This excludes community whose nodes (hashtags) have recorded no user’s engagement or very little in overall. The choice of threshold \(\epsilon\) is context dependent.

  3. 3.

    The community members convey a minimum surface-level interpretability (Community surface-level interpretability Principle). This interpretability, although subjective and requires user’s scrutinization, consists of performing a manual check of either the hashtag message contents or available attributes of users (e.g., location, User’s ID) associated with each hashtag, and check whether some trend can be observed. For instance, the hashtags’s transformed wording (after word splitting) may reveal that hashtags word constituents can be grouped into a same category, or the Twitter users of hashtags share same location, or same IDs, etc. This process is helped by a visualization module that enables the user to easily visualize all transformed word of the hashtags, properties of their associated IDs, and decide whether a generated community will be preserved or discarded, and thus the focus should be shifted towards other (sub)communities.

3.4 Social bots analysis

In social media, a bot is meant to represent an account controlled by a software (Yang et al. 2020). These bots could have a positive or negative impact on social media, and some could be harmful. A bot can manipulate social media users, creating spam, frauds, among others. Hence, in our work, bots investigation is a crucial point in order to reveal and identify the human presence in social behavior and the type of social bots associated with parking. To do this, the botometer APIFootnote 6 is used in order to analyze the Twitter users IDs associated with the various communities, and the detailed approach is summarized in Algorithm 1 below. It should be mentioned that we have restricted in the algorithm to the top-25 Twitter IDs according to the number of tweets they do generate, mainly because of the Botometer API limited access per day, while it is also true to speculate that most 25 active users can provide a fairly good coverage of the discussion around the hashtag topic (Fig. 3).

Fig. 3
figure 3

Algorithm 1

3.5 Industry-sector interconnection

Given a generated community corresponding to a subgraph of the overall hashtag network graph, then the process of industry sector interconnection evaluation makes use of two distinguished analysis:

  • A mapping from hashtag to industry sector.

  • A weighting mechanism that only involves the calculus of number of shared tweets among the two disciplines provides an indication of the interconnection score.

This mapping is performed by scrutinizing word constituents of hashtags (surface-level analysis) and associated user-ids of the corresponding tweets. This is due to the observation that many companies do utilize user ids where the name of company is included. Therefore, using the taxonomy highlighted in Table 1, each node (hashtag) of a given community is assigned to one or several industry sector categories, accordingly. On the other hand, in the weighting mechanism, the interaction between two industry sectors, say X and Y, can be computed by summing up the weights of edges where X and Y and linked by an edge. Example of such calculus will be provided in the result section of this paper.

4 Results and interpretations

4.1 Global trend analysis result

4.1.1 Top active users

One useful trend analysis is related to our desire to comprehend the nature of active users in the collected Twitter dataset. This boils down to identifying the number of tweets per each Twitter ID contained in the database, and rank the active users accordingly. At the same time, we also performed a manual check of the Twitter content to grasp the type of activity and main business associated with each of the top active users. The result of this analysis is presented in Table 3. The reading of the table shows the following. First, when comparing the original three parking leading hashtags used in the data collection process, \(\#Parking\) emerges as the most active one. Second, when scrutinizing the activity of the most active users in terms of number of generated tweets, one notices the parking providers, online booking companies, real estate companies and travel booking agencies share more than 90% of the total activities. Therefore, most of the top active users are business companies and organizations. The few individuals that are found in this list are either photographers or artists. Each of these active user holds a large number of followers. This result will be reflected in the cliques and community analysis, where there is a high presence of these influencers. Other influencers include manufacturers, construction companies, and artisans (such as painters and photographers) who have an interest in parking facilities and art. These results overlap with some of the literature finding about industries and categorization of parking industries made in Table 1.

Table 3 Top 10 influencers in each hashtag and their number of followers and number of tweets within the dataset

4.2 Hashtag social network results

Visualization of the overall hashtag network according to the methodology highlighted in the methodology section is presented in Fig. 4. In addition, we also considered a special network of this overall hashtag network where we restricted to only those hashtags that contain string “park” in the hashtag. We shall refer to this subgraph “Park Hashtag Network”. The summary of network attributes of both “overall hashtag network” and “park hashtag network” are summarized in Table 4.

Fig. 4
figure 4

Visualization of the hashtag network

Table 4 Summaries for hashtag networks

From Table 4, we notice the following. First, both graphs are well-connected since the average shortest path length is just two hops, with high clustering coefficient and small diameter value. Second, a low value of average in-betweenness centrality in both graphs indicates the lack of central nodes that would be crucial for information flow between nodes, which is expected given the high connectivity of the graph. Third, a low average degree centrality indicates that nodes are not straightforwardly connected to other nodes (i.e., with one hop). Still, the low value is not surprising when taking into account that the centrality value of individual nodes has been normalized with the most significant possible degree that is very high, given the number of nodes. It is no surprise that #Parking is the most central node as its degree and in-betweenness centrality values are above 0.9 in both graphs. No other node presents a such highly centrality. #ParkingLot is the node with the second highest centrality values in both graphs, but the values are only around 0.1!

The application of the interpretability principles highlighted in Sect. 2 (size criticality, community engagement and community surface-level interpretability requirements) when investigating the outcomes of each of the community detection algorithm (Clique, k-plex, k-core, Girvan–Neuman) has resulted in several interesting (sub) communities that are highlighted below. It should be noted that without the application of such constraints, one yields a large number of results. For instance, more than 1172 cliques were obtained for hashtag network and 556 for park-hashtag network together with hundreds of communities resulted from Girvan–Neuman, k-plex and k-core algorithms, which raise the importance of these principles to restrict the outcome to useful ones only. We shall distinguish results obtained for the overall hashtag network community detection and those pertaining to park-hashtag community.

4.3 Communities from hashtag network

4.3.1 Electric cars clique

The clique highlighted in Fig. 5 corresponds to a community of electric cars owners looking for parking spaces equipped with electric charging stations. The significant number of interactions (as demonstrated by high value of engagement values of nodes) and high number of tweets in this clique testify of users’ interest to identify parking lots equipped with electric chargers for their electrical vehicles. This joins finding by Van Wee et al. (2012) who discussed the implications of electric cars in transportation systems and the need to adapt the infrastructure accordingly.

Fig. 5
figure 5

a “Electric car” clique from hashtag Network: E refers to the engagement value, and S refers to the sentiment value. b Social graph of Industrial categories resulted from the electric car community. Each edge value represent a number of tweets involved between two hashtags, or industries

Regarding the hashtags that form this community, one shall notice a good engagement for hashtags related to new technology such as #charginginfrastructure, #software, and #prop-tech, which indicates the attraction of people by these electrical facilities in parking and the need for electric stations. On the other hand, an average engagement is noticed with hashtags like #investement and #electricvehicles; this may be due to advertisers’ utilization of these hashtags, which, in turn, results in less interaction of genuine users. In addition, the sentiment analysis that characterizes all the hashtags is rather neutral (0) due to commercial nature of the statements. In terms of industry sector involvement, electric cars industry involves installing electric car chargers and maintaining the associated technology (D, G), constructing associated infrastructure (B), while taking into account the environmental (C) and security (E) matters. Also, the hashtags and discussions highlighted the strong involvement of the ICT software industry that generally includes online parking reservation services, web or mobile apps, online parking support, and could solutions used to store and distribute parking data. Figure 4b shows the highest inter-sector interaction in this community occur between environment (C) and Information Technology (D), and between Information technology (D) and Construction (C), which reinforces the observation that Information Technology plays central role in the car-charger ecosystem.

4.3.2 Corona clique

Figure 6 presents another interesting clique. It shows the intertwine between parking and corona hashtags. The discussion topic was about finding free parking spaces, paying parking fees during corona time, potential discount, and finding the closest parking space and its associated cost. Interestingly, it also shows the interest in ICT solutions that permits real-time information about parking lots availability as essential factor that influence some individuals’ parking search decisions in pandemic situation, which shows the interest of users to restrict their travel scope during pandemic time. This also agrees with related work reported in Zhang et al. (2016), and Zong and Wang (2015) that reviewed relevant factors that need to be considered in the planning and management of new parking infrastructures through intelligent sensing technology. This corona clique is composed of 6 different hashtags, where the #parking, and #parkinglot hashtags display the clique’s parking aspect, while the remaining 4 hashtags represent the corona pandemic, although the engagement value is found pretty average, possibly reflecting the low tendency of individuals to go out and use their cars during the corona pandemic. Regarding the involved industries, one shall mention the prevalence of installations (G) or construction (B) of new parking amenities in the discussions.Footnote 7

Fig. 6
figure 6

“Corona” clique from hashtag network. Each edge value represent a number of tweets involved between two hashtags

4.3.3 IoT and smart parking clique

The clique in Fig. 7 presents a 2-plex that connects parking hashtag and technology-related hashtags such as internet of things (IoT), artificial intelligence (AI) and smart cities. The discussion is mainly associated with a mobile app often used as a real-time parking solution, with the help of smart parking sensors to find available parking lots in real-time. Scrutinizing the Twitter users involved in this clique reveals that most users have a large number of followers and high engagement scores. This clique shares many features with electric-car clique in terms of users’ requirement and technology industry, stressing on demands for adapting smart solutions with parking demands through mobile app and smart parking solutions. Intuitively, mobile apps have the potential to improve parking quality of services by reducing the time of cruising for parking and providing real-time parking information and their cost. Besides, data sources gathered from parking sensors open collaborations between various industry sectors and municipality in order to use the data efficiently for multiple strategies such as software engineering to accommodate various third-party apps, analyzing vehicles and drivers activities, policy-making and urban planning.

Fig. 7
figure 7

a IoT and smart parking clique, a 2-plex from hashtag network. b Network of industrial categories resulted from the IoT and smart parking clique. Each edge value represent a number of tweets involved between two hashtags, or industries

Regarding the nature of hashtags associated with this clique, 8 hashtags concern IoT sensor technology in parking, with relatively high engagement score, and even a positive sentiment, which testify of users’ interest to the underlined concepts. Especially, hashtags #innovation, and #smartcities are endowed with positive sentiment and very high engagement score. The inter-sector interaction map in Fig. 6b shows again the central role of Information Technology (D) as the heart discipline ensuring effective interaction with other disciplines.

4.3.4 Social, photography and personal effect community

Figure 8 shows the second-largest k-core obtained from the hashtag network, a 16-core community. It demonstrates the centrality and the importance of the #Parking hashtag, which connects four other most significant cliques where each clique represents a category of interest related to parking. For instance, one of these cliques revolves around photography and parking.

Fig. 8
figure 8

Social, photography and personal effect community formed from the 16 core of hashtag network

In general, the relatively high size of this community explains the diversity of the topics covered. The top 11 hashtags (in terms of number of tweets involved in) are presented in Table 5. One notices the dominance of social, photography, transportation, personal and to a small extent business aspects. The engagement is relatively medium for the majority of the hashtags, and the sentiment is neutral, which testify of relatively moderate interaction of users. On the other hand, we can identify that the #parking hashtag is the only common term that links the various sub-communities with dominance of Architecture (A) and Artisans (H) fields, stressing on the importance of parking location, design to enable artistic activities. This draws on large scale and diversity of parking concept involving several aspects of urban life.

Table 5 Hashtags engagement and sentiment from the social, photography and personal effect community formed from the 16 core of hashtag network

4.4 Communities from park hashtag network

Next, the park hashtag network is analyzed. Three different communities are extracted from this network. The interpretations and the analysis of this communities is made by following the same approaches as with the previous network.

4.4.1 Parking skills clique

The clique in Fig. 9 is related to individuals criticizing other individuals because of their parking skills. The hashtags such as #badparking, #parkingidiot, and #parkignfail powerfully demonstrate individuals’ bad behavior when they tend to park their cars. It also reports about skeptical individuals’ parking behavior or the lousy parking planning and management provided by the city’s parking suppliers. These findings agree to some extent to Mondschein et al. (2020) who studied parking behavior using online reviews from Yelps restaurants.

Fig. 9
figure 9

“Skills” clique from the park hashtag network. Each edge value represent a number of tweets involved between two hashtags

We can notice a shallow engagement of people with such hashtags through their negative discussion. The sentiment analysis is negative for the three hashtags that describe the parking behavior. One positive sentiment is reported, by abuse and due to error when splitting the hashtag by the splitter library, for the hashtag (#learntopark), even though the hashtag concerns people’s parking skills. While the rest of hashtags are rather neutral. A conclusion to be drawn from this hashtag analysis is that the shallow engagement accompanies the negative hashtags. People stay away from interacting with such negative or nasty wording that may harm others’ feelings of individuals. Moreover, the interactions and discussions in this community point out the lack of security systems in these parking structures that could alert potential disturbance or bad behavior. Also, a parking guidance system or software could help monitor the parking places, guide people to free or available slots, and help avoid disputes for spots between drivers. Therefore, this calls for further collaboration between parking providers (G) and municipalities (B), security companies (E) as well with ICT companies (D) to bring more comfort and guidance to users in their parking tasks.

4.4.2 Parking management clique

Figure 10 visualizes another clique associated with new technology and parking management issues, reflected by hashtags associated with parking-management, parking-news, parking-guidance as well as the associated solution/service suppliers and industries reflected by hashtags like smart-parking, parking-sol, parking-technology. Multiple services attempt to associate users’ demands with parking guidance and increase parking management efficiency. For instance, the E-parking service enables the use of data gathered with networking technologies in cloud from IoT-sensors systems, or web services or mobile apps, which aid online reservations of parking. We shall notice important engagement score for most hashtags of this community, which testify about users’ interest. This finding is also supported by other studies identified in transportation literature. For instance, Mondschein et al. (2020) reported on the lousy quality of parking lots appeared to cause frustration to users and raise the importance of parking management. In terms of sentiment, #smartparking is found to bear a positive sentiment, while others are rather neutral, which is widely expected given the nature of discussions. In short, the community raises the interest for a better parking management, including smart parking solutions and IoT systems. A wide range of industry sectors are found embedded in this community [sectors A, B, C, S, E and G where D (Information Technology) is again found to be playing pivotal role].

Fig. 10
figure 10

a Parking management clique from park hashtag network. b Network of industrial categories resulted from the IoT and parking management clique. Each edge value represent a number of tweets involved between hashtags or industries

4.4.3 Large scale technology discussion community

Figure 11 shows a relatively large scale community from level 4 Girvan Newman algorithm where parking and technology aspects are the dominant trend. The community distinguishes three main hubs corresponding to hashtags #Parking, #ParkingLot, #ParkingGarage. At next level in terms of connectivity, one encounters hashtags #parkingmanagement, #smartparking,#parkingsolution and #parkingtechnology, which define the context of this community that is mostly dominated by parking technologies and management issues. Due to the large number of nodes forming this community and in order to highlight the emerging discussion trends, we present in Table 6 the top hashtags in terms of number of tweets mentioning these hashtags together with the associated engagement and sentiment scores. Especially, we observe high engagement for technology-based solutions #parkingapp and #parkingindustry with more than 80% score, as well as those related to ticketing and broad technology issues with more than 70%. Other hashtags, like #parkingsolutions, #parkwheels, and #hotelparking, also present acceptable level of engagement, with prevalence of advertisement messages of business sectors in the vicinity of parking areas. We also constructed the contribution of various industry sectors using histogram representation the dominance of Information Technology (24%) followed by Architecture and Security (16% each).

Fig. 11
figure 11

a Large scale technology discussion community, from level 4 of Girvan Newman algorithm from the park hashtag network, b bar plot representing the industries involved with the community with respect to hashtags refereeing to industries concept. The values are in percentages showing the industries occurrence in the most frequent hashtags forming the community

Table 6 Hashtags engagement and sentiment of the large scale technology discussion community from level 4 of Girvan Newman

4.5 Social bots

In the results summarized in Table 7, the analysis has indicated that in six communities out of seven, the humans’ proportion was higher than bots’s proportion. Only one community (large scale technology community) exhibits the prevalence of bots. Especially, the majority of the accounts classified as spams were rather attributed to parking marketing and parking reviews like robots. However, the gaps between humans and bots proportions vary across various communities. For instance, humans have a significant presence in social-based communities such as the corona clique and parking skills community with a proportion of more than 70%. This ratio is not a surprise considering the orientations and directions of discussions of the communities centered more on social aspects that involve genuine reviews, more emotions, and agitation from participants. However, in parking management and IoT smart parking communities, the ratio of human proportion substantially decreases. This is caused by the tendency of the API to classify most accounts delivering marketing and advertising information as bots. It is noticeable to see the presence of social bots in all communities, although with different proportions. This can also be attributed on the employed threshold (0.5) in the botometer API, which is also open to debate, as this can yield several many false positive. Overall, the communities were more humans centered than bots. These results bring more consideration and confirmations for the communities’ interpretations about parking industries, parking issues and parking topics.

Table 7 Summary about the communities size, engagement, sentiment, and industries involvement

5 Discussions

5.1 Summarizing the findings

Table 7 summarizes the main properties of the various communities in a way comprehend the research questions \({\mathrm{RQ}}_{1}\) and \({\mathrm{RQ}}_{2}\), highlighting the main industry sector, community scale, average hashtag engagement, sentiment and bot percentage in each community. In terms of parking behaviour, one can distinguish social, economic and technology aspects that widely contribute to user’s behavior. In short, social factors are associated with individual’s lifestyle, including habits, cultural and community based factors. Economic factors are ultimately linked to financial costing and moneywise, while technology factors correspond to the ways the new technology affect business and/or practices. In this course, using both the results of community-based analysis and global trend analysis. For instance, one distinguishes individual parking skills, leisure purpose, illegal parking behavior as the main categorization of social factors, which are mainly identified in Parking-Skills Clique community. Economic factors are found to be mainly exemplified in potential cost induced by lack of parking lot (i.e., fuel, extra commuting costs, missing opportunities), actual parking cost, discount/free parking and potential fine costs, if any. These are manifested mainly in two communities (Corona Clique and Parking Management Clique). Finally, technology factors are manifested through at least three stream of factors. The first one is associated with mobile apps that interact with parking system, handle online payment, and check for lot availability on real-time. The second stream is infrastructure based and seek for smart parking system, availability of electrical chargers. The third class is related to the technical knowledge, skills and potential hints associated with the use of such mobile app and smart systems. Four communities encompass the technological factors: IoT Smart Parking Clique, Large Scale Technology Discussion Community, Parking Management Community and Electrical Car Clique. This is summarized in Table 8. Besides, in terms of engagement, the results show a high user’s engagement with technology related factors, which can be explained by the inherent curiosity of users to explore the new era of smart parking infrastructure and mobile app with real-time information. This is followed by economical factors, which can also be explained by the importance of costing aspect. While the social factors are found to bear no engagement, where most of associated tweets are rather informative.

Table 8 Parking behavior by different aspects with the communities involved

Other useful take away messages from this analysis are the following. First, one of the most utmost characterizations found within multiple communities is the new technology and IoT solutions demands. This aspect is highlighted with the significant engagement of the leading hashtags that describe these technology solutions such as: smart parking, tech, software, IoT, and AI, which testify of the continuous support of individuals for such new technological break through as a way to enhance quality of service and comfort. Second, the study indicates that technology hashtags are often associated with positive sentiment, which is itself associated with high engagement score. Third, social aspect associated hashtags often subsume small engagement, which reflect the non-desire of individuals to like or share pure personal social experience. Fourth, the structure of the identified communities is not exclusive, where one hashtag can participate in many communities, which sound natural given the various algorithms employed in the community detection task. This can be employed to study other aspects of user’s preferences among various communities, an issue which is not covered in this paper.

5.2 Comparison for social graph generation method

The semi-empirical approach adopted in this paper for threshold selection exhibited in Sect. 3.3.1 can also be questioned. In order to demonstrate the technical soundness of our proposal we have carried out a comparison with the state-of-the-art multiscale blackbone approach of Serrano et al. (2009). For this purpose, we have used the disparity filter method and followed the implementation available in the GitHub repository.Footnote 8

The method applied to the network creation follows the same social network construction process applied to our network. However, without the inclusion of the threshold, i.e., given two hashtags, a link is established between them if at least one tweet contains these two hashtags. Next, in order to identify which links should be preserved, we applied the disparity filter. This has resulted in a social graph called a backbone network where specific links are maintained for each node. Table 9 provides the resulting attributes backbone graph compared with our social graph attributes.

Table 9 Summary comparison for hashtag network and backbone based network attributes

Looking at the results in the table, we can notice that the resulting backbone network is smaller than our network. It is characterized by higher average betweenness centrality, and closely similar results were found for the degree centrality, average path length, and diameter. Regarding the community detection algorithms, the backbone-based graph yields less number of communities than ours in some algorithms, such as k_core algorithm, and a higher number of cliques but with a relatively (very) smaller size.

Strictly speaking, the backbone method could present suitable results in terms of node relevance following a predefined significance value (taken to be 0.5 in our testing) and provide a less dense network with a better visualization possibility. However, the significance value and the statistical-based method followed in filtering parts of the network have led to the deletion of several communities with meaningful discussion topics. Some of the ignored communities were considered in our work and brought useful outcomes in terms of industry interaction and user’s car parking behavior. It is essential to mention that different significance values could lead to bigger or smaller networks that could give different communities results, so monitoring the significance value is also an essential open horizon that is not answered by the initial backbone construction algorithm.

Furthermore, in our semi-empirical threshold-based method, the network permitted us to investigate a larger number of communities and provide interpretation to the relevant ones after carefully studying them. In contrast, the backbone model has given a network which resulted in the deletion of communities based on statistical relevance, which might be questionable in terms of identification of parking industries and behavior.

5.3 Botometer accuracy

As already pointed out in Sect. 4.5, the use of Botometer API can be open to debate due the difficulty in validating the results in addition to the inherent limitations of its machine learning algorithmic construction that uses Random-Forest classifier. Indeed, the training of Botometer algorithm is based on a publicly available dataset containing 57,155 accounts labeled “bot” and 30,853 labeled “human” Yang et al. (2019) where the quality of labeling is sometimes questionable. Besides, the largest subset of ‘bots” comes from a sample of spammy or promotional accounts from the early days of Twitter (2009–2010), which may question its generalization capability to current bots. Therefore, trivially the lack of reliable ground truth dataset can impact the training performance of the algorithm. On the other hand, the lack of open access to the API also restricts our ability to retrieve details outcomes of the software in order to carry on further testing and evaluation. Nevertheless, the popularity of the Botometer API cannot be ignored as demonstrated by its adoption in several academic, commercial and city pilot projects. Although the exact value of the Botometer API is questionable, we hypothesize that such uncertainty will likely impact all communities almost in the same order of magnitude. Therefore, when it comes to comparing trends among the various communities, the results can be rationally accepted to some extent.

6 Conclusion

This paper presents an original contribution that explores the car parking industry and driver’s behavior from the hashtags analysis perspective. The study stresses the importance of hashtags to build online communities and infer useful insights in terms of users’ interaction, sentiment polarity and spammers. The construction of communities makes use of a set of rational criteria that enhance explainability; satisfying size-criticality, an acceptable level of engagement, and serface-level interpretability principles. By doing so, seven leading communities have been identified (Electrical Car Clique, Corona Clique, IoT and Smart Parking Clique, Social, Photography and Personal Effect Community, Parking Skills Clique, Parking Management Clique, Large Scale Technology Discussion Community). The results highlight the impact of multiple industries and how they reshape the car-parking ecosystem, with a special role of ICT and technology industry that has been found to derive the most significant impact on communities, engagements, positive sentiment and bot generation. In addition, the study identified useful factors that influence drivers’ parking search decision and preferences. These are categorized in social, economical and technological factors, which are reflected at different levels in the identified communities. The insights and discussions revealed in this work can help the municipalities and parking providers to advance in evidence-based policing when planning future parking infrastructures.