1 Introduction

Production outsourcing to provide the best combination of resources has been a critical issue in contemporary manufacturing supply chain management [1]. Large companies typically outsource non-strategic activities to suppliers and place more emphasis on core competencies [2]. For example, in the semiconductor sector, some firms do not perform any manufacturing at all, others undertake assembly and testing but outsource wafer fabrication, and yet others solely conduct testing [3, 4]. When outsourcing choices are implemented, various contractual partnerships between different firms are established in order to produce and distribute a product [3]. Both focal firms and their partners need to make thoughtful and strategic decisions about which connections to create and maintain in line with their own best benefits.

As modern enterprises become globalized and digital, manufacturing collaborative partnerships evolve into increasing complex network [5]. This leaves manufacturers vulnerable to the possible negative challenge brought by the network complexity on their operations. For instance, Boeing struggles with the adverse effects resulting from outsourcing and the increasingly complex supply chain network during the development and initial service phases of the 787 Dreamliner [6]. The network complexity may affect various aspects of the supply chain, such as the availability of raw materials, disruption risk, network resilience, operational performance innovation, and more [7].

To tackle the challenges, structuring supply chains from a complex network perspective to help firms configure outsourcing partnership has become a key task [6, 8]. The extant research has investigated that real-world complex supply chain networks are neither completely chaotic nor completely ordered, but rather exhibit scale-free, modular community, and hierarchical characteristics that can coexist in one network [7, 9, 10]. The scale-free topology reflects the existence of an obvious power law distribution of node connectivity in the network, implying that the focal firms obtain a large degree of connectivity due to their density of partnerships [11, 12]. Modular community refers to the high local link density characteristic of a network that can be partitioned into subgroups of modules, which have high connectivity of nodes within a module but low connection between modules [7, 13]. The presence of hierarchical topology occurs when scale-free and community properties coexist, exhibiting a scenario in which firms with high density links are associated with firms from other communities that have little chance of interlinking [7, 13]. From the perspective of resource flow of production, the linkages between enterprise nodes from one community to other communities characterize the supply chain structure. Whether the flow of resources is smooth or not directly affects the development of individual enterprise entities and the entire network [14].

However, it is observed that the overlapping feature of enterprise communities is often ignored in the current network structuring analysis. For example, the scale-free and community analysis of collaborative network is often conducted around the focal firms and non-overlapping communities [15, 16]. The overlapping communities that exist in the network create more complex and interdependent relationships between focal firms and suppliers, as well as among suppliers themselves [17]. This requires further understanding of the role of overlapping communities in structuring the network. In addition, overlapping communities result in blurred hierarchical network structure, which confuses the concept of straightforward linear relationships in the supply chain and makes it more difficult to comprehend the topology of the network.

Considering the network complexity, this paper aims to construct clear supply chain relationships in the network with highly overlapping community characteristics, which concerns the hierarchical upstream and downstream relationships of suppliers, assemblers, distributors, and manufacturers. Notice that the supply chain network contains multiple subsystems of community and multiple layers of connectivity, and hierarchical clustering is an effective technical approach to improve our understanding of the complex system by taking account “multi-layered” and community characteristics [18, 19]. Thus, this paper proposes a module gravitation-based hierarchical clustering method to facilitate the process of structuring the multi-layers of the supply chain network. As far as community discovery and the identification of overlapping modules are concerned, this paper adopts node-based gravitational clustering method, because it can effectively cluster dataset with arbitrary shapes [20].

The results of the study have important implications for manufacturing firms in the supply chain to make supplier selection decisions regarding outsourcing operations. It provides managerial insights to tap the network efficiency by considering the significant role of distributors in the supply chain network structure. The rest of this paper is organized as follows: Sect. 2 reviews the related works. Section 3 describes the proposed algorithm in detail. Section 4 compares the performance of the proposed approach with four benchmark algorithms on both synthetic and real-world networks. Management implications are also inferred from the results of real-world network experiments. Section 5 concludes our work of this study.

2 Related Works

2.1 Complex Supply Chain Network Structuring

As the complexity of supply chain network continues to increase, the massive interdependencies between firms create nonlinear relationships and complicated behaviors that are difficult to manage and control with linear models [9]. The use of complex network techniques to analyze and structure the network has become the rational choice. The first category of the current research is to validate supply chain performance by analyzing the topological characteristics of the network, including average path length, degree distribution, network diameter, clustering coefficients, and community structure. In the work presented by Chakraborty et al. [13], they investigated the topological characteristics of supply chain networks from three different perspectives: node-level, meso-level and flow dynamics, and obtained a coarse-grained description of communities, thereby verifying the structural attributes of an efficient supply chain system. Using the design structure matrix, Wang et al. [11] proposed a method to quantitatively measure the complexity characteristics and evolution laws of supply chain networks concerning small worlds, scale-free, etc., which provided a basis for structuring the risk prevention network. The analysis on the automotive manufacturing industry showed that automotive supply chain network was typically a small-world and scale-free network [12]. The characteristics were also reflected in the smartphone phone manufacturing industry [10] and aviation industry [2]. In addition, it was shown that the corporate collaborative network followed a power law shaped degree distribution, with high performing firms exhibiting significant power to control the network in terms of horizontal integration [21].

The second category focuses on constructing and optimizing the network structure, given that the layers of the supply chain network are known and the echelon positions of the nodes are fixed, but the connection edges between nodes are adjustable. Sun et al. [22] investigated a dumbbell-shaped fixed multilayer supply chain network structure based on node degree preference and link preference, and proposed an evolutionary algorithm to enhance the robustness of the network. Long [23] studied a five-echelon network and proposed a data-driven four-dimensional flow model for the framework of network decision making. A centralized decision-making structure model with a linear matrix inequality technique was proposed to control chaotic behavior of the network, where the configuration order of the entities was determined according to the customer demand [24].

The third category is concentrated on dynamically extracting the supply chain network maps according to the data features or workflow requirements. Rahul [25] presented a data-driven visualization methodology with network examples on innovation, geographic location, and knowledge sharing to illustrate the applicability and value of visualization techniques in revealing important clusters, patterns, trends, and outliers in the networks. A deep neural network approach was proposed by Wichmann et al. [26]. The approach used a machine learning method to identify unknown buy–sell relationships based on a text corpus built from natural language processing. Extending the process to a broader dataset would yield the map of the supply chain network. By incorporating performance evaluation process algebra into Petri net, Ding et al. [27] proposed an approach based on the workflow of the supply chain system to identify the hierarchical structure of the network. Similarly, through integrating a Petri net and triangulation clustering algorithms, Blackhursta et al. [14] presented a method that could quickly visualize the supply chain network and analyze the propagation paths of disruptive events.

While most of above approaches to structure the supply chain networks by analyzing their characteristics of complexity, few literature proposes systematic structuring algorithms. In particular, these approaches, as discussed in Sect. 1, generally ignore the existence of overlapping communities and their impact in the network structuring process.

2.2 Hierarchical Community Detection

Hierarchical structure may exist in complex network graphs with multi-level clusters of nodes, also referred to as communities or modules. For example, the supply chain system [7], socialized interest network [28] and neural network [29] all exhibit hierarchical cohesive characteristics. In recent years, a great number of algorithms have proposed for the hierarchical community detection. They can be roughly categorized into two groups, divisive methods and agglomerative methods.

A divisive method is a top-down clustering, starting with regarding the whole network as one community containing all node points. It recursively splits communities until a stopping criterion is reached so that individual nodes are arranged into singleton clusters [30]. GN algorithm, introduced by Girvan and Newman [31], is a well-known divisive method that performs hierarchical community discovery by iteratively eliminating the edges with the greatest edge betweenness until no edge remains. To find good hierarchical partitions of the network, Gómez et al. [32] proposed a method based on the node-game shortest path betweenness measure. Instead of calculating the precise edge betweenness of all the edges, Mahsa et al. [33] proposed a trust-based algorithm which conceptually expresses the trust penetration strength between nodes to extract the communities in the network. Gregory [34] improved the GN algorithm to detect overlapping communities and proposed the cluster-overlap Newman Girvan algorithm (CONGA), which revealed the hierarchical structure of overlapping communities by repeatedly removing edges with maximum edge betweenness or split vertices with maximum split betweenness until no edges remained. Following that, various enhancements to CONGA were proposed, including techniques based on multi-dimensional similarity [35] and notation of the common friends [36]. The main problem of the divisive methods is that while computing edge betweenness, there may be many repeated calculation of shortest paths, and the time complexity is high.

In an agglomerative hierarchical community detection method, each observation is initially considered as a single-element cluster (leaf). Through repeated iterations of cluster merging, all points become members of a single cluster (root), resulting in a hierarchical tree that can be displayed as a dendrogram [37]. The approaches based on the modularity optimality have become a mainstream of research. The representative algorithms include fast-unfolding [38] and Fast-Newman [39] algorithms. There methods use the modularity as the metric to evaluate each node and reassign the community based on the gain until the modularity no longer increases. The network hierarchy is unfolded as a community of communities built in the optimization process. However, the optimization of the modularity may sometimes be very complex due to it is an NP-complete problem. Improved algorithms based on similarity modularization, simulated annealing and genetic methods are proposed in the work by Zhen et al. [40], Lee et al. [41] and Talavera et al. [42]. The second stream of agglomerative methods is represented by label propagation algorithm (LPA) [43], which works by propagating labels throughout the network until closely connected nodes have a common label and the hierarchical community structure is formed in the label propagation process. Improvements of LPA, such as LPA-CBD [44] and LPAE [45] are reported. In terms of the hierarchical clustering on overlapping communities based on label propagation methods, ACLPA [46] and SLPA [47] are reported. The label propagation algorithms alleviate the resolution limitation to a certain extent, but it has poor stability due to the randomness of spreading process. The third stream of agglomerative methods is the gravitation-based methods, where node gravitation force is taken into account to guide community merging and to control the randomness [48]. Modified methods such as fuzzy granular-based [49], local density-based [20] and label propagation-based [50] gravitational community detection techniques were proposed recently. However, these methods need prior knowledge about the data, such as information about the number of communities and probability distribution over clusters, which are always difficult to obtain in advance. Incorrect or incomplete prior knowledge may result in uncertainty in the hierarchical clustering process, which affects the accuracy of the community detection.

In addition to the above two groups of algorithms, some other methods based on random walk [51, 52], genetic method [53] nonlinear dynamical evolution [54], three-way decision model [55], scalable spectral clustering [56] and voting simulation [57] have been proposed to for hierarchical community detection.

Few of the aforementioned algorithms have been tested for the supply chain network structuring problem. Most of these studies treat all the nodes in the complex network as equal entities, and can only restore the hierarchical community structures of homogeneous networks. However, the supply chain network in most cases is heterogeneously organized, where the focal firms play cohesive roles in industrial coordination, product orientation, innovation leading, etc., while other small and medium-sized firms behave as supporting suppliers and undertake outsourcing operation around the focal enterprises. In addition, firms may play many roles in multiple supply chains at the same time and, therefore, may belong to multiple communities, which leads to intertwined communities and ambiguous boundaries among them. However, many of the existing algorithms have low performance as the degree of overlapping between communities increases.

3 The Proposed Algorithm

In this section, we describe the proposed gravitation-based hierarchical community detection algorithm for structuring supply chain network (GHSCA) in detail. In GHSCA, a central node-based gravitational clustering strategy is first used to discover the community structure of the enterprise network, and then a strategy of gravitation-based pruning of overlapping nodes is used to identify the functional modules, and finally, a module gravitation-based hierarchical clustering strategy is employed to structure the supply chain network.

3.1 Central Node Gravitation-Based Enterprise Community Detection

As mentioned above, the focal firms are the initiators of production. They act as the core actors who drive the supply chain network and play a cohesive role in outsourcing operations, coordinating partnerships and product orientation. For this, we first propose a central node gravitation-based community identification strategy to identify the central nodes and the enterprise communities centered on them in the network.

The network model is represented as an undirected graph \(G = (V,E)\), where \(V\) is a collection of enterprise nodes and \(E \subseteq V \times V\) is a set of edges which refer to the various collaboration relationships between upstream and downstream nodes. Derived from the law of universal gravitation, the gravitation of a pair of enterprise nodes \(v_{i}\) and \(v_{j}\) is defined as follows:

$$ g(v_{i} ,v_{j} ) = \frac{{D(v_{i} )D(v_{j} )}}{{d^{2} (v_{i} ,v_{j} )}} = \frac{{D(v_{i} )D(v_{j} )}}{{(1 - Jaccard(v_{i} ,v_{j} ))^{2} }} $$
(1)

where \(D(v_{i} )\) and \(D(v_{j} )\) are the degrees of \(v_{i}\) and \(v_{j}\) quantified by the total number of edges incident to them. \(d(v_{i} ,v_{j} )\) denotes the distance between \(v_{i}\) and \(v_{j}\), which is measured by the Jaccard distance \(1 - Jaccard(v_{i} ,v_{j} )\) to define their dissimilarity. The Jaccard similarity \(Jaccard(v_{i} ,v_{j} )\) is calculated as the ratio of the number of unique neighbors shared by the two nodes to the total number of neighbors they have, written in notation form as

$$Jaccard(v_{i} ,v_{j} ) = {{\left| {N(v_{i} ) \cap N(v_{j} )} \right|} \mathord{\left/ {\vphantom {{\left| {N(v_{i} ) \cap N(v_{j} )} \right|} {\left| {N(v_{i} ) \cup N(v_{j} )} \right|}}} \right. \kern-0pt} {\left| {N(v_{i} ) \cup N(v_{j} )} \right|}}$$

where

$$N(v_{i} ) = \{ v_{j} |(v_{i} ,v_{j} ) \in E\}$$

represents the set of neighboring nodes surround \(v_{i}\).

Thus, the gravitation force \(F(v_{i} )\) of \(v_{i}\) can be expressed by the following:

$$ F(v_{i} ) = \mathop \sum \limits_{{v_{j} \in N(v_{i} )}} g(v_{i} ,v_{j} ) = \mathop \sum \limits_{{v_{j} \in N(v_{i} )}} \frac{{D(v_{i} )D(v_{j} )}}{{(1 - Jaccard(v_{i} ,v_{j} ))^{2} }} $$
(2)

The centrality of a node in the network can be characterized by its gravitation force.

We identify the central node of a community as the node with particularly large gravitation force. Algorithm 1 shows the proceeding of selecting the central nodes in the network, performing as follows. First, the gravitational force \(F(v_{i} )\) of each enterprise node \(v_{i}\) in the network is calculated. Second,\(F(v_{i} )\) is compared with the gravitational force \(F(v_{j} )\) of each of \(v_{i}\)’s neighboring enterprise nodes \(v_{j}\). If the value of \(F(v_{i} )\) is greater than the \(F(v_{j} )\) values of all its neighbor nodes, node \(v_{i}\) is selected as a central enterprise node. Note that, the output central enterprise nodes of Algorithm 1 are not necessarily the nodes with the largest degrees in the network. From the view of enterprise community distribution, they are in the best central positions of enterprise groups in the network topology.

figure a

Once the central nodes are located, GHSCA adopts the central node expansion based clustering strategy to identify the enterprise communities in the network. An enterprise community \(C_{k}^{{}} (k = 1,2,...,n)\) can be considered as an organization with a central node enterprise \(v_{kc}\) surrounded by a number of supporting enterprises \(v_{ks} (s = 1,2,...,m)\). GHSCA takes a central enterprise node as the initial community and traverses all the supporting nodes in the network in descending order based on their gravitational force. For a supporting enterprise node \(v_{{}}\) that has not been assigned to a community in the network, GHSCA checks each central node \(v_{kc}\) and determines whether \(v_{{}}\) should belong to the same community.\(v_{{}}\) is added in the community \(C_{k}^{{}}\) of central node \(v_{kc}\) if \(v_{{}}\) satisfies one of the following conditions:

  • (a) \(v_{{}}\) is a direct neighbor node connected to the central node \(v_{kc}\), and \(F(v_{kc} ) - F(v_{{}} ) \ge 0\).

  • (b) \(v_{{}}\) is an indirect neighbor of the \(v_{kc}\), and the attraction rate of the central node \(v_{kc}\) to \(v_{{}}\), \(ia(v_{kc} ,v_{{}} )\), is greater than or equal to their Jaccard similarity coefficient \(Jaccard(v_{{{\text{kc}}}} ,v)\).

In (b), \(ia(v_{kc} ,v_{{}} )\) is calculated by the ratio of the gravitation on the shortest path from \(v_{{}}\) to \(v_{kc}\) to the gravitational force of \(v_{kc}\). It is defined by the following expression:

$$ ia(v_{kc} ,v_{{}} ) = \frac{{g(v_{kc} ,v_{i} ) + g(v_{i} ,v_{i + 1} ) + ... + g(v_{j} ,v_{{}} )}}{{F(v_{kc} )}} $$
(3)

The Jaccard coefficient, which measures the similarity between \(v_{kc}\) and \(v_{{}}\), can be considered as a threshold to determine whether \(v_{{}}\) should be classified into the same community as the central node \(v_{kc}\).

When \(ia(v_{kc} ,v_{{}} ) \ge 1\), the central enterprise node \(v_{kc}\) can strongly and indirectly aggregate the support enterprise node \(v_{{}}\), and \(v_{{}}\) is only attributed to the community \(C_{k}^{{}}\). When \(Jaccard(v_{{{\text{kc}}}} ,v) \le ia(v_{kc} ,v) < 1\),\(v_{kc}\) can indirectly aggregate \(v_{{}}\), while \(v_{{}}\) may belong to other communities as well; when \(ia(v_{kc} ,v_{{}} ) < Jaccard(v_{{{\text{kc}}}} ,v)\), there is no significant community affiliation between node \(v_{kc}\) and \(v\). Algorithm 2 gives the procedure for the central node gravitation force-based community detection.

figure b

3.2 Functional Modules Identification Based on Gravitational Pruning of Overlapping Nodes

The community structure based on the above definitions may lead to overlaps between communities, i.e., the same node may be divided into more than two different communities. This is in line with the chaotic situation that some outsourcing contractors may participate in multiple supply chain activities in a real supply chain network. It implies that different focal firms may have a large number of common supporting firms, leading to the ambiguous boundaries among communities. From a supply chain perspective, the division of tasks upstream and downstream of the supply chain, as well as the synergy and close association of partners, result in considerable overlaps of community members. Therefore, when there are a large number of common neighbors shared by different communities in the network, relying only on central node expansion to identify the community structure becomes problematic. To this end, GHSCA employs a gravitational tension pruning-based strategy to divide the intertwined communities into functional modules.

The gravitational tension of the edge between two nodes \(v_{i}\) and \(v_{j}\) is calculated as the total of the gravitational forces exerted on them by their neighborhood nodes subtracting the gravitational force exerted between them. It is defined as

$$ \begin{gathered} T(e(v_{i} ,v_{j} )) = \sum\limits_{{v_{ik} \in N(v_{i} )\backslash \{ v_{i} \} }} {g(v_{i} ,v_{ik} ) + } \hfill \\ \, \sum\limits_{{v_{jk} \in N(v_{j} )\backslash \{ v_{i} \} }} {g(v_{j} ,v_{jk} )} - g(v_{i} ,v_{j} ) \hfill \\ \end{gathered} $$
(4)

where \(e(v_{i} ,v_{j} )\) denotes the edge of two directly connected nodes \(v_{i}\) and \(v_{j}\), \(v_{ik}\) is a neighbor of \(v_{i}\) different from \(v_{j}\), and \(v_{jk}\) is in the neighbor of \(v_{j}\) excluding \(v_{i}\). Taking Fig. 1 as an example, the tension between the connected edge of nodes \(v_{1}\) and \(v_{2}\) is the total gravitation of \(v_{11}\), \(v_{12}\) and \(v_{13}\) on \(v_{1}\) plus the total gravitation of \(v_{21}\), \(v_{22}\) and \(v_{23}\) on \(v_{2}\), deducting the gravitation between \(v_{1}\) and \(v_{2}\). The tension of edges reflects the interdependence between enterprise nodes involved in the activities of different communities. When the tension between two nodes is intense, it means that the gravitational force on them by their neighboring nodes is greater than the gravitational force between them, and they may be separated. A bearing coefficient \(t\) is used to define the maximum tension that an edge can hold. If \(T(e(v_{i} ,v_{j} )) \ge t\), then edge \(e(v_{i} ,v_{j} )\) will be removed in the process of segmenting communities.

Fig. 1
figure 1

Gravitational tension of edge

We use the concepts of focal module and hub module to describe the different parts of a community. A focal module in a supply chain network acts as an interdependent, semi-autonomous community of firms in the network with highly specific core competencies (e.g., product assembly, or material replenishment) that can deploy the needed tasks of other modules. A hub module serves as a centralized setup that connects multiple focal modules or other hubs and plays a pivot role in the supply chain system. Based on the division of a community into non-overlapping and overlapping regions, the nodes are classified into either a focal module or a hub module. In a focal module, the central node has larger gravitation and more links than the other node within the module. The overlapping nodes in the hub module also have larger degrees than the general nodes because they are intensively connected with other nodes in separate focal modules.

After the previous community discovery algorithm, the supply chain network becomes a set of communities \(\Omega = \{ C_{1}^{{}} ,C_{2}^{{}} ,..,C_{k}^{{}} \}\). To identify the functional modules, GHSCA considers the gravitational tension of the connected edges between the overlapping nodes and the non-overlapping nodes to determine whether the overlapping nodes should stay in a focal module or be pruned into a hub module. By continuously examining the gravitational tension of the edges of overlapping nodes, the community composition \(\Omega = \{ C_{1}^{{}} ,C_{2}^{{}} ,..,C_{k}^{{}} \}\) is transformed into a collection of the focal modules and hub modules \(FM = \{ M_{1} ,M_{2} ,...,M_{k} ,\) \(H_{1} ,H_{2} ,...,H_{h} \}\) that should have intra-module homogeneity and inter-module heterogeneity. Figure 2 shows an illustration. In Fig. 2(a), the enterprise community \(C_{1}\) and \(C_{2}\) are overlapped with two nodes. If the bearing coefficient \(t\) is set large enough, as shown in Fig. 2(b1), the overlapping nodes may both be distributed to community \(C_{2}\), generating two independent focal modules,\(M_{1}\) and \(M_{2}\). In Fig. 2(b2), if we decrease the value of \(t\), then the two overlapping nodes can be extracted into an independent hub module \(H\). The region consisting of \(C_{1}\) and \(C_{2}\) can then be partitioned into two focal modules \(M_{1}\) and \(M_{2}\), and a hub module \(H\) which links \(M_{1}\) and \(M_{2}\). Algorithm 3 gives the algorithm of the functional modules identification operation.

Fig. 2
figure 2

Example of overlapping modules detection. a Overlapping communities (b1) focal module detection results with a higher bearing coefficient value; b2 hub module detection result with a lower bearing coefficient value

figure c

3.3 Supply Chain Network Structuring Based on Modular Gravitational Hierarchical Clustering

The identification of functional modules enables the community boundaries in the supply chain network to become explicit. Next, GHSCA employs hierarchical clustering based on the gravitation of modules to form hinged layers of modules, thus structuring the supply chain network layout. The module gravitation is defined as

$$ Gra\left( {M_{i} ,M_{j} } \right) = IC(M_{i} ,M_{j} ) \cdot \frac{{mec(M_{i} ) \cdot mec(M_{j} )}}{{d^{2} (M_{i} ,M_{j} )}} $$
(5)

where \(IC(M_{i} ,M_{j} )\) is the interaction coefficient of modules \(M_{i}\) and \(M_{j}\), \(mec(M_{i} )\) denotes the max edge-connectivity of module \(M_{i}\), and \(d(M_{i} ,M_{j} )\) is the distance between both modules. They are formulated, respectively, as the following:

$$ IC(M_{i} ,M_{j} ) = 1 + \frac{{\left| {ce(M_{i} ,M_{j} )} \right|}}{{\left| {E(M_{i} ,M_{j} )} \right|}} $$
(6)

where \(\left| {ce(M_{i} ,M_{j} )} \right|\) denotes the number of edges connecting \(M_{i}\) and \(M_{j}\), and \(\left| {E(M_{i} ,M_{j} )} \right|\) represents the summed number of edges of \(M_{i}\) and \(M_{j}\). The interaction coefficient measures the structural convergence of the two modules. The higher the coefficient value is, the higher number of shared edges between the two modules, and the greater degree of interaction between the modules:

$$ mec(M_{i} ) = \left| {E(M_{i} )} \right| - (m - 1) $$
(7)

where \(\left| {E(M_{i} )} \right|\) is the number of edges in \(M_{i}\), and \(m\) is the total number of nodes in \(M_{i}\). The max edge-connectivity reveals the structural stability of \(M_{i}\):

$$ d(M_{i} ,M_{j} ) = 1 - Jaccard(M_{i} ,M_{j} ) $$
(8)

where the distance between modules is measured by the Jaccard distance. In addition, the Jaccard similarity is calculated as \(Jaccard(M_{j} ,M_{j} ) = {{\left| {M_{i} \cap M_{j} )} \right|} \mathord{\left/ {\vphantom {{\left| {M_{i} \cap M_{j} )} \right|} {\left| {M_{i} \cup M_{j} } \right|}}} \right. \kern-0pt} {\left| {M_{i} \cup M_{j} } \right|}}\).

From the above definitions, it can be seen that module gravitation not only reflects the tightness of the modules connected in the network, but also takes into account their similarity. Consequently, GHSCA avoids the adjustment of controlling parameters when measuring the structural characteristics of the network. The algorithm consists of the following two steps.

The first step is to build the dendrogram of modules by hierarchical clustering. The dendrogram is a diagrammatic representation of the hierarchical relationship between the modules. It is to work out the most resolvable way to allocate the modules to clusters. To begin with, the module gravitation \(Gra\left( {M_{i} ,M_{j} } \right)\) between every two focal modules in \(\{ M_{1} ,M_{2} ,...,M_{k} \}\) is measured. Then, the agglomerative clustering scheme is implemented in a “bottom-up” manner. Starting from the pair of modules with the highest gravitation, they are merged together with their hub modules (if any) into a compound which will participate in the subsequent clustering process. The consolidation process is repeated until all the modules in the network are merged into one single cluster. A dendrogram \(T = Cla_{q} \{ Cla_{w} \{ ...,M_{i} ,\)\(...H_{q} \} ,\) \(Cla_{e} \{ M_{j} ,...,...H_{y} \} ,...H_{z} \}\) is generated according to the order in which the modules are merged. It consists of stacked branches called as clades, which are further split into smaller branches. For each clade, it has left and right sub-branches. Each subbranch corresponds to a single module or a clade of clustered compound. The hub modules connecting the sub-branches are also recorded in the clade. At the bottom level of \(T\) are individual modules, which are grouped into clusters based on module gravitation. Fewer and fewer clusters are at higher levels of the dendrogram.

The second step is to construct an n-tier supply chain network structure \(SC\) based on the dendrogram \(T\) generated in the first step. GHSCA examines each clade of \(T\) from the bottom-up. For a particular intermediate clade \(Cla_{q}\) on \(T\), if \(Cla_{q}\) is assembled by module \(M_{i}\) and \(M_{j}\), a tertiary supply chain relationship \(M_{i} ^{\prime} \to H_{ij} ^{\prime} \to M_{j} ^{\prime}\) is formed when there is a hub module \(H_{ij}\) on \(Cla_{q}\). Otherwise, a secondary supply chain relationship \(M_{i} \to M_{j}\) is formed. If \(Cla_{q}\) is assembled by a clade \(Cla_{q} ^{\prime}(M_{i} ^{\prime} \to H_{ij} ^{\prime} \to M_{j} ^{\prime})\) and a module \(M_{m}\), the module \(M_{j} ^{\prime}\) on \(Cla_{q} ^{\prime}\) with higher gravitation to \(M_{m}\) will be upstream of \(M_{m}\). In addition, a four-tier supply chain relationship is established in the form of \(M_{i} ^{\prime} \to H_{ij} ^{\prime} \to M_{j} ^{\prime} \to M_{m}\). If \(Cla_{q}\) is assembled by two clades \(Cla_{q} ^{\prime}\) \((M_{i} ^{\prime} \to H_{ij} ^{\prime} \to M_{j} ^{\prime})\) and \(Cla_{q} ^{\prime\prime}\)\((M_{i} ^{\prime\prime} \to H_{ij} ^{\prime\prime} \to M_{j} ^{\prime\prime})\), the supply relationships formed by its left and right sub-branches will be integrated as \(\left( \begin{gathered} M_{i} ^{\prime} \to H_{ij} ^{\prime} \to M_{j} ^{\prime} \hfill \\ M_{i} ^{\prime\prime} \to H_{ij} ^{\prime\prime} \to M_{j} ^{\prime\prime} \hfill \\ \end{gathered} \right) \to Cla_{q}\). The process of supply chain integration is continued until the topmost clade is reached, where module \(M_{f}\) or clade \(Cla_{f}\) on the very top clade of \(T\) represents the most downstream position in the supply chain. Figure 3 is an example to illustrate the transformation from a dendrogram of functional modules to the supply chain structure. In Fig. 3(a), there are four clades \(Cla_{1} (A,H_{AB} ,B)\), \(Cla_{2} (C,H_{CD} ,D)\), \(Cla_{3} (Cla_{1} ,Cla_{2} )\) and \(Cla_{4} (Cla_{3} ,E)\). First, the branches \(Cla_{1}\) and \(Cla_{2}\) are examined, and two tertiary supply chain relationships \(sc_{1} = A \to H_{AB} \to B\) and \(sc_{2} = C \to H_{CD} \to D\) are established. The penultimate branch \(Cla_{3}\) is then examined, and the supply chains are integrated as \(\left( \begin{gathered} A \to H_{AB} \to B \hfill \\ C \to H_{CD} \to D \hfill \\ \end{gathered} \right)\), since \(Cla_{3}\) is constructed by two clades \(Cla_{1}\) and \(Cla_{2}\). Following that, for \(Cla_{4}\) which consists of a subbranch and a module \(E\), the module with the greatest gravitational pull on \(E\) is chosen as the upstream supplier of \(E\), and a four-tier supply chain structure \(SC = \left( \begin{gathered} A \to H_{AB} \to B \hfill \\ C \to H_{CD} \to D \hfill \\ \end{gathered} \right) \to E\) is built. Meanwhile, GHSCA has traversed to the topmost clade of the dendrogram. As a result, a four-tier supply chain relationship is eventually constructed, as seen in Fig. 3(b). Algorithm 4 gives the procedure for structuring the supply chains network using module gravitation-based hierarchical clustering.

Fig. 3
figure 3

Example of supply chain structuring. a Dendrogram of modules; b supply chain structure constructed

figure d

3.4 The Time Complexity of GHSCA

For an enterprise network \(G = (V,E)\), we assume that there are \(N\) nodes in the network and \(K\) of them are central nodes, and the average degree of each node is \(D\). Algorithm 1 needs to obtain the gravitation force of each node, and the time complexity is \(O(N)\). The iterative process of selecting the central node needs to determine whether a node has a greater gravitational force than all of its neighbors, so the time complexity of it is \(O(N*D)\). In the subsequent node expansion process, the expansion range for a central node is the entire network, so the required time complexity of the Algorithm 2 is \(O(K*(N - 1))\), which can be approximated to \(O(N*K)\). Suppose the average number of overlapping edges between communities of the \(K\) central nodes is \(R\), then the time complexity of the Algorithm 3 is \(O(R*K^{2} )\) since the functional module detection is performed between every two communities. Assume that there are \(M\) modules are detected in the Algorithm 3, then the time complexity of hierarchical clustering phase is \(O(M^{2} )\). In addition, the time complexity of supply chain detection is \(O(M - 1)\). Therefore, the time complexity of the Algorithm 4 is \(O(M^{2} ) +\)\(O(M - 1) = O(M^{2} )\). The above analysis shows that the total complexity of the proposed GHSCA is \(O(N) + O(N*D) + O(N*K) + O(R*K^{2} ) + O(M^{2} )\). For the real-world supply chain networks,\(K,M,E \ll N\). Therefore, the total time complexity of the GHSCA is \(O(N*D)\).

4 Experimental Tests and Analysis

In this section, we validate the performance of the proposed GHSCA through experiments. The experiments are first conducted on synthetic networks and the performance of GHSA is compared with four benchmark hierarchical clustering algorithms that are often included in test reports in academia and industry, including two divisive algorithms, GN [31] and CONGA [34], and two agglomerative algorithms, fast-unfolding [38] and LPA [43]. Then, we apply GHSCA to a real-world supply chain network and compare it with the above four benchmarks to show its applicability and effectiveness. Simulations are performed on computers with an Intel(R) Core(TM) 2.6 GHz processor and a Windows 10 operating system with 16.00 GB of RAM. The algorithms are implemented in Python 3.6 environment.

4.1 Evaluation Metrics

We use an internal measure, modularity \(Q\) [58] and an external measure, normalized mutual information (NMI) [59], to evaluate the performance of different hierarchical clustering algorithms. Of them, Modularity \(Q\) proposed by Newman [58] is an unsupervised metric. It is used in the situation where there are no real module division results available. The measure is to quantify how well the network is clustered into modules relative to a null model which generates a random network that has the same degree distribution as the original network. A clustering method that decomposes the network in a way that maximizes the modularity function \(Q\) is considered to be the optimal solution. It takes into account the difference between the number of edges within the detected modules and the expected number of edges in the null model. The following is how modularity \(Q\) is calculated:

$$ Q = \sum\limits_{i = 1}^{K} {\left( {\frac{{L_{i} }}{2m} - \left( {\frac{{Z_{i} }}{2m}} \right)^{2} } \right)} $$
(9)

where \(K\) is the number of modules, \(m\) denotes the total number of edges in the network, \(L_{i}\) is the number of edges between the nodes within module \(i\), and \(Z_{i}\) is the sum of degree of nodes of module \(i\). \(Q\) value is in the domain of [-1, 1], with positive value indicating that there are more edges than expected within the detected module. When \(Q\) value is between 0.3 and 0.7, it indicates a significant module structure. The closer its value is to 1, the stronger the module structure delineated, that is, the better the quality of the clustering results.

The external measure, NMI, is a supervised metric because it assesses the quality of the clustering on the basis of the ground truth clustering structure. It utilizes the confusion matrix for accuracy estimation. We explain NMI as follows.

Let \(W = (w_{1} ,w_{2} ,...,w_{K} )\) denote the resultant module set obtained from an experiment run by a clustering algorithm,\(T = (t_{1} ,t_{2} ,...,t_{J} )\) represents the ground truth module set, and \(N\) be the total number of nodes in the network. The confusion matrix \(C\) is constructed, where its rows correspond to \(W\) and its columns refer to \(T\). The element \(C_{ij}\) indicates the number of overlapping nodes between the detected module \(i\) in \(W\) and real module \(j\) in \(T\). The NMI value between \(W\) and \(T\) is calculated as follows:

$$ NMI(W,T) = \frac{{ - 2\sum\nolimits_{i = 1}^{K} {\sum\nolimits_{j = 1}^{J} {C_{ij} \log \left( {\frac{{C_{ij} N}}{{C_{i.} C_{.j} }}} \right)} } }}{{\sum\nolimits_{i = 1}^{K} {C_{i.} \log \left( {\frac{{C_{i.} }}{N}} \right)} + \sum\nolimits_{j = 1}^{J} {C_{.j} \log \left( {\frac{{C_{.j} }}{N}} \right)} }} $$
(10)

where \(C_{i.}\) is the sum over elements in row \(i\), and \(C_{.j}\) is the sum over elements in column \(j\) of the confusion matrix. The range of the value of NMI is 0 to 1. The larger the value of NMI is, the higher the similarity between \(W\) and \(T\). If \(NMI = 1\), it indicates that \(W\) and \(T\) are the same clustering results of the network. If \(NMI = 0\), it means that they are totally different.

4.2 Experiments on Synthetic Networks

We use Lancichinetti–Fortunato–Radicchi (LFR) benchmark [60] to generate artificial networks with known community structure configurations. The benchmark is widely used to test clustering algorithms by that the networks generated have a broad degree distribution and overlapping communities. The parameters of LFR that are used to control the complexity of networks are as follows:\(n\), the number of nodes in the network; \(k\), the average degree of node; \(maxk\), the maximum degree of node; \(minc\), the minimum nodes for the community sizes; \(maxc\), the maximum nodes for the community sizes; \(\mu\), the mixing parameter; \(O_{n}\),the overlapping density parameter;\(O_{m}\), the parameter of membership of overlapping nodes.

We generate four groups of LFR networks with different sizes. Their corresponding settings are shown in Table 1. We simulate the complexity of the networks by varying the parameter \(\mu\), which defines the fraction of inter-community edges incident to each node. We consider \(\mu\) ranging from 0.1 to 0.8 to verify the performance of the algorithms. Besides, the overlapping feature of the networks is reflected by varying the parameter \(O_{n}\) which defines the number of overlapping nodes, and parameter \(O_{m}\) which defines the number of modules of the overlapping nodes belong to. Both \(O_{n}\) and \(O_{m}\) imply the extent of interweaving between the communities in the networks. We set \(O_{n}\) to vary from 64 (128) to 512 (1024) in steps of 64 (128) for networks with the size of 1024 (2048) nodes and \(O_{m}\) to vary from 2 to 8.

Table 1 Parameter setting of LFR networks

Before examining the performance of the proposed GHSCA, we first test the plausibility of the synthetic network as the models for supply chain network by verifying if the LFR networks exhibit scale-free, modular community, and hierarchical characteristics [7, 9, 10]. The test results with \(\mu\) set to 0.3 are shown in Table 2. First, to investigate the LFR networks are of scale-free topology, we use Least Square method and Kolmogorov–Smirnov (K–S) test to see if the degree distribution of nodes of LFR networks follows the power law. From Table 2, we can see that the \(R\) squared values for all the LFR networks are greater than 0.65, and the \(p\) values are less than 0.1, indicating significant goodness of fit of power law models. Second, we use the modularity maximization method [58] to verify the existence of community structure. Shown in Table 2, all the modularity \(Q\) are positive ranging between 0.512 and 0.816, indicating that presence of community structure. The positive values imply that nodes within the community are densely connected to each other, but sparsely connected with nodes in other communities. Third, we adopt coefficient \(\beta\) proposed by Ravasz et al. [61] as an indicator to determine the hierarchical structure. For nodes with \(k\) links, the coefficient \(\beta\) is defined in the clustering coefficient \(C(k)\) following scaling law \(C(k)\sim k^{ - \beta }\), where \(\beta\) represents the hierarchical coefficient and \(k\) is the node degree. As shown in Table 2, all values of the coefficient \(\beta\) are positive suggesting that the LFR networks exhibit hierarchical structures.

Table 2 Topology analysis of LFR networks (\(\mu = 0.3\))

Next, we analyze the performance of the algorithms with respect to the metric of modularity \(Q\). Figure 4 plots the \(Q\) values of GHSCA and the four compared benchmark hierarchical clustering algorithms on the four groups of LFR networks with the variation of the mixing parameter \(\mu\). From Fig. 4, it is observed that the proposed GHSCA has the best performance. It achieves the largest \(Q\) values on the LFR networks with four different sizes. We can see that the performance of all the five algorithms deteriorates as \(\mu\) increases, especially after \(\mu > 0.4\), but the performance of GHSCA is more robust compared to the other algorithms. In addition, it can be seen that the corresponding \(Q\) values of all the algorithms decrease as the network size increases, but GHSCA has the flattest decline. GN and CONGA maintain low but stable performance. LPA only performs relatively well on the networks with \(\mu < 0.4\). When \(\mu > 0.4\), its performance decreases sharply. Fast-unfolding achieves a pretty good performance. Its modularity curves are basically coincided with those of the proposed GHSCA. This is due to the fact that fast-unfolding is based on modularity optimization, which aims to make the modularity of the clustering results as large as possible during the iterative process. However, as mentioned previously, this greedy iterative approach leads to misclassification of nodes with high modularity at the same time.

Fig. 4
figure 4

Modularity \(Q\) of different algorithms with the variation of the mixing parameter \(\mu\)

Then, we analyze the experimental results with respect to the NMI metric. Figures 5, 6, and 7 show the NMI curves of GHSCA and the four compared algorithms on the LFR networks by varying parameter \(\mu\), \(O_{n}\), and \(O_{m}\), respectively. Each NMI value reported on the curves is obtained by taking average on 20 experiments in the same LFR network. The following results can be observed.

Fig. 5
figure 5

NMI of different methods with the variation of the mixing parameter \(\mu\)

Fig. 6
figure 6

NMI of different methods with the variation of overlapping density \(O_{n}\)

Fig. 7
figure 7

NMI of different methods with the variation of the membership of overlapping nodes \(O_{m}\)

First, GHSCA outperforms the other four methods in terms of overall performance. From Fig. 5, we can observe that the NMI values of all the five algorithms are declined with the increasing of \(\mu\). However, the performance of GHSCA is the more stable over the variation interval of \(\mu\) than the other algorithms. It can be also seen that as the network size increases, the NMI values of the other algorithms decrease in different degrees, nevertheless GHSCA behaves steadier. CONGA has the best performance at \(\mu { = }0.1\), but as the \(\mu\) increases, its performance decreases dramatically. When \(\mu < 0.4\), the relative effectiveness of LPA approaches that of GHSCA. When \(\mu > 0.4\), the advantage of GHSCA over the four benchmark algorithms expands as compared to the situation when \(\mu < 0.4\). This means that the proposed GHSCA can be adapted to networks of various sizes especially large-scale networks.

Second, GHSCA is more applicable for networks with blurred structure of modules. Figures 6 and 7 display the NMI values by varying \(O_{n}\) and \(O_{m}\) in networks of sizes 1024 and 2048, respectively. As the values of \(O_{n}\) and \(O_{m}\) become larger, the overlapping degree of modules increases and the boundaries between modules become ambiguous. Observing from the figures, we can see that the overall performance of GHSCA remains the best regardless of the variation of \(O_{n}\) and \(O_{m}\). The NMI curves of GHSCA maintain flatter trends irrespective of the size of the network, compared with those of other four algorithms. When \(O_{n}\) and \(O_{m}\) are relatively small, the advantage of GHSCA over the other algorithms is not significant. However, when \(O_{n}\) increases to 320 (640) and \(O_{m}\) increases to 5, the gap between them widens rapidly. The NMI values of GN and CONGA decline most significantly, because the divisive methods repeatedly calculate the shortest paths in the network, thus the sophistication of the algorithm is increased and the accuracy is significantly reduced. As far as fast-unfolding is concerned, the greedy approach used in its clustering optimization can easily overcluster the network if the overlapping among modules becomes serious. It may add some peripheral points to the original compact modules, resulting in wrong module merging. For LPA, the randomness of the label propagation process is amplified as the boundary between modules becomes unclear, which leads to the ambiguity of node attribution in the network. By contrast, the dual gravitation (node-based and module-based) used by the proposed GHSCA imposes a normal force between nodes and a magnetic effect between modules, which acts as a controller on the randomness and over-clustering in the module detection process, while ensuring the process converges with a fine gradient.

4.3 Experiments on Real-World Network

In this subsection, we apply the proposed GHSCA to a real-world dataset relating to smart phone battery production for structuring the supply chain network. According to the iFinD Financial Data Terminal’s panorama of smart phone industry in China, the manufacturing upstream involves numerous subsectors, including the design, assembly and manufacturing of chips, displays, memory, PCB/FPC and batteries. For battery production, three manufactures, Desay, Samsung (Tianjin) and Scud are of our interests. Their business is targeted at providing outsourcing services as original equipment manufacturers (OEMs) and original design manufacturers (ODMs) for brand companies of smartphones, including the top-sales companies such as Apple (China), Huawei, Xiaomi, and VIVO. Through SkyEyeSearch, a business information platform, we obtained the partnership information of the above three major battery manufacturers, as well as the information about the assembly firms of the four major cell phone brand companies mentioned above. The data contains 146 firms and 281 partnerships for the year 2021–2022. Figure 8 depicts the initial topological diagram of the supply chain network, where each node represents a business and each edge reflects a cooperative interaction between firms.

Fig. 8
figure 8

Supply chain network of smart phone battery production

The topological properties of the battery supply chain network are summarized in Table 3. The average degree of the network represents the average number of connections of a network node. A degree of 3.767 indicates that the network has dense links between the firms. The network diameter is used to measure the greatest distance between any two nodes in the network, often known as the shortest path. The diameter of 6 is an intriguing discovery since it corresponds to the six-degree separation that describes a small-world network. A small diameter and a high number of nodes imply that the firms in the network have close interactivity and the topology is compact. The \(R^{2}\) and \(p\) values used to validate the power law distribution fit are 0.749 and < 0.01, respectively, demonstrating that the network is a scale-free network. This suggests that there exist a small number of focal firms companies or some hub firms have high connectivity in the network, while the rest of the large number of firms have low connectivity. The modularity rating of 0.552 implies that firms in the network can be organized into community structures to a large extent. Furthermore, as evidenced by coefficient \(\beta { = }0.32\), the subdivided communities exhibit hierarchical cooperative relations, suggesting that the communities can be structured into highly interconnected layers of groups.

Table 3 Topology properties of battery supply chain network

The results of the algorithmic procedure of GHSCA on the battery supply chain network are described as the following. Frist, by algorithm 1, four smartphone brand companies and three battery manufacturers are selected as central nodes based on their strong gravitational force on neighboring nodes. In the enterprise community discovery phase, seven central enterprise nodes are expanded into seven highly overlapped communities after the implementation of Algorithm 2. Table 4 describes the network’s complexity as identified by Algorithm 2. The mixing parameter \(\mu = 0.38\) indicates that the network is rather complex. There are 30 nodes falling into the overlapping regions, accounting for 20.5% of the nodes in the network. The value of community membership is 7, implying that all the communities detected are intertwined.

Table 4 Network complexity detected

Figure 9 depicts the visualization results of community identification, with various hues denoting distinct communities. From \(C_{1}^{{}}\) to \(C_{7}^{{}}\), the enterprise communities of Apple, Huawei, Xiaomi, vivo, Desai batteries, Samsung batteries, and SCUD batteries are represented in the diagram by the colors orange, yellow, dark green, red, green, pink, and flesh. It can be observed that there are variances in the size of the detected enterprise communities. The communities of the three battery suppliers are larger, whereas the communities of the smartphone brand companies are smaller. The overlapping nodes of the communities are highlighted in blue. According to statistics, the average degree of the central firms is 12.86, which is much greater than the average degree of the non-central nodes, 5.32. The average degree of the overlapping nodes is 8.43, which is greatly higher than that of non-overlapping nodes, 2.61.

Fig. 9
figure 9

Result of community detection

Second, to reduce the complexity of network analysis, algorithm 3 divides the highly overlapping network community structure into a form consisting of focal and hub modules with explicit boundaries based on the functional categories of the firms. As illustrated in Fig. 10, when the bearing factor is set to 60, the battery supply chain network is separated into seven independent focal modules \(M_{1}\)\(M_{7}\), and seven hub modules \(H_{1}\)\(H_{7}\). Each focal module consists of a focal firm, such as a smartphone brand business or a battery maker, together with the supporting enterprises that are drawn to it by its gravity, whereas each hub module serves as a link between the focal modules and is made up of a number of firms from the overlapping communities. The average degree of the focal module is 2.61, while the average degree of the hub modules is 8.43. This implies that the hub modules are critical in the network for inter-module connections.

Fig. 10
figure 10

Detection result of functional modules

Third, in order to analyze the upstream and downstream interactions between firms, algorithm 4 performs hierarchical clustering of functional modules to construct the corresponding supply chain structure. Based on the gravitational law, the modules are merged one by one to generate a dendrogram as illustrated in Fig. 11. The leaves of the tree correspond to the enterprise modules of the network. The tree diagram reflects the module hierarchy and can describe the distance between participants appearing in the lower branches and those far apart. From Fig. 11, it is observed that \(M_{5}\) combines with \(M_{2}\) at level 1 via \(H_{2}\), \(H_{3}\) and \(H_{5}\), and then with \(M_{1}\) at level 3 via \(H_{1}\) to \(H_{5}\), forming a big cluster. Meanwhile,\(M_{6}\) unites with \(M_{4}\) via \(H_{4}\) and \(H_{5}\) at level 2, and then with \(M_{3}\) via \(H_{1}\), \(H_{2}\) and \(H_{5}\) at level 4, forming another big cluster. At level 5, these two big clusters are merged into a supercluster. Finally, all hub modules except \(H_{5}\) join it with \(M_{7}\) at the highest level. By inspecting each clade of the tree diagram, GHSCA retrieves 31 supply chain linkages among modules, such as \(M_{6} \to H_{5} \to H_{3} \to M_{2}\). Figure 12 presents the finalized six-tiered structure of the supply chain network. The upstream consists of three focal modules centered on battery manufacturers, the midstream of seven hub modules, and the downstream of four focal modules centered on smartphone brand companies.

Fig. 11
figure 11

Dendrogram of functional modules

Fig. 12
figure 12

Structuring result of supply chain network

To validate GHSCA’s efficacy, we compare its performance to that of the four benchmark algorithms. The results are shown in Table 5. Observed from the table, GHSCA scores the highest on both measures. It achieves a high modularity \(Q\) value 0.717, which demonstrates that it has accurately captured the highly overlapping community structure that characterizes the battery supply chain network. Fast-unfolding, like in the synthetic network tests, also has a relatively high \(Q\) value because it focuses on maximizing the modularity of the neighboring community for each node. In terms of NMI measure, GHSCA has a value of 0.873. This shows that the module identification results of the algorithm are strongly consistent with the ground truth. It has effectively detected the discrepancies in the internal structure of the modules of firms, as well as successfully determined their hierarchical positions in the supply chain network. All four benchmark algorithms perform poorly on NMI. It also confirms the conventional wisdom on Fast-unfolding algorithm that its overfitting disadvantage leads to unsatisfactory NMI even with a high \(Q\) value.

Table 5 Performance on battery supply chain network

4.4 Managerial Insights of Real Network Experiment Results

The results of structuring the supply chain network of smartphone battery production by GHSCA have important management implications. First, a supply chain structure characterized with six tiers is detected by GHSCA. It reconfirms the notion of multi-level, holistic management as emphasized by supply chain networks [7, 62]. Tier 1 firms mainly consist of the suppliers of smart phone batteries, including lithium-ion battery raw material suppliers, cell manufacturers, pack suppliers, and battery management system suppliers. Tier 1 firms work directly with the battery manufacturers. Tier 2 firms are the OEMs and ODMs of battery that undertake battery production outsourcing contracts for producing mobile phones. These manufactures design, produce and test various types of cell phone batteries according to their customers’ needs. They also provide related services, such as battery packaging and battery management system design. Tier 3 and Tier 4 firms are the distributors and sub-distributors that supply battery products to the assembly firms of smart phone and provide professional service for Tier 2 firms, such as logistic and warehousing service. Tier 5 firms are electronic assembly enterprises that take OEMs order from the brand companies of smart phone. Assembly companies are tasked with integrating cell phone components, including various chips, screens, memories, camera sensors, batteries, and circuit boards, into whole machines. These electronic components usually are not produced by smart phone companies, but outsourced to the corresponding OEMs for production. Tier 6 firms are the brand companies that innovate and design the intelligent phones and build sales network to customers.

Second, the focal module detection result shows that the three battery manufactures (Desay, Samsung (Tianjin) and Scud) and four smart phone brand companies (Apple (China), Huawei, Xiaomi, and vivo) play as focal firms that attract small- and medium-sized lithium battery energy storage and packaging integration firms, as well as cell phone assembly firms. This reflects the modular community and scale-free characteristics of the supply chain networks [7, 12]. Seven focal modules are centered at the focal firms. Within each module, the focal firms have high connectivity. They are the core elements in the supply chain network, acting as the initiators of outsourcing business and order providers for the upstream suppliers in the corresponding modules. They are also responsible for the internal operation and management within the modules such as promoting information sharing and coordinating the relationship among the suppliers. The scale-free structure induced by the gravitational force of the focal firms may suggest network vulnerability. The risk is that if a focal battery manufacture or smart phone brand company disrupts, the non-focal firms will be dispersed.

Third, the seven identified hub modules are far more well connected with other modules than the focal ones. However, the degree of connectivity within each of hub modules is low, and there are no core firms to coalesce the internal firms. The connectivity of the hub module accounts for 47% of the entire network, about 2.87 times more than that of the focal firms. This result further provides evidence for the scale-free feature of the supply chain network, while derived from overlapping communities. It implies that instead of focal companies, it is the hub modules that become the key players in the implementation of collaborative plans throughout the supply chain network. The scale-free feature of the network facilitates the implementation of cooperation strategies [11]. The hub modules usually consist of a number of distributors that are intermediate between the focal firms and other suppliers. They can provide rich resources and knowledge to the network, such as flexible warehouses, procurement channels and product development services, thus delivering focal firms with more diversified outsourcing services. In addition, they can save communication costs between the smart phone brand manufactures and battery manufacturers, thereby facilitating mutual cooperation between them.

Fourth, the complexity of the network is mainly caused by the shared distribution network among the focal battery manufactures. GHSCA detects a two layered distribution network. We can see that there is an initially non-obvious “many-to-many” relationship between the focal firms on Tier 2 and the distribution modules on Tier 3. But as the network extends downstream, there is a large number of cross-connection between the distribution modules on Tier 3 and sub-distribution modules on Tier 4. This situation also carries over to the hub modules on Tier 4 and the focal modules on Tier 5. It implies that dependency relations of the network greatly relies on the connectivity of hub modules. The behavior of the hub modules ensures the cohesion and efficiency of the network. When some firms in the hub modules disrupt, the supply and services between battery OEMs and assembly OEMs of smart phone will be reduced, which may result in serious consequences for smaller and less connected manufacturers. Maintaining a sufficient number of distributors and a sufficiently dense distribution network ensures the resilience and robustness of the supply chain network. However, decision makers should be aware that the multiple levels of the distribution network result in the supply chain being prolonged, which may lead to poor information transfer, thus affecting the accuracy and timeliness of decision making. In addition, there is no unified command system to coordinate distributors. The shared distribution network and too many distributors may cause supplier management problems that affects the supplying quality.

5 Conclusion

In industrial manufacturing, the complexity of the supply chain network induced by outsourcing may result in risks of supply delays, quality defects, and other problems that can negatively affect the production. Structuring the supply chain network has undoubtedly become an essential research topic in supply chain management. In this paper, we propose a dual gravitational-based hierarchical community detection algorithm, namely GHSCA, to construct the network that exhibit scale-free, highly overlapped modular community, and hierarchical characteristics. Specifically, GHSCA first utilizes a central node gravitation-based community detection strategy to obtain a coarse granularity structure of communities. Next, based on gravitation metric, an overlapping community detection strategy is employed to identify the functional modules, forming a fine granularity structure of modules. Then, a modular gravitation-based hierarchical clustering strategy is applied to construct the dendrogram which builds the layered framework of the network. Referring to the architecture of the dendrogram, the firms are sequentially arranged at the appropriate upstream and downstream tiers, and the structure of the supply chain network is constructed. Experimental results demonstrate that GHSCA performs well in complex supply chain networks with overlapping community feature.

The structuring result of GHSCA on the real-world supply chain network of battery production explains the management implication of the coexistence of hierarchical, community and scale-free characteristics. Within the multi-tiered network, firms are organized into focal modules exhibiting strong internal connectivity while also low external connectivity, and hub module exhibiting low degree of inner connection but high degree of outer connection. The focal modules are the driving force of the network, however, the network scale-free structure they induced may suggest vulnerability if the firms disrupt. The hub modules ensure the resilience and robustness of the network, but the excessive intertwining among them may cause efficiency risks.

The proposed GHSCA does not require a priori knowledge about the network. It adopts heuristic strategies to infer the global optimal structure from the topology of the supply chain network, avoiding some small or micro-communities that are often not presented in the prior knowledge to be filtered out. It enables decision makers to effectively identify and manage the key elements of the supply chain network by discovering the functional modules as well as other supplier communities. By analyzing the architecture detected by GHSCA, managers can effectively diagnose the potential structural weakness and vulnerability and increase their awareness of risk resolution. At the micro-level, it facilitates the parties in the network to adjust their supply chain strategies in a timely manner according to the market situation. At the macro-level, it serves as a guide for supply chain integration and cooperation between industries.

There are some limitations of our work. Our proposed strategy for structuring the network is based on module gravitation, which improves the overall time efficiency. However, to realize the detection of functional modules, GHSCA employs a time-consuming node-based strategy. In addition, the analysis of the supply chain network structure created by GHSCA does not reveal the dynamics of the network. However, we can track its evolution by collecting time-series data to form a sequence of snapshots of the network. Moreover, we present a network for smartphone battery production as a case study. However, the dataset we have collected is relatively small, which does not fully reveal the full spectrum of the battery industry. Nevertheless, this dataset is most up-to-date and comparatively complete, and can represent a general overview of the network. Therefore, the implication drawn in this paper is not conclusive, but suggestive.