A New Connectivity Index for Container Ports

We propose a new index, the Container Port Connectivity Index (CPCI), to measure the trade connectivity of ports within the network of container shipping. This index is based on both economics and network topology, and a distinctive feature is that the strength of a port is based on its position within the global structure of the shipping network and not just on local information such as the number of TEUs handled or direct links to other ports. Furthermore, it produces separate scores for inbound and outbound container movement and so supports more detailed analyses. We explore the usefulness of this index by analyzing the global network of scheduled mainline container-shipping services as it existed in September 2011. The interested reader may wish to explore the global network of container shipping in more detail at http://www2.isye.gatech.edu/~jjb/wh/apps/transportation/ container-network.html.

What makes a container port attractive? From an operational point of view, one important factor is connectivity: Are there convenient services to and from other important ports? Several measures borrowed from graph theory have been applied to measure connectivity, but these are typically based only weakly on economics. We suggest a new measure of importance with which to compare container ports, which we call the Container Port Connectivity Index (CPCI). This measure is based on a richer model than used heretofore of the intensity of container trade between pairs of ports. Then we use this model to compute the importance of each port as if ranking web pages. In this computation, the importance of a port is based not just on the importance of immediate neighbors but also on the importance of neighbors of those neighbors, and so on. We use these tools to analyze a model of the global network of mainline, scheduled, container-shipping services and argue that they offer a more nuanced and accurate reflection of the relative importance of ports.

The global network of container-shipping
There are several network-based models of the world-wide movement of containers by sea. The one closest to ours is that of [5]. Both their model and ours represent each container port as a vertex. Kaluza et al. include a link (edge) directed from vertex (port) i to vertex (port) j if some container ship traveled directly from port i to port j at any time during 2007, as reported by www.sea-web.com. In our model the meaning of a link is slightly different: There is a link directed from port i to port j if there was mainline, scheduled container service traveling directly from port i to port j, as reported by commercial data source Compair Data on September 2011. In other words our model is a snapshot of the network as it would be engaged by a shipper, while theirs includes more ephemeral phenomena, such as seasonal feeder services. This makes sense for the problem in which Kaluza et al. is interested, which is bio-invasion by species that are spread by ships. In contrast, our concern is operational: what is the nature of the network on which a particular container might move? Ducruet and Notteboom [3] also studied network models of container shipping that are constructed, like that of Kaluza et al., from the time-aggregated movements of all container ships over a year. In one of their models two ports are connected if a container ship has traveled directly from one to the other any time during the reference year. In their other model, two ports are connected if both appeared anywhere on the same scheduled container service during that year (so that each service is represented by a complete subgraph).
Kaluza et al. perform topological analyses of their time-aggregated shipping network. Ducruet and Notteboom go further in relating the topological analyses to geography and economics, but they model container flows by undirected links and so ignore the direction of container movement. Figure 1 shows the network described by our dataset. Several large patterns are im- Figure 1: The network of scheduled container services among 457 ports of the world. Each arrow indicates scheduled container service from origin to destination port (but not the actual geography of the shipping route). Darker links are of greater trade intensity according to a computation based on the Liner Shipping Connectivity Index. Ports represented by larger disks scored proportionally higher according to the new measure of port connectivity described herein.
mediately evident, including the importance of the East Asian ports and the intensity of trade between East Asia and Europe, through the great transshipment ports of Southeast Asia and through the Suez Canal. Similarly, it is clear that services along the west coast of Africa or the east coast of South America are primarily local connections, from port to nearby port. This network has 457 ports and 2,479 links and is strongly connected (that is, for any two ports, each is reachable from the other by some directed path). The mean degree of ports is 10.85, and link-diameter of the network is 11 links 1 (The mean degree is smaller in our network and the diameter larger than those of the time-aggregated networks, probably because they contain additional links such as seasonal and unplanned changes to shipping routes.) Because our dataset includes transit times, we can also report that the travel-time diameter of our network is 56 days (ignoring time in port) 2 .
The mean link-shortest path in our network is 4.02 links, with a median of 4.0. We also find the mean time-shortest path to be 18.6 days, with a median of 19 days.

A link-weight based on economics
It is natural to enrich a network model to reflect the intensity of trade moving along each link. However, it is hard to get trade data at the level of containers and ports and so trade intensity is generally approximated by transport capacity. Kaluza et al. defined the weight of a link to be the sum of gross tonnage of all shipping traversing that link in 2007. Since we have more information regarding each link, we suggest defining the weight of a link by adapting the Liner Shipping Connectivity Index (LSCI). The LSCI was developed by the United Nations Conference on Trade and Development (UNCTAD) to compare the trade competitiveness of countries with respect to logistics and transport. UNCTAD computes the LSCI for a country as an aggregation of five statistics: number of liner services calling, number of liner companies, number of ships, combined container capacity of the ships (in TEUs), and capacity of the largest ship calling [9]. Despite the narrowness of focus and somewhat arbitrary method of aggregating component statistics, the LSCI is based on hard numbers and is felt to accurately reflect trade competitiveness. Indeed, the LSCI has been observed to be strongly correlated with the Logistics Performance Index (LPI), a comprehensive survey of perceptions that is reported annually by the World Bank [1,9].
The LSCI implicitly treats each country of concern as if it were a single location and the entire rest of the world a single trading partner. In effect the world container network is reduced to two vertices, as in Figure 2. The five statistics on which the LSCI is based describe the container capacity connecting the country to the rest of the world and so we may interpret the LSCI as a measure of the strength of the link between two vertices.

USA
Rest of the world LSCI index for USA Figure 2: The Liner Shipping Connectivity Index is based implicitly on a model that aggregates all the ports of that country into one vertex and all the rest of the world into a single trading partner. The LSCI then describes the container-shipping capacity between them.
We follow the idea of the LSCI to compute, for each pair of ports i and j, a weight reflecting the intensity of container capacity moving from i directly to j. The computation is exactly that of the LSCI, except for ports rather than for countries, and for directed transit (that is from port i to port j. Figure 3 shows the resultant distribution of weights for all direct links in our network, and Table 1 lists the twenty links of greatest weight. The most distinctive pattern is that all but one of these links are intra-Asian. Notably, Shanghai figures in six of these links, three times as an origin and three times as a destination. Hong Kong appears seven times and always as a destination, reflecting its role as a marshaling point for exports. This dominance of East Asian container flows is consistent with the statistics reported by Global Insight, quoted in [4], which also observes "Particularly striking is the fact that in 2010, the volume of trade in the Intra-Asia market is four times higher than the volume of trade in the transatlantic".

Clustering and communities
A community within a network is a collection of vertices with dense and strong connections among themselves but sparser and weaker connections to other vertices. [2] identified communities among countries trading several important commodities. Here we take a more granular look and identify natural trading communities among container ports, as revealed by LSCI-weighted links.
To recognize communities, we rely on an objective function termed modularity. The idea is that the modularity Q of a group of communities {c i } is large when there is more total weight contributed by edges within the communities than might be expected by chance [8]. More formally, where A ij has value w ij if there is a link of weight w ij from vertex (port) i to vertex (port) j, m = ij A ij , δ ij is the Kronecker delta symbol, and c (i) is the index of the community to which vertex (port) i is assigned.
To identify communities in a network one must search over all partitions {c i } of the vertices to find one that maximizes modularity Q. We used the heuristic search method of [7], which is known to work well, under which the SC network resolved into eight communities based on links weighted by LSCI. The results, shown in Figure 4, are suggestive. For example, the computation clearly recognized important global patterns, including trans-Pacific trade (Figure 4a), as well as trans-Atlantic (Figure 4b), and intra-American trade ( Figure 4e). Other observations: • Figure 4a: This community is the most strongly-defined in the sense that it includes the ports that contribute most to the total modularity, such as the giants Shanghai, Ningbo, and Hong Kong, and these Asian ports are the anchors of this community.
It may seem surprising that this Pacific-spanning community also includes the Caribbean port of Colon, Panama (all the other Panamanian ports are, as would be expected, in the Caribbean community of Figure 4e). But this makes sense because many services from Asia to the US East Coast find it convenient to transship at Colon for subsequent disbursement throughout the Caribbean.
• Figure 4b: Rotterdam and Hamburg are the core ports of this community.
• Figure 4c: The Mideast community is based on trade through the Suez Canal. It includes East Africa above the ports of Tanzania and the Comoros and Seychelles Islands.
• Figure 4d: The East African ports below Tanzania, including the large ports of South Africa, are better connected to the West African trading community than to others. Tanjung Pelepas is the easternmost member, reflecting its role as point of distribution of manufactured goods from East Asia to Africa. The few European members are connected primarily through the ports of Tanger or Algeciras.
• Figure 4e: The Caribbean community includes two outliers inviting comment. Wilmington, Delaware, in the US, has strong ties to Central America because of its specialization in the handling of tropical fruits and fruit juices. On the west coast, San Diego is more strongly connected to Latin America than to East Asia because the Asian services prefer to call at Los Angeles or Long Beach for their larger regional market and superior hinterland storage and transportation infrastructure.
• Figure 4f: Port-of-Spain (Trinidad) is the northernmost member. Services travel from it into this community.
• Figure 4g: This community is an artifact of the isolation of New Zealand. It consists of the regional ports of Lyttelton, Napier, Port Chalmers, and Wellington, which have very few direct international connections. They are better connected amongst themselves than to the rest of the world. The international connections to New Zealand call mainly at Auckland and Tauranga, which are members of the Asia-Pac and trans-Pacific trading community.
• Figure 4h: This is another community determined by geography. These ports are locally connected but all significant connections to the outside world are mainly through a few ports near the Straits of Gibraltar, through which ships must pass to enter the Mediterranean Sea.
The ports that contribute most to the modularity score of a community are, in a sense, the anchors of those communities. Those of highest modularity are overwhelmingly Asian and especially Chinese, with the top ten being Shanghai, Ningbo, Hong Kong, Busan, Rotterdam, Yantian, Hamburg, Port Klang (Malaysia), and Qingdao. The ports that contribute most within the Trans-Atlantic community are Rotterdam, Hamburg, and Savannah; within the South Asia/Mideast community: Port Klang, Jeddah, and Dubai; within the West/South Africa: Tanjung Pelepas, Cape Town, and Durban; and within the southeastern US, Caribbean, and Pacific South American community: Callao (Peru), Manzanillo (Panama), and Balboa (Panama).
It is worth noting that Singapore is not among the ten largest contributors to modularity. It is a member of the powerful Asia-Pac and trans-Pacific community, but it does not have dense local connections as do the big China ports. Instead, it serves more more as a transshipment hub, with services to and from other ports that may not be directly connected themselves. This is reflected in that the clustering coefficient of Singapore, which measures how connected to each other are its immediate neighbors [11], is the very lowest among all container ports, followed by other important transshipment hubs such as Port Klang, Algeciras, Kingston, and Cartagena. These ports send and receive containers to many other ports, but their immediate neighbors do not ship much directly to each other.

A new index of strength for container ports
We have defined the weight of a link to be the value of its LSCI; now we use these weights to compute a new index of port connectivity. We compute the Container Port Connectivity Index (CPCI) by the "HITS" algorithm ("Hyperlink-Induced Topic Search"), which is an eigenvector-based method to rank web pages [6]. (See Appendix A for details.) The HITS algorithm computes two scores for each vertex of a network of directed edges. In the context of container shipping, we refer to these as inbound and outbound scores. Roughly speaking, a port with a high inbound score has greater power to aggregate goods; and a port with a high outbound score has greater power to distribute goods.
The CPCI will differentiate between the port of Figure 5 and one with the directions of freight flow reversed. A port will be assigned a high inbound score if container capacity  flows to it from ports with high outbound score, or if it is not too far downstream from a such a port. Similarly, a port will be assigned a high outbound score if container capacity flows from it to many ports with high inbound score, or if it is not too far upstream from such a port. Figure 5: This port has direct connections to many ports, and so will receive a higher outbound score than would a port with the direction of freight flows reversed.
Praia, a container port in Cape Verde, is an example. It is unusual in the region in having a relatively high inbound score, which arises because it receives service directly from Algeciras, a regional hub with a relatively high outbound score. That service continues on to St. Vicente, which has a lower inbound score because it is further removed from Alegeciras. St. Vicente, in turn, has a higher inbound score than the next few regional ports farther down-service.
The CPCI thus combines economics with network topology. Economics is reflected in the weight of the links, which are scored by an adaptation of the LSCI. And network topology is reflected in the recursive ranking of the HITS algorithm. A port scores well under the CPCI if it is has strong trade connections; but it also inherits some of the importance of its neighbors, and -with diminishing effect -their neighbors, and so on.

Ranking ports by CPCI
As measured by the CPCI the best-connected ports are not necessarily those with the most links. For example, Cartagena receives services from 20 different ports, which is more than the 18 received by Yantian. Nevertheless, Yantian ranks much higher as scored by the CPCI with regard to inbound (0.286 versus 0.0036). This is a reflection of the fact that the CPCI depends not just on the number of links but also on the weights of those links and the scores of the ports to which they connect.
Similarly, the best-connected ports are not necessarily the busiest. Table 2 shows the CPCI scores of the twenty ports that scored highest with respect to our measure of inbound connectivity (where, for comparison, we have included ranking by TEUs handled in 2010). Similarly, Table 3 gives the highest ranking ports by outbound score. The ports of East Asia dominate with respect to either measure, inbound or outbound. Even though Shanghai handled more TEUs, Hong Kong ranks higher by CPCI, presumably because it is better connected within the global container-shipping network 3 .  On the other hand our ranking appears to neglect the high-volume European ports such as Rotterdam, Antwerp, and Hamburg, as well as the busy Mideast port of Dubai, but this is because they are more isolated from other big ports. In contrast, the big East Asian ports are well-connected with the rest of the world -and with each other, which further increases their scores. Figure 6 plots scores of all 457 ports. Several stand out for the significant differences between inbound and outbound scores, and these differences illustrate how the CPCI can make structural distinctions about the position of a port in the network.
Los Angeles and Long Beach have inbound scores that are relatively high in comparison to outbound scores. This reflects the fact that these are the two main ports of entry for product manufactured in East Asia. To reduce in-transit inventory, powerful retailers in North America insist that their freight be the last loaded out of Asia and the first unloaded in North America, and so there are many direct links from big Asian ports into Los Angeles and Long Beach. Services that have traversed the Pacific Ocean to call at Los Angeles or Long Beach then typically call at Oakland before returning to the large ports of Asia. Consequently, Oakland has a high outbound score in comparison to its inbound score. This is a general pattern that may be observed along many service loops: ports that are immediately downstream from important ports tend to have higher inbound scores, while ports toward the end of the loop tend to have higher outbound scores. Da Chan Bay, into one, ranked fourth in volume. Table 4 shows that, among the ports of North America, the west coast ports, led by Los Angeles and Long Beach, dominate by the measure of inbound connectivity, reflecting the many services that come directly from the great manufacturing centers of East Asia. Moreover, many of the west coast ports score much higher with respect to inbound connectivity than to outbound.

North American ports
New York is the only port on the east coast to score highly with respect to inbound scores. But Table 5 shows that east coast ports such as Savannah are more competitive with respect to outbound scores. It will be interesting to see how these rankings change after the widening of the Panama Canal is completed in 2014.

Comparison of Container Port Connectivity Index with LSCI
The LSCI is defined for countries, while the CPCI is defined for ports. Nevertheless, we can directly compare the rankings by these indices of those countries with a single dominant port. We identified 64 container ports that were, within our data source, unique within their country, and then compared rankings by each of the 2011 LSCI and by each of the inbound and the outbound versions of Container Port Connectivity Index. The results appear in Table 6 and are generally consonant: those ranked among the top ten by LSCI are among the top twenty by CPCI, either inbound or outbound.
The differences in ranking between Gothenburg and Gdansk again illustrate how our suggested index captures structure of the network. Gdansk ranks relatively high in inbound strength because it receives shipments directly from Hamburg but ships only to the lesser port of Aarhus, which accounts for its relatively lower ranking in outbound strength. On the other hand, Gothenburg receives freight only from Aarhus, but it ships to the more significant port of Bremerhaven, from which it derives a higher outbound score.

Comparison with other measures of centrality
The CPCI is based on both economics and network theory and so, we believe, provides a better measure of trade-connectivity than alternative measures. One measure of the centrality of a vertex within a network is degree centrality, which in our context tells from how many other ports a port receives direct shipments (in-degree) or to how many others it sends direct shipments (out-degree). While interesting, this measure neglects economic issues as the volume of trade along each link. It merely records the fact of trade.
Some measures of centrality incorporate distance. This can be useful because shipping cost is roughly proportional to distance. One such measure is closeness. The sum of shortest distances from a vertex to all other vertices is its farness. The closeness of a port is the reciprocal of its farness. We can compute the closeness centrality to a port and also from a port. Again, this ignores intensity of trade. When distance from every port is equally important, the five ports of greatest inbound closeness are all in Spain; and the most important ports with regard to outbound closeness are all in and around Panama.
Betweenness is the number of shortest paths within the network on which the vertex (port) lies. We find the twenty ports of greatest betweenness centrality to be quite different

Conclusions
The Container Port Connectivity Index is a descriptive index. It summarizes in two numbers something about how each port is connected to others within the larger network. Importantly, the Container Port Connectivity Index expresses more than local connectivity to immediate neighbors but also neighbors-of-neighbors, and so on, with all links weighted by an estimate of the intensity of trade (currently, a specialization of the Liner Shipping Connectivity Index). Furthermore, the Container Port Connectivity Index allows inbound and outbound strengths to be studied independently, and this gives a more detailed look at the economic roles played by each port. Finally, the Container Port Connectivity Index supports what-if analysis in a way that survey-based indices cannot. Any index of logistics performance is an attempt to summarize a complex environment.
The LSCI may be criticized for the rather arbitrary way that data is agglomerated; and the LPI for its reliance on perception rather than measurement. The Container Port Connectivity Index has weaknesses as well. In particular, because it uses an LSCI-like computation, it inherits any criticism of that. In addition, while the CPCI scores are based on connectivity, they are not based on geography, and so do not explicitly account for travel time between ports. Nevertheless, the Container Port Connectivity Index has many useful properties. In particular, it is based on link weights that are computed just like the Liner Shipping Connectivity Index; and because the LSCI has been vetted by economists as capturing intensities of trade, our index inherits that descriptive power and exercises it at a more granular level 4 .
We expect the Container Port Connectivity Index to be useful in some of the same ways as the LSCI. This may include explaining how the container-shipping network changes over time or using the edge weights and port scores as explanatory variables for economic phenomena. We believe these finer-grained statistics will be easier to understand and to explain because they directly reflect immediate decisions of primary actors such as shipping companies.
It should be remarked that none of the network models discussed herein captures anything about transshipment. Even though there may be direct links from port A to port B and from port B to port C, to transport a container from A to C may require transshipment. In this case ports A and C are further apart in both time and cost than they might appear in these models. Unfortunately there is insufficient such information available to piece together a useful view; but if that information were available, it could be incorporated into a model that explicitly represents the structure of scheduled liner services, along the lines of [3], but with directed links.

A The HITS algorithm
The HITS algorithm was originally developed to rank the web pages for a search engine [6]. It computes two scores for each web page, a hub score and an authority score, where a good authority page is a page with many incoming links, while a good hub page is a page with many outgoing links. The idea of the HITS algorithm is that any page that is cited by important hub pages should be considered an authority. Similarly, any page that cites important authority pages should be considered a hub.
In the context of container shipping, we interpret an authority as a port that receives shipments from many ports and so is good at aggregating shipments, and so worthy of a high CPCI score for inbound. Similarly, a hub is for us a port that sends shipments to many other ports, and so is good at distribution, which results in a high CPCI score for outbound.
More generally, the HITS algorithm can be exercised on any directed network. Let E be the set of directed edges of a network and λ a constant. Then the authority and hub scores of vertex i are the solutions x i and y i to If A is the adjacency matrix of the network, the equations for vertices i = 1, . . . , n can be written in matrix form as Each of the above systems of equations is equivalent to the problem of finding eigenpairs satisfying constraints defined by the system of equations itself, and the importance scores are the principal eigenvectors corresponding to each of the system of equations. Such measures of centrality are known as spectral centrality measures [10].