How Complex Is a Fractal? Head/tail Breaks and Fractional Hierarchy

A fractal bears a complex structure that is reflected in a scaling hierarchy, indicating that there are far more small things than large ones. This scaling hierarchy can be effectively derived using head/tail breaks—a clustering and visualization tool for data with a heavy-tailed distribution—and quantified by a head/tail breaks-induced integer, called ht-index, indicating the number of clusters or hierarchical levels. However, this integral ht-index has been found to be less precise for many fractals at their different phrases of development. This paper refines the ht-index as a fraction to measure the scaling hierarchy of a fractal more precisely within a coherent whole and further assigns a fractional ht-index—the fht-index—to an individual data value of a data series that represents the fractal. We developed two case studies to demonstrate the advantages of the fht-index, in comparison with the ht-index. We found that the fractional ht-index or fractional hierarchy in general can help characterize a fractal set or pattern in a much more precise manner. The index may help create intermediate map scales between two consecutive map scales.

be an intege ee, as seen ab to become es that ht-in has been us , as previous s, scaling hie ly, the fht-in aper further hierarchical l ing and calcu perspective, ight of scale n panel (b). T of scale 1/9, a n panel (e)gral part of th hole, respecti ies or sub-wh d constitute segments, respectively, and their ht-indexes are 3 and 4, because the recurring times of far more short segments than long ones are 2 and 3 as shown in panels (b) and (d) of the same figure. These two htindexes are exactly 3 and 4, because removing one of the shortest segments from these two curves would not obtain the ht-indexes of 3 and 4, while adding one of the shortest segments would not increase the ht-indexes because of its insensitivity. From these two indexes, we can conclude that the Koch curve shown in panel (e) of Figure 1 must have an fht-index of 3.x, where 0 < x <1. However, ht-index as previously defined (Jiang and Yin 2014) captures only approximately scaling hierarchy, and is therefore less sensitive to some small changes. This is what motivates us to develop the fhtindex.

III. Wholes and sub-wholes
A fundamental concept of this paper is whole or sub-wholes. Assuming that the above ten numbers constitute a complete whole, the first three numbers or the first head would be a sub-whole. In other words, given a data series as a whole, its head and the head of the head (in a recursive fashion) would be the sub-wholes. This is just a simple understanding of whole or sub-wholes. The reader needs to refer to the following formal definition and methods for better understanding the whole or sub-wholes.
It is important to realize that the curve shown in panel (e) of Figure 1 is not a whole, but part of a whole -the curve shown in panel (c) of the same figure. In this paper, a whole is defined as a data series of n values that ranges from the largest to smallest and meets the following condition: htindex(n) -ht-index(n-1) = 1. For example, the 52 segments constitute a whole because ht-index(52)ht-index(51) = 1. This definition of whole applies to sub-wholes as well. For example, the first 13 values of the 52 segments constitute a sub-whole because ht-index(13) -ht-index(12) = 1. According to the definition of whole or sub-whole, a Koch curve is not a whole, but the seemingly incomplete Koch curves shown in panels (a) and (c) are a sub-whole or a whole. In other words, the curve in panel (e) of Figure 1 is a whole according to the strict definition of Koch curve, but it is not a whole according to the very definition of head/tail breaks.
Given the 52 segments as a whole, ranking all its segments from the longest (of scale 1) to the shortest (of scale 1/27) creates a data series shown in panel (g) of Figure 1 -the row named "whole" -where data and its whole are shown together with its index in the first three rows. We have already derived the sub-whole of the 13 segments in the previous paragraph with the ht-index of 3. We further determine other sub-wholes or sub-data: the first three segments {1, 1/3, 1/3} with the ht-index of 2, and the first segment {1} with the ht-index of 1. All these sub-wholes (or sub-data series) are with integral ht-indexes as shown in panel (g) of Figure 1. These indexes with integral ht-indexes are called anchors for each sub-whole or whole. Note that the sub-whole and whole constitute a nested relationship; that is, the first sub-whole is within the second sub-whole, the first two sub-wholes are within the third sub-whole, and all the three sub-wholes are within the whole.

IV. Methods -fht-index for a data series and its individual data
In order to determine the fht-index of the first 21 segments, we divided the data series range between the 13 th and the 52 nd (or the range between the third and fourth anchors) equally into 39 intervals and converted the equal intervals from a linear scale to a nonlinear scale using a power function of * , where j is the index of each interval. This provides us with the fhtindex of the first 21 segments: 3.042 (or x = 0.042 in panel (g) of Figure 1).
To summarize the calculation of the fht-index in general, given a data series, we first seek its whole by appending new data values up to the next hierarchical level, and sub-wholes by shrinking the data series to previous levels recursively. A whole is obtained from a data series by appending small values at its smallest end until the ht-index is increased to the next level exactly. In a similar vein, starting from the first value as the first sub-whole, more sub-wholes are obtained by adding values one by one until ht-index is increased to a next level exactly. A whole and its sub-wholes constitute nesting relationships. As a rule for determining the whole and sub-wholes or the anchors, the ht-index at index k must meet the condition of ht-index(k) -ht-index(k-1) = 1. Next, the range between two largest anchors, representing the largest sub-whole and the whole, respectively, should be equally interpolated and the equal intervals are then converted into a nonlinear scale to get the fht-index of a data series.
Having obtained the fht-index of the data series, we assign an fht-index to each data value of the data series. There are two ways to do this. The first is to take a whole whose ht-index is an integer, and the other is to take the data series (which is unlikely to be a whole) whose ht-index is a fraction. The data series to be examined is usually unlikely to be exactly a whole. Nevertheless, the input data series could incidentally be a whole. As shown in panel (g) of Figure 1, the largest data value is assigned to the first anchor, so it has the highest ht-index of 4, and the smallest data value is assigned to the fourth anchor, so it has the lowest ht-index of 1. Having assigned all integral ht-indexes to these anchors, other indexes are assigned to some fht-indexes by interpolating the ranges between these anchors. This assignment of integral ht-indexes looks like the flip process of determining anchors; the anchors increase from the first data value to the last, while the integral ht-indexes decrease from the first data value to the last. After assigning the integral ht-indexes, we have to interpolate the range between subwholes and the range between sub-whole and the whole in order to obtain fht-indexes of other individual values. Eventually, the fht-index of the 21 st segment is 1.63 (or y = 0.63 in panel (g) of Figure 1). The above procedure for a whole can be packed as a function of the fht-index:

Function Fht-index (whole)
// This function returns a fht-index for each value in the whole // The data is sorted ranging from the largest to the smallest Anchors (whole) // this function returns AnchorNum of the whole Flip AnchorNum in whole; // The largest AnchorNum is assigned to the lowest marked index, while //the smallest AnchorNum is assigned to the largest marked index Foreach marked index p: Find its next marked index p'; range = p' -p; subHtFraction = Interpolation (AnchorNum, range); htFraction.add(subHtFraction); Return htFraction;

End Function
For a data series that is not incidentally a whole, it is necessary to append some smallest values in order to make it a whole. While this is simple for the Koch curves, for real-world data it is important to get its trend line that best fits the data series. In this regard, it is recommended to use trend line functions such as power law, logarithmic, polynominal, and exponential. As a rule, the most-fit trend line must be chosen for a specific data series. The fht-index (e.g., 3.x) of the data series is obtained by interpolating the range between the largest sub-whole and the whole. The anchors are with integral htindexes, but in the opposite order; the largest anchors with the smallest integral ht-index, and smallest anchors with the largest integral ht-index. Those data values between anchors or between the largest anchors and the whole must be obtained through interpolation. To this point, we have relied on the Koch curves to illustrate the ideas of fht-index in order to make it more accessible to experts as well as non-experts.

V. Case studies and FHTCalculator
To further explore the fht-index, we applied it to two case studies. The first case study involves 36 city sizes that follow Zipf's law (Zipf 1949) exactly: 1, 1/2, 1/3,…, and 1/36 (panel (a) of Figure 2) with an ht-index of 3. The second case study involves 8,106 natural cities with an ht-index of 7, derived from the social media Brightkite in the United States (panel (c) of Figure 2, Jiang and Miao 2015). For the first case study, appending the smallest values is pre-determined by the rank sizes, while for the second case study the smallest values are determined by a power law function of 5,03 . of the 8,016 city sizes. Unlike the ht-indexes that are discrete, fht-indexes for individual data values, as shown in panels (b) and (d), are continuous, and thus capture scaling hierarchy more precisely than the discrete ht-indexes. The fht-indexes of these two data are 3.81 and 7.04, respectively based on the itial stage ht-index). s but also eloped by e but also ent means ess (Jiang t is more ore small fht-index

VII. Conclusion
This paper refines the ht-index to be a fraction to better characterize the scaling hierarchy of a fractal or data series with a heavy-tailed distribution. The existing integral ht-index is implicitly based on the assumption that any given data series of a heavy-tailed distribution is always a whole. This assumption does not always hold true. In many cases, a data series is likely to be part of a whole rather than a whole itself. Based on this new perception, we put a data series within a whole and seek its sub-wholes or anchors in order to derive its fht-index. This fht-index is always greater than the integral ht-index. We further assign an fht-index to each data value of the data series. More precisely, the anchors are with integral ht-indexes, while other data values or non-anchors are with fht-indexes. The fht-index may help measure degree of living structure or more efficiently and effectively visualize fractal urban structure and nonlinear dynamics, since the structure and dynamics have been firstly captured by the fht-index. In the future, we will seek applications of the fht-index to better characterize geographic forms and processes, or urban structure and dynamics in particular, and even beyond the understanding towards the making -how to better heal and design built environments.