Machine Learning and Algebraic Approaches towards Complete Matter Spectra in 4d F-theory

Motivated by engineering vector-like (Higgs) pairs in the spectrum of 4d F-theory compactifications, we combine machine learning and algebraic geometry techniques to analyze line bundle cohomologies on families of holomorphic curves. To quantify jumps of these cohomologies, we first generate 1.8 million pairs of line bundles and curves embedded in dP3, for which we compute the cohomologies. A white-box machine learning approach trained on this data provides intuition for jumps due to curve splittings, which we use to construct additional vector-like Higgs-pairs in an F-Theory toy model. We also find that, in order to explain quantitatively the full dataset, further tools from algebraic geometry, in particular Brill–Noether theory, are required. Using these ingredients, we introduce a diagrammatic way to express cohomology jumps across the parameter space of each family of matter curves, which reflects a stratification of the F-theory complex structure moduli space in terms of the vector-like spectrum. Furthermore, these insights provide an algorithmically efficient way to estimate the possible cohomology dimensions across the entire parameter space. 1 ar X iv :2 00 7. 00 00 9v 1 [ he pth ] 3 0 Ju n 20 20


Introduction
The spectrum of light chiral particles is a defining feature of any four dimensional quantum field theory. Their precise number affects aspects such as the moduli space of vacua, or the behavior of the theory under RG flow. Moreover, they are also of paramount importance to phenomenology, in particular when it comes to models of beyond-the-Standard-Model physics. Therefore, to be able to draw formal and phenomenological lessons from string theory about 4d field theories, one needs efficient methods to compute the spectrum in compactification scenarios.
From an effective field theory perspective, the chiral excess χ(R) -the difference between chiral and anti-chiral modes of the same matter representation R -is a discrete parameter, whereas the individual number of light (anti-)chiral modes depend on continuous mass parameters. In string theory, this is reflected by the fact that χ(R) is typically a topologically protected quantity, whereas the (perturbative) mass parameters 1 are captured by continuous deformations, or moduli, which for certain values can lead to a pair of chiral and anti-chiral modes -a vector-like pair -to become massless.
In many string compactification scenarios, we know in principle what the relevant computations are: massless fields are zero modes of some differential operators on the internal space, and therefore counted by appropriate sheaf cohomologies. However, oftentimes these computations are so complicated that in practice, they can only be carried out explicitly for toy models, or for specialized values of the deformation parameters. On the other hand, an exact understanding of how the cohomologies depend on these parameters is necessary for a complete description of the physical interpretation. The moduli dependence and the possibility of jumps in the massless spectrum have been first discussed in the context of heterotic string theory in [1][2][3][4][5][6]. More recently, the complex structure moduli dependence of the cohomology dimensions has been studied in [7,8] and [9] in the context of instanton and perturbative superpotential terms, respectively.
In comparison, an analogous analysis in the context of F-theory compactifications [10] is largely missing and has only been discussed in part in [11]. The main reason is because, unlike the chiral spectrum which is accessible via intersection theory [12][13][14][15][16][17][18][19][20][21][22][23][24][25], the vector-like spectrum in F-theory depends on a gauge background, which is encoded in mathematically rather intricate objects such as the intermediate Jacobian and Deligne cohomology [26][27][28][29]. Recent progress [30,31] has made the spectrum computationally more accessible. Namely, for a four-dimensional N = 1 F-theory compactifications on an elliptically fibered Calabi-Yau fourfold π : Y 4 → B 3 with a given gauge background, the massless spectrum of chiral particles in representation R can be counted by certain line bundle cohomologies h i (C R , L R ), i = 0, 1 on complex curves C R ⊂ B 3 -the matter curves -in the base. Given a compact model with a fixed gauge background, C R and L R are specified by global data in terms of polynomials on B 3 , whose coefficients are (parts of) the complex structure parameters of Y 4 . In this case, one can model the line bundle as a coherent sheaf on B 3 , whose cohomology computation can be systematized in a computer algebra system [32]. While this algorithm can be applied to a broad class of global F-theory models, the calculations for almost all phenomenologically interesting examples overburden even super-computers specifically designed for such tasks. The reason is that here, and in fact in many cohomology computations using commutative algebra or computational algebraic geometry, we need to compute Groebner Bases, whose computational complexity scales extremely poorly.
The introduction of ideas from Big Data and machine learning (ML) to string phenomenology [33][34][35][36] provides new perspectives; see [37] for an introduction and comprehensive overview. One advantage that a trained algorithm provides is that it recognizes more subtle patterns without the need of a complete, "microscopic" understanding of the task. In particular, recent studies suggest that supervised learning can be used to predict line bundle cohomologies in string compactifications [35,38,39]. One may be tempted to apply these techniques, which are mostly motivated by heterotic compactifications, directly to the F-theory. However, there is a significant difference in the way the line bundle data are specified in global heterotic vs. Ftheory models. In heterotic examples, the line bundles are typically given in a "canonical" way, namely as an element of the Picard group Pic(X) of the underlying manifold X. This was used, e.g., in [40,41] to derive formulae for line bundle cohomologies in terms of topological indices.
However, in the F-theory setting, there is no straightforward fashion to extract even the structure of the Picard group of C R , given its polynomial description. Likewise, because the same data specifies L R essentially as a sum of points p i on B 3 that also lie on C R , it is by no means obvious if, say, p 1 − p 2 is trivial or not on C R . What makes the situation particularly challenging is that, by varying the complex structure parameters, the structure of Pic(C R ) as well as the points specifying L R will change. Together with the fact that we simply do not have a large data set of non-trivial F-theory examples, it is a priori unclear whether we could train an algorithm that reliably predicts the cohomologies for realistic models with arbitrary complex parameters.
Instead, we will use machine learning techniques on less complex examples to gain some intuition for circumstances under which line bundle cohomologies jump. Physically, this is already interesting as such a jump can engineer one or possibly more massless vector-like pairs in situations where one generically expects none. Even if the trained algorithm does not perform perfectly, understanding its strategy can provide a guiding principle for the behavior of the vector-like spectrum in non-trivial examples. For this reason, we focus on white-box machine learning techniques, in particular on decision trees.
To fully understand the results of the machine learning, we further employ "formal" techniques from algebraic geometry, in the form of Brill-Noether theory. This allows to identify "microscopically" the sources for jumps in cohomology, either from the curve C R or the line bundle L R becoming non-generic. With these insights, we provide an algorithmic way to estimate the admissible numbers of vector-like pairs over the entire parameter space of a matter curve in a global F-theory model with given gauge background. Furthermore, our analysis also reveals a convenient diagrammatic way to encode the stratification on the parameter space induced by the number of vector-like pairs. We believe that this is progress towards understanding the full complex structure dependence of the vector-like spectrum in global F-theory models.
The paper is organized as follows. In section 2 we discuss our machine learning approach. Using the exact methods implemented in [42], we generate a database [43] of cohomologies of pullback line bundles on hypersurface curves in dP 3 . Interpreting these results with decision trees, we find that curve splittings typically lead to jumps in the vector-like spectrum. In section 3, we demonstrate that such curve splittings provide a practial way to engineer jumps in a global F-theory GUT-model. To investigate the origin of these jumps, we turn in section 4 to algebraic and analytic techniques. We find a unified perspective on jumps due to curve splittings and non-generic line bundles described by Brill-Noether theory, and introduce a diagrammatic way to illustrate the natural stratification of the complex structure parameter space in terms of the vector-like spectrum. In section 5, we present a refined analysis of jumps due to curve splittings. This rests on a procedure to count the global sections by gluing "local contributions" along intersections of curve components, which leads to two interesting results: First, we are able to formulate sufficient conditions for jumps of vector-like spectra. Second, we can propose an algorithmic h 0 estimate, which relies mostly on topological data, and hence provides a quick, approximative scan of the vector-like spectrum over the entire parameter space of a matter curve. In contrast to currently existing exact methods, such as [42], our implementation [44] has a much lower demand of computational resources and run times.

Introduction to Decision Trees
We are interested in tuning complex structure moduli to engineer jumps in the dimensions of sheaf cohomologies over complex curves. It is a priori not clear how to efficiently identify these subloci in complex structure moduli space. In order to state (at least) necessary conditions for jumps to occur, we address the problem using ML. Since we are interested in interpreting the results of the ML algorithm, we resort to white-box models, in particular to binary decision trees.
In more detail, we use binary decision trees as classifiers in supervised machine learning, following the notation and conventions of [37]. Supervised learning means that we have a set of inputs x µ i (called features) together with associated labels 2 y i , where i = 1, . . . , N counts the feature-label-pairs, and µ = 1, . . . , F counts the F features of each input. This set of feature-label combinations is now divided into a train set and a test set (typically around 90 percent of the pairs are assigned to the train set and 10 percent to the test set). Using the train set, an algorithm is trained to learn a map from the features to the labels. The training consists of adjusting parameters of the algorithm to optimize the map. This is typically done by minimizing the loss, which is a measure for how well the algorithm reproduces the labels. Once training ends, the algorithm is tested on the test set. This is necessary in order to see how well it performs on (hitherto unseen) data. If the test set have been chosen generically enough, performance on the test set will serve as an indicator for how well the trained algorithm will perform.
After this general discussion, let us describe these steps in the context of binary decision trees. Trees are data structures that appear abundantly in computer science. They can be thought of as acyclic, directed, connected graphs with a unique root vertex (in trees, vertices are called nodes). In binary trees, each node has either zero or exactly two vertices, each of which is connected to a unique node. These two subnodes are called child nodes, and the original node is called parent node. A node with no children is called a leaf node.
A decision tree expects numerical features x (0) i . It then introduces boolean splitting criteria of the type x (0) i ≤ κ i for some constant κ i ∈ R. All data that satisfy this criterion are assigned to one child node, while data that does not satisfy the criterion is assigned to the other child node. The tree is now built recursively by splitting each child node according to some other feature x (0) j ≤ κ j , etc. This procedure segments feature space (which is in our case R N ) along hyperplanes x i = κ i with the goal to find regions such that all inputs in that region belong to the same class.
At each node, it is checked how many of the data carry which label. For single membership classification problems, which is what we will be using, the labels are just the different classes which the input feature vector belongs to. A typical loss function is the Gini impurity of a node, which measures how "impure" the data at that node actually is, i.e., how many features with different classes are in the region in feature space corresponding to this node. Denoting the set of features in the region of node a by N a , we find for K classes the fraction of elements that belong to a class y k ∈ K via (2.1) The Gini impurity G a at node a can then be written as In particular, if all elements of N a belong to the same class, G a = 0. In such a case, the node is turned into a leaf, since no further splits are necessary. The decision tree is now trained by starting from the root node and trying to split by any of the F features. For κ i , one tries all 3 intermediate values between consecutive values of feature i. The solution that leads to the lowest Gini impurity at the child nodes is accepted, and the procedure is repeated for the two child nodes and the remaining features, etc.
In cases where the map from the input to the labels is not one-to-many, one can eventually reach a perfect classification, if need be with a single element in each region. Typically, this is undesired and hence one stops splitting a node if there are less than some fixed number of elements in its corresponding region. Turning this around, if the minimal number at which a node is split is set to 2, and if the tree does not find a solution where all leaves have Gini impurity zero, this means that the map defined by the input-label-pairs is many-to-one, i.e., even all features combined are not sufficient to distinguish between the class labels.

Divisors and line bundles on dP 3
While in the general F-theoretic setup, matters curves C R are a priori defined on a threefold B 3 , in most models there is a distinguished surface S ⊂ B 3 that is wrapped by the 7-branes supporting a non-abelian gauge theory, in which the matter curve sits. A part of the complex structure moduli then parametrizes deformations of the curve inside S, which will in general affect the vector-like spectrum. These deformations can be described by pulling back all defining polynomials on B 3 onto S, and then simply consider the coefficients of these in terms of the homogeneous coordinates on S.
For our data collection, we will mimic such a "pulled back" description by focusing on curves embedded inside the del Pezzo surface dP 3 . One advantage of this choice is that dP 3 has a toric description in terms of a reflexive polygon, which simplifies many computations. Another one is that it fits the setup for section 3, where we consider an F-theory toy model with non-abelian gauge degrees of freedom localized precisely on a dP 3 surface.
To set the notation, we denote the toric coordinates of dP 3 by x i , i = 1, ..., 6. They are graded by homogeneous scalings with associated divisor classes, which are summarized in the following table: The columns give the divisor classes of the coordinate's vanishing loci. E.g., The Stanley-Reisner ideal is 4) and the anti-canonical class is The independent intersection numbers are In order to simplify the notation, we introduce the short-hand notation where m i (x 1 , . . . , x 6 ) are monomials of appropriate multi-degree under the grading in (2.3). Importantly, the coefficients c i parametrize the shape of the curve and thus model (parts of) the complex structure parameters of a global F-theory compactification. The (arithmetic) genus of the curve depends only on the divisor class [C] of the curve (equivalently, the multi-degree of the monomials in P ) and is given via adjunction formula as Next, we also need to specify a line bundle L on C. Again, instead of focusing on the most general setup, where L is directly specified by a set of points on C, we consider the slightly simpler cases where L is a pullback of a line bundle L = O dP 3 (D) on dP 3 : (2.8) One can think of the points then as the (weighted) intersections {a i p i } between C and a generic representative in the class D. Note that in this case, another representative of D, intersecting C at {b j p j }, necessarily must give the same divisor on C, i.e., {a i p i } ∼ {b j p j } are linearly equivalent on C. However, in general we cannot say anything about linear equivalences among any two of the points. Therefore, we expect, and also will find, that even for pullback line bundles, there can be special divisor alignments, i.e., p 1 and p 2 , say, move into special positions, when we deform C, thus leading to jumps in the cohomology.

Generating the data set
We generate training data by picking 6 different curve classes [C] with genus 1 ≤ g ≤ 6. For each class we consider several line bundles L on dP 3 and compute (using techniques from [32]) the cohomologies h i (C(c), L| C(c) ), where we vary the curve C(c) by considering all possible combinations of c i ∈ {0, 1}, i = 1, . . . , d for the coefficients. 4 This way, we calculate cohomologies of L pulled back to 2 d − 1 genus g curves in the class [C]. While this seems to be a very limited choice, it nevertheless reveals enough structures to correlate jumps in cohomology with degenerations of the geometry. On the other hand, it also introduces some bias in the data. For example, a common way the curve degenerates is if all monomials in the defining polynomial share a common variable; this happens frequently if many c i are set to 0. However, for certain polynomials, restricting c i ∈ {0, 1} misses out possible factorizations, where factors are not just a single variable. We will see later that we can easily generalize the interpretation based on our data with algebraic methods to these cases as well.
For this data set, we then compute/collect the following features for each choice of line bundle L on each curve C with coefficients c i : F1) The coefficients c i that define the curve.
F2) The genus of the curve. F8) The genera of the split components.
F9) The intersection numbers among the split components.
Note that all of this data is numerical (the true/false features are encoded as 1/0). We aggregate the features F4-F9 into a single feature called the split type. We want to consider two curves as identical if their features F4-F9 are identical (up to relabeling the individual components). In order to check this, we would in principle have to check all permutations of all split components and see whether any of them have the same data. Since this becomes prohibitively expensive, we perform the following necessary checks: • Are the data F4 and F5 identical for the two curves?
• Are the data F6-F8 identical as sets for the two curves? This can be checked by ordering the tuples and comparing them, which is much faster than checking actual permutations.
• Is the determinant of the intersection matrix in F9 identical for the two curves? Note that the determinant is permutation invariant. However, at that point we do not check whether the permutations that make all sets match are actually the same.
Curves which are identical under these checks are assigned the same integer that encodes the split type. Equipped with this data, we generate four different data sets which we use to train the decision trees and compare the results. In the first, we use the coefficients c i as features and assign a label of 0 if the cohomology dimension of H 0 (C(c), L) has the generic (i.e., the lowest) value and a label of 1 if there is a jump. Note that at this point, we only classify the curve according to whether a jump occurs, but not according to how large the jump is. For the second data set, we use the same labels, while the features are taken to be the topological intersection numbers between the curve components and the line bundle divisors. For the third data set we use the split type as explained above. Finally, for the fourth data set, we use both the split type and the topological intersection numbers between the curve components and the line bundle divisor as features. In addition, we perform a train:test split of 90:10 for all four data sets.

Decision Trees to learn cohomology jumps
Training the decision trees only takes a few seconds on a modern desktop computer. We train a separate decision tree for each line bundle and each of the four data sets. It is instructive to compare the performance of all four training sets on both the train and the test set.
The results for the accuracy of the trained trees on the test set are summarized in Figure 1. One notices that the accuracy of all data sets improves with the genus of the curve. This is due to the fact that the size of the data set grows with the genus: While the genus 0 curve we are considering has only 7 coefficients c i and hence only 2 7 − 1 = 128 data points per line bundle, the genus 6 curve has 2 18 − 1 = 262143 data points.
For the blue data points, which uses the coefficients c i as labels, we find that the decision tree performs best. This is to be expected, since these are the finest feature set, i.e., the one with the most information, out of the four feature sets we studied. Indeed, the trees reach an accuracy of essentially 1 as soon as the training set becomes large enough (there are 3685 points in the training set for genus 3). For the other three data sets, we see that they perform worse, but still reaches high accuracies. Using just the split type as a feature, for the larger genus cases where enough data is available, we reach accuracies around 80 to 85 percent. Using the intersection numbers, accuracies around 94 percent are obtained. Lastly, combining the split type and the intersection numbers, improves the results obtained when either is used individually, to an accuracy of around 97 percent. This means that the two features contain different types of information which the three can use in order to improve its prediction when given access to both.
One can learn more information about the data by also analyzing the performance on the training set, as explained in Section 2.1. Indeed, we find that, when not imposing constraints on the tree, the accuracy on the train set when using the coefficients as features is always 100 percent. This is not surprising, since the coefficients uniquely identify each case and hence the tree can learn a sequence of splits that puts each data point in the correct leaf node (if necessary, this leaf might only contain this single data point). For the other data sets, we find  that the performance on the test set is already below 100 percent. Hence, the features are not enough to decide whether a jump in cohomology occurs, not even in principle. Let us illustrate this by looking at the decision tree trained on the full data set for a genus three curve D C = (4; −1, −1, −1) inside dP 3 with line bundle D L = (1, 2, −2, −1), cf. Section B.1.5. We give the full decision tree in Figure 2. Looking at the root node, we see that for this bundle, there are 4095 different data points ("samples"). Out of these, 1791 exhibit a cohomology jump for this line bundle, while 2304 do not. The tree assigns a class label to this (non-leaf) node based on the majority, which is "no jump". However, there are almost as many data points with a jump as there are data points without, which is why the uncertainty is high. This is encoded in the light blue color: the more certain a node predicts no jump, the darker blue it is colored. Similarly, the more certain there is a jump, the darker orange it is.
Recall that integers labelling the split type (based on the features F4-F9) are by construction small if the number of components the curve splits into is small. Hence, small split types correspond to irreducible curves, or curves with only few split components. We expect such curves being close to generic (in a sense that will be made mathematically more precise in Section 4), hence the cohomologies should also take generic values.  Indeed, we observe that the first split is performed according to whether or not the split type is smaller than 5.5. This first split already gives a good indicator in the sense that out of the 1710 training data points that have a split type of 5 or smaller, 85 percent actually do not have a jump in their cohomologies. This also illustrates that decision trees can be used for feature selection: important features that are good indicators for the classes tend to be used for splitting higher up in the tree, while more unimportant features are used further down the tree (or not at all, if they do not have any predictive power for the class membership). Now, in our case, we only have a single feature, but it is a composite feature of several quantities. The fact that the first split does not occur around the median (which would be 27) but at much smaller value indicates that the number of split components is a good criterion to distinguish jumps.
While the split types are integers, the tree always chooses half-integer decision boundaries. The reason is that the tree does not know that the feature only takes integer values. Hence, splitting in the middle between the feature values that appear in the train set will allow the most slack in either direction when the tree is presented with unseen data.
By focusing on the leaf nodes, we can also see that the tree is not classifying the data perfectly, not even the training data. Indeed, many nodes have a non-zero Gini impurity, i.e., both curves with and without jumps share the same split type associated with this leaf node. Looking for example at the bottom right leaf node, we see that three curves have the same split type (with value 48). However, two of these have a jump while one does not. This means that the topological data F4-F9 used to construct the split type is not enough to decide whether or not a cohomology jump occurs.

Jumps from curve splittings
We have seen that the decision tree trained on a combination of split types and intersection numbers performs very well. Moreover, the tree trained with just the split types splits on small split types first. This suggests that there is a tight correlation between changes in the topology of the curve and jumps in the line bundle cohomology. In particular, the data set has an abundance of cases with jumps where the curve C splits off one or more rigid components: For 78 (about 95%) of the 82 pairs of geometries D C and line bundles D L considered in our database, we find that we can split off a rigid component E, i.e., C →C ∪ E, such that Put differently, for almost all pairs (D C , D L ) in our database, there exists a rigid divisor such that splitting off this rigid divisor from the curve C leads to a jump in the number of global sections on that curve. At the same time, for a given combination (D C , D L ), we observe a jump of h 0 min only for a subset of all possible splits C →C ∪ E, suggesting that E and D L must have some correlation in order for the cohomology to enhance. We list the details of these splittings and jumps in appendix B.1.
It is obvious that the jumps stemming from rigid component splittings can be associated with the curve C becoming non-generic. While per se not unexpected, the machine learning process reveals -without explicitly "knowing" algebraic geometry -these features.
It is important in this context to address the bias in the data coming from considering only values of {0, 1} for the coefficients. Namely, within the data, we only observe jumps associated with splittings of rigid components. Naively, one might conclude that rigidity of a split component is a necessary condition. However, as we already stressed in the beginning of section 2.3, setting enough coefficients to 0 usually factors out one of the homogeneous coordinates x i . The corresponding curve splitting then always involves the toric divisor V (x i ) which on a dP 3 is rigid for any i = 1, ..., 6. Therefore, the strong correlation between a rigid component and a jump is likely due to the bias in the data.
Indeed, we will find in sections 4 and 5 with insights from algebraic geometry, that the main source for cohomology jumps in cases of curve splittings is actually insensitive to components being rigid. We will also supplement a concrete example in section 4.1.3 where we find a jump from non-rigid curve splittings. Furthermore, we will combine these arguments with the intuition about curve splittings we gained through the data to phrase a sufficient condition for a jump in cohomology to occur in terms of topological data only. We will discuss this idea in section 5.

Unpredicted jumps
The fact that the decision tree cannot predict all jumps hints towards sources for additional sections (and hence cohomology jumps) beyond curve splitting. Within the data set, we observe that in rare occasions, the curve remains smooth despite a deformation which induces a jump.
For illustration purposes, consider again the genus three curve with the line bundle discussed above. Generically, this genus 3 curve is cut out by the polynomial (2.10) In our database, we have computed the number of global sections for this line bundle for coefficient choices c ∈ {0, 1} 12 − 0. For these 4095 curves, we find • h 0 = 2: 1664 (40.6%) , Our database indicates that a jump to h 0 = 3 occurs whenever c 1 = c 2 = c 3 = c 11 = c 12 = 0. This corresponds to a splitting (2.11) The majority of the cases with h 0 = 2 are where either V (x 2 ) or V (x 5 ) splits off, each being a rigid P 1 . This is in line with the above observation. However, we also have instances (about 9% of all curves with h 0 = 2) where the curve remains smooth and irreducible. Despite having h 0 = 2, the split type features cannot distinguish these cases from the generic setup with h 0 = 1, thus leading to an imperfect performance of the decision tree.
While we will come back to a detailed discussion of this phenomenon and the associated algebraic description in terms of Brill-Noether theory in section 4.2, it is evident that these cases of jumps are associated to the line bundle L on C becoming non-generic. Moreover, we also observe that such Brill-Noether-type jumps can sometimes produce values of h 0 that cannot be obtained by splittings off rigid curve components. This becomes particularly important in F-theory models, as we will discuss now.

Application: F-theory model building
In the previous section, we have used machine learning techniques to gain some intuition on how line bundle cohomologies jump under complex structure deformations. While we will discuss the underlying "precise" description of these various sources of jumps in the next section, we would like to show that these "rules of thumb" inferred from the withe-box machine learning results can be applied directly in string phenomenology. To this end, we consider an F-theory toy model and exemplify how curve splittings help "controlling" the number of vector-like pairs. 6 Let us first summarize the relevant features of the model, whose explicit construction is detailed in [32]. The model has an SU (5) gauge symmetry localized on a dP 3 surface inside the compact base threefold B 3 , which itself is a smooth hypersurface inside a toric variety. There are matter states in the representations 10 1 , 5 3 and 5 −2 , where the subscript denote the charges under an additional U (1) gauge symmetry. Each representation R resides on a curve C R inside the dP 3 surface. One can find a globally consistent vertical G 4 -flux configuration that induces the chiral spectrum In the following, we will analyze in detail the vector-like spectrum in this setup.

Geometry of curves
In the global geometry, the matter curves C R are complete intersections involving the dP 3 surface and another divisor on the base B 3 . As discussed in [32], a generic choice of the complex structure parameters for the elliptic fourfold also induces a generic curve C R on dP 3 . In other words, we can parametrize them in terms of global denotes the divisor class of the curve inside dP 3 . Furthermore, the data defining the zero mode spectrum in a global F-theory model can be extracted from the G 4 -configuration and packaged into a line bundle (or, more generally, a coherent sheaf) for each curve C R [30,31]. For the case at hand, the flux inducing the chiral spectrum (3.1) induces line bundles which are pullbacks of various bundles on dP 3 to the curves [32].
Using the same notation as in the previous section 7 , the curves with their genus and their corresponding zero-modes counting bundles are: Note that the cohomologies on C 10 1 and C 5 −2 are fixed by the exactness of the corresponding Koszul resolutions, and hence there are no complex-structure-dependent jumps possible. 8 For the representation 5 3 , no such arguments apply, and thus we expect the number n of light vector-like pairs to vary. The curve C 5 3 = {a 3,2 = 0} is the vanishing locus of a polynomial with class (10; −3, −3, −4), whose explicit expression in the parametrization of the toric dP 3 coordinates x i are given in appendix A, cf. (A.58). With the curve having genus 24, it would be almost impossible to perform a scan by varying all the complex structure parameters ((A.58) has 44 coefficients), as we did previously for the low genus cases. However, the intuition we gained from the low genus examples will help us to "control" n -that is, to efficiently find suitable geometries realizing the desired vector-like spectrum.

Engineering jumps in cohomology
What we have learned from the machine learning results is that the line bundle cohomology is more likely to jump if the curve in question is reducible. Though we have already emphasized that rigidity of the components is not necessary, the abundance of toric coordinates makes it handy to factor out various different curves which in this case happen to be rigid. For the purpose of finding a concrete realization of a particular jump in the vector-like spectrum, these rigid factors turn out to be sufficient.
We thus modify the coefficients of the defining polynomial a 3,2 in (A.58) such that individual toric coordinates x i of dP 3 factor out. Of course, not every such factorization will lead to a jump: the rigid component must in some way receive a "non-trivial contribution", i.e., intersection, from the divisor D L defining the line bundle. The intuitions we gained from the previous section is that a negative intersection of D L with V (x i ) will lead to a jump. It is then intuitive to assume that the more rigid components splits off, the higher the jumps tend to be. With this intuition, we now proceed to engineer step-wise jumps of the vector-like spectrum.
Using the linear relations (2.3) and intersection numbers (2.5), we easily verify the divisor defining the line bundle, the polynomial factors as a 3,2 = x 6 R 2 , where R 2 is an irreducible polynomial in the class (10; −3, −3, −5). And indeed, a computer-assisted computation with methods from [32] reveals that for this curve C 2 = {x 6 R 2 = 0}, we have We can factor out another factor x 6 from R 2 by setting . In this case, we find a jump by three, To achieve a jump by four, we factorize , with the following choice of complex structure: Then we find Lastly, we also easily construct a model with five vector-like pairs, by setting On this sublocus in complex structure moduli space, the matter curve factorizes as . In this case we have

Single vector-like pair from Brill-Noether theory
The above examples demonstrate how the machine learning intuition led us to a step-wise increase in the number of vector-like pairs by suitable tuning of the complex structure parameters. These jumps occur because the matter curve in question splits into several components. However, such splittings induce a jump from zero vector-like pairs to at least two (or three, or four, or five). If we are interested in models with a single vector-like pair -such as for the Higgs field in MSSM realizations -then we need to look for other effects than curve splitting. As we have seen earlier, such effects are related to the cases not predicted by the trained decision tree. Here, the jumps in cohomology are not due to the curve becoming non-generic, but rather the line bundle. In fact, Brill-Noether theory (to be discussed in the next section, see also appendix A.1) tells us that for the matter curve C 5 3 of genus 24, we expect that a scenario with a single vector-like pair -i.e., one having h i = (16, 1) -to occur on a subvariety of dimension ρ = g − h 0 · h 1 = 8 of the space Jac(C 5 3 ) which parametrizes the line bundles on C 5 3 . Note that the same formula would yield ρ = −10 for jumps by two, and hence no such jumps can occur for a generic C 5 3 . This agrees with the above instances, as each of those requires the curve to become non-generic.
Because of this, engineering the jump by 1 becomes more challenging, and in particular requires additional tools from algebraic geometry. We defer the details of the relevant computations to appendix A and simply remark here that the necessary tuning is (3.11) One can easily verify that the polynomial a 3,2 in (A.58) does not factorize in this case, and that the curve C 5 3 remains smooth. Therefore, the enhancement in cohomology in this case is indeed of Brill-Noether type.

Cohomology jumps throughout the moduli space
To put the intuition we gained from machine learning onto more solid grounds, we now apply tools from algebraic geometry to develop a more complete, "microscopic" understanding for the various sources of jumps we encountered in our data. As we will see, the resulting insights lead to a diagrammatic representation of a stratification of the complex structure moduli space of F-theory compactifications induced by vector-like spectra.
As we have alluded to in section 2, based on our database we can essentially distinguish two types of jumps: 1. Jumps due to a non-generic line bundle.

Jumps due to a non-generic curve.
This shows that our samplings are very atypical. Namely, true jump loci have lower dimensionality than the full set of parameters. Therefore, jump loci form sets of measure 0 and should never be encountered by a genuinely random sample.
It is central to our discussion that algebraic geomemtry can bound from below the 'size' of such jump loci. In particular, this is true for jumps due to non-generic line bundles. Such jumps have been analyzed since 1874 in the context of Brill-Noether theory 9 [47]. Given a generic curve C g of genus g and an integer d, Brill-Noether theory provides an integer ρ(r, g, d) which measures how likely it is that a line bundle L d of degree d on C g has r + 1 independent non-trivial global sections, i.e., has h 0 (C g , L d ) = r + 1.
To formulate this more precisely, first recall that the Jacobian Jac(C g ) of the curve C g is isomorphic to C g /Λ where Λ is the full-dimensional period lattice of C g . By the Abel-Jacobi map, equivalence classes of line bundles of degree d form a copy of the Jacobian Jac(C g ). Let us focus on the subset of the Jacobian formed by all equivalence classes of line bundles of degree d which admit exactly r + 1 global sections. Then a lower bound on the dimension of this space is given by the integer In the last equality we use the intuitive notation n 0 = r + 1. Furthermore, we have used that by the Riemann-Roch theorem, Further details on Brill-Noether theory can be found in appendix appendix A.1, and a more complete presentation is given in [48,49]. An important result follows from [50]: If the curve is generic, then lines bundles of degree d only admit numbers r + 1 of global sections for which ρ(r, g, d) is non-negative. Put differently, there are no line bundles on generic curves with r + 1 global sections with ρ(r, g, d) < 0. Furthermore, the value of ρ gives a very clear notion of the likelihood to have r + 1 sections in terms of a dimension on the "moduli" space of line bundles.
Let us demonstrate this for a line bundle L of degree d = 2 on a curve C g of genus g = 3.
By general theory, the number of section of this line bundle cannot exceed its degree. Hence, it has 0, 1 or 2 sections. With this information, let us compute ρ(r, d, g): From this we learn, that most line bundles L of degree d = 2 on a genus g = 3 curve C 3 satisfy h 0 (C 3 , L) = 0. Since for these bundles ρ matches the dimension of the Jacobian of C 3 , we can say that these line bundles are associated to generic points of the Jacobian. Furthermore, we learn that there are line bundles with h 0 (C 3 , L) = 1. However, these are special in the sense that they are associated to a codimension-1 locus in the Jacobian Jac(C 3 ). Finally, ρ = −1 for r = 1 begs for an explanation. This explanation follows from work of Griffiths and Harris [50]: So in particular, on generic curves it holds G r+1 d = ∅ if and only if ρ (r, d, g) < 0. Consequently, we conclude from eq. (A.14), that on generic genus g = 3 curve, there is no line bundle L of degree 2 such that h 0 (C 3 , L) = 2.
Note however, that this does not rule out the possibility that non-generic curves may host such line bundles. In the case at hand, it follows from the theorem of Clifford [50] that hyperelliptic curves H 3 of genus g = 3 admit line bundles L of degree d = 2 and h 0 (H 3 , L) = 2. Note that hyperelliptic curves of genus g > 2 are non-generic. Hence, this points us to jumps of the vector-like spectrum, which originate from non-generic deformations of the curve.
Let us give another such example, which illustrates a jump on a singular curve. To this end, let us consider a line bundle L of degree d = 5 on a genus g = 2 curve. Then χ(L) = 4 and h 0 (C 2 , L) ∈ {4, 5}. Let us compute ρ(r, d, g) for these two values of global sections: Thus, on a smooth curve of genus g = 2, any line bundle of degree d = 5 has 4 global sections. Even more, since the degree d is in the stable range, we find 4 global sections for this line bundle on every smooth curve of genus g = 2 -generic or not. Hence, 5 sections can only be realized on a singular curve. This can be achieved by choosing the curve parameters (which model the complex structure moduli of global F-theory models) such that the curve becomes reducible, and factors into various components which intersect transversely in a number of points. A way to construct global sections on such curves is then as follows: First, consider each component individually and identify which sections they support. Then, by demanding that these sections agree at the intersection points, we glue these local sections to global sections. We will return to this gluing procedure in more detail in section 5.
In this section, we will take a closer look at the interplay of jumps that occur due to nongenericity both of the line bundle and the curve. In particular, since in global F-theory models, both the bundle and the curve depend on the complex structure parameters of the elliptic fibration in the same fashion (namely through the coefficients of its defining polynomials), they should be treated on the same footing, which we can summarize diagrammatically. The following analysis requires, at a technical level, a working understanding of the Koszul resolution of a pullback bundle, its associated long exact sequence in sheaf cohomology, inferring the maps in this long exact sequence from Čech ochomology as well as a basic understanding of on-reduced curves. For convenience of the reader, further details are provided in appendix A.

Jumps from curve splittings
We first analyze examples with jumps from curve splittings. We will see that rigidity of the components that split off play no role in the section counting. The reason why we found in earlier chapters that rigid divisors split off is due to our special choice of setting all coefficients in the polynomial that specify the curve in dP 3 to either zero or one.

Example: one additional section
Setup Let us return to the example of a line bundle on a genus 2 curve discussed above. In more detail, the curve and line bundle are given by where the coefficients c ∈ C 10 form the parameter space of this genus g = 2 setup. The line bundle L(c) = O dP 3 (D L )| C(c) satisfies deg(L(c)) = 5. Hence, on smooth curves, the theorem of Riemann-Roch tells us Moreover, since deg(L(c)) = 5 > 2g − 2, we know that for smooth curves h 1 (C(c), L(c)) = 0. Hence, h 0 (L(c)) = 5 is only possible on non-smooth curves.
Comparison with database In our database, we have considered choices of parameters c ∈ {−1, 0, 1} 10 − 0. On about 96% of these 59048 curves, L(c) has 4 sections. This fits with the above picture, that generically we expect 4 sections. However, we also find 2186 curves for which L(c) has 5 sections. Those curves satisfy c 3 = c 6 = c 9 = 0, which means that is a genus-0 curve with V (x 4 ) · B = 3. We will now argue that L(c) admits 5 sections if and only if C(c) decomposes in this way.
Classification of jump geometries To this end, we consider the Koszul resolution Its associated long exact sequence in sheaf cohomology takes the form The exactness of this sequence implies that where M ϕ = (c 3 , c 6 , c 9 , 0). We explain the construction of the mapping matrix M ϕ in more detail in appendix A.
Obviously, M ϕ has rank 1 iff (c 3 , c 6 , c 9 ) = 0 and its rank vanishes iff (c 3 , c 6 , c 9 ) = 0. This immediately leads to the following classification of curve geometries: showing that we obtain one additional vector-like pair if and only if the curve factors as V (x 4 ) ∪ B. We illustrate this result in the following diagram: In this diagram, the a th node represents a family F a of curves, for which we give the generic element in this family. For example, the family F 1 of curves at the first node is defined by the condition (c 3 , c 6 , c 9 ) = 0 and has the curve C as its generic element, which is a smooth, irreducible curve of genus g = 2. Note that (non-generic) members of F 1 can also be singular curves with several components. For example, the curve V (x 3 1 x 2 2 x 2 3 x 5 ) is defined by the condition that all c i but c 3 vanish. This curve is clearly singular and has several connected components. Recall that F 1 is the family of curves on which the line bundle in question admits four global sections. Hence, the statement is that even on such a very singular curve, the bundle in question admits exactly four sections.
This feature changes exactly on the family of curves F 2 , which are defined by (c 3 , c 6 , c 9 ) ≡ 0. Its generic element is a curve of the form V (x 4 ) ∪ B, where B is a smooth genus g = 0 curve touching V (x 4 ) in 3 distinct points. We can also view F 1 = {c | (c 3 , c 6 , c 9 ) = 0} and F 2 = {c | (c 3 , c 6 , c 9 ) = 0} as subspaces of the parameter space C 10 c. In this case it is trivial to see that where F 1 the closure with respect to the standard topology on C 10 . We will come back to this property shortly.

An h 0 -gap
Whilst factoring-off curve components typically increases the number of global sections, this effect need not necessarily generate exactly one additional section, as we have already seen above. Rather, it can force multiple additional sections to appear simultaneously. An example of this sort is (4.14) In this case, C(c) = V (P (c)) is a genus 1 curve defined by Here, we will argue, that even on singular curve, the pullback line bundle L can never have exactly one section. To see this, let us look at the long exact sequence in sheaf cohomology associated to the Koszul resolution of the setup: The exactness of this sequence implies h 0 (C, (4.17) Consequently, the statement that on the curves in class D C the pullback of D L never has exactly one section is equivalent to saying that M ϕ never has rank 2. We see this by studying the four non-trivial and independent 3 × 3-minors of M ϕ : which can have at most rank 1. More generally, we can classify the rank of M ϕ and thereby summarize the curve geometry as follows: Observe again that within the parameter space of c, we have The corresponding diagram is

Jump from non-rigid curve splitting
We now address the bias in our data, and provide a concrete example of jumps from curve splitting where none of the components are rigid. To this end, we consider D C = (2; −1, −1, 0) and D L = (−2, 0, 4, 0). This curve is thus given by For generic coefficients c i , the curve C is a smooth curve of genus g = 0 and L has degree d = 0. Hence we conclude h 0 (C, L) = 1.
To understand jumps at special coefficients, we employ the Koszul resolution and find (4.24) The rank drops of this matrix include both cases of rigid and non-rigid splittings. Explicitly, let us set A i = V (x i ), which are rigid components. Moreover, we also have the following possible genus g = 0 components which are non-rigid: With these, we can then summarize the rank drops as follows: rk(M ) explicit condition curve splitting The corresponding diagram is of the form

Jumps from non-generic line bundles
We now turn to jumps due to special alignments of the points that define a line bundle divisor. This genus g = 3 curve C(c) = V (P (c)) is defined by (4.29) Brill-Noether theory implies Hence, a jump on the generic curve -a Brill-Noether jump -to h 0 (C(c), L(c)) = 2 is possible.
To explicitly construct such curves, we again inspect the long exact sequence, associated to the Koszul resolution of L(c), which is given by The corresponding diagram is of the form The change of coefficients leads to a transition C 1 → C 2 of smooth, irreducible curves. Since the topology of the curve does not change for this choice of parameters, such a transition cannot be detected from the topological data which we used for our machine learning. Therefore, such transitions are the major source of error in our decision trees.
On smooth curves C i , the nature of the jump C 1 → C 2 can be analyzed by using Serre duality: Hence, the origin of this jump is that K C and the line bundle divisor differ, modulo linear equivalence, only by a point on C. Such a divisor is known as a special divisor. Loosely speaking, we may thus say that the origin of this one additional sections is that the points, which define the line bundle on the curve, move into a special alignment.
Note that also in this case, the diagram (4.34) encodes a hierarchy F 1 ⊃ F 2 , F 2 ⊃ F 3 . This is a generic feature of the parameter space and reflects a stratification induced by the vector-like spectrum.

h 0 -stratification of the parameter space
A stratification of a topological space X is a decomposition X = i F i into locally closed subspaces F i such that Intuitively speaking, a feature associated to a subspace F i -a so-called stratum -becomes "less likely" with increasing codimension of F i , and being contained in (the closure of) a higher dimensional stratum F j implies a "specialization" of the feature when going from F i to F j with j > i. The second defining property has a convenient diagrammatic representation: Let the strata F i form vertices of a graph, then there is a directed edge going from j to i if F i ⊂ F j . This is precisely the structure of the diagrams (4.12), (4.22), (4.27), and (4.34). Here, the stratified X is the parameter space {c} associated with a pair (D C , D L ), and the strata are defined by the value of h 0 (C(c), L(c)) in the notation of the previous subsections. Hence, we call these diagrams h 0 -stratification, or in short, stratification diagrams.
Note that Brill-Noether theory basically provides an analog description of the moduli space of line bundles / divisors on a smooth curve. In particular, it provides lower bounds on the dimension of the strata in terms of ρ. For F-theory models, where also deformations of the curve's topology become relevant, we see that the stratification by h 0 can be extended to the enlarged moduli space.
We observe that in this generalized setting, a stratum associated to a certain value of h 0 can consist of several disjoint subfamilies of different dimensions. In the example (4.34), the stratum F 2 associated with h 0 = 2 decomposes as  It is easy to see that each of these components also satisfies the axioms for strata (since they satisfy F (x) 2 ∩ F (y) 2 = ∅ for x = y). Furthermore, their closure contains the common stratum F 3 = {c | c 1 = ... = c 12 = 0} of higher codimension with h 0 = 3, as can be seen from the arrows connecting the three subfamilies of the stratum F 2 to F 3 in (4.34).
In general, a stratification diagram can be roughly divided into three regions. At low values of h 0 , jumps typically occur for divisor alignment, i.e., are allowed by Brill-Noether theory on a smooth curve. To get to high h 0 , i.e., many vector-like pairs, the curve typically needs to factorize into many components. In the middle regime, we can have a mixture, meaning in particular that a jump occurs due to divisor alignment on a split component.
To illustrate such a "typical" case, consider This genus g = 5 curve is given by C(c) = V (P (c)) with The stratification of curve geometries follows from the long exact sequence Consequently h 0 (C(c), L(c)) = 7 − rk(M ϕ ) and we find Hence, provided that the line bundle divisor is chosen such that K D i − D L | D i is effective, we find an additional section on D i , due to a Brill-Noether effect. More explicitly, in the case at hand this condition states that the line bundle divisor is linearly equivalent to the trivial divisor, i.e. D L | D i ∼ ∅. This condition is satisfied on D 2 but not on D 1 . For this reason we find one additional section on A 3 ∪ D 2 .

Local to global section counting
In this section, we provide an in-depth analysis of the procedure of gluing local sections on reducible curves. As a result, we can place a lower bound on the number of global sections. We find sufficient topological conditions for a jump of h 0 to occur. This further allows us to formulate an algorithm to estimate the possible numbers of vector-like pairs on the moduli space of F-theory compactifications.

Trivial boundary conditions
Let us start by looking at a simple example. To this end, we go back to the geometry discussed in section 4.1.2, i.e. Recall that in this case, C(c) = V (P (c)) is a genus 1 curve defined by We found that for c 1 = c 3 = c 4 = c 6 = c 7 = 0 we have 3 global sections. Furthermore, we have already seen that for this choice of parameters, the curve has 4 components These components have the following properties: In the last column we give the number of sections of the restriction of the bundle O dP 3 (D L ) to these curve components. We will refer to these sections in the following as the local sections. We display this geometry in fig. 4. Our task is to glue the local sections to global sections on the curve C = E 6 ∪ E 4 ∪ E (2) 2 ∪ A . To this end, we work out the sections explicitly and then subject them to boundary conditions at the intersection points of the different curve components.
For the components A, E 4 and E 6 we already know that the only allowed local section vanishes identically. On E (2) 2 however, the situation is a bit more involved since E (2) 2 is a non-reduced curve. As a set, E 2 is the locus V (x 3 ). Using the scaling relations of dP 3 , we can then set x 2 = x 4 = x 6 = 1 and thereby identify (x 1 , x 5 ) as coordinates of E 2 . Note, however, that since E 2 is a non-reduced curve, the polynomial x 3 is a non-trivial function on this curve component. These observations allow us to conclude where P i (x 1 , x 5 ) is the space of polynomials of degree i in x 1 and x 5 . Upon homogenization with x 2 , x 4 , x 6 , we can then write .

(5.6)
From this, we learn that the only sections on V (x 2 3 ), which vanish at V (x 1 ), V (x 5 ) and V (c 2 x 1 x 2 + c 5 x 5 x 6 ), are linear combinations of the following three sections: Consequently, by extending these sections by zero outside of V (x 2 3 ), we obtain 3 global sections.

Non-trivial boundary conditions
Let us consider D C = (3, −1, −1, −1) and D L = (5; −4, −4, 3). We pick special values for the parameters such that C = V (x 1 x 2 2 x 2 4 x 6 ). The curve thus factors into four components, as We have also listed bases for the sections on the individual curve components. By starting in E 3 , we see that there is a unique section which extends to E 5 and then to E 1 -this section is x 4 . However, this section fails to vanish on V (x 1 ). Consequently, this geometry only admits the global section which is identically zero.

From trivial to non-trivial boundary conditions
We have seen an interesting geometric transition when we discussed D C = (5, −1, −1, −2) and D L = (1; 1, −4, 1) in section 4.3. Namely, the transition enforces a Brill-Noether jump on D 2 . Whilst D 1 only supports the trivial section, D 2 supports a one-dimensional space of non-trivial sections. As a consequence, A 3 ∪ D 2 admits one additional section as compared to A 3 ∪ D 1 . Let us investigate this finding in more detail. We depict this geometry in fig. 6 and recall the following information: To simplify our analysis, let us work with a particular class of curves D 1 and D 2 , for which the transition D 1 → D 2 is particularly simple: Next, we turn to the sections on A 3 ∼ = P 1 . We note that the homogeneous coordinates are [x 1 : x 5 ]. Hence, the line bundle sections at hand are of the form (λ = x 2 x −1 6 ): At x 3 = 0, we may set x 2 = x 4 = x 6 = 1 by the scaling relations of dP 3 . In terms of these inhomogeneous coordinates, we find That all said, we can discuss the global sections on A 3 ∪ D 1 and A 3 ∪ D 2 : • On D 1 , the only supported section vanishes identically. Hence, we may only consider sections on A 3 , which vanish at A 3 ∩ D 1 . It is not too hard to see that the space of these sections is generated by • On D 2 however, the line bundle divisor is special. In fact, since it is a divisor of degree zero, this divisor must be the trivial divisor. Consequently, the sections on D 2 are the constant ones. It is not too hard to see that the sections on A 3 , which have value 1 at the intersection points A 3 ∩ D 2 , are generated by This explains the one additional section on A 3 ∩ D 2 as opposed to A 3 ∩ D 1 .

Overcounting boundary conditions
As a final example, let us look at D C = (4; −1, −1, −1) and D L = (1, 1, −3, 0). Let us deform the curve C such that it is given by We display this curve geometry in fig. 7. The two curve components have the following properties: Up to canonical isomorphism (induced from the connection homomorphism), we find a basis of the sections on C 2 as From this we can see that the third section automatically vanishes at the intersection C 1 ∩ C 2 , whilst the other two sections do not vanish there. Consequently, and in agreement with the computational results by gap, we find h 0 (C 1 ∪ C 2 , L) = 1. Importantly, a naive guess cannot predict this number. In this case, we would have counted as follows: 3 sections on C 2 subject to vanishing conditions at the 3 intersection points C 1 ∩ C 2 should leave us only with the trivial section. Hence, in this example, a naive counting fails. Such phenomena were originally studied more generally in [51,52] -see also [53] for a more modern exposition of the material.

Sufficient jump condition and algorithmic section estimate
As demonstrated in the previous section, gluing local sections to global sections is a non-trivial task. The exact details depend, among other things, on the relative position of the line bundle divisor and the intersection points of the curve components: the results change when some of these intersection points coincide and when the bundle divisor is special on some curve components.
In the following, we will propose a counting mechanism with the following key properties: • It relies mostly on topological data.
• It provides a lower bound on the number of global sections.
Of course, such a simplified counting procedure will fail to predict intricate geometries as discussed in [51][52][53]. Still, it has two distinct advantages. First, since it relies mostly on topological data, it is very fast. Given a curve C and a line bundle L on C, we can apply the Figure 7: Naively, we expect 3 − 3 = 0 global sections. However, one section on C 2 automatically vanishes at C 1 ∩ C 2 , leading to h 0 (C 1 ∪ C 2 , L) = 1.
strategy to place a lower bound on h 0 (C(c), L(c)) for many different choices of parameters c of C. The collection of these lower bounds can then serve as an estimate of the vector-like spectrum of (C, L) over the parameter space. Note that obtaining such an estimate is unfeasible with existing exact algorithms, e.g., those implemented in [42], since these algorithms require extensive computational resources and often take a long time to finish. The second advantage results from the fact that our counting procedure systematically underestimates the actual number of global sections. Therefore, it allows us to formulate sufficient conditions for a jump in the vector-like spectrum to happen.

Counting procedure
Let us consider a curve C with i.e., C has N components C i . For our counting procedure to be as simple and reliable as possible, let us avoid setups of the type discussed in section 5.1.2 and section 5.1.3. Hence, let us consider a line bundle L on C such that neighboring curve components do not support non-trivial sections simultaneously. Put different, we only consider setups where for all curve components C i the following holds true: Let us denote by b i the number of intersection points of C i with the other curve components. Generically, we then impose b i conditions on the "local" sections in H 0 (C i , L| C i ). Consequently, is a lower bound to the number of sections on C i which satisfy the gluing boundary conditions. The sum of these contributions over all curve components places a lower bound on h 0 (C, L): We expect that equality holds in generic situations and that only fairly tuned geometries, in the spirit of [51][52][53], will lead to a proper inequality. As simple demonstration, let us apply this procedure to the geometry discussed in section 5.1.1: section 5.1.1: Indeed, 3 i=1 n i = 3 in agreement with our discussion in section 4.1.2. However, if we apply this counting to A 3 ∪ D 2 , as discussed in section 5.1.3, then we find the inequality This shows that, if we are interested in the exact number rather than a lower bound, we should restrict our counting procedure to curve geometries where neighboring curve components do not support non-trivial sections simultaneously. Furthermore, the geometry studied in section 5.1.4 shows that even under this assumption, there are exceptions to this counting procedure. In this case, this can be attributed to a special alignment of the line bundle divisor and the intersection points, such that one of the sections automatically satisfies all of the boundary conditions.

Accuracy on our database
Let us now apply this counting procedure to our database [43] to obtain an estimate of how often the inequality is satisfied. To this end, we need to identify the number of local sections, which can be challenging for complicated curve geometries and could call for an application of, e.g., the exact methods implemented in [42]. However, given the vast number of curve components in our database, we find it more appealing to focus on those curves for which we can identify the number of local sections quicker. To this end, we focus on the following two types of curves: • Smooth curves: We consider the line bundle degree d = deg( L| C i ). Provided that d < 0, we know that L| C i does not admit non-trivial sections. Conversely, if d > 2g(C i ) − 2, then it follows from application of the Kodaira vanishing theorem, that h 0 (C i , L| C i ) = d − g + 1. If none of these conditions is satisfied, we discard the curve for this test.
• Non-split curves: For these curves, we can simply read off the number of local sections from our database.
Based on these local section counts, we have then applied the counting procedure presented in section 5.2.1. Recall that a large number of curves in our database do neither consist of smooth curve components nor are non-split. Furthermore, recall that we subject the curve geometry to the condition that neighboring components do not support non-trivial sections simultaneously. Let us emphasize that the latter is a simplifying assumption to simplify our counting procedure. Whilst we leave extensions in this direction to future work, we can still apply our (restricted) counting procedure to roughly 60% of the cases in our database. For these, we predict the correct number of global sections with an accuracy of more than 99%, i.e. our counting procedure works remarkably well. We list the detailed results in appendix B.2.1.

Sufficient conditions for jumps in cohomology
These insights of gluing local sections to form global sections, imply sufficient conditions for jumps in cohomology. First, we have the following Lemma 1. Let S be a smooth surface, L ∈ Pic(S) a line bundle, and |C| a linear system of curves on S. Consider a special member C 1 ∪ C 2 such that the curves C 1 , C 2 meeting transversely in C 1 · C 2 > 0 distinct points. Let N 1 = h 0 (C 1 , L| C 1 ) and N 2 = h 0 (C 2 , L| C 2 ). Then Proof We consider the short exact sequence 0 → L| C 1 ∪C 2 → L| C 1 C 2 → L| C 1 ∩C 2 → 0. The associated long exact sequence in sheaf cohomology begins with We can use this result, together with the insights on gluing local sections to global sections, to derive the following Corollary 1. Let S be a smooth surface, L ∈ Pic(S) a line bundle, and |C| a linear system of curves on S with smooth general member C and special member C 1 ∪ C 2 where C 1 , C 2 are smooth curves of genera g 1 , g 2 meeting transversely in C 1 · C 2 > 0 distinct points. We assume Proof Since deg L| C 1 < 0, there are no sections on C 1 . Hence, from lemma 1 we obtain the inequality (5.28) Hence, we conclude Finally, since we assume deg L| C 1 < min {0, g 1 − 1}, the number of additional sections on C 1 ∪ C 2 is bounded from below by the positive integer g 1 − deg L| C 1 − 1.
We expect that equality holds in generic situations and that only special setups in the spirit of [51,52] lead to a proper inequality. Still, our result is powerful enough to give a sufficient condition for a jump. Let us demonstrate this in the geometries discussed in section 3.1. Recall that we are looking at S = dP 3 and From this we see that corollary 1 applies to this geometry and implies This is in agreement with our discussion in section 3.1. In many string theory constructions, it is important to engineer exactly one additional vector-like pair. This is particularly true when generating exactly one Higgs pair in MSSM constructions. It is intuitive, that such a minimal change in the vector-like spectrum, requires only mild changes in the geometry. As long as corollary 1 applies, a necessary condition for such a mild change is to merely split off either a P 1 or a torus - More generally, it is of interest to identify the allowed numbers of global sections on a given curve. Therefore, we will now describe an estimate for these values, which is based on the counting procedure presented in section 5.2.1, lemma 1 and corollary 1.

Algorithmic spectrum estimates
We can use our results to formulate an algorithmic estimate for the vector-like spectrum over the parameter space of a given setup (D C , D L ) in a global model. For the time being, our algorithm is focused on the case of a curve in dP 3 defined by {P = 0} and pullback line bundles on these curves. We have implemented this algorithm in the package H0Approximator [44] as part of [42]. Our algorithm proceeds as follows: Let us emphasize a couple of important points of this counting procedure. First, in the second step we do not apply exact methods, such as [42], to find the exact number of local sections. Rather, we identify the generic number of sections, by which we mean h 0 (C, L) = χ(L) if χ(L) ≥ 0 and h 0 (C, L) = 0 otherwise. The advantage of this is, that the chiral index can be obtained from topology only. Hence, the number of global sections can be estimated very quickly. Furthermore, this strategy does not violate our lower bound philosophy, since the generic number of sections is never larger than the actual number of sections. Consequently, this strategy allows us to quickly identify a lower bound to the actual number of global sections.
Secondly, let us point out that one disadvantage of our approach of generic local sections is that we are unable to identify Brill-Noether jumps on the curve components in this way. However, since such a quick spectrum estimate over the entire parameter space of the curve is currently unfeasible or impossible to obtain with the fully accurate methods, we accept this minor drawback.
Finally, note that upon splitting off P 1 s from the curve, the curve could (accidentally) factor further. Computing these further factorizations requires a primary ideal decomposition of the corresponding principal ideal. Currently, this is the most time consuming operation in our algorithm. We reserve optimizations for future work.
This algorithm correctly predicts all the possible values of h 0 for 67 of the 83 pairs (D C , D L ) in our database [43]. Only for one pair (D C , D L ), our prediction misses more than 2 values of the exact spectrum. Given the simplicity of our approximation, which means that we cannot detect intricate Brill-Noether jumps and effects discussed in [51,52], we consider this a very positive result. We list the details in appendix B.2.2.

Conclusion and Outlook
Motivated by a better understanding of the exact massless spectra of 4d F-theory compactifications, we have analyzed in this work families of curves C(c) in a complex surface and line bundles L(c) on these. Our focus has been on the interplay between changes in the cohomology h 0 (C(c), L) and variations of the parameters c, which play the role of complex structure moduli in the context of global F-theory models. To gain insights on how these two are related, we have used two approaches.
To begin with, we first used ideas from Big data and machine learning to gain some intuitions, based on computationally simpler examples, under what circumstances the cohomology may jump, leading to additional vector-like pairs in the F-theory interpretation. To this end we have generated, in section 2, a database [43] of cohomologies for pairs (C(c), L(c)) by varying the parameters c, where the curves are of genus 1 ≤ g ≤ 6, and the line bundles were pullback bundles from a dP 3 surface. For these less complex examples, the cohomologies can be computed using the computer implementations in [42]. We then use supervised learning on decision trees to predict jumps in the value of h 0 . Using different features for training, we find that, while not performing perfectly, topological criteria are surprisingly well-suited (reaching about 95% accuracy) for distinguishing cases with generic vs. enhanced h 0 . In particular, the algorithm learns from the data a strong correlation between jumps and curves C(c) which split into various components. This intuition can be applied, without any detailed understanding of the origin of the jumps, directly to find complex structure tunings targeted at generating additional vector-like pairs in F-theory model building. We demonstrate this in section 3 with an F-theory toy model containing a curve of genus 24, for which a scan over the relevant parameter space would be computationally infeasible. Nevertheless, we found that we can use curve splittings alone to easily engineer 2 to 5 additional vector-like pairs. This highlights the effectiveness of the machine learning approach to learn certain features from simpler examples, and without any previous knowledge. However, we also saw there that by curve splitting alone, a spectrum with just one vector-like pair is impossible to achieve.
To overcome this obstacle, we have employed well-known techniques in algebraic geometry, such as the Koszul resolution and Čech cohomology, which also helps to explain our findings from the machine learning approach in more detail. We conclude that deformations of the parameters c leading to a jump in cohomology can be largely classified as either the curve C(c) or the line bundle L(c) becoming non-generic. While the former comes from curve splittings and is thus topological 11 , the latter is due to special alignments of the points on C(c) defining L(c), and not visible just from topological criteria. The fact that the learner performed so well with the topological criteria is due to a bias in the dataset, which contains only a small number of instances with non-generic line bundles. Such jumps can never be predicted by the learner based just on split type and intersection numbers. However, as we discussed in section 4, we find in general "equally likely" jumps due to non-generic line bundles. The likeliness can be quantified by comparing the dimension of the corresponding subspace of the parameter space on which the jumps occur, which for non-generic line bundles is the subject of Brill-Noether theory. This is generalized in the F-theoretic setup, where complex structure deformations affect genericity of the curve and line bundle democratically. This leads to a stratification of the parameter space by the values of h 0 . That is, the complex structure moduli space of global F-theory models decomposes into disjoint subspaces labelled by the vector-like spectrum. The relationship between the strata can be represented by a Hasse-type diagram, which we term h 0 -stratification diagrams.
The connection between decision trees and the stratification diagrams, which are also Hasse diagrams, is rather intriguing. While they bear some resemblance with decision trees, a key difference is that, unlike in decision trees, nodes can have more than one incoming edge. It would be interesting to investigate whether other graph-based machine learning techniques, such as Graph NNs, can be used to train algorithms that can predict the presence of jumps more accurately than the decision trees. Furthermore, recall that global F-theory models typically contain more than one matter curve. The complex structures of these curves are determined by the global moduli of the elliptic fibration, and it is in general not possible to tune the complex structures of all of these curves independently. Therefore, it would be important to extend our analysis to a simultaneous h 0 -stratification of the moduli space by all the matter curves in a global F-theory model.
In section 5, we have then investigated the "microscopic" origins of jumps due to curvesplittings. It follows a simple counting procedure of local sections on individual curve components, which we then glue to global contributions to h 0 on the whole curve. Depending on the boundary conditions imposed by the intersection patterns of the components, this can lead to a net-increase of global sections on the reducible curve compared to the generic case. We have used this understanding to formulate sufficient conditions for a jump in the vectorlike spectrum to occur as a result of a curve splitting. These criteria are purely topological, and combine the gluing arguments with vanishing theorems on individual components. Let us stress that this in general provides only a lower bound for h 0 for the split curve, because it does not take into account alignments of the intersection points of the components and divisors on the individual components. It will be interesting to investigate, if these bounds can be further improved by topological considerations.
Despite these simplifications, we found these criteria extremely useful to provide a rough estimate of the possible spectrum of h 0 on the moduli space of F-theory compactifications, and implemented the algorithm in [44]. To fully appreciate this implementation, let us mention that to the best knowledge of the authors, the exact algorithms implemented in [42,54,55] do not allow for a parametric cohomology computation. Rather, they will focus on one particular point in the complex structure moduli space and provide the exact answer at this very point. Since each of these computations requires huge amounts of computational resources and runtime, it is impractical to repeat such computations for many points in the complex structure moduli space. In contrast, the new algorithm yields an approximate, but oftentimes sufficiently accurate, estimate -even for complicated examples such as the genus 24 curve discussed in section 3 -within minutes. We leave generalizations of this counting algorithm, as well as extensions to other toric surfaces, for future work.
Another limitation of our approach is that we have only considered pullback line bundles so far. However, as already alluded to in the introduction, vector-like spectra in F-theory are oftentimes encoded in line bundles described by a formal weighted sum of points. Such a description is computationally harder for two main reasons. First, it takes much longer to compute line bundle cohomologies of non-pullback bundles with the technologies of [42]. This makes it more challenging to generate a sufficiently large database to apply ideas from Big data and machine learning. The second obstacle is the parametrization of the line bundles. Namely, distinct point configuration can encode equivalent line bundles if their difference is the divisor of a meromorphic function. To have a better handle on tracking how these equivalences change with complex structure deformations, we need a better understanding of meromorphic functions on higher genus curves. The crucial tool in this direction is the Abel-Jacobi map, which also plays a similar role in the hyperelliptic curve cryptography. It would be interesting to see to what extent machine learning ideas can be beneficial here.
A related issue arises for fractional bundles or root bundles. These appear frequently in explicit global F-theory constructions that engineer a three-generation Standard-Model-like particle physics sector [16,21,[23][24][25]. The constraint to have chiral indices with |χ| = 3 in these models lead to line bundles L on curves C which satisfy L ⊗n = L| C , where L is a line bundle on the base B 3 of the elliptic fibration. In case n = 2 and L = K B 3 is the canonical bundle of the base, the bundle L can be understood as the pullback of the spin bundle of B 3 to C. However, for general F-theory constructions, also 3rd and higher roots of bundles L = K B 3 appear. An understanding of which line bundles L on C satisfy such an equation again requires a detailed understanding of which points -in this case the intersection points of C with the divisor on B 3 dual to L -on the curve define equivalent divisors. We expect that this will also be intimately related to satisfying the quantization condition [56] for the gauge flux background.
Finally, it is important to point out that the complex structure parameters of the elliptic fibration are not the only parameters of the physical theory. Rather, a large part of this parameter space which we have not touched upon is in the parametrization of all possible gauge backgrounds. This includes in particular backgrounds with so-called non-vertical G 4flux [57,58], for which explicit construction methods in global models are largely unknown. While these typically do not contribute to the chiral index, it is not clear at the moment if they could modify the flux-induced line bundles on the matter curves. However, since nonvertical fluxes contribute prominently to a superpotential for the moduli, their presence will dynamically select points in the moduli space that can be a vacuum for the theory, thus have a very different, but direct influence on the vector-like spectrum. We will therefore need a much better handle on these gauge backgrounds first before we can develop a full understanding for the space of 4d F-theory vacua.
A Tools: Koszul resolution, Brill-Noether theory and fat points The purpose of this appendix is to cover some of the necessary mathematical backgrounds, and also provide more details of computations carried out throughout the paper.

A.1 Brill-Noether theory
Our exposition of Brill-Noether theory is based on [48,49]. We refer the interested reader to these references for more details.

A.1.1 The Jacobian of Riemann surfaces
To each smooth Riemann surface C g one can associate a Jacobian variety Jac(C g ). This variety is of dimension g and classifies equivalence classes of line bundle divisors of degree 0: In this expression Div 0 (C g ) is the group of all divisors of degree 0 and Prin(C g ) the group of all principal divisors on C g . Line bundles on C g are isomorphic iff their divisors differ by a divisor in Prin(C g ). Hence, sheaf cohomologies of line bundles can only differ if the line bundles are not isomorphic, or equivalently if their divisors differ by more than elements of Prin(C g ). Consequently, the Jacobian of C g plays an important role for our analysis and in Brill-Noether theory. Let us therefore introduce the Jacobian in more detail.
Historically, the Jacobian of a curve C g of genus g was discovered by investigating integrals P ω where P ⊂ C g is a (not necessarily closed) path and ω a holomorphic differential. More generally, mark a point p 0 ∈ C g , let (ω 1 , . . . , ω g ) be a basis of the holomorphic differentials on C g and consider the map The value of this map strongly depends on the path P ⊂ C g which we choose to connect p 0 and p. This redundancy can be removed by taking the period lattice of C g into account. To this end, recall that there are 2g homologically distinct closed 1-cycles in C g , i.e., H 1 (C g , Z) is a 2g-dimensional vector space over Z. 12 We now consider the map where ω i denote the above basis of holomorphic differentials on C g . Hence, for every of the 2g-basis elements of H 1 (C g , Z), we obtain an element φ(α) ∈ C g . It turns out that these 2g elements span a full-dimensional lattice Λ in C g -the period lattice of C g . By virtue of this lattice, we obtain a well-defined map This map is known as the Abel-Jacobi map. It can easily be extended to divisors in C g . Namely, for a divisor The theorem of Abel (see [59] and references therein) states that two effective divisors D and E satisfy φ(D) = φ(E) iff D and E are linearly equivalent. Consequently, we obtain an injective group homomorphism of divisor classes of degree 0. It turns out that this map is also surjective (see [59] for a proof). Hence, there is a natural isomorphism Jac(C g ) = Div 0 (C g )/Prin(C g ) ∼ = C g /Λ . (A.8)

A.1.2 Central results
For ease of notation let Div(C g ) d denote all divisors of degree d. Then, let us consider the restriction of eq. (A.6) to Div(C g ) d , i.e.
Let us pick an integer r ≥ −1 and study the subvariety of Jac(C g ) Then, the central result of Brill-Noether theory states [47] dim(G r d ) ≥ ρ (r, d, g) ≡ g − (r + 1) · ((r + 1) − (d − g + 1)) . (A.11) By use of the Riemann-Roch theorem we can rewrite this results in the suggestive form with n 0 ≡ r + 1 and n 1 = r + 1 − (d − g + 1). We may thus use ρ (r, d, g) as a measure for how likely it is that a line bundle of degree d on a genus g curve C g has n 0 = r + 1 global sections. Let us demonstrate this for degree d = 2 bundles on a genus-3 curve. By general theory, the number of section of a line bundle on a curve C g with g ≥ 1 can never exceed its degree. Hence n 0 ∈ {0, 1, 2}. With this information, let us compute ρ(r, d, g) for the admissible values of r: r (n 0 , n 1 ) ρ(r, d, g) From this we learn, that most line bundles L of degree 2 on a genus-3 curve C 3 satisfy h 0 (C 3 , L) = 0. Since for these bundles ρ matches the dimension of the Jacobian of C 3 , we can say that these line bundles are associated to generic points of the Jacobian. Furthermore, we learn that there are such line bundles with h 0 (C 3 , L) = 1. However, these are special in the sense that they are associated to a codimension-1 locus in the Jacobian Jac(C 3 ). Finally, ρ = −1 for r = 1 begs for an explanation. This explanation follows from work of Griffiths and Harris [50]: On generic curves, dim(G r d ) = ρ (r, d, g). So in particular, on generic curves it holds G r d = ∅ if and only if ρ (r, d, g) < 0. Consequently, we conclude from eq. (A.14), that on generic genus g = 3 curve, there is no line bundle L of degree 2 such that h 0 (C 3 , L) = 2.
Note however, that this does not rule out the possibility that non-generic curves may host such line bundles. In the case at hand, it follows from the theorem of Cliffford [50] that hyperelliptic curves H 3 of genus g = 3 admit line bundles L of degree 2 and h 0 (H 3 , L) = 2.

A.1.3 Brill-Noether jump
As we see from eq. (A.14), we can in general modify a line bundle on a generic curve such that it admits additional sections. A jump from r = r generic to r generic + 1 is equivalent to saying that the Serre-dual bundle admits a section, i.e., becomes effective: where ∼ represents linear equivalence of divisors. Obviously, this requires the line bundle divisor D to move into special alignment relative to K C . Such a divisor is termed a special divisor. We term a change in h 0 , which is solely attributed to a special alignment of the line bundle divisor, a Brill-Noether jump.

A.2.1 Generalities
Given a curve C and a line bundle L on C, we wish to identify which deformations of the curve lead to an increased number of global sections for L. For hypersurface curves in dP 3 , the answer follows from a study of the Koszul resolution. In this case C(c) = V (P (c)) for a polynomial P (c). The coefficients c model the complex structure moduli of a global F-theory setting.
For such a setup, the Koszul resolution is given by the short-exact sequence The map α is induced by the polynomial P (c). Namely, for U ⊆ dP 3 open, α is given by For example, in section 4.1.1, we consider D C = (4; −1, −2, −1) and D L = (3; −3, −1, −2). In this case, the Koszul resolution simplifies and takes the form Then it follows A detailed study of Čech cohomology [60] shows that in this geometry we have M ϕ = (c 3 , c 6 , c 9 , 0). Hence, h 1 (C(c), L(c)) = 1 on curves with c 3 = c 6 = c 9 = 0 and otherwise h 1 (C(c), L(c)) = 0. Along these lines, we classify the curve geometries according to their admitted number of global sections.
Recall that Čech cohomology expresses H i (dP 3 , O dP 3 (D L − D C )) and H i (dP 3 , O dP 3 (D L )) as collections of local sections. The mappings of these local sections follow from eq. (A.17), i.e., are given by multiplication with the polynomial P (c) which defines the curve C(c). Importantly, these bases are expressed modulo equivalence relations induced from Čech coboundaries. Therefore, these computations are typically fairly tedious.
Oftentimes, cohomCalg [61][62][63][64][65][66][67] can help to simplify this task. Namely, it identifies bases of H i (dP 3 , O dP 3 (D L − D C )) and H i (dP 3 , O dP 3 (D L )) in terms of rationoms -quotients of monomials in the homogeneous coordinates -and therefore simplifies the task to find the bases in Čech cohomology. Even more, we may be tempted to simply multiply the basis elements identifed by cohomCalg [61][62][63][64][65][66][67] with the polynomial P (c) and ignore all image rationoms that have not been identified as bases for H i (dP 3 , O dP 3 (D L )) by cohomCalg under the assumption that they correspond to Čech coboundaries.
This procedure fails whenever Čech cohomology chamber factors greater than 1 appear. In this case, cohomCalg finds that one rationom R spans a vector space of dimension greater than 1 in sheaf cohomology. The interpretation of this is, that there are at least two distinct Čech cochains, i.e., collections of local sections, in which the rationom R is the only non-trivial entry. Hence, these distinct Čech cochains are both canonically isomorphic to R. However, to identify the mapping matrices of the line bundle cohomologies correctly, the information about R is insufficient. Rather, the corresponding Čech cochains need to be identified explicitly.
Given these insights, we have taken extra care, to work out the mappings presented in this work carefully with Čech cohomology. We present such a computation in large detail in the following section.
Before we come to this, let us mentioned that a detailed study of the Koszul resolution is not original to this work. For example, in the context of heterotic compactifications, these resolutions -including the mappings in the induced long exact sequence -have been studied extensively [68][69][70][71][72]. However, to the best of our knowledge, chamber factor greater than 1 do not show in products of projective spaces. Hence, this complication does not arise in heterotic compactifications with CICYs.
In this expression, U is the affine open cover of the dP 3 surface -we will discuss this momentarily -and the maps δ i are the boundary morphisms in the Čech complex Thereby, let us specify our statement regarding the RHS of eq. (A.26). We claim that all omitted terms are in im(δ 0 ), i.e., are Čech coboundaries. To justify this statement, we proceed by investigating the following objects: 1. im (δ 0 (D L )).

A.3 The fat point
Finally, in our analysis, non-reduced curves feature prominently. Consequently, a basic understanding of such curves is required. Let us therefore briefly discuss the mother of all non-reduced varieties, the fat point. This is an example in non-compact affine space C 2 with coordinates x, y. Most of this intuition carries over to compact curves. More details can for example be found in [49,74]. Let us consider V (x) ⊆ C 2 . This is the complex (non-compact) curve with coordinate y. The difference between V (x) and V (x 2 ) is not the collection of points, which these vanishing sets contain, but rather the allowed functions on these spaces. Namely, recall that in the modern language of algebraic geometry, a scheme (or equivalently in the analytic regime -a geometric space) is a pair of a topological space and a structure sheaf. The difference between V (x) and V (x 2 ) is this very structure sheaf.
In staying within the regime of algebraic geometry, the structure sheaf of C 2 is given by (the sheafification of) the total coordinate ring C[x, y] -the ring of all polynomials in the variables x and y. Likewise, we can understand the structure sheaf on V (x) from its coordinate ring: Consequently, on V (x 2 ), the polynomial x provides a non-trivial function! This is the difference between V (x) and V (x 2 ). We can extend this example slightly by looking at V (y, x 2 ). For this space we find Hence, on this point in the affine plane C, the set of non-trivial functions is 1-dimensional and is generated by the polynomial x. This lends V (y, x 2 ) its name -as point set it is just a single point, yet this point is large enough to admit non-trivial functions -it is a fat point.

B.1 Curve splittings and jumps
Recall that the six toric P 1 s of dP 3 correspond to the exceptional divisors E 1 , E 2 , E 3 and the following three divisors:

B.2 Local to global section counting applied to our database
In this section, we list results which quantify how good the counting procedure proposed in section 5.2.1 works, when applied to our database. We have preformed two tests: