1 Introduction

There are several ways to measure the (non)associativity of a given binary operation \(\circ \). The index of nonassociativity counts the number of triples (abc) such that \((a \circ b) \circ c \ne a \circ (b \circ c)\). The semigroup distance counts the minimum number of changes we need to perform in the Cayley table of \(\circ \) in order to make it associative. The associative spectrum counts the number of term functions arising from different bracketings of \(x_1 \circ \dots \circ x_n\), thereby giving information about the consequences of the associative identity that are (not) satisfied by \(\circ \).

We define these notions more precisely in Sect. 2, and we give several examples as well as a brief overview of earlier research in this area. The examples will show that there is little relationship among these measures of associativity: it is possible that one of them is very small, while another one is very large, for the same binary operation.

We can define a binary operation on the vertices of a graph by

$$\begin{aligned} x \circ y = {\left\{ \begin{array}{ll} x, &{} \text {if there is an edge from } x \text { to } y;\\ 0, &{} \text { otherwise}; \end{array}\right. } \end{aligned}$$

where 0 is an external zero element. The main topic of this paper is the study of associativity measures of these graph algebras. Again, the more formal definition is given in Section 2, and we also recall some notions of graph theory there.

We will only consider undirected graphs, and these have been completely classified with respect to their associative spectra in [18] (we state this classification in Theorem 5.1 in Section 5). Therefore, we focus on the semigroup distance (more precisely, a “graph-algebraic” version of the semigroup distance) and on the index of nonassociativity in Sections 3 and 4. In particular, we determine the minimal and maximal values of these measures of associativity, and we characterize graphs corresponding to these extremal values.

We will conclude in Section 5 that—as opposed to the case of arbitrary binary operations—the semigroup distance and the index of nonassociativity are closely related: for connected undirected graphs, the notions of “antiassociativity” and “almost associativity” coincide for these two measures of associativity (but not for the associative spectrum).

2 Preliminaries

2.1 Measures of (non)associativity

Definition 2.1

([2]). The index of nonassociativity of a finite groupoid \(\mathbb {A}=(A;\circ )\) is the number of nonassociative triples in \(\mathbb {A}\):

$$\begin{aligned} {\text {ns}}(\mathbb {A}) = \left|\big \{ (a,b,c) \in A^3 : (a \circ b) \circ c \ne a \circ (b \circ c) \big \}\right|. \end{aligned}$$

The index of nonassociativity was defined by Climescu [2], and he proved that all values between 1 and \(n^3\) are possible for n-element groupoids if \(n \ge 3\) (see also [5]). (For 2-element groupoids the possible values are 0, 2, 4 and 8.) Clearly, \({\text {ns}}(\mathbb {A})=0\) if and only if \(\mathbb {A}\) is associative (or, more precisely, the binary operation of \(\mathbb {A}\) is associative), and we may say that \(\mathbb {A}\) is almost associative if \({\text {ns}}(\mathbb {A})=1\), while \(\mathbb {A}\) can be regarded as antiassociative if \({\text {ns}}(\mathbb {A})=n^3\). Groupoids with \({\text {ns}}(\mathbb {A})=1\) are also called Szász–Hájek-groupoids, because their structure was first studied by Szász [25] and Hájek [6], and later by Kepka and Trch in a long series of papers [10,11,12,13,14,15,16,17].

To introduce our second measure of associativity, we need to define the distance of two groupoids. Let \(\mathbb {A}_1=(A;\circ )\) and \(\mathbb {A}_2=(A;*)\) be two groupoids on the same finite set A. The distance of \(\mathbb {A}_1\) and \(\mathbb {A}_2\) is the Hamming distance of their operation tables, i.e., the number of positions where the operation tables differ:

$$\begin{aligned} {\text {dist}}(\mathbb {A}_1,\mathbb {A}_2) = |\{ (x,y) : x \circ y \ne x * y \}|. \end{aligned}$$

The set of all groupoids on A is a metric space with the above defined distance. The semigroup distance a groupoid \(\mathbb {A}\), introduced by Kepka and Trch [9], is simply the distance of \(\mathbb {A}\) to the set of all semigroups in this metric space.

Definition 2.2

([9]). The semigroup distance of a finite groupoid \(\mathbb {A}=(A;\circ )\) is defined by

$$\begin{aligned} {\text {sdist}}(\mathbb {A}) = \min \big \{{{\text {dist}}((A;\circ ),(A;*))} : * \text { is an associative operation on } A \big \}. \end{aligned}$$

Informally, \({\text {sdist}}(\mathbb {A})\) is the least number of changes one has to perform in the operation table of \(\mathbb {A}\) to make it associative. We have \({\text {sdist}}(\mathbb {A})=0\) if and only if \(\mathbb {A}\) is associative, and we can say that \(\mathbb {A}\) is almost associative if \({\text {sdist}}(\mathbb {A})=1\). Antiassociativity for n-element groupoids could be defined by \({\text {sdist}}(\mathbb {A})={\text {maxdist}}(n)\), where

$$\begin{aligned} {\text {maxdist}}(n) = \max \big \{{{\text {sdist}}(\mathbb {A})} : \mathbb {A}\text { is an } n \text {-element groupoid}\big \}. \end{aligned}$$

However, the value of \({\text {maxdist}}(n)\) is not known. It is clear that \({\text {maxdist}}(n) \le n^2-n\), as we can make any groupoid associative by changing each entry in the operation table to the most frequently occurring element. As a lower bound, we have \({\text {maxdist}}(n) \ge n^2/4\), as shown by the following example.

Example 2.3

[9]. Let A be an n element set, and let \(x \circ y = f(x)\), where f is a permutation of A that has no fixed points. Then we have \({\text {ns}}(\mathbb {A}) = n^3\) (this is easy to verify) and \({\text {sdist}}(\mathbb {A}) \ge n^2/4\) (this was proved in [9]).

Let us now describe a third way to measure associativity, which was proposed by Csákány [3].

Definition 2.4

[3]. For a groupoid \(\mathbb {A}= (A;\circ )\), let \(s_n(\mathbb {A})\) denote the number of term operations induced by bracketings of the “product” \(x_1 \circ \cdots \circ x_n\). The sequence \({{\,\textrm{spec}\,}}(\mathbb {A}) = (s_1(\mathbb {A}),s_2(\mathbb {A}),s_3(\mathbb {A}),\dots )\) is called the associative spectrum of \(\mathbb {A}\).

Clearly, \(s_1(\mathbb {A})=s_2(\mathbb {A})=1\) for all groupoids, and \(s_3(\mathbb {A})=1\) if \(\mathbb {A}\) is associative, while \(s_3(\mathbb {A})=2\) if \(\mathbb {A}\) is not associative. Moreover, by the generalized associative law, \(s_3(\mathbb {A})=1\) implies \(s_n(\mathbb {A})=1\) for all \(n \in \mathbb {N}\), thus the associative spectrum of a semigroup is \((1,1,1,\dots )\). The associative spectrum measures associativity by its consequences: a nonassociative groupoid \(\mathbb {A}\) may still satisfy some identities that are consequences of the associative law, and a relatively small spectrum indicates that \(\mathbb {A}\) satisfies relatively many of these identities. On the other hand, if \(\mathbb {A}\) satisfies no nontrivial “bracketing identities”, then \(s_n(\mathbb {A})\) equals the number of formally different bracketings of \(x_1 \circ \cdots \circ x_n\), which is the \((n-1)\)-st Catalan number \(C_{n-1} = \frac{1}{n} {2n-2 \atopwithdelims ()n-1}\). In the latter case \(\mathbb {A}\) can be regarded as antiassociative, and almost associativity could be defined by \({{\,\textrm{spec}\,}}(\mathbb {A})=(1,1,2,1,1,\dots )\), as this is the least nonassociative associative spectrum(!). For more background about associative spectra, we refer the reader to [3, 20].

2.2 Comparisons

As the following example illustrates, \({\text {ns}}(\mathbb {A})\) can be arbitrarily large for a groupoid with \({\text {sdist}}(\mathbb {A})=1\), and, at the same time, \(\mathbb {A}\) can have a Catalan spectrum.

Example 2.5

Let us define a binary operation \(\circ \) on \(A = \{ 0,1,\dots ,n-1 \}\) by

$$\begin{aligned} x \circ y = {\left\{ \begin{array}{ll} 1, &{} \text {if } x=y=0;\\ 0, &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

It is clear that \({\text {sdist}}(\mathbb {A})=1\) (we get a zero semigroup by setting \(0 \circ 0 = 0\)), and it is not hard to verify that \({\text {ns}}(\mathbb {A})=2n(n-1)\) [9]. Restricting \(\circ \) to the subuniverse \(\{ 0,1 \}\), we get the Sheffer operation (NOR operation), which does not satisfy any nontrivial bracketing identities [3], hence the same is true for \(\mathbb {A}\), i.e., \(\mathbb {A}\) has a Catalan spectrum.

In some sense, the example above is the worst possible: if \(|A|=n\), then \({\text {sdist}}(\mathbb {A})=1\) implies \({\text {ns}}(\mathbb {A}) \le 2n(n-1)\) [9] (and, of course, no associative spectrum can exceed the Catalan numbers). Analogously to Example 2.5, a groupoid with \({\text {ns}}(\mathbb {A})=1\) can have arbitrarily large semigroup distance [1, 16]. However, here we do not know the maximal value of \({\text {sdist}}(\mathbb {A})\) among n-element groupoids with \({\text {ns}}(\mathbb {A})=1\); the authors of [1] only mention that the size of the groupoids they have constructed grow quickly.

In the next example we present groupoids that are almost associative with respect to both the index of nonassociativity and the semigroup distance but are antiassociative in the “spectral” sense.

Example 2.6

Let us consider the groupoid \(\mathbb {A}= (A;\circ )\) defined by the following operation table:

$$\begin{aligned}{}\begin{array}{c|ccc} \circ &{} 0 &{} 1 &{} 2\\ \hline 0 &{} 0 &{} 0 &{} 0\\ 1 &{} 0 &{} 1 &{} 0\\ 2 &{} 0 &{} 1 &{} 2 \end{array} \end{aligned}$$

We have \({\text {sdist}}(\mathbb {A})=1\) (we get a semigroup by setting \(1 \circ 2 = 2\)) and \({\text {ns}}(\mathbb {A})=1\) (the only nonassociative triple is (1, 2, 1)). On the other hand, \(\mathbb {A}\) has a Catalan spectrum [3]. Repeatedly extending the groupoid by a zero element, we can construct arbitrarily large groupoids with these properties.

Let us now see how small can be the semigroup distance of an n-element groupoid \(\mathbb {A}\) with \({\text {ns}}(\mathbb {A})=n^3\).

Proposition 2.7

If \(\mathbb {A}\) is an n-element groupoid and \({\text {ns}}(\mathbb {A})=n^3\), then we have \({\text {sdist}}(\mathbb {A}) \ge n\).

Proof

Assume for contradiction that there is an n-element groupoid \(\mathbb {A}=(A;\circ )\) such that \({\text {ns}}(\mathbb {A})=n^3\) and \({\text {sdist}}(\mathbb {A}) < n\). Then there is a semigroup \(\mathbb {A}^*=(A;*)\) with \({\text {dist}}(\mathbb {A},\mathbb {A}^*) < n\). By the pigeonhole principle, there is an element \(a \in A\) such that the row of a in the operation table of \(\mathbb {A}\) is identical to the row of a in the operation table of \(\mathbb {A}^*\), and similarly, there exist \(c \in A\) such that the column of c looks the same in the two operation tables:

$$\begin{aligned} \forall x \in A:a \circ x = a * x \text { and } x \circ c = x * c. \end{aligned}$$

This implies that (abc) is an associative triple in \(\mathbb {A}\) for any \(b \in A\):

$$\begin{aligned} (a \circ b) \circ c = (a * b) * c = a * (b * c) = a \circ (b \circ c). \end{aligned}$$

However, this contradicts the assumption \({\text {ns}}(\mathbb {A})=n^3\). \(\square \)

The estimate in the proposition above is sharp, as shown by the following example

Example 2.8

Let us define a binary operation \(\circ \) on \(A = \{ 0,1,\dots ,n-1 \}\) by \(x \circ y = f(x)\), where the map \(f:A \rightarrow A\) is given by

$$\begin{aligned} f(x) = {\left\{ \begin{array}{ll} 1, &{} \text {if } x=0;\\ 0, &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

(Compare this with Example 2.3.) Then we have \((a \circ b) \circ c = f^2(a)\) and \(a \circ (b \circ c) = f(a)\); therefore, \({\text {ns}}(\mathbb {A})=n^3\). This implies \({\text {sdist}}(\mathbb {A}) \ge n\) by Proposition 2.7. Actually, we have \({\text {sdist}}(\mathbb {A}) = n\), as \(\mathbb {A}\) is of distance n to a zero semigroup. Considering the associative spectrum, note that evaluating \(x_1 \circ \cdots \circ x_n\) over \(\mathbb {A}\), we get either \(f(x_1)\) or \(f^2(x_1)\), depending on the bracketing. Thus the associative spectrum of \(\mathbb {A}\) is quite small: \({{\,\textrm{spec}\,}}(\mathbb {A})=(1,1,2,2, \dots )\).

Summarizing our observations, we can say that the three measures of associativity behave quite differently, but there seems to be some weak connection between the index of nonassociativity and the semigroup distance. The latter is also illustrated by the following inequality.

Theorem 2.9

[9]. If \(|A|=n\), then \({\text {ns}}(\mathbb {A}) \le (2n^2+2n) \cdot {\text {sdist}}(\mathbb {A})\).

2.3 Graphs

A directed graph is a pair \(G=(V;\rho )\), where \(V=V(G)\) is a nonempty set (vertices) and \(\rho \subseteq V \times V\) is a relation on V. We consider finite undirected graphs, i.e., we always assume that the relation \(\rho \) is symmetric. (Sometimes we drop the adjective “undirected”: by default, a graph always means an undirected graph in this paper.) If G is such a graph and \((x,y) \in \rho \) for two distinct vertices \(x,y \in V(G)\), then \(e=\{x,y\}\) is an (undirected) edge of G, which we will simply write as \(e=xy\) (of course, yx is the same edge). If \((x,x) \in \rho \), then there is a loop on the vertex x. It will be convenient to treat loops and “real” edges separately. Therefore, by a slight abuse of terminology, in the following we say that \(e=xy\) is an edge only in the case \(x \ne y\). The set of “loopy” vertices is denoted by \(L(G)\), thus we write \(x \in L(G)\) to indicate that there is a loop on the vertex x. We denote by E(G) the set of edges (loops are not included!), and we partition this set into three parts according to the number of loops at the endpoints of the edges:

$$\begin{aligned} E_0(G)&= \{ xy \in E(G) : x \notin L(G)\text { and } y \notin L(G)\};\\ E_1(G)&= \{ xy \in E(G) : \text {exactly one of } x \in L(G), y \in L(G)\text { holds}\};\\ E_2(G)&= \{ xy \in E(G) : x \in L(G)\text { and } y \in L(G)\}. \end{aligned}$$

(Thus \(e \in E_i\) means that exactly i of the two endpoints of the edge e have a loop.) We say that G is a reflexive graph if the underlying relation is reflexive, i.e., \(L(G)= V(G)\). Similarly, we say that G is irreflexive if there are no loops, i.e., if \(L(G)= \emptyset \). We use the notation \(G^\circlearrowleft \) for the graph that we obtain from G by adding loops to all vertices.

The neighborhood of x in G is the set \(N(x) = \{y \in V(G) : xy \in E(G) \}\). Since loops and edges are treated separately, we have \(x \notin N(x)\) even if there is a loop on x. The degree of a vertex x is the size of it neighborhood: \(d(x) = |N(x)|\). Note that if there is a loop on x, it is not taken into account in d(x). We say that x is an isolated vertex if \(d(x)=0\). Again, loops do not matter: a vertex with a loop that is not connected to any other vertices is considered isolated. A connected component of G is said to be nontrivial if it has at least two vertices; the trivial connected components are just the isolated vertices. For \(A \subseteq V\), we denote by \(G\vert _A\) the induced subgraph of G on the vertex set A, and \(G{\setminus } A\) stands for \(G\vert _{V {\setminus } A}\).

By a cherry in G, we mean a three-vertex induced subgraph in which exactly two edges are present (loops do not matter):

figure a

The set of cherries in G will be denoted by \(Ch(G)\). Thus \(xyz \in Ch(G)\) if xyz are three distinct vertices, and exactly two of \(xy \in E(G)\), \(yz \in E(G)\) and \(xz \in E(G)\) holds. Here we use xyz as a shorthand for \(\{x,y,z\}\) (just as xy is a shorthand for \(\{x,y\}\) when we speak about (non)edges).

An irreflexive graph G is bipartite if \(V(G) = A \cup B\), where A and B are disjoint sets (called the two color classes) such that every edge of G has one of its endpoints in A and the other endpoint in B. (We usually draw bipartite graphs in such a way that A is above B and B is below A.) For natural numbers nm, let \(K_n\) denote the complete graph on n vertices, and let \(K_{n,m}\) denote the complete bipartite graph with n and m vertices in the two color classes. We will often need balanced complete bipartite graphs, where the sizes of the two color classes differ by at most one. Up to isomorphism there is only one such graph on n vertices, namely \(K_{\big \lfloor n/2 \big \rfloor ,\big \lceil n/2 \big \rceil }\). These graphs are irreflexive by definition; but we will also frequently encounter the reflexive complete graph . We use the notation for the graph that we obtain from by removing one loop (of course, up to isomorphism, it does not matter which loop is removed). Similarly, and indicate the removal of one edge from \(K_n\) and , respectively (loops are retained in the second case).

We will need the following classical result of extremal graph theory, which is a special case of Turán’s theorem about \(K_r\)-free graphs (see, e.g., [4, Theorem 7.1.1]).

Theorem 2.10

If G is a triangle-free irreflexive undirected graph on n vertices, then \(|E(G)| \le \big \lfloor n/2 \big \rfloor \cdot \big \lceil n/2 \big \rceil = \big \lfloor n^2/4 \big \rfloor \). Equality holds here if and only if \(G \cong K_{\big \lfloor n/2 \big \rfloor ,\big \lceil n/2 \big \rceil }\).

2.4 Graph algebras

Shallon [24] proposed a construction of an algebra associated to a directed graph. These graph algebras have a binary and a nullary operation, but here we only consider their groupoid reducts (for simplicity, we do not introduce a new name, and we just call these reducts graph algebras).

Definition 2.11

[24]. The graph algebra of a directed graph \(G=(V;\rho )\) is the groupoid \(\mathbb {A}({G})=(V\mathbin {{\dot{\cup }}}\{0\};\circ )\), where

$$\begin{aligned} x \circ y = {\left\{ \begin{array}{ll} x, &{} \text {if } x,y \in V \text { and } (x,y) \in \rho ;\\ 0, &{} \text {if } x,y \in V \text { and } (x,y) \notin \rho ;\\ 0, &{} \text {if } x=0 \text { or } y=0. \end{array}\right. } \end{aligned}$$

Let us mention that there are also other ways to assign a groupoid to a directed graph; for instance, in [7] the authors study associative triples in the groupoid \((V;\cdot )\), where the binary operation is defined by \(x \cdot y = x\) if \((x,y) \in \rho \) and \(x \cdot y = y\) if \((x,y) \notin \rho \).

Poomsa-ard [21] characterized directed graphs with associative graph algebras.

Theorem 2.12

[21]. The graph algebra of a directed graph is associative if and only if the edge relation is transitive and the outneighborhood of each vertex is a reflexive complete graph.

For undirected graphs, the characterization takes the following simple form.

Corollary 2.13

An undirected graph has an associative graph algebra if and only if all of its nontrivial connected components are reflexive complete graphs.

Associative spectra of graph algebras were investigated in [18, 19]. The main tool in that study was a result of Pöschel and Wessel [22] describing satisfaction of identities in terms of graph homomorphisms. Since associative spectra of undirected graphs were completely described in [18] (we state this result in Theorem 5.1), we consider \({\text {ns}}(\mathbb {A}({G}))\) and \({\text {sdist}}(\mathbb {A}({G}))\).

More precisely, we consider a variant of the semigroup distance that seems more relevant to our setting, and, admittedly, is easier to handle. We restrict our attention to the metric space of graph algebras of undirected graphs with a fixed vertex set V, and we measure associativity by the distance to the set \({\text {AssGr}}(V)\) of associative graph algebras in this metric space.

Definition 2.14

For a finite nonempty set V, let \({\text {AssGr}}(V)\) denote the set of all undirected graphs H with \(V(H)=V\) such that \(\mathbb {A}({H})\) is associative. For a finite undirected graph G with \(V(G)=V\), we define \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))\) as follows:

$$\begin{aligned} {\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))= \min \big \{{{\text {dist}}(\mathbb {A}({G}),\mathbb {A}({H}))}: H \in {\text {AssGr}}(V)\big \}. \end{aligned}$$

Clearly, \({\text {sdist}}(\mathbb {A}({G})) \le {\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))\), and sometimes we have equality here (e.g., when \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))=1\)), but it may also happen that the inequality is strict (e.g., for \(G \cong K_{\big \lfloor n/2 \big \rfloor ,\big \lceil n/2 \big \rceil }\); see Section 5). Observe that deleting or adding an edge requires two changes in the operation table, while deleting or adding a loop requires only one change. Thus, if G and H are two graphs on the same finite vertex set, then

$$\begin{aligned} {\text {dist}}(\mathbb {A}({G}),\mathbb {A}({H}))= 2 \cdot |E(G) \bigtriangleup E(H)| + |L(G)\bigtriangleup L(H)|. \end{aligned}$$
(2.1)

Tables 1, 2 and 3 in Appendix C show the results of a brute force computer exploration of graph algebras of undirected graphs of sizes 3, 4 and 5 (we leave the case of 2-vertex graphs as an exercise to the interested reader). Note that the size of the graph algebra is one more than the size of the graph, due to the external zero element. We will always refer to the size of the graph; in particular, n denotes the number of vertices of the graph under consideration throughout the paper (hence the corresponding graph algebra has \(n+1\) elements).

We can conjecture from the tables that the range of \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))\) for n-vertex graphs is an interval. This is indeed the case: we prove in Theorem 3.5 that the possible values are \(0,1,\dots , \big \lfloor n^2/2 \big \rfloor \), and we characterize graphs with the maximal value \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))= \big \lfloor n^2/2 \big \rfloor \) in Theorem 3.6. The behavior of the index of nonassociativity seems more complicated. It will be clear that \({\text {ns}}(\mathbb {A}({G}))\) is always even (see Proposition 4.1), but we see “gaps” in the range of \({\text {ns}}(\mathbb {A}({G}))\) even if we disregard odd numbers (these missing values are indicated by gray color in Tables 2 and 3). We find the maximal value of \({\text {ns}}(\mathbb {A}({G}))\) and the corresponding graphs in Theorem 4.6, and we describe some gaps at the top of the range in Corollary 4.8. In Theorem 4.10 we prove that the bottom of the range contains an interval of even numbers that is asymptotically as long as the whole range. This means that the “chaotic” part at the top is very small, but it remains an open problem to complete the description of the range of \({\text {ns}}(\mathbb {A}({G}))\).

3 Semigroup distance of graph algebras

Our first result is an upper estimate of \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))\) in terms of the number of edges. We prove this inequality in the next lemma, then we prove that the inequality is strict in some special cases (Lemma 3.2), and in Proposition 3.3 we characterize graphs for which the estimate is sharp.

Lemma 3.1

For any finite undirected graph G, we have \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))\le 2\cdot |E(G)|\).

Proof

Consider the graph H such that \(V(H)=V(G)\), \(E(H)=\emptyset \) and \(L(H)=L(G)\), i.e., H is obtained from G by deleting all edges (but keeping the loops). By Corollary 2.13, H has an associative graph algebra, since it has no nontrivial component. Clearly, \({\text {dist}}(\mathbb {A}({G}),\mathbb {A}({H}))=2\cdot |E(G)|\), thus \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))\le 2\cdot |E(G)|\). \(\square \)

Lemma 3.2

If G is a finite undirected graph that has a loop on a non-isolated vertex, then \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))< 2\cdot |E(G)|\).

Proof

Assume that \(x \in L(G)\) and \(xy \in E(G)\). We consider the graph H such that \(V(H)=V(G)\), \(E(H)=\{xy\}\) and \(L(H)=L(G)\cup \{y\}\), i.e., H is obtained from G by deleting all edges except xy and adding a loop to y (if there was no loop there). By Corollary 2.13, H has an associative graph algebra, since its only nontrivial component is isomorphic to . According to (2.1), we have

$$\begin{aligned} {\text {dist}}(\mathbb {A}({G}),\mathbb {A}({H}))&= 2 \cdot |E(G) \bigtriangleup E(H)| + |L(G)\bigtriangleup L(H)|\\&\le 2\cdot (|E(G)|-1)+1\\&=2\cdot |E(G)|-1, \end{aligned}$$

thus \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))< 2\cdot |E(G)|\), as claimed. \(\square \)

Proposition 3.3

For any finite undirected graph G, we have

$$\begin{aligned} {\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))\le 2\cdot |E(G)|, \end{aligned}$$

and equality holds here if and only if G is triangle-free and loops appear only on isolated vertices.

Proof

We have already proved the inequality \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))\le 2\cdot |E(G)|\) in Lemma 3.1, so we only need to describe the graphs for which we have equality. Isolated vertices (with or without loops) do not influence the value of \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))\), and they do not count into \(|E(G)|\) either. Therefore, we can assume without loss of generality that G has no isolated vertices. Moreover, by Lemma 3.2, we may also suppose that G is irreflexive.

Assume first that G contains a triangle xyz. Let us construct the graph H such that \(V(H)=V(G)\), \(E(H)=\{xy, xz, yz\}\) and \(L(H)=\{x,y,z\}\), i.e., H is obtained from G by deleting all edges except the three edges of the triangle xyz and adding a loop to x, y and z. By Corollary 2.13, H has an associative graph algebra, since its only nontrivial component is isomorphic to . According to (2.1), we have

$$\begin{aligned} {\text {dist}}(\mathbb {A}({G}),\mathbb {A}({H}))&= 2 \cdot |E(G) \bigtriangleup E(H)| + |L(G)\bigtriangleup L(H)|\\&\le 2\cdot (|E(G)|-3)+3\\&=2\cdot |E(G)|-3, \end{aligned}$$

thus \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))< 2\cdot |E(G)|\), as claimed.

It remains to prove that if G is an irreflexive triangle-free graph without isolated vertices, then \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))\ge 2\cdot |E(G)|\). Let \(H \in {\text {AssGr}}(V)\), where \(V=V(G)\), and let us verify that \({\text {dist}}(\mathbb {A}({G}),\mathbb {A}({H}))\ge 2\cdot |E(G)|\). By Corollary 2.13, all nontrivial connected components of H are reflexive complete graphs. Denote the vertex sets of the nontrivial components of H by \(V_i\ (i=1,\dots ,k)\) and let \(V_0\) denote the set of isolated vertices of H (we can assume without loss of generality that H has no loops on isolated vertices). Of course, it may happen that H has no nontrivial connected components; in that case we have \(k=0\) and \(V_0=V\). Let \(e_i\) (\(i=1,\dots ,k\)) denote the number of edges in \(G|_{V_i}\) and let \(e_0\) denote the number of the remaining edges in G. Note that \(e_0\) counts the edges of G within \(V_0\) as well as the edges across the sets \(V_i\ (i=0,1,\dots ,k)\).

This time it will be easier to compute \({\text {dist}}(\mathbb {A}({G}),\mathbb {A}({H}))\) directly by considering the operation tables of \(\mathbb {A}({G})\) and \(\mathbb {A}({H})\), instead of using (2.1). The two operation tables differ at \(|V_i|^2-2e_i\) many places in \(V_i \times V_i\) for \(i=1,\dots ,k\) and at \(2e_0\) many places outside the set \(\bigcup _{i=1}^kV_i \times V_i\). Therefore, we have \({\text {dist}}(\mathbb {A}({G}),\mathbb {A}({H}))= 2e_0+\sum _{i=1}^k(|V_i|^2-2e_i)\). Since \(G|_{V_i}\) (\(i=1,\dots ,k\)) is an irreflexive triangle-free graph, \(e_i\le |V_i|^2/4\) holds by Theorem 2.10. This implies \(|V_i|^2- 2e_i \ge 2e_i\) for \(i=1,\dots ,k\), hence

$$\begin{aligned} {\text {dist}}(\mathbb {A}({G}),\mathbb {A}({H}))= 2e_0+\sum _{i=1}^k(|V_i|^2-2e_i) \ge 2e_0+\sum _{i=1}^k2e_i = 2 |E(G)|, \end{aligned}$$

and this completes the proof. \(\square \)

Before stating and proving the main results of this section that describe possible values of \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))\) (Theorem 3.5) and the graphs attaining the maximal value (Theorem 3.6), we need to look at what happens to \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))\) if we add a loop to a vertex.

Lemma 3.4

Let G be a finite undirected graph with \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))=m\), and assume that there is no loop on the vertex \(v \in V(G)\). Let \(G^*\) be the graph obtained from G by adding a loop to v. Then \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G^*)})\in \{m, m-1\}\).

Proof

Clearly \({\text {dist}}(\mathbb {A}({G}),\mathbb {A}({G^*)})=1\), thus the triangle inequality implies that \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G^*)}) \ge m-1\). In order to prove \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G^*)}) \le m\), let us consider \(H \in {\text {AssGr}}(V)\), with \(V=V(G)\) and \({\text {dist}}(\mathbb {A}({G}),\mathbb {A}({H}))=m\). If \(v \in L(H)\) then \({\text {dist}}({\mathbb {A}({G^*})},{\mathbb {A}({H})})=m-1 \le m\). If \(v \notin L(H)\) then, by Corollary 2.13, v is an isolated vertex in H and the graph \(H^*\) obtained from H by adding a loop to the vertex v also belongs to \({\text {AssGr}}(V)\); furthermore, \({\text {dist}}({\mathbb {A}({G^*})},\mathbb {A}({H^*)})=m\) in this case. This proves that in both cases there is an associative graph algebra of distance at most m from \(\mathbb {A}({G})\), thus \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))\le m\), as claimed. \(\square \)

Let \({{\,\textrm{range}\,}}_n({\text {sdist}}_\textrm{gr})\) denote the range of \({\text {sdist}}_\textrm{gr}\) on graph algebras of n-vertex undirected graphs:

$$\begin{aligned} {{\,\textrm{range}\,}}_n({\text {sdist}}_\textrm{gr}) = \big \{{{\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))}: G \text { is an undirected graph on } n \text { vertices} \big \}. \end{aligned}$$

Theorem 3.5

For any positive integer n, we have

$$\begin{aligned} {{\,\textrm{range}\,}}_n({\text {sdist}}_\textrm{gr}) = \Big \{0,1,\dots , \big \lfloor n^2/2 \big \rfloor \Big \}. \end{aligned}$$

Proof

Let us consider bipartite graphs G with color classes A and B, where \(|A|=\big \lfloor n/2 \big \rfloor \), \(|B|=\big \lceil n/2 \big \rceil \). The number of edges of such a graph can be any number between 0 and \(\big \lfloor n/2 \big \rfloor \cdot \big \lceil n/2 \big \rceil \). Since bipartite graphs are triangle-free, we have \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))= 2\cdot |E(G)|\) by Proposition 3.3, thus \({{\,\textrm{range}\,}}_n({\text {sdist}}_\textrm{gr})\) contains all even numbers between 0 and \(2\cdot \big \lfloor n/2 \big \rfloor \cdot \big \lceil n/2 \big \rceil =\big \lfloor n^2/2 \big \rfloor \).

To obtain the odd numbers in this interval, let G be any one of the graphs considered above that has a non-isolated vertex x. (The latter assumption exludes only the empty graph.) If \(G^*\) is the graph obtained from G by adding a loop to the vertex x, then we have \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G^*)})=2\cdot |E(G)|-1\) by Lemmas 3.2 and 3.4. This proves that \({{\,\textrm{range}\,}}_n({\text {sdist}}_\textrm{gr})\) contains all odd numbers between 1 and \(\big \lfloor n^2/2 \big \rfloor -1\).

It remains to prove that \({{\,\textrm{range}\,}}_n({\text {sdist}}_\textrm{gr})\) does not contain any number greater than \(\big \lfloor n^2/2 \big \rfloor \). Let \(V=V(G)\) have n elements, let \(H_0\) be the empty graph on V (with no edges and no loops), and let \(H_1\) be the reflexive complete graph on V (thus ). Clearly, \(H_0, H_1 \in {\text {AssGr}}(V)\) and \({\text {dist}}(\mathbb {A}({G}),\mathbb {A}({H_0)}) + {\text {dist}}(\mathbb {A}({G}),\mathbb {A}({H_1)}) = n^2\). This implies that the smaller one of \({\text {dist}}(\mathbb {A}({G}),\mathbb {A}({H_0)})\) and \({\text {dist}}(\mathbb {A}({G}),\mathbb {A}({H_1)})\) is at most \(\big \lfloor n^2/2 \big \rfloor \), hence \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))\le \big \lfloor n^2/2 \big \rfloor \). \(\square \)

Theorem 3.6

For an arbitrary finite undirected graph G on n vertices, we have \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))=\big \lfloor n^2/2 \big \rfloor \) if and only if \(G \cong K_{\big \lfloor n/2 \big \rfloor ,\big \lceil n/2 \big \rceil }\).

Proof

It is clear from Proposition 3.3 that \({\text {sdist}}_\textrm{gr}(K_{\big \lfloor n/2 \big \rfloor ,\big \lceil n/2 \big \rceil }) = 2 \cdot \big \lfloor n/2 \big \rfloor \cdot \big \lceil n/2 \big \rceil = \big \lfloor n^2/2 \big \rfloor \). Conversely, assume that \(V=V(G)\) has n elements and \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))=\big \lfloor n^2/2 \big \rfloor \), and let us prove that \(G \cong K_{\big \lfloor n/2 \big \rfloor ,\big \lceil n/2 \big \rceil }\). We separate two cases on whether G contains loops or not.

Case 1: There are no loops in G (i.e., G is irreflexive). Let \(H_0\) be the empty graph on V (with no edges and no loops), and let \(H_1\) be the reflexive complete graph on V (thus ). Then we have

$$\begin{aligned} {\text {dist}}(\mathbb {A}({G}),\mathbb {A}({H_0)})&= 2 \cdot |E(G)| \ge {\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))= \big \lfloor n^2/2 \big \rfloor \\ {\text {dist}}(\mathbb {A}({G}),\mathbb {A}({H_1)})&= n^2 - 2 \cdot |E(G)| \ge {\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))= \big \lfloor n^2/2 \big \rfloor . \end{aligned}$$

From these two inequalities we get \(\big \lfloor n^2/2 \big \rfloor \le 2 \cdot |E(G)| \le \big \lceil n^2/2 \big \rceil \). If n is odd, then \(\big \lceil n^2/2 \big \rceil \) is an odd number, hence \(2 \cdot |E(G)| = \big \lceil n^2/2 \big \rceil \) is impossible. So we have \(2 \cdot |E(G)| = \big \lfloor n^2/2 \big \rfloor \), and this is certainly true also if n is even. Therefore, \(2\cdot |E(G)| = {\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))\), and then G is triangle-free, according to Proposition 3.3. Furthermore, G is irreflexive and \(|E(G)| = 1/2 \cdot \big \lfloor n^2/2 \big \rfloor = \big \lfloor n^2/4 \big \rfloor \), thus we can use Theorem 2.10 to conclude that \(G \cong K_{\big \lfloor n/2 \big \rfloor ,\big \lceil n/2 \big \rceil }\).

Case 2: There is at least one loop in G. By Lemma 3.4, \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))\) could only increase if we remove loops, but by Theorem 3.5, \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))= \big \lfloor n^2/2 \big \rfloor \) is the maximum of \({{\,\textrm{range}\,}}_n({\text {sdist}}_\textrm{gr})\). This means that \({\text {sdist}}_\textrm{gr}(\mathbb {A}({{\hat{G}})}) = \big \lfloor n^2/2 \big \rfloor \) holds for the graph \({\hat{G}}\) that we obtain from G by removing all loops. Now \({{\hat{G}}} \cong K_{\big \lfloor n/2 \big \rfloor ,\big \lceil n/2 \big \rceil }\) follows by Case 1, hence \(E({\hat{G}}) = \big \lfloor n/2 \big \rfloor \cdot \big \lceil n/2 \big \rceil \). Since G contains a loop but has no isolated vertices, we can apply Lemma 3.2:

$$\begin{aligned} {\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))< 2 \cdot |E(G)| = 2 \cdot |E({\hat{G}})| = 2 \cdot \big \lfloor n/2 \big \rfloor \cdot \big \lceil n/2 \big \rceil = \big \lfloor n^2/2 \big \rfloor . \end{aligned}$$

However, this contradicts our assumption \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))= \big \lfloor n^2/2 \big \rfloor \), so Case 2 is actually not possible. \(\square \)

4 Index of nonassociativity of graph algebras

As the next proposition shows, the index of nonassociativity is much easier to compute than \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))\): we just need to count cherries and edges. We also see that \({\text {ns}}(\mathbb {A}({G}))\) is always an even number for undirected graphs.

Proposition 4.1

For any finite undirected graph G, we have

$$\begin{aligned} {\text {ns}}(\mathbb {A}({G}))= 4 \cdot |Ch(G)| + 4 \cdot |E_0(G)| + 2 \cdot |E_1(G)|. \end{aligned}$$

Proof

Let uvw be a cherry, say, with \(uv, uw \in E(G)\) and \(vw \notin E(G)\). It is straightforward to verify that 4 of the 6 permutations of uvw give a nonassociative triple, namely (uvw), (uwv), (vuw) and (wuv). We claim that all nonassociative triples with pairwise distinct entries do arise this way from a cherry. Indeed, if (xyz) is a nonassociative triple, then either \((xy)z = 0 \ne x = x(yz)\) or \((xy)z = x \ne 0 = x(yz)\). The first case holds if and only if \(xy \in E(G)\), \(yz \in E(G)\), \(xz \notin E(G)\), while the second case is equivalent to \(xy \in E(G)\), \(yz \notin E(G)\), \(xz \in E(G)\). We see that in both cases xyz is a cherry, provided that x, y, z are pairwise distinct. Thus the number of nonassociative triples consisting of three different entries is \(4 \cdot |Ch(G)|\).

Now if u and v are two different vertices and \(uv \notin E(G)\) or \(uv \in E_2(G)\), then all triples formed from u and v are associative. If \(uv \in E_0(G)\), then we get the 4 nonassociative triples (uvv), (uvu), (vuu) and (vuv). If \(uv \in E_1(G)\), say, with \(u \in L(G)\) and \(v \notin L(G)\), then we get only 2 nonassociative triples, namely (uvv) and (vuv). Thus the number of nonassociative triples consisting of two different entries is \(4 \cdot |E_0(G)| + 2 \cdot |E_1(G)|\). To finish the proof, we only need to note that triples of the form (uuu) are associative (as well as those that contain the zero element). \(\square \)

Proposition 4.2

For any finite undirected graph G, we have

$$\begin{aligned} {\text {ns}}(\mathbb {A}({G}))\le 2 \cdot \sum _{v \in V(G)} d(v)^2, \end{aligned}$$

and equality holds here if and only if G is triangle-free and loops appear only on isolated vertices.

Proof

Isolated vertices (with or without loops) do not influence the index of nonassociativity, and they do not count into vertex degrees either. Therefore, we can assume without loss of generality that G has no isolated vertices. By Proposition 4.1, \({\text {ns}}(\mathbb {A}({G}))\le 4 \cdot |Ch(G)| + 4 \cdot |E(G)|\), with equality if and only if G is irreflexive. Each vertex v is the “top” vertex of at most \({d(v) \atopwithdelims ()2}\) cherries, and if v is not contained in any triangle, then (and only then), v is the “top” vertex of exactly \({d(v) \atopwithdelims ()2}\) cherries. Thus we can estimate \({\text {ns}}(\mathbb {A}({G}))\) as follows:

$$\begin{aligned} {\text {ns}}(\mathbb {A}({G}))&\le 4 \cdot |Ch(G)| + 4 \cdot |E(G)|\\&\le 4 \cdot \sum _{v \in V(G)} {d(v) \atopwithdelims ()2} + 4 \cdot |E(G)|\\&= 2 \cdot \sum _{v \in V(G)} d(v)^2 - 2 \cdot \sum _{v \in V(G)} d(v) + 4 \cdot |E(G)|\\&= 2 \cdot \sum _{v \in V(G)} d(v)^2 \end{aligned}$$

(in the last step we used the well-known handshaking theorem: the sum of the degrees is twice the number of edges). It is clear from the arguments above that this estimate is sharp if and only if G contains neither loops nor triangles.

\(\square \)

Example 4.3

It follows immediately from the proposition above that for the complete bipartite graph \(K_{a,b}\), we have \({\text {ns}}(\mathbb {A}({{K_{a,b}})}) = 2ab^2+2a^2b = 2(a+b)ab\).

In the following we take a route similar to Section 3: we estimate \({\text {ns}}(\mathbb {A}({G}))\) by the number of edges (Proposition 4.4), then we see what happens to \({\text {ns}}(\mathbb {A}({G}))\) when we add a loop (Lemma 4.5), and we use these to prove our main results about the range of \({\text {ns}}(\mathbb {A}({G}))\) (Theorem 4.6, Corollary 4.8 and Theorem 4.10).

Proposition 4.4

For any finite undirected graph G with n vertices, we have

$$\begin{aligned} {\text {ns}}(\mathbb {A}({G}))\le 2n \cdot |E(G)|, \end{aligned}$$

and equality holds here if and only if G is either a complete bipartite graph or G has no edges.

Proof

Example 4.3 shows that \({\text {ns}}(\mathbb {A}({G}))= 2n \cdot |E(G)|\) for complete bipartite graphs, and this is trivially true for “edgeless” graphs, too.

Now assume that G has at least one edge, and for any \(e \in E(G)\), let ch(e) denote the number of cherries that contain the edge e. Clearly, we have \(ch(e) \le (n-2)\) for each \(e \in E(G)\); furthermore, \(\sum _{e \in E(G)} ch(e) = 2 \cdot |Ch(G)|\), as each cherry contains exactly two edges. Using these observations together with Proposition 4.1, we can estimate \({\text {ns}}(\mathbb {A}({G}))\) as follows:

$$\begin{aligned} {\text {ns}}(\mathbb {A}({G}))&\le 4 \cdot |Ch(G)| + 4 \cdot |E(G)|\\&= 2 \cdot \sum _{e \in E(G)} {ch(e)} + 4 \cdot |E(G)|\\&\le 2(n-2) \cdot |E(G)| + 4 \cdot |E(G)|\\&= 2n \cdot |E(G)|. \end{aligned}$$

The inequalities above turn into equalities if and only if the endpoints of each edge are loopless, and every edge is contained in \(n-2\) cherries, i.e.,

$$\begin{aligned} \forall xy \in E(G)\ \forall v \in V(G) {\setminus } \{x,y\}:xv \in E(G) \text { or } yv \in E(G), \text { but not both}.\nonumber \\ \end{aligned}$$
(4.1)

Let us assume that G is such a graph, and suppose for contradiction that G contains a cycle C of odd length. Any chord of C (i.e., an edge connecting two non-consecutive vertices of C) cuts C into two shorter cycles, one of which has odd length. Therefore, if we choose C to be of minimal odd length \(\ell \), then C is a chordless cycle. However, the existence of a chordless cycle of length \(\ell \) contradicts (4.1), unless \(\ell =4\). This proves that G does not contain any cycles of odd length, hence G is bipartite. Let A and B be the two color classes, and let \(xy \in E(G)\) with \(x \in A\) and \(y \in B\). If v is any vertex from A, then (4.1) gives that \(yv \in E(G)\), whereas if v belongs to B, then we have \(xv \in E(G)\). Since this is true for every edge xy of G, we can conclude that G is a complete bipartite graph. \(\square \)

Lemma 4.5

Let G be a finite undirected graph, let v be a loopless vertex of G, and let \(G^*\) denote the graph obtained from G by adding a loop to v. Then we have

$$\begin{aligned} {\text {ns}}(\mathbb {A}({G^*)}) = {\text {ns}}(\mathbb {A}({G}))- 2 \cdot d(v). \end{aligned}$$

Proof

Let p denote the number of loopless neighbors of v, and let q denote the number of neighbors of v that have a loop. Then we have

$$\begin{aligned} |Ch(G^*)| = |Ch(G)|, \quad |E_0(G^*)| = |E_0(G)| - p, \quad |E_1(G^*)| = |E_1(G)| +p - q. \end{aligned}$$

Now we can compute \({\text {ns}}(\mathbb {A}({G^*)})\) with the help of Proposition 4.1:

$$\begin{aligned} {\text {ns}}(\mathbb {A}({G^*)})&= 4 \cdot |Ch(G)| + 4 \cdot (|E_0(G)|-p) + 2 \cdot (|E_1(G)|+p-q)\\&= {\text {ns}}(\mathbb {A}({G}))-2(p+q)\\&= {\text {ns}}(\mathbb {A}({G}))-2d(v). \end{aligned}$$

\(\square \)

Theorem 4.6

For any finite irreflexive undirected graph G on n vertices, the following hold:

  1. (i)

    if \(G \cong K_{\big \lfloor n/2 \big \rfloor ,\big \lceil n/2 \big \rceil }\), then \({\text {ns}}(\mathbb {A}({G}))= 2n\big \lfloor n/2 \big \rfloor \big \lceil n/2 \big \rceil = n\big \lfloor n^2/2 \big \rfloor \);

  2. (ii)

    if \(G \ncong K_{\big \lfloor n/2 \big \rfloor ,\big \lceil n/2 \big \rceil }\), then \({\text {ns}}(\mathbb {A}({G}))\le 2n\big \lfloor n/2 \big \rfloor \big \lceil n/2 \big \rceil - 4\big \lfloor n/2 \big \rfloor + 4\).

Proof

The proof is quite long and technical, so we present it separately in Appendix A. \(\square \)

Remark 4.7

Let G be the graph obtained from \(K_{\big \lfloor n/2 \big \rfloor ,\big \lceil n/2 \big \rceil }\) by connecting two vertices in the color class of size \(\big \lceil n/2 \big \rceil \) by an edge. Then we have \({\text {ns}}(\mathbb {A}({G}))= 2n\big \lfloor n/2 \big \rfloor \big \lceil n/2 \big \rceil - 4\big \lfloor n/2 \big \rfloor + 4\), and this shows that the estimate in item (ii) of Theorem 4.6 cannot be improved.

Let \({{\,\textrm{range}\,}}_n({\text {ns}})\) denote the range of \({\text {ns}}\) on graph algebras of n-vertex undirected graphs:

$$\begin{aligned} {{\,\textrm{range}\,}}_n({\text {ns}}) = \big \{{{\text {ns}}(\mathbb {A}({G}))} : G \text { is an undirected graph on } n \text { vertices} \big \}. \end{aligned}$$

Corollary 4.8

Let n be a positive integer, and assume that \(n \ge 8\).

  1. (i)

    If n is even, then the three largest elements of \({{\,\textrm{range}\,}}_n({\text {ns}})\) are

    $$\begin{aligned} \frac{n^3}{2} - (2n-4), \quad \frac{n^3}{2} - n, \quad \frac{n^3}{2}. \end{aligned}$$
  2. (ii)

    If n is odd, then the four largest elements of \({{\,\textrm{range}\,}}_n({\text {ns}})\) are

    $$\begin{aligned} \frac{n^3-n}{2} - (2n-6), \quad \frac{n^3-n}{2} - (n+1), \quad \frac{n^3-n}{2} - (n-1), \quad \frac{n^3-n}{2}. \end{aligned}$$

Proof

The corollary can be derived from Theorem 4.6 and Lemma 4.5 as follows. Let G be a graph on n vertices, and let \({\hat{G}}\) denote the graph that we obtain from G by deleting all loops. If \({\hat{G}} \ncong K_{\big \lfloor n/2 \big \rfloor ,\big \lceil n/2 \big \rceil }\), then

$$\begin{aligned} {\text {ns}}(\mathbb {A}({G}))= {\text {ns}}(\mathbb {A}({{\hat{G}})}) - \sum _{v \in L(G)} 2d(v) \le 2n\bigg \lfloor \frac{n}{2}\bigg \rfloor \bigg \lceil \frac{n}{2}\bigg \rceil - 4\bigg \lfloor \frac{n}{2}\bigg \rfloor + 4, \end{aligned}$$

and we can have equality here, for example, if G is the graph described in Remark 4.7. If \({\hat{G}} \cong K_{\big \lfloor n/2 \big \rfloor ,\big \lceil n/2 \big \rceil }\), then

$$\begin{aligned} {\text {ns}}(\mathbb {A}({G}))= {\text {ns}}(\mathbb {A}({K_{\big \lfloor n/2 \big \rfloor ,\big \lceil n/2 \big \rceil })}) - \sum _{v \in L(G)} 2d(v) = 2n\bigg \lfloor \frac{n}{2}\bigg \rfloor \bigg \lceil \frac{n}{2}\bigg \rceil - \sum _{v \in L(G)} 2d(v). \end{aligned}$$

This number can be larger than \(2n\big \lfloor n/2 \big \rfloor \big \lceil n/2 \big \rceil - 4\big \lfloor n/2 \big \rfloor + 4\) only if G has at most one loop, and then the subtrahend \(\sum _{v \in L(G)} 2d(v)\) is either zero, \(2\big \lfloor n/2 \big \rfloor \) or \(2\big \lceil n/2 \big \rceil \). \(\square \)

Remark 4.9

Corollary 4.8 is valid also for \(n < 8\), in the sense that the largest elements of \({{\,\textrm{range}\,}}_n({\text {ns}})\) are

  • \(2n\big \lfloor \frac{n}{2}\big \rfloor \big \lceil \frac{n}{2}\big \rceil - 4\big \lfloor \frac{n}{2}\big \rfloor + 4\),

  • \(2n\big \lfloor \frac{n}{2}\big \rfloor \big \lceil \frac{n}{2}\big \rceil -2\big \lceil \frac{n}{2}\big \rceil \),

  • \(2n\big \lfloor \frac{n}{2}\big \rfloor \big \lceil \frac{n}{2}\big \rceil -2\big \lfloor \frac{n}{2}\big \rfloor \),

  • \(2n\big \lfloor \frac{n}{2}\big \rfloor \big \lceil \frac{n}{2}\big \rceil \),

but some of these numbers might coincide (even for odd n), and their order might be different.

We have seen that there are some gaps close to the top of \({{\,\textrm{range}\,}}_n({\text {ns}})\); however, the bottom of \({{\,\textrm{range}\,}}_n({\text {ns}})\) contains a long sequence of consecutive even numbers (asymptotically as long as the whole range).

Theorem 4.10

Let \(r_n\) be the greatest even integer such that all even numbers up to \(r_n\) belong to \({{\,\textrm{range}\,}}_n({\text {ns}})\). Then we have

$$\begin{aligned} \lim _{n \rightarrow \infty }\frac{r_n}{n^3}=\frac{1}{2}. \end{aligned}$$

Proof

The proof is quite long and technical, so we present it separately in Appendix B. \(\square \)

It is easy to determine graphs with the least possible nonzero index of nonassociativity (we will do this in Section 5, but we invite the reader to do this on their own now), and most of these graphs are not connected. Connected graphs are much more interesting in this respect, so we conclude this section by a characterization of the “most associative” connected graphs.

Let denote the graph that is constructed by connecting a vertex of \(K_a\) and a vertex of \(K_b\) by a path of length \(\ell \). Here, by the length of a path we mean the number of edges in the path, i.e., has \(a+b+\ell -1\) vertices. As an illustration, let us draw and (vertices of the connecting paths are colored gray):

figure b

Clearly, we have ; furthermore, the graphs , and are isomorphic, as all of them are paths of length \(\ell +2\). Therefore, in the following we will always assume that \(a \ge 2\) and \(b \ge 2\) when we consider the graphs . This way the path of length 2 cannot be written in the form , but it is included in the second item of the proposition below as .

Proposition 4.11

For any finite connected irreflexive undirected graph G on \(n \ge 3\) vertices, the following hold:

  1. (i)

    if \(G \cong K_n\), then \(|Ch(G)| = 0\);

  2. (ii)

    if , then \(|Ch(G)| = n-2\);

  3. (iii)

    if for some \(r,s \ge 2\), \(\ell \ge 1\) with \(r+s+\ell -1=n\), then \(|Ch(G)| = n-2\);

  4. (iv)

    if G is not isomorphic to any of the above mentioned graphs, then \(|Ch(G)| > n-2\).

Proof

The first three statements of the proposition are straightforward to verify. We prove (iv) by induction on n. The case \(n=3\) is void, and for \(n=4\) there are only two graphs (up to isomorphism) that are relevant to (iv), and both of them indeed have more than \(n-2=2\) cherries:

figure c

From now on we consider the case \(n \ge 5\). If every vertex of G has degree 1 or \(n-1\), then G is isomorphic either to the complete graph \(K_n\) or to the star \(K_{1,n-1}\). In the latter case \(|Ch(G)| = {n-1 \atopwithdelims ()2} > n-2\), since \(n \ge 5\).

Therefore, we may assume that there is a vertex x with \(2 \le d(x) \le n-2\). Setting \(A = N(x)\) and \(B = V(G) {\setminus } (A \cup \{x\})\), we have \(|A| = d(x) \ge 2\) and \(|B|=n-1-|A| \ge 1\). There are two types of cherries in G that contain the vertex x:

figure d

For type (a) we need that \(uv \notin E(G)\), thus the number of cherries of this type is the number of non-edges in the induced subgraph \(G\vert _A\); we will denote this number by \(n_A = {|A| \atopwithdelims ()2} - |E(G\vert _A)|\). Since \(xw \notin E(G)\) for all \(w \in B\) (by the very definition of the set B), the number of cherries of type (b) equals the number of edges uw with \(u \in A\) and \(w \in B\), which we will denote by \(e_{AB}\). Thus we have the following relationship between the number of cherries in G and in \(G{\setminus }\{x\}\):

$$\begin{aligned} |Ch(G)| = |Ch(G{\setminus }\{x\})| + n_A + e_{AB}. \end{aligned}$$
(4.2)

It is possible that \(n_A=0\), but the connectedness of G guarantees that \(e_{AB} \ge 1\) (recall that B is not empty, as \(d(x) \le n-2\)). We discuss four cases for \(G{\setminus }\{x\}\) corresponding to (i), (ii), (iii) and (iv).

If \(G{\setminus }\{x\} \cong K_{n-1}\), then \(|Ch(G{\setminus }\{x\})|=0\), \(n_A=0\) and \(e_{AB}=|A|\cdot |B|\). The case \(|A|=n-2\) is not possible, since then G would be isomorphic to . Therefore, \(2 \le |A| \le n-3\), which together with \(|A| + |B| = n-1\) implies \(|A| \cdot |B| \ge 2 \cdot (n-3) > n-2\). Now (4.2) gives \(|Ch(G)| = 0 + 0 + |A|\cdot |B| > n-2\), and this is what we had to prove.

If , then \(|Ch(G{\setminus }\{x\})|=n-3\), \(n_A \in \{0,1\}\) and \(e_{AB} \ge 2\). (The last inequality follows from the fact that for \(n \ge 5\) it is not possible to divide the vertices of into two parts in such a way that only one edge goes across the two parts.) By (4.2), we have \(|Ch(G)| \ge (n-3) + 0 + 2 > n-2\), as required.

If , then \(|Ch(G{\setminus }\{x\})|=n-3\), \(n_A \ge 0\) and \(e_{AB} \ge 1\), hence (4.2) yields \(|Ch(G)| \ge (n-3) + 0 + 1 \ge n-2\). We have equality here if and only if \(n_A=0\) and \(e_{AB}=1\). This means that A induces a complete subgraph of \(G{\setminus }\{x\}\) that can be separated from the rest of \(G{\setminus }\{x\}\) by the removal of a single edge, which is possible only if A corresponds to \(K_r\) or \(K_s\) at the isomorphism . Since G is obtained from \(G{\setminus }\{x\}\) by connecting the new vertex x to each element of A, either or , contradicting the assumption of (iv). This proves that \(|Ch(G)| > n-2\).

In the remaining cases the induction hypothesis gives that \(|Ch(G{\setminus }\{x\})|>n-3\), hence \(|Ch(G)| > (n-3) + 0 + 1 = n-2\) follows by (4.2), as \(n_A \ge 0\) and \(e_{AB} \ge 1\). \(\square \)

Theorem 4.12

For any finite connected undirected graph G on \(n \ge 3\) vertices, the following hold:

  1. (i/a)

    if , then \({\text {ns}}(\mathbb {A}({G}))= 0\);

  2. (i/b)

    if , then \({\text {ns}}(\mathbb {A}({G}))= 2(n-1)\);

  3. (ii)

    if , then \({\text {ns}}(\mathbb {A}({G}))= 4(n-2)\);

  4. (iii)

    if for some \(r,s \ge 2\), \(\ell \ge 1\) with \(r+s+\ell -1=n\), then \({\text {ns}}(\mathbb {A}({G}))= 4(n-2)\);

  5. (iv)

    if G is not isomorphic to any of the above mentioned graphs, then \({\text {ns}}(\mathbb {A}({G}))> 4(n-2)\).

Proof

The first four statements follow from propositions 4.1 and 4.11. In order to prove (iv), consider the graph \({\hat{G}}\) that is obtained from G by removing all loops.

If \({\hat{G}} \cong K_n\), then Proposition 4.1 shows that \({\text {ns}}(\mathbb {A}({G}))= 2(n-1) \cdot (n-|L(G)|)\). Here we must have \(|L(G)| \le n-2\) (otherwise we are in case (i/a) or (i/b)), hence \({\text {ns}}(\mathbb {A}({G}))\ge 2(n-1) \cdot 2 > 4(n-2)\).

If or , then G must have at least one loopless vertex (otherwise we are in case (ii) or (iii)); therefore, \({\text {ns}}(\mathbb {A}({G}))> 4 \cdot |Ch(G)|\) according to Proposition 4.1, and \(|Ch(G)| = n-2\) by Proposition 4.11. This proves that \({\text {ns}}(\mathbb {A}({G}))> 4(n-2)\).

If G is not isomorphic to any of the above mentioned graphs, then propositions 4.1 and 4.11 give \({\text {ns}}(\mathbb {A}({G}))\ge 4 \cdot |Ch(G)| > 4(n-2)\). \(\square \)

5 Conclusion

In this section we compare the extremal cases for the three measures of associativity (\({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))\), \({\text {ns}}(\mathbb {A}({G}))\) and \({{\,\textrm{spec}\,}}(\mathbb {A}({G}))\)) for undirected graphs. Before doing so, let us complement the results of the previous two sections by the following description of associative spectra of graph algebras of undirected graphs (the first item in the theorem is just a repetition of Corollary 2.13).

Theorem 5.1

[18]. Let G be an undirected graph.

  1. (i)

    If every nontrivial connected component of G is a reflexive complete graph, then \(s_n(\mathbb {A}({G})) = 1\) for all \(n \in \mathbb {N}\).

  2. (ii)

    If every nontrivial connected component of G is either a reflexive complete graph or a complete bipartite graph, and the last case occurs at least once, then \(s_n(\mathbb {A}({G})) = 2^{n-2}\) for all \(n \ge 2\).

  3. (iii)

    Otherwise \(s_n(\mathbb {A}({G})) = C_{n-1}\) for all \(n \in \mathbb {N}\).

By Theorems 3.6 and 4.6, the only antiassociative graph algebra on n vertices is \(\mathbb {A}({K_{\big \lfloor n/2 \big \rfloor ,\big \lceil n/2 \big \rceil }})\), if we measure associativity by \({\text {sdist}}_\textrm{gr}\) or by \({\text {ns}}\). However, \(\mathbb {A}({K_{\big \lfloor n/2 \big \rfloor ,\big \lceil n/2 \big \rceil }})\) is not antiassociative with respect to the associative spectrum (in fact, it is almost associative!), according to the theorem above. On the other hand, most graph algebras are antiassociative in the “spectral” sense by Theorem 5.1, and the other two measures of associativity can be very small for these graphs (see below).

The least nonzero value of \({\text {sdist}}_\textrm{gr}\) is 1, and we have \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))=1\) if and only if G is obtained from a graph \(H \in {\text {AssGr}}(V(G))\) by removing one loop, but G itself does not belong to \({\text {AssGr}}(V(G))\). By Corollary 2.13, this means that one component of G is isomorphic to for some \(r \ge 2\), and all other nontrivial components are reflexive complete graphs. For such a graph, we have \({\text {ns}}(\mathbb {A}({G}))= 2(r-1)\) by Proposition 4.1. Therefore, \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))=1\) implies \({\text {ns}}(\mathbb {A}({G}))\le 2(n-1)\) for graphs G with n vertices. (This explains the \((n-1)\) dots in the second column in Tables 1, 2 and 3.) Theorem 5.1 shows that these graphs have a Catalan spectrum.

Since the index of nonassociativity of a graph algebra is always an even number, the least possible nonzero value is 2, and, by Proposition 4.1, we have \({\text {ns}}(\mathbb {A}({G}))= 2\) if and only if \(Ch(G)= E_0(G)= \emptyset \) and \(|E_1(G)|=1\). It is straightforward to verify that this happens if and only if one component of G is isomorphic to (an edge with a loop at one endpoint) and all other nontrivial components are reflexive complete graphs. If G is such a graph, then \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))=1\), and G has a Catalan spectrum. (This explains why we only have one dot in the second row in Tables 1, 2 and 3.)

We see that if \(\mathbb {A}({G})\) is almost associative with respect to the index of nonassociativity (i.e., \({\text {ns}}(\mathbb {A}({G}))=2\)), then it is almost associative also with respect to \({\text {sdist}}_\textrm{gr}\) (i.e., \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))=1\)), but the converse implication is not true. However, if we restrict our attention to connected graphs with at least four vertices, then these two notions of almost associativity actually coincide. Indeed, for connected graphs with n vertices, the least nonzero value of \({\text {ns}}(\mathbb {A}({G}))\) is \(2(n-1)\), and this value is attained only for if \(n \ge 4\) (see Theorem 4.12). On the other hand, it follows from the discussion above that if G is connected, then \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))=1\) holds also only if .

We can conclude that for undirected graphs, \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))\) and \({\text {ns}}(\mathbb {A}({G}))\) match nicely (at least as far as antiassociativity and almost associativity are concerned), but the associative spectrum is quite unrelated to them. The following theorem, which is an analogue of Theorem 2.9, and Conjecture 5.3 below also show connections between \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))\) and \({\text {ns}}(\mathbb {A}({G}))\).

Theorem 5.2

For every undirected graph G with n vertices, we have

$$\begin{aligned} {\text {ns}}(\mathbb {A}({G}))\le 2(n-1) \cdot {\text {sdist}}_\textrm{gr}(\mathbb {A}({G})). \end{aligned}$$

Proof

Let \(H \in {\text {AssGr}}(V(G))\) such that \({\text {dist}}(\mathbb {A}({G}),\mathbb {A}({H}))= {\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))\). By (2.1), we have \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))= 2k + \ell \), where \(k = |E(G) \bigtriangleup E(H)|\) and \(\ell = |L(G)\bigtriangleup L(H)|\). We introduce some more notation to facilitate the proof: let \(e_i = |E_i(G)|\) and \(e'_i = |E_i(G) \cap E(H)|\) for \(i=0,1\). We claim that

$$\begin{aligned} 2e_0 + e_1 \le 2k + 2e'_0 + e'_1. \end{aligned}$$
(5.1)

Indeed, by the definition of k, we have

$$\begin{aligned} 2k \ge 2 \cdot |E(G) {\setminus } E(H)|&\ge 2 \cdot |E_0(G){\setminus } E(H)| + 2 \cdot |E_1(G){\setminus } E(H)|\\&= 2(e_0-e'_0) + 2(e_1-e'_1)\\&\ge 2(e_0-e'_0) + (e_1-e'_1), \end{aligned}$$

and this is equivalent to (5.1).

Let us consider the following sets for \(i=0,1\):

$$\begin{aligned} \Theta _i = \big \{ (xy,x) : xy \in E_i(G) \cap E(H) \text { and } x \in L(G)\bigtriangleup L(H) \big \}. \end{aligned}$$

If \(xy \in E_0(G)\cap E(H)\), then xy is an edge of H, hence, by Corollary 2.13 both x and y must have a loop in H. Since these two vertices do not have a loop in G, both (xyx) and (xyy) belong to \(\Theta _0\); consequently, \(|\Theta _0| = 2 e'_0\). Similarly, if \(xy \in E_1(G)\cap E(H)\), then exactly one of (xyx) and (xyy) belongs to \(\Theta _1\), hence \(|\Theta _1| = e'_1\). On the other hand, for each \(x \in L(G)\bigtriangleup L(H)\), there are at most \(n-1\) vertices y such that \((xy,x) \in \Theta _0 \cup \Theta _1\), thus \(|\Theta _0 \cup \Theta _1| \le (n-1) \cdot \ell \). This implies that

$$\begin{aligned} 2e'_0 + e'_1 = |\Theta _0| + |\Theta _1| = |\Theta _0 \cup \Theta _1| \le (n-1) \cdot \ell \end{aligned}$$

(note that \(\Theta _0\) and \(\Theta _1\) are disjoint). Comparing this with (5.1), we obtain that

$$\begin{aligned} 2e_0 + e_1 \le 2k + (n-1) \cdot \ell . \end{aligned}$$
(5.2)

Next we consider the following set:

$$\begin{aligned} \Xi = \big \{ (xyz,xy) : xyz \in Ch(G)\text { and } xy \in E(G) \bigtriangleup E(H) \big \}. \end{aligned}$$

Since H contains no cherries by Corollary 2.13, all cherries of G must be “destroyed”, i.e., each cherry of G appears at least once as the first component of an element of \(\Xi \), thus \(|\Xi | \ge |Ch(G)|\). On the other hand, for each \(xy \in E(G) \bigtriangleup E(H)\), there are at most \(n-2\) vertices z such that \((xyz,xy) \in \Xi \), hence \(|\Xi | \le (n-2) \cdot k\). This implies that

$$\begin{aligned} |Ch(G)| \le (n-2) \cdot k. \end{aligned}$$
(5.3)

Now we can prove the desired inequality with the help of (5.2), (5.3) and Proposition 4.1:

$$\begin{aligned} {\text {ns}}(\mathbb {A}({G}))&= 4 \cdot |Ch(G)| + 4e_0 + 2e_1\\&\le 4(n-2) \cdot k + 4k + 2(n-1) \cdot \ell \\&= 2(n-1) \cdot (2k + \ell )\\&= 2(n-1) \cdot {\text {sdist}}_\textrm{gr}(\mathbb {A}({G})). \end{aligned}$$

\(\square \)

Conjecture 5.3

For every undirected graph G with n vertices, we have

$$\begin{aligned} {\text {ns}}(\mathbb {A}({G}))\ge 2 \cdot {\text {sdist}}_\textrm{gr}(\mathbb {A}({G})). \end{aligned}$$

Remark 5.4

As we have already mentioned, we have \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))=1\) and \({\text {ns}}(\mathbb {A}({G}))=2(n-1)\) for . This shows that the estimate in Theorem 5.2 cannot be improved. Similarly, if G is a disjoint union of r copies of , then \({\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))=r\) and \({\text {ns}}(\mathbb {A}({G}))=2r\), hence Conjecture 5.3 is also sharp (if it is true at all). This conjecture can clearly seen in Tables 1, 2 and 3: there are no dots above the main diagonal.

Finally, let us list some topics for further research:

  1. 1.

    Prove or disprove Conjecture 5.3.

  2. 2.

    Complete the description of the set \({{\,\textrm{range}\,}}_n({\text {ns}})\) by determining all gaps in the range.

  3. 3.

    In Section 3 we considered \({\text {sdist}}_\textrm{gr}\) instead of the “real” semigroup distance. Determine the range of \({\text {sdist}}(\mathbb {A}({G}))\) for graph algebras of n-vertex undirected graphs and characterize graphs corresponding to the extremal cases. Let us mention that \({\text {sdist}}(\mathbb {A}({G})) < {\text {sdist}}_\textrm{gr}(\mathbb {A}({G}))\) holds for \(G=K_{\big \lfloor n/2 \big \rfloor ,\big \lceil n/2 \big \rceil }\) if \(n \ge 3\). Indeed, let H be the directed graph that we obtain from \(K_{\big \lfloor n/2 \big \rfloor ,\big \lceil n/2 \big \rceil }\) by making all edges one-way, pointing to the color class of size \(\big \lfloor n/2 \big \rfloor \), and adding a loop to each vertex of this color class. By Theorem 2.12, \(\mathbb {A}({H})\) is associative, and we have \({\text {dist}}(\mathbb {A}({K_{\big \lfloor n/2 \big \rfloor ,\big \lceil n/2 \big \rceil }),{\mathbb {A}({H})}}) = \big \lfloor n/2 \big \rfloor \cdot \big \lceil n/2 \big \rceil + \big \lfloor n/2 \big \rfloor \), which is less than \({\text {sdist}}_\textrm{gr}(\mathbb {A}({K_{\big \lfloor n/2 \big \rfloor ,\big \lceil n/2 \big \rceil })}) = 2 \cdot \big \lfloor n/2 \big \rfloor \cdot \big \lceil n/2 \big \rceil \) for \(n \ge 3\).

  4. 4.

    Investigate measures of (non)associativity for graph algebras of directed graphs (see [19] for a study of associative spectra of these graph algebras).