Generalizing p-Laplacian: spectral hypergraph theory and a partitioning algorithm

For hypergraph clustering, various methods have been proposed to define hypergraph p-Laplacians in the literature. This work proposes a general framework for an abstract class of hypergraph p-Laplacians from a differential-geometric view. This class includes previously proposed hypergraph p-Laplacians and also includes previously unstudied novel generalizations. For this abstract class, we extend current spectral theory by providing an extension of nodal domain theory for the eigenvectors of our hypergraph p-Laplacian. We use this nodal domain theory to provide bounds on the eigenvalues via a higher-order Cheeger inequality. Following our extension of spectral theory, we propose a novel hypergraph partitioning algorithm for our generalized p-Laplacian. Our empirical study shows that our algorithm outperforms spectral methods based on existing p-Laplacians.


Introduction
Graphs are one of the most widely used data representations for structured data, such as social networks (Newman, 2006), web pages (Brin and Page, 1998) and images (Shi and Malik, 1997). The graph Laplacian is a linear operator that characterizes the graph. A natural discrete optimization problem whose solution characterizes a balanced clustering is solved in its relaxed form by finding the second eigenvector of the graph Laplacian (Fiedler, 1973;Alon and Milman, 1985;Hagen and Kahng, 1992;Shi and Malik, 1997;von Luxburg, 2007). The Laplacian has been previously generalized to a nonlinear p-Laplacian in the context of machine learning. Improved performance as a function of p was previously demonstrated in various special cases such as (Bühler and Hein, 1 3 2009; Bougleux et al., 2009). The p-Laplacian has previously been motivated from the perspective of differential geometry (Zhou and Schölkopf, 2005), as well as from a Cheeger inequality perspective (Bühler and Hein, 2009).
Hypergraphs generalize graphs and serve as a natural representation of multi-relational data. The hypergraph representation has been used to model videos (Huang et al., 2009), web browsing histories (Mobasher et al., 2000), recommender engines (Gharahighehi et al., 2021) and cell molecules (Klamt et al., 2009). However, although a hypergraph is a natural data representation, generalization from graph Laplacian to hypergraph Laplacian is not straightforward. Thus, multiple such generalizations have been proposed in the literature (Agarwal et al., 2006;Saito et al., 2018;Hein et al., 2013;Li and Milenkovic, 2018). Furthermore, the extension from Laplacian to p-Laplacian may take different forms Li and Milenkovic, 2018). However, as shown in Table 1, although they have a similar structure, some Laplacians miss some key features. The objective of this work is to construct a theoretical structure to bring these similar but disparate models into one unified framework.
In our unified framework, we define an abstract class of hypergraph p-Laplacians that incorporates a number of previously proposed hypergraph p-Laplacians as well as previously unstudied novel hypergraph p-Laplacians. This framework builds on a limited special case previously proposed in Saito et al. (2018). The overall framework is inspired by a differential-geometric analogy from the continuous to the discrete domain. Exploiting the differential-geometric connection, we provide a generalized nodal domain theorem (see Theorem 9) and a generalized Cheeger inequality (see Theorem 10 and Corollary 11). These provide a theoretical justification and bounds for using the eigenvectors of a hypergraph p-Laplacian to perform partitioning. Exploiting these theoretical results, we provide an algorithm for finding an approximation to the second eigenvector. We provide an empirical study of this algorithm which shows that our algorithm outperforms a variety of existing hypergraph p-Laplacian based methods.  Zhou et al. (2006), and unnormalized clique is studied in Rodriguez (2002) and edge-normalized clique is in Saito et al. (2018) The tv is first proposed in Hein et al. (2013) and generalized to a submodular hypergraph (Li and Milenkovic, 2018). Ours incorporates all of these hypergraph Laplacians. All of these Laplacians have a similar structure, stated as ⟨ , Δ p ⟩ = S p ( ) (Proposition 4, for star p = 2 since p-Laplacian in a star form has not been studied), which gives a foundation of hypergraph cut partitioning algorithm. However, some Laplacians have missed some key characteristics in the

3
We highlight five salient contributions of this work.
1. From a differential-geometric perspective, we define an abstract class of p-Laplacians of hypergraphs that can incorporate previously proposed p-Laplacians as well as novel unstudied p-Laplacians. 2. We provide theoretical results for our abstract class of p-Laplacians, such as the Nodal domain theorem, the Cheeger inequality, and a bound on the relationship between the minimum Cheeger cut and the second p-eigenvalue of the p-Laplacian. 3. Exploiting these theoretical results, we propose a convergent hypergraph partitioning algorithm with respect to our abstract class of hypergraph p-Laplacians. 4. We demonstrate empirically that our method can improve the performance of the existing p-Laplacians. 5. Based on our theoretical and empirical observations, we provide guidance on the choice of p-Laplacian.
We remark that in the literature, there are a number of special case results (Zhou et al., 2006;Agarwal et al., 2006;Saito et al., 2018;Hein et al., 2013). These prior results derive a patchwork of nodal domain theorems, Cheeger inequalities, as well as partitioning algorithms for some particular cases of hypergraph p-Laplacians, as shown in Table 1.
The advantage of the approach here is that we define an abstract class of hypergraph p-Laplacians, and both our theory and our partitioning algorithm apply to the complete class. Finally, we provide guidance on how to select a particular value of p for hypergraph p-Laplacians. All proofs are in appendix sections.

Preliminaries
This section reviews the notions of hypergraph and Cheeger inequality.

Hypergraph notions
We begin with standard definitions and notations of a hypergraph. A hypergraph G = (V ,E, , ) , where V is a vertex set, an E is an edge set, is a vector {w(e)} e∈E where w ∶ E → ℝ + maps each edge with a weight, and is a vector { (v)} v∈V , where ∶ V → ℝ + maps each edge with a edge weight. The edge set E is a subset of all possible permutation of vertices, i.e., where P k denotes the set of permutations on {1, … , k} . A hypergraph is connected if there is a path for every pair of vertices. A hypergraph is undirected when the set of edges are symmetric; defining a relation ∼ between two edges as [v (1) , … , v (k) ] ∼ [v � (1) , … , v � (k) ] where , � ∈ P k (k = 1, … , n) and we denote a set of undirected edges as E u = E ∕ ∼ . In what follows, we assume that the hypergraph G is connected and undirected unless noted. We define the degree of a vertex v ∈ V as d(v) = ∑ e∈E∶v∈e w(e) , while the degree of an edge e ∈ E is defined as |e| . For the benefit of the representation of hypergraph, we define various matrices. Degree matrices D v and D e are diagonal matrices whose diagonal elements are degree of vertices and edges, respectively. Let W e be a diagonal |E| × |E| matrix, whose diagonal elements are weight of edge e. We denote by |V| × |E u | matrix H is an incidence matrix, whose element h(v, e) = 1 if vertex v is connected to the edge e, and 0 otherwise. For more details, see Berge (1984).

Cheeger inequality
This section reviews Cheeger inequality for the standard 2-Laplacian case. Influenced by the inequality of the eigenvalue of continuous Laplacian called Cheeger inequality, there is existing research on the Cheeger inequality in the discrete domain (Alon and Milman, 1985;Lee et al., 2014;Tudisco and Hein, 2016). This inequality shows the connection between the eigenproblem of Laplacian and a graph cut called as Cheeger cut Alon and Milman (1985). This inequality motivates us to use eigenvectors to partition in the following way. While the discrete optimization problem of finding a subset of V that minimizes the Cheeger cut is NP-hard (von Luxburg, 2007), the eigenproblem of the graph Laplacian is not NP-hard. Since the Cheeger inequality gives bounds between eigenvalues of graph Laplacian and optimal Cheeger cut, the Cheeger inequality can be said to guarantee the performance of the eigenproblem compared to the ground truth from the original cut problem. This performance guarantee enables us to use eigenvectors obtained by less computationally expensive eigenproblem instead of the costly ground truth from the discrete cut problem. In other words, the Cheeger inequality "connects" Cheeger cut and eigenproblem; the Cheeger inequality shows how much we approximate the original graph cut problem by relaxing this into the real-valued eigenproblem of Laplacian.
We observe the "connection" as follows. Let A ⊂ V be a set and A be a complement of U. The Cheeger cut may be defined as is the value of the optimal cut. The Cheeger inequality shows a connection between the eigenvalue of Laplacian and the Cheeger constant as Equation (1) shows how we approximate the Cheeger constant by relaxing the original discrete cut problem into the real-valued eigenproblem of the graph Laplacian. The Cheeger inequality guarantees the performance of the cut resulting from algorithms using the second eigenvector of Laplacian as follows. Let (B, B) be the cut found by the second eigenvector of the Laplacian , such that ({v ∶ (v) ≥ t}, {v ∶ (v) < t}) minimizing Cheeger cut. Chung (2007) showed that C(B) √ 2 2 . Therefore, by the upper bound of Eq. (1) we observe which gives a guarantee for the worst case of performance of spectral clustering. This motivates us to use spectral methods for graph partitioning problems. (2) 1 3

Hypergraph p-Laplacian
This section defines and discusses a hypergraph p-Laplacian and associated p-eigenpairs.
3.1 Differential operators: gradient ∇ , divergence div and p-Laplacian 1 p In this section, we aim to extend various differential operators proposed in Saito et al. (2018) to an abstract class of p-Laplacians. We firstly introduce the following two inner product spaces H(V) and H(E) of real-valued functions over the vertex set and the (directed) edge set respectively, We next define three operators on these spaces; gradient ∇: . These operators are discrete geometric analogs to the comparable operators in the continuous differential geometry. In the continuous domain, for the second differentiable function f, the p-Laplace operator is defined as where operators with superscripted by (c) are the standard continuous calculus ones. In the following, we would like to establish a differentialgeometric framework in a generalized discrete setting analogous to the continuous one to define an abstract class of p-Laplacians. The operators div , Δ p were introduced in the graph setting (Zhou and Schölkopf, 2005;Grady, 2006) and generalized to the hypergraph setting in , whereas a similar formulation of ∇ was given graph and hypergraph settings (Zhou and Schölkopf, 2005;Saito et al., 2018). The definition that we propose below broadly generalizes all previous definitions. We define and discuss its interpretation below.
We propose to define the hypergraph-gradient as follows. The definition below is the generalization of the definition of gradient over hypergraphs proposed in Saito et al. (2018).
where the operator ∇ and the function c (u, v, e, ) satisfies the following three conditions for all e ∈ E and vertices u, v ∈ e; (3) This hypergraph-gradient can be intuitively interpreted as follows. The term (u)∕ 1∕p (u) − (v)∕ 1∕p (v) can be interpreted as "roughness" (normalized by ) between two vertices. The hypergraph-gradient is a sum of all possible combinations of this term for the edge e. Hence, the hypergraph-gradient can be intuitively seen as the roughness in one edge, similar to the continuous gradient ∇ (c) .
The definition of the hypergraph-gradient function depends on a "weighting" function c (u, v, e, ) . This weighting function can be seen as a coefficient of the difference between every pair of vertices. Varying c (u, v, e, ) allows us to model different types of hypergraph expansions including but not limited to the star (Zhou et al., 2006) or clique expansions ) (see Table 2 for details), i.e., the function c enables the following our generalized p-Laplacian framework to be abstract.
We leave a few remarks on equations of gradient Definition 1. First, the gradient operator ∇ and the function c has three conditions described as Eqs. (5), (6) and (7). Equation (5) requires the operator ∇ to be either homogeneous or absolute homogeneous. Equation (6) requires that the summation of the function over all the pairs of vertices at an edge e is independent of . Equation (7) enforces that the function c 1∕p is independent of once we fix one edge and one vertex in the edge. In the following, when c is not differentiable, we consider subdifferential instead of derivative. For a more detailed discussion on these conditions, see Appendix 1. We normalize by vertex weights . We call the vertex weights unnormalized when v = 1 and normalized when (v) = d(v) , ∀v ∈ V . We observe that the existing unnormalized p-Laplacian such as Hein et al. (2013) (v) = 1 and normalized 2-Laplacian (Zhou et al., 2006;Saito et al., 2018) The following definition of a divergence operator is inspired by an analogy to the continuous setting.

Definition 2 A hypergraph divergence is an operator
Note that Definition 2 is an analog to the continuous Stoke's Theorem. Also, we can check that div is unique. Intuitively, divergence counts the net flow defined by on the vertex, similarly to the intuition in the continuous domain.
Finally, we propose to define p-Laplacian.

p-Dirichlet sum and p-Laplacian
This section defines the p-Dirichlet sum, which can be interpreted as energy over the hypergraph. Also, we discuss relations between the p-Dirichlet sum and the p-Laplacian. Lastly, we discuss how these relations are the foundation of graph partitioning. Using the norm defined by the Hilbert space in Eq. (4), we define p-Dirichlet sum of ∈ H(V) as which measures roughness of over the hypergraph. Hence, it is natural to interpret the p-Dirichlet sum as an energy over hypergraph. Later we will use this energy serves as the objective function of the hypergraph partitioning.
For the p-Dirichlet sum and p-Laplacian, the following relationships hold; These relations are important both in the continuous and discrete domains. In the continuous domain, the analog of these relations is fundamental for an important problem on p-Laplacian, called Dirichlet Principle (Courant and Hilbert, 1962); the Dirichlet energy is minimized when the Laplace equation is satisfied. For the clustering in the discrete domain, we minimize the p-Dirichlet sum. To do so, we consider a problem similar to the Laplace equation, which is the eigenproblem of Laplacian. This is how we see an analogy between continuous differential geometry and this discrete geometry. In the following, we illustrate this with an example of the standard graph 2-Laplacian case. Let L be a standard normalized graph 2-Laplacian L = D −1∕2 (D − A)D −1∕2 where A is an adjacency matrix, and D is a diagonal degree matrix. This Laplacian is computed through our framework if we set hypergraph-gradient as Note that we put = {d(v)} v∈V . We then obtain which corresponds to L . From Eq. (8), the energy is defined as This corresponds to Proposition 4. Differentiating S 2 ( ) by , we observe which corresponds to Proposition 5. More detailed steps are explained in Appendix 2. In the graph partitioning context, Eqs. (11) and (12) serve as a foundation of the relationship between balanced cuts and graph Laplacian. Using these properties and Courtant's minmax theorem, we can connect the discrete cut problem into the eigenproblem of Laplacian by the Nodal domain theorem and the Cheeger inequality, as we will see in the next sections.
We often see the properties of Propositions 4 and 5 in the graph p-Laplacian (Bühler and Hein, 2009;Bougleux et al., 2009). Moreover, also in the hypergraph context, without a defining differential geometric setup, we see these properties in the existing hypergraph Laplacians, as seen in Table 1 (Zhou et al., 2006;Saito et al., 2018;Hein et al., 2013;Li and Milenkovic, 2018). Hence, it is natural to expect that all of the hypergraph Laplacians have a similar structure in this sense. However, as we see in Table 1, some Laplacians miss some features; particularly, some Laplacians miss the useful nodal domain theorem and Cheeger inequality (discussed in Sect. 4). Note that we "borrow" these results from the continuous differential geometry. One of the benefits of our abstract Laplacian is to give comprehensive analyses to all Laplacians defined from the gradient Definition 1.

p-Eigenproblem of p-Laplacian
Next, we discuss the eigenproblem of this p-Laplacian. Since a p-Laplace operator is nonlinear, we introduce the standard generalization of eigenpair for p-Laplacian (see for examples of Tudisco and Hein (2016)).

Definition 6 [4]
Let p (x) ∶= |x| p−1 sgn (x) . For p 1 , a hypergraph p-eigenpair, which is a pair of p-eigenvalue ∈ ℝ and p-eigenvector ∈ H(V) of Δ p , is defined by In the standard Laplacian, one can show the connection between its eigenpair and Rayleigh quotient from the matrix theory as well as the continuous analysis. To obtain p-eigenpair, we consider the following Rayleigh quotient:

Proposition 7 Consider the Rayleigh quotient for our abstract class of p-Laplacians as
The function R p has a critical point at * if and only if * is p-eigenvector of Δ p . The corresponding p-eigenvalue * is given as * p = R p ( * ) . Moreover, the first p-eigenvalue is 0, For the standard Laplacians, the first p-eigenvector is equal to c for unnormalized and cD 1∕2 v for normalized case. For more properties of p-eigenpair, see Appendix 5.

p-Laplacians and related regularizers
This section shows that various hypergraph Laplacians and related regularizers can be seen as a special case of our framework. We discuss a clique expansion way (Rodriguez, 2002;Saito et al., 2018), a star expansion way (Zhou et al., 2006), a total variation way , and a submodular hypergraph way (Li and Milenkovic, 2018).

Clique expansion hypergraph Laplacian (clique)
This approach constructs a graph where a clique replaces every pair of vertices in an original edge of a hypergraph. Hypergraph clique 2-Laplacian normalized by a degree of edge (clique e-n) is proposed by Saito et al. (2018), and clique 2-Laplacian but edge-unnormalized (clique e-un) is proposed Rodriguez (2002). For those 2-Laplacians, clique Laplacian . Note that this can be seen as hypergraph contraction to graph, represented by W, and L is a standard 2-Laplacian induced by W.

Star expansion hypergraph Laplacian (star)
This way constructs a graph by making a new vertex for every edge to form a star. This Hypergraph 2-Laplacian can be written as (Zhou et al., 2006). We remark that this view is also hypergraph contraction to graph, represented by adjacency matrix HW e D −1 e H ⊤ . Note that this Laplacian can be seen as the standard Laplacian if we consider hypergraph as a graph, except for the coefficient 1/2. This coefficient difference comes from the nature of this view, as discussed in Saito et al. (2018).

Total variation and submodular p -Laplacian (tv/sub)
The total variation (tv) approach for hypergraph has been considered in a different context than the other two . The TV regularizer is defined as which is not normalized by a degree of vertex (tv v-un). We here propose normalized total variation (tv v-n) p-Laplacian, whose regularizer we define as This TV p-Laplacian is incorporated by the submodular p-Laplacian (Li and Milenkovic, 2018). The extensive study by Li and Milenkovic (2018) considers hypergraph p-Laplacian in the context of a submodular function, which we refer to as sub. For a submodular function F ∶ 2 |e| → [0, 1] , associated with edge e, the submodular p-Laplacian is associated with the following energy; . Note that this form is one form of Lovász extention. By taking F(S i ) = 1 for all i, we obtain tv p-Laplacian.

Connections between our p-Laplacian and existing Laplacians
These Laplacians can be seen as a special case of our abstract Laplacian, defined by Definition 3, followed by hypergraph-gradient (Definition 1) and hypergraph-divergence (Definition 2). Table 2 summarizes the corresponding function c (u, v, e, ) in the definition of hypergraph-gradient. Edge normalized and unnormalized clique 2-Laplacians in Table 2 are 2-Laplacians proposed by Saito et al. (2018) and Rodriguez (2002), respectively. Star 2-Laplacian in Table 2 is equal to the Laplacian proposed by Zhou et al. (2006). The regularizer of unnormalized TV p-Laplacian in Table 2 corresponds one by Hein et al. (2013). We also note that all the functions of c(u, v, e, ) satisfy the condition of Definition 1 (see Appendix 1). For more discussion, see Appendix 6.

Properties of p-Eigenpair of p-Laplacian
This section discusses the properties of the p-eigenproblem of our hypergraph p-Laplacian. Hence, we aim to establish the theoretical background of spectral clustering using p-Laplacian, such as the nodal domain theorem and the Cheeger inequality. The nodal domain theorem is about the bounds of the number of nodal domains, which can be seen as a "division". Using this nodal domain, the Cheeger inequality shows how much p-eigenproblem can approximate a minimal graph cut.

Nodal domain theorem of the p-Laplacian
This section aims to extend the classical nodal domain theorem to our framework. The nodal domain theorem in the discrete domain is developed analogously from the Courant's nodal domain theorem in the continuous domain (Courant and Hilbert, 1962). In the continuous case, a nodal domain is defined as a region for a function where a sign does not change. Therefore, a nodal domain marks the natural division of regions of real values. The nodal domain theorem shows a connection between eigenvectors of Laplacian and nodal domains; the theorem describes the bounds of the number of nodal domains of eigenvectors of Laplacian (Lindqvist, 2008). The same idea can be established in the discrete domain, i.e., a nodal domain is a connected sub-hypergraph where a sign of p-eigenvector does not change. This nodal domain can be seen as a "partition" by the p-eigenvector in the discrete domain. A next question is "can we obtain a similar bound to the number of this nodal domain?" We begin with the definition of a nodal domain for a hypergraph.

Definition 8 A nodal domain is a maximally connected subgraph
Next, with this definition, we discuss the nodal domain theorem for our hypergraph p-Laplacian. The nodal domain theorem for graph Laplacian has been proven in Fiedler (1975), generalized to graph p-Laplacian by Tudisco and Hein (2016), and extended to a particular type of hypergraph p-Laplacian by Li and Milenkovic (2018). In this line of research, we extend these nodal domain theorems to our abstract class of hypergraph p-Laplacians as follows; Theorem 9 Let 0 = 1 2 ≤ … ≤ k−1 k = … = k+r−1 k+r ≤ … , be eigenvalues of Δ p , and k is an associated eigenvector with k . Then k induces at most k + r − 1 nodal domains.
As seen in Theorem 9, the nodal domain theorem studies the structure of p-eigenvectors of p-Laplacian; Theorem 9 shows the bound on the number of nodal domains of p-eigenvectors. The number of nodal domains matters to Cheeger inequality, which is a theoretical justification for spectral methods via our p-Laplacian. We will discuss this Cheeger inequality next.

k-way cheeger inequality
This section establishes the k-way Cheeger inequality for our hypergraph p-Laplacian. As we saw in Sect. 2.2, the 2-way Cheeger inequality serves as the connection between Cheeger cut and eigenproblem. Moreover, the inequality gives a performance guarantee of the relaxed graph partitioning problem. We want to establish such a connection between the Cheeger cut and p-eigenproblem of our hypergraph p-Laplacian. For this purpose, we aim to generalize this Cheeger inequality to our hypergraph p-Laplacians to achieve spectral partitioning via our p-Laplacian.
We start our discussion from a 2-way Cheeger cut. Let A ⊂ V be a set and A be a complement of A. The generalized Cheeger cut may be defined as We call the optimal cut h 2 ∶= min A⊂V C(A) as Cheeger constant. Considering the standard graph, this generalized Cheeger cut becomes the standard Cheeger cut discussed in Sect. 2.2. We shall extend this generalized 2-way Cheeger cut to k-way Cheeger cut. Con- Similarly to the previous studies (Tudisco and Hein, 2016;Li and Milenkovic, 2018), we establish k-way Cheeger inequality for our p-Laplacian as follows.
Theorem 10 Let ( k , k ) be a p-eigenpair of Δ p , m k be the number of nodal domains of k . Then, u,v∈e c(u, v, e, ).

Corollary 11 Let (B, B) be the cut found by the second eigenvector of the p-Laplacian
Theorem 10 is an extension of the graph Cheeger inequality in terms of three perspectives; graph to hypergraph, 2-way to k-way, and the standard 2-Laplacian to our abstract class of p-Laplacians. Following Theorem 10, Corollary 11 is the bound of the relationship between the cut obtained by the second p-eigenvector of our abstract class of p-Laplacians and the generalized Cheeger constant. Similarly to the classical case in Sect. 2.2, Theorem 10 shows how we approximate the k-way Cheeger constant by relaxing discrete k-way cut problem into p-eigenproblem of Δ p ; Theorem 10 gives the upper and lower bounds of the optimal cut using k-th p-eigenvalue. Moreover, Corollary 11 gives a guarantee for the worst case of a 2-way cut obtained by p-eigenvector. These bounds can be said to guarantee the performance of the cut resulting from spectral methods via p-eigenvectors of our p-Laplacian. Hence, Theorem 10 and Corollary 11 motivate us to use spectral methods via our p-Laplacian for the hypergraph partitioning problem instead of the costly discrete original cut problem. This inequality gives the tightest bound when p → 1 . Since the original cut problem is NP-hard, the eigenproblem is also an NP-hard problem in this asymptotic case. Moreover, by considering the case of the standard graph 2-Laplacian, this inequality can be reduced to the classical Cheeger inequality. Also, when k = 2 , this inequality is for h 2 , which is a 2-way Cheeger cut. Therefore, in the next section, we focus on constructing a spectral algorithm for a 2-way partitioning.
Finally, we remark that the discussion on k-way Cheeger cut is a generalization of the standard graph 2-way Cheeger inequality of 2-Laplacian (Alon and Milman, 1985;Alon, 1986), k-way Cheeger inequality of 2-Laplacian (Lee et al., 2014), k-way Cheeger inequality of graph p-Laplacian (Tudisco and Hein, 2016), and k-way Cheeger inequality of p-Laplacian of submodular hypergraph (Li and Milenkovic, 2018) cases. We also note that the proofs for the nodal domain theorem (Theorem 9) and the Cheeger inequality (Theorem 10) are a natural generalization of the previous studies such as Tudisco and Hein (2016) and Li and Milenkovic (2018). Rather than introducing new techniques in the proofs, the focus of this work is that we generalize the hypergraph p-Laplacian as much as possible where these structures preserve in order to provide a unified framework.

Hypergraph partitioning via p-Laplacian
Sect. 4 shows the guarantee of performance of eigenproblem instead of the NP-hard discrete Cheeger cut problem. Therefore, this section establishes our partitioning algorithm, exploiting p-eigenpairs of our hypergraph p-Laplacian.
We firstly discuss a property of p-eigenvectors of Δ p . For the p-Laplacian eigenproblem, since the p-Laplacian is nonlinear, p-eigenvectors are not necessarily be orthogonal to each other. However, we still want a relationship between p-eigenvectors. For this motivation, instead of the orthogonality, Luo et al. (2010) proposed p-orthogonality as follows.
Definition 12 Let Ξ p ( ) be a vector, whose v-th element is p ( (v)) . We call ≠ 0 and In order to analyze this p-orthogonality of our abstract class of p-Laplacians, we recall the Taylor expansion, which is often used for approximating functions in physics. For example, in the motion of a pendulum, if we approximate functions with respect to the amplitude of the angular of the pendulum by Taylor expansion, the motion equation is approximated by a simple harmonic motion (Courant and Hilbert, 1962). The Taylor expansion leads an infinite differentiable function f(x) to write as f ( x is a constant, and f (n) is a n-th derivative of f. This Taylor expansion is often used to approximate the function. If we consider approximating the function by the first order, the remainder (the second or higher terms) can be seen as the approximation error. For two functions f and g, if the error term can be written as the sum of the second or higher terms, i.e., f ( n g , and f ,g,n f ,n g is a coefficient, then we call the function f is equal to g up to the second order of Taylor expansion. Using this notion and p-orthogonality, we obtain the following; Theorem 13 tells us that two p-eigenvectors are approximated p-orthogonal, up to the second-order of Taylor expansion. We move our discussion to the second p-eigenpair by considering the Rayleigh quotient. In the graph p-Laplacian (Bühler and Hein, 2009) and the clique p-Laplacian case  and also the continuous case (Lindqvist, 2008), the global minimum of a variant of Rayleigh quotient gives the second p-eigenpair. Similarly to these works, we propose to define the following quotient as where 1 is the first p-eigenvector. This quotient is supported by the following theorem.

Theorem 14 The global solution of Eq. (19) is given by
This theorem shows that we have an exact identification for the second p-eigenpair; minimizing Eq. (19) gives the second p-eigenpair of Δ p . However, the major disadvantage is that Eq. (19) is not convex and hence difficult to obtain the global optimum; optimization algorithms applied to Eq. (19) would give the local optimum instead of the global optimum.
Therefore, we next consider a strategy to get a better local optimum for a 2-way hypergraph partitioning. The idea to obtain a better optimum is using the exact p-orthogonality as a constraint, instead of the constraint "p-orthogonal up the second order", which each pair of p-eigenvectors must obey (Theorem 13). The reason why we use this strategy is as follows. Due to the non-convexity of Eq. (19), the solution obtained by an optimization algorithm can be a local optimum. However, this local optimum is not guaranteed to be p-orthogonal up to the second-order to the first p-eigenvector 1 while 2 is so. To gain a better optimal solution, we exploit Theorem 13, and we want a constraint that enforces that the solution to be p-orthogonal up the second-order to 1 . However, it is difficult to work directly with this constraint "p-orthogonal up the second order". To ease this difficulty, we propose to use an exact p-orthogonal condition as a constraint. Thanks to Theorem 13, this exact constraint can be seen as an approximated condition by the second order of Taylor expansion. We borrow this approximation idea from physics; it is common to approximate physical phenomena by the second order of Taylor expansion, such as the explained motion of a pendulum case. Following this discussion, we incorporate the exact p-orthogonality as a constraint. Then, we consider the optimization problem as,  (21), we propose to apply natural gradient algorithm (Amari, 1998) as shown in Algorithm 1, similarly to Luo et al. (2010). If we use a simple gradient method as J � ∕ , the orthogonal condition does not hold for each update. Instead of using this for the update of Algorithm 1, we use ) ⊤ so that we can preserve the orthogonal condition in Eq.(21) (Luo et al., 2010). The convergence of this algorithm is also guaranteed (Luo et al., 2010).

Related work
This section compares related hypergraph 2-Laplacians and p-Laplacians and partitioning algorithms. This section is complementary to Sect. 3.4. While Sect. 3.4 defines the related p-Laplacians, this section focuses on discussing the context and explaining the difference between ours and existing ones. One major hypergraph Laplacian is from a clique expansion way (clique). The unweighted setting edge-unnormalized 2-Laplacian was proposed in Rodriguez (2002) (clique e-un). This 2-Laplacian and Laplacians proposed in other studies (Zien et al., 1999;Bolla, 1993;Gibson et al., 2000) are theoretically equivalent (Agarwal et al., 2006). In this line of research, 2-Laplacian from a differential geometry viewpoint is proposed . When p = 2 , this Laplacian also can be explained by the clique expansion way but normalized by a degree of edge (clique e-n). Moreover, this p-Laplacian is proposed based on forming vertex-wise energy (clique e-n-vw) , while ours is edge-wise energy. In Saito et al. (2018), the p-energy S VW p ( ) is defined using the norm of the hypergraph-gradient is defined ∇ at vertex v as This idea comes from the definition of the energy around vertex as ‖∇ (v)‖ and obtains total energy by summing up those energies over all vertices. Note that if we assume the standard graph, p-Laplacian in Saito et al. (2018) corresponds to a series of graph studies (Zhou and Schölkopf, 2005;Bougleux et al., 2009), which also assume vertex-wise energy. On the other hand, ours corresponds to the p-Laplacian, which assumes edge-wise energy (Bühler and Hein, 2009;Tudisco and Hein, 2016). Hence, our work does not incorporate p-Laplacian proposed in Saito et al. (2018) since the p-Dirichlet sum setting is different. Remark that when p = 2 , our model incorporates CLIQUE E-N-VW by using c in Table 2. However, Saito et al. (2018) did not give theoretical analyses such as the nodal domain theorem or the Cheeger inequality. Moreover, Saito et al. (2018) did not give a specific partitioning algorithm exploiting characteristics of p-Laplacian such as p-orthogonality. Hence, we need to use a general-purpose optimization method for the p-eigenproblem. However, such methods do not always leverage the characteristics of p-Laplacian, which would possibly lead to better performance in terms of space, time, and accuracy.
Another line of research is in a star expansion way, shown in Sect. 3.4. Zhou et al. (2006) proposed 2-Laplacian based on a lazy random walk view. Agarwal et al. (2006) shows that this 2-Laplacian is theoretically equivalent to Laplacians by studies of (Zien et al., 1999;Li and Solé, 1996), also further discussed in Ghoshdastidar and Dukkipati (2017a).
Other Laplacian is from a total variation way and subsequent submodular way (tv/sub). A regularization framework for p ≥ 1 is proposed in Hein et al. (2013) with hypergraph partitioning algorithm for p = 1 , and further explored in Chan et al. (2018). This idea is extended to a submodular hypergraph (Li and Milenkovic, 2018). A submodular hypergraph has an objective function of energy using one form of Lovász expansion of a submodular function. Moreover, sub incorporates the inhomogeneous cut proposed by Li and Milenkovic (2017), where weights can vary when we partition the edge. Along with this new class of hypergraph cut, Li and Milenkovic (2018) proposed partitioning algorithms for p = 1 and p = 2. Seeing the definition (Eq. (15)), submodular p-Laplacian describes a broad class hypergraph p-Laplacian using submodular function. We also mention the p = 2 case for submodular cut objective functions are discussed in Yoshida (2019) using the general form of Lovász extension. Moreover, a series of research Benson et al., 2020) directly defines objective function using submodular function, instead of Lovász extension. While submodular models seem flexible, ours are more versatile since we do not assume submodularity. The submodular p-Laplacian is a special case of ours as long as the conditions in Definition 1 are satisfied. Additionally, our algorithm can address arbitrary p, while algorithms from Hein et al. (2013) and Li and Milenkovic (2018) focused on the specific p ( p = 1 or p =2).
We remark that our framework can address existing 2-Laplacian from clique and star, and tv/sub p-Laplacian. Moreover, our partitioning algorithm can work for arbitrary p 1 , while those existing algorithms focus on specific p or use a general-purpose optimization algorithm without theoretical analyses. We also note that our framework can define a new p-Laplacian, which is (but not limited to) normalized TV, shown in Sect. 3.4. However, we need to recognize that it is out of the scope of our work to incorporate clique e-n-vw p-Laplacian. Moreover, since our framework based on the relationships of Propositions 4 and 5, our framework does not incorporate a tensor modeled Laplacian for uniform hypergraph, where all edges connect the same number of vertices (Cooper and Dutle, 2012;Hu and Qi, 2012;Qi, 2013;Hu and Qi, 2015;Chen et al., 2017;Ghoshdastidar and Dukkipati, 2017b;Chang et al., 2020). The reason why we cannot incorporate clique e-n-vw p-Laplacian and tensors into our model is that our model is based upon the energy formed as Eq. (8), while energies for those two are differently defined. Particularly, we note that the difference in the aims between tensor modeled Laplacians and our framework is as follows; while the tensor modeled Laplacians are the tensor operation, our framework focuses on the contraction made by the energy Eq. (8).
Lastly, we comment on p-Laplacians in the continuous domain. The continuous p-Laplacian has a longer history than in the discrete domain. The Dirichlet energy is defined similarly to Eq. (8), and the variation of the energy would give the Laplace equation (Courant and Hilbert, 1962). The energy is minimized when the Laplace equation is satisfied. This framework extended to arbitrary p-norm, such as Binding and Rynne (2008), and was theoretically analyzed in many ways, such as nodal domain theorem and Cheeger inequality. We remark that in the continuous case, we can identify the second p-eigenpair similarly to Eq. (19), but no exact identification for the third or higher has been found yet (Lindqvist, 2008). For more comprehensive study, we refer to Lindqvist (2008) and Struwe (2000).

Preliminary experiments
Our experiments aim to evaluate our approximation algorithm (Algorithm 1) as a function of p and the particular type of hypergraph Laplacians (star, clique, and tv/sub).
Primary Objective of the Experiments. The objective of the experiments is to see if our algorithm (Algorithm 1) improves on the existing methods introduced in Sect. 6. Algorithm 1 has two key "levers"; the choice of the parameter p and the choice of hypergraph Laplacian, i.e., the function c in the gradient (Definition 1). On the one hand, in the previous works discussed in Sect. 6 the algorithms for hypergraph p-Laplacians were designed for a particular p (e.g., p = 1, 2 ) or applied to all p > 1 without theoretical justifications. On the contrary, Algorithm 1 for our abstract class of hypergraph p-Laplacians works for all p > 1 with theoretical justification. Therefore, we provide experiments for a wide range of hypergraph Laplacians for p > 1 in comparison to existing algorithms. We thus apply Algorithm 1 to five existing hypergraph Laplacians (clique e-n, clique e-un, star, tv v-un, and tv v-n) for p > 1 . We compare these to the existing fixed p algorithms for particular type of Laplacians. Moreover, since clique e-n has a partitioning algorithm using a particular hypergraph p-Laplacian (clique e-n-vw by Saito et al. (2018), see Sect. 6 for the definition), we also compare to this. Hence, we compare five instantiations of ours with six previous algorithms as; • Algorithm 1 for all p > 1 is applied to the five geometries: Note that there are variety of submodular function for sub can be considered, but we made tv by Hein et al. (2013) as a representative of the sub group.
Experimental Setup. We build a hypergraph using the method for categorical datasets introduced in Zhou et al. (2006). Each instance in the dataset consists of |E| categories. The vertices of the hypergraph are the instances. The edges are defined by the attribute values. Each attribute value within a given category defines an edge where each vertex in the edge corresponds to those instances that share the same attribute value. All edges are given weight one. Our experiment is performed on the datasets mushroom, cancer, chess, and congress from the UCI repository (Dua and Graff, 2022), and two datasets created from 20newsgroups 1 (for short "news") with two classes (1,2) and (3,4). All of these were used in the previous studies (Zhou et al., 2006;Hein et al., 2013;Saito et al., 2018). We summarize the datasets in Table 3. The value = ∑ e∈E |e| /|E| is the average edge degree. Furthermore, = ∑ e∈E |e| ∕ |V| |E| is the average ratio of the number of vertices connected by each edge to the total number of vertices, which we can recognize as "density" of a hypergraph. In Table 4 we compare 11 instantiations of hypergraph p-Laplacians as discussed above. For clique e-n-vw ( p ∈ [1, 3] ) we conducted experiments using the same setting as Saito et al. (2018) as the setting matches  Table 4 The experimental results for hypergraph partitioning for our methods and existing ones. We applied our algorithm 1 for p > 1 to five geometry of the hypergraph Laplacians (clique e-n, clique e-un, star, tv v-un, tv v-n).
We compared these to the existing fixed p algorithms for the five hypergraph Laplacians; clique e-n for p = 2 by Saito et al. (2018), clique e-un for p = 2 is by Rodriguez (2002), star p = 2 is by Zhou et al. (2006), and tv v-un and tv v-n for p = 1 is by Hein et al. (2013). Moreover, for clique e-n, we also compared with the algorithm for p > 1 (clique e-n-vw) by Saito et al. (2018). Thus, we compare five instantiations of ours with six existing ones. We compare the performance by error. Performance with ours is marked with #. For free-parameter p > 1 to ours. For our methods we used p ∈ {1.1, 1.2, … , 3.0} ; we limited ourselves to p ≤ 3 since the Cheeger Inequality (Theorem 10), is progressively looser for larger p values. For the free-parameter experiments, we set the starting condition of our algorithm to the solution of the corresponding fixed-parameter Laplacian. We used the step size = 0.01‖ ‖ 1 1 /‖Ω‖ 1 1 as done in Luo et al. (2010). For all methods, a second eigenvector was hence computed, and we used the k-means objective to determine the "split point" on the eigenvector (as was also done in (Zhou et al., 2006;Saito et al., 2018)). We evaluated the performance of our algorithms via their error rate, i.e., (# of errors)/(# of data), as used in (Zhou et al., 2006;Hein et al., 2013;Saito et al., 2018).
Overall Results The results are summarized in Table 4. First, looking into our algorithm (Algorithm 1) vs. fixed-parameter algorithms (existing methods, see the performances associated with in Table 3) for five geometries, we see that our methods consistently demonstrate improved performance from existing fixed-parameter methods. We also remark that among for clique e-n ours consistently outperforms clique e-n-vw, except chess.
Further Discussion A natural question to ask for our algorithm is "which hypergraph Laplacian and which p is suitable?". A further look into our abstract class of p-Laplacians can answer this question; the experimental result reveals how the choice of p and type of the hypergraph Laplacian are connected to the underlying parameters (average edge degree) and (density) of the datasets. Although the experiments are preliminary, there seem to be consistent trends that provide guidance on a range of p and the type of Laplacian to consider. Further, the experimental guidance is supported by the theory given earlier in this manuscript.
Our observation is that the density parameter ( ) is related to the range of p while the the average edge degree parameter ( ) is connected to the hypergraph Laplacian. The density parameter ( ) indicates the natural range for p. The dataset chess is significantly more dense (large ) than the other datasets. The table indicates that while large p tends to work better for the chess dataset, the tendency is that small p improves on large p for the non-chess datasets. To understand this, we consider the trade-off between the Cheeger inequality (see Theorem (10)) and the p-Dirichlet sum. The Cheeger inequality is tighter for smaller p; hence, the relaxed objective becomes closer to the discrete objective. On the other hand, if we examine the p-Dirichlet sum (see Eq. (8)), one may observe that it is a p-norm to p-th power of the hypergraph-gradient. The dimensionality of hypergraph-gradient scales with the graph density ( ). Hence in the dense case, a relatively larger p is needed to induce the same magnitude of change in the p-Dirichlet sum, which is connected to the second p-eigenvector via Rayleigh quotient (see Eq. (14)). The analogous phenomena connecting the choice of p to density have been observed in a standard graph such as online graph transduction (Herbster and Lever, 2009). Turning to the average edge degree parameter ( ), we observe the following preliminary indications that suggest how to choose the Laplacian as a function of . There we see on the large dataset (chess and mushroom) that all tv methods out perform star and clique methods of our p-Laplacian whereas for the other smaller datasets all star and clique methods outperform all tv methods. We have provided some guidance on the choice of Laplacian and the range of p based on the density and average edge degree of the graph. For more detailed experimental results and further discussion, see Appendix 12.
We further observe a different behavior than the semi-supervised learning in (Alamgir and Luxburg, 2011; Slepcev and Thorpe, 2019) using the same energy Eq. (8) in the standard graph setting. These works deal with the case of semi-supervised learning using p-Laplacians of a graph with an asymptotically large number of vertices. In this case, the problem does not degenerate into the trivial one when p is large, while the problem does so when p is small. However, from these experimental results, we observed a different behavior; small p also works when is small, as we discussed. This might be because there is a structural difference in the use of the p-Laplacian between semi-supervised learning and unsupervised learning.

Conclusion
This work has considered hypergraph spectral clustering. We have proposed a general framework for hypergraph p-Laplacian and provided theoretical results for our p-Laplacian. We also have proposed a convergent hypergraph partitioning algorithm with respect to our abstract class of p-Laplacian exploiting theoretical results. Our experiment has shown that our algorithm outperforms the existing spectral clustering algorithms for hypergraph Laplacians. Also, we have shown practical guidance on the choice of p-Laplacian.
There are several future directions. A fruitful direction would be to explore if our p-Laplacian can converge to the continuous p-Laplace operator in the limit of infinite data, similarly to the graph case (Belkin and Niyogi, 2003) and the hypergraph case (Saito, 2022). Moreover, similarly to the previous studies Saito et al., 2018), semi-supervised learning using S p as a regularizer would be valuable for a future study. Furthermore, while we conduct our experiment on a real dataset, it would be interesting to conceive an illustrative toy dataset where some hypergraph Laplacian works better than the others or where some p works while p = 2 does not. It would also be valuable to study multi-class clustering for arbitrary p using higher-order eigenvectors similarly to the standard graph 2-Laplacian case (von Luxburg, 2007), as opposed to the methods using recursive one-vs-rest two-class partitioning (Bühler and Hein, 2009;Hein et al., 2013). However, unlike the 2-Laplacian matrix, where those can be easily obtained, it would be difficult to obtain the third or higher p-eigenpairs of p-Laplacian. The reason is that while we know the algebraic identification for the second p-eigenpair (Eq. (19)), there have not been such identifications for the higher eigenpairs both in the discrete and continuous domain (Lindqvist, 2008).

Discussion on the conditions of definition 1
In this section, we discuss the conditions of the operator ∇ and the function c (u, v, e, ) by drawing examples. We use the examples listed in Table 2, which is mainly discussed in Sect. 3.4.
The first condition forces the operator ∇ to be homogeneous or absolute homogeneous. For the examples in Table 2, this condition for cliques and star is also obviously satisfied since the functions c for cliques and star are independent of . For the total variation, we obtain which therefore satisfies the condition Eq. (23). For sub, we compute and therefore this satisfies the condition. Secondly, we discuss the first condition, which is The condition Eq. (26) wants the summation of the function over all the pairs of vertices at an edge e to be independent of . For the examples in Table 2, this condition for cliques and star is satisfied since the functions c for cliques and star are independent of . For the total variation, the function c depends on . However, since the function c for total variation can be written as then the summation is Therefore, the summation of c(u, v, e, ) over u, v is independent of although c (u, v, e, ) is dependent on . To see sub, we observe that  which is independent of the order of and the particular vertices u, v.
The third condition is which means the function c (u, v, e, ) is constant once we fix e ∈ E and one vertex in the edge. From this condition, we obtain Similarly, we can prove the condition of u. This implies that the function c works as a coefficient for once we fix v and e, and c is independent of . Note that if c is not differentiable, then we simply change to subdifferentiation as For examples in Table 2, this condition is also obviously satisfied, since the functions c for cliques and star are independent of . Although the function c for total variation depends on , the function c is constant once we fix one vertex and one edge e. Moreover, this implies that the function c satisfies the condition Eq. (31).

Proof of propositions 4 and 5
For the convenience of the other proofs, we start our discussion from Eqs. (4) and (5). Equation (4) can be shown by Equation (5) can be shown by the following. By differentiating S p ( ) by at the vertex v, we obtain = ‖∇ (e)‖ p p = S p ( ).

3
By using Eq. (36), we consider the following equation.
As the summation in Eq. (37) runs over all vertices v ∈ V , we can reconstruct all pairs of vertices in each edge. Therefore, From this and Eq. (37), we obtain From Eq. (4) and Proposition (5), we can show that

Basic properties for the definition of the hypergraph-gradient
The following basic properties of hypergraph-gradient easily follow from the definition. (36)

Proposition 15
The hypergraph-gradient has the following properties.
These properties directly follow from the definition of the hypergraph-gradient.

Proof of proposition 7
By differentiating Eq. (14) by , we can obtain the condition for critical points of Eq. (14) as follows; By Definition 6, we can immediately show that is an eigenvector of Δ p . Moreover the eigenvalue can be obtained by S p ( )∕‖ ‖ p p . The last statement can be shown immediately by the definition.
By the definition of Rayleigh quotient, we immediately have the following property.
For the first p-eigenvector, we compute p-Laplacian by differentiating by , that is Then, we obtain From Eq. (43), the derivative of hypergraph-gradient is independent of . Therefore, from Eq. (46), the p-Laplacian Δ p equals to 0 if ∇ (e) = 0, ∀e ∈ E . This means that Δ p = 0 . Also, Δ p cM 1∕p = 0.
(41) (∇cM 1∕p ) (e) = , ∀e ∈ E, As the p-eigenvalue is equal or greater than 0 from Proposition 7, the first p-eigenvector is M 1∕p , associated with the first p-eigenvalue 0.
The following corollary follows.

Corollary 17
Proof Similarly to the proof of p-eigenvector, we compute As the second derivative of hypergraph-gradient is 0 from Eq. (43),

Properties of p-Eigenpair of p-Laplacian
In this section, we remark that interesting properties of p-Eigenpair of p-Laplacian.
We firstly remark that the definition of p-eigenpair (Definition 6) leads to existing an infinite number of p-eigenpairs, similarly to the continuous case (Binding and Rynne, 2008).
We move our discussion to a property of multiplicity of first p-eigenvalues.

Proposition 18 Suppose that hypergraph G is a union of k independent and connected hypergraphs
Then, k equals to the multiplicity of eigenvalue 0 of Δ p .
The following corollary follows from this proposition.

Corollary 19
The second p-eigenvalue of Δ p is greater than 0, if a hypergraph G is connected.
To analyze critical point of Eq. (14), Index theory (Struwe, 2000) is useful. We use Krasnoselskii genus (Struwe, 2000) for a set A, that is defined as (A)∶=0 if A = � , (A)∶= inf{k ∈ + | ∃ odd continuous h ∶ A → k �{0} , and ∶=∞ when no such h exists ∀j ∈ + . In this context, this genus is a generalized concept of dimension of A. Since R p ( ) = R p ( ) by Corollary 16, to consider the p-eigenpair of p-Laplacian, we can limit our interest to S p ∶={ | ‖ ‖ p p = 1} . From the results in discrete case (Chang, 2016;Tudisco and Hein, 2016;Li and Milenkovic, 2018) and continuous case (Lindqvist, 2008), we obtain the following proposition, which is a generalized Rayleigh min-max theorem.

Proposition 20 Consider the set of subsets
The sequence defined as gives a critical point of R p ( ) , whose corresponding and k constitute a p-eigenpair of Δ p .
Similarly to the Rayleigh min max theorem, this proposition yields the sequence of p-eigenpairs. Moreover, for a standard Laplacian of standard graph, this reduces into Rayleigh min-max theorem. However, similarly to the continuous p-Laplacian theory (Lindqvist, 2008), we do not know if this sequence yields exhaustive p-eigenpairs.

Proof of proposition 18
As we observe R p ( ) ≥ 0 from the definition, we can show that all the p-eigenvalues are non-negative. We denote by G i a vector whose size of vector is the number of vertices of G and fill 1 to the elements corresponds to G i and else 0. By using this notation, we show that Δ p (cM −1∕p G i ) = 0 for all i = 1, … , k , which shows that those vectors are p-eigenvector and corresponding p-eigenvalues are 0. From the definition of p-Laplacian, those are the only p-eigenvectors whose p-eigenvalues are 0. The above shows that the multiplicity of p-eigenvalues of 0 equals to the number of independent components.
Next, we introduce the classical result of Lunsternik-Schinirelman theorem.
Theorem 23 [Struwe (2000)] Suppose function g ∶ S p → ℝ is a locally Lipschitz, then yields a sequence of critical values of g.
By applying Theorem 23 to R p , we can show that the sequence in Eq. (20) yields a critical values of R p , which are p-eigenvalues of p-Laplacian.

Discussion on table 2
We discuss how Table. 2 connects to the existing Laplacians, by splitting the discussion by clique, star, and total variation Laplacians.
We note that all the functions in Table 2 satisfies the conditions of Definition 1, which is discussed in Sect. A.

Clique Laplacians
The hypergraph-gradient for edge-normalized Laplacian is given as and edge-unnormalized Laplacian can be written as To show that the function c for clique can induce existing clique Laplacian, we only consider when p = 2 , and (v) = d (v) . The following proposition directly shows that Saito's 2-Laplacian is our 2-Laplacian for clique setting. Saito et al. (2018) as

Proposition 24 Let e be
The induced 2-Laplacian correspond to 2-Laplacians proposed by Saito et al. (2018). If we choose the same hypergraph-gradient but omitted denominator √ �e� − 1 , then the induced 2-Laplacian corresponds to Rodriguez's Laplacian. This also shows that Rodriguez 2-Laplacian is our edge-unnormalized clique 2-Laplacian.
The 2-Laplacians L are given as where for an edge-normalized setting W is a matrix whose element is w(u, v) = ∑ uv∈e w(e) ∕ (|e| − 1) and D = D v and for an edge-unnormalized setting w(u, v) = ∑ u,v∈e w(e) and D is a diagonal matrix whose element d(v, v) = ∑ v∈e w(e). In Saito et al. (2018), they used the different energy setup for p-Laplacian, as discussed in Sect. 6. However, when p = 2 , Saito's 2-Laplacian matches our clique normalized 2-Laplacian. Actually, in the case of p = 2 , we obtain Therefore, given Saito et al. (2018) has the same structure of different geometry, we can directly apply their proof to our setting in the case of p = 2.

Star Laplacian
The given hypergraph-gradient for star Laplacian can be written as Here, we show that this hypergraph 2-Laplacian also can be seen from the same framework. Similarly to the clique Laplacians, the following proposition follows. We can compute 2-Laplacian in the same manner as Saito et al. (2018), with a slight change of denominator of hypergraph-gradient from √ �e� − 1 to √ �e� . The 2-Laplacian induced from the hypergraph-gradient Eq. (57) can be computed as where W s is a matrix whose element w s (u, v) = ∑ u,v∈e w(e)∕�e� . We can show that Eq. (58) satisfies the condition of Laplacian, in the same manner as the proof for Proposition 9 in Saito et al. (2018).

Proposition 25
which ends the proof. ◻ We move to prove the following lemma.

Lemma 28 Denote , be a p-eigenpair of p-Laplacian
From the definition of nodal domains, each edge e intersects at most two nodal domains with different signs. Therefore, | e∩A i = | e for any e ∈ E and for any nodal domain A i , and | e∩A i = sgn( i )y| e for any e ∈ E and for any nodal domain A i . We divide edges into two classes according to the number of nodal domains intersected by each edge as follows.
Note that E 1 ∪ E 2 = E . Then, since ∇ (e) = ∇ | A i (e) if A i ∩ e = � and ∇ (e) = 0 for those i such that A i ∩ e = � and simpler version of Hölder's inequality From Eqs. (69) and (70), we have The last equality follows from Lemma 3.7 in Tudisco and Hein (2016). ◻ Now we prove Theorem 9. Suppose that k has multiplicity r and associated eigenvector . As functions | A 1 , … , | A m are linear independent of the definition of a nodal domain, From Eq. (72) m ≥ k and m ≥ k + r − 1 . This concludes the proof.

Proof of theorem 10
We begin our discussion by the following lemma.

Lemma 29 Let
This concludes the proof. ◻ By combining lemma 30 and lemma 31, we can say a stronger statement than Corollary 11.

Proof of theorem 13
By definition, for all v ∈ V where and are eigenvalues associated with and , respectively. Then, for all v ∈ V we obtain By summing up over all v ∈ V and taking difference of both side of Eqs. (92) and (93), we compute By applying Taylor expansion at a Δ p M 1∕p to Δ p , at a Δ p M 1∕p to Δ p at a ( ) to p ( ) , and at a ( ) to p ( ) in right hand side of Eq. (94), we obtain (88) As we have 2 ≥ inf R p ( ) and 2 ≤ inf R p ( ) , we obtain 2 = R p ( ).

Additional experimental results
We first mention that our experiment was run on mac mini with Intel i7 and 32GiB RAM. Table 5 shows the detailed results of the hypergraph partitioning experiment. This is a further evidence of a trade-off between Cheeger inequality and natural intuition of p from Eq. (8), discussed in Sect. 7. However, the difference of the performance between the best p and the others varies; sometimes the contribution of p is small while sometimes p makes the large difference. Also, you can see the further evidence that E-N-VW outperfoms the other method in chess, while looking at the other datasets, there we see on the large dataset (mushroom) that all TV methods out perform star and clique methods whereas for smaller all star and clique methods outperform all TV methods.
Author contributions SS conceived the framework, conducted theoretical analyses, and carried out the experiments. SS and MH wrote the manuscript.
Funding This research has been made possible thanks partially to Huawei, who are generously supporting Shota Saito's PhD study at University College London.

Data availability
We used publicly available data, and we cite them appropriately.
Code availability Part of our code depends on the others' packages, which are not available to publish due to copyright reasons. We will endeavor to reduce the dependency on these restricted packages by the time of publication. Also, full code is available upon request to the corresponding author.

Conflict of interest The authors have declared that no competing interests exist.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.