1 Introduction

Graphs are one of the most widely used data representations for structured data, such as social networks (Newman, 2006), web pages (Brin and Page, 1998) and images (Shi and Malik, 1997). The graph Laplacian is a linear operator that characterizes the graph. A natural discrete optimization problem whose solution characterizes a balanced clustering is solved in its relaxed form by finding the second eigenvector of the graph Laplacian (Fiedler, 1973; Alon and Milman, 1985; Hagen and Kahng, 1992; Shi and Malik, 1997; von Luxburg, 2007). The Laplacian has been previously generalized to a nonlinear p-Laplacian in the context of machine learning. Improved performance as a function of p was previously demonstrated in various special cases such as (Bühler and Hein, 2009; Bougleux et al., 2009). The p-Laplacian has previously been motivated from the perspective of differential geometry (Zhou and Schölkopf, 2005), as well as from a Cheeger inequality perspective (Bühler and Hein, 2009).

Hypergraphs generalize graphs and serve as a natural representation of multi-relational data. The hypergraph representation has been used to model videos (Huang et al., 2009), web browsing histories (Mobasher et al., 2000), recommender engines (Gharahighehi et al., 2021) and cell molecules (Klamt et al., 2009). However, although a hypergraph is a natural data representation, generalization from graph Laplacian to hypergraph Laplacian is not straightforward. Thus, multiple such generalizations have been proposed in the literature (Agarwal et al., 2006; Saito et al., 2018; Hein et al., 2013; Li and Milenkovic, 2018). Furthermore, the extension from Laplacian to p-Laplacian may take different forms (Saito et al., 2018; Li and Milenkovic, 2018). However, as shown in Table 1, although they have a similar structure, some Laplacians miss some key features. The objective of this work is to construct a theoretical structure to bring these similar but disparate models into one unified framework.

In our unified framework, we define an abstract class of hypergraph p-Laplacians that incorporates a number of previously proposed hypergraph p-Laplacians as well as previously unstudied novel hypergraph p-Laplacians. This framework builds on a limited special case previously proposed in Saito et al. (2018). The overall framework is inspired by a differential-geometric analogy from the continuous to the discrete domain. Exploiting the differential-geometric connection, we provide a generalized nodal domain theorem (see Theorem 9) and a generalized Cheeger inequality (see Theorem 10 and Corollary 11). These provide a theoretical justification and bounds for using the eigenvectors of a hypergraph p-Laplacian to perform partitioning. Exploiting these theoretical results, we provide an algorithm for finding an approximation to the second eigenvector. We provide an empirical study of this algorithm which shows that our algorithm outperforms a variety of existing hypergraph p-Laplacian based methods.

We highlight five salient contributions of this work.

  1. 1.

    From a differential-geometric perspective, we define an abstract class of p-Laplacians of hypergraphs that can incorporate previously proposed p-Laplacians as well as novel unstudied p-Laplacians.

  2. 2.

    We provide theoretical results for our abstract class of p-Laplacians, such as the Nodal domain theorem, the Cheeger inequality, and a bound on the relationship between the minimum Cheeger cut and the second p-eigenvalue of the p-Laplacian.

  3. 3.

    Exploiting these theoretical results, we propose a convergent hypergraph partitioning algorithm with respect to our abstract class of hypergraph p-Laplacians.

  4. 4.

    We demonstrate empirically that our method can improve the performance of the existing p-Laplacians.

  5. 5.

    Based on our theoretical and empirical observations, we provide guidance on the choice of p-Laplacian.

We remark that in the literature, there are a number of special case results (Zhou et al., 2006; Agarwal et al., 2006; Saito et al., 2018; Hein et al., 2013). These prior results derive a patchwork of nodal domain theorems, Cheeger inequalities, as well as partitioning algorithms for some particular cases of hypergraph p-Laplacians, as shown in Table 1. The advantage of the approach here is that we define an abstract class of hypergraph p-Laplacians, and both our theory and our partitioning algorithm apply to the complete class. Finally, we provide guidance on how to select a particular value of p for hypergraph p-Laplacians. All proofs are in appendix sections.

Table 1 Comparison table for existing methods and ours. star is studied in Zhou et al. (2006), and unnormalized clique is studied in Rodriguez (2002) and edge-normalized clique is in Saito et al. (2018)

2 Preliminaries

This section reviews the notions of hypergraph and Cheeger inequality.

2.1 Hypergraph notions

We begin with standard definitions and notations of a hypergraph. A hypergraph \(G\) \(=\) \((V\),E,\({\mathbf {w}}\),\(\varvec{\mu })\), where V is a vertex set, an E is an edge set, \({\mathbf {w}}\) is a vector \(\{w(e)\}_{e \in E}\) where \(w\) \(:\) \(E\) \(\rightarrow\) \({\mathbb {R}}^{+}\) maps each edge with a weight, and \(\varvec{\mu }\) is a vector \(\{\mu (v)\}_{v \in V}\), where \(\mu\) \(:\) \(V\) \(\rightarrow\) \({\mathbb {R}}^{+}\) maps each edge with a edge weight. The edge set E is a subset of all possible permutation of vertices, i.e., \(E\) \(\subseteq\) \(\cup _{k=1}^{\vert V \vert }\) \(\cup _{ \{v_1,\ldots ,v_k\} \subseteq V}\) \(\{ [v_{\sigma (1)},\ldots ,v_{\sigma (k)} ]\) \(\mid\) \(\sigma\) \(\in\) \({\mathcal {P}}_{k}\}\), where \({\mathcal {P}}_{k}\) denotes the set of permutations \(\sigma\) on \(\{1,\ldots ,k\}\). A hypergraph is connected if there is a path for every pair of vertices. A hypergraph is undirected when the set of edges are symmetric; defining a relation \(\sim\) between two edges as \([v_{\sigma (1)},\ldots ,v_{\sigma (k)}]\sim [v_{\sigma '(1)},\ldots ,v_{\sigma '(k)}]\) where \(\sigma ,\sigma '\in {\mathcal {P}}_{k} (k=1,\ldots ,n)\) and we denote a set of undirected edges as \(E_{u}\) \(=\) \(E\) \(/\) \(\sim\). In what follows, we assume that the hypergraph G is connected and undirected unless noted. We define the degree of a vertex \(v\) \(\in\) \(V\) as \(d(v)\) \(=\) \(\sum _{e \in E: v \in e} w(e)\), while the degree of an edge \(e\) \(\in\) \(E\) is defined as \(\vert e \vert\). For the benefit of the representation of hypergraph, we define various matrices. Degree matrices \(D_{v}\) and \(D_{e}\) are diagonal matrices whose diagonal elements are degree of vertices and edges, respectively. Let \(W_e\) be a diagonal \(\vert E \vert\) \(\times\) \(\vert E \vert\) matrix, whose diagonal elements are weight of edge e. We denote by \(\vert V \vert\) \(\times\) \(\vert E_{u} \vert\) matrix H is an incidence matrix, whose element \(h(v,e)\) \(=\) \(1\) if vertex v is connected to the edge e, and 0 otherwise. For more details, see Berge (1984).

2.2 Cheeger inequality

This section reviews Cheeger inequality for the standard 2-Laplacian case. Influenced by the inequality of the eigenvalue of continuous Laplacian called Cheeger inequality, there is existing research on the Cheeger inequality in the discrete domain (Alon and Milman, 1985; Lee et al., 2014; Tudisco and Hein, 2016). This inequality shows the connection between the eigenproblem of Laplacian and a graph cut called as Cheeger cut Alon and Milman (1985). This inequality motivates us to use eigenvectors to partition in the following way. While the discrete optimization problem of finding a subset of V that minimizes the Cheeger cut is NP-hard (von Luxburg, 2007), the eigenproblem of the graph Laplacian is not NP-hard. Since the Cheeger inequality gives bounds between eigenvalues of graph Laplacian and optimal Cheeger cut, the Cheeger inequality can be said to guarantee the performance of the eigenproblem compared to the ground truth from the original cut problem. This performance guarantee enables us to use eigenvectors obtained by less computationally expensive eigenproblem instead of the costly ground truth from the discrete cut problem. In other words, the Cheeger inequality “connects” Cheeger cut and eigenproblem; the Cheeger inequality shows how much we approximate the original graph cut problem by relaxing this into the real-valued eigenproblem of Laplacian.

We observe the “connection” as follows. Let \(A\) \(\subset\) \(V\) be a set and \({\overline{A}}\) be a complement of U. The Cheeger cut may be defined as

$$\begin{aligned}&C(U) := \frac{ \partial V(A,{\overline{A}})}{\min (\mathrm {vol}(A),\mathrm {vol}({\overline{A}}))}, \mathrm {\ where\ }\partial V(A,{\overline{A}}) {:}{=}\sum _{u \in A, v \in {\overline{A}} } w(\{u,v\}), \end{aligned}$$

where \(\mathrm {vol}(A)\) \(=\) \(\sum _{v\in A}d(v)\). The Cheeger constant \(h_2\) \({:}{=}\) \(\min _{A \subset V}\) \(C(A)\) is the value of the optimal cut. The Cheeger inequality shows a connection between the eigenvalue of Laplacian and the Cheeger constant as

$$\begin{aligned} h_2 \le \sqrt{2 \lambda _2} \le 2 \sqrt{h_2}. \end{aligned}$$
(1)

Equation (1) shows how we approximate the Cheeger constant by relaxing the original discrete cut problem into the real-valued eigenproblem of the graph Laplacian. The Cheeger inequality guarantees the performance of the cut resulting from algorithms using the second eigenvector of Laplacian as follows. Let \((B, {\overline{B}})\) be the cut found by the second eigenvector of the Laplacian \(\psi\), such that \((\{ v : \psi (v) \ge t\},\{ v : \psi (v) < t\})\) minimizing Cheeger cut. Chung (2007) showed that \(C(B)\) \(\sqrt{2 \lambda _2}\). Therefore, by the upper bound of Eq. (1) we observe

$$\begin{aligned} C(B)<2\sqrt{h_2}, \end{aligned}$$
(2)

which gives a guarantee for the worst case of performance of spectral clustering. This motivates us to use spectral methods for graph partitioning problems.

3 Hypergraph p-Laplacian

This section defines and discusses a hypergraph p-Laplacian and associated p-eigenpairs.

3.1 Differential operators: gradient \(\nabla\), divergence \(\mathrm {div}\) and p-Laplacian \(\Delta _{p}\)

In this section, we aim to extend various differential operators proposed in Saito et al. (2018) to an abstract class of p-Laplacians. We firstly introduce the following two inner product spaces \({\mathcal {H}}(V)\) and \({\mathcal {H}}(E)\) of real-valued functions over the vertex set and the (directed) edge set respectively,

$$\begin{aligned} \langle f,g \rangle _{{\mathcal {H}}(V)}&{:}{=}\sum _{v \in V} f(v)g(v) \end{aligned}$$
(3)
$$\begin{aligned} \langle r,s \rangle _{{\mathcal {H}}(E)}&{:}{=}\sum _{ e \in E} \frac{r(e)s(e)}{\vert e \vert !},. \end{aligned}$$
(4)

We next define three operators on these spaces; gradient \(\nabla\):\({\mathcal {H}}(V)\) \(\rightarrow\) \({\mathcal {H}}\) \((E)\), divergence \(\mathrm {div}\):\({\mathcal {H}}(E)\) \(\rightarrow\) \({\mathcal {H}}(V)\), and p-Laplacian \(\Delta _{p}\):\({\mathcal {H}}(V)\) \(\rightarrow\) \({\mathcal {H}}(E)\). These operators are discrete geometric analogs to the comparable operators in the continuous differential geometry. In the continuous domain, for the second differentiable function f, the p-Laplace operator is defined as \(\Delta _{p}^{(c)}f:=\mathrm {div}^{(c)}(\Vert f \Vert ^{p-2}\nabla ^{(c)} f)\), where operators with superscripted by (c) are the standard continuous calculus ones. In the following, we would like to establish a differential-geometric framework in a generalized discrete setting analogous to the continuous one to define an abstract class of p-Laplacians. The operators \(\mathrm {div}\), \(\Delta _{p}\) were introduced in the graph setting (Zhou and Schölkopf, 2005; Grady, 2006) and generalized to the hypergraph setting in (Saito et al., 2018), whereas a similar formulation of \(\nabla\) was given graph and hypergraph settings (Zhou and Schölkopf, 2005; Saito et al., 2018). The definition that we propose below broadly generalizes all previous definitions. We define and discuss its interpretation below.

We propose to define the hypergraph-gradient as follows. The definition below is the generalization of the definition of gradient over hypergraphs proposed in Saito et al. (2018).

Definition 1

Let \(\nabla\) be an operator \(\nabla : {\mathcal {H}}(V) \rightarrow {\mathcal {H}} (E)\). A hypergraph-gradient \(\nabla\) is

$$\begin{aligned} ( \nabla \psi ) (e) {:}{=}\sum _{u,v \in e} w^{\frac{1}{p}} (e) c^{\frac{1}{p}}(u,v,e, \psi )\left( \frac{\psi (u)}{\mu ^{1/p}(u)} - \frac{\psi (v)}{\mu ^{1/p}(v)} \right) , \end{aligned}$$

where the operator \(\nabla\) and the function \(c(u,v,e,\psi )\) satisfies the following three conditions for all \(e \in E\) and vertices \(u,v \in e\);

$$\begin{aligned} \vert \nabla (\alpha \psi ) \vert&= \vert \alpha \nabla (\alpha \psi ) \vert \end{aligned}$$
(5)
$$\begin{aligned} \sum _{u,v \in e}c(u,v,e,\psi )&=c(e) \end{aligned}$$
(6)
$$\begin{aligned} \left. \frac{\partial c^{1/p}(u,v,e,\psi )}{\partial \psi } \right| _{v}=0,\forall u \in e \quad&\mathrm {and} \quad \left. \frac{\partial c^{1/p}(u,v,e,\psi )}{\partial \psi } \right| _{u}=0, \forall v \in e \end{aligned}$$
(7)

This hypergraph-gradient can be intuitively interpreted as follows. The term \(\psi (u)/\mu ^{1/p}(u) - \psi (v)/\mu ^{1/p}(v)\) can be interpreted as “roughness” (normalized by \(\varvec{\mu }\)) between two vertices. The hypergraph-gradient is a sum of all possible combinations of this term for the edge e. Hence, the hypergraph-gradient can be intuitively seen as the roughness in one edge, similar to the continuous gradient \(\nabla ^{(c)}\).

The definition of the hypergraph-gradient function depends on a “weighting” function \(c(u,v,e,\psi )\). This weighting function can be seen as a coefficient of the difference between every pair of vertices. Varying \(c(u,v,e,\psi )\) allows us to model different types of hypergraph expansions including but not limited to the star (Zhou et al., 2006) or clique expansions (Saito et al., 2018) (see Table 2 for details), i.e., the function c enables the following our generalized p-Laplacian framework to be abstract.

We leave a few remarks on equations of gradient Definition 1. First, the gradient operator \(\nabla\) and the function c has three conditions described as Eqs. (5), (6) and (7). Equation (5) requires the operator \(\nabla\) to be either homogeneous or absolute homogeneous. Equation (6) requires that the summation of the function over all the pairs of vertices at an edge e is independent of \(\psi\). Equation (7) enforces that the function \(c^{1/p}\) is independent of \(\psi\) once we fix one edge and one vertex in the edge. In the following, when c is not differentiable, we consider subdifferential instead of derivative. For a more detailed discussion on these conditions, see Appendix 1. We normalize \(\psi\) by vertex weights \(\varvec{\mu }\). We call the vertex weights unnormalized when \(\mu _v\) \(=\) \(1\) and normalized when \(\mu (v)\) \(=\) \(d(v)\) \(,\forall v \in V\). We observe that the existing unnormalized p-Laplacian such as Hein et al. (2013) \(\mu (v)\) \(=\) \(1\) and normalized 2-Laplacian (Zhou et al., 2006; Saito et al., 2018) when \(\mu (v)\) \(=\) \(d(v)\).

The following definition of a divergence operator is inspired by an analogy to the continuous setting.

Definition 2

A hypergraph divergence is an operator \(\mathrm {div}: {\mathcal {H}}(E) \rightarrow {\mathcal {H}}(V)\) which satisfies \(\forall \psi \in {\mathcal {H}}(V),\forall \phi \in {\mathcal {H}}(E)\) \(\langle \phi , \nabla \psi \rangle _{{\mathcal {H}}(E)} = \langle \psi , -\mathrm {div} \phi \rangle _{{\mathcal {H}}(V)}\).

Note that Definition 2 is an analog to the continuous Stoke’s Theorem. Also, we can check that div is unique. Intuitively, divergence counts the net flow defined by \(\phi\) on the vertex, similarly to the intuition in the continuous domain.

Finally, we propose to define p-Laplacian.

Definition 3

An operator \(\Delta _{p}\):\({\mathcal {H}}(V)\) \(\rightarrow\) \({\mathcal {H}}(E)\) is a hypergraph p-Laplacian if \(\Delta _p\) \(\psi\) \({:}{=}\) \(-\mathrm {div}\) \((\Vert \nabla \psi \Vert ^{p-2}_{p} \nabla \psi )\).

3.2 p-Dirichlet sum and p-Laplacian

Table 2 The relationship between a function \(c(u,v,e,\psi )\) in hypergraph-gradients and Laplacians. We denote \(e_{[1]}\) by the first vertex of an edge e. Also, \(F:2^{\vert e \vert } \rightarrow [0,1]\) is a submodular function, and we use rearranged vertices \(v_{i}\) so that \(\psi (v_{\vert e \vert }) \ge \ldots \ge \psi (v_{1})\). See Sect. 3.4 for the details and all the notations

This section defines the p-Dirichlet sum, which can be interpreted as energy over the hypergraph. Also, we discuss relations between the p-Dirichlet sum and the p-Laplacian. Lastly, we discuss how these relations are the foundation of graph partitioning.

Using the norm defined by the Hilbert space in Eq. (4), we define p-Dirichlet sum of \(\psi\) \(\in\) \(H(V)\) as

$$\begin{aligned} S_{p}(\psi ) {:}{=}\Vert \nabla \psi \Vert _{p}^{p} = \sum _{e \in E} \frac{\vert \nabla \psi (e)\vert ^p}{\vert e \vert !}, \end{aligned}$$
(8)

which measures roughness of \(\psi\) over the hypergraph. Hence, it is natural to interpret the p-Dirichlet sum as an energy over hypergraph. Later we will use this energy serves as the objective function of the hypergraph partitioning.

For the p-Dirichlet sum and p-Laplacian, the following relationships hold;

Proposition 4

\(S_{p}(\psi ) = \langle \psi , \Delta _{p}\psi \rangle _{{\mathcal {H}}(V)}\).

Proposition 5

\(\left. \dfrac{\partial S_{p}(\psi )}{\partial \psi } \right| _{v} = p \Delta _{p} \psi (v).\)

These relations are important both in the continuous and discrete domains. In the continuous domain, the analog of these relations is fundamental for an important problem on p-Laplacian, called Dirichlet Principle (Courant and Hilbert, 1962); the Dirichlet energy is minimized when the Laplace equation is satisfied. For the clustering in the discrete domain, we minimize the p-Dirichlet sum. To do so, we consider a problem similar to the Laplace equation, which is the eigenproblem of Laplacian. This is how we see an analogy between continuous differential geometry and this discrete geometry. In the following, we illustrate this with an example of the standard graph 2-Laplacian case. Let L be a standard normalized graph 2-Laplacian \(L\) \(=\) \(D^{-1/2}(D-A)D^{-1/2}\) where A is an adjacency matrix, and D is a diagonal degree matrix. This Laplacian is computed through our framework if we set hypergraph-gradient as

$$\begin{aligned}{}[4] (\nabla \psi ) (\{u,v\}) = w^{1/2}(\{u,v\})\left( \frac{\psi (u)}{\sqrt{d(u)}} - \frac{\psi (v)}{\sqrt{d(v)}} \right) . \end{aligned}$$
(9)

Note that we put \(\varvec{\mu } = \{ d(v) \}_{v \in V}\). We then obtain

$$\begin{aligned} \Delta _2 \psi (v) = \psi (v) - \sum _{\{u,v\} \in E} \frac{w(\{u,v\})}{d^{1/2}(u)d^{1/2}}, \end{aligned}$$
(10)

which corresponds to \(L\psi\). From Eq. (8), the energy is defined as

$$\begin{aligned} S_2(\psi )&= \Vert \nabla \psi \Vert _2^2 = \sum _{u,v} w(\{u,v\})\left( \frac{\psi (u)}{\sqrt{d(u)}} - \frac{\psi (v)}{\sqrt{d(v)}} \right) ^2 = \psi ^{\top } L \psi . \end{aligned}$$
(11)

This corresponds to Proposition 4. Differentiating \(S_2(\psi )\) by \(\psi\), we observe

$$\begin{aligned} \frac{\partial S_{2}(\psi )}{\partial \psi }&= 2 L\psi , \end{aligned}$$
(12)

which corresponds to Proposition 5. More detailed steps are explained in Appendix 2. In the graph partitioning context, Eqs. (11) and (12) serve as a foundation of the relationship between balanced cuts and graph Laplacian. Using these properties and Courtant’s min-max theorem, we can connect the discrete cut problem into the eigenproblem of Laplacian by the Nodal domain theorem and the Cheeger inequality, as we will see in the next sections.

We often see the properties of Propositions 4 and 5 in the graph p-Laplacian (Bühler and Hein, 2009; Bougleux et al., 2009). Moreover, also in the hypergraph context, without a defining differential geometric setup, we see these properties in the existing hypergraph Laplacians, as seen in Table 1 (Zhou et al., 2006; Saito et al., 2018; Hein et al., 2013; Li and Milenkovic, 2018). Hence, it is natural to expect that all of the hypergraph Laplacians have a similar structure in this sense. However, as we see in Table 1, some Laplacians miss some features; particularly, some Laplacians miss the useful nodal domain theorem and Cheeger inequality (discussed in Sect. 4). Note that we “borrow” these results from the continuous differential geometry. One of the benefits of our abstract Laplacian is to give comprehensive analyses to all Laplacians defined from the gradient Definition 1.

3.3 p-Eigenproblem of p-Laplacian

Next, we discuss the eigenproblem of this p-Laplacian. Since a p-Laplace operator is nonlinear, we introduce the standard generalization of eigenpair for p-Laplacian (see for examples of Tudisco and Hein (2016)).

Definition 6

[4] Let \(\xi _p(x)\) \({:}{=}\) \(\vert x \vert ^{p-1}\) \(\mathrm {sgn}\) \((x)\). For \(p\) \(1\), a hypergraph p-eigenpair, which is a pair of p-eigenvalue \(\lambda \in {\mathbb {R}}\) and p-eigenvector \(\psi \in {\mathcal {H}}(V)\) of \(\Delta _p\), is defined by

$$\begin{aligned} (\Delta _{p}\psi )(v) = \lambda \xi _{p}(\psi (v)), \forall v \in V. \end{aligned}$$
(13)

In the standard Laplacian, one can show the connection between its eigenpair and Rayleigh quotient from the matrix theory as well as the continuous analysis. To obtain p-eigenpair, we consider the following Rayleigh quotient:

Proposition 7

Consider the Rayleigh quotient for our abstract class of p-Laplacians as

$$\begin{aligned} R_{p}(\psi ) {:}{=}\frac{S_{p}(\psi )}{\Vert \psi \Vert _{p}^{p}}, \mathrm {where\ } \Vert \psi \Vert _{p} {:}{=}\left( \sum _{v} \psi ^p(v) \right) ^{\frac{1}{p}}, \end{aligned}$$

The function \(R_{p}\) has a critical point at \(\psi ^{*}\) if and only if \(\psi ^{*}\) is p-eigenvector of \(\Delta _{p}\). The corresponding p-eigenvalue \(\lambda ^{*}\) is given as \(\lambda _p^{*}\) \(=\) \(R_{p} (\psi ^{*})\). Moreover, the first p-eigenvalue is 0, whose p-eigenvector is \(M^{1/p}{\mathbf {1}}\), where M is a \(\vert V \vert\) \(\times\) \(\vert V \vert\) diagonal matrix whose diagonal element is \(\mu _v\).

For the standard Laplacians, the first p-eigenvector is equal to \(c{\mathbf {1}}\) for unnormalized and \(cD_{v}^{1/2}{\mathbf {1}}\) for normalized case. For more properties of p-eigenpair, see Appendix 5.

3.4 p-Laplacians and related regularizers

This section shows that various hypergraph Laplacians and related regularizers can be seen as a special case of our framework. We discuss a clique expansion way (Rodriguez, 2002; Saito et al., 2018), a star expansion way (Zhou et al., 2006), a total variation way (Hein et al., 2013), and a submodular hypergraph way (Li and Milenkovic, 2018).

3.4.1 Clique expansion hypergraph Laplacian (clique)

This approach constructs a graph where a clique replaces every pair of vertices in an original edge of a hypergraph. Hypergraph clique 2-Laplacian normalized by a degree of edge (clique e-n) is proposed by Saito et al. (2018), and clique 2-Laplacian but edge-unnormalized (clique e-un) is proposed Rodriguez (2002). For those 2-Laplacians, clique Laplacian is \(L\) \(=\) \(I\) \(-\) \(D^{-1/2}\) \(W\) \(D^{-1/2}\), where for the edge-normalized setting W is a matrix whose element is \(w(u,v)\) \(=\) \(\sum _{uv \in e}\) \(w(e)\) \(/\) \((\vert e \vert\) \(-\) \(1)\) and \(D\) \(=\) \(D_{v}\) and for an edge-unnormalized setting \(w(u,v) = \sum _{u,v \in e} w(e)\) and D is a diagonal elements are \(d(v,v) = \sum _{v \in e } w(e)\). Note that this can be seen as hypergraph contraction to graph, represented by W, and L is a standard 2-Laplacian induced by W.

3.4.2 Star expansion hypergraph Laplacian (star)

This way constructs a graph by making a new vertex for every edge to form a star. This Hypergraph 2-Laplacian can be written as \(L_{2,s}\) \(=\) \(I\) \(-\) \(D_{v}^{-1/2}HW_{e}D_{e}^{-1}H^{\top }D_{v}^{-1/2}\) (Zhou et al., 2006). We remark that this view is also hypergraph contraction to graph, represented by adjacency matrix \(HW_{e}D_{e}^{-1}H^{\top }\). Note that this Laplacian can be seen as the standard Laplacian if we consider hypergraph as a graph, except for the coefficient 1/2. This coefficient difference comes from the nature of this view, as discussed in Saito et al. (2018).

3.4.3 Total variation and submodular p -Laplacian (tv/sub)

The total variation (tv) approach for hypergraph has been considered in a different context than the other two (Hein et al., 2013). The TV regularizer is defined as

$$\begin{aligned} S_p(\psi )=\sum _{e \in E}w(e)\max _{v,u \in e}\vert \psi (v) - \psi (u)\vert ^{p}, \end{aligned}$$
(14)

which is not normalized by a degree of vertex (tv v-un). We here propose normalized total variation (tv v-n) p-Laplacian, whose regularizer we define as \(S_p(\psi )\) \(=\) \(\sum _{e}\) \(w(e)\) \(\max _{v,u \in e}\) \(\vert \psi (v)/d^{1/p}(v)\) \(-\) \(\psi (u)/d^{1/p}(u)\vert ^{p}\). This TV p-Laplacian is incorporated by the submodular p-Laplacian (Li and Milenkovic, 2018). The extensive study by Li and Milenkovic (2018) considers hypergraph p-Laplacian in the context of a submodular function, which we refer to as sub. For a submodular function \(F:2^{\vert e \vert } \rightarrow [0,1]\), associated with edge e, the submodular p-Laplacian is associated with the following energy;

$$\begin{aligned} S_{p} (\psi ) = \sum _{e \in E} (w(e) \max _{S \subset V} (F(S)) \left( \sum _{i=1}^{\vert e \vert - 1} F(S_{i}) \left( \frac{\psi (v_{i+1})}{\mu ^{1/p}(v_{i+1})} - \frac{\psi (v_{i})}{\mu ^{1/p}(v_{i})}\right) \right) ^{p}, \end{aligned}$$
(15)

by reordering v in e as \(\psi (v_{\vert e \vert }) \ge \psi (v_{\vert e \vert - 1}) \ge \ldots \ge \psi (v_{1})\), where \(S_{i}:=\{v_{j}\}_{j=1}^{i}\). Note that this form is one form of Lovász extention. By taking \(F(S_{i})=1\) for all i, we obtain tv p-Laplacian.

3.4.4 Connections between our p-Laplacian and existing Laplacians

These Laplacians can be seen as a special case of our abstract Laplacian, defined by Definition 3, followed by hypergraph-gradient (Definition 1) and hypergraph-divergence (Definition 2). Table 2 summarizes the corresponding function \(c(u,v,e,\psi )\) in the definition of hypergraph-gradient. Edge normalized and unnormalized clique 2-Laplacians in Table 2 are 2-Laplacians proposed by Saito et al. (2018) and Rodriguez (2002), respectively. Star 2-Laplacian in Table 2 is equal to the Laplacian proposed by Zhou et al. (2006). The regularizer of unnormalized TV p-Laplacian in Table 2 corresponds one by Hein et al. (2013). We also note that all the functions of \(c(u,v,e,\psi )\) satisfy the condition of Definition 1 (see Appendix 1). For more discussion, see Appendix 6.

4 Properties of p-Eigenpair of p-Laplacian

This section discusses the properties of the p-eigenproblem of our hypergraph p-Laplacian. Hence, we aim to establish the theoretical background of spectral clustering using p-Laplacian, such as the nodal domain theorem and the Cheeger inequality. The nodal domain theorem is about the bounds of the number of nodal domains, which can be seen as a “division”. Using this nodal domain, the Cheeger inequality shows how much p-eigenproblem can approximate a minimal graph cut.

4.1 Nodal domain theorem of the p-Laplacian

This section aims to extend the classical nodal domain theorem to our framework. The nodal domain theorem in the discrete domain is developed analogously from the Courant’s nodal domain theorem in the continuous domain (Courant and Hilbert, 1962). In the continuous case, a nodal domain is defined as a region for a function where a sign does not change. Therefore, a nodal domain marks the natural division of regions of real values. The nodal domain theorem shows a connection between eigenvectors of Laplacian and nodal domains; the theorem describes the bounds of the number of nodal domains of eigenvectors of Laplacian (Lindqvist, 2008). The same idea can be established in the discrete domain, i.e., a nodal domain is a connected sub-hypergraph where a sign of p-eigenvector does not change. This nodal domain can be seen as a “partition” by the p-eigenvector in the discrete domain. A next question is “can we obtain a similar bound to the number of this nodal domain?”

We begin with the definition of a nodal domain for a hypergraph.

Definition 8

A nodal domain is a maximally connected subgraph A of hypergraph G such that for \(\psi \in H(V)\) A is either \(\{v\mid\) \(\psi\) \((v)\) \(0\}\) or \(\{v\mid\) \(\psi\) \((v)\) \(0\}\).

Next, with this definition, we discuss the nodal domain theorem for our hypergraph p-Laplacian. The nodal domain theorem for graph Laplacian has been proven in Fiedler (1975), generalized to graph p-Laplacian by Tudisco and Hein (2016), and extended to a particular type of hypergraph p-Laplacian by Li and Milenkovic (2018). In this line of research, we extend these nodal domain theorems to our abstract class of hypergraph p-Laplacians as follows;

Theorem 9

Let \(0\) \(=\) \(\lambda _1\) \(\lambda _2\) \(\le\) \(\ldots\) \(\le\) \(\lambda _{k-1}\) \(\lambda _{k}\) \(=\) \(\ldots\) \(=\) \(\lambda _{k+r-1}\) \(\lambda _{k+r}\) \(\le\) \(\ldots\), be eigenvalues of \(\Delta _p\), and \(\psi _k\) is an associated eigenvector with \(\lambda _{k}\). Then \(\psi _k\) induces at most \(k\) \(+\) \(r\) \(-\) \(1\) nodal domains.

As seen in Theorem 9, the nodal domain theorem studies the structure of p-eigenvectors of p-Laplacian; Theorem 9 shows the bound on the number of nodal domains of p-eigenvectors. The number of nodal domains matters to Cheeger inequality, which is a theoretical justification for spectral methods via our p-Laplacian. We will discuss this Cheeger inequality next.

4.2 k-way cheeger inequality

This section establishes the k-way Cheeger inequality for our hypergraph p-Laplacian. As we saw in Sect. 2.2, the 2-way Cheeger inequality serves as the connection between Cheeger cut and eigenproblem. Moreover, the inequality gives a performance guarantee of the relaxed graph partitioning problem. We want to establish such a connection between the Cheeger cut and p-eigenproblem of our hypergraph p-Laplacian. For this purpose, we aim to generalize this Cheeger inequality to our hypergraph p-Laplacians to achieve spectral partitioning via our p-Laplacian.

We start our discussion from a 2-way Cheeger cut. Let \(A\) \(\subset\) \(V\) be a set and \({\overline{A}}\) be a complement of A. The generalized Cheeger cut may be defined as

$$\begin{aligned}&C(A) := \frac{ \partial V(A,{\overline{A}})}{\min (\mathrm {vol}(A),\mathrm {vol}({\overline{A}}))},\ \mathrm {\ where\ } \mathrm {vol}(A)=\sum _{u\in A}\mu (v) \nonumber \\&\partial V(A,{\overline{A}}) {:}{=}\sum _{e: u,v \in e, u\in A,v\in {\overline{A}} } w(e)c(e), \quad c(e):=\sum _{u,v \in e}c(u,v,e,\psi ). \end{aligned}$$
(16)

We call the optimal cut \(h_2\) \({:}{=}\) \(\min _{A \subset V}\) \(C(A)\) as Cheeger constant. Considering the standard graph, this generalized Cheeger cut becomes the standard Cheeger cut discussed in Sect. 2.2. We shall extend this generalized 2-way Cheeger cut to k-way Cheeger cut. Consider disjoint partitioning of V into k sets \(\{V_i\}_{i=1,\cdots ,k}\). Then, we define the k-way Cheeger constant as

$$\begin{aligned} h_k {:}{=}\min _{\{V_i\}_{i=1,\cdots ,k}} \max _{j \in \{1,\ldots ,k \}} C(V_j). \end{aligned}$$
(17)

Similarly to the previous studies (Tudisco and Hein, 2016; Li and Milenkovic, 2018), we establish k-way Cheeger inequality for our p-Laplacian as follows.

Theorem 10

Let \((\lambda _k, \psi _k)\) be a p-eigenpair of \(\Delta _p\), \(m_k\) be the number of nodal domains of \(\psi _k\). Then,

$$\begin{aligned} \left( \max _{v}\frac{d(v)}{\mu (v)}\right) ^{-(p-1)} \left( \frac{h_{m_k}}{p}\right) ^{p} \le \lambda _k \le \min (k,\max _{e \in E} \vert e \vert )^{p-1} h_k. \end{aligned}$$

Corollary 11

Let \((B, {\overline{B}})\) be the cut found by the second eigenvector of the p-Laplacian \(\psi\), such that \((\{ v : \psi (v) \ge t\},\{ v : \psi (v) < t\})\) minimizing Cheeger cut. Then,

$$\begin{aligned} \left( \frac{1}{\max _{v}d(v)/\mu (v)}\right) ^{p-1} \left( \frac{C(B)}{p}\right) ^{p} < 2 h_2 \end{aligned}$$
(18)

Theorem 10 is an extension of the graph Cheeger inequality in terms of three perspectives; graph to hypergraph, 2-way to k-way, and the standard 2-Laplacian to our abstract class of p-Laplacians. Following Theorem 10, Corollary 11 is the bound of the relationship between the cut obtained by the second p-eigenvector of our abstract class of p-Laplacians and the generalized Cheeger constant. Similarly to the classical case in Sect. 2.2, Theorem 10 shows how we approximate the k-way Cheeger constant by relaxing discrete k-way cut problem into p-eigenproblem of \(\Delta _p\); Theorem 10 gives the upper and lower bounds of the optimal cut using k-th p-eigenvalue. Moreover, Corollary 11 gives a guarantee for the worst case of a 2-way cut obtained by p-eigenvector. These bounds can be said to guarantee the performance of the cut resulting from spectral methods via p-eigenvectors of our p-Laplacian. Hence, Theorem 10 and Corollary 11 motivate us to use spectral methods via our p-Laplacian for the hypergraph partitioning problem instead of the costly discrete original cut problem. This inequality gives the tightest bound when \(p\) \(\rightarrow\) \(1\). Since the original cut problem is NP-hard, the eigenproblem is also an NP-hard problem in this asymptotic case. Moreover, by considering the case of the standard graph 2-Laplacian, this inequality can be reduced to the classical Cheeger inequality. Also, when \(k\) \(=\) \(2\), this inequality is for \(h_2\), which is a 2-way Cheeger cut. Therefore, in the next section, we focus on constructing a spectral algorithm for a 2-way partitioning.

Finally, we remark that the discussion on k-way Cheeger cut is a generalization of the standard graph 2-way Cheeger inequality of 2-Laplacian (Alon and Milman, 1985; Alon, 1986), k-way Cheeger inequality of 2-Laplacian (Lee et al., 2014), k-way Cheeger inequality of graph p-Laplacian (Tudisco and Hein, 2016), and k-way Cheeger inequality of p-Laplacian of submodular hypergraph (Li and Milenkovic, 2018) cases. We also note that the proofs for the nodal domain theorem (Theorem 9) and the Cheeger inequality (Theorem 10) are a natural generalization of the previous studies such as Tudisco and Hein (2016) and Li and Milenkovic (2018). Rather than introducing new techniques in the proofs, the focus of this work is that we generalize the hypergraph p-Laplacian as much as possible where these structures preserve in order to provide a unified framework.

5 Hypergraph partitioning via p-Laplacian

Sect. 4 shows the guarantee of performance of eigenproblem instead of the NP-hard discrete Cheeger cut problem. Therefore, this section establishes our partitioning algorithm, exploiting p-eigenpairs of our hypergraph p-Laplacian.

We firstly discuss a property of p-eigenvectors of \(\Delta _p\). For the p-Laplacian eigenproblem, since the p-Laplacian is nonlinear, p-eigenvectors are not necessarily be orthogonal to each other. However, we still want a relationship between p-eigenvectors. For this motivation, instead of the orthogonality, Luo et al. (2010) proposed p-orthogonality as follows.

Definition 12

Let \(\Xi _{p} (\psi )\) be a vector, whose v-th element is \(\xi _{p}(\psi (v))\). We call \(\psi \ne 0\) and \(\omega \ne 0\) as p-orthogonal if \(\Xi _{p}\left( \psi \right) ^{\top }\) \(\Xi _{p}\left( \omega \right)\) \(=\) \(0\).

In order to analyze this p-orthogonality of our abstract class of p-Laplacians, we recall the Taylor expansion, which is often used for approximating functions in physics. For example, in the motion of a pendulum, if we approximate functions with respect to the amplitude of the angular of the pendulum by Taylor expansion, the motion equation is approximated by a simple harmonic motion (Courant and Hilbert, 1962). The Taylor expansion leads an infinite differentiable function f(x) to write as \(f(x)\) \(=\) \(f(a_{f,x})\) \(+\) \(\sum _{n=1}^{\infty } f^{(n)}\) \((x\) \(-\) \(a_{f,x})/n!\), where \(a_{f,x}\) is a constant, and \(f^{(n)}\) is a n-th derivative of f. This Taylor expansion is often used to approximate the function. If we consider approximating the function by the first order, the remainder (the second or higher terms) can be seen as the approximation error. For two functions f and g, if the error term can be written as the sum of the second or higher terms, i.e., \(f(x)\) \(=\) \(g(x)\) \(+\) \(o_{2}\), where \(o_2\) \(=\) \(\sum _{n_{f}+n_{g} \ge 2, n_{f},n_{g} \in {\mathbb {N}}^{+}}\) \(\beta _{f,g,n_{f},n_{g}}\) \((x\) \(-\) \(a_{f,x})^{n_{f}}(\) \(x\) \(-\) \(a_{g,x})^{n_{g}}\), and \(\beta _{f,g,n_{f},n_{g}}\) is a coefficient, then we call the function f is equal to g up to the second order of Taylor expansion. Using this notion and p-orthogonality, we obtain the following;

Theorem 13

Let \((\psi\),\(\lambda ^{\psi })\) and \((\omega\), \(\lambda ^{\omega })\) be the p-eigenpairs of \(\Delta _p\). The p-eigenvectors \(\psi\) and \(\omega\) are p-orthogonal up to the second order of Taylor expansion with the vertex if \(\lambda ^{\psi }\) and \(\lambda ^{\omega }\) are not equal up to the second order of Taylor expansion.

Theorem 13 tells us that two p-eigenvectors are approximated p-orthogonal, up to the second-order of Taylor expansion.

We move our discussion to the second p-eigenpair by considering the Rayleigh quotient. In the graph p-Laplacian (Bühler and Hein, 2009) and the clique p-Laplacian case (Saito et al., 2018) and also the continuous case (Lindqvist, 2008), the global minimum of a variant of Rayleigh quotient gives the second p-eigenpair. Similarly to these works, we propose to define the following quotient as

$$\begin{aligned} R^{(2)}_{p}(\psi ) {:}{=}\frac{S_{p}(\psi )}{\min _{\eta }\Vert \psi - \eta \psi _1 \Vert _{p}^{p}}, \end{aligned}$$
(19)

where \(\psi _1\) is the first p-eigenvector. This quotient is supported by the following theorem.

Theorem 14

The global solution of Eq. (19) is given by \(\psi ^{*} = \psi _2 + \eta ^{*}\psi _1\), where \(\eta ^{*}=\text{argmin}_{\eta } \Vert \psi _2 - \eta \psi _1 \Vert _{p}^{p}\), and \(\psi _2\) is the second p-eigenvector.

This theorem shows that we have an exact identification for the second p-eigenpair; minimizing Eq. (19) gives the second p-eigenpair of \(\Delta _{p}\). However, the major disadvantage is that Eq. (19) is not convex and hence difficult to obtain the global optimum; optimization algorithms applied to Eq. (19) would give the local optimum instead of the global optimum.

Therefore, we next consider a strategy to get a better local optimum for a 2-way hypergraph partitioning. The idea to obtain a better optimum is using the exact p-orthogonality as a constraint, instead of the constraint “p-orthogonal up the second order”, which each pair of p-eigenvectors must obey (Theorem 13). The reason why we use this strategy is as follows. Due to the non-convexity of Eq. (19), the solution obtained by an optimization algorithm can be a local optimum. However, this local optimum is not guaranteed to be p-orthogonal up to the second-order to the first p-eigenvector \(\psi _1\) while \(\psi _2\) is so. To gain a better optimal solution, we exploit Theorem 13, and we want a constraint that enforces that the solution to be p-orthogonal up the second-order to \(\psi _1\). However, it is difficult to work directly with this constraint “p-orthogonal up the second order”. To ease this difficulty, we propose to use an exact p-orthogonal condition as a constraint. Thanks to Theorem 13, this exact constraint can be seen as an approximated condition by the second order of Taylor expansion. We borrow this approximation idea from physics; it is common to approximate physical phenomena by the second order of Taylor expansion, such as the explained motion of a pendulum case. Following this discussion, we incorporate the exact p-orthogonality as a constraint. Then, we consider the optimization problem as,

$$\begin{aligned}&\min _{\psi } J(\psi ) = R^{(2)}_{p}(\psi ) \quad \mathrm {s.t.} \quad \Xi _p (\psi )^{\top } \Xi _p (\psi _1)=0. \end{aligned}$$
(20)

Since \(R^{(2)}_{p}(\alpha \psi )\) \(=\) \(R^{(2)}_{p}(\psi )\) for \(\alpha \ne 0\), we can arbitrarily change the scale of \(\psi\) to \(R^{(2)}_{p}(\psi )\). Hence, we add the scale constraints to Eq. (20) as

$$\begin{aligned} J'(\psi ) = R^{(2)}_{p}(\Xi ^{-1}_p(\psi )) \quad \mathrm {s.t.} \quad \psi ^{\top } \psi _1=0, \Vert \psi \Vert _{2}^{2}=1, \end{aligned}$$
(21)

which gives the same global minimum solution as Eq. (20). To solve Eq. (21), we propose to apply natural gradient algorithm (Amari, 1998) as shown in Algorithm 1, similarly to Luo et al. (2010). If we use a simple gradient method as \(\partial J'/\partial \psi\), the orthogonal condition does not hold for each update. Instead of using this for the update of Algorithm 1, we use \(\frac{\partial J'}{\partial \psi } - \psi (\frac{\partial J'}{\partial \psi })^{\top } \psi\) so that we can preserve the orthogonal condition in Eq.(21) (Luo et al., 2010). The convergence of this algorithm is also guaranteed (Luo et al., 2010).

figure a

6 Related work

This section compares related hypergraph 2-Laplacians and p-Laplacians and partitioning algorithms. This section is complementary to Sect. 3.4. While Sect. 3.4 defines the related p-Laplacians, this section focuses on discussing the context and explaining the difference between ours and existing ones.

One major hypergraph Laplacian is from a clique expansion way (clique). The unweighted setting edge-unnormalized 2-Laplacian was proposed in Rodriguez (2002) (clique e-un). This 2-Laplacian and Laplacians proposed in other studies (Zien et al., 1999; Bolla, 1993; Gibson et al., 2000) are theoretically equivalent (Agarwal et al., 2006). In this line of research, 2-Laplacian from a differential geometry viewpoint is proposed (Saito et al., 2018). When \(p\) \(=\) \(2\), this Laplacian also can be explained by the clique expansion way but normalized by a degree of edge (clique e-n). Moreover, this p-Laplacian is proposed based on forming vertex-wise energy (clique e-n-vw) (Saito et al., 2018), while ours is edge-wise energy. In Saito et al. (2018), the p-energy \(S_{p}^{VW}(\psi )\) is defined using the norm of the hypergraph-gradient is defined \(\nabla \psi\) at vertex v as

$$\begin{aligned} S_{p}^{VW}(\psi ){:}{=}\sum _{v \in V}\Vert \nabla \psi (v) \Vert ^{p}, \mathrm {where} \ \Vert \nabla \psi (v) \Vert {:}{=}\left( \sum _{ e \in E : e_{[1]} = v } \frac{\vert \nabla \psi (e) \vert ^{2}}{\vert e \vert !} \right) ^{\frac{1}{2}}. \end{aligned}$$
(22)

This idea comes from the definition of the energy around vertex as \(\Vert \nabla \psi (v) \Vert\) and obtains total energy by summing up those energies over all vertices. Note that if we assume the standard graph, p-Laplacian in Saito et al. (2018) corresponds to a series of graph studies (Zhou and Schölkopf, 2005; Bougleux et al., 2009), which also assume vertex-wise energy. On the other hand, ours corresponds to the p-Laplacian, which assumes edge-wise energy (Bühler and Hein, 2009; Tudisco and Hein, 2016). Hence, our work does not incorporate p-Laplacian proposed in Saito et al. (2018) since the p-Dirichlet sum setting is different. Remark that when \(p=2\), our model incorporates \(\textsc {clique e-n-vw}\) by using c in Table 2. However, Saito et al. (2018) did not give theoretical analyses such as the nodal domain theorem or the Cheeger inequality. Moreover, Saito et al. (2018) did not give a specific partitioning algorithm exploiting characteristics of p-Laplacian such as p-orthogonality. Hence, we need to use a general-purpose optimization method for the p-eigenproblem. However, such methods do not always leverage the characteristics of p-Laplacian, which would possibly lead to better performance in terms of space, time, and accuracy.

Another line of research is in a star expansion way, shown in Sect. 3.4. Zhou et al. (2006) proposed 2-Laplacian based on a lazy random walk view. Agarwal et al. (2006) shows that this 2-Laplacian is theoretically equivalent to Laplacians by studies of (Zien et al., 1999; Li and Solé, 1996), also further discussed in Ghoshdastidar and Dukkipati (2017a).

Other Laplacian is from a total variation way and subsequent submodular way (tv/sub). A regularization framework for \(p\ge 1\) is proposed in Hein et al. (2013) with hypergraph partitioning algorithm for \(p\) \(=\) \(1\), and further explored in Chan et al. (2018). This idea is extended to a submodular hypergraph (Li and Milenkovic, 2018). A submodular hypergraph has an objective function of energy using one form of Lovász expansion of a submodular function. Moreover, sub incorporates the inhomogeneous cut proposed by Li and Milenkovic (2017), where weights can vary when we partition the edge. Along with this new class of hypergraph cut, Li and Milenkovic (2018) proposed partitioning algorithms for \(p\) \(=\)1 and \(p\) \(=\)2. Seeing the definition (Eq. (15)), submodular p-Laplacian describes a broad class hypergraph p-Laplacian using submodular function. We also mention the \(p=2\) case for submodular cut objective functions are discussed in Yoshida (2019) using the general form of Lovász extension. Moreover, a series of research (Veldt et al., 2020; Benson et al., 2020) directly defines objective function using submodular function, instead of Lovász extension. While submodular models seem flexible, ours are more versatile since we do not assume submodularity. The submodular p-Laplacian is a special case of ours as long as the conditions in Definition 1 are satisfied. Additionally, our algorithm can address arbitrary p, while algorithms from Hein et al. (2013) and Li and Milenkovic (2018) focused on the specific p (\(p\) \(=\)1 or \(p\) \(=\)2).

We remark that our framework can address existing 2-Laplacian from clique and star, and tv/sub p-Laplacian. Moreover, our partitioning algorithm can work for arbitrary \(p\) \(1\), while those existing algorithms focus on specific p or use a general-purpose optimization algorithm without theoretical analyses. We also note that our framework can define a new p-Laplacian, which is (but not limited to) normalized TV, shown in Sect. 3.4. However, we need to recognize that it is out of the scope of our work to incorporate clique e-n-vw p-Laplacian. Moreover, since our framework based on the relationships of Propositions 4 and 5, our framework does not incorporate a tensor modeled Laplacian for uniform hypergraph, where all edges connect the same number of vertices (Cooper and Dutle, 2012; Hu and Qi, 2012; Qi, 2013; Hu and Qi, 2015; Chen et al., 2017; Ghoshdastidar and Dukkipati, 2017b; Chang et al., 2020). The reason why we cannot incorporate clique e-n-vw p-Laplacian and tensors into our model is that our model is based upon the energy formed as Eq. (8), while energies for those two are differently defined. Particularly, we note that the difference in the aims between tensor modeled Laplacians and our framework is as follows; while the tensor modeled Laplacians are the tensor operation, our framework focuses on the contraction made by the energy Eq. (8).

Lastly, we comment on p-Laplacians in the continuous domain. The continuous p-Laplacian has a longer history than in the discrete domain. The Dirichlet energy is defined similarly to Eq. (8), and the variation of the energy would give the Laplace equation (Courant and Hilbert, 1962). The energy is minimized when the Laplace equation is satisfied. This framework extended to arbitrary p-norm, such as Binding and Rynne (2008), and was theoretically analyzed in many ways, such as nodal domain theorem and Cheeger inequality. We remark that in the continuous case, we can identify the second p-eigenpair similarly to Eq. (19), but no exact identification for the third or higher has been found yet (Lindqvist, 2008). For more comprehensive study, we refer to Lindqvist (2008) and Struwe (2000).

7 Preliminary experiments

Table 3 Dataset summary. All the dataset has two classes. The parameter \(\delta\) is the average edge degree parameter \(\delta :=\) \(\sum _{e \in E}\) \(\vert e \vert\)/\(\vert E \vert\), and \(\tau :=\) \(\sum _{e \in E}\) \(\vert e \vert\) \(/\) \(\vert V \vert\) \(\vert E \vert\) is the average ratio of the number of vertices connected by each edge to the total number of vertices, which we can recognize as "density" of hypergraph
Table 4 The experimental results for hypergraph partitioning for our methods and existing ones. We applied our algorithm 1 for \(p>1\) to five geometry of the hypergraph Laplacians (clique e-n, clique e-un, star, tv v-un, tv v-n). We compared these to the existing fixed p algorithms for the five hypergraph Laplacians; clique e-n for \(p=2\) by Saito et al. (2018), clique e-un for \(p=2\) is by Rodriguez (2002), star \(p=2\) is by Zhou et al. (2006), and tv v-un and tv v-n for \(p=1\) is by Hein et al. (2013). Moreover, for clique e-n, we also compared with the algorithm for \(p>1\) (clique e-n-vw) by Saito et al. (2018). Thus, we compare five instantiations of ours with six existing ones. We compare the performance by error. Performance with ours is marked with #. For free-parameter \(p>1\), we give the value of p giving the smallest error in the column p next to the dataset. The superscripted * means fixed-parameter. See the main text for more discussion

Our experiments aim to evaluate our approximation algorithm (Algorithm 1) as a function of p and the particular type of hypergraph Laplacians (star, clique, and tv/sub).

Primary Objective of the Experiments. The objective of the experiments is to see if our algorithm (Algorithm 1) improves on the existing methods introduced in Sect. 6. Algorithm 1 has two key “levers”; the choice of the parameter p and the choice of hypergraph Laplacian, i.e., the function c in the gradient (Definition 1). On the one hand, in the previous works discussed in Sect. 6 the algorithms for hypergraph p-Laplacians were designed for a particular p (e.g., \(p=1,2\)) or applied to all \(p>1\) without theoretical justifications. On the contrary, Algorithm 1 for our abstract class of hypergraph p-Laplacians works for all \(p>1\) with theoretical justification. Therefore, we provide experiments for a wide range of hypergraph Laplacians for \(p>1\) in comparison to existing algorithms. We thus apply Algorithm 1 to five existing hypergraph Laplacians (clique e-n, clique e-un, star, tv v-un, and tv v-n) for \(p>1\). We compare these to the existing fixed p algorithms for particular type of Laplacians. Moreover, since clique e-n has a partitioning algorithm using a particular hypergraph p-Laplacian (clique e-n-vw by Saito et al. (2018), see Sect. 6 for the definition), we also compare to this. Hence, we compare five instantiations of ours with six previous algorithms as;

  • Algorithm 1 for all \(p>1\) is applied to the five geometries:

    1. 1.

      clique e-n: \(p>1\)

    2. 2.

      clique e-un: \(p>1\)

    3. 3.

      star: \(p>1\)

    4. 4.

      tv v-un: \(p>1\)

    5. 5.

      tv v-n: \(p>1\)

  • Comparison as six existing algorithms:

    1. 1.

      clique e-n: \(p=2\) (Saito et al., 2018)

    2. 2.

      clique e-n-vw: \(p>1\) (Saito et al., 2018)

    3. 3.

      clique e-un: \(p=2\) (Rodriguez, 2002)

    4. 4.

      star: \(p=2\) (Zhou et al., 2006)

    5. 5.

      tv v-un: \(p=1\) (Hein et al., 2013)

    6. 6.

      tv v-n: \(p=1\) (Hein et al., 2013)

Note that there are variety of submodular function for sub can be considered, but we made tv by Hein et al. (2013) as a representative of the sub group.

Experimental Setup. We build a hypergraph using the method for categorical datasets introduced in Zhou et al. (2006). Each instance in the dataset consists of \(\vert E \vert\) categories. The vertices of the hypergraph are the instances. The edges are defined by the attribute values. Each attribute value within a given category defines an edge where each vertex in the edge corresponds to those instances that share the same attribute value. All edges are given weight one. Our experiment is performed on the datasets mushroom, cancer, chess, and congress from the UCI repository (Dua and Graff, 2022), and two datasets created from 20newsgroups \(^*\)Footnote 1 (for short “news”) with two classes (1,2) and (3,4). All of these were used in the previous studies (Zhou et al., 2006; Hein et al., 2013; Saito et al., 2018). We summarize the datasets in Table 3. The value \(\delta\) \(=\) \(\sum _{e \in E}\) \(\vert e \vert\)/\(\vert E \vert\) is the average edge degree. Furthermore, \(\tau\) \(=\) \(\sum _{e \in E}\) \(\vert e \vert\) \(/\) \(\vert V \vert\) \(\vert E \vert\) is the average ratio of the number of vertices connected by each edge to the total number of vertices, which we can recognize as "density" of a hypergraph. In Table 4 we compare 11 instantiations of hypergraph p-Laplacians as discussed above. For clique e-n-vw (\(p\) \(\in\) \([1,3]\)) we conducted experiments using the same setting as Saito et al. (2018) as the setting matches to ours. For our methods we used \(p\) \(\in\) \(\{1.1,1.2,\ldots ,3.0\}\); we limited ourselves to \(p\) \(\le\) \(3\) since the Cheeger Inequality (Theorem 10), is progressively looser for larger p values. For the free-parameter experiments, we set the starting condition of our algorithm to the solution of the corresponding fixed-parameter Laplacian. We used the step size \(\alpha\) \(=\)0.01\(\Vert \psi \Vert _1^1\)/\(\Vert \Omega \Vert _1^1\) as done in Luo et al. (2010). For all methods, a second eigenvector was hence computed, and we used the k-means objective to determine the “split point” on the eigenvector (as was also done in (Zhou et al., 2006; Saito et al., 2018)). We evaluated the performance of our algorithms via their error rate, i.e., (# of errors)/(# of data), as used in (Zhou et al., 2006; Hein et al., 2013; Saito et al., 2018).

Overall Results The results are summarized in Table 4. First, looking into our algorithm (Algorithm 1) vs. fixed-parameter algorithms (existing methods, see the performances associated with \(^{*}\) in Table 3) for five geometries, we see that our methods consistently demonstrate improved performance from existing fixed-parameter methods. We also remark that among for clique e-n ours consistently outperforms clique e-n-vw, except chess.

Further Discussion A natural question to ask for our algorithm is “which hypergraph Laplacian and which p is suitable?”. A further look into our abstract class of p-Laplacians can answer this question; the experimental result reveals how the choice of p and type of the hypergraph Laplacian are connected to the underlying parameters \(\delta\) (average edge degree) and \(\tau\) (density) of the datasets. Although the experiments are preliminary, there seem to be consistent trends that provide guidance on a range of p and the type of Laplacian to consider. Further, the experimental guidance is supported by the theory given earlier in this manuscript.

Our observation is that the density parameter (\(\tau\)) is related to the range of p while the the average edge degree parameter (\(\delta\)) is connected to the hypergraph Laplacian. The density parameter (\(\tau\)) indicates the natural range for p. The dataset chess is significantly more dense (large \(\tau\)) than the other datasets. The table indicates that while large p tends to work better for the chess dataset, the tendency is that small p improves on large p for the non-chess datasets. To understand this, we consider the trade-off between the Cheeger inequality (see Theorem (10)) and the p-Dirichlet sum. The Cheeger inequality is tighter for smaller p; hence, the relaxed objective becomes closer to the discrete objective. On the other hand, if we examine the p-Dirichlet sum (see Eq. (8)), one may observe that it is a p-norm to p-th power of the hypergraph-gradient. The dimensionality of hypergraph-gradient scales with the graph density (\(\tau\)). Hence in the dense case, a relatively larger p is needed to induce the same magnitude of change in the p-Dirichlet sum, which is connected to the second p-eigenvector via Rayleigh quotient (see Eq. (14)). The analogous phenomena connecting the choice of p to density have been observed in a standard graph such as online graph transduction (Herbster and Lever, 2009). Turning to the average edge degree parameter (\(\delta\)), we observe the following preliminary indications that suggest how to choose the Laplacian as a function of \(\delta\). There we see on the large \(\delta\) dataset (chess and mushroom) that all tv methods out perform star and clique methods of our p-Laplacian whereas for the other smaller \(\delta\) datasets all star and clique methods outperform all tv methods. We have provided some guidance on the choice of Laplacian and the range of p based on the density \(\tau\) and average edge degree \(\delta\) of the graph. For more detailed experimental results and further discussion, see Appendix 12.

We further observe a different behavior than the semi-supervised learning in (Alamgir and Luxburg, 2011; Slepcev and Thorpe, 2019) using the same energy Eq. (8) in the standard graph setting. These works deal with the case of semi-supervised learning using p-Laplacians of a graph with an asymptotically large number of vertices. In this case, the problem does not degenerate into the trivial one when p is large, while the problem does so when p is small. However, from these experimental results, we observed a different behavior; small p also works when \(\tau\) is small, as we discussed. This might be because there is a structural difference in the use of the p-Laplacian between semi-supervised learning and unsupervised learning.

8 Conclusion

This work has considered hypergraph spectral clustering. We have proposed a general framework for hypergraph p-Laplacian and provided theoretical results for our p-Laplacian. We also have proposed a convergent hypergraph partitioning algorithm with respect to our abstract class of p-Laplacian exploiting theoretical results. Our experiment has shown that our algorithm outperforms the existing spectral clustering algorithms for hypergraph Laplacians. Also, we have shown practical guidance on the choice of p-Laplacian.

There are several future directions. A fruitful direction would be to explore if our p-Laplacian can converge to the continuous p-Laplace operator in the limit of infinite data, similarly to the graph case (Belkin and Niyogi, 2003) and the hypergraph case (Saito, 2022). Moreover, similarly to the previous studies (Hein et al., 2013; Saito et al., 2018), semi-supervised learning using \(S_p\) as a regularizer would be valuable for a future study. Furthermore, while we conduct our experiment on a real dataset, it would be interesting to conceive an illustrative toy dataset where some hypergraph Laplacian works better than the others or where some p works while \(p=2\) does not. It would also be valuable to study multi-class clustering for arbitrary p using higher-order eigenvectors similarly to the standard graph 2-Laplacian case (von Luxburg, 2007), as opposed to the methods using recursive one-vs-rest two-class partitioning (Bühler and Hein, 2009; Hein et al., 2013). However, unlike the 2-Laplacian matrix, where those can be easily obtained, it would be difficult to obtain the third or higher p-eigenpairs of p-Laplacian. The reason is that while we know the algebraic identification for the second p-eigenpair (Eq. (19)), there have not been such identifications for the higher eigenpairs both in the discrete and continuous domain (Lindqvist, 2008).