9.1 Introduction

With the fast development of information technologies, social media data have increased rapidly. Social media platforms provide new ways to produce and receive content, especially user-generated content. Users can shop, watch movies, and instantly participate in the propagation, interaction, and sharing of news events on the Internet. Rich behavioral data on social media platforms are generated by great numbers of users every day, which support different downstream applications and provide insights for better understanding of users’ intension and portrait.

A typical social media analysis application is the so-called recommender system [1]. When listening to music, shopping, watching movies on the Internet, or looking for friends on social network services, users are likely to be drowned in an unprecedented amount of information. This is what we call “information overload.” To address this issue, recommender systems have been developed for decades. The main goal of recommender systems is to forecast how users would react to a product by better understanding their preferences based on the user’s historical interaction data, user profile, item attributes, context data, and other information. This could help predict whether the users like an item or not. For example, in the movie recommender system, the user profile may contain user ID, age, gender, income, marital status, and more. The movie (item) attributes’ information may include movie ID, name, genre, director, released time, actors, and more. Interactive data contain the movies which users have seen and even provided comments. The goal of the movie recommender system is to integrate this information to recommend movies that users might like.

Another popular social media analysis application is sentiment analysis [2]. The uses on social media platforms have generated a large amount of opinion data every moment in recent years, which helps to decode and mine users’ attitudes on specific topics. Researchers have begun to look at sentiment analysis of users in social media contexts. In economics, stock price fluctuations can be forecast to some extent by analyzing the sentiment of social media users. In politics, social media posts can reflect public opinions. Users’ sentiments may also affect their behaviors, for example, emotionally charged people are more likely to forward and repost tweets. Therefore, sentiment analysis plays an important role in social media analysis. However, sentiment analysis is challenging due to the multi-modality and complexity of social media data. For example, a tweet may include text, images, videos, and possibly more. Furthermore, there exist complex correlations among posts in various areas, such as the dimensions of time, location, and user preferences. The interaction among these users further increases the challenges in this task.

In addition to the posts on social media, physiological signals can also be used to analyze the emotion of people [3]. Compared with text, facial expressions, and other data, physiological signals are not easy to disguise and can better reflect real emotions of people. Therefore, emotion recognition based on physiological signals plays an important role in many applications such as clinical diagnosis, which also has played a significant role on social media analysis when these data are available. Physiological signals of different modalities contain complementary information representations of human emotions. It is of great significance to discover and utilize the correlations among these representations to improve the accuracy of emotion recognition.

From the above examples, it can be readily seen that one important issue of social media analysis is how to model complex correlations among data and to make use of the complementary information among multi-modal representations to better understand data. Hypergraphs have been widely used in social media computing in recent years because of their usefulness in complex data modeling. In the following, we discuss three applications of hypergraph computation in social media analysis: recommender system, sentiment analysis, and emotion recognition. In the recommender system, we discuss hypergraph-based collaborative filtering [4] and attribute inference. We then present sentiment prediction [5] and social event detection using hypergraph computation [6] for sentiment analysis. In the third section, we introduce two different hypergraph computation methods of emotion recognition using multi-modal physiological signals [7,8,9]. Part of the work introduced in this chapter has been published in [4,5,6,7,8,9].

9.2 Recommender System

In recent years, the Internet has become an integral part of people’s daily life—shopping, watching the news, listening to music, etc., on the Internet. However, with the explosion of information, people find it is increasingly difficult to sift through the massive volumes of data on the Internet to access the needed information. For example, when a user wants to watch a movie online and access the movie site, the user is likely to drown in thousands of movies and cannot find the one in mind. This is called the “information overload” issue.

Recommender systems emerge under such circumstances. Recommender systems are a powerful tool for reducing the problem of information overload since they may assist users to find useful information and assist service providers to boost profits. Recommender systems have been used in many online systems, from general platforms including e-commerce, social media, and content sharing to vertical services such as movie, news, and music websites.

The core of the recommender system is to understand the users through their attribute information and historical interactions and then predict whether they would like one item. It is worth noticing that the user-side information, the item-side information, as well as the interaction data, play a vital role in this process. The user-side information, including gender, age, personality, etc., often reflects the users’ preference. For example, male users may be more likely to read military and political news, while female users may prefer fashion and entertainment news. The item-side information, such as the category, text description, image, etc., can characterize the attribute of the item. Such attribute information may suggest potential consumer groups. For instance, health supplements may be bought more often by the elderly, while electronics are more likely to be purchased by younger people. Historical interactions also involve potential users’ preferences, which are suggested by the assumption that “behavioral similar users may have similar preferences on items.” Figure 9.1 shows an example for recommender system based on similar patterns.

Fig. 9.1
An illustration of 2 people doing similar activities such as playing tennis and running. Both of them buy meat and salad. An energy drink bought by the person who runs, is recommended for the person who plays tennis.

An example of recommender system based on similar patterns

We can find from these examples that what recommender systems actually do is to distinguish similar users from different perspectives based on complex, multi-modal given data. Therefore, one key problem is how to model and learn the complex relationship between users and items. Recently, hypergraph computation has attracted much attention and has been applied to recommender systems to help solve this problem. The hypergraph can naturally integrate the user-side information, item-side information, as well as the interaction data, thanks to flexible hyperedges and especially hyperedge groups. Therefore, similar users/items can be connected in different areas. In this section, we discuss two examples of applying hypergraph computation in recommender system, i.e., collaborative filtering and attribute inference.

9.2.1 Collaborative Filtering

In the past decades, collaborative filtering (CF), a crucial, popular recommendation technique, has been extensively used in various recommender systems. The fundamental assumption of CF is that consumers who engage in similar behaviors, for example, reading the same kind of news frequently, are likely to have similar tastes for items, such as games, movies, and commodities. A common CF-based solution goes through the following two steps: first, it uses historical interactions to identify similar users and items; and second, it makes suggestions for users based on the information acquired in the last step.

Since people and things have topological links that the network can describe, graph-based CF approaches have attracted a lot of interest in recent years. Although graph-based CF approaches have been explored for a long time and produced respectable performance, there are still certain restrictions. First, the high-order correlations in the user–item network are modeled and utilized insufficiently. For example, CF methods hope to find a group of behavior-similar users. Such associations between users are group-level (beyond pairwise) and cannot be well-captured by the graph structure since only pairwise correlations can be modeled in a graph. Second, when users and things are represented by a graph in graph-based approaches, there are no fundamental distinctions between them. When an item has many users connected to it, it is a popular item. In contrast, being connected to a variety of items does not necessarily mean that a user is well-liked.

Under these circumstances, more adaptable and appropriate user and item modeling is required. Thanks to its adaptable hyperedges, the hypergraph structure, as opposed to the graph structure, offers a more natural approach for representing such high-order and intricate relationships. In this subsection, we present a dual channel hypergraph collaborative filtering (DHCF) framework [4] to solve the aforementioned problems. In the following, we introduce how to model the user–item interactions and learn the high-order connectivity with dual hypergraphs.

(1) Hypergraph Modeling of High-Order Connectivity

Given a user–item network, the high-order connectivity is captured by some self-defined association rules. Based on these rules, several hyperedge groups can be constructed, which can capture higher-order correlations rather than pairwise relationships, e.g., by linking users who behave similarly but without direct connections. For example, we can connect the users who have purchased the same item with a hyperedge, as shown in Fig. 9.2. In addition to the interactions that are apparently visible in the observed data, these rules may also be thought of as a high-order perspective to describe the otherwise raw data. Here we introduce a way to capture the high-order connectivity with hypergraphs for users and items, separately.

Fig. 9.2
An illustration depicts five people. Three people code on a laptop and three of them play with a basketball. It points to a hypergraph where one person is common between the coders and the basketball players.

An example of hypergraph modeling for user–item network

User Hypergraph Construction

We first define the k-order neighbors for items. If there is a path between item a and item b that consists of a series of adjacent vertices and has fewer users than k, then we can say item a (item b) is item b (item a)’s k-order reachable neighbor in the user–item network.

We then define the k-order neighbor users for items. If there are direct paths between user a and item a and item a is itemb’s k-order neighbor, then usera is k-order neighbor for itemb.

The \(B_{u}^{k}(i)\) symbol represents the set of k-order \(B_{u}^{k}(i)\) users for item i. A hypergraph can be defined mathematically as a set family where each set indicates a hyperedge. As a result, a hypergraph may be built using the k-order neighbor users set of an object. By using the above definitions, the corresponding hyperedge group may be constructed as follows:

$$\displaystyle \begin{aligned} \mathbb{E}_{B^k_u} = \{ B^k_u(i) \mid i \in I \}. \end{aligned} $$
(9.1)

The k-order accessible matrix of items is denoted by \({\mathbf {A}}^k_i \in \{0, 1\}^{M \times M}\), which can be written as follows:

$$\displaystyle \begin{aligned} {\mathbf{A}}^k_i = \text{Min}(1, \text{power}({\mathbf{H}}^\top \cdot \mathbf{H}, k)), \end{aligned} $$
(9.2)

where the function pow(M, k) determines the k power of the matrix M in question. The incidence matrix of the user–item network is represented by H ∈ {0, 1}N×M, where N and M are the numbers of users and items, respectively. Then, the incidence matrix of the hyperedge group has the following form:

$$\displaystyle \begin{aligned} {\mathbf{H}}_{B^k_u} = \mathbf{H} \cdot {\mathbf{A}}^{k-1}_i. \end{aligned} $$
(9.3)

The hypergraph \(\mathbb {G}_u\) can capture the overall high-order correlations among users by fusing multiple hyperedge groups that are constructed via k-order reachable rule. Therefore, the H u can be written as

$$\displaystyle \begin{aligned} {\mathbf{H}}_u = f\big(\mathbb{E}_{B^{k_1}_u},\mathbb{E}_{B^{k_2}_u},\dots,\mathbb{E}_{B^{k_a}_u}\big) = \underbrace{{\mathbf{H}}_{B^{k_1}_u}||{\mathbf{H}}_{B^{k_2}_u}||\dots||{\mathbf{H}}_{B^{k_a}_u}}_a, \end{aligned} $$
(9.4)

where ⋅||⋅ is the concatenation operation, which is an example of hyperedge groups fusion function f(⋅).

Item Hypergraph Construction

Here the high-order connectivity for items is defined in a similar way. The k-order accessible matrix of user \({\mathbf {A}}^k_u \in \{0, 1\}^{N \times N}\) is defined as

$$\displaystyle \begin{aligned} {\mathbf{A}}^k_u = \text{Min}(1, \text{power}(\mathbf{H}\cdot {\mathbf{H}}^\top , k)). \end{aligned} $$
(9.5)

The incidence matrix \({\mathbf {H}}_{B^k_i} \in \{0, 1\}^{M \times N}\) can be written as

$$\displaystyle \begin{aligned} {\mathbf{H}}_{B^k_i} = {\mathbf{H}}^{\top} \cdot {\mathbf{A}}^{k-1}_u. \end{aligned} $$
(9.6)

By assuming that we have b hyperedge groups, the item’s hypergraph incidence matrices H i are similarly formulated as follows:

$$\displaystyle \begin{aligned} {\mathbf{H}}_i = f\big(\mathbb{E}_{B^{k_1}_i},\mathbb{E}_{B^{k_2}_i},\dots,\mathbb{E}_{B^{k_b}_i}\big) = \underbrace{{\mathbf{H}}_{B^{k_1}_i}||{\mathbf{H}}_{B^{k_2}_i}||\dots||{\mathbf{H}}_{B^{k_b}_i}}_b. \end{aligned} $$
(9.7)

In this way, the high-order connectivity for both users and items is captured with a hypergraph. Figure 9.3 gives one example of the defined high-order connectivity for users [4]. Subsequently, two embedding look-up tables (\({\mathbf {E}}_u=[{\mathbf {e}}_{u_1}, \ldots , {\mathbf {e}}_{u_N}]\) and \({\mathbf {E}}_i=[{\mathbf {e}}_{i_1}, \ldots , {\mathbf {e}}_{i_M}]\)) are constructed to describe both users and items, which, together with the hypergraph structure, are prepared for later learning.

Fig. 9.3
An illustration displays the connection between 1 and 2-order reachable neighbours and users that are classified as items and users in every stage. These connect with users in an i sub t h column by hyperedges.

The illustration of high-order connectivity for users

(2) High-Order Information Passing

When mixed high-order correlations have been obtained, the neighboring messages are aggregated using the high-order information passing technique, which can be expressed as

$$\displaystyle \begin{aligned} \left \{ \begin{array}{cc} M_u=\mbox{HyConv}(E_u, H_u)\\ M_i=\mbox{HyConv}(E_i, H_i) \end{array}, \right. {} \end{aligned} $$
(9.8)

where HyConv(⋅, ⋅) can be any hypergraph convolution operation as that specified in HGNN (HGNNConv for short). Through information passing from high-order neighbors, the complex correlations between vertices have been encoded to the aggregated messages of users (\(M^{\prime }_u\)) and items (\(M^{\prime }_i\)), respectively. It should be noted that the high-order neighbor mentioned here is not a fixed concept of the direct interactions in user–item network, but an abstract description that can link the similar users/items in latent behavior–attribute space.

To provide an example of high-order information passing, we present the jump hypergraph convolution (JHyConv) in this part. Inspired by some previous work [10], the JHyConv operator creates the learned representations by concatenating a vertex’s current representation with its aggregated neighborhood representation. The JHyConv is written as

$$\displaystyle \begin{aligned} {\mathbf{X}}^{(l+1)}=\sigma\left({\mathbf{D}}_{v}^{-1 / 2} \mathbf{H} {\mathbf{D}}_{e}^{-1} {\mathbf{H}}^{\top} {\mathbf{D}}_{v}^{-1 / 2} {\mathbf{X}}^{(l)} \varTheta^{(l)} + {\mathbf{X}}^{(l)} \right), \end{aligned} $$
(9.9)

where all symbols follow existing notations consistently.

In contrast to conventional HGNNConv, the jump hypergraph convolution enables the model to take into account both its representation and aggregated high-order representations. The messages M u and M i are then used to jointly update E u and E i.

(3) Joint Information Updating

The goal of the joint information updating is to extract information that is discriminatory for users and items, which is formulated by

$$\displaystyle \begin{aligned} \left \{ \begin{array}{cc} E^{\prime}_u=\text{JMU}(M_u, M_i)\\ E^{\prime}_i=\text{JMU}(M_i, M_u) \end{array} \right., \end{aligned} $$
(9.10)

where any learnable feed-forward neural network may be used for JMU(⋅, ⋅). Updated embeddings for users and items are termed as \(E^{\prime }_u\) and \(E^{\prime }_i\), respectively. Here, a shared fully connected layer is applied.

(4) Overall DHCF Layer

The two stages of DHCF framework are illustrated in Figs. 9.4 and 9.5, respectively. The high-order information passing and joint information updating constitute an integrated DHCF layer, which, thanks to its powerful hypergraph structure, can directly model and encode the high-order connectivity.

Fig. 9.4
A flow diagram represents high-order information processing with users and items connected to joint hypergraph convolution. The matrices are interconnected via matrix transpose, element-wise add, and matrix multiplication.

The first stage of the DHCF framework

Fig. 9.5
A flow diagram represents joint information updating that has 5 matrix grids. The matrices are connected via matrix transpose.

The second stage of the DHCF framework

With the specified HyConv and JMU, a DHCF configuration can be formulated as follows:

$$\displaystyle \begin{aligned} \left\{ \begin{array}{rc} f(\dots) =& \cdot||\cdot \\ \text{HyConv}(\cdot, \cdot) =& \text{JHyConv}(\cdot, \cdot) \\ \text{JMU}(\cdot, \cdot) =& \text{MLP}_1(\cdot) \end{array} \right., \end{aligned} $$
(9.11)

where MLP1(⋅) is a fully connected layer, Θ is trainable parameters, and ⋅||⋅ is the concatenation operation.

The matrix form of the embedding propagation on hypergraph can be written as follows:

$$\displaystyle \begin{aligned} \left\{ \begin{array}{ll} & \left. \begin{array}{ll} {\mathbf{H}}_u &= \mathbf{H}||\left(\mathbf{H}({\mathbf{H}}^{\top}\mathbf{H})\right)\\[0.2cm] {\mathbf{H}}_i &= {\mathbf{H}}^{\top}||\left({\mathbf{H}}^{\top}(\mathbf{H}{\mathbf{H}}^{\top})\right) \end{array} \right\} \text{hypergraph setup} \\[0.2cm] & \left. \begin{array}{ll} {\mathbf{M}}_u^{(l)} &={\mathbf{D}}_{u_v}^{-1 / 2} {\mathbf{H}}_u {\mathbf{D}}_{u_e}^{-1} {\mathbf{H}}^{\top}_u {\mathbf{D}}_{u_v}^{-1 / 2} {\mathbf{E}}_u^{(l)} + {\mathbf{E}}_u^{(l)} \\[0.2cm] {\mathbf{M}}_i^{(l)} &={\mathbf{D}}_{i_v}^{-1 / 2} {\mathbf{H}}_i {\mathbf{D}}_{i_e}^{-1} {\mathbf{H}}^{\top}_i {\mathbf{D}}_{i_v}^{-1 / 2} {\mathbf{E}}_i^{(l)} + {\mathbf{E}}_i^{(l)} \end{array} \right\} \text{Phase 1} \\[0.2cm] & \left. \begin{array}{ll} {\mathbf{E}}_u^{(l+1)} &= \sigma({\mathbf{M}}_u^{(l)} \varTheta^{(l)} ) \\[0.2cm] {\mathbf{E}}_i^{(l+1)} &= \sigma({\mathbf{M}}_i^{(l)} \varTheta^{(l)} ) \end{array} \right\} \text{Phase 2}\\ \end{array} \right., \end{aligned} $$
(9.12)

where \({\mathbf {D}}_{u_v}\), \({\mathbf {D}}_{u_e}\) and \({\mathbf {D}}_{i_v}\), \({\mathbf {D}}_{i_e}\) are vertex degree and hyperedge degree matrices of user hypergraph H u and item hypergraph H i, respectively. \({\mathbf {E}}^{(l)}_u\) and \({\mathbf {E}}^{(l)}_i\) are the inputs for layer l, while \({\mathbf {E}}^{(l+1)}_u\) and \({\mathbf {E}}^{(l+1)}_i\) are the outputs for layer l.

With the introduced framework, the collaborative signals in the user–item network are modeled and captured, thus achieving better representation.

9.2.2 Attribute Inference

A CF-based recommender system has the cold-start problem when there is a lack of historical behavior data of users, making it challenging to personalize recommendations to individual users. Making use of user and item attribute data is a potential answer to this issue. The attribute information of users usually includes gender, age, occupation, etc. The attribute information of an item can be the genre of a movie or music, or the classification of an item on an e-commerce website, etc. According to the principle of CF, similar users will choose similar items, and the attribute information can then be used to establish the similarity between users or items. The addition of attribute information can build up the association between users and items in the absence of user historical behaviors, which can well alleviate the cold-start problem. In other words, attribute information can assist in collaborative filtering.

However, attribute information is often insufficient, as many people are reluctant to provide true personal information. Therefore, attribute inference becomes an important task. It is mutually reinforcing with the recommendation task, as high-quality attributes can help better with collaborative filtering, while more accurate user behavior can also help infer attributes of users and items.

In this section, we discuss a framework of multi-task learning that combines the attribute inference task with the recommendation task. The framework first utilizes multi-channel hypergraph CF for representation learning, performs two downstream tasks simultaneously, and lastly optimizes the model by downstream tasks. The pipeline of the framework is presented in Fig. 9.6.

Fig. 9.6
A pipeline model has a user embedding U plus an item embedding V and a multi-channel hypergraph structure, leading to multi-channel hypergraph learning. It further divides into recommendation and attribute inference, resulting in a multi-stage pseudo-label attribute update.

The pipeline of multi-channel hypergraph neural networks for recommendation and attribute inference

(1) Multi-Channel Hypergraph Collaborative Filtering

Multi-Channel Hypergraph Construction

In order to model the higher-order interactions and attributes between users and items, two hypergraphs, named Interaction Hypergraph and Attribute Hypergraph, are constructed and denoted as I and A for simplicity.

The structure of I is generated through the interaction between users and items. The implicit interaction matrix is represented as \(\mathbf {R}\in \mathbb {R}^{n_u \times n_v} \), where n u and n v denote user and item numbers, respectively. With the k-order reachable rule introduced in the previous subsection, we generate the hyperedges by connecting the user’s and item’s 1-order reachable users and items. The incidence matrix can be expressed as

$$\displaystyle \begin{aligned} \begin{array}{l} {\mathbf{H}}_I^u(i, j) = \left\{ \begin{array}{ll} 1 & \text{user}_i\ \text{interacted with item}_j\\ 0 & \text{otherwise. } \end{array} \right. \\ {\mathbf{H}}_I^v(i, j) = \left\{ \begin{array}{ll} 1 & \text{item}_i\ \text{interacted with user}_j\\ 0 & \text{otherwise. } \end{array} \right. \end{array} {} \end{aligned} $$
(9.13)

It is obvious that \({\mathbf {H}}_I^u = \mathbf {R}\) and \({\mathbf {H}}_I^v = {\mathbf {R}}^\top \).

The structure of A is generated through the attribute information of users and items. The user and item binary attribute matrices are denoted by \(\mathbf {X}\in \mathbb {R}^{n_u\times n_p}\) and \(\mathbf {Y} \in \mathbb {R}^{n_v\times n_q}\), where n p and n q denote user and item attribute numbers, respectively. Attributes represent hyperedges, and vertices with the same attributes are connected by hyperedges. The incidence matrix can be formulated as

$$\displaystyle \begin{aligned} \begin{array}{l} {\mathbf{H}}_A^u(i, j) = \left\{ \begin{array}{ll} 1 & \text{user}_i\ \text{has attribute}_j\\ 0 & \text{otherwise. } \end{array} \right. \\ {\mathbf{H}}_I^v(i, j) = \left\{ \begin{array}{ll} 1 & \text{item}_i\ \text{has attribute}_j\\ 0 & \text{otherwise. } \end{array} \right. \end{array} {} \end{aligned} $$
(9.14)

Here we can have \({\mathbf {H}}_A^u=\mathbf {X}\) and \({\mathbf {H}}_I^v = \mathbf {Y}\).

Multi-Channel Hypergraph Learning

When the hypergraph structure has been generated, the multi-channel hypergraph convolution is performed separately. It can be written as

$$\displaystyle \begin{aligned} {\mathbf{X}}^{(k+1)}=\sigma({\mathbf{D}}_v^{-1/2}\mathbf{HD}_e^{-1}{\mathbf{H}}^\top{\mathbf{D}}_v^{-1/2}{\mathbf{X}}^{(k)}), \end{aligned} $$
(9.15)

where X (k) denotes the vertex embeddings after k-layer convolution, and it should be replaced by \({\mathbf {U}}_c^{(k)}\) and \({\mathbf {V}}_c^{(k)}\) for user and item embeddings on channel c ∈{A, I} in our case. To bypass the over-smoothing problem, the results obtained from K-layer propagation are averaged as below:

$$\displaystyle \begin{aligned} {\mathbf{U}}_c^* = \frac{1}{K+1}\sum_{l=0}^K {\mathbf{U}}_c^{(k)}, {\mathbf{V}}_c^* = \frac{1}{K+1}\sum_{l=0}^K {\mathbf{V}}_c^{(k)}. {} \end{aligned} $$
(9.16)

Moreover, to aggregate information from different channels, a channel attention mechanism is leveraged to generate the comprehensive user and item embeddings. It is defined as

$$\displaystyle \begin{aligned} \displaystyle \boldsymbol{\alpha}_u^c = & f_{a}({\mathbf{U}}_c^*) = \frac{\mathrm{exp}({\mathbf{a}}_u^\top \cdot {\mathbf{W}}_{a}^{c,u} {\mathbf{U}}_c^*)}{{\sum}_{c} \mathrm{exp}({\mathbf{a}}_u^\top \cdot {\mathbf{W}}_{a}^{c,u} {\mathbf{U}}_c^*)}, \end{aligned} $$
(9.17)
$$\displaystyle \begin{aligned} \displaystyle \boldsymbol{\alpha}_v^c = & f_{a}({\mathbf{V}}_c^*) = \frac{\mathrm{exp}({\mathbf{a}}_v^\top \cdot {\mathbf{W}}_{a}^{c,v} {\mathbf{V}}_c^*)}{{\sum}_{c} \mathrm{exp}({\mathbf{a}}_v^\top \cdot {\mathbf{W}}_{a}^{c,v} {\mathbf{V}}_c^*)}, \end{aligned} $$
(9.18)

where \({\mathbf {W}}_{a} \in \mathbb {R}^{d \times d}\) is the trainable parameter, and d denotes the embedding dimension. The comprehensive representations can be formulated as

$$\displaystyle \begin{aligned} {\mathbf{U}}^* = \sum_{c} \alpha_u^c {\mathbf{U}}_c^*, {\mathbf{V}}^* = \sum_{c} \alpha_v^c {\mathbf{V}}_c^*, \end{aligned} $$
(9.19)

where c ∈{A u, I u, A v, I v}.

The graph convolution is leveraged in order to further exploit the interaction data between users and items. It can be formulated as

$$\displaystyle \begin{aligned} \left(\begin{array}{ll} {\mathbf{U}}^{*(j+1)} \\ {\mathbf{V}}^{*(j+1)} \end{array}\right) ={\mathbf{D}}^{-1/2} \left(\begin{array}{ll} \mathbf{0} & \mathbf{R} \\ {\mathbf{R}}^\top & \mathbf{0} \end{array}\right) {\mathbf{D}}^{-1/2} \left(\begin{array}{ll} {\mathbf{U}}^{*(j)} \\ {\mathbf{V}}^{*(j)} \end{array}\right), {} \end{aligned} $$
(9.20)
$$\displaystyle \begin{aligned} \hat{\mathbf{U}} = \frac{1}{J+1}\sum_{l=0}^J {\mathbf{U}}^{*(j)}, \hat{\mathbf{V}} = \frac{1}{J+1}\sum_{l=0}^J {\mathbf{V}}^{*(j)}, \end{aligned} $$
(9.21)

where J is the number of graph convolution layers.

(2) Recommendation Task and Attribute Inference Task

Following up the representation learning through multi-channel hypergraph collaborative filtering, the two downstream tasks can be performed simultaneously.

First, based on the idea of matrix factorization, the user and item interaction can be predicted as

$$\displaystyle \begin{aligned} \hat{\mathbf{R}} = \hat{\mathbf{U}}\hat{\mathbf{V}}^\top. {} \end{aligned} $$
(9.22)

Next, we consider the nature of the relationship between attributes and vertices, and a subtle method of attribute inference is discussed. Also inspired by matrix factorization, the attribute matrix can be regarded as the product of two low-rank matrices. It can be formulated as

$$\displaystyle \begin{aligned} \hat{\mathbf{X}} = \hat{\mathbf{U}}{\mathbf{P}}^\top, \hat{\mathbf{Y}} = \hat{\mathbf{V}}{\mathbf{Q}}^\top, {} \end{aligned} $$
(9.23)

where \(\mathbf {P} \in \mathbb {R}^{n_p \times d}\) and \(\mathbf {Q} \in \mathbb {R}^{n_q \times d}\) are the user and item attribute representations. The use of matrix factorization for attribute inference is very reasonable because attributes are influenced by the properties of vertices and the properties of attribute themselves; one cannot be presented without the other. In conclusion, the benefit of processing two distinct tasks concurrently with this method is that it permits information sharing while allowing a high degree of autonomy between the two training activities.

(3) Joint Optimization

A paired loss called Bayesian Personalized Ranking (BPR) promotes observable behavior predictions to outperform unobserved ones, and it is utilized to optimize the recommendation task. It can be written as

$$\displaystyle \begin{aligned} \mathbb{L}_r = \sum_{j\in \mathbb{I}(i), k \notin \mathbb{I}(i)} -log \sigma (\hat{r}_{i,j} - \hat{r}_{i,k}) + \lambda \lVert \varPhi_r \rVert_2^2, {} \end{aligned} $$
(9.24)

where Φ r represents the model parameters and \(\hat {{\mathbf {r}}}_{i,j} = {\mathbf {u}}_i^\top {\mathbf {v}}_j\) represents the probability that useri is interested in itemj. The sigmoid function is denoted as σ(⋅).

Next, the attribute inference task can be regarded as an attribute categories classification problem. The cross-entropy loss is then leveraged for optimizing the attribute inference task. It can be written as

$$\displaystyle \begin{aligned} \begin{array}{ll} \mathbb{L}_i^u &= -\frac{1}{n_u}\displaystyle\sum_i \sum_{j=1}^{n_p} x_{ij}log(\hat{x}_{ij}), \\[0.2cm] \mathbb{L}_i^v &= -\frac{1}{n_v}\displaystyle\sum_i \sum_{j=1}^{n_q} y_{ij}log(\hat{y}_{ij}), \\[0.2cm] \mathbb{L}_i &= \mathbb{L}_i^u + \mathbb{L}_i^v, {} \end{array} \end{aligned} $$
(9.25)

where \(\hat {\mathbf {x}}_{i,j} = {\mathbf {u}}_i^\top {\mathbf {p}}_j\) is the inference score of useri on user attributej, and \(\hat {\mathbf {y}}_{i,j} = {\mathbf {v}}_i^\top {\mathbf {q}}_j\) is the inference score of itemi on item attributej.

Finally, the sum of the losses from the two tasks is the overall loss. It can be written as

$$\displaystyle \begin{aligned} \mathbb{L} = \mathbb{L}_r + \gamma \cdot \mathbb{L}_i, {} \end{aligned} $$
(9.26)

where γ is the hyperparameter for balancing the two different losses.

Although in this section we only discuss two instances, i.e., collaborative filtering and attribute inference, applications of hypergraph computation in recommender system do not end there. In collaborative filtering, only the historical interaction data are utilized, and the hypergraph is constructed upon the similarity of users/items in behavior space. In attribute inference, the attribute information of users and items is further utilized to solve the cold-start problem. In this case, the hypergraph is constructed based on both behaviors and attributes. In addition to the behavior and attribute data, the context data, such as the time, location, weather, etc., can also be integrated, and hypergraph can also be applied to model the complex correlations among these data. Also, the user–item network sometimes can be multiplex, that is, there may exist various kinds of interactions between users and items, e.g., a user may view, click, and purchase an item. How to adopt the hypergraph to model such multiplex connections also remains to be explored.

9.3 Sentiment Analysis

The emergence of Twitter and Sina Weibo has given social media users a place to share their thoughts and emotions about particular occurrences. At the same time, this information is rapidly and widely disseminated throughout social networks. Therefore, how to analyze the information in social media becomes an important issue.

First, sentiment dimension, event monitoring, social network analysis, and business advice all have numerous potential applications for microblog sentiment research. By analyzing the sentiment of massive data, we can get the emotional attitude of netizens toward relevant events. Second, real-time multimedia data may travel quickly and widely throughout the social network in terms of the temporal dimension, having a significant impact on society. Therefore, efficient real-time temporal detection can help government organizations with macroeconomic control and marketing management at huge corporations.

There are multi-modal data among Twitter data, including text, images, emojis, videos, etc. The higher-order association between different modalities can be well modeled by hypergraphs to extract sentiment information. In the following subsections, we provide two examples to analyze the sentiment of microblog data in two dimensions using hypergraphs, respectively, [5, 6].

9.3.1 Sentiment Prediction

Predicting multi-modal sentiment of tweets is not an easy task. Most sentiment analysis models focus on textual or visual channels only. However, in human emotional perception, different moods have their own characteristics so that sentiment analysis should be based on multiple perspectives. Even with multi-channel data, it is uncertain whether the emotions of different channels are related. Moreover, there are cases where some channels are missing. To address these problems, a two-layer multi-modal hypergraph learning framework [5] is introduced to create a multi-modal sentiment prediction.

This framework’s objective is to forecast the sentiment of provided multi-modal microblog data (e.g., a Weibo tweet) that include text, visuals, and emoticons. The bag-of-textual-words feature \(F_i^{botw}=\{w_i^t, \ldots , w_{m_t}^t\}\) is extracted for textual modality. The visual modality feature \(F_i^{bovw}=\{w_i^v, \ldots , w_{m_v}^v\}\) is extracted from the i-th image. Furthermore, an emoticon dictionary is defined for the emotical modality, which forms the bag-of-emoticon-words feature \(F_i^{boew}=\{w_i^e, \ldots , w_{m_e}^e\}\). A corresponding sentiment score \(s_k^t, s_k^v, s_k^e\) is assigned to \(w_k^t, w_k^v, w_k^e\), respectively. Consequently, the tweet x i can be denoted as \(\{F^{botw}_i, F^{bovw}_i, F^{boew}_i \}\). Through investigating \(F^{botw}_i, F^{bovw}_i\) and \(F^{boew}_i\) simultaneously, the sentiment of x i can be predicted.

(1) Multi-Modal Hypergraph Learning

To create the incidence matrix of the hypergraph, the correlation between each tweet and the “centroid” tweets of various modalities is first computed. Each tweet is treated as a vertex and the hyperedges connecting its k nearest neighbors in each modality. It is important to note that each vertex can be thought of as a centroid. The incidence matrix can be defined as

$$\displaystyle \begin{aligned} \mathbf{H}(v_i, e_j) = \left\{ \begin{array}{ll} s(j,i) & \text{if } v_i \in e_j \\ 0 & \text{otherwise } \end{array} \right., \end{aligned} $$
(9.27)

where \(s(j,i)=\exp (-\frac {dist(i,j)^2}{\sigma \hat {d}^2})\) is the correlation between v i and e j. dist(i, j) is the distance in Euclidean terms between v i and the centroid vertex of e j. \(\hat {d}\) is the average pairwise distance for the corresponding modality, and the parameter σ is empirically set to modify the normalization of the tweet relevance. Each hyperedge’s weight starts out at 1.

In multi-modal hypergraph learning (MHG) [5], guided inference is used to perform hypergraph learning. It calculates the relevance scores of tweets with varying attitudes by iteratively updating the relevance score vector f and the hyperedge weights W. It accomplishes the aforementioned objectives by optimizing the loss functions:

$$\displaystyle \begin{aligned} \begin{array}{c} \arg \min \limits_{\mathbf{f},\mathbf{W}} \{\varOmega(\mathbf{f}) + \lambda \mathbb{R}_{emp}(\mathbf{f}) + \mu \sum \limits_{i=1}^{n_e} w_i^2\}, \\ \text{s.t. } \sum_{i=1}^{n_e} w_i =1, \end{array} \end{aligned} $$
(9.28)

where f is the learned relevance score, Ω(f) is a regularizer built on the Hypergraph Normalized Laplacian, \(\mathbb {R}_{emp}(\mathbf {f} ) = \left \|f - y\right \|{ }^2\) denotes the empirical loss, and \(\sum \limits _{i=1}^{n_e}w^2_i\) is the regularizer. Ω(f) can be formulated as

$$\displaystyle \begin{aligned} \varOmega(\mathbf{f})=\frac{1}{2} \sum \limits^{e \in \mathbb{E}} \sum \limits_{u,v\in\mathbb{V}} \frac{w(e)h(u,e)h(v,e)}{\delta(e)} \times \left(\frac{\mathbf{f}(u)}{\sqrt{d(u)}}-\frac{\mathbf{f}(v)}{\sqrt{d(v)}}\right)^2, \end{aligned} $$
(9.29)

where \(d(v)=\sum \limits _{e \in \mathbb {E}} \mathbf {W}(e)h(v,e)\) denotes vertex degree and \(\delta (e)=\sum \limits _{v\in \mathbb {V}} h(v,e)\) denotes hyperedge degree. Let \(\varTheta ={\mathbf {D}}_v^{-1/2}\mathbf {HWD}_e^{-1}{\mathbf {H}}^\top {\mathbf {D}}_v^{-1/2}\) and Δ = I − Θ be the hypergraph Laplacian. The diagonal matrices of d(v) and δ(e) are represented as D v and D e, respectively. The normalized cost function can be expressed as

$$\displaystyle \begin{aligned} \varOmega(\mathbf{f})={\mathbf{f}}^\top\varDelta \mathbf{f}. \end{aligned} $$
(9.30)

The two parameters W and f are optimized iteratively using the following two functions:

$$\displaystyle \begin{aligned} \arg \min \limits_{\mathbf{f}} \varPhi(\mathbf{f}) = \arg \min \limits_{\mathbf{f}} \{{\mathbf{f}}^\top\varDelta \mathbf{f}+ \lambda \left \|f - y\right\|{}^2 \} , \end{aligned} $$
(9.31)
$$\displaystyle \begin{aligned} \begin{array}{c} \arg \min \limits_{\mathbf{W}} \varPhi(\mathbf{W}) = \arg \min \limits_{\mathbf{W}} \{{\mathbf{f}}^\top\varDelta \mathbf{f} + \mu \sum \limits_{i=1}^{n_e}w^2_i \}, \\ \text{s.t.}~~\sum \limits_{i=1}^{n_e} w_i =1. \end{array} \end{aligned} $$
(9.32)

As shown above, MHG simulates the sample–sample relation for the purpose of hypergraph construction. The properties of modalities and their relevance to one another, however, are not fully utilized.

(2) Dual-Layer Multi-Modal Hypergraph Learning

Dual-layer multi-modal hypergraph learning is composed of 2 hypergraph layers, \(\mathbb {G}_1=(\mathbb {V}_1,\mathbb {E}_1, \mathbf {W})\) for tweet-level hypergraph and \(\mathbb {G}_2=(\mathbb {V}_2,\mathbb {E}_2, \mathbf {M})\) for feature-level hypergraph, respectively.

To allow multi-modal features to be adopted more explicitly and to directly construct multi-modal hypergraphs for modal correlation, each hypergraph layer of dual-layer multi-modal hypergraph learning uses relations between vertex and hyperedge to represent sample features or relations between features and samples, rather than relations between samples in MHG.

The sentiment label vector of tweets and the sentiment label vector of multi-modal sentiment words are denoted, respectively, by y and t in distinct hypergraph layers. Therefore, in two hypergraph layers, f and g started out originally as vectors representing the relevance scores of tweets and multi-modal features/words, respectively. It is said that M can be regarded as the confidence ratings of the sentiment labels y, which correspond to f in the hypergraph of tweet level. Two hypergraph layers are connected, and the multi-modal relevance of features is transferred to the tweet-level hypergraph in order to help predict tweet sentiment.

The probabilistic incidence matrix of a hypergraph is written as

$$\displaystyle \begin{aligned} {\mathbf{H}}_\ast(v_i,e_j)=\left\{ \begin{array}{ll} 1 & \text{if } v_i \in e_j \\ 0 & \text{otherwise} \end{array} \right., \end{aligned} $$
(9.33)

where ∗ denotes either 1 or 2, and the same applies below.

The following loss function can be optimized to represent the learning process:

$$\displaystyle \begin{aligned} \begin{array}{c} \arg \min \limits_{\mathbf{f},\mathbf{g},\mathbf{W}, \mathbf{M}} \{\varOmega_1(\mathbf{f}) + \lambda_1 \mathbb{R}_{emp1}(\mathbf{f}) + \mu_1 \sum \limits_{i}^{n_{e1}} {\mathbf{W}}_i^2 +\varOmega_2(\mathbf{g}) + \lambda_2 \mathbb{R}_{emp2}(\mathbf{g}) + \mu_2 \sum \limits_{i}^{n_{e2}} {\mathbf{M}}_i^2\}, \\ \text{s.t. } \left\{ \begin{array}{l} \sum_{i=1}^{n_e1} {\mathbf{W}}_i =1 \\ \sum_{i=1}^{n_e2} {\mathbf{M}}_i =1 \end{array} \right., \end{array} \end{aligned} $$
(9.34)

where Ω 1(f) and Ω 2(g) are regularizers based on the normalized Laplacian on hypergraph, \(\mathbb {R}_{emp1}(\mathbf {f})=\left \|\mathbf {f}-\mathbf {y} \circ \mathbf {M} \right \|{ }^2\) and \(\mathbb {R}_{emp2}(\mathbf {g})=\left \|\mathbf {g}-\mathbf {t} \right \|{ }^2\) are the empirical losses, and \(\sum _{i=1}^{n_e1} {\mathbf {W}}_i\) and \(\sum _{i=1}^{n_e2} {\mathbf {M}}_i\) are the L 2 regularizers on the hyperedge weights. In this scenario, empirical loss is represented as \(\mathbb {R}_{emp1}(\mathbf {f})=\left \|\mathbf {f}-\mathbf {y} \circ \mathbf {M} \right \|{ }^2\) and \(\mathbb {R}_{emp2}(\mathbf {g})=\left \|\mathbf {g}-\mathbf {t} \right \|{ }^2\), and \(\sum _{i=1}^{n_e1} {\mathbf {W}}_i\) and \(\sum _{i=1}^{n_e2} {\mathbf {M}}_i\) are the L 2 regularizers on the hyperedge weights. The normalized Laplacian on hypergraph regularizers Ω 1(f) and Ω 2(g) are further described as follows:

$$\displaystyle \begin{aligned} \begin{array}{c} \varOmega_1(\mathbf{f})={\mathbf{f}}^\top (\mathbf{I} - {\mathbf{D}}_{v1}^{-1/2} {\mathbf{H}}_1 \mathbf{WD}_{e1}^{-1}{\mathbf{H}}_1^\top{\mathbf{D}}_{v1}^{-1/2}) \mathbf{f}, \\ \varOmega_2(\mathbf{g})={\mathbf{g}}^\top (\mathbf{I} - {\mathbf{D}}_{v2}^{-1/2} {\mathbf{H}}_2 \mathbf{MD}_{e2}^{-1}{\mathbf{H}}_2^\top{\mathbf{D}}_{v2}^{-1/2}) \mathbf{g}. \end{array} \end{aligned} $$
(9.35)

The loss function then has the following form in terms of f, W, g, and M:

$$\displaystyle \begin{aligned} \mathbb L(\mathbf{f} , \mathbf{W}, \mathbf{g}, \mathbf{M}) = & \varOmega_1(\mathbf{f}) + \lambda_1 \mathbb{R}_{emp1}(\mathbf{f}) + \mu_1 \sum \limits_{i}^{n_{e1}} {\mathbf{W}}_i^2 \\ & +\varOmega_2(\mathbf{g}) + \lambda_2 \mathbb{R}_{emp2}(\mathbf{g}) + \mu_2 \sum \limits_{i}^{n_{e2}} {\mathbf{M}}_i^2 \\ & +\eta_1\left(\sum_{i=1}^{n_e1} {\mathbf{W}}_i-1\right) +\eta_2\left(\sum_{i=1}^{n_e2} {\mathbf{M}}_i-1\right). \end{aligned} $$
(9.36)

To summarize, we introduced a two-layer multi-modal hypergraph learning framework that models correlations among visual, textual, and emoji modalities while allowing input from missing modalities to achieve document sentiment prediction for multi-modal tweets.

9.3.2 Social Event Detection

The expanding visual content of microblogs and the inter-connectedness of diverse data have received less attention from existing methods, while social event identification as a crucial social media analysis problem has received much attention in recent years. Figure 9.7 presents an example of real-time social event. In social media platforms, event detection is a difficult issue due to the distinctiveness of social media data for the following reasons. First, it is required to explore a set of posts that are significantly related to one another and discuss a common issue because social media postings are noisy and do not include enough substantial material to provide full information. Second, social media posts can come in a variety of multimedia formats and include information such as images, timestamps, locations, user preferences, and social connections in addition to text. Finally, social posts are real time, and these large scale, real-time data make social events difficult to detect. Hypergraph, due to its natural structural advantages, can establish higher-order correlations between data of different posts, different modalities, and different times, thus enabling real-time event detection. In this subsection, we introduce a hypergraph-based method for real-time social event detection. The overall framework is shown in Fig. 9.8.

Fig. 9.7
A set of 3 illustrations. a is a webpage in a foreign language that has photographs of 2 table tennis players with options for share and comment. Graph b of accumulated posts versus days has 3 rising trends. c has clip-arts of nature, tags, social networking, video, and a map.

An example of a real-time social event. (a) Conversational text. (b) Heterogeneous content. (c) Continuously growing real-time data. Parts of this figure are from [6]

Fig. 9.8
A flow diagram represents a hypergraph with components labeled microblogs, microblog clique generation, event detection via graph partition, and detected events.

Overall framework of the real-time social event detection. This figure is from [6]

(1) Microblog Clique Generation

Microblog clique (MC), which consists of a collection of closely connected tweets, is constructed as a basic unit rather than a single microblog in order to make up for the lack of information. These microblogs cover the same subject in short time.

A hypergraph is used to describe the relationship between heterogeneous data of various tweets. A set of microblogs is denoted as M = {m 1, m 2, …, m n}. The constructed hypergraph \(\mathbb {G}_H = \{\mathbb {V}, \mathbb {E}, \mathbf {W}\}\), where a vertex v represents a microblog and a hyperedge e represents a subset of microblogs. The hyperedge weight is denoted as w(e), and its diagonal matrix is formed as W. The similarity between two microblogs m i and m j is first determined using the following heterogeneous features in order to generate hyperedges.

The cosine similarity function is used for computing textual and visual similarities. The Haversine formula is used for measuring the geographical similarity. The pairwise temporal similarity is calculated by \(s_{TI}(m_i,m_j)=1-\frac {|ti_i,tj_i|}{\tau }\). The timestamps of m i and m j are ti i and tj i, while τ denotes a normalized constant. Measures of the pairwise social similarity are

$$\displaystyle \begin{aligned} s_S(m_i, m_j) = \left \{ \begin{array}{ll} 1, & \text{if } u_i = u_j \\ 0.5, & \text{if } u_i \text{ and } u_j \text{ are linked through the social platform} \\ 0, & \text{otherwise} \end{array} \right., \end{aligned} $$
(9.37)

where u i is the owner of m i.

Two hyperedges are created by connecting each microblog m i with its neighbors as per geographic distance and middle position of location and time information. For each microblog m i, the top N nearest microblogs in terms of textual information and visual content are chosen. Finally, all microblogs of the same user are connected to generate a hyperedge. The incidence matrix, vertex degree, and edge degree of the hypergraph are defined in the same way as above.

Next, MC is generated by dividing microblogs into groups of the same topic through the hypergraph cut approach. Assume S and \(\bar {S}\) are the results of \(\mathbb {G}_H\) through the two-way partition, and the hypergraph cut can be described as

$$\displaystyle \begin{aligned} \begin{array}{c} \text{Cut}_H(S,\bar{S}):=\sum \limits_{e \in \partial S} w(e) \frac{|e\cap S||e \cap \bar{S}|}{d(e)}, \\ \partial S :=\{e\in E |e \cap S \neq \emptyset, e \cap \bar{S} \neq \emptyset\}. \end{array} \end{aligned} $$
(9.38)

The definition of the two-way normalized partition is

$$\displaystyle \begin{aligned} N\text{Cut}_H(S,\bar{S}):=\text{Cut}_H(S,\bar{S})\left(\frac{1}{\text{vol}(S)}+\frac{1}{\text{vol}(\bar{S})}\right), \end{aligned} $$
(9.39)

where the volume of S is denoted by \(\text{vol}(S)=\sum \limits _{v\in S} D(v)\). A real-valued optimization work can be relaxed from the normalized cut issue. By choosing the eigenvectors corresponding to the smallest non-zero eigenvalues of the hypergraph Laplacian, \(\varDelta =\mathbf {I}-{\mathbf {D}}_v^{-1/2}\mathbf {HWD}_e^{-1}{\mathbf {H}}^\top {\mathbf {D}}_v^{-1/2}\), and the solution can be found. The input tweets M are split into two groups, and then a bidirectional normalized partitioning is carried out recursively in each new set until the best partitioning outcome is attained. Based on the representation capacity of the various partitions as achieved by Bayesian Information Criteria (BIC), this best partitioning result is determined.

BIC is used to choose the optimal hypergraph partitioning results. For M = {m 1, m 2, …, m n}, with P = {P 1, P 2, …, P m} as a set of partitions, the BIC score is determined by

$$\displaystyle \begin{aligned} \begin{array}{l} \text{BIC} = \text{llh}(M)-\frac{N_p}{2}\log n ,\\ \text{llh}(M) = \sum \limits_i \left(\frac{1}{\sqrt{2\pi} \hat{\theta}^{N_p}}-\frac{1}{2\hat{\theta}^2}\left\|d(m_i,c_{m_i})\right\|{}^2 + \log \frac{n_i}{n}\right),\\ \hat{\theta}^2=\frac{1}{n-m}\sum \limits_i d(m_i,c_{m_i})^2, \end{array} \end{aligned} $$
(9.40)

where N p represents the parameter number and the microblog features’ dimension, n is the microblogs number, and n i is the count of corresponding partition of m i.

Following the division of the provided microblogs into a group of MCs, the MCs offer more sensible information by examining a collection of strongly correlated microblogs rather than individual microblogs, which can express more meaningful and pertinent material in the succeeding event detection technique.

(2) Detection of Social Events in Real Time

Event Detection by Using MC

For MC = {MC1, …, MCp} and corresponding microblogs M = {m 1, …, m n}, there are two observations as follows. First off, inside a single MC, and microblogs frequently refer to the same event (MC cues). Second, MCs with similar features tend to be associated with the same event (smoothness cues).

If a microblog is integrated into an MC, it is connected to the MC to impose MC cues. In order to enforce smoothness cues, pairwise MCs that are close to one another in feature space are connected. Formally, a bipartite graph \(\mathbb {G}_{\mathbb {B}} = \{X, Y, B\}\) is used to express MC and M, and two vertex sets are expressed as X and Y , where X := MC ∩M, Y := MC, with |X| = |MC| + |M| and |Y | = |MC| vertices, respectively. The definition of the across-affinity matrix B between X and Y  is as follows:

$$\displaystyle \begin{aligned} B_{ij}=\left\{ \begin{array}{ll} \eta, & \text{if } x_i \in \text{ M, } x_i\in y_j, y_j\in \text{ MC} \\ e^{-\gamma d_{ij}}, & \text{if } x_i \in \text{ MC, } y_j\in \text{ MC} \\ 0, & \text{otherwise} \end{array} \right., \end{aligned} $$
(9.41)

where d ij is the distance between two MCs, and η and γ are the two parameters that balance the inner-MC correlation and the between-MC smoothness.

The bipartite graph \(\mathbb {G}_{\mathbb {B}}\) and the necessary number of partitions K are used as the basis for the transfer cut method to partition MCs. First, assume \(\mathbb {G}_{\mathbb {B}\mathbb {Y}}=\{Y, {\mathbf {W}}_Y\}\) contains only vertices of the MC. L Y = D Y −W Y is the graph Laplacian of \(\mathbb {G}_{\mathbb {B}\mathbb {Y}}\), where D Y = diag(B 1), \({\mathbf {W}}_Y={\mathbf {B}}^\top {\mathbf {D}}_X^{-1} \mathbf {B}\). Assume that \(\{\lambda _i, {\mathbf {v}}_i\}_1^K\) are the K smallest eigenpairs of \(\mathbb {G}_{\mathbb {B}}\). They can be calculated as

$$\displaystyle \begin{aligned} \begin{array}{c} 0\leq \xi_i\leq 1, \xi_i(2-\xi_i)=\lambda_i, \\ {\mathbf{u}}_i=\frac{1}{1-\xi_i}\mathbf{Qv}_i, {\mathbf{f}}_i=({\mathbf{u}}_i^\top, {\mathbf{v}}_i^\top)^\top, \end{array} \end{aligned} $$
(9.42)

where \(\mathbf {Q}={\mathbf {D}}^{-1}_{\mathbf {X}}\mathbf {B}\) is the corresponding transition probability matrix from X to Y .

Second, {f 1, …, f K} are K-spectra clustered and the best K is selected by BIC. Assume that K 0 is the count of existing events. It is started at 0. Furthermore, suppose that the biggest number for incoming data is not larger than K 0 + n newt m, where the threshold t m is used to decide the minimum microblog number. Therefore, the bipartite graph is segmented n newt m + 1 times, and the segmentation result is selected as the event detection result using BIC. Suppose {Γ 1, …, Γ K} are the detected K events in the last process. The key MCs are found by MC selection for each Γ i, and the number of each MC is measured in terms of importance. Finally, the top n sMC MCs are selected to describe each Γ i.

Detection of Incremental Social Events

The real-time detection method is defined as follows. Assume that event detection is run at time t 0, with generated MCs, i.e., MC = {MC1, …, MCp}, detected events {Γ 1, …, Γ q}, and noisy data. New data arrive continuously from moment t 0, and it can be processed a short time gap t. In other words, event detection can be run at every t 0 + x × t, where x equals to 1, 2, …. In this instance, t 0 + Δ t is used as an example, and M new stands for newly arriving microblogs. The two steps that make up event detection are MC generation and event partition.

To generate new MCs for previous time periods, \(MC^\ast = \{MC^\ast _1, MC^\ast _2, \ldots . , MC^\ast _{n_e} \}\) were used as known samples. MC and M new are used to construct the incremental microblog hypergraph \(\mathbb {G}_H^{t_0+\varDelta _t}\). However, it is challenging because there is no clear distinction between a microblog collection and a microblog. No more than 3n e representative microblogs get to be chosen since only the three most representative tweets for each MC are chosen, depending on the amount of retweets and comments. To create the incremental microblog hypergraph \(\mathbb {G}_H^{t_0+\varDelta _t}\), they are merged with M new. New MCs (MCnew0) are then created from these data using the hypergraph partition. Based on the representative microblogs, MCnew0 and MC are combined together. In this way, \(n_{\text{MC}_{\text{new}}}\) new MCs (MCnew0) are constructed and utilized for event detection.

For detection in real time, the past events Γ = {Γ 1, …, Γ K}are used as known data in the time period. The corresponding representative MCs in Γ and the generated incremental MCnew are used to jointly construct the next graph. The difference is that for the identified events, the distance between MCs is set to 0 as follows:

(9.43)

where k = 1, 2, …, K. Therefore, according to the BIC, the bipartite graph can be partitioned into existing events and new events.

There are still several challenging problems in hypergraph computation for sentiment analysis tasks that can be continued for more research. First, for the sentiment recognition task, the case of conflicting multi-modal information can be considered. Second, further consideration can be given to the information that may be hidden in broken posts and users for the detection task on real-time social events. These tasks take into account the positive or negative associations among multiple entities, where the hypergraph is suitable for modeling such correlations.

9.4 Emotion Recognition

Emotion recognition has gained wide recognition in neuroscience and psychology research [11], and artificial intelligence offers more reliable and accurate computational models for the identification and study of emotions. It has also been extensively applied in real life [12], especially in human–computer interaction, motor vehicle driving assistance training, emotion classification in movies, and other pertinent similar areas [13].

Emotion recognition has three main goals [14]: first, to enable the understanding, inference, and recognition of human emotions by intelligent systems; second, to make it possible for systems to make human-like expressions of emotion in response to stimuli (e.g., conversational agents or robots); and third, to make it possible for intelligent systems to actually perceive emotions. Over the past three decades, researchers from several disciplines have pursued these three goals in different ways, with the method of recognizing emotions as the central issue of research. Although it has been studied for many years, progress is still being made. The reality is that there are various ways for people to convey their emotions, including language, gestures, facial expressions, and physiological signs [15]. Finding a suitable method to identify and analyze human emotions may be a long-term problem. Human volition determines the first three modalities, and there are substantial individual variances [16]. Because of these, approaches based on these three modalities have limitations in terms of accuracy and reliability. In contrast, physiological signals cannot be readily blocked or concealed and are simultaneously governed by the body’s neurological and hormonal systems. They are also often independent of human will. Therefore, physiological signals rather than visual or auditory cues may offer more accurate information about emotions [17]. A multitude of environmental and psychological elements, including interests and personality, can have an impact on human emotion, which is a highly subjective phenomenon.

Nonetheless, because of the following factors, recognizing emotions through physiological signals is still a work in progress:

  • Existence of the emotional gap and ambiguity in the concept of emotions [18]

  • Potential associations between modality and subject [19]

  • Specificity of the stimulus response (SR) and individual response (IR) [20]

  • Noise and incomplete data in the data [21]

  • Multifactorial influences [22]

In this case, the hypergraph structure allows the establishment of complex correlations that can simultaneously take into account: (a) correlations between EEG, EOG, and EMG signals, which are signals from several modalities; (b) correlations between subjects; and (c) patterns of physiological signal changes in a single subject in response to various stimuli. Two methods are presented for emotion prediction using hypergraph computation, including multi-modal vertex-weighted hypergraph learning (MVHL) [7, 8] and multi-hypergraph neural networks (MHGNN) [9].

(1) Multi-Modal Vertex-Weighted Hypergraph Learning

Hypergraphs have been used to depict the link between physiological data and personality [7]. In this way, MVHL introduces a multi-modal vertex-weighted hypergraph learning method for personalized emotion recognition (PER) that takes into account vertex weights, hyperedge weights, and modal weights. Each vertex in this method is a composite tuple (subject, stimulus). A hypergraph structure is used to develop personality correlations between various subjects and physiological correlations between the corresponding stimuli. Each vertex and hyperedge, as well as the weights of the various hypergraphs, are automatically learned. Hyperedge weights are used to create the optimal representation, while vertex weights are used to describe the impact of various samples and patterns in the learning process. The calculated factors—known as sentiment relevance—are employed for sentiment identification and are learned on a multi-modal vertex-weighted hypergraph. The fact that the vertices are composite with incorporated data from various subjects allows MVHL to identify numerous subjects’ emotions at once.

The framework of this model is shown as follows. First, a composite tuple of vertices (subjects, stimuli) is formed using the subjects and the stimuli used to elicit the subjects’ emotions. Second, multi-modal hyperborders are constructed to form personality associations among different subjects and physiological associations among the corresponding stimuli. Finally, after joint learning of vertex-weighted multi-modal multi-task hypergraphs, PER results can be obtained.

Hypergraph Construction

This model constructs the hypergraph structure by pairwise similarity between different samples. The pairwise similarity of u i and u j’s personalities is measured by the cosine function:

$$\displaystyle \begin{aligned} s_{PER}(u_i, u_j) = \frac{<{\mathbf{p}}_i,{\mathbf{p}}_j>}{\|{\mathbf{p}}_i\|\cdot\|{\mathbf{p}}_j\|}, \end{aligned} $$
(9.44)

where u i’s personality vector is denoted by p i. The centroid is determined by selecting one vertex at a time, and a hyperedge is built to link the centroid to its K nearest neighbors in the existing representation space. It should be noted that personified hyperedges are built using both intra- and inter-subject viewpoints. A hyperedge links all the vertices from the same subject together. Additionally, based on personality similarities, the closest K subjects for each subject are chosen, and all of their vertices are connected by creating another hyperedge.

Assume that the constructed hypergraphs are \(\mathbb {G}_m=(\mathbb {V}_m, \mathbb {E}_m, {\mathbf {W}}_m)\), where \(\mathbb {V}_m\) and \(\mathbb {E}_m\) denote the vertex set and hyperedge set, respectively, and W m is the diagonal hyperedge weight matrix of the m-th hypergraph (m = 1, 2, …, M). The incidence matrix H m can be computed as

$$\displaystyle \begin{aligned} {\mathbf{H}}_m(v,e)=\left\{ \begin{array}{cc} 1, & \text{if } v \in e \\ 0, & \text{if } v \notin e \end{array} \right.. \end{aligned} $$
(9.45)

The different weights of the vertices are learned to evaluate their value and contribution to the learning process. It is distinct from the classic hypergraph learning method, which simply views all the vertices equally. Assume U m is the diagonal matrix of vertex weight. The vertex degree and the hyperedge degree are defined as \(d_m(v)=\sum \limits _{e \in \mathbb {E}_m} {\mathbf {W}}_m(e){\mathbf {H}}_m(v,e)\) and \(\delta (e)=\sum \limits _{v \in \mathbb {V}_m} {\mathbf {U}}_m(e){\mathbf {H}}_m(v,e)\). Accordingly, the two diagonal matrices are defined as \({\mathbf {D}}_m^v(i,i)=d_m(v_i)\) and \({\mathbf {D}}_m^e(i,i)=\delta _m(e_i)\).

Multi-Modal Vertex-Weighted Hypergraph Learning

The goal is to simultaneously study the correlations among the included physiological signals and the personality relations across various subjects. The framework of the multi-modal vertex-weighted hypergraph learning is presented in Fig. 9.9. Given N subjects u 1, …, u N and the involved stimuli s ij(j = 1, …, n i) for u i, we assume that the c-th emotion category’s compound vertices and associated labels are \(\{(u_1, s_{1j})\}^{n_1}_{j=1}, \ldots , \{(u_N, s_{Nj})\}^{n_N}_{j=1}\) and \({\mathbf {y}}_{1c}=[y^c_{11}, \ldots , y^c_{1n_1}]^\top , \ldots , {\mathbf {y}}_{Nc}=[y^c_{N1}, \ldots , y^c_{Nn_N}]^\top \), where c = 1, …, n e.

Fig. 9.9
A framework consists of subjects and stimuli, leading to compound vertex generation, followed by hyperedge construction, then V M 2 H L, resulting in personalized emotions.

Overall framework of the multi-modal vertex-weighted hypergraph learning. This figure is from [7]

The count of emotion categories is denoted as n e. The estimated values of all stimuli associated to the specified users of the c-th emotion category, also known as emotion relevance, are given by \({\mathbf {r}}_{1c}=[r_{11}^c, \ldots , r_{1n_1}^c]^\top , \ldots , {\mathbf {r}}_{Nc}=[r_{N1}^c, \ldots , r_{Nn_N}^c]^\top \). y c, r c are denoted by

$$\displaystyle \begin{aligned} {\mathbf{y}}_{c}=[{\mathbf{y}}_{1c}^\top, \ldots, {\mathbf{y}}_{Nc}^\top]^\top, {\mathbf{r}}_{c}=[{\mathbf{r}}_{1c}^\top, \ldots, {\mathbf{r}}_{Nc}^\top]^\top. \end{aligned} $$
(9.46)

Let \(\mathbf {Y}=[{\mathbf {y}}_1, \ldots , {\mathbf {y}}_c, \ldots , {\mathbf {y}}_{n_e}]\), \(\mathbf {R} = [{\mathbf {r}}_1, \ldots , {\mathbf {r}}_c, \ldots , {\mathbf {r}}_{n_e}]\), where the two trade-off parameters are λ and η. The hypergraph structure’s regularizer is defined as follows:

$$\displaystyle \begin{aligned} \varPsi (\mathbf{R, W, U}, \boldsymbol{\alpha}) =\sum \limits_{c=1}^{n_e}{\mathbf{r}}_c^\top \sum \limits_{m=1}^M \alpha_m({\mathbf{U}}_m-\varTheta_m){\mathbf{r}}_c, \end{aligned} $$
(9.47)

where \(\varTheta _m = ({\mathbf {D}}_m^v)^{-1/2} {\mathbf {U}}_m {\mathbf {H}}_m {\mathbf {W}}_m ({\mathbf {D}}_m^e)^{-1} {\mathbf {H}}_m^\top {\mathbf {U}}_m ({\mathbf {D}}_m^v)^{-1/2}\). Then, \(\varDelta = \sum \limits ^M_{m=1} \alpha _m ({\mathbf {U}}_m-\varTheta _m)\) can be seen as the fused hypergraph Laplacian with vertex weighting.

(2) Multi-Hypergraph Neural Networks

Multi-hypergraph neural network (MHGNN) uses hypergraph to build complex correlations and identify emotions by physiological signals, which can take into account: (a) correlations between signals of various modalities, i.e., z EEG, EOG, and EMG; (b) relationships between subjects; and (c) patterns of physiological signal changes in a single person in response to various stimuli. This model groups each given subject and stimuli to a complex tuple, respectively. Assuming it is a vertex in the hypergraph, it would generate a hypergraph for each pattern with its corresponding physiological signal, making use of the term hyperedge to express the correlations among the physiological signals in response to various stimuli. The vertices are then categorized within the MHGNN framework in accordance with the intricate relationships in the data. As a result, the categorization of vertices in various hypergraphs can be equated to the recognition of emotions. Different hypergraph neural networks are combined using a fully connected network. The relative relevance of various multi-modal physiological signals is also taken into account of this network when classifying emotions. This framework’s primary benefit is its ability to combine multi-modal data and to represent three intricate relationships of the data. Figure 9.10 shows the pipeline of the MHGNN framework.

Fig. 9.10
A pipeline model represents a vertex comprising subject and stimuli. It leads to multimodal information, resulting in multi-hypergraph neural networks with multiple components.

The pipeline of multi-hypergraph neural networks

Modeling of Multi-Hypergraph

Subject correlation is formulated using a multi-hypergraph structure given a number of features from various physiological inputs. Each modality is represented by a separate hypergraph. The connections between the vertices of the hypergraph are constructed using hyperedges, and each vertex on the hypergraph represents a topic to be learned with a description of its corresponding stimuli. The k-NN method is used to generate hypergraphs, where k is a hyperparameter for assessing the connectivity. The hyperedges are created after all vertices have acted as the centroid. Each vertex gets chosen as a centroid once. We assume that S = S 1, S 2, …, S n is defined as a training set with modality i’s features \({\mathbf {X}}^{(i)} = {\mathbf {x}}^{(i)}_1 , {\mathbf {x}}^{(i)}_2 ,\ldots , {\mathbf {x}}^{(i)}_n\) , where vector \({\mathbf {x}}^{(i)}_j\) is the feature of the j-th training sample from modality i and S j denotes the j-th training sample. According to the KNN approach, the vertex v p shares the hyperedge with the k nearest vertices above and around it. Hyperedge e p is centered on the vertex v p. The Euclidean distance between the corresponding feature vectors represents the separation between two vertices. The correlation between vertex p and vertex q is represented by the matrix element h p,q. As an exponential representation of Euclidean distance, the correlation can be described as

$$\displaystyle \begin{aligned} h_{p,q}^{(i)}=\left\{ \begin{array}{ll} \exp(-\frac{d\left({\mathbf{x}}_p^{(i)}, {\mathbf{x}}_q^{(i)}\right)^2}{d^2}), & q \in u_p \\ 0, & q \notin u_p \end{array} \right., \end{aligned} $$
(9.48)

where \(d({\mathbf {x}}^{(i)}_p , {\mathbf {x}}^{(i)}_q)\) stands for the feature space Euclidean distance between samples p and q. The weight matrix W (i) is set to be an identity matrix in our model because we lack prior knowledge regarding the significance of hyperedges. As a result, the incident matrix H (i) contains all the data for the hypergraph.

An incidence matrix H(i) is generated for each modality. Finally, m incident matrices can be generated for m modalities.

Multi-Hypergraph Convolutional Networks

The creation of subject representation and subsequent emotion classification are crucial steps in emotion recognition. Deep neural networks have made significant progress in the representation of data in the last few years. However, given the intricacy of data correlations, it is still work in progress. In order to represent data and recognize emotions, a multi-hypergraph convolutional network framework that can simultaneously take into account several physiological inputs from different people is developed.

In a hypergraph convolutional network, the spatial convolution is viewed from the perspective of graph spectral theory as a spectral matrix product, and the hypergraph Laplacian Δ is leveraged to convert it from the spatial domain to the spectral domain. Δ can be formulated as \(\varDelta = \mathbf {I} - {\mathbf {D}}^{-1/2}_v \mathbf {HWD}^{-1}_e {\mathbf {H}}^{\top } {\mathbf {D}}^{-1/2}_v\), where D e and D v are the matrices of hyperedge degree and vertex degree, respectively. In this case, it is possible to formulate a hypergraph convolutional layer for each modality as

$$\displaystyle \begin{aligned} {\mathbf{X}}^{(i)}_{(l+1)}=\sigma\left({{\mathbf{D}}^{(i)-1/2}_v {\mathbf{H}}^{(i)} {\mathbf{W}}^{(i)} {\mathbf{D}}^{(i)-1}_e {\mathbf{H}}^{(i)\top} {\mathbf{D}}^{(i)-1/2}_v {\mathbf{X}}^{(i)}_{(l)} \boldsymbol{\varTheta}^{(i)}_{(l)}}\right), \end{aligned} $$
(9.49)

where \(\boldsymbol {\varTheta }^{(i)}_{(l)}\) is the learnable parameter of the l-th layer in i-th hypergraph neural network (HGNN) and σ is the activation function. When using hypergraph convolution, the parameters for Θ (i) are updated by backpropagating the feature X (i). Hypergraph structure-related parameters, such as \({\mathbf {D}}^{(i)-1/2}_v {\mathbf {H}}^{(i)} {\mathbf {W}}^{(i)} {\mathbf {D}}^{(i)-1}_e {\mathbf {H}}^{(i)\top } {\mathbf {D}}^{(i)-1/2}_v\), are pre-computed and are not trainable in this procedure. The symbol \({\mathbf {A}}^{(i)}_h\) is used to represent these parameters for simplification, and the hypergraph convolutional layer can be rewritten as

$$\displaystyle \begin{aligned} {\mathbf{X}}^{(i)}_{(l+1)} = \sigma\left({\mathbf{A}}_h^{(i)} {\mathbf{X}}^{(i)}_{(l)} \boldsymbol{\varTheta} ^{(i)}_{(l)}\right). \end{aligned} $$
(9.50)

It is important to note that the formulation of graph convolution and hypergraph convolution is similar. The graph convolution is shown as follows:

$$\displaystyle \begin{aligned} {\mathbf{X}}^{(i)}_{(l+1)} = \sigma\left({\mathbf{D}}^{(i)-1/2} {\mathbf{A}}^{(i)} {\mathbf{D}}^{(i)-1/2} {\mathbf{X}}^{(i)}_{(l)} \boldsymbol{\varTheta}^{(i)}_{(l)}\right). \end{aligned} $$
(9.51)

Hyperedges built from characteristics of several modalities are concatenated in traditional models of single hypergraph neural networks. However, because of their distinct sizes and dimensions, hyperedges have been known of being inconsistent. Additionally, there could be some variations in the perspectives from which various modalities approach the work. Some could be crucial, while others might not be just as important. In a single hypergraph model with identical weights, such discrepancies are not possible to see. However, simply concatenating distinct hyperedges makes it difficult to specifically weight them. A multi-hypergraph neural network structure is introduced to integrate multiple hypergraph structures in order to address the issue.

To calculate intermediate representations for each modality, m hypergraph neural network models are built using m hypergraphs for m modalities. The K-layer i-th hypergraph neural network may be expressed as follows:

$$\displaystyle \begin{aligned} HGNN({\mathbf{H}}^{(i)}, {\mathbf{X}}^{(i)}) = \sigma^{(i)}_K \left({\mathbf{A}}^{(i)}_h (\cdots \sigma^{(i)}_1 ({\mathbf{A}}^{(i)}_h {\mathbf{X}}^{(i)} \boldsymbol{\varTheta}^{(i)}_1) \cdots) \boldsymbol{\varTheta}^{(i)}_K\right). \end{aligned} $$
(9.52)

The final output is then generated using the m output of intermediate representations by a fully connected layer. As a fusion layer, the layer dynamically combines the outcomes of hypergraph convolutions and weights them corresponding to their contributions. A softmax layer serves as the classifier. In layers of networks with diverse hypergraph structures, modality characteristics of various sizes and dimensions are learned. Finally, they are weighted automatically and merged into the fusion layer.

W f and b f stand for the weights and bias of the fusion layer, respectively. The model can be expressed as follows:

$$\displaystyle \begin{aligned} MHGNN({\mathbf{X}}^{(1)}, {\mathbf{X}}^{(2)}, \ldots, {\mathbf{X}}^{(m)}) = & softmax\left({\mathbf{W}}_f {\mathbf{W}}_m [HGNN({\mathbf{H}}^{(1)},{\mathbf{X}}^{(1)}), \right.\\ & HGNN({\mathbf{H}}^{(2)},{\mathbf{X}}^{(2)}), \ldots, \\ &\left.HGNN({\mathbf{H}}^{(m)},{\mathbf{X}}^{(m)})]+{\mathbf{b}}_f\right), \end{aligned} $$
(9.53)

where the matrix of modality weights is denoted by \({\mathbf {W}}_m = Diag\left ({\mathbf {w}}^{(1)},{\mathbf {w}}^{(2)},\right .\) \(\left .\dots ,{\mathbf {w}}^{(m)}\right )\).

The patterns were discovered to represent a pair of interconnected and mutually reinforcing interdisciplinary concerns by examining the data findings making use of the network structure of the hypergraph. Another intriguing occurrence in the experiments was the variations in each subject’s physiological characteristics. Therefore, what should be considered is to: (a) collect data according to the requirements of real application scenarios; (b) pay attention to individual differences; (c) analyze correlations between subjects of training and test samples; and (d) add more information such as action recognition information. Hypergraphs are considered as a good tool to discover biological patterns among them.

9.5 Summary

In this chapter, to illustrate the paradigm of using hypergraph computation in social media analysis, we overview three applications, i.e., recommender system, sentiment analysis, and emotion recognition. In recommender system, we discuss two specific applications: collaborative filtering and attribute inference. Collaborative filtering only considers the raw user–item network, and hypergraph is used to model the inter- and intra-domain (user or item) correlations in behavior space. Attribute inference further takes the attribute information into consideration in addition to the historical interactions. Besides, context information such as time and location can also be integrated, which is left to explore. In sentiment analysis, sentiment prediction and social event detection are covered. The former task mainly concerns the sentiment conveyed by each multi-modal tweet, while the latter one focuses on exploring a group of postings that are closely connected and cover the same subjects. Furthermore, recognizing the emotion of people through multi-modal physiological signals is also presented. There are still many social media analysis applications worth exploring with hypergraph computation. For example, heterogeneous correlations widely exist in the social media context. How to utilize the complementary information among these heterogeneous associations with hypergraph computation has become a key issue. Besides, social media data are always dynamic rather than static, and the newcoming data may have different distributions compared with the existing data. Under such circumstances, the static hypergraph computation method cannot be directly applied, and the dynamic hypergraph computation paradigm is deserved to be investigated to solve this complex issue.