1 Introduction

At present, many consumers have become accustomed to using the Internet to purchase goods. Since consumers cannot see the actual products, they search for product reviews before purchasing and use them as reference for making purchasing decisions. After purchasing a product, consumers will also give ratings to the products or share their comments on the Internet. These review contents help other users to better understand particular products and makes it easier for them to make purchasing decisions. Businesses and companies use online reviews to learn what consumers think about their products or the companies themselves, so they can improve the products or services they offer. However, the huge amount of review data accumulated on websites over time has resulted in information overload, as users are often unable to efficiently find the precise information they need.

Many scholars have proposed various recommender systems to solve the problem of information overload. Such systems analyze users’ interests and purchase behaviors, such as users’ ratings, search history, and purchase history, and then use probability analysis or machine learning methods to recommend products that users may be interested in in the future. The common recommender system techniques include collaborative filtering [1,2,3,4] and content-based filtering [5, 6]. Traditional recommendation methods analyze user preferences based on users’ or items’ ratings [7]. However, the ratings no longer provide in-depth insights into user preferences. Due to limitations, including the problems of cold-start, sparsity and scalability, the traditional methods are also unsuitable for analyzing huge amounts of data.

Besides ratings, it would be helpful to analyze and extract various features from users’ reviews to represent user preferences in recommendation methods to improve the accuracy of predictions. To analyze user preferences more accurately, methods such as Latent Dirichlet Allocation (LDA) [8, 9] and machine learning methods [10, 11] can be used to analyze the topics (i.e., aspects) contained in the content of reviews. Therefore, some recommendation methods make recommendations based on the implicit aspect and sentiment in the reviews [11, 12]. Noteworthily, users use emotional vocabulary in their reviews to express their personal preferences. Proios et al. [13] analyzed the aspects and polarities in users’ reviews to build profiles of users and merchants. Zhang et al. [14] used a supervised approach to classify the aspects in user reviews into three different polarities (positive, neutral, and negative) and applied them to the recommendation method. Furthermore, Jo and Oh [9] used a class neural network to analyze aspect and sentiment vocabulary and predict user preferences. The above-mentioned studies are mainly based on the implicit aspect and sentiment of the reviews to make recommendations; they have a high accuracy of recommendation. Therefore, the analysis of aspect and sentiment of reviews provides important factors for precisely generating user preferences in this work.

In addition to analyzing reviews, analyzing correlations in social networks can also derive users’ implicit preferences. In social networks, users in the same circle of friends may have similar interests and preferences for products. Therefore, many people may ask their friends and relatives for their opinions before buying a product to elicit their comments about products. Some studies have proposed social network-based recommendation methods [15,16,17], which usually use trust, influence, and follower relations among users as the basis for recommendations. When users do not have enough ratings or purchase behaviors, using friends' correlations to make recommendations can effectively eliminate the shortcomings of traditional recommendation methods. Analyzing the implicit and explicit relationships among users or items to represent user preferences can enhance recommendation performance.

In summary, this study considered both the review content and the correlation between users and products and proposed an aspect-based rating prediction with a hybrid deep learning method (ARPH). This method analyzes the textual review content and the relationships among users and items to establish user preference characteristics to predict future items of interest for users. It consists of five parts: aspect detection, sentiment and semantic analysis, user preference analysis, graph attention network analysis, and rating prediction. Aspect detection analyzes and extracts the desired aspect from each sentence in user reviews. The various aspects of the user and the product are integrated and transformed into vectors. In the sentiment and semantic analysis, the sentiment and semantic features implied in each sentence of the review content can be extracted as aspect sentiment and semantic vectors of the user and the product, respectively [18]. The aspect vectors and the sentiment and semantic vectors are combined in the user preference analysis. The convolutional neural network (CNN) is then used to train these vectors and obtain user latent factors and item latent factors. In this study, these two potential factor vectors were integrated with matrix factorization (MF) to obtain the predicted ratings of products. In addition, a graphical attention network [19] was built to calculate the predicted ratings based on user relations and item relations, respectively. Finally, a multilayer perceptron (MLP) was used to integrate the predicted ratings from user preference analysis and graph attention network analysis. The weights of these two ratings were automatically adjusted by a neural network to predict the products that users were likely to be interested in in the future. The results of these two analyses were integrated and complemented to improve rating prediction accuracy. The experimental results show that considering the review aspects, sentiment and semantics, and the relationships among users and items in the rating prediction method could effectively enhance the prediction accuracy.

The main contributions of this study are described as follows. This study proposed a novel rating prediction approach, ARPH, which combines multiple features extracted from reviews and the relationships among users and items for making predictions. The integration of multiple features can more precisely establish user and item preference features. Additionally, ARPH applies the characteristics of deep learning methods to more accurately train and analyze the data, and further enhance the accuracy of rating prediction. Because the predictions derived from user preference analysis and graphical attention networks can be adjusted to complement each other, ARPH can generate more flexible rating predictions. If there are fewer user reviews, ARPH can predict ratings based on the graphical attention network analysis (i.e., using the relationships among users and items). On the other hand, if there are insufficient user and item relationships, the predicted ratings are derived from the user preference analysis (i.e., using the features derived from textual reviews).

This paper consists of five chapters. Chapter 2 provides the literature on related research topics, including deep learning models and recommender systems. Chapter 3 introduces the proposed aspect-based rating prediction with a hybrid deep learning method (ARPH). Chapter 4 presents the experimental results of the proposed method and other related methods. Chapter 5 presents the conclusion and suggests future works.

2 Related Works

2.1 Deep Learning Models

Deep learning technology enhances the efficiency and accuracy of data analysis by incorporating hidden layers into the traditional neural network. It is also applied to the tasks of natural language processing (NLP), such as sentiment analysis and opinion mining [20,21,22] to enhance the analysis accuracy.

2.1.1 BERT Model

Bidirectional encoder representations from transformers (BERT) is an NLP pre-training technology developed by Google [23]; it is the encoder of the transformers model. BERT is composed of N sub-layers combining multi-head attention and feedforward neural networks. Through multi-headed attention, BERT can analyze the context in sentences and learn text features. This mechanism enables BERT to learn more text information in processing natural language tasks [24]. Tenney et al. [25] found that the BERT model can extract both the syntax and semantic features in the text through multiple hidden layers. For analyzing the implicit sentiment of the text, Chen et al. [11] integrated hierarchical and sentence-level corpora, as well as sentiment vocabulary dictionaries, and then used the BERT model to perform sentiment classification for sentences covering specific aspects. Hoang et al. [21] proposed aspect-based sentiment analysis (ABSA) where BERT is employed to predict the aspect of text content.

2.1.2 Graph Neural Network

The graph neural network (GNN) was developed from a neural network model. It can analyze node relationships based on the messages passed between graph nodes. GNN is mostly used in various graph-based research areas, including the research fields of biochemical molecules, NLP, knowledge graphs, recommender systems, and computational vision [26]. Convolutional graph neural networks (ConvGNNs) [27], which are GNN models, receive multiple graph patterns as input and apply convolution to extract important implicit features from each graph node to generate the classification results. GNN models have also been applied in research related to recommender systems. Berg et al. [28] used a graph convolutional encoder to predict the user’s preferences based on a link graph of products. A social recommendation method based on the GNN framework (i.e., GraphRec) [19] extracts user–product interactions, preferences from user–item graphs, and social relationships to make rating predictions.

2.2 Recommender Systems

Based on the user’s interests and behaviors, the recommender system will actively filter unnecessary information and predict the items that may be of interest to the user in the future. In this section, the recommendation methods based on collaborative filtering, text analysis, and deep learning are explained separately.

2.2.1 Collaborative Filtering

Collaborative filtering (CF) is a widely used traditional recommendation method operating across various domains, addressing the information overload problem. Based on user preferences, CF first identifies a group of users with similar preferences and then recommends items to target users based on the items that similar users like [2, 4]. There are some problems with the CF method. (1) Cold-start problem [29]: The CF method cannot make recommendations when new users or items have insufficient ratings in the system. (2) Sparsity problem [30]: When too few ratings of items are given by users, the CF cannot accurately calculate the similarity between users. (3) Scalability problem: As the number of users and items increases, the CF computation time increases as well. CF has some potential limitations and mainly makes recommendations based on users’ or items’ ratings. Therefore, a lot of new methods based on the analysis of different factors (e.g., trust and social relations [15] or textual content [31,32,33,34,35]) or deep learning methods [32, 33] are proposed to enhance the recommendation performance.

2.2.2 Recommendation Based on Text Analysis

Text analysis is utilized in recommendation methods to derive insights into users’ behaviors and preferences in seeking to enhance recommendation performance. Wang and Blei [36] used LDA to analyze the aspects of users’ preferences and then employed the CF method to make recommendations. The attentive aspect-based recommendation model [31] can address the diversity in reviews, establish relationships between users and products, and identify correlations between similar aspects to find the aspects that users emphasize the most. The sentiment vocabulary in the reviews can represent users’ preferences for products. Lu et al. [34] analyzed the emotional intensity implied by the aspect and text in users’ reviews and recommended products to users that matched their preferences. Musto et al. [37] proposed a multi-criteria recommender system by integrating opinion mining and sentiment analysis methods to make recommendations based on multiple aspects of the user. Analyzing the product aspects and sentiments in review texts has been widely proven in the literature to enhance recommendation accuracy [38].

2.2.3 Recommendation Based on Deep Learning Methods

With deep learning methods, large amounts of data can be analyzed to make fast and accurate predictions through continuous learning and feedback during the computing process. Deep learning can help improve the efficiency in learning and retrieving information, as well as the accuracy of recommender systems [39, 40]. To address the sparsity problem, Kim et al. [41, 42] used a CNN to analyze text context and extract features, and then combined it with PMF to make recommendations. DeepCoNN [43] employs two CNNs to analyze user and product reviews for extracting hidden, latent features of users and products. It provides accurate recommendations for users with limited ratings or reviews. Zhang et al. [44] employed attention-based CNN to analyze keywords in product reviews for constructing product attractiveness; recommendations were then made based on users’ preferences in different aspects. In addition, deep learning methods can be applied to data derived from social networks [45,46,47]. Deep graph neural network-based social recommendation employs GNN to analyze user and product characteristics to extract potential factors and then predict ratings with MF [48]. Therefore, this study asserts that the integration of multiple deep learning methods can significantly enhance recommendation results.

3 The ARPH Method

3.1 Research Process

The aspect-based rating prediction with a hybrid deep learning method (ARPH) was proposed in this work for analyzing a review website and making product predictions. The textual content of reviews implied information about user preference aspects, sentiments and semantics, so users’ preferences and product ratings could be understood regarding each aspect. In addition, based on user relations and item relations, the similarity between users’ preferences and the correlation between the products they have purchased could be understood. The features extracted by these two methods could improve prediction accuracy by establishing user preferences and product features more accurately.

The overview of the proposed method is shown in Fig. 1. In this study, the electronics category of Amazon datasets was collected for the experiment. In the data pre-processing module, eligible users and reviews were first selected from the original dataset. The filtered review data were then subjected to pre-processing, such as stemming and stop word removal. According to our previous research [18], sentence-level latent Dirichlet allocation (SLDA) [9] was utilized for aspect detection to analyze the aspect of each sentence in the user’s review and to build an aspect embedding vector. In the sentiment and semantic analysis module, the BERT model [49, 50] was adopted to build a vector of sentiment and semantic features for each aspect of the user and product. The generated aspect vectors and sentiment–semantic feature vectors were combined and trained by CNNs to obtain the user latent and item latent factors, respectively [18]. The MF method was then used to calculate the ratings of the product.

Fig. 1
figure 1

Research architecture

In addition, in the graph attention network analysis module, the user relations and item relations were input into two graph attention networks (GATs) (Fan et al. [19]). The MLP method was then used to generate user prediction scores. Finally, the weights were used to integrate the prediction scores generated by the user preference analysis and the graph attention network analysis modules to recommend products that might be of interest to users in the future.

3.2 Aspect Detection

A review usually contains one or more implicit aspects, and each aspect consists of many keywords. To analyze each aspect of a particular comment, this study uses the same method proposed in our previous research (i.e., the SLDA method) to detect aspects from reviews [18]. SLDA uses the keywords in a sentence to infer the aspect of the sentence. Thus, a sentence has only one aspect. Finally, the vectors of words with the same aspect in the review sentences were merged to generate an aspect vector of the user. Similarly, product reviews were analyzed using the same method.

Suppose that m sentences in the user u’s comments belong to the aspect x; the aspect x contains n feature words. GloVe [51] for conversion is used to transform the feature word k into a word vector \({f}_{u,x}^{k}\). Equation 1 is the vector of aspect x which summed up all the word vectors under the aspect. \({A}_{u,x}\) represents the aspect vector of user u for aspect x, while k represents the feature word.

$$A_{u,x} = \mathop \sum \limits_{k = 1}^{n} f_{u,x}^{k} .$$
(1)

Suppose that m sentences in the reviews of product i belong to aspect y, and aspect y contains n feature words. Let \({f}_{i,y}^{k}\) be a vector of the feature word k in aspect y. Equation 2 represents the aspect vector of product i, which sums up all the vectors of feature words in aspect y.

$$A_{i,y} = \mathop \sum \limits_{k = 1}^{n} f_{i,y.}^{k}$$
(2)

3.3 Sentiment and Semantic Analysis

The BERT model [21, 24, 52] can analyze the contextual relationships in the text by utilizing the bi-directional self-attention mechanism; it then generates the results of the semantic analysis, as shown in Fig. 2. The sentiment score is then obtained through a softmax layer. The BERT model in the NLP domain can generate accurate results. Therefore, this study used the BERT model to analyze the sentiment and semantic meaning implied in the reviews to establish users’ preferences.

Fig. 2
figure 2

Analytical architecture of the BERT Model

Suppose sentence s in the review of user u belongs to the aspect x and contains n words \(\{{wd}_{1},{wd}_{2},{wd}_{3},\dots \dots ,{wd}_{n}\}\), where wdn represents the word n. First, a special token [CLS] was added at the beginning of Sentence s, denoted as s \(=\{\left[CLS\right],{wd}_{1},{wd}_{2},\dots .,{wd}_{n}\}\). The self-attention layer of the BERT model encodes the input sequence. The output vector corresponding to [CLS] was used as the sentence sentiment representation. The sentiment–semantic vector of Sentence s is shown in Eq. (3).

$${\mathrm{Sen}}_{u,x}=\mathrm{Attention}\left(s\right).$$
(3)

Finally, the input sentiment–semantic vectors \({\mathrm{Sen}}_{u,x}\) were classified into 1–5 classes using the Softmax Layer.

$$r=\mathrm{Softmax}({W}_{T}\cdot {\mathrm{Sen}}_{u,x}+{b}_{T}),$$
(4)

where r represents the sentiment score of Sentence s. If the r value is 1 or 2, it means the sentiment of the sentence is negative; 3 means neutral; and 4 and 5 mean positive. \({W}_{T}\) is the weight vector, while \({b}_{T}\) is the bias vector.

The sentiment–semantic vector of a sentence in a user aspect (or a product aspect) is obtained by summing the sentiment–semantic vectors of sentiment feature words in a sentence. Suppose there are n sentences in the user u’s reviews that belong to aspect x (i.e., {\({s}_{1},{s}_{2},{s}_{3},\dots \dots ,{s}_{n}\)}). Through the BERT model, the sentiment–semantic vector \({\mathrm{Sen}}_{u,x}^{k}\) could be obtained for the sentence k. Then, Eq. (5) is used to generate the sentiment score of aspect x (i.e., \({S}_{u,x}\)) by summing up the sentiment–semantic vectors of the sentences in the aspect.

$$S_{u,x} = \mathop \sum \limits_{k = 1}^{n} {\text{Sen}}_{u,x}^{k} .$$
(5)

Suppose that m sentences in the reviews of product i belong to the aspect y (i.e., {\({s}_{1},{s}_{2},{s}_{3},\dots \dots ,{s}_{m}\)}). The sentiment–semantic vector \({\mathrm{Sen}}_{i,y}^{l}\) of the sentence l is obtained through the BERT model. Let \({\mathrm{AS}}_{i,y}\) be the sentiment–semantic vector of product i for aspect y. The sentiment–semantic vectors of the sentences contained in product aspect y were summed up using Eq. (6).

$${\text{AS}}_{i,y} = \mathop \sum \limits_{l = 1}^{m} {\text{Sen}}_{i,y}^{l} .$$
(6)

3.4 User Preference Analysis

In this study, a CNN was applied to obtain the user latent factors and item latent factors, respectively, based on all of the aspect vectors and sentiment–semantic vectors derived from the reviews of users and products. Then, based on the user latent and item latent factors, the MF [7] generates the product ratings. The architecture of user preference analysis is shown in Fig. 3.

Fig. 3
figure 3

User preference analysis

In the user feature model, the user’s aspect vector \({A}_{u,x}\) and the sentiment–semantic vector \({\mathrm{AS}}_{u,x}\) were combined as a user feature vector of user u for aspect x (i.e., \({\mathrm{UP}}_{u,x}\)) by using Eq. (7).

$${{\mathrm{UP}}_{u,x}=A}_{u,x}+{\mathrm{AS}}_{u,x}.$$
(7)

Since the user’s reviews might contain several different aspects, its multiple aspects were integrated into one feature vector \({\mathrm{UP}}_{u}=\{({A}_{u,1}+{\mathrm{AS}}_{u,1}),...,({A}_{u,m}+{\mathrm{AS}}_{u,m})\}\), which represents the user u’s preferences. Then, all the user latent features were input to CNN for training. Finally, all the user latent factors (i.e., V) were output.

The method of modeling product evaluation features is similar to that of user features. For product i in aspect y, the product aspect vector \({A}_{i,y}\) and the sentiment–semantic vector \({\mathrm{AS}}_{i,y}\) were integrated as a product feature \({\mathrm{PA}}_{i,y}\), as shown in Eq. (8).

$${\mathrm{PA}}_{i,y}={A}_{i,y}+{\mathrm{AS}}_{i,y.}$$
(8)

Suppose the review of product i contains m different aspects. Similar to the method of user preference generation, multiple aspects of product i were integrated into one feature vector, i.e., \({\mathrm{PA}}_{i}=\{({A}_{i,1}{+\mathrm{AS}}_{i,1}),...,({A}_{i,m}+{\mathrm{AS}}_{i,m})\}\), to represent the product feature \({\mathrm{PA}}_{i}\). The product evaluation features were then trained by the CNN model to finally output all item latent factors (i.e., \(Q\)).

3.4.1 Rating Prediction Using MF

In this study, the feature output from the two CNN models (i.e., user preference feature V and product evaluation feature Q) (from Sect. 3.4) were fed into the MF model [7] to make the rating predictions as shown in Eq. (9).

$${r}_{u,i}^{A}={Q}_{i}^{T}\cdot {V}_{u},$$
(9)

where \({r}_{u,i}^{A}\) is the predicted rating of user u for product i; \({V}_{u}\) is the column vector of user u in the user preference matrix V; and \({Q}_{i}\) is the row vector of product i in the product evaluation feature matrix Q.

To minimize the error between the predicted ratings and the actual ratings after MF, stochastic gradient descent is used to minimize the objective function. The function is shown in Eq. (10), where \({r}_{u,i}\) represents the actual rating of product i by user u, and λ represents the constant of control regularization. In the computation of the minimization objective function, the parameters are updated by randomly selecting samples. The optimal parameter values were found by multiple iterations, which can be used to adjust the weights of the CNN model.

$$\mathop {{\text{min}}}\limits_{{Q^{*} V^{*} }} \mathop \sum \limits_{u,i} \left( {r_{u,i} - Q_{i}^{T} \cdot V_{u} } \right)^{2} + \lambda \left(|| {Q_{i}||^{2}+ ||V_{u}||^{2} } \right).$$
(10)

3.5 Graph Attention Network Analysis

User relations and item relations may influence the user’s preferences. Therefore, this study proposed a GAT method to predict the products that users may prefer based on user relations and item relations, as shown in Fig. 4. In this method, the user–item embedding, user-rating embedding, and user-relation embedding were input into the GAT to generate the user latent factor. Similarly, the product ratings (i.e., item–user embedding and item-rating embedding) and item relations (item-relation embedding) were input into the GAT to generate the item latent factor. Finally, the MLP was used to integrate the two latent factors to predict the items of users’ preferences.

Fig. 4
figure 4

The architecture of graph attention network

3.5.1 User Relation Analysis

The user rating information of the products can be used to determine the products purchased by users and the user rating of the products (which represents users’ preferences). This study referred to Fan et al. [19], where the user latent factor was obtained by analyzing the user ratings and user relations.

This study first transformed the rating records of product i by user u into two vectors: user–item embedding (i.e.,\({q}_{i}\)) and user-rating embedding (i.e.,\({s}_{r}\)). The former represents the vector of the product purchased by the user, while the latter is the vector of product ratings by the user. The two vectors were then concatenated using the MLP (i.e., \(g\)). The vector after concatenation is denoted as \({x}_{u,i}\), as shown in Eq. (11).

$${x}_{u,i}=g\left(\left[{q}_{i}\oplus {s}_{r}\right]\right).$$
(11)

This study used the Pearson correlation coefficient [53, 54] to find similar users. The value of Pearson was between [− 1,1]. The value of − 1 represents a negative correlation, while 1 represents a positive correlation. The user similarity was calculated as follows:

$${\mathrm{sim}}_{u,v}=\frac{{\sum }_{i\in \left({I}_{u}\cap {I}_{v}\right)}\left({r}_{u,i}-\overline{{r }_{u}}\right)\left({r}_{v,i}-\overline{{r }_{v}}\right)}{\sqrt{{\sum }_{i\in \left({I}_{u}\cap {I}_{v}\right)}{\left({r}_{u,i}-\overline{{r }_{u}}\right)}^{2}}\sqrt{{\sum }_{i\in \left({I}_{u}\cap {I}_{v}\right)}{\left({r}_{v,i}-\overline{{r }_{v}}\right)}^{2}}},$$
(12)

where \({\mathrm{sim}}_{u,v}\) is the rating similarity of users u and v; I is the intersection of rated products of users u and v; \({I}_{u}\) and \({I}_{v}\) are the set of rated products of users u and v, respectively; \({r}_{u,i}\) and \({r}_{v,i}\) are the ratings of product i by users u and v, respectively; and \(\overline{{r }_{u}}\) and \(\overline{{r }_{v}}\) are the average ratings of users u and v. In this study, the users with similarities greater than 0 were selected to build user relations. User v, who is related to user u, is represented as a user-relation embedding (i.e., \({o}_{u,v}\)).

The vectors \({x}_{u,i}\) and \({o}_{u,v}\) were then input into two GATs for analysis. The attention mechanism in the GATs could automatically assign different weights to the nodes in the graph (i.e., \({\mathrm{\alpha }}_{u,i}^{*}\)). The attention weights were computed to weight the products preferred by the users. The attention weights were regularized as \({\alpha }_{u,i}\), as shown in Eqs. (13) and (14).

$${\mathrm{\alpha }}_{u,i}^{*}={W}_{2}^{T}\cdot\upsigma \left({W}_{1}\cdot {x}_{u,i}+{b}_{1}\right)+{b}_{2},$$
(13)
$${\alpha }_{u,i}=\frac{\mathrm{exp}({\mathrm{\alpha }}_{u,i}^{*})}{{\sum }_{i\in {I}_{u}}\mathrm{exp}({\mathrm{\alpha }}_{u,i}^{*})},$$
(14)

where \({\mathrm{\alpha }}_{u,i}^{*}\) is the attention weight of user u for the product i; W1 and W2 are weightings; \({I}_{u}\) is the set of products rated by user u; \({\alpha }_{u,i}\) is the attention weight after regularization of \({\mathrm{\alpha }}_{u,i}^{*}\); \({x}_{u,i}\) is the vector of rating information of user u; b1 and b2 are the bias.

The obtained attention weights were used to weight all items i rated by user u to obtain the item-space user latent factor (i.e., \({h}_{u}^{I}\)), as shown in Eq. (15).

$${h}_{u}^{I}=\upsigma \left(\mathrm{W}\cdot \left\{\sum_{i\in {I}_{u}}{\alpha }_{u,i}\cdot {x}_{u,i}\right\}+b\right),$$
(15)

where \({x}_{u,i}\) is the rating vector of user u; \({\alpha }_{u,i}\) is the attention weight of user u to product i; σ is the activation function; W is the weight; \({I}_{u}\) is the set of products rated by user u; b is the bias.

According to user relation, users’ purchase decisions are often influenced by the preferences of similar users. Similar to the previous approach, the user-relation embedding (i.e., \({o}_{u,v})\) was input into the GAT for analysis to obtain the relation-space user latent factor (i.e., \({h}_{u}^{S}\)), as shown in Eq. (16).

$${h}_{u}^{S}=\upsigma \left(\mathrm{W}\cdot \left\{\sum_{v\in {N}_{u}}{\alpha }_{u,v}\cdot {o}_{u,v}\right\}+b\right),$$
(16)

where \({\alpha }_{u,v}\) is the attention weight of the relation between users u and v; \({o}_{u,v}\) is the relation vector between users u and v; σ is the activation function; W is the weight; \({N}_{u}\) is the set of user v related with user u; b is the bias. Finally, the two latent factor vectors obtained above were fed into the MLP to obtain the user latent factor (i.e., \({h}_{u}\)), as shown in Eq. (19).

$${c}_{1}=\left[{h}_{u}^{I}\oplus {h}_{u}^{S}\right],$$
(17)
$${c}_{2}= \sigma \left({W}_{2}\cdot {c}_{1}+{b}_{2}\right),$$
(18)
$${h}_{u}= \sigma \left({W}_{l}\cdot {\mathrm{c}}_{l-1}+{b}_{l}\right).$$
(19)

3.5.2 Item Relation Analysis

In addition to analyzing the user latent factor, this study analyzed the user ratings and the product relations to obtain the item latent factor. The method proposed in this section is the same as that used in Sect. 3.5.1. The products bought by users and their respective ratings can be obtained from the product reviews. The rating records of product i by user u were first transformed into two vectors: item–user embedding (i.e., \({d}_{u}\)) and item-rating embedding (i.e., \({t}_{r}\)). The former is the vector of the product purchased by the user, while the latter is the vector of the product rated by the user. Then, the two vectors were concatenated using MLP (i.e., \(g\)). The vector after the concatenation is denoted as \({p}_{i,u}\), as shown in Eq. (20).

$${p}_{i,u}=g\left(\left[{d}_{u}\oplus {t}_{r}\right]\right).$$
(20)

This study used the Pearson correlation coefficient [4] to build the product relation network and find the correlated products. Pearson’s value is between [− 1,1]. A value of − 1 represents a negative correlation, while 1 represents a positive correlation. The product similarity is calculated as shown below.

$${\mathrm{sim}}_{i,j}=\frac{{\sum }_{u\in \left({U}_{i}\cap {U}_{j}\right)}\left({r}_{u,i}-\overline{{r }_{i}}\right)\left({r}_{u,j}-\overline{{r }_{j}}\right)}{\sqrt{{\sum }_{u\in \left({U}_{i}\cap {U}_{j}\right)}{\left({r}_{u,i}-\overline{{r }_{i}}\right)}^{2}}\sqrt{{\sum }_{u\in \left({U}_{i}\cap {U}_{j}\right)}{\left({r}_{u,j}-\overline{{r }_{j}}\right)}^{2}}}.$$
(21)

\({\mathrm{sim}}_{i,j}\) is the rating similarity of the products i and j; u is the user of the rating intersection of products i and j; \({U}_{i}\) and \({U}_{j}\) are the set of users who rated products i and j, respectively; \({r}_{u,i}\) and \({r}_{u,j}\) are the ratings of products i and j by user u; and \(\overline{{r }_{i}}\) and \(\overline{{r }_{j}}\) are the average ratings of products i and j. In this study, products with product similarity greater than 0 were selected to establish item relations. For product i, the item-relation embedding (i.e., \({k}_{i,j}\)) is transformed into product j, which is related to product i.

The same methods in Sect. 3.5.1 [i.e., Eqs. (13) and (14)] were used. The vectors \({p}_{i,u}\) [defined in Eq. (20)] and \({k}_{i,j}\) were input into two GATs for analysis. Assume that the weights calculated by the GAT were \({\beta }_{i,u}^{*}\), and the weight was regularized as \({\beta }_{i,u}\). The attention weights were used to weight all users u who have rated product i to obtain the user-space item latent factor (i.e., \({z}_{i}^{U}\)), as shown in Eq. (22).

$${z}_{i}^{U}=\upsigma \left(\mathrm{W}\cdot \left\{\sum_{u\in {U}_{i}}{\beta }_{i,u}\cdot {p}_{i,u}\right\}+b\right),$$
(22)

where \({p}_{i,u}\) is the rating vector of the review of product i; \({\beta }_{i,u}\) is the attention weight of the users who have rated product i; σ is the activation function; W is the weight; \({U}_{i}\) is the set of users who have rated product i; b is the bias.

Based on item relations, products that are rated similarly could be learned. The item-relation embedding (i.e., \({k}_{i,j}\)) was input into the GAT analysis to obtain the relationship-space item latent factor (i.e., \({z}_{i}^{S}\)), as shown in Eq. (23):

$${z}_{i}^{S}=\upsigma \left(\mathrm{W}\cdot \left\{\sum_{j\in {N}_{i}}{\beta }_{i,j}\cdot {k}_{i,j}\right\}+b\right),$$
(23)

where \({\beta }_{i,j}\) is the attention weight of correlation between products i and j; \({k}_{i,j}\) is the vector of correlation between products i and j; σ is the activation function; W is the weight; \({N}_{i}\) is the set of product j associated with product i; and b is the bias.

Finally, the user-space item latent factor (i.e., \({z}_{i}^{U}\)) and the relationship-space item latent factor (i.e., \({z}_{i}^{S}\)) were fed into an MLP to obtain the item latent factor (i.e., \({z}_{i}\)), as shown in Eq. (26).

$${g}_{1}=\left[{z}_{i}^{U}\oplus {z}_{i}^{S}\right],$$
(24)
$${g}_{2}= \sigma \left({W}_{2}\cdot {g}_{1}+{b}_{2}\right),$$
(25)
$${z}_{i}= \sigma \left({W}_{l}\cdot {g}_{n-1}+{b}_{n}\right).$$
(26)

3.5.3 Rating Prediction Based on GAT

In this section, the user latent factor (i.e., \({h}_{u}\)) (from Sect. 3.5.1) and item latent factor (i.e., \({z}_{i}\)) (from Sect. 3.5.2) obtained from the previous analysis were integrated. They were then fed into an MLP for rating prediction. The prediction equation is shown below.

$${e}_{1}=\left[{h}_{u}\oplus {z}_{i}\right],$$
(27)
$${e}_{2}= \sigma \left({W}_{2}\cdot {e}_{1}+{b}_{2}\right),$$
(28)
$${e}_{l}= \sigma \left({W}_{l}\cdot {e}_{l-1}+{b}_{l-1}\right),$$
(29)
$${r}_{u,i}^{G}= {W}^{T}\cdot {e}_{l}.$$
(30)

3.6 Rating Prediction

The predicted rating was derived from the user preferences and the relationships between users and products. Thus, this study integrated the predicted rating obtained from the models of user preference analysis and the GAT analysis model, as shown in Fig. 5. It also used an MLP to automatically adjust the weights for the predicted ratings derived from the user preference analysis model and the GAT analysis, respectively, and then generate the final predicted ratings. This approach makes the rating prediction method more flexible and accurate. For instance, if the user’s review data are small, the prediction will be based on the user’s rating and the relation between users. If the user’s reviews are numerous, the rating prediction will be based on the textual content of the user’s reviews.

Fig. 5
figure 5

The combination of user preference analysis and the GAT analysis

Therefore, in this section, the predicted rating \({r}_{u,i}^{A}\) (from Sect. 3.4.1) from user preference analysis and the predicted rating \({r}_{u,i}^{G}\) (from Sect. 3.5.2) from GAT analysis were input into the MLP for learning and prediction. Finally, an integrated rating prediction result \(\widehat{{r}_{u,i}}\) was output. The rating prediction was calculated as shown in Eq. (34).

$${a}_{1}=\left[{r}_{u,i}^{A}\oplus {r}_{u,i}^{G}\right],$$
(31)
$${a}_{2}= \sigma \left({W}_{2}\cdot {a}_{1}+b2\right),$$
(32)
$${a}_{3}= \sigma \left({W}_{3}\cdot {a}_{2}+b3\right),$$
(33)
$$\widehat{{r}_{u,i}}= {W}^{T}\cdot {a}_{3}.$$
(34)

4 Experiment and Evaluation

4.1 Data Collection and Evaluation Metrics

This study used the Electronics category of Amazon Datasets as the dataset in the experiment [18]. It filtered the review data in the dataset from 2013 to 2014. The dataset contains the following fields: user ID, product ID, review rating, and review text. In the user and product review data, there were at least 9 reviews for each user and each product. Finally, the filtered dataset contained 46,243 reviews and ratings, of which 3409 users had more than 9 reviews, and 3440 products had more than five reviews. For the experimental evaluation, 80% of the data were used as training data to train the neural network model, and 20% of the data were used as testing data to validate the proposed method.

In the field of recommender systems and rating prediction, the mean absolute error (MAE) and the root mean square error (RMSE) [4, 55, 56] are often employed to measure the recommendation performance. The smaller the values of MAE and RMSE, the higher the accuracy of the model. The formula is as follows:

$$\mathrm{MAE}=\frac{{\sum }_{i\in I}|{r}_{i}-\widehat{{r}_{i}}|}{|I|},$$
(35)
$$\mathrm{RMSE}=\sqrt{\frac{{\sum }_{i\in I}({r}_{i}-\widehat{{r}_{i}}{)}^{2}}{|I|}.}$$
(36)

For product i in product set I, \({r}_{i}\) is the true rating of product i, and \(\widehat{{r}_{i}}\) is the predicted rating of product i.

4.2 Explanation of Experimental Method

In this study, the proposed method: aspect-based rating prediction with a hybrid deep learning method (ARPH), was compared with other rating prediction methods. Brief descriptions of the compared methodologies are listed as follows.

  1. 1.

    Aspect-Based Rating Prediction with a Hybrid Deep Learning Method (ARPH): This method proposed in this study analyzes the aspect features and sentiment–semantic features from user and product review text and then analyzes the user relation and product relation using graph attention networks. Finally, the extracted features are integrated to predict product ratings. (Refer to Sect. 3.6)

  2. 2.

    Aspect-Based Rating Prediction with Deep Learning (ARPDL): This method analyzes user and product reviews to extract the aspect features and sentiment–semantic features. It then uses MF to make rating predictions. (Refer to Sect. 3.4)

  3. 3.

    Rating Prediction Based on Graph Attention Network (RPGAT): This method analyzes ratings and relations of user and product perspectively through a graph attention network and uses user relation and item relation to generate predicted ratings. (Refer to Sect. 3.5)

  4. 4.

    Deep Collaborative Neural Networks-CNN (DeepCoNN): This method uses a CNN to generate user latent factor and item latent factor and subsequently uses MF for rating prediction [43].

  5. 5.

    Graph Neural Networks for Social Recommendation (GraphRec): GraphRec [19] applied a graph attention network to analyze user and product ratings and relations and then used the MLP technique to generate predicted ratings.

  6. 6.

    Matrix Factorization (MF): This method categorizes a rating matrix into two low-dimensional matrices: the latent factors of users and products [7]. The two latent factors are then used to perform the matrix inner product to obtain the user’s predicted ratings for the unknown products.

  7. 7.

    Probabilistic Matrix Factorization (PMF): PMF [57] extends the MF method, adding probability analysis to the process of MF. It does not analyze textual comments and user relations.

  8. 8.

    Social Network Matrix Factorization (SocialMF): SocialMF [15] is based on traditional MF combined with a trust matrix. This method uses MF to make predictions based on users’ ratings of the product and the trust relationship between users.

  9. 9.

    Aspect-Aware Latent Factor Model (ALFM): ALFM [12] builds user preferences and product features for different aspects based on latent aspects extracted from reviews. Finally, the aspect score and the aspect importance are integrated to predict ratings.

  10. 10.

    Aspect-Based Neural Recommender (ANR): The ANR method [58] uses a neural network to analyze aspects in reviews, as well as important users or products in different aspects, and then makes rating predictions.

4.3 Result Analysis and Discussion

This study conducted several experiments to evaluate the performance of the proposed method and explored the effects of different characteristics on the method. Afterward, the prediction methods of other related studies were compared with the proposed method. The following subsections analyze and discuss the experimental results in detail.

4.3.1 Impacts of Aspect Number on Rating Prediction

Since the number of aspect features implied in the review text may affect the accuracy of the prediction, this experiment was used to find the optimal number of aspects and the optimal values for the parameters of the CNN. For the proposed method ARPDL, this study set the number of aspects as 1, 5, 10, 15, 20, 25, 30, and 35, respectively. The parameter setting of the CNN included filter size (F) and strides (S). In this study, the kernel size (K) was set as 2, the activation function (\({G}_{\mathrm{conv}}\)) was tanH, the fully connected activation function (\({G}_{fc}\)) was ReLu, and the optimizer (O) was Adam based on Zheng et al. [43]. Two steps were used to find the optimal parameter values for the CNN model.

  • Step 1: Let n be the number of input aspects. Fix the default parameters and adjust the number of moving steps S from 1 to 6 in increments of 1 to observe the effect of moving steps on prediction accuracy.

  • Step 2: Set the default parameters and find the optimal filter size F. The value of F was set from 2 to n by increasing the size in the power of 2 to observe the effect of the filter size on the accuracy of the prediction.

After the above steps, the optimal parameters for different aspect numbers were shown in Table 1. In this experiment, a good prediction result was obtained with the number of aspects of 15 (Aspects = 15). The parameters were set as follows: F = 1, K = 2, S = 2,\({G}_{\mathrm{conv}}\)= tanH,\({G}_{fc}\)= ReLu, and O = Adam.

Table 1 Parameter setting of the ARPDL method under different numbers of aspect

The trends of RMSE and MAE are shown in Table 1. When the number of aspects was set to 1, the performance in rating prediction was poor. This result meant that only the content of the review was used for the rating prediction without considering the aspects contained in the review; hence, the product could not be accurately recommended. Both RMSE and MAE tended to decrease when the number of aspects started to increase. Setting 15 aspects for the proposed method had the best performance of prediction. The RMSE and MAE started to increase when the number of aspects was higher than 15, which meant that too many aspects were unhelpful to the prediction in the model. Increasing the number of aspects also increased the number of feature words in the topic model. However, some feature terms may not be important in an article, and these unimportant features will gradually decrease the prediction accuracy. Therefore, this study set 15 for the aspect number in the subsequent experiments.

4.3.2 Impacts of Aspects and Sentiment–Semantic Features on Rating Prediction

Different features used in the ARPDL method may impact the prediction results. This experiment investigated the effect of using different features on the proposed method. The aspect number of all compared methods was set as 15 based on the results in Sect. 4.3.1. The three ARPDL methods with different features were (1) ARPDL(Text): predicting ratings based on the aspect features in reviews; (2) ARPDL(Senti + Sem): predicting ratings based on the sentiment–semantic features in reviews; and (3) ARPDL: predicting ratings based on the combination of aspect features and sentiment–semantic features.

As shown in Fig. 6, ARPDL(Text) had poor prediction results using only aspect features for prediction. The prediction result of ARPDL (Senti + Sem) was slightly better than that of ARPDL(Text). Notably, ARPDL, with the consideration of both aspect and sentiment–semantic features, had better prediction accuracy. The aspect and sentiment–semantic features can help analyze user preferences and product reviews. Therefore, the proposed ARPDL method can effectively discover more users’ implicit preferences from the text content of reviews, thereby improving prediction accuracy.

Fig. 6
figure 6

Prediction results of APRDL methods which consider aspect features

4.3.3 Impacts of Graph Attention Networks on Rating Prediction

In this study, the ARPDL method used a graph attention network to analyze the relationship between users and products (refer to Sect. 3.5). This experiment focuses on the effect of different features on the prediction accuracy of the RPGAT method. In this experiment, three variations of the RPGAT method were compared. The three features used are described as follows:

  1. 1.

    RPGAT (User): This method predicted ratings only based on user relations derived from the graph attention network.

  2. 2.

    RPGAT (Item): This method predicted ratings only based on item relations derived from the graph attention network.

  3. 3.

    RPGAT: This method predicted ratings based on the combination of user relations and item relations derived from the graph attention network.

Figure 7 shows that the prediction performance was RPGAT(Item) < RPGAT(User) < RPGAT. The prediction accuracy of the rating based on user relations was slightly better than the prediction results based on item relations. Analyzing user preferences by user-relation characteristics can help improve the prediction results. The RPGAT method, which integrated user relations with item relations, had the best prediction accuracy. The RPGAT method analyzed user preferences based on both user and item relations and identified users or items with high correlation, which gave the best prediction results.

Fig. 7
figure 7

Prediction results of RPGAT with different relation features

4.3.4 Comparisons of Rating Prediction Methods Integrating Aspect, Sentiment–semantic, and Graph Attention Network

This study investigated the impact of user preferences and product features built by three different approaches: aspect and sentiment–semantic features from reviews, user and item relations, and the integration of both review features and relation features. That is, three prediction methods: ARPDL, RPGAT, and ARPH were compared. The ARPDL method analyzed the textual reviews and integrated aspect vectors and sentiment–semantic vectors to predict ratings using CNNs. RPGAT used a graph attention network to analyze user and product relations, respectively, and predicted product ratings. The ARPH method integrated the features used in ARPDL and RPGAT methods, i.e., the aspect features, sentiment–semantic features, and user and product relations to make rating predictions. The experimental results are shown in Fig. 8.

Fig. 8
figure 8

Comparisons of rating prediction results of ARPDL, RPGAT, and ARPH methods

The experimental results showed that the performance of the prediction methods was RPGAT < ARPDL < ARPH. The lowest prediction accuracy was achieved by the RPGAT method, which only considered both user and item relations. The ARPH method complemented the review aspect features with sentiment–semantic features, as well as the relations between users and products, to obtain more accurate rating prediction results. Among other methods, ARPH had the best prediction accuracy.

4.3.5 Comparison of the Related Methods

In this section, the ARPH method was compared with the other related prediction methods described in Sect. 4.3. These methods can be divided into three categories:

  1. 1.

    MF-based methods: MF, PMF, and SocialMF.

  2. 2.

    Aspect-based rating prediction methods: ANR and ALFM.

  3. 3.

    Deep learning-based methods: DeepCoNN-CNN, GraphRec, and ARPH.

The results of these methods are compared in Figs. 9 and 10.

Fig. 9
figure 9

RMSE of all rating prediction methods compared

Fig. 10
figure 10

MAE of all rating prediction methods compared

In the MF-based methods, the prediction accuracy is MF < SocialMF < PMF. PMF has the best prediction result compared to MF and SocialMF methods. In PMF, probability analysis is added to the traditional MF method to solve the problem of sparsity due to a large dataset. Consequently, effective recommendations can be made for users with few ratings. The SocialMF method is based on the trust relationship in social networks to make rating predictions. Both PMF and SocialMF methods have good prediction results and outperform the traditional MF method.

Among the aspect-based methods, ANR and ALFM methods both have good prediction accuracy. Both methods use the aspects extracted from the reviews to make their rating predictions, helping improve the prediction accuracy. Furthermore, the ANR and DeepCoNN methods use machine learning methods to analyze the review contents. The experiments found that ANR with the consideration of aspect features achieves better prediction performance. Therefore, whether a traditional or a machine learning method is used, adding aspect analysis to the rating prediction method can effectively improve prediction performance.

Among the deep learning-based methods, DeepCoNN and GraphRec both have good prediction accuracy. It is found that training and learning features through deep learning methods can help build more accurate user preferences. The ARPH method proposed in this study has the best prediction performance; it integrates multiple features to build user latent factors and item latent factors. Noteworthily, it is effective in improving the rating prediction accuracy and has better prediction results.

5 Conclusions and Future Work

Users on many e-commerce and social networking sites can provide product ratings and share reviews. These reviews help other users understand the product and make it easier for them to make purchasing decisions. However, the large number of reviews causes the problem of information overload for users. Many recommendation methods have been proposed to solve this problem by predicting products or merchants that users may be interested in based on user ratings. However, user ratings may be too subjective and insufficient to represent the preferences of users and merchants. Because user preferences are implied in the textual reviews, this study proposed an aspect-based rating prediction with a hybrid deep learning method (ARPH), which analyzed the reviews and relations between users and products on an e-commerce website to predict the ratings and recommend products that users may be interested in in the future.

There were four parts in the process of the proposed method: aspect detection, sentiment and semantic analysis, graph attention network analysis, and rating prediction. Aspect detection can identify the aspects of sentences in user reviews and generate each aspect vector for users and products. The sentiment and semantic analysis can extract the sentiment and semantic features from the sentences in user reviews; it can then generate the sentiment–semantic vectors of user and product aspects. The CNN model combined both aspect vectors and sentiment–semantic vectors to generate the user latent factors and item latent factors. According to these two latent factor vectors, the MF method was used to predict the ratings of products that users are likely to be interested in. GAT analysis then built user and product relations, respectively, based on user rating data to make rating predictions. Finally, an MLP was used to integrate the predicted ratings from the user preference analysis and the GAT analysis by automatically adjusting their weights to predict the ratings of products. The experimental results showed that the proposed ARPH method had the best recommendation performance among all the compared methods. That is, both aspects and sentiment–semantic features extracted from the review text and the characteristics of user relations and product relations helped to make accurate predictions. The ARPH method could discover more implicit user preferences and product features, find more associated users and related products, and improve the accuracy of recommendations efficiently.

In future research, this method can be improved by analyzing other users’ implicit preferences and product features. For instance, in the rating prediction method, image analysis techniques can be employed to analyze the characteristics of product images, make rating predictions based on the image features, or find other similar products and make recommendations. In addition, in future research, deep graphical convolutional networks or graphical recurrent neural networks can be adopted to analyze user preference features. Applying the features of different neural networks may yield improved prediction results. Furthermore, graphical neural network methods may be employed in future research to analyze temporal factors. Moreover, by analyzing the network graphs which change over time, the changes in users’ preferences can be better understood to improve the accuracy of recommendations.