Graph Neural Network for Context-Aware Recommendation

Recommendation problems are naturally tackled as a link prediction task in a bipartite graph between user and item nodes, labelled with rating information on edges. To provide personal recommendations and improve the performance of the recommender system, it is necessary to integrate side information along with user-item interactions. The integration of context is a key success factor in recommendation systems because it allows catering for user preferences and opinions, especially when this pertains to the circumstances surrounding the interaction between users and items. In this paper, we propose a context-aware Graph Convolutional Matrix Completion which captures structural information and integrates the user’s opinion on items along with the surrounding context on edges and static features of user and item nodes. Our graph encoder produces user and item representations with respect to context, features and opinion. The decoder takes the aggregated embeddings to predict the user-item score considering the surrounding context. We have evaluated the performance of our model on 14 five publicly available datasets and compared it with state-of-the-art algorithms. Throughout this we show how it can effectively integrate user opinion along with surrounding context to produce a final node representation which is aware of the favourite circumstances of the particular node.


Introduction
With the rapid development of e-commerce and social media platforms in the last few years, recommender systems have gathered notable attention [1,2].They provide a methodology to identify user's requirements and predict the interest by mining the user's history and their interactions with items (e.g., purchase, watch, click, and read).Recommender systems can take various forms depending upon the application, e.g., playlist generator for video and music services (Netflix, YouTube), friend suggestions on Instagram and Facebook, and product suggestion on eBay and Amazon.One of the most common and general approaches for recommendation is Collaborative Filtering (CF) [3,4], which assume similar users have similar preferences and hence they like similar items.This approach models explicit feedback (e.g., ratings) or implicit feedback (e.g., clicks, read) to reconstruct the user's interactions.
Recently, approaches based on Graph Neural Networks (GNNs) have been demonstrated to be highly effective on various tasks defined over relational data, such as protein structure and knowledge graphs [5].The main idea of GNN is to produce the representation of a node by aggregating features from its neighbouring nodes iteratively, as shown in Fig. 1.Each GNN layer gathers all k-hop nearby node embeddings (messages) and summarizes them via an aggregation function (e.g., sum).After aggregation, the node's current state is updated.Many of these approaches treat recommendation tasks as link prediction in bipartite graphs via matrix completion [6,7].The bipartite graph can be represented as an adjacency matrix between user and item nodes, where the task is to predict entries inside the matrix (also known as link prediction).Recently, many researchers contributed towards the development of GNN-based collaborative filtering for modelling user-item interactions in the form of a message passing neural network between user and item nodes [8,9].A wide range of techniques including CF based approaches for recommender systems solely focus on rating information provided by users.Despite the popularity of these approaches, they have limited performance in real-world applications as they neglect side information such as static features of nodes (user's and item's profile), and surrounding context information (e.g., mood, time, weather) that can improve performance by enhancing the personalization in recommender systems.The surrounding context reflects the fact that user choices change with time and are highly dependent on the context under which they interact with the item.For example, time and weather information highly impact the choice of users in restaurant recommendation, while the user's mood influences which song they are most likely to listen.As such, it is important to develop context-aware recommender systems that can effectively accommodate the static features of users, as well as surrounding context information while making predictions [10].The contextual prefiltering technique [11] filters the originally available data based on current context information, and recommendation is based on the filtered data.On the other hand, the contextual postfiltering paradigm [12] takes the recommendation results from the two-dimensional recommendation techniques and filters these results based on the current context.In [13], a context based recommendation problem is mapped to a tensor completion task, which is inspired from CF approach (matrix completion), but it suffers from high complexity.SocialMF [14] integrates a trust factor as a social context between users in the social network to enhance the performance of the matrix factorization approach.Following this line of research, several deep learning based matrix factorization approaches have been proposed for context-aware recommendation tasks [15][16][17].
The existing approaches for context-aware recommendation are incapable of capturing dynamic user-item-context deep interaction, and discount the fact that the same person can behave differently when interacting with the same item under different context [18].It is therefore reasonable to expect an improvement in the quality of personalized recommendations when incorporating dynamic context information.This is the key focus and motivation underlying this work.The Fig. 2 represents the user's interaction with items considering the knowledge about the surrounding context.This can also be represented as a bipartite graph between users and items, with edges labelled with context and ratings/opinion.We introduce a novel GNN based matrix completion approach with an attention mechanism that effectively integrates the following three kinds of information from the graph (Fig. 2), between user and item nodes:  2 User's interaction with the item (e.g., movie) is surrounded by certain context (e.g., weather, mood, weekend) that influence user's opinion on item.This data takes the form of a 3D matrix between user, item and context • user's opinion/rating on items; • context information on edges between users and items; • Static features of users and items.
In particular, we leverage a context-aware graph convolutional autoencoder for matrix completion.Our graph convolutional auto encoder learns from the static features of nodes and user-item interaction information (rating), and context.We also introduce an attention factor for three kinds of embeddings (static feature-based, opinion-based, and context-based) generated by the encoder.The resulting embeddings are given as input to the decoder with the objective to reconstruct a matrix with minimum loss.A preliminary version of this work has appeared as a conference article [19].This work extends the original article by including : • multiple aggregation functions for user-item opinion graph inside a customized weight sharing graph convolutional network; • the attention mechanism for integrating multiple representations for users and items, i.e., opinion, contextual and static feature representation; • a performance evaluation of the proposed algorithm on two additional datasets for music and travel recommendation; • an extended analysis of the algorithm to include the impact of the attention mechanism for the aggregation of multiple representations.

Related Work
The vast majority of the work in the field of context-aware recommendation frameworks has been devoted to the improvement of matrix factorization (MF) approaches.These approaches work by decomposing the user-item interaction matrix into lower dimension matrices [20,21].Despite of good performance, these approaches are unable to capture the user/item-context correlation as they consider context as features of the user and item [22].Neural Factorization machine (NFM) is a deep learning method to model high-order nonlinear feature interactions for sparse data [15].In [23], a neural network model has been proposed that captures the impact of context on users and items.It learns the importance of context, but the simplicity of this model limits the ability to capture the real influence of the relationship between features.Recently, GNN based approaches have been introduced to tackle recommendation tasks on graph-structured representations of the problem [24].These methods are suitable for modelling the interaction of nodes on graph structural features in a flexible and explicit way.Fi-GNN [25] utilizes a graph structure to naturally represent the characteristics of multiple feature fields, in which every node corresponds to a feature field, and these different fields can interact through edges to model the node interaction in graph.STAR-GCN [7] stacks multiple identical GCN encoder-decoders combined with intermediate supervision to improve the final prediction performance.GCMC [6] leverages the bipartite graph between user and item nodes to learn the node representations.Both GCMC and STAR-GCN treat equally all neighbours of a node.IGMC [26] is an inductive approach for user-item matrix completion recommendation tasks, which do not consider any side information.
Previous GNN based collaborative filtering approaches [27,28] are unable to capture the collaborative filtering effect, as they discard the collaborative signals that are hidden in user-item interaction.In [8], NGCF model successfully encodes user-item high-order connectivity by exploiting user-item bipartite graph.GCF-YA [29] is a deep graph neural network implementation of collaborative filtering, based on information propagation and attention mechanism to predict missing links between users and items.GraphRec [30] tackles social recommendation by aggregating the historical behaviour of individuals from user-user and user-item bipartite graph for recommendation.
Context information on the user has been successfully used to improve recommendation performance [16,31].Recently, we have seen work on dynamic graphs that integrate interaction times as context information [32][33][34].DGCF [35] integrates the time interval between the previous and current interaction of user-item pairs inside their embedding to get the latest node representations for recommendation.An inductive deep learning approach DyRep, which is used to learn from the temporally evolving interaction between user item nodes.These approaches solely consider time information and are hence limited to integrate any other context information.
The above GNN based approaches consider the rating information as the user's opinion on the edges between the user and item nodes in a bipartite graph.Some approaches only consider user and item static features, or integrate time as a context to capture a dynamically evolving environments.All these approaches ignore the surrounding context information that can improve performance.In the following, we show how it is possible to extend such approaches to consider dynamic and time-varying contextual features influencing recommendations.

Problem Definition
We have categorized data for context-aware recommendation into four categories: items, users, context, and interactions.Context can be defined as the surrounding knowledge that is associated with the user-item interaction, e.g., time, company, mood, location, etc.In this work, we have defined a 3D rating/opinion interaction matrix between user, item and context , where N u is total number of users, N v represents the total number of items and N c is total number of different contexts (as shown in Fig. 2).The rating scale ranges from one to five stars such that A uvc ∈ {1, . . .5} N u ×N v ×N c except for InCarMusic dataset, where maximum rating is six.User and items are associated to multiple static features describing the characteristics of individuals.For example, static user features are gender, age, and static product features can be colour, brand, category etc.Let N F u and N F v represents the total number of features of users and items, respectively.The importance of the contextual features varies from person to person and from item to item.
Given such data, the recommendation problem is then cast as a task aiming to predict the existence of a labelled link between a user and an item considering the knowledge about the surrounding context.This work aims to introduce context information to matrix completion tasks with mechanisms for finding which context attributes are important for a target user and items.Details of the learning model are discussed in Sect. 4.

Context-Aware GNN model
In this section, we present our link prediction model for bipartite graph between users and items with context information on edges.We extend the graph convolutional autoencoder in [6] (GC MC + f eat, in the following).GC MC + f eat leverages rating information using a 2D user-item opinion/rating matrix along with static node features, ignoring the context information on edges.The major contribution of our approach, dubbed as context-aware graph convolutional matrix completion (cGC MC F ), is to utilize context features on the edges.The Fig. 3 High-level architecture of the proposed context-aware graph convolutional autoencoder.User's opinion on item is modeled using local weight sharing GCN.User and item features as well as user-context and itemcontext are modeled using dense neural network.While user-context-item interaction is modeled with GCN with global weight sharing proposed architecture has three main blocks, shown in Fig. 3. From top to bottom: the first block represents the input data, i.e., user's opinion/rating on items, the profile of users and items, user-item-context interaction graph with edges labeled with context and rating, and the favourite context of users and items.The second block represents the graph encoder.Inside the graph encoder, GC MC + f eat operates on 2D user-item rating matrix, while cGC MC F is our proposed extension that leverages context information on edges and maps user-item-context interaction to a 3D matrix.The graph encoder is composed of two graph convolutional neural network layers and two dense neural network layers.Each layer operates on different data to produce user and item representations with respect to rating opinion, static node features, and context information.This multiple perspective representation for each user and item is accumulated without attention weights, in our algorithms cGC MC old and cGC MC old F [19].While in cGC MC and cGC MC F , we provide the accumulation along with the attention mechanism.Further details regarding the encoder part are explained in Sect.4.1.The decoder (discussed in Sect.4.2) utilizes the encoded representations to predict the link in a bipartite graph.

Graph Encoder
Our graph encoder takes the following data in input: 1. User's Opinion on Items.The matrix A ∈ R N u ×N v +R represents user's rating/opinion on items.This matrix is composed of A r sub-matrices where A r ∈ R N u ×N v and r ∈ {1, 2 . . ., R}.
2. Static User's features.The matrix U F ∈ R N u ×N Fu consists of normalized static feature attributes for users.3. Static Item's features.The matrix V F ∈ R N v ×N Fv consists of normalized static feature attributes for items.4. Surrounding Context of User-Item Interaction (A uvc ).We have represented useritem-context interaction using 3D matrix ∈ R N u ×N v ×N c .This binary matrix contains information about the surrounding context under which the user has provided a specific opinion on the item.For example, if user U A has rated item V B with rating 5 under context c 1 , c 2 , c 3 ∈ {c 1 , c 2 , c 3 , . . ., N c }, then this matrix contain an entry set to 1 for U A , V B and c 1 , c 2 , c 3 .5. Favourite Context of Users.The matrix U C ∈ R N u ×N c denotes the importance of context for individual users.We use information from the matrix A (Eq. 1) to give more weight (α) to the context in which a user has given the high rating, compared to the context under which the user has rated less.6. Favourite Context of Items.The matrix above.The value of the context attributes of an item is high if it is more likely to get a high rating under a specific context.Thus, giving more importance to the context attributes under which an item is rated highly.
Next, we explain how a graph encoder operates on the matrices defined above, to learn the representations of users and items with respect to rating, context and static features.

User-Rating-Item Representation
The user opinions represented in the adjacency matrix A (Eq. 1) map the user's likeliness for items in the bipartite graph.We have a local weight sharing graph convolutional layer for modelling user's opinion.The local weight sharing mechanism allows having different convolutional weights based on the edge types.The number of weight matrices is equal to the possible available rating levels R. The customized message propagation for graph convolutions uses an edge type-specific parameter matrix W r .After the message propagation step, we aggregate the incoming messages at each node by two alternative types of aggregation functions: sum and stack.
• stack aggregation: concatenating all edge specific matrices along their first dimension.
• sum aggregation: performing an addition of all edge-specific matrices.
Overall, this edge specific message propagation is more effective compared to the general global message propagation.Our model selection experiments considered summation and concatenation as alternatives, and we have selected the former for its best overall performance (in validation).Details of this spectral convolutional layer are defined in the following: where X u and X v are the one-hot unique vectors for the user and item node.The term R is the maximal rating a user can give to an item, W u i and W v i represents R trainable weight matrices and σ is non linear activation function such as ReLU.The matrix Ãi and ÃT i are the normalized adjacency matrix A i and its transpose, respectively.
where the term D represents a diagonal degree matrix, containing the square root of degree on diagonal.Similarly, A T i is normalized to get ÃT i (using Eq. 5).

Context Representation
The user-item-context interaction matrix A uvc is normalized by dividing each context attribute with the total count of context attributes recorded at the time of user-item interaction.The normalized context attributes are further accumulated to get where u and v are user and item indexes in the matrix, N uv c represents the count of occurrences of context c when user u has rated item v, c uv i denotes the individual context value under which user u has rated item v.
We propose to leverage graph convolutions to model user-context-item interactions in the matrix A c , with the same message propagation rule as used for modelling user's opinion (Eq. 3) and (Eq.4)) but with a single global weight matrix.We represent the user and item representation with respect to context attributes as z c 1 u and z c 1 v , respectively.The user's behaviour varies with the change in the surrounding context, which makes them react differently to the same item under different contexts.Similarly, an item gets a different rating when the surrounding context changes.This makes the context information naturally dynamic.For modelling this dynamic user-context and item-context relation, we performed a statistical analysis of training data and identify α importance factor for each user and item, respectively.The α factor gives more importance to the favourite context of users and items.We have stored the extracted user preferences in U C : where N u denotes the neighbours of user u, N uv c represents the number of context attributes in which the user provides opinion r .We have obtained the context importance for each item in a similar way (Eq.7) and stored in V C .Both matrices are normalized to have values between 0 to 1.We have the simplest dense neural network layer to process this information.The weight matrices chosen for this purpose are randomly and uniformly distributed and node dropout is applied to the hidden layers to prevent overfitting.The operations on this layer are defined as : To get the final user's and item's context representation, we have integrated z c 1 u with z c 2 u , and where as W represents trainable weight matrices and b is a bias.

User's and Item's Profile Representation
The static features of user and item nodes are represented as U F and V F , respectively.We have not given these features directly as input in the graph convolution layer as they degrade the performance in case of sparse user-item content features.Therefore, we have a separate dense neural network layer to get the static feature representation for user and item nodes.
where W f 3 and W f 4 represent trainable weight matrices and b f is a bias.

Accumulation with Attention
We have accumulated the user's representation from rating/opinion (Eq.3), features (Eq.12) and context (Eq.10) perspective.Here, we introduce the learnable attention weights for the three representations in cGC MC F .In cGC MC old [19], we have accumulated these embeddings without considering any learnable attention weights.The last layer of the graph encoder is a dense neural network layer and is responsible for producing the final embedding with or without attention weights.For cGC MC F user's final representation is defined as: Similarly, the item's representations from rating/opinion, context and feature perspective are concatenated after having attention weights to get the final item embedding.

Decoder
We use a bilinear decoder that takes context-aware embedding of user-item interaction and reconstructs rating matrix ( Â) between users and items.Here, we address this problem as a classification task and each rating is treated as a separate class.The decoder produces a probability distribution over all classes through a bilinear operation: where Q r are R trainable matrices of dimension D × D, D is the hidden dimension of user's and item's embedding obtained from encoder and R are the available rating levels.In our setting, we defined Q r as: Here, k represents the number of linear functions which are chosen to be lower than the rating level, to avoid overfitting.The term α kr is learnable W s represents the weight matrix.
We have tested our model with different settings and represented them with different names: cGC MC and cGC MC F .cGC MC models the effect of context along with an opinion matrix, while cGC MC F brings the context effect with opinion as well as static features.We have tested both models with and without attention mechanism.We found that the attention mechanism improved the performance.

Rating Prediction and Model Training
We evaluate the performance of the proposed algorithm using MAE (Eq.18) and RMSE (Eq.19) metrics with respect to the rating assigned by the user to their interaction with the item.The choice of these metrics over classification based ones is driven by the nature of the ratings, which is ordinal rather than multinomial.Hence it is important to capture how closely the prediction approximates the expected rating (which is not the case for classification-based metrics).Our model is trained in end-to-end fashion by minimizing the root mean square error between the actual (A i j ) and reconstructed rating ( Âi j ).
where n represents the cardinality of user-item pairs.

Datasets
To demonstrate the effectiveness of our proposed algorithms cGC MC and cGC MC F , we conduct experiments on five real-world publicly available datasets for movies, music and travel.We summarize the statistics of datasets in Table 1, where density is defined as the ratio between the number of edges and the cardinality of the (user,items) pairs.LDOS-CoMoDa1 is a popular movie dataset collected from survey.This dataset contains user's opinions on a movie considering the surrounding context.The context information includes location (home, friend's house, public place), time (morning, afternoon, evening, night), day-type (working day, weekend, holiday), weather (sunny, cloudy, rainy, stormy, snowy), decision (movie choices by themselves or users were given a movie), mood (positive, negative, neutral), season (summer, winter, spring, autumn), endEmo i.e., emotional state at the end of watching movie (sad, happy, angry, surprised, neutral, scared, disgusted), domEmo i.e., emotional state experienced most when watching movie (sad, happy, angry, surprised,

Table 1
The statistical information defining number of users, items and context attributes along with the edge density and rating levels for each of the datasets used in our experiments neutral, scared, disgusted), interaction (1 st interaction with a movie, N th interaction with a movie), physical (ill, healthy), companion (alone, friends, partner, family, colleagues, parents, public).Besides this information, LDOS-CoMoDa also has profile features for users (gender, age, city, country) and movies (director, language, actor, genre).DePaulMovie2 is a movie dataset collected by researchers of the DePaul University, with ratings acquired by survey.Students have been asked to rate movies subject to 3 context variables: location (home, Cinema), time (weekend, weekday), and companion (partner, family, alone) information.This dataset does not have user's and item's profile features.
Travel-STS 3 dataset contains information about places visited by tourists.The context information includes distance (nearby, far away), time available (half a day, one day, more than one day), temperature (warm, hot, burning, cool, cold, freezing), season (summer, winter, spring, autumn), crowdedness (empty, crowded, not crowded), mood (happy, active, sad, lazy), budget (high spender, budget traveler, price for quality), weather (sunny, cloudy, rainy, clear sky, thunderstorm, snowing), companion (with children, with friends/colleagues, alone, with family, with girlfriend/boyfriend), weekend (weekday, weekend), travel goal (visiting friends, religion, business, health care, education, social event, scenic/landscape, hedonistic/fun, activity/sport), means of transport (bicycle, car, public transport, no transportation means) and knowledge of surrounding (returning visitor, completely new area, citizen of the area).This dataset also contains user profile features (age, gender).
Tijuana Restaurant 3 is a restaurant dataset gathered via a survey consisting of 8 inquiries from persons about various neighbouring cafes.Every restaurant picked was assessed multiple times, one for every possible context setting.The context information includes combinations of time and location (c 1 : weekday and school, c 2 : weekday and home, c 3 : weekday and work, c 4 : weekend and school, c 5 : weekend and home, and c 6 : weekend and work).
The density value in Table 1 represent a fraction of positive links between the nodes.Tijuana-Restaurant dataset has a few number of nodes connected with a high number of edges, while LDOS-CoMoDA dataset has a greater number of nodes connected with few edges (compared to other datasets).Overall, the effect of high or low density values on the performance of our models is shown to be negligible in Sect.6.

Implementation Setup
Our Pytorch implementation 4 of the cGC MC and cGC MC F models is publicly available.We have used 60% data as a training set, 20% as a validation set and 20% as a test set for each dataset.The data splitting is performed five times.Each time the data is shuffled with a different random seed before dividing into splits.The average performance of all algorithms after five runs with different random splits is presented in Sect.6.

Computational cost
We report the computational costs (in seconds) of cGCMc and cGCMC F , obtained by computing the average time required by a single training epoch and the average time required by the prediction step (i.e., on the whole testset).Results are presented in the Table 2.

Hyper-parameters
We have evaluated our approach under different configurations.The best value for each hyper-parameter is shown in bold.We have searched the embedding size for the user's  3. We have chosen batch size from [40,80,120,150,200].The last layer of the encoder is set to produce embeddings of size 75.The node dropout (P drop ) rate is tuned in [0.3, 0.4, 0.5, 0.6, 0.7].P drop is the probability to randomly drop all outgoing messages from specific nodes to train under the denoising setup.The α importance factor defined as [0.2, 0.3, 0.5, 0.7, 0.8] ∀ r ∈ R, initially chosen randomly considering the fact: We can choose from any set of initial values provided that it satisfies the fact: the context in which the user gives a high rating should have more weight.The attention weights for opinion, feature, and context representations are first set to random values and then learned to give appropriate weights for each of these representations before combining them.All neurons use ReLU nonlinearity and Adam is employed as the optimization algorithm.The model is trained for 200 epochs.For baseline algorithms, all parameters are initialized as mentioned in the corresponding papers.

Benchmarks
In the evaluation phase, we have evaluated the test set using predictive performance in terms of mean absolute error (M AE) and root mean square error (R M S E).We compare our approach with several link prediction algorithms from the literature as follows : • SocialMF [14] is a matrix factorization approach that exploits user-user trust information along with user opinion on the item to predict items for users.• SVD + + [36] improves the conventional SVD approach by allowing the joint use of explicit (e.g., user's rating opinion), and implicit (e.g., purchases, visited items) information.
• PMF [37] is a matrix factorization approach for sparse datasets.This exploits the useritem interactions only to learn user and item embeddings, while forgoing the context features.
• BiasedMF [38] is an improvement to traditional matrix factorization and it incorporates bias for user, item, and global bias factors • GCMC [6] models user's opinion leveraging the rating matrix between users and items for matrix completion task.
• GCMC+feat [6] extended GC MC by integrating static features inside the user and item nodes for link prediction in a bipartite graph.• GraphRec uu uv [30] algorithm exploits the social relation between users along with useritem interactions for link prediction in user-item bipartite graph.

Performance Comparison
Table 4 presents a comparison between the previous version of our algorithm (subscript with 'old') with the extended version, and Table 5 presents the performance comparison of our approach with other state-of-art algorithms.Our two datasets (LDOS-CoMoDa and Travel-STS) contain user and item (description) features along with the user's opinion on items and context information.For the other three datasets (DePaul, InCarMusic, Tijuana-Restaurant), we have only user's opinion on the item and contextual information.The algorithms that are integrating user's and item's feature information are not applicable to the later category of datasets (indicated inside tables with the NA mark, as in "Not Applicable").
• A clear performance difference can be seen between the old and extended versions of our model on all datasets (provided in Table 4).This is purely due to the newly introduced attention factor in the last layer of the encoder.• Basic matrix factorization approaches, P M F and Biased M F, that solely model user-item interaction as isolated instances, ignore side information thus limiting their representation ability.These approaches perform worse compared to all baseline algorithms on all datasets because of their limitation to integrate knowledge about surroundings.• The SV D++, Social M F, and Graph Rec uu uv perform better than basic matrix factorization approaches as they capture and integrate knowledge about an individual user in the form of social trust or by using implicit feedback.Despite of integrating side information, these approaches perform worse than our method because of the advantageous effect of surrounding contextual learning.
• When comparing our proposed algorithm with GNN based approaches (GC MC and GC MC + f eat), we can identify a significant improvement in performance motivated by the capability of providing context-aware recommendations.Overall, our model outperforms all baseline approaches on all datasets, providing sufficient grounding to state the importance of being able to take into consideration the surrounding knowledge of the context to provide accurate recommendations.

Impact of Context Modeling
The major contribution of our approach is to organize context features on edges with user-item interaction in an effective way.We have used the α importance factor to learn favourite surrounding context features for target user and item for context-aware link prediction.We hence execute ablation study, to validate the rationality and usefulness of α.We already explained how context importance varies from person to person and different context attributes effect differently on the items.The Fig. 4 demonstrates the positive effect of capturing this importance factor in our model.This is clearly due to prioritizing the contexts which are important for users and items by giving them more weight.

Impact of Attention Weights
We have three kinds of representations for the individual user and item (opinions, feature, and context as mentioned in Sect.4.1).For the accumulation of these three representations, we determine that the concatenation of the representations is better in performance compared to summation.That is why we mentioned the results with concatenation only.We have introduced learnable attention weights for each representation before accumulating them.These learnable weights provide a different significance for each representation (i.e., opinion, contextual, and feature representation for users and items) in the final embedding.User's (or Item's) opinion representation contains information about the neighbouring nodes with respect to opinion information.Similarly, the contextual representation contains neighbouring nodes with respect to contextual information.The final representation for the user is an accumulation of these along with a dense feature representation.We believe that these representations have their own impact on the final node representation with some factors, which we call learnable weight.It might be possible that for some users opinion-based neigh-

Conclusion
We have focused our work on emphasizing the impact of knowledge about the surrounding context on user-item interaction.To this end, we have organized context, opinion, and item features into a bipartite graph and an associated multidimensional matrix.We approached the resulting matrix completion task using a graph convolutional autoencoder.Our graph encoder captures the context information along with opinion in user-item interactions.We also showed how the model leverages context information to capture the user's behaviour in relation to the surrounding context, giving attention to the most important contextual aspects of the user and item.Furthermore, the bilinear decoder predicts the labelled edges between the user and item.To demonstrate the effectiveness of our approach, we tested it on five public datasets, showing significant improvements over state-of-the-art baselines.
We have conducted various experiments to verify how context representation gives benefit.The application of our model is not only limited to product recommender systems in smart devices, i.e., music/movie/travel/fashion recommendations.This model can also be used for several intelligent predictions by developing further for specific domains like personal medical reminders for elders and smart device setting controller based on the surrounding context.In this work, the accumulative approach unifies all context information, neglecting the dynamic nature of some contextual attributes.This may result in losing the diversity of individual context attributes.In the future, we would like to explore multi-dimensional edge feature-based GNNs and multi-way interactions between users and items to capture more realistically the dynamic behaviours.Furthermore, we intend to investigate the use of separate embeddings for user and item contexts and to evaluate the performance on a large scale dataset.
On a different side, we want to extend our model to deal with heterogeneous graphs which consist of nodes of different types and different context information on different edges.
Funding Open access funding provided by Universitá di Pisa within the CRUI-CARE Agreement.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Fig. 5
Fig. 5 Effect of accumulation with attention in terms of MAE

Table 2
The average time (sec.)taken by cGCMC and cGCMC F for each dataset

Table 3 cGCMC
F encoder and decoder layers and their respective best output dimension hyperparameter values

Table 4
Test performance comparison with state-of-art algorithms.Best results are marked in bold letters ALGORITHM

Table 5
Test-set performance comparison with state-of-art algorithms.Best results are marked in bold ALGORITHM MAEFig.4Effect of importance factor α on cGC MC in terms of MAE