Implicit Feedback Awareness for Session Based Recommendation in E-Commerce

Information overload is a challenge in e-commerce platforms. E-shoppers may have difficulty selecting the best product from the available options. Recommender systems (RS) can filter relevant products according to user’s preferences, interest or observed user behaviours while they browse products on e-commerce platforms. However, collecting users’ explicit preferences for the products on these platforms is a difficult process since buyers prefer to rate the products after they use them rather than while they are looking for products. Therefore, to generate next product recommendations in the e-commerce domain, mostly shoppers’ click behaviour is taken into consideration. Shoppers could indicate their interest in the products in different ways. Spending more time on a product could imply a different level of user interest than skipping quickly the product or adding basket behaviour could show more intense interest than just browsing. In this study, we investigate the effect of applying the generated explicit ratings on RS by implementing a framework that maps users’ implicit feedback into explicit ratings in the e-commerce domain. We conduct computational experiments on well-known RS algorithms using two datasets containing mapped explicit ratings. The results of the experimental analysis indicate that incorporating calculated explicit ratings from users’ implicit feedback can help RS models perform better. The results suggest that there is more performance gap between using implicit and explicit ratings when factorisation machine RS model is used.


Introduction
Recommender systems (RS) plays an important role in digital marketing and has been widely used in several sectors such as retail, movie, news, music, book and shopping. Effective RS methods improve the user experience [11]. Also, many businesses depend on RS as powerful personalised marketing tools [29] to achieve business goals and boost sales. Thus, several methods have been developed to recommend most relevant items to users [8] including content-based filtering (CBF) [17], collaborative filtering [22] (CF) and hybrid RS [2].
Session-based recommendation systems (SBRS) [9] are one significant sort of RS, with the main goal of predicting what the next item a certain user is likely to see. Most SBRS merely use the current browsing history, i.e. the items the user has visited so far during the session. This is due to the fact that previous browsing/purchase information is not always available and may be irrelevant to the user's present purpose (the user may look for different items in different sessions). Most classical RS methods are based on user rating. User rating is normally limited to the items the user has purchased/tried, for example if a user gave a high rating for a science fiction books, this rating could be used to recommend another book for this user. However, In online SBRS item rating is not available as users will not explicitly rate the items while they are browsing, but rating takes place after using the purchased item. Therefore, current SBRS [15,16,26,47] focus on the item features and simple implicit ratings which give equal rating for all viewed items. However, in practice, a viewed item is not an accurate indication of interest, and normally users will have a different level of interest in the items they have been viewing so far.
This work extends the previously published conference paper [10] by experimenting on two new datasets that range different time periods with the previously published one to demonstrate the robustness of the suggested interest level mapping strategy. Moreover, this study further expands on the previously released conference paper by incorporating a dataset analysis section as well as a comparison with current relevant publications. The idea of the paper is that the user interaction and behaviour during a session (e.g. duration of item view, basket items, number of repeated item visits) may indicate the expected user-item rating. Therefore, in this paper, we answer the following research questions: can integration of user browsing behaviour into SBRS methods improve the RS performance, i.e. increase the prediction accuracy of the next item the user is likely to view? And how?
To answer these questions, in this paper, we propose a method to estimate a personalised item rating based on the users' behaviour in a given session. As research on the relationship between implicit feedback and explicit feedback shows that there is a meaningful correlation between these two types of feedback [21,34]. Also, the estimated rating represents the level of the user's interest in the item based on their behavioural and contextual data since several works [20,32,40] showed that using estimated ratings can help to improve the RS performance. Moreover, this paper presents a test method to measure the performance of the proposed framework on different sequence length of user-item interactions on different RS algorithms for both datasets (Fresh relevance and Yoochose Recsys datasets). The major contributions of this study are as follows: 1. We introduce a user behaviour aware model that: (a) analyses users' interest in the sessions (b) utilises derived numerical implicit ratings in RS models for next-click prediction in SBRS. 2. The proposed model is evaluated on two real-world datasets, the Yoochoose dataset from RecSys 2015, and the Fresh relevance dataset from a personalising company in the UK. Experimental results show that UIA SBRS achieves a good level of performance, and the proposed UIA mechanism plays an important role.
The rest of this paper consists of the following sections: the next section reviews similar works to this work. The following section addresses the overall proposed framework. Computational experiments and results are presented in the next section. Finally, the last section concluding remarks and future direction are reported.

Background and Related Works
The purpose of this work is to explore the relationship between one class and estimated numerical implicit rating as a context factor in SBRS. This is implemented using FR and Item-Item similarity CF models. Also, the RS models are analysed in terms of the effect of a user's past interaction length by running on two e-commerce dataset. Thus in this section, we will give a brief description of general RS models, FR and Item-Item similarity CF models, sequence aware RS and feedback types.

RS Types and Methods
In [23], RS are generally classified into three categories, namely; CF, CBF and hybrid RS. Each of these methods has advantages and drawbacks. For example, CBF RS suffers from serendipity [49], CF RS is hindered from adding a new item or new user [23,45] in other mean cold start problems and sparsity problems [1,17] and hybrid RS [5] tries to alleviate the drawbacks of these models. However, Hybrid RS can have disadvantages in terms of resource consumption since these systems combine both models [23].

Factorisation Recommender (FR) Model
The FR model [7,37] tries to learn the latent factors for the users, items and side features. The latent factors are used to rank the items for each user in terms of the likelihood that the user may interact with these items. In explicit feedback dataset, latent factors are learned by the user-item interactions and their rating on each interacted item. However, in implicit feedback, latent factors are learned solely based on the interactions. This method works well in CF-based RS for both implicit and explicit feedback datasets. Regardless of feedback type, if the aim is to rank candidate items as a result of prediction at the end of model training, the items are ranked in terms of interaction likelihood by the users. When the dataset only consist of user id, item id and rating, the score for user i on item j is calculated as in Eq. 1.

SN Computer Science
In this equation, i is a user, j is an item, is bias term, w i , w j is weight indicator for user i and item j respectively, x i , y j are vectors for side data of user i and item j respectively, a and b are respectively their side features weights, u i , v j is latent factor, which are vectors of length number of latent factors, for user i and item j respectively. When side data is not given in the FR model, loss calculation and rating calculation is similar to matrix factorization (MF) [24].
To update the weights vector in the Eq. 2, stochastic gradient descent (SGD) [3] optimization algorithm is commonly used. SGD updates each vector in each iteration by learning rate step.
The Eq. 2 shows the optimisation method of the FR model. In this equation, D is dataset containing the user id, item id and side data features. w is the weight term for users and items, a, b are weight vectors for item side data and user side data respectively. U = (u 1 , u 2 , …), V = (v 1 , v 2 , …) are latent factors for users and items respectively. 1 , 2 are regularisation parameters for weight vectors and user-item latent factors respectively. score(i, j) is calculated score in each iteration denoted as r ij . Loss function (L) in Eq. 3 calculates the difference between the predicted score r ij and actual score r ij .
The difference between FR and MF [24] is that in addition to conventional MF models, FR models learn latent vectors of side data of users and items, however, in MF, only latent factors for users and items are learnt.

Item-Item Similarity Collaborative Filtering (Item-Item Similarity CF) Recommendation Model
In Item-Item similarity CF model [25,43], the similarity between items is calculated by looking at the interacted items of users who have common interacted items. Jaccard and Cosine metrics can be used for the similarity measurement between items [41]. In Jaccard similarity [41], user ratings on items are not taken into account. The idea is to take the average of common users who interacted with both item i and item j.
In Cosine similarity [41], the ratings on items are considered. We used this similarity measurement for the derived implicit rating dataset. CS(i, j) is similarity calculation for items i and j using Eq. 5. U ij is users both rated item i and j, U i is users rated item i, U j is users rated item j, r ui is the rating of user u on item i and r uj is the rating of user u on item j.
Predictions, y uj are calculated two ways depending on whether rating is specified or not. For example, Eq. 6 is used for prediction when ratings are not presented.
y uj is prediction of user u on item j, i I u is items interacted by user u, sim(i, j) is similarity between item i and item j calculated using Eq. 4.
When the ratings are specified in the dataset for example if the given dataset has explicit feedback, Eq. 7 is used to calculate predictions, r ui is rating given by user u on item i, sim(i, j) is calculated using Eqs. 4 or 5. The RS used in Amazon website [25] is a well-known example of Item-Item similarity CF.

Context Awareness in Session-Based Recommender Systems
The recommendation list can be influenced by the context the user is in. The works [4,19,39] investigated the context factor on the performance of the recommendation model. [4] examined position and context awareness of SBRS using deep-learning basing method, the experiments showed 3% of improvement on recall and precision. The experiments showed better recall and precision scores after applying context-awareness. Moreover, [19] investigated the role of discounts, the effects of adopting users' short term intention and popularity trends of the products on RS performance.

Feedback Types
In the RS domain, there are two types of feedback. These are implicit and explicit feedback.
1. Implicit feedback: Implicit feedback is observed behaviours without interrupting users' usage of the system. Namely, a user is not aware while performing implicit feedback obtaining process. This data can be used in RS by interpreting the user interest level [31]. Implicit feedback can be purchase history, read time, number of clicks or session duration. 2. Explicit feedback: In this type of feedback, the user explicitly indicates his interest score on a service such as listened to music, watched a movie or any object that user interacted [20].

Explicit and Implicit Feedback Correlation
In [20], they proposed a work to show the correlation between implicit feedback and explicit feedback. To compare both feedback types, they analysed a music-listening platform last.fm dataset. 1 In last.fm dataset, like or unlike of a song is explicit preference indicator and number of times a track played by a user is implicit preference indicator. The authors derived numerical implicit ratings from implicit preference indicators, and they created another dataset to store numerical implicit ratings. They built CF RS models on the two types of feedback. Their results show that different types of feedback complement each other. Also, the models trained on different feedback types showed similar performance to each other. In [40], a method to compare performance measurement of two types of feedback applied to the job domain is designed. They presented a user-user similarity CF approach using only implicit feedback. Before applying their method, they analysed which factors could be strong indicators of user interest to a job. They aim to find which resources better-represent users' interest level and how to represent users' implicit feedback level to the explicit level. Similarly, in [31], user's behaviours on an electronic book domain were captured. The authors converted observed user behaviours into explicit ratings. Their results indicate that user behaviour modelling showed a significant improvement on RS model's performance.

Limitations of the Previous Approaches
Previous works about converting user behaviours to numerical ratings mainly focused on User-based CF that they investigated already observed user behaviours in the past for only registered users. The limitation of this approach is that Userbased CF models cannot produce recommendations when users' rating history is absent [19,24] since recently, e-commerce websites have become popular. In these e-commerce websites, shoppers can browse items without registering even they can purchase items as a guest, and there is no any user-product rating history. Accordingly, to solve the drawback of User-based CF models for anonymous users in e-commerce platforms, session-based recommender (SBRS) models have been developed where only click behaviours are considered for the next item recommendations [15,19]. On the other hand, users leave valuable data about their intentions and preferences while browsing the items in the sessions such as duration spent on an item and the number of clicks for an item. One of the limitations of current SBRS models is that user's valuable behavioural indications are ignored, and these models provide next item recommendations solely based on user's click behaviour in the session.
Moreover, as mentioned above, context factors such as price and category of the browsed products in the session are already used for filtering purpose in SBRS. The limitation of this approach is that restricting recommendation models to only filtering based on context factors can cause to losing valuable user preference indicators since not only item and time-based features also user behaviours are strong signals for showing users' interest level on the browsed items in the SN Computer Science sessions. For instance, the minutes user spent on an item while exploring the item, the number of clicks of an item and basket actions(added to cart, browsed only) of the user in the session could be considered as users' interest level indicator on the item. In this paper, we combine all the user activities in the session and create an implicit numerical rating that estimates users' interest level on an item in the on-going session.

User Interest Aware Framework
As mentioned before, current SBRS is mainly based on implicit item rating, where session viewed items are equally rated in terms of user interest. The proposed algorithm in this paper is motivated by the idea that the user-item interaction in a given session can indicate the level of the user interest in the items viewed so far.
This section presents the user interest aware (UIA) SBRS framework. In this framework, we propose a novel method to predict the user interest (rating) for an item in the given session and use the predicted rating in the RS algorithms (Item-Item CF and FR).
In the proposed UIA framework (Fig. 1), we have created a method to predict users' interest levels on the item by taking into account their implicit feedback and recommended products based on their behaviours in the on-going sessions. The framework consists of three main phases. The first phase is the data collection, data pre-processing and feature selection. The second phase is interest level prediction, which could be seen as a way for converting implicit to explicit rating, and the last phase is utilising the derived ratings on SBRS models.

Phase 1: Data Collection, Preparation and Analysis
This phase consists of data collection from the company, data preparation and dataset analysing steps. In the data preparation step, we apply label encoding 2 to categorical IDs, and we refine items which are viewed only one time in the whole dataset and some sessions consist of one viewed item, in which they do not provide enough information to build connections with other items and the sessions. We use two datasets in these work. The first dataset is the Fresh relevance dataset. This dataset covers for a two weeks period from a real-world e-commerce website. 3 The second dataset is the Yoochose RecSys dataset 4 which stores click events from an e-commerce website and covers one month of period. Table 1 shows the statistics about the number of unique items, sessions and total interactions in each dataset. In this section, we analyse the details of our datasets. As seen from Table 1, the dataset density is very low. Thus, RS models are effected from cold-start sessions as mostly, in the sessions a few items are browsed.  Yoochose RecSys Dataset Figure 2 shows the analysis of user interaction for 1 month. It can be seen in Fig. 2 that users likely choose to visit the website on weekends or closer days to weekends such as Fridays and Mondays. Also, users prefer to visit the website after working hours around 19:00, as seen in Fig. 3. We examine the item frequencies in sessions. We find that most sessions include two items interactions which is very low number for RS to create correlations between items and sessions due to sparsity and cold start challenges. Thus, we eliminate some sessions which have less than 10 items interactions. After the pre-processing stage, we have 35,233 item, 180,512 sessions and 3,167,484 total interactions. Before

Fresh Relevance Dataset
We analyse Fresh relevance dataset in terms of the relation between basket outcome and session duration, as seen in Fig. 4. ba the sessions end up browsing only, b the sessions end up with adding to basket, t the sessions end up with the purchase.
It can be seen from Fig. 4 that if the outcome of the session is purchasing, the total duration is longer than other types of outcome. Also, as expected if users' intention solely to browse products, the session has the shortest duration among the other types of outcome.
Moreover, we look at the daily user interaction frequency with the session outcome to find if there is any correlation between weekdays and user interactions (Fig 5).
Interestingly, it seems from the analyses of the result that users are keen on buying or browsing products on weekends, especially on Sunday. In contrast, in the middle of the week, they are less likely to visit the website to purchase or browse items. Also, we analyse the most interacted hour in the days and Fig. 6 shows that users are more likely to visit the website after working hours.
We look at the session-item interaction frequency for each session. Our dataset analysis results show that more than 450,000 sessions have viewed only one item which means that RS algorithms will suffer from cold start problem and sparsity problem since there is not enough data to establish similarity relation between sessions and items as in Yoochose dataset. To alleviate sparsity and cold start drawbacks for RS models, we delete sessions which have less than 10 item interactions.

Phase 2: Interest Level Prediction
In this phase, we analyse the users' behaviours and their contributions to calculate the final users' interest level on items.
A user can be directed to a website from different sources, for example, from Google search or an advertisement link shown on a website. The first item that the user look for can be considered as the most relevant item for the user initial intention. After visiting a product, the user will get recommendations based on the item's content or other users' tastes. The point in RS is to get user attention to visit items in the recommendation list. If recommended items are interesting for the user, he/she will click and will look at the detail of the suggested product. If the user is happy with the item user browsed, user can add this item to cart. Otherwise, the user will keep searching until he finds his favoured items or user will leave the system. Sometimes, the user can have some uncertainties about buying products added to cart. In that case, the user will not proceed to purchase the item added to cart, or the user can give up browsing products and leave the system.

Simple Implicit Feedback
Simple implicit feedback can be considered as positive feedback if a user views an item. Since we do not have explicitly given ratings by users, we have two indicators U ui for the simple implicit feedback, in which if the user u U interacted with the product i I or not in a session s S . For the proposed framework, we used Eq. 8 for the simple implicit feedback (one class) rating representation.
For any interactions, regardless users' basket outcome, purchasing behaviour or click behaviour, if there is an interaction with a product, this can be considered as positive feedback otherwise 0, means a user has not seen the items yet. Also, as mentioned in [40], implicit feedback is a relative indication that shows if a user likes an item or not. However, having an interaction on an item can be assumed a minimum interest level [33,46].

Behaviour Mapping
Mapping implicit feedback (behaviour) to numerical implicit rating can help better represent user interest on the items. However, implicit mapping feedback is not trivial work since each different domain has different factors to be considered [32,40]. The motivation of the UIA framework is to see whether there is an improvement on RS performance by analysing user activities in terms of their different behaviours on e-commerce websites, and deriving users' interest level on items as the numerical implicit rating.
Other researchers [30,40] proposed methods to convert implicit feedback to numerical implicit rating by giving weights to users' behaviours on an e-book application, and job domain, respectively. We follow a similar way to construct the numerical implicit ratings(interest level), and we define actions a user can have on e-commerce system and their weights (see Table 2). If any of these actions have not appeared in the dataset, their contribution will be 0.
To understand the process of explication converting process, Table 2 is explained in detail. ID indicates different behaviours and used in mathematical notation defining stage, Name explains the behaviour that the user showed in the system. Weight shows the contribution of a given behaviour on the numerical implicit rating conversion process. For example, if a user did not like an item he browsed, he may have the intention to click another item in a minute.

Mathematical Model to Convert Implicit Feedback to Numerical Value of Implicit Feedback (Interest Level)
We define different mathematical equations to indicate user u U interest level on item i I . The aim of using mathematical equations is to interpret users' actions to have a numerical value of implicit feedback which we call explicit rating of user behaviour or numerical implicit rating. After having users' explicit rating, they can be utilised in different RS methods to analyse explicitly modelled user-item interactions. Our final rating score will be between 0 and 4, which means 0 shows that the user has not interacted with item yet and 4 means item took user's attraction at the highest level.
As mentioned in [20], each domain has different implicit feedback modelling method, even for similar domains but in different e-commerce applications, the interpretation method for the implicit feedback changes. Thus, we may have different weights and their contributions for final explicit rating calculation for each dataset.
F1: This indicates the click count contribution to the numerical implicit rating. This indicates the click count contribution to the numerical implicit rating for each item for a session. To have a normalised value for this indicator, we formulate the calculation of this indicators contribution in Eq. 9. In this equation t c shows total click count in a session s, and c i shows the click count for the item i in the session s. In this equation, we will get a value between 0 and 1 as item's click contribution to implicit rating based on total click and item's click in the session F2 ∶ indicates level, in which if item i is added to basket in session s (Eq. 10) The contribution of adding an item to basket shows an interest level for the item but this depends on users' habit. For example, if a user adds more than one item to the basket, the interest level for the each item can be different comparing to adding one item to the basket. Therefore, in the Equation, t a shows total number of added items to basket, and t i shows how many item i is added to basket in a session s. The user's interest contribution of adding to basket for each item is restricted between 0 and 1. This equation is valid only if there is any item is added to the basket in the session ( t a > 0).
Basket outcome has three categories: b means item only browsed. ba means item added to the basket but not purchased. t means item is purchased. We assume item purchasing is strong interest indicator that we calculate its contribution in F4, adding to the basket is high-interest indicator however it is relatively less than purchasing, and browsing is minimum interest indicator however we already calculated its contribution in click count indicator F1; thus we will not give any interest level contribution for browsing the items.
F3 ∶ This represents the duration factor. We can think that if a user spends more time on an item, this means the user has more interest level than less time spend. The Eq. 11 is used to calculate the user's interest level on an item using duration factor. In this equation, total session duration represented as t d , and i d duration spend on an item i in the session s. Calculated F3 value as the consequence of duration factor for interest level calculation on an item is in range between 0 and 1 F4: shows if the item i is purchased in a session s or not. This will have an important interest level indicator for user on an item in the session. It is calculated using Eq. 12. In this equation, t p is the number of total purchased item in the session s, and i p is the number that shows how many of item i is purchased in the session s. This interest level has a score between 0 and 1 for the each item in a session s. This implicit factor is valid when at least one item is purchased in the session s ( t p > 0).

Final Numerical Implicit Rating Calculation
We use weights for final score calculation, these weights are showing importance levels of the factors. The sum of these weights is equals to 1 (Eq. 13).
After we have numerical equivalents of implicit feedback using factor equivalence of user behaviours, we create the final numerical rating score by applying aggregation of each numerical equivalents with considering their weights (Eq. 14). The best weight combination in this equation learnt by applying a cross validation method, in which in each cross validation the performance of RS models are evaluated and the weights for the best found performance are selected.

Phase 3: Evaluation
In this phase, we evaluate the proposed framework with different metrics. First, we split datasets as test and train. For testing, we allocate one month and one day for Yoochose RecSys dataset and Fresh relevance dataset respectively. Secondly, RS models are trained with two different types of datasets; the dataset consisting of simple implicit feedback, and new dataset with explicit feedback. In the model evaluation step, models are evaluated by giving interacted items in the sessions to trained RS models and getting recommendations from the models. Ground truth items will be hidden, and the recommendations from the model and items in ground truth will be compared to evaluate the performance. Note that we do not use derived numerical implicit rating for the ongoing item interaction but previous items since in practice we cannot utilise numerical implicit rating concurrently, in which after user view next item we can utilise numerical implicit rating for the previous item.

Experimental Setup and Results
In this section, we explain the experimental setup details, evaluation metrics, evaluation methods lastly, we discuss the results of the experiments.

Experimental Setup
We use in this work two different RS models which are the FR model and Item-Item similarity CF model. To run the experiments, we use Graphlab machine learning tool. 5

FR Model
If any side data are not presented in the FR model, it acts as a standard MF model. We used two different types of FR. The first one is for implicit feedback which is one class implicit feedback and the second one is for derived numerical implicit rating.

Item-Item Similarity CF Model
We use the Item-Item similarity CF model to compare the result of one class implicit rating data and derived numerical implicit rating. For evaluation, Jaccard and Cosine similarity metrics are used in the Item-Item similarity CF model. In the Jaccard similarity metric, only interacted items are important regardless of ratings on items. On the other hand, Cosine similarity takes into account the user ratings on items.

Evaluation Metrics
In the literature, accuracy, precision, recall and coverage are some metrics used in RS [14]. recall@n (Eq. 15) and precision@n (Eq. 16) have been used widely in the top-n ranked list RS [13,14,42]. Since RS can only recommend a few items at a time, users are expecting to see relevant items in the first page. Thus, we prefer recall@n as an evaluation metric to measure the performance of our method on top n recommendations. recall@n metric shows how the model is good to predict the items in ground truth, precision@n describes how our model's recommendations are good to predict items in ground truth. Also, we employ user coverage metric to see the ratio of the number of users that get at least one correct prediction. coverage@n (Eq. 17) describes the ratio of the number of the users retrieved at least one correct recommendation U mi to all number of the user U in test data [28].

Dataset Splitting
For dataset splitting, we apply 10 fold cross-validation to have reliable performance results. In each validation loop, we split sessions as train and test. For test dataset, we select 10% of whole sessions in each fold. We do not add any session-item interaction to train dataset from test sessions. In other words, our models are blind to the test sessions. The experiment results show the average values of the 10 fold cross-validation.

Calculating the Weights
The values of the weights of the behaviours w n n ∈ {1,2,3,4 } are decided by the experiments' results. The main approach followed is to assume that, browsing ( w 1 ) an item is the weakest level for users' interest indicator. If an item is added to cart ( w 2 ), it is presumed that the user has an intention to buy this product and that, thus, he/she has a higher interest level than when just viewing the item. Also, the duration ( w 3 ) that the user spent on the item shows an interest level, if it is more than a certain level, as explained in Sect. 3. Lastly, if an item is purchased ( w 4 ) in the session, it is taken that the user explicitly indicated he/she liked it and is interested in it. In fact, purchase action has the highest interest rate among the other action types. After experimenting with different weight values by considering the above assumption which inspired by [18], the best ones are identified, as shown (15) recall@n = |Recommended Items ∩ Ground Truth| |Ground Truth| |Recommended Items ∩ Ground Truth| |Recommended Items|  Table 3. The search space for the weights are restricted between 0 and 1.

Experiments
We choose two different recommendation models which are Item-Item similarity CF [22] and FR models [36]. We analyse Item-Item similarity CF with two different similarity measurements, which are Cosine and Jaccard. Cosine similarity is applied to numerical implicit rating data while the Jaccard similarity metric is applied to one-class implicit ratings. In Item-Item similarity CF, 64 most similar items are selected for each item as neighbour since experiment results showed above 64 nearest neighbour does not make a difference in performance. For the FR model, we select stochastic gradient descent (SGD) [44] as the optimisation method. In Graphlab tool, we can adjust if our dataset consists of implicit feedback or explicit feedback by defining the target attribute. If the model is trained with a target attribute, it means we are using explicit feedback and model will be trained with the standard SGD optimisation method. Otherwise, when the ratings are not available, the ranking will be done by SGD optimiser that SGD will optimise logistic loss function such as observed items is pushed to 1, and the unobserved sample is pushed to 0. In the FR model, since the dimension of latent factors is an important parameter to represent item latent factors and user latent factors, we set this parameter 100 as training FR model is computationally expensive and experiments shows that above 100 for the dimension of the latent factors does not show enough improvement on the performance.  For each interacted item(item sequence), we retrieve top@n n [5] recommendation, and we evaluate top@n recommendation with recall@n, precision@n and coverage@n metrics. Length of item sequence changes regarding the length of hidden items to predict. Our aims in this experiment are two-fold. Firstly, we investigate the performance of RS on numerical implicit rating data which is derived from user behaviours and one class implicit rating data. Secondly, we analyse the effect of the sequence length, which has been used as interacted items on RS performance.
The overview of our experiment method for sequence aware recommendation is simulated in Fig. 7. As shown in Fig. 7, at the beginning of a session, the number of the items in ground truth is 30, the items in the ground truth will be predicted by the model base on interacted item(s). For a given interacted item/items in a session, the recommendation outcome is ranked based on the similarity scores of the given item/items. Over time, the length of interacted items increases until the ground truth length reduces to 1.

Result and Analysis
We report the results of experiments in Tables 4, 5, 6 and 7. In Tables, G shows the number of items in the ground truth.
As seen in Fig. 7, the aim is recommending accurately these items. For example, if G is 1, it means that except last interacted item, all previously interacted items in the session are used for getting recommendations, and the target is predicting correctly this hidden item. Also, impl. and exp. indicate evaluation results of the implicit (baseline) and estimated numerical implicit rating, respectively. As users are more likely interested in top items in the recommendation list, we chose recall@n and precision@n evaluation metrics. For model effectiveness, the experiment results are also analysed in terms of coverage@n metric. We compare the proposed framework on Item-Item similarity CF and FR models on each dataset. Tables 4 and 5 show the performance results for the Item-Item CF and FR RS models on Yoochose Rec-Sys dataset, while Tables 6 and 7 show the evaluation results for Item-Item CF and the FR RS models on Fresh relevance dataset. Tables 5 and 7 show that the FR model has performed better for all evaluation metrics when the models trained on derived numerical ratings. Also, the results of Tables 5  and 7 show that when the FR model knows more interacted items from the sessions, it performed better in terms of recall. Interestingly, when the length of interacted items are decreased (length of ground truth increased), the performance difference between the FR model trained on two Moreover, the coverage rate of the Item-Item similarity CF model for both datasets has performed better than the FR model. Also, the models trained on derived implicit ratings showed robust coverage rates compared to one class rating dataset. This result has supported that taking into consideration users' actions in the sessions may help create well session-item correlations.
Overall, we can confirm from the results that when the model knows more interaction about an ongoing session, it performs better in terms of recall metric. Also, when we train the models with derived ratings, the models have better performance on all metrics due to taking into account users' preferences on the items in the sessions. Lastly, the overall results show that the Item-Item similarity CF model fit better than the FR model in SBRS domain.

Discussion
The literature examines next item recommendations in a number of ways. Item KNN [25], for example, estimates transition probability after an item interaction by calculating item similarity from the sessions, however simply examining items in the sessions might ignore some of the aspects of the user intention in the session. Later, the Markov chain [38] approach is created, which also depends on the transition probability between items and the item sequences in the session. Recently, neural network [6], recurrent neural network [15], graph neural network [12] and deep neural networkbased [48] algorithms have been applied to session-based recommendations. These models, on the other hand, seek to discover item transition relations only via the use of item sequences. Existing approaches have achieved good results in capturing item relations in sessions, but they still have limitations. These strategies take into account the clicked items in the sessions on the same level in terms of the user's interest. On the other hand, each item may capture the user's interest at a different degree [18]. For example, a user may spend more time on one item when browsing than on others, or a user may visit the same item many times during the session. As a result, in addition to utilising item sequence to calculate the transition probability between items, users' implicit feedback may also be included. To the best of our knowledge, this is the first effort that takes into account implicit feedback during session-based recommendation in the e-commerce domain by converting users' implicit feedback into numerical implicit ratings. Our findings add to the body of knowledge by demonstrating that when users' implicit feedback is used to generate recommendations, RS performance significantly improves.

Conclusion and Future Works
In this study, we proposed a user behaviour aware framework called UIA to integrate user behaviour awareness to the SBRS models. In this framework, we derived numerical implicit ratings from users' behaviours in the sessions, and we utilised derived numerical implicit ratings as the context factor in the two different RS models namely FR and Item-Item CF models and compared the models' performances which are trained on derived numerical implicit rating dataset and one class implicit ratings dataset. Also, we analysed the effect of sequential awareness on the models' performance. We evaluated the UIA framework on two realworld datasets and three evaluation metrics to see how the proposed framework performs. We believe that our study has several important results: 1. Integrating users' behaviours besides other context factors in the sessions help to improve SBRS quality. 2. SBRS models are performing better when the sessions have more item interactions. 3. Using derived numerical implicit ratings enhanced FR model more than Item-Item similarity CF model. However, evaluation results for all metrics showed that Item-Item similarity CF models have better performance on SBRS. This results support why the Item-Item similarity CF models are mostly preferred in SBRS [27,35].
The suggested UIA framework has some limitations. To begin, the estimated weights are affected by the dataset utilised in the experiments. As a result, it may be preferable to calculate these weights independently for each dataset. Second, the features considered for the weight factors must be present in the datasets. For future works, different approaches can be applied for deriving implicit numerical feedback from user behaviours. Also, different user behaviours such as only viewing, adding to cart and time spending on items can be integrated RNN based recommendation in addition to item feature embedding and user feature embedding. In this work, we split the sessions from different levels, and we used all items in the first side as interaction, as seen in the experiment section. However, instead of using all items as interacted items, one can design a different method to analyse the effect of inputs one by one or different input combinations as the interacted items from the first part of the split session.
Funding No funding was received.
Data availability Interested parties can obtain the anonymised datasets that support the findings of this study from the corresponding author upon reasonable request.

Conflict of interest
The authors declare that they have no conflict of interest.
Ethical standards This article does not contain any studies with human participants or animals performed by any of the authors.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.