Object-aware Policy Network in Deep Recommender Systems

Deep learning has been successfully applied in the recommender system. Low-dimensional dense embedding is typically used to represent the feature of users and items. To optimize the model, some models propose to dynamically search the embedding size based on the popularity of different users and items. However, these models ignore the interaction between the user and the item which will hinder the optimization of the features in embedding. In this paper, we propose Object-aware Policy Network (OPN) and introduces an object-aware method that is used for optimizing the features in embedding. We evaluate our model on the two real-world benchmark datasets. With less than 10% increased time consumption in all experiments, the results show that our proposed model is able to improve the performance of binary classification task by a margin of 0.30 and multiclass classification task by a margin of 0.35 compared to the best accuracies achieced by baselines on different datasests.


Introduction
Deep learning techniques have been applied to many recommender systems [1][2][3] for its excellent performance in nonlinear transformation and representation learning. Deep recommender systems commonly consist of a representation layer and an inference layer. The former learns to map discrete user and item identifiers into real-value embedding representations and the latter takes this as input to calculate it with MLP (Multi-Layer Perception) or FM (Factorization Machine) [4] to get the prediction. Lots of works improve the learning models' performance by capturing high-order feature interactions [5], applying the attention mechanism [6] or combining the factorization machines with the MLP [5], but many interesting researches [1,7,8] have shown that the representation layer is also an important factor for the performance improvement of the model.
Recently, research on the representation layer mainly focused on searching the embedding size for different users and items. ESPAN (Embedding Size Adjustment Policy Network) [9], NIS (Neural Input Search) [10], and AutoEmb (AutoML Based End-to-end Framework) [11] uphold that the embedding size should be dynamically changed according to the popularity of a user-item pair. Applying highdimensional embeddings to the user/item with low frequency will lead to overfitting due to over-parameterization [12]. And such a model cannot be trained effectively [11] when applying low-dimensional embeddings to the user/item with high frequency. However, the models mentioned above improve the model only in the aspect of the embedding size without optimizing features in the embeddings, which will impact the accuracy of the model. To address the above problem, in this work, we present an Object-aware Policy Network (OPN) on the basis of the prior work of ESPAN [9]. 'Object' means the interaction target of the user and item. 'Policy Network' is a model proposed in the ESPAN and the 'Policy' defines how to adjust embedding size of the user and item. The proposed model introduces the object-aware method to optimize the features by learning user preference when user interacts with different items. The design of the OPN has taken the following considerations.
Firstly, features play a central role in the success of many predictive systems. For many real-world tasks, the lack of the high-quality features often affects the effectiveness of a recommender system. Hence, many recommender systems involve operations of feature interaction. Secondly, to improve the FM [4], FFMs (Field-aware Factorization Machines) [13] saves several embeddings for a single feature field, which shows that, comparing to an independent the static manner, dynamic features in embeddings can yield better performance. Thirdly, [14] believes that different users show different preferences when interacting with different items, which means that we can optimize the features in embeddings with the interactions between the user and the item.
OPN optimizes the features in embeddings by capturing the interactions between the user and the item with objectaware method. Moreover, we have conducted extensive experiments on two public recommendation datasets and the results have indicated that our proposed model could outperform several baselines by a margin of 0.30 with barely little extra time consumption being introduced as overhead. Our main contributions are summarized as follows: 1. We propose an object-aware method which optimizes the features in embeddings by capturing the interactions between the user and the item. 2. In order to find the appropriate concatenated embedding size, we carried out a large number of experiments, and finally gave the most suitable size for the model. 3. We evaluate OPN on benchmark data, which shows consistent improvement over several competitive baselines. The remainder of this paper is organized as follows. First, we review related work in Sect. 2. Then, we introduce some preliminary knowledge for better understanding of our method in Sect. 3. Next, we detail the proposed method in Sect. 4. The experiment detail is introduced and analyzed in Sect. 5. Finally, this paper is concluded in Sect. 6.

Related Work
Recently, deep learning is gradually applied in recommender systems. Many models capture the latent feature interactions between the user and the item with deep learning's ability of non-linear transformation. [1] proposed Wide&Deep, which jointly trained wide linear models and deep neural networks to exploit the correlation available in the historical data and to explore new feature combinations. In order to handle the parameter explosion caused by models like FM [4] in the high-order feature interactions, [5] proposed a DCN (Deep & Cross Network) model that used a novel cross-network to efficiently learn certain bounded-degree feature interactions. [15] proposed NFM (Neural Factorization Machine) model, which enhanced the expression ability of FM by modelling high-order and non-linear feature interactions with novel Bilinear Interaction pooling. This operation could encode more informative feature interactions and facilitate the following deep layers to learn meaningful information. Attention is an important technique in recommendation system. [16] proposed AutoInt which automatically learns the high-order feature interaction of input features with multi-head selfattentive neaural network which could model different orders of feature combinations of input features. [17] proposed to learn the importance of each feature interaction from data with attention technique which is implemented by the attention net. Deep recommender systems consist of two key components: a representation layer and an inference layer. All the works discussed above focus on the inference layer. After analyzing many earlier studies [7, 9-11, 13, 18, 19], we can conclude that the improvements of the representation layer are mainly summarized into two aspects: searching embedding size and optimizing the features in embeddings.
The appropriate size of the embeddings could either avoid the overfitting or effectively decrease the number of the parameters. [20] proposed the NAS (Neural Architecture Search) that leveraged reinforcement learning (RL) with RNN (Recurrent Neural Network) to train lots of candidate components to convergence. Inspired by the NAS, [10] proposed a NIS (Neural Input Search) model which searched for appropriate embeddings size in a collection of Embedding Blocks with a RL algorithm like ENAS (Efficient Neural Architecture Search) [21]. [11] proposed AutoEmb model, which was an AutoML based end-to-end framework. This model can automatically and dynamically leverage embedding with various dimensions. [9] proposed ESPAN model, which overcome the drawbacks of soft selection with hard selection.
It has been proved to be efficient to improve the model performance by optimizing the features in the embeddings [18,22,23]. [13] proposed a FFMs model, which saved several embeddings for a single field for interactions with different fields, but the number of parameters in FFMs was in the order of the multiplication of the feature number and field number. To solve this problem, the FWFMS (Fieldweighted Factorization Machines) model proposed by [7] had learned a field pair weight matrix and could effectively capture the heterogeneity of field pair interactions, thus greatly reducing the number of parameters. Work reported in [1] was a hybrid network structure that combined a wide model and a deep model, using feature engineering in the wide component to enhance the learning of deep component. However, feature engineering can be expensive and requires domain knowledge. [18] proposed a FGCNN (Feature Generation by Convolutional Neural Network) model, which generated sophisticated feature interactions automatically through machine learning models to avoid human intervention and obtain more useful feature interactions. [19] and [24] proposed to optimize the features in the embeddings with attention mechanism, which could learn the context-aware latent features based on text information. In order to capture user's diverse interests from historical behaviors, [25]proposed to adaptively learn the representation of user interests with a locl activation unit which will calculate the distribution of the user's interests with attention technique. As user's trend of interest will change all the time, [26] proposed to capture interest elvolving process with interest elvolving layer which embeds attention mechanism into the sequential structure novelly. [27] uses a hierarchical structure which incorporates character-level and word-level information and applies an attention mechanism to both levels to differentiate more important information from less important information.
Previous works ignored the possibility that the features in the embeddings could be affected by the interacted object which is important to the model [28][29][30]. In this paper, based on ESPAN [9], we propose an Object-aware Policy Network (OPN), which dynamically optimizes the features in the embeddings.

Embedding
One-hot is a coding method commonly used in NLP (Natural Language Processing) and recommender systems. Though it can convert the categorical data to the form that can be easily used by machine learning, there are still some drawbacks in the practice. In natural language processing and understanding, the words are encoded as one-hot vectors, such as "programmer = [0, 0, 1, 0, 0, 0]" and "salesman = [0, 0, 0, 0, 1, 0]". However, this treats individual words as unique symbols and can't reflect characteristics of them.
To solve the above problems, it is suggested to apply an embedding layer upon the raw feature input to compress it to a low dimensional, dense real-value vector: where e i is the embedding vector, x i is the one-hot vector for an item, and W embed,i ∈ ℝ n e ×n v is the corresponding embedding matrix and n e , n v are the embedding size and number of the item, respectively.
As shown in Fig. 1, embedding layer maps the words to a joint latent factor space of dimensionality f. "programmer" and "salesman" are associated with embedding vectors p, q ∈ ℝ f respectively, and embeddings could measure the extent to which the word possesses those factors.

Field-aware
Inspired by FFMs [13], this section will introduce it for the convenience of understanding the method proposed in this paper. Introduction to the FM [4] will be shown first before the FFMs.
FM is the first model that computes the interactions between different field features in linear time with a linear number of parameters. It models the 2-order interaction between features as following: where n denotes the number of fields, w 0 ∈ ℝ,w ∈ ℝ n , V ∈ ℝ n×k . ⟨v i , v j ⟩ is the dot product of two vectors, used to capture the 2-order interaction between features x i and x j .
However, FM models feature interactions with static embeddings while features from one field often interact differently with features from different other fields. Thus, FFMs proposed to model such difference explicitly by saving where f 1 and f 2 are respectively the fields of j 1 and j 2 , w ∈ ℝ n×n and n is the number of fields. As depicted in Fig. 2, FFMs save n -1 embedding vectors for each feature. When modeling different features, FFMs only use the corresponding one v i,j to interact with another feature j.

Our Approach
In this section, we will present the object-aware method which effectively tackles the challenges mentioned in Sect. 1 via capturing the interaction between user and item with MLP layers. We will first provide the overview of the framework; next detail the feature optimizing model and recommendation model.

Overview
We aim to optimize the features in embeddings by capturing interactions between users and items. To this end, we propose the object-aware method to tackle this challenge based on ESPAN [9]. As depicted in Fig. 3, our framework consists of three core components: the policy network, feature optimizing model and the deep recommendation model.
The policy network in OPN serves as an RL agent that dynamically adjusts the embeddings size of users and items and provides appropriate embeddings for feature optimizing model. Next, the feature optimizing model will optimize the features in embeddings with objectaware method and unify the sizes of the input embeddings. Afterward, two unified embeddings will be concatenated and fed into the deep recommendation model to calculate the prediction results. Finally, the recommendation model and the feature optimizing model will be updated with the backward-propagation.

Feature Optimizing Model
Commonly, embeddings generated from users and items will be fed into recommendation model directly. However, raw features in embeddings can't represent the users or items appropriately. So, we propose to capture the interactions between users and items to optimize the features by object-aware method.
As shown in Fig. 4, after fed into feature optimizing model, embeddings of the user e (u) and the item e (i) will be concatenated with a fixed size embedding of the item e (i) f or the user e (u) f as the following:

Suppose we have n candidate embedding sizes for both of users and items
For simplicity, we denote e �(u) and e �(i) as e ′ which has embedding . Next, to capture the interactions between users and items, we send e, into the transformation map as following: where e ′ k is any embedding with dimension d k (k = 1, 2, ⋯ , n − 1) , ê is embedding with dimension d n . W k−1→k and (4) b k−1→k are learnable weight and bias parameters. Furthermore, e ′ with d ′ n embedding size will be transformed as the following: where W n and b n are learnable weight and bias parameters. Thus, we can ensure that embedding e ′ j with dimension d j (j = 1, 2, ⋯ , n) will always be transformed into the embedding with dimension d n .

Deep Recommendation Model
As depicted in Fig. 5, two embeddings are fed into the deep recommendation model. Since the model adopts two pathways to model users and items, we combine the features of two pathways by concatenating them where ê (u) and ê (i) are embeddings of users and items. This design has been widely adopted in many deep learning works [31,32]. To capture the interactions between user and item latent features we feed them into inference layer, which consists of MLP with m layers: where W l ∈ ℝ k l ×k l−1 and b l ∈ ℝ k l represent the model weight and bias of the l-th layer. k l is the number of neurons in the l-th layer perception. l is the layer depth and the is an Overview of the feature optimizing model. activation function. After that, l is then used to calculate the prediction results through an activation function.

Suppose we have n candidate embedding sizes for both of users and items
The number of parameters in ESPAN is d 2 where K is the size of the concatenated embedding.OPN use 2K(d 1 + d 2 + ⋯ + d n ) + nK 2 additional parameters in the whole model and the number of parameters of OPN is not significantly more than that of ESPAN. However, it's hard to select the size of the concatenated embedding and we have to carry out a large number of experiments to try every possible size to alleviate this problem.

Experiments
To validate the effectiveness of our proposed method, we carry out comprehensive experiments on two real-world datasets for recommendation benchmark purpose. In the experiment, we want to validate and testify: • (Q1) How does the proposed model perform compared to the traditional models and ESPAN? • (Q2) How to choose the size of the concatenated embeddings for the object-aware method? • (Q3) How does object-aware method affect the training time of the model?
In this section, we will first introduce the experimental settings, then present the overall performance comparison and time consumption with discussion.

Dataset
The following two datasets are used to evaluate the proposed model: • MovieLens 20M Dataset (ml-20m) [33]: This is a public movie rating dataset from grouplen. It has 20 million ratings and 465,000 tag applications applied to 27,000 movies by 138,000 users and it is guaranteed that every user has rated at least 20 movies. • MovieLens Latest Datasets (ml-latest) [34]: This is the newest public movie rating dataset from grouplen. It has 27,000,000 ratings and 1,100,000 tag applications applied to 58,000 movies by 280,000 users and every user has rated at least 1 movie.

Parameter Settings
First, we set six candidate embedding sizes D = {2, 4, 8, 16, 64, 128} for both the users and items. The recommender system and the policy network are trained upon the prior work of ESPAN [9]. Adam [35] optimizers with an initial learning rate of 0.003 and 0.0001 are used for about two networks, respectively. Dimension of the concatenated embeddings is set as 8. The whole framework is trained on mini-batches with a size of 500.

Platform
All experiments are conducted on a Windows PC with CPU Intel(R) Core(TM) i5-9300H @2.40GHz, 16GM memory and GTX 1650 with 4GB memory.

Baseline
We compare our model with the following four baseline methods: • FIXED: The original recommendation model with fixed embedding size and we choose this model as baseline to show that multi-size embedding could improve the performance of the model. • DARTS [36]: This is a classic model based on multi-size embedding. • AutoEmb [11]: A DARTS-based method assigns weights based on the frequency of a given user and item. The whole framework is end-to-end differentiable with soft selection. We set it as the baseline model to show that hard selection could improve prediction accuracy more than soft selection. • ESPAN [9]: We propose the OPN on the basis of this model and set it as the baseline model to prove the effectiveness of the object-aware method.

Evaluation Metric
We evaluate the baselines and our model on the following tasks: • Binary Classification Task: Users' five ratings are divided into positive and negative labels. Positive label includes 4-star and 5-star, and other ratings are assigned to the negative label. Predicting the correct label is the recommendation model's task and we will evaluate the model by classification accuracy and mean-squared-error loss. • Multiclass Classification Task: Users' ratings are divided into 5 levels and the model should predict the correct rating. Evaluation of the model will be evaluated by classification accuracy and cross-entropy loss.

Performance Comparison (Q1)
The target of the recommendation model is to promote the prediction accuracy, so we set the prediction accuracy as the evaluation indicator. We carry out experiments for baseline models on different datasets and compare their results with the OPN. The result shows that OPN model improves the prediction accuracy on ml-20m and ml-latest datasets. Table 1 shows the overall performance of all models on two datasets and we can have the following observations: • All the models with dynamic changing embedding size outperform the baseline FIXED on all the datasets and tasks. According to this observation, we conclude that the performance of a recommender system can be boosted by utilizing dynamic embedding strategy. • DARTS, ESPAN and OPN outperform the DARTS and AutoEmb on all datasets and tasks because they avoid the interference of the redundant information from the embeddings of other sizes with hard selection [9]. Therefore, we can boost the model performance by applying hard selection instead of soft selection. • Our model outperforms ESPAN on all datasets and tasks.
The main difference between our proposed model and ESPAN is whether involving object-aware mechanism or not. Such experimental result demonstrates the effectiveness of our proposed object-aware mechanism.

Selection of the Concatenated Embedding Size (Q2)
In object-aware method, there is a fixed dimension embedding that is concatenated to the user or item embedding. We need to know how to choose size of this concatenated embedding and carry out a large number of experiments by recording the result of the model with different dimensions of the concatenated embedding. The result shows that the concatenated embeddings with 8 dimension could improve the prediction accuracy the most. As presented in the Table 2, increasing size of the concatenated embeddings improves the performance of the model at the beginning, however their performance begins to degrade when dimension size is greater than 8, except the binary task on ml-20m dataset.
Therefore, we can see that: (i) only concatenated embeddings with 8 dimension could significantly outperform the ESPAN on all datasets and tasks. (ii) concatenated embeddings with 2 dimensions showed the worst performance on most of the dataset and tasks. It is easy to explain this since inefficient features contained in the embeddings could hardly provide sufficient information for the deep learning models. (iii) Appropriate size of the concatenated embeddings was a key factor for the performance of the OPN, some models with improper embeddings size performed even worse than the ESPAN.

Efficiency of the Model (Q3)
The efficiency of the deep learning model is very important for real-world production systems. FFMs introduces large amounts of parameters for the field-aware method, which increases the complexity of the model and requires more time to train the model. Therefore, we want to explore how the object-aware method will affect the training time of the model by recording the training time of these two models.
We measured the training time of the model on Mov-ieLen 20M and MovieLen Latest, respectively. As the Fig. 6 shows, except of the multiclass task on MovieLens Latest dataset, the training time of OPN increased by less  than 10% in all experiments, indicating that OPN achieved better performance with little time consumption overhead introduced.

Conclusion
The well-known ESPAN unifies different embedding sizes by sending embeddings to a series of linear transformation without consideration of the interaction between the user and the item. Inspired by FFMs, we believe that the features in embeddings should be different when a user interacts with different items. Hence, an object-aware method is proposed to further optimize the ESPAN. However, the concatenated embedding size is hard to choose in objectaware method, and we will attempt to implement the this method in a new way. We conducted comprehensive experiments on two public real-world datasets and the result shows that our model outperforms the baseline models on all datasets, while introducing little extra time consumption.

Declarations
Ethics Approval All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendents or comparable ethical standards.

Informed Consent
Informed consent was obtained from all individual participants included in the study.

Conflict of Interest
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.