User Embeddings Based on Mobile App Behavior Data

. We consider a smart phone scenario with a number of apps used by a user. The app usage data provides information about the user behavior, which can be used to identify the user demographics and interest and in turn is used to ﬁ nd similar users. In this paper, we propose a method to generate a latent space user embedding using the user app usage data, which is a dense low-dimensional representation of the user. This representation is used for low latency user similarity computation and acts as the user feature representation in user demographics prediction models.


Introduction
User Modeling and User Profiling are well-studied areas of Computer Science finding applications in broad range of areas including recommendation systems and expert systems. Understanding the user preferences, interests and characteristics allows service providers to serve more personalized content to user resulting in improved user experience and increased success (engagement, purchases, etc.) for the business. In the domain of smartphones, a potential approach for building the user profiles is based on either developing a predictive model that map the user to a pre-defined taxonomy or creating user segments based on the app installation or app usage. In this paper, we present a novel approach to map a user to a latent space embedding which implicitly captures user behavior. In our work, we make several contributions in the field of user modelling based on app usage. First, we propose a user modelling method for creating a user embedding using Auto encoders. Second, we show the effectiveness of user embedding for different use cases such as demographics prediction. Thirdly, we show that this user representation can be used for user segmentation using vector operations.

Related Work
User modelling and Personalization has been studied extensively, especially since the advent of smartphones. Zolna et al.
[1] present a method to build LSTMs to model user behavior on a website, with applications in e-commerce domain. The approach looks at user's browsing pattern and maps it into a fixed-size vector. Amir et al.
[2] describe a method to create user embedding from text to capture latent user aspects. App2Vec [3] and AppDNA [4] are two such papers, which present methods to create embedding which captures semantic relationship between apps. The work done by us presents a novel unified method to compute user embedding based on the apps of the user instead of deducing user embedding from the app embedding. [5] Shows that usage of user embedding for the gender prediction task where the social context of the user is available but the social context is not readily available in all cases so in the absence of social context, the proposed approach doesn't require the user relations. Mannan [6] shows the use of artificial neural network for the user similarity computation but it is computationally expensive to compute the similarity between all the set of users and it cannot perform vector-based operations using this approach but the autoencoder fills this gap.

User Embeddings Model
User embedding can be used for finding similar user. The app space cardinality is very high but it is very sparse per user whereas dense user embeddings are usefull in this case. User embedding can be used as a feature for predictive analytics like classification of user into various categories extracted in an unsupervised manner. User embedding can be used to perform vector operations like addition, subtraction etc. This can be used to perform user analogy tasks. Our method is based on autoencoder. Autoencoder is a neural network model, which was developed for unsupervised learning and is used for feature extraction. [7] explains the usage of autoencoder for reducing the dimensions using a multi-layer neural network, which is proven to be better than the standard PCA approach. Amiri [8] presents a method to compute similarity between text pairs using autoencoder.

Method Explanation
In this subsection, we explain the architecture of the user-embedding model based on the autoencoder architecture. The layers of the encoder model are explained below (Fig. 1). (1) Input Layer -The input layer is 262,142 dimensional which is equal to the vocabulary of the app space.
(2) Hidden later -This layer is 512 dimensional which is followed by Relu, batchnorm, dropout respectively. (3) Output later -This layer is 300 dimensional. Tanh computation is applied on the output layer The layers of the decoder model are in the reverse order until the input layer. Sigmoid operation is applied to the output layer. Loss is optimized using the Adam optimizer and BCELoss. Number of epoch is equal to 6. Variable learning rate is employed per epoch-[0.1, 0.05, 0.025, 0.01, 0.005, 0.0001]. We ran the experiment on Amazon AWS p2.xlarge which has 12 GB of GPU memory which influences the autoencoder design (Table 1).

Experiments
We evaluate the user-embedding model using a private dataset collected from a survey of the user app usage. We collected the app usage data of the users over a period of 2 months. The User set is composed of both the male and female users within the age group from 16 to 80. App usage data consists of the app ids (For example -com. whatsapp) of the apps used by the users within this period in the csv format. This dataset is used to evaluate the user embedding using various tasks. The app usage data is preprocessed using Spark to create the one hot encodings of the user app usage vector.

Gender Clustering
In this sub section, we try to find the relation between the embedding and the user gender. In this case, we sampled 1000 user randomly from the two classes and generate the user embedding using the model training earlier and generate t-sne plots.

Age Clustering
In this sub section, we try to find the relation between the embedding and the user age. In the case of the age attribute based clustering of the user, we divider the user into two groups based on different point of age as the separation point and observe that the distinct clusters are formed at various age separation points. Figure 3 shows the user clusters based on different age threshold point. Equal numbers of users are sample from the two sets and t-sne plot are computed for the various threshold ages. This shows that distinct clusters are observed under various age thresholds (Fig. 2).

Embedding for User Similarity Task
In this sub section, we evaluate the quality of finding users with similar attributes. Figure 4a shows the quality of the user similarity task. In this experiment, we sampled a subset of users from the two age groups. This figure shows the probability of finding a user within the same age group by using the cosine distance between the user embedding lies within the range of 0.6 to 0.8. The app usage behavior of the people in the age group under 35 and over 35 is the least non-distinguishable in the experiment as it can be shown in lowest cosine distance of 0.6 whereas it's most distinguishable in the  Figure 4b shows the quality of the user similarity task on dividing the user sample into two sets based on the age threshold.

Validation Using Analogy Tasks
In this sub-section, we try to derive relationships based on multiple attributes. For e.g. the deduced semantic relation between an old woman user group and old man user group is similar to the semantic relation between young girl user group and young boy user group. These multi-attribute semantic relations can be evaluated by performing vector operation on the embedding. For example -To compute the semantic relation old_woman -old_man = young_girl -X, we sample 1000 users randomly from the each user group, compute the average of the vectors, perform this process for the old_woman, old_man, young_girl user groups and then search the vector space for the user_group vector closest to X using the cosine distance method and found that the closest vector is equal to the young_boy user group. We performed eight more experiments like this and the results are shown in the Table 2.  Table 2. Analysis of the classification quality for gender prediction using user embeddings and shallow network Analogy Actual test case old_woman-old_man = young_girl-young_boy 60 to 65_F-60 to 65_M = 20 to 25_F-25 to 30_M old_woman-young_girl = old_man-young_boy 70 to 75_F-20 to 25_F = 70 to 75_M-25 to 30_M young_girl-young_boy = old_woman-old_man 20 to 25_F-20 to 25_M = 70 to 75_F-65 to 70_M young_girl-old_woman = young_boy-old_man 20 to 25_F-70 to 75_F = 20 to 25_M-65 to 70_M old_man-old_woman = young_man-young_girl 60 to 65_M-60 to 65_F = 20 to 25_M-25 to 30_F old_man-young_boy = old_woman-young_girl 60 to 65_M-20 to 25_M = 60 to 65_F-25 to 30_F young_boy-old_man = young_girl-old_woman 20 to 25_M-70 to 75_M = 20 to 25_F-65 to 70_F young_boy-old_woman = young_girl-old_man 20 to 25_M-70 to 75_F = 20 to 25_F-60 to 65_M young_boy-young_girl = old_man-old_woman 20 to 25_M-20 to 25_F = 70 to 75_M-65 to 70_F

User Embedding for Classification Task
In this sub-section, we use the embedding as feature for user attribute prediction and compare its performance with a shallow prediction model with manual feature engineering. The shallow prediction model is based on SVM [9] model with the following parameters: gamma = 1 and C = 0.1. Input data to the embedding based model is the app-based user embedding whereas the shallow network is build using the tf-idf features of the metadata of the app ids used by the user. Table 3. shows that the embedding based binary prediction model achieves accuracy similar to the shallow model without hand engineered features and this method helps to build the solutions faster. Table 4. shows the quality of the age prediction model. The recall of the classes 0 to 17 and over 55 is higher using the embedding approach because of the generalization aspect of embedding.

Conclusion and Future Work
In this paper, we have proposed a method to create user embedding for a smartphone user based on their app usage. Our experiments highlight the usefulness of such an embedding in capturing user behavior and for binary prediction tasks and vector operations. In future, we plan to evaluate the use of the user embedding for the recommendation task.   Table 4. Analysis of the classification quality for age prediction using embedding vector feature and shallow network