Bi-Labeled LDA: Inferring Interest Tags for Non-famous Users in Social Network

User tags in social network are valuable information for many applications such as Web search, recommender systems and online advertising. Thus, extracting high quality tags to capture user interest has attracted many researchers’ study in recent years. Most previous studies inferred users’ interest based on text posted in social network. In some cases, ordinary users usually only publish a small number of text posts and text information is not related to their interest very much. Compared with famous user, it is more challenging to find non-famous (ordinary) user’s interest. In this paper, we propose a probabilistic topic model, Bi-Labeled LDA, to automatically find interest tags for non-famous users in social network such as Twitter. Instead of extracting tags from text posts, tags of non-famous users are inferred from interest topics of famous users. With the proposed model, the formulation of social relationship between non-famous users and famous user is simulated and interest tags of famous users are exploited to supervise the training of the model and to make use of latent relation among famous users. Furthermore, the influence of popularity of famous user and popular tags are considered, and tags of non-famous users are ranked based on random walk model. Experiments were conducted on Twitter real datasets. Comparison with state-of-the-art methods shows that our method is more superior in terms of both ranking and quality of the tagging results.


Introduction
Online social networking platforms like Twitter have become a mainstream medium, attracting millions of people spending their time there every day. Capturing interest and preference of users in these platforms is very important for many applications such as recommender system, personalized search and online advertising, besides social networking service itself. Tagging is one effective way to describe user's preference interest. Some social networking platforms such as Twitter don't provide chance for users to tag themselves. Others allow users to provide tags to describe themselves, but these tags are usually ambiguous, trivial, inadequate or even plain false [7]. Therefore, how to tag users automatically and accurately becomes a hot research topic in recent years. Non-famous 1 users in social network platforms are usually less active and provide less information than famous users, making it more challenging to find appropriate tags for them. Therefore, in this paper, we focus on study how to find interest tags for non-famous users in Social network platforms such as Twitter.
A key challenge of solving this problem is how to accurately infer the topics of interest for a user u. Most prior studies attempted to infer the topics of interest from the tweet content posted or retweeted by u in Twitter, mainly using topic models such as LDA and Labeled Latent Dirichlet Allocation (labeled LDA) [13]. Labeled LDA is one of the most competitive models for solving this problem using tweet content information. Some other approaches use both the tweet content and the social relationship information, mining users' topics of interest from tweets and re-ranking users' interests based on underlying Social network [12,18]. However, people often post interest-unrelated tweets about their lives [4,17,19]. Therefore, tweets users published usually cannot reflect or cover all topics of their interests.
To address these problems, Bhattacharya et al. [1] proposed a method that first determines the topical expertise of popular Twitter users based on their Twitter Lists features 2 and then transitively infers the interests of the users who follow them. Through this approach, tags extracted for popular user are of high quality. But many popular users cannot get tags and non-popular users usually get popular tags. Our experiments show that it usually recommends popular tags such as "celeb," "news," "media," for non-famous users. Ding et al. [4] proposed a method to extract interest tags from Twitter user biographies, which heavily depends on the availability of users' biographies. Lappas et al. [7] proposed to use traditional LDA model to find the famous aspects of popular Twitter users based on their published tweets and to infer tags of non-famous users based on their following relationship with the popular users. In this work, words of tweets were extracted as tags, which often have low level of generalization. In addition, every popular user was regarded as a unique id in the LDA model, which ignored the relations between popular users and undermined the effectiveness of the proposed method.
In this paper, we study two mining problems: how to extract interest tags for non-famous users based on social relationship among users in social network, without using tweet information, and how to rank tags of each non-famous user, capturing the importance of different tags. In particular, we extend traditional topic model LDA to model non-famous user's following behavior, making use of famous user's tag information simultaneously. People usually follow a famous user for personal interest reason. Therefore, famous users share quite the same interests with the non-famous users who follow them, which is called homophily in [22]. Based on this phenomenon, famous user's interest information is incorporated into traditional topic model LDA to serve as labels of documents and labels of words, and a probabilistic topic model called Bi-Labeled LDA is developed based on two basic intuitions. To further enhance the performance of this model, we improve it by relaxing the assumption that a famous user is followed due to one topic of interest and taking high popularity issues into account. Based on the result of this topic model, a random walk model is proposed to further rank tags for each non-famous user, utilizing social relationship among non-famous users. Ultimately based on these model results, we can output a ranked list of tags to describe each user's interests.
The major contributions we make in this paper are as follows: • We propose a new topic model, Bi-Labeled LDA, to model the process in which non-famous user follows famous users and infer tags for non-famous users effectively. Comparing to existing model, it takes the relation between famous users into consideration, incorporating more supervision information into traditional LDA. Bi-Labeled LDA is further improved to address two issues: strong assumption behind LDA and high popularity of topic and famous user. • A Random Walk model is proposed to rank the tags inferred through Bi-Labeled LDA, adjusting the importance of unpopular tags among famous users. • We conducted comprehensive experiments on real dataset and compared the interest tags found based on the proposed models and state-of-the-art approaches. We find that interest tags extracted by our methodology are far superior to others either in accuracy and have better generalization.
The rest of the paper is organized as follows: In Sect. 2, related work is discussed. We describe all problems and clear the definitions in Sect. 3. Then, the proposed Bi-Labeled LDA model and its extensions to infer interest tags for non-famous users are presented in Sect. 4. And, the method to find the interest tags of famous users is introduced in Sect. 5. In Sect. 6, experimental setup and results are described. Finally, conclusions are drawn in Sect. 7.

Related Work
Closely related existing work can be categorized into two groups: One group of work mainly utilizes users' tweet information to extract their interest topics, and the other group utilizes other kinds of information such as biography and social information to infer their interests.
Most prior studies attempted to mine user interests from the tweets posted or retweeted, mainly using topic models such as LDA. Xu et al. [20] proposed a modified authortopic model named twitter-user model to discover users' topics of interest by filtering out interest-unrelated tweets from the aggregated user tweets. For each tweet, they introduced a latent variable to indicate whether it is related to its author's interests. Zhao et al. [23] developed a new topic model named Twitter-LDA to improve the quality of topics by restricting each tweet to one topic and a common background topic. Quercia et al. [11] inferred users' topics of interest with a supervised topic model, Labeled Latent Dirichlet Allocation (Labeled LDA), and showed it to be more effective than LDA. Specifically, Labeled LDA uses the same underlying mechanisms as traditional LDA, but each topic is seeded with a label, to help anchor the topic extraction process. Quercia et al. labeled each Twitter user using some text classification APIs, while Ottoni et al. [10] selected the 300 most common hashtags from all the tweets as topic labels. Michelson and Macskassy [9] proposed to find user's interest with entity categories through extracting entities from tweets and categorizing entities based on Wikipedia. Some approaches used both tweets and network information, mining users' topics of interest from tweets and then re-ranking users' interests based on underlying social network using technique such as Random Walk [12,18].
After all, all the methods mentioned above rely on the tweet content, but Twitter users often post tweets about their daily lives or have conversation with their friends, which are usually not related to their interests [4,17,19], and 82.2% Twitter users post less than 100 tweets per year [7], which both make it difficult to infer meaningful topics from tweets. To address this problem, most studies focused on users' other features, such as biographies and network information, and incorporated extra information such as Wikipedia and human effort [1,4,7,8]. Bhattacharya et al. [1] first deduced the topical expertise of famous Twitter users based on their Twitter Lists features and then transitively inferred the interests of the users who follow them. Although their approach is very effective for deducing the topical expertise of famous users, it doesn't perform well for non-popular users. Our experiments show that it always recommends famous tags such as "celeb," "news," and "media," to nonfamous users. Ding et al. [4] extracted interest tags from Twitter user biographies, with a sequential labeling model based on automatically constructed labeled data. However, their approach heavily depends on the availability of users' biographies, and as a matter of fact only 22% of Twitter users have a biography on their profile [13] and Ding et al. revealed that only 28.8% of biographies contain meaningful interest tags. Then, even in the most ideal case, they are only able to recommend interest tags for 6.336% users in Twitter. Lim and Datta [8] introduced a method to find a user's interest through classifying their celebrity followings into categories. Celebrity is categorized through extracting keywords from occupation or the first paragraph of description text presented on Wikipedia and mapping from the extracted keywords to category. This method depends on information presented on Wikipedia for only real-life celebrities, and how to build the mapping from keywords to categories is not solved. Lappas et al. [7] inferred the famous aspects of popular Twitter users using two standard LDA models: one for the generative process of non-famous users' followings behavior, where every popular Twitter user is regarded as a word token, and the other for the generative process of tweets. Finally, words of tweets were extracted as tags, which often have low level of generalization. In addition, every popular user was regarded as a unique id in the LDA model, which ignored the co-occurrence information among popular users.
In this paper, we propose a model to find interest tags for non-famous users based on the underlying social network in Twitter, without using tweet text information. In addition, by mapping each famous user with a tag set and modeling the user's following behavior, it takes relations between famous users into consideration. Besides social relationship between non-famous user and famous user, relationship between nonfamous users is also used to rank tags inferred through the topic model, improving performance further.

Problem Definition
Given a set of users on social network platforms such as Twitter, for each user u, we have all of its followings, i.e., the users u follows, and the List information of the followings, including name and description of each list. A user may be a person, a company, an organization, etc. Among these users, we define those who are followed by less than 2000 users as non-famous users, and users who are followed by more than 2000 users as famous users [6,7]. Then, we split these users into two sets: a set U of non-famous users and a set V of famous users.
Given the above information, in this paper we have the following three mining tasks: Mining task 1 Given a set V of famous users and their followers, we want to extract interest tags for each famous user based on List information. As a result of task 1, we obtain a set K of tags, each of which represents an interest topic of famous users. Let the set K = t 1 , t 2 , … , t |K| . For each famous user v, we describe its interests by a set of tags, denoted by a binary vector, Mining task 2 Give a set U of non-famous users, a set V of famous users with tags, and following relationship between these two sets of users, we want to infer a set of tags for each non-famous user to represent their interests.
Mining task 3 Give a set U of non-famous users, each associated with a set of tags, we want to rank the tags so that the higher the rank, the more possible the user is interested in the topic the tag represents.
We describe how to fulfill task 1 in Sect. 4 and the other two tasks in Sect. 5. Our major contribution focuses on task 2 and task 3. To make the procedure more understandable, we describe task 1 first. We will use the expression of topic and tag interchangeably hereafter according to context. A tag represents an interest topic of both famous users and non-famous users.

Extracting Interest Tags of Famous Users
As famous users are usually active users, with more activities and more information posted every day, there are many ways to extract interest tags for them. In this paper, we take advantage of List in Twitter to do that. In Twitter, List is introduced to help users organize their followings. A user can create a List, specify a List name and an optional description, and then add some of his followings to this List. Usually, a famous Twitter user is a member of many Lists. For instance, Barack Obama is a member of Lists such as "politics", "government", "celeb", "leader", etc.
Ghosh et al. [5] proposed a method to discover the topical expertise of famous users utilizing Twitter List names and descriptions. We adopt this method and improved it to tag famous users. We adopt TweetNLP 3 to perform POS tagging for text information. Before extracting tags, we filter each list by using the GNU Aspell dictionary. GNU Aspell is a free and open-source spell checker, which can determine Out of vocabulary (OOV) tokens. It includes support for using multiple dictionaries and can remove noisy words efficiently. After that, we normalize the lists by using normalization system. 4 This normalization system detects and expands word tokens not in standard type including abbreviations and acronyms. Then, we merge synonyms based on WordNet. Finally, we remove stop words and perform stemming by using the Porter stemming algorithm. After these preprocessing steps, we extract tags of famous users based on the method used by Bhattacharya et al. [1,5]. According to this method, to find interest tags of a famous Twitter user v, we first collect the Lists which have v as a member, and then extract frequently occurring terms (unigrams and bigrams which are identified as nouns or adjectives) from the List names and descriptions. For each term, we count its frequency, the number of times it occurs in the list names or descriptions. In particular, if term t has frequency no less than 10, we identify v as an expert on a topic t, and we regard t as a tag of user v. As a result, each famous user who is member of at least one List is temporarily tagged by the set of terms extracted. Then, we retain those tags which occur in more than 1% users' tags. Applying this method on the experimental dataset, after this step, we get a set of meaningful and qualified tags, and we keep only these tags to infer non-famous user's tags.
Through this method, in our experimental dataset, 61.1% of famous users have at least one tag. To improve this method and to infer tags for more famous user, first, for each non-famous user u, we form a temporary tag set by obtaining the union of tag sets of famous users u follows. For those famous users for whom we cannot infer tags, we then tag them based on non-famous user's temporary tag set. For a famous user v, let f(v) be the set of non-famous users who follow v, and CT(u) be the temporary tag set of non-famous user u, then user v's tag set, denoted by tag(v), is inferred according to Eq. (1): That is to say, the intersection of non-famous followers' temporary tag set is regarded as the famous user's tag set. In this way, 93.4% of famous users finally have tags.

Inferring Tags for Non-famous Users
Based on the tags famous users have, in this section we introduce our methods to infer tags of non-famous users based on social relationship information.
Suppose each famous user has a set of tags, each of which represents an interest topic which they are an expert at or famous in. Meanwhile, they are usually more famous in some topics than in others. For example, Lance Armstrong 5 has two tags: cyclist and cancer survivor. And he is more famous as a world-class cyclist than as a cancer survivor. A non-famous user follows a famous user due to some of the topical expertise of the famous user. For example, a user follows Lance Armstrong because he is an expert or famous in cycling. Therefore, when a non-famous user u follows a famous user v, we assume that user u has different levels of interest in each topic represented by a tag of user v. However, we only observe that user u follows user v; it is not easy to find the different influence of user v's different interest topics on user u.
In order to figure out the reason why a non-famous user u follows a famous user v, we have the following intuitions:
• Intuition 1 If a non-famous user u follows more users who are famous in topic a than the ones who are famous in topic b, u follows a famous user v who is famous both in topics a and b more because of interest in topic a than in topic b. For example, suppose user u follows ten famous users. We count tags of these famous users and get three tags with counts: entertainment (6), business (1), and food (5). For a particular famous user v with tags entertainment and business, we think user u follows v more because of interest in topic entertainment than in topic business. • Intuition 2 If a famous user v is followed by more nonfamous users with interest in topic a than in topic b, then v is followed by a non-famous user u more due to u's interest in topic a than in topic b.
For example, suppose a famous user v with tags entertainment and business is followed by ten non-famous users. Among these non-famous users, six have interest in topic entertainment, one has interest in business, and five have interest in food. Then, non-famous user u follows v more because of his/her interest in topic entertainment than topic business.
Based on the observations and intuition discussed above, we propose a modified topic model, Bi-Labeled LDA, to model the generative process of non-famous users' followings behavior. In this model, we assume a non-famous user u follows a famous user v because of one topic of interest. Different from the model proposed by Lappas et al. [7], tags of famous users are exploited to supervise the learning of the model, linking different famous users through tags. In this model, we first only make use of the following relationship between non-famous users and famous users. We exclude social relationship between non-famous users in this step because we want to eliminate noise as far as possible. Existing study [22] has shown that sometimes a non-famous user follows another non-famous user may be due to the fact that they are offline friends, families, or just following each other back, not for real interest.
This model is further improved by relaxing the assumption that a famous user is followed because of one topic of interest. The following behavior may be owing to some topics or popularity of the famous user.
Based on Bi-Labeled LDA, we find a set of tags for each non-famous user with each tag representing an interest topic. But a user may be more interested in some topics than others. Therefore, we take advantage of following relationship between non-famous users to rank each user's tags in the end.
In the following, we first introduce the basic model of Bi-Labeled LDA and its extension and then describe the ranking model.

Bi-Labeled LDA
To perform mining task 2, we propose a probabilistic topic model, Bi-Labeled LDA, to model the process in which nonfamous users follow famous users in social networking platforms such as Twitter. This model is an improvement in traditional topic model LDA [2], which is originally proposed to model the generative process of a document. According to LDA, each word of a document is generated through two steps: first pick a topic based on document-specific topic distribution, and then, pick a word based on word distribution of the picked topic, under the assumption that a document is a mixture of latent topics. To model the user's following behavior in social network platforms, we have similar assumption that each user has a mixture of latent interest topics. Each tag of a famous user represents one interest topic. To get information about one topic, a user chooses famous users who has the same topic interests to follow. Therefore, the set of famous users a user u follows reflects u's latent interests. In analogy to document generation, each non-famous user u is regarded as a document, consisting of a set of famous users who are followed by user u. Hence, each followed famous user corresponds to a word of the document. For simplicity, famous users followed by a user u are called u's followings. Informally, to generate a document, a non-famous user u first picks a topic from his personal distribution of interest topics and then picks a famous user in that topic based on the topic's distribution over all the famous users. We call this process following behavior generative process.
Formally, given the set U of all non-famous users who follow a set V of famous users, we can represent each nonfamous user u by a bag of famous users u follows, denoted by V (u) Here N u is the number of famous users followed by users u. Through the method introduced in Sect. 4, we extract a set of tags, K = t 1 , t 2 , … , t |K| , and each famous user v's tag set is denoted by a vector, To find tags of a non-famous user u, we first build a candidate tag set for u, which is the union of tag sets of famous users u follows and denoted by a vector, As famous users have tags, that is to say, each famous user is labeled with a set of tags, we want to use this label information to supervise the generative process of non-famous user's following behavior and use tags in K to express nonfamous user's interest. Based on this idea, we propose the Bi-Labeled LDA model. Its graphical representation is illustrated in Fig. 1.
The symbols used in the model are summarized in Table 1.
The top half part of the graphical model is the same as the standard LDA, which says that for user u, each famous user followed by user u is generated by first picking a topic Z i based on (u) and then picking a famous user What is different is the lower half part. We think a user u's topics come from the tags of famous users followed by user u. Therefore, a user's topic distribution is restricted by a label prior . Similarly, for each topic t k , its famous user distribution is restricted by a label prior . Specifically, user u's topic distribution is restricted to be only over user u's candidate tag set (while each tag is regarded as a possible latent topic), and the famous user distribution of a topic t k is restricted to be only over those famous users who have this topic (i.e., the famous user who has tag t k ).
In other words, unlike traditional LDA, Bi-Labeled LDA defines a one-to-one correspondence between latent topics and tags. Every document is restricted to those topics that correspond to its candidate tag set. Meanwhile, every famous user is restricted to be generated (followed) from these topics, i.e., every topic can only have famous users associated with the same topic (tag). In this way, we incorporate supervision into traditional LDA and, meanwhile, take advantage of the relation among famous users who have same tags. Let = 1 , … , |K| T and = 1 , … , |V| T be the Dirichlet smoothing parameters for topics and words, respectively, = 1 , … , |K| T and = 1 , … , |K| T be the label priors for topic and non-famous users, respectively, be the topic t k distribution over famous users. Let L (u) and M (k) be two matrices used to constrain the topics user u could have and the topics v could belong to, respectively.
In order to restrict (u) to be defined only over the topics that correspond to u's candidate tag set represented by In other words, (u) k is equal to k if and only if (u) k is 1, and 0 otherwise. Clearly, the topics of user u are constrained to its candidate tag set.
For example, suppose |K| = 4 , a non-famous user u's candidate tag set is denoted by vector (u) = (1, 0, 0, 1) , then L (u) and (u) are shown as below: The set of famous users and the ith famous user followed by user u K The set of topics (tags) famous users have Z i The ith topic N u The number of famous users followed by user u The famous user v's tag set (u) A binary vector to represent u's candidate tags , Dirichlet smoothing parameters of topics and words, respectively , Label priors for topics and non-famous users, respectively The non-famous user u's topic distribution The topic t k 's distribution over famous users (u) The Dirichlet smoothing parameters for non-famous user u The Dirichlet smoothing parameters for topic k Similarly, we define a matrix M (k) of size |V| × |V| for each topic t k . For each row i ∈ {1, … , |V|} and column j ∈ {1, … , |V|}: i is computed as: , famous user i can belong to topic t k ), and 0 otherwise. Clearly, the topics a famous user can belong to are constrained to its associated tag set.
The generative process behind model Bi-Labeled LDA is shown below.

Learning and Inference
Similar to standard LDA, we learn (u) and (k) using collapsed Gibbs sampling [14]. The final sampling update |V| T equation for picking a topic to explain why user u follows user v is given in Eq. (5), assuming that v is the mth famous user in u's following list. Equations (6) and (7) are used to estimate (u) and (k) .
where c k,u, * denotes the number of associations between a topic t k and a non-famous user u , c −(u,m) k,u, * denotes the count when we exclude the follow relation between a non-famous user u and a famous user v , and the symbol * denotes a summation over all possible subscript variables. Symbols are summarized in Table 2.
Note that in the above equations the topic prior (u) is document specific, and the word prior (k) is topic specific. Bi-Labeled LDA captures the two intuitions discussed in the beginning of Sect. 4 well as explained below: • If a non-famous user u follows more users who are famous in aspect x than the ones who are famous in aspect y, then c x,u, * > c y,u, * → � (u) x > � (u) y , i.e., u follows a famous user v who is famous both in aspects x and y more because of interest in aspect x than in y. • If a famous user v is followed by more non-famous users with interest in aspect x than that in aspect y, then , v is followed by a nonfamous user u more due to u's interest in aspect x than in aspect y.
After the learning and inferring step, we obtained estimation of (u) and (k) , which indicate a non-famous users u's topic distribution and the topic t k 's distribution over famous users, respectively. Since we map every topic to a tag, we then recommend the top ranked tags to a non-famous user according to their probability values in (u) . As a result, each non-famous user is recommended a tag set, and we record each tag by a pair (tag, probability score), i.e., t k , (u) k .

Extension of Bi-Labeled LDA
In Bi-Labeled LDA, we assume a non-famous user follows a famous user only because of one topic. But it is possible that famous users are followed because of their multiple topics of interests. Meanwhile, the more topics a famous user is interested in, the more likely its popularity in some topics is overestimated. Besides, high popularity issue [16] is ignored in Bi-Labeled LDA. In our problem setting, high popularity issue manifests in the following two aspects: First, some famous users are more popular in some topics than others. The more popular a user v is, the more likely a user follows v not because of interest but because of v's popularity [18]. For example, you follow Barack Obama just because he is popular, not because you are interested in politics. Second, some topics are more popular than others. As shown in Eq. 6, more popular topics usually have bigger c k,u, * , which may dominate u's other less popular interests.
Taking these points into consideration and inspired by models proposed in [3], we further improve Bi-Labeled LDA to solve these problems. To address the first issue, we assume some famous users are followed because of one topic and others may be because of more than one topic. We call the corresponding following relationship one-topic following relation and multi-topics following relation. Then, we can deal with these situations separately. Accordingly, in the following behavior generative process, there are two separate paths from which famous users are followed. This separation is expected to eliminate the cases in which a famous user is followed by a non-famous user owing to more than one topics, and ultimately help to generate topics with less bias. To do that, a new binary latent variable st (single topic) is introduced to indicate the path the famous user v comes from, where st = 0 means v comes from a "multi-topics" path and st = 1 means that v comes from a "one-topic" path. And in the sampling process, we do a "path labeling" as well as a "topic labeling" for a following behavior. The upper middle red dashed box component in Fig. 2 depicts how this idea is incorporated into the Bi-Labeled LDA model.
Taking high popularity issue into consideration, the reason why a famous user v is followed by a non-famous user u can be extended to include the following three situations: • User u is interested in some topics in which user v is famous for. • User v is very popular. • One topic in which user v is famous is very popular.

Z
Denoting Z = z 1 , z 2 , … , z |U| , in which each z u = z u,1 , z u,2 , … , z u,n u represents the topic assignment of user u's followings. W Denoting W = w 1 , w 2 , … , w |U| , in which each w u = w u,1 , w u,2 , … , w u,n u , representing n u famous users whom user u follows. z u, m The topic assignment of the mth famous user followed by user u w u,m The mth famous user followed by user u c k,u,v The number of associations between a topic t k and a famous user v followed by user u c k,u, * The number of associations in which u follows a famous user due to topic k. Symbol * denotes a summation over all possible subscript variables and here means all possible famous users followed by u c −(u,m)

k,u, *
The number of times user u follows a famous user due to topic k excluding the current following behavior that non-famous user u follows the mth famous user Based on these points, we update user's topic distribution by taking topic's global popularity into account. Meanwhile, we update topic's famous user distribution by taking famous user's popularity into consideration. This interpretation leads us to a polya-urn model [3] with two new components added to the model, the upper left and upper right parts in dashed boxes in Fig. 2, with meaning of symbols shown in Table 3.
For convenience, we call the model in Fig. 1 as Bi-Labeled LDA1 and call this new model as Bi-Labeled LDA2. The upper right dotted box shows the global popularity distribution of famous users, which consists of a multinomial distribution , a Dirichlet prior , and a concentration scalar 2 . Note that is a vector of length |V|, the number of unique famous users, and each element has a value of  Fig. 2 is as follows: The variable st follows a Bernoulli distribution constrained by a Beta prior . When a famous user has many tags, the probability to be followed due to more than one topic becomes higher. Therefore, we pose an asymmetric prior according to the tag sets of famous users, which is mapped by a sigmoid function shown as follows: where The two latent variables are inferred simultaneously in every Gibbs sampling iteration. The topic-labeling process is performed only when st = 1 . Ultimately, the user-topic distribution ( (u) ) and topic-user distribution ( (k) ) can also be estimated based on Eqs. (11) and (12):

Ranking Tags of Non-famous Users
Based on the user-topic distribution (u) of user u obtained through model Bi-Labeled LDA (Bi-Labeled LDA1 or Bi-Labeled LDA2), we know the probability user u is interested   The indicator of whether a famous user is picked to follow through single topic c k,u,m, st The number of associations between a topic t k and the mth famous user v followed by a non-famous user u when the single topic label is st in each topic. Then, we can rank these topics according to their probabilities and regard tags representing the top topics as u's tag list. This rank sounds reasonable. But for some topics, if only a small number of famous users are interested in them, they may be ranked low in non-famous user's tag lists. For example, though a non-famous user u is quite interested in "lista," u just follows a few famous users who have tag "lista," as this tag is rare among the famous users. This may result in low ranking of "lista" in the tag list of u. On the other hand, if u is really interested in "lista," among the users u follows, the number of non-famous users having this tag may be relatively large. For example, user u may follow a lot of colleagues, who share the same interests such as "lista" with u, but they are not famous users. Based on this observation, we propose to use topic sensitive Random-Walk [19] model to re-rank the tags obtained from Bi-Labeled LDA based on following relationship among non-famous users.
Following relationship among non-famous users can be represented by a graph G(U, E) (we call it social network) as illustrated through a toy example in Fig. 3a. An edge of E between nodes in this graph represents a follow relationship between the users denoted by nodes in U. For example, in Fig. 3a, non-famous users b, c, and d follow u, and u follows non-famous users e and f. For a non-famous user u, let FN(u) be the set of all the non-famous users followed by u and FD(u) be the set of all the non-famous users who follows u. For a tag t of user u, we think the more users in FN(u) have this tag, the more user u is interested in the topic represented by tag t, or we simply say the more important tag t is to user u. In other words, we think tags of each following of user u have influence on the importance of each tag to u. To model the influence spreading process, we use Random Walk model, which is illustrated in Fig. 3b. As can be seen from this figure, the topic influence spreading direction is just the opposite to the following relationship shown in Fig. 3a. Tags of users e and f influence the importance of user u's tags, and user u's tags further influence those of users b, c, and d.
Suppose each user u has an initial topic distribution, I 0 u (equal to (u) , output of Bi-Labeled LDA), representing the initial importance of each tag to the user. This distribution is updated iteratively through influence spreading. Let I x u = (I x u1 , I x u2 , … I x u|K| ) , denoting user u's topic distribution after the xth iteration, which is updated according to the following equations: Fig. 3 a The following relationship, b the interests' spreading, and c an example of topic sensitive Random Walk Non-famous user u's topic distribution after the xth iteration in the process of Random Walk FN(u) The set of all the non-famous users followed by u FD(f ) The set of non-famous users who follow user f The decay factor in the process of interests spreading p fuk The weight of influence of topic t k spreading from user f to user u where ∈ [0, 1] is a decay factor, (13) means that topics t k can be spread from a following user f to u, if and only if in the results of Bi-Labeled LDA both u and f have interest in topic t k . For topics user u is not interested in, u's following users don't contribute to their importance. The influence of f to each possible follower user u is uniformly. Symbols used in this model are summarized in Table 4. Figure 3c illustrates the influence spreading process between u's followings and u. Suppose users u, e, and f all have two topics with probability greater than threshold min . Users u and f both have tags "news" and "music," and user e has tags "news" and "bicycle." Thus, in the process of Random Walk, tags of "news" and "music" can be transferred from f to u, but only tag "news" can be transferred from e to u and tag "bicycle" cannot be transferred, as u does not have tag "bicycle." The major steps of random walk model are shown below.

Experiments
In this section, we illustrate the efficacy of our proposed methods through an experimental evaluation on real data, comparing with existing state-of-the-art methods. We first show how to extract our experimental dataset and ground truth, and then compare the performance of different models proposed in this paper. Finally, we compare them with other existing methods and give a case study.

Dataset
We use the Twitter graph 6  To evaluate the performance of the proposed models, we need to compare the inferred interest tags with ground truth, i.e., known interests for some specific Twitter users. To do that, we select those non-famous users who declare their interests in their bios 8 as test dataset. Ding et al. [4] found that users always use "play + NP," "NP fan," "interested in + NP," "love < topic>" or some similar phrases to describe their interests in their biography, where NP stands for a noun phrase. We use the Stanford POS Tagger 9 to find out all the users whose biographies contain such phrases. Finally, we get 3242 such users. Further, we randomly select 120 users from them, and manually tag all the users according to their Twitter homepage, biographies, Lists they created and subscribed to. Note that the reason for manually tagging is that biographies are in free form and ambiguous. For instance, they usually express their interest as someone's fan, such as "Howard Stern fan" and "Orlando Magic fan." As a result, we get 100 users with manually labeled interest tags. Besides, for each selected user, we also classify its interests into several aspects such as {sports(NBA, Orlando Magic, Gator), music, show(Howard Stern)}, where sports is an aspect represented by tags NBA Orlando Magic and Gator. Hence, finally we get 100 users with manually labeled interest tags, which are clustered into several aspects.

Evaluated Approaches
To evaluate the performance of our proposed models, Bi-Labeled LDA1, Bi-Labeled LDA2 and Bi-Labeled-LDA-Ran-domWalk, following models are compared with, and more details about these models are given in Sect. 2. Labeled LDA [10,13]. We select the same tag set extracted from famous users' List features as topic labels.
It is used to model the generative process of user's tweets and ultimately recommend the words of the tweets to users as their tags.
• Labeled LDA-Text-Follow This baseline is the same as Labeled LDA-Text, except that it models both the generative process of user's tweets and followings at the same time. • Labeled LDA-Follow This baseline is similar to Labeled LDA-Text, except that it models the generative process of user's followings instead of user's tweets, and labels of the top ranked topics instead of tweet words are recommended to users as their final tags. • Tag-LDA This baseline was proposed to model the generative process of words and tags of a labeled document at the same time [15]. Due to the large noise in tweets, we model the generative of hashtags in tweets and famous users' tags at the same time. We finally recommend the hashtags and famous users' tags to users. Different from Labeled LDA, it has no restriction on the topics a document can have. • Tag-LDA-Follow For this baseline, it is the same as Tag-LDA, but we replace famous users' tags with users' followings and finally recommend hashtags in tweets to users.
For all the topic models listed above, we set as 0.5 and as 0.01. For all the topic models using tweet content listed above, we set the number of topics as 159, the number of tags we extracted for famous users. After learning and inference, we get probability distributions (u) and (k) , which indicates a non-famous user u's topic distribution and the topic distribution over terms, respectively. We recommend a term t to user u based on information gain measure as used in [7]: Information gain measures the reduction in the entropy associated with user u, incurred by the presence or absence of term t. p(u) = 1 ∕ |U| is assumed to be the same for all users, and we compute p(t) and p(u|t) as follows: (15) p(t|u) = IG(t|u) ∝ p(t) p(u|t) ⋅ log p(u|t) + p(¬u|t) log p(¬u|t) After computing the information gain scores, we recommend top scoring terms to users. In addition, we tried several other mechanisms, but this one performs best.

Comparison of Bi-Labeled LDA1 with Bi-Labeled LDA2
We use two measures to evaluate the performance of each model. One is DCG 10 values of the top n tags extracted for a user by each setting as a measure of performance. The other is the number of the aspects of each user's interests reflected in top n tags ( n ∈ {1, 5, 10, 15, 20} ), which we call hit number. Note that, since it is quite difficult to accurately know the exact number of aspects a user is interested in, we use the number of interest aspects captured instead of the percentage. In particular, since it is difficult to decide which aspect a user is more interested in, we only consider whether a tag is relevant to a user's interest or not. In addition, even though there are usually more than one tag relevant to one aspect, these tags are usually not completely the same but slightly different. For example, "book, reading, writer, write, kindle" are all relevant to book, but not completely the same. Given that, we calculate DCG and define the graded relevance in Eq. 18 and 19, where tag i is the tag at rank position i, rel i is the graded relevance of tag i , and tag i ∈ k means tag i can reflect u's interest in kth aspect: In other words, when more than one tag in the tag list of a non-famous user corresponds to a same aspect, the graded relevance of the first one is 5 and the others are 3. In this way, tag sets with top n tags covering all of aspects get the highest score. Specifically, the more aspects a tag list captures and the more tags that reflect different sides of the same aspect, the higher score the tag set will get.
To test effects of parameters 1 and 2 on the performance of Bi-Labeled LDA2, we conducted a set of experiments. As 2 decreases, less and less popularity of famous users is taken into account. And when 2 becomes larger, each topic's famous user distribution would become closer and closer to the global distribution of famous users, which is represented as popularity of each famous users but not topic specific. Thus, it leads to the reduction in performance. We fix 1 as 0.05 and adjust 2 from 0.05 to 10. The measure DCG@10 corresponding to different combinations of 1 and 2 is shown in Fig. 4. It can be seen from the figure that as 2 grows bigger, DCG first increases and then declines.
Lower 1 reduces the impact of popular topics and then lowers their rank ordering, but interest topics declared by users usually include some popular topics, such as "sport," "music," "movie," and "travel." We fix 2 as 1.0, and let 1 vary from 0.01 to 10.0. And the DCG is shown in Fig. 5.
Similar to the trend shown in Fig. 4, as 1 becomes bigger, the DCG first increases and then decreases.
The hit number of different combinations of 1 and 2 is shown in Figs. 6 and 7. In Fig. 6, 1 is fixed at 0.05, and 2 varies from 0.05 to 10, while in Fig. 7, 2 is fixed at 1.0, and 1 varies from 0.01 to 10. As can be seen from the two figures that too low or too high of 1 and 2 would result in bad performance. For example, when 1 = 0.01, it is so low that popularity of topics cannot work effectively, and when 1 = 10.0 or 2 = 5.0 or 10.0, they are so large that they can dominate the distribution without distinction among different users and topics. Considering both measures, we finally set the coefficients ( 1 , 2 ) as (0.05,1). Now, we compare the performance of models Bi-Labeled LDA1 and Bi-Labeled LDA2 in Figs. 8 and 9 for the two measures, respectively. Overall, we can see that model Bilabeled LDA2 outperforms Bi-Labeled LDA1, which means the extension of Bi-Labeled LDA1 to Bi-Labeled LDA2 is necessary.

Comparison of Bi-Labeled Walk with Bi-Labeled LDA2
To evaluate if Random Walk model is helpful for improving the ranking result, we compare it with Bi-Labeled LDA2. In this experiment, we vary decay factor in Eq. 13 from 0.2, 0.3, 0.5, 0.6, to 0.8. The larger the is, the more the influence one can get from its followers. Figures 10 and 11 show DCG@10 and hit number of Bi-Labeled-LDA-RandomWalk with different decay factors, respectively. As can be seen from these figures, the difference between different values of decay factor is not big. Taking into account both DCG and hit number, = 0.5 or 0.6 gives relatively better performance. The reason may be that, when is too large, the users would get too much influence from their followers. On the other hand, when is too small, the influence from their followers is too small. Setting = 0.5, we compare Bi-Labeled-LDA-Random-Walk with Bi-Labeled LDA2 in Figs. 12 and 13, which indicates that re-ranking using the random walk model can improve performance.

Comparing with Existing Methods
Based on ground truth we constructed, we compare our proposed models with state-of-the-art existing models listed in Sect. 5.2. The results are shown in Figs. 14 and 15.
Among the methods based on lists, Bi-Labeled-LDA1 performs better than List-Based which simply ranks the tags through frequency. And Bi-Labeled-LDA2 which relaxes the assumption of following behavior because of one topic of interest and takes high popularity issues into account outperforms Bi-Labeled-LDA1. And re-ranking based on social relationship among normal users is further superior to Bi-Labeled-LDA2. We find that tweet-based methods always recommend tags which either relate to daily life, recent events, globally popular topics, or relate to only one or two topics. Even though List-Based method can cover many topics of users' interests, it always ranks famous tags such as "news," "movie," "media," and "tech" in the top of the list.
Besides, we actually are very tolerant of the tags recommended by tweet-based method, which are usually not precise enough. For example, tags recommended by them for users who are interested in "politics" are usually "government" or "iranelection." Tags recommended by them for users who are interested in "baseball" may be "redsox" (an American professional baseball team), or even "sox." Even though these tags recommended by tweet-based methods are not generalized well enough for other applications, we still treat them as related to users' interests in order to avoid the deviation or artificial evaluation. In this sense, the tags recommended by our method have better generalization and more applicable for many applications, which are further illustrated in Sect. 6.6.

Comparison of Users with Different Levels of Activeness
Finally, in order to evaluate how each model performs for users of different activeness (number of tweets), we adopt the same evaluation method presented in [7]  For each user, the annotators were asked to pick the approach with the best tag set, and they could also pick multiple winners or no winners at all. Then, we report the average DCG for each approach. We also compute the average Kappa statistic of agreement between each pair of annotators on the wins for each approach. The value was 0.85, which signifies a robust agreement between annotators. The DCG of each approach is shown in Fig. 16. As can be seen, our approach consistently outperforms all others for all six groups of users. Among the others, as the number of published tweets increases, models utilizing text information and hashtags first increase and then decrease, which imply both too few and too many tweets are not good. This is easy to understand. Small number of tweets cannot provide enough information, and large number of tweets may provide much noise information.

Case Study
We illustrate the effectiveness of our proposed models through some cases. Table 6 shows the declared interests for some non-famous users (as given in their bio) and the top 10 tags recommended by the five different methods List-Based, Labeled LDA-Text, Tag-LDA-Follow, Labeled LDA-Follow, and Bi-Labeled-LDA-RandomWalk. They show relatively better performance in each group of models.
List-Based and Labeled LDA-Text perform better compared with other baselines. Tag-LDA-Follow performs better then Tag-LDA. And Labeled LDA-Follow performs well among the baselines which use social network. The tags in bold score 5 in terms of relevance, those in italic score 3, and all the others score 0. It is obvious that the tag sets recommended by Bi-Labeled LDA-RandomWalk are more precise and capture a larger fraction of users' interests declared in  their bios. For instance, Leftonred is a native New Yorker, who is good at fixing computers, playing ping-pong, loving photograph, sake, beer, wine, whiskey, spirits tea, origami, and backgammon. We infer his interests such as "beer," "nyc," "wine," "geek" and "art," while tags extracted by List-Based method, Labeled LDA-Text, Tag-LDA-Follow and Labeled LDA-Follow only contain one or two aspects. Even though List-Based method can cover many aspects of users' interests, it always ranks very popular tags in the top of its tag set, such as "news," "movie," "media," and "tech." Labeled LDA-Text and Tag-LDA-Follow which are based on tweet content usually either capture only one aspect of users' interest or recommend tags related to their daily life, recent events, or globally popular topics (e.g., "marathon" and "fb"). Moreover, tags recommended by tweet-based methods are not generalized enough. For example, "bike" and "ride" could be generalized to "cycling," and "code" could be generalized to "development." In this sense, the tags recommended by Bi-Labeled-LDA-RandomWalk, List-Based and Labeled LDA-Follow methods are more generalized and more applicable for applications such as personalized recommendation and advertising. Furthermore, to see why a famous user v is followed by non-famous users, we calculate their topic distributions as shown in Eq. 20: where P(v|t) is directly from the output, , of Bi-Labeled LDA. Table 7 shows bios of some famous users and the top 10 topics (tags) with high value of P(t|v).
For example, Rainn Wilson, an American actor who is famous for his Emmy Award-nominated role in television comedy "The Office," is best known because of topics such as "humor," "tv," "star," "hollywood," and "movie." And for Library of Congress of the USA, people usually follow it because of the topics of "book," "organization," (20) P(t|v) = P(t) × P(v|t) P(v) "government," and "education." We can see from the table that most of the topics inferred through our proposed model for famous users are accurate and reasonable.

Discussion
So far, we presented our proposed models through taking Twitter as an example. Bi-Labeled LDA is not limited to Twitter. It can be used to other social networking service platforms such as Sina Weibo and Facebook. Social network platforms such as Twitter do not ask users to tag themselves. Others such as Sina Weibo though provide chance for users to provide tags to describe themselves, many users don't use this chance or tags provided are usually ambiguous, trivial, inadequate or even plain false [7]. Thus, in this case, it is also necessary to infer high-quality tags for most users. Our proposed methods can be used in this kind of social networks, where it is easy to get social relationship between users and it is relatively easy to tag a small set of famous users. For example, in Sina Weibo, the platform itself provides highquality tags for famous users (called "big V users"). We can make use of these tags and the link between famous users and non-famous user to tag other non-famous users utilizing our proposed model. In this case, we skip the step to infer tag of famous users. In case there are not high-quality tags for famous users, we can use their published text information utilizing existing methods based on tweets to get tags for them first, as they are usually active users in terms of publishing behavior.

Conclusion
In this paper, we proposed a probabilistic topic model, Bi-Labeled LDA, to infer interest tags for non-famous users based on their social relationship with famous users, without using text content information. In particular, the proposed topic model simulates non-famous user's following behavior and incorporates topic restrictions to both user's topic distribution and topic's word distribution, based on famous user's tag information. The basic model is further extended to relax assumption and consider high popularity issues. To improve the ranking of tags, the model is finally combined with random walk model, utilizing relationship among non-famous users. Experiments conducted on real Tweet dataset show that the proposed models outperform existing models and can capture more topics of user interests. In future, we would like to study how to use the proposed model in other scenarios and evaluate the effects of tags inferred on applications such as personalize recommendation and online advertising.