Behavior Modeling in Social Networks
- 346 Downloads
Latent Dirichlet Allocation: an effective topic model
Online social networks
- Power law
A functional relationship between two quantities where a relative change in one quantity results in a proportional relative change in the other quantity: one quantity varies as a power of another
Suspicious behavior detection: detecting fake reviews, fake social accounts, spammers, fake relationships, and fraudsters
- Social context
Contextual factors that determine users’ behaviors in social environments such as influence, trust, and preference
The development of social media has enabled the collection of behavioral data of unprecedented size and complexity. All social platforms have realized that great scientific and marketing values are contained in the millions of billions of behavioral records. Accurate prediction and detection of user behavior are key techniques for many social media applications, such as recommender systems (RSs), personalized search, and social marketing. Behavior modeling is significant in real applications and systems.
First, behavior modeling in social networks has countless marketing value in web service and information system. Social recommendation systems and social media marketing have become very important profit-making models. Social media closely connect users and provide them with a wide range of patterns and channels to acquire information and knowledge and even meet their shopping needs. Applications based on large-scale behavioral data, such as precise prediction on click and purchase behaviors and antifraud detection, have brought massive market values and economic returns.
Second, behavior modeling in social networks has great significance in the national product and security. Social media enable people to create, publish, and broadcast information with extreme ease and speed. It altered the traditional environment of news and information dissemination which has been monopolized by the state and related institutions for centuries. Moreover, quicker and easier sharing and cooperation in social media are free from the huge cost of traditional business models. It amplifies the influence of individuals on productivity, state security, and social development.
In the field of computer science, behavior modeling is one of the most important scientific research problems.
First, user behaviors in social networks are richer and more complex than those in traditional life. Behaviors in social networks are mainly user-user interactions such as friendships, following relationships, interaction frequencies, and roles in the networks. User behaviors in social media are user-information interactions that contain rich contents including text, image, video, URL, and emoticon and many environmental aspects including device and geo-location. Behavioral data in complex social media environments is much richer than the data in traditional social networks: the number of users is at million or billion level, while the number of user behaviors which can be represented as user-information interactions increases by millions of billions every day.
Second, behavior modeling in social networks needs supports from interdisciplinary research. Different social media services based on web technology create different behavioral mechanisms. However, behavioral analysis under these mechanisms cannot be solely conducted by the web technology. Knowledge from multiple research areas such as anthroponomy, anthropology, psychology, sociology, and communication studies has great significance in understanding user behaviors and uncovering rules of social environments. The ideas of behavioral models often come from hypothesis on user behavioral mechanisms that have been uncovered in these areas: researchers should answer the following questions. Why do users forward, share, or reject a received message in social media? Are their behaviors consistent and complementary in different social media platforms? Do users who have monetary incentives or fraudulent intentions generate suspicious behaviors? What are the differences between their suspicious behavioral patterns and normal users’ patterns? Only by integrating interdisciplinary knowledge can we effectively analyze and model user behaviors in social media.
Behavior modeling has to face a few serious challenges brought by the large-scale behavioral data. The major challenges are as follows.
High sparseness Traditional behavior prediction methods including collaborative filtering techniques suffer from the problem of high sparseness of behavioral data. For example, the Netflix rating data are often sparse, bringing big errors in estimating correlations between users and movies and making rating predictions inaccurate.
Heterogeneity User behaviors that can be represented as user-item links create heterogeneous graphs in multiple aspects due to the complex environments of social media. First, behavioral link properties are different in different social networks. In general, user behaviors form different types of graphs such as undirected graphs, directed graphs, bipartite graphs, weighted graphs, or hyper-graphs, which have been represented as symmetric matrix, asymmetric matrix, binary matrix, nonnegative matrix, or nonnegative tensor. Second, social networks include not only user nodes but also different types of nodes that represent information, contents, devices, and many other elements in the networks.
Large volume and dynamics The dynamics of user behaviors continuously input behavioral data into computational models of real systems. The systems are often broken down due to repetitive tasks of processing behavioral dynamics. For example, when there is a new user or a new message in Twitter, the computational complexity of updating and training behavioral data of billions of users is too high to practice. It is important to study incremental processing methods and online learning methods in large-scale social media.
Complex intentions Social media have become an important tool to broadcast information in this era: governments have realized the information cascades of high speed, wide range, and strong influence. Users look for useful information in social media, and at the same time, they spot suspicious behavioral phenomena of information manipulation. Social media services are seeking effective suspicious behavior detection (SBD) techniques. The common suspicious behaviors include lockstep following relationships by zombie followers and malicious web posts by fraudsters and spammers. For example, you can purchase 4000 Twitter followers with a cup of coffee.
In the late 1970s, the behavioral approach to systems theory and control theory was initiated where the behaviors are signals compatible with the system. Since the twenty-first century comes, OSNs such as Facebook, Twitter, and LinkedIn emerge and attract billions of accounts; thus, the signals from the users behind the systems are stored in the databases at a large scale. The behavior modeling in social networks is to understand the underlying patterns of human behaviors to facilitate the systems and services from the large collections of data.
Scientists have early realized that user preference that describes what content a user likes is an important driving factor for user behaviors. Content-based analysis used word distributions to represent users and items in news RSs (Balabanović and Shoham 1997); however, it did not take use of the semantics inside the sparse word distributions. In 2003, latent Dirichlet allocation (LDA) was proposed to automatically group documents into topics according to dependencies between words and documents (Blei et al. 2003). Based on the advanced LDA, scholars proposed a series of preference-based methods (Liu et al. 2012). The personalized preference analysis discovered the topic-level social dynamics in Twitter text data (Narang et al. 2013).
Context awareness was first introduced by Schilit in 1994 to refer to the idea that computers react based on their environments (Schilit et al. 1994). In social networks, the users react or behave based on the environments including the time, geographical location, the social platform they are using, the channel they access the network, and their friends’ words (Yuan et al. 2013; Liu and Aberer 2013). On one hand, the researchers dig into the details of the contextual information. Local and global contexts were exploited for social recommendation (Tang et al. 2013). On the other hand, they propose complicated, effective, and efficient models to represent the multifaceted behaviors, discover the behavioral patterns, and predict the missing behaviors (Jiang et al. 2014a).
The world of the Internet does not always tell truth. There are four categories of suspicious behaviors: traditional spam (web, email, and short message), fake review, social spam, and social link farming (Jiang et al. 2016b). Before the social networks appear, researchers have proposed several web spam and fake review spam detection techniques (Chirita et al. 2005; Xu et al. 2012). But the social networks are more complicated. In order to catch the suspicious behaviors in Facebook or Twitter, we have to understand the behavioral intentions of the suspicious accounts. Therefore, in the last decade, the researchers focus more on detecting social spammers, fake followers, and fake Page Likes.
Modeling Complex Behaviors in Social Networks
Modeling Behavior Content
Modeling Social Context
Modeling Cross Platform Context
Modeling Behavior Intention
As web applications such as Hotmail, Facebook, Twitter, and Amazon have become important means of satisfying working, social, information-seeking, and shopping tasks, suspicious users (such as spammers, fraudsters, and other types of attackers) are increasingly attempting to engage in dishonest activity, such as scamming money out of Internet users and faking popularity in political campaigns. Fortunately, commercially available suspicious behavior detection techniques can eliminate a large percentage of spam, fraud, and Sybil attacks on popular platforms. Naturally, the owners of these platforms want to ensure that any behavior happening on them involves a real person interested in interacting with a specific Facebook page, following a specific Twitter account, or rating a specific Amazon product.
Link farming previously referred to a form of spamming on search engine indexes that connected all of a web page’s hyperlinks to every other page in a group. Today, it is grown to include many graph-based applications within millions of nodes and billions of edges. For example, in Twitter’s “who-follows-whom” graph, fraudsters are paid to make certain accounts seem more legitimate or famous by giving them additional followers (zombies). In Facebook’s “who-likes-what-page” graph, fraudsters create ill-gotten page likes to turn a profit from groups of users acting together, generally liking the same pages at around the same time (Beutel et al. 2013). Unlike spam content that can be caught via existing anti-spam techniques, link-farming fraudsters can easily avoid content-based detection: zombie followers do not have to post suspicious content; they just distort the graph structure (Jiang et al. 2014c). Thus, the problem of combating link farming is rather challenging.
The framework of behavior modeling supports multiple applications in social networks that can be categorized into two: predicting the “good” behaviors and detecting the “bad” behaviors.
Behavior Prediction for Recommendation and Social Marketing
Recommender systems have become extremely common in a variety of social applications. The most popular systems are for movies, music, news, books, social tags, collaborators, restaurants, financial services, life insurance, friends, Twitter followers, and products in general.
Social media marketing connects these consumers and audiences to businesses that share the same needs, wants, and values. Precise behavior prediction techniques facilitate target advertising, product promotion, and online marketing in the social networks.
Suspicious Behavior Detection
The popularity of social network sites has made them prime targets for spammers. Social spam is unwanted spam content appearing on OSNs with UGC. It can be manifested in many ways, including bulk messages, profanity, insults, hate speech, malicious links, fraudulent reviews, fake friends, and personally identifiable information. Many of the social networks require an administrator’s time and energy to manually filter or remove spam. Automatic social spam detection is thus important.
Besides social spam, the suspicious behaviors include fake reviews, fake comments, fake blogs, fake posts, and deceptive messages. Supervised learning, pattern discovery, graph-based methods, and relational modeling have been applied to address the problems in the real world. We also observe other typical suspicious behaviors such as ill-gotten Page Likes in Facebook, ill-gotten five stars in Amazon and eBay, and zombie followers in Twitter.
Modeling Aspect-Level Behavior Content
There is a big gap between existing behavior models and behavioral semantics. Integrating the context and content information requires comprehensive and accurate representations of the behavior content, and traditional bag of word (BOW) and LDA cannot well represent the rich content. The former one has a serious issue of sparseness. For example, “Mrs. Clinton” and “Hillary Clinton” often refer to the same celebrity, but the BOW regards them as different elements. The latter one is uninterpretable and the representations are too general. For example, the topical dimension of political words is too rough to understand the user’s response to the presidential election. Therefore, mining the entities and events, the attributes of and aspects related to the entities and events, and the users’ sentiments will facilitate the modeling of the behavior content and users’ intentions underlying the UGC.
Modeling Multi-contextual and Dynamic Behaviors
Modeling the Causal Effects of Behaviors
Coming to the big data era, statistical modeling, which is a powerful tool for developing and testing theories by way of causal explanation, prediction, and description, has been adopted in a broad range of applications. For example, in online advertising, causal model was used to explain the causal effect of advertising strategies, and in information propagation, predictive model was employed for cascading outbreak detection (Cui et al. 2013). Usually there are two important aspects of statistical modeling: prediction performance and model interpretability. A good model should be both predictive and interpretable. Causal inference, which refers to the process of drawing a conclusion about a causal connection based on the conditions of the occurrence of an effect (Holland 1986), is a powerful statistical modeling tool for explanatory analysis. It is usually believed that models with high explanatory power are inherently of high predictive power (Shmueli 2010).
Other Future Directions
Back to Fig. 1, behavior modeling is designed to serve both the industrial and research communities. For industry, applied behavior modeling refers to (1) extend the scalability and parallelizability of the algorithms in online applications and (2) conduct online testing experiments to demonstrate the effectiveness of the algorithms. For scientific research, traditional discoveries were generated from a small series of experiments due to the high cost of human labor. Now with the large-scale databases of social networks, how to use behavior modeling results to benefit social and behavioral sciences becomes important.
- Beutel A, Xu W, Guruswami V, Palow C, Faloutsos C (2013) Copycatch: stopping group attacks by spotting lockstep behavior in social networks. In: Proceedings of the 22nd international conference on World Wide Web, Rio de Janeiro, pp 119–130, 13–17 May 2013Google Scholar
- Chirita PA, Diederich J, Nejdl W (2005) MailRank: using ranking for spam detection. In: Proceedings of the 14th ACM international conference on Information and knowledge management, Bremen, 31 Oct–5 Nov 2005, pp 373–380Google Scholar
- Cui P, Jin S, Yu L, Wang F, Zhu W, Yang S (2013) Cascading outbreak prediction in networks: a data-driven approach. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, Chicago, 11–14 Aug 2013, pp 901–909Google Scholar
- Jiang M, Cui P, Wang F, Yang Q, Zhu W, Yang S (2012a) Social recommendation across multiple relational domains. In: Proceedings of ACM international Conference on information and knowledge management (CIKM), Maui, 29 Oct–2 Nov 2012, pp 1422–1431Google Scholar
- Jiang M, Cui P, Liu R, Yang Q, Wang F, Zhu W, Yang S (2012b) Social contextual recommendation. In: Proceedings of ACM international conference on information and knowledge management (CIKM), Maui, 29 Oct–2 Nov 2012, pp 45–54Google Scholar
- Jiang M, Cui P, Wang F, Xu X, Zhu W, Yang S (2014a) FEMA: flexible evolutionary multi-faceted analysis for dynamic behavioral pattern discovery. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD), New York, 24–27 Aug 2014, pp 1186–1195Google Scholar
- Jiang M, Cui P, Beutel A, Faloutsos C, Yang S (2014c) CatchSync: catching synchronized behavior in large directed graphs. In: The 20th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), New York, 24–27 Aug 2014, pp 941–950Google Scholar
- Jiang M, Cui P, Beutel A, Faloutsos C, Yang S (2014d) Inferring strange behavior from connectivity pattern in social networks. In: Proceedings of the 18th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Tainan, 13–16 May 2014, pp 126–138Google Scholar
- Jiang M, Cui P, Chen X, Wang F, Zhu W, Yang S (2015a) Social recommendation with cross-domain transferable knowledge. IEEE Trans Knowl Data Eng 27(11):3084–3097Google Scholar
- Jiang M, Beutel A, Cui P, Hooi B, Yang S, Faloutsos C (2015c) A general suspiciousness metric for dense blocks in multimodal data. In: The 15th IEEE International Conference on Data Mining (ICDM)Google Scholar
- Jiang M, Cui P, Yuan NJ, Xie X, Yang S (2016a) Little is much: bridging cross-platform behaviors through overlapped crowds. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, Phoenix, 12–17 Feb 2016, pp 13–19Google Scholar
- Jiang M, Cui P, Beutel A, Faloutsos C, Yang S (2016c) Inferring lockstep behavior from connectivity pattern in large graphs. Knowl Inf Syst 48(2):399–428Google Scholar
- Koren Y (2008) Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, Las Vegas, 24–27 Aug 2008. ACM, New York, pp 426–434. ISBN:978-1-60558-193-4Google Scholar
- Li B, Yang Q, Xue (2009) Can movies and books collaborate? Cross-domain collaborative filtering for sparsity reduction. Hong Kong, China, IJCAI 9:2052–2057. ISBN: 978-1-60558-512-3Google Scholar
- Liu X, Aberer K (2013) Soco: a social network aided context-aware recommender system. In: Proceedings of the 22nd international conference on World Wide Web, Rio de Janeiro, 13–17 May 2013, pp 781–802Google Scholar
- Liu NN, Zhao M, Yang Q (2009) Probabilistic latent preference analysis for collaborative filtering. Proceedings of the 18th ACM conference on Information and knowledge management, Hong Kong, 2–6 Nov 2009. ACM, New York, pp 759–766. ISBN:978-1-60558-512-3Google Scholar
- Narang K, Nagar S, Mehta S et al (2013) Discovery and analysis of evolving topical social discussions on unstructured microblogs. In: Proceedings of the 35th European conference on advances in information retrieval, Moscow, 24–27 Mar 2013. Springer, Berlin/Heidelberg, pp 545–556. ISBN:978-3-642-36972-8Google Scholar
- Sarwar B, Karypis G, Konstan J et al (2001) Item-based collaborative filtering recommendation algorithms. Proceedings of the 10th international conference on World Wide Web, Hong Kong, 1–5 May 2001. ACM, New York, pp 285–295. ISBN:1-58113-348-0Google Scholar
- Schilit B, Adams N, Want R (1994) Context-aware computing applications. Proceedings of the 1994 first workshop on mobile computing systems and applications, 8–9 Dec 1994, pp 85–90Google Scholar
- Shmueli G (2010) To explain or to predict? Stat Sci 25(3):289–310. doi:10.1214/10-STS330Google Scholar
- Tang J, Hu X, Gao H et al (2013) Exploiting local and global social context for recommendation. Proceedings of the twenty-third international joint conference on artificial Intelligence, Beijing, 3–9 Aug 2013. AAAI Press, pp 2712–2718. ISBN:978-1-57735-633-2Google Scholar
- Xu Q, Xiang EW, Yang Q, Du J, Zhong J (2012) Sms spam detection using noncontent features. IEEE Intell Syst 27(6):44–51Google Scholar
- Yuan Q, Cong G, Ma Z, Sun A, Thalmann NM (2013) Who, where, when and what: discover spatio-temporal topics for twitter users. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, Chicago, 11–14 Aug 2013. ACM, New York, pp 605–613. ISBN:978-1-4503-2174-7Google Scholar
- Zhong E, Fan W, Yang Q (2014) User behavior learning and transfer in composite social networks. ACM TKDD 8(1):6Google Scholar