Guest Editorial: Behavioral-Data Mining in Information Systems and the Big Data Era
- 853 Downloads
An information system collects and processes data with the aim to extract information and to support decision-making tasks. Since the advent of the so-called Social Web, users are encouraged to create content and upload it on the Web, so huge amounts of data are continuously generated. This data represents a great opportunity for researchers, companies, and decision makers to infer non-trivial patterns and generate new knowledge; on the other side, a lot of challenges arise from such amounts of data. In order to handle these new challenges and accomplish their objectives, information systems need efficient and effective ways to process this data. On the one hand, the algorithms that process these large amounts of data should have low computational costs, in order to keep up with the rapid evolution of the Web and guarantee efficiency, while on the other hand they should be able to filter out the less useful chunks of data and process only those that lead to an effective decision making.
Behavioral-data mining is the process of extracting information by analyzing the huge amounts of data that describe the behavior of the users in a system. This particular kind of mining has proven to be useful in various information systems areas (Beutel et al. 2015), such as the detection of tag clusters (Boratto et al. 2009), the creation of web personalization services (Mobasher et al. 2000), the improvement of web search ranking (Agichtein et al. 2006), and the generation of friend recommendations in social media systems (Manca et al. 2018). It is also the foundation of many computational social science studies (Lazer et al. 2009).
In this special issue, we explore a new frontier in Information Systems, which aims at producing behavioral-data mining approaches able to deal with the big data problem. The rest of this article is structured as follows: Section 2 focuses on the challenges that mining behavioral data in big data scenarios poses; in Section 3 we introduce some recent advances in this area; Section 4 contains concluding remarks.
2 Research Challenges
Mining behavioral data in information systems, when working in big data scenarios, poses several challenges. In this section, we will discuss some of the most important ones.
When monitoring users’ behavior, we mostly rely on implicit feedback provided by the users (e.g., the browsing history, or the items the users click on) (Oard and Kim 1998). While this form of feedback allows us to collect much more information about the users with respect to explicit feedback provided by ratings or reviews, its drawback is the lack of information about what the users do not like. Indeed, missing feedback might mean that the user does not like an item or that she might have not encountered it. Thinking about domains such as large e-commerce websites or social media platforms, in which users interact only with a small subset of items, this leads to a lot of uncertainty on the preferences for the majority of the items. In the mining process, this might affect the extraction of actionable knowledge about the users.
Both when working with implicit and explicit feedback, data is very sparse. Indeed, as previously mentioned, users implicitly interact with a small subset of items; moreover, they are usually reluctant to provide explicit ratings to evaluate the items, which is considered as a tiring process (Oard and Kim 1998). As Fan et al. highlight (Fan et al. 2014), high dimensionality leads to noise accumulation, spurious correlations, and incidental homogeneity. Moreover, when combining high-dimensional data to high sparsity, issues such as heavy computational cost and algorithmic instability arise. Hence, processing behavioral data still represents a challenge.
In the growing field of Computational Social Science, solving these problems has a direct impact in how science is done, how large scale sociological experiments have to be designed, or how given data sets of “natural experiments” (Dunning 2012) can be exploited scientifically to allow to verify or refute hypotheses. Challenges in this context are control for potential confounders in observational data, for example through randomization of the collected data. This allows to compare the observed with the expected outcome and assess the degree to which this outcome is indeed relevant. An example for such a study is given in Laniado et al. (2018). In case one is in control of the data collection process prior to the observed events, the additional challenge of how to correctly design the data collection process has to be solved, as is shown in the case for urban mobility studies in Manca et al. (2017).
Last, but not least, users’ privacy is a very timely issue, considering the new General Data Protection Regulation (GDPR), which is enforceable throughout Europe since May 25, 2018. Note that Facebook decided to extend the regulation worldwide,1 so the services that are built on top of it have to be compliant with the regulation, independently from the country. In addition to how data is collected and stored, according to Article 13, Paragraph 2 (f) of GDPR, users are entitled to have an explanation about how decisions are taken by an algorithm. Hence, collecting and storing behavioral data after the regulation might be challenging, and the mining algorithms should be able to provide explanations to users, which might not always be possible (e.g., when employing deep learning).
3 In this Special Issue
The six accepted articles in this special issue cover many of the aforementioned themes, with innovative techniques for mining user behavior in information systems in big data environments. The research contributions advance the state of the art, considering different aspects and scenarios, ranging from content retrieval and classification, to the characterization of user satisfaction in social networks and recommender systems, by considering different aspects, such as user personality, geographic distance of the users, and the content of the items.
In their article “TV-Program Retrieval and Classification: A Comparison of Approaches based on Machine Learning”, Narducci et al. (2018) analyze user behavior in order to generate personalized Electronic Program Guides (EPGs). More specifically, they focus on the retrieval of possibly interesting programs for the users, by first classifying them according to their textual description, and then retrieving those that best match a specific program type. Experiments performed on a dataset provided by Philips Research, related to 133,579 TV shows broadcasted by 47 channels in German language, show that Logistic Regression is the best algorithm both in the classification and retrieval tasks.
Nguyen et al. considered the role of personality in recommender systems, in the paper “User Personality and User Satisfaction with Recommender Systems” (Nguyen et al. 2018). This study considers 1800 users, to analyze if rating-based recommender systems were able to deliver preferred levels of diversity, popularity, and serendipity to them. Results show that these systems fail to do so. The authors also assessed users’ personality traits using the Ten-item Personality Inventory (TIPI), which suggests that users with different personalities have different preferences for these three recommendation properties. Given these results, the authors suggest that, in the future, recommender systems should consider users’ personality traits.
Golbeck et al., in their article “Scaling Up Integrated Structural and Content-Based Network Analysis” (Golbeck et al. 2018), face the issue of identifying clusters and classifying network nodes, as the network grows bigger and manual classification is no longer possible. The authors show how topic modeling can be employed to produce easy-to-understand keywords that represent important clusters in a network. Those keywords reflect the insights achieved by human analysts doing a manual content-based analysis of the network features.
The paper “The Impact of Geographic Distance on Online Social Interactions”, by Laniado et al. (2018), aims to explain the effect that geographic distance has on online social interactions and, simultaneously, tries to understand the interplay between the social characteristics of friendship ties and their spatial properties. The findings support the idea that spatial distance constraints whom users interact with, but not the intensity of their social interactions. Furthermore, friendship ties belonging to denser connected groups tend to arise at shorter spatial distances than social ties established between members belonging to different groups. Finally, the authors show that these findings mostly do not depend on the age of the users, although younger users seem to be slightly more constrained to shorter geographic distances.
In their article “Inducing Personalities and Values from Language Use in Social Network Communities”, Kumar et al. (2018) analyze the communities in social media networks as composition of induced psycholinguistic and sociolinguistic variables (Personalities, Values, and Ethics) across individuals. The study was performed on six datasets annotated with Values and Ethics of the users. The authors created models to determine the Personality and Values of individuals, by analyzing their language usage and social media behavior. Then, they connect the characteristics of individuals within an online community, and they create a map of values and ethics for India.
The final paper of this special issue, titled “Personality, User Preferences and Behavior in Recommender Systems”, by Karumur et al. (2018), identified Big-5 personality types of 1840 users of the MovieLens recommender system. The aim was to examine factors of user retention and engagement, content preferences, and rating patterns, to identify recommender-system related behaviors and preferences that correlate with user personality. Results show that personality traits correlate significantly with behaviors and preferences such as newcomer retention, intensity of engagement, activity types, item categories, consumption versus contribution, and rating patterns.
Mining user behavior in information systems is a topic of central interest to gather actionable knowledge about the users and provide services to them. Being able to do so in scenarios characterized by the big data represents a new frontier in this area. The papers included in this special issue cover several topics and present some of the key directions in this vibrant and rapidly expanding area of research and development. We hope the set of selected papers provides the community with a better understanding of the current directions, and that they inspire readers with possible areas to focus on in their future research.
We thank all the authors for considering this special issue as an outlet to publish their research results in the area of behavioral-data mining. We also would like to thank the referees who provided very useful and thoughtful feedback to the authors. Finally, we express our gratitude to the Editors-in-Chief, Professor Rao and Professor Ramesh, for their kind support, advice, and encouragements throughout the preparation of this special issue.
- Agichtein, E., Brill, E., Dumais, S.T. (2006). Improving web search ranking by incorporating user behavior information. In E.N. Efthimiadis, S.T. Dumais, D. Hawking, K. Jarvelin (Eds), SIGIR 2006: proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, Seattle, Washington, August 6–11, 2006 (pp. 19–26). ACM. https://doi.org/10.1145/1148170.1148177.
- Beutel, A., Akoglu, L., Faloutsos, C. (2015). Graph-based user behavior modeling: from prediction to fraud detection. In L. Cao, C. Zhang, T. Joachims, G.I. Webb, D.D. Margineantu, G. Williams (Eds.) , Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, Sydney, NSW, Australia, August 10–13, 2015 (pp. 2309–2310). ACM. https://doi.org/10.1145/2783258.2789985.
- Boratto, L., Carta, S., Vargiu, E. (2009). RATC: a robust automated tag clustering technique. In T.D. Noia, & F. Buccafurri (Eds.) , E-Commerce and web technologies, 10th international conference, EC-Web 2009, Linz, Austria, September 1–4, 2009. Proceedings, Lecture Notes in Computer Science (Vol. 5692, pp. 324–335). Springer, https://doi.org/10.1007/978-3-642-03964-5_30.CrossRefGoogle Scholar
- Golbeck, J., Gerhard, J., O’Colman, F., O’Colman, R. (2018). Scaling up integrated structural and content-based network analysis. Information Systems Frontiers, 20(6). https://doi.org/10.1007/s10796-017-9783-x.
- Karumur, R.P., Nguyen, T.T., Konstan, J.A. (2018). Personality, user preferences and behavior in recommender systems. Information Systems Frontiers, 20(6). https://doi.org/10.1007/s10796-017-9800-0.
- Kumar, U., Reganti, A.N., Maheshwari, T., Chakroborty, T., Gambäck, B., Das, A. (2018). Inducing personalities and values from language use in social network communities. Information Systems Frontiers, 20(6). https://doi.org/10.1007/s10796-017-9793-8.
- Laniado, D., Volkovich, Y., Scellato, S., Mascolo, C., Kaltenbrunner, A. (2018). The impact of geographic distance on online social interactions. Information Systems Frontiers, 20(6). https://doi.org/10.1007/s10796-017-9784-9.
- Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabási, A.L., Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M., Jebara, T., King, G., Macy, M., Roy, D., Van Alstyne, M. (2009). Computational social science. Science, 323(5915), 721–723. https://doi.org/10.1126/science.1167742.CrossRefGoogle Scholar
- Narducci, F., Musto, C, de Gemmis, M., Lops, P., Semeraro, G. (2018). Tv-program retrieval and classification: a comparison of approaches based on machine learning. Information Systems Frontiers, 20(6). https://doi.org/10.1007/s10796-017-9780-0.
- Nguyen, T.T., Maxwell Harper, F., Terveen, L., Konstan, J.A. (2018). User personality and user satisfaction with recommender systems. Information Systems Frontiers, 20(6). https://doi.org/10.1007/s10796-017-9782-y.
- Oard, D., & Kim, J. (1998). Implicit feedback for recommender systems. In Proceedings of the AAAI workshop on recommender systems (pp. 81–83).Google Scholar