Interactive Analytics in Social Media
- 42 Downloads
Interactive Analytics in Social Media is a multistep process through which an analyst refines his/her understanding of users and their actions in social media. Interactive analytics in social media is helpful in data science where analysts do not necessarily know what to look for. It is a recent research field of large practical importance, with many open challenges. Interactive analytics in social media could be formulated in different ways including exploration under constraints such as minimizing the analyst’s time, maximizing the diversity of returned results, optimizing coverage of the input, or minimizing the number of exploration steps.
The main benefit of interactive analytics in social media is that it virtually sits on top of most social media analytics techniques as an exploratory layer that enables the gradual understanding of underlying datasets. It is thus essential that interactive analytics allows analysts to refine their requirements as they explore data.
Interactive analytics in social media are grounded in the very process used by analysts when confronted with a space of groups to explore. The analysis process is often iterative where the analyst focuses on a few groups in the result and iteratively refines the parameters to try to get groups closer to her interest.
This process can be burdensome and hinder the analyst’s reasoning by breaking her train of thought. Recent data mining tools like Knime or RapidMiner propose to formalize data processing in workflows that can be quickly adapted. However they do not provide a logical link between the results of two different executions, which limits their degree of interactivity.
Dedicated approaches propose to navigate in the space of groups, which can be either pre-built or computed on the fly according to the navigation choices of the analyst. Such approaches originate from HCI and the field of visual analytics . This field develops increasingly sophisticated approaches to visualize various levels of aggregate statistics over data or graph structures. However, there is no good solution yet for the visualization of a large space of groups. A problem is that visual analytics research and data mining research are often conduced in isolation, preventing a much needed cross-fertilization. Recently, workshops in major data mining conferences were proposed in an effort to close the gap between standard data analytics and visual analytics [8, 9].
Among existing work, Goethals et al.  proposed the MIME framework that provides a visual interface over group patterns. MIME supports an iterative and interactive process to explore and refine the discovered user groups and provide numerous metrics over the presented results. In this framework, the navigation is completely directed by the analyst, which may limit results discovery in exploratory settings.
A growing number of approaches propose only a limited number of groups to the analyst, over which she can provide feedback which will be used to iteratively propose other groups expected to better match the analyst’s interest. Among those, Bhuiyan et al. proposed a sampling-based method  inspired by the Query by Example principle in information retrieval. Dzyuba et al.  proposed a method based on beam search directed by like/dislike annotations from the analyst. Last, Boley et al.  proposed another method which exploits user feedback to rank results and multiarmed bandit techniques for selecting the groups to present.
More details on these methods can be found in the survey proposed by Van Leuween .
The main problem of interactive analytics is to provide a way to navigate in the huge search space of groups defined by other social media analytics methods, such as structure analytics. Such navigation is necessary when measures of interestingness of groups are not selective enough, and the analyst does not have the elements to design a more selective measure. Interactive exploration of selected groups allows the analyst to exploit implicit criteria based on her intuition and experience.
The analyst must not be overwhelmed with exploration options. This suggests to break the exploration process into successive steps, each offering few exploration options.
The groups offered to the analyst must be of high quality and at the same time cover as much as possible the space of groups of interest.
The train of thought of the analyst must not be broken during the interaction. Each step of the interactive group discovery process must be executed fast.
A method following these principles should thus provide in a short amount of time a set of k groups (with k of manageable size for the analyst, i.e., k ≤ 10), each of these groups either being of interest to the analyst and having the potential to lead to a group of interest through further interactions. Such a group can be selected by the analyst to indicate her interest in this direction of exploration; in this case a new set of k groups based on this group is recomputed, until the analyst finds a satisfying group.
Explore provides as diverse as possible choices, each leading to a different part of the search space. It is needed in situations where the groups found so far have no interest for the analyst, so new exploration directions must provided.
Exploit provides refinements of the last seen group. It is needed when the analyst expresses an interest for the group found, meaning that the exploration direction is promising and that the analyst wants to “drill-down” in that direction.
The most common way to compute groups while taking this trade-off into account is to view the computation of the k groups as an optimization problem, where the function to optimize can be parameterized to lean more toward exploration or more toward exploitation.
To improve this function, some approaches also learn the analyst’s bias during interactions, in order to help selecting groups having the highest perceived interest for the analyst . Such approaches are more taxing for the analyst, as they need more feedback on the groups presented: ideally, identifying which groups are promising and which are not interesting. However when a correct model of the analyst can be learned, the number of exploration steps is expected to be reduced.
Interactive analytics in social media is a nascent field with many applications in data science. It could be used to understand a particular population online in order to serve better content. It could also be used to find surprising patterns (e.g., patterns hidden due to the long tail data distribution).
One of the most challenging aspects of interactive data exploration is its validation. There are many different algorithms for interactive exploration, but a principled qualitative validation strategy to compare them together is still missing.
User studies through crowdsourcing Internet marketplaces is the only proposed solution for the moment. These studies are usually hard to construct and interpret. One direction of improvement could be to use datasets having known users as experts. In an academic context, a dataset mentioning demographics and activities of domain researchers is an example. In the case of such datasets, experts can reason better on their choices and decisions.
As mentioned in , beside validation, there exist other points of improvement for interactive analysis. First, the learning process of the analyst interest should be improved to adapt to specific tasks which have their own specific interestingness measure. Second, data mining methods should output results in few seconds in order to allow a truly interactive exploration. Techniques such as sampling and approximation, as well as exploiting multicore processors, are a promising direction for that goal. Last direction is to further integrate research in data analytics and visualization.
- 1.Bhuiyan M, Mukhopadhyay S, Hasan MA. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management; 2012.Google Scholar
- 2.Boley M, Mampaey M, Kang B, Tokmakov P, Wrobel S. One click mining: interactive local pattern discovery through implicit preference and performance learning. In: Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics; 2013.Google Scholar
- 3.Dong X, Xuehua S, Qiaozhu M, Jiawei H. Discovering interesting patterns through user’s interactive feedback. Knowledge discovery and data mining. New York: ACM; 2006.Google Scholar
- 4.Dzyuba V, van Leeuwen M, Nijssen S, De Raedt L. Active preference learning for ranking patterns. In: Proceedings of the 25th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2013); 2013.Google Scholar
- 5.Fekete JD. Solving problems with visual analytics book, chapter 6 In: Infrastructure. Keim D, Kohlhammer J, Ellis G, Mansmann F, editors. http://www.vismaster.eu/book/.
- 6.Geng L, Hamilton HJ. Interestingness measures for data mining: a survey. ACM Comput Surv. (CSUR) 2006;38(3):1–32.Google Scholar
- 7.Goethals B, Moens S, Vreeken J. MIME: a framework for interactive visual pattern mining. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2011.Google Scholar
- 8.Instant Interactive Data Mining Workshop, ECML-PKDD workshops; 2012. http://adrem.ua.ac.be/iid2012/.
- 9.Interactive Data Exploration and Analytics Workshop (IDEA), KDD Workshops; 2013. http://poloclub.gatech.edu/idea2013/.
- 10.van Leeuwen M. Interactive data exploration using pattern mining. In: Interactive Knowledge Discovery and Data Mining in Biomedical Informatics; 2014.Google Scholar