Abstract
While rule mining is critical for decision-making applications, rule mining systems still lack support for interactive exploration of multitude of generated rules and understanding of relationships among rule results produced with various parameter settings. Based on a novel parameter space-driven approach, our proposed Framework forInteractiveRuleExploration [FIRE (PARAS/FIRE homepage: http://paras.cs.wpi.edu/)] addresses this usability shortcoming. FIRE features innovative visual displays and interactions to enable interactive rule exploration. We propose two linked interactive displays, namely the parameter space view (PSpace) and the rule space view (RSpace) that together enable enhanced sense-making of rule relationships. The PSpace view visualizes the distribution of rules produced for diverse parameter settings. This not only facilitates user parameter selection for rule mining but also enhances an analyst’s understanding of rule relationships in the parameter space context. The RSpace view provides a detailed display of the rules using a novel rule glyph visualization to facilitate interactive visual rule comparisons. We evaluate the usability and effectiveness of our FIRE framework with two studies. First, in a case study a researcher explored a dataset of interest using the FIRE paradigm as well as the state-of-the-art rule visualization techniques from the ARulsViz R package. Further, our user study with 22 subjects establishes the usability and effectiveness of the proposed visual displays and interactions of FIRE using several benchmark datasets. Overall, this research encompasses significant contributions at the intersection of data mining and visual analytics.
Similar content being viewed by others
Notes
The FIRE tool is available at [11] as a web interface for researchers to upload their own datasets, generate association rules on the datasets and visualize the rules.
This case study was performed by an avid bike user with an interest in data mining.
References
Aggarwal, C.C., Yu, P.S.: A new approach to online generation of association rules. IEEE Trans. Knowl. Data Eng. 13(4), 527–540 (2001)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB ’94 Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499. Morgan Kaufmann Publishers Inc. San Francisco, CA (1994)
Borgelt, C.: Efficient implementations of Apriori, Eclat and FP-growth. http://www.borgelt.net (2017). Accessed Dec 2017
Boulicaut, J.F., Jeudy, B.: Constraint-based data mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 339–354. Springer, Berlin (2010)
Cao, L., Li, J., Wang, C., Yu, P.S.: Efficient selection of globally optimal rules on large imbalanced data based on rule coverage relationship analysis. In: SIAM International Conference on Data Mining, pp. 216–224 (2013)
Cao, L.: Combined mining: analyzing object and pattern relations for discovering and constructing complex yet actionable patterns. WIREs Data Min. Knowl. Discov. 3(2), 140–155 (2013)
Chaudhuri, S., Lee, H., Narasayya, V.R.: Variance aware optimization of parameterized queries. In: SIGMOD Conference, pp. 531–542 (2010)
Cleveland, R.B., Cleveland, W.S., Mcrae, J.E., Terpenning, I.: STL: a seasonal-trend decomposition procedure based on loess. J. Off. Stat. 6(1), 3–73 (1990)
Couturier, O., Hamrouni, T., Yahia, S.B., Nguifo, E.M.: A scalable association rule visualization towards displaying large amounts of knowledge. In: International Conference Information Visualisation, pp. 657–663 (2007)
Duan, S., Thummala, V., Babu, S.: Tuning database configuration parameters with iTuned. PVLDB 2(1), 1246–1257 (2009)
PARAS/FIRE Home Page. http://paras.cs.wpi.edu/ (2018). Accessed March 2018
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD ’00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data, vol. 29, pp. 1–12. ACM, New York, NY (2000)
Hahsler, M., Chelluboina, S.: ARulesViz R package. http://cran.r-project.org/web/packages/arulesViz/vignettes/arulesViz-1.jpg (2017). Accessed Dec 2017
Jeudy, B., Boulicaut, J.-F.: Using condensed representations for interactive association rule mining. In: PKDD, pp. 225–236 (2002)
Kaya, M., Reda, A.: Online mining of fuzzy multidimensional weighted association rules. Appl. Intell. 29(1), 13–34 (2008)
Kubat, M., Hafez, A., Raghavan, V.V., Lekkala, J.R., Chen, W.K.: Itemset trees for targeted association querying. IEEE Trans. Knowl. Data Eng. 15(6), 1522–1534 (2003)
Leung, C.K.-S.: Constraint-based association rule mining. In: Wang, J. (ed.) Encyclopedia of Data Warehousing and Mining, pp. 307–312. Hershey, Information Science Reference (2009)
Lin, X., Mukherji, A., Rundensteiner, E.A., Ruiz, C., Ward, M.O.: PARAS: a parameter space framework for online association mining. PVLDB 6(3), 193–204 (2013)
Lin, X., Mukherji, A., Rundensteiner, E.A., Ward, M.O.: SPIRE: supporting parameter-driven interactive rule mining and exploration. PVLD 7(13), 1653–1656 (2014)
Liu, G., Suchitra, A., Zhang, H., Feng, M., Ng, S.-K., Wong, L.: AssocExplorer: an association rule visualization system for exploratory data analysis. In: ACM SIGKDD Demo, pp. 1536–1539 (2012)
Lucchese, C., Orlando, S., Perego, R., Silvestri, F.: WebDocs: a real-life huge transac. dataset, FIMI (2004)
Mukherji, A., Lin, X., Botaish, C.R., Whitehouse, J., Rundensteiner, E.A., Ward, M.O., Ruiz, C.: PARAS: interactive parameter space exploration for association rule mining. In: ACM SIGMOD, pp. 1017–1020 (2013)
Mukherji, A., Lin, X., Whitehouse, J., Botaish, C.R., Rundensteiner, E.A., Ward, M.O.: FIRE: interactive visual support for parameter space-driven rule mining. In: CIKM, pp. 2447–2452 (2013)
Qin, X., Ahsan, R., Lin, X., Rundensteiner, E.A., Ward, M.O.: iPARAS: incremental construction of parameter space for online association mining. In: BigMine, pp. 149–165 (2014)
Qin, X., Ahsan, R., Lin, X., Rundensteiner, E.A., Ward, M.O.: Interactive temporal association analytics. In: EDBT, pp. 197–208 (2016)
Qin, X., Kakar, T., Wunnava, S., Rundensteiner, E.A., Cao, L.: MARAS: signaling multi-drug adverse reactions. In: ACM SIGKDD, pp. 1615–1623 (2017)
Shao, J., Yin, J., Liu, W., Cao, L.: Actionable combined high utility itemset mining. In: AAAI, pp. 4206–4207 (2015)
Tork, H.F.: Bike sharing dataset. https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset (2017). Accessed Dec 2017
UCI Machine Learning Repository. http://www.ics.uci.edu/~mlearn/MLRepository.html (2017). Accessed 17 March 2017
Wang, S., Cao, L.: Inferring implicit rules by learning explicit and hidden item dependency. In: IEEE TSMC, PP(99), pp. 1–12 (2017)
Ward, M.O.: A taxonomy of glyph placement strategies for multidimensional data visualization. Inf. Vis. 1(3–4), 194–210 (2002)
Wong, P.-Y., Chan, T.-M., Wong, M.-H., Leung, K.-S.: Predicting approximate protein-DNA binding cores using association rule mining. In: IEEE ICDE, pp. 965–976 (2012)
Wu, T., Chen, Y., Han, J.: Association mining in large databases: a re-examination of its measures. In: PKDD, pp. 621–628 (2007)
XmdvTool Home Page. http://davis.wpi.edu/~xmdv/ (2018). Accessed March 2018
Yang, D., Rundensteiner, E.A., Ward, M.O.: A shared execution strategy for multiple pattern mining requests over streaming data. Proc. VLDB Endow. 2(1), 874–885 (2009)
Zaki, M.J., Hsiao, C.-J.: CHARM: an efficient algorithm for closed itemset mining. In: SIAM SDM (2002)
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: SIG KDD, pp. 283–286 (1997)
Acknowledgements
This work was supported by NSF under Grants IIS-0812027, CCF-0811510 and IIS-1117139.
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by National Science Foundation under Grants IIS-0812027, CCF-0811510 and IIS-1117139.
Appendix
Appendix
1.1 Preprocessing of the bike sharing dataset
The Bike Sharing dataset [29] was preprocessed before loading into FIRE [11] and R ARulesViz [14], as described below.
-
1.
Data corresponding to three of the attributes was eliminated. The attributes are instant (unique identifier), dteday (date) and yr (contains 2 values: year 1 and year 2).
-
2.
The casual and registered users increased over time. In particular, the casual users increase at 0.895 users per day, whereas the registered users increase at a rate of 4.874 users per day. To cancel the effect of the overall growth, the data was rotated to negate the slope of the trend lines. In Figs. 45 and 46, we show the original user counts, and in Figs. 47 and 48 we show the adjusted user counts for the casual and registered users categories. This processing is similar in flavor to season trend decomposition in [8].
-
3.
Further, the attributes were discretized as shown in Table 3.
1.2 Association rules and redundancy
An adjacency lattice (Fig. 49) denotes items such as X, Y and Z. The support value of each item (say, X) or itemset (say, XY) indicates the total instances of the item or itemset in the dataset. For example, in a set of 100 records, X occurs in 80 and Y in 60 records. Itemset XY has a support of 40 records. For a rule \(R = (X \longrightarrow Y\)), its confidence can be represented as confidence \((R) = \frac{\hbox {support}(X \cup Y)}{\hbox {support}(X)}\).
Aggarwal et al. [1] define rule redundancy relationships, such that redundant rules may be filtered out to present succinct results to the user. The redundant rules could always be derived on demand, if so desired. We examine how these redundancy relationships can be identified in the parameter space model. In particular, redundancy can be of two types [1], as defined below.
Definition 1
Simple redundancy Let \(A \Rightarrow B\) and C\(\Rightarrow D\) be two rules such that the itemsets A, B, C and D satisfy the condition \(A \cup B = C \cup D\). The rule \(C \Rightarrow D\) is simply redundant with respect to the rule \(A \Rightarrow B\), if \(C \supset A\).
Definition 2
Strict redundancy We consider two rules generated from itemsets \(X_{i}\) and \(X_{j}\), respectively, such that \(X_{i} \supset X_{j}\). Let \(A \Rightarrow B\) and \(C \Rightarrow D\) be rules satisfying \(A \cup B = X_{i}\), \(C \cup D = X_{j}\), and \(C \supseteq A\). Then the rule \(C \Rightarrow D\) is strictly redundant with respect to the rule \(A \Rightarrow B\).
The concept of redundancy can be illustrated using the rules generated from the lattice (Fig. 49) as listed in Table 4. Based on Definitions 1 and 2, if a rule \(\mathcal {R}_{1}\) is simple or strict redundant with respect to another rule \(\mathcal {R}_{2}\), then \(\mathcal {R}_{2}\) is said to simple or strict dominate\(\mathcal {R}_{1}\), respectively. In Table 4, the rule (\(X \Rightarrow YZ\)) simple dominates the rules (\(XY \Rightarrow Z\)) and (\(XZ \Rightarrow Y\)) (Def. 1). In Table 4, the rule (\(X \Rightarrow YZ\)) strict dominates rules (\(X \Rightarrow Y\)) and (\(X \Rightarrow Z\)) (Def. 2). In general, a rule may be dominated by several dominating rules and may in turn dominate several other dominated rules.
Rights and permissions
About this article
Cite this article
Mukherji, A., Lin, X., Toto, E. et al. FIRE: a two-level interactive visualization for deep exploration of association rules. Int J Data Sci Anal 7, 201–226 (2019). https://doi.org/10.1007/s41060-018-0133-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-018-0133-y