Skip to main content
Log in

FIRE: a two-level interactive visualization for deep exploration of association rules

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

While rule mining is critical for decision-making applications, rule mining systems still lack support for interactive exploration of multitude of generated rules and understanding of relationships among rule results produced with various parameter settings. Based on a novel parameter space-driven approach, our proposed Framework forInteractiveRuleExploration [FIRE (PARAS/FIRE homepage: http://paras.cs.wpi.edu/)] addresses this usability shortcoming. FIRE features innovative visual displays and interactions to enable interactive rule exploration. We propose two linked interactive displays, namely the parameter space view (PSpace) and the rule space view (RSpace) that together enable enhanced sense-making of rule relationships. The PSpace view visualizes the distribution of rules produced for diverse parameter settings. This not only facilitates user parameter selection for rule mining but also enhances an analyst’s understanding of rule relationships in the parameter space context. The RSpace view provides a detailed display of the rules using a novel rule glyph visualization to facilitate interactive visual rule comparisons. We evaluate the usability and effectiveness of our FIRE framework with two studies. First, in a case study a researcher explored a dataset of interest using the FIRE paradigm as well as the state-of-the-art rule visualization techniques from the ARulsViz R package. Further, our user study with 22 subjects establishes the usability and effectiveness of the proposed visual displays and interactions of FIRE using several benchmark datasets. Overall, this research encompasses significant contributions at the intersection of data mining and visual analytics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31
Fig. 32
Fig. 33
Fig. 34
Fig. 35
Fig. 36
Fig. 37
Fig. 38
Fig. 39
Fig. 40
Fig. 41
Fig. 42
Fig. 43
Fig. 44

Similar content being viewed by others

Notes

  1. The FIRE tool is available at [11] as a web interface for researchers to upload their own datasets, generate association rules on the datasets and visualize the rules.

  2. This case study was performed by an avid bike user with an interest in data mining.

References

  1. Aggarwal, C.C., Yu, P.S.: A new approach to online generation of association rules. IEEE Trans. Knowl. Data Eng. 13(4), 527–540 (2001)

    Article  Google Scholar 

  2. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB ’94 Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499. Morgan Kaufmann Publishers Inc. San Francisco, CA (1994)

  3. Borgelt, C.: Efficient implementations of Apriori, Eclat and FP-growth. http://www.borgelt.net (2017). Accessed Dec 2017

  4. Boulicaut, J.F., Jeudy, B.: Constraint-based data mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 339–354. Springer, Berlin (2010)

    Google Scholar 

  5. Cao, L., Li, J., Wang, C., Yu, P.S.: Efficient selection of globally optimal rules on large imbalanced data based on rule coverage relationship analysis. In: SIAM International Conference on Data Mining, pp. 216–224 (2013)

  6. Cao, L.: Combined mining: analyzing object and pattern relations for discovering and constructing complex yet actionable patterns. WIREs Data Min. Knowl. Discov. 3(2), 140–155 (2013)

    Article  Google Scholar 

  7. Chaudhuri, S., Lee, H., Narasayya, V.R.: Variance aware optimization of parameterized queries. In: SIGMOD Conference, pp. 531–542 (2010)

  8. Cleveland, R.B., Cleveland, W.S., Mcrae, J.E., Terpenning, I.: STL: a seasonal-trend decomposition procedure based on loess. J. Off. Stat. 6(1), 3–73 (1990)

    Google Scholar 

  9. Couturier, O., Hamrouni, T., Yahia, S.B., Nguifo, E.M.: A scalable association rule visualization towards displaying large amounts of knowledge. In: International Conference Information Visualisation, pp. 657–663 (2007)

  10. Duan, S., Thummala, V., Babu, S.: Tuning database configuration parameters with iTuned. PVLDB 2(1), 1246–1257 (2009)

    Google Scholar 

  11. PARAS/FIRE Home Page. http://paras.cs.wpi.edu/ (2018). Accessed March 2018

  12. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

  13. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD ’00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data, vol. 29, pp. 1–12. ACM, New York, NY (2000)

  14. Hahsler, M., Chelluboina, S.: ARulesViz R package. http://cran.r-project.org/web/packages/arulesViz/vignettes/arulesViz-1.jpg (2017). Accessed Dec 2017

  15. Jeudy, B., Boulicaut, J.-F.: Using condensed representations for interactive association rule mining. In: PKDD, pp. 225–236 (2002)

  16. Kaya, M., Reda, A.: Online mining of fuzzy multidimensional weighted association rules. Appl. Intell. 29(1), 13–34 (2008)

    Article  Google Scholar 

  17. Kubat, M., Hafez, A., Raghavan, V.V., Lekkala, J.R., Chen, W.K.: Itemset trees for targeted association querying. IEEE Trans. Knowl. Data Eng. 15(6), 1522–1534 (2003)

    Article  Google Scholar 

  18. Leung, C.K.-S.: Constraint-based association rule mining. In: Wang, J. (ed.) Encyclopedia of Data Warehousing and Mining, pp. 307–312. Hershey, Information Science Reference (2009)

    Chapter  Google Scholar 

  19. Lin, X., Mukherji, A., Rundensteiner, E.A., Ruiz, C., Ward, M.O.: PARAS: a parameter space framework for online association mining. PVLDB 6(3), 193–204 (2013)

    Google Scholar 

  20. Lin, X., Mukherji, A., Rundensteiner, E.A., Ward, M.O.: SPIRE: supporting parameter-driven interactive rule mining and exploration. PVLD 7(13), 1653–1656 (2014)

    Google Scholar 

  21. Liu, G., Suchitra, A., Zhang, H., Feng, M., Ng, S.-K., Wong, L.: AssocExplorer: an association rule visualization system for exploratory data analysis. In: ACM SIGKDD Demo, pp. 1536–1539 (2012)

  22. Lucchese, C., Orlando, S., Perego, R., Silvestri, F.: WebDocs: a real-life huge transac. dataset, FIMI (2004)

  23. Mukherji, A., Lin, X., Botaish, C.R., Whitehouse, J., Rundensteiner, E.A., Ward, M.O., Ruiz, C.: PARAS: interactive parameter space exploration for association rule mining. In: ACM SIGMOD, pp. 1017–1020 (2013)

  24. Mukherji, A., Lin, X., Whitehouse, J., Botaish, C.R., Rundensteiner, E.A., Ward, M.O.: FIRE: interactive visual support for parameter space-driven rule mining. In: CIKM, pp. 2447–2452 (2013)

  25. Qin, X., Ahsan, R., Lin, X., Rundensteiner, E.A., Ward, M.O.: iPARAS: incremental construction of parameter space for online association mining. In: BigMine, pp. 149–165 (2014)

  26. Qin, X., Ahsan, R., Lin, X., Rundensteiner, E.A., Ward, M.O.: Interactive temporal association analytics. In: EDBT, pp. 197–208 (2016)

  27. Qin, X., Kakar, T., Wunnava, S., Rundensteiner, E.A., Cao, L.: MARAS: signaling multi-drug adverse reactions. In: ACM SIGKDD, pp. 1615–1623 (2017)

  28. Shao, J., Yin, J., Liu, W., Cao, L.: Actionable combined high utility itemset mining. In: AAAI, pp. 4206–4207 (2015)

  29. Tork, H.F.: Bike sharing dataset. https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset (2017). Accessed Dec 2017

  30. UCI Machine Learning Repository. http://www.ics.uci.edu/~mlearn/MLRepository.html (2017). Accessed 17 March 2017

  31. Wang, S., Cao, L.: Inferring implicit rules by learning explicit and hidden item dependency. In: IEEE TSMC, PP(99), pp. 1–12 (2017)

  32. Ward, M.O.: A taxonomy of glyph placement strategies for multidimensional data visualization. Inf. Vis. 1(3–4), 194–210 (2002)

    Article  Google Scholar 

  33. Wong, P.-Y., Chan, T.-M., Wong, M.-H., Leung, K.-S.: Predicting approximate protein-DNA binding cores using association rule mining. In: IEEE ICDE, pp. 965–976 (2012)

  34. Wu, T., Chen, Y., Han, J.: Association mining in large databases: a re-examination of its measures. In: PKDD, pp. 621–628 (2007)

  35. XmdvTool Home Page. http://davis.wpi.edu/~xmdv/ (2018). Accessed March 2018

  36. Yang, D., Rundensteiner, E.A., Ward, M.O.: A shared execution strategy for multiple pattern mining requests over streaming data. Proc. VLDB Endow. 2(1), 874–885 (2009)

    Article  Google Scholar 

  37. Zaki, M.J., Hsiao, C.-J.: CHARM: an efficient algorithm for closed itemset mining. In: SIAM SDM (2002)

  38. Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: SIG KDD, pp. 283–286 (1997)

Download references

Acknowledgements

This work was supported by NSF under Grants IIS-0812027, CCF-0811510 and IIS-1117139.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abhishek Mukherji.

Additional information

Supported by National Science Foundation under Grants IIS-0812027, CCF-0811510 and IIS-1117139.

Appendix

Appendix

1.1 Preprocessing of the bike sharing dataset

The Bike Sharing dataset [29] was preprocessed before loading into FIRE [11] and R ARulesViz [14], as described below.

Fig. 45
figure 45

Original casual user count

Fig. 46
figure 46

Original Registered User Count

Fig. 47
figure 47

Adjusted casual user count

Fig. 48
figure 48

Adjusted registered user count

Table 3 Discretized attributes: bike sharing dataset
Fig. 49
figure 49

Adjacency lattice example

  1. 1.

    Data corresponding to three of the attributes was eliminated. The attributes are instant (unique identifier), dteday (date) and yr (contains 2 values: year 1 and year 2).

  2. 2.

    The casual and registered users increased over time. In particular, the casual users increase at 0.895 users per day, whereas the registered users increase at a rate of 4.874 users per day. To cancel the effect of the overall growth, the data was rotated to negate the slope of the trend lines. In Figs. 45 and 46, we show the original user counts, and in Figs. 47 and 48 we show the adjusted user counts for the casual and registered users categories. This processing is similar in flavor to season trend decomposition in [8].

  3. 3.

    Further, the attributes were discretized as shown in Table 3.

1.2 Association rules and redundancy

An adjacency lattice (Fig. 49) denotes items such as XY and Z. The support value of each item (say, X) or itemset (say, XY) indicates the total instances of the item or itemset in the dataset. For example, in a set of 100 records, X occurs in 80 and Y in 60 records. Itemset XY has a support of 40 records. For a rule \(R = (X \longrightarrow Y\)), its confidence can be represented as confidence \((R) = \frac{\hbox {support}(X \cup Y)}{\hbox {support}(X)}\).

Table 4 Redundancy in generated association rules

Aggarwal et al. [1] define rule redundancy relationships, such that redundant rules may be filtered out to present succinct results to the user. The redundant rules could always be derived on demand, if so desired. We examine how these redundancy relationships can be identified in the parameter space model. In particular, redundancy can be of two types [1], as defined below.

Definition 1

Simple redundancy Let \(A \Rightarrow B\) and C\(\Rightarrow D\) be two rules such that the itemsets ABC and D satisfy the condition \(A \cup B = C \cup D\). The rule \(C \Rightarrow D\) is simply redundant with respect to the rule \(A \Rightarrow B\), if \(C \supset A\).

Definition 2

Strict redundancy We consider two rules generated from itemsets \(X_{i}\) and \(X_{j}\), respectively, such that \(X_{i} \supset X_{j}\). Let \(A \Rightarrow B\) and \(C \Rightarrow D\) be rules satisfying \(A \cup B = X_{i}\), \(C \cup D = X_{j}\), and \(C \supseteq A\). Then the rule \(C \Rightarrow D\) is strictly redundant with respect to the rule \(A \Rightarrow B\).

The concept of redundancy can be illustrated using the rules generated from the lattice (Fig. 49) as listed in Table 4. Based on Definitions 1 and 2, if a rule \(\mathcal {R}_{1}\) is simple or strict redundant with respect to another rule \(\mathcal {R}_{2}\), then \(\mathcal {R}_{2}\) is said to simple or strict dominate\(\mathcal {R}_{1}\), respectively. In Table 4, the rule (\(X \Rightarrow YZ\)) simple dominates the rules (\(XY \Rightarrow Z\)) and (\(XZ \Rightarrow Y\)) (Def. 1). In Table 4, the rule (\(X \Rightarrow YZ\)) strict dominates rules (\(X \Rightarrow Y\)) and (\(X \Rightarrow Z\)) (Def. 2). In general, a rule may be dominated by several dominating rules and may in turn dominate several other dominated rules.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mukherji, A., Lin, X., Toto, E. et al. FIRE: a two-level interactive visualization for deep exploration of association rules. Int J Data Sci Anal 7, 201–226 (2019). https://doi.org/10.1007/s41060-018-0133-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-018-0133-y

Keywords

Navigation