Skip to main content
Log in

On a confidence gain measure for association rule discovery and scoring

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

This article presents a new interestingness measure for association rules called confidence gain (CG). Focus is given to extraction of human associations rather than associations between market products. There are two main differences between the two (human and market associations). The first difference is the strong asymmetry of human associations (e.g., the association “shampoo” → “hair” is much stronger than “hair” → “shampoo”), where in market products asymmetry is less intuitive and less evident. The second is the background knowledge humans employ when presented with a stimulus (input phrase).

CG calculates the local confidence of a given term compared to its average confidence throughout a given database. CG is found to outperform several association measures since it captures both the asymmetric notion of an association (as in the confidence measure) while adding the comparison to an expected confidence (as in the lift measure). The use of average confidence introduces the “background knowledge” notion into the CG measure.

Various experiments have shown that CG and local confidence gain (a low-complexity version of CG) successfully generate association rules when compared to human free associations. The experiments include a large-scale “free sssociation Turing test” where human free associations were compared to associations generated by the CG and other association measures. Rules discovered by CG were found to be significantly better than those discovered by other measures.

CG can be used for many purposes, such as personalization, sense disambiguation, query expansion, and improving classification performance of small item sets within large databases.

Although CG was found to be useful for Internet data retrieval, results can be easily used over any type of database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the International Conference on Very Large Data Bases, pp. 478–499 (1994)

  2. Amir, R., Feldman, R., Kashi, R.: A new and versatile method for association generation. Inf. Syst. 2, 333–347 (1997)

    Article  Google Scholar 

  3. Basu, S., Mooney, R.J., Pasupuleti, K.V., Ghosh, J.: Evaluating the novelty of text-mined rules using lexical knowledge. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 233–239 (2001)

  4. Bing, L., Yiming, M., Ching-Kian, W., Philip, S.-W.: Scoring the data using association rules. Appl. Intell. 18(2), 119–135 (2003)

    Article  Google Scholar 

  5. Chakrabarti, S., Sarawagi, S., Dom, B.: Mining surprising patterns using temporal description length. In: Gupta, A., Shmueli, O., Widom, J. (eds.): 24th International Conference on Very Large Data Bases, pp. 606–617. New York: Morgan Kaufmann

  6. Chen, G., Weia, Q., Liub, D., Wetsc, G.: Simple association rules (SAR) and the SAR-based rule discovery. Comput. Ind. Eng. 43(4), 721–733 (2002)

    Article  Google Scholar 

  7. Dong, G., Li, J.: Interestingness of discovered association rules in terms of neighborhood based unexpectedness. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) (1998)

  8. Feldman, R., Hirsh, H.: Mining Associations in Text in the Presence of Background Knowledge. In: Proceedings of the 2nd International Conference on Knowledge Discovery (KDD96), Portland, OR (1996)

  9. French, R.M.: Subcognition and the limit of the turing test: Mind 99(393), 53–65 (1990)

    Google Scholar 

  10. Frantzi, K.-T., Ananiadou, S.: Automatic term recognition using contextual cues. In: Proceedings of the 3rd DELOS Workshop. Zurich (1997)

  11. Freitas, A.-A.: On rule interestingness measures. Knowl.-Based Syst. 12, 309–315 (2000)

    Article  Google Scholar 

  12. Geyer-Schulz, A., Hhsler, M.: Evaluation of Recommender Algorithms for an Internet Information Broker based on Simple Association Rules and on the Repeat-Buying Theory. In: Proceedings WEBKDD’2002, pp. 100–114. Edmonton, Canada (2002)

  13. Graham, J.-W.: Evolutionary hot spots data mining. Methodologies for Knowledge Discovery and Data Mining. Lecture Notes in Artificial Intelligence, vol. 1574. Berlin, Heidelberg, New York: Springer (1999)

  14. Grefenstette, G.: Evaluation Techniques for Automatic Semantic Extraction: Comparing Syntactic and Window-Based Approaches. MIT Press, Cambridge, MA, pp. 205–216 (1996)

    Google Scholar 

  15. Hilderman, R.-J., Hamilton, H.-J.: Knowledge discovery and interestingness measures: A survey. Technical Report CS 99–04, Department of Computer Science, University of Regina, Saskatchewan, Canada (1999)

  16. Jing, Y., Croft, W.-B.: An association thesaurus for information retrieval. In: Proceedings of the RIAO 94 Conference, New York, pp. 146–160 (2004)

  17. Kintsch, W.: Predication. Cogn. Sci. 25, 173–202 (2001)

    Article  Google Scholar 

  18. Landauer, T.-K., Dumais, S.-T.: A solution to Plato’s problem: the latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychol. Rev. 104(2), 211–240 (1997)

    Article  Google Scholar 

  19. Landauer, T.-K., Foltz, P.-W., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)

    Google Scholar 

  20. Lavigne, F., Lavigne, P.: Anticipatory semantic processes. Int. J. Comput. Anticipatory Syst. 7, 3–31 (2000)

    Google Scholar 

  21. Leea, C.-H., Kim, Y.-H., Rheeb, P.-K.: Web personalization expert with combining collaborative filtering and association rule mining technique. Expert Syst. Appl. 21(3), 131–137 (2001)

    Article  Google Scholar 

  22. Miller, G.-A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.-J.: Introduction to WordNet: an on-line lexical database. J. Lexicogr. 3(4), 234–244 (1990)

    Google Scholar 

  23. Moshaber, B., Cooley, R., Srivastava, J.: Automatic personalization based on Web usage mining. Commun. ACM 43(8), 142–151 (2000)

    Article  Google Scholar 

  24. Nelson, D.-L., McEvoy, C.-L., Schreiber, T.-A.: The University of South Florida Word Association, Rhyme, and Word Fragment Norms. http://w3.usf.edu/FreeAssociation/Intro.html (2002)

  25. Padmanabhan, B., Tuzhilin, A.: A belief-driven method for discovering unexpected patterns. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD), pp. 94–100 (1998)

  26. Padmanabhan, B., Tuzhilin, A.: Small is beautiful: discovering the minimal set of unexpected patterns. In: Ramakrishnan, R., Stolfo, S., Bayardo, R., Parsa, I. (eds.): Proceedinmgs of the 6th ACM SIGKDD International Conference, pp. 54–63 (2000)

  27. Piatetsky–Shapiro, G.: Discovery, analysis, and presentation of strong rules. In: Piatetsky–Shapiro, G., Frawley, W. (eds.): Knowledge Discovery in Databases, pp. 229–248. Menlo Park, CA: AAAI Press (1991)

    Google Scholar 

  28. Rajman, M., Besancon, R.: Text mining – knowledge extraction from unstructured textual data. In: Proceedings of the 6th Conference of the International Federation of Classification Societies (IFCS-98) pp. 473–480, Rome (1998)

  29. Rapp, R.: Syntagmatic and paradigmatic associations in information retrieval. In: Proceedings of the 26th Annual Conference for the GFKL (Mannheim), Mannheim, Germany (in press) (2002)

  30. Rapp, R., Wettler, M.: Prediction of free word associations based on hebbian learning. In: Proceedings of the International Joint Conference on Neural Networks, vol. 1. Singapore, pp. 25–29 (1991)

  31. Silberschatz, A., Tuzhilin, A.: What makes patterns interesting in knowledge discovery systems. IEEE Trans. Knowl. Data Eng. 8(6), 970–974 (1996)

    Article  Google Scholar 

  32. Tamir, R.: Internet thesaurus – extracting relevant terms from WWW pages using weighted threshold function over a cross-reference matrix. In: Proceedings of the 26th Annual Conference for the GFKL (Mannheim), Mannheim, Germany (in press) (2002)

  33. Tan, P., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. Technical Report 2002–112. Army High Performance Computing Research Center (2002)

  34. Turing, A.M.: Computing machinery and intelligence. Mind 59(236), 433–460 (1950)

    MathSciNet  Google Scholar 

  35. Webb, G.-I., Zhang, S.: Beyond association rules: generalized rule discovery. (in press) (2003)

  36. Wilson, M.-D.: The MRC psycholinguistic database: machine readable dictionary. Behav. Res. Methods Instruments Comput. 20(1), 6–11 (1988). An on-line version of EAT association thesaurus available at: http://monkey.cis.rl.ac.uk/Ear/htdocs/eat.html (current 2003)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raz Tamir.

Additional information

Edited by J. Srivastava

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tamir, R., Singer, Y. On a confidence gain measure for association rule discovery and scoring. The VLDB Journal 15, 40–52 (2006). https://doi.org/10.1007/s00778-004-0148-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-004-0148-y

Keywords

Navigation