Abstract
This article presents a new interestingness measure for association rules called confidence gain (CG). Focus is given to extraction of human associations rather than associations between market products. There are two main differences between the two (human and market associations). The first difference is the strong asymmetry of human associations (e.g., the association “shampoo” → “hair” is much stronger than “hair” → “shampoo”), where in market products asymmetry is less intuitive and less evident. The second is the background knowledge humans employ when presented with a stimulus (input phrase).
CG calculates the local confidence of a given term compared to its average confidence throughout a given database. CG is found to outperform several association measures since it captures both the asymmetric notion of an association (as in the confidence measure) while adding the comparison to an expected confidence (as in the lift measure). The use of average confidence introduces the “background knowledge” notion into the CG measure.
Various experiments have shown that CG and local confidence gain (a low-complexity version of CG) successfully generate association rules when compared to human free associations. The experiments include a large-scale “free sssociation Turing test” where human free associations were compared to associations generated by the CG and other association measures. Rules discovered by CG were found to be significantly better than those discovered by other measures.
CG can be used for many purposes, such as personalization, sense disambiguation, query expansion, and improving classification performance of small item sets within large databases.
Although CG was found to be useful for Internet data retrieval, results can be easily used over any type of database.
Similar content being viewed by others
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the International Conference on Very Large Data Bases, pp. 478–499 (1994)
Amir, R., Feldman, R., Kashi, R.: A new and versatile method for association generation. Inf. Syst. 2, 333–347 (1997)
Basu, S., Mooney, R.J., Pasupuleti, K.V., Ghosh, J.: Evaluating the novelty of text-mined rules using lexical knowledge. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 233–239 (2001)
Bing, L., Yiming, M., Ching-Kian, W., Philip, S.-W.: Scoring the data using association rules. Appl. Intell. 18(2), 119–135 (2003)
Chakrabarti, S., Sarawagi, S., Dom, B.: Mining surprising patterns using temporal description length. In: Gupta, A., Shmueli, O., Widom, J. (eds.): 24th International Conference on Very Large Data Bases, pp. 606–617. New York: Morgan Kaufmann
Chen, G., Weia, Q., Liub, D., Wetsc, G.: Simple association rules (SAR) and the SAR-based rule discovery. Comput. Ind. Eng. 43(4), 721–733 (2002)
Dong, G., Li, J.: Interestingness of discovered association rules in terms of neighborhood based unexpectedness. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) (1998)
Feldman, R., Hirsh, H.: Mining Associations in Text in the Presence of Background Knowledge. In: Proceedings of the 2nd International Conference on Knowledge Discovery (KDD96), Portland, OR (1996)
French, R.M.: Subcognition and the limit of the turing test: Mind 99(393), 53–65 (1990)
Frantzi, K.-T., Ananiadou, S.: Automatic term recognition using contextual cues. In: Proceedings of the 3rd DELOS Workshop. Zurich (1997)
Freitas, A.-A.: On rule interestingness measures. Knowl.-Based Syst. 12, 309–315 (2000)
Geyer-Schulz, A., Hhsler, M.: Evaluation of Recommender Algorithms for an Internet Information Broker based on Simple Association Rules and on the Repeat-Buying Theory. In: Proceedings WEBKDD’2002, pp. 100–114. Edmonton, Canada (2002)
Graham, J.-W.: Evolutionary hot spots data mining. Methodologies for Knowledge Discovery and Data Mining. Lecture Notes in Artificial Intelligence, vol. 1574. Berlin, Heidelberg, New York: Springer (1999)
Grefenstette, G.: Evaluation Techniques for Automatic Semantic Extraction: Comparing Syntactic and Window-Based Approaches. MIT Press, Cambridge, MA, pp. 205–216 (1996)
Hilderman, R.-J., Hamilton, H.-J.: Knowledge discovery and interestingness measures: A survey. Technical Report CS 99–04, Department of Computer Science, University of Regina, Saskatchewan, Canada (1999)
Jing, Y., Croft, W.-B.: An association thesaurus for information retrieval. In: Proceedings of the RIAO 94 Conference, New York, pp. 146–160 (2004)
Kintsch, W.: Predication. Cogn. Sci. 25, 173–202 (2001)
Landauer, T.-K., Dumais, S.-T.: A solution to Plato’s problem: the latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychol. Rev. 104(2), 211–240 (1997)
Landauer, T.-K., Foltz, P.-W., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)
Lavigne, F., Lavigne, P.: Anticipatory semantic processes. Int. J. Comput. Anticipatory Syst. 7, 3–31 (2000)
Leea, C.-H., Kim, Y.-H., Rheeb, P.-K.: Web personalization expert with combining collaborative filtering and association rule mining technique. Expert Syst. Appl. 21(3), 131–137 (2001)
Miller, G.-A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.-J.: Introduction to WordNet: an on-line lexical database. J. Lexicogr. 3(4), 234–244 (1990)
Moshaber, B., Cooley, R., Srivastava, J.: Automatic personalization based on Web usage mining. Commun. ACM 43(8), 142–151 (2000)
Nelson, D.-L., McEvoy, C.-L., Schreiber, T.-A.: The University of South Florida Word Association, Rhyme, and Word Fragment Norms. http://w3.usf.edu/FreeAssociation/Intro.html (2002)
Padmanabhan, B., Tuzhilin, A.: A belief-driven method for discovering unexpected patterns. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD), pp. 94–100 (1998)
Padmanabhan, B., Tuzhilin, A.: Small is beautiful: discovering the minimal set of unexpected patterns. In: Ramakrishnan, R., Stolfo, S., Bayardo, R., Parsa, I. (eds.): Proceedinmgs of the 6th ACM SIGKDD International Conference, pp. 54–63 (2000)
Piatetsky–Shapiro, G.: Discovery, analysis, and presentation of strong rules. In: Piatetsky–Shapiro, G., Frawley, W. (eds.): Knowledge Discovery in Databases, pp. 229–248. Menlo Park, CA: AAAI Press (1991)
Rajman, M., Besancon, R.: Text mining – knowledge extraction from unstructured textual data. In: Proceedings of the 6th Conference of the International Federation of Classification Societies (IFCS-98) pp. 473–480, Rome (1998)
Rapp, R.: Syntagmatic and paradigmatic associations in information retrieval. In: Proceedings of the 26th Annual Conference for the GFKL (Mannheim), Mannheim, Germany (in press) (2002)
Rapp, R., Wettler, M.: Prediction of free word associations based on hebbian learning. In: Proceedings of the International Joint Conference on Neural Networks, vol. 1. Singapore, pp. 25–29 (1991)
Silberschatz, A., Tuzhilin, A.: What makes patterns interesting in knowledge discovery systems. IEEE Trans. Knowl. Data Eng. 8(6), 970–974 (1996)
Tamir, R.: Internet thesaurus – extracting relevant terms from WWW pages using weighted threshold function over a cross-reference matrix. In: Proceedings of the 26th Annual Conference for the GFKL (Mannheim), Mannheim, Germany (in press) (2002)
Tan, P., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. Technical Report 2002–112. Army High Performance Computing Research Center (2002)
Turing, A.M.: Computing machinery and intelligence. Mind 59(236), 433–460 (1950)
Webb, G.-I., Zhang, S.: Beyond association rules: generalized rule discovery. (in press) (2003)
Wilson, M.-D.: The MRC psycholinguistic database: machine readable dictionary. Behav. Res. Methods Instruments Comput. 20(1), 6–11 (1988). An on-line version of EAT association thesaurus available at: http://monkey.cis.rl.ac.uk/Ear/htdocs/eat.html (current 2003)
Author information
Authors and Affiliations
Corresponding author
Additional information
Edited by J. Srivastava
Rights and permissions
About this article
Cite this article
Tamir, R., Singer, Y. On a confidence gain measure for association rule discovery and scoring. The VLDB Journal 15, 40–52 (2006). https://doi.org/10.1007/s00778-004-0148-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-004-0148-y