On a confidence gain measure for association rule discovery and scoring

Tamir, Raz; Singer, Yehuda

doi:10.1007/s00778-004-0148-y

On a confidence gain measure for association rule discovery and scoring

Regular Paper
Published: 05 September 2005

Volume 15, pages 40–52, (2006)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Raz Tamir¹ &
Yehuda Singer²

127 Accesses
6 Citations
Explore all metrics

Abstract

This article presents a new interestingness measure for association rules called confidence gain (CG). Focus is given to extraction of human associations rather than associations between market products. There are two main differences between the two (human and market associations). The first difference is the strong asymmetry of human associations (e.g., the association “shampoo” → “hair” is much stronger than “hair” → “shampoo”), where in market products asymmetry is less intuitive and less evident. The second is the background knowledge humans employ when presented with a stimulus (input phrase).

CG calculates the local confidence of a given term compared to its average confidence throughout a given database. CG is found to outperform several association measures since it captures both the asymmetric notion of an association (as in the confidence measure) while adding the comparison to an expected confidence (as in the lift measure). The use of average confidence introduces the “background knowledge” notion into the CG measure.

Various experiments have shown that CG and local confidence gain (a low-complexity version of CG) successfully generate association rules when compared to human free associations. The experiments include a large-scale “free sssociation Turing test” where human free associations were compared to associations generated by the CG and other association measures. Rules discovered by CG were found to be significantly better than those discovered by other measures.

CG can be used for many purposes, such as personalization, sense disambiguation, query expansion, and improving classification performance of small item sets within large databases.

Although CG was found to be useful for Internet data retrieval, results can be easily used over any type of database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the International Conference on Very Large Data Bases, pp. 478–499 (1994)
Amir, R., Feldman, R., Kashi, R.: A new and versatile method for association generation. Inf. Syst. 2, 333–347 (1997)
Article Google Scholar
Basu, S., Mooney, R.J., Pasupuleti, K.V., Ghosh, J.: Evaluating the novelty of text-mined rules using lexical knowledge. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 233–239 (2001)
Bing, L., Yiming, M., Ching-Kian, W., Philip, S.-W.: Scoring the data using association rules. Appl. Intell. 18(2), 119–135 (2003)
Article Google Scholar
Chakrabarti, S., Sarawagi, S., Dom, B.: Mining surprising patterns using temporal description length. In: Gupta, A., Shmueli, O., Widom, J. (eds.): 24th International Conference on Very Large Data Bases, pp. 606–617. New York: Morgan Kaufmann
Chen, G., Weia, Q., Liub, D., Wetsc, G.: Simple association rules (SAR) and the SAR-based rule discovery. Comput. Ind. Eng. 43(4), 721–733 (2002)
Article Google Scholar
Dong, G., Li, J.: Interestingness of discovered association rules in terms of neighborhood based unexpectedness. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) (1998)
Feldman, R., Hirsh, H.: Mining Associations in Text in the Presence of Background Knowledge. In: Proceedings of the 2nd International Conference on Knowledge Discovery (KDD96), Portland, OR (1996)
French, R.M.: Subcognition and the limit of the turing test: Mind 99(393), 53–65 (1990)
Google Scholar
Frantzi, K.-T., Ananiadou, S.: Automatic term recognition using contextual cues. In: Proceedings of the 3rd DELOS Workshop. Zurich (1997)
Freitas, A.-A.: On rule interestingness measures. Knowl.-Based Syst. 12, 309–315 (2000)
Article Google Scholar
Geyer-Schulz, A., Hhsler, M.: Evaluation of Recommender Algorithms for an Internet Information Broker based on Simple Association Rules and on the Repeat-Buying Theory. In: Proceedings WEBKDD’2002, pp. 100–114. Edmonton, Canada (2002)
Graham, J.-W.: Evolutionary hot spots data mining. Methodologies for Knowledge Discovery and Data Mining. Lecture Notes in Artificial Intelligence, vol. 1574. Berlin, Heidelberg, New York: Springer (1999)
Grefenstette, G.: Evaluation Techniques for Automatic Semantic Extraction: Comparing Syntactic and Window-Based Approaches. MIT Press, Cambridge, MA, pp. 205–216 (1996)
Google Scholar
Hilderman, R.-J., Hamilton, H.-J.: Knowledge discovery and interestingness measures: A survey. Technical Report CS 99–04, Department of Computer Science, University of Regina, Saskatchewan, Canada (1999)
Jing, Y., Croft, W.-B.: An association thesaurus for information retrieval. In: Proceedings of the RIAO 94 Conference, New York, pp. 146–160 (2004)
Kintsch, W.: Predication. Cogn. Sci. 25, 173–202 (2001)
Article Google Scholar
Landauer, T.-K., Dumais, S.-T.: A solution to Plato’s problem: the latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychol. Rev. 104(2), 211–240 (1997)
Article Google Scholar
Landauer, T.-K., Foltz, P.-W., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)
Google Scholar
Lavigne, F., Lavigne, P.: Anticipatory semantic processes. Int. J. Comput. Anticipatory Syst. 7, 3–31 (2000)
Google Scholar
Leea, C.-H., Kim, Y.-H., Rheeb, P.-K.: Web personalization expert with combining collaborative filtering and association rule mining technique. Expert Syst. Appl. 21(3), 131–137 (2001)
Article Google Scholar
Miller, G.-A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.-J.: Introduction to WordNet: an on-line lexical database. J. Lexicogr. 3(4), 234–244 (1990)
Google Scholar
Moshaber, B., Cooley, R., Srivastava, J.: Automatic personalization based on Web usage mining. Commun. ACM 43(8), 142–151 (2000)
Article Google Scholar
Nelson, D.-L., McEvoy, C.-L., Schreiber, T.-A.: The University of South Florida Word Association, Rhyme, and Word Fragment Norms. http://w3.usf.edu/FreeAssociation/Intro.html (2002)
Padmanabhan, B., Tuzhilin, A.: A belief-driven method for discovering unexpected patterns. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD), pp. 94–100 (1998)
Padmanabhan, B., Tuzhilin, A.: Small is beautiful: discovering the minimal set of unexpected patterns. In: Ramakrishnan, R., Stolfo, S., Bayardo, R., Parsa, I. (eds.): Proceedinmgs of the 6th ACM SIGKDD International Conference, pp. 54–63 (2000)
Piatetsky–Shapiro, G.: Discovery, analysis, and presentation of strong rules. In: Piatetsky–Shapiro, G., Frawley, W. (eds.): Knowledge Discovery in Databases, pp. 229–248. Menlo Park, CA: AAAI Press (1991)
Google Scholar
Rajman, M., Besancon, R.: Text mining – knowledge extraction from unstructured textual data. In: Proceedings of the 6th Conference of the International Federation of Classification Societies (IFCS-98) pp. 473–480, Rome (1998)
Rapp, R.: Syntagmatic and paradigmatic associations in information retrieval. In: Proceedings of the 26th Annual Conference for the GFKL (Mannheim), Mannheim, Germany (in press) (2002)
Rapp, R., Wettler, M.: Prediction of free word associations based on hebbian learning. In: Proceedings of the International Joint Conference on Neural Networks, vol. 1. Singapore, pp. 25–29 (1991)
Silberschatz, A., Tuzhilin, A.: What makes patterns interesting in knowledge discovery systems. IEEE Trans. Knowl. Data Eng. 8(6), 970–974 (1996)
Article Google Scholar
Tamir, R.: Internet thesaurus – extracting relevant terms from WWW pages using weighted threshold function over a cross-reference matrix. In: Proceedings of the 26th Annual Conference for the GFKL (Mannheim), Mannheim, Germany (in press) (2002)
Tan, P., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. Technical Report 2002–112. Army High Performance Computing Research Center (2002)
Turing, A.M.: Computing machinery and intelligence. Mind 59(236), 433–460 (1950)
MathSciNet Google Scholar
Webb, G.-I., Zhang, S.: Beyond association rules: generalized rule discovery. (in press) (2003)
Wilson, M.-D.: The MRC psycholinguistic database: machine readable dictionary. Behav. Res. Methods Instruments Comput. 20(1), 6–11 (1988). An on-line version of EAT association thesaurus available at: http://monkey.cis.rl.ac.uk/Ear/htdocs/eat.html (current 2003)

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel
Raz Tamir
Computer Studies Program, Extension of Derby University in Israel, Tel-Aviv, Israel
Yehuda Singer

Authors

Raz Tamir
View author publications
You can also search for this author in PubMed Google Scholar
Yehuda Singer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raz Tamir.

Additional information

Edited by J. Srivastava

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tamir, R., Singer, Y. On a confidence gain measure for association rule discovery and scoring. The VLDB Journal 15, 40–52 (2006). https://doi.org/10.1007/s00778-004-0148-y

Download citation

Received: 06 February 2004
Accepted: 06 December 2004
Published: 05 September 2005
Issue Date: January 2006
DOI: https://doi.org/10.1007/s00778-004-0148-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On a confidence gain measure for association rule discovery and scoring

Abstract

Access this article

Similar content being viewed by others

Recent advances in decision trees: an updated survey

Expected utility theory with probability grids and preference formation

The pattern frequency distribution theory: a mathematic establishment toward rational and reliable pattern mining

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On a confidence gain measure for association rule discovery and scoring

Abstract

Access this article

Similar content being viewed by others

Recent advances in decision trees: an updated survey

Expected utility theory with probability grids and preference formation

The pattern frequency distribution theory: a mathematic establishment toward rational and reliable pattern mining

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation