A Generalized Model for Predictive Data Mining

Hansen, James V.; McDonald, James B.

doi:10.1023/A:1016050803099

A Generalized Model for Predictive Data Mining

Published: July 2002

Volume 4, pages 179–186, (2002)
Cite this article

Information Systems Frontiers Aims and scope Submit manuscript

James V. Hansen¹ &
James B. McDonald²

113 Accesses
9 Citations
Explore all metrics

Abstract

This paper describes a flexible model for predictive data mining, EGB2, which optimizes over a parameter space to fit data to a family of models based on maximum-likelihood criteria. It is also shown how EGB2 can integrate asymmetric costs of Type I and Type II errors, thereby minimizing expected misclassification costs.

Importantly, it has been shown that standard methods of computing maximum-likelihood estimators are generally inconsistent when applied to sample data having different proportions of labels than are found in the universe from which the sample is drawn. We show how a choice estimator based on weighting each observation's contribution to the log-likelihood function, can contribute to estimator consistency and how this feature can be implemented in EGB2.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Amemiya A. Advanced Econometrics. Cambridge, MA: Harvard University Press, 1985.
Google Scholar
Bar Niv R, McDonald J. Identifying financial distress in the insurance industry:Asynthesis of methodological and empirical issues. Journal of Risk and Insurance 1992;59:543–574.
Google Scholar
Bell T, Szykowny S, Willingham J. Assessing the likelihood of fraudulent financial reporting: A cascaded logit approach. Working paper, KPMG Peat Marwick, 1993.
Clarke D, McDonald J. Generalized bankruptcy models applied to predicting consumer credit behavior. Journal of Economics and Business 1992;44:47–62.
Google Scholar
Dawes R. The robust beauty of improper linear models in decision making. American Psychologist 1979;34:571–582.
Google Scholar
Glymour C, Madigan D, Pregibon D, Smyth P. Statistical themes and lessons for data mining. Data Mining and Knowledge Discovery 1997;1:11–28.
Google Scholar
Hansen J, McDonald J, Stice J. Artificial intelligence and generalized qualitative-response models: An empirical test on two audit decision-making domains. Decision Sciences 1992;23:708–723.
Google Scholar
Hassoun M. Fundamentals of Artificial Neural Networks. Cambridge, MA: MIT Press, 1995.
Google Scholar
Johnson E, Meyer R, Ghose S. When choice models fails: Compensatory models in efficient sets. Working paper, Graduate School of Industrial Administration, Carnegie-Mellon University, 1985.
Kalbfleisch J, Prentice R. The Statistical Analysis of Failure Times. New York: Wiley, 1980.
Google Scholar
Kearns M, Vazirani U. An Introduction to Computational Learning Theory. Cambridge, MA: The MIT Press, 1994.
Google Scholar
Libby R. Accounting and Human Information Processing: Theory and Applications. Englewood Cliffs, NJ: Prentice-Hall, 1981.
Google Scholar
Manski C, Lerman S. The estimation of choice probabilities from choice based samples. Econometrica 1977;45:1977–1988.
Google Scholar
McDonald J. Some generalized functions for the size distribution of income. Econometrica 1984;52:647–663.
Google Scholar
McDonald J, White S. A comparison of some robust, adaptive, and partially adaptive estimators of regression models. Econometric Reviews 1993;12:103–124.
Google Scholar
McDonald J, Xu Y. A generalization of the beta distribution with applications. Journal of Econometrics 1995;66:133–152. Errata 1995;69:427–428.
Google Scholar
Payne J. Task complexity and contingent processing in decision making: An information search and protocol analysis. Organizational Behavior and Human Performance 1976;16:366–387.
Google Scholar
Quandt R. Computational problems and methods. In: Handbook of Econometrics, Ch. 12, Vol. 1, 1983:699–764.
Google Scholar
Rainville ED. Special Functions. New York: MacMillan, 1960.
Google Scholar
Shavlik J, Dietterich T. Introduction. In: Shavlik J, Dietterich T, eds. Readings in Machine Learning. San Mateo, CA: Morgan Kaufmann Publishers, 1991.
Google Scholar
Stice J. Using financial and market information to identify preengagement factors associated with lawsuits against auditors. The Accounting Review 1991;66:516–534.
Google Scholar
Weiss S, Kulikowski C. Computer Systems that Learn. San Mateo, CA: Morgan Kaufmann Publishers, 1991.
Google Scholar

Download references

Author information

Authors and Affiliations

Marriott School of Management, Brigham Young University, Provo, Utah, 84602, USA
James V. Hansen
Department of Economics, Brigham Young University, Provo, Utah, 84602, USA
James B. McDonald

Authors

James V. Hansen
View author publications
You can also search for this author in PubMed Google Scholar
James B. McDonald
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to James V. Hansen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hansen, J.V., McDonald, J.B. A Generalized Model for Predictive Data Mining. Information Systems Frontiers 4, 179–186 (2002). https://doi.org/10.1023/A:1016050803099

Download citation

Issue Date: July 2002
DOI: https://doi.org/10.1023/A:1016050803099

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Generalized Model for Predictive Data Mining

Abstract

Access this article

Similar content being viewed by others

Simple measures of uncertainty for model selection

Classification Models

Roles Played by Bayesian Networks in Machine Learning: An Empirical Investigation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Navigation

A Generalized Model for Predictive Data Mining

Abstract

Access this article

Similar content being viewed by others

Simple measures of uncertainty for model selection

Classification Models

Roles Played by Bayesian Networks in Machine Learning: An Empirical Investigation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation