Skip to main content
Log in

Cost-sensitive learning with conditional Markov networks

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

There has been a recent, growing interest in classification and link prediction in structured domains. Methods such as conditional random fields and relational Markov networks support flexible mechanisms for modeling correlations due to the link structure. In addition, in many structured domains, there is an interesting structure in the risk or cost function associated with different misclassifications. There is a rich tradition of cost-sensitive learning applied to unstructured (IID) data. Here we propose a general framework which can capture correlations in the link structure and handle structured cost functions. We present two new cost-sensitive structured classifiers based on maximum entropy principles. The first determines the cost-sensitive classification by minimizing the expected cost of misclassification. The second directly determines the cost-sensitive classification without going through a probability estimation step. We contrast these approaches with an approach which employs a standard 0/1-loss structured classifier to estimate class conditional probabilities followed by minimization of the expected cost of misclassification and with a cost-sensitive IID classifier that does not utilize the correlations present in the link structure. We demonstrate the utility of our cost-sensitive structured classifiers with experiments on both synthetic and real-world data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Abe N, Zadrozny B, Langford J (2004) An iterative method for multiclass cost-sensitive learning. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, Seattle, WA, pp 3–11

  • Berrou C, Glavieux A, Thitimajshima P (1993) Near Shannon limit error-correcting coding and decoding: turbo codes. In: Proceedings of IEEE international conference on communications, vol 2. IEEE, Geneva, Switzerland, pp 1064–1070

  • Besag J (1986) On the statistical analysis of dirty pictures. J R Stat Soc 48: 259–302

    MATH  MathSciNet  Google Scholar 

  • Bodik P, Hong W, Guestrin C, Madden S, Paskin M, Thibaux R (2004) Intel lab dataset. http://berkeley.intel-research.net/labdata/

  • Bollobas B, Borgs C, Chayes JT, Riordan O (2003) Directed scale-free graphs. In: Proceedings of the fourteenth ACM-SIAM symposium on discrete algorithms (SODA). SIAM, Baltimore, MD, pp 132–139

  • Bradford J, Kunz C, Kohavi R, Brunk C, Brodley C (1998) Pruning decision trees with misclassification costs. In: Proceedings of the tenth European conference on machine learning. Springer-Verlag, Chemnitz, Germany, pp 131–136

  • Brefeld U, Geibel P, Wysotzki F (2003) Support vector machines with example dependent costs. In: Proceedings of the fourteenth European conference on machine learning. Springer, Cavtat-Dubrovnik, Croatia, pp 23–34

  • Chakrabarti S, Dom B, Indyk P (1998) Enhanced hypertext categorization using hyperlinks. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM, Seattle, WA, pp 307–318

  • Chan P, Stolfo S (1998) Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In: Proceedings of the fourth international conference on knowledge discovery and data mining. AAAI Press, New York, NY, pp 164–168

  • Cohn D, Hofmann T (2001) The missing link – a probabilistic model of document content and hypertext connectivity. In: Advances in neural information processing systems 13. MIT Press, Denver, CO, pp 430–436

  • Deshpande A, Guestrin C, Madden S, Hong W (2005) Exploiting correlated attributes in acquisitional query processing. In: Proceedings of the twenty-first international conference on data engineering. IEEE, Tokyo, Japan, pp 143–154

  • Domingos P (1999) MetaCost: a general method for making classifiers cost-sensitive. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 155–164

  • Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley Interscience

  • Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the seventeenth international joint conference on artificial intelligence. Morgan Kaufmann, Seattle, WA, pp 973–978

  • Fumera G, Roli F (2002) Cost-sensitive learning in support vector machines. In: Convegno Associazione Italiana per L’Intelligenza Artificiale

  • Geibel P, Wysotzki F (2003) Perceptron based learning with example dependent and noisy costs. In: Proceedings of the twentieth international conference on machine learning. AAAI Press, Menlo Park, CA, pp 218–225

  • Getoor L, Friedman N, Koller D, Taskar B (2002) Learning probabilistic models of link structure. J Machine Learning Res 3: 679–707

    Article  MathSciNet  Google Scholar 

  • Hummel R, Zucker S (1983) On the foundations of relaxation labeling processes. IEEE Trans Pattern Anal Mach Intell 5(3): 267–287

    Article  MATH  Google Scholar 

  • Jaynes ET, Rosenkrantz RD (ed) (2003) E. T. Jaynes: papers on probability, statistics and statistical physics. Springer

  • Knoll U, Nakhaeizadeh G, Tausend B (1994) Cost-sensitive pruning of decision trees. In: Proceedings of the European conference on machine learning. Springer-Verlag, Catania, Italy, pp 383–386

  • Kschischang FR, Frey BJ (1998) Iterative decoding of compound codes by probability propagation in graphical models. IEEE J Sel Areas Commun 16: 219–230

    Article  Google Scholar 

  • Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning. Morgan Kaufmann, Williamstown, MA, pp 282–289

  • Lu Q, Getoor L (2003) Link-based classification. In: Proceedings of the twentieth international conference on machine learning. AAAI Press, Washington, DC, pp 496–503

  • McEliece RJ, MacKay DJC, Cheng JF (1998) Turbo decoding as an instance of Pearl’s belief propagation algorithm. IEEE J Sel Areas Commun 16: 140–152

    Article  Google Scholar 

  • Minka T (2001) Expectation propagation for approximate bayesian inference. In: Proceedings of the seventeenth conference in uncertainty in artificial intelligence. Morgan Kaufmann, Seattle, WA, pp 362–369

  • Murphy K, Weiss Y, Jordan MI (1999) Loopy belief propagation for approximate inference: an empirical study. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann, Stockholm, Sweden, pp 467–475

  • Neville J, Jensen D (2000) Iterative classification in relational data. In: AAAI workshop on learning statistical models from relational data. AAAI Press, Austin, TX, pp 13–20

  • Sen P, Getoor L (2006) Cost-sensitive learning with conditional Markov networks. In: Proceedings of the twenty-third international conference on machine learning. ACM, Pittsburgh, PA, pp 801–808

  • Singhvi V, Krause A, Guestrin C, Garrett J, Matthews HS (2005) Intelligent light control using sensor networks. In: Proceedings of the third international conference on embedded networked sensor systems. ACM, San Diego, CA, pp 218–229

  • Slattery S, Craven M (1998) Combining statistical and relational methods for learning in hypertext domains. In: Proceedings of the 8th international workshop on inductive logic programming. Springer-Verlag, Madison, WI, pp 38–52

  • Taskar B, Abbeel P, Koller D (2002) Discriminative probabilistic models for relational data. In: Proceedings of the eighteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann, Edmonton, Canada, pp 485–492

  • Taskar B, Chatalbashev V, Koller D (2004a) Learning associative Markov networks. In: Proceedings of the twenty-first international conference on machine learning. ACM, Banff, Alberta, Canada, pp 807–814

  • Taskar B, Guestrin C, Koller D (2004b) Max-margin Markov networks. In: Advances in neural information processing systems 16. MIT Press, Vancouver and Whistler, British Columbia, Canada, pp 25–32

  • Tsochantaridis I, Hofmann T, Joachims T, Altun Y (2004) Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the twenty-first international conference on machine learning. ACM, Banff, Alberta, Canada, pp 823–830

  • Xu L, Wilkinson D, Southey F, Schuurmans D (2006) Discriminative unsupervised learning of structured predictors. In: Proceedings of the twenty-third international conference on machine learning. ACM, Pittsburgh, PA, pp 1057–1064

  • Yedidia JS, Freeman WT, Weiss Y (2001) Generalized belief propagation. In: Advances in neural information processing systems 13. MIT Press, Denver, CO, pp 689–695

  • Yedidia JS, Freeman WT, Weiss Y (2005) Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans Inform Theory 51: 2282–2312

    Article  MathSciNet  Google Scholar 

  • Zadrozny B, Elkan C (2001) Learning and making decisions when costs and probabilities are both unknown. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, San Francisco, CA, pp 204–213

  • Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost-proportionate example weighting. In: Proceedings of the third IEEE international conference on data mining. IEEE, Melbourne, FL, pp 435–442

  • Zhang T, Oles FJ (2001) Text categorization based on regularized linear classification methods. Inform Retrieval 4: 5–31

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Prithviraj Sen.

Additional information

Responsible editor: Bianca Zadrozny.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sen, P., Getoor, L. Cost-sensitive learning with conditional Markov networks. Data Min Knowl Disc 17, 136–163 (2008). https://doi.org/10.1007/s10618-008-0090-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-008-0090-5

Keywords

Navigation