Abstract
There has been a recent, growing interest in classification and link prediction in structured domains. Methods such as conditional random fields and relational Markov networks support flexible mechanisms for modeling correlations due to the link structure. In addition, in many structured domains, there is an interesting structure in the risk or cost function associated with different misclassifications. There is a rich tradition of cost-sensitive learning applied to unstructured (IID) data. Here we propose a general framework which can capture correlations in the link structure and handle structured cost functions. We present two new cost-sensitive structured classifiers based on maximum entropy principles. The first determines the cost-sensitive classification by minimizing the expected cost of misclassification. The second directly determines the cost-sensitive classification without going through a probability estimation step. We contrast these approaches with an approach which employs a standard 0/1-loss structured classifier to estimate class conditional probabilities followed by minimization of the expected cost of misclassification and with a cost-sensitive IID classifier that does not utilize the correlations present in the link structure. We demonstrate the utility of our cost-sensitive structured classifiers with experiments on both synthetic and real-world data.
Similar content being viewed by others
References
Abe N, Zadrozny B, Langford J (2004) An iterative method for multiclass cost-sensitive learning. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, Seattle, WA, pp 3–11
Berrou C, Glavieux A, Thitimajshima P (1993) Near Shannon limit error-correcting coding and decoding: turbo codes. In: Proceedings of IEEE international conference on communications, vol 2. IEEE, Geneva, Switzerland, pp 1064–1070
Besag J (1986) On the statistical analysis of dirty pictures. J R Stat Soc 48: 259–302
Bodik P, Hong W, Guestrin C, Madden S, Paskin M, Thibaux R (2004) Intel lab dataset. http://berkeley.intel-research.net/labdata/
Bollobas B, Borgs C, Chayes JT, Riordan O (2003) Directed scale-free graphs. In: Proceedings of the fourteenth ACM-SIAM symposium on discrete algorithms (SODA). SIAM, Baltimore, MD, pp 132–139
Bradford J, Kunz C, Kohavi R, Brunk C, Brodley C (1998) Pruning decision trees with misclassification costs. In: Proceedings of the tenth European conference on machine learning. Springer-Verlag, Chemnitz, Germany, pp 131–136
Brefeld U, Geibel P, Wysotzki F (2003) Support vector machines with example dependent costs. In: Proceedings of the fourteenth European conference on machine learning. Springer, Cavtat-Dubrovnik, Croatia, pp 23–34
Chakrabarti S, Dom B, Indyk P (1998) Enhanced hypertext categorization using hyperlinks. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM, Seattle, WA, pp 307–318
Chan P, Stolfo S (1998) Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In: Proceedings of the fourth international conference on knowledge discovery and data mining. AAAI Press, New York, NY, pp 164–168
Cohn D, Hofmann T (2001) The missing link – a probabilistic model of document content and hypertext connectivity. In: Advances in neural information processing systems 13. MIT Press, Denver, CO, pp 430–436
Deshpande A, Guestrin C, Madden S, Hong W (2005) Exploiting correlated attributes in acquisitional query processing. In: Proceedings of the twenty-first international conference on data engineering. IEEE, Tokyo, Japan, pp 143–154
Domingos P (1999) MetaCost: a general method for making classifiers cost-sensitive. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 155–164
Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley Interscience
Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the seventeenth international joint conference on artificial intelligence. Morgan Kaufmann, Seattle, WA, pp 973–978
Fumera G, Roli F (2002) Cost-sensitive learning in support vector machines. In: Convegno Associazione Italiana per L’Intelligenza Artificiale
Geibel P, Wysotzki F (2003) Perceptron based learning with example dependent and noisy costs. In: Proceedings of the twentieth international conference on machine learning. AAAI Press, Menlo Park, CA, pp 218–225
Getoor L, Friedman N, Koller D, Taskar B (2002) Learning probabilistic models of link structure. J Machine Learning Res 3: 679–707
Hummel R, Zucker S (1983) On the foundations of relaxation labeling processes. IEEE Trans Pattern Anal Mach Intell 5(3): 267–287
Jaynes ET, Rosenkrantz RD (ed) (2003) E. T. Jaynes: papers on probability, statistics and statistical physics. Springer
Knoll U, Nakhaeizadeh G, Tausend B (1994) Cost-sensitive pruning of decision trees. In: Proceedings of the European conference on machine learning. Springer-Verlag, Catania, Italy, pp 383–386
Kschischang FR, Frey BJ (1998) Iterative decoding of compound codes by probability propagation in graphical models. IEEE J Sel Areas Commun 16: 219–230
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning. Morgan Kaufmann, Williamstown, MA, pp 282–289
Lu Q, Getoor L (2003) Link-based classification. In: Proceedings of the twentieth international conference on machine learning. AAAI Press, Washington, DC, pp 496–503
McEliece RJ, MacKay DJC, Cheng JF (1998) Turbo decoding as an instance of Pearl’s belief propagation algorithm. IEEE J Sel Areas Commun 16: 140–152
Minka T (2001) Expectation propagation for approximate bayesian inference. In: Proceedings of the seventeenth conference in uncertainty in artificial intelligence. Morgan Kaufmann, Seattle, WA, pp 362–369
Murphy K, Weiss Y, Jordan MI (1999) Loopy belief propagation for approximate inference: an empirical study. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann, Stockholm, Sweden, pp 467–475
Neville J, Jensen D (2000) Iterative classification in relational data. In: AAAI workshop on learning statistical models from relational data. AAAI Press, Austin, TX, pp 13–20
Sen P, Getoor L (2006) Cost-sensitive learning with conditional Markov networks. In: Proceedings of the twenty-third international conference on machine learning. ACM, Pittsburgh, PA, pp 801–808
Singhvi V, Krause A, Guestrin C, Garrett J, Matthews HS (2005) Intelligent light control using sensor networks. In: Proceedings of the third international conference on embedded networked sensor systems. ACM, San Diego, CA, pp 218–229
Slattery S, Craven M (1998) Combining statistical and relational methods for learning in hypertext domains. In: Proceedings of the 8th international workshop on inductive logic programming. Springer-Verlag, Madison, WI, pp 38–52
Taskar B, Abbeel P, Koller D (2002) Discriminative probabilistic models for relational data. In: Proceedings of the eighteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann, Edmonton, Canada, pp 485–492
Taskar B, Chatalbashev V, Koller D (2004a) Learning associative Markov networks. In: Proceedings of the twenty-first international conference on machine learning. ACM, Banff, Alberta, Canada, pp 807–814
Taskar B, Guestrin C, Koller D (2004b) Max-margin Markov networks. In: Advances in neural information processing systems 16. MIT Press, Vancouver and Whistler, British Columbia, Canada, pp 25–32
Tsochantaridis I, Hofmann T, Joachims T, Altun Y (2004) Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the twenty-first international conference on machine learning. ACM, Banff, Alberta, Canada, pp 823–830
Xu L, Wilkinson D, Southey F, Schuurmans D (2006) Discriminative unsupervised learning of structured predictors. In: Proceedings of the twenty-third international conference on machine learning. ACM, Pittsburgh, PA, pp 1057–1064
Yedidia JS, Freeman WT, Weiss Y (2001) Generalized belief propagation. In: Advances in neural information processing systems 13. MIT Press, Denver, CO, pp 689–695
Yedidia JS, Freeman WT, Weiss Y (2005) Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans Inform Theory 51: 2282–2312
Zadrozny B, Elkan C (2001) Learning and making decisions when costs and probabilities are both unknown. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, San Francisco, CA, pp 204–213
Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost-proportionate example weighting. In: Proceedings of the third IEEE international conference on data mining. IEEE, Melbourne, FL, pp 435–442
Zhang T, Oles FJ (2001) Text categorization based on regularized linear classification methods. Inform Retrieval 4: 5–31
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Bianca Zadrozny.
Rights and permissions
About this article
Cite this article
Sen, P., Getoor, L. Cost-sensitive learning with conditional Markov networks. Data Min Knowl Disc 17, 136–163 (2008). https://doi.org/10.1007/s10618-008-0090-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-008-0090-5