Skip to main content

Techniques for dealing with missing values in classification

  • Conference paper
  • First Online:
Advances in Intelligent Data Analysis Reasoning about Data (IDA 1997)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1280))

Included in the following conference series:

Abstract

A brief overview of the history of the development of decision tree induction algorithms is followed by a review of techniques for dealing with missing attribute values in the operation of these methods. The technique of dynamic path generation is described in the context of tree-based classification methods. The waste of data which can result from casewise deletion of missing values in statistical algorithms is discussed and alternatives proposed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Breiman, L., Friedman, J.H., Olshen, R.A. & Stone, C.J. (1984). Classification and regression trees. Belmont: Wadsworth.

    MATH  Google Scholar 

  • Clark, L.A. & Pregibon, D. (1992). Tree-based models. In Statistical Models in S, edited by J.M. Chambers & T.J. Hastie, pp. 377–419. California: Wadsworth & Brooks/Cole.

    Google Scholar 

  • Friedman, H.F., Kohavi, R. & Yun, Y. (1996). Lazy decision trees. in Proceedings of the 13th National Conference on Artificial Intelligence, pp. 717–724, AAAI Press/MIT Press.

    Google Scholar 

  • Hunt, E.B. (1962). Concept learning: an information processing problem. New York: Wiley.

    Book  Google Scholar 

  • Hunt, E.B., Marin, J. & Stone, P.J. (1966). Experiments in induction. New York: Academic Press.

    Google Scholar 

  • Kass, G.V. (1980). An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29, 119–127.

    Article  Google Scholar 

  • Kononenko, I., Bratko, I. & Roskar, E. (1984). Experiments in automatic learning of medical diagnostic rules. Technical Report. Jozef Stefan Institute, Ljubjana, Yugoslavia.

    Google Scholar 

  • Liu, W.Z. & White, A.P. (1991). A review of inductive learning. In Research and Development in Expert Systems VIII, edited by I.M. Graham and R.W. Milne, pp. 112–126. Cambridge: Cambridge University Press.

    Google Scholar 

  • Liu, W.Z. & White, A.P. (1994). The importance of attribute selection measures in decision tree induction. Machine Learning, 15, 25–41.

    Google Scholar 

  • Liu, W.Z. White, A.P. & Hallissey, M.T. (1994). Early screening for gastric cancer using machine learning techniques. In Machine Learning: ECML-94, edited by F. Bergadano and L. De Raedt, pp. 391–394. Springer-Verlag, Berlin.

    Chapter  Google Scholar 

  • Liu, W.Z., White, A.P., Hallissey, M.T. & Fielding, J.W.L. (1996). Machine learning techniques in early screening for gastric and oesophageal cancer. Artificial Intelligence in Medicine, 8, 327–341.

    Article  Google Scholar 

  • Mingers, J. (1989). An empirical comparison of pruning methods for decision tree induction. Machine Learning, 4, 227–243.

    Article  Google Scholar 

  • Quinlan, J.R. (1979). Discovering rules by induction from large collections of examples. In Expert Systems in the Micro-Electronic Age, edited by D. Michie, pp. 168–201. Edinburgh: Edinburgh University Press.

    Google Scholar 

  • Quinlan, J.R. (1986). Induction of decision trees. Machine Learning, 1, 81–106.

    Google Scholar 

  • White, A.P. (1987). Probabilistic induction by dynamic path generation in virtual trees. In Research and Development in Expert Systems III, edited by M.A. Bramer, pp. 35–46. Cambridge: Cambridge University Press.

    Google Scholar 

  • White, A.P. & Liu, W.Z. (1994). Bias in information-based measures in decision tree induction. Machine Learning, 15, 321–329.

    MATH  Google Scholar 

  • White, A.P., Liu, W.Z., Hallissey, M.T. & Fielding, J.W.L. (1996). A comparison of two classification techniques in screening for gastro-oesophageal cancer. Applications and Innovations in Expert Systems IV, edited by A. Macintosh and C. Cooper, pp. 83–97. Cambridge: Cambridge University Press.

    Google Scholar 

  • White, A.P. & Liu, W.Z. (1997). Statistical properties of tree-based approaches to classification. In Machine Learning and Statistics: the Interface, edited by R. Nakhaeizadeh and C. Taylor, pp. 23–44. ISBN 0-471-14890-3, John Wiley & Sons, Inc.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Xiaohui Liu Paul Cohen Michael Berthold

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag

About this paper

Cite this paper

Liu, W.Z., White, A.P., Thompson, S.G., Bramer, M.A. (1997). Techniques for dealing with missing values in classification. In: Liu, X., Cohen, P., Berthold, M. (eds) Advances in Intelligent Data Analysis Reasoning about Data. IDA 1997. Lecture Notes in Computer Science, vol 1280. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0052868

Download citation

  • DOI: https://doi.org/10.1007/BFb0052868

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63346-4

  • Online ISBN: 978-3-540-69520-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics