Skip to main content

Better Decision Tree Induction for Limited Data Sets of Liver Disease

  • Conference paper
Computer Applications for Bio-technology, Multimedia, and Ubiquitous City (BSBT 2012, MulGraB 2012, IUrC 2012)

Abstract

Decision trees can be very useful data mining tools for human experts to diagnose the disease, because the knowledge structure is represented in tree shape. But we may not get satisfactory decision tree, if we do not have enough number of consistent instances in the data sets. Recently two kinds of relatively small data sets of liver disorder from America and India are available, so in order to generate more accurate and useful decision trees for the disease this paper suggests appropriate sampling for the data instances that are in the class of higher error rate. Experiments with the two public domain data sets and a representative decision tree algorithm, C4.5, shows very successful results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ribeiro, R., Marinho, R., Velosa, J., Ramalho, F., Sanches, J.M.: Chronic liver disease staging classification based on ultrasound, clinical and laboratorial data. In: Proceedings of 2011 IEEE International Symposium on Biomedical Imaging from Nano to Macro, pp. 707–710 (2011)

    Google Scholar 

  2. UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/datasets/Liver+Disorders

  3. Zhou, Z., Jiang, Y., Chen, S.: Extracting symbolic rules from trained neural network ensembles. AI Communications 16(1), 3–15 (2003)

    Google Scholar 

  4. Podgorelec, V., Kokol, P., Stiglic, B., Rozman, I.: Decision trees: an overview and their use in medicine. Journal of Medical Systems 26(5), 445–463 (2002)

    Article  Google Scholar 

  5. Lin, Y.C.: Design and Implementation of an Ontology-Based Psychiatric Disorder Detection System. WSEAS Transactions on Information Sciences and Applications 7(1), 56–69 (2010)

    Google Scholar 

  6. Tryfos, P.: Sampling for Applied Research: Text and Cases, Willy (1996)

    Google Scholar 

  7. Ramana, B.V., Babu, M.S.P., Venkateswarlu, N.B.: A Critical Comparative Study of Liver Patients from USA and INDIA: An Exploratory Analysis. International Journal of Computer Science, 506–516 (2012)

    Google Scholar 

  8. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, Inc. (1993)

    Google Scholar 

  9. Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Yu, P.S., Zhou, Z., Steinbach, M., Hand, D.J., Steinberg, D.: Top 10 Algorithms in Data Mining. Knowledge Information System 14, 1–37 (2008)

    Article  Google Scholar 

  10. Chawla, N.V.: C4.5 and Imbalanced data sets : Investigating the effect of sampling emthod, probalistic estimate, and decision tree structure. In: Workshop on Learning from Imbalanced Datasets II, ICML, Washington DC (2003)

    Google Scholar 

  11. Drummond, C., Holte, R.C.: C4.5, Class Imbalance, and Cost Sensitivity: Why Under-sampling beats Over-sampling. In: Workshop on Learning from Imbalanced Datasets II, ICML, Washington DC (2003)

    Google Scholar 

  12. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16, 341–378 (2002)

    Google Scholar 

  13. Japkowicz, N., Stephen, S.: The class imbalance problem: A systematic study. Intelligent Data Analysis 6(5), 429–449 (2002)

    MATH  Google Scholar 

  14. Zhou, Z., Jiang, Y.: NeC4.5: Neural Ensemble Based C4.5. IEEE Transactions on Knowledge and Data Engineering 16 (2004)

    Google Scholar 

  15. Garcke, J., Griebel, M.: Classification with sparse grids using simplicial basis function. Intelligent Data analysis 6 (2002)

    Google Scholar 

  16. Kahramanli, H., Allahverdi, N.: Mining Classification Rules for Liver Disorders. International Journal of Mathematics and Computers in Simulation 3(1), 9–19 (2009)

    Google Scholar 

  17. Ramana, B.V., Babu, M.S.P., Venkateswarlu, N.B.: A Critical Study of Selected Classification Algorithms for Liver Disease Diagnosis. International Journal of Database Management Systems 3(2), 101–114 (2011)

    Article  Google Scholar 

  18. Frank, A., Suncion, A.: UCI Machine Learning Repository. University of California, School of Information and Computer Sciences, Irvine (2010), http://archive.ics.uci.edu/ml

    Google Scholar 

  19. Zheng, Z.: Scaling up the Rule Generation of C4.5. In: Wu, X., Kotagiri, R., Korb, K.B. (eds.) PAKDD 1998. LNCS, vol. 1394, pp. 348–359. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sug, H. (2012). Better Decision Tree Induction for Limited Data Sets of Liver Disease. In: Kim, Th., Kang, JJ., Grosky, W.I., Arslan, T., Pissinou, N. (eds) Computer Applications for Bio-technology, Multimedia, and Ubiquitous City. BSBT MulGraB IUrC 2012 2012 2012. Communications in Computer and Information Science, vol 353. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35521-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35521-9_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35520-2

  • Online ISBN: 978-3-642-35521-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics