Skip to main content

Improving Classification Accuracy of Large Test Sets Using the Ordered Classification Algorithm

  • Conference paper
  • First Online:
  • 732 Accesses

Part of the Lecture Notes in Computer Science book series (LNAI,volume 2527)

Abstract

We present a new algorithm called Ordered Classification, that is useful for classification problems where only few labeled examples are available but a large test set needs to be classified. In many real-world classification problems, it is expensive and some times unfeasible to acquire a large training set, thus, traditional supervised learning algorithms often perform poorly. In our algorithm, classification is performed by a discriminant approach similar to that of Query By Committee within the active learning setting. The method was applied to the real-world astronomical task of automated prediction of stellar atmospheric parameters, as well as to some benchmark learning problems showing a considerable improvement in classification accuracy over conventional algorithms.

Keywords

  • Spectral Index
  • Target Function
  • Unlabeled Data
  • Query Point
  • Weighted Linear Regression

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/3-540-36131-6_8
  • Chapter length: 10 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   139.00
Price excludes VAT (USA)
  • ISBN: 978-3-540-36131-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   179.00
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J.R. Quinlan. Induction of decision trees. Machine Learning, 1:81–106, 1986.

    Google Scholar 

  2. J.R. Quinlan. C4.5: Programs for machine learning. 1993. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  3. Christopher G. Atkeson, Andrew W. Moore, and Stefan Schaal. Locally weighted learning. Artificial Intelligence Review, 11:11–73, 1997.

    CrossRef  Google Scholar 

  4. A. McCallum and K. Nigam. Employing EM in pool-based active learning for text classification. In Proceedings of the 15th International Conference on Machine Learning, 1998.

    Google Scholar 

  5. K. Nigam, A. Mc Callum, S. Thrun, and T. Mitchell. Learning to classify text from labeled and unlabeled documents. Machine Learning, pages 1–22, 1999.

    Google Scholar 

  6. K. Nigam. Using Unlabeled Data to Improve Text Classification. PhD thesis, Carnegie Mellon University, 2001.

    Google Scholar 

  7. S. Baluja. Probabilistic modeling for face orientation discrimination: Learning from labeled and unlabeled data. Neural Information Processing Sys., pages 854–860, 1998.

    Google Scholar 

  8. B. Shahshahani and D. Landgrebe. The effect of unlabeled samples in reducing the small sample size Problem and mitigating the Hughes phenomenon. IEEE Transactions on Geoscience und Remote Sensing, 32(5):1087–1095, 1994.

    CrossRef  Google Scholar 

  9. F. Cozman and I. Cohen. Unlabeled data can degrade classification performance of generative classifiers. Technical Report HPL-2001-234, Hewlett-Packard Laboratories, 1501 Page Mill Road, September 2001.

    Google Scholar 

  10. K. Nigam and R. Ghani. Analyzing the effectiveness and applicabitily of cotraining. Ninth International Conference on Information und Knowledgement Management, pages 86–93, 2000.

    Google Scholar 

  11. A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the 1998 Conference on Computational Learning Theory, July 1998.

    Google Scholar 

  12. S. Goldman and Y. Zhou. Enhancing supervised learning with unlabeled data. In Proceedings On the Seventeenth International Conference on Machine Learning: ICML-2000, 2000.

    Google Scholar 

  13. M.T. Fardanesh and K.E. Okan. Classification accuracy improvement of neural network by using unlabeled data. IEEE Transactions on Geoscience und Remote Sensing, 36(3):1020–1025, 1998.

    CrossRef  Google Scholar 

  14. A. Blum and Shuchi Chawla. Learning from labeled and unlabeled data using graph mincuts. ICML’01, pages 19–26, 2001.

    Google Scholar 

  15. G. Fung and O.L. Mangasarian. Semi-supervised support vector machines for unlabeled data classification. Optimixation Methods and Software, (15):29–44, 2001.

    MATH  CrossRef  Google Scholar 

  16. M. Szummer and T. Jaakkola. Kerne1 expansions with unlabeled examples. NIPS 13, 2000.

    Google Scholar 

  17. Y. Freund, H. Seung, E. Shamir, and N. Tishby. Selective sampling using the query by comitee algorithm. Machine Learning, 28(2/3):133–168, 1997.

    MATH  CrossRef  Google Scholar 

  18. S. Argamon-Engelson and I. Dagan. Committee-based sample selection for probabilistic classifers. Journal of Artificial Inteligence Research, (11):335–360, Nov. 1999.

    Google Scholar 

  19. R. Liere and Prasad Tadepalli. Active learning with committees for text categorization. In Proceedings of the Fourteenth National Conference On Artificial Intelligence, pages 591–596, 1997.

    Google Scholar 

  20. C. Merz and P. Murphy. UCI Machine Learning Repository. http://www.ics.uci.edu./~mlearn/MLRepository.html, 1996.

  21. L.A. Jones. Star Populations in Galaxies. PhD thesis, University of North Carolina, Chapel Hill, North Carolina, 1996.

    Google Scholar 

  22. L. Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.

    MATH  MathSciNet  Google Scholar 

  23. T. Dietterich. Ensemble methods in machine learning. In First International Workshop on Multiple Classifier Systems, Lecture Notes in Computer Science, pages 1–15, New York: Springer Verlag, 2000. In J. Kittler and F. Roli (Ed.).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Solorio, T., Fuentes, O. (2002). Improving Classification Accuracy of Large Test Sets Using the Ordered Classification Algorithm. In: Garijo, F.J., Riquelme, J.C., Toro, M. (eds) Advances in Artificial Intelligence — IBERAMIA 2002. IBERAMIA 2002. Lecture Notes in Computer Science(), vol 2527. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36131-6_8

Download citation

  • DOI: https://doi.org/10.1007/3-540-36131-6_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00131-7

  • Online ISBN: 978-3-540-36131-2

  • eBook Packages: Springer Book Archive

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.