Advertisement

Data Mining and Knowledge Discovery

, Volume 6, Issue 2, pp 115–130 | Cite as

On Issues of Instance Selection

  • Huan Liu
  • Hiroshi Motoda
Article

Keywords

Artificial Intelligence Data Structure Information Theory Instance Selection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aha, D. (Ed.). 1997. Lazy Learning. Dordrecht: Kluwer Academic Publishers.Google Scholar
  2. Aha, D.W., Kibler, D., and Albert, M.K. 1991. Instance-based learning algorithms. Machine Learning 6:37–66.Google Scholar
  3. Baeza-Yates, R. and Ribeiro-Neto, B. 1999. Morden Information Retrieval. New York: Addison Wesley and ACM Press.Google Scholar
  4. Bloedorn, E. and Michalski, R. 1998. Data-Driven Constructive Induction: A Methodology and Its Applications. In Feature Extraction, Construction and Selection: A Data Mining Perspective. Boston: Kluwer Academic Publishers, pp. 51–68.Google Scholar
  5. Blum, A. and Langley, P. 1997. Selection of relevant features and examples in machine learning. Artificial Intelligence, 97:245–271.Google Scholar
  6. Bradley, P., Fayyad, U., and Reina, C. 1998. Scaling clustering algorithms to large databases. In Proceedings of the Fourth International Conference on Knowledge Discovery & Data Mining, pp. 9–15.Google Scholar
  7. Breiman, L. and Friedman, J. 1984. Tool for large data set analysis. In Statistical Signal Processing, E. Wegman and J. Smith (Eds.). New York: M. Dekker, pp. 191–197.Google Scholar
  8. Breiman, L., Friedman, J., Olshen, R., and Stone, C. 1984. Classification and Regression Trees. Wadsworth & Brooks/Cole Advanced Books & Software, Monterey: CA.Google Scholar
  9. Brighton, H. and Mellish, C. 2002. Advances in instance selection for instance-based learning. Data Mining and Knowledge Disovery, An International Journal, 6(2):153–172.Google Scholar
  10. Brodley, C.E. 1995. Recursive automatic bias selection for classifier construction. Machine Learning, 20(1/2): 63–94.Google Scholar
  11. Burges, C. 1998. A tutorial on support vector machines. Journal of Data Mining and Knowledge Discovery, 2:121–167.Google Scholar
  12. Chang, C. 1974. Finding prototypes for nearest neighbor classifiers. IEEE Transactions on Computers, C-23.Google Scholar
  13. Chaudhuri, S., Motwani, R., and Narasayya, V. 1998. Random sampling for histogram construction: How much is enough? In Proceedings of ACM SIGMOD, International Conference on Management of Data, L. Haas and A. Tiwary (Eds.). New York: ACM, pp. 436–447.Google Scholar
  14. Cochran, W. 1977. Sampling Techniques. New York: John Wiley & Sons.Google Scholar
  15. Cohn, D., Atlas, L., and Ladner, R. 1994. Improving generalization with active learning. Machine Learning, 15:201–221.Google Scholar
  16. Cohn, D., Ghahramani, Z., and Jordan, M. 1996. Active learning with statistical models. Journal of Artificial Intelligence Research, 4:129–145.Google Scholar
  17. Cover, T. and Hart, P. 1967. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, IT-13:21–27.Google Scholar
  18. Cover, T.M. and Thomas, J.A. 1991. Elements of Information Theory. New York: Wiley.Google Scholar
  19. Devlin, B. 1997. Data Warehouse from Architecture to Implementations. Reading, MA: Addison Wesley Longman, Inc.Google Scholar
  20. Domingo, C., Gavaldà, R., and Watanabe, O. 2002. Adaptive sampling methods for scaling up knowledge discovery algorithms. Data Mining and Knowledge Disovery, An International Journal, 6(2):131–152.Google Scholar
  21. DuMouchel, W., Volinsky, C., Johnson, T., Cortes, C., and Pregibon, D. 1999. Squashing flat files flatter. In Proceedings of the 5th ACM Conference on Knowledege Discovery and Data Mining.Google Scholar
  22. Everitt, B. 1974. Cluster Analysis. London: Heinemann.Google Scholar
  23. Fayyad, U. and Irani, K. 1993. Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp. 1022–1027.Google Scholar
  24. Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. 1996. From data mining to knowledge discovery: An overview. In Advances in Knowledge Discovery and Data Mining, U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy (Eds.). Menlo Park, CA: AAAI Press/The MIT Press, pp. 495–515.Google Scholar
  25. Fisher, D. 1987. Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2:139–172.Google Scholar
  26. Freund, Y. 1994. Sifting informative examples from a random source. In Advances in Neural Information Processing Systems, pp. 85–89.Google Scholar
  27. Freund, Y. 1995. Boosting a weak learning algorithm by majority algorithm. Information and Computation, 121(2):256–285.Google Scholar
  28. Freund, Y. and Schapire, R. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer Systems and Science, 55(1):119–139.Google Scholar
  29. Harris-Jones, C. and Haines, T.L. 1997. Sample size and misclassification: Is more always better? Working Paper AMSCAT-WP-97-118, AMS Center for Advanced Technologies.Google Scholar
  30. Hussain, F., Liu, H., Tan, C., and Dash, M. 1999. Discretization: An enabling technique. Technical Report: TRC6/99, School of Computing, National University of Singapore.Google Scholar
  31. Joachims, T. 1998. Text categorization with support vector machines: Learning with many relevant features. In Proceedings of 10th European Conference on Machine Learning, C. Nedellec and C. Rouveirol (Eds.). Chemnitz, Germany, pp. 137–142.Google Scholar
  32. Kivinen, J. and Mannila, H. 1994. The power of sampling in knowledge discovery. In SIGMOD/PODS' 94, pp. 77–85.Google Scholar
  33. Langley, P. 1996. Elements of Machine Learning. San Mateo, CA: Morgan Kaufmann.Google Scholar
  34. Lewis, D. and Catlett, J. 1994. Heterogeneous uncertainty sampling for supervised learning. In Proceedings of the Eleventh Conference on Machine Learning, pp. 148–156.Google Scholar
  35. Lewis, D. and Gale, W. 1994. A sequential algorithm for training text classifiers. In Proceedings of the Seventeenth Annual ACM-SIGR Conference on Research and Development in Information Retrieval, pp. 3–12.Google Scholar
  36. Liu, H. and Motoda, H. (Eds.). 1998a. Feature Extraction, Construction and Selection: A Data Mining Perspective. Boston: Kluwer Academic Publishers.Google Scholar
  37. Liu, H. and Motoda, H. 1998b. Feature Selection for Knowledge Discovery Data Mining. Boston: Kluwer Academic Publishers.Google Scholar
  38. Madigan, D., Raghavan, N., DuMouchel, W., Nason, M., Posse, C., and Ridgeway, G. 2002. Liklihood-based data squashing: A modeling approach to instance construction. Data Mining and Knowledge Discovery, An International Journal, 6(2):173–190.Google Scholar
  39. McCallum, A. and Nigam, K. 1998. Employing EM in pool-based active learning for text classification. In Proceedings of the Fifteenth International Conference on Machine Learning, pp. 350–358.Google Scholar
  40. Mitchell, T. 1997 Machine Learning. New York: McGraw-Hill.Google Scholar
  41. Piatetsky-Shapiro, G. and Connell, C. 1984. Accurate estimate of the number of tuples satisfying a condition. In ACM SIGMOD Conference, pp. 256–276.Google Scholar
  42. Provost, F., Jensen, D., and Oates, T. 1999. Efficient progressive sampling. In Proceedings of the 5th ACM Conference on Knowledege Discovery and Data Mining.Google Scholar
  43. Provost, F. and Kolluri, V. 1999. A survey of methods for scaling up inductive algorithms. Journal of Data Mining and Knowledge Discovery, 3:131–169.Google Scholar
  44. Quinlan, J. 1993. C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.Google Scholar
  45. Reinartz, T. 1999. Focusing Solutions for Data Mining. New York: Springer. LNAI 1623.Google Scholar
  46. Reinartz, T. 2002. A unifying view on instance selection. Data Mining and Knowledge Disovery, An International Journal, 6(2):191–210.Google Scholar
  47. Schapire, R. 1990. The strength of weak learnability. Machine Learning, 5(2):197–227.Google Scholar
  48. Scholkopf, B., Burges, C., and Vapnik, V. 1995. Extracting support data for a given task. In Proceedings of the First International Conference on Knowledge Discvoery and Data Mining, U. Fayyad and R. Uthurusamy (Eds.). pp. 252–257.Google Scholar
  49. Seung, H., Opper, M., and Sompolinsky, H. 1992. Query by committee. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA, pp. 287–294.Google Scholar
  50. Smith, P. 1998. Into Statistics. Singapore: Springer-Verlag.Google Scholar
  51. Syed, N., Liu, H., and Sung, K. 1999a. Handling concept drifts in incremental learning with support vector machines. In Proceedings of ACM SIGKDD, International Conference on Knowledge Discovery and Data Mining, New York, S. Chaudhuri and D. Madigan (Eds.). pp. 317–321.Google Scholar
  52. Syed, N., Liu, H., and Sung, K. 1999b. A study of support vectors on model independent example selection. In Proceedings of ACM SIGKDD, International Conference on Knowledge Discovery and Data Mining, New York, S. Chaudhuri and D. Madigan (Eds.). pp. 272–276.Google Scholar
  53. Szalay, A. and Gray, J. 1999. Drowning in data. Scientific American www.sciam.com/explorations/1999/.Google Scholar
  54. Utogoff, P. 1989. Incremental induction of decision trees. Machine Learning, 4:161–186.Google Scholar
  55. Valiant, L. 1984. A theory of the learnable. Communications of the Association for Computing Machinery, 27:1134–1142.Google Scholar
  56. Vapnik, V. 1995. The Nature of Statistical Learning Theory. New York: Springer-Verlag.Google Scholar
  57. Weiss, S. and Indurkhya, N. 1998. Predictive Data Mining. San Francisco, California: Morgan Kaufmann.Google Scholar
  58. Weiss, S. and Kulikowski, C. 1991. Computer Systems That Learn. San Mateo, California: Morgan Kaufmann.Google Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • Huan Liu
    • 1
  • Hiroshi Motoda
    • 2
  1. 1.Department of Computer Science and EngineeringArizona State UniversityTempeUSA
  2. 2.Institute of Scientific and Industrial ResearchOsaka UniversityOsakaJapan

Personalised recommendations