Skip to main content

Analogy-Based Reasoning in Classifier Construction

  • Conference paper
Transactions on Rough Sets IV

Part of the book series: Lecture Notes in Computer Science ((TRS,volume 3700))

Abstract

Analogy-based reasoning methods in machine learning make it possible to reason about properties of objects on the basis of similarities between objects. A specific similarity based method is the k nearest neighbors (k-nn) classification algorithm. In the k-nn algorithm, a decision about a new object x is inferred on the basis of a fixed number k of the objects most similar to x in a given set of examples. The primary contribution of the dissertation is the introduction of two new classification models based on the k-nn algorithm.

The first model is a hybrid combination of the k-nn algorithm with rule induction. The proposed combination uses minimal consistent rules defined by local reducts of a set of examples. To make this combination possible the model of minimal consistent rules is generalized to a metric-dependent form. An effective polynomial algorithm implementing the classification model based on minimal consistent rules has been proposed by Bazan. We modify this algorithm in such a way that after addition of the modified algorithm to the k-nn algorithm the increase of the computation time is inconsiderable. For some tested classification problems the combined model was significantly more accurate than the classical k-nn classification algorithm.

For many real-life problems it is impossible to induce relevant global mathematical models from available sets of examples. The second model proposed in the dissertation is a method for dealing with such sets based on locally induced metrics. This method adapts the notion of similarity to the properties of a given test object. It makes it possible to select the correct decision in specific fragments of the space of objects. The method with local metrics improved significantly the classification accuracy of methods with global models in the hardest tested problems.

The important issues of quality and efficiency of the k-nn based methods are a similarity measure and the performance time in searching for the most similar objects in a given set of examples, respectively. In this dissertation both issues are studied in detail and some significant improvements are proposed for the similarity measures and for the search methods found in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behaviour of distance metrics in high dimensional space. In: Proceedings of the Eighth Internatinal Conference on Database Theory, London, UK, pp. 420–434 (2001)

    Google Scholar 

  2. Aha, D.W.: Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms. International Journal of Man-Machine Studies 36, 267–287 (1992)

    Article  Google Scholar 

  3. Aha, D.W.: The omnipresence of case-based reasoning in science and applications. Knowledge-Based Systems 11(5-6), 261–273 (1998)

    Article  Google Scholar 

  4. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)

    Google Scholar 

  5. Ajdukiewicz, K.: Logika Pragmatyczna. PWN, Warszawa (1974)

    Google Scholar 

  6. Bazan, J.G.: Discovery of decision rules by matching new objects against data tables. In: Polkowski, L., Skowron, A. (eds.) RSCTC 1998. LNCS (LNAI), vol. 1424, pp. 521–528. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  7. Bazan, J.G., Szczuka, M.: RSES and RSESlib - a collection of tools for rough set computations. In: Ziarko, W.P., Yao, Y. (eds.) RSCTC 2000. LNCS (LNAI), vol. 2005, pp. 106–113. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  8. Bazan, J.G., Szczuka, M., Wojna, A.G., Wojnarski, M.: On the evolution of Rough Set Exploration System. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 592–601. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  9. Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R  ⋆ -tree: an efficient and robust access method for points and rectangles. In: Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, Atlantic City, NJ, pp. 322–331 (1990)

    Google Scholar 

  10. Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)

    MATH  Google Scholar 

  11. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Communications of the ACM 18(9), 509–517 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  12. Berchtold, S., Keim, D., Kriegel, H.P.: The X-tree: an index structure for high dimensional data. In: Proceedings of the Twenty Second International Conference on Very Large Databases, pp. 28–39 (1996)

    Google Scholar 

  13. Beyer, K.S., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Proceedings of the Seventh International Conference on Database Theory, Jerusalem, Israel, pp. 217–235 (1999)

    Google Scholar 

  14. Biberman, Y.: A context similarity measure. In: Proceedings of the Ninth European Conference on Machine Learning, Catania, Italy, pp. 49–63 (1994)

    Google Scholar 

  15. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1996)

    MATH  Google Scholar 

  16. Blake, C.L., Merz, C.J.: UCI repository of machine learning databases. Department of Information and Computer Science. University of California, Irvine (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html

  17. Breiman, L.: Statistical modeling - the two cultures. Statistical Science 16(3), 199–231 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  18. Brin, S.: Near neighbor search in large metric spaces. In: Proceedings of the Twenty First International Conference on Very Large Databases, pp. 574–584 (1995)

    Google Scholar 

  19. Chavez, E., Navarro, G., Baeza-Yates, R., Marroquin, J.L.: Searching in metric spaces. Technical Report TR/DCC-99-3, Department of Computer Science. University of Chile (1999)

    Google Scholar 

  20. Ciaccia, P., Patella, M., Zezula, P.: M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of the Twenty Third International Conference on Very Large Databases, pp. 426–435 (1997)

    Google Scholar 

  21. Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning 3, 261–284 (1989)

    Google Scholar 

  22. Cost, S., Salzberg, S.: A weighted nearest neighbor algorithm for learning with symbolic features. Machine Learning 10, 57–78 (1993)

    Google Scholar 

  23. Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13, 21–27 (1967)

    Article  MATH  Google Scholar 

  24. Domeniconi, C., Gunopulos, D.: Efficient local flexible nearest neighbor classification. In: Proceedings of the Second SIAM International Conference on Data Mining (2002)

    Google Scholar 

  25. Domingos, P.: Unifying instance-based and rule-based induction. Machine Learning 24(2), 141–168 (1996)

    MathSciNet  Google Scholar 

  26. Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)

    MATH  Google Scholar 

  27. Dudani, S.: The distance-weighted k-nearest-neighbor rule. IEEE Transactions on Systems, Man and Cybernetics 6, 325–327 (1976)

    Google Scholar 

  28. Fikes, R.E., Nilsson, N.J.: STRIPS: A new approach to the application of theorem proving to problem solving. Artificial Intelligence 2(3-4), 189–208 (1971)

    Article  MATH  Google Scholar 

  29. Finkel, R., Bentley, J.: Quad-trees: a data structure for retrieval and composite keys. ACTA Informatica 4(1), 1–9 (1974)

    Article  MATH  Google Scholar 

  30. Fisher, R.A.: Applications of “student”s’ distribution. Metron 5, 3–17 (1925)

    Google Scholar 

  31. Fix, E., Hodges, J.L.: Discriminatory analysis, non-parametric discrimination: Consistency properties. Technical Report 4, USAF School of Aviation and Medicine, Randolph Air Field (1951)

    Google Scholar 

  32. Friedman, J.: Flexible metric nearest neighbor classification. Technical Report 113. Department of Statistics, Stanford University, CA (1994)

    Google Scholar 

  33. Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer, New York (2001)

    MATH  Google Scholar 

  34. Friedman, J.H., Kohavi, R., Yun, Y.: Lazy decision trees. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence, Cambridge, pp. 717–724 (1996)

    Google Scholar 

  35. Fukunaga, K., Narendra, P.M.: A branch and bound algorithm for computing k-nearest neighbors. IEEE Transactions on Computers 24(7), 750–753 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  36. Gaede, V., Gunther, O.: Multidimensional access methods. ACM Computing Surveys 30(2), 170–231 (1998)

    Article  Google Scholar 

  37. Golding, A.R., Rosenbloom, P.S.: Improving accuracy by combining rule-based and case-based reasoning. Artificial Intelligence 87(1-2), 215–254 (1996)

    Article  Google Scholar 

  38. Góra, G., Wojna, A.G.: Local attribute value grouping for lazy rule induction. In: Alpigini, J.J., Peters, J.F., Skowron, A., Zhong, N. (eds.) RSCTC 2002. LNCS (LNAI), vol. 2475, pp. 405–412. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  39. Góra, G., Wojna, A.G.: RIONA: a classifier combining rule induction and k-nn method with automated selection of optimal neighbourhood. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 111–123. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  40. Góra, G., Wojna, A.G.: RIONA: a new classification system combining rule induction and instance-based learning. Fundamenta Informaticae 51(4), 369–390 (2002)

    MATH  MathSciNet  Google Scholar 

  41. Gosset, W.S.(Student): The probable error of a mean. Biometrika 6, 1–25 (1908)

    Google Scholar 

  42. Grzymala-Busse, J.W.: LERS - a system for learning from examples based on rough sets. In: Slowinski, R. (ed.) Intelligent Decision Support, Handbook of Applications and Advances of the Rough Sets Theory, pp. 3–18. Kluwer Academic Publishers, Dordrecht (1992)

    Google Scholar 

  43. Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, Boston, MA, pp. 47–57 (1984)

    Google Scholar 

  44. Hastie, T., Tibshirani, R.: Discriminant adaptive nearest neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 18(6), 607–616 (1996)

    Article  Google Scholar 

  45. Jensen, F.V.: An Introduction to Bayesian Networks. Springer, New York (1996)

    Google Scholar 

  46. Kalantari, I., McDonald, G.: A data structure and an algorithm for the nearest point problem. IEEE Transactions on Software Engineering 9(5), 631–634 (1983)

    Article  Google Scholar 

  47. Katayama, N., Satoh, S.: The SR-tree: an index structure for high dimensional nearest neighbor queries. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, pp. 369–380 (1997)

    Google Scholar 

  48. Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the Ninth International Conference on Machine Learning, Aberdeen, Scotland, pp. 249–256. Morgan Kaufmann, San Francisco (1992)

    Google Scholar 

  49. Kleinberg, J., Papadimitriou, C., Raghavan, P.: Segmentation problems. Journal of the ACM 51(2), 263–280 (2004)

    Article  MathSciNet  Google Scholar 

  50. Klösgen, W., Żytkow, J.M. (eds.): Handbook of Data Mining and Knowledge Discovery. Oxford University Press, Inc., New York (2002)

    MATH  Google Scholar 

  51. Kononenko, I.: Estimating attributes: Analysis and extensions of RELIEF. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)

    Google Scholar 

  52. Leake, D.B. (ed.): Case-Based Reasoning: Experiences, Lessons and Future Directions. AAAI Press/MIT Press (1996)

    Google Scholar 

  53. Li, J., Dong, G., Ramamohanarao, K., Wong, L.: DeEPs: a new instance-based discovery and classification system. Machine Learning (2003) (to appear)

    Google Scholar 

  54. Li, J., Ramamohanarao, K., Dong, G.: Combining the strength of pattern frequency and distance for classification. In: Proceedings of the Fifth Pacific-Asia Conference on Knowledge Discovery and Data Mining, Hong Kong, pp. 455–466 (2001)

    Google Scholar 

  55. Lin, K.I., Jagadish, H.V., Faloustos, C.: The TV-tree: an index structure for high dimensional data. VLDB Journal 3(4), 517–542 (1994)

    Article  Google Scholar 

  56. Lowe, D.: Similarity metric learning for a variable kernel classifier. Neural Computation 7, 72–85 (1995)

    Article  Google Scholar 

  57. Luce, D.R., Raiffa, H.: Games and Decisions. Wiley, New York (1957)

    MATH  Google Scholar 

  58. Macleod, J.E.S., Luk, A., Titterington, D.M.: A re-examination of the distance-weighted k-nearest-neighbor classification rule. IEEE Transactions on Systems, Man and Cybernetics 17(4), 689–696 (1987)

    Article  Google Scholar 

  59. Michalski, R.S.: A theory and methodology of inductive learning. Artificial Intelligence 20, 111–161 (1983)

    Article  MathSciNet  Google Scholar 

  60. Michalski, R.S., Mozetic, I., Hong, J., Lavrac, H.: The multi-purpose incremental learning system AQ15 and its testing application to three medical domains. In: Proceedings of the Fifth National Conference on Artificial Intelligence, pp. 1041–1045 (1986)

    Google Scholar 

  61. Mitchell, T.M.: Machine Learning. McGraw-Hill, Portland (1997)

    MATH  Google Scholar 

  62. Nievergelt, J., Hinterberger, H., Sevcik, K.: The grid file: an adaptable symmetric multikey file structure. ACM Transactions on Database Systems 9(1), 38–71 (1984)

    Article  Google Scholar 

  63. Pawlak, Z.: Rough Sets - Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991)

    MATH  Google Scholar 

  64. Polkowski, L., Skowron, A.: Synthesis of decision systems from data tables. In: Lin, T.Y., Cercone, N. (eds.) Rough Sets and Data Mining: Analysis of Imprecise Data, pp. 259–299. Kluwer Academic Publishers, Dordrecht (1997)

    Google Scholar 

  65. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)

    Google Scholar 

  66. Robinson, J.: The K-D-B-tree: a search structure for large multi-dimensional dynamic indexes. In: Proceedings of the 1981 ACM SIGMOD International Conference on Management of Data, New York, pp. 10–18 (1981)

    Google Scholar 

  67. Rosenblueth, A., Wiener, N., Bigelow, J.: Behavior, purpose, and teleology. Philosophy of Science 10, 18–24 (1943)

    Article  Google Scholar 

  68. Russell, S.J.: Use of Knowledge in Analogy and Induction. Morgan Kaufmann, San Francisco (1989)

    MATH  Google Scholar 

  69. Salzberg, S.: A nearest hyperrectangle learning method. Machine Learning 2, 229–246 (1991)

    Google Scholar 

  70. Savaresi, S.M., Boley, D.L.: On the performance of bisecting K-means and PDDP. In: Proceedings of the First SIAM International Conference on Data Mining, Chicago, USA, pp. 1–14 (2001)

    Google Scholar 

  71. Sellis, T., Roussopoulos, N., Faloustos, C.: The R+-tree: a dynamic index for multi-dimensional objects. In: Proceedings of the Thirteenth International Conference on Very Large Databases, pp. 574–584 (1987)

    Google Scholar 

  72. Shepard, R.N.: Toward a universal law of generalization for psychological science. science 237, 1317–1323 (1987)

    Article  MathSciNet  Google Scholar 

  73. Skowron, A., et al.: Rough set exploration system. Institute of Mathematics, Warsaw University, Poland, http://logic.mimuw.edu.pl/~rses

  74. Skowron, A., Rauszer, C.: The discernibility matrices and functions in information systems. In: Slowinski, R. (ed.) Intelligent Decision Support, Handbook of Applications and Advances of the Rough Sets Theory, pp. 331–362. Kluwer Academic Publishers, Dordrecht (1992)

    Google Scholar 

  75. Skowron, A., Stepaniuk, J.: Information granules and rough-neural computing. In: Rough-Neural Computing: Techniques for Computing with Words. Cognitive Technologies, pp. 43–84. Springer-Verlag, Heidelberg (2003)

    Google Scholar 

  76. Skowron, A., Wojna, A.G.: K nearest neighbors classification with local induction of the simple value difference metric. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 229–234. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  77. Stanfill, C., Waltz, D.: Toward memory-based reasoning. Communications of the ACM 29(12), 1213–1228 (1986)

    Article  Google Scholar 

  78. Uhlmann, J.: Satisfying general proximity/similarity queries with metric trees. Information Processing Letters 40(4), 175–179 (1991)

    Article  MATH  Google Scholar 

  79. Vapnik, V.: Statistical Learning Theory. Wiley, Chichester (1998)

    MATH  Google Scholar 

  80. Veloso, M.: Planning and Learning by Analogical Reasoning. Springer, Heidelberg (1994)

    MATH  Google Scholar 

  81. van Neumann, J., Morgenstern, O.: Theory of Games and Economic Behavior. Princeton University Press, Princeton (1944)

    MATH  Google Scholar 

  82. Ward Jr, J.: Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58, 236–244 (1963)

    Article  MathSciNet  Google Scholar 

  83. Weber, R., Schek, H.J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of the Twenty Fourth International Conference on Very Large Databases, pp. 194–205 (1998)

    Google Scholar 

  84. Wettschereck, D.: A Study of Distance-Based Machine Learning Algorithms. PhD thesis, Oregon State University (1994)

    Google Scholar 

  85. Wettschereck, D., Aha, D.W., Mohri, T.: A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artificial Intelligence Review 11, 273–314 (1997)

    Article  Google Scholar 

  86. White, D.A., Jain, R.: Similarity indexing with the SS-tree. In: Proceedings of the Twelve International Conference on Data Engineering, New Orleans, USA, pp. 516–523 (1996)

    Google Scholar 

  87. Wiener, N.: Cybernetics. Wiley, New York (1948)

    Google Scholar 

  88. Wilson, D.R., Martinez, T.R.: Improved heterogeneous distance functions. Journal of Artificial Intelligence Research 6, 1–34 (1997)

    MATH  MathSciNet  Google Scholar 

  89. Wilson, D.R., Martinez, T.R.: An integrated instance-based learning algorithm. Computational Intelligence 16(1), 1–28 (2000)

    Article  MathSciNet  Google Scholar 

  90. Wojna, A.G.: Adaptacyjne definiowanie funkcji boolowskich z przykladow. Master’s thesis, Warsaw University (2000)

    Google Scholar 

  91. Wojna, A.G.: Center-based indexing for nearest neighbors search. In: Proceedings of the Third IEEE International Conference on Data Mining, Melbourne, Florida, USA, pp. 681–684. IEEE Computer Society Press, Los Alamitos (2003)

    Chapter  Google Scholar 

  92. Wojna, A.G.: Center-based indexing in vector and metric spaces. Fundamenta Informaticae 56(3), 285–310 (2003)

    MATH  MathSciNet  Google Scholar 

  93. Wolpert, D.: Constructing a generalizer superior to NETtalk via meithematical theory of generalization. Neural Networks 3, 445–452 (1989)

    Article  Google Scholar 

  94. Wróblewski, J.: Covering with reducts - a fast algorithm for rule generation. In: Polkowski, L., Skowron, A. (eds.) RSCTC 1998. LNCS (LNAI), vol. 1424, pp. 402–407. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  95. Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the Fourth Annual ACM/SIGACT-SIAM Symposium on Discrete Algorithms, Austin, Texas, pp. 311–321 (1993)

    Google Scholar 

  96. Zavrel, J.: An empirical re-examination of weighted voting for k-nn. In: Proceedings of the Seventh Belgian-Dutch Conference on Machine Learning, Tilburg, The Netherlands, pp. 139–148 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wojna, A. (2005). Analogy-Based Reasoning in Classifier Construction. In: Peters, J.F., Skowron, A. (eds) Transactions on Rough Sets IV. Lecture Notes in Computer Science, vol 3700. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11574798_11

Download citation

  • DOI: https://doi.org/10.1007/11574798_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29830-4

  • Online ISBN: 978-3-540-32016-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics