Analogy-Based Reasoning in Classifier Construction

Wojna, Arkadiusz

doi:10.1007/11574798_11

Arkadiusz Wojna¹⁸

Part of the book series: Lecture Notes in Computer Science ((TRS,volume 3700))

530 Accesses
37 Citations

Abstract

Analogy-based reasoning methods in machine learning make it possible to reason about properties of objects on the basis of similarities between objects. A specific similarity based method is the k nearest neighbors (k-nn) classification algorithm. In the k-nn algorithm, a decision about a new object x is inferred on the basis of a fixed number k of the objects most similar to x in a given set of examples. The primary contribution of the dissertation is the introduction of two new classification models based on the k-nn algorithm.

The first model is a hybrid combination of the k-nn algorithm with rule induction. The proposed combination uses minimal consistent rules defined by local reducts of a set of examples. To make this combination possible the model of minimal consistent rules is generalized to a metric-dependent form. An effective polynomial algorithm implementing the classification model based on minimal consistent rules has been proposed by Bazan. We modify this algorithm in such a way that after addition of the modified algorithm to the k-nn algorithm the increase of the computation time is inconsiderable. For some tested classification problems the combined model was significantly more accurate than the classical k-nn classification algorithm.

For many real-life problems it is impossible to induce relevant global mathematical models from available sets of examples. The second model proposed in the dissertation is a method for dealing with such sets based on locally induced metrics. This method adapts the notion of similarity to the properties of a given test object. It makes it possible to select the correct decision in specific fragments of the space of objects. The method with local metrics improved significantly the classification accuracy of methods with global models in the hardest tested problems.

The important issues of quality and efficiency of the k-nn based methods are a similarity measure and the performance time in searching for the most similar objects in a given set of examples, respectively. In this dissertation both issues are studied in detail and some significant improvements are proposed for the similarity measures and for the search methods found in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behaviour of distance metrics in high dimensional space. In: Proceedings of the Eighth Internatinal Conference on Database Theory, London, UK, pp. 420–434 (2001)
Google Scholar
Aha, D.W.: Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms. International Journal of Man-Machine Studies 36, 267–287 (1992)
Article Google Scholar
Aha, D.W.: The omnipresence of case-based reasoning in science and applications. Knowledge-Based Systems 11(5-6), 261–273 (1998)
Article Google Scholar
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)
Google Scholar
Ajdukiewicz, K.: Logika Pragmatyczna. PWN, Warszawa (1974)
Google Scholar
Bazan, J.G.: Discovery of decision rules by matching new objects against data tables. In: Polkowski, L., Skowron, A. (eds.) RSCTC 1998. LNCS (LNAI), vol. 1424, pp. 521–528. Springer, Heidelberg (1998)
Chapter Google Scholar
Bazan, J.G., Szczuka, M.: RSES and RSESlib - a collection of tools for rough set computations. In: Ziarko, W.P., Yao, Y. (eds.) RSCTC 2000. LNCS (LNAI), vol. 2005, pp. 106–113. Springer, Heidelberg (2001)
Chapter Google Scholar
Bazan, J.G., Szczuka, M., Wojna, A.G., Wojnarski, M.: On the evolution of Rough Set Exploration System. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 592–601. Springer, Heidelberg (2004)
Chapter Google Scholar
Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R ^⋆-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, Atlantic City, NJ, pp. 322–331 (1990)
Google Scholar
Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)
MATH Google Scholar
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Communications of the ACM 18(9), 509–517 (1975)
Article MATH MathSciNet Google Scholar
Berchtold, S., Keim, D., Kriegel, H.P.: The X-tree: an index structure for high dimensional data. In: Proceedings of the Twenty Second International Conference on Very Large Databases, pp. 28–39 (1996)
Google Scholar
Beyer, K.S., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Proceedings of the Seventh International Conference on Database Theory, Jerusalem, Israel, pp. 217–235 (1999)
Google Scholar
Biberman, Y.: A context similarity measure. In: Proceedings of the Ninth European Conference on Machine Learning, Catania, Italy, pp. 49–63 (1994)
Google Scholar
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1996)
MATH Google Scholar
Blake, C.L., Merz, C.J.: UCI repository of machine learning databases. Department of Information and Computer Science. University of California, Irvine (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Breiman, L.: Statistical modeling - the two cultures. Statistical Science 16(3), 199–231 (2001)
Article MATH MathSciNet Google Scholar
Brin, S.: Near neighbor search in large metric spaces. In: Proceedings of the Twenty First International Conference on Very Large Databases, pp. 574–584 (1995)
Google Scholar
Chavez, E., Navarro, G., Baeza-Yates, R., Marroquin, J.L.: Searching in metric spaces. Technical Report TR/DCC-99-3, Department of Computer Science. University of Chile (1999)
Google Scholar
Ciaccia, P., Patella, M., Zezula, P.: M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of the Twenty Third International Conference on Very Large Databases, pp. 426–435 (1997)
Google Scholar
Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning 3, 261–284 (1989)
Google Scholar
Cost, S., Salzberg, S.: A weighted nearest neighbor algorithm for learning with symbolic features. Machine Learning 10, 57–78 (1993)
Google Scholar
Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13, 21–27 (1967)
Article MATH Google Scholar
Domeniconi, C., Gunopulos, D.: Efficient local flexible nearest neighbor classification. In: Proceedings of the Second SIAM International Conference on Data Mining (2002)
Google Scholar
Domingos, P.: Unifying instance-based and rule-based induction. Machine Learning 24(2), 141–168 (1996)
MathSciNet Google Scholar
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
MATH Google Scholar
Dudani, S.: The distance-weighted k-nearest-neighbor rule. IEEE Transactions on Systems, Man and Cybernetics 6, 325–327 (1976)
Google Scholar
Fikes, R.E., Nilsson, N.J.: STRIPS: A new approach to the application of theorem proving to problem solving. Artificial Intelligence 2(3-4), 189–208 (1971)
Article MATH Google Scholar
Finkel, R., Bentley, J.: Quad-trees: a data structure for retrieval and composite keys. ACTA Informatica 4(1), 1–9 (1974)
Article MATH Google Scholar
Fisher, R.A.: Applications of “student”s’ distribution. Metron 5, 3–17 (1925)
Google Scholar
Fix, E., Hodges, J.L.: Discriminatory analysis, non-parametric discrimination: Consistency properties. Technical Report 4, USAF School of Aviation and Medicine, Randolph Air Field (1951)
Google Scholar
Friedman, J.: Flexible metric nearest neighbor classification. Technical Report 113. Department of Statistics, Stanford University, CA (1994)
Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer, New York (2001)
MATH Google Scholar
Friedman, J.H., Kohavi, R., Yun, Y.: Lazy decision trees. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence, Cambridge, pp. 717–724 (1996)
Google Scholar
Fukunaga, K., Narendra, P.M.: A branch and bound algorithm for computing k-nearest neighbors. IEEE Transactions on Computers 24(7), 750–753 (1975)
Article MATH MathSciNet Google Scholar
Gaede, V., Gunther, O.: Multidimensional access methods. ACM Computing Surveys 30(2), 170–231 (1998)
Article Google Scholar
Golding, A.R., Rosenbloom, P.S.: Improving accuracy by combining rule-based and case-based reasoning. Artificial Intelligence 87(1-2), 215–254 (1996)
Article Google Scholar
Góra, G., Wojna, A.G.: Local attribute value grouping for lazy rule induction. In: Alpigini, J.J., Peters, J.F., Skowron, A., Zhong, N. (eds.) RSCTC 2002. LNCS (LNAI), vol. 2475, pp. 405–412. Springer, Heidelberg (2002)
Chapter Google Scholar
Góra, G., Wojna, A.G.: RIONA: a classifier combining rule induction and k-nn method with automated selection of optimal neighbourhood. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 111–123. Springer, Heidelberg (2002)
Chapter Google Scholar
Góra, G., Wojna, A.G.: RIONA: a new classification system combining rule induction and instance-based learning. Fundamenta Informaticae 51(4), 369–390 (2002)
MATH MathSciNet Google Scholar
Gosset, W.S.(Student): The probable error of a mean. Biometrika 6, 1–25 (1908)
Google Scholar
Grzymala-Busse, J.W.: LERS - a system for learning from examples based on rough sets. In: Slowinski, R. (ed.) Intelligent Decision Support, Handbook of Applications and Advances of the Rough Sets Theory, pp. 3–18. Kluwer Academic Publishers, Dordrecht (1992)
Google Scholar
Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, Boston, MA, pp. 47–57 (1984)
Google Scholar
Hastie, T., Tibshirani, R.: Discriminant adaptive nearest neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 18(6), 607–616 (1996)
Article Google Scholar
Jensen, F.V.: An Introduction to Bayesian Networks. Springer, New York (1996)
Google Scholar
Kalantari, I., McDonald, G.: A data structure and an algorithm for the nearest point problem. IEEE Transactions on Software Engineering 9(5), 631–634 (1983)
Article Google Scholar
Katayama, N., Satoh, S.: The SR-tree: an index structure for high dimensional nearest neighbor queries. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, pp. 369–380 (1997)
Google Scholar
Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the Ninth International Conference on Machine Learning, Aberdeen, Scotland, pp. 249–256. Morgan Kaufmann, San Francisco (1992)
Google Scholar
Kleinberg, J., Papadimitriou, C., Raghavan, P.: Segmentation problems. Journal of the ACM 51(2), 263–280 (2004)
Article MathSciNet Google Scholar
Klösgen, W., Żytkow, J.M. (eds.): Handbook of Data Mining and Knowledge Discovery. Oxford University Press, Inc., New York (2002)
MATH Google Scholar
Kononenko, I.: Estimating attributes: Analysis and extensions of RELIEF. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)
Google Scholar
Leake, D.B. (ed.): Case-Based Reasoning: Experiences, Lessons and Future Directions. AAAI Press/MIT Press (1996)
Google Scholar
Li, J., Dong, G., Ramamohanarao, K., Wong, L.: DeEPs: a new instance-based discovery and classification system. Machine Learning (2003) (to appear)
Google Scholar
Li, J., Ramamohanarao, K., Dong, G.: Combining the strength of pattern frequency and distance for classification. In: Proceedings of the Fifth Pacific-Asia Conference on Knowledge Discovery and Data Mining, Hong Kong, pp. 455–466 (2001)
Google Scholar
Lin, K.I., Jagadish, H.V., Faloustos, C.: The TV-tree: an index structure for high dimensional data. VLDB Journal 3(4), 517–542 (1994)
Article Google Scholar
Lowe, D.: Similarity metric learning for a variable kernel classifier. Neural Computation 7, 72–85 (1995)
Article Google Scholar
Luce, D.R., Raiffa, H.: Games and Decisions. Wiley, New York (1957)
MATH Google Scholar
Macleod, J.E.S., Luk, A., Titterington, D.M.: A re-examination of the distance-weighted k-nearest-neighbor classification rule. IEEE Transactions on Systems, Man and Cybernetics 17(4), 689–696 (1987)
Article Google Scholar
Michalski, R.S.: A theory and methodology of inductive learning. Artificial Intelligence 20, 111–161 (1983)
Article MathSciNet Google Scholar
Michalski, R.S., Mozetic, I., Hong, J., Lavrac, H.: The multi-purpose incremental learning system AQ15 and its testing application to three medical domains. In: Proceedings of the Fifth National Conference on Artificial Intelligence, pp. 1041–1045 (1986)
Google Scholar
Mitchell, T.M.: Machine Learning. McGraw-Hill, Portland (1997)
MATH Google Scholar
Nievergelt, J., Hinterberger, H., Sevcik, K.: The grid file: an adaptable symmetric multikey file structure. ACM Transactions on Database Systems 9(1), 38–71 (1984)
Article Google Scholar
Pawlak, Z.: Rough Sets - Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991)
MATH Google Scholar
Polkowski, L., Skowron, A.: Synthesis of decision systems from data tables. In: Lin, T.Y., Cercone, N. (eds.) Rough Sets and Data Mining: Analysis of Imprecise Data, pp. 259–299. Kluwer Academic Publishers, Dordrecht (1997)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Robinson, J.: The K-D-B-tree: a search structure for large multi-dimensional dynamic indexes. In: Proceedings of the 1981 ACM SIGMOD International Conference on Management of Data, New York, pp. 10–18 (1981)
Google Scholar
Rosenblueth, A., Wiener, N., Bigelow, J.: Behavior, purpose, and teleology. Philosophy of Science 10, 18–24 (1943)
Article Google Scholar
Russell, S.J.: Use of Knowledge in Analogy and Induction. Morgan Kaufmann, San Francisco (1989)
MATH Google Scholar
Salzberg, S.: A nearest hyperrectangle learning method. Machine Learning 2, 229–246 (1991)
Google Scholar
Savaresi, S.M., Boley, D.L.: On the performance of bisecting K-means and PDDP. In: Proceedings of the First SIAM International Conference on Data Mining, Chicago, USA, pp. 1–14 (2001)
Google Scholar
Sellis, T., Roussopoulos, N., Faloustos, C.: The R+-tree: a dynamic index for multi-dimensional objects. In: Proceedings of the Thirteenth International Conference on Very Large Databases, pp. 574–584 (1987)
Google Scholar
Shepard, R.N.: Toward a universal law of generalization for psychological science. science 237, 1317–1323 (1987)
Article MathSciNet Google Scholar
Skowron, A., et al.: Rough set exploration system. Institute of Mathematics, Warsaw University, Poland, http://logic.mimuw.edu.pl/~rses
Skowron, A., Rauszer, C.: The discernibility matrices and functions in information systems. In: Slowinski, R. (ed.) Intelligent Decision Support, Handbook of Applications and Advances of the Rough Sets Theory, pp. 331–362. Kluwer Academic Publishers, Dordrecht (1992)
Google Scholar
Skowron, A., Stepaniuk, J.: Information granules and rough-neural computing. In: Rough-Neural Computing: Techniques for Computing with Words. Cognitive Technologies, pp. 43–84. Springer-Verlag, Heidelberg (2003)
Google Scholar
Skowron, A., Wojna, A.G.: K nearest neighbors classification with local induction of the simple value difference metric. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 229–234. Springer, Heidelberg (2004)
Chapter Google Scholar
Stanfill, C., Waltz, D.: Toward memory-based reasoning. Communications of the ACM 29(12), 1213–1228 (1986)
Article Google Scholar
Uhlmann, J.: Satisfying general proximity/similarity queries with metric trees. Information Processing Letters 40(4), 175–179 (1991)
Article MATH Google Scholar
Vapnik, V.: Statistical Learning Theory. Wiley, Chichester (1998)
MATH Google Scholar
Veloso, M.: Planning and Learning by Analogical Reasoning. Springer, Heidelberg (1994)
MATH Google Scholar
van Neumann, J., Morgenstern, O.: Theory of Games and Economic Behavior. Princeton University Press, Princeton (1944)
MATH Google Scholar
Ward Jr, J.: Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58, 236–244 (1963)
Article MathSciNet Google Scholar
Weber, R., Schek, H.J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of the Twenty Fourth International Conference on Very Large Databases, pp. 194–205 (1998)
Google Scholar
Wettschereck, D.: A Study of Distance-Based Machine Learning Algorithms. PhD thesis, Oregon State University (1994)
Google Scholar
Wettschereck, D., Aha, D.W., Mohri, T.: A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artificial Intelligence Review 11, 273–314 (1997)
Article Google Scholar
White, D.A., Jain, R.: Similarity indexing with the SS-tree. In: Proceedings of the Twelve International Conference on Data Engineering, New Orleans, USA, pp. 516–523 (1996)
Google Scholar
Wiener, N.: Cybernetics. Wiley, New York (1948)
Google Scholar
Wilson, D.R., Martinez, T.R.: Improved heterogeneous distance functions. Journal of Artificial Intelligence Research 6, 1–34 (1997)
MATH MathSciNet Google Scholar
Wilson, D.R., Martinez, T.R.: An integrated instance-based learning algorithm. Computational Intelligence 16(1), 1–28 (2000)
Article MathSciNet Google Scholar
Wojna, A.G.: Adaptacyjne definiowanie funkcji boolowskich z przykladow. Master’s thesis, Warsaw University (2000)
Google Scholar
Wojna, A.G.: Center-based indexing for nearest neighbors search. In: Proceedings of the Third IEEE International Conference on Data Mining, Melbourne, Florida, USA, pp. 681–684. IEEE Computer Society Press, Los Alamitos (2003)
Chapter Google Scholar
Wojna, A.G.: Center-based indexing in vector and metric spaces. Fundamenta Informaticae 56(3), 285–310 (2003)
MATH MathSciNet Google Scholar
Wolpert, D.: Constructing a generalizer superior to NETtalk via meithematical theory of generalization. Neural Networks 3, 445–452 (1989)
Article Google Scholar
Wróblewski, J.: Covering with reducts - a fast algorithm for rule generation. In: Polkowski, L., Skowron, A. (eds.) RSCTC 1998. LNCS (LNAI), vol. 1424, pp. 402–407. Springer, Heidelberg (1998)
Chapter Google Scholar
Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the Fourth Annual ACM/SIGACT-SIAM Symposium on Discrete Algorithms, Austin, Texas, pp. 311–321 (1993)
Google Scholar
Zavrel, J.: An empirical re-examination of weighted voting for k-nn. In: Proceedings of the Seventh Belgian-Dutch Conference on Machine Learning, Tilburg, The Netherlands, pp. 139–148 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Informatics, Warsaw University, Banacha 2, 02-097, Warsaw, Poland
Arkadiusz Wojna

Authors

Arkadiusz Wojna
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, University of Manitoba, R3T 5V6, Winnipeg, Manitoba, Canada
James F. Peters
Institute of Mathematics, Warsaw University, Banacha 2, 02-097, Warsaw, Poland
Andrzej Skowron

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wojna, A. (2005). Analogy-Based Reasoning in Classifier Construction. In: Peters, J.F., Skowron, A. (eds) Transactions on Rough Sets IV. Lecture Notes in Computer Science, vol 3700. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11574798_11

Download citation

DOI: https://doi.org/10.1007/11574798_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29830-4
Online ISBN: 978-3-540-32016-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics