Abstract
Feature extraction based on evolutionary search offers new possibilities for improving classification accuracy and reducing measurement complexity in many data mining and machine learning applications. We present a family of genetic algorithms for feature synthesis through clustering of discrete attribute values. The approach uses new compact graph-based encoding for cluster representation, where size of GA search space is reduced exponentially with respect to the number of items in partitioning, as compared to original idea of Park and Song. We apply developed algorithms and study their effectiveness for DNA fingerprinting in population genetics and text categorization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Devijver, P.A. and Kittler, J.: Pattern Recognition: A Statistical Approach. Prentice-Hall International, (1982)
Jain A.K., Duin R.P. and Mao J, Statistical pattern recognition: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, (2000) 4–37
Freitas A.A.: Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer-Verlag, (2002)
Kohavi R. and John G.: Wrappers for Feature Subset Selection. Artificial Intelligence Journal 97(1–2), (1997) 273–324
Siedlecki W. and Sklansky J.: On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2, (1988) 197–220
Vafaie H., and De Jong K.: Robust feature selection algorithms. In Proc. of the 5th IEEE International Conference on Tools for Artificial Intelligence, Boston, MA, (1993) 356–363
Whitley D., Beveridge R., Guerra C. and Graves C.: Messy Genetic Algorithms for Subset Feature Selection. International Conference on Genetic Algorithms. T. Baeck, ed. Morgan Kaufmann, (1997)
Yang J. and Honavar V.: Feature Subset Selection Using a Genetic Algorithm. In: Feature Extraction, Construction, and Subset Selection: A Data Mining Perspective. Motoda, H. and Liu, H. (Eds.) New York, Kluwer, (1998)
Punch W.F., Goodman E.D., Pei M., Chia-Shun L., Hovland P. and Enbody R.: Further Research on Feature Selection and Classification Using Genetic Algorithms. In Proc. 5th International Conference on Genetic Algorithms, Urbana-Champaign IL, (1993) 557–562
Raymer M., Punch W., Goodman E., Sanschagrin P., and Kuhn L., Simultaneous Feature Extraction and Selection using a Masking Genetic Algorithm. In Proc. of 7th International Conference on Genetic Algorithms (ICGA), San Francisco CA, (1997) 561–567
Vafaie H. and DeJong K.: Feature Space Transformation Using Genetic Algorithms. IEEE Intelligent Systems 13(2), (1998) 57–65
Lin C. and Wu J.: Automatic facial feature extraction by genetic algorithms. IEEE Trans. on Image Processing, vol. 8(6), (1999) 834–845
Raymer M.L., Punch W.F., Goodman E.D., Kuhn L.A. and Jain A.K.: Dimensionality Reduction Using Genetic Algorithms. IEEE Trans. on Evolutionary Computations 4(2), (2000) 164–171
Brumby S.P., Theiler J., Perkins S.J., Harvey N.R., Szymanski J.J., Bloch J.J., and Mitchell M.: Investigation of Feature Extraction by a Genetic Algorithm. Proc. SPIE 3812, (1999) 24–31
Larsen O., Freitas A.A. and Nievola J.C.: Constructing X-of-N attributes with a genetic algorithm. In Proc. 4th Int. Conf. on Recent Advances in Soft Computing, (2002) 326–331
Pudil P. and Novovicová J.: Feature Subset Selection Using a Genetic Algorithm in Feature Extraction. In: Huan Liu, Hiroshi Motoda (eds.): Construction and Selection: A Data Mining Perspective, Kluwer (1998)
Martin-Bautista M. and Vila M.-A.: A survey of genetic feature selection in mining issues. In Proceedings of the Congress on Evolutionary Computation (CEC 99), (1999) 13–23
Falkenauer E., Genetic Algorithms and Grouping Problems. John Wiley & Son Ltd., (1998)
Park Y-J. and Song M-S.: A genetic algorithm for clustering problems. In Proc. 3rd Annual Conf. on Genetic Programming, (1998) 568–575.
Trunk, G.V.: A problem of dimensionality: a simple example. IEEE Trans. Patt. Anal. Mach. Intell. 1, (1979) 306–307
Minker, J., Wilson, G.A., Zimmerman, B.H., An evaluation of query expansion by the addition of clustered terms for a document retrieval system. Information Storage and Retrieval 8(6), (1972) 329–348
Spark-Jones K. and Jackson D.M.: The use of automatically-obtained keyword classifications for information retrieval. Information Processing and Management 5, (1970) 175–201
Merzbacher M. and Chu W. W.: Pattern-based clustering for database attribute values. In Proc. of AAAI Workshop on Knowledge Discovery in Databases, Wash., D.C., (1993)
Tishby N., Pereira F.C., and Bialek W.: The information bottleneck method. In Proc. of the 37-th Annual Allerton Conference on Communication, Control and Computing, (1999) 368–377
Slonim N. and Tishby N.: Agglomerative Information Bottleneck. In Advances in Neural Information Processing Systems (NIPS-12), MIT Press, (1999) 617–623
Friedman N., Mosenzon O., Slonim N., and Tishby N.: Multivariate Information Bottleneck. In Proc. of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI), (2001)
Slonim N., Friedman N., and Tishby N.: Agglomerative Multivariate Information Bottleneck. In Advances in Neural Information Processing Systems (NIPS-14), (2001)
O’Connell J.R. and Weeks D.E.: The VITESSE algorithm for rapid exact multilocus linkage analysis via genotype set-recoding and fuzzy inheritance. Nature Genetics 11, (1995) 402–408
Friedman N., Geiger D., and Lotner N.: Likelihood Computation with Value Abstraction. In Proc. Sixteenth Conf. on Uncertainty in Artificial Intelligence (UAI), (2000)
Chartrand, G. and Oellermann O.R.: Applied and Algorithmic Graph Theory. McGraw-Hill, Inc., New York (1993)
Bollob’as B.: Random Graphs. Academic Press, London, (1985)
Waser P.M. and Strobeck C.: Genetic signatures of interpopulation dispersal. Trends Ecol Evol 13, (1998) 43–44
Guinand, B., Topchy A., Page K.S., Burnham-Curtis M.K., Punch W.F., and Scribner K. T.: Comparisons of likelihood and machine learning methods of individual classification. Journal of Heredity 93(4), (2002) 260–269
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Topchy, A., Punch, W. (2003). Dimensionality Reduction via Genetic Value Clustering. In: Cantú-Paz, E., et al. Genetic and Evolutionary Computation — GECCO 2003. GECCO 2003. Lecture Notes in Computer Science, vol 2724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45110-2_16
Download citation
DOI: https://doi.org/10.1007/3-540-45110-2_16
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40603-7
Online ISBN: 978-3-540-45110-5
eBook Packages: Springer Book Archive