Skip to main content

Data Reduction for Pattern Recognition and Data Analysis

  • Chapter
Computational Intelligence: A Compendium

Part of the book series: Studies in Computational Intelligence ((SCI,volume 115))

Pattern recognition involves various human activities of great practical significance, such as data-based bankruptcy prediction, speech/image recognition, machine fault detection and cancer diagnosis. Clearly, it would be immensely useful to build machines to fulfill pattern recognition tasks in a reliable and efficient way. The most general and most natural pattern recognition frameworks mainly rely on statistical characterizations of patterns with an assumption that they are generated by a probabilistic system. Research on neural pattern recognition has been widely conducted during the past few decades. In contrast to statistical methods, no assumptions (a priori knowledge) are required for building a neural pattern recognition framework. Despite the fact that different pattern recognition systems use different working mechanisms, the basic procedures of all these systems are basically the same. A typical pattern recognition procedure generally consists of three sequential parts – a sensing model for collecting and preprocessing raw data from real sites, a data processing model (which includes feature extraction/ selection and pattern selection), and a recognition/classification model [13, 58]. When one is handling a pattern recognition process, the following basic issues must be addressed:

  • How to process the raw data for a pattern recognition task? This issue concerns the sensing and preprocessing stage of pattern recognition;

  • How to determine appropriate data for a given pattern recognition model? This is a very important concern in the data processing stage. Deleting noisy or redundant data (including features and patterns) invariably leads to enhanced recognition performance;

  • How to design an appropriate classifier based on a given data set? This topic has been widely discussed in the pattern recognition community. Various learning algorithms and models have been proposed in an attempt to enhance recognition accuracy as much as possible, and in a fashion that is as simple as possible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 389.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alon U, Barkar N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1996) Broad pattern of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. National Academy Science, 9612: 6745-6750.

    Article  Google Scholar 

  2. Astrahan MM (1970) Speech analysis by clustering, or the hyperphoneme method. Stanford AI Project Memo, Stanford University, CA.

    Google Scholar 

  3. Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Networks, 5: 537-550.

    Article  Google Scholar 

  4. Bins J, Draper B (2001) Feature selection from huge feature sets. In: Proc. Intl. Conf. Computer Vision, July, Vancouver, Canada: 159-165.

    Google Scholar 

  5. Bishop CM (1995) Neural Networks for Pattern Recognition. Oxford University Press, New York, NY.

    Google Scholar 

  6. Blum AL, Langley P (1993) Selecting concise training sets from clean data. IEEE Trans. Neural Networks, 42: 305-318.

    Article  Google Scholar 

  7. Blum AL, Langley P (1997) Selection of relevant feature and examples in machine learning. Artificial Intelligence, 971-2: 245-271.

    Article  MATH  MathSciNet  Google Scholar 

  8. Bonnlander B (1996) Nonparametric selection of input variables for connec-tionist learning. PhD Thesis, Department of Computer Science, University of Colorado at Boulder, CU-CS-812-96.

    Google Scholar 

  9. Carunana RA, Freitag D (1994) Greedy attribute selection. In: Cohen WW, Hirsh H (eds) Proc. 11th Intl. Conf. Machine Learning, New Brunswick, NJ, July. Morgan Kaufmann, San Francisco, CA: 28-36.

    Google Scholar 

  10. Catlett J (1991) Megaindiction: machine learning on very large databases. PhD Thesis, Department of Computer Science, University of Sydney, Australia.

    Google Scholar 

  11. Chow TWS, Huang D (2005) Estimating optimal feature subsets using effi-cient estimation of high-dimensional mutual information. IEEE Trans. Neural Networks, 161: 213-224.

    Article  Google Scholar 

  12. Devijver PA, Kittler J (1982) Pattern Recognition: a Statistical Approach. Prentice Hall, Englewood Cliffs, NJ.

    MATH  Google Scholar 

  13. Duda RO, Hart PE, Stork DG (2001) Pattern Classification. Wiley, New York, NY.

    MATH  Google Scholar 

  14. Fraser AM, Swinney HL (1986) Independent coordinates for strange attractors from mutual information. Physics Reviews A, 332: 1134-1140.

    Article  MathSciNet  Google Scholar 

  15. Freund Y, Seung H, Shamir E, Tishby N (1997) Selective sampling using the query by committee algorithm. Machine Learning, 28: 133-168.

    Article  MATH  Google Scholar 

  16. Friedman JH (1997) Data mining and statistics: what’s the connection? In: Scott DW (ed) Proc. 29th Symp. Interface Between Computer Science and Statistics, Houston, TX, May (available online at http://www.stat.stanford. edu/jhf/ftp/dm-stats.ps - last accessed March 2007).

  17. Golub TR, Slonim DK, Tamayo P, Huard C, Gassenbeck M, Mesirov JP, Coller H, Loh L, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286: 531-537.

    Article  Google Scholar 

  18. Gray RM (1984) Vector quantization. IEEE ASSP Magazine, 12: 4-29.

    Article  Google Scholar 

  19. Gui J, Li H (2005) Penalized Cox regression analysis in the high-dimensional and low sample size settings, with application to microarray gene expression. Bioinformatics, 2113: 3001-3008.

    Article  Google Scholar 

  20. Guyon I, Weston J, Barnhill S (2002) Gene selection for cancer classification using support vector machines. Machine Learning, 46: 389-422.

    Article  MATH  Google Scholar 

  21. Guyon I, Elisseeff (2003) An introduction to variable and feature selection. J. Machine Learning Research, 3: 1157-1183.

    Article  MATH  Google Scholar 

  22. Hall MA (1999) Correlation-based feature selection for machine learning. PhD Thesis, Department of Computer Science, University of Waikato, New Zealand.

    Google Scholar 

  23. Hall MA, Holmes G (2000) Benchmarking attribute selection techniques for data mining. Working Paper 00/10, Department of Computer Science, Uni-versity of Waikato, New Zealand (available online at http://citeseer.ist.psu. edu/382752.html - last accessed March 2007).

  24. Han JW, Kamber M (2001) Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco, CA.

    Google Scholar 

  25. Hart PE (1968) The condensed nearest neighbour rule. IEEE Trans. Information Theory, 14: 515-516.

    Article  Google Scholar 

  26. Huang D, Chow TWS (2005) Efficiently searching the important input variables using Bayesian discriminant. IEEE Trans. Circuits and Systems - Part I, 524: 785-793.

    Article  MathSciNet  Google Scholar 

  27. Huang D, Chow TWS (2006) Enhancing density-based data reduction using entropy. Neural Computation, 18: 470-495.

    Article  MATH  Google Scholar 

  28. Jain AK, Zongker D (1997) Feature selection: evaluation, application, and small sample performance. IEEE Trans. Pattern Analysis and Machine Intelligence, 192: 153-158.

    Article  Google Scholar 

  29. John GH, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem. In: Cohen WW, Hirsh H (eds) Proc. 11th Intl. Conf. Machine Learning, New Brunswick, NJ, July. Morgan Kaufmann, San Francisco, CA: 121-129.

    Google Scholar 

  30. John GH, Langley P (1996) Statistics vs. dynamics sampling for data mining. In: Simoudis E, Han J, Fayyad UM (eds) Proc. 2nd Intl. Conf. Knowledge Discovery and Data Mining, Portlnd, OR, August. AAAI Press, Menlo Park, CA: 367-370.

    Google Scholar 

  31. Kohavi R, John GH (1998) The wrapper approach. In: Liu H, Motoda H (eds) Feature Extraction, Construction and Selection. Kluwer Academic Publishers, New York, NY: 33-50.

    Google Scholar 

  32. Kohonen T (2001) Self-Organizing Maps. Springer-Verlag, London, UK.

    MATH  Google Scholar 

  33. Kudo M, Sklansky (1997) A comparative evaluation of medium and large-scale feature selectors for pattern classifiers. In: Pudil P, Novovicova J, Grim J (eds) Proc. 1st Intl. Workshop Statistical Techniques in Pattern Recognition, Prague, Czech Republic, June: 91-96.

    Google Scholar 

  34. Kudo M, Sklansky J (2000) Comparison of algorithms that select features for pattern classifiers. Pattern Recognition, 33: 25-41.

    Article  Google Scholar 

  35. Kwak N, Choi C-H (2002) Input feature selection for classification problems. IEEE Trans. Neural Networks, 13: 143-159.

    Article  Google Scholar 

  36. Kwak N, Choi C-H (2002) Input feature selection by mutual information based on Parzen window. IEEE Trans. Pattern Analysis and Machine Intelligence, 2412: 1667-1671.

    Article  Google Scholar 

  37. Last M, Kandel A, Maimon O, Eberbach E (2000) Anytime algorithm for feature selection. In: Ziarko W, Yao Y (eds) Rough Sets and Current Trends in Comput-ing (Proc. 2nd Intl. Conf. RSCTC), October, Banff, Canada. Springer-Verlag, London, UK: 16-19.

    Google Scholar 

  38. Law M, Figueiredo M, Jain A (2002) Feature saliency in unsupervised learning. Technical Report, Department of Computer Science, Michigan State Univer-sity (available at http://www.cse.msu.edu/#lawhiu/papers/TR02.ps.gz - last accessed March 2007).

  39. Lazzerini B, Marcelloni F(2001) Feature selection based on similarity. Electronics Letters, 38(3): 121-122.

    Article  Google Scholar 

  40. Lewis DD, Catlett J (1994) Heterogeneous uncertainty: sampling estimation of error reduction. In: Cohen WW, Hirsh H (eds) Proc. 11th Intl. Conf. Machine Learning, New Brunswick, NJ, July. Morgan Kauffman, San Francisco, CA: 148-156.

    Google Scholar 

  41. Liu H, Motoda H (1998) Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, London, UK.

    MATH  Google Scholar 

  42. Liu H, Motoda H, Dash M (1998) A monotonic measure for optimal feature selec-tion. In: Nedellec C, Rouveiral C (eds) Proc. European Conf. Machine Learning, Chemnitz, Germany, April. Springer-Verlag, London, UK: 101-106.

    Google Scholar 

  43. Liu H, Motoda H, Yu L (2002) Feature selection with selective sampling. In: Sammut C, Hoffmann A (eds) Proc. 9th Intl. Conf. Machine Learning, Sydney, Australia, July. Morgan Kaufmann, San Francisco, CA: 395-402.

    Google Scholar 

  44. MacKay D (1992) A practical Bayesian framework for backpropagation networks. Neural Computation, 4: 448-472.

    Article  Google Scholar 

  45. Mitra P, Murthy CA, Pal SK (2002) Density-based multi-scale data condensation. IEEE Trans. Pattern Analysis and Machine Intelligence,246: 734-747.

    Article  Google Scholar 

  46. Mitra P, Murthy CA, Pal SK (2002) Unsupervised feature seelction using fea-ture similarity. IEEE Trans. Pattern Analysis and Machine Intelligence, 243: 301-312.

    Article  Google Scholar 

  47. Molina LC, Belanche L, Nebot A (2002) Feature selection algorithms: a survey and experimental evaluation. Technical Report, Department de Llenguatges i Sistemes Informátics, Universitat Politèncnica de Catalunya.

    Google Scholar 

  48. Moon Y, Rajagopalan B, Lall U (1995) Estimation of mutual information using kernel density estimators. Physics Reviews E, 52: 2318-2321.

    Article  Google Scholar 

  49. Moore J, Han E, Boley D, Gini M, Gross R, Hastings K, Karypis G, Kumar V, Mobasher B (1997) Web page categorization and feature seelction using association rule and principal component clustering. Proc. 7th Intl. Workshop Information Technologies and Systems, Atlanta, GA, December (available online at http://citeseer.ist.psu.edu/15436.html - last accessed March 2007)

  50. Narendra PM, Fukunaga K (1997) A branch and bound algorithm for feature subset selection. IEEE Trans. Computers - C, 26(9): 917-922.

    Article  Google Scholar 

  51. Pal SK, De RK, Basak J (2000) Unsupervised feature evaluation: a neuro-fuzzy approach. IEEE Trans. Neural Networks, 112: 366-376.

    Article  Google Scholar 

  52. Plutowski M, White H (1993) Selecting concise training sets from clean data. IEEE Trans. Neural Networks, 42: 305-318.

    Article  Google Scholar 

  53. Provost F, Kolluri V (1999) A survey of methods for scaling up inductive algorithms. Data Mining and Knowledge Discovery, 2: 131-169.

    Article  Google Scholar 

  54. Pudil P, Novovicova J, Kittler J (1994) Floating search methods in feature selection. Pattern Recogition Letters, 15: 1119-1125.

    Article  Google Scholar 

  55. Roy N, McCallum A (2001) Toward optimal active learning through sampling estimation of error reduction. In: Lapalme KG (eds) Proc. 18th Intl. Conf. Machine Learning, Williamstown, MA, June. Morgan Kauffman, San Francisco, CA: 441-448.

    Google Scholar 

  56. Setiono R, Liu H (1997) Neural network feature selector. IEEE Trans. Neural Networks, 83: 654-661.

    Article  Google Scholar 

  57. Siedlecki W, Sklansky J (1989) A note on genetic algorithms for large scale on feature selection. Pattern Recogition Letters, 10: 335-347.

    Article  MATH  Google Scholar 

  58. Thedodoridis S, Koutroumbas K (1998) Pattern Recognition. Academic Press, London, UK.

    Google Scholar 

  59. Tong S, Koller D (2000) Support vector machine active learning with applica-tions to text classification. In: Langley P (ed) Proc. 17th Intl. Conf. Machine Learning, Stanford, CA, June. Morgan Kaufmann, San Francisco, CA: 999-1006.

    Google Scholar 

  60. Wang H, Bell D, Murtagh F (1999) Axiomatic approach to feature sub-set selection based on relevance. IEEE Trans. Pattern Analysis and Machine Intelligence, 213: 271-277.

    Article  Google Scholar 

  61. Wang W, Jones P, Patridge D (2001) A comparative study of feature-salience ranking techniques. Neural Computation, 13: 1603-1623.

    Article  MATH  Google Scholar 

  62. Weston J, Mukherjee S, Chapelle O, Pontil M, Poggio T, Vapnik V (2001) Feature selection for SVMs. In: Solla SA, Leen TK, Muller K-R (eds) Advances in Neural Information Processing Systems 13. MIT Press, Cambridge, MA: 688-674.

    Google Scholar 

  63. Wilson AL, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Machine Learning, 38: 257-286.

    Article  MATH  Google Scholar 

  64. Wolf L, Shashua A (2003) Feature selection for unsupervised and supervised inference: the emergence of sparsity in a wieghted-based approach. Technical Report 2003-58, June, Hebrew University, Israel.

    Google Scholar 

  65. Xing EP, Jordan MI, Karp RM (2001) Feature selection for high-dimensional genomic microarray data. In: Brodley CE, Danyluk AP (eds) Proc. 18th Intl. Conf. Machine Learning, Boston, MA, June. Morgan Kauffman, San Francisco, CA.

    Google Scholar 

  66. Xu L, Yan P, Chang T (1998) Best first strategy for feature selection. Proc. 9th Intl. Conf. Pattern Recognition, Rome, Italy, November. IEEE Computer Society Press, Piscataway, NJ: 706-708.

    Google Scholar 

  67. Yang J, Honavar VG (1998) Feature subset selection using a genetic algorithm. IEEE Intelligent Systems, 132: 44-49.

    Article  Google Scholar 

  68. Yang ZP, Zwolinski(2001) Mutual information theory for adaptive mixture models. IEEE Trans. Pattern Analysis and Machine Intelligence, 23(4): 396-403.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Chow, T.W.S., Huang, D. (2008). Data Reduction for Pattern Recognition and Data Analysis. In: Fulcher, J., Jain, L.C. (eds) Computational Intelligence: A Compendium. Studies in Computational Intelligence, vol 115. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78293-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78293-3_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78292-6

  • Online ISBN: 978-3-540-78293-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics