Advertisement

A Genetic-Based Feature Construction Method for Data Summarisation

  • Rayner Alfred
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5139)

Abstract

The importance of input representation has been recognised already in machine learning. This paper discusses the application of genetic-based feature construction methods to generate input data for the data summarisation method called Dynamic Aggregation of Relational Attributes (DARA). Here, feature construction methods are applied in order to improve the descriptive accuracy of the DARA algorithm. The DARA algorithm is designed to summarise data stored in the non-target tables by clustering them into groups, where multiple records stored in non-target tables correspond to a single record stored in a target table. This paper addresses the question whether or not the descriptive accuracy of the DARA algorithm benefits from the feature construction process. This involves solving the problem of constructing a relevant set of features for the DARA algorithm by using a genetic-based algorithm. This work also evaluates several scoring measures used as fitness functions to find the best set of constructed features.

Keywords

Feature Construction Data Summarisation Genetic Algorithm Clustering 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alfred, R., Kazakov, D.: Data Summarisation Approach to Relational Domain Learning Based on Frequent Pattern to Support the Development of Decision Making. In: 2nd ADMA International Conference, pp. 889–898 (2006)Google Scholar
  2. 2.
    Blockeel, H., Dehaspe, L.: Tilde and Warmr User Manual (1999), http://www.cs.kuleuvan.ac.be/~ml/PS/TWuser.ps.gz
  3. 3.
    Lavrač, N., Flach, P.A.: An extended transformation approach to Inductive Logic Programming. ACM Trans. Comput. Log. 2(4), 458–494 (2001)CrossRefGoogle Scholar
  4. 4.
    Salton, G., Wong, A., Yang, C.S.: A Vector Space Model for Automatic Indexing. Commun. ACM 18(11), 613–620 (1975)zbMATHCrossRefGoogle Scholar
  5. 5.
    Pagallo, G., Haussler, D.: Boolean Feature Discovery in Empirical Learning. Machine Learning 5, 71–99 (1990)CrossRefGoogle Scholar
  6. 6.
    Hu, Y.J., Kibler, D.F.: Generation of Attributes for Learning Algorithms. In: AAAI/IAAI, vol. 1, pp. 806–811 (1996)Google Scholar
  7. 7.
    Hu, Y.J.: A genetic programming approach to constructive induction. In: Proc. of the Third Annual Genetic Programming Conference, pp. 146–157. Morgan Kauffman, Madison (1998)Google Scholar
  8. 8.
    Otero, F.E.B., Silva, M.S., Freitas, A.A., Nievola, J.C.: Genetic Programming for Attribute Construction in Data Mining. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E.P.K., Poli, R., Costa, E. (eds.) EuroGP 2003. LNCS, vol. 2610, pp. 384–393. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  9. 9.
    Bensusan, H., Kuscu, I.: Constructive Induction using Genetic Programming. In: ICML 1996, Evolutionary computing and Machine Learning Workshop (1996)Google Scholar
  10. 10.
    Zheng, Z.: Constructing X-of-N Attributes for Decision Tree Learning. Machine Learning 40(1), 35–75 (2000)zbMATHCrossRefGoogle Scholar
  11. 11.
    Zheng, Z.: Effects of Different Types of New Attribute on Constructive Induction. In: ICTAI, pp. 254–257 (1996)Google Scholar
  12. 12.
    Quinlan, R.J.: Decision-Tree. In: C4.5: Programs for Machine Learning. Morgan Kaufmann Series in Machine Learning (1993)Google Scholar
  13. 13.
    Holland, J.: Adaptation in Natural and Artificial Systems. University of Michigan Press (1975)Google Scholar
  14. 14.
    Amaldi, E., Kann, V.: On the Approximability of Minimising Nonzero Variables or Unsatisfied Relations in Linear Systems. Theory Computer Science 209(1-2), 237–260 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Freitas, A.A.: Understanding the Crucial Role of Attribute Interaction in Data Mining. Artif. Intell. Rev. 16(3), 177–199 (2001)zbMATHCrossRefGoogle Scholar
  16. 16.
    Shafti, L.S., Pérez, E.: Genetic Approach to Constructive Induction Based on Non-algebraic Feature Representation. In: R. Berthold, M., Lenz, H.-J., Bradley, E., Kruse, R., Borgelt, C. (eds.) IDA 2003. LNCS, vol. 2810, pp. 599–610. Springer, Heidelberg (2003)Google Scholar
  17. 17.
    Vafaie, H., DeJong, K.: Feature Space Transformation Using Genetic Algorithms. IEEE Intelligent Systems 13(2), 57–65 (1998)CrossRefGoogle Scholar
  18. 18.
    Koza, J.R.: Genetic Programming: On the programming of computers by means of natural selection. Statistics and Computing 4(2) (1994)Google Scholar
  19. 19.
    Krawiec, K.: Genetic Programming-based Construction of Features for Machine Learning and Knowledge Discovery Tasks. Genetic Programming and Evolvable Machines 3, 329–343 (2002)zbMATHCrossRefGoogle Scholar
  20. 20.
    Davies, D.L., Bouldin, D.W.: A Cluster Separation Measure. IEEE Trans. Pattern Analysis and Machine Intelligence 1, 224–227 (1979)CrossRefGoogle Scholar
  21. 21.
    Shannon, C.E.: A mathematical theory of communication. Bell system technical journal 27 (1948)Google Scholar
  22. 22.
    Wiener, N.: Cybernetics: Or Control and Communication in Animal and the Machine. MIT Press, Cambridge (2000)Google Scholar
  23. 23.
    Srinivasan, A., Muggleton, S., Sternberg, M.J.E., King, R.D.: Theories for Mutagenicity: A Study in First-Order and Feature-Based Induction. Artif. Intell. 85(1-2), 277–299 (1996)CrossRefGoogle Scholar
  24. 24.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Rayner Alfred
    • 1
  1. 1.School of Engineering and Information TechnologyUniversiti Malaysia SabahKota KinabaluMalaysia

Personalised recommendations