Metaheuristics: Computer Decision-Making pp 347-367 | Cite as
Developing Classification Techniques from Biological Databases Using Simulated Annealing
Abstract
This paper describes new approaches to classification/identification of biological data. It is expected that the work may be extensible to other domains such as the medical domain or fault diagnostic problems. Organisms are often classified according to the value of tests which are used for measuring some characteristic of the organism. When selecting a suitable test set it is important to choose one of minimum cost. Equally, when classification models are constructed for the posterior identification of unnamed individuals it is important to produce optimal models in terms of identification performance and cost. In this paper, we first describe the problem of selecting an economic test set for classification. We develop a criterion for differentiation of organisms which may encompass fuzzy differentiability. Then, we describe the problem of using batches of tests sequentially for identification of unknown organisms, and we explore the problem of constructing the best sequence of batches of tests in terms of cost and identification performance. We discuss how metaheuristic algorithms may be used in the solution of these problems. We also present an application of the above to the problem of yeast classification and identification.
Keywords
Classification Identification Minimum test set (MTS) Heuristic techniques.Preview
Unable to display preview. Download preview PDF.
Bibliography
- J. A. Barnett. Identifying yeasts. Nature, 229 (578), 1971a.Google Scholar
- J. A. Barnett. Selection of tests for identifying yeasts. Nature, 232: 221–223, 1971b.CrossRefGoogle Scholar
- J. A. Barnett, R. W. Payne, and D. Yarrow. Yeasts: Characteristics and identification, Third Edition. Cambridge University Press, Cambrige, UK, 2000.Google Scholar
- B. de la Iglesia, J. C. W. Debuse, and V. J. Rayward-Smith. Discovering knowledge in commercial databases using modern heuristic techniques. In E. Simoudis, J. W. Han, and U. M. Fayyad, editors, Proceedings of the Second Int. Conf. on Knowledge Discovery and Data Mining. AAAI Press, 1996.Google Scholar
- B. de la Iglesia and V. J. Rayward-Smith. The discovery of interesting nuggets using heuristic techniques. In H. A. Abbass, R. A. Sarker, and C. S. Newton, editors, Data Mining: a Heuristic Approach. Idea group Publishing, USA, 2001.Google Scholar
- K. Deb, S. Agrawal, A. Pratap, and T. Meyarivan. A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II, 2000.Google Scholar
- J. C. W. Debuse and V. J. Rayward-Smith. Feature subset selection within a simulated annealing data mining algorithm. Journal of Intelligent Information Systems, 9: 57–81, 1997.CrossRefGoogle Scholar
- J.C.W. Debuse, B. de la Iglesia, C. M. Howard, and V. J. Rayward-Smith. Building the KDD roadmap: A methodology for knowledge discovery. In R. Roy, editor, Industrial Knowledge Management, pages 179–196. Springer-Verlag, London, 2000.Google Scholar
- C.M. Fonseca and P. J. Fleming. An overview of evolutionary algorithms in multiobjective optimisation. Evolutionary Comp, 3: 1–16, 1995.CrossRefGoogle Scholar
- M. R. Garey and D. S. Johnson. Computers and intractability: A guide to the theory of NP-completeness. Freeman, New York, 1979.MATHGoogle Scholar
- D. E. Goldberg. Genetic Algorithms in Search, Optimisation and Machine Learning. Addison-Wesley, Reading, Massachusetts, 1989.Google Scholar
- M. Hall. Correlation-based feature selection for machine learning, 1998.Google Scholar
- J. Horn and N. Nafpliotis. Multiobjective optimisation using the niched pareto genetic algorithm. Technical Report Illigal Report 93005, Illinois Genetic Algorithms Laboratory, University of Illinois, Urbana, Champaign, 1994.Google Scholar
- L. Hyafil and R. L. Rivest. Constructing optimal binary decision trees is npcomplete. Information Processing Letters, 5: 15–17, 1976.MathSciNetMATHCrossRefGoogle Scholar
- W. Jakob, M. Gorges-Schleuter, and Blume C. Applications of genetic algorithms to task planning and learning. In R. Manner and B. Manderick, editors, Parallel problem solving from Nature, 2, pages 291–300. North-Holland, 1992.Google Scholar
- R. M. Karp. Reducibility among combinatorial problems. In Complexity of Computer Communications. Plenum Press, New York, 1972.Google Scholar
- Igor Kononenko. Estimating attributes: Analysis and extensions of RELIEF. In European Conference on Machine Learning, pages 171–182, 1994.Google Scholar
- Huan Liu, Hiroshi Motoda, and Manoranjan Dash. A monotonic measure for optimal feature selection. In European Conference on Machine Learning, pages 101–106, 1998.Google Scholar
- J. W. Mann. X-SAmson v1.5 developers manual. School of Information Systems Technical Report, University of East Anglia, UK, 1996.Google Scholar
- A. Osyczka. Computer aided multicriterion optimisation method. Advances in Modelling and Simulation, 3 (4): 41–52, 1985.Google Scholar
- R. J. Pankhurst, editor. Systematics Association Special Volume No. 7, Biological Identification with Computers. Academic Press, New York, 1975.Google Scholar
- G. T. Parks and I. Miller. Selective breeding in a multiobjective genetic algorithm. In A. E. Eiben, editor, Proceedings of the Fifth International Conference on Parallel Problem Solving from Nature. Springer-Verlag, 1998.Google Scholar
- R. W. Payne. Selection criteria for the construction of efficient diagnostic keys. Journal of Statistical Planning and Inference, 5: 27–36, 1981.MathSciNetCrossRefGoogle Scholar
- R. W. Payne. Construction of irredundant test sets. Applied Statistics, 40: 213–229, 1991.CrossRefGoogle Scholar
- R. W. Payne. The use of identification keys and diagnostic tables in statistical work. In COMPSTAT 1992: Proceedings in Computational Statistics,volume 2, Heidelberg, 1992. Physica-Verlag.Google Scholar
- R. W. Payne. Genkey, a program for construction and printing identification keys and diagnostic tables. Technical Report m00/42529, Rothamsted Experimental Station, Harpenden, Hertfordshire, 1993.Google Scholar
- R. W. Payne and T. J. Dixon. A study of selection criteria for constructing identification keys. In T. Havranek, Z. Sidak, and M. Novak, editors, COMPSTAT 1984: Proceedings in Computational Statistics, Vienna, 1984. Physica-Verlag.Google Scholar
- R. W. Payne and D. A. Preece. Identification keys and diagnostic tables: a review (with discussion). Journal of the Royal Statistical Society, 143: 253–292, 1981.MathSciNetGoogle Scholar
- R. W. Payne and C. J. Thompson. A study of criteria for constructing identification keys containing tests with unequal costs. Computational Statistics Quarterly, 1: 43–52, 1989.Google Scholar
- J. I. Pitt and A. D. Hocking. Fungi and food spoilage 2nd Edition. Mackie Academic and Professional, London, 1997.CrossRefGoogle Scholar
- J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA, 1993.Google Scholar
- J. R. Quinlan. Bagging, boosting, and C4.5. In Proc. of the Thirteenth National Conference on A.I. AAAI Press/MIT Press, 1996.Google Scholar
- A. P. Reynolds, J. L. Dicks, I. N. Roberts, J. J. Wesselink, B. de la Iglesia, V. Robert, T. Boekhout, and V.J Rayward-Smith. Algorithms for identification key generation and optimization with application to yeast identification. In Proceedings of EvoBIO-2003 LNCS, Volume 2611. Springer, 2003. (To appear).Google Scholar
- J. D. Schaffer. Multiple objective optimisation with vector evaluated genetic algorithms. In J. J. Grefenstette, editor, Proceedings of the First International Conference on Genetic Algorithms, pages 93–100, San Mateo, California, 1985. Morgan Kaufmann Publishers Inc.Google Scholar
- N. Srinivas and K. Deb. Multiobjective optimisation using non-dominated sorting in genetic algorithms. Evolutionary Computation, 2 (3): 221–248, 1994.CrossRefGoogle Scholar
- J. J. Wesselink, B. de la Iglesia, S. A. James, J. L. Dicks, I. N. Roberts, and V. J. Rayward-Smity. Determining a unique defining dna sequence for yeast species using hashing techniques. Bioinformatics, 18 (7): 1004–1010, 2002.CrossRefGoogle Scholar
- W. R. Willcox and S. P. Lapage. Automatic construction of diagnostic tables. Computer Journal, 15: 263–267, 1972.CrossRefGoogle Scholar
- H. J. Zimmermann. Fuzzy Set Theory and its applications. Kluwer Academic Publishers, London, 1991.MATHGoogle Scholar