Advertisement

Algorithms for Identification Key Generation and Optimization with Application to Yeast Identification

  • Alan P. Reynolds
  • Jo L. Dicks
  • Ian N. Roberts
  • Jan-Jap Wesselink
  • Beatriz de la Iglesia
  • Vincent Robert
  • Teun Boekhout
  • Victor J. Rayward-Smith
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2611)

Abstract

Algorithms for the automated creation of low cost identification keys are described and theoretical and empirical justifications are provided. The algorithms are shown to handle differing test costs, prior probabilities for each potential diagnosis and tests that produce uncertain results. The approach is then extended to cover situations where more than one measure of cost is of importance, by allowing tests to be performed in batches. Experiments are performed on a real-world case study involving the identification of yeasts.

Keywords

Greedy Algorithm Shannon Entropy Material Cost Test Cost Weighted Cost 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    M. Edwards and D.R. Morse. The potential for computer-aided identification in biodiversity research. Trends in Ecology and Evolution, 10(4):153–158, 1995.CrossRefGoogle Scholar
  2. 2.
    T. Wijtzes, M.R. Bruggeman, M.J.R. Nout, and M.H. Zwietering. A computerised system for the identification of lactic acid bacteria. International Journal of Food Microbiology, pages 65–70, 1997.Google Scholar
  3. 3.
    B. De la Iglesia, V.J. Rayward-Smith, and J.J. Wesselink. Classification/ identification on biological databases. Proc MIC2001, 4th International Metaheuristics Conference, ed. J.P. de Souza, Porto, Portugal, 2001.Google Scholar
  4. 4.
    R.W. Payne and C.J. Thompson. A study of criteria for constructing identification keys containing tests with unequal costs. Comp. Stats. Quarterly, 1:43–52, 1989.Google Scholar
  5. 5.
    R.W. Payne and T.J. Dixon. A study of selection criteria for constructing identification keys. In T. Havranek, Z. Sidak, and M. Novak, editors, COMPSTAT 1984: Proceedings in Computational Statistics, pages 148–153. Physica-Verlag, 1984.Google Scholar
  6. 6.
    R.W. Payne. Genkey: A program for constructing and printing identification keys and diagnostic tables. Technical Report m00/42529, Rothamsted Experimental Station, Harpenden, Hertfordshire, 1993.Google Scholar
  7. 7.
    R.W. Payne and D.A. Preece. Identification keys and diagnostic tables: a review. Journal of the Royal Statistical Society, Series A, 143(3):253–292, 1980.zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    C.E. Shannon. A mathematical theory of communication. Bell Systems Technical Journal, 27:379–423 and 623-656, 1949.MathSciNetGoogle Scholar
  9. 9.
    J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.Google Scholar
  10. 10.
    T. A. Feo and M. G. C. Resende. Greedy randomized adaptive search procedures. Journal of Global Optimization, 6:109–133, 1995.zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Alan P. Reynolds
    • 1
  • Jo L. Dicks
    • 2
  • Ian N. Roberts
    • 3
  • Jan-Jap Wesselink
    • 1
  • Beatriz de la Iglesia
    • 2
  • Vincent Robert
    • 3
  • Teun Boekhout
    • 4
  • Victor J. Rayward-Smith
    • 1
  1. 1.School of Information SystemsUniversity of East AngliaNorwichUK
  2. 2.Computational Biology GroupJohn Innes CentreNorwichUK
  3. 3.Institute of Food ResearchNational Collection of Yeast CulturesNorwichUK
  4. 4.Centraalbureau voor SchimmelculturesUtrechtThe Netherlands

Personalised recommendations