The Greedy Prepend Algorithm for Decision List Induction

  • Deniz Yuret
  • Michael de la Maza
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4263)


We describe a new decision list induction algorithm called the Greedy Prepend Algorithm (GPA). GPA improves on other decision list algorithms by introducing a new objective function for rule selection and a set of novel search algorithms that allow application to large scale real world problems. GPA achieves state-of-the-art classification accuracy on the protein secondary structure prediction problem in bioinformatics and the English part of speech tagging problem in computational linguistics. For both domains GPA produces a rule set that human experts find easy to interpret, a marked advantage in decision support environments. In addition, we compare GPA to other decision list induction algorithms as well as support vector machines, C4.5, naive Bayes, and a nearest neighbor method on a number of standard data sets from the UCI machine learning repository.


Support Vector Machine Secondary Structure Prediction Large Scale Problem Default Rule Unknown Word 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Rivest, R.L.: Learning decision lists. Machine Learning 2, 229–246 (1987)MathSciNetGoogle Scholar
  2. 2.
    Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998),
  3. 3.
    Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning 3, 261–283 (1989)Google Scholar
  4. 4.
    Webb, G.I.: Recent progress in learning decision lists by prepending inferred rules. In: Proceedings of the Second Singapore International Conference on Intelligent Systems (SPICIS 1994), Singapore, pp. B280–B285 (1994)Google Scholar
  5. 5.
    Newlands, D., Webb, G.I.: Alternative strategies for decision list construction. In: Proceedings of the Fourth Data Mining Conference (DM IV 2003), pp. 265–273 (2004)Google Scholar
  6. 6.
    Clark, P., Boswell, R.: Rule induction with CN2: Some recent improvements. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 151–163. Springer, Heidelberg (1991)CrossRefGoogle Scholar
  7. 7.
    Webb, G.I.: Opus: An efficient admissible algorithm for unordered search. JAIR 3, 431–465 (1995)MATHGoogle Scholar
  8. 8.
    Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the Workshop on Massive Datasets, Washington, DC, NRC, Committee on Applied and Theoretical Statistics (1993)Google Scholar
  9. 9.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)MATHGoogle Scholar
  10. 10.
    Chou, P.Y., Fasman, G.D.: Conformational parameters for amino acids in helical, beta sheet and random coil regions calculated from proteins. Biochemistry 13(2), 211–222 (1974)CrossRefGoogle Scholar
  11. 11.
    Levin, J.M., Pascarella, S., Argos, P., Garnier, J.: Quantification of secondary structure prediction improvement using multiple alignment. Prot. Engin. 6, 849–854 (1993)CrossRefGoogle Scholar
  12. 12.
    Rost, B., Sander, C.: Prediction of protein secondary structure at better than 70% accuracy. Journal of Molecular Biology 232, 584–599 (1993)CrossRefGoogle Scholar
  13. 13.
    Huang, J.T., Wang, M.T.: Secondary structural wobble: The limits of protein prediction accuracy. Biochemical and Biophysical Research Communications 294(3), 621–625 (2002)CrossRefGoogle Scholar
  14. 14.
    Cuff, J.A., Barton, G.J.: Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins: Structure, Function, and Genetics 34, 508–519 (1999)CrossRefGoogle Scholar
  15. 15.
    King, R.D., Sternberg, M.J.E.: Identification and application of the concepts important for accurate and reliable protein secondary structure prediction. Protein Sci 5, 2298–2310 (1996)CrossRefGoogle Scholar
  16. 16.
    Frishman, D., Argos, P.: Seventy-five percent accuracy in protein secondary structure prediction. Proteins: Structure, Function, and Genetics 27, 329–335 (1997)CrossRefGoogle Scholar
  17. 17.
    Salamov, A.A., Solovyev, V.V.: Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. Journal of Molecular Biology 247, 11–15 (1995)CrossRefGoogle Scholar
  18. 18.
    Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)Google Scholar
  19. 19.
    Weischedel, R., Meteer, M., Schwartz, R., Ramshaw, L.: Coping with ambiguity and unknown words through probabilistic models. Computational Linguistics 19(2), 359–382 (1993)Google Scholar
  20. 20.
    Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics 21(4), 543–565 (1995)Google Scholar
  21. 21.
    Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (1996)Google Scholar
  22. 22.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Deniz Yuret
    • 1
  • Michael de la Maza
    • 2
  1. 1.Koç UniversityIstanbulTurkey
  2. 2.Park Hudson FinanceCambridgeUSA

Personalised recommendations