Abstract
Automatic cue phrase selection is a crucial step for designing a dialogue act recognition model using machine learning techniques. The approaches, currently used, are based on specific type of feature selection approaches, called ranking approaches. Despite their computational efficiency for high dimensional domains, they are not optimal with respect to relevance and redundancy. In this paper we propose a genetic-based approach for cue phrase selection which is, essentially, a variable length genetic algorithm developed to cope with the high dimensionality of the domain. We evaluate the performance of the proposed approach against several ranking approaches. Additionally, we assess its performance for the selection of cue phrases enriched by phrase’s type and phrase’s position. The results provide experimental evidences on the ability of the genetic-based approach to handle the drawbacks of the ranking approaches and to exploit cue’s type and cue’s position information to improve the selection. Furthermore, we validate the use of the genetic-based approach for machine learning applications. We use selected sets of cue phrases for building a dynamic Bayesian networks model for dialogue act recognition. The results show its usefulness for machine learning applications.
Similar content being viewed by others
References
Ali A, Mahmod R, Ahmad F, Sullaiman N (2006) Dynamic bayesian networks for intention recognition in conversational agent. In: Proceedings of the 3rd international conference on artificial intelligence in engineering and technology (iCAiET2006). Universiti Malaysia Sabah, Sabah, Malaysia
Allen J, Core M (1997) Draft of DAMSL: dialog act markup in several layers. The Multiparty Discourse Group, University of Rochester, Rochester, USA. Available from http://www.cs.rochester.edu/research/cisd/resources/damsl
Araujo L (2002) Part-of-speech tagging with evolutionary algorithms. In: Proceedings of the international conference on intelligent text processing and computational linguistics, lecture notes in computer science, vol 2276. Springer-Verlag, Berlin, pp 230–239
Austin JL (1962) How to do things with words. Oxford University Press, Oxford
Belz A, Eskikaya B (1998). A genetic algorithm for finite-state automata induction with an application to phonotactics. In: Proceedings of ESSLLI-98 workshop on automated acquisition of syntax and parsing
Blum A, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97:245–271
Bunt H (1994) Context and dialogue control. Think 3(1):19–31
Caballero RE, Estevez PA (1998) A niching genetic algorithm for selecting features for neural network classifiers. In: Proceedings of the 8th international conference of artificial neural networks. Springer-Verlag, pp 311–316
Cantu-Paz E (2004) Feature subset selection, class separability, and genetic algorithms. In: Proceedings of genetic and evolutionary computation conference-GECCO 2004, Deb K, (ed) et al. pp 959–970
Chunkai K, Zhang HH (2005), An effective feature selection scheme via genetic algorithm using mutual information. In: Proceedings of 2nd international conference on fuzzy systems and knowledge discovery, pp 73–80
Dash M, Liu H (1997) Feature selection for classification. Int Data Anal Int J 1(3):131–156
Dash M, Liu H (2003) Consistency-based search in feature selection. Artif Intell 151(1/2):155–176
David WA (1992) Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms. Int J Man Mach Stud 36(1):267–287
Davidor Y (1991) A genetic algorithm applied to robot trajectory generation. In: Davis L (ed) Handbook of genetic algorithms. Van Nostrand Reinhold, pp 144–165
Davidor Y (1991) Genetic algorithms and robotics: a heuristic strategy for optimisation, vol 1 of World Scientic Series in robotics and automated systems World Scientific
Eads D, Hill D, Davis S, Perkins S, Ma J, Porter R, Theiler J (2002) Genetic algorithms and support vector machines for time series classification. In: Proceedings of 5th conference on the application and science of neural networks, fuzzy systems and evolutionary computation, symposium on optical science and technology of the 2002 SPIE annual meeting, pp 74–85
Fatourechi M, Birch GE, Ward RK (2007) Application of a hybrid wavelet feature selection method in the design of a self-paced brain interface system. J Neuroeng Rehabil 4:11
Filho B (2000) Feature selection from huge feature sets in the context of computer vision. Master’s thesis, Colorado State University Fort Collins, Colorado
Fishel M (2007) Machine learning techniques in dialogue act recognition. In: Proceedings of estonian papers in applied linguistics 3, pp 117–134
Fleuret F (2004) Fast binary feature selection with conditional mutual information. J Mach Learn Res 5:1531–1555
Fogel LJ, Owens AJ, Walsh MJ (1966) Artificial intelligence through simulated evolution. Wiley, New York
Frohlich H, Chapelle O, Schölkopf B (2004) Feature selection for support vector machines using genetic algorithms. Int J Artif Intell Tools 13(4):791–800
Goldberg DE (1989) Genetic algorithms in search, optimisation, and machine learning. Addison-Wesley, New York
Goldberg DE, Korb B, Deb K (1990) Messy genetic algorithms: motivation, analysis, and first results. Complex Syst 3:493–530
Goldberg DE, Deb K, Korb B (1990) Messy genetic algorithms revisited: studies in mixed size and scale. Complex Syst 4(4):415–444
Gonzalo V, Sánchez-Ferrero J, Arribas I (2007) A statistical-genetic algorithm to select the most significant features in mammograms. In: Proceedings of the 12th international conference on computer analysis of images and patterns, pp 189–196
Harvey I (1995) The artificial evolution of adaptive behaviour, D. Phil. thesis, School of cognitive and computing sciences. University of Sussex
Haupt RL, Haupt SE (2004) Practical genetic algorithms, 2nd edn. Wiley, New York
Hirschberg J, Litman D (1993) Empirical studies on the disambiguation of cue phrases. Comput Linguist 19(3):501–530
Holland JH (1975) Adaptation in natural and artificial systems. The University of Michigan Press, Ann Arbor
Hong JH, Cho SB (2006) Efficient huge-scale feature selection with speciated genetic algorithm. Pattern Recognit Lett 2(27):143–150
Intel Corporation (2004). Probabilistic network library—user guide and reference manual
Jurafsky D (2004) Pragmatics and computational linguistics. In: Horn L, Ward G (eds) The Handbook of pragmatics. Oxford, Blackwell, pp 578–604
Jurafsky D, Shriberg E, Fox B, Traci C (1998) Lexical, prosodic, and syntactic cues for dialog acts. In: Proceedings of ACL/coling ‘98 workshop on discourse relations and discourse markers, Montreal, Quebec, Canada pp 114–120
Kats H (2006) Classification of user utterances in question answering dialogues. Master’s thesis, University of Twente, Netherlands
Kazakov D (1998) Genetic algorithms and MDL bias for word segmentation. In: Proceeding of ESSLLI-97
Kelly JD, Davis L (1991) Hybridising the genetic algorithm and the K nearest neighbors classification algorithm. In: ICGA pp 377–383
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell J 97(1/2):273–324
Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge
Lankhorst MM (1994) Automatic word categorisation with genetic algorithms: computer science report CS-R9405. University of Groningen, The Netherlands
Lanzi P (1997) Fast feature selection with genetic algorithms: a filter approach. In: Proceedings of IEEE international conference on evolutionary computation, pp 537–540
Lesch S (2005) Classification of multidimensional dialogue acts using maximum entropy. Diploma Thesis, Saarland University, Postfach 151150, D-66041 Saarbrucken, Germany
Li L, Weinberg CR, Darden TA, Pedersen LG (2001) Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17:1131–1142
Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Kluwer, Boston
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17:491–502
Liu JJ, Cutler G, Li W, Pan Z, Peng S, Hoey T, Chen L, Ling X (2005) Multiclass cancer classification and biomarker discovery using GA-based algorithms. Bioinformatics 21:2691–2697
Liu W, Wang M, Zhong Y (1995) Selecting features with genetic algorithm in handwritten digits recognition. In: Proceedings of the international IEEE conference on evolutionary computation, pp 396–399
Losee RM (1995) Learning syntactic rules and tags with genetic algorithms for information retrieval and filtering: an empirical basis for grammatical rules. Inf Process Manag 32:185–197
Lu J, Zhao T, Zhang Y (2008) Feature selection based-on genetic algorithm for image annotation. Knowl Based Syst 21(8):887–891
Manning C, Schutze H (1999) Foundation of statistical natural language processing. MIT Press, Cambridge
Mitchell M (1996) An introduction to genetic algorithms. MIT Press, Cambridge
Morariu D, Vintan L, Tresp V, (2006) Evolutionary feature selection for text documents using the SVM. In: Proceedings of 3rd international conference on machine learning and pattern recognition (MLPR 2006), ISSN 1305–5313 vol 15, pp 215–221, Barcelona
Moser A, Murty M. (2000) On the scalability of genetic algorithms to very large-scale feature selection. EvoWorkshops, pp 77–86
Nettleton DJ, Gargliano R (1994). Evolutionary algorithms and dialogue. In: Practical handbook of genetic algorithms. CRC Press, New York
Oakes M (1997) Statistics for corpus linguistics. Edinburgh University Press, Edinburgh
Ozdemir M, Embrechts MJ, Arciniegas F, Breneman CM, Lockwood L, Bennett KP (2001) Feature selection for in-silico drug design using genetic algorithms and neural networks. IEEE mountain workshop on soft computing in industrial applications, pp 53–57
Punch WF, Goodman ED, Pei M, Chia-Shun L, Hovland P, Enbody R (1993), Further research on feature selection and classification using genetic algorithms. In: Proceedings of the fifth international conference on genetic algorithms, champaign, Ill: pp 557–564
Samuel K, Carberry S, Vijay-Shanker K (1999) Automatically selecting useful phrases for dialogue act tagging, In: Proceedings of PACLING ‘99 (fourth Conference of the Pacific Association for Computational Linguistics). Waterloo, Ontario, Canada
Schütz M (1997) Other operators: gene duplication and deletion. In: Bäck TH, Fogel DB, Michalewicz Z Hrsg., Handbook of evolutionary computation C3.4:8–15. Oxford University Press, New York, und Institute of Physics Publishing, Bristol
Searle JR (1975) A taxonomy of illocutionary acts. In: Gunderson K, (eds), Language, mind and knowledge, Minnesota studies in the philosophy of science. University of Minnesota Press 7:344–369
Sebastiani F (2002) Machine learning in automated text categorisation. ACM Comput Surv 34(1):1–47
Siedlecki W, Sklansky J (1988) On automatic feature selection. Int J Patt Recognit Artif Intell 2(2):197–220
Silla CN, Pappa GL, Freitas AA, Kaestner CAA (2004) Automatic text summarisation with genetic algorithm-based attribute selection. In: IX IBERAMIA—Ibero-American conference on artificial intelligence, Puebla
Smith SF (1980) A learning system based on genetic adaptive algorithms. Ph. D. Thesis. University of Pittsburgh, PA, USA
Vafaie H, De Jong K (1995) Genetic algorithms as a tool for restructuring feature space representations. In: Proceedings of the seventh international conference on tools with artificial intelligence. Henidon
Vafaie H, De Jong K (1992) Genetic algorithms as a tool for feature selection in machine learning. In: Proceeding of the 4th international conference on tools with artificial intelligence, Arlington
Van de Burgt SP, Schaake J, Nijholt A (1995) Language analysis for dialogue management in theatre information and booking system, language engineering, AI 95, 15th international conference, Montpellier, pp 351–362
Verbree AT, Rienks RJ, Heylen DKJ (2006) Dialogue act tagging using smart feature selection: results on multiple corpora. In: Proceedings of the first international IEEE workshop on spoken language technology SLT, pp 10–13
Vose MD (1999) The simple genetic algorithm: foundation and theory. MIT Press, Cambridge
Webb N, Hepple M, Wilks Y (2005) Dialogue act classification based on Intra-utterance features”. In: Proceedings of the AAAI 05
Webb N, Hepple M, Wilks Y (2005) Empirical determination of thresholds for optimal dialogue act classification. In: Proceeding of the ninth workshop on the semantics and pragmatics of dialogue
William H (2004) Genetic wrappers for feature selection in decision tree induction and variable ordering in Bayesian network structure learning. Inf Sci Int J 163(1–3):103–122
Wilson, GC, Heywood MI (2005) Use of a genetic algorithm in brill’s transformation-based part-of-speech tagger. In: Proceedings of the genetic and evolutionary computation conference (GECCO 2005), June 25–29, 2005, Washington, DC, USA, ACM Press, ISBN 1-59593-010-8, pp 2067–2073
Yang J, Honavar V (1998) Feature subset selection using a genetic algorithm. IEEE Intell Syst 13(2):44–49
Yu E, Cho S (2003) GA-SVM wrapper approach for feature subset selection in keystroke dynamics identity verification. In: Proceedings of the IEEE international joint conference on neural networks 3:2253–2257
Zebulum RS, Pacheco MA, Vellasco M (2000) Variable length representation in evolutionary electronics. Evol Comput J 8(1):93–120
Zhang L, Wang J, Zhao Y, Yang Z (2003) A novel hybrid feature selection algorithm: using Relief estimation for GA-wrapper search. In: Proceedings of IEEE international conference on machine learning and cybernetics
Zhang P, Verma B, Kumar K (2004) A neural-genetic algorithm for feature selection and breast abnormality classification in digital mammography. In: Proceedings of IEEE international joint conference on neural networks, vol 3, pp 2303–2308
Zheng Z, Wu X, Srihari R (2004) Feature selection for text categorization on imbalanced data. SIGKDD 6(1):80–89
Zhuo L, Zheng J,Wang F, Li X, Ai B, Qian J, (2008) A genetic algorithm based wrapper feature selection method for classification of hyperspectral images using support vector machine. The international archives of the photogrammetry, remote sensing and spatial information sciences. vol XXXVII. Part B7. Beijing
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yahya, A.A., Ramli, A.R. Genetic-based approach for cue phrase selection in dialogue act recognition. Evol. Intel. 1, 253–269 (2009). https://doi.org/10.1007/s12065-008-0016-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-008-0016-6