Abstract
Designing a new application of knowledge discovery is a very tedious task. The success is determined to a great extent by an adequate example representation. The transformation of given data to the example representation is a matter of feature generation and selection. The search for an appropriate approach is difficult. In particular, if time data are involved, there exist a large variety of how to handle them. Reports on successful cases can provide case designers with a guideline for the design of new, similar cases. In this paper we present a complete knowledge discovery process applied to insurance data. We use the TF/IDF representation from information retrieval for compiling time-related features of the data set. Experimental reasults show that these new features lead to superior results in terms of accuracy, precision and recall. A heuristic is given which calculates how much the feature space is enlarged or shrinked by the transformation to TF/IDF.
Chapter PDF
Similar content being viewed by others
Keywords
References
Goldie, C., Klüppelberg, C.: Subexponential distributions. In: Adler, R., Feldman, R., Taqqu, M. (eds.) A practical guide to heavy tails: Statistical techniques for analysing heavy tails, Birkhauser, Basel (1997)
Apte, C.: Pednault, E., Weiss, S.: Data mining with extended symbolic methods. In: Procs. Joint Statistical Meeting. IBM insurance mining (1998)
Pairceir, R., McClean, S., Scotney, B.: Using hierarchies, aggregates, and statisticalk models to discover knowledge from distributed databases. In: Procs. AAAI WOrkshop on Learning Statistical Models from Relational Data, pp. 52–58. Morgan Kaufmann, Menlo Park (2000)
Lang, S., Kragler, P., Haybach, G., Fahrmeir, L.: Bayesian space-time analysis of health insurance data. In: Schwaiger, M., Opitz, O. (eds.) Exploratory Data Analysis in Empirical Research, Springer, Heidelberg (2002)
Klugmann, S., Panjer, H., Wilmot, G.: Loss Models – Fram Doata to Decisions. Wiley, Chichester (1998)
Staudt, M., Kietz, J.U., Reimer, U.: A data mining support environment and its application to insurance data. In: Procs. KDD. insurance mining (1998)
Kietz, J.U., Vaduva, A., Zücker, R.: MiningMart: Metadata-driven preprocessing. In: Proceedings of the ECML/PKDD Workshop on Database Support for KDD (2001)
Agrawal, R., Psaila, G., Wimmers, E.L., Zaït, M.: Querying shapes of histories. In: Proceedings of 21st International Conference on Very Large Data Bases, pp. 502–514. Morgan Kaufmann, San Francisco (1995)
Baron, S., Spiliopoulou, M.: Monitoring change in mining results. In: Proceedings of the 3rd International Conference on Data Warehou-sing and Knowledge Discovery, pp. 51–60. Springer, Heidelberg (2001)
Bettini, C., Jajodia, S., Wang, S.: Time Granularities in Databases, Data Mining, and Temporal Reasoning. Springer, Heidelberg (2000)
Blockeel, H., Fürnkranz, J., Prskawetz, A., Billari, F.: Detecting temporal change in event sequences: An application to demographic data. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 29–41. Springer, Heidelberg (2001)
Das, G., Lin, K.I., Mannila, H., Renganathan, G., Smyth, P.: Rule Discovery from Time Series. In: Agrawal, R., Stolorz, P.E., Piatetsky-Shapiro, G. (eds.) Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD 1998), New York City, pp. 16–22. AAAI Press, Menlo Park (1998)
Mannila, H., Toivonen, H., Verkamo, A.: Discovering frequent episode in sequences. In: Procs. of the 1st Int. Conf. on Knowledge Discovery in Databases and Data Mining, AAAI Press, Menlo Park (1995)
Morik, K.: The representation race - preprocessing for handling time phenomena. In: Lopez de Mantaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 4–19. Springer, Heidelberg (2000)
Domeniconi, C., Shing Perng, C., Vilalta, R., Ma, S.: A classification approach for prediction of target events in temporal sequences. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, p. 125. Springer, Heidelberg (2002)
Salton, G., Buckley, C.: Term weighting approaches in automatic text retrieval. Information Processing and Management 24, 513–523 (1988)
Kietz, J.U., Vaduva, A., Zücker, R.: Mining Mart: Combining Case-Based- Reasoning and Multi-Strategy Learning into a Framework to reuse KDD-Application. In: Michalki, R., Brazdil, P. (eds.) Proceedings of the fifth International Workshop on Multistrategy Learning (MSL 2000), Guimares, Portugal (2000)
Bi, Z., Faloutsos, C., Korn, F.: The DGX distribution for mining massive, skewed data. In: 7th International ACM SIGKDD Conference on Knowledge Discovery and Data Mining, ACM, New York (2001)
Witten, I., Frank, E.: Data Mining – Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (2000)
Rüping, S.: mySVM-Manual. Universität Dortmund, Lehrstuhl Informatik VIII (2000), http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM/
Bauschulte, F., Beckmann, I., Haustein, S., Hueppe, C., El Jerroudi, Z., Koepcke, H., Look, P., Morik, K., Shulimovich, B., Unterstein, K., Wiese, D.: PG-402 EndberichtWissensmanagement. Technical report, Fachbereich Informatik, Universität Dortmund (2002)
Allen, J.F.: Towards a general theory of action and time. Artificial Intelligence 23, 123–154 (1984)
Höppner, F.: Discovery of Core Episodes from Sequences. In: Hand, D.J., Adams, N.M., Bolton, R.J. (eds.) Pattern Detection and Discovery. LNCS (LNAI), vol. 2447, pp. 1–12. Springer, Heidelberg (2002)
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD Conference on Management of Data, Washington, D.C., pp. 207–216 (1993)
Fisseler, J.: Anwendung eines Data Mining-Verfahrens auf Versicherungsdaten. Master’s thesis, Fachbereich Informatik, Universität Dortmund (2003)
Mandelbrot, B.: A note on a class of skew distribution functions: Analysis and critique of a paper by H.A. Simon. Informationi and Control 2, 90–99 (1959)
Box, G.E.P., Jenkins, G.M., Reinsel, G.C.: Time Series Analysis. Forecasting and Control, 3rd edn. Prentice Hall, Englewood Cliffs (1994)
Schlittgen, R., Streitberg, B.H.J.: Zeitreihenanalyse, 9th edn., Oldenburg (2001)
Keogh, E., Pazzani, M.: Scaling up dynamic time warping for datamining applications. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 285–289. ACM Press, New York (2000)
Agrawal, R., Faloutsos, C., Swami, A.: Efficient similarity search in sequence databases. In: Lomet, D.B. (ed.) FODO 1993. LNCS, vol. 730, pp. 69–84. Springer, Heidelberg (1993)
Oates, T., Firoiu, L., Cohen, P.R.: Using dynamic time warping to bootstrap hmmbased clustering of time series. In: Sun, R., Giles, C.L. (eds.) IJCAI-WS 1999. LNCS (LNAI), vol. 1828, pp. 35–52. Springer, Heidelberg (2001)
Geurts, P.: Pattern extraction for time series classification. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 115–127. Springer, Heidelberg (2001)
Lausen, G., Savnik, I., Dougarjapov, A.: Msts: A system for mining sets of time series. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 289–298. Springer, Heidelberg (2000)
Guralnik, V., Srivastava, J.: Event detection from time series data. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, San Diego, USA, pp. 33–42 (1999)
Morik, K., Wessel, S.: Incremental signal to symbol processing. In: Morik, K., Kaiser, M., Klingspor, V. (eds.) Making Robots Smarter – Combining Sensing and Action through Robot Learning, pp. 185–198. Kluwer Academic Publ., Dordrecht (1999)
Mannila, H., Toivonen, H., Verkamo, A.: Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery 1, 259–290 (1997)
Klingspor, V., Morik, K.: Learning understandable concepts for robot navigation. In: Morik, K., Klingspor, V., Kaiser, M. (eds.) Making Robots Smarter – Combining Sensing and Action through Robot Learning, Kluwer, Dordrecht (1999)
Rieger, A.D.: Program Optimization for Temporal Reasoning within a Logic Programming Framework. PhD thesis, Universität Dortmund, Dortmund, Germany (1998)
Morik, K., Scholz, M.: The MiningMart Approach to Knowledge Discovery in Databases. In: Zhong, N., Liu, J. (eds.) Intelligent Technologies for Information Analysis, Springer, Heidelberg (2003) (to appear)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Morik, K., Köpcke, H. (2004). Analysing Customer Churn in Insurance Data – A Case Study. In: Boulicaut, JF., Esposito, F., Giannotti, F., Pedreschi, D. (eds) Knowledge Discovery in Databases: PKDD 2004. PKDD 2004. Lecture Notes in Computer Science(), vol 3202. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30116-5_31
Download citation
DOI: https://doi.org/10.1007/978-3-540-30116-5_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23108-0
Online ISBN: 978-3-540-30116-5
eBook Packages: Springer Book Archive