Knowledge Discovery in Spatial Planning Data: A Concept for Cluster Understanding

  • Martin BehnischEmail author
  • Alfred Ultsch
Part of the Geotechnologies and the Environment book series (GEOTECH, volume 13)


The objective of this paper is to present a methodology for discovering comprehensible, valid, potentially innovative, and useful patterns, i.e., new knowledge, in multidimensional spatial data. Techniques from statistics, machine learning, and data mining are applied in consecutive logical steps to allow the visualization of results and the application of validation procedures at each stage. However, the approach does not end with a data cluster; rather, if such a valid cluster has been achieved, then the question is posed: “What do the clusters mean?”. Symbolic machine learning methods are employed to produce an explanation of the clusters in terms of rules employing an understandable subset of the high-dimensional data variables. This combined with canonical representatives of a cluster and consideration of the spatial distribution of the clusters lead to hypothesis on emergent data structures, that is, potential new knowledge. The approach is demonstrated on an exemplary data set of German urban districts featuring seven dimensions of land use.


Knowledge discovery Data mining Cluster Spatial planning 



The authors acknowledge the yearly data provided by the Federal Agency for Cartography and Geodesy, which was crucial for the development of the land use monitoring. The Federal Institute for Research on Building, Urban Affairs and Spatial Development (BBSR) makes several spatial typologies available. The authors would like to thank the colleagues of the Leibniz Institute of Ecological Urban and Regional Development (IOER) for the indicator computation and the fruitful cooperation. Further, we cordially appreciate the remarks of the reviewers and editors for giving constructive and helpful comments to improve the quality of this chapter.


  1. Alpaydin E (2008) Introduction to machine learning, 2nd edn. MIT, CambridgeGoogle Scholar
  2. Aumayr ChM (2007) European region types in EU-25. Eur J Comp Econ 4(2):109–147Google Scholar
  3. Bätzing W, Dickhörner Y (2001) Die Typisierungen der Alpengemeinden nach Entwicklungsverlaufsklassen für den Zeitraum 1870–1990. Mitteilungen der Fränkischen Geographischen Gesellschaft 48:273303Google Scholar
  4. Behnisch M (2009) Urban data mining. KIT Scientific Publishing, KarlsruheGoogle Scholar
  5. Behnisch M, Ultsch A (2009) Urban data mining: spatiotemporal exploration of multidimensional data. Build Res Inf. doi:10.1080/09613210903189343Google Scholar
  6. Behnisch M, Hagemann U, Meinel G (2013) Analysergebnisse zum Gebäudebestand in Deutschland auf der Grundlage von Geobasisdaten. In: Meinel G, Schumacher U, Behnisch M (eds) Flächennutzungsmonitoring V Methodik Analyseergebnisse Flächenmanagement (IR Schriftenreihe, Bd. 61) Rhombos, BerlinGoogle Scholar
  7. Bilmes J (1998) A gentle tutorial on the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical report. Available via DIALOG. Cited 25 Oct 2013
  8. Blume L, Sack D (2010) Patterns of social capital in West German regions. Eur Urban Reg Stud. doi:10.1177/0969776408090416Google Scholar
  9. Breiman L (2001) Random forests. Mach Learn. doi:10.1023/A:1010933404324Google Scholar
  10. Breiman L, Friedman J, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, BelmontGoogle Scholar
  11. Carlsson G, Mémoli F (2010) Characterization, stability and convergence of hierarchical clustering methods. J Mach Learn Res 11:1425–1470Google Scholar
  12. Cohen WW (1995) Fast effective rule induction. In: Prieditis A, Russel S (eds) Machine learning. Proceedings of the twelfth international conference (ML 95), Lake Tahoe. Morgan Kaufmann, San FranciscoGoogle Scholar
  13. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–38Google Scholar
  14. Demsar U (2009) Data mining of geospatial data: combining visual and automatic methods. Royal Institute of Technology (PhD thesis), StockholmGoogle Scholar
  15. Dosch F (2001) Flächenverbrauch in Deutschland und Mitteleuropa. Struktur, Trends und Steuerungsoptionen durch das Boden-Bündnis. Terra-Tech 6:19–23Google Scholar
  16. European Environment Agency (2011) Landscape fragmentation in Europe. Joint EEA-FOEN report, EEA report No. 2. European Environment Agency (EEA). Available via DIALOG. Cited 25 Oct 2013
  17. European Environment Agency (2013) Technical note on HR imperviousness layer product specification. European Environment Agency (EEA). Available via DIALOG. Cited 25 Oct 2013
  18. European Spatial Planning Observation Network (ESPON) (2011) Climate change and territorial effects on regions and local economies. Available via DIALOG. Cited 25 Oct 2013
  19. Fayyad UM, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. doi:
  20. Federal Institute for Research on Building, Urban Affairs and Spatial Development (ed) (2013) Spatial typologies of the Federal Institute for Research on Building, Urban Affairs and Spatial Development. Technical report. Available via DIALOG. Cited 25 Oct 2013
  21. Fienberg SE (2007) The analysis of cross-classified categorical data, 2nd edn. Springer, New YorkCrossRefGoogle Scholar
  22. Frenkel A (2004) Land-use patterns in the classification of cities: the Israeli case. Environ Plan B Plan Des 31(5):711–730CrossRefGoogle Scholar
  23. Geyler S, Warner B, Brandl A, Kuntze M (2008) Clusteranalyse der Gemeinden in der Kernregion Mitteldeutschland. Eine Typisierung der Region nach Entwicklungsparametern und Rahmenbedingungen. In: Foschungsverbund KoReMi (eds) Schriftenreihe des Forschungsverbundes KoReMi (Band 2), LeipzigGoogle Scholar
  24. Guo D (2009) Multivariate spatial clustering and geovisualization. In: Miller HJ, Han J (eds) Geographic data mining and knowledge discovery, 2nd edn. Chapman & Hall, Boca RatonGoogle Scholar
  25. Hand D, Mannila H, Smyth P (2001) Principles of data mining. MIT, CambridgeGoogle Scholar
  26. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Data mining, inference, and prediction, 2nd edn. Springer, New YorkGoogle Scholar
  27. Hietel E, Waldhardt R, Otte A (2004) Analysing land-cover changes in relation to environmental variables in Hesse, Germany. Landsc Ecol. doi:10.1023/B:LAND.0000036138.82213.80Google Scholar
  28. Izenman AJ (2008) Modern multivariate statistical techniques. Springer, New YorkCrossRefGoogle Scholar
  29. Jaeger J (2000) Landscape division, splitting index, and effective mesh size: new measures of landscape fragmentation. Landsc Ecol. doi:10.1023/A:1008129329289Google Scholar
  30. Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern. doi:10.1007/BF00337288Google Scholar
  31. Kohonen T (2001) Self-organizing maps, 3rd edn. Springer, New YorkCrossRefGoogle Scholar
  32. Kronthaler F (2005) Economic capability of East German regions: results of a cluster analysis. Reg Stud. doi:10.1080/00343400500213630Google Scholar
  33. Krüger T, Meinel G, Schumacher U (2013) Land-use monitoring by topographic data analysis. Cartogr Geogr Inf Sci. doi:10.1080/15230406.2013.809232Google Scholar
  34. Kuncheva L (2004) Combining pattern classifiers: methods and algorithms. Wiley, HobokenCrossRefGoogle Scholar
  35. Kupas K, Klebe G, Ultsch A (2004) Comparison of substructural epitopes in enzyme active sites using self-organizing maps. J Comput Aided Mol Des. doi:10.1007/s10822-004-6553-xGoogle Scholar
  36. Laube P (2011) Raumzeitliches data mining. In: Schilcher M (ed) Geoinformationssysteme. abc, HeidelbergGoogle Scholar
  37. Limpert E, Stahel WA, Abbt M (2001) Log-normal distributions across the sciences: keys and clues. BioScience 51(5):343–352CrossRefGoogle Scholar
  38. Loetsch J, Ultsch A (2013) A machine-learned knowledge discovery method for associating complex phenotypes with complex genotypes. Application to pain. J Biomed Inform. doi:10.1016/j.jbi.2013.07.010Google Scholar
  39. Meinel G (2013) Auf dem Weg zu einer besseren Flächenstatistik. Raumforschung Raumordnung. doi:10.1007/s13147-013-0256-5Google Scholar
  40. Miller HJ, Han J (2009) Geographic data mining and knowledge discovery, 2nd edn. Chapman & Hall, LondonGoogle Scholar
  41. Moerchen F, Ultsch A, Hoos O (2006) Extracting interpretable muscle activation patterns with time series knowledge mining. Int J Knowl Based Intell Eng Syst 9(3):197–208Google Scholar
  42. Moser B, Jaeger JAG, Tappeiner U, Tasser E, Eiselt B (2007) Modification of the effective mesh size for measuring landscape fragmentation to solve the boundary problem. Landsc Ecol. doi:10.1007/s10980-006-9023-0.07.010Google Scholar
  43. Qu W (2000) Zur Anwendung der Fuzzy-Clusteranalyse in der Grundstückswertermittlung. Univ. Hannover, Fachbereich Bauingenieur- und Vermessungswesen, HannoverGoogle Scholar
  44. Quinlan R (1993) C4.5 – programs for machine learning. Mach Learn. doi:10.1007/BF00993309Google Scholar
  45. Quinlan R (2013) C5.0 and see 5: illustrative examples. Available via DIALOG. Cited 25 Oct 2013
  46. Rasul G, Thapa GB, Zoebisch MA (2004) Determinants of land-use changes in the Chittagong hill tracts of Bangladesh. Appl Geogr 24(3):217–240CrossRefGoogle Scholar
  47. Rice JA (2007) Mathematical statistics and data analysis, 3rd edn. Duxbury Press, Pacific GroveGoogle Scholar
  48. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Comput Appl Math. doi:10.1016/0377-0427(87)90125-7Google Scholar
  49. Siedentop S, Fina S (2010) Monitoring urban sprawl in Germany: towards a GIS-based measurement and assessment approach. J Land Use Sci. doi:10.1080/1747423X.2010.481075Google Scholar
  50. Siedentop S, Kausch S, Einig K, Gssel J (2003) Siedlungsstrukturelle Veränderungen im Umland der Agglomerationsräume. In: Bundesamt für Bauwesen und Raumordnung (ed) Forschungen (Band 114), Selbstverlag, BonnGoogle Scholar
  51. Siedentop S, Heiland S, Lehmann I, Schauerte-Lüke N (2007) Regionale Schlüsselindikatoren nachhaltiger Flächennutzung fr die Fortschrittsberichte der Nationalen Nachhaltigkeitsstrategie Flächenziele (Nachhaltigkeitsbarometer Fläche). In: Bundesamt für Bauwesen und Raumordnung (ed) Forschungen (Band 130), BonnGoogle Scholar
  52. Steinhardt U, Herzog F, Lausch A, Müller E, Lehmann S (1999) The hemeroby index for landscape monitoring and evaluation. In: Hyatt DE, Lenz R, Pykh YA (eds) Environmental indices systems analysis approach. Advances in sustainable development. Proceedings of the first international conference on environmental indices systems analysis approach, St. Petersburg, 7–11 July 1997. EOLSS Publishers, OxfordGoogle Scholar
  53. Storch H, Schmidt M (2008) Spatial planning: indicators to assess the efficiency of land consumption and land-use. In: Schmidt M, Knopp L (eds) Standards and thresholds for impact assessment. Environmental protection in the European union series. Springer, BerlinGoogle Scholar
  54. Streich B (2009) Stadtplanung in der Wissensgesellschaft, 2nd edn. VS, WiesbadenGoogle Scholar
  55. Thompson DM, Serneels S, Lambin EF (2002) Land use strategies in the Mara ecosystem: a spatial analysis linking socio-economic data with landscape variables. In: Walsh SJ, Crews-Meyer KA (eds) Linking people, place and policy: a GIScience approach. Kluwer Academic, NorwellGoogle Scholar
  56. Tukey JW (1977) Exploratory data analysis. Pearson, LondonGoogle Scholar
  57. Ultsch A (1991) Konnektionistische Modelle und ihre Integration mit wissensbasierten Systemen. Dekanat Informatik, DortmundGoogle Scholar
  58. Ultsch A (1999) Data mining and knowledge discovery with emergent self-organizing feature maps for multivariate time series. In: Oja E, Kaski S (eds) Kohonen maps. Elsevier, AmsterdamGoogle Scholar
  59. Ultsch A (2003) Pareto density estimation: a density estimation for knowledge discovery. In: Baier D, Wernecke K D (eds) Innovations in classification, data science, and information systems. Proceedings 27th annual conference of the German classification society. Springer, BerlinGoogle Scholar
  60. Ultsch A (2013) Databionic knowledge discovery. Lecture notes, Department of Mathematics and Computer Science, Philipps-University of Marburg. Available via DIALOG. Cited 25 Oct 2013
  61. Ultsch A, Herrmann L (2006) Automatic clustering with U*C. Technical report, Department of Mathematics and Computer Science, Philipps-University of Marburg. Available via DIALOG. Cited 25 Oct 2013
  62. Walz U, Stein C (2014) Indicators of hemeroby for the monitoring of landscapes in Germany. J Nat Conserv. doi:10.1016/j.jnc.2014.01.007Google Scholar
  63. Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244CrossRefGoogle Scholar
  64. Wilkinson L, Friendly M (2009) The history of the cluster heat map. Am Stat. doi:10.1198/tas.2009.0033Google Scholar
  65. Zentrale Stelle Hausumringe und Hauskoordinaten (2013) Produktbeschreibungen Hausumringe und Hauskoordinaten. Bezirksregierung Köln. Available via DIALOG. Cited 25 Oct 2013

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Leibniz Institute of Ecological Urban and Regional DevelopmentDresdenGermany
  2. 2.Department of Mathematics and Computer SciencePhilipps-University of MarburgMarburgGermany

Personalised recommendations