Abstract
The objective of this paper is to present a methodology for discovering comprehensible, valid, potentially innovative, and useful patterns, i.e., new knowledge, in multidimensional spatial data. Techniques from statistics, machine learning, and data mining are applied in consecutive logical steps to allow the visualization of results and the application of validation procedures at each stage. However, the approach does not end with a data cluster; rather, if such a valid cluster has been achieved, then the question is posed: “What do the clusters mean?”. Symbolic machine learning methods are employed to produce an explanation of the clusters in terms of rules employing an understandable subset of the high-dimensional data variables. This combined with canonical representatives of a cluster and consideration of the spatial distribution of the clusters lead to hypothesis on emergent data structures, that is, potential new knowledge. The approach is demonstrated on an exemplary data set of German urban districts featuring seven dimensions of land use.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alpaydin E (2008) Introduction to machine learning, 2nd edn. MIT, Cambridge
Aumayr ChM (2007) European region types in EU-25. Eur J Comp Econ 4(2):109–147
Bätzing W, Dickhörner Y (2001) Die Typisierungen der Alpengemeinden nach Entwicklungsverlaufsklassen für den Zeitraum 1870–1990. Mitteilungen der Fränkischen Geographischen Gesellschaft 48:273303
Behnisch M (2009) Urban data mining. KIT Scientific Publishing, Karlsruhe
Behnisch M, Ultsch A (2009) Urban data mining: spatiotemporal exploration of multidimensional data. Build Res Inf. doi:10.1080/09613210903189343
Behnisch M, Hagemann U, Meinel G (2013) Analysergebnisse zum Gebäudebestand in Deutschland auf der Grundlage von Geobasisdaten. In: Meinel G, Schumacher U, Behnisch M (eds) Flächennutzungsmonitoring V Methodik Analyseergebnisse Flächenmanagement (IR Schriftenreihe, Bd. 61) Rhombos, Berlin
Bilmes J (1998) A gentle tutorial on the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical report. Available via DIALOG. http://crow.ee.washington.edu/people/bulyko/papers/em.pdf. Cited 25 Oct 2013
Blume L, Sack D (2010) Patterns of social capital in West German regions. Eur Urban Reg Stud. doi:10.1177/0969776408090416
Breiman L (2001) Random forests. Mach Learn. doi:10.1023/A:1010933404324
Breiman L, Friedman J, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont
Carlsson G, Mémoli F (2010) Characterization, stability and convergence of hierarchical clustering methods. J Mach Learn Res 11:1425–1470
Cohen WW (1995) Fast effective rule induction. In: Prieditis A, Russel S (eds) Machine learning. Proceedings of the twelfth international conference (ML 95), Lake Tahoe. Morgan Kaufmann, San Francisco
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–38
Demsar U (2009) Data mining of geospatial data: combining visual and automatic methods. Royal Institute of Technology (PhD thesis), Stockholm
Dosch F (2001) Flächenverbrauch in Deutschland und Mitteleuropa. Struktur, Trends und Steuerungsoptionen durch das Boden-Bündnis. Terra-Tech 6:19–23
European Environment Agency (2011) Landscape fragmentation in Europe. Joint EEA-FOEN report, EEA report No. 2. European Environment Agency (EEA). Available via DIALOG. http://www.eea.europa.eu/publications/landscape-fragmentation-in-europe. Cited 25 Oct 2013
European Environment Agency (2013) Technical note on HR imperviousness layer product specification. European Environment Agency (EEA). Available via DIALOG. http://www.gmes-geoland.info/fileadmin/geoland2/redakteur/pdf/Project_Documentation/Service_Specification/TechnicalProductSpecification_HR_Imperviousness_Layer_I1-01.pdf. Cited 25 Oct 2013
European Spatial Planning Observation Network (ESPON) (2011) Climate change and territorial effects on regions and local economies. Available via DIALOG. http://www.espon.eu/main/Menu_Projects/Menu_AppliedResearch/climate.html. Cited 25 Oct 2013
Fayyad UM, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. doi:http://dx.doi.org/10.1609/aimag.v17i3.1230
Federal Institute for Research on Building, Urban Affairs and Spatial Development (ed) (2013) Spatial typologies of the Federal Institute for Research on Building, Urban Affairs and Spatial Development. Technical report. Available via DIALOG. http://www.bbsr.bund.de/BBSR/DE/Raumbeobachtung/Raumabgrenzungen/raumabgrenzungen_node.html. Cited 25 Oct 2013
Fienberg SE (2007) The analysis of cross-classified categorical data, 2nd edn. Springer, New York
Frenkel A (2004) Land-use patterns in the classification of cities: the Israeli case. Environ Plan B Plan Des 31(5):711–730
Geyler S, Warner B, Brandl A, Kuntze M (2008) Clusteranalyse der Gemeinden in der Kernregion Mitteldeutschland. Eine Typisierung der Region nach Entwicklungsparametern und Rahmenbedingungen. In: Foschungsverbund KoReMi (eds) Schriftenreihe des Forschungsverbundes KoReMi (Band 2), Leipzig
Guo D (2009) Multivariate spatial clustering and geovisualization. In: Miller HJ, Han J (eds) Geographic data mining and knowledge discovery, 2nd edn. Chapman & Hall, Boca Raton
Hand D, Mannila H, Smyth P (2001) Principles of data mining. MIT, Cambridge
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Data mining, inference, and prediction, 2nd edn. Springer, New York
Hietel E, Waldhardt R, Otte A (2004) Analysing land-cover changes in relation to environmental variables in Hesse, Germany. Landsc Ecol. doi:10.1023/B:LAND.0000036138.82213.80
Izenman AJ (2008) Modern multivariate statistical techniques. Springer, New York
Jaeger J (2000) Landscape division, splitting index, and effective mesh size: new measures of landscape fragmentation. Landsc Ecol. doi:10.1023/A:1008129329289
Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern. doi:10.1007/BF00337288
Kohonen T (2001) Self-organizing maps, 3rd edn. Springer, New York
Kronthaler F (2005) Economic capability of East German regions: results of a cluster analysis. Reg Stud. doi:10.1080/00343400500213630
Krüger T, Meinel G, Schumacher U (2013) Land-use monitoring by topographic data analysis. Cartogr Geogr Inf Sci. doi:10.1080/15230406.2013.809232
Kuncheva L (2004) Combining pattern classifiers: methods and algorithms. Wiley, Hoboken
Kupas K, Klebe G, Ultsch A (2004) Comparison of substructural epitopes in enzyme active sites using self-organizing maps. J Comput Aided Mol Des. doi:10.1007/s10822-004-6553-x
Laube P (2011) Raumzeitliches data mining. In: Schilcher M (ed) Geoinformationssysteme. abc, Heidelberg
Limpert E, Stahel WA, Abbt M (2001) Log-normal distributions across the sciences: keys and clues. BioScience 51(5):343–352
Loetsch J, Ultsch A (2013) A machine-learned knowledge discovery method for associating complex phenotypes with complex genotypes. Application to pain. J Biomed Inform. doi:10.1016/j.jbi.2013.07.010
Meinel G (2013) Auf dem Weg zu einer besseren Flächenstatistik. Raumforschung Raumordnung. doi:10.1007/s13147-013-0256-5
Miller HJ, Han J (2009) Geographic data mining and knowledge discovery, 2nd edn. Chapman & Hall, London
Moerchen F, Ultsch A, Hoos O (2006) Extracting interpretable muscle activation patterns with time series knowledge mining. Int J Knowl Based Intell Eng Syst 9(3):197–208
Moser B, Jaeger JAG, Tappeiner U, Tasser E, Eiselt B (2007) Modification of the effective mesh size for measuring landscape fragmentation to solve the boundary problem. Landsc Ecol. doi:10.1007/s10980-006-9023-0.07.010
Qu W (2000) Zur Anwendung der Fuzzy-Clusteranalyse in der Grundstückswertermittlung. Univ. Hannover, Fachbereich Bauingenieur- und Vermessungswesen, Hannover
Quinlan R (1993) C4.5 – programs for machine learning. Mach Learn. doi:10.1007/BF00993309
Quinlan R (2013) C5.0 and see 5: illustrative examples. Available via DIALOG. http://www.rulequest.com/. Cited 25 Oct 2013
Rasul G, Thapa GB, Zoebisch MA (2004) Determinants of land-use changes in the Chittagong hill tracts of Bangladesh. Appl Geogr 24(3):217–240
Rice JA (2007) Mathematical statistics and data analysis, 3rd edn. Duxbury Press, Pacific Grove
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Comput Appl Math. doi:10.1016/0377-0427(87)90125-7
Siedentop S, Fina S (2010) Monitoring urban sprawl in Germany: towards a GIS-based measurement and assessment approach. J Land Use Sci. doi:10.1080/1747423X.2010.481075
Siedentop S, Kausch S, Einig K, Gssel J (2003) Siedlungsstrukturelle Veränderungen im Umland der Agglomerationsräume. In: Bundesamt für Bauwesen und Raumordnung (ed) Forschungen (Band 114), Selbstverlag, Bonn
Siedentop S, Heiland S, Lehmann I, Schauerte-Lüke N (2007) Regionale Schlüsselindikatoren nachhaltiger Flächennutzung fr die Fortschrittsberichte der Nationalen Nachhaltigkeitsstrategie Flächenziele (Nachhaltigkeitsbarometer Fläche). In: Bundesamt für Bauwesen und Raumordnung (ed) Forschungen (Band 130), Bonn
Steinhardt U, Herzog F, Lausch A, Müller E, Lehmann S (1999) The hemeroby index for landscape monitoring and evaluation. In: Hyatt DE, Lenz R, Pykh YA (eds) Environmental indices systems analysis approach. Advances in sustainable development. Proceedings of the first international conference on environmental indices systems analysis approach, St. Petersburg, 7–11 July 1997. EOLSS Publishers, Oxford
Storch H, Schmidt M (2008) Spatial planning: indicators to assess the efficiency of land consumption and land-use. In: Schmidt M, Knopp L (eds) Standards and thresholds for impact assessment. Environmental protection in the European union series. Springer, Berlin
Streich B (2009) Stadtplanung in der Wissensgesellschaft, 2nd edn. VS, Wiesbaden
Thompson DM, Serneels S, Lambin EF (2002) Land use strategies in the Mara ecosystem: a spatial analysis linking socio-economic data with landscape variables. In: Walsh SJ, Crews-Meyer KA (eds) Linking people, place and policy: a GIScience approach. Kluwer Academic, Norwell
Tukey JW (1977) Exploratory data analysis. Pearson, London
Ultsch A (1991) Konnektionistische Modelle und ihre Integration mit wissensbasierten Systemen. Dekanat Informatik, Dortmund
Ultsch A (1999) Data mining and knowledge discovery with emergent self-organizing feature maps for multivariate time series. In: Oja E, Kaski S (eds) Kohonen maps. Elsevier, Amsterdam
Ultsch A (2003) Pareto density estimation: a density estimation for knowledge discovery. In: Baier D, Wernecke K D (eds) Innovations in classification, data science, and information systems. Proceedings 27th annual conference of the German classification society. Springer, Berlin
Ultsch A (2013) Databionic knowledge discovery. Lecture notes, Department of Mathematics and Computer Science, Philipps-University of Marburg. Available via DIALOG. http://www.uni-marburg.de/fb12/informatik/arbeitsgebiete/bioinf/profalfredultsch. Cited 25 Oct 2013
Ultsch A, Herrmann L (2006) Automatic clustering with U*C. Technical report, Department of Mathematics and Computer Science, Philipps-University of Marburg. Available via DIALOG. http://www.uni-marburg.de/fb12/forschung/berichte/berichteinformtk/autom_clust. Cited 25 Oct 2013
Walz U, Stein C (2014) Indicators of hemeroby for the monitoring of landscapes in Germany. J Nat Conserv. doi:10.1016/j.jnc.2014.01.007
Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244
Wilkinson L, Friendly M (2009) The history of the cluster heat map. Am Stat. doi:10.1198/tas.2009.0033
Zentrale Stelle Hausumringe und Hauskoordinaten (2013) Produktbeschreibungen Hausumringe und Hauskoordinaten. Bezirksregierung Köln. Available via DIALOG. http://www.bezreg-koeln.nrw.de/brk_internet/organisation/abteilung07/dezernat_74/zshh/index.html. Cited 25 Oct 2013
Acknowledgements
The authors acknowledge the yearly data provided by the Federal Agency for Cartography and Geodesy, which was crucial for the development of the land use monitoring. The Federal Institute for Research on Building, Urban Affairs and Spatial Development (BBSR) makes several spatial typologies available. The authors would like to thank the colleagues of the Leibniz Institute of Ecological Urban and Regional Development (IOER) for the indicator computation and the fruitful cooperation. Further, we cordially appreciate the remarks of the reviewers and editors for giving constructive and helpful comments to improve the quality of this chapter.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix 1
1.1 Nonlinear Transformations of the Variables in the UD Data
\(\mathit{OpenSpaceMeshSize: log}\) \(\mathit{BuildingArea: log}\) \(\mathit{SettlementDensity: log}\) \(\mathit{SealedSurface: sqrt}\) \(\mathit{LandConsumption: sqrt}\) \(\mathit{ProtectedAreas: sqrt}\) \(\mathit{HemerobyIndex: identity}(\mathit{notransformation})\)
Appendix 2
2.1 Cluster Representatives
U-matrix cluster | Urban district |
---|---|
UC1 | Leverkusen |
UC2 | Heilbronn |
UC3 | Zweibrücken |
UC4 | Freiburg im Breisgau |
UC5 | Bremen |
UC6 | Aschaffenburg |
UC7 | Landau in der Pfalz |
UC8 | Berlin |
UC9 | Suhl |
Appendix 3
3.1 Rules Explaining the U-Matrix Clustering
UD data belongs to Cluster UC1, if
\(\mathit{log}(\mathit{SealedSurfaces}) \geq 48.3179\) and
\(\mathit{log}(\mathit{OpenSpaceMeshSize}) \leq 72.6255\) and
\(\mathit{log}(\mathit{BuildingArea}) < 65.4549\)
UD data belongs to Cluster UC2, if
\(\mathit{log}(\mathit{SealedSurfaces}) < 48.3179\) and
\(\mathit{log}(\mathit{BuildingArea}) < 75.0076\) and
\(\mathit{sqrt}(\mathit{ProtectedAreas}) < 60.8555\) or
\(\mathit{log}(\mathit{SealedSurfaces}) \geq 48.3179\) and
\(\mathit{log}(\mathit{BuildingArea}) \geq 65.4549\) and
\(\mathit{log}(\mathit{OpenSpaceMeshSize}) < 72.6255\)
UD data belongs to Cluster UC3, if
\(\mathit{log}(\mathit{BuildingArea}) \geq 75.0076\) and
\(\mathit{log}(\mathit{SealedSurfaces}) < 48.3179\)
UD data belongs to Cluster UC4, if
\(\mathit{sqrt}(\mathit{ProtectedAreas}) \geq 60.8555\) and
\(\mathit{log}(\mathit{BuildingArea}) < 75.0076\) and
\(\mathit{log}(\mathit{SealedSurfaces}) < 48.3179\)
UD data belongs to Cluster UC5, if
\(\mathit{log}(\mathit{OpenSpaceMeshSize}) \geq 72.6255\) and
\(\mathit{log}(\mathit{SealedSurfaces}) \geq 48.3179\)
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Behnisch, M., Ultsch, A. (2015). Knowledge Discovery in Spatial Planning Data: A Concept for Cluster Understanding. In: Helbich, M., Jokar Arsanjani, J., Leitner, M. (eds) Computational Approaches for Urban Environments. Geotechnologies and the Environment, vol 13. Springer, Cham. https://doi.org/10.1007/978-3-319-11469-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-11469-9_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11468-2
Online ISBN: 978-3-319-11469-9
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)