Skip to main content

Knowledge Discovery in Spatial Planning Data: A Concept for Cluster Understanding

  • Chapter
  • First Online:
Computational Approaches for Urban Environments

Part of the book series: Geotechnologies and the Environment ((GEOTECH,volume 13))

  • 2001 Accesses

Abstract

The objective of this paper is to present a methodology for discovering comprehensible, valid, potentially innovative, and useful patterns, i.e., new knowledge, in multidimensional spatial data. Techniques from statistics, machine learning, and data mining are applied in consecutive logical steps to allow the visualization of results and the application of validation procedures at each stage. However, the approach does not end with a data cluster; rather, if such a valid cluster has been achieved, then the question is posed: “What do the clusters mean?”. Symbolic machine learning methods are employed to produce an explanation of the clusters in terms of rules employing an understandable subset of the high-dimensional data variables. This combined with canonical representatives of a cluster and consideration of the spatial distribution of the clusters lead to hypothesis on emergent data structures, that is, potential new knowledge. The approach is demonstrated on an exemplary data set of German urban districts featuring seven dimensions of land use.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Alpaydin E (2008) Introduction to machine learning, 2nd edn. MIT, Cambridge

    Google Scholar 

  • Aumayr ChM (2007) European region types in EU-25. Eur J Comp Econ 4(2):109–147

    Google Scholar 

  • Bätzing W, Dickhörner Y (2001) Die Typisierungen der Alpengemeinden nach Entwicklungsverlaufsklassen für den Zeitraum 1870–1990. Mitteilungen der Fränkischen Geographischen Gesellschaft 48:273303

    Google Scholar 

  • Behnisch M (2009) Urban data mining. KIT Scientific Publishing, Karlsruhe

    Google Scholar 

  • Behnisch M, Ultsch A (2009) Urban data mining: spatiotemporal exploration of multidimensional data. Build Res Inf. doi:10.1080/09613210903189343

    Google Scholar 

  • Behnisch M, Hagemann U, Meinel G (2013) Analysergebnisse zum Gebäudebestand in Deutschland auf der Grundlage von Geobasisdaten. In: Meinel G, Schumacher U, Behnisch M (eds) Flächennutzungsmonitoring V Methodik Analyseergebnisse Flächenmanagement (IR Schriftenreihe, Bd. 61) Rhombos, Berlin

    Google Scholar 

  • Bilmes J (1998) A gentle tutorial on the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical report. Available via DIALOG. http://crow.ee.washington.edu/people/bulyko/papers/em.pdf. Cited 25 Oct 2013

  • Blume L, Sack D (2010) Patterns of social capital in West German regions. Eur Urban Reg Stud. doi:10.1177/0969776408090416

    Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn. doi:10.1023/A:1010933404324

    Google Scholar 

  • Breiman L, Friedman J, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont

    Google Scholar 

  • Carlsson G, Mémoli F (2010) Characterization, stability and convergence of hierarchical clustering methods. J Mach Learn Res 11:1425–1470

    Google Scholar 

  • Cohen WW (1995) Fast effective rule induction. In: Prieditis A, Russel S (eds) Machine learning. Proceedings of the twelfth international conference (ML 95), Lake Tahoe. Morgan Kaufmann, San Francisco

    Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–38

    Google Scholar 

  • Demsar U (2009) Data mining of geospatial data: combining visual and automatic methods. Royal Institute of Technology (PhD thesis), Stockholm

    Google Scholar 

  • Dosch F (2001) Flächenverbrauch in Deutschland und Mitteleuropa. Struktur, Trends und Steuerungsoptionen durch das Boden-Bündnis. Terra-Tech 6:19–23

    Google Scholar 

  • European Environment Agency (2011) Landscape fragmentation in Europe. Joint EEA-FOEN report, EEA report No. 2. European Environment Agency (EEA). Available via DIALOG. http://www.eea.europa.eu/publications/landscape-fragmentation-in-europe. Cited 25 Oct 2013

  • European Environment Agency (2013) Technical note on HR imperviousness layer product specification. European Environment Agency (EEA). Available via DIALOG. http://www.gmes-geoland.info/fileadmin/geoland2/redakteur/pdf/Project_Documentation/Service_Specification/TechnicalProductSpecification_HR_Imperviousness_Layer_I1-01.pdf. Cited 25 Oct 2013

  • European Spatial Planning Observation Network (ESPON) (2011) Climate change and territorial effects on regions and local economies. Available via DIALOG. http://www.espon.eu/main/Menu_Projects/Menu_AppliedResearch/climate.html. Cited 25 Oct 2013

  • Fayyad UM, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. doi:http://dx.doi.org/10.1609/aimag.v17i3.1230

  • Federal Institute for Research on Building, Urban Affairs and Spatial Development (ed) (2013) Spatial typologies of the Federal Institute for Research on Building, Urban Affairs and Spatial Development. Technical report. Available via DIALOG. http://www.bbsr.bund.de/BBSR/DE/Raumbeobachtung/Raumabgrenzungen/raumabgrenzungen_node.html. Cited 25 Oct 2013

  • Fienberg SE (2007) The analysis of cross-classified categorical data, 2nd edn. Springer, New York

    Book  Google Scholar 

  • Frenkel A (2004) Land-use patterns in the classification of cities: the Israeli case. Environ Plan B Plan Des 31(5):711–730

    Article  Google Scholar 

  • Geyler S, Warner B, Brandl A, Kuntze M (2008) Clusteranalyse der Gemeinden in der Kernregion Mitteldeutschland. Eine Typisierung der Region nach Entwicklungsparametern und Rahmenbedingungen. In: Foschungsverbund KoReMi (eds) Schriftenreihe des Forschungsverbundes KoReMi (Band 2), Leipzig

    Google Scholar 

  • Guo D (2009) Multivariate spatial clustering and geovisualization. In: Miller HJ, Han J (eds) Geographic data mining and knowledge discovery, 2nd edn. Chapman & Hall, Boca Raton

    Google Scholar 

  • Hand D, Mannila H, Smyth P (2001) Principles of data mining. MIT, Cambridge

    Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Data mining, inference, and prediction, 2nd edn. Springer, New York

    Google Scholar 

  • Hietel E, Waldhardt R, Otte A (2004) Analysing land-cover changes in relation to environmental variables in Hesse, Germany. Landsc Ecol. doi:10.1023/B:LAND.0000036138.82213.80

    Google Scholar 

  • Izenman AJ (2008) Modern multivariate statistical techniques. Springer, New York

    Book  Google Scholar 

  • Jaeger J (2000) Landscape division, splitting index, and effective mesh size: new measures of landscape fragmentation. Landsc Ecol. doi:10.1023/A:1008129329289

    Google Scholar 

  • Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern. doi:10.1007/BF00337288

    Google Scholar 

  • Kohonen T (2001) Self-organizing maps, 3rd edn. Springer, New York

    Book  Google Scholar 

  • Kronthaler F (2005) Economic capability of East German regions: results of a cluster analysis. Reg Stud. doi:10.1080/00343400500213630

    Google Scholar 

  • Krüger T, Meinel G, Schumacher U (2013) Land-use monitoring by topographic data analysis. Cartogr Geogr Inf Sci. doi:10.1080/15230406.2013.809232

    Google Scholar 

  • Kuncheva L (2004) Combining pattern classifiers: methods and algorithms. Wiley, Hoboken

    Book  Google Scholar 

  • Kupas K, Klebe G, Ultsch A (2004) Comparison of substructural epitopes in enzyme active sites using self-organizing maps. J Comput Aided Mol Des. doi:10.1007/s10822-004-6553-x

    Google Scholar 

  • Laube P (2011) Raumzeitliches data mining. In: Schilcher M (ed) Geoinformationssysteme. abc, Heidelberg

    Google Scholar 

  • Limpert E, Stahel WA, Abbt M (2001) Log-normal distributions across the sciences: keys and clues. BioScience 51(5):343–352

    Article  Google Scholar 

  • Loetsch J, Ultsch A (2013) A machine-learned knowledge discovery method for associating complex phenotypes with complex genotypes. Application to pain. J Biomed Inform. doi:10.1016/j.jbi.2013.07.010

    Google Scholar 

  • Meinel G (2013) Auf dem Weg zu einer besseren Flächenstatistik. Raumforschung Raumordnung. doi:10.1007/s13147-013-0256-5

    Google Scholar 

  • Miller HJ, Han J (2009) Geographic data mining and knowledge discovery, 2nd edn. Chapman & Hall, London

    Google Scholar 

  • Moerchen F, Ultsch A, Hoos O (2006) Extracting interpretable muscle activation patterns with time series knowledge mining. Int J Knowl Based Intell Eng Syst 9(3):197–208

    Google Scholar 

  • Moser B, Jaeger JAG, Tappeiner U, Tasser E, Eiselt B (2007) Modification of the effective mesh size for measuring landscape fragmentation to solve the boundary problem. Landsc Ecol. doi:10.1007/s10980-006-9023-0.07.010

    Google Scholar 

  • Qu W (2000) Zur Anwendung der Fuzzy-Clusteranalyse in der Grundstückswertermittlung. Univ. Hannover, Fachbereich Bauingenieur- und Vermessungswesen, Hannover

    Google Scholar 

  • Quinlan R (1993) C4.5 – programs for machine learning. Mach Learn. doi:10.1007/BF00993309

    Google Scholar 

  • Quinlan R (2013) C5.0 and see 5: illustrative examples. Available via DIALOG. http://www.rulequest.com/. Cited 25 Oct 2013

  • Rasul G, Thapa GB, Zoebisch MA (2004) Determinants of land-use changes in the Chittagong hill tracts of Bangladesh. Appl Geogr 24(3):217–240

    Article  Google Scholar 

  • Rice JA (2007) Mathematical statistics and data analysis, 3rd edn. Duxbury Press, Pacific Grove

    Google Scholar 

  • Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Comput Appl Math. doi:10.1016/0377-0427(87)90125-7

    Google Scholar 

  • Siedentop S, Fina S (2010) Monitoring urban sprawl in Germany: towards a GIS-based measurement and assessment approach. J Land Use Sci. doi:10.1080/1747423X.2010.481075

    Google Scholar 

  • Siedentop S, Kausch S, Einig K, Gssel J (2003) Siedlungsstrukturelle Veränderungen im Umland der Agglomerationsräume. In: Bundesamt für Bauwesen und Raumordnung (ed) Forschungen (Band 114), Selbstverlag, Bonn

    Google Scholar 

  • Siedentop S, Heiland S, Lehmann I, Schauerte-Lüke N (2007) Regionale Schlüsselindikatoren nachhaltiger Flächennutzung fr die Fortschrittsberichte der Nationalen Nachhaltigkeitsstrategie Flächenziele (Nachhaltigkeitsbarometer Fläche). In: Bundesamt für Bauwesen und Raumordnung (ed) Forschungen (Band 130), Bonn

    Google Scholar 

  • Steinhardt U, Herzog F, Lausch A, Müller E, Lehmann S (1999) The hemeroby index for landscape monitoring and evaluation. In: Hyatt DE, Lenz R, Pykh YA (eds) Environmental indices systems analysis approach. Advances in sustainable development. Proceedings of the first international conference on environmental indices systems analysis approach, St. Petersburg, 7–11 July 1997. EOLSS Publishers, Oxford

    Google Scholar 

  • Storch H, Schmidt M (2008) Spatial planning: indicators to assess the efficiency of land consumption and land-use. In: Schmidt M, Knopp L (eds) Standards and thresholds for impact assessment. Environmental protection in the European union series. Springer, Berlin

    Google Scholar 

  • Streich B (2009) Stadtplanung in der Wissensgesellschaft, 2nd edn. VS, Wiesbaden

    Google Scholar 

  • Thompson DM, Serneels S, Lambin EF (2002) Land use strategies in the Mara ecosystem: a spatial analysis linking socio-economic data with landscape variables. In: Walsh SJ, Crews-Meyer KA (eds) Linking people, place and policy: a GIScience approach. Kluwer Academic, Norwell

    Google Scholar 

  • Tukey JW (1977) Exploratory data analysis. Pearson, London

    Google Scholar 

  • Ultsch A (1991) Konnektionistische Modelle und ihre Integration mit wissensbasierten Systemen. Dekanat Informatik, Dortmund

    Google Scholar 

  • Ultsch A (1999) Data mining and knowledge discovery with emergent self-organizing feature maps for multivariate time series. In: Oja E, Kaski S (eds) Kohonen maps. Elsevier, Amsterdam

    Google Scholar 

  • Ultsch A (2003) Pareto density estimation: a density estimation for knowledge discovery. In: Baier D, Wernecke K D (eds) Innovations in classification, data science, and information systems. Proceedings 27th annual conference of the German classification society. Springer, Berlin

    Google Scholar 

  • Ultsch A (2013) Databionic knowledge discovery. Lecture notes, Department of Mathematics and Computer Science, Philipps-University of Marburg. Available via DIALOG. http://www.uni-marburg.de/fb12/informatik/arbeitsgebiete/bioinf/profalfredultsch. Cited 25 Oct 2013

  • Ultsch A, Herrmann L (2006) Automatic clustering with U*C. Technical report, Department of Mathematics and Computer Science, Philipps-University of Marburg. Available via DIALOG. http://www.uni-marburg.de/fb12/forschung/berichte/berichteinformtk/autom_clust. Cited 25 Oct 2013

  • Walz U, Stein C (2014) Indicators of hemeroby for the monitoring of landscapes in Germany. J Nat Conserv. doi:10.1016/j.jnc.2014.01.007

    Google Scholar 

  • Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244

    Article  Google Scholar 

  • Wilkinson L, Friendly M (2009) The history of the cluster heat map. Am Stat. doi:10.1198/tas.2009.0033

    Google Scholar 

  • Zentrale Stelle Hausumringe und Hauskoordinaten (2013) Produktbeschreibungen Hausumringe und Hauskoordinaten. Bezirksregierung Köln. Available via DIALOG. http://www.bezreg-koeln.nrw.de/brk_internet/organisation/abteilung07/dezernat_74/zshh/index.html. Cited 25 Oct 2013

Download references

Acknowledgements

The authors acknowledge the yearly data provided by the Federal Agency for Cartography and Geodesy, which was crucial for the development of the land use monitoring. The Federal Institute for Research on Building, Urban Affairs and Spatial Development (BBSR) makes several spatial typologies available. The authors would like to thank the colleagues of the Leibniz Institute of Ecological Urban and Regional Development (IOER) for the indicator computation and the fruitful cooperation. Further, we cordially appreciate the remarks of the reviewers and editors for giving constructive and helpful comments to improve the quality of this chapter.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Behnisch .

Editor information

Editors and Affiliations

Appendices

Appendix 1

1.1 Nonlinear Transformations of the Variables in the UD Data

\(\mathit{OpenSpaceMeshSize: log}\) \(\mathit{BuildingArea: log}\) \(\mathit{SettlementDensity: log}\) \(\mathit{SealedSurface: sqrt}\) \(\mathit{LandConsumption: sqrt}\) \(\mathit{ProtectedAreas: sqrt}\) \(\mathit{HemerobyIndex: identity}(\mathit{notransformation})\)

Appendix 2

2.1 Cluster Representatives

U-matrix cluster

Urban district

UC1

Leverkusen

UC2

Heilbronn

UC3

Zweibrücken

UC4

Freiburg im Breisgau

UC5

Bremen

UC6

Aschaffenburg

UC7

Landau in der Pfalz

UC8

Berlin

UC9

Suhl

Appendix 3

3.1 Rules Explaining the U-Matrix Clustering

UD data belongs to Cluster UC1, if

\(\mathit{log}(\mathit{SealedSurfaces}) \geq 48.3179\) and

\(\mathit{log}(\mathit{OpenSpaceMeshSize}) \leq 72.6255\) and

\(\mathit{log}(\mathit{BuildingArea}) < 65.4549\)

UD data belongs to Cluster UC2, if

\(\mathit{log}(\mathit{SealedSurfaces}) < 48.3179\) and

\(\mathit{log}(\mathit{BuildingArea}) < 75.0076\) and

\(\mathit{sqrt}(\mathit{ProtectedAreas}) < 60.8555\) or

\(\mathit{log}(\mathit{SealedSurfaces}) \geq 48.3179\) and

\(\mathit{log}(\mathit{BuildingArea}) \geq 65.4549\) and

\(\mathit{log}(\mathit{OpenSpaceMeshSize}) < 72.6255\)

UD data belongs to Cluster UC3, if

\(\mathit{log}(\mathit{BuildingArea}) \geq 75.0076\) and

\(\mathit{log}(\mathit{SealedSurfaces}) < 48.3179\)

UD data belongs to Cluster UC4, if

\(\mathit{sqrt}(\mathit{ProtectedAreas}) \geq 60.8555\) and

\(\mathit{log}(\mathit{BuildingArea}) < 75.0076\) and

\(\mathit{log}(\mathit{SealedSurfaces}) < 48.3179\)

UD data belongs to Cluster UC5, if

\(\mathit{log}(\mathit{OpenSpaceMeshSize}) \geq 72.6255\) and

\(\mathit{log}(\mathit{SealedSurfaces}) \geq 48.3179\)

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Behnisch, M., Ultsch, A. (2015). Knowledge Discovery in Spatial Planning Data: A Concept for Cluster Understanding. In: Helbich, M., Jokar Arsanjani, J., Leitner, M. (eds) Computational Approaches for Urban Environments. Geotechnologies and the Environment, vol 13. Springer, Cham. https://doi.org/10.1007/978-3-319-11469-9_3

Download citation

Publish with us

Policies and ethics