Abstract
The naming of natural features, such as hills, lakes, springs, meadows etc., provides a wealth of linguistic information; the study of the names and naming systems is called onomastics. We consider a data set containing all names and locations of about 58,000 lakes in Finland. Using computational techniques, we address two major onomastic themes. First, we address the existence of local dependencies or repulsion between occurrences of names. For this, we derive a simple form of spatial association rules. The results partially validate and partially contradict results obtained by traditional onomastic techniques. Second, we consider the existence of relatively homogeneous spatial regions with respect to the distributions of place names. Using mixture modeling, we conduct a global analysis of the data set. The clusterings of regions are spatially connected, and correspond quite well with the results obtained by other techniques; there are, however, interesting differences with previous hypotheses.
Chapter PDF
Similar content being viewed by others
References
Ripley, B.D.: Spatial Statistics. John Wiley & Sons, Chichester (1981)
Bailey, T.C., Gatrell, A.C.: Interactive Spatial Data Analysis. Longman Scientific & Technical (1995)
Leskinen, T.: The geographic names register of the National Land Survey of Finland. In: Eighth United Nations Conference on the Standardization of Geographical Names (2002)
Ripley, B.D.: The second-order analysis of stationary point processes. Journal of Applied Probability 13, 255–266 (1976)
Koperski, K., Han, J.: Discovery of spatial association rules in geographic information databases. In: Proceedings of the 4th International Symposium on Large Spatial Databases (1995)
Koperski, K.: A Progressive Refinement Approach to Spatial Data Mining. PhD thesis, Simon Fraser University (1999)
Estivill-Castro, V., Lee, I.: Data mining techniques for autonomous exploration of large volumes of geo-referenced crime data. In: 6th International Conference on Geocomputation (2001)
Huang, Y., Shekhar, S., Xiong, H.: Discovering co-location patterns from spatial datasets: A general approach. Submitted to IEEE Transactions on Knowledge and Data Engineering (TKDE) (2002) (under second round review)
Huang, Y., Xiong, H., Shekhar, S., Pei, J.: Mining confident co-location rules without a support threshold. In: Proceedings of the 18th ACM Symposium on Applied Computing, ACM SAC (2003) (to appear)
McLachlan, G., Peel, D.: Finite Mixture Models. Wiley Series in Probability and Statistics. John Wiley & Sons, Chichester (2000)
Everitt, B., Hand, D.: Finite Mixture Distributions. Monographs on Applied Probability and Statistics. Chapman and Hall, Boca Raton (1981)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39, 1–38 (1977)
Redner, R., Walker, H.: Mixture densities, maximum likelihood and the EM algorithm. SIAM Review 26, 195–234 (1984)
McLachlan, G.J.: The EM Algorithm and Extensions. Wiley & Sons, Chichester (1996)
Wu, C.J.: On the convergence properties of the EM algorithm. The Annals of Statistics 11, 95–103 (1983)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Leino, A., Mannila, H., Pitkänen, R.L. (2003). Rule Discovery and Probabilistic Modeling for Onomastic Data. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds) Knowledge Discovery in Databases: PKDD 2003. PKDD 2003. Lecture Notes in Computer Science(), vol 2838. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39804-2_27
Download citation
DOI: https://doi.org/10.1007/978-3-540-39804-2_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20085-7
Online ISBN: 978-3-540-39804-2
eBook Packages: Springer Book Archive