Rule Discovery and Probabilistic Modeling for Onomastic Data

Leino, Antti; Mannila, Heikki; Pitkänen, Ritva Liisa

doi:10.1007/978-3-540-39804-2_27

Antti Leino^10,11,
Heikki Mannila^10,12 &
Ritva Liisa Pitkänen^11,13

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2838))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

2191 Accesses
6 Citations
1 Altmetric

Abstract

The naming of natural features, such as hills, lakes, springs, meadows etc., provides a wealth of linguistic information; the study of the names and naming systems is called onomastics. We consider a data set containing all names and locations of about 58,000 lakes in Finland. Using computational techniques, we address two major onomastic themes. First, we address the existence of local dependencies or repulsion between occurrences of names. For this, we derive a simple form of spatial association rules. The results partially validate and partially contradict results obtained by traditional onomastic techniques. Second, we consider the existence of relatively homogeneous spatial regions with respect to the distributions of place names. Using mixture modeling, we conduct a global analysis of the data set. The clusterings of regions are spatially connected, and correspond quite well with the results obtained by other techniques; there are, however, interesting differences with previous hypotheses.

Download to read the full chapter text

Chapter PDF

Marble Algorithm: a solution to estimating ecological niches from presence-only records

Article Open access 21 September 2015

ClustGeo: an R package for hierarchical clustering with spatial constraints

Article 20 January 2018

PerioClust: A Simple Hierarchical Agglomerative Clustering Approach Including Constraints

References

Ripley, B.D.: Spatial Statistics. John Wiley & Sons, Chichester (1981)
Book MATH Google Scholar
Bailey, T.C., Gatrell, A.C.: Interactive Spatial Data Analysis. Longman Scientific & Technical (1995)
Google Scholar
Leskinen, T.: The geographic names register of the National Land Survey of Finland. In: Eighth United Nations Conference on the Standardization of Geographical Names (2002)
Google Scholar
Ripley, B.D.: The second-order analysis of stationary point processes. Journal of Applied Probability 13, 255–266 (1976)
Article MATH MathSciNet Google Scholar
Koperski, K., Han, J.: Discovery of spatial association rules in geographic information databases. In: Proceedings of the 4th International Symposium on Large Spatial Databases (1995)
Google Scholar
Koperski, K.: A Progressive Refinement Approach to Spatial Data Mining. PhD thesis, Simon Fraser University (1999)
Google Scholar
Estivill-Castro, V., Lee, I.: Data mining techniques for autonomous exploration of large volumes of geo-referenced crime data. In: 6th International Conference on Geocomputation (2001)
Google Scholar
Huang, Y., Shekhar, S., Xiong, H.: Discovering co-location patterns from spatial datasets: A general approach. Submitted to IEEE Transactions on Knowledge and Data Engineering (TKDE) (2002) (under second round review)
Google Scholar
Huang, Y., Xiong, H., Shekhar, S., Pei, J.: Mining confident co-location rules without a support threshold. In: Proceedings of the 18th ACM Symposium on Applied Computing, ACM SAC (2003) (to appear)
Google Scholar
McLachlan, G., Peel, D.: Finite Mixture Models. Wiley Series in Probability and Statistics. John Wiley & Sons, Chichester (2000)
Google Scholar
Everitt, B., Hand, D.: Finite Mixture Distributions. Monographs on Applied Probability and Statistics. Chapman and Hall, Boca Raton (1981)
Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39, 1–38 (1977)
MATH MathSciNet Google Scholar
Redner, R., Walker, H.: Mixture densities, maximum likelihood and the EM algorithm. SIAM Review 26, 195–234 (1984)
Article MATH MathSciNet Google Scholar
McLachlan, G.J.: The EM Algorithm and Extensions. Wiley & Sons, Chichester (1996)
Google Scholar
Wu, C.J.: On the convergence properties of the EM algorithm. The Annals of Statistics 11, 95–103 (1983)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Helsinki Institute for Information Technology, Basic Research Unit, Department of Computer Science, University of Helsinki, P.O. Box 26, FIN-00014, Finland
Antti Leino & Heikki Mannila
Research Institute for the Languages of Finland, Sörnäisten rantatie 25, FIN-00500, Helsinki, Finland
Antti Leino & Ritva Liisa Pitkänen
Laboratory of Computer and Information Science, Helsinki University of Technology, P.O. Box 5400, FIN-02015, HUT, Finland
Heikki Mannila
Department of Finnish, University of Helsinki, P.O. Box 3, FIN-00014, Finland
Ritva Liisa Pitkänen

Authors

Antti Leino
View author publications
You can also search for this author in PubMed Google Scholar
Heikki Mannila
View author publications
You can also search for this author in PubMed Google Scholar
Ritva Liisa Pitkänen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Nova Gorica, Nova Gorica, Slovenia
Nada Lavrač
Rudjer Bošković Institute, Bijenička 54, 10000, Zagreb, Croatia
Dragan Gamberger
Jozef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Ljupčo Todorovski
Leiden Institute of Advanced Computer Science, Leiden University,
Hendrik Blockeel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Leino, A., Mannila, H., Pitkänen, R.L. (2003). Rule Discovery and Probabilistic Modeling for Onomastic Data. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds) Knowledge Discovery in Databases: PKDD 2003. PKDD 2003. Lecture Notes in Computer Science(), vol 2838. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39804-2_27

Download citation

DOI: https://doi.org/10.1007/978-3-540-39804-2_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20085-7
Online ISBN: 978-3-540-39804-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Rule Discovery and Probabilistic Modeling for Onomastic Data

Abstract

Chapter PDF

Similar content being viewed by others

Marble Algorithm: a solution to estimating ecological niches from presence-only records

ClustGeo: an R package for hierarchical clustering with spatial constraints

PerioClust: A Simple Hierarchical Agglomerative Clustering Approach Including Constraints

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Rule Discovery and Probabilistic Modeling for Onomastic Data

Abstract

Chapter PDF

Similar content being viewed by others

Marble Algorithm: a solution to estimating ecological niches from presence-only records

ClustGeo: an R package for hierarchical clustering with spatial constraints

PerioClust: A Simple Hierarchical Agglomerative Clustering Approach Including Constraints

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation