Abstract
Let X 1, X 2,..., X n be iid random variables with a discrete distribution {p i } i =1m. We will discuss the coincidence probability R n , i.e., the probability that there are members of {X i } having the same value. If m=365 and p i ≡1/365, this is the famous birthday problem. Also we will give two kinds of approximation to this probability. Finally we will give two applications. The first is the estimation of the coincidence probability of surnames in Japan. For this purpose, we will fit a generalized zeta distribution to a frequency data of surnames in Japan. The second is the true birthday problem, that is, we will evaluate the birthday probability in Japan using the actual (non-uniform) distribution of birthdays in Japan.
Similar content being viewed by others
References
Arratia, R., Goldstein, L. and Gordon, L. (1989). Two moments suffice for Poisson approximations: the Chen-Stein method, Ann. Probab., 17, 9–25.
Barndorff-Nielsen, O. E. and Cox, D. R. (1989). Asymptotic Techniques for Use in Statistics, Chapman and Hall, London.
Bloom, D. M. (1973). A birthday problem, Amer. Math. Monthly, 80, 1141–1142.
Bolotonikov, Yu. V. (1968). Limiting processes in a model of distribution of particles into cells with unequal probabilities, Theory Probab. Appl., 13, 504–511.
Chistyakov, V. P. and Viktorova, I. I. (1965). Asymptotic normality in a problem of balls when probabilities of falling into different boxes are different, Theory Probab. Appl., 10, 149–154.
Comtet, L. (1974). Advanced Combinatorics, Reidel, Dordrecht.
Daiiti Life Insurance Co. (ed.) (1987) Surnames and Names, Kouyuu Publishing Co., Tokyo (in Japanese).
Fang, K.-T. (1985). Occupancy problems, Encyclopedia of Statistical Sciences (eds. S.Kotz and N. L.Johnson), Vol. 6, 402–406, Wiley, New York.
Feller, W. (1968). An Introduction to Probability Theory and Its Applications, Vol. 1, Wiley, New York.
Flajolet, P., Gardy, D. and Thimonier, L. (1988). Probabilistic languages and random allocations, Lecture Notes in Comput. Sci., 317, 239–253, Springer, Berlin.
Flajolet, P., Gardy, D. and Thimonier, L. (1991). Birthday paradox, coupon collectors, caching algorithms and self-organizing search, Discrete Appl. Math. (to appear).
Gradshteyn, I. S. and Ryzhik, I. M. (1980). Tables of Integrals, Series, and Products, Academic Press, Orland.
Hill, B. M. (1974). The rank-frequency form of Zipf's law. J. Amer. Statist. Assoc., 69, 1017–1026.
Johnson, N. L. and Kotz, S. (1977). Urn Models and Their Applications, Wiley, New York.
Klotz, J. (1979). The birthday problem with unequal probabilities, Tech. Report, No. 59, Department of Statistics, University of Wisconsin.
Kolchin, V. F., Sevast'yanov, B. A. and Chistyakov, V. H. (1978). Random Allocation (translation ed. A. V.Barakrishna), Wistons and sons, Washington D.C.
Management and Coordination Agency (ed.) (1990). Japan Statistical Yearbook 1989, Statistics Bureau, Management and Coordination Agency, Tokyo.
Ministry of Health and Welfare (ed.) (1988). Vital Statistics 1987, JAPAN, Vol. 1, Statistics and Information Department, Minister's Secretariat, Ministry of Health and Welfare, Tokyo.
Moser, L. and Wyman, M. (1958). Asymptotic development of the Stirling numbers of the first kind, J. London Math. Soc., 33, 133–146.
Munford, A. G. (1977). A note on the uniformity assumption in the birthday problem. Amer. Statist., 31, 119.
Nishimura, K. and Sibuya, M. (1988). Occupancy with two types of balls, Ann. Inst. Statist. Math., 40, 77–91.
Niwa, M. (ed.) (1978). Japanese Surnames, Vol. 1 and 2, Nippon Keizai Shinbun-Sha Co., Tokyo (in Japanese).
Niwa, M. (1980). Origins of Surnames, Kadokawa Book Co., Tokyo (in Japanese).
Roman, S. (1984). The Umbral Calculus, Academic Press, Orland.
Tanaka, K. (1972). Statistics of Japanese surnames and names, Gengo-Seikatu, 254, 72–79 (in Japanese).
Author information
Authors and Affiliations
Additional information
This research is supported in part by Grant-in-Aid for Scientific Research of the Ministry of Education, Science and Culture under the contact number 01540141 and 02640057.
About this article
Cite this article
Mase, S. Approximations to the birthday problem with unequal occurrence probabilities and their application to the surname problem in Japan. Ann Inst Stat Math 44, 479–499 (1992). https://doi.org/10.1007/BF00050700
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF00050700