Identifiability and Estimation of Probabilities from Multiple Databases with Incomplete Data and Sampling Selection

  • Jinzhu Jia
  • Zhi Geng
  • Mingfeng Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4109)


For an application problem, there may be multiple databases, and each database may not contain complete variables or attributes, that is, some variables are observed but some others are missing. Further, data of a database may be collected conditionally on some designed variables. In this paper, we discuss problems related to data mining from such multiple databases. We propose an approach for detecting identifiability of a joint distribution from multiple databases. For an identifiable joint distribution, we further present the expectation-maximization (EM) algorithm for calculating the maximum likelihood estimates (MLEs) of the joint distribution.


  1. [1]
    Beeri, C., Fagin, R., Maier, D., Yannakakis, M.: On the desirability of acyclic database schemes. J. Association for Computing Machinery 30, 479–513 (1983)zbMATHMathSciNetGoogle Scholar
  2. [2]
    Bickel, P.J., Doksum, K.A.: Mathemetical Statistics. Holden-Day, Oakland (1977)Google Scholar
  3. [3]
    Dempster, A.P., Larid, N.M., Rubin, D.B.: Maximum likelihood estimation from incomplete data via the EM algorithm (with disscussion). J. R. Stat. Soc. Ser. B. 39, 1–38 (1977)zbMATHGoogle Scholar
  4. [4]
    Geng, Z., Wan, K., Tao, F.: Mixed graphical models with missing data and the partial imputation EM algorithm. Scan. J. of Stat. 27, 433–444 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  5. [5]
    Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, 2nd edn. Wiley, New York (2002)zbMATHGoogle Scholar
  6. [6]
    Rassler, S.: Statistical Matching. Lecture Notes in Statistics, vol. 168. Springer, New York (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jinzhu Jia
    • 1
  • Zhi Geng
    • 1
  • Mingfeng Wang
    • 1
  1. 1.School of Mathematical Sciences, LMAMPeking UniversityBeijingChina

Personalised recommendations