Abstract
The presence of missing or incomplete data is a commonplace in large real-word databases. In this paper, we study the problem of missing values which occur at the measure dimension of data cube. We propose a two-part mixture model, which combines the logistic model and loglinear model together, to predict and impute the missing values. The logistic model here is applied to predict missing positions while the loglinear model is applied to compute the estimation. Experimental results on real datasets and synthetic datasets are presented.
Keywords
- Logistic Model
- Range Query
- Synthetic Dataset
- Loglinear Model
- Data Cube
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
The datacube is a widely used data model for On-Line Analytical Processing (OLAP).A datacube is a multidimensional data abstraction, where aggregated measures of the combinations of dimension values are kept.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
A. Agresti.Categorical Data Analysis,Wiley Series in Probability and Mathematical Statistics,1990.
D. Barbara, H. Garcia-Molina, and D. Porter.The management of probabilistic data, IEEE Transactions on Knowledge and Data Engineering.Vol.4,no.5,page 487–502,1992.
D. Barbará, and X. Wu. Loglinear Based Quasi Cubes, Journal of Information and Intelligent System(JIIS), Vol 16(3),P255–276, Kluwer academic publishers.
J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data cube:A relational aggregation operator generalizing group-by,cross-tabs and sub-totals, In Proceedings of the 12th International Conference on Data Engineering,pages 152–159,1996.
J. W. Grzymala-Busse, and M. Hu. A Comparison of Several Approaches to Missing Attribute Values in Data Mining.In Proceedings of the second International Conference on Rough Sets and Current Trends in Computing,RSCTC 2000.
J. W. Grzymala-Busse.On the unknown attribute values in functional dependencies,In Proceedings of Methodologies for Intelligent Systems, Lecture Notes in AI,542, page 368–377,1991.
D. W. Hosmer, S. Lemeshow.Applied Logistic Regression, John Wiley and Sons, Inc.1989.
T. Imielinski, and W. Lipski. Incomplete Information in Relational Databases, Journal of ACM,31(4), page 761–791,1984.
R. A. Little, and D.B. Rubin.Statistical analysis with missing data,New York, John Wiley and Sons,1987.
P. van der Putten, M. van Someren. COIL Challenge 2000:The Insurance Company Case, Sentient Machine Research, Amsterdam, June 2000.
J. R. Quinlan.Induction of decision trees,Machine Learning,vol.1,page 81–106,1986.
J. R. Quinlan.Unknown attribute values in induction,In Proceedings of the Sixth International Machqine Learning Workshop, page 164–168,1989.
D. B. Rubin, Multiple Imputation for Nonresponse in Surveys,Wiley Series in Probability and Mathematical Statistics,1987.
J. L. Schafer. Analysis of Incomplete Multivariate Data, Book number 72 in the Chapman and Hall series Monographs on Statistics and Applied Probability.London, Chapman and Hall,1997.
J. L. Schafer, and M.K. Olsen. Modeling and imputation of semicontinuous survey variables,In Proceedings of Federal Committee on Statistical Methodology (FCSM) Reseach Conference,Nov,1999.
T. Y. Young, and T.W. Calvert. Classification, Estimation and Pattern Recognition.Elsevier,1974.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, X., Barbará, D. (2002). Modeling and Imputation of Large Incomplete Multidimensional Datasets. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2002. Lecture Notes in Computer Science, vol 2454. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46145-0_28
Download citation
DOI: https://doi.org/10.1007/3-540-46145-0_28
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44123-6
Online ISBN: 978-3-540-46145-6
eBook Packages: Springer Book Archive
