Mining for Empty Rectangles in Large Data Sets
Many data mining approaches focus on the discovery of similar (and frequent) data values in large data sets. We present an alternative, but complementary approach in which we search for empty regions in the data. We consider the problem of finding all maximal empty rectangles in large, two-dimensional data sets. We introduce a novel, scalable algorithm for finding all such rectangles. The algorithm achieves this with a single scan over a sorted data set and requires only a small bounded amount of memory. We also describe an algorithm to find all maximal empty hyper-rectangles in a multi-dimensional space. We consider the complexity of this search problem and present new bounds on the number of maximal empty hyper-rectangles. We briefly overview experimental results obtained by applying our algorithm to a synthetic data set.
KeywordsAssociation Rule Scalable Algorithm High Step Empty Region Empty Rectangle
Unable to display preview. Download preview PDF.
- 1.R. Agrawal, T. Imielinksi, and A. Swami. Mining Association Rules between Sets of Items in Large Databases. ACM SIGMOD, 22(2), June 1993.Google Scholar
- 2.M. J. Atallah and Fredrickson G. N. A note on finding a maximum empty rectangle. Discrete Applied Mathematics, (13):87–91, 1986.Google Scholar
- 3.D. Barbará, W. DuMouchel, C. Faloutsos, P. J. Haas, J. M. Hellerstein, Y. E. Ioannidis, H. V. Jagadish, T. Johnson, R. T. Ng, V. Poosala, K. A. Ross, and K. C. Sevcik. The New Jersey Data Reduction Report. Data Engineering Bulletin, 20(4):3–45, 1997.Google Scholar
- 5.Q. Cheng, J. Gryz, F. Koo, C. Leung, L. Liu, X. Qian, and B. Schiefer. Implementation of two semantic query optimization techniques in DB2 universal database. In Proceedings of the 25th VLDB, pages 687–698, Edinburgh, Scotland, 1999.Google Scholar
- 6.J. Edmonds, J. Gryz, D. Liang, and R. J. Miller. Mining for Empty Rectangles in Large Data Sets (Extended Version). Technical Report CSRG-410, Department of Computer Science, University of Toronto, 2000.Google Scholar
- 8.H. V. Jagadish, J. Madar, and R. T. Ng. Semantic Compression and Pattern Extraction with Fascicles. In Proc. of VLDB, pages 186–197, 1999.Google Scholar
- 9.B. Liu, K. Wang, L.-F. Mun, and X.-Z. Qi. Using Decision Tree Induction for Discovering Holes in Data. In 5th Pacific Rim International Conference on Artificial Intelligence, pages 182–193, 1998.Google Scholar
- 10.Bing Liu, Liang-Ping Ku, and Wynne Hsu. Discovering interesting holes in data. In Proceedings of IJCAI, pages 930–935, Nagoya, Japan, 1997. Morgan Kaufmann.Google Scholar
- 11.R. J. Miller and Y. Yang. Association Rules over Interval Data. ACM SIGMOD, 26(2):452–461, May 1997.Google Scholar
- 12.A. Namaad, W. L. Hsu, and D. T. Lee. On the maximum empty rectangle problem. Applied Discrete Mathematics, (8):267–277, 1984.Google Scholar
- 14.T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases. ACM SIGMOD, 25(2), June 1996.Google Scholar