Abstract
Data mining has in recent years emerged as an interesting area in the boundary between algorithms, probabilistic modeling, statistics, and databases. Data mining research can be divided into global approaches, which try to model the whole data, and local methods, which try to find useful patterns occurring in the data. We discuss briefly some simple local and global techniques, review two attempts at combining the approaches, and list open problems with an algorithmic flavor.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
D. Achlioptas and F. McSherry. Fast computation of low-rank approximations. In STOC 01, pages 611–618, 2001.
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In P. Buneman and S. Jajodia, editors, Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD’93), pages 207–216, Washington, D.C., USA, May 1993. ACM.
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press, Menlo Park, CA, 1996.
R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proceedings of the Twentieth International Conference on Very Large Data Bases (VLDB’94), pages 487–499, Sept. 1994.
R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the Eleventh International Conference on Data Engineering (ICDE’95), pages 3–14, Taipei, Taiwan, Mar. 1995.
R. Agrawal and R. Srikant. Privacy-preserving data mining. In ACM SIGMOD, 2000.
Y. Azar, A. Fiat, A. R. Karlin, F. McSherry, and J. Saia. Spectral analysis of data. In ACM Symposium on Theory of Computing, 2000.
R. J. Bayardo Jr. and R. Agrawal. Mining the most interesting rules. In Proc. of the Fifth ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pages 145–154, 1999.
R. J. Bayardo Jr., R. Agrawal, and D. Gunopulos. Constraint-based rule mining in large, dense databases. Data Mining and Knowledge Discovery, 4(2/3):217–240, 2000.
B. Bollobás. Combinatorics. Cambridge University Press, Cambridge, 1986.
I. V. Cadez, S. Gaffney, and P. Smyth. A general probabilistic framework for clustering individuals and objects. In KDD 2000, pages 140–149, 2000.
S. Chakrabarti, S. Sarawagi, and B. Dom. Mining surprising patterns using temporal description length. In A. Gupta, O. Shmueli, and J. Widom, editors, VLDB’98, pages 606–617. Morgan Kaufmann, 1998.
E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P. Indyk, R. Motwani, J. D. Ullman, and C. Yang. Finding interesting associations without support pruning. IEEE Transactions on Knowledge and Data Engineering, 13(1):64–78, 2001.
G. Cormode, P. Indyk, N. Koudas, and S. Muthukrishnan. Fast mining of massive tabular data via approximate distance computations. In ICDE2002, 2002.
G. Das and H. Mannila. Context-based similarity measures for categorical databases. In PKDD 2000, 2000.
G. Das, H. Mannila, and P. Ronkainen. Similarity of attributes by external probes. In R. Agrawal, P. Stolorz, and G. Piatetsky-Shapiro, editors, Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD’98), pages 23–29, New York, NY, USA, Aug. 1998. AAAI Press.
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harsh-man. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391–407, 1990.
S. Della Pietra, V. J. Della Pietra, and J. D. Lafferty. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):380–393, 1997.
W. DuMouchel and D. Pregibon. Empirical bayes screening for multi-item associations. In KDD-2001, pages 67–76, 2001.
T. Eiter and G. Gottlob. Identifying the minimal transversals of a hypergraph and related problems. SIAM Journal on Computing, 24(6):1278–1304, Dec. 1995.
T. Eiter, G. Gottlob, and K. Makino. New results on monotone dualization and generating hypergraph transversals. In STOC’02, 2002.
B. Everitt and D. Hand. Finite Mixture Distributions. Chapman and Hall, London, 1981.
V. Ganti, J. Gehrke, and R. Ramakrishnan. Demon: Mining and monitoring evolving data. IEEE Transactions on Knowledge and Data Engineering, 13(1):50–63, 2001.
A. Garg and D. Roth. Understanding probabilistic classifiers. In ECML 2001, pages 179–191, 2001.
D. Gibson, J. M. Kleinberg, and P. Raghavan. Clustering categorical data: An approach based on dynamical systems. In Gupta, O. Shmueli, and J. Widom, editors, VLDB’98, pages 311–322. Morgan Kaufmann, 1998.
J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000.
D. Hand, H. Mannila, and P. Smyth. Principles of Data Mining. MIT Press, 2001.
J. M. Kleinberg, C. H. Papadimitriou, and P. Raghavan. A microeconomic view of data mining. Data Mining and Knowledge Discovery, 2(4):311–324, 1998.
Y. Lindell and B. Pinkas. Privacy preserving data mining. In Crypto 2000, pages 36–54. Springer-Verlag, 2000.
H. Mannila. Theoretical frameworks for data mining. SIGKDD Explorations, 1(2):30–32, 2000.
H. Mannila and H. Toivonen. Multiple uses of frequent sets and condensed representations. In E. Simoudis, J. Han, and U. Fayyad, editors, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD’96), pages 189–194, Portland, Oregon, Aug. 1996. AAAI Press.
H. Mannila and H. Toivonen. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(3):241–258, Nov. 1997.
H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for discovering association rules. In U. M. Fayyad and R. Uthurusamy, editors, Knowledge Discovery in Databases, Papers from the 1994 AAAI Workshop (KDD’94), pages 181–192, Seattle, Washington, USA, July 1994. AAAI Press.
H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3):259–289, Nov. 1997.
G. McLachlan and D. Peel. Finite Mixture Distributions. Wiley, New York, 2000.
C. H. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala. Latent semantic indexing: A probabilistic analysis. In Proc. 17th ACM Symposium on the Principles of Database Systems (PODS’98), pages 159–168, Seattle, WA, June 1998.
D. Pavlov, H. Mannila, and P. Smyth. Probabilistic models for query approximation with large sparse binary data sets. In Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI-00), 2000.
D. Pavlov, H. Mannila, and P. Smyth. Beyond independence: Probabilistic models for query approximation on binary transaction data. Technical Report Technical Report UCI-ICS TR-01-09, Information and Computer Science Department, UC Irvine, 2001.
J. Pei, J. Han, and L. V. S. Lakshmanan. Mining frequent item sets with convertible constraints. In ICDE 2001, pages 433–442, 2001.
J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang. H-mine: Hyper-structure mining of frequent patterns in large databases. In Proc. 2001 Int. Conf. on Data Mining (ICDM’01), 2001.
R. Ramakrishnan and J. Gehrke. Database Management Systems (2nd ed.). McGraw-Hill, 2001.
P. Smyth. Data mining at the interface of computer science and statistics. In Data Mining for Scientific and Engineering Applications, 2002. To appear.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mannila, H. (2002). Local and Global Methods in Data Mining: Basic Techniques and Open Problems. In: Widmayer, P., Eidenbenz, S., Triguero, F., Morales, R., Conejo, R., Hennessy, M. (eds) Automata, Languages and Programming. ICALP 2002. Lecture Notes in Computer Science, vol 2380. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45465-9_6
Download citation
DOI: https://doi.org/10.1007/3-540-45465-9_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43864-9
Online ISBN: 978-3-540-45465-6
eBook Packages: Springer Book Archive