Local and Global Methods in Data Mining: Basic Techniques and Open Problems

Mannila, Heikki

doi:10.1007/3-540-45465-9_6

Heikki Mannila^7,8

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2380))

Included in the following conference series:

International Colloquium on Automata, Languages, and Programming

2336 Accesses
35 Citations

Abstract

Data mining has in recent years emerged as an interesting area in the boundary between algorithms, probabilistic modeling, statistics, and databases. Data mining research can be divided into global approaches, which try to model the whole data, and local methods, which try to find useful patterns occurring in the data. We discuss briefly some simple local and global techniques, review two attempts at combining the approaches, and list open problems with an algorithmic flavor.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D. Achlioptas and F. McSherry. Fast computation of low-rank approximations. In STOC 01, pages 611–618, 2001.
Google Scholar
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In P. Buneman and S. Jajodia, editors, Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD’93), pages 207–216, Washington, D.C., USA, May 1993. ACM.
Google Scholar
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press, Menlo Park, CA, 1996.
Google Scholar
R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proceedings of the Twentieth International Conference on Very Large Data Bases (VLDB’94), pages 487–499, Sept. 1994.
Google Scholar
R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the Eleventh International Conference on Data Engineering (ICDE’95), pages 3–14, Taipei, Taiwan, Mar. 1995.
Google Scholar
R. Agrawal and R. Srikant. Privacy-preserving data mining. In ACM SIGMOD, 2000.
Google Scholar
Y. Azar, A. Fiat, A. R. Karlin, F. McSherry, and J. Saia. Spectral analysis of data. In ACM Symposium on Theory of Computing, 2000.
Google Scholar
R. J. Bayardo Jr. and R. Agrawal. Mining the most interesting rules. In Proc. of the Fifth ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pages 145–154, 1999.
Google Scholar
R. J. Bayardo Jr., R. Agrawal, and D. Gunopulos. Constraint-based rule mining in large, dense databases. Data Mining and Knowledge Discovery, 4(2/3):217–240, 2000.
Article Google Scholar
B. Bollobás. Combinatorics. Cambridge University Press, Cambridge, 1986.
MATH Google Scholar
I. V. Cadez, S. Gaffney, and P. Smyth. A general probabilistic framework for clustering individuals and objects. In KDD 2000, pages 140–149, 2000.
Google Scholar
S. Chakrabarti, S. Sarawagi, and B. Dom. Mining surprising patterns using temporal description length. In A. Gupta, O. Shmueli, and J. Widom, editors, VLDB’98, pages 606–617. Morgan Kaufmann, 1998.
Google Scholar
E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P. Indyk, R. Motwani, J. D. Ullman, and C. Yang. Finding interesting associations without support pruning. IEEE Transactions on Knowledge and Data Engineering, 13(1):64–78, 2001.
Article Google Scholar
G. Cormode, P. Indyk, N. Koudas, and S. Muthukrishnan. Fast mining of massive tabular data via approximate distance computations. In ICDE2002, 2002.
Google Scholar
G. Das and H. Mannila. Context-based similarity measures for categorical databases. In PKDD 2000, 2000.
Google Scholar
G. Das, H. Mannila, and P. Ronkainen. Similarity of attributes by external probes. In R. Agrawal, P. Stolorz, and G. Piatetsky-Shapiro, editors, Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD’98), pages 23–29, New York, NY, USA, Aug. 1998. AAAI Press.
Google Scholar
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harsh-man. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391–407, 1990.
Article Google Scholar
S. Della Pietra, V. J. Della Pietra, and J. D. Lafferty. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):380–393, 1997.
Article Google Scholar
W. DuMouchel and D. Pregibon. Empirical bayes screening for multi-item associations. In KDD-2001, pages 67–76, 2001.
Google Scholar
T. Eiter and G. Gottlob. Identifying the minimal transversals of a hypergraph and related problems. SIAM Journal on Computing, 24(6):1278–1304, Dec. 1995.
Google Scholar
T. Eiter, G. Gottlob, and K. Makino. New results on monotone dualization and generating hypergraph transversals. In STOC’02, 2002.
Google Scholar
B. Everitt and D. Hand. Finite Mixture Distributions. Chapman and Hall, London, 1981.
MATH Google Scholar
V. Ganti, J. Gehrke, and R. Ramakrishnan. Demon: Mining and monitoring evolving data. IEEE Transactions on Knowledge and Data Engineering, 13(1):50–63, 2001.
Article Google Scholar
A. Garg and D. Roth. Understanding probabilistic classifiers. In ECML 2001, pages 179–191, 2001.
Google Scholar
D. Gibson, J. M. Kleinberg, and P. Raghavan. Clustering categorical data: An approach based on dynamical systems. In Gupta, O. Shmueli, and J. Widom, editors, VLDB’98, pages 311–322. Morgan Kaufmann, 1998.
Google Scholar
J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000.
Google Scholar
D. Hand, H. Mannila, and P. Smyth. Principles of Data Mining. MIT Press, 2001.
Google Scholar
J. M. Kleinberg, C. H. Papadimitriou, and P. Raghavan. A microeconomic view of data mining. Data Mining and Knowledge Discovery, 2(4):311–324, 1998.
Article Google Scholar
Y. Lindell and B. Pinkas. Privacy preserving data mining. In Crypto 2000, pages 36–54. Springer-Verlag, 2000.
Google Scholar
H. Mannila. Theoretical frameworks for data mining. SIGKDD Explorations, 1(2):30–32, 2000.
Article Google Scholar
H. Mannila and H. Toivonen. Multiple uses of frequent sets and condensed representations. In E. Simoudis, J. Han, and U. Fayyad, editors, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD’96), pages 189–194, Portland, Oregon, Aug. 1996. AAAI Press.
Google Scholar
H. Mannila and H. Toivonen. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(3):241–258, Nov. 1997.
Google Scholar
H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for discovering association rules. In U. M. Fayyad and R. Uthurusamy, editors, Knowledge Discovery in Databases, Papers from the 1994 AAAI Workshop (KDD’94), pages 181–192, Seattle, Washington, USA, July 1994. AAAI Press.
Google Scholar
H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3):259–289, Nov. 1997.
Google Scholar
G. McLachlan and D. Peel. Finite Mixture Distributions. Wiley, New York, 2000.
Book Google Scholar
C. H. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala. Latent semantic indexing: A probabilistic analysis. In Proc. 17th ACM Symposium on the Principles of Database Systems (PODS’98), pages 159–168, Seattle, WA, June 1998.
Google Scholar
D. Pavlov, H. Mannila, and P. Smyth. Probabilistic models for query approximation with large sparse binary data sets. In Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI-00), 2000.
Google Scholar
D. Pavlov, H. Mannila, and P. Smyth. Beyond independence: Probabilistic models for query approximation on binary transaction data. Technical Report Technical Report UCI-ICS TR-01-09, Information and Computer Science Department, UC Irvine, 2001.
Google Scholar
J. Pei, J. Han, and L. V. S. Lakshmanan. Mining frequent item sets with convertible constraints. In ICDE 2001, pages 433–442, 2001.
Google Scholar
J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang. H-mine: Hyper-structure mining of frequent patterns in large databases. In Proc. 2001 Int. Conf. on Data Mining (ICDM’01), 2001.
Google Scholar
R. Ramakrishnan and J. Gehrke. Database Management Systems (2nd ed.). McGraw-Hill, 2001.
Google Scholar
P. Smyth. Data mining at the interface of computer science and statistics. In Data Mining for Scientific and Engineering Applications, 2002. To appear.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, HIIT Basic Research Unit, University of Helsinki, PO Box 26, FIN-00014, University of Helsinki, Finland
Heikki Mannila
Laboratory of Computer and Information Science, Helsinki University of Technology, PO Box 5400, FIN-02015, HUT, Finland
Heikki Mannila

Authors

Heikki Mannila
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Theoretical Computer Science, ETH Zentrum, ETH Zürich, 8092, Zürich, Switzerland
Peter Widmayer & Stephan Eidenbenz &
Department of Languages and Sciences of the Computation E.T.S. de Ingeniería Informática, University of Málaga, Campus de Teatinos, 29071, Málaga, Spain
Francisco Triguero , Rafael Morales & Ricardo Conejo , &
School of Cognitive and Computing Sciences, University of Sussex, Falmer, Brighton, BN1 9QN, UK
Matthew Hennessy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mannila, H. (2002). Local and Global Methods in Data Mining: Basic Techniques and Open Problems. In: Widmayer, P., Eidenbenz, S., Triguero, F., Morales, R., Conejo, R., Hennessy, M. (eds) Automata, Languages and Programming. ICALP 2002. Lecture Notes in Computer Science, vol 2380. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45465-9_6

Download citation

DOI: https://doi.org/10.1007/3-540-45465-9_6
Published: 25 June 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43864-9
Online ISBN: 978-3-540-45465-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics