Skip to main content

Local and Global Methods in Data Mining: Basic Techniques and Open Problems

  • Conference paper
  • First Online:
Automata, Languages and Programming (ICALP 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2380))

Included in the following conference series:

Abstract

Data mining has in recent years emerged as an interesting area in the boundary between algorithms, probabilistic modeling, statistics, and databases. Data mining research can be divided into global approaches, which try to model the whole data, and local methods, which try to find useful patterns occurring in the data. We discuss briefly some simple local and global techniques, review two attempts at combining the approaches, and list open problems with an algorithmic flavor.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D. Achlioptas and F. McSherry. Fast computation of low-rank approximations. In STOC 01, pages 611–618, 2001.

    Google Scholar 

  2. R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In P. Buneman and S. Jajodia, editors, Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD’93), pages 207–216, Washington, D.C., USA, May 1993. ACM.

    Google Scholar 

  3. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press, Menlo Park, CA, 1996.

    Google Scholar 

  4. R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proceedings of the Twentieth International Conference on Very Large Data Bases (VLDB’94), pages 487–499, Sept. 1994.

    Google Scholar 

  5. R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the Eleventh International Conference on Data Engineering (ICDE’95), pages 3–14, Taipei, Taiwan, Mar. 1995.

    Google Scholar 

  6. R. Agrawal and R. Srikant. Privacy-preserving data mining. In ACM SIGMOD, 2000.

    Google Scholar 

  7. Y. Azar, A. Fiat, A. R. Karlin, F. McSherry, and J. Saia. Spectral analysis of data. In ACM Symposium on Theory of Computing, 2000.

    Google Scholar 

  8. R. J. Bayardo Jr. and R. Agrawal. Mining the most interesting rules. In Proc. of the Fifth ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pages 145–154, 1999.

    Google Scholar 

  9. R. J. Bayardo Jr., R. Agrawal, and D. Gunopulos. Constraint-based rule mining in large, dense databases. Data Mining and Knowledge Discovery, 4(2/3):217–240, 2000.

    Article  Google Scholar 

  10. B. Bollobás. Combinatorics. Cambridge University Press, Cambridge, 1986.

    MATH  Google Scholar 

  11. I. V. Cadez, S. Gaffney, and P. Smyth. A general probabilistic framework for clustering individuals and objects. In KDD 2000, pages 140–149, 2000.

    Google Scholar 

  12. S. Chakrabarti, S. Sarawagi, and B. Dom. Mining surprising patterns using temporal description length. In A. Gupta, O. Shmueli, and J. Widom, editors, VLDB’98, pages 606–617. Morgan Kaufmann, 1998.

    Google Scholar 

  13. E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P. Indyk, R. Motwani, J. D. Ullman, and C. Yang. Finding interesting associations without support pruning. IEEE Transactions on Knowledge and Data Engineering, 13(1):64–78, 2001.

    Article  Google Scholar 

  14. G. Cormode, P. Indyk, N. Koudas, and S. Muthukrishnan. Fast mining of massive tabular data via approximate distance computations. In ICDE2002, 2002.

    Google Scholar 

  15. G. Das and H. Mannila. Context-based similarity measures for categorical databases. In PKDD 2000, 2000.

    Google Scholar 

  16. G. Das, H. Mannila, and P. Ronkainen. Similarity of attributes by external probes. In R. Agrawal, P. Stolorz, and G. Piatetsky-Shapiro, editors, Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD’98), pages 23–29, New York, NY, USA, Aug. 1998. AAAI Press.

    Google Scholar 

  17. S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harsh-man. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391–407, 1990.

    Article  Google Scholar 

  18. S. Della Pietra, V. J. Della Pietra, and J. D. Lafferty. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):380–393, 1997.

    Article  Google Scholar 

  19. W. DuMouchel and D. Pregibon. Empirical bayes screening for multi-item associations. In KDD-2001, pages 67–76, 2001.

    Google Scholar 

  20. T. Eiter and G. Gottlob. Identifying the minimal transversals of a hypergraph and related problems. SIAM Journal on Computing, 24(6):1278–1304, Dec. 1995.

    Google Scholar 

  21. T. Eiter, G. Gottlob, and K. Makino. New results on monotone dualization and generating hypergraph transversals. In STOC’02, 2002.

    Google Scholar 

  22. B. Everitt and D. Hand. Finite Mixture Distributions. Chapman and Hall, London, 1981.

    MATH  Google Scholar 

  23. V. Ganti, J. Gehrke, and R. Ramakrishnan. Demon: Mining and monitoring evolving data. IEEE Transactions on Knowledge and Data Engineering, 13(1):50–63, 2001.

    Article  Google Scholar 

  24. A. Garg and D. Roth. Understanding probabilistic classifiers. In ECML 2001, pages 179–191, 2001.

    Google Scholar 

  25. D. Gibson, J. M. Kleinberg, and P. Raghavan. Clustering categorical data: An approach based on dynamical systems. In Gupta, O. Shmueli, and J. Widom, editors, VLDB’98, pages 311–322. Morgan Kaufmann, 1998.

    Google Scholar 

  26. J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000.

    Google Scholar 

  27. D. Hand, H. Mannila, and P. Smyth. Principles of Data Mining. MIT Press, 2001.

    Google Scholar 

  28. J. M. Kleinberg, C. H. Papadimitriou, and P. Raghavan. A microeconomic view of data mining. Data Mining and Knowledge Discovery, 2(4):311–324, 1998.

    Article  Google Scholar 

  29. Y. Lindell and B. Pinkas. Privacy preserving data mining. In Crypto 2000, pages 36–54. Springer-Verlag, 2000.

    Google Scholar 

  30. H. Mannila. Theoretical frameworks for data mining. SIGKDD Explorations, 1(2):30–32, 2000.

    Article  Google Scholar 

  31. H. Mannila and H. Toivonen. Multiple uses of frequent sets and condensed representations. In E. Simoudis, J. Han, and U. Fayyad, editors, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD’96), pages 189–194, Portland, Oregon, Aug. 1996. AAAI Press.

    Google Scholar 

  32. H. Mannila and H. Toivonen. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(3):241–258, Nov. 1997.

    Google Scholar 

  33. H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for discovering association rules. In U. M. Fayyad and R. Uthurusamy, editors, Knowledge Discovery in Databases, Papers from the 1994 AAAI Workshop (KDD’94), pages 181–192, Seattle, Washington, USA, July 1994. AAAI Press.

    Google Scholar 

  34. H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3):259–289, Nov. 1997.

    Google Scholar 

  35. G. McLachlan and D. Peel. Finite Mixture Distributions. Wiley, New York, 2000.

    Book  Google Scholar 

  36. C. H. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala. Latent semantic indexing: A probabilistic analysis. In Proc. 17th ACM Symposium on the Principles of Database Systems (PODS’98), pages 159–168, Seattle, WA, June 1998.

    Google Scholar 

  37. D. Pavlov, H. Mannila, and P. Smyth. Probabilistic models for query approximation with large sparse binary data sets. In Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI-00), 2000.

    Google Scholar 

  38. D. Pavlov, H. Mannila, and P. Smyth. Beyond independence: Probabilistic models for query approximation on binary transaction data. Technical Report Technical Report UCI-ICS TR-01-09, Information and Computer Science Department, UC Irvine, 2001.

    Google Scholar 

  39. J. Pei, J. Han, and L. V. S. Lakshmanan. Mining frequent item sets with convertible constraints. In ICDE 2001, pages 433–442, 2001.

    Google Scholar 

  40. J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang. H-mine: Hyper-structure mining of frequent patterns in large databases. In Proc. 2001 Int. Conf. on Data Mining (ICDM’01), 2001.

    Google Scholar 

  41. R. Ramakrishnan and J. Gehrke. Database Management Systems (2nd ed.). McGraw-Hill, 2001.

    Google Scholar 

  42. P. Smyth. Data mining at the interface of computer science and statistics. In Data Mining for Scientific and Engineering Applications, 2002. To appear.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mannila, H. (2002). Local and Global Methods in Data Mining: Basic Techniques and Open Problems. In: Widmayer, P., Eidenbenz, S., Triguero, F., Morales, R., Conejo, R., Hennessy, M. (eds) Automata, Languages and Programming. ICALP 2002. Lecture Notes in Computer Science, vol 2380. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45465-9_6

Download citation

  • DOI: https://doi.org/10.1007/3-540-45465-9_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43864-9

  • Online ISBN: 978-3-540-45465-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics