Mining in the phrasal frontier

  • Helena Ahonen
  • Oskari Heinonen
  • Mika Klemettinen
  • A. Inkeri Verkamo
Poster Session 6
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1263)


Data mining methods have been applied to a wide variety of domains. Surprisingly enough, only a few examples of data mining in text are available. However, considering the amount of existing document collections, text mining would be most useful. Traditionally, texts have been analysed using various information retrieval related methods and natural language processing. In this paper, we present our first experiments in applying general methods of data mining to discovering phrases and co-occurring terms. We also describe the text mining process developed. Our results show that data mining methods — with appropriate preprocessing — can be used in text processing, and that by shifting the focus the process can be used to obtain results for various purposes.


  1. 1.
    R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press, 1996.Google Scholar
  2. 2.
    D. R. Cutting, D. Karger, J. Pedersen, and J. W. Tukey. Scatter/Gather: A cluster-based approach to browsing large document collections. In Proc. of the 15th Annual Int'l ACM/SIGIR Conference, pages 318–329, Copenhagen, Denmark, June 1992.Google Scholar
  3. 3.
    U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data mining to knowledge discovery: An overview. In Advances in Knowledge Discovery and Data Mining, pages 1–34. AAAI Press, 1996.Google Scholar
  4. 4.
    R. Feldman, I. Dagan, and W. Klösgen. Efficient algorithms for mining and manipulating associations in texts. In Cybernetics and Systems, Vol. II, The 13th European Meeting on Cybernetics and Systems Research, Vienna, Austria, April 1996.Google Scholar
  5. 5.
    D. D. Lewis and K. Spärck Jones. Natural language processing for information retrieval. CACM, 39(1):92–101, 1996.Google Scholar
  6. 6.
    H. Mannila. Data mining: machine learning, statistics, and databases. In Proc. of the 8th Int'l Conference on Scientific and Statistical Database Management, pages 1–6, Stockholm, Sweden, 1996.Google Scholar
  7. 7.
    H. Mannila and H. Toivonen. Discovering generalized episodes using minimal occurrences. In Proc. of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96), pages 146–151, Portland, Oregon, USA, August 1996. AAAI Press.Google Scholar
  8. 8.
    G. Salton. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, 1988.Google Scholar
  9. 9.
    F. Smadja. Retrieving collocations from text: Xtract. Computational Linguistics, 19(1):143–177, 1993.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Helena Ahonen
    • 1
  • Oskari Heinonen
    • 1
  • Mika Klemettinen
    • 1
  • A. Inkeri Verkamo
    • 1
  1. 1.University of Helsinki, Department of Computer ScienceUniversity of HelsinkiFinland

Personalised recommendations