Advertisement

Selecting Candidate Labels for Hierarchical Document Clusters Using Association Rules

  • Fabiano Fernandes dos Santos
  • Veronica Oliveira de Carvalho
  • Solange Oliveira Rezende
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6438)

Abstract

One way to organize knowledge and make its search and retrieval easier is to create a structural representation divided by hierarchically related topics. Once this structure is built, it is necessary to find labels for each of the obtained clusters. In many cases the labels have to be built using only the terms in the documents of the collection. This paper presents the SeCLAR (Selecting Candidate Labels using Association Rules) method, which explores the use of association rules for the selection of good candidates for labels of hierarchical document clusters. The candidates are processed by a classical method to generate the labels. The idea of the proposed method is to process each parent-child relationship of the nodes as an antecedent-consequent relationship of association rules. The experimental results show that the proposed method can improve the precision and recall of labels obtained by classical methods.

Keywords

label hierarchical clustering association rules text mining 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB 1994: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499. Morgan Kaufmann Publishers Inc., San Francisco (1994)Google Scholar
  2. 2.
    Bast, H., Dupret, G., Majumdar, D., Piwowarski, B.: Discovering a term taxonomy from term similarities using principal component analysis. In: Ackermann, M., Berendt, B., Grobelnik, M., Hotho, A., Mladenič, D., Semeraro, G., Spiliopoulou, M., Stumme, G., Svátek, V., van Someren, M. (eds.) EWMF 2005 and KDO 2005. LNCS (LNAI), vol. 4289, pp. 103–120. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  3. 3.
    Glover, E.J., Pennock, D.M., Lawrence, S., Krovetz, R.: Inferring hierarchical descriptions. In: CIKM, pp. 507–514. ACM, New York (2002)Google Scholar
  4. 4.
    Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: KDD 1999: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 16–22. ACM, New York (1999)CrossRefGoogle Scholar
  5. 5.
    Lopes, A., Pinho, R., Paulovich, F., Minghim, R.: Visual text mining using association rules. In: ScienceDirect, pp. 316–326 (2007)Google Scholar
  6. 6.
    Mahgoub, H., Rösner, D., Ismail, N., Torkey, F.: A text mining technique using association rules extraction. International Journal of Computational Intelligence, 21–28 (2008)Google Scholar
  7. 7.
    Moura, M.F., Rezende, S.O.: A simple method for labeling hierarchical document clusters. In: Proceedings of AIA 2010 - Artificial Intelligence and Applications, Innsbruck, Austria (2010)Google Scholar
  8. 8.
    Popescul, A., Ungar, L.: Automatic labeling of document clusters (2000) (unpublished manuscript), http://www.cis.upenn.edu/~popescul/Publications/popescul00labeling.pdf
  9. 9.
    Porter, M.F.: An algorithm for suffix stripping. Readings in Information Retrieval, 313–316 (1997)Google Scholar
  10. 10.
    Salton, G., Buckley, C.: Term weighting approaches in automatic text retrieval. Technical report, Ithaca, NY, USA (1987)Google Scholar
  11. 11.
    Searle, S.R.: Linear models. J. Wiley, New York (1971)zbMATHGoogle Scholar
  12. 12.
    Treeratpituk, P., Callan, J.: Automatically labeling hierarchical clusters. In: dg.o 2006: Proceedings of the 2006 international conference on Digital government research, pp. 167–176. ACM, New York (2006)Google Scholar
  13. 13.
    Weiss, S.M., Indurkhya, N., Zhang, T., Damerau, F.J.: Text Mining - Predictive Methods for Analizing Unstructured Information. Springer Science+Business Media, Inc., Heidelberg (2005)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Fabiano Fernandes dos Santos
    • 1
  • Veronica Oliveira de Carvalho
    • 2
  • Solange Oliveira Rezende
    • 1
  1. 1.Instituto de Ciências Matemáticas e de ComputaçãoUniversidade de São Paulo (USP)Brazil
  2. 2.Instituto de Geociências e Ciências Exatas UNESPUniv Estadual PaulistaBrazil

Personalised recommendations