Mining Rough Association from Text Documents for Web Information Gathering

  • Yuefeng Li
  • Ning Zhong
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4400)


It is a big challenge to guarantee the quality of association rules in some application areas (e.g., in Web information gathering) since duplications and ambiguities of data values (e.g., terms). Rough set based decision tables could be efficient tools for solving this challenge. This paper first illustrates the relationship between decision tables and association mining. It proves that a decision rule is a kind of closed pattern. It also presents an alternative concept of rough association rules to improve the quality of discovered knowledge in this area. The premise of a rough association rule consists of a set of terms (items) and a weight distribution of terms (items). The distinct advantage of rough association rules is that they contain more specific information than normal association rules. This paper also conducts some experiments to compare the proposed method with association rule mining and decision tables; and the experimental results verify that the proposed approach is promising.


Association mining Web information gathering Rough sets 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Antonie, M.L., Zaiane, O.R.: Text document categorization by term association. In: 2nd IEEE International Conference on Data Mining, Japan, pp. 19–26. IEEE Computer Society Press, Los Alamitos (2002)Google Scholar
  2. 2.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)Google Scholar
  3. 3.
    Chang, G., et al.: Mining the World Wide Web: an information search approach. Kluwer Academic Publishers, Dordrecht (2001)zbMATHGoogle Scholar
  4. 4.
    Eirinaki, M., Vazirgiannis, M.: Web mining for web personalization. ACM Transactions on Internet Technology 3(1), 1–27 (2003)CrossRefGoogle Scholar
  5. 5.
    Evans, D.A., et al.: CLARIT experiments in batch filtering: term selection and threshold optimization in IR and SVM Filters. In: TREC02 (2002)Google Scholar
  6. 6.
    Fayyad, U., et al. (eds.): Advances in knowledge discovery and data mining. AAAI Press, Menlo Park (1996)Google Scholar
  7. 7.
    Feldman, R., Hirsh, H.: Mining associations in text in presence of background knowledge. In: 2nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 343–346. ACM Press, New York (1996)Google Scholar
  8. 8.
    Feldman, R., et al.: Maximal association rules: a new tool for mining for keyword co-occurrences in document collection. In: 3rd International conference on knowledge discovery (KDD), pp. 167–170 (1997)Google Scholar
  9. 9.
    Feldman, R., et al.: Text mining at the term level. In: Żytkow, J.M. (ed.) PKDD 1998. LNCS, vol. 1510, pp. 65–73. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  10. 10.
    Feldman, R., Dagen, I., Hirsh, H.: Mining text using keywords distributions. Journal of Intelligent Information Systems 10(3), 281–300 (1998)CrossRefGoogle Scholar
  11. 11.
    Grossman, D.A., Frieder, O.: Information retrieval algorithms and heuristics. Kluwer Academic Publishers, Boston (1998)zbMATHGoogle Scholar
  12. 12.
    Guan, J.W., Bell, D.A., Liu, D.Y.: The rough set approach to association rules. In: 3rd IEEE International Conference on Data Mining, Melbourne, Florida, USA, pp. 529–532. IEEE Computer Society Press, Los Alamitos (2003)CrossRefGoogle Scholar
  13. 13.
    Han, J., Fu, Y.: Minino Multiple-level Association Rules in Large Databases. IEEE Trans. On Knowledge and Data Engineering 11(5), 798–805 (1999)CrossRefGoogle Scholar
  14. 14.
    Holt, J.D., Chung, S.M.: Multipass algorithms for mining association rules in text databases. Knowledge and Information Systems 3, 168–183 (2001)zbMATHCrossRefGoogle Scholar
  15. 15.
    Li, X., Liu, B.: Learning to classify texts using positive and unlabeled data. In: IJCAI, pp. 587–592 (2003)Google Scholar
  16. 16.
    Li, Y., Zhong, N.: Web mining model and its applications on information gathering. Knowledge-Based Systems 17, 207–217 (2004)CrossRefGoogle Scholar
  17. 17.
    Li, Y., Zhong, N.: Capturing evolving patterns for ontology-based. In: IEEE/WIC/ACM International Conference on Web Intelligence, Beijing, China, pp. 256–263. IEEE, Los Alamitos (2004)Google Scholar
  18. 18.
    Li, Y., Zhong, N.: Interpretations of association rules by granular computing. In: 3rd IEEE International Conference on Data Mining, Melbourne, Florida, USA, pp. 593–596. IEEE Computer Society Press, Los Alamitos (2003)CrossRefGoogle Scholar
  19. 19.
    Li, Y., Zhong, N.: Mining ontology for automatically acquiring Web user information needs. IEEE Transactions on Knowledge and Data Engineering 18(4), 554–568 (2006)CrossRefMathSciNetGoogle Scholar
  20. 20.
    Liu, B., et al.: Building text classifiers using positive and unlabeled examples. In: 3rd IEEE International Conference on Data Mining, Melbourne, Florida, USA, pp. 179–186. IEEE Computer Society Press, Los Alamitos (2003)CrossRefGoogle Scholar
  21. 21.
    Liu, B., Ma, Y., Yu, P.S.: Discovering business intelligence information by comparing company Web sites. In: Zhong, N., Liu, J., Yao, Y.Y. (eds.) Web Intelligence, pp. 105–127. Springer, Heidelberg (2003)Google Scholar
  22. 22.
    Mostafa, J., Lam, W., Palakal, M.: A multilevel approach to intelligent information filtering: model, system, and evaluation. ACM Transactions on Information Systems 15(4), 368–399 (1997)CrossRefGoogle Scholar
  23. 23.
    Pawlak, Z.: In pursuit of patterns in data reasoning from data, the rough set way. In: Alpigini, J.J., et al. (eds.) RSCTC 2002. LNCS (LNAI), vol. 2475, pp. 1–9. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  24. 24.
    Pawlak, Z.: Flow graphs and decision algorithms. In: 9th International Conference on Rough Set, Fuzzy Sets, Data Mining and Granular Computing, Chongqing, China, pp. 1–10 (2003)Google Scholar
  25. 25.
    Robertson, S., Hull, D.A.: The TREC-9 filtering track final report. In: TREC-9 (2000)Google Scholar
  26. 26.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)CrossRefGoogle Scholar
  27. 27.
    Tzvetkov, P., Yan, X., Han, J.: TSP: Mining top-K closed sequential patterns. In: 3rd IEEE International Conference on Data Mining, Melbourne, Florida, USA, pp. 347–354. IEEE Computer Society Press, Los Alamitos (2003)CrossRefGoogle Scholar
  28. 28.
    Wu, S.-T., et al.: Automatic pattern taxonomy exatraction for Web mining. In: IEEE/WIC/ACM International Conference on Web Intelligence, Beijing, China, pp. 242–248. IEEE, Los Alamitos (2004)Google Scholar
  29. 29.
    Yu, H., Han, J., Chang, K.: PEBL: positive example based learning for Web page classification using SVM. In: KDD02, pp. 239–248 (2002)Google Scholar
  30. 30.
    Webb, G.I., Zhang, S.: K-optimal rule discovery. Data Mining and Knowledge Discovery 10, 39–79 (2004)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Yuefeng Li
    • 1
  • Ning Zhong
    • 2
  1. 1.School of Software Engineering and Data Communications, Queensland University of Technology, Brisbane QLD 4001Australia
  2. 2.Department of Information Engineering, Maebashi Institute of Technology, Maebashi 371-0816Japan

Personalised recommendations