Mining Rough Association from Text Documents

  • Yuefeng Li
  • Ning Zhong
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4259)


It is a big challenge to guarantee the quality of association rules in some application areas (e.g., in Web information gathering) since duplications and ambiguities of data values (e.g., terms). This paper presents a novel concept of rough association rules to improve the quality of discovered knowledge in these application areas. The premise of a rough association rule consists of a set of terms (items) and a weight distribution of terms (items). The distinct advantage of rough association rules is that they contain more specific information than normal association rules. It is also feasible to update rough association rules dynamically to produce effective results. The experimental results also verify that the proposed approach is promising.


Decision Rule Association Rule Rule Mining Decision Table Breakeven Point 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Antonie, M.L., Zaiane, O.R.: Text document categorization by term association. In: 2nd IEEE International Conference on Data Mining, Japan, pp. 19–26 (2002)Google Scholar
  2. 2.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)Google Scholar
  3. 3.
    Chang, G., Healey, M.J., McHugh, J.A.M., Wang, J.T.L.: Mining the World Wide Web: an information search approach. Kluwer Academic Publishers, Dordrecht (2001)MATHGoogle Scholar
  4. 4.
    Eirinaki, M., Vazirgiannis, M.: Web mining for web personalization. ACM Transactions on Internet Technology 3(1), 1–27 (2003)CrossRefGoogle Scholar
  5. 5.
    Evans, D.A., et al.: CLARIT experiments in batch filtering: term selection and threshold optimization in IR and SVM Filters. In: TREC 2002 (2002)Google Scholar
  6. 6.
    Feldman, R., Hirsh, H.: Mining associations in text in presence of background knowledge. In: 2nd ACM SIGKDD, pp. 343–346 (1996)Google Scholar
  7. 7.
    Feldman, R., et al.: Maximal association rules: a new tool for mining for keyword co-occurrences in document collection. In: KDD 1997, pp. 167–170 (1997)Google Scholar
  8. 8.
    Feldman, R., et al.: Text mining at the term level. In: Żytkow, J.M. (ed.) PKDD 1998. LNCS, vol. 1510, pp. 65–73. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  9. 9.
    Feldman, R., Dagen, I., Hirsh, H.: Mining text using keywords distributions. Journal of Intelligent Information Systems 10(3), 281–300 (1998)CrossRefGoogle Scholar
  10. 10.
    Grossman, D.A., Frieder, O.: Information retrieval algorithms and heuristics. Kluwer Academic Publishers, Boston (1998)MATHGoogle Scholar
  11. 11.
    Guan, J.W., Bell, D.A., Liu, D.Y.: The rough set approach to association rules. In: 3rd IEEE International Conference on Data Mining, Melbourne, Florida, USA, pp. 529–532 (2003)Google Scholar
  12. 12.
    Holt, J.D., Chung, S.M.: Multipass algorithms for mining association rules in text databases. Knowledge and Information Systems 3, 168–183 (2001)MATHCrossRefGoogle Scholar
  13. 13.
    Li, X., Liu, B.: Learning to classify texts using positive and unlabeled data. In: IJCAI, pp. 587–592 (2003)Google Scholar
  14. 14.
    Li, Y., Zhong, N.: Web mining model and its applications on information gathering. Knowledge-Based Systems 17, 207–217 (2004)CrossRefGoogle Scholar
  15. 15.
    Li, Y., Zhong, N.: Capturing evolving patterns for ontology-based. In: IEEE/WIC/ACM International Conference on Web Intelligence, Beijing, China, pp. 256–263 (2004)Google Scholar
  16. 16.
    Li, Y., Zhong, N.: Interpretations of association rules by granular computing. In: 3rd IEEE International Conference on Data Mining, Melbourne, Florida, USA, pp. 593–596 (2003)Google Scholar
  17. 17.
    Li, Y., Zhong, N.: Mining ontology for automatically acquiring Web user information needs. IEEE Transactions on Knowledge and Data Engineering 18(4), 554–568 (2006)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Mostafa, J., Lam, W., Palakal, M.: A multilevel approach to intelligent information filtering: model, system, and evaluation. ACM Transactions on Information Systems 15(4), 368–399 (1997)CrossRefGoogle Scholar
  19. 19.
    Pawlak, Z.: In pursuit of patterns in data reasoning from data, the rough set way. In: 3rd International Conference on Rough Sets and Current Trends in Computing, USA, pp. 1–9 (2002)Google Scholar
  20. 20.
    Pawlak, Z., Skowron, A.: Rough sets and Boolean reasoning. In: Information Science (2006) (to appear)Google Scholar
  21. 21.
    Robertson, S., Hull, D.A.: The TREC-9 filtering track final report, TREC-9 (2000)Google Scholar
  22. 22.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)CrossRefGoogle Scholar
  23. 23.
    Tzvetkov, P., Yan, X., Han, J.: TSP: Mining top-K closed sequential patterns. In: Proceedings of 3rd IEEE International Conference on Data Mining, Melbourne, Florida, USA, pp. 347–354 (2003)Google Scholar
  24. 24.
    Wu, S.-T., Li, Y., Xu, Y., Pham, B., Chen, P.: Automatic pattern taxonomy exatraction for Web mining. In: IEEE/WIC/ACM International Conference on Web Intelligence, Beijing, China, pp. 242–248 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Yuefeng Li
    • 1
  • Ning Zhong
    • 2
  1. 1.School of Software Engineering and Data CommunicationsQueensland University of TechnologyBrisbaneAustralia
  2. 2.Department of Systems and Information EngineeringMaebashi Institute of TechnologyJapan

Personalised recommendations