Advertisement

Knowledge and Information Systems

, Volume 61, Issue 3, pp 1695–1713 | Cite as

Categorizing relational facts from the web with fuzzy rough sets

  • Aditya Bharadwaj
  • Sheela RamannaEmail author
Regular Paper
  • 84 Downloads

Abstract

Significant advances have been made in automatically constructing knowledge bases of relational facts derived from web corpora. These relational facts are linguistic in nature and are represented as ordered pairs of nouns (Winnipeg, Canada) belonging to a category (City_Country). One major problem is that these facts are abundant but mostly unlabeled. Hence, semi-supervised learning approaches have been successful in building knowledge bases where a small number of labeled examples are used as seed (training) instances and a large number of unlabeled instances are learnt in an iterative fashion. In this paper, we propose a novel fuzzy rough set-based semi-supervised learning algorithm (FRL) for categorizing relational facts derived from a given corpus. The proposed FRL algorithm is compared with a tolerance rough set-based learner (TPL) and the coupled pattern learner (CPL). The same ontology derived from a subset of corpus from never ending language learner system was used in all of the experiments. This paper has demonstrated that the proposed FRL outperforms both TPL and CPL in terms of precision. The paper also addresses the concept drift problem by using mutual exclusion constraints. The contributions of this paper are: (i) introduction of a formal fuzzy rough model for relations, (ii) a semi-supervised learning algorithm, (iii) experimental comparison with other machine learning algorithms: TPL and CPL, and (iv) a novel application of fuzzy rough sets.

Keywords

Text categorization Relational facts Semi-supervised learning Fuzzy rough sets Web mining 

Notes

Acknowledgements

Special thanks to Cenker Sengoz for sharing the dataset and for discussions regarding TPL. We are very grateful to Prof. Estevam R. Hruschka Jr. for the NELL dataset and Prof. Andrzej Skowron for helpful suggestions.

References

  1. 1.
    Banko M, Cafarella M, Soderland S, Broadhead M, Etzioni O (2007) Open information extraction from the web. In: Proceedings of IJCAI, pp 2670–2676Google Scholar
  2. 2.
    Bharadwaj A, Ramanna S (2017) Fuzzy rough set-based unstructured text categorization. In: Mouhoub M, Langlais P (eds) Canadian AI 2017, LNAI 10233, pp 335–340CrossRefGoogle Scholar
  3. 3.
    Brin S (1999) Extracting patterns and relations from the world wide web. In: Selected papers from the international workshop on the world wide web and databases, WebDB’98, pp 172–183Google Scholar
  4. 4.
    Carlson A, Betteridge J, Wang RC, Hruschka Jr ER, Mitchell TM (2010) Coupled semi-supervised learning for information extraction. In: Proceedings of the 3rd ACM international conference on web search and data mining, pp 101–110Google Scholar
  5. 5.
    Cock MD, Cornelis C, Kerre EE (2004) Fuzzy rough sets: beyond the obvious. In: Proceedings of the 2004 IEEE international conference on fuzzy systems, vol 1, pp 103–108Google Scholar
  6. 6.
    Cornelis C, De Cock M, Radzikowska AM (2008) Fuzzy rough sets: from theory into practice. In: Pedrycz W, Skowron A, Kreinovich V (eds) Handbook of granular computing. Wiley, Hoboken, pp 533–552CrossRefGoogle Scholar
  7. 7.
    Curran J, Murphy T, Scholz B (2007) Minimising semantic drift with mutual exclusion bootstrapping. In: Proc. of PACLING, pp 172–180Google Scholar
  8. 8.
    De Cock M, Cornelis C (2005) Fuzzy rough set based web query expansion. In: Proceedings of rough sets and soft computing in intelligent agent and web technology, pp 9–16Google Scholar
  9. 9.
    Dong XL, Gabrilovich E, Heitz G, Horn W, Lao N, Murphy K, Strohmann T, Sun S, Zhang W (2014) Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: The 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’14, New York, pp 601–610Google Scholar
  10. 10.
    Dubois D, Prade H (1990) Rough fuzzy sets and fuzzy rough sets*. Int J Gener Syst 17(2–3):191–209CrossRefGoogle Scholar
  11. 11.
    Etzioni O, Fader A, Christensen J, Soderland S, Mausam (2011) Open information extraction: the second generation. In: International joint conference on artificial intelligence, pp 3–10Google Scholar
  12. 12.
    Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2013) A survey on concept drift adaptation. ACM Comput Surv 1(1):1–44CrossRefGoogle Scholar
  13. 13.
    Ghahramani Z, Heller KA (2005) Bayesian sets. In: Advances in neural information processing systems, vol 18Google Scholar
  14. 14.
    Ho TB, Nguyen NB (2002) Nonhierarchical document clustering based on a tolerance rough set model. Int J Intell Syst 17:199–212CrossRefGoogle Scholar
  15. 15.
    Jensen R, Shen Q (2008) Computational intelligence and feature selection: rough and fuzzy approaches, vol 8. Wiley, LondonCrossRefGoogle Scholar
  16. 16.
    Kawasaki S, Nguyen NB, Ho TB (2000) Hierarchical document clustering based on tolerance rough set model. In: Proceedings of the 4th European conference on principles of data mining and knowledge discovery, pp 458–463CrossRefGoogle Scholar
  17. 17.
    Mahdisoltani F, Biega J, Suchanek FM (2015) YAGO3: a knowledge base from multilingual wikipedias. In: 7th Biennial conference on innovative data systems research (CIDR 2015)Google Scholar
  18. 18.
    Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(22):39–41CrossRefGoogle Scholar
  19. 19.
    Mitchell T, Cohen W, Hruschka E, Talukdar P, Betteridge J, Carlson A, Dalvi B, Gardner M, Kisiel B, Krishnamurthy J, Lao N, Mazaitis K, Mohamed T, Nakashole N, Platanios E, Ritter A, Samadi M, Settles B, Wang R, Wijaya D, Gupta A, Chen X, Saparov A, Greaves M, Welling J (2018) Never-ending learning. Commun ACM 61(5):103–115CrossRefGoogle Scholar
  20. 20.
    Ngo CL (2003) A tolerance rough set approach to clustering web search results. Master’s thesis, Warsaw UniversityGoogle Scholar
  21. 21.
    Nguyen H, Ho TB (2008) Rough document clustering and the internet. In: Pedrycz W, Skowron A, Kreinovich V (eds) Handbook of granular computing. Wiley, Hoboken, pp 987–1003CrossRefGoogle Scholar
  22. 22.
    Nguyen S, Swieboda W, Jaskiewicz G (2012) Extended document representation for search result clustering. In: Bembenik R, Skonieczny L, Rybinski H, Niezgodka M (eds) Intelligent tools for building a scient. Info. Plat. SCI, vol 390, pp 77–95Google Scholar
  23. 23.
    Pal SK, Skowron A (eds) (1999) Rough-fuzzy hybridization: a new trend in decision making, 1st edn. Springer, SecaucuszbMATHGoogle Scholar
  24. 24.
    Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356CrossRefGoogle Scholar
  25. 25.
    Polkowski L, Skowron A, Zytkow J (1994) Tolerance based rough sets. In: Lin TY, Wildberger M (eds) Soft computing: rough sets, fuzzy logic, neural networks, uncertainty management, knowledge discovery. Simulation Councils Inc., San Diego, pp 55–58Google Scholar
  26. 26.
    Radzikowska AM, Kerre EE (2002) A comparative study of fuzzy rough sets. Fuzzy Sets Syst 126:137–156MathSciNetCrossRefGoogle Scholar
  27. 27.
    Ramanna S, Peters J, Sengoz C (2017) Application of tolerance rough sets in structured and unstructured text categorization: a survey. In: Wang G (ed) Thriving rough sets, studies in computational intelligence, vol 708. Springer, Cham, pp 119–137CrossRefGoogle Scholar
  28. 28.
    Rebele T, Suchanek F, Hoffart J, Biega J, Kuzey E, Weikum G (2016) YAGO: a multilingual knowledge base from wikipedia, wordnet, and geonames. Springer, Cham, pp 177–185Google Scholar
  29. 29.
    Sengoz C (2014) A granular-based approach for semi-supervised web information labeling. Master’s thesis, University of WinnipegGoogle Scholar
  30. 30.
    Sengoz C, Ramanna S (2014) A semi-supervised learning algorithm for web information extraction with tolerance rough sets. In: Active media technology 2014, Web Intelligence Conference 2014, LNCS 8610, pp 1–10Google Scholar
  31. 31.
    Sengoz C, Ramanna S (2015) Learning relational facts from the web: a tolerance rough set approach. Pattern Recogn Lett 67(P2):130–137CrossRefGoogle Scholar
  32. 32.
    Shi L, Ma X, Xi L, Duan Q, Zhao J (2011) Rough set and ensemble learning based semi-supervised algorithm for text classification. Expert Syst Appl 38(5):6300–6306CrossRefGoogle Scholar
  33. 33.
    Skowron A, Stepaniuk J (1996) Tolerance approximation spaces. Fundam Inf 27(2,3):245–253MathSciNetzbMATHGoogle Scholar
  34. 34.
    Srinivasan P, Ruiz ME, Kraft DH, Chen J (2001) Vocabulary mining for information retrieval: rough sets and fuzzy sets. Inf Process Manag 37(1):15–38CrossRefGoogle Scholar
  35. 35.
    Suchanek FM (2009) Automated construction and growth of a large ontology. PhD thesis, Natural Sciences and Technology of Saarland UniversityGoogle Scholar
  36. 36.
    Suchanek FM, Kasneci G, Weikum G (2007) Yago: a core of semantic knowledge. In: 16th international world wide web conference (WWW 2007). ACM Press, New York, pp 697–706Google Scholar
  37. 37.
    Swieboda W, Meina M, Nguyen H (2013) Weight learning for document tolerance rough set model. In: RSKT 2013, LNAI 8171. Springer, Berlin, pp 386–396CrossRefGoogle Scholar
  38. 38.
    Thanh NC, Yamada K, Unehara M (2011) A similarity rough set model for document representation and document clustering. J Adv Comput Intell Intell Inf 15(2):125–133CrossRefGoogle Scholar
  39. 39.
    Verma S, Hruschka Jr ER (2012) Coupled Bayesian sets algorithm for semi-supervised learning and information extraction. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 307–322Google Scholar
  40. 40.
    Virginia G, Nguyen HS (2013) Lexicon-based document representation. Fundam Inf 124(1–2):27–46MathSciNetzbMATHGoogle Scholar
  41. 41.
    Virginia G, Nguyen HS (2015) A semantic text retrieval for indonesian using tolerance rough sets models. Trans Rough Sets LNCS 8988(XIX):138–224MathSciNetzbMATHGoogle Scholar
  42. 42.
    Zadeh L (1997) Towards a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets Syst 177(19):111–127MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Applied Computer ScienceUniversity of WinnipegWinnipegCanada

Personalised recommendations