Text Mining Scientific Papers: A Survey on FCA-Based Information Retrieval Research

  • Jonas Poelmans
  • Dmitry I. Ignatov
  • Stijn Viaene
  • Guido Dedene
  • Sergei O. Kuznetsov
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7377)

Abstract

Formal Concept Analysis (FCA) is an unsupervised clustering technique and many scientific papers are devoted to applying FCA in Information Retrieval (IR) research. We collected 103 papers published between 2003-2009 which mention FCA and information retrieval in the abstract, title or keywords. Using a prototype of our FCA-based toolset CORDIET, we converted the pdf-files containing the papers to plain text, indexed them with Lucene using a thesaurus containing terms related to FCA research and then created the concept lattice shown in this paper. We visualized, analyzed and explored the literature with concept lattices and discovered multiple interesting research streams in IR of which we give an extensive overview. The core contributions of this paper are the innovative application of FCA to the text mining of scientific papers and the survey of the FCA-based IR research.

Keywords

Information Retrieval Concept Lattice Query Enlargement Information Retrieval System Formal Context 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ahmad, I., Jang, T.S.: Old Fashion Text-Based Image Retrieval Using FCA. In: Proc. IEEE Int. Conf. Image Processing, ICIP-III, vol. 2, pp. 33–36 (2003)Google Scholar
  2. 2.
    Amato, G., Meghini, C.: Faceted Content-based Image Retrieval. In: Proc. 19th IEEE Int. Conf. on Database and Expert Systems Application, DEXA, pp. 402–406. (2008)Google Scholar
  3. 3.
    Bruno, M., Canfora, G., Penta, M.D., Scognamiglio, R.: An Approach to support Web Service Classification and Annotation. In: Proc. IEEE Int. Conf. on e-Technology, e-Commerce and e-Service, pp. 138–143 (2005)Google Scholar
  4. 4.
    Carpineto, C., Romano, G.: A lattice conceptual clustering system and its application to browsing retrieval. Machine Learning 24(2), 1–28 (1996b)Google Scholar
  5. 5.
    Carpineto, C., Romano, G.: Concept data analysis: Theory and applications. John Wiley & Sons (2004a)Google Scholar
  6. 6.
    Carpineto, C., Romano, G.: Exploiting the Potential of Concept Lattices for Information Retrieval with CREDO. J. of Universal Computing 10(8), 985–1013 (2004b)Google Scholar
  7. 7.
    Carpineto, C., Romano, G.: Using Concept Lattices for Text Retrieval and Mining. In: Ganter, B., Stumme, G., Wille, R. (eds.) ICFCA 2005. LNCS (LNAI), vol. 3626, pp. 161–179. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  8. 8.
    Ceravolo, P., Gusmini, A., Leida, M., Cui, Z.: An FCA-based mapping generator. In: 12th IEEE int. Conf. on Emerging Technologies and Factory Automation, pp. 796–803 (2007)Google Scholar
  9. 9.
    Cigarrán, J.M., Gonzalo, J., Peñas, A., Verdejo, M.F.: Browsing Search Results via Formal Concept Analysis: Automatic Selection of Attributes. In: Eklund, P. (ed.) ICFCA 2004. LNCS (LNAI), vol. 2961, pp. 74–87. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  10. 10.
    Cigarrán, J.M., Peñas, A., Gonzalo, J., Verdejo, M.F.: Automatic Selection of Noun Phrases as Document Descriptors in an FCA-Based Information Retrieval System. In: Ganter, B., Godin, R. (eds.) ICFCA 2005. LNCS (LNAI), vol. 3403, pp. 49–63. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  11. 11.
    Cole, R., Eklund, P.: Browsing Semi-structured Web Texts Using Formal Concept Analysis. In: Delugach, H.S., Stumme, G. (eds.) ICCS 2001. LNCS (LNAI), vol. 2120, pp. 319–332. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  12. 12.
    Cole, R., Eklund, P., Stumme, G.: Document retrieval for e-mail search and discovery using Formal Concept Analysis. In: Applied Artificial Intelligence, vol. 17, pp. 257–280. Taylor & Francis (2003)Google Scholar
  13. 13.
    Cole, R.J.: The management and visualization of document collections using Formal Concept Analysis. Ph. D. Thesis, Griffith University (2000)Google Scholar
  14. 14.
    Ignatov, D.I., Kuznetsov, S.O.: Frequent Itemset Mining for Clustering Near Duplicate Web Documents. In: Rudolph, S., Dau, F., Kuznetsov, S.O. (eds.) ICCS 2009. LNCS (LNAI), vol. 5662, pp. 185–200. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  15. 15.
    Dau, F., Ducrou, J., Eklund, P.: Concept Similarity and Related Categories in SearchSleuth. In: Eklund, P., Haemmerlé, O. (eds.) ICCS 2008. LNCS (LNAI), vol. 5113, pp. 255–268. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  16. 16.
    De Souza, K.X.S., Davis, J.: Using an Aligned Ontology to Process User Queries. In: Bussler, C.J., Fensel, D. (eds.) AIMSA 2004. LNCS (LNAI), vol. 3192, pp. 44–53. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  17. 17.
    Ducrou, J.: DVDSleuth: A Case Study in Applied Formal Concept Analysis for Navigating Web Catalogs. In: Priss, U., Polovina, S., Hill, R. (eds.) ICCS 2007. LNCS (LNAI), vol. 4604, pp. 496–500. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  18. 18.
    Ducrou, J., Vormbrock, B., Eklund, P.: FCA-Based Browsing and Searching of a Collection of Images. In: Schärfe, H., Hitzler, P., Øhrstrøm, P. (eds.) ICCS 2006. LNCS (LNAI), vol. 4068, pp. 203–214. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  19. 19.
    Ducrou, J., Eklund, P.W.: SearchSleuth: The Conceptual Neighborhood of an Web Query. In: CLA (2007b)Google Scholar
  20. 20.
    Ducrou, J., Wormuth, B., Eklund, P.: Dynamic Schema Navigation Using Formal Concept Analysis. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2005. LNCS, vol. 3589, pp. 398–407. Springer, Heidelberg (2005b)CrossRefGoogle Scholar
  21. 21.
    Eklund, P., Ducrou, J.: Navigation and Annotation with Formal Concept Analysis. In: Richards, D., Kang, B.-H. (eds.) PKAW 2008. LNCS, vol. 5465, pp. 118–121. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  22. 22.
    Eklund, P., Ducrou, J., Brawn, P.: Concept Lattices for Information Visualization: Can Novices Read Line-Diagrams? In: Eklund, P. (ed.) ICFCA 2004. LNCS (LNAI), vol. 2961, pp. 57–73. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  23. 23.
    Eklund, P., Wille, R.: Semantology as Basis for Conceptual Knowledge Processing. In: Kuznetsov, S.O., Schmidt, S. (eds.) ICFCA 2007. LNCS (LNAI), vol. 4390, pp. 18–38. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  24. 24.
    Eklund, P., Wormuth, B.: Restructuring Help Systems Using Formal Concept Analysis. In: Ganter, B., Godin, R. (eds.) ICFCA 2005. LNCS (LNAI), vol. 3403, pp. 129–144. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  25. 25.
    Ganter, B., Wille, R.: Formal Concept Analysis. Mathematical foundations. Springer (1999)Google Scholar
  26. 26.
    Recio-García, J.A., Gómez-Martín, M.A., Díaz-Agudo, B., González-Calero, P.A.: Improving Annotation in the Semantic Web and Case Authoring in Textual CBR. In: Roth-Berghofer, T.R., Göker, M.H., Güvenir, H.A. (eds.) ECCBR 2006. LNCS (LNAI), vol. 4106, pp. 226–240. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  27. 27.
    Godin, R., Gecsei, J., Pichet, C.: Design of browsing interface for information retrieval. In: Belkin, N.J., et al. (eds.) Proc. GIR, pp. 32–39 (1989)Google Scholar
  28. 28.
    Godin, R., Missaoui, R., April, A.: Experimental comparison of navigation in a Galois lattice with conventional information retrieval methods. Int. J. Man-Machine Studies 38, 747–767 (1993)CrossRefGoogle Scholar
  29. 29.
    Hachani, N., Ben Hassine, M.A., Chettaoui, H., et al.: Cooperative answering of fuzzy queries. Journal of Computer Science and Technology 24(4), 675–686 (2009)CrossRefGoogle Scholar
  30. 30.
    Hitzler, P., Krötzsch, M.: Querying Formal Contexts with Answer Set Programs. In: Schärfe, H., Hitzler, P., Øhrstrøm, P. (eds.) ICCS 2006. LNCS (LNAI), vol. 4068, pp. 260–273. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  31. 31.
    Ignatov, D.I., Kuznetsov, S.O.: Frequent Itemset Mining for Clustering Near Duplicate Web Documents. In: Rudolph, S., Dau, F., Kuznetsov, S.O. (eds.) ICCS 2009. LNCS (LNAI), vol. 5662, pp. 185–200. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  32. 32.
    Kim, M., Compton, P.: Evolutionary Document Management and Retrieval for Specialised Domains on the Web. Int. J. of Human Computer Studies 60(2), 201–241 (2004)CrossRefGoogle Scholar
  33. 33.
    Kim, M., Compton, P.: A Hybrid Browsing Mechanism Using Conceptual Scales. In: Hoffmann, A., Kang, B.-H., Richards, D., Tsumoto, S. (eds.) PKAW 2006. LNCS (LNAI), vol. 4303, pp. 132–143. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  34. 34.
    Koester, B.: Conceptual Knowledge Retrieval with FooCA: Improving Web Search Engine Results with Contexts and Concept Hierarchies. In: Perner, P. (ed.) ICDM 2006. LNCS (LNAI), vol. 4065, pp. 176–190. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  35. 35.
    Lakhal, L., Stumme, G.: Efficient Mining of Association Rules Based on Formal Concept Analysis. In: Ganter, B., Stumme, G., Wille, R. (eds.) ICFCA 2005. LNCS (LNAI), vol. 3626, pp. 180–195. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  36. 36.
    Le Grand, B., Aufaure, M.A., Soto, M.: Semantic and Conceptual Context-Aware Information Retrieval. In: Damiani, E., Yetongnon, K., Chbeir, R., Dipanda, A. (eds.) SITIS 2006. LNCS, vol. 4879, pp. 247–258. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  37. 37.
    Liu, M., Shao, M., Zhang, W., Wu, C.: Reduction method for concept lattices based on rough set theory and its application. Computers and Mathematics with Applications 53, 1390–1410 (2007)MathSciNetMATHCrossRefGoogle Scholar
  38. 38.
    Lungley, D., Kruschwitz, U.: Automatically Maintained Domain Knowledge: Initial Findings. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 739–743. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  39. 39.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)Google Scholar
  40. 40.
    Messai, N., Devignes, M.D., Napoli, A., Smail-Tabbone, M.: Extending Attribute Dependencies for Lattice-based Querying and Navigation. In: Eklund, P., Haemmerlé, O. (eds.) ICCS 2008. LNCS (LNAI), vol. 5113, pp. 189–202. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  41. 41.
    Messai, N., Devignes, M.-D., Napoli, A., Smaïl-Tabbone, M.: Querying a Bioinformatic Data Sources Registry with Concept Lattices. In: Dau, F., Mugnier, M.-L., Stumme, G. (eds.) ICCS 2005. LNCS (LNAI), vol. 3596, pp. 323–336. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  42. 42.
    Mili, H., Ah-Ki, E., Godin, R., Mcheick, H.: Another nail to the coffin of faceted controlled-vocabulary component classification and retrieval. VCM SIGSOFT Software Engineering Notes 22(3), 89–98 (1997)CrossRefGoogle Scholar
  43. 43.
    Muangon, W., Intakosum, S.: Retrieving Design Patterns by Case-Based Reasoning and Formal Concept Analysis. In: 2nd Int. Conf. Comp. Sc. Inf. Technology, pp. 424–428 (2009)Google Scholar
  44. 44.
    Nafkha, I., Jaoua, A.: Using Formal Concept Analysis for Heterogeneous Information. In: Belohlavek, R.R., et al. (eds.) CLA, pp. 107–122 (2005)Google Scholar
  45. 45.
    Nauer, E., Toussaint, Y.: CreChainDo: An iterative and interactive Web information retrieval system based on lattices. International Journal of General Systems 38(4), 363–378 (2009)MATHCrossRefGoogle Scholar
  46. 46.
    Peng, D., Huang, S., Wang, X., Zhou, A.: Concept-Based Retrieval of Alternate Web Services. In: Zhou, L.-z., Ooi, B.-C., Meng, X. (eds.) DASFAA 2005. LNCS, vol. 3453, pp. 359–371. Springer, Heidelberg (2005a)CrossRefGoogle Scholar
  47. 47.
    Peng, X., Zhao, W.: An Incremental and FCA-based Ontology Construction Method for Semantics-based Component Retrieval. In: 7th Int. Conf. on Quality Soft, pp. 309–315 (2007)Google Scholar
  48. 48.
    Poelmans, J., Elzinga, P., Viaene, S., Dedene, G.: Formal Concept Analysis in Knowledge Discovery: A Survey. In: Croitoru, M., Ferré, S., Lukose, D. (eds.) ICCS 2010. LNCS, vol. 6208, pp. 139–153. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  49. 49.
    Poelmans, J., Elzinga, P., Viaene, S., Dedene, G.: Concept Discovery Innovations in Law Enforcement: a Perspective. In: IEEE CINS Workshop (INCos), Greece (2010b)Google Scholar
  50. 50.
    Poelmans, J., Elzinga, P., Viaene, S., Dedene, G.: Curbing domestic violence: Instantiating C-K theory with Formal Concept Analysis and Emergent Self Organizing Maps. Intelligent Systems in Accounting, Finance and Management 17(3-4), 167–191 (2010c)CrossRefGoogle Scholar
  51. 51.
    Polaillon, G., Aufaure, M.A., Le Grand, B., Soto, M.: FCA for contextual semantic navigation and information retrieval in heterogeneous information systems. In: 8th IEEE Int. Workshop on Database and Expert Systems Applications, pp. 534–539 (2007)Google Scholar
  52. 52.
    Poshyvanyk, D., Marcus, A.: Combining Formal Concept Analysis with Information Retrieval for Concept Location in Source Code. In: Proc. IEEE Int. Conf. on Program Comprehension, pp. 37–48 (2007)Google Scholar
  53. 53.
    Priss, U.: Lattice-based Information Retrieval. Knowledge Organization 27(3), 132–142 (2000)Google Scholar
  54. 54.
    Priss, U.: Formal Concept Analysis in Information Science. In: Blaise, C. (ed.) Annual Review of Information Science and Technology, ASIST, vol. 40, pp. 521–543 (2006)Google Scholar
  55. 55.
    Spyratos, N., Meghini, C.: Preference-Based Query Tuning Through Refinement/Enlargement in a Formal Context. In: Dix, J., Hegner, S.J. (eds.) FoIKS 2006. LNCS, vol. 3861, pp. 278–293. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  56. 56.
    Stojanovic, N.: On the query refinement in the ontology-based searching for information. Information Systems 30(7), 543–563 (2005)MathSciNetCrossRefGoogle Scholar
  57. 57.
    Stojanovic, N.: On Using Query Neighborhood for Better Navigation through a Product Catalog: SMART Approach. In: IEEE Int. Conf. e-Tech., e-Com. and e-Service (2004)Google Scholar
  58. 58.
    Tane, J.: Using a Query-Based Multicontext for Knowledge Base Browsing. In: 3rd Int. Conf., ICFCA - Supplementary, Lens, France, pp. 62–78 (2005)Google Scholar
  59. 59.
    Tane, J., Cimiano, P., Hitzler, P.: Query-Based Multicontexts for Knowledge Base Browsing: An Evaluation. In: Schärfe, H., Hitzler, P., Øhrstrøm, P. (eds.) ICCS 2006. LNCS (LNAI), vol. 4068, pp. 413–426. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  60. 60.
    Tilley, T.: Tool Support for FCA. In: Eklund, P. (ed.) ICFCA 2004. LNCS (LNAI), vol. 2961, pp. 104–111. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  61. 61.
    Tilley, T., Eklund, P.: Citation analysis using Formal Concept Analysis: A case study in software engineering. In: 18th Int. Conf., DEXA, pp. 545–550 (2007)Google Scholar
  62. 62.
    Wille, R.: Restructuring lattice theory: an approach based on hierarchies of concepts. In: Rival, I. (ed.) Ordered Sets, pp. 445–470. Reidel, Dordrecht-Boston (1982)Google Scholar
  63. 63.
    Wille, R.: Methods of Conceptual Knowledge Processing. In: Missaoui, R., Schmidt, J. (eds.) ICFCA 2006. LNCS (LNAI), vol. 3874, pp. 1–29. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  64. 64.
    Zhang, Y., Feng, B., Xue, Y.: A New Search Results Clustering Algorithm based on Formal Concept Analysis. In: 5th Int. Conf. on FSKD, pp. 356–360 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Jonas Poelmans
    • 1
    • 4
  • Dmitry I. Ignatov
    • 4
  • Stijn Viaene
    • 1
    • 2
  • Guido Dedene
    • 1
    • 3
  • Sergei O. Kuznetsov
    • 4
  1. 1.Faculty of Business and EconomicsK.U. LeuvenLeuvenBelgium
  2. 2.Vlerick Leuven Gent Management SchoolLeuvenBelgium
  3. 3.Universiteit van Amsterdam Business SchoolAmsterdamThe Netherlands
  4. 4.National Research University Higher School of Economics (HSE)MoscowRussia

Personalised recommendations