On the Need to Bootstrap Ontology Learning with Extraction Grammar Learning

  • Georgios Paliouras
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3596)


The main claim of this paper is that machine learning can help integrate the construction of ontologies and extraction grammars and lead us closer to the Semantic Web vision. The proposed approach is a bootstrapping process that combines ontology and grammar learning, in order to semi-automate the knowledge acquisition process. After providing a survey of the most relevant work towards this goal, recent research of the Software and Knowledge Engineering Laboratory (SKEL) of NCSR “Demokritos” in the areas of Web information integration, information extraction, grammar induction and ontology enrichment is presented. The paper concludes with a number of interesting issues that need to be addressed in order to realize the advocated bootstrapping process.


Information Extraction Domain Ontology Formal Concept Analysis Conceptual Graph Extraction Task 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Appelt, D.E., Hobbs, J.R., Bear, J., Israel, D.J., Tyson, M.: Fastus: A finite-state processor for information extraction from real-world text. In: Bajcsy, R. (ed.) IJCAI, pp. 1172–1178 (1993)Google Scholar
  2. 2.
    Bikel, D.M., Miller, S., Schwartz, R.L., Weischedel, R.M.: Nymble: a high-performance learning name-finder. In: ANLP, pp. 194–201 (1997)Google Scholar
  3. 3.
    Brewster, C., Ciravegna, F., Wilks, Y.: User-centred ontology learning for knowledge management. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds.) NLDB 2002. LNCS, vol. 2553, pp. 203–207. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  4. 4.
    Buitelaar, P., Handschuh, S., Magnini, B. (eds.): Proceedings of the ECAI Ontology Learning and Population Workshop, Valencia, Spain, August 22-24 (2004)Google Scholar
  5. 5.
    Buitelaar, P., Olejnik, D., Sintek, M.: A protégé plug-in for ontology extraction from text based on linguistic analysis. In: Bussler, C.J., Davies, J., Fensel, D., Studer, R. (eds.) ESWS 2004. LNCS, vol. 3053, pp. 31–44. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  6. 6.
    Cimiano, P., Hotho, A., Stumme, G., Tane, J.: Conceptual knowledge processing with formal concept analysis and ontologies. In: Eklund, P. (ed.) ICFCA 2004. LNCS (LNAI), vol. 2961, pp. 189–207. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  7. 7.
    Cimiano, P., Schmidt-Thieme, L., Pivk, A., Staab, S.: Learning taxonomic relations from heterogeneous evidence. In: Buitelaar et al. [4]Google Scholar
  8. 8.
    Ciravegna, F.: Adaptive information extraction from text by rule induction and generalisation. In: Nebel, B. (ed.) IJCAI, pp. 1251–1256. Morgan Kaufmann, San Francisco (2001)Google Scholar
  9. 9.
    Ciravegna, F., Dingli, A., Guthrie, D., Wilks, Y.: Integrating information to bootstrap information extraction from web sites. In: Kambhampati, S., Knoblock, C.A. (eds.) IIWeb, pp. 9–14 (2003)Google Scholar
  10. 10.
    Corbett, D.: Interoperability of ontologies using conceptual graph theory. In: Wolff, K.E., Pfeiffer, H.D., Delugach, H.S. (eds.) ICCS 2004. LNCS (LNAI), vol. 3127, pp. 375–387. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  11. 11.
    Crysmann, B., Frank, A., Kiefer, B., Mueller, S., Neumann, G., Piskorski, J., Schäfer, U., Siegel, M., Uszkoreit, H., Xu, F., Becker, M., Krieger, H.-U.: An integrated archictecture for shallow and deep processing. In: ACL, pp. 441–448 (2002)Google Scholar
  12. 12.
    Delteil, A., Faron, C., Dieng, R.: Building concept lattices by learning concepts from rdf graphs annotating web documents. In: Priss, et al. [39], pp. 191–204Google Scholar
  13. 13.
    Dietterich, T.G.: Learning at the Knowledge Level. Machine Learning 1(3), 287–316 (1986)Google Scholar
  14. 14.
    Embley, D.W.: Programming with data frames for everyday data items. In: NCC, p. 301305 (1980)Google Scholar
  15. 15.
    Embley, D.W.: Towards semantic understanding – an approach based on information extraction ontologies. In: Schewe, K.-D., Williams, H.E. (eds.) ADC. CRPIT, vol. 27, p. 3. Australian Computer Society (2004)Google Scholar
  16. 16.
    Faure, D., Nedellec, C.: Knowledge acquisition of predicate argument structures from technical texts using machine learning: The system ASIUM. In: Fensel, D., Studer, R. (eds.) EKAW 1999. LNCS (LNAI), vol. 1621, pp. 329–334. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  17. 17.
    Fellbaum, C. (ed.): WordNet An Electronic Lexical Database. Bradford Books (1998)Google Scholar
  18. 18.
    Gaizauskas, R., Wakao, T., Humphreys, K., Cunningham, H., Wilks, Y.: University of sheffield: Description of the lasie system as used for muc-6. In: MUC-6, pp. 207–220 (1995)Google Scholar
  19. 19.
    Hahn, U., Markó, K.G.: An integrated, dual learner for grammars and ontologies. Data Knowl. Eng. 42(3), 273–291 (2002)zbMATHCrossRefGoogle Scholar
  20. 20.
    Hakeem, A., Sheikh, Y., Shah, M.: Casee: A hierarchical event representation for the analysis of videos. In: McGuinness, D.L., Ferguson, G. (eds.) AAAI, pp. 263–268. AAAI Press / The MIT Press (2004)Google Scholar
  21. 21.
    Hess, J., Cyre, W.R.: A cg-based behavior extraction system. In: Tepfenhart, W.M., Cyre, W.R. (eds.) ICCS 1999. LNCS, vol. 1640, pp. 127–139. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  22. 22.
    Jacobs, P.S., Rau, L.F.: Scisor: Extracting information from on-line news. Communications of the ACM 33(11), 88–97 (1990)CrossRefGoogle Scholar
  23. 23.
    Karkaletsis, V., Paliouras, G., Spyropoulos, C.D.: A bootstrapping approach to knowledge acquisition from multimedia content with ontology evolution. In: Honkela, T., Simula, O. (eds.) AKRR. Helsinki University of Technology (2005)Google Scholar
  24. 24.
    Karkaletsis, V., Spyropoulos, C.D.: Cross-lingual information management from web pages. In: PCI (2003)Google Scholar
  25. 25.
    Kushmerick, N.: Wrapper induction: Efficiency and expressiveness. Artificial Intelligence 118(1-2), 15–68 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  26. 26.
    Langley, P., Stromsten, S.: Learning context-free grammars with a simplicity bias. In: Lopez de Mantaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 220–228. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  27. 27.
    Maedche, A., Staab, S.: Mining ontologies from text. In: Dieng, R., Corby, O. (eds.) EKAW 2000. LNCS (LNAI), vol. 1937, pp. 189–202. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  28. 28.
    Modayil, J., Kuipers, B.: Bootstrap learning for object discovery. In: IROS. IEEE Press, Los Alamitos (2004)Google Scholar
  29. 29.
    Muslea, I., Minton, S., Knoblock, C.A.: A hierarchical approach to wrapper induction. In: Agents, pp. 190–197 (1999)Google Scholar
  30. 30.
    Navigli, R., Velardi, P.: Learning domain ontologies from document warehouses and dedicated websites. Computational Linguistics 30(2) (2004)Google Scholar
  31. 31.
    Neumann, G., Xu, F.: Course on intelligent information extraction. In: ESSLI (2004)Google Scholar
  32. 32.
    Neumann, G., Piskorski, J.: A shallow text processing core engine. Computational Intelligence 18(3), 451–476 (2002)CrossRefGoogle Scholar
  33. 33.
    Nicolas, S., Moulin, B., Mineau, G.W.: Sesei: A cg-based filter for internet search engines. In: Ganter, B., de Moor, A., Lex, W. (eds.) ICCS 2003. LNCS, vol. 2746, pp. 362–377. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  34. 34.
    Ogata, N., Collier, N.: Ontology express: Statistical and non-monotonic learning of domain ontologies from text. In: Buitelaar, et al. [4], pp. 19–24Google Scholar
  35. 35.
    Patrick, J.: The scamseek project: Text mining for finanical scams on the internet. In: Simoff, S.J., Williams, G.J. (eds.) ADMC, pp. 33–38 (2004)Google Scholar
  36. 36.
    Petasis, G., Cucchiarelli, A., Velardi, P., Paliouras, G., Karkaletsis, V., Spyropoulos, C.D.: Automatic adaptation of proper noun dictionaries through cooperation of machine learning and probabilistic methods. In: Belkin, N.J., Ingwersen, P., Leong, M.-K. (eds.) SIGIR, pp. 128–135. ACM, New York (2000)CrossRefGoogle Scholar
  37. 37.
    Petasis, G., Paliouras, G., Karkaletsis, V., Halatsis, C., Spyropoulos, C.D.: e-grids: Computationally efficient grammatical inference from positive examples. Grammars (2004)Google Scholar
  38. 38.
    Petasis, G., Paliouras, G., Spyropoulos, C.D., Halatsis, C.: eg-grids: Context-free grammatical inference from positive examples using genetic search. In: Paliouras, G., Sakakibara, Y. (eds.) ICGI 2004. LNCS (LNAI), vol. 3264, pp. 223–234. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  39. 39.
    Priss, U., Corbett, D., Angelova, G. (eds.): ICCS 2002. LNCS (LNAI), vol. 2393. Springer, Heidelberg (2002)zbMATHGoogle Scholar
  40. 40.
    Reeve, L., Han, H.: The survey of semantic annotation platforms. In: ACM/SAC (2005)Google Scholar
  41. 41.
    Reidsma, D., Kuper, J., Declerck, T., Saggion, H., Cunningham, H.: Cross document ontology based information extraction for multimedia retrieval. In: Supplementary proceedings of the ICCS 2003, Dresden (2003)Google Scholar
  42. 42.
    Reinberger, M.-L., Spyns, P.: Discovering knowledge in texts for the learning of dogma-inspired ontologies. In: Buitelaar, et al. [4], pp. 19–24Google Scholar
  43. 43.
    Richards, D.: Addressing the ontology acquisition bottleneck through reverse ontological engineering. Knowledge and Information Systems 6(4), 402–427 (2004)CrossRefMathSciNetGoogle Scholar
  44. 44.
    Riloff, E., Jones, R.: Learning dictionaries for information extraction by multi-level bootstrapping. In: AAAI/IAAI, pp. 474–479 (1999)Google Scholar
  45. 45.
    Angelova, G., Boytcheva, S., Dobrev, P.: Cgextract: Towards extraction of conceptual graphs from controlled english. In: Supplementary proceedings of the ICCS 2001, Stanford, USA (2001)Google Scholar
  46. 46.
    Sigletos, G., Paliouras, G., Spyropoulos, C.D., Stamapoulos, T.: Stacked generalization for information extraction. In: de Mántaras, R.L., Saitta, L. (eds.) ECAI, pp. 549–553. IOS Press, Amsterdam (2004)Google Scholar
  47. 47.
    Spyropoulos, C.D., Karkaletsis, V., Grover, C., Pazienza, M.-T., Souflis, D., Coch, J.: Final report of the project crossmarc (cross-lingual multi agent retail comparison). Technical report (2003)Google Scholar
  48. 48.
    Valarakos, A.G., Paliouras, G., Karkaletsis, V., Vouros, G.A.: Enhancing ontological knowledge through ontology population and enrichment. In: Motta, E., Shadbolt, N.R., Stutt, A., Gibbins, N. (eds.) EKAW 2004. LNCS (LNAI), vol. 3257, pp. 144–156. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  49. 49.
    Valarakos, A.G., Paliouras, G., Karkaletsis, V., Vouros, G.A.: A name-matching algorithm for supporting ontology enrichment. In: Vouros, G.A., Panayiotopoulos, T. (eds.) SETN 2004. LNCS (LNAI), vol. 3025, pp. 381–389. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  50. 50.
    Wolff, G.: Grammar discovery as data compression. In: AISB/GI, pp. 375–379 (1978)Google Scholar
  51. 51.
    Xu, F., Kurz, D., Piskorski, J., Schmeier, S.: Term extraction and mining of term relations from unrestricted texts in the financial domain. In: BIS (2002)Google Scholar
  52. 52.
    Montes y Gómez, M., Gelbukh, A.F., López-López, A.: Text mining at detail level using conceptual graphs. In: Priss, et al. [39], pp. 122–136Google Scholar
  53. 53.
    Yangarber, R., Lin, W., Grishman, R.: Unsupervised learning of generalized names. In: COLING (2002)Google Scholar
  54. 54.
    Zhang, L., Yu, Y.: Learning to generate cgs from domain specific sentences. In: Delugach, H.S., Stumme, G. (eds.) ICCS 2001. LNCS (LNAI), vol. 2120, pp. 44–57. Springer, Heidelberg (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Georgios Paliouras
    • 1
  1. 1.Institute of Informatics and TelecommunicationsNCSR “Demokritos”Greece

Personalised recommendations