Semantic Views of Homogeneous Unstructured Data

  • Weronika T. Adrian
  • Nicola Leone
  • Marco MannaEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9209)


Homogeneous unstructured data (HUD) are collections of unstructured documents that share common properties, such as similar layout, common file format, or common domain of values. Building on such properties, it would be desirable to automatically process HUD to access the main information through a semantic layer – typically an ontology – called semantic view. Hence, we propose an ontology-based approach for extracting semantically rich information from HUD, by integrating and extending recent technologies and results from the fields of classical information extraction, table recognition, ontologies, text annotation, and logic programming. Moreover, we design and implement a system, named KnowRex, that has been successfully applied to curriculum vitae in the Europass style to offer a semantic view of them, and be able, for example, to select those which exhibit required skills.


Unstructured data Ontologies Semantic information extraction Table recognition Semantic views 



The work has been supported by Regione Calabria, programme POR Calabria FESR 2007–2013, within project “KnowRex: Un sistema per il riconoscimento e l’estrazione di conoscenza”.


  1. 1.
    Anantharangachar, R., Ramani, S., Rajagopalan, S.: Ontology guided information extraction from unstructured text. CoRR abs/1302.1335 (2013)Google Scholar
  2. 2.
    Balke, W.T.: Introduction to information extraction: basic notions and current trends. Datenbank-Spektrum 12(2), 81–88 (2012)CrossRefGoogle Scholar
  3. 3.
    Chang, C.H., Kayed, M., Girgis, M.R., Shaalan, K.F.: A survey of web information extraction systems. IEEE Trans. Kn. Data Eng. 18(10), 1411–1428 (2006)CrossRefGoogle Scholar
  4. 4.
    Chen, L., Ortona, S., Orsi, G., Benedikt, M.: Aggregating semantic annotators. In: Proceedings VLDB Endow, vol. 6 no. 13, pp. 1486–1497 (2013)Google Scholar
  5. 5.
    Furche, Tim, Gottlob, Georg, Grasso, Giovanni, Orsi, Giorgio, Schallhart, Christian, Wang, Cheng: Little knowledge rules the web: domain-centric result page extraction. In: Rudolph, Sebastian, Gutierrez, Claudio (eds.) RR 2011. LNCS, vol. 6902, pp. 61–76. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  6. 6.
    Jiang, J.: Information extraction from text. In: Aggarwal, C.C., Zhai, C.X. (eds.) Mining Text Data, pp. 11–41. Springer, US (2012)CrossRefGoogle Scholar
  7. 7.
    Kara, S., Alan, O., Sabuncu, O., Akpinar, S., Cicekli, N.K., Alpaslan, F.N.: An ontology-based retrieval system using semantic indexing. Inf. Syst. 37(4), 294–305 (2012)CrossRefGoogle Scholar
  8. 8.
    Karkaletsis, Vangelis, Fragkou, Pavlina, Petasis, Georgios, Iosif, Elias: Ontology based information extraction from text. In: Paliouras, Georgios, Spyropoulos, Constantine D., Tsatsaronis, George (eds.) Multimedia Information Extraction. LNCS, vol. 6050, pp. 89–109. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  9. 9.
    Manna, M., Oro, E., Ruffolo, M., Alviano, M., Leone, N.: The H\(\imath \)L\(\varepsilon \)X system for semantic information extraction. Trans. Large-Scale Data- Knowl.-Centered Syst. V 7100, 91–125 (2012)CrossRefGoogle Scholar
  10. 10.
    Mo, Qian, Chen, Yi-hong: Ontology-Based Web Information Extraction. In: Zhao, Maotai, Sha, Junpin (eds.) ICCIP 2012, Part I. CCIS, vol. 288, pp. 118–126. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  11. 11.
    Ricca, F., Leone, N.: Disjunctive logic programming with types and objects: The DLV\(^{+}\) system. J. Appl. Logic 5(3), 545–573 (2007)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Weronika T. Adrian
    • 1
    • 2
  • Nicola Leone
    • 1
  • Marco Manna
    • 1
    Email author
  1. 1.Department of Mathematics and Computer ScienceUniversity of CalabriaCosenzaItaly
  2. 2.AGH University of Science and TechnologyKrakowPoland

Personalised recommendations