Interactive Text Analysis and Information Extraction

  • Tasos Giannakopoulos
  • Yannis FoufoulasEmail author
  • Harry Dimitropoulos
  • Natalia Manola
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 988)


A lot of work that has been done in the text mining field concerns the extraction of useful information from the full-text of publications. Such information may be links to projects, acknowledgements to communities, citations to software entities or datasets and more. Each category of entities, according to its special characteristics, requires different approaches. Thus it is not possible to build a generic mining platform that could text mine various publications to extract such info. Most of the time, a field expert is needed to supervise the mining procedure, decide the mining rules with the developer, and finally validate the results. This is an iterative procedure that requires a lot of communication among the experts and the developers, and thus is very time-consuming. In this paper, we present an interactive mining platform. Its purpose is to allow the experts to define the mining procedure, set/update the rules, validate the results, while the actual text mining code is produced automatically. This significantly reduces the communication among the developers and the experts and moreover allows the experts to experiment themselves using a user-friendly graphical interface.



This work is funded by the European Commission under H2020 projects OpenAIRE-Connect (grant number: 731011) and OpenAIRE-Advance (grant number: 777541).


  1. 1.
    Agrawal, R., Shim, K.: Developing tightly-coupled data mining applications on a relational database system. In: KDD (1996)Google Scholar
  2. 2.
    madIS, Lefteris Stamatogiannakis, Mei Li Triantafillidi, Yannis Foufoulas. Accessed 4 Oct 2018
  3. 3.
    Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: 26th Symposium on Mass Storage Systems and Technologies (MSST) (2010)Google Scholar
  4. 4.
    Chronis, Y.: A relational approach to complex dataflows. In: EDBT/ICDT Workshops (2016)Google Scholar
  5. 5.
    Giannakopoulos, T., Foufoulas, I., Stamatogiannakis, E., Dimitropoulos, H., Manola, N., Ioannidis, Y.: Discovering and visualizing interdisciplinary content classes in scientific publications. D-Lib Mag. 20(11), 4 (2014)Google Scholar
  6. 6.
    Giannakopoulos, T., Foufoulas, I., Stamatogiannakis, E., Dimitropoulos, H., Manola, N., Ioannidis, Y.: Visual-based classification of figures from scientific literature. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1059–1060. ACM, May 2015Google Scholar
  7. 7.
    Giannakopoulos, T., Stamatogiannakis, E., Foufoulas, I., Dimitropoulos, H., Manola, N., Ioannidis, Y.: Content visualization of scientific corpora using an extensible relational database implementation. In: Bolikowski, Ł., Casarosa, V., Goodale, P., Houssos, N., Manghi, P., Schirrwagen, J. (eds.) TPDL 2013. CCIS, vol. 416, pp. 101–112. Springer, Cham (2014). Scholar
  8. 8.
    Foufoulas, Y., Stamatogiannakis, L., Dimitropoulos, H., Ioannidis, Y.: High-pass text filtering for citation matching. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds.) TPDL 2017. LNCS, vol. 10450, pp. 355–366. Springer, Cham (2017). Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Tasos Giannakopoulos
    • 1
  • Yannis Foufoulas
    • 1
    Email author
  • Harry Dimitropoulos
    • 1
  • Natalia Manola
    • 1
  1. 1.University of Athens, Greece and Athena Research CenterAthensGreece

Personalised recommendations