Skip to main content

Holistic graph-based document representation and management for open science

Abstract

While most previous research focused only on the textual content of documents, advanced support for document management in digital libraries, for open science, requires handling all aspects of a document: from structure, to content, to context. These different but inter-related aspects cannot be handled separately and were traditionally ignored in digital libraries. We propose a graph-based unifying representation and handling model based on the definition of an ontology that integrates all the different perspectives and drives the document description in order to boost the effectiveness of document management. We also show how even simple algorithms can profitably use our proposed approach to return relevant and personalized outcomes in different document management tasks.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Notes

  1. https://www.fosteropenscience.eu/taxonomy/term/110

  2. https://www.openaire.eu/.

  3. https://www.oclc.org/research/activities/frbr.html Recently proposed as an OWL2 ontology (see http://www.sparontologies.net/ontologies/frbr)

  4. https://www.openarchives.org/ore/

  5. https://dl.acm.org/ccs.

  6. www.fosteropenscience.eu/taxonomy/term/110.

  7. A demo of the system is available at http://193.204.187.73:8088/GraphBRAIN/.

References

  1. Asma, A., Siti, S.: Graph based text representation for document clustering. J. Theor. Appl. Inf. Technol. 76 (2015)

  2. Auer, S., Kovtun, V., Prinz, M., et al.: Towards a knowledge graph for science. In: Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics. Association for Computing Machinery, New York, NY, USA, WIMS ’18 (2018) https://doi.org/10.1145/3227609.3227689

  3. Bartling, S., Friesike, S. (eds.): Opening Science. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-00026-8

  4. Belew, R.K.: Adaptive information retrieval: using a connectionist representation to retrieve and leasrn about documents. SIGIR Forum 51(2), 106–115 (2017). https://doi.org/10.1145/3130348.3130359

    Article  Google Scholar 

  5. Berger, H., Dittenbach, M., Merkl, D.: An adaptive information retrieval system based on associative networks. In: Hartmann S, Roddick JF (eds) Conceptual Modelling 2004, First Asia-Pacific Conference on Conceptual Modelling (APCCM2004), Dunedin, New Zealand, January 18–22, 2004, CRPIT, vol 31. Australian Computer Society, pp 27–36 (2004)

  6. Can, F., Fox, E.A., Snavely, C.D., et al.: Incremental clustering for very large document databases: initial MARIAN experience. Inf. Sci. 84(1 &2), 101–114 (1995). https://doi.org/10.1016/0020-0255(94)00111-N

    Article  Google Scholar 

  7. Chakravarthy, S., Venkatachalam, A., Telang, A.: A graph-based approach for multi-folder email classification. In: 2010 IEEE International Conference on Data Mining (ICDM), pp 78–87. IEEE (2010)

  8. Chang, J., Kim, I.: Analysis and evaluation of current graph-based text mining researches. Adv. Sci. Technol. Lett. 42, 100–103 (2013)

    Article  Google Scholar 

  9. Croft, W.B., Thompson, R.H.: I3r: a new approach to the design of document retrieval systems. J. Am. Soc. Inf. Sci. 38(6), 389–404 (1987)

    Article  Google Scholar 

  10. Dewey, M.: A Classification and Subject Index for Cataloguing and Arranging the Books and Pamphlets of a Library. Amherst, Massachusetts (1876)

    Google Scholar 

  11. Esposito, F., Malerba, D., Semeraro, G., et al.: Adding machine learning and knowledge intensive techniques to a digital library service. Int. J. Digit. Libr. 2, 3–19 (1998)

    Article  Google Scholar 

  12. Esposito, F., Ferilli, S., Fanizzi, N., et al.: Incremental multistrategy learning for document processing. Appl. Artif. Intell. J. 17, 859–883 (2003)

    Article  Google Scholar 

  13. Ferilli, S.: A case study: dominus. In: Automatic Digital Document Processing and Management-Problems, Algorithms and Techniques. Advances in Pattern Recognition. Springer (2011)

  14. Ferilli, S.: An automatic intelligent system for document processing and fruition. Trans. Mach. Learn. Data Min. 11, 43–62 (2018)

    Google Scholar 

  15. Ferilli, S.: Integration strategy and tool between formal ontology and graph database technology. Electronics 10, 2616 (2021)

    Article  Google Scholar 

  16. Ferilli, S., Redavid, D.: An ontology and a collaborative knowledge base for history of computing. In: Proceedings of the 1st International Workshop on Open Data and Ontologies for Cultural Heritage (ODOCH-2019), 31st International Conference on Advanced Information Systems Engineering (CAiSE 2016), pp. 49–60 (2019)

  17. Ferilli, S., Redavid, D.: The GraphBRAIN system for knowledge graph management and advanced fruition. In: Foundations of Intelligent Systems, LNCS, vol 12117, pp 308–317. Springer (2020a)

  18. Ferilli, S., Redavid, D.: An ontology and knowledge graph infrastructure for digital library knowledge representation. In: Digital Libraries: The Era of Big Data and Data Science, Communications in Computer and Information Science, vol. 1177, pp. 47–61. Springer, Berlin (2020)

    Google Scholar 

  19. Fox, E.A.: Development of the coder system: a testbed for artificial intelligence methods in information retrieval. Inf. Process. Manag. 23(4), 341–366 (1987). https://doi.org/10.1016/0306-4573(87)90022-7

    Article  Google Scholar 

  20. Fox, E.A., France, R.K.: Architecture of an expert system for composite document analysis, representation, and retrieval. Int. J. Approx. Reason. 1(2), 151–175 (1987). https://doi.org/10.1016/0888-613X(87)90012-0

    Article  Google Scholar 

  21. Ghulam, M., Liyana, S., Ram, G., et al.: Classification of forensic autopsy reports through conceptual graph-based document representation model. J. Biomed. Inform. 82, 88–105 (2018)

    Article  Google Scholar 

  22. Gonçalves, M.A., France, R.K., Fox, E.A., et al.: MARIAN searching and querying across heterogeneous federated digital libraries. In: Proceedings of the First DELOS Network of Excellence Workshop on Information Seeking, Searching and Querying in Digital Libraries, DELOS 2000, Zurich, Switzerland, December 11–12, 2000, ERCIM Workshop Proceedings, vol 01/W001. ERCIM (2000)

  23. Gonçalves, M.A., Fox, E.A., Watson, L.T.: Towards a digital library theory: a formal digital library ontology. Int. J. Digit. Libr. 8(2), 91–114 (2008). https://doi.org/10.1007/s00799-008-0033-1

    Article  Google Scholar 

  24. Haslhofer, B., Isaac, A., Simon, R.: Knowledge Graphs in the Libraries and Digital Humanities Domain, pp 1–8. Springer, Cham. (2018) https://doi.org/10.1007/978-3-319-63962-8_291-1

  25. Heath, T., Bizer, C.: Linked data: evolving the web into a global data space. Synth. Lect. Semantic Web Theory Technol. 1(1), 1–136 (2011). https://doi.org/10.2200/S00334ED1V01Y201102WBE001

    Article  Google Scholar 

  26. Hocker, J., Schindler, C., Rittberger, M.: Participatory design for ontologies: a case study of an open science ontology for qualitative coding schemas. Aslib J. Inf. Manag. 72, 671–685 (2020)

    Article  Google Scholar 

  27. IFLA Study Group on the FRBR Functional requirements for bibliographic records—final report. Technical report. International Federation of Library Associations and Institutions (2009)

  28. Jaradeh, M.Y., Oelen, A., Prinz, M., et al.: Open research knowledge graph: a system walkthrough. In: Doucet A, Isaac A, Golub K, et al. (eds) Digital Libraries for Open Knowledge—23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019, Oslo, Norway, September 9–12, 2019, Proceedings, Lecture Notes in Computer Science, vol. 11799, pp 348–351. Springer (2019) https://doi.org/10.1007/978-3-030-30760-8_31

  29. Khan, S.A., Bhatti, R.: Semantic web and ontology-based applications for digital libraries: an investigation from LIS professionals in Pakistan. Electron Libr. 36(5), 826–841 (2018). https://doi.org/10.1108/EL-08-2017-0168

    Article  Google Scholar 

  30. Kruk, S.R., Synak, M., Zimmermann, K.: Marcont-integration ontology for bibliographic description formats. In: Baker T, Méndez E (eds) Vocabularies in Practice: Proceedings of the 2005 International Conference on Dublin Core and Metadata Applications, DC 2005, Madrid, Spain, September 12–15, 2005. Dublin Core Metadata Initiative, pp 231–234 (2005)

  31. Manghi, P., Bardi, A., Atzori, C., et al.: The openaire research graph data model (2019). https://doi.org/10.5281/zenodo.2643199

  32. Merrouni, Z.A., Frikh, B., Ouhbi, B.: Toward contextual information retrieval: a review and trends. In: Procedia Computer Science. The Second International Conference On Intelligent Computing In Data Sciences, ICDS2018, vol. 148, pp. 191–200 (2019). https://doi.org/10.1016/j.procs.2019.01.036

  33. Miller, G.: Wordnet: a lexical database for English. Commun. ACM 38, 39–41 (1995)

    Article  Google Scholar 

  34. Mons, B., Neylon, C., Velterop, J., et al.: Cloudy, increasingly fair; revisiting the fair data guiding principles for the European open science cloud. Inf Serv Us 37, 49–56 (2017)

    Google Scholar 

  35. Muhammad, R., Farnaz, A., Mohammad, S.: Document clustering using graph based document representation with constraints. CoRR abs/1412.1888 (2014)

  36. Ni, Y., Kai, Q.X., Cao, F., et al.: Semantic documents relatedness using concept graph representation. In: 9th ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, WSDM ’16, pp. 635–644 (2016)

  37. Osman, A., Barukub, O.: Graph-based text representation and matching: a review of the state of the art and future challenges. IEEE Access 8, 87562–87583 (2020)

    Article  Google Scholar 

  38. Poggi, A (ed).: Proceedings of the First International Workshop on Open Data and Ontologies for Cultural Heritage Co-located with the 31st International Conference on Advanced Information Systems Engineering, ODOCH@CAiSE 2019, Rome, Italy, June 3, 2019, CEUR Workshop Proceedings, vol. 2375. CEUR-WS.org (2019)

  39. Popovici, E.: Information retrieval of text, structure and sequential data in heterogeneous XML document collections. (recherche et filtrage d’information multimédia (texte, structure et séquence) dans des collections de documents XML hétérogènes). PhD thesis, University of Southern Brittany, Morbihan, France (2008)

  40. Robinson, I., Webber, J., Eifrem, E.: Graph Databases: New Opportunities for Connected Data, 2nd edn., p. 95472. O’Reilly Media Inc, Sebastopol (2015)

    Google Scholar 

  41. Rotella, F., Leuzzi, F., Ferilli, S.: Learning and exploiting concept networks with ConNeKTion. Appl. Intell. 42, 87–111 (2015)

    Article  Google Scholar 

  42. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    Article  Google Scholar 

  43. Shapiro, S.C., Rapaport, W.J.: The SNePS family. Comput. Math. Appl. 23(2), 243–275 (1992). https://doi.org/10.1016/0898-1221(92)90143-6

    Article  MATH  Google Scholar 

  44. Sonawane, S., Kulkarni, P.: Graph based representation and analysis of text document: a survey of techniques. Int. J. Comput. Appl. 96, 1–8 (2014)

    Google Scholar 

  45. Sonawane, S., Kulkarni, P.: Context-based co-reference resolution for text document using graph model (cont-graph). Int. J. Knowl. Eng. Data Min. 4, 1–17 (2016)

    Article  Google Scholar 

  46. Sowa, J.F.: Semantic networks (2015)

  47. Spellman, B.A., Gilbert, E.A., Corker, K.S.: Open Science, pp. 1–47. Wiley, New York (2018). https://doi.org/10.1002/9781119170174.epcn519

    Book  Google Scholar 

  48. Tong, T.: Semantic frameworks for document and ontology clustering. PhD thesis, University of Missouri–Kansas City (2011)

  49. Veena, G., Krishnan, S.: A concept based graph model for document representation using coreference resolution. In: Intelligent Systems Technologies and Applications, AISC, vol 384. Springer, pp 367–379 (2016)

  50. Wang, Y., Ni, X., Sun, J., et al.: Representing document as dependency graph for document clustering. In: 20th ACM International Conference on Information and Knowledge Management, CIKM’11, pp 2177–2180. ACM (2011)

  51. Zhou, J., Cui, G., Zhang, Z., et al.: Graph neural networks: a review of methods and applications. CoRR abs/1812.08434 (2018) https://arxiv.org/abs/1812.08434

Download references

Acknowledgements

We are grateful to Artificial Brain S.r.l., which contributed for free to the development and engineering of the systems used for this work (DoMInUS, ConNeKTion, and, notably, GraphBRAIN), and which allowed us to freely exploit also proprietary parts of the system in our research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefano Ferilli.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ferilli, S., Redavid, D. & Di Pierro, D. Holistic graph-based document representation and management for open science. Int J Digit Libr (2022). https://doi.org/10.1007/s00799-022-00328-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00799-022-00328-z

Keywords

  • Document representation
  • Knowledge graphs
  • Document management
  • Open science