Skip to main content

MANENT: An Infrastructure for Integrating, Structuring and Searching Digital Libraries

  • Chapter
Learning Structure and Schemas from Documents

Part of the book series: Studies in Computational Intelligence ((SCI,volume 375))

Abstract

Digital Libraries represent the commitment of research communities to preserve authoritative and well structured sources of knowledge, and to share archival organisations, methods and resources thanks to systems relying on standard metadata formats. This chapter describes some natural language processing techniques exploited for automatically extracting structural information from documents stored in Digital Libraries, based on the exposed metadata. The most prominent results achieved in this area are surveyed and discussed. As an example of an infrastructure for integrating, structuring and searching Digital Libraries based on natural language processing and semantic web techniques, we discuss the MANENT system. MANENT is a working prototype offering services of Digital Library content management and record classification and retrieval. It is hosted on a server at the Computer Science Department of Genova University and, starting from 2011, it will become publicly available. 475,000 records drawn from 138 repositories that all over the world expose OAI-PMH services have been downloaded, stored, and their automatic classification is under way.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agirre, E., Edmonds, P.: Word Sense Disambiguation - Algorithms and Applications. Springer, Heidelberg (2007)

    Book  Google Scholar 

  2. Agosti, M., Berretti, S., Brettlecker, G., del Bimbo, A., Ferro, N., Fuhr, N., Keim, D., Klas, C.P., Lidy, T., Milano, D., Norrie, M., Ranaldi, P., Rauber, A., Schek, H.J., Schreck, T., Schuldt, H., Signer, B., Springmann, M.: DelosDLMS - the integrated DELOS digital library management system. In: Proceedings of the First International Conference on Digital Libraries: Research and Development, pp. 36–45 (2007)

    Google Scholar 

  3. Agosti, M., Ferro, N.: A Formal Model of Annotations of Digital Content. ACM Trans. Inform. Syst., 26(1) (2007)

    Google Scholar 

  4. Balasubramanian, N., Allan, J., Croft, W.B.: A comparison of sentence retrieval techniques. In: Proceedings of the Thirtieth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 813–814 (2007)

    Google Scholar 

  5. Banerjee, S., Pedersen, T.: An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 136–145. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  6. Baruzzo, A., Casoto, P., Challapalli, P., Dattolo, A., Pudota, N., Tasso, C.: Toward Semantic Digital Libraries: Exploiting Web2.0 and Semantic Services in Cultural Heritage. Journal of Digital Information 10(6) (2009)

    Google Scholar 

  7. Bentivogli, L., Forner, P., Magnini, B., Pianta, E.: Revising WordNet Domains Hierarchy: Semantics, Coverage, and Balancing. In: Proceedings of the Twenty-First International Conference on Computational Linguistics (COLING 2004),, pp. 101–108 (2004)

    Google Scholar 

  8. Bloehdorn, S., Cimiano, P., Duke, A., Haase, P., Heizmann, J., Thurlow, I., Völker, J.: Ontology-based question answering for digital libraries. In: Kovács, L., Fuhr, N., Meghini, C. (eds.) ECDL 2007. LNCS, vol. 4675, pp. 14–25. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  9. Buitelaar, P., Cimiano, P., Frank, A., Hartung, M., Racioppa, S.: Ontology-based information extraction and integration from heterogeneous data sources. Int. J. Hum.-Comput. Stud., 66(11), 759–788 (2008)

    Article  Google Scholar 

  10. Candela, L., Castelli, D., Ferro, N., Ioannidis, Y., Koutrika, G., Meghini, C., Pagano, P., Ross, S., Soergel, D., Agosti, M., Dobreva, M., Katifori, V., Schuldt, H.: The DELOS Digital Library Reference Model. Foundations for Digital Libraries. ISTI-CNR, PISA (2007)

    Google Scholar 

  11. Cavnar, W.B., Trenkle, J.M.: N-Gram-Based Text Categorization. In: Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval, pp. 161–175 (1994)

    Google Scholar 

  12. Dublin Core Metadata Element Set, http://www.dublincore.org/documents/dces/

  13. EAD: Encoded Archivial Description, http://www.loc.gov/ead/

  14. EAD XML Metaschema, http://www.loc.gov/ead/ead.xsd

  15. Ferilli, S., Biba, M., Basile, T., Esposito, F.: Combining Qualitative and Quantitative Keyword Extraction Methods with Document Layout Analysis. In: Proceedings of the Fifth Italian Research Conference on Digital Libraries (IRCDL 2009). DELOS: an Association for Digital Libraries (2009)

    Google Scholar 

  16. Ferro, N.: Annotation search: The FAST way. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds.) ECDL 2009. LNCS, vol. 5714, pp. 15–26. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  17. Ferro, N., Silvello, G.: The NESTOR framework: How to handle hierarchical data structures. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds.) ECDL 2009. LNCS, vol. 5714, pp. 215–226. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  18. FOAF: Friend of a Friend ontology, http://www.foaf-project.org/

  19. Gliozzo, A., Strapparava, C.: Semantic Domains in Computational Linguistics. Springer, Heidelberg (2009)

    Book  MATH  Google Scholar 

  20. Gliozzo, A., Strapparava, C., Dagan, I.: Unsupervised and supervised exploitation of semantic domains in lexical disambiguation. Computer Speech & Language 18(3), 255–299 (2004)

    Article  Google Scholar 

  21. Gruber, T.: Definition of Ontology. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, Springer, Heidelberg (2009)

    Google Scholar 

  22. Hargittai, E., Fullerton, F., Menchen-Trevino, E., Thomas, K.: Trust Online: Young Adults’ Evaluation of Web Content. International Journal of Communication 4, 468–494 (2010)

    Google Scholar 

  23. Hunter, J., Khan, I., Gerber, A.: Harvana: harvesting community tags to enrich collection metadata. In: Proceedings of the Eighth ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 147–156 (2008)

    Google Scholar 

  24. Itzcovich, O.: L’uso del calcolatore in storiografia, Milano (1993)

    Google Scholar 

  25. Kruk, S.R., McDaniel, B.: Semantic Digital Libraries. Springer, Heidelberg (2009)

    Book  MATH  Google Scholar 

  26. Locoro, A.: Tagging Domain Ontologies with WordNet Domains: An Approach for Fostering Ontology Classification, Engineering and Matching. Technical Report DISI-TR-10-10, CS Dept. of Genova University (2010), http://www.disi.unige.it/person/LocoroA/download/DISI-TR-10-10.pdf

  27. Magnini, B., Cavagliá, G.: Integrating Subject Field Codes into WordNet. In: Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000), pp. 1413–1414 (2000)

    Google Scholar 

  28. Magnini, B., Strapparava, C., Pezzulo, G., Gliozzo, A.: The role of domain information in Word Sense Disambiguation. Natural Language Engineering 8, 359–373 (2002)

    Article  Google Scholar 

  29. MARCXML, http://www.loc.gov/standards/marcxml/

  30. METS: Metadata encoding and Transmission Standard, http://www.loc.gov/standards/mets/

  31. Metzler, D., Dumais, S.T., Meek, C.: Similarity measures for short segments of text. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECiR 2007. LNCS, vol. 4425, pp. 16–27. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  32. Mihalcea, R., Corley, C., Strappavara, C.: Corpus-based and Knowledge-based Measures of Text Semantic Similarity. In: Proceedings of the Twenty-First National Conference on Artificial Intelligence and Eighteenth Innovative Applications of Artificial Intelligence Conference. AAAI Press, Menlo Park (2006)

    Google Scholar 

  33. Miller, G.A.: WordNet: A Lexical Database for English. Communications of the ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  34. OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting, http://www.openarchives.org/OAI/openarchivesprotocol.html

  35. Ortoleva, P.: Persi nella rete? Circolazione del sapere storico. In: Soldani, S., Tomassini, L. (eds.) Storia & Computer, alla ricerca del passato con l’informatica, Milano (1996)

    Google Scholar 

  36. The Open Archives Initiative Protocol for Metadata Harvesting: Metadata Prefix and Metadata Schema, http://www.openarchives.org/OAI/openarchivesprotocol.html#MetadataNamespaces

  37. The Open Archives Initiative Protocol for Metadata Harvesting: Guidelines for Repository Implementers, http://www.openarchives.org/OAI/2.0/guidelines-repository.htm

  38. The Protégé Ontology Editor, http://protege.stanford.edu/

  39. Rocchio, J.: Relevance feedback in information retrieval. In: Salton, G. (ed.) The SMART retrieval system: Experiments in Automatic Document Processing. Prentice-Hall, Englewood Cliffs (1971)

    Google Scholar 

  40. Rowland, R.: L’informatica e il mestiere dello storico. In: Quaderni Storici, pp. 26–78 (1991)

    Google Scholar 

  41. Salton, G., Lesk, M.: Computer evaluation of indexing and text processing. Journal of the ACM (JACM) 15(1), 8–36 (1968)

    Article  MATH  Google Scholar 

  42. SPARQL Query Language for RDF, http://www.w3.org/TR/rdf-sparql-query/

  43. Wache, H., Vögele, T., Visser, U., Stuckenschmidt, H., Schuster, G., Neumann, H., Hübner, S.: Ontology-based integration of information - a survey of existing approaches. In: Proceedings of the Twelfth International Joint Conference on Artificial Intelligence (IJCAI 2001) Workshop on Ontologies and Information Sharing, pp. 108–117 (2001)

    Google Scholar 

  44. Wenger, E.: Communities of practice, learning, meaning and identity, Cambridge (1998)

    Google Scholar 

  45. W3C . OWL Web Ontology Language Overview – W3C Recommendation (February 10, 2004)

    Google Scholar 

  46. W3C . RDF Vocabulary Description Language 1.0: RDF Schema – W3C Recommendation (February 10, 2004)

    Google Scholar 

  47. W3C . RDF/XML Syntax Specification (Revised) – W3C Recommendation (February 10, 2004)

    Google Scholar 

  48. W3C . Extensible Markup Language (XML) 1.0 (Fifth Edition) – W3C Recommendation (November 26, 2008)

    Google Scholar 

  49. Wordnets in the world, http://www.globalwordnet.org/gwa/wordnet_table.htm

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Locoro, A., Grignani, D., Mascardi, V. (2011). MANENT: An Infrastructure for Integrating, Structuring and Searching Digital Libraries. In: Biba, M., Xhafa, F. (eds) Learning Structure and Schemas from Documents. Studies in Computational Intelligence, vol 375. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22913-8_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22913-8_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22912-1

  • Online ISBN: 978-3-642-22913-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics