Unifying the Concept of Collection in Digital Libraries

  • Carlo Meghini
  • Nicolas Spyratos
Part of the Studies in Computational Intelligence book series (SCI, volume 265)

Abstract

The notion of collection plays a key role in Digital Libraries, where several kinds of collections are typically found. We claim that all these kinds can be unified into a single abstraction mechanism, endowed with an extension and an intension, similarly to predicates in logic. The extension of a collection is the set of documents that are members of the collection at a given point in time, while the intension is a description of the meaning of the collection, that is the peculiar property that the members of the collection possess and that distinguishes the collection from other collections. The problem then arises how to automatically derive the intension from a given extension, a problem that must be solved e.g. for the creation of a collection from a set of documents. It turns out that our notion of collection is very close to the notion of formal concept in Formal Concept Analysis, which provides a well-founded framework to formalize the problem and very useful tools to solve it. We exploit this framework to study the problem of automatically deriving a collection intension from a given extension.We then show how intensions can be exploited for carrying out basic tasks on collections, establishing a connection between Digital Library management and data integration.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Adjiman, P., Chatalic, P., Goasdoué, F., Rousset, M.C., Simon, L.: Distributed reasoning in a peer-to-peer setting: Application to the semantic web. Journal of Artificial Intelligence Research 25, 269–314 (2006)MATHMathSciNetGoogle Scholar
  2. 2.
    Baader, F., Calvanese, D., McGuiness, D., Nardi, D., Patel-Scheneider, P. (eds.): The description logic handbook. Cambridge University Press, Cambridge (2003)MATHGoogle Scholar
  3. 3.
    Bergmark, D.: Collection Synthesis. In: Proceeding of the second ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 253–262. ACM Press, New York (2002), http://doi.acm.org/10.1145/544220.544275 CrossRefGoogle Scholar
  4. 4.
    Blair, D.C.: The challenge of commercial document retrieval, Part II: a strategy for document searching based on identifiable document partitions. Information Processing and Management 38, 293–304 (2002)MATHCrossRefGoogle Scholar
  5. 5.
    Callan, J., Connell, M.: Query-based sampling of text databases. ACM Transactions on Information Systems (TOIS) 19(2), 97–130 (2001), http://doi.acm.org/10.1145/382979.383040 CrossRefGoogle Scholar
  6. 6.
    Callan, J.P., Lu, Z., Croft, W.B.: Searching Distributed Collections with Inference Networks. In: Fox, E.A., Ingwersen, P., Fidel, R. (eds.) Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 21–28. ACM Press, Seattle (1995)CrossRefGoogle Scholar
  7. 7.
    Candela, L.: Virtual Digital Libraries. Ph.D. thesis, Information Engineering Department, University of Pisa (2006)Google Scholar
  8. 8.
    Candela, L., Castelli, D., Ferro, N., Koutrika, G., Meghini, C., Ioannidis, Y., Pagano, P., Ross, S., Soergel, D., Agosti, M., Dobreva, M., Katifori, V., Schuldt, H.: The DELOS Digital Library Reference Model - Foundations for Digital Libraries. DELOS Network of Excellence on Digital Libraries (2007) ISBN 2-912337-37-XGoogle Scholar
  9. 9.
    Candela, L., Castelli, D., Pagano, P.: A service for supporting virtual views of large heterogeneous digital libraries. In: Koch, T., Sølvberg, I.T. (eds.) ECDL 2003. LNCS, vol. 2769, pp. 362–373. Springer, Heidelberg (2003)Google Scholar
  10. 10.
    Candela, L., Straccia, U.: The Personalized, Collaborative Digital Library Environment Cyclades and its Collections Management. In: Callan, J., Crestani, F., Sanderson, M. (eds.) SIGIR 2003 Ws Distributed IR 2003. LNCS, vol. 2924, pp. 156–172. Springer, Heidelberg (2004)Google Scholar
  11. 11.
    Carpineto, C., Romano, G.: Information retrieval through hybrid navigation of lattice representations. International Journal of Human-Computer Studies 45(5), 553–578 (1996)CrossRefGoogle Scholar
  12. 12.
    Carpineto, C., Romano, G.: A lattice conceptual clustering system and its application to browsing retrieval. Machine Learning 24(2), 95–122 (1996)Google Scholar
  13. 13.
    Carpineto, C., Romano, G.: Effective reformulation of boolean queries with concept lattices. In: Andreasen, T., Christiansen, H., Larsen, H.L. (eds.) FQAS 1998. LNCS (LNAI), vol. 1495, pp. 83–94. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  14. 14.
    Carpineto, C., Romano, G.: Order-theoretical ranking. Journal of American Society for Information Science 51(7), 587–601 (2000)CrossRefGoogle Scholar
  15. 15.
    French, J.C., Powell, A.L., Callan, J., Viles, C.L., Emmitt, T., Prey, K.J., Mou, Y.: Comparing the performance of database selection algorithms. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 238–245. ACM Press, New York (1999), http://doi.acm.org/10.1145/312624.312684 CrossRefGoogle Scholar
  16. 16.
    Ganter, B., Wille, R.: Applied lattice theory: Formal concept analysis, http://www.math.tu.dresden.de/~ganter/psfiles/concept.ps
  17. 17.
    Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, Heidelberg (1999)MATHGoogle Scholar
  18. 18.
    Garey, M.R., Johnson, D.S.: Computers and Intractability, A Guide to the Theory of NP-Completeness. W.H. Freeman and Company, New York (1979)MATHGoogle Scholar
  19. 19.
    Geisler, G., Giersch, S., McArthur, D., McClelland, M.: Creating Virtual Collections in Digital Libraries: Benefits and Implementation Issues. In: Proceedings of the second ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 210–218. ACM Press, New York (2002), http://doi.acm.org/10.1145/544220.544265 CrossRefGoogle Scholar
  20. 20.
    Godin, R., Gecsei, J., Pichet, C.: Design of a browsing interface for information retrieval. In: Proceedings of SIGIR 1989, the Twelfth Annual International ACM Conference on Research and Development in Information Retrieval, Cambridge, MA, pp. 32–39 (1989)Google Scholar
  21. 21.
    Gonçalves, M.A., Fox, E.A., Watson, L.T., Kipp, N.A.: Stream, structures, spaces, scenarios, societies (5s): A formal model for digital library. ACM TOIS 22(2), 270–312 (2004)CrossRefGoogle Scholar
  22. 22.
    Halevy, A.Y.: Answering Queries Using Views: A Survey. VLDB Journal 10(4), 270–294 (2001)MATHCrossRefGoogle Scholar
  23. 23.
    Lagoze, C., Fielding, D.: Defining Collections in Distributed Digital Libraries. D-Lib Magazine (1998), http://www.dlib.org
  24. 24.
    Lenzerini, M.: Data integration: A theoretical perspective. In: Proc. of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2002), Madison, Winsconsin, USA (2002) (Invited tutorial)Google Scholar
  25. 25.
    Meghini, C., Spyratos, N.: Preference-based query tuning through refinement/enlargement in a formal context. In: Dix, J., Hegner, S.J. (eds.) FoIKS 2006. LNCS, vol. 3861, pp. 278–293. Springer, Heidelberg (2006)Google Scholar
  26. 26.
    Meghini, C., Spyratos, N.: Computing intensions of digital library collections. In: Kuznetsov, S.O., Schmidt, S. (eds.) ICFCA 2007. LNCS (LNAI), vol. 4390, pp. 66–91. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  27. 27.
    Meghini, C., Spyratos, N.: Synthesizing monadic predicates. Journal of Logic and Computation 18, 831–847 (2008)MATHCrossRefMathSciNetGoogle Scholar
  28. 28.
    Priss, U.: Lattice-based information retrieval. Knowledge Organization 27(3), 132–142 (2000)Google Scholar
  29. 29.
    Renda, M.E., Callan, J.: The robustness of content-based search in hierarchical peer to peer networks. In: CIKM 2004: Proceedings of the thirteenth ACM international conference on Information and knowledge management, pp. 562–570. ACM Press, New York (2004), http://doi.acm.org/10.1145/1031171.1031276 CrossRefGoogle Scholar
  30. 30.
    Witten, I.H., Bainbridge, D., Boddie, S.J.: Power to the People: End-user Building of Digital Library Collections. In: Proceedings of the first ACM/IEEE-CS joint conference on Digital libraries, pp. 94–103. ACM Press, New York (2001), http://doi.acm.org/10.1145/379437.379458 CrossRefGoogle Scholar
  31. 31.
    Xu, J., Cao, Y., Lim, E.P., Ng, W.K.: Database selection techniques for routing bibliographic queries. In: Proceedings of the third ACM conference on Digital Libraries, pp. 264–274. ACM Press, New York (1998), http://doi.acm.org/10.1145/276675.276707 CrossRefGoogle Scholar
  32. 32.
    Yuwono, B., Lee, D.L.: Server Ranking for Distributed Text Retrieval Systems on the Internet. In: Database Systems for Advanced Applications, pp. 41–50 (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Carlo Meghini
    • 1
  • Nicolas Spyratos
    • 2
  1. 1.Consiglio Nazionale delle RicercheIstituto della Scienza e delle Tecnologie della InformazionePisaItaly
  2. 2.Laboratoire de Recherche en InformatiqueUniversité Paris-SudOrsay CedexFrance

Personalised recommendations