Skip to main content

Exploring Large Digital Libraries by Multimodal Criteria

  • Conference paper
  • First Online:
Research and Advanced Technology for Digital Libraries (TPDL 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9819))

Included in the following conference series:

  • 1523 Accesses

Abstract

Digital library (DL) support for different information seeking strategies (ISS) has not evolved as fast as their amount of offered stock or presentation quality. However, several studies argue for the support of explorative ISS in conjunction to the directed query-response paradigm. Hence, this paper presents a primarily explorative research system prototype for metadata harvesting allowing multimodal access to DL stock for researchers during the research idea development phase, i.e., while the information need (IN) is vague. To address evolving INs, the prototype also allows ISS transitions, e.g., to OPACs, if accuracy is needed.

As its second contribution, the paper presents a curated data set for digital humanities researchers that is automatically enriched with metadata derived by different algorithms including content-based image features. The automatic enrichment of originally bibliographic metadata is needed to support the exploration of large metadata stock as traditional metadata does not always address vague INs.

The presented proof of concept clearly shows that use case-specific metadata facilitates the interaction with large metadata corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.europeana.eu/; http://dp.la/.

  2. 2.

    Open Archives Initiative Protocol for Metadata Harvesting; https://www.openarchives.org/pmh/.

  3. 3.

    Online Public Access Catalog.

  4. 4.

    http://www.numpy.org/; http://www.scipy.org/; http://pandas.pydata.org/.

  5. 5.

    https://gephi.org/; https://d3js.org/.

  6. 6.

    http://dublincore.org/.

  7. 7.

    Although one might argue whether humans do not err.

  8. 8.

    For the sake of readability, we decided to publish the full source in form of a documented Jupyter (http://jupyter.org/) notebook as a supplement to this paper to limit the discussion of algorithmic parameters to a minimum.

  9. 9.

    For instance, the RAK main manual, the German counterpart to Anglo-American Cataloguing Rules, is a 627 pages long document.

  10. 10.

    Because of the unavailability of ground truths similar to our corpus, the limited amount of data in the prototype, and the non-destructive extension of the metadata records, we decided against a full automation of the evaluation. However, the resulting clusters are stable enough to be checked against common authority files.

  11. 11.

    The in principle optional normalization was carried out primarily to offer researchers a homogeneous image data set.

  12. 12.

    http://opencv.org/.

  13. 13.

    http://sbb.berlin/sbbrowse.

References

  1. Reiterer, H., Mußler, G., Mann, T., Handschuh, S.: INSYDER. In: Proceedings of the 23rd SIGIR 2000, pp. 112–119. ACM (2000)

    Google Scholar 

  2. Kuhlthau, C.C.: Inside the search process: information seeking from the user’s perspective. J. Am. Soc. Inf. Sci. 42(5), 361–371 (1991)

    Article  Google Scholar 

  3. Ingwersen, P.: Cognitive perspectives of information retrieval interaction: elements of a cognitive IR theory. J. Doc. 52, 3–50 (1996)

    Article  Google Scholar 

  4. Ellis, D., Haugan, M.: Modelling the information seeking patterns of engineers and research scientists in an industrial environment. J. Doc. 53(4), 384–403 (1997)

    Article  Google Scholar 

  5. Thomee, B., Popescu, A.: Overview of the ImageCLEF 2012 flickr photo annotation and retrieval task. In: Forner, P., Karlgren, J., Womser-Hacker, C. (eds.) CLEF 2012, Online Working Notes, Rome, Italy, 17–20 September 2012 (2012)

    Google Scholar 

  6. Caputo, B., Muller, H., Thomee, B., Villegas, M., Paredes, R., Zellhofer, D., Goeau, H., Joly, A., Bonnet, P., Martinez Gomez, J., Varea, I.G., Cazorla, M.: ImageCLEF 2013: the vision, the data and the open challenges. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 250–268. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  7. deMey, M.: The cognitive viewpoint: its development and its scope. In: International Workshop on the Cognitive Viewpoint, CC 1977, Ghent, Belgium, pp. xvi–xxxii (1977)

    Google Scholar 

  8. Ingwersen, P., Järvelin, K.: The Turn: Integration of Information Seeking and Retrieval in Context. Springer, Dordrecht (2005)

    MATH  Google Scholar 

  9. Zellhöfer, D.: A preference-based relevance feedback approach for polyrepresentative multimedia retrieval. Ph.D. thesis, Brandenburg Technical University (2015)

    Google Scholar 

  10. Bates, M.: The design of browsing and berrypicking techniques for the online search interface. Online Rev. 13(5), 407–424 (1989)

    Article  Google Scholar 

  11. Marchionini, G., Geisler, G., Brunk, B.: Agileviews: a human-centered framework for interfaces to information spaces. In: Proceedings of the Annual Conference of the American Society for Information Science, pp. 271–280 (2000)

    Google Scholar 

  12. Belkin, N., Marchetti, P., Cool, C.: BRAQUE: design of an interface to support user interaction in information retrieval. Inf. Process. Manag. 29(3), 325–344 (1993)

    Article  Google Scholar 

  13. White, R., Roth, R.: Exploratory Search: Beyond the Query-Response Paradigm. Synthesis Lectures on Information Concepts, Retrieval, and Services. Morgan & Claypool Publishers, San Rafael (2009)

    Google Scholar 

  14. Winkler, W.E.: String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of record linkage. In: Proceedings of the Section on Survey Research, pp. 354–359 (1990)

    Google Scholar 

  15. Fox, E.A., Leidig, J.: Digital Libraries Applications: CBIR, Education, Social Networks, eScience/Simulation, and GIS. Synthesis Lectures on Information Concepts, Retrieval, and Services. Morgan & Claypool Publishers, San Rafael (2014)

    Google Scholar 

  16. Lux, M., Chatzichristofis, S.: Lire: Lucene image retrieval: an extensible Java CBIR library. In: Proceedings of the 16th ACM MM 2008, pp. 1085–1088. ACM (2008)

    Google Scholar 

  17. Shneiderman, B.: The eyes have it: a task by data type taxonomy for information visualizations. In: Proceedings of the 1996 IEEE Symposium on Visual Languages, VL 1996, pp. 336–343. IEEE Computer Society (1996)

    Google Scholar 

  18. Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference on WWW 2010, pp. 1177–1178. ACM, New York (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Zellhöfer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Zellhöfer, D. (2016). Exploring Large Digital Libraries by Multimodal Criteria. In: Fuhr, N., Kovács, L., Risse, T., Nejdl, W. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2016. Lecture Notes in Computer Science(), vol 9819. Springer, Cham. https://doi.org/10.1007/978-3-319-43997-6_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-43997-6_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-43996-9

  • Online ISBN: 978-3-319-43997-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics