Skip to main content
Log in

CoBITs: a distributed indexing approach to collaborative content-based multimedia retrieval across digital archives

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

There’s more and more precious content digitized in digital archives especially for cultural heritage. It could cost much effort in digitization and archiving. To meet the requirements in a digital archiving system, several issues must be addressed. First, it usually requires resources such as computation and storage for each individual digital archive to maintain its own service. Second, the archived content would be more useful if they can be easily utilized in providing services such as searching across multiple archives. Current approaches usually adopt metadata harvesting that would build a centralized index from separate digital libraries. They usually suffer from the problem of metadata inconsistency. In this paper, we propose a distributed indexing approach to collaborative content-based multimedia retrieval across digital archives. To reduce the loads in each archive, we dynamically distribute the tasks of crawling, indexing, and query processing depending on the response time. Distributed crawler-based approach can simplify the design of indexing and query processing steps by maintaining the data to be indexed local to the machine for crawling. It can facilitate efficient archiving and indexing by automatically following the link structure of contents published on the Web. Also, it enables simpler implementation and easier support for cross-archive applications such as search and copy detection. Experimental results show the potential of the proposed approach in load balancing with appropriate task distribution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://www.ndap.org.tw/, the homepage for the first phase of the digital archives project in Taiwan. The second phase projects are available in: http://www.teldap.tw/

  2. http://catalog.digitalarchives.tw/

  3. http://www.archive.org/

  4. http://www.netpreserve.org/

  5. http://crawler.archive.org/

  6. http://lucene.apache.org/nutch/

  7. http://www.petitcolas.net/fabien/watermarking/stirmark/

References

  1. Banbridge D, Don K, Buchanan G, Witten I, Jones S, Jones M, Barr M (2004) In Proceedings of the 8th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2004) pp 1–13

  2. Bender M, Michel S, Triantafillou P, Weikum G, Zimmer C (2005) Improving collection selection with overlap awareness in P2P search engines. In Proceedings of SIGIR 2005, pp 67–74

  3. Boldi P, Codenotti B, Santini M, Vigna S (2004) UbiCrawler: a scalable fully distributed Web crawler. Softw Pract Experience 34(8):711–726

    Article  Google Scholar 

  4. Buchanan G, Bainbridge D, Don K, Witten I (2005) A new framework for building digital library collections. In Proceedings of ACM/IEEE Joint Conference on Digital Libraries (JCDL 2005), pp 23–31

  5. Callan J (2002) Distributed information retrieval. In Advances in information retrieval. pp 127–150

  6. Cho J, Garcia-Molina H (2002) Parallel crawlers. In Proceedings of the 11th World Wide Web conference (WWW 2002), pp 124–135

  7. Efron M, Organisciak P, Fenlon K (2011) Building topic models in a federated digital library through selective document exclusion. Proc Am Soc Info Sci Tech 48:1–10. doi:10.1002/meet.2011.14504801048

  8. Heydon A, Najork M (1999) Mercator: a scalable, extensible web crawler. World Wide Web 2(4):219–229. Available at http://link.springer.com/article/10.1023%2FA%3A1019213109274

  9. Lagoze C, Sompel HV, Nelson M, Warner S The open archives initiative protocol for metadata harvesting (2.0). Public draft, available at http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm

  10. Liu X, Maly K, Zubair M, Nelson ML (2003) Repository synchronization in the OAI framework. In Proceedings of the Joint Conference on Digital Libraries (JCDL 2003), pp 191–198

  11. Lu J, Callan J (2003) Content-based retrieval in hybrid peer-to-peer networks. In Proceedings of the twelfth International Conference on Information and Knowledge Management (CIKM 2003), pp 199–206

  12. Lu J, Callan J (2005) Federated search of text-based digital libraries in hierarchical peer-to-peer networks. In Proceedings of 27th European Conference on Information Retrieval Research (ECIR 2005), pp 52–66

  13. Maniatis P, Roussopoulos M, Giuli T, Rosenthal D, Baker M (2005) The LOCKSS peer-to-peer digital preservation system. ACM Trans Comput Syst 23(1):2–50

    Article  Google Scholar 

  14. Payette S, Lagoze C (1998) Flexible and Extensible Digital Object and Repository Architecture (FEDORA). In Proceedings of the 2nd European Conference on Research and Advanced Technology for Digital Libraries (ECDL 1998), pp 41–59

  15. Seara EFR, Sunye MS, Bona LCE, Vignatti T, Vignatti AL, Doucet A (2012) Extending OAI-PMH over structured P2P networks for digital preservation. Int J Digit Libr 12:13–26

    Article  Google Scholar 

  16. Shkapenyuk V, Suel T (2002) Design and implementation of a high-performance distributed web crawler. In Proceedings of the International Conference on Data Engineering (ICDE 2002), pp 357–368

  17. Simeoni F, Yakici M, Neely S, Crestani F (2008) Metadata harvesting for content-based distributed information retrieval. J Am Soc Inf Sci Technol 59(1):12–24

    Article  Google Scholar 

  18. Singh A, Srivatsa M, Liu L, Miller T (2003) Apoidea: A decentralized peer-to-peer architecture for crawling the world wide web. In Proceedings the SIGIR 2003 Workshop on Distributed IR, LNCS 2924. pp 126–142

  19. Smith M, Barton M, Bass M, Branschofsky M, McClellan G, Stuve D, Tansley R, Walker JH (2003) DSpace: an open source dynamic digital repository. D-Lib Mag 9(No.1)

  20. Staples T, Wayland R, Payette S (2003) The fedora project: an open-source digital object repository management system. D-Lib Mag 9(No. 4)

  21. Stribling J, Councill I, Li J, Kaashoek M, Karger D, Morris R, Shenker S (2005) OverCite: A cooperative digital research library. In Proceedings of the 4th International Workshop on Peer-to-Peer Systems (IPTPS 2005), pp 69–79

  22. Suel T, Mathur C, Wu J, Zhang J, Delis A, Kharrazi M, Long X, Shanmugasundaram K (2003) ODISSEA: A peer-to-peer architecture for scalable web search and information retrieval. In Proceedings of the 6th International Workshop on the Web and Database (WebDB 2003), pp 67–72

  23. Teregowda P, Urgaonkar B, Giles CL (2010) Cloud computing: A digital libraries perspective. In Proceedings of IEEE 3rd International Conference on Cloud Computing (Cloud 2010), pp 115–122

  24. Trnkoczy J, Stankovski V (2008) Improving the performance of federated digital library services. Futur Gener Comput Syst 24:824–832

    Article  Google Scholar 

  25. Trnkoczy J, Turk Z, Stankovski V (2006) A grid-based architecture for personalized federation of digital libraries. Libr Collect Acquis Tech Serv 30:139–153

    Article  Google Scholar 

  26. Vignatti T, Bona LCE, Sunye MS (2009) Long-term digital archiving based on selection of repositories over P2P networks. In Proceedings of IEEE 9th International Conference on Peer-to-Peer Computing (P2P 2009), pp 194–203

  27. Wang JH, Chang HC, Hsiao JH (2008) Protecting digital library collections with collaborative web image copy detection. In Proceedings of the 11th International Conference on Asian Digital Libraries (ICADL 2008), pp 332–335

  28. Wittek P, Daranyi S (2011) Leveraging on high-performance computing and cloud technologies in digital libraries: A case study. In Proceedings of IEEE 3rd International Conference on Cloud Computing Technology and Science (CloudCom 2011), pp 606–611

Download references

Acknowledgment

We would like to thank the support from National Science Council, Taiwan under the grant number NSC101-2219-E-027-005.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jenq-Haur Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, JH., Chang, HC. CoBITs: a distributed indexing approach to collaborative content-based multimedia retrieval across digital archives. Multimed Tools Appl 74, 2639–2658 (2015). https://doi.org/10.1007/s11042-013-1461-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-013-1461-5

Keywords

Navigation