Abstract
Modern scientific repositories are growing rapidly in size. Scientists are increasingly interested in viewing the latest data as part of query results. Current scientific middleware cache systems, however, assume repositories are static. Thus, they cannot answer scientific queries with the latest data. The queries, instead, are routed to the repository until data at the cache is refreshed. In data-intensive scientific disciplines, such as astronomy, indiscriminate query routing or data refreshing often results in runaway network costs. This severely affects the performance and scalability of the repositories and makes poor use of the cache system. We present Delta a dynamic data middleware cache system for rapidly-growing scientific repositories. Delta’s key component is a decision framework that adaptively decouples data objects—choosing to keep some data object at the cache, when they are heavily queried, and keeping some data objects at the repository, when they are heavily updated. Our algorithm profiles incoming workload to search for optimal data decoupling that reduces network costs. It leverages formal concepts from the network flow problem, and is robust to evolving scientific workloads. We evaluate the efficacy of Delta, through a prototype implementation, by running query traces collected from a real astronomy survey.
Chapter PDF
Similar content being viewed by others
References
Amiri, K., Park, S., Tewari, R., Padmanabhan, S.: DBProxy: a dynamic data cache for web applications. In: Proc. Int’l. Conf. on Data Engineering (2003)
Bagchi, A., Chaudhary, A., Goodrich, M.T., Li, C., Shmueli-Scheuer, M.: Achieving communication efficiency through push-pull partitioning of semantic spaces to disseminate dynamic information. Transactions on Knowledge and Data Engineering 18(10) (2006)
Bornhövd, C., Altinel, M., Krishnamurthy, S., Mohan, C., Pirahesh, H., Reinwald, B.: DBCache: middle-tier database caching for highly scalable e-business architectures. In: Proc. ACM SIGMOD Int’l Conf. on Management of Data (2003)
Borodin, A., El-Yaniv, R.: Online computation and competitive analysis. Cambridge University Press, Cambridge (1998)
Candan, K.S., Li, W.S., Luo, Q., Hsiung, W.P., Agrawal, D.: Enabling dynamic content caching for database-driven web sites. In: Proc. ACM SIGMOD Int’l Conf. on Management of Data (2001)
Candan, K.S., Li, W.-S., Luo, Q., Hsiung, W.-P., Agrawal, D.: Enabling dynamic content caching for database-driven web sites. SIGMOD Record 30(2), 532–543 (2001)
Cao, P., Irani, S.: Cost-aware www proxy caching algorithms. In: Proc. of the USENIX Symposium on Internet Technologies and Systems (1997)
Corman, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to algorithms. MIT Press, Cambridge (1990)
Dar, S., Franklin, M.J., Jonsson, B.T., Srivastava, D., Tan, M.: Semantic data caching and replacement. In: Proc. Int’l. Conf. on Very Large Databases (1996)
Deolasee, P., Katkar, A., Panchbudhe, A., Ramamritham, K., Shenoy, P.: Adaptive push-pull: disseminating dynamic web data. In: Proc. 10th Int’l. World Wide Web Conf. (2001)
Deux, O., et al.: The story of O2. Trans. on Knowledge and Data Engineering 2(1) (1990)
Garey, M., Johnson, D.: Computers and intractability: a guide to NP-completeness. WH Freeman and Company, San Francisco (1979)
Garrod, C., Manjhi, A., Ailamaki, A., Maggs, B., Mowry, T., Olston, C., Tomasic, A.: Scalable query result caching for web applications. In: Proc. Int’l Conf. on Very Large Databases (2008)
Guo, H., Larson, P., Ramakrishnan, R., Goldstein, J.: Support for relaxed currency and consistency constraints in mtcache. In: Proc. ACM SIGMOD Int’l. Conf. on Management of Data (2004)
Hochbaum, D. (ed.): Approximation Algorithms for NP-hard Problems. PWS Publishing Company (1997)
Huang, Y., Sloan, R., Wolfson, O.: Divergence Caching in Client Server Architectures. In: Proc. 3rd International Conference on Parallel and Distributed Information Systems (1994)
Kaiser, N.: Pan-starrs: a wide-field optical survey telescope array. Ground-based Telescopes 5489(1), 11–22 (2004)
Kaiser, N.: Pan-STARRS: a large synoptic survey telescope array. In: Proc. SPIE, pp. 154–164 (2002)
Kunszt, P., Szalay, A., Thakar, A.: The hierarchical triangular mesh. In: Mining the Sky: Proc. MPA/ESO/MPE Workshop (2001)
Labrinidis, A., Roussopoulos, N.: Webview materialization. SIGMOD Record 29(2) (2000)
Labrinidis, A., Roussopoulos, N.: Exploring the tradeoff between performance and data freshness in database-driven web servers. The VLDB Journal 13(3) (2004)
Lecluse, C., Richard, P., Velez, F.: O2, an object-oriented data model. SIGMOD Record 17(3), 424–433 (1988)
Large Synoptic Survey Telescope, http://www.lsst.org
Malik, T., Burns, R., Chaudhary, A.: Bypass caching: Making scientific databases good network citizens. In: Proc. Int’l. Conf. on Data Engineering (2005)
Malik, T., Burns, R., Chawla, N.: A black-box approach to query cardinality estimation. In: Proc. 3rd Conf. on Innovative Data Systems Research (2007)
Malik, T., Wang, X., Little, P., Chaudhary, A., Thakar, A.R.: Robust caching for rapidly-growing scientific repositories (2010), http://www.cs.purdue.edu/~tmalik/Delta-Full.pdf
Olston, C., Loo, B.T., Widom, J.: Adaptive precision setting for cached approximate values. ACM SIGMOD Record 30 (2001)
Olston, C., Manjhi, A., Garrod, C., Ailamaki, A., Maggs, B.M., Mowry, T.C.: A scalability service for dynamic web applications. In: CIDR (2005)
Olston, C., Widom, J.: Best-effort cache synchronization with source cooperation. In: Proc. ACM SIGMOD Int’l. Conf. on Management of Data (2002)
Pan-STARRS—Panoramic Survey Telescope and Rapid Response System, http://www.pan-starrs.ifa.hawaii.edu
Peng, G.: CDN: Content distribution network. Arxiv preprint cs.NI/0411069 (2004)
Protopapas, P., Jimenez, R., Alcock, C.: Fast identification of transits from light-curves. Journal reference: Mon. Not. Roy. Astron. Soc. 362, 460–468 (2005)
Sloan Digital Sky Survey, http://www.sdss.org
Shoshani, A., Sim, A., Gu, J.: Storage Resource Managers: Essential Components for the Grid. Kluwer Academic Publishers, Dordrecht (2004)
Singh, V., Gray, J., Thakar, A.R., Szalay, A.S., Raddick, J., Boroski, B., Lebedeva, S., Yanny, B.: SkyServer Traffic Report: The First Five Years, MSR-TR-2006-190. Technical report, Microsoft Technical Report, Redmond, WA (2006)
Skiena, S.: The algorithm design manual. Springer, Heidelberg (1998)
Stevens, W.: TCP/IP illustrated. The Protocols, vol. 1. Addison-Wesley Longman Publishing Co., Inc., Boston (1993)
Stonebraker, M., Aoki, P.M., Devine, R., Litwin, W., Olson, M.: Mariposa: A new architecture for distributed data. In: Proc. of the Internationall Conference on Data Engineering (1994)
Stonebraker, M., Aoki, P.M., Litwin, W., Pfeffer, A., Sah, A., Sidell, J., Staelin, C., Yu, A.: Mariposa: A wide-area distributed database system. The VLDB Journal 5(1) (1996)
Szalay, A.S., Gray, J., Thakar, A.R., Kunszt, P.Z., Malik, T., Raddick, J., Stoughton, C., van den Berg, J.: The SDSS skyserver: public access to the Sloan Digital Sky Server data. In: Proc. ACM SIGMOD Int’l Conf. on Management of Data (2002)
The Times Ten Team. In-memory data management in the application tier. In: Proc. of the International Conference on Data Engineering (2000)
Wang, X., Malik, T., Burns, R., Papadomanolakis, S., Ailamaki, A.: A workload-driven unit of cache replacement for mid-tier database caching. In: Kotagiri, R., Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 374–385. Springer, Heidelberg (2007)
Weisstein, E.W.: Vertex cover. from mathworld–a wolfram web resource, http://mathworld.wolfram.com/VertexCover.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 IFIP International Federation for Information Processing
About this paper
Cite this paper
Malik, T., Wang, X., Little, P., Chaudhary, A., Thakar, A. (2010). A Dynamic Data Middleware Cache for Rapidly-Growing Scientific Repositories. In: Gupta, I., Mascolo, C. (eds) Middleware 2010. Middleware 2010. Lecture Notes in Computer Science, vol 6452. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16955-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-16955-7_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16954-0
Online ISBN: 978-3-642-16955-7
eBook Packages: Computer ScienceComputer Science (R0)