Executable schema mappings for statistical data processing
- 57 Downloads
Abstract
Data processing is the core of any statistical information system. Statisticians are interested in specifying transformations and manipulations of data at a high level, in terms of entities of statistical models. We illustrate here a proposal where a high-level language, EXL, is used for the declarative specification of statistical programs, and a translation into executable form in various target systems is available. The language is based on the theory of schema mappings, in particular those defined by a specific class of tgds, which we actually use to optimize user programs and facilitate the translation towards several target systems. The characteristics of such class guarantee good tractability properties and the applicability in Big Data settings. A concrete implementation, EXLEngine, has been carried out and is currently used at the Bank of Italy.
Keywords
Schema mappings Statistical data Scalable data processing ETLReferences
- 1.Arenas, M., Fagin, R., Nash, A.: Composition with target constraints. Logical Methods Comput. Sci. 7(3) (2011)Google Scholar
- 2.Arenas, M., Gottlob, G., Pieris, A.: Expressive languages for querying the semantic web. In: PODS, pp. 14–26 (2014)Google Scholar
- 3.Atzeni, P., Cappellari, P., Torlone, R., Bernstein, P., Gianforme, G.: Model-independent schema translation. VLDB J. 17, 1347–1370 (2008)CrossRefGoogle Scholar
- 4.Atzeni, P., Bellomarini, L., Bugiotti, F., Gianforme, G.: MISM: a platform for model-independent solutions to model management problems. J. Data Semant. 14, 133–161 (2009)CrossRefGoogle Scholar
- 5.Atzeni, P., Bellomarini, L., Bugiotti, F., Celli, F., Gianforme, G.: A runtime approach to model-generic translation of schema and data. Inf. Syst. 37, 269–287 (2012)CrossRefGoogle Scholar
- 6.Atzeni, P., Bellomarini, L., Bugiotti, F.: Exlengine: executable schema mappings for statistical data processing. In: EDBT, pp. 672–682 (2013)Google Scholar
- 7.Bellomarini, L., Gottlob, G., Pieris, A., Sallinger, E.: Swift logic for big data and knowledge graphs. In: IJCAI, pp. 2–10 (2017)Google Scholar
- 8.Bernstein, P.A., Melnik, S.: Model management 2.0: manipulating richer mappings. In: SIGMOD Conference, pp. 1–12 (2007)Google Scholar
- 9.Boehm, M., Tatikonda, S., Reinwald, B., Sen, P., Tian, Y., Burdick, D., Vaithyanathan, S.: Hybrid parallelization strategies for large-scale machine learning in systemml. PVLDB 7(7), 553–564 (2014)Google Scholar
- 10.Bonifati, A., Chang, E.Q., Ho, T., Lakshmanan, L.V.S., Pottinger, R.: Heptox: Marrying XML and heterogeneity in your P2P databases. In: VLDB, pp. 1267–1270 (2005)Google Scholar
- 11.Brockwell, P.J., Davis, R.A. (eds.): Introduction to Time Series and Forecasting. Springer, New York (2002)MATHGoogle Scholar
- 12.Calì, A., Gottlob, G., Lukasiewicz, T.: A general datalog-based framework for tractable query answering over ontologies. In: PODS, pp. 77–86 (2009)Google Scholar
- 13.Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Data complexity of query answering in description logics (extended abstract). In: IJCAI, pp. 4163–4167 (2015)Google Scholar
- 14.Chaudhuri, S.: An overview of query optimization in relational systems. In: PODS, PODS ’98, pp. 34–43, New York, NY, USA, (1998). ACMGoogle Scholar
- 15.Chaudhuri, S., Shim, K.: Including group-by in query optimization. In: VLDB, pp. 354–366. Morgan Kaufmann, Burlington (1994)Google Scholar
- 16.Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J.M., Welton, C.: MAD skills: new analysis practices for big data. PVLDB 2(2), 1481–1492 (2009)Google Scholar
- 17.Das, S., Sismanis, Y., Beyer, K.S., Gemulla, R., Haas, P.J., McPherson, J.: Ricardo: integrating R and hadoop. In: SIGMOD, pp. 987–998 (2010)Google Scholar
- 18.Del Vecchio, V.: Statistical data and concepts representation. Bank of Italy (1997). http://goo.gl/YIAqDp
- 19.Del Vecchio, V., Di Giovanni, F., Pambianco, S.: The “matrix” model. Bank of Italy (2007). http://goo.gl/Dj2XT0
- 20.Dessloch, S., Hernández, M., Wisnesky, R., Radwan, A., Zhou, J.: Orchid: integrating schema mapping and ETL. In: ICDE, pp. 1307–1316 (2008)Google Scholar
- 21.Di Giovanni, F., Piazza, D.: Processing and managing statistical data: a national central bank experience. Bank of Italy (2009). http://goo.gl/ZNi5zh
- 22.Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: semantics and query answering. In: ICDT, pp. 207–224 (2003)Google Scholar
- 23.Fagin, R., Kolaitis, P.G., Popa, L.: Data exchange: getting to the core. ACM Trans. Database Syst. 30(1), 174–210 (2005)CrossRefMATHGoogle Scholar
- 24.Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.C.: Composing schema mappings: second-order dependencies to the rescue. ACM Trans. Database Syst. 30(4), 994–1055 (2005)CrossRefGoogle Scholar
- 25.Fagin, R., Haas, L., Hernández, M., Miller, R., Popa, L., Velegrakis, Y.: Clio: schema mapping creation and data exchange. In: Conceptual Modeling: Foundations and Applications, pp. 198–236 (2009)Google Scholar
- 26.Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.C.: Schema mapping evolution through composition and inversion. In: Schema Matching and Mapping, pp. 191–222 (2011)Google Scholar
- 27.Gottlob, G., Pichler, R., Savenkov, V.: Normalization and optimization of schema mappings. PVLDB 2(1), 1102–1113 (2009)Google Scholar
- 28.Haas, L.M., Hernández, M.A., Ho, H., Popa, L., Roth, M.: Clio grows up: from research prototype to industrial tool. In: SIGMOD, pp. 805–810. ACM (2005)Google Scholar
- 29.Kolaitis, P.: Schema mappings, data exchange, and metadata management. In: PODS, pp. 61–75 (2005)Google Scholar
- 30.Kolaitis, P.G., Panttaja, J., Tan, W.C.: The complexity of data exchange. In: SIGMOD, pp. 30–39 (2006)Google Scholar
- 31.Mahdi, E.: A survey of r software for parallel computing. Am. J. Appl. Math. Stat. 2(4), 224–230 (2014)CrossRefGoogle Scholar
- 32.Mecca, G., Papotti, P., Raunich, S.: Core schema mappings: scalable core computations in data exchange. Inf. Syst. 37(7), 677–711 (2012)CrossRefGoogle Scholar
- 33.Mumick, I.S., Pirahesh, H., Ramakrishnan, R.: The magic of duplicates and aggregates. In: VLDB, pp. 264–277 (1990)Google Scholar
- 34.Ramsay, J.O., Hooker, G., Graves, S. (eds.): Functional Data Analysis with R and Matlab. Springer, New York (2009)MATHGoogle Scholar
- 35.Sallinger, E.: Reasoning about schema mappings. In: Data Exchange, Integration, and Streams, pp. 97–127 (2013)Google Scholar
- 36.Schmidberger, M., Morgan, M., Eddelbuettel, D., Yu, H., Tierney, L., Mansmann, U.: State of the art in parallel computing with r. J. Stat. Softw. 31(1), 1–27 (2009). 8CrossRefGoogle Scholar
- 37.Stonebraker, M., Becla, J., DeWitt, D.J., Lim, K., Maier, D., Ratzesberger, O., Zdonik, S.B.: Requirements for science data bases and SCIDB. In: CIDR (2009)Google Scholar