Abstract
An enormous effort is usually devoted to data wrangling, the tedious process of cleaning, transforming and combining data, such that it is ready for modelling, visualisation or aggregation. Data transformation and formatting is one common task in data wrangling, which is performed by humans in two steps: (1) they recognise the specific domain of data (dates, phones, addresses, etc.) and (2) they apply conversions that are specific to that domain. However, the mechanisms to manipulate one specific domain can be unique and highly different from other domains. In this paper we present BK-ADAPT, a system that uses inductive programming (IP) with a dynamic background knowledge (BK) generated by a machine learning meta-model that selects the domain and/or the primitives from several descriptive features of the data wrangling problem. To show the performance of our method, we have created a web-based tool that allows users to provide a set of inputs and one or more examples of outputs, in such a way that the rest of examples are automatically transformed by the tool.
This research was supported by the EU (FEDER) and the Spanish MINECO RTI2018-094403-B-C32, and the Generalitat Valenciana PROMETEO/2019/098. L. Contreras-Ochando was also supported by the Spanish MECD (FPU15/03219). J. Hernández-Orallo is also funded by FLI (RFP2-152). F. Martínez-Plumed was also supported by INCIBE, the European Commission (JRC) HUMAINT project (CT-EX2018D335821-101), and UPV (PAID-06-18).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Wrangle language: https://docs.trifacta.com/display/SS/Wrangle+Language.
- 2.
The complete description of the approach can be found at: [1].
- 3.
The code is available at: https://github.com/liconoc/DataWrangling-DSI.
- 4.
A demo can be seen on: https://www.youtube.com/watch?v=wxFhXYyonOw.
References
Contreras-Ochando, L., Ferri, C., Hernández-Orallo, J., Martínez-Plumed, F., Ramírez-Quintana, M.J., Katayama, S.: Automated data transformation with inductive programming and dynamic background knowledge. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2019 (2019, to appear)
Gulwani, S.: Automating string processing in spreadsheets using input-output examples. In: Proceedings of 38th Principles of Programming Languages, pp. 317–330 (2011)
Gulwani, S., Hernandez-Orallo, J., Kitzelmann, E., Muggleton, S.H., Schmid, U., Zorn, B.: Inductive programming meets the real world. Commun. ACM 58(11), 90–99 (2015)
Kandel, S., Paepcke, A., Hellerstein, J., Heer, J.: Wrangler: interactive visual specification of data transformation scripts. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 3363–3372. ACM (2011)
Katayama, S.: An analytical inductive functional programming system that avoids unintended programs. In: Proceedings of PEPM, pp. 43–52. ACM (2012)
Shu, C., Zhang, H.: Neural programming by example. In: AAAI, pp. 1539–1545 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Contreras-Ochando, L., Ferri, C., Hernández-Orallo, J., Martínez-Plumed, F., Ramírez-Quintana, M.J., Katayama, S. (2020). BK-ADAPT: Dynamic Background Knowledge for Automating Data Transformation. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11908. Springer, Cham. https://doi.org/10.1007/978-3-030-46133-1_45
Download citation
DOI: https://doi.org/10.1007/978-3-030-46133-1_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46132-4
Online ISBN: 978-3-030-46133-1
eBook Packages: Computer ScienceComputer Science (R0)