Skip to main content

BK-ADAPT: Dynamic Background Knowledge for Automating Data Transformation

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2019)

Abstract

An enormous effort is usually devoted to data wrangling, the tedious process of cleaning, transforming and combining data, such that it is ready for modelling, visualisation or aggregation. Data transformation and formatting is one common task in data wrangling, which is performed by humans in two steps: (1) they recognise the specific domain of data (dates, phones, addresses, etc.) and (2) they apply conversions that are specific to that domain. However, the mechanisms to manipulate one specific domain can be unique and highly different from other domains. In this paper we present BK-ADAPT, a system that uses inductive programming (IP) with a dynamic background knowledge (BK) generated by a machine learning meta-model that selects the domain and/or the primitives from several descriptive features of the data wrangling problem. To show the performance of our method, we have created a web-based tool that allows users to provide a set of inputs and one or more examples of outputs, in such a way that the rest of examples are automatically transformed by the tool.

This research was supported by the EU (FEDER) and the Spanish MINECO RTI2018-094403-B-C32, and the Generalitat Valenciana PROMETEO/2019/098. L. Contreras-Ochando was also supported by the Spanish MECD (FPU15/03219). J. Hernández-Orallo is also funded by FLI (RFP2-152). F. Martínez-Plumed was also supported by INCIBE, the European Commission (JRC) HUMAINT project (CT-EX2018D335821-101), and UPV (PAID-06-18).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Wrangle language: https://docs.trifacta.com/display/SS/Wrangle+Language.

  2. 2.

    The complete description of the approach can be found at: [1].

  3. 3.

    The code is available at: https://github.com/liconoc/DataWrangling-DSI.

  4. 4.

    A demo can be seen on: https://www.youtube.com/watch?v=wxFhXYyonOw.

References

  1. Contreras-Ochando, L., Ferri, C., Hernández-Orallo, J., Martínez-Plumed, F., Ramírez-Quintana, M.J., Katayama, S.: Automated data transformation with inductive programming and dynamic background knowledge. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2019 (2019, to appear)

    Google Scholar 

  2. Gulwani, S.: Automating string processing in spreadsheets using input-output examples. In: Proceedings of 38th Principles of Programming Languages, pp. 317–330 (2011)

    Google Scholar 

  3. Gulwani, S., Hernandez-Orallo, J., Kitzelmann, E., Muggleton, S.H., Schmid, U., Zorn, B.: Inductive programming meets the real world. Commun. ACM 58(11), 90–99 (2015)

    Article  Google Scholar 

  4. Kandel, S., Paepcke, A., Hellerstein, J., Heer, J.: Wrangler: interactive visual specification of data transformation scripts. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 3363–3372. ACM (2011)

    Google Scholar 

  5. Katayama, S.: An analytical inductive functional programming system that avoids unintended programs. In: Proceedings of PEPM, pp. 43–52. ACM (2012)

    Google Scholar 

  6. Shu, C., Zhang, H.: Neural programming by example. In: AAAI, pp. 1539–1545 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lidia Contreras-Ochando .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Contreras-Ochando, L., Ferri, C., Hernández-Orallo, J., Martínez-Plumed, F., Ramírez-Quintana, M.J., Katayama, S. (2020). BK-ADAPT: Dynamic Background Knowledge for Automating Data Transformation. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11908. Springer, Cham. https://doi.org/10.1007/978-3-030-46133-1_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-46133-1_45

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-46132-4

  • Online ISBN: 978-3-030-46133-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics