Advertisement

PRESISTANT: Data Pre-processing Assistant

  • Besim BilalliEmail author
  • Alberto Abelló
  • Tomàs Aluja-Banet
  • Rana Faisal Munir
  • Robert Wrembel
Conference paper
  • 847 Downloads
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 317)

Abstract

A concrete classification algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around. Typically, in order to improve the results, datasets need to be pre-processed. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives and non-experienced users become overwhelmed. Trial and error is not feasible in the presence of big amounts of data. We developed a method and tool—PRESISTANT, with the aim of answering the need for user assistance during data pre-processing. Leveraging ideas from meta-learning, PRESISTANT is capable of assisting the user by recommending pre-processing operators that ultimately improve the classification performance. The user selects a classification algorithm, from the ones considered, and then PRESISTANT proposes candidate transformations to improve the result of the analysis. In the demonstration, participants will experience, at first hand, how PRESISTANT easily and effectively ranks the pre-processing operators.

Keywords

Data pre-processing Meta-learning Data mining 

Notes

Acknowledgments

This research has been funded by the European Commission through the Erasmus Mundus Joint Doctorate “Information Technologies for Business Intelligence - Doctoral College” (IT4BI-DC).

References

  1. 1.
    Bilalli, B., Abelló, A., Aluja-Banet, T.: On the predictive power of meta-features in OpenML. Appl. Math. Comput. Sci. 27(4), 697–712 (2017)MathSciNetGoogle Scholar
  2. 2.
    Bilalli, B., Abelló, A., Aluja-Banet, T., Wrembel, R.: Automated data pre-processing via meta-learning. In: Bellatreche, L., Pastor, Ó., Almendros Jiménez, J.M., Aït-Ameur, Y. (eds.) MEDI 2016. LNCS, vol. 9893, pp. 194–208. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-45547-1_16CrossRefGoogle Scholar
  3. 3.
    Bilalli, B., Abelló, A., Aluja-Banet, T., Wrembel, R.: Towards intelligent data analysis: the metadata challenge. In: IOTBD 2016, pp. 331–338 (2016)Google Scholar
  4. 4.
    Bilalli, B., Abelló, A., Aluja-Banet, T., Wrembel, R.: Intelligent assistance for data pre-processing. Comput. Stand. Interfaces 57, 101–109 (2018)CrossRefGoogle Scholar
  5. 5.
    Bilalli, B., Abelló, A., Aluja-Banet, T., Wrembel, R.: PRESISTANT: learning based assistant for data pre-processing. In: eprint arXiv: https://arxiv.org/pdf/1803.01024.pdf (2018). (Under review)
  6. 6.
    Brazdil, P., Giraud-Carrier, C., Soares, C., Vilalta, R.: Metalearning: Applications to Data Mining, 1st edn. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-73263-1CrossRefzbMATHGoogle Scholar
  7. 7.
    Chu, X., Ilyas, I.F., Krishnan, S., Wang, J.: Data cleaning: overview and emerging challenges. In: SIGMOD 2016, pp. 2201–2206 (2016)Google Scholar
  8. 8.
    Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. AI Mag. 17, 37 (1996)Google Scholar
  9. 9.
    Feurer, M., Klein, A., Eggensperger, K., Springenberg, J.T., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: NIPS 2015, pp. 2962–2970 (2015)Google Scholar
  10. 10.
    Furche, T., Gottlob, G., Libkin, L., Orsi, G., Paton, N.W.: Data wrangling for big data: challenges and opportunities. In: EDBT 2016, pp. 473–478 (2016)Google Scholar
  11. 11.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)CrossRefGoogle Scholar
  12. 12.
    Järvelin, K., Kekäläinen, J.: IR evaluation methods for retrieving highly relevant documents. In: SIGIR 2000, pp. 41–48 (2000)Google Scholar
  13. 13.
    Kalousis, A.: Algorithm selection via meta-learning. Ph.D. Dissertation (2002)Google Scholar
  14. 14.
    Kandel, S., Paepcke, A., Hellerstein, J., Heer, J.: Wrangler: interactive visual specification of data transformation scripts. In: CHI 2011, pp. 3363–3372 (2011)Google Scholar
  15. 15.
    Lenzerini, M.: Data integration: a theoretical perspective. In: PODS 2002, pp. 233–246 (2002)Google Scholar
  16. 16.
    Michie, D., Spiegelhalter, D.J., Taylor, C.C., Campbell, J. (eds.): Machine Learning: Neural and Statistical Classification. Ellis Horwood, Chichester (1994)zbMATHGoogle Scholar
  17. 17.
    Munson, M.A.: A study on the importance of and time spent on different modeling steps. ACM SIGKDD Explor. Newsl. 13(2), 65–71 (2012)CrossRefGoogle Scholar
  18. 18.
    Nguyen, P., Hilario, M., Kalousis, A.: Using meta-mining to support data mining workflow planning and optimization. J. Artif. Intell. Res. 51, 605–644 (2014)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Besim Bilalli
    • 1
    • 2
    Email author
  • Alberto Abelló
    • 1
  • Tomàs Aluja-Banet
    • 1
  • Rana Faisal Munir
    • 1
  • Robert Wrembel
    • 2
  1. 1.Universitat Politècnica de CatalunyaBarcelonaSpain
  2. 2.Poznan University of TechnologyPoznanPoland

Personalised recommendations