Particulate Matter Matters—The Data Science Challenge @ BTW 2019

Abstract

For the second time, the Data Science Challenge took place as part of the 18th symposium “Database Systems for Business, Technology and Web” (BTW) of the Gesellschaft für Informatik (GI). The Challenge was organized by the University of Rostock and sponsored by IBM and SAP. This year, the integration, analysis and visualization around the topic of particulate matter pollution was the focus of the challenge. After a preselection round, the accepted participants had one month to adapt their developed approach to a substantiated problem, the real challenge. The final presentation took place at BTW 2019 in front of the prize jury and the attending audience. In this article, we give a brief overview of the schedule and the organization of the Data Science Challenge. In addition, the problem to be solved and its solution will be presented by the participants.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Notes

  1. 1.

    https://maps.luftdaten.info/, last called on 2019-02-20.

  2. 2.

    https://archive.luftdaten.info/, last accessed on 2019-02-20.

  3. 3.

    https://www.daserste.de/information/reportage-dokumentation/dokus/exclusiv-im-ersten-das-diesel-desaster-100.html, last viewed on 2019-02-20, video available until 2020-01-07.

  4. 4.

    http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram.

  5. 5.

    ftp://ftp-cdc.dwd.de/pub/CDC/grids_germany/monthly/precipitation.

  6. 6.

    http://luftdaten.info.

  7. 7.

    ftp://ftp-cdc.dwd.de/pub/CDC/observations_germany/climate.

  8. 8.

    https://www.openstreetmap.org.

  9. 9.

    https://www.flightradar24.com.

  10. 10.

    http://download.geofabrik.de/.

  11. 11.

    https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf.

  12. 12.

    www.visitberlin.de.

  13. 13.

    https://www.watterott.com/media/files_public/reiknvyoc/SDS011.pdf.

  14. 14.

    https://www.umwelt-plakette.de/de/info-zur-deutschen-umwelt-plakette/umweltzonen-in-deutschland/deutsche-umweltzonen.

References

  1. 1.

    Alkhouri G, Wilke M (2019) Deep Learning zur Vorhersage von Feinstaubbelastung. In: Meyer H, Ritter N, Thor A, Nicklas D, Heuer A, Klettke M (eds) Datenbanksysteme für Business, Technologie und Web (BTW 2019) 18. Fachtagung des GI-Fachbereichs “Datenbanken und Informationssysteme” (DBIS), Rostock, Germany, 4.–8. März 2019 Gesellschaft für Informatik, Bonn, pp 305–308 https://doi.org/10.18420/btw2019-ws-35

    Google Scholar 

  2. 2.

    Bailis P, Gan E, Madden S, Narayanan D, Rong K, Suri S (2017) Macrobase: prioritizing attention in fast data. ACM International Conference on Management of Data. ACM, Chicago, pp 541–556 (Proceedings)

    Google Scholar 

  3. 3.

    Bougoudis I, Demertzis K, Iliadis L (2016) Fast and low cost prediction of extreme air pollution values with hybrid unsupervised learning. Integr Comput Aided Eng 23(2):115–127

    Article  Google Scholar 

  4. 4.

    Cleveland RB, Cleveland WS, McRae JE, Terpenning I (1990) STL: a seasonal-trend decomposition. J Off Stat 6(1):3–73

    Google Scholar 

  5. 5.

    Cyrys J, Eeftens M, Heinrich J, Ampe C, Armengaud A, Beelen R, Bellander T, Beregszaszi T, Birk M, Cesaroni G et al (2012) Variation of NO2 and NOx concentrations between and within 36 European study areas: results from the ESCAPE study. Atmos Environ 62:374–390. https://doi.org/10.1016/j.atmosenv.2012.07.080

    Article  Google Scholar 

  6. 6.

    Deutscher Wetterdienst (2019) Climate data center. ftp://ftp-cdc.dwd.de/pub/CDC/observations_germany/climate/. Accessed 6 Feb 2019

    Google Scholar 

  7. 7.

    Esmailoghli M, Redyuk S, Martinez R, Abedjan Z, Rabl T, Markl V (2019) Explanation of air pollution using external data sources. In: Meyer H, Ritter N, Thor A, Nicklas D, Heuer A, Klettke M (eds) Datenbanksysteme für Business, Technologie und Web (BTW 2019) 18. Fachtagung des GI-Fachbereichs “Datenbanken und Informationssysteme” (DBIS), Rostock, Germany, 4.–8. März 2019 Gesellschaft für Informatik, Bonn, pp 297–300 https://doi.org/10.18420/btw2019-ws-32

    Google Scholar 

  8. 8.

    Folium (2019) Folium documentation. https://python-visualization.github.io/folium/. Accessed 6 May 2019

    Google Scholar 

  9. 9.

    Grunert H, Meyer H (2019) Die Data Science Challenge auf der BTW 2019 in Rostock. In: Meyer H, Ritter N, Thor A, Nicklas D, Heuer A, Klettke M (eds) Datenbanksysteme für Business, Technologie und Web (BTW 2019) 18. Fachtagung des GI-Fachbereichs “Datenbanken und Informationssysteme” (DBIS), Rostock, Germany, 4.–8. März 2019 Gesellschaft für Informatik, Bonn, pp 281–284 https://doi.org/10.18420/btw2019-ws-30

    Google Scholar 

  10. 10.

    Hagedorn S, Sattler K (2019) Peaks and the influence of weather, traffic, and events on particulate pollution. In: Meyer H, Ritter N, Thor A, Nicklas D, Heuer A, Klettke M (eds) Datenbanksysteme für Business, Technologie und Web (BTW 2019) 18. Fachtagung des GI-Fachbereichs “Datenbanken und Informationssysteme” (DBIS), Rostock, Germany, 4.–8. März 2019 Gesellschaft für Informatik, Bonn, pp 301–302 https://doi.org/10.18420/btw2019-ws-33

    Google Scholar 

  11. 11.

    Klingner M (2018) Stellungnahme von Prof. Dr. Matthias Klingner zur öffentlichen Anhörung am 25. Juni 2018. https://www.bundestag.de/resource/blob/561430/42f387a20eef0041e81502cd5092b271/014_sitzung_fraunhofer-data.pdf. Accessed 25 Apr 2019

    Google Scholar 

  12. 12.

    Klingner M, Sähn E (2008) Prediction of PM10 concentration on the basis of high resolution weather forecasting. Meteorol Z 17(3):263–272. https://doi.org/10.1127/0941-2948/2008/0288

    Article  Google Scholar 

  13. 13.

    Lelieveld J, Evans JS, Fnais M, Giannadaki D, Pozzer A (2015) The contribution of outdoor air pollution sources to premature mortality on a global scale. Nature 525(7569):367

    Article  Google Scholar 

  14. 14.

    Leys C, Ley C, Klein O, Bernard P, Licata L (2013) Detecting outliers: do not use standard deviation around the mean, use absolute deviation around the median. J Exp Soc Psychol 49(4):764–766

    Article  Google Scholar 

  15. 15.

    Meyer H, Ritter N, Thor A, Nicklas D, Heuer A, Klettke M (eds) (2019) Datenbanksysteme für Business, Technologie und Web (BTW 2019). 18. Fachtagung des GI-Fachbereichs “Datenbanken und Informationssysteme” (DBIS), Rostock, Germany, 4.-8. März 2019 Gesellschaft für Informatik, Bonn

    Google Scholar 

  16. 16.

    Moritz S, Sardá A, Bartz-Beielstein T, Zaefferer M, Stork J (2015) Comparison of different methods for Univariate time series imputation in R. ArXiv 2015(10):arXiv:1510.03924 [stat.AP]. https://arxiv.org/abs/1510.03924

    Google Scholar 

  17. 17.

    Mukherjee A, Agrawal M (2017) World air particulate matter: sources, distribution and health effects. Environ Chem Lett 15(2):283–309

    Article  Google Scholar 

  18. 18.

    Nova Fitness Co, Ltd (2015) SDS011 laser PM2.5 sensor specification. http://ecksteinimg.de/Datasheet/SDS011laserPM2.5sensorspecification-V1.3.pdf. Accessed 8 Feb 2019

    Google Scholar 

  19. 19.

    OpenWeather (2018) Weather API – OpenWeatherMap. https://openweathermap.org/api. Accessed 28 Nov 2018

    Google Scholar 

  20. 20.

    Alfeld P (1984) A trivariate Clough-Tocher scheme for tetrahedral data. Comput Aided Geom Des 1(2):169–181. https://doi.org/10.1016/0167-8396(84)90029-3

    Article  MATH  Google Scholar 

  21. 21.

    Plotly (2019) Build beautiful, web-based analytics applications with Dash. https://plot.ly/products/dash/. Accessed 20 Apr 2019

    Google Scholar 

  22. 22.

    Rausch A, Werhahn O, Witzel O, Ebert V, Vuelban EM, Gersl J, Kvernmo G, Korsman J, Coleman M, Gardiner T et al (2015) Metrology to underpin future regulation of industrial emissions. 17th International Congress of Metrology. EDP Sciences, Paris, p 7008

    Google Scholar 

  23. 23.

    Schmitz C, Serai DD, Gava TE (2019) Prediction of air pollution with machine learning. In: Meyer H, Ritter N, Thor A, Nicklas D, Heuer A, Klettke M (eds) Datenbanksysteme für Business, Technologie und Web (BTW 2019) 18. Fachtagung des GI-Fachbereichs “Datenbanken und Informationssysteme” (DBIS), Rostock, Germany, 4.–8. März 2019 Gesellschaft für Informatik, Bonn, pp 303–304 https://doi.org/10.18420/btw2019-ws-34

    Google Scholar 

  24. 24.

    Stuttgart OL (2015) Luftdaten Info. https://luftdaten.info/. Accessed 28 Nov 2018

    Google Scholar 

  25. 25.

    Stuttgart OL (2015) Luftdaten Info. https://archive.luftdaten.info/csv_per_month/. Accessed 28 Nov 2018

    Google Scholar 

  26. 26.

    topographic-mapcom (2019) Topografische Karte Stuttgart. http://de-de.topographic-map.com/places/Stuttgart-8132395/. Accessed 26 Feb 2019

    Google Scholar 

  27. 27.

    Tukey JW (1977) Exploratory data analysis. Addison-Wesley, Reading

    Google Scholar 

  28. 28.

    Environmental Protection Agency (2019) Particulate Matter (PM) basics. https://www.epa.gov/pm-pollution/particulate-matter-pm-basics#PM. Accessed 27 Apr 2019

    Google Scholar 

  29. 29.

    Waizenegger T (2017) BTW 2017 data science challenge (SDSC17). In: Mitschang B, Ritter N, Schwarz H, Klettke M, Thor A, Kopp O, Wieland M (eds) Datenbanksysteme für Business, Technologie und Web (BTW 2017) 17. Fachtagung des GI-Fachbereichs “Datenbanken und Informationssysteme” (DBIS), Stuttgart, Germany, 6.-10. März 2017, pp 405–406

    Google Scholar 

  30. 30.

    WHO (2016) Air pollution levels rising in many of the world’s poorest cities. http://www.who.int/en/news-room/detail/12-05-2016-air-pollution-levels-rising-in-many-of-the-world-s-poorest-cities. Accessed 24 Nov 2018

    Google Scholar 

  31. 31.

    Woltmann L, Hartmann C, Lehner W (2019) Assessing the impact of driving bans with data analysis. In: Meyer H, Ritter N, Thor A, Nicklas D, Heuer A, Klettke M (eds) Datenbanksysteme für Business, Technologie und Web (BTW 2019) 18. Fachtagung des GI-Fachbereichs “Datenbanken und Informationssysteme” (DBIS), Rostock, Germany, 4.–8. März 2019 Gesellschaft für Informatik, Bonn, pp 287–296 https://doi.org/10.18420/btw2019-ws-31

    Google Scholar 

  32. 32.

    Xiao Q, Ma Z, Li S, Liu Y (2015) The impact of winter heating on air pollution in China. PLoS ONE 10(1):e117311

    Article  Google Scholar 

Download references

Acknowledgements

The organizers of the Data Science Challenge would like to take this opportunity to thank the participants and jury members for their contributions, especially Ute Schuerfeld and Stefan Goers for their valuable support throughout the whole process. In addition, we would like to thank IBM and SAP for sponsoring the Challenge.

The TU Dresden would like to thank Elke Sähn from the Fraunhofer-Institut für Verkehrs- und Infrastruktursysteme IVI for her substantial input as its domain expert.

The TU Berlin would like to acknowledge the German Federal Ministry of Transport and Digital Infrastructure (BMVI) in the context of the research initiative mFUND, for funding the project DAYSTREAM under grant number 19F2031D, in which some of the tools and techniques used in this research are based or inspired. The work is also supported by the BZML under grant number 01IS18037A, BBDC 2 under grant number 01IS18025A, ECDF, and the HEIBRiDS graduate school.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Holger J. Meyer.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Meyer, H.J., Grunert, H., Waizenegger, T. et al. Particulate Matter Matters—The Data Science Challenge @ BTW 2019. Datenbank Spektrum 19, 165–182 (2019). https://doi.org/10.1007/s13222-019-00322-x

Download citation

Keywords

  • BTW 2019
  • Data Science Challenge
  • Big Data Analytics
  • Particulate matter
  • Driving bans