Abstract
The basic objective of the proposed work is to analyse arrival delay of the flights using data mining and four supervised machine learning algorithms: random forest, Support Vector Machine (SVM), Gradient Boosting Classifier (GBC) and k-nearest neighbour algorithm, and compare their performances to obtain the best performing classifier. To train each predictive model, data has been collected from BTS, United States Department of Transportation. The data included all the flights operated by American Airlines, connecting the top five busiest airports of United States, located in Atlanta, Los Angeles, Chicago, Dallas/Fort Worth, and New York, in the years 2015 and 2016. Aforesaid supervised machine learning algorithms were evaluated to predict the arrival delay of individual scheduled flights. All the algorithms were used to build the predictive models and compared to each other to accurately find out whether a given flight will be delayed more than 15 min or not. The result is that the gradient boosting classifier gives the best predictive arrival delay performance of 79.7% of total scheduled American Airlines’ flights in comparison to kNN, SVM and random forest. Such a predictive model based on the GBC potentially can save huge losses; the commercial airlines suffer due to arrival delays of their scheduled flights.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
U. D. of Transportation, February 2016 on-time performance up from previous year (2016)
Wikipedia contributors: American Airlines. Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=American_Airlines&oldid=812987658. Accessed 09 Nov 2017
List of Top 40 Airports in US. World Airport Codes. https://www.world-airport-codes.com/us-top-40-airports.html. Accessed 09 Nov 2017
Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT press (2012)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Babu, N.R., Mohan, B.J.: Fault classification in power systems using EMD and SVM. Ain Shams Eng. J. (2015)
Aler, R., Galvn, I.M., Ruiz-Arias, J.A., Gueymard, C.A.: Improving the separation of direct and diffuse solar radiation components using machine learning by gradient boosting. Solar Energy 150, 558–569 (2017)
Breiman, Leo: Random forests. Mach. Learn. 45(1), 5–32 (2001)
OST\(\_\)R | BTS | Transtats. OST\(\_ \)R | BTS | Transtats. http://www.transtats.bts.gov/. Accessed 10 Nov 2017
Choi, S., Kim, Y.J., Briceno, S., Mavris, D.: Prediction of weather-induced airline delays based on machine learning algorithms. In: 2016 IEEE/AIAA 35th Digital Avionics Systems Conference (DASC), pp. 1–6. IEEE (2016)
About Feature Scaling and Normalization. Sebastian Raschka’s Website. July 11, 2014. http://sebastianraschka.com/Articles/2014_about_feature_scaling.html. Accessed 10 Nov 2017
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chakrabarty, N., Kundu, T., Dandapat, S., Sarkar, A., Kole, D.K. (2019). Flight Arrival Delay Prediction Using Gradient Boosting Classifier. In: Abraham, A., Dutta, P., Mandal, J., Bhattacharya, A., Dutta, S. (eds) Emerging Technologies in Data Mining and Information Security. Advances in Intelligent Systems and Computing, vol 813. Springer, Singapore. https://doi.org/10.1007/978-981-13-1498-8_57
Download citation
DOI: https://doi.org/10.1007/978-981-13-1498-8_57
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1497-1
Online ISBN: 978-981-13-1498-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)