Abstract
Students typically do not have practical tools to help them choose their target universities to apply. This work proposes a comprehensive analytics framework as a decision support tool that assists students in their admission process. As an essential element of the developed framework, a prediction procedure is developed to precisely determine the student's chance of admission to each university using various machine learning methods. It is concluded that random forest combined with kernel principal component analysis outperforms other prediction models. Besides, an online survey is built to elicit the utility of the student regarding each university. A mathematical programming model is then proposed to determine the best universities to apply among the candidates considering the probable limitations; the most important is the student's budget. The model is also extended to consider multiple objectives for making decisions. Last, a case study is provided to show the practicality of the developed decision support tool.
Similar content being viewed by others
References
Abbas AE (2010) Constructing multiattribute utility functions for decision analysis. In: Risk and optimization in an uncertain world. In: INFORMS, pp 62–98
Achabal DD, McIntyre SH, Smith SA, Kalyanam K (2000) A decision support system for vendor managed inventory. J Retail 76(4):430–454
Acharya MS, Armaan A, Antony AS (2019) A comparison of regression models for prediction of graduate admissions. In: 2019 International conference on computational intelligence in data science (ICCIDS). IEEE, pp 1–5.
Adekitan AI, Noma-Osaghae E (2019) Data mining approach to predicting the performance of first year student in a university using the admission requirements. Educ Inf Technol 24(2):1527–1543
Albawi S, Mohammed TA, Al-Zawi S (2017) Understanding of a convolutional neural network. In 2017 International conference on engineering and technology. IEEE, pp 1–6
Asif R, Merceron A, Ali SA, Haider NG (2017) Analyzing undergraduate students’ performance using educational data mining. Comput Educ 113:177–194
Audet C, Hare W (2017) Biobjective optimization. In: Derivative-free and blackbox optimization. Springer, New York, pp 247–262
Baucells M, Sarin RK (2003) Group decisions with multiple criteria. Manage Sci 49(8):1105–1118
Belloni A, Lovett MJ, Boulding W, Staelin R (2012) Optimal admission and scholarship decisions: choosing customized marketing offers to attract a desirable mix of customers. Mark Sci 31(4):621–636
Board S (2009) Preferences and utility. UCLA, Los Angeles
Chui KT, Fung DCL, Lytras MD, Lam TM (2020) Predicting at-risk university students in a virtual learning environment via a machine learning algorithm. Comput Hum Behav 107:105584
Ding L (2019) Theoretical perspectives of quantitative physics education research. Phys Rev Phys Educ Res 15(2):020101
Dumitrescu E, Hue S, Hurlin C, Tokpavi S (2022) Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects. Eur J Oper Res 297(3):1178–1192
Egorow O, Siegert I, Wendemuth A (2018) Improving emotion recognition performance by random-forest-based feature selection. In: International conference on speech and computer. Springer, Berlin, pp 134–144
Esteban A, Zafra A, Romero C (2020) Helping university students to choose elective courses by using a hybrid multi-criteria recommendation system with genetic optimization. Knowl Based Syst 194:105385
Ghai B (2015) Analysis & prediction of american graduate admissions process. Stony Brook University, Department of Computer Science
Gharroudi O, Elghazel H, Aussem A (2014) A comparison of multi-label feature selection methods using the random forest paradigm. In: Canadian conference on artificial intelligence. Springer, New York, pp 95–106
Ghodsypour SH, O’Brien C (1998) A decision support system for supplier selection using an integrated analytic hierarchy process and linear programming. Int J Prod Econ 56:199–212
Gray CC, Perkins D (2019) Utilizing early engagement and machine learning to predict student outcomes. Comput Educ 131:22–32
Gupta N, Sawhney A, Roth D (2016) Will I get in? Modeling the graduate admission process for American universities. In 2016 IEEE 16th international conference on data mining workshops (ICDMW). IEEE, pp 631–638
Helal S, Li J, Liu L, Ebrahimie E, Dawson S, Murray DJ, Long Q (2018) Predicting academic performance by considering student heterogeneity. Knowl-Based Syst 161:134–146
Hoffait A-S, Schyns M (2017) Early detection of university students with potential difficulties. Decis Support Syst 101:1–11
Hussain M, Zhu W, Zhang W, Abidi SMR, Ali S (2019) Using machine learning to predict student difficulties from learning session data. Artif Intell Rev 52(1):381–407
Injadat M, Moubayed A, Nassif AB, Shami A (2020) Systematic ensemble model selection approach for educational data mining. Knowl Based Syst 200:105992
Jansen SJ (2011) The multi-attribute utility method. In: Jansen SJT et al (eds) The measurement and analysis of housing preference and choice. Springer, New York, pp 101–125
Kaur P, Gosain A (2020) Robust hybrid data-level sampling approach to handle imbalanced data during classification. Soft Comput 24(20):15715–15732
Kim D, Kim N, Cho J, Shin H (2019) Optimizing the multistage university admission decision process. INFORMS J Appl Anal 49(6):422–429
Kotsiantis SB (2012) Use of machine learning techniques for educational proposes: a decision support system for forecasting students’ grades. Artif Intell Rev 37(4):331–344
Kutner MH, Nachtsheim CJ, Neter J, Li W (2005) Applied linear statistical models, 5th edn. McGraw-Hill/Irwin, New York
Li S, Harner EJ, Adjeroh DA (2011) Random KNN feature selection-a fast and stable alternative to Random Forests. BMC Bioinformatics 12(1):1–11
Lykourentzou I, Giannoukos I, Nikolopoulos V, Mpardis G, Loumos V (2009) Dropout prediction in e-learning courses through the combination of machine learning techniques. Comput Educ 53(3):950–965
Maldonado S, Armelini G, Guevara CA (2017) Assessing university enrollment and admission efforts via hierarchical classification and feature selection. Intelligent Data Analysis 21(4):945–962
Maltz EN, Murphy KE, Hand ML (2007) Decision support for university enrollment management: Implementation and experience. Decis Support Syst 44(1):106–123
Mansmann S, Scholl MH (2007) Decision support system for managing educational capacity utilization. IEEE Trans Educ 50(2):143–150
Mengash HA (2020) Using data mining techniques to predict student performance to support decision making in university admission systems. IEEE Access 8:55462–55470
Moore JS (1998) An expert system approach to graduate school admission decisions and academic performance prediction. Omega 26(5):659–670
Moxnes E (2004) Estimating customer utility of energy efficiency standards for refrigerators. J Econ Psychol 25(6):707–724
Ngai EW, Wat F (2005) Fuzzy decision support system for risk analysis in e-commerce development. Decis Support Syst 40(2):235–255
Nissen J, Donatello R, Van Dusen B (2019) Missing data and bias in physics education research: a case for using multiple imputation. Phys Rev Phys Educ Res 15(2):020106
Partridge M, Calvo RA (1998) Fast dimensionality reduction and simple PCA. Intell Data Anal 2(3):203–214
Picard RR, Cook RD (1984) Cross-validation of regression models. J Am Stat Assoc 79(387):575–583
Probst P, Wright MN, Boulesteix AL (2019) Hyperparameters and tuning strategies for random forest. Data Min Knowl Discov 9(3):e1301
Ragab AHM, Mashat AFS, Khedra AM (2012) HRSPCA: hybrid recommender system for predicting college admission. In: 2012 12th International conference on intelligent systems design and applications (ISDA). IEEE, pp 107–113
Raschka S (2018) Model evaluation, model selection, and algorithm selection in machine learning. https://arxiv.org/abs/1811.12808
Rodriguez JD, Perez A, Lozano JA (2009) Sensitivity analysis of k-fold cross validation in prediction error estimation. IEEE Trans Pattern Anal Mach Intell 32(3):569–575
Rutkowski L, Jaworski M, Pietruczuk L, Duda P (2014) The CART decision tree for mining data streams. Inf Sci 266:1–15
Speiser JL, Miller ME, Tooze J, Ip E (2019) A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst Appl 134:93–101
Springuel RP, Wittmann MC, Thompson JR (2019) Reconsidering the encoding of data in physics education research. Phys Rev Phys Educ Res 15(2):020103
Stone M (1978) Cross-validation: a review. Statistics 9(1):127–139
Van Dusen B, Nissen J (2019) Modernizing use of regression models in physics education research: a review of hierarchical linear modeling. Phys Rev Phys Educ Res 15(2):020108
Walczak S, Sincich T (1999) A comparative analysis of regression and neural networks for university admissions. Inf Sci 119(1–2):1–20
Waters A, Miikkulainen R (2014) Grade: machine learning support for graduate admissions. AI Mag 35(1):64–64
Wu H, Lin A, Xing X, Song D, Li Y (2021) Identifying core driving factors of urban land use change from global land cover products and POI data using the random forest method. Int J Appl Earth Observ Geoinform 103:102475
Young NT, Caballero MD (2019) Using machine learning to understand physics graduate school admissions. https://arxiv.org/abs/1907.01570.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Details on preprocessing
Appendix A: Details on preprocessing
Earlier in Sect. 4.1. it was discovered that the dataset used in this framework is clean and has no missing values, but what if this wasn’t the case? If a dataset is not clean and contains some missing data, there are multiple ways to solve the problem and handle this discrepancy. These solutions include deletion methods to eliminate missing data, replacing the missing data with the mean of that column, predicting the missing values, etc. To check the impact of the missing data on the prediction models, a new dataset is created from the original dataset by intentionally missing 10% of the data. Then, the first two abovementioned methods, i.e., replacing the missing data with the mean of that column and eliminating missing data methods, are used to handle these missing data. The results of the R-square values of the prediction models on the new dataset are reported in Table 11.
11.
Comparing Table 11 with the original Table IV shows that missing data in this dataset does not significantly change the results. Therefore, the Random Forest remains the selected method for prediction in this work, even in the presence of missing data since it has the best R-square value. Interested readers regarding handling missing data are referred to, e.g., Springuel et al. (2019) and Nissen et al. (2019). After scaling the data, the calculated mean of the chance of admitting column is 0.61, as seen in Table III, which indicates that the data is nearly balanced and there is no need to handle imbalanced data. What if this was not the case and the data were imbalanced as well? In that case, there are multiple ways to handle imbalanced data, for instance, resample the training set, under-sampling, over-sampling, and more; refer to Kaur and Gosain (2020) for more detail.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kiaghadi, M., Hoseinpour, P. University admission process: a prescriptive analytics approach. Artif Intell Rev 56, 233–256 (2023). https://doi.org/10.1007/s10462-022-10171-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-022-10171-y