Abstract
The present research consists of important comprehensive aspects to provide an overview of the dimensions of basic needs where households are facing. It may help to determine the applicability of supervised learning models for the classification of household poverty in India. The data are extracted from the fourth round of the nationally representative survey, namely the National Family and Health Survey (NFHS), conducted in 2015–2016. The Unsatisfied Basic Needs (UBN) approach is used to measure household poverty. The dimensions of poverty considered pertained to overcrowding, electricity, water sources, toilet facilities, school attendance, and subsistence capacity. This study also compares the five well-known algorithms of supervised learning algorithms, namely: Logistic Regression, Decision Tree, Random Forest, Neural Network, and Naïve Bayes, for the classification of household poverty in India. Our results show that house overcrowding (48.56%) is the most dominant dimension of poverty in Indian households, followed by lack of toilet facilities (42.3%), and housing (41.7%). For each dimension, there is still a disparity between urban and rural areas, with each dimension of poverty being more prominent in rural areas. Regarding supervised algorithms, all performed well, with the Random Forest algorithm showing the highest accuracy (81.01%) and Naïve Bayes the least accuracy (78.27%). As overcrowding was the most prominent dimension of poverty, there is a need to prioritize investment in the appropriate housing that includes toilets and other basic needs. Among supervised algorithms, a Random Forest algorithm may be recommended to assess the poverty status of Indian households.
Similar content being viewed by others
Data availability
The study utilizes a secondary source of data that is freely available in the public domain through http://iipsinidia.org. The necessary ethical approval has been taken by the respective organizations involved in the data collection process.
References
Ali QSW, Dkhar N (2018) India’s rampant urban water issues and challenges. In: Teri
Alkire S, Foster J (2011) Counting and multidimensional poverty measurement. J Public Econom. https://doi.org/10.1016/j.jpubeco.2010.11.006
Alkire S, Santos ME (2013) A multidimensional approach: poverty measurement & beyond. Soc Indic Res 112(2):239–257. https://doi.org/10.1007/s11205-013-0257-3
Alsharkawi A, Al-Fetyani M, Dawas M, Saadeh H, Alyaman M (2021) Poverty classification using machine learning: the case of Jordan. Sustainability (Switzerland) 13(3):1–16. https://doi.org/10.3390/su13031412
Asselin LM, Anh VT (2008) Multidimensional poverty and multiple correspondence analysis. Quant Approach Multidimens Poverty Meas. https://doi.org/10.1057/9780230582354
Aubron C, Lehoux H, Lucas C (2015) Poverty and inequality in rural India: reflections based on two agrarian system analyses in the state of Gujarat. EchoGéo 32:17. https://doi.org/10.4000/echogeo.14300
Best KB, Gilligan JM, Baroud H, Carrico AR, Donato KM, Ackerly BA, Mallick B (2021) Random forest analysis of two household surveys can identify important predictors of migration in Bangladesh. J Comput Soc Sci 4(1):77–100. https://doi.org/10.1007/s42001-020-00066-9
Bhide A, Crenshaw K, Shaban A, De Neve G, Donner H, Banerjee-Guha S et al (2019) Housing poverty in urban India: the failures of past and current strategies and the need for a new blueprint. Econom Polit Wkly. https://doi.org/10.2307/40277859
Bilton PA (2016) Tree-based models for poverty estimation: a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Statistics at Massey University, Manawatu (Doctoral dissertation, Massey University)
Bilton P, Jones G, Ganesh S, Haslett S (2017) Classification trees for poverty mapping. Comput Stat Data Anal 115:53–66. https://doi.org/10.1016/j.csda.2017.05.009
Breiman L (2001) Statistical modeling: the two cultures. Stat Sci 16(3):199–215. https://doi.org/10.1214/ss/1009213726
Brownlee J (2019) Comparing classical and machine learning algorithms for time series forecasting. Machine Learning Mastery, Australia
Cerda P, Varoquaux G (2020) Encoding high-cardinality string categorical variables. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/tkde.2020.2992529
Chattopadhyay AK, Kumar TK, Rice I (2020) A social engineering model for poverty alleviation. Nat Commun 11(1):6345. https://doi.org/10.1038/s41467-020-20201-4
Christiaensen L, Lanjouw P, Luoto J, Stifel D (2012) Small area estimation-based prediction methods to track poverty: Validation and applications. J Econom Inequality 10(2):267–297. https://doi.org/10.1007/s10888-011-9209-9
Dehury B, Mohanty SK (2017) Multidimensional poverty, household environment and short-term morbidity in India. Genus 73(1):1–23. https://doi.org/10.1186/s41118-017-0019-1
Dotter C, Klasen S (2017) The multidimensional poverty index: achievements , conceptual and empirical issues. UNDP Human Development Report Office. Ocasional Papers, 1–45
Froemelt A, Buffat R, Hellweg S (2020) Machine learning based modeling of households: a regionalized bottom-up approach to investigate consumption-induced environmental impacts. J Ind Ecol 24(3):639–652. https://doi.org/10.1111/jiec.12969
Gao C, Fei CJ, McCarl BA, Leatham DJ (2020) Identifying vulnerable households using machine learning. Sustainability. https://doi.org/10.3390/su12156002
Gopalan AC (2018) Development and deprivation: the indian experience development and deprivation the Indian experience. Econom Polit Wkly 18(51):2163–2168
Günther F, Fritsch S (2010) Neuralnet: training of neural networks. R J 2(1):30–38. https://doi.org/10.32614/rj-2010-006
Han S, Kim H (2021) Optimal feature set size in random forest regression. Appl Sci 11(8). https://doi.org/10.3390/app11083428
Hnatkovska V, Lahiri A (2013) The rural–urban divide in India. In: International Growth Centre Working Paper, February, 1–24
India Knowledge (2007) India’s rural poor: why housing isn’t enough to create sustainable communities. Wall Street Journal, 7 July. http://online.wsj.com/article/SB124697714669806043.html. Accessed 15 Jan 2020
Irizarry RA (2020) The caret package. Introduct Data Sci. https://doi.org/10.1201/9780429341830-30
Källestål C, Blandón EZ, Peña R, Peréz W, Contreras M, Persson LÅ, Sysoev O, Selling KE (2020) Assessing the multiple dimensions of poverty. Data mining approaches to the 2004–14 health and demographic surveillance system in Cuatro Santos, Nicaragua. Front Public Health 7:1–12. https://doi.org/10.3389/fpubh.2019.00409
Kapur A, Iyer S (2015) Swachh Bharat Mission SBM (Gramin) Budget Briefs 2015–16. In Budget Briefs, vol 7, issue 5
Kaviani P, Dhotre S (2017) Short Survey on Naive Bayes Algorithm. Int J Adv Res Comput Sci Manage 4:22
Khan JR, Chowdhury S, Islam H, Raheem E (2021) Machine learning algorithms to predict the childhood anemia In Bangladesh. J Data Sci 17(1):195–218. https://doi.org/10.6339/jds.201901_17(1).0009
Korkmaz M, Güney S, YİĞİTER Ş (2012) The importance of logistic regression implementations in the Turkish livestock sector and logistic regression implementations/fields. Harran Tarım ve Gıda Bilimleri Dergisi 16(2):25–36.
Kshirsagar V, Wieczorek J, Ramanathan S, Wells R (2017) Household poverty classification in data-scarce environments: a machine learning approach. arXiv preprint arXiv:1711.06813
Liaw A, Wiener M (2002) Classification and regression by random forest. R News 2(3):18–22
Malaeb B, Imai K (2018) Asia’s rural–urban disparity in the context of growing inequality. Ifad Res Ser 27:1–39
Mathiassen A (2008) The predictive ability of poverty models. Empirical Evidence from Uganda. Research Department of Statistics Norway, Discussion Papers
Romeshun K, Mayadunne G (2011) Appropriateness of the Sri Lanka poverty line for measuring urban poverty: the case of Colombo. International Institute for Environment and Development, London
Mitchell TM (nd) [PDF] Machine learning
Mohanty SK (2011) Multidimensional poverty and child survival in india. PLoS ONE. https://doi.org/10.1371/journal.pone.0026857
Mohanty SK, Agrawal NK, Mahapatra B, Choudhury D, Tuladhar S, Holmgren EV (2017) Multidimensional poverty and catastrophic health spending in the mountainous regions of Myanmar, Nepal and India. Int J Equity Health 16(1):1–13. https://doi.org/10.1186/s12939-016-0514-6
More S (2021) Ending poverty: the road to 2030. PLoS ONE 1(Sdg 1):5–7
Nayyar G, Nayyar R (2016) India’s “poverty of numbers” revisiting measurement issues. Econom Polit Wkly 51:61–71
Newhouse D, Vyas P (2019). Estimating poverty in India without expenditure data: a survey-to-survey imputation approach. June. https://doi.org/10.1596/1813-9450-8878
Pandey SM, Agarwal T, Krishnan NC (2018) Multi-task deep learning for predicting poverty from satellite images. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, pp 7793–7798
Piaggesi S, Gauvin L, Tizzoni M, Adler N, Verhulst S, Young A, Price R, Ferres L, Cattuto C, Panisson A (2019). Predicting city poverty using satellite imagery. Pp 90–96
Planning Commission of India (2013) Press note on poverty estimates, 2011–12 Government of India Planning Commission July 2013. Press Information Bureau, July, 1–10
Probst P, Wright MN (2019) Hyperparameters and tuning strategies for random forest. WIREs Data Min Knowl Discover. https://doi.org/10.1002/widm.1301
Qiao H, Peng J, Xu Z, Zhang B (2003) A reference model approach to stability analysis of neural networks. IEEE Trans Syst Man Cybern Part B 33(6):925–936
Raju K, Manasi S, Nagesh L (2008) Emerging ground water crisis in urban areas—a case study of ward no. 39, Bangalore city. https://doi.org/10.13140/RG.2.2.19101.72168
Release P (2019) Over the last 25 years, more than a billion people have lifted themselves out of extreme poverty, and the global poverty rate is now lower than it has ever been in recorded history. This is one of the greatest human achievements of our time. 1–5
RNDr Beáta Stehlíková D (2016) Poverty analysis using machine learning methods. [Bachelor’s thesis, Comenius University in Bratislava]. http://www.iam.fmph.uniba.sk/institute/stehlikova/BC/2016-plulikova.pdf
Rodrik D (2007) Growth building jobs and prosperity in developing counttries. Departement for International Development, 1–25
Roser M, Ortiz-Ospina E (2017) Global extreme poverty. https://ourworldindata.org/extremepoverty/. Accessed 15 Apr 2020
Shalev-Shwartz S, Ben-David S (2013) Understanding machine learning: From theory to algorithms. In: Understanding Machine Learning: From Theory to Algorithms (vol 9781107057). https://doi.org/10.1017/CBO9781107298019
Sharma L, Chakravarty K (2015) Multidimensional poverty measurement in Haryana. Indian J Hum Dev 9(1):89–101. https://doi.org/10.1177/0973703020150106
Shrinivasan K, Mohanty S (2004) Deprivation of basic amenities by caste and religion. Econ Polit Wkly 39(7):728–735
Silber J, Deutsch J (2005) Measuring multidimensional poverty: an empirical comparison of various approaches. Rev Income Wealth 51:145–174. https://doi.org/10.1111/j.1475-4991.2005.00148.x
Singh G, Kumar B, Gaur L, Tyagi A (2019) Comparison between multinomial and bernoulli naïve bayes for text classification. In: Proceedings of the 2019 International Conference on Automation, Computational and Technology Management (ICACTM), pp 593–596. https://doi.org/10.1109/ICACTM.2019.8776800
Singh K, Kaur J (2014) India, quality of life. In: Michalos AC (ed) Encyclopedia of quality of life and well-being research. Springe, Dordrecht, pp 3187–3190
Song YY, Lu Y (2015) Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry 27(2):130–135. https://doi.org/10.11919/j.issn.1002-0829.215044
Staveteig S, Mallick L (2014) Intertemporal comparisons of poverty and wealth with DHS data: a harmonized asset index approach. DHS Methodological Reports No. 15, September
Subash SP, Kumar RR, Aditya KS (2018) Satellite data and machine learning tools for predicting poverty in rural India. Agric Econ Res Rev 31(2):231. https://doi.org/10.5958/0974-0279.2018.00040.x
Talingdan JA (2019) Performance comparison of different classification algorithms for household poverty classification. In: Proceedings of the 2019 4th International Conference on Information Systems Engineering (ICISE), pp 11–15. https://doi.org/10.1109/ICISE.2019.00010
Thoplan R (2014) Random forests for poverty classification. Int J Sci 4531(August):252–259
Tian F, Wu B, Zeng H, Ahmed S, Yan N, White I, Zhang M, Stein A (2020) Identifying the links among poverty, hydroenergy and water use using data mining methods. Water Resour Manage 34(5):1725–1741. https://doi.org/10.1007/s11269-020-02524-5
United Nations (2014) A world that counts—mobilising the data revolution for sustainable development. Independent Expert Advisory Group on a Data Revolution for Sustainable Development, New York
Venkatramolla SK (2019) Machine learning and data science for a household-specific poverty level prediction task. Kansas State University
Wagle U, Vollmer F, Desa U, Thorbecke E, Sen A, Santos ME et al. (2010) OPHI Working Paper No. 32 Counting and Multidimensional Poverty Measurement. J Econom Inequality 9(2)
Walker R (2019) Multidimensional poverty. Routledge Int Handbook Poverty. https://doi.org/10.4324/9780429058103-4
Watson D (2016) Encyclopedia of food and agricultural ethics. Encycl Food Agric Ethics. https://doi.org/10.1007/978-94-007-6167-4
Weerts HJP, MuellerVanschoren AC J (2020) Importance of tuning hyperparameters of machine learning algorithms. Futures 56(October):43–52. https://doi.org/10.1016/j.futures.2013.10.008
WHO (2018) WHO Housing and Health Guidelines—Executive Summary. November 2018, 6–9
Witten I, Frank E, Hall MA (2011) Data mining—practical machine learning tools and techniques, 3rd Edition
Zhang Z (2016) Naïve Bayes classification in R. Ann Transl Med 4(12):241
Acknowledgements
Not applicable.
Funding
The author did not receive any funding to carry out this research.
Author information
Authors and Affiliations
Contributions
Conceptualization, SA and MD; methodology, SA; software, SA; data analysis, SA; supervision, MD; writing-original draft preparation, SA; writing-review and editing, MD. Both authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Ansari, S., Dhar, M. Poverty classification based on unsatisfied basic needs index: a comparison of supervised learning algorithms. SN Soc Sci 2, 69 (2022). https://doi.org/10.1007/s43545-022-00375-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s43545-022-00375-y