Skip to main content

Advertisement

Log in

Poverty classification based on unsatisfied basic needs index: a comparison of supervised learning algorithms

  • Original Paper
  • Published:
SN Social Sciences Aims and scope Submit manuscript

Abstract

The present research consists of important comprehensive aspects to provide an overview of the dimensions of basic needs where households are facing. It may help to determine the applicability of supervised learning models for the classification of household poverty in India. The data are extracted from the fourth round of the nationally representative survey, namely the National Family and Health Survey (NFHS), conducted in 2015–2016. The Unsatisfied Basic Needs (UBN) approach is used to measure household poverty. The dimensions of poverty considered pertained to overcrowding, electricity, water sources, toilet facilities, school attendance, and subsistence capacity. This study also compares the five well-known algorithms of supervised learning algorithms, namely: Logistic Regression, Decision Tree, Random Forest, Neural Network, and Naïve Bayes, for the classification of household poverty in India. Our results show that house overcrowding (48.56%) is the most dominant dimension of poverty in Indian households, followed by lack of toilet facilities (42.3%), and housing (41.7%). For each dimension, there is still a disparity between urban and rural areas, with each dimension of poverty being more prominent in rural areas. Regarding supervised algorithms, all performed well, with the Random Forest algorithm showing the highest accuracy (81.01%) and Naïve Bayes the least accuracy (78.27%). As overcrowding was the most prominent dimension of poverty, there is a need to prioritize investment in the appropriate housing that includes toilets and other basic needs. Among supervised algorithms, a Random Forest algorithm may be recommended to assess the poverty status of Indian households.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data availability

The study utilizes a secondary source of data that is freely available in the public domain through http://iipsinidia.org. The necessary ethical approval has been taken by the respective organizations involved in the data collection process.

References

  • Ali QSW, Dkhar N (2018) India’s rampant urban water issues and challenges. In: Teri

  • Alkire S, Foster J (2011) Counting and multidimensional poverty measurement. J Public Econom. https://doi.org/10.1016/j.jpubeco.2010.11.006

    Article  Google Scholar 

  • Alkire S, Santos ME (2013) A multidimensional approach: poverty measurement & beyond. Soc Indic Res 112(2):239–257. https://doi.org/10.1007/s11205-013-0257-3

    Article  Google Scholar 

  • Alsharkawi A, Al-Fetyani M, Dawas M, Saadeh H, Alyaman M (2021) Poverty classification using machine learning: the case of Jordan. Sustainability (Switzerland) 13(3):1–16. https://doi.org/10.3390/su13031412

    Article  Google Scholar 

  • Asselin LM, Anh VT (2008) Multidimensional poverty and multiple correspondence analysis. Quant Approach Multidimens Poverty Meas. https://doi.org/10.1057/9780230582354

    Article  Google Scholar 

  • Aubron C, Lehoux H, Lucas C (2015) Poverty and inequality in rural India: reflections based on two agrarian system analyses in the state of Gujarat. EchoGéo 32:17. https://doi.org/10.4000/echogeo.14300

    Article  Google Scholar 

  • Best KB, Gilligan JM, Baroud H, Carrico AR, Donato KM, Ackerly BA, Mallick B (2021) Random forest analysis of two household surveys can identify important predictors of migration in Bangladesh. J Comput Soc Sci 4(1):77–100. https://doi.org/10.1007/s42001-020-00066-9

    Article  Google Scholar 

  • Bhide A, Crenshaw K, Shaban A, De Neve G, Donner H, Banerjee-Guha S et al (2019) Housing poverty in urban India: the failures of past and current strategies and the need for a new blueprint. Econom Polit Wkly. https://doi.org/10.2307/40277859

    Article  Google Scholar 

  • Bilton PA (2016) Tree-based models for poverty estimation: a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Statistics at Massey University, Manawatu (Doctoral dissertation, Massey University)

  • Bilton P, Jones G, Ganesh S, Haslett S (2017) Classification trees for poverty mapping. Comput Stat Data Anal 115:53–66. https://doi.org/10.1016/j.csda.2017.05.009

    Article  Google Scholar 

  • Breiman L (2001) Statistical modeling: the two cultures. Stat Sci 16(3):199–215. https://doi.org/10.1214/ss/1009213726

    Article  Google Scholar 

  • Brownlee J (2019) Comparing classical and machine learning algorithms for time series forecasting. Machine Learning Mastery, Australia

  • Cerda P, Varoquaux G (2020) Encoding high-cardinality string categorical variables. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/tkde.2020.2992529

    Article  Google Scholar 

  • Chattopadhyay AK, Kumar TK, Rice I (2020) A social engineering model for poverty alleviation. Nat Commun 11(1):6345. https://doi.org/10.1038/s41467-020-20201-4

    Article  Google Scholar 

  • Christiaensen L, Lanjouw P, Luoto J, Stifel D (2012) Small area estimation-based prediction methods to track poverty: Validation and applications. J Econom Inequality 10(2):267–297. https://doi.org/10.1007/s10888-011-9209-9

    Article  Google Scholar 

  • Dehury B, Mohanty SK (2017) Multidimensional poverty, household environment and short-term morbidity in India. Genus 73(1):1–23. https://doi.org/10.1186/s41118-017-0019-1

    Article  Google Scholar 

  • Dotter C, Klasen S (2017) The multidimensional poverty index: achievements , conceptual and empirical issues. UNDP Human Development Report Office. Ocasional Papers, 1–45

  • Froemelt A, Buffat R, Hellweg S (2020) Machine learning based modeling of households: a regionalized bottom-up approach to investigate consumption-induced environmental impacts. J Ind Ecol 24(3):639–652. https://doi.org/10.1111/jiec.12969

    Article  Google Scholar 

  • Gao C, Fei CJ, McCarl BA, Leatham DJ (2020) Identifying vulnerable households using machine learning. Sustainability. https://doi.org/10.3390/su12156002

    Article  Google Scholar 

  • Gopalan AC (2018) Development and deprivation: the indian experience development and deprivation the Indian experience. Econom Polit Wkly 18(51):2163–2168

    Google Scholar 

  • Günther F, Fritsch S (2010) Neuralnet: training of neural networks. R J 2(1):30–38. https://doi.org/10.32614/rj-2010-006

    Article  Google Scholar 

  • Han S, Kim H (2021) Optimal feature set size in random forest regression. Appl Sci 11(8). https://doi.org/10.3390/app11083428

  • Hnatkovska V, Lahiri A (2013) The rural–urban divide in India. In: International Growth Centre Working Paper, February, 1–24

  • India Knowledge (2007) India’s rural poor: why housing isn’t enough to create sustainable communities. Wall Street Journal, 7 July. http://online.wsj.com/article/SB124697714669806043.html. Accessed 15 Jan 2020

  • Irizarry RA (2020) The caret package. Introduct Data Sci. https://doi.org/10.1201/9780429341830-30

    Article  Google Scholar 

  • Källestål C, Blandón EZ, Peña R, Peréz W, Contreras M, Persson LÅ, Sysoev O, Selling KE (2020) Assessing the multiple dimensions of poverty. Data mining approaches to the 2004–14 health and demographic surveillance system in Cuatro Santos, Nicaragua. Front Public Health 7:1–12. https://doi.org/10.3389/fpubh.2019.00409

    Article  Google Scholar 

  • Kapur A, Iyer S (2015) Swachh Bharat Mission SBM (Gramin) Budget Briefs 2015–16. In Budget Briefs, vol 7, issue 5

  • Kaviani P, Dhotre S (2017) Short Survey on Naive Bayes Algorithm. Int J Adv Res Comput Sci Manage 4:22

    Google Scholar 

  • Khan JR, Chowdhury S, Islam H, Raheem E (2021) Machine learning algorithms to predict the childhood anemia In Bangladesh. J Data Sci 17(1):195–218. https://doi.org/10.6339/jds.201901_17(1).0009

    Article  Google Scholar 

  • Korkmaz M, Güney S, YİĞİTER Ş (2012) The importance of logistic regression implementations in the Turkish livestock sector and logistic regression implementations/fields. Harran Tarım ve Gıda Bilimleri Dergisi 16(2):25–36.

    Article  Google Scholar 

  • Kshirsagar V, Wieczorek J, Ramanathan S, Wells R (2017) Household poverty classification in data-scarce environments: a machine learning approach. arXiv preprint arXiv:1711.06813

  • Liaw A, Wiener M (2002) Classification and regression by random forest. R News 2(3):18–22

    Google Scholar 

  • Malaeb B, Imai K (2018) Asia’s rural–urban disparity in the context of growing inequality. Ifad Res Ser 27:1–39

    Google Scholar 

  • Mathiassen A (2008) The predictive ability of poverty models. Empirical Evidence from Uganda. Research Department of Statistics Norway, Discussion Papers

  • Romeshun K, Mayadunne G (2011) Appropriateness of the Sri Lanka poverty line for measuring urban poverty: the case of Colombo. International Institute for Environment and Development, London

  • Mitchell TM (nd) [PDF] Machine learning

  • Mohanty SK (2011) Multidimensional poverty and child survival in india. PLoS ONE. https://doi.org/10.1371/journal.pone.0026857

    Article  Google Scholar 

  • Mohanty SK, Agrawal NK, Mahapatra B, Choudhury D, Tuladhar S, Holmgren EV (2017) Multidimensional poverty and catastrophic health spending in the mountainous regions of Myanmar, Nepal and India. Int J Equity Health 16(1):1–13. https://doi.org/10.1186/s12939-016-0514-6

    Article  Google Scholar 

  • More S (2021) Ending poverty: the road to 2030. PLoS ONE 1(Sdg 1):5–7

    Google Scholar 

  • Nayyar G, Nayyar R (2016) India’s “poverty of numbers” revisiting measurement issues. Econom Polit Wkly 51:61–71

    Google Scholar 

  • Newhouse D, Vyas P (2019). Estimating poverty in India without expenditure data: a survey-to-survey imputation approach. June. https://doi.org/10.1596/1813-9450-8878

  • Pandey SM, Agarwal T, Krishnan NC (2018) Multi-task deep learning for predicting poverty from satellite images. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, pp 7793–7798

  • Piaggesi S, Gauvin L, Tizzoni M, Adler N, Verhulst S, Young A, Price R, Ferres L, Cattuto C, Panisson A (2019). Predicting city poverty using satellite imagery. Pp 90–96

  • Planning Commission of India (2013) Press note on poverty estimates, 2011–12 Government of India Planning Commission July 2013. Press Information Bureau, July, 1–10

  • Probst P, Wright MN (2019) Hyperparameters and tuning strategies for random forest. WIREs Data Min Knowl Discover. https://doi.org/10.1002/widm.1301

    Article  Google Scholar 

  • Qiao H, Peng J, Xu Z, Zhang B (2003) A reference model approach to stability analysis of neural networks. IEEE Trans Syst Man Cybern Part B 33(6):925–936

    Article  Google Scholar 

  • Raju K, Manasi S, Nagesh L (2008) Emerging ground water crisis in urban areas—a case study of ward no. 39, Bangalore city. https://doi.org/10.13140/RG.2.2.19101.72168

  • Release P (2019) Over the last 25 years, more than a billion people have lifted themselves out of extreme poverty, and the global poverty rate is now lower than it has ever been in recorded history. This is one of the greatest human achievements of our time. 1–5

  • RNDr Beáta Stehlíková D (2016) Poverty analysis using machine learning methods. [Bachelor’s thesis, Comenius University in Bratislava]. http://www.iam.fmph.uniba.sk/institute/stehlikova/BC/2016-plulikova.pdf

  • Rodrik D (2007) Growth building jobs and prosperity in developing counttries. Departement for International Development, 1–25

  • Roser M, Ortiz-Ospina E (2017) Global extreme poverty. https://ourworldindata.org/extremepoverty/. Accessed 15 Apr 2020

  • Shalev-Shwartz S, Ben-David S (2013) Understanding machine learning: From theory to algorithms. In: Understanding Machine Learning: From Theory to Algorithms (vol 9781107057). https://doi.org/10.1017/CBO9781107298019

  • Sharma L, Chakravarty K (2015) Multidimensional poverty measurement in Haryana. Indian J Hum Dev 9(1):89–101. https://doi.org/10.1177/0973703020150106

    Article  Google Scholar 

  • Shrinivasan K, Mohanty S (2004) Deprivation of basic amenities by caste and religion. Econ Polit Wkly 39(7):728–735

    Google Scholar 

  • Silber J, Deutsch J (2005) Measuring multidimensional poverty: an empirical comparison of various approaches. Rev Income Wealth 51:145–174. https://doi.org/10.1111/j.1475-4991.2005.00148.x

    Article  Google Scholar 

  • Singh G, Kumar B, Gaur L, Tyagi A (2019) Comparison between multinomial and bernoulli naïve bayes for text classification. In: Proceedings of the 2019 International Conference on Automation, Computational and Technology Management (ICACTM), pp 593–596. https://doi.org/10.1109/ICACTM.2019.8776800

  • Singh K, Kaur J (2014) India, quality of life. In: Michalos AC (ed) Encyclopedia of quality of life and well-being research. Springe, Dordrecht, pp 3187–3190

    Chapter  Google Scholar 

  • Song YY, Lu Y (2015) Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry 27(2):130–135. https://doi.org/10.11919/j.issn.1002-0829.215044

    Article  Google Scholar 

  • Staveteig S, Mallick L (2014) Intertemporal comparisons of poverty and wealth with DHS data: a harmonized asset index approach. DHS Methodological Reports No. 15, September

  • Subash SP, Kumar RR, Aditya KS (2018) Satellite data and machine learning tools for predicting poverty in rural India. Agric Econ Res Rev 31(2):231. https://doi.org/10.5958/0974-0279.2018.00040.x

    Article  Google Scholar 

  • Talingdan JA (2019) Performance comparison of different classification algorithms for household poverty classification. In: Proceedings of the 2019 4th International Conference on Information Systems Engineering (ICISE), pp 11–15. https://doi.org/10.1109/ICISE.2019.00010

  • Thoplan R (2014) Random forests for poverty classification. Int J Sci 4531(August):252–259

    Google Scholar 

  • Tian F, Wu B, Zeng H, Ahmed S, Yan N, White I, Zhang M, Stein A (2020) Identifying the links among poverty, hydroenergy and water use using data mining methods. Water Resour Manage 34(5):1725–1741. https://doi.org/10.1007/s11269-020-02524-5

    Article  Google Scholar 

  • United Nations (2014) A world that counts—mobilising the data revolution for sustainable development. Independent Expert Advisory Group on a Data Revolution for Sustainable Development, New York

  • Venkatramolla SK (2019) Machine learning and data science for a household-specific poverty level prediction task. Kansas State University

  • Wagle U, Vollmer F, Desa U, Thorbecke E, Sen A, Santos ME et al. (2010) OPHI Working Paper No. 32 Counting and Multidimensional Poverty Measurement. J Econom Inequality 9(2)

  • Walker R (2019) Multidimensional poverty. Routledge Int Handbook Poverty. https://doi.org/10.4324/9780429058103-4

    Article  Google Scholar 

  • Watson D (2016) Encyclopedia of food and agricultural ethics. Encycl Food Agric Ethics. https://doi.org/10.1007/978-94-007-6167-4

    Article  Google Scholar 

  • Weerts HJP, MuellerVanschoren AC J (2020) Importance of tuning hyperparameters of machine learning algorithms. Futures 56(October):43–52. https://doi.org/10.1016/j.futures.2013.10.008

    Article  Google Scholar 

  • WHO (2018) WHO Housing and Health Guidelines—Executive Summary. November 2018, 6–9

  • Witten I, Frank E, Hall MA (2011) Data mining—practical machine learning tools and techniques, 3rd Edition

  • Zhang Z (2016) Naïve Bayes classification in R. Ann Transl Med 4(12):241

    Article  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

The author did not receive any funding to carry out this research.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, SA and MD; methodology, SA; software, SA; data analysis, SA; supervision, MD; writing-original draft preparation, SA; writing-review and editing, MD. Both authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Salmaan Ansari.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 20 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ansari, S., Dhar, M. Poverty classification based on unsatisfied basic needs index: a comparison of supervised learning algorithms. SN Soc Sci 2, 69 (2022). https://doi.org/10.1007/s43545-022-00375-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s43545-022-00375-y

Keywords

Navigation