Skip to main content

Mutual Information Scoring: Increasing Interpretability in Categorical Clustering Tasks with Applications to Child Welfare Data

  • Conference paper
  • First Online:
Social, Cultural, and Behavioral Modeling (SBP-BRiMS 2022)

Abstract

Youth in the American foster care system are significantly more likely than their peers to face a number of negative life outcomes, from homelessness to incarceration. Administrative data on these youth have the potential to provide insights that can help identify ways to improve their path towards a better life. However, such data also suffer from a variety of biases, from missing data to reflections of systemic inequality. The present work proposes a novel, prescriptive approach to using these data to provide insights about both data biases and the systems and youth they track. Specifically, we develop a novel categorical clustering and cluster summarization methodology that allows us to gain insights into subtle biases in existing data on foster youth, and to provide insight into where further (often qualitative) research is needed to identify potential ways of assisting youth.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. AFCARS foster care annual file user’s guide (2019). https://www.ndacan.acf.hhs.gov/datasets/pdfs_user_guides/afcars-foster-care-users-guide-2000-present.pdf

  2. The AFCARS report. Tech. Rep. 27, Administration on Children Youth and Families, Children’s Bureau, US Department of Health and Human Services (2020)

    Google Scholar 

  3. Andreopoulos, B., An, A., Wang, X., Schroeder, M.: A roadmap of clustering algorithms: Finding a match for a biomedical application. Briefings in bioinformatics

    Google Scholar 

  4. Bald, A., Doyle, Joseph J, J., Gross, M., Jacob, B.: Economics of foster care. Working Paper 29906, National Bureau of Economic Research, April 2022

    Google Scholar 

  5. Barbará, D., Li, Y., Couto, J.: Coolcat: an entropy-based algorithm for categorical clustering. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 582–589 (2002)

    Google Scholar 

  6. Camasso, M.J., Jagannathan, R.: Conceptualizing and testing the vicious cycle in child protective services: the critical role played by child maltreatment fatalities. Child Youth Serv. Rev. 103, 178–189 (2019)

    Article  Google Scholar 

  7. Connell, C.M., Vanderploeg, J.J., Flaspohler, P., Katz, K.H., Saunders, L., Tebes, J.K.: Changes in placement among children in foster care: a longitudinal study of child and case influences. Social Service Review 80(3), 398–418 (2006)

    Article  Google Scholar 

  8. Connelly, R., Playford, C.J., Gayle, V., Dibben, C.: The role of administrative data in the big data revolution in social science research. Social Science Research 59

    Google Scholar 

  9. Courtney, M., Dworsky, A., Brown, A., Cary, C., Love, K., Vorhies, V.: Midwest evaluation of the adult functioning of former foster youth: Outcomes at age 26. Tech. Rep. 9, University of Chicago, Chapin Hall Center for Children (2011)

    Google Scholar 

  10. Cusick, G., Courtney, M.: Offending during late adolescence: How do youth aging out of care compare with their peers?, January 2007

    Google Scholar 

  11. Daley, D., Bachmann, M., Bachmann, B.A., Pedigo, C., Bui, M.T., Coffman, J.: Risk terrain modeling predicts child maltreatment. Child Abuse & Neglect 62

    Google Scholar 

  12. Day, A.G., Dworsky, A., Fogarty, K.J., Damashek, A.: An examination of post-secondary retention and graduation among foster care youth enrolled in a four-year university. Child Youth Serv. Rev. 33, 2335–2341 (2011)

    Article  Google Scholar 

  13. Deng, S., He, Z., Xu, X.: G-anmi: a mutual information based genetic clustering algorithm for categorical data. Knowl.-Based Syst. 23(2), 144–149 (2010)

    Article  Google Scholar 

  14. Dua, D., Graff, C.: UCI machine learning repository (2017)

    Google Scholar 

  15. Dworsky, A., Napolitano, L., Courtney, M.: Homelessness during the transition from foster care to adulthood. American Journal of Public Health 103(S2)

    Google Scholar 

  16. Ganti, V., Gehrke, J., Ramakrishnan, R.: CACTUS-clustering categorical data using summaries. In: SIGKDD, pp. 73–83 (1999)

    Google Scholar 

  17. Green, B.L., et al.: It’s not as simple as it sounds: Problems and solutions in accessing and using administrative child welfare data for evaluating the impact of early childhood interventions. Children Youth Serv. Rev. 57, 40–49

    Google Scholar 

  18. Guha, S., Rastogi, R., Shim, K.: Rock: a robust clustering algorithm for categorical attributes. Inf. Syst. 25(5), 345–366 (2000)

    Article  Google Scholar 

  19. He, Z., Xu, X., Deng, S.: k-anmi: a mutual information based clustering algorithm for categorical data. Inf. Fusion 9(2), 223–233 (2008)

    Article  Google Scholar 

  20. Jadhav, A., Pramod, D., Ramanathan, K.: Comparison of performance of data imputation methods for numeric dataset. Appl. Artif. Intell. 33(10)

    Google Scholar 

  21. Martin, E.: Hidden Consequences: The Impact of Incarceration on Dependent Children, March 2017

    Google Scholar 

  22. Matta Oshima, K.M., Narendorf, S.C., McMillen, J.C.: Pregnancy risk among older youth transitioning out of foster care. Children and Youth Services Review 35(10)

    Google Scholar 

  23. NYS Office of Children and Family Services.: Eligibility manual for child welfare programs (2018)

    Google Scholar 

  24. Qin, H., Ma, X., Herawan, T., Zain, J.M.: MGR: an information theory based hierarchical divisive clustering algorithm for categorical data. Knowl.-Based Syst. 67, 401–411 (2014)

    Article  Google Scholar 

  25. Rodriguez, M.Y., DePanfilis, D., Lanier, P.: Bridging the gap: Social work insights for ethical algorithmic decision-making in human services. IBM J. Res. Dev. 63(4/5), 8:1–8:8 (2019)

    Google Scholar 

  26. Romano, S., Bailey, J., Nguyen, V., Verspoor, K.: Standardized mutual information for clustering comparisons: one step further in adjustment for chance. Proc. Mach. Learn. Res. 32, 1143–1151 (2014)

    Google Scholar 

  27. Schwartz, I.M., York, P., Nowakowski-Sims, E., Ramos-Hernandez, A.: Predictive and prescriptive analytics, machine learning and child welfare risk assessment: The broward county experience. Children Youth Serv. Rev. 81, 309–320

    Google Scholar 

  28. Vaithianathan, R., Maloney, T., Putnam-Hornstein, E., Jiang, N.: Children in the public benefit system at risk of maltreatment: Identification via predictive modeling. Am. J. Prev. Med. 45(3), 354–359 (2013)

    Article  Google Scholar 

  29. de Vos, N.J.: kmodes categorical clustering library (2015–2021). https://github.com/nicodv/kmodes

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pranav Sankhe .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sankhe, P., Hall, S.F., Sage, M., Rodriguez, M.Y., Chandola, V., Joseph, K. (2022). Mutual Information Scoring: Increasing Interpretability in Categorical Clustering Tasks with Applications to Child Welfare Data. In: Thomson, R., Dancy, C., Pyke, A. (eds) Social, Cultural, and Behavioral Modeling. SBP-BRiMS 2022. Lecture Notes in Computer Science, vol 13558. Springer, Cham. https://doi.org/10.1007/978-3-031-17114-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-17114-7_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-17113-0

  • Online ISBN: 978-3-031-17114-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics