Mutual Information Scoring: Increasing Interpretability in Categorical Clustering Tasks with Applications to Child Welfare Data

Sankhe, Pranav; Hall, Seventy F.; Sage, Melanie; Rodriguez, Maria Y.; Chandola, Varun; Joseph, Kenneth

doi:10.1007/978-3-031-17114-7_16

Pranav Sankhe¹⁰,
Seventy F. Hall¹⁰,
Melanie Sage¹⁰,
Maria Y. Rodriguez¹⁰,
Varun Chandola¹⁰ &
…
Kenneth Joseph¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13558))

Included in the following conference series:

International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation

694 Accesses
1 Citations

Abstract

Youth in the American foster care system are significantly more likely than their peers to face a number of negative life outcomes, from homelessness to incarceration. Administrative data on these youth have the potential to provide insights that can help identify ways to improve their path towards a better life. However, such data also suffer from a variety of biases, from missing data to reflections of systemic inequality. The present work proposes a novel, prescriptive approach to using these data to provide insights about both data biases and the systems and youth they track. Specifically, we develop a novel categorical clustering and cluster summarization methodology that allows us to gain insights into subtle biases in existing data on foster youth, and to provide insight into where further (often qualitative) research is needed to identify potential ways of assisting youth.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Clustering Methods with Qualitative Data: a Mixed-Methods Approach for Prevention Research with Small Samples

Article 07 May 2015

Multilevel Modeling in Family Violence Research

Article 22 August 2017

Beyond the number of classes: separating substantive from non-substantive dependence in latent class analysis

Article Open access 25 June 2015

References

AFCARS foster care annual file user’s guide (2019). https://www.ndacan.acf.hhs.gov/datasets/pdfs_user_guides/afcars-foster-care-users-guide-2000-present.pdf
The AFCARS report. Tech. Rep. 27, Administration on Children Youth and Families, Children’s Bureau, US Department of Health and Human Services (2020)
Google Scholar
Andreopoulos, B., An, A., Wang, X., Schroeder, M.: A roadmap of clustering algorithms: Finding a match for a biomedical application. Briefings in bioinformatics
Google Scholar
Bald, A., Doyle, Joseph J, J., Gross, M., Jacob, B.: Economics of foster care. Working Paper 29906, National Bureau of Economic Research, April 2022
Google Scholar
Barbará, D., Li, Y., Couto, J.: Coolcat: an entropy-based algorithm for categorical clustering. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 582–589 (2002)
Google Scholar
Camasso, M.J., Jagannathan, R.: Conceptualizing and testing the vicious cycle in child protective services: the critical role played by child maltreatment fatalities. Child Youth Serv. Rev. 103, 178–189 (2019)
Article Google Scholar
Connell, C.M., Vanderploeg, J.J., Flaspohler, P., Katz, K.H., Saunders, L., Tebes, J.K.: Changes in placement among children in foster care: a longitudinal study of child and case influences. Social Service Review 80(3), 398–418 (2006)
Article Google Scholar
Connelly, R., Playford, C.J., Gayle, V., Dibben, C.: The role of administrative data in the big data revolution in social science research. Social Science Research 59
Google Scholar
Courtney, M., Dworsky, A., Brown, A., Cary, C., Love, K., Vorhies, V.: Midwest evaluation of the adult functioning of former foster youth: Outcomes at age 26. Tech. Rep. 9, University of Chicago, Chapin Hall Center for Children (2011)
Google Scholar
Cusick, G., Courtney, M.: Offending during late adolescence: How do youth aging out of care compare with their peers?, January 2007
Google Scholar
Daley, D., Bachmann, M., Bachmann, B.A., Pedigo, C., Bui, M.T., Coffman, J.: Risk terrain modeling predicts child maltreatment. Child Abuse & Neglect 62
Google Scholar
Day, A.G., Dworsky, A., Fogarty, K.J., Damashek, A.: An examination of post-secondary retention and graduation among foster care youth enrolled in a four-year university. Child Youth Serv. Rev. 33, 2335–2341 (2011)
Article Google Scholar
Deng, S., He, Z., Xu, X.: G-anmi: a mutual information based genetic clustering algorithm for categorical data. Knowl.-Based Syst. 23(2), 144–149 (2010)
Article Google Scholar
Dua, D., Graff, C.: UCI machine learning repository (2017)
Google Scholar
Dworsky, A., Napolitano, L., Courtney, M.: Homelessness during the transition from foster care to adulthood. American Journal of Public Health 103(S2)
Google Scholar
Ganti, V., Gehrke, J., Ramakrishnan, R.: CACTUS-clustering categorical data using summaries. In: SIGKDD, pp. 73–83 (1999)
Google Scholar
Green, B.L., et al.: It’s not as simple as it sounds: Problems and solutions in accessing and using administrative child welfare data for evaluating the impact of early childhood interventions. Children Youth Serv. Rev. 57, 40–49
Google Scholar
Guha, S., Rastogi, R., Shim, K.: Rock: a robust clustering algorithm for categorical attributes. Inf. Syst. 25(5), 345–366 (2000)
Article Google Scholar
He, Z., Xu, X., Deng, S.: k-anmi: a mutual information based clustering algorithm for categorical data. Inf. Fusion 9(2), 223–233 (2008)
Article Google Scholar
Jadhav, A., Pramod, D., Ramanathan, K.: Comparison of performance of data imputation methods for numeric dataset. Appl. Artif. Intell. 33(10)
Google Scholar
Martin, E.: Hidden Consequences: The Impact of Incarceration on Dependent Children, March 2017
Google Scholar
Matta Oshima, K.M., Narendorf, S.C., McMillen, J.C.: Pregnancy risk among older youth transitioning out of foster care. Children and Youth Services Review 35(10)
Google Scholar
NYS Office of Children and Family Services.: Eligibility manual for child welfare programs (2018)
Google Scholar
Qin, H., Ma, X., Herawan, T., Zain, J.M.: MGR: an information theory based hierarchical divisive clustering algorithm for categorical data. Knowl.-Based Syst. 67, 401–411 (2014)
Article Google Scholar
Rodriguez, M.Y., DePanfilis, D., Lanier, P.: Bridging the gap: Social work insights for ethical algorithmic decision-making in human services. IBM J. Res. Dev. 63(4/5), 8:1–8:8 (2019)
Google Scholar
Romano, S., Bailey, J., Nguyen, V., Verspoor, K.: Standardized mutual information for clustering comparisons: one step further in adjustment for chance. Proc. Mach. Learn. Res. 32, 1143–1151 (2014)
Google Scholar
Schwartz, I.M., York, P., Nowakowski-Sims, E., Ramos-Hernandez, A.: Predictive and prescriptive analytics, machine learning and child welfare risk assessment: The broward county experience. Children Youth Serv. Rev. 81, 309–320
Google Scholar
Vaithianathan, R., Maloney, T., Putnam-Hornstein, E., Jiang, N.: Children in the public benefit system at risk of maltreatment: Identification via predictive modeling. Am. J. Prev. Med. 45(3), 354–359 (2013)
Article Google Scholar
de Vos, N.J.: kmodes categorical clustering library (2015–2021). https://github.com/nicodv/kmodes

Download references

Author information

Authors and Affiliations

University at Buffalo, Buffalo, NY, USA
Pranav Sankhe, Seventy F. Hall, Melanie Sage, Maria Y. Rodriguez, Varun Chandola & Kenneth Joseph

Authors

Pranav Sankhe
View author publications
You can also search for this author in PubMed Google Scholar
Seventy F. Hall
View author publications
You can also search for this author in PubMed Google Scholar
Melanie Sage
View author publications
You can also search for this author in PubMed Google Scholar
Maria Y. Rodriguez
View author publications
You can also search for this author in PubMed Google Scholar
Varun Chandola
View author publications
You can also search for this author in PubMed Google Scholar
Kenneth Joseph
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pranav Sankhe .

Editor information

Editors and Affiliations

Army Cyber Institute, West Point, NY, USA
Robert Thomson
Pennsylvania State University, Pennsylvania, PA, USA
Christopher Dancy
United States Military Academy, West Point, NY, USA
Aryn Pyke

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sankhe, P., Hall, S.F., Sage, M., Rodriguez, M.Y., Chandola, V., Joseph, K. (2022). Mutual Information Scoring: Increasing Interpretability in Categorical Clustering Tasks with Applications to Child Welfare Data. In: Thomson, R., Dancy, C., Pyke, A. (eds) Social, Cultural, and Behavioral Modeling. SBP-BRiMS 2022. Lecture Notes in Computer Science, vol 13558. Springer, Cham. https://doi.org/10.1007/978-3-031-17114-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-17114-7_16
Published: 18 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17113-0
Online ISBN: 978-3-031-17114-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Mutual Information Scoring: Increasing Interpretability in Categorical Clustering Tasks with Applications to Child Welfare Data

Abstract

Access this chapter

Similar content being viewed by others

Clustering Methods with Qualitative Data: a Mixed-Methods Approach for Prevention Research with Small Samples

Multilevel Modeling in Family Violence Research

Beyond the number of classes: separating substantive from non-substantive dependence in latent class analysis

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Mutual Information Scoring: Increasing Interpretability in Categorical Clustering Tasks with Applications to Child Welfare Data

Abstract

Access this chapter

Similar content being viewed by others

Clustering Methods with Qualitative Data: a Mixed-Methods Approach for Prevention Research with Small Samples

Multilevel Modeling in Family Violence Research

Beyond the number of classes: separating substantive from non-substantive dependence in latent class analysis

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation