Abstract
This paper summarizes improvements to an earlier developed Fuzzy Bayes approach for assigning coding categories to injury narratives randomly extracted from a large U.S. insurer. Improvements to the model included: adding sequenced words as predictors and removing common subsets prior to calculation of word strengths. Removing subsets and adding word sequences improved prediction strengths for sequences found frequently in the training dataset, and resulted in more intuitive predictions and increased prediction strengths. Improved accuracy was found for several categories that had proved difficult to code in the past. This study also examined the effectiveness of a two-tiered approach, in which narratives were first categorized at the broad level (such as [falls]), before classification at a more refined level (such as [falls from heights].) The overall sensitivity following a two-tiered approach was 79% for predicting classifications at the broad category level and 66% for the more refined prediction categories.
Chapter PDF
Similar content being viewed by others
References
Sorock, G., Smith, G., Reeve, G., et al.: Three perspectives on work-related injury surveillance systems. Am J. Ind. Med. 32, 116–128 (1997)
Smith, G.S.: Public health approaches to occupational injury prevention: do they work? Inj. Prev. 7(suppl. I), i3–i10 (2001)
Lehto, M., Sorock, G.: Machine learning of motor vehicle accident categories from narrative data. Methods Info Med. 35(4-5), 309–316 (1996)
Sorock, G., Ranney, T., Lehto, M.: Motor vehicle crashes in roadway construction work zones: an analysis using narrative text from insurance claims. Accid. Anal. Prev. 28, 131–138 (1996)
Wellman, H.M., Lehto, M.R., Sorock, G.S.: Computerized coding of injury narrative data from the National Health Interview Survey. Accid Anal. Prev. 36, 165–171 (2004)
Lincoln, A.E., Sorock, G.S., Courteney, T.K., Wellman, H.M., Smith, G.S., Amoroso, P.J.: Using narrative text and coded data to develop hazard scenarios for occupational injury interventions. Inj. Prev. 10, 249–254 (2004)
Clancy, E.A.: Factors Influencing the Resubstitution Accuracy in Multivariate Classification Analysis: Implications for Study Design in Ergonomics. Ergonomics 40(4), 417–427 (1997)
Bureay of Labor Statistics. Occupational injury and illness classification manual. Us Departmetn of Labor, Washington, DC (December 1992)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Marucci, H.R., Lehto, M.R., Corns, H.L. (2007). Computer Classification of Injury Narratives Using a Fuzzy Bayes Approach: Improving the Model. In: Smith, M.J., Salvendy, G. (eds) Human Interface and the Management of Information. Methods, Techniques and Tools in Information Design. Human Interface 2007. Lecture Notes in Computer Science, vol 4557. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73345-4_57
Download citation
DOI: https://doi.org/10.1007/978-3-540-73345-4_57
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73344-7
Online ISBN: 978-3-540-73345-4
eBook Packages: Computer ScienceComputer Science (R0)