Journal of Medical Systems

, Volume 36, Issue 2, pp 475–481 | Cite as

Information Extraction Approaches to Unconventional Data Sources for “Injury Surveillance System”: the Case of Newspapers Clippings

  • Paola Berchialla
  • Cecilia Scarinzi
  • Silvia Snidero
  • Yousif Rahim
  • Dario GregoriEmail author
Original Paper


Injury Surveillance Systems based on traditional hospital records or clinical data have the advantage of being a well established, highly reliable source of information for making an active surveillance on specific injuries, like choking in children. However, they suffer the drawback of delays in making data available to the analysis, due to inefficiencies in data collection procedures. In this sense, the integration of clinical based registries with unconventional data sources like newspaper articles has the advantage of making the system more useful for early alerting. Usage of such sources is difficult since information is only available in the form of free natural-language documents rather than structured databases as required by traditional data mining techniques. Information Extraction (IE) addresses the problem of transforming a corpus of textual documents into a more structured database. In this paper, on a corpora of Italian newspapers articles related to choking in children due to ingestion/inhalation of foreign body we compared the performance of three IE algorithms- (a) a classical rule based system which requires a manual annotation of the rules; (ii) a rule based system which allows for the automatic building of rules; (b) a machine learning method based on Support Vector Machine. Although some useful indications are extracted from the newspaper clippings, this approach is at the time far from being routinely implemented for injury surveillance purposes.


Injury surveillance systems Text analysis Injury prevention Public health Data mining 


  1. 1.
    Centers for Disease Control and Prevention, Updated guidelines for evaluating public health surveillance systems: recommendations from the guidelines working group, in MMWR Recomm Rep. 2001. p. 1–51.Google Scholar
  2. 2.
    Voight, B., et al., Injury reporting in Connecticut newspapers. Inj. Prev. 4(4):292–294, 1998.CrossRefGoogle Scholar
  3. 3.
    Horan, J. M., and Mallonee, S., Injury surveillance. Epidemiol. Rev. 25:24–42, 2003.CrossRefGoogle Scholar
  4. 4.
    Baullinger, J., et al., Use of Washington State newspapers for submersion injury surveillance. Inj. Prev. 7(4):339–342, 2001.CrossRefGoogle Scholar
  5. 5.
    Guard, A., and Gallagher, S. S., Heat related deaths to young children in parked cars: an analysis of 171 fatalities in the United States, 1995–2002. Inj. Prev. 11(1):33–37, 2005.CrossRefGoogle Scholar
  6. 6.
    Frost, K., Frank, E., and Maibach, E., Relative risk in the news media: a quantification of misrepresentation. Am. J. Public Health 87(5):842–845, 1997.CrossRefGoogle Scholar
  7. 7.
    Chapman, S., and Lupton, D., The fight for public health: principles and practice of media advocacy. London: BMJ. xv, 270, 1994.Google Scholar
  8. 8.
    Fine, P. R., et al., Are newspapers a viable source for intentional injury surveillance data? South Med. J. 91(3):234–242, 1998.MathSciNetCrossRefGoogle Scholar
  9. 9.
    Rainey, D. Y., and Runyan, C. W., Newspapers: a source for injury surveillance? Am. J. Public Health 82(5):745–746, 1992.CrossRefGoogle Scholar
  10. 10.
    Zhou, G., et al., Recognizing names in biomedical texts: a machine learning approach. Bioinformatics 20(7):1178–1190, 2004.CrossRefGoogle Scholar
  11. 11.
    Corney, D. P., et al., BioRAT: extracting biological information from full-length papers. Bioinformatics 20(17):3206–3213, 2004.CrossRefGoogle Scholar
  12. 12.
    Zigon, G., et al., Child mortality due to suffocation in Europe (1980–1995): a review of official data. Acta Otorhinolaryngol. Ital. 26(3):154–161, 2006.Google Scholar
  13. 13.
    Saggion, H., et al., Multimedia indexing through multi-source and multi-language information extraction: the MUMIS project. Data Knowledge Eng. 48(2):247–264, 2004.CrossRefGoogle Scholar
  14. 14.
    Cunningham, H., et al., GATE: a framework and graphical development environment for robust NLP tools and applications. In 40th Anniversary Meeting of the Association for Computational Linguistics (ACL’02). 2002.Google Scholar
  15. 15.
    Text Analysis International Inc., Integrated development environments for natural language processing. 2001.Google Scholar
  16. 16.
    Iria, J., Ireson, N., and Ciravegna, F.. An Experimental Study on Boundary Classification Algorithms for Information Extraction using SVM. In Workshop on Adaptive Text Extraction and Mining 11th Conference of the European Chapter of the Association for Computational Linguistics. 2006.Google Scholar
  17. 17.
    Joachims, T., Training Linear SVMs in Linear Time. in, Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD). 2006.Google Scholar
  18. 18.
    Makhoul, J., et al., Performance measures for information extraction. In Proceedings of DARPA Broadcast News Workshop, (Herndon, VA), 1999.Google Scholar
  19. 19.
    Ghaffar, A., Hyder, A. A., and Bishai, D., Newspaper reports as a source for injury data in developing countries. Health Policy Plan 16(3):322–325, 2001.CrossRefGoogle Scholar
  20. 20.
    Collier, N., and Takeuchi, K., Comparison of character-level and part of speech features for name recognition in biomedical texts. J. Biomed. Inform. 37(6):423–435, 2004.CrossRefGoogle Scholar
  21. 21.
    Ananiadou, S., Kell, D. B., and Tsujii, J. I., Text mining and its potential applications in systems biology. Trends Biotechnol, 2006.Google Scholar
  22. 22.
    Marshall, R. J., Comparison of misclassification rates of search partition analysis and other classification methods. Stat. Med. 25(22):3787–3797, 2005.CrossRefGoogle Scholar
  23. 23.
    Rahman, F., Andersson, R., and Svanstrom, L., Potential of using existing injury information for injury surveillance at the local level in developing countries: experiences from Bangladesh. Public Health 114:133–136, 2000.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Paola Berchialla
    • 1
  • Cecilia Scarinzi
    • 2
  • Silvia Snidero
    • 3
  • Yousif Rahim
    • 4
  • Dario Gregori
    • 1
    • 5
    Email author
  1. 1.Department of Public Health and MicrobiologyUniversity of TorinoTorinoItaly
  2. 2.Department of Statistics and Applied Mathematics D. de CastroUniversity of TorinoTorinoItaly
  3. 3.S&A S.r.l.CuneoItaly
  4. 4.International Society for Violence and Injury PreventionStockholmNorway
  5. 5.Department of Environmental Medicine and Public HealthPadovaItaly

Personalised recommendations