Abstract
Injury Surveillance Systems based on traditional hospital records or clinical data have the advantage of being a well established, highly reliable source of information for making an active surveillance on specific injuries, like choking in children. However, they suffer the drawback of delays in making data available to the analysis, due to inefficiencies in data collection procedures. In this sense, the integration of clinical based registries with unconventional data sources like newspaper articles has the advantage of making the system more useful for early alerting. Usage of such sources is difficult since information is only available in the form of free natural-language documents rather than structured databases as required by traditional data mining techniques. Information Extraction (IE) addresses the problem of transforming a corpus of textual documents into a more structured database. In this paper, on a corpora of Italian newspapers articles related to choking in children due to ingestion/inhalation of foreign body we compared the performance of three IE algorithms- (a) a classical rule based system which requires a manual annotation of the rules; (ii) a rule based system which allows for the automatic building of rules; (b) a machine learning method based on Support Vector Machine. Although some useful indications are extracted from the newspaper clippings, this approach is at the time far from being routinely implemented for injury surveillance purposes.
Similar content being viewed by others
References
Centers for Disease Control and Prevention, Updated guidelines for evaluating public health surveillance systems: recommendations from the guidelines working group, in MMWR Recomm Rep. 2001. p. 1–51.
Voight, B., et al., Injury reporting in Connecticut newspapers. Inj. Prev. 4(4):292–294, 1998.
Horan, J. M., and Mallonee, S., Injury surveillance. Epidemiol. Rev. 25:24–42, 2003.
Baullinger, J., et al., Use of Washington State newspapers for submersion injury surveillance. Inj. Prev. 7(4):339–342, 2001.
Guard, A., and Gallagher, S. S., Heat related deaths to young children in parked cars: an analysis of 171 fatalities in the United States, 1995–2002. Inj. Prev. 11(1):33–37, 2005.
Frost, K., Frank, E., and Maibach, E., Relative risk in the news media: a quantification of misrepresentation. Am. J. Public Health 87(5):842–845, 1997.
Chapman, S., and Lupton, D., The fight for public health: principles and practice of media advocacy. London: BMJ. xv, 270, 1994.
Fine, P. R., et al., Are newspapers a viable source for intentional injury surveillance data? South Med. J. 91(3):234–242, 1998.
Rainey, D. Y., and Runyan, C. W., Newspapers: a source for injury surveillance? Am. J. Public Health 82(5):745–746, 1992.
Zhou, G., et al., Recognizing names in biomedical texts: a machine learning approach. Bioinformatics 20(7):1178–1190, 2004.
Corney, D. P., et al., BioRAT: extracting biological information from full-length papers. Bioinformatics 20(17):3206–3213, 2004.
Zigon, G., et al., Child mortality due to suffocation in Europe (1980–1995): a review of official data. Acta Otorhinolaryngol. Ital. 26(3):154–161, 2006.
Saggion, H., et al., Multimedia indexing through multi-source and multi-language information extraction: the MUMIS project. Data Knowledge Eng. 48(2):247–264, 2004.
Cunningham, H., et al., GATE: a framework and graphical development environment for robust NLP tools and applications. In 40th Anniversary Meeting of the Association for Computational Linguistics (ACL’02). 2002.
Text Analysis International Inc., Integrated development environments for natural language processing. 2001.
Iria, J., Ireson, N., and Ciravegna, F.. An Experimental Study on Boundary Classification Algorithms for Information Extraction using SVM. In Workshop on Adaptive Text Extraction and Mining 11th Conference of the European Chapter of the Association for Computational Linguistics. 2006.
Joachims, T., Training Linear SVMs in Linear Time. in, Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD). 2006.
Makhoul, J., et al., Performance measures for information extraction. In Proceedings of DARPA Broadcast News Workshop, (Herndon, VA), 1999.
Ghaffar, A., Hyder, A. A., and Bishai, D., Newspaper reports as a source for injury data in developing countries. Health Policy Plan 16(3):322–325, 2001.
Collier, N., and Takeuchi, K., Comparison of character-level and part of speech features for name recognition in biomedical texts. J. Biomed. Inform. 37(6):423–435, 2004.
Ananiadou, S., Kell, D. B., and Tsujii, J. I., Text mining and its potential applications in systems biology. Trends Biotechnol, 2006.
Marshall, R. J., Comparison of misclassification rates of search partition analysis and other classification methods. Stat. Med. 25(22):3787–3797, 2005.
Rahman, F., Andersson, R., and Svanstrom, L., Potential of using existing injury information for injury surveillance at the local level in developing countries: experiences from Bangladesh. Public Health 114:133–136, 2000.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Berchialla, P., Scarinzi, C., Snidero, S. et al. Information Extraction Approaches to Unconventional Data Sources for “Injury Surveillance System”: the Case of Newspapers Clippings. J Med Syst 36, 475–481 (2012). https://doi.org/10.1007/s10916-010-9492-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10916-010-9492-1