Information Systems Frontiers

, Volume 20, Issue 2, pp 195–207 | Cite as

A comparative analysis of semi-supervised learning: The case of article selection for medical systematic reviews

  • Jun LiuEmail author
  • Prem Timsina
  • Omar El-Gayar


While systematic reviews are positioned as an essential element of modern evidence-based medical practice, the creation of these reviews is resource intensive. To mitigate this problem, there have been some attempts to leverage supervised machine learning to automate the article triage procedure. This approach has been proved to be helpful for updating existing systematic reviews. However, this technique holds very little promise for creating new reviews because training data is rarely available when it comes to systematic creation. In this research we assess and compare the applicability of semi-supervised learning to overcome this labeling bottleneck and support the creation of systematic reviews. The results indicated that semi-supervised learning could significantly reduce the human effort and is a viable technique for automating medical systematic review creation with a small-sized training dataset.


Medical systematic reviews Semi-supervised learning Active learning Self-training Text mining Text analytics 


  1. Adeva, G., Atxa, P., Carrillo, U., & Zengotitabengoa, A. (2014). Automatic text classification to support systematic reviews in medicine. Expert Systems with Applications, 41(4), 1498–1508.CrossRefGoogle Scholar
  2. Allen, I., & Olkin, I. (1999). Estimating time to conduct a meta‐analysis from number of citations retrieved. JAMA, 282(7), 634–635.CrossRefGoogle Scholar
  3. Bekhuis, T., & Demner-Fushman, D. (2012). Screening nonrandomized studies for medical systematic reviews: a comparative study of classifiers. Artificial Intelligence in Medicine, 55, 197–207.CrossRefGoogle Scholar
  4. Bennett, K. and Demiriz, A. (1999). Semi-supervised support vector machines. Advances in Neural Information processing systems: 368–374.Google Scholar
  5. Cohen, A. M., Hersh, W. R., Peterson, K., & Yen, P.-Y. (2006). Reducing workload in systematic review preparation using automated citation classification. Journal of the American Medical Informatics Association, 13(2), 206–219.CrossRefGoogle Scholar
  6. Cohen, A. M., Ambert, K., & McDonagh, M. (2009). Cross-topic learning for work prioritization in systematic review creation and update. Journal of the American Medical Informatics Association, 16(5), 690–704.CrossRefGoogle Scholar
  7. Frunza, O., Inkpen, D. and Matwin, S. (2010). Building Systematic Reviews Using Automatic Text Classification Techniques. Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Association for Computational Linguistics: 303–311.Google Scholar
  8. Gieseke, F., Airola, A., Pahikkala, T., & Kramer, O. (2014). Fast and simple gradient-based optimization for semi-supervised support vector machines. Neurocomputing, 123, 23–32.CrossRefGoogle Scholar
  9. Jin, Y., Huang, C., & Zhao, L. (2011). A semi-supervised learning algorithm based on modified self-training SVM. Journal of Computers, 6(7), 1438–1443.CrossRefGoogle Scholar
  10. Lin, J. S., O’Connor, E., Rossom, R. C., Perdue, L. A., & Eckstrom, E. (2013). Screening for cognitive impairment in older adults: a systematic review for the U.S. preventive services task force. Annals of Internal Medicine, 159(9), 601–612.Google Scholar
  11. Matwin, S., Kouznetsov, A., Inkpen, D., Frunza, O., & O’Blenis, P. (2010). A new algorithm for reducing the workload of experts in performing systematic reviews. Journal of the American Medical Informatics Association, 17(4), 446–453.CrossRefGoogle Scholar
  12. McGowan, J., & Sampson, M. (2005). Systematic reviews need systematic searchers. Journal of the Medical Library Association, 93(1), 74–80.Google Scholar
  13. Murdoch, T., & Detsky, A. (2013). The inevitable application of big data to health care. JAMA, 309(13), 1351–1352.CrossRefGoogle Scholar
  14. Robertson, S. (2004). Understanding inverse document frequency: on theoretical arguments for IDF. Journal of Documentation, 60(5), 503–520.CrossRefGoogle Scholar
  15. Settles, B. (2010). Active learning literature survey. University of Wisconsin, Madison 52(11): 55–66.Google Scholar
  16. Shemilt, I., Simon, A., Hollands, G. J., Marteau, T. M., Ogilvie, D., O’Mara-Eves, A., Kelly, M. P., & Thomas, J. (2013). Pinpointing needles in giant haystacks: use of text mining to reduce impractical screening workload in extremely large scoping reviews. Research Synthesis Methods, 5(1), 31–49.CrossRefGoogle Scholar
  17. Shojania, K. G., Sampson, M., Ansari, M. T. and Garritty, C. (2007). Updating Systematic Reviews. Publication No. AHRQ 07–0087, Rockville, MD, Agency for Healthcare Research and Quality.Google Scholar
  18. Song, M., Yu, H. and Han, W. S. (2011). Combining active learning and semi-supervised learning techniques to extract protein interaction sentences. BMC bioinformatics 12.Google Scholar
  19. Thomas, J., McNaught, J., & Ananiadou, S. (2011). Applications of text mining within systematic reviews. Research Synthesis Methods, 2(1), 1–14.CrossRefGoogle Scholar
  20. Timsina, P., Liu, J. and El-Gayar, O. (2015). Advanced analytics for the automation of medical systematic reviews. Information Systems Frontiers (A Special Issue on Big Data and Analytics in Healthcare): 1–16.Google Scholar
  21. Tsafnat, G., Glasziou, P., Choong, M., Dunn, A., Galgani, F., & Coiera, E. (2014). Systematic review automation technologies. Systematic Reviews, 3, 74.CrossRefGoogle Scholar
  22. Wang, S., Li, D., Petrick, N., Sahiner, B., Linguraru, M. G., & Summersa, R. M. (2015). Optimizing area under the ROC curve using semi-supervised learning. Pattern Recognition, 48(1), 276–287.CrossRefGoogle Scholar
  23. Zhou, D., Bousquet, O., Lal, T. N., Weston, J. and Schölkopf, B. (2004). Learning with Local and Global Consistency. Max Planck Institute for Biological Cybernetics, 72076 Tuebingen, Germany.Google Scholar
  24. Zhu, X. (2005). Semi-supervised learning literature survey. TR-1530, University of Wisconsin-Madison, Department of Computer Science.Google Scholar
  25. Zhu, X. and Ghahramani, Z. (2002). Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University.Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Dakota State UniversityMadisonUSA

Personalised recommendations