Automatic Curation of Clinical Trials Data in LinkedCT

  • Oktie Hassanzadeh
  • Renée J. Miller
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9367)


The Linked Clinical Trials (LinkedCT) project started back in 2008 with the goal of providing a Linked Data source of clinical trials. The source of the data is from the XML data published on, which is an international registry of clinical studies. Since the initial release, the LinkedCT project has gone through some major changes to both improve the quality of the data and its freshness. The result is a high-quality Linked Data source of clinical studies that is updated daily, currently containing over 195,000 trials, 4.6 million entities, and 42 million triples. In this paper, we present a detailed description of the system along with a brief outline of technical challenges involved in curating the raw XML data into high-quality Linked Data. We also present usage statistics and a number of interesting use cases developed by external parties. We share the lessons learned in the design and implementation of the current system, along with an outline of our future plans for the project which include making the system open-source and making the data free for commercial use.


Clinical trials Linked data Data curation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Django Web Framework.
  2. 2.
    Akhtar, W., Kopecký, J., Krennwallner, T., Polleres, A.: XSPARQL: traveling between the XML and RDF worlds – and avoiding the XSLT pilgrimage. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 432–447. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  3. 3.
    Anderson, P., Thor, A., Benik, J., Raschid, L., Vidal, M.: PAnG: finding patterns in annotation graphs. In: SIGMOD, pp. 677–680 (2012)Google Scholar
  4. 4.
    Berners-Lee, T.: Linked Data - Design Issues (2006). (accessed April 27, 2015)
  5. 5.
    Bizer, C., Cyganiak, R.: D2R server - publishing relational databases on the semantic web. In: ISWC Posters and Demonstrations Track (2006)Google Scholar
  6. 6.
    Boyce, R.D., et al.: Dynamic Enhancement of Drug Product Labels to Support Drug Safety, Efficacy, and Effectiveness. J. Biomedical Semantics 4, 5 (2013)CrossRefGoogle Scholar
  7. 7.
    Buneman, P., Cheney, J., Tan, W.C., Vansummeren, S.: Curated databases. In: PODS, pp. 1–12 (2008)Google Scholar
  8. 8.
    Califf, R.M., Zarin, D.A., Kramer, J.M., Sherman, R.E., Aberle, L.H., Tasneem, A.: Characteristics of Clinical Trials Registered in, 2007–2010. JAMA 307(17), 1838–1847 (2012)CrossRefGoogle Scholar
  9. 9.
    Dumontier, M., et al.: Bio2RDF release 3: a larger, more connected network of linked data for the life sciences. In: ISWC, pp. 401–404 (2014)Google Scholar
  10. 10.
    Hassanzadeh, O., Kementsietsidis, A., Lim, L., Miller, R.J., Wang, M.: A framework for semantic link discovery over relational data. In: CIKM, pp. 1027–1036 (2009)Google Scholar
  11. 11.
    Hassanzadeh, O., Kementsietsidis, A., Lim, L., Miller, R.J., Wang, M.: LinkedCT: A Linked Data Space for Clinical Trials. Technical Report CSRG-596, University of Toronto, August 2009Google Scholar
  12. 12.
    Huang, Z., ten Teije, A., van Harmelen, F.: SemanticCT: a semantically-enabled system for clinical trials. In: Riaño, D., Lenz, R., Miksch, S., Peleg, M., Reichert, M., ten Teije, A. (eds.) KGC 2013 and ProHealth 2013. LNCS, vol. 8268, pp. 11–25. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  13. 13.
    Jentzsch, A., Andersson, B., Hassanzadeh, O., Stephens, S., Bizer, C.: Enabling tailored therapeutics with linked data. In: Proceedings of the WWW 2009 Workshop on Linked Data on the Web (LDOW 2009) (2009)Google Scholar
  14. 14.
    Jentzsch, A., Zhao, J., Hassanzadeh, O., Cheung, K.-H., Samwal, M., Andersson, B.: Linking open drug data. In: I-SEMANTICS (2009)Google Scholar
  15. 15.
    Laine, C., et al.: Clinical Trial Registration: Looking Back and Moving Ahead. New England Journal of Medicine 356(26), 2734–2736 (2007)CrossRefGoogle Scholar
  16. 16.
    MacKellar, B., et al.: Patient-oriented clinical trials search through semantic integration of linked open data. In: IEEE ICCI*CC, pp. 218–225 (2013)Google Scholar
  17. 17.
    Novack, G.D.: Clinical Trial Registry-Update. Ocular Surface 7(4), 212–4 (2009)CrossRefGoogle Scholar
  18. 18.
    Patel, C., Cimino, J., Dolby, J., Fokoue, A., Kalyanpur, A., Kershenbaum, A., Ma, L., Schonberg, E., Srinivas, K.: Matching patient records to clinical trials using ontologies. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 816–829. Springer, Heidelberg (2007) CrossRefGoogle Scholar
  19. 19.
    Prayle, A.P., et al.: Compliance with Mandatory Reporting of Clinical Trial Results on Cross Sectional Study. BMJ 344 (2012)Google Scholar
  20. 20.
    Ross, J.S., et al.: Publication of NIH Funded Trials Registered in Cross Sectional Analysis. BMJ 344 (2012)Google Scholar
  21. 21.
    Sonntag, D., Setz, J., Ahmed-Baker, M., Zillner, S.: Clinical trial and disease search with ad hoc interactive ontology alignments. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 674–686. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  22. 22.
    Tu, S.W., et al.: OCRe: an ontology of clinical research. In: 11th International Protege Conference (2009)Google Scholar
  23. 23.
    Yeganeh, S.H., Hassanzadeh, O., Miller, R.J.: Linking semistructured data on the web. In: WebDB (2011)Google Scholar
  24. 24.
    Zarin, D.A., et al.: Trial Registration at between May and October 2005. New England Journal of Medicine 353(26), 2779–2787 (2005)CrossRefGoogle Scholar
  25. 25.
    Zarin, D.A., et al.: The Results Database-Update and Key Issues. New England Journal of Medicine 364(9), 852–860 (2011)CrossRefGoogle Scholar
  26. 26.
    Zaveri, A., et al.: ReDD-observatory: using the web of data for evaluating the research-disease disparity. In: WI, pp. 178–185 (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of TorontoTorontoCanada
  2. 2.IBM T.J. Watson Research CenterNew YorkUSA

Personalised recommendations