Abstract
The Linked Clinical Trials (LinkedCT) project started back in 2008 with the goal of providing a Linked Data source of clinical trials. The source of the data is from the XML data published on ClinicalTrials.gov, which is an international registry of clinical studies. Since the initial release, the LinkedCT project has gone through some major changes to both improve the quality of the data and its freshness. The result is a high-quality Linked Data source of clinical studies that is updated daily, currently containing over 195,000 trials, 4.6 million entities, and 42 million triples. In this paper, we present a detailed description of the system along with a brief outline of technical challenges involved in curating the raw XML data into high-quality Linked Data. We also present usage statistics and a number of interesting use cases developed by external parties. We share the lessons learned in the design and implementation of the current system, along with an outline of our future plans for the project which include making the system open-source and making the data free for commercial use.
The data source is publicly available at http://linkedct.org. Data dumps available at http://purl.org/net/linkedct/datadump. Please note scheduled maintenance down times on our Twitter feed https://twitter.com/linkedct. Resource URIs validated as proper Linked Data by http://validator.linkeddata.org/ (“All tests passed”). Part of LOD cloud. Registered on http://datahub.io.
R.J. Miller—Partially supported by NSERC BIN.
Chapter PDF
Similar content being viewed by others
References
Django Web Framework. https://www.djangoproject.com/
Akhtar, W., Kopecký, J., Krennwallner, T., Polleres, A.: XSPARQL: traveling between the XML and RDF worlds – and avoiding the XSLT pilgrimage. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 432–447. Springer, Heidelberg (2008)
Anderson, P., Thor, A., Benik, J., Raschid, L., Vidal, M.: PAnG: finding patterns in annotation graphs. In: SIGMOD, pp. 677–680 (2012)
Berners-Lee, T.: Linked Data - Design Issues (2006). http://www.w3.org/DesignIssues/LinkedData.html (accessed April 27, 2015)
Bizer, C., Cyganiak, R.: D2R server - publishing relational databases on the semantic web. In: ISWC Posters and Demonstrations Track (2006)
Boyce, R.D., et al.: Dynamic Enhancement of Drug Product Labels to Support Drug Safety, Efficacy, and Effectiveness. J. Biomedical Semantics 4, 5 (2013)
Buneman, P., Cheney, J., Tan, W.C., Vansummeren, S.: Curated databases. In: PODS, pp. 1–12 (2008)
Califf, R.M., Zarin, D.A., Kramer, J.M., Sherman, R.E., Aberle, L.H., Tasneem, A.: Characteristics of Clinical Trials Registered in ClinicalTrials.gov, 2007–2010. JAMA 307(17), 1838–1847 (2012)
Dumontier, M., et al.: Bio2RDF release 3: a larger, more connected network of linked data for the life sciences. In: ISWC, pp. 401–404 (2014)
Hassanzadeh, O., Kementsietsidis, A., Lim, L., Miller, R.J., Wang, M.: A framework for semantic link discovery over relational data. In: CIKM, pp. 1027–1036 (2009)
Hassanzadeh, O., Kementsietsidis, A., Lim, L., Miller, R.J., Wang, M.: LinkedCT: A Linked Data Space for Clinical Trials. Technical Report CSRG-596, University of Toronto, August 2009
Huang, Z., ten Teije, A., van Harmelen, F.: SemanticCT: a semantically-enabled system for clinical trials. In: Riaño, D., Lenz, R., Miksch, S., Peleg, M., Reichert, M., ten Teije, A. (eds.) KGC 2013 and ProHealth 2013. LNCS, vol. 8268, pp. 11–25. Springer, Heidelberg (2013)
Jentzsch, A., Andersson, B., Hassanzadeh, O., Stephens, S., Bizer, C.: Enabling tailored therapeutics with linked data. In: Proceedings of the WWW 2009 Workshop on Linked Data on the Web (LDOW 2009) (2009)
Jentzsch, A., Zhao, J., Hassanzadeh, O., Cheung, K.-H., Samwal, M., Andersson, B.: Linking open drug data. In: I-SEMANTICS (2009)
Laine, C., et al.: Clinical Trial Registration: Looking Back and Moving Ahead. New England Journal of Medicine 356(26), 2734–2736 (2007)
MacKellar, B., et al.: Patient-oriented clinical trials search through semantic integration of linked open data. In: IEEE ICCI*CC, pp. 218–225 (2013)
Novack, G.D.: Clinical Trial Registry-Update. Ocular Surface 7(4), 212–4 (2009)
Patel, C., Cimino, J., Dolby, J., Fokoue, A., Kalyanpur, A., Kershenbaum, A., Ma, L., Schonberg, E., Srinivas, K.: Matching patient records to clinical trials using ontologies. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 816–829. Springer, Heidelberg (2007)
Prayle, A.P., et al.: Compliance with Mandatory Reporting of Clinical Trial Results on ClinicalTrials.gov: Cross Sectional Study. BMJ 344 (2012)
Ross, J.S., et al.: Publication of NIH Funded Trials Registered in ClinicalTrials.gov: Cross Sectional Analysis. BMJ 344 (2012)
Sonntag, D., Setz, J., Ahmed-Baker, M., Zillner, S.: Clinical trial and disease search with ad hoc interactive ontology alignments. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 674–686. Springer, Heidelberg (2012)
Tu, S.W., et al.: OCRe: an ontology of clinical research. In: 11th International Protege Conference (2009)
Yeganeh, S.H., Hassanzadeh, O., Miller, R.J.: Linking semistructured data on the web. In: WebDB (2011)
Zarin, D.A., et al.: Trial Registration at ClinicalTrials.gov between May and October 2005. New England Journal of Medicine 353(26), 2779–2787 (2005)
Zarin, D.A., et al.: The ClinicalTrials.gov Results Database-Update and Key Issues. New England Journal of Medicine 364(9), 852–860 (2011)
Zaveri, A., et al.: ReDD-observatory: using the web of data for evaluating the research-disease disparity. In: WI, pp. 178–185 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Hassanzadeh, O., Miller, R.J. (2015). Automatic Curation of Clinical Trials Data in LinkedCT. In: Arenas, M., et al. The Semantic Web - ISWC 2015. ISWC 2015. Lecture Notes in Computer Science(), vol 9367. Springer, Cham. https://doi.org/10.1007/978-3-319-25010-6_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-25010-6_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25009-0
Online ISBN: 978-3-319-25010-6
eBook Packages: Computer ScienceComputer Science (R0)