Abstract
We present a software architecture that federates data from multiple heterogeneous health informatics data sources owned by multiple organizations. The architecture builds upon state-of-the-art open-source Java and XML frameworks in innovative ways. It consists of (a) federated query engine, which manages federated queries and result set aggregation via a patient identification service; and (b) data source facades, which translate the physical data models into a common model on-the-fly and handle large result set streaming. System modules are connected via reusable Apache Camel integration routes and deployed to an OSGi enterprise service bus. We present an application of our architecture that allows users to construct queries via the i2b2 web front-end, and federates patient data from the University of Utah Enterprise Data Warehouse and the Utah Population database. Our system can be easily adopted, extended and integrated with existing SOA Healthcare and HL7 frameworks such as i2b2 and caGrid.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Abbreviations
- AOP:
-
Aspect-Oriented Programming
- CCTS:
-
Center for Clinical and Translational Science
- DQC:
-
Data source QueryContext
- DS:
-
Data Source
- DTS:
-
Distributed Terminology Server
- ESB:
-
Enterprise Service Bus
- ID:
-
Identifier
- FQC:
-
Federated QueryContext
- FQE:
-
Federated Query Engine
- FURTHeR:
-
Federated Utah Research & Translational Health e-Repository
- i2b2:
-
Informatics for Integrating Biology and the Bedside
- JMS:
-
Java Message Service
- MDR:
-
Metadata Repository
- OSGi:
-
Open Services Gateway Initiative
- QC:
-
QueryContext
- SOA:
-
Service-Oriented Architecture
- Spring DM:
-
Spring Dynamic Modules
- UUEDW:
-
University of Utah Enterprise Data Warehouse
- UPDB:
-
Utah Population Database
- UPDBL:
-
Utah Population Database Light
- VR:
-
Virtual Repository
References
Apache-Foundation., Camel Book in One Page, available online at http://activemq.apache.org/camel/book-in-one-page.html, 2008.
Atlassian., Bug, Issue and Project Tracking for Software Development - JIRA, available online at http://www.atlassian.com/software/jira/, 2011.
Balani, N., Apache Cxf web service development: Develop and deploy SOAP and RESTful web services. Packt, New York, 2009.
Bauer, C., and King, G., Java Persistence with Hibernate. Manning Publications; Revised edition (November 24, 2006) ISBN-10: 1932394885, ISBN-13: 978-1932394887. http://www.amazon.com/Java-Persistence-Hibernate-Christian-Bauer/dp/1932394885.
Ben Collins-Sussman, B. W. F., and Michael Pilato, C., Version Control with Subversion. O’Reilly Media, 2nd edition 2004. ISBN-10: 9780596510336. ISBN-13: 978-0596510336. http://www.amazon.com/Version-Control-Subversion-Michael-Pilato/dp/0596510330.
Birn., The Beginner’s User Guide to inQ and SRB, available online at http://nbirn.net/tools/srb/inQ_user_guide.shtm, 2007.
Bradshaw, R. L., Matney, S., Livne, O. E., Bray, B. E., Mitchell, J. A., and Narus, S. P., Architecture of a Federated Query Engine for Heterogeneous Resources. In Proceedings of the AMIA Annual Symposium, San Francisco, CA, November 14–18 2009 Published online at http://www.ncbi.nlm.nih.gov/pmc/issues/184543/, 2009.
Bug, W., Astahkov, V., Boline, J., Fennema-Notestine, C., Grethe, J. S., Gupta, A., Kennedy, D. N., Rubin, D. L., Sanders, B., Turner, J. A., and Martone, M. E., 2008. Data federation in the Biomedical Informatics Research Network: tools for semantic annotation and query of distributed multiscale brain data. In Proceedings of the AMIA Annual Symposium, Washington, DC, November 8–12 2008 Published online at http://www.ncbi.nlm.nih.gov/pmc/issues/177327/, 1220.
Further., FURTHeR XML Schemas, available online at https://dev-app.further.utah.edu/portal/doc/schemaDocs.jsf, 2010.
Hall, R. S., Pauls, K., Mcculloch, S., and Savage, D., OSGi in Action. Manning Publications; 1 edition (April 28, 2011). ISBN-10: 1933988916. ISBN-13: 978-1933988917. http://www.amazon.com/OSGi-Action-Creating-Modular-Applications/dp/1933988916.
He, S., Hurdle, J. F., Botkin, J. R., and Narus, S. P., Integrating a Federated Healthcare Data Query Platform With Electronic IRB Information Systems. Proceedings of the AMIA Annual Symposium, 2010, doi: N/A.
Katz, M., Practical RichFaces. Apress; 1 edition (December 16, 2008). ISBN-10: 9781430210559. ISBN-13: 978-1430210559. ASIN: 1430210559. http://www.amazon.com/Practical-RichFaces-Max-Katz/dp/1430210559.
Kawaguchi, K., Meet Hudson, available online at http://wiki.hudson-ci.org/display/HUDSON/Meet+Hudson, 2010.
Keator, D. B., Wei, D., Gadde, S., Bockholt, J., Grethe, J. S., Marcus, D., Aucoin, N., and Ozyurt, I. B., Derived Data Storage and Exchange Workflow for Large-Scale Neuroimaging Analyses on the BIRN Grid. Front Neuroinformatics 3, 2009, doi:10.3389/neuro.11.030.2009.
Komatsoulis, G. A., Warzel, D. B., Hartel, F. W., Shanbhag, K., Chilukuri, R., Fragoso, G., Coronado, S., Reeves, D. M., Hadfield, J. B., Ludet, C., and Covitz, P. A., caCORE version 3: Implementation of a model driven, service-oriented architecture for semantic interoperability. J. Biomed. Inform. 41:106–123, 2008. doi:10.1016/j.jbi.2007.03.009.
Livne, O. E., Schultz, N. D., and Narus, S. P., Federated querying architecture for clinical & translational health IT. In Proceedings of the 1st ACM International Health Informatics Symposium (IHI '10), Tiffany Veinot (Ed.). ACM, New York, NY, USA, 250–256. doi:10.1145/1882992.1883028.
Matney, S. A., Bradshaw, R. L., Livne, O. E., Bray, B. E., Frey, L., Mitchell, J. A., and Narus, S. P., Developing a Semantic Framework for Clinical and Translational Research. Paper presented at the AMIA Summit on Translational Bioinformatics, San Francisco, CA, 2011.
Metsker, S. J., The Design Patterns Java Workbook. Addison-Wesley Professional (April 4, 2002). ISBN-10: 0201743973. ISBN-13: 978-0201743975. http://www.amazon.com/Design-Patterns-Java-TM-Workbook/dp/0201743973.
Murphy, S. N., Mendis, M. E., Berkowitz, D. A., Kohane, I., and Chueh, H. C., Integration of clinical and genetic data in the i2b2 architecture. AMIA Annu. Symp. Proc. 2006; 2006: 1040.PMCID: PMC1839291. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1839291/.
National Cancer Institute., caDSR Wiki Home Page, available online at https://wiki.nci.nih.gov/display/caDSR/caDSR+Wiki+Home+Page;jsessionid=7ABB839EF722AC25491AF6C73476AE41, 2011.
O’brien, T., Casey, J., Fox, B., Zyl, J. V., Moser, M., Redmond, E., and Shatzer, L., Maven: The Complete Reference Sonatype, Inc., Mountain View, CA, available online at http://www.sonatype.com/books/mvnref-book/reference/public-book.html, 2009.
Oracle Corporation., JSR-000222 Java(TM) Architecture for XML Binding (JAXB), available online at http://jcp.org/aboutJava/communityprocess/mrel/jsr222/index.html, 2011.
Oster, S., Langella, S., Hastings, S., Ervin, D., Madduri, R., Phillips, J., Kurc, T., Siebenlist, F., Covitz, P., Shanbhag, K., Foster, I., and Saltz, J., caGrid 1.0: An enterprise Grid infrastructure for biomedical research. J. Am. Med. Inform. Assoc. 15:138–149, 2008. doi:10.1197/jamia.M2522.
Progress software., Open Source OSGi ESB - FUSE ESB 4 (ServiceMix 4), available online at http://fusesource.com/products/enterprise-servicemix4/, 2009.
Roth, M. T., Ozcan, F., and Haas, L. M., Cost Models DO Matter: Providing Cost Information for Diverse Data Sources in a Federated System. In Proceedings of the Proceedings of the 25th International Conference on Very Large Data Bases1999 Morgan Kaufmann Publishers Inc., 671494, 599–610, 1999.
Roth, M. T., and Schwarz, P. M., Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources. In Proceedings of the Proceedings of the 23rd International Conference on Very Large Data Bases1997 Morgan Kaufmann Publishers Inc., 670992, 266–275, 1997.
Schneier, B., Applied Cryptography : Protocols, Algorithms, and Source Code in C. Wiley, 2nd Edition, 1996. ISBN-10: 0471117099. ISBN-13: 978-0471117094. http://www.amazon.com/Applied-Cryptography-Protocols-Algorithms-Source/dp/0471117099.
Slaymaker, M., Power, D., Russell, D., Wilson, G., and Simpson, A., Accessing and aggregating legacy data sources for healthcare research, delivery and training. In Proceedings of the ACM symposium on Applied computing, Fortaleza, Ceara, Brazil2008 ACM, New York, NY, 1363994, 1317–1324, 2008.
Springsource., The Spring Framework - Reference Documentation, available online at http://static.springsource.org/spring/docs/2.5.x/reference/testing.html, 2007.
Springsource., Spring Dynamic Modules Reference Guide, available online at http://static.springsource.org/osgi/docs/1.2.1/reference/html/, 2009.
Tidwell, D., XSLT. O’Reilly Media, 2001. Print ISBN: 978-0-596-00053-0. ISBN 10: 0-596-00053-7. http://oreilly.com/catalog/9780596000530.
Walls, C., and Breidenbach, R., Spring in action. Manning Publications; 2nd edition (August 23, 2007). ISBN-10: 9781933988139. ISBN-13: 978-1933988139. http://www.amazon.com/Spring-Action-Craig-Walls/dp/1933988134.
Weber, G. M., Murphy, S. N., Mcmurry, A. J., Macfadden, D., Nigrin, D. J., Churchill, S., and Kohane, I. S., The Shared Health Research Information Network (SHRINE): A prototype federated query tool for clinical data repositories. J. Am. Med. Inform. Assoc. 16:624–630, 2009. doi:10.1197/jamia.M3191.
Acknowledgments
The authors would like to acknowledge Susan Matney for her work on the FURTHeR terminology server, and Richard Bradshaw for his work on the FURTHeR metadata repository. The present manuscript is an extension of our ACM IHI 2010 paper entitled “Federated Querying Architecture for Clinical & Translational Health IT” [16]; see Appendix for more details.
Author information
Authors and Affiliations
Corresponding author
Additional information
This investigation was supported by Public Health Services research grant UL1-RR025764 from the National Center for Research Resources and by the National Library of Medicine grant 1RC2LM010798 from the National Institutes of Health. The authors wish to thank the Pedigree and Population Resource and Research Informatics Groups at the Huntsman Cancer Institute and the University of Utah Enterprise Warehouse Team.
Appendix: New material included in this paper
Appendix: New material included in this paper
The present manuscript is an extension of our ACM IHI 2010 paper entitled “Federated Querying Architecture for Clinical & Translational Health IT” [16]. We included over 30% of new content. The main additions are
-
Section “Existing Federation Solutions”: discussion and comparison of our system with the Garlic engine.
-
Section “Aggregation via an In-memory Database”: describes in more detail the methodology of result set aggregation from multiple data sources and duplicate record resolution.
-
Section “Query Context”: a discussion of supported query expressions and specific health care applications.
-
Section “Data Source Flow”: The DS Flow was updated to support result set paging critical to large result set processing.
-
Section “Cancelling a Query”: we added a new command type called CANCEL for cancelling a query and modified the DS flow to support in-vivo cancellation requests.
Section “Implementation”: an entirely new section describing the application of our system to querying two live data sources, the University of Utah Data Warehouse and the Utah Population Database, and our integration with the i2b2 web front-end.
Rights and permissions
About this article
Cite this article
Livne, O.E., Schultz, N.D. & Narus, S.P. Federated Querying Architecture with Clinical & Translational Health IT Application. J Med Syst 35, 1211–1224 (2011). https://doi.org/10.1007/s10916-011-9720-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10916-011-9720-3