GEM: The GAAIN Entity Mapper

  • Naveen Ashish
  • Peehoo Dewan
  • Jose-Luis Ambite
  • Arthur W. Toga
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9162)


We present a software system solution that significantly simplifies data sharing of medical data. This system, called GEM (for the GAAIN Entity Mapper), harmonizes medical data. Harmonization is the process of unifying information across multiple disparate datasets needed to share and aggregate medical data. Specifically, our system automates the task of finding corresponding elements across different independently created (medical) datasets of related data. We present our overall approach, detailed technical architecture, and experimental evaluations demonstrating the effectiveness of our approach.


Data Element Topic Modeling Mapping Accuracy Data Dictionary Schema Pair 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Ashish, N., Ambite, J.L., Muslea, M., Turner, J.A.: Neuroscience data integration through mediation: an (F)BIRN case study. Front. Neuroinform. 4:118 (2010). doi:  10.3389/fninf.2010.00118. PUBMED PMID: 21228907 PMCID: PMC3017358
  2. 2.
    Aumueller, D., Do, H.H., Massmann, S., Rahm, E.: Schema and ontology matching with COMA++. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 906–908. ACM, June 2005Google Scholar
  3. 3.
    Beekly, D.L., Ramos, E.M., Lee, W.W., et al.: The National Alzheimer’s Coordinating Center (NACC) database: the uniform data set. Alzheimer Dis. Assoc. Disord. 21, 249–258 (2007)CrossRefGoogle Scholar
  4. 4.
    Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012). doi: 10.1145/2133806.2133826. MathSciNetCrossRefGoogle Scholar
  5. 5.
    Bosch, T., Mathiak, B.: Generic multilevel approach designing domain ontologies based on XML schemas. In: Workshop Ontologies Come of Age in the Semantic Web, pp. 1–12 (2011)Google Scholar
  6. 6.
    Do, H.H., Melnik, S., Rahm, E.: Comparison of schema matching evaluations. In: Chaudhri, A.B., Jeckle, M., Rahm, E., Unland, R. (eds.) Web, Web-Services, and Database Systems 2002. LNCS, vol. 2593, pp. 221–237. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  7. 7.
    Doan, A., Halevy, A., Ives, Z.: Principles of Data Integration. Elsevier, Amsterdam (2012)Google Scholar
  8. 8.
    Doan, A., Domingos, P., Halevy, A.Y.: Reconciling schemas of disparate data sources: a machine-learning approach. In: ACM Sigmod Record, vol. 30, no. 2, pp. 509–520. ACM, May 2001Google Scholar
  9. 9.
    Garcia-Molina, H.: Database Systems: The Complete Book. Pearson Education, India (2008)Google Scholar
  10. 10.
    Halevy, A.Y., Ashish, N., Bitton, D., Carey, M., Draper, D., Pollock, J., Sikka, V.: Enterprise information integration: successes, challenges and controversies. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 778–787. ACM, June, 2005Google Scholar
  11. 11.
    Karlawish, J., Siderowf, A., Hurtig, H., Elman, L., McCluskey, L., Van Deerlin, V., Lee, V.M., Trojanowski, J.Q.: Building an integrated neurodegenerative disease database at an academic health center. Alzheimer’s Dement. 7, e84–e93 (2011). doi:  10.1016/j.jalz.2010.08 CrossRefGoogle Scholar
  12. 12.
    Mandel, A.J., Kamerick, M., Berman, D., Dahm, L.: University of California Research eXchange (UCReX): a federated cohort discovery system. In: 2012 IEEE International Conference on Healthcare Informatics, Imaging and Systems Biology, p. 146 (2012)Google Scholar
  13. 13.
    Morris, J.C., Weintraub, S., Chui, H.C., Cummings, J., DeCarli, C., Ferris, S., Foster, N.L., Galasko, D., Graff-Radford, N., Peskind, E.R., Beekly, D., Ramos, E.M., Kukull, W.A.: The Uniform Data Set (UDS): clinical and cognitive variables and descriptive data from Alzheimer Disease Centers. Alzheimer Dis. Assoc. Disord. 20(4), 210–216 (2006)CrossRefGoogle Scholar
  14. 14.
    Morris, J.C., et al.: Developing an international network for Alzheimer’s research: the Dominantly Inherited Alzheimer Network. Clin. Invest. (Lond) 2(10), 975–984 (2012). PMCID: PMC3489185CrossRefGoogle Scholar
  15. 15.
    NDAR: National Database of Autism Research (2014). Web:
  16. 16.
    Ohmann, C., Kuchinke, W.: Future developments of medical informatics from the viewpoint of networked clinical research. Methods Inf. Med. 48(1), 45–54 (2009)Google Scholar
  17. 17.
    Shen, L., Thompson, P.M., Potkin, S.G., Bertram, L., Farrer, L.A., Foroud, T.M., Green, R.C., Hu, X., Huentelman, M.J., Kim, S., Kauwe, J.S., Li, Q., Liu, E., Macciardi, F., Moore, J.H., Munsie, L., Nho, K., Ramanan, V.K., Risacher, S.L., Stone, D.J., Swaminathan, S., Toga, A.W., Weiner, M.W., Saykin, A.J.: Generic analysis of quantitative phenotypes in AD and MCI: imaging, cognition and biomarkers. Brain Imaging Behav. 8(2), 183–207 (2014)CrossRefGoogle Scholar
  18. 18.
    Sidorov, G., Gelbukh, A., Gómez-Adorno, H., Pinto, D.: Soft similarity and soft cosine measure: similarity of features in vector space model. Computación y Sistemas 18(3), 491–504 (2014). doi:  10.13053/CyS-18-3-2043. Accessed 7 October 2014CrossRefGoogle Scholar
  19. 19.
    Tata, S., Patel, J.: Estimating the selectivity of tf-idf based cosine similarity predicates. SIGMOD Rec. 36(2), 75–80 (2007)CrossRefGoogle Scholar
  20. 20.
    Wu, X., Li, J., Ayutyanont, N., Protas, H., Jagust, W., Fleisher, A., Reiman, E., Yao, L., Chen, K.: The receiver operational characteristic for binary classification with multiple indices and its application to the neuroimaging study of Alzheimer’s disease. IEEE/ACM Trans. Comput. Biol. Bioinf. 10, 173–180 (2013)CrossRefGoogle Scholar
  21. 21.
    Xie, S.X., Baek, Y., Grossman, M., Arnold, M.S., Weiner, M.W., Thal, L.J., Peterson, R.C., Jack, C., Jagust, W., Trojanowski, J.Q., Toga, A.W., Beckett, L.: Alzheimer’s disease neuroimaging initiative. Neuroimaging Clin. N. Am. 15(4), 869–877 (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Naveen Ashish
    • 1
  • Peehoo Dewan
    • 1
  • Jose-Luis Ambite
    • 2
  • Arthur W. Toga
    • 1
  1. 1.Laboratory of NeuroImaging, Keck School of Medicine of USC, USC Stevens Neuroimaging and Informatics InstituteUniversity of Southern CaliforniaLos AngelesUSA
  2. 2.Information Sciences InstituteUniversity of Southern CaliforniaLos AngelesUSA

Personalised recommendations