A MapReduce Based Distributed Framework for Similarity Search in Healthcare Big Data Environment

  • Hiren K.D. Sarma
  • Yogesh K. Dwivedi
  • Nripendra P. Rana
  • Emma L. Slade
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9373)


Similarity search in the big data environment is a challenging task. Patient Similarity search (PaSi) is an important issue in healthcare network and data. The results of PaSi search may be highly useful for drawing different conclusions and decisions to improve healthcare systems. Such findings can also be useful for choosing the treatment paths for new patients. In this paper, we propose a MapReduce based framework as a solution to the PaSi problem in the context of a healthcare network imagined to be implemented considering the healthcare centers of India. It is assumed that such a healthcare network will be implemented in future over the Government of India cloud known as GI cloud or ‘MeghRaj’. The paper also discusses the associated implementation challenges of the proposed framework and the query handling approach for the proposed framework to solve the PaSi problem is stated. Finally, the paper outlines the future scope of the work.


Big data MapReduce Similarity search Patient similarity (PaSi) Cloud Framework 


  1. 1.
    Barkhordari, M., Niamanesh, M.: ScaDiPaSi: an effective scalable and distributable MapReduce-based method to find patient similarity on huge healthcare networks. Big Data Res. 2(1), 19–27 (2015)CrossRefGoogle Scholar
  2. 2.
    Lee, K.H., Lee, Y.J., Choi, H., Chung, Y.D., Moon, B.: Parallel data processing with MapReduce: a survey. AcM sIGMoD Rec. 40(4), 11–20 (2012)CrossRefGoogle Scholar
  3. 3.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  4. 4.
  5. 5.
    Ding, L., Xin, J., Wang, G., Huang, S.: ComMapReduce: an improvement of MapReduce with lightweight communication mechanisms. In: Lee, S.-G., Peng, Z., Zhou, X., Moon, Y.-S., Unland, R., Yoo, J. (eds.) DASFAA 2012, Part II. LNCS, vol. 7239, pp. 150–168. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  6. 6.
    Highland, F., Stephenson, J.: Fitting the problem to the paradigm: algorithm characteristics required for effective use of MapReduce. Procedia Comput. Sci. 12, 212–217 (2012)CrossRefGoogle Scholar
  7. 7.
    Bu, Y., Howe, B., Balazinska, M., Ernst, M.D.: HaLoop: efficient iterative data processing on large clusters. Proc. VLDB Endowment 3(1–2), 285–296 (2010)CrossRefGoogle Scholar
  8. 8.
    Martha, V.S., Zhao, W., Xu, X.: h-MapReduce: a framework for workload balancing in MapReduce. In: 27th International Conference on IEEE Advanced Information Networking and Applications (AINA), pp. 637–644 (2013)Google Scholar
  9. 9.
    Groot, S.: Modeling I/O interference in data intensive Map-Reduce applications. In: 12th International Symposium on IEEE/IPSJ Applications and the Internet (SAINT), pp. 206–209 (2012)Google Scholar
  10. 10.
    Zhang, Y., Gao, Q., Gao, L., Wang, C.: Imapreduce: a distributed computing framework for iterative computation. J. Grid Comput. 10(1), 47–68 (2012)CrossRefGoogle Scholar
  11. 11.
    Nicolae, B., Moise, D., Antoniu, G., Bougé, L., Dorier, M.: BlobSeer: bringing high throughput under heavy concurrency to hadoop Map-Reduce applications. In: 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp. 1–11 (2010)Google Scholar
  12. 12.
    Mohamed, H., Marchand-Maillet, S.: MRO-MPI: MapReduce overlapping using MPI and an optimized data exchange policy. Parallel Comput. 39(12), 851–866 (2013)CrossRefGoogle Scholar
  13. 13.
    Srinivasan, U., Arunasalam, B.: Leveraging big data analytics to reduce healthcare costs. IT Prof. 15(6), 21–28 (2013)CrossRefGoogle Scholar
  14. 14.
    Jee, K., Kim, G.H.: Potentiality of big data in the medical sector: focus on how to reshape the healthcare system. Healthc. Inf. Res. 19(2), 79–85 (2013)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Metaxas, O., Dimitropoulos, H., Ioannidis, Y.: AITION: a scalable KDD platform for Big Data Healthcare. In: 2014 International Conference on IEEE-EMBS Biomedical and Health Informatics (BHI), pp. 601–604 (2014)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2015

Authors and Affiliations

  • Hiren K.D. Sarma
    • 1
  • Yogesh K. Dwivedi
    • 2
  • Nripendra P. Rana
    • 2
  • Emma L. Slade
    • 2
  1. 1.Department of Information TechnologySikkim Manipal Institute of TechnologyRangpoIndia
  2. 2.School of ManagementSwansea UniversitySwanseaUK

Personalised recommendations