Analysis of Aadhaar Card Dataset Using Big Data Analytics

  • R. JayashreeEmail author
Conference paper
Part of the Lecture Notes on Data Engineering and Communications Technologies book series (LNDECT, volume 35)


Aadhaar provides an essential details about an individual with 12-digit unique identification number that contains all the details about an individual, including demographic and biometric information of every resident Indian individual. Aadhaar is a big data which need to be stored and managed securely and safely. Several processing techniques and privacy measures have introduced to process such huge confidential data. However, identifying individual details which may be used by different sectors is not linked or updated with aadhaar data. In order to update essential details of an individual along with existing database of an aadhaar for use by crime department, health care center and professionals, several algorithms, tools, techniques used in big data analytics have been discussed in this survey paper. This is useful for hospitals for retrieving blood donor details, crime investigation and professionals for retrieving the details about residents along with their aadhaar details.


MySQL Hadoop Sqoop Hive 


  1. 1.
    Wu, X., Zhu, X., Wu, G.Q., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)CrossRefGoogle Scholar
  2. 2.
    Xu, L., Jiang, C., Wang, J., Yuan, J., Ren, Y.: Information security in big data: privacy and data mining. China Commun. (Suppl. 2) (2014)Google Scholar
  3. 3.
    Matturdi, B., Zhou, X., Li, S., Lin, F.: Big data security and privacy: a review. IEEE Trans. Content Min. (2014). Scholar
  4. 4.
    Ren, Z., Wan, J., Shi, W., Xu, X., Zhou, M.: Workload analysis, implications and optimization on a production hadoop cluster: a case study on taobao. IEEE Trans. Serv. Comput. 7(2), 307–321 (2013). 1939-1374/13/$31.00CrossRefGoogle Scholar
  5. 5.
    Fuad, A., Erwin, A., Ipung, H.P.: Processing performance on Apache Pig, Apache Hive and MySQL Cluster. In: International Conference on Information, Communication Technology and System, pp. 297-302 (2014)Google Scholar
  6. 6.
    Xie, H., Wang, M., Lie, J.: A data reusing strategy based on hive. National Natural Science Foundation of China, No. 61103046, and Fundamental Research Funds for the Central Universities, DHU Distinguished Young Professor Program, No. B201312Google Scholar
  7. 7.
    Wang, K., Bian, Z., Chen, Q., Wang, R., Xu, G.: Simulating hive cluster for deployment planning, evaluation and optimization. In: IEEE 6th International Conference on Cloud Computing Technology and Science, pp. 475-482 (2014).
  8. 8.
    Mavaluru, D., Shriram, R., Sugumaran, V.: Big data analytics in information retrieval: promise and potential. In: IEEE Network (2015)Google Scholar
  9. 9.
    Motwani, D., Madan, M.L.: Information retrieval using hadoop big data analysis. In: Proceedings of 08th IRF International Conference, Bengaluru (2014). ISBN: 978–93-84209-33-9Google Scholar
  10. 10.
    Ke, H., Li, P., Guo, S., Stojmenovic, I.: Aggregation on the fly: reducing traffic for big data in the cloud. Science and Engineering. Springer Proceedings in Physics, vol. 166. Springer, India (2015). Scholar
  11. 11.
    Bernstein, D.: The emerging hadoop, analytics, stream stack for big data. IEEE Cloud Comput. 1(4), 84–86 (2014). 2325-6095/14/$31.00CrossRefGoogle Scholar
  12. 12.
    Zhao, Y., Wu, J., Liu, C.: Dache: a data aware caching for big-data applications using the MapReduce framework. Tsinghua Sci. Technol. 19(1), 39–50 (2014). ISSN ll1007-0214ll05/10llpp39-50CrossRefGoogle Scholar
  13. 13.
    Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Antony, S., Liu, H., Murthy, R.: Hive – a petabyte scale data warehouse using hadoop (2010)Google Scholar
  14. 14.
    Kodabagi, M.M., Sarashetti, D., Naik, V.: A text information retrieval technique for big data using Map Reduce. Bonfring Int. J. Softw. Eng. Soft Comput. 6, 22–26 (2016)CrossRefGoogle Scholar
  15. 15.
    Guo, Y., Rao, J., Cheng, D., Zhou, X.: iShuffle: improving hadoop performance with shuffle-on-write. IEEE Trans. Parallel Distrib. Syst. 28(6), 1649–1662 (2016). Scholar
  16. 16.
    Dede, E., Sendir, B., Kuzlu, P., Weachock, J., Govindaraju, M., Ramakrishnan, L.: Processing Cassandra datasets with Hadoop-streaming based approaches. IEEE Trans. Serv. Comput. 9(1), 46–58 (2015). Scholar
  17. 17.
    Xia, D., Li, H., Wang, B., Li, Y., Zhang, Z.: A Map Reduce-based nearest neighbor approach for big-data-driven traffic flow prediction. IEEE Trans. 2169–3536 (2016). Scholar
  18. 18.
    Shulyak, A.C., John, L.K.: Identifying performance bottlenecks in Hive: use of processor counters. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 2109-2114 (2016)Google Scholar
  19. 19.
    Li, X., Li, H., Huang, Z., Zhu, B., Cai, J.: EStore: an effective optimized data placement structure for Hive. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 2996-3001 (2016)Google Scholar
  20. 20.
    Surekha, D., Swamy, G., Venkatramaphanikumar, S.: Real time streaming data storage and processing using storm and analytics with Hive. In: 2016 International Conference on Advanced Communication Control and Computing Technologies (ICACCCT), pp. 606-610 (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringPanimalar Engineering CollegeChennaiIndia

Personalised recommendations