A decade of big data literature: analysis of trends in light of bibliometrics

  • Iftikhar AhmadEmail author
  • Gulzar Ahmed
  • Syed Adeel Ali Shah
  • Ejaz Ahmed


Bibliometrics is a quantitative tool for the analysis of literature published in a scientific field. Using Scopus as the data source, we perform a thorough analysis of scholarly works published in the field of big data from 2008 to 2017. The objective of the work is to find the most cited articles in the given time frame, the citation trends, the authorship trends as well as the trends of research work in the related area. The analysis shows that over 50% of publications do not receive any citations, and the average number of citations per publication is 3.17. It is also observed that single authorship of research publications has declined over the time. The analysis reveals the pioneering role played by the USA in advancing the research in big data, which has lately been taken over by China, and the large-scale usage of big data analytics in various domains of science.


Big data Bibliometric analysis Citation analysis 


  1. 1.
    Adam D, Kramer I, Guillory JE, Hancock JT (2014) Experimental evidence of massive-scale emotional contagion through social networks. Proc Natl Acad Sci 111(24):8788–8790CrossRefGoogle Scholar
  2. 2.
    Ahmed E, Yaqoob I, Hashem IAT, Khan I, Ahmed AIA, Imran M, Vasilakos AV (2017) The role of big data analytics in Internet of Things. Comput Netw 129:459–471CrossRefGoogle Scholar
  3. 3.
    Aksnes DW (2003) Characteristics of highly cited papers. Res Evalu 12(3):159–170CrossRefGoogle Scholar
  4. 4.
    AlZubi AA (2018) Big data analytic diabetics using map reduce and classification techniques. J Supercomput.
  5. 5.
    Anagnostopoulos I, Zeadally S, Exposito E (2016) Handling big data: research challenges and future directions. J Supercomput 72(4):1494–1516. CrossRefGoogle Scholar
  6. 6.
    Aronova E, Baker KS, Oreskes N (2010) Big science and big data in biology: from the international geophysical year through the international biological program to the long term ecological research (LTER) network, 1957 present. Hist Stud Nat Sci 40(2):183–224CrossRefGoogle Scholar
  7. 7.
    Bello-Orgaz G, Jung JJ, Camacho D (2016) Social big data: recent achievements and new challenges. Inf Fusion 28:45–59CrossRefGoogle Scholar
  8. 8.
    Bourque P, Abran A, Garbajosa J, Keeni G, Shen B (2014) Guide to the software engineering body of knowledge (SWEBOK) version3.0. IEEE PressGoogle Scholar
  9. 9.
    Boyd D, Crawford K (2012) Critical questions for big data: provocations for a cultural, technological, and scholarly phenomenon. Inf Commun Soc 15(5):662–679CrossRefGoogle Scholar
  10. 10.
    Brinkmann BH, Bower MR, Stengel KA, Worrell GA, Stead M (2009) Large-scale electrophysiology: acquisition, compression, encryption, and storage of big data. J Neurosci Methods 180(1):185–192CrossRefGoogle Scholar
  11. 11.
    Brzezinski M (2015) Power laws in citation distributions: evidence from scopus. Scientometrics 103(1):213228CrossRefGoogle Scholar
  12. 12.
    Chadegani A, Arezoo, Salehi H, Yunus M, Farhadi H, Fooladi M, Farhadi M, Ebrahim NA (2013) A comparison between two main academic literature collections: Web of Science and Scopus databases. Asian Soc Sci 9(5):18–26CrossRefGoogle Scholar
  13. 13.
    Chen H, Chiang RHL, Storey VC (2012) Business intelligence and analytics: from big data to big impact. MIS Q 36(4):1165–1188CrossRefGoogle Scholar
  14. 14.
    Chen Y, Alspaugh S, Katz R (2012) Interactive analytical processing in big data systems: a cross-industry study of mapreduce workloads. Proc VLDB Endow 5(12):1802–1813CrossRefGoogle Scholar
  15. 15.
    Chianese A, Marulli F, Piccialli F, Benedusi P, Jung JE (2017) An associative engines based approach supporting collaborative analytics in the internet of cultural things. Future Gener Comput Syst 66:187–198CrossRefGoogle Scholar
  16. 16.
    Cohen J, Dolan B, Dunlap M, Hellerstein JM, Welton C (2009) MAD skills: new analysis practices for big data. Proc VLDB Endow 2(2):1481–1492CrossRefGoogle Scholar
  17. 17.
    Crespo JA, Herranz N, Li Y, RuizCastillo J (2014) The effect on citation inequality of differences in citation practices at the web of science subject category level. J Assoc Inf Sci Technol 65(6):1244–1256CrossRefGoogle Scholar
  18. 18.
    Culnan MJ (1978) An analysis of the information usage patterns of academics and practitioners in the computer field: a citation analysis of a national conference proceedings. Inf Process Manag 14(6):395–404CrossRefGoogle Scholar
  19. 19.
    Davis PM (2009) Authorchoice openaccess publishing in the biological and medical literature: a citation analysis. J Assoc Inf Sci Technol 60(1):3–8CrossRefGoogle Scholar
  20. 20.
    Ding Y, Zhang G, Chambers T, Song M, Wang X, Zhai C (2014) Contentbased citation analysis: the next generation of citation analysis. J Assoc Inf Sci Technol 65(9):1820–1833CrossRefGoogle Scholar
  21. 21.
    Dou C, Cui Y, Wong R, Atif M, Li G, Ranjan R (2017) Unsupervised blocking and probabilistic parallelisation for record matching of distributed big data. J Supercomput.
  22. 22.
    Effendy S, Yap RHC (2017) Analysing trends in computer science research: a preliminary study using the Microsoft Academic Graph. In: Proceedings of the 26th International Conference on World Wide Web companion. International World Wide Web, Conferences Steering Committee, pp 1245–1250Google Scholar
  23. 23.
    Al-Fuqaha A, Guizani M, Mohammadi M, Aledhari M, Ayyash M (2015) Internet of things: a survey on enabling technologies, protocols, and applications. IEEE Commun Surv Tutor 17(4):2347–2376CrossRefGoogle Scholar
  24. 24.
    Garousi V, Mäntylä MV (2016) Citations, research topics and active countries in software engineering: a bibliometrics study. Comput Sci Rev 19:56–77MathSciNetCrossRefGoogle Scholar
  25. 25.
    Garousi V (2015) A bibliometric analysis of the Turkish software engineering research community. Scientometrics 105(1):23–49CrossRefGoogle Scholar
  26. 26.
    Garousi V, Fernandes JM (2016) Highly-cited papers in software engineering: the top-100. Inf Softw Technol 71:108–128CrossRefGoogle Scholar
  27. 27.
    Gingras Y, Wallace ML (2010) Why it has become more difficult to predict Nobel Prize winners: a bibliometric analysis of nominees and winners of the chemistry and physics prizes (19012007). Scientometrics 82(2):401–412CrossRefGoogle Scholar
  28. 28.
    Gohar M, Ahmed SH, Khan M, Guizani N, Ahmed A, Rahman AU (2018) A big data analytics architecture for the internet of small things. IEEE Commun Mag 56(2):128–133CrossRefGoogle Scholar
  29. 29.
    Goodrum, Abby A, McCain KW, Lawrence S, Giles CL (2001) Scholarly publishing in the Internet age: a citation analysis of computer science literature. Inf Process Manag 37(5):661–675CrossRefGoogle Scholar
  30. 30.
    Hampton SE, Strasser CA, Tewksbury JJ, Gram WK, Budden A, Batcheller et al (2013) Big data and the future of ecology. Front Ecol Environ 11(3):156–162CrossRefGoogle Scholar
  31. 31.
    Hashem, Targio IA, Yaqoob I, Anuar NB, Mokhtar S, Gani A, Khan SU (2015) The rise of big data on cloud computing: review and open research issues. Inf Syst 47:98–115CrossRefGoogle Scholar
  32. 32.
    Herodotou, Herodotos, Lim H, Luo G, Borisov N, Dong L, Cetin FB, Babu S (2011) Starfish: a self-tuning system for big data analytics. In Cidr 11(2011):261–272Google Scholar
  33. 33.
    Ho Y-S (2012) Top-cited articles in chemical engineering in Science Citation Index Expanded: a bibliometric analysis. Chin J Chem Eng 20(3):478–488CrossRefGoogle Scholar
  34. 34.
    Ho Y-S (2014) Classic articles on social work field in Social Science Citation Index: a bibliometric analysis. Scientometrics 98(1):137–155CrossRefGoogle Scholar
  35. 35.
    Hoonlor A, Szymanski BK, Zaki MJ (2013) Trends in computer science research. Commun ACM 56(10):74–83CrossRefGoogle Scholar
  36. 36.
    Howe, Doug, Costanzo M, Fey P, Gojobori T, Hannick L, Hide W, Hill DP et al (2008) Big data: the future of biocuration. Nature 455(7209):47CrossRefGoogle Scholar
  37. 37.
    Ioannidis J, Boyack KW, Small H, Sorensen AA, Klavans R (2014) Bibliometrics: is your most cited work your best? Nat News 514(7524):561–562CrossRefGoogle Scholar
  38. 38.
    Jabbar S, Malik KR, Ahmad M, Aldabbas O, Asif M, Khalid S, Han K, Ahmed SH (2018) A methodology of real-time data fusion for localized big data analytics. IEEE Access 6:24510–24520CrossRefGoogle Scholar
  39. 39.
    Jacobs A (2009) The pathologies of big data. Commun ACM 52(8):36–44CrossRefGoogle Scholar
  40. 40.
    Kalantari A, Kamsin A, Kamaruddin HS, Ebrahim NA, Gani A, Ebrahimi A, Shamshirband S (2017) A bibliometric approach to tracking big data research trends. J Big Data 4(1)Google Scholar
  41. 41.
    Kosinski M, Stillwell D, Graepel T (2013) Private traits and attributes are predictable from digital records of human behavior. Proc Natl Acad Sci 110(15):5802–5805CrossRefGoogle Scholar
  42. 42.
    Leonelli S (2014) What difference does quantity make? On the epistemology of Big Data in biology. Big Data Soc 1(1)Google Scholar
  43. 43.
    Liao H, Tang M, Luo L, Li C, Chiclana F, Zeng X-J (2018) A bibliometric analysis and visualization of medical big data research. Sustainability 10(1)Google Scholar
  44. 44.
    Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE (2017) A survey of deep neural network architectures and their applications. Neurocomputing 234:11–26CrossRefGoogle Scholar
  45. 45.
    Lladós J, Cores F, Guirado F (2018) Optimization of consistency-based multiple sequence alignment using Big Data technologies. J Supercomput.
  46. 46.
    Murdoch TB, Detsky AS (2013) The inevitable application of big data to health care. JAMA 309(13):1351–1352CrossRefGoogle Scholar
  47. 47.
    Newman R, Tseng J (2018) Cloud computing and the square kilometre array. Last Accessed 8th May
  48. 48.
    Nobre GC, Tavares E (2017) Scientific literature analysis on big data and internet of things applications on circular economy: a bibliometric study. Scientometrics 111(1):463–492CrossRefGoogle Scholar
  49. 49.
    Raghupathi W, Raghupathi V (2014) Big data analytics in healthcare: promise and potential. Health Inf Sci Syst 2(1)Google Scholar
  50. 50.
    Rani S, Ahmed SH, Talwar R, Malhotra J (2017) Can sensors collect big data? An energy-efficient big data gathering algorithm for a WSN. IEEE Trans Ind Inform 13(4):1961–1968CrossRefGoogle Scholar
  51. 51.
    Rodríguez-Mazahua L, Rodríguez-Enríquez C-A, Sánchez-Cervantes JL, Cervantes J, García-Alcaraz JL, Alor-Hernández G (2016) A general perspective of big data: applications, tools, challenges and trends. J Supercomput 72(8):3073–3113. CrossRefGoogle Scholar
  52. 52.
    Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, Efron MJ, et al (2015) Big data: astronomical or genomical? PLoS Biol 13(7)Google Scholar
  53. 53.
    Thomson Reuters Using Bibliometrics: A guide to evaluating research performance with citation data Last Accessed 8 Feb 2018
  54. 54.
    Wohlin C (2005) An analysis of the most cited articles in software engineering journals—1999. Inf Softw Technol 47(15):957–964CrossRefGoogle Scholar
  55. 55.
    Wohlin C (2007) An analysis of the most cited articles in software engineering journals—2000. Inf Softw Technol 49(1):2–11CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science and Information TechnologyUniversity of Engineering and TechnologyPeshawarPakistan
  2. 2.Institute of Business and Management SciencesThe University of AgriculturePeshawarPakistan
  3. 3.Centre for Mobile Cloud Computing ResearchUniversity of MalayaKuala LumpurMalaysia

Personalised recommendations