Abstract
Today, an enormous amount of data is being continuously generated in all walks of life by all kinds of devices and systems every day. A significant portion of such data is being captured, stored, aggregated and analyzed in a systematic way without losing its “4V” (i.e., volume, velocity, variety, and veracity) characteristics. We review major drivers of big data today as well the recent trends and established platforms that offer valuable perspectives on the information stored in large and heterogeneous data sets. Then, we present a classification of some of the most important challenges when handling big data. Based on this classification, we recommend solutions that could address the identified challenges, and in addition we highlight cross-disciplinary research directions that need further investigation in the future.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
References
Jacobs A (2009) The pathologies of big data. Commun ACM 52(8):36–44
Madden S (2012) From databases to big data. IEEE Internet Comput 16(3):4–6
Wu X, Zhu X, Wu GQ, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107
Gantz J, Reinsel D (2011) Extracting value from chaos. IDC iView, pp 1–12
Banaee H, Ahmed MU, Loutfi A (2013) Data mining for wearable sensors in health monitoring systems: a review of recent trends and challenges. Sensors 13(12):17472–17500
Hashem IAT, Yaqoob I, Anuar NB, Mokhtar S, Gani A, Khan SU (2015) The rise of ‘big data’ on cloud computing: review and open research issues. Inf Syst 47:98–115
Kwon O, Lee N, Shin B (2014) Data quality management, data usage experience and acquisition intention of big data analytics. Int J Inf Manag 34(3):387–394
Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Byers AH (2016) Big data: the next frontier for innovation, competition, and productivity. McKinsey Global Institute, 2011. http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation. Accessed 12 January 2016
Pretz K (2016) Better health care through data: how health analytics could contain costs and improve care. The IEEE Institute, New York. http://theinstitute.ieee.org/technology-focus/technology-topic/better-health-care-through-data. Accessed 12 January 2016
Chen H, Compton S, Hsiao O (2013) DiabeticLink: a health big data system for patient empowerment and personalized healthcare, vol 8040. In: Smart health. Springer, Berlin, pp 71–83
O’Driscoll A, Daugelaite J, Sleator RD (2013) Big data. Hadoop and cloud computing in genomics. J Biomed Inf 46(5):774–781
Big Data Insight Group. http://www.thebigdatainsightgroup.com/site/article/nypd-make-big-apple-safer-big-data. Accessed 12 January 2016
Rozenfeld M (2016) The future of crime prevention. IEEE Institute, New York. http://theinstitute.ieee.org/technology-focus/technology-topic/the-future-of-crime-prevention. Accessed 12 January 2016
NASA Jet Propulsion Laboratory, Managing the deluge of ’Big Data’ from space. http://solarsystem.nasa.gov/news/display.cfm?News_ID=45192. Accessed 12 January 2016
Kambatla K, Kollias G, Kumar V, Grama A (2014) Trends in big data analytics. J Parallel Distrib Comput 74(7):2561–2573 ISSN 0743–7315
Atzeni P, Bugiotti F, Rossi L (2014) Uniform access to NoSQL systems. Inf Syst 43:117–133 ISSN 0306–4379
Chen M, Mao S, Liu Y (2014) Big data: a survey. Mobile Netw Appl 19(2):171–209
Owen S, Anil R, Dunning T, Friedman E (2011) Mahout in action. Manning Publications Co, USA ISBN: 9781935182689
Prakashbhai PA, Pandey HM (2014) Inference patterns from Big Data using aggregation, filtering and tagging—a survey. In: 5th international conference The next generation information technology summit (confluence), September 2014, pp 66–71
Hu H, Wen Y, Chua TS, Li X (2014) Toward scalable systems for big data analytics: a technology tutorial. IEEE Access 2:652–687
Che D, Safran M, Peng Z (2013) From big data to big data mining: challenges, issues, and opportunities. In: Lecture notes in computer science, vol 7827, pp 1–15
Tan W, Blake MB, Saleh I, Dustdar S (2013) Social-network-sourced big data analytics. IEEE Internet Comput 7(5):62–69
Lin J, Kolcz A (2012) Large-scale machine learning at twitter. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data (SIGMOD ’12). ACM, New York, pp 793–804
Liu J, Liu F, Ansari N (2014) Monitoring and analyzing big traffic data of a large-scale cellular network with Hadoop. IEEE Netw 28(4):32–39
Marchal S, Francois J, State R, Engel T (2014) Phishstorm: detecting phishing with streaming analytics. IEEE Trans Netw Serv Manag 11(4):458–471
Ma C, Zhang HH, Wang X (2014) Machine learning for Big Data analytics in plants. Trends Plant Sci 19(12):798–808
Chandola V, Sukumar SR, Schryver JC (2013) Knowledge discovery from massive healthcare claims data. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’13). ACM, New York, pp 1312–1320
3M Meeting Network. http://www.3rd-force.org/meetingnetwork/files/meetingguide_pres.pdf. Accessed 12 January 2016
Reda K, Febretti A, Knoll A, Aurisano J, Leigh J, Johnson AE, Papka ME, Hereld M (2013) Visualizing large, heterogeneous data in hybrid-reality environments. IEEE Comput Graph Appl 33(4):38–48
Philip Chen CL, Zhang CY (2014) Data-intensive applications, challenges, techniques and technologies: a survey on Big Data. Inf Sci 275:314–347
Jagadish HV, Gehrke J, Labrinidis A, Papakonstantinou Y, Patel JM, Ramakrishnan R, Shahabi C (2014) Big data and its technical challenges. Commun ACM 57(7):86–94
Labrinidis A, Jagadish HV (2012) Challenges and opportunities with big data. Proc VLDB Endow 5(12):2032–2033
Buneman P, Khanna S, Tan W (2000) Data provenance: some basic issues. In: Proceedings of foundations of software technology and theoretical computer science (FST TCS 2000). LNCS, vol 1974, pp 87–93
Price S, Flach PA (2013) A Higher-order data flow model for heterogeneous Big Data. In: 2013 IEEE international conference on big data, October 2013, pp 569–574
Xindong W, Xingquan Z, Gong-Qing W, Wei D (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107
Davis K, Patterson D (2012) Ethics of big data, O’Reilly. ISBN 978-1-4493-1179-7
Mann S (2012) Through the glass. Light IEEE Technol Soc Mag 31(3):10–14
Michael K, Miller KW (2013) Big data: new opportunities and new challenges. IEEE Comput 46(6):22–24
Kupwade PH, Seshadri R (2014) Big data security and privacy issues in healthcare. In: 2014 IEEE international congress on big data, pp 762–765
Volkovs M, Fei C, Szlichta J, Miller RJ (2014) Continuous data cleaning. In: 2014 IEEE 30th international conference on data engineering (ICDE), pp 244–255
Wang J, Song Z, Li Q, Yu J, Chen F (2014) Semantic-based intelligent data clean framework for big data. In: 2014 international conference on security, pattern analysis, and cybernetics (SPAC), pp 448–453
Stonebraker M, Bruckner D, Ilyas I, Beskales G, Cherniack M, Zdonik S, Pagan A, Xu S (2013) Data curation at scale: the data tamer system. In: Proceedings of biennial ACM conference on innovative data systems research (CIDR’13), Alisomar
Bansal SK (2014) Towards a semantic extract-transform-load (ETL) framework for big data integration. In: 2014 IEEE international congress on big data (BigData Congress), pp 522–529
Kadadi A, Agrawal R, Nyamful C, Atiq R (2014) Challenges of data integration and interoperability in big data. In: 2014 IEEE international conference on big data (Big Data), pp 38–40
Dong XL, Srivastava D (2013) Big data integration. In: 2013 IEEE 29th international conference on data engineering (ICDE), pp 1245–1248
Sowe SK, Zettsu K (2013) The architecture and design of a community-based cloud platform for curating big data. In: 2013 international conference on cyber-enabled distributed computing and knowledge discovery (CyberC), pp 171–178
O’Leary DE (2014) Embedding AI and crowdsourcing in the big data lake. IEEE Intell Syst 29(5):70–73
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. ACM Commun 51(1):107–113
Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26(2):1–26
Kumar KA, Quamar A, Deshpande A, Khuller S (2014) SWORD: workload-aware data placement and replica selection for cloud data management systems. VLDB J 23(6):845–870
Wang Z, Zhu W, Chen X, Sun L, Liu J, Chen M, Cui P, Yang S (2013) Propagation-based social-aware multimedia content distribution. ACM Trans Multimed Comput Commun Appl (TOMM) 9(1):52:1–52:20
Wang Z, Zhu W, Chen M, Sun L, Yang S (2015) CPCDN: content delivery powered by context and user intelligence. IEEE Trans Multimed 17(1):92–103
Menglan H, Jun L, Yang W, Veeravalli B (2014) Practical resource provisioning and caching with dynamic resilience for cloud-based content distribution networks. IEEE Trans Parall Distrib Syst 25(8):2169–2179
Suto K, Nishiyama H, Kato N, Nakachi T, Fujii T, Takahara A (2014) Toward integrating overlay and physical networks for robust parallel processing architecture. IEEE Netw 28(4):40–45
Jiayi L, Rosenberg C, Simon G, Texier G (2014) Optimal delivery of rate-adaptive streams in underprovisioned networks. IEEE J Select Areas Commun 32(4):706–718
Fiore S, D’Anca A, Elia D, Palazzo C, Foster I, Williams D, Aloisio G (2014) Ophidia: a full software stack for scientific data analytics. In: 2014 international conference on high performance computing & simulation (HPCS), pp 343–350
Bhandarkar SM, Arabnia HR, Smith JW (1995) A reconfigurable architecture for image processing and computer vision. Int J Pattern Recognit Artif Intell (IJPRAI) 9(2):201–229. (Special issue on VLSI Algorithms and Architectures for Computer Vision. Image Processing, Pattern Recognition and AI)
Heinze T, Pappalardo V, Jerzak Z, Fetzer C (2014) Auto-scaling techniques for elastic data stream processing. In: 2014 IEEE 30th international conference on data engineering workshops (ICDEW), pp 296–302
Hsiang HW, Tse CY, Chien MW (2014) Multiple two-phase data processing with mapreduce. In: 2014 IEEE 7th international conference on cloud computing (CLOUD), pp 352–359
Arif Wani M, Arabnia HR (2003) Parallel edge-region-based segmentation algorithm targeted at reconfigurable multi-ring network. J Supercomput 25(1):43–63
Mokhtari R, Stumm M (2014) BigKernel—high performance CPU-GPU communication pipelining for big data-style applications. In: 2014 IEEE 28th international parallel and distributed processing symposium, pp 819–828
Chatterjee A, Radhakrishnan S, Sekharan CN (2014) Connecting the dots: triangle completion and related problems on large data sets using GPUs. In: 2014 IEEE international conference on big data (Big Data), pp 1–8
Shahar Y (1997) A framework for knowledge-based temporal abstraction. Elsevier Artif Intell 90(1–2):79–133
Tajer A, Veeravalli VV, Poor HV (2014) Outlying sequence detection in large data sets: a data-driven approach. IEEE Signal Process Mag 31(5):44–56
Bhandarkar SM, Arabnia HR (1995) The REFINE multiprocessor: theoretical properties and algorithms. Elsevier Parall Comput 21(11):1783–1806
Bhandarkar SM, Arabnia HR (1995) The Hough transform on a reconfigurable multi-ring network. J Parall Distrib Comput 24(1):107–114
Arabnia HR, Bhandarkar SM (1996) Parallel stereocorrelation on a reconfigurable multi-ring network. J Supercomput 10(3):243–270
Vafopoulos M, Meimaris M, Anagnostopoulos I, Papantoniou A, Xidias I, Alexiou G, Vafeiadis G, Klonaras M, Loumos V (2015) Public spending as LOD: the case of Greece. Seman Web Interoperabil Usabil Applicabil Seman Web 6(2):155–164
Ekbia H, Mattioli M, Kouper I, Arave G, Ghazinejad A, Bowman T, Suri VR, Tsou A, Weingart S, Sugimoto CR (2014) Big data, bigger dilemmas: a critical review. J Assoc Inf Sci Technol. Wiley, New York
Smith M, Szongott C, Henne B, von Voigt G (2012) Big data privacy issues in public social media. In: 6th IEEE international conference on digital ecosystems technologies (DEST), pp 1–6
Zhang X, Dou W, Pei J, Nepal S, Yang C, Liu C, Chen J (2015) Proximity-aware local-recoding anonymization with mapreduce for scalable big data privacy preservation in cloud. IEEE Trans Comput 64(8):2293–2307
Acknowledgments
We express our gratitude to Emna Mezghani for her contributions to this work. The authors thank the anonymous reviewers for their valuable comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Anagnostopoulos, I., Zeadally, S. & Exposito, E. Handling big data: research challenges and future directions. J Supercomput 72, 1494–1516 (2016). https://doi.org/10.1007/s11227-016-1677-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1677-z