Emergent models, frameworks, and hardware technologies for Big data analytics

  • Sven Groppe


Today’s state-of-the-art Big data analytics engines handle masses of data, but will reach to their limits, as the future Big data flood is predicted to still grow with an increasing speed. Hence we need to think about the next development phase and future features of Big data analytics engines. In this paper, we discuss possible future enhancements in the area of Big data analytics with focus on emergent models, frameworks, and hardware technologies. We point out a selection of new challenges and open research questions.


Big data Computer architectures FPGA GPU Cloud Computing Fog Computing Dew Computing Semantic Web 


  1. 1.
    Abdelfattah MS, Hagiescu A, Singh D (2014) Gzip on a Chip: High performance lossless data compression on FPGAs using OpenCL. In: Proceedings of the International Workshop on OpenCL 2014, IWOCL ’14. ACM, New York, NY, USA, pp 4:1–4:9Google Scholar
  2. 2.
    Ahn J, Im D, Kim H (2015) Sigmr: Mapreduce-based SPARQL query processing by signature encoding and multi-way join. J Supercomput 71(10):3695–3725CrossRefGoogle Scholar
  3. 3.
    Ajtai M, Komlós J, Szemerédi E (1983) An 0(n log n) sorting network. In: Proceedings of the Fifteenth Annual ACM Symposium on Theory of Computing, STOC ’83. ACM, New York, NY, USA, pp 1–9Google Scholar
  4. 4.
    Alam M, Yoginath SB, Perumalla KS (2016) Performance of point and range queries for in-memory databases using radix trees on GPUS. In: 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp 1493–1500Google Scholar
  5. 5.
    Alcantara DA, Sharf A, Abbasinejad F, Sengupta S, Mitzenmacher M, Owens JD, Amenta N (2009) Real-time parallel hashing on the gpu. ACM Trans Graph 28(5):154CrossRefGoogle Scholar
  6. 6.
    Alvarez V, Richter S, Chen X, Dittrich J (2015) A comparison of adaptive radix trees and hash tables. In: ICDEGoogle Scholar
  7. 7.
    AMD (2014) Compute Cores, White Paper. Accessed 19 Feb 2018
  8. 8.
    Ashkiani S, Li S, Farach-Colton M, Amenta N, Owens JD (2017) GPU LSM: a dynamic dictionary data structure for the GPU. CoRR. arXiv:1707.05354. Accessed 19 Feb 2018
  9. 9.
    Baddar SWA-H, Batcher KE (2011) Designing sorting networks: a new paradigm. Springer, BerlinCrossRefGoogle Scholar
  10. 10.
    Barbieri DF, Braga D, Ceri S, Della Valle E, Grossniklaus M (2010) Incremental reasoning on streams and rich background knowledge. Springer, Berlin, pp 1–15zbMATHGoogle Scholar
  11. 11.
    Barbieri DF, Braga D, Ceri S, Valle ED, Huang Y, Tresp V, Rettinger A, Wermser H (2010) Deductive and inductive stream reasoning for semantic social media analytics. IEEE Intell Syst 25(6):32–41CrossRefGoogle Scholar
  12. 12.
    Batcher KE (1968) Sorting networks and their applications. In: AFIPSGoogle Scholar
  13. 13.
    Battré D, Heine F, Höing A, Kao O (2007) On triple dissemination, forward-chaining, and load balancing in DHT based RDF stores. In: Proceedings of the 2006 International Conference on Databases, Information Systems, and Peer-to-Peer Computing. Springer, pp 343–354Google Scholar
  14. 14.
    Bender MA et al (2007) Cache-oblivious streaming B-trees. In: SPAAGoogle Scholar
  15. 15.
    Berners-Lee T, Hendler J, Lassila O (2001) The Semantic Web. Scientific American Magazine 284:34–43CrossRefGoogle Scholar
  16. 16.
    Blochwitz C, Joseph JM, Pionteck T, Backasch R, Werner S, Heinrich D, Groppe S (2015) An optimized radix-tree for hardware-accelerated index generation for Semantic Web Databases. In: International Conference on ReConFigurable Computing and FPGAs (ReConFig), Cancun, Mexico, December 7–9Google Scholar
  17. 17.
    Blochwitz C, Wolff J, Joseph JM, Werner S, Heinrich D, Groppe S, Pionteck T (2017) Hardware-accelerated radix-tree based string sorting for Big data applications. In: Architecture of Computing Systems (ARCS 2017) - 30th International Conference (LNCS), vol 10172, Vienna, Austria, pp 47–58, 3–6 April 2017Google Scholar
  18. 18.
    Bonomi F, Milito R, Zhu J, Addepalli S (2012) Fog computing and its role in the internet of things. In: Proceedings of the First Edition of the MCC Workshop on Mobile Cloud Computing, MCC ’12. ACM, New York, pp 13–16Google Scholar
  19. 19.
    Borne K (2014) Top 10 Big Data Challenges A Serious Look at 10 Big Data Vs. Gartner. Accessed 19 Feb 2018
  20. 20.
    Carbone P, Ewen S, Fóra G, Haridi S, Richter S, Tzoumas K (2017) State management in apache flink: consistent stateful distributed stream processing. Proc. VLDB Endow. 10(12):1718–1729CrossRefGoogle Scholar
  21. 21.
    Chang F et al (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26(2):4:1–4:26MathSciNetCrossRefGoogle Scholar
  22. 22.
    Chazelle B, Guibas LJ (1986) Fractional cascading: I. A data structuring technique. Algorithmica 1(1):133–162MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Chellappa R (1997) Intermediaries in cloud-computing: a new computing paradigm. In: INFORMSGoogle Scholar
  24. 24.
    Chen X, Chen H, Zhang N, Zhang S (2014) Sparkrdf: elastic discreted rdf graph processing engine with distributed memory. In: Proceedings of the 2014 International Conference on Posters & Demonstrations Track, ISWC-PD’14, vol 1272, pp 261–264, Aachen, Germany. CEUR-WS.orgGoogle Scholar
  25. 25.
    Chen Y-T, Cong J, Fang Z, Lei J, Wei P (2016) When spark meets FPGAs: a case study for next-generation DNA sequencing acceleration. In: 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 16), Denver, CO, 2016. USENIX AssociationGoogle Scholar
  26. 26.
    Comer D (1979) Ubiquitous B-tree. ACM Comput Surv 11(2):121–137MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Daga M, Nutter M (2012) Exploiting coarse-grained parallelism in B+ tree searches on an APU. In: High performance computing, networking, storage and analysis (SCC), 2012 SC companion. IEEE, pp 240–247Google Scholar
  28. 28.
    DataStax, Inc (2016) How is data written?. Accessed 19 Feb 2018
  29. 29.
    Dowd M et al (1989) The periodic balanced sorting network. J ACM 36(4):738–757MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    Facebook (2015) Indexing SST files for better lookup performance. Accessed 19 Feb 2018
  31. 31.
    Fisher DE, Yang S (2016) Doing more with the dew: a new approach to cloud-dew architecture. Open J Cloud Comput 3(1):8–19CrossRefGoogle Scholar
  32. 32.
    Gaetani E, Aniello L, Baldoni R, Lombardi F, Margheri A, Sassone V (2017) Blockchain-based database to ensure data integrity in cloud computing environments. In: ITASEC, pp 146–155Google Scholar
  33. 33.
    Google (2015) Leveldb file layout and compactions. Accessed 19 Feb 2018
  34. 34.
    Graux D, Jachiet L, Genevès P, Layaïda N (2016) SPARQLGX: efficient distributed evaluation of SPARQL with apache spark. In: The Semantic Web - ISWC 2016—15th International Semantic Web Conference, Kobe, Japan, October 17–21, 2016, Proceedings, Part II, pp 80–87Google Scholar
  35. 35.
    Groppe J, Groppe S, Schleifer A, Linnemann V (Nov. 2009) LuposDate: a semantic web database system. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management (ACM CIKM 2009). ACM, Hong Kong, China, pp 2083–2084Google Scholar
  36. 36.
    Groppe S (2011) Data management and query processing in semantic web databases. Springer, BerlinCrossRefzbMATHGoogle Scholar
  37. 37.
    Groppe S (2017) LUPOSDATE Semantic Web Database Management System. Accessed 3 Feb 2017
  38. 38.
    Groppe S, Kiencke T, Werner S, Heinrich D, Stelzner M, Gruenwald L (2014) P-luposdate: using precomputed bloom filters to speed up sparql processing in the cloud. Open J Semant Web 1(2):25–55Google Scholar
  39. 39.
    Heimel M, Saecker M, Pirk H, Manegold S, Markl V (2013) Hardware-oblivious parallelism for in-memory column-stores. Proc VLDB Endow 6(9):709–720CrossRefGoogle Scholar
  40. 40.
    Heinrich D, Werner S, Blochwitz C, Pionteck T, Groppe S (2017) Search & update optimization of a B+ tree in a hardware aided semantic web database system. In: Proceedings of the 7th International Conference on Emerging Databases (EDB)(Lecture Notes in Electrical Engineering (LNEE)). Springer, vol 461 , pp 172–182Google Scholar
  41. 41.
    Heinrich D, Werner S, Stelzner M, Blochwitz C, Pionteck T, Groppe S (2015) Hybrid FPGA approach for a B+ tree in a semantic web database system. In: Proceedings of the 10th International Symposium on Reconfigurable Communication-Centric Systems-on-Chip (ReCoSoC 2015), Bremen, Germany, June 29–July 1 2015. IEEEGoogle Scholar
  42. 42.
    Idreos S, Koubarakis M (2004) Methods and applications of artificial intelligence. In: Third Hellenic Conference on AI, SETN 2004, Samos, Greece, May 5–8, 2004. Proceedings, Chapter P2P-DIET: Ad-hoc and Continuous Queries in Peer-to-Peer Networks Using Mobile Agents. Springer, Berlin, pp 23–32Google Scholar
  43. 43.
    Idreos S, Koubarakis M, Tryfonopoulos C (2004) Advances in database technology—EDBT 2004. In: 9th International Conference on Extending Database Technology, Heraklion, Crete, Greece, March 14–18, 2004, chapter P2P-DIET: One-Time and Continuous Queries in Super-Peer Networks. Springer, Berlin, pp 851–853Google Scholar
  44. 44.
    Jung HS, Yoon CS, Lee YW, Park JW, Yun CH (2017) Processing IoT data with cloud computing for smart cities. Int J Web Appl (IJWA) 9(3):88–95Google Scholar
  45. 45.
    Kaoudi Z, Koubarakis M, Kyzirakos K, Miliaraki I, Magiridou M, Papadakis-Pesaresi A (2010) Atlas: storing, updating and querying RDF(S) data on top of DHTs. Web Semant Sci Serv Agents World Wide Web 8(4):271–277CrossRefGoogle Scholar
  46. 46.
    Kaoudi Z, Manolescu I (2015) Rdf in the clouds: a survey. VLDB J 24(1):67–91CrossRefGoogle Scholar
  47. 47.
    Laney D (2001) 3D Data Management: controlling data volume, velocity and variety. Gartner, Accessed 19 Feb 2018
  48. 48.
    Leis V, Kemper A, Neumann T (2013) The adaptive radix tree: artful indexing for main-memory databases. In: ICDEGoogle Scholar
  49. 49.
    Li J, Tseng H-W, Lin C, Papakonstantinou Y, Swanson S (2016) Hippogriffdb: balancing i/o and gpu bandwidth in big data analytics. Proc. VLDB Endow. 9(14):1647–1658CrossRefGoogle Scholar
  50. 50.
    Liang W, Yin W, Kang P, Wang L (2016) Memory efficient and high performance key-value store on FPGA using cuckoo hashing. In: FPLGoogle Scholar
  51. 51.
    Liarou E, Idreos S, Koubarakis M (2007) The Semantic Web. In: 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, November 11–15, 2007. Proceedings, chapter Continuous RDF Query Processing over DHTs. Springer, Berlin, pp 324–339Google Scholar
  52. 52.
    Linked Data (2016) Linked data—connect distributed data across the Web. Accessed 4 Nov 2016Google Scholar
  53. 53.
    Liu Y, McBrien P (2017) Spowl: spark-based owl 2 reasoning materialisation. In: Proceedings of the 4th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond (BeyondMR’17)Google Scholar
  54. 54.
    LOD2 (2016) LODStats. Accessed 4 Nov 2016Google Scholar
  55. 55.
    LOD2 (2016) Welcome—LOD2—Creating knowledge out of interlinked data. Accessed 4 Nov 2016Google Scholar
  56. 56.
    Luo L, Wong MDF, Leong L (2012) Parallel implementation of r-trees on the GPU. In: 17th Asia and South Pacific Design Automation Conference, pp 353–358Google Scholar
  57. 57.
    Maarala AI, Su X, Riekki J (2017) Semantic reasoning for context-aware internet of things applications. IEEE Internet Things J 4(2):461–473CrossRefGoogle Scholar
  58. 58.
    Mammo M, Bansal SK (2015) Distributed SPARQL over big RDF data: a comparative analysis using presto and mapreduce. In: 2015 IEEE International Congress on Big Data, New York City, NY, USA, June 27–July 2, pp 33–40Google Scholar
  59. 59.
    Mattern F, Floerkemeier C (2010) From the internet of computers to the internet of things. In: From Active Data Management to Event-based Systems and More. Springer, pp 242–259Google Scholar
  60. 60.
    McConaghy T, Marques R, Müller A, De Jonghe D, McConaghy T, McMullen G, Henderson R, Bellemare S, Granzotto A (2016) Bigchaindb: a scalable blockchain database. White paperGoogle Scholar
  61. 61.
    Mietz R, Groppe S, Oliver Kleine DB, Fischer S, Römer K, Pfisterer D (2013) A P2P semantic query framework for the internet of things. PIK - Praxis der Informationsverarbeitung und Kommunikation 36(2):73–79CrossRefGoogle Scholar
  62. 62.
    Mietz R, Groppe S, Römer K, Pfisterer D (2013) Semantic models for scalable search in the internet of things. J Sens Actuator Netw 2(2):172–195CrossRefGoogle Scholar
  63. 63.
    Moore GE (1965) Cramming more components onto integrated circuits. Electronics 38(8):114–117Google Scholar
  64. 64.
    Moore GE (1975) Progress in digital integrated electronics. In: Electron Devices Meeting, 1975 International. IEEE, vol 21, pp 11–13Google Scholar
  65. 65.
    Moore GE (2015) The man whose name means progress, the visionary engineer reflects on 50 years of Moore’s Law. IEEE Spectrum: special report: 50 years of Moore’s Law (Interview). Interview with Rachel Courtland. Accessed 19 Feb 2018
  66. 66.
    Moscovici N, Cohen N, Petrank E (2017) A GPU-friendly skiplist algorithm. In: 26th International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, pp 246–259Google Scholar
  67. 67.
    Mueller R, Teubner J, Alonso G (2012) Sorting networks on fpgas. VLDB J 21(1):1–23CrossRefGoogle Scholar
  68. 68.
    Nakamoto S (2008) Bitcoin: a peer-to-peer electronic cash system. Accessed 19 Feb 2018
  69. 69.
    Noghabi SA, Paramasivam K, Pan Y, Ramesh N, Bringhurst J, Gupta I, Campbell RH (2017) Samza: stateful scalable stream processing at linkedin. Proc VLDB Endow 10(12):1634–1645CrossRefGoogle Scholar
  70. 70.
    Nurvitadhi E, Sim J, Sheffield D, Mishra A, Krishnan S, Marr D (2016) Accelerating recurrent neural networks in analytics servers: comparison of FPGA, CPU, GPU, and ASIC. In: 26th International Conference on Field Programmable Logic and Applications (FPL)Google Scholar
  71. 71.
    ONeil P et al (1996) The log-structured merge-tree (LSM-tree). Acta Inform 33(4):351–385CrossRefGoogle Scholar
  72. 72.
    Pagh R, Rodler F (2004) Cuckoo hashing. J Algorithms 51(2):122–144MathSciNetCrossRefzbMATHGoogle Scholar
  73. 73.
    Pirk H, Moll O, Zaharia M, Madden S (2016) Voodoo—a vector algebra for portable database performance on modern hardware. Proc VLDB Endow 9(14):1707–1718CrossRefGoogle Scholar
  74. 74.
    Plessl C (2012) Accelerating scientific computing with massively parallel computer architectures. IMPRS Winter School, Wroclaw. Accessed 19 Feb 2018
  75. 75.
    Prasad SK, McDermott M, He X, Puri S (2015) Gpu-based parallel r-tree construction and querying. In: Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International. IEEE, pp 618–627Google Scholar
  76. 76.
    Ramaswamy L, Chen J (2011) The coquos approach to continuous queries in unstructured overlays. IEEE Trans Knowl Data Eng 23(3):463–478CrossRefGoogle Scholar
  77. 77.
    Rupp K (2016) CPU, GPU and MIC hardware characteristics over time. Posted in blog GPGPU/MIC computing., 2013, last update
  78. 78.
    Ruta M, Scioscia F, Ieva S, Capurso G, Sciascio ED (2017) Semantic blockchain to improve scalability in the internet of things. Open J Internet Things, 3(1):46–61. Special Issue: Proceedings of the International Workshop on Very Large Internet of Things (VLIoT 2017) in conjunction with the VLDB 2017 Conference in Munich, GermanyGoogle Scholar
  79. 79.
    Schätzle A, Przyjaciel-Zablocki M, Skilevic S, Lausen G (2016) S2RDF: RDF querying with SPARQL on spark. PVLDB 9(10):804–815Google Scholar
  80. 80.
    Segal O, Colangelo P, Nasiri N, Qian Z, Margala M (2015) SparkCL: A unified programming framework for accelerators on heterogeneous clusters. CoRR. arXiv:1505.01120. Accessed 19 Feb 2018
  81. 81.
    Shahvarani A, Jacobsen H-A (2016) A hybrid b+-tree as solution for in-memory indexing on CPU-GPU heterogeneous computing platforms. In: Proceedings of the 2016 International Conference on Management of Data (SIGMOD), pp 1523–1538Google Scholar
  82. 82.
    Skala K, Davidovic D, Afgan E, Sovic I, Sojat Z (2015) Scalable distributed computing hierarchy: cloud, fog and dew computing. Open J Cloud Comput 2(1):16–24Google Scholar
  83. 83.
    Stone JE, Gohara D, Shi G (2010) Opencl: a parallel programming standard for heterogeneous computing systems. IEEE Des. Test 12(3):66–73Google Scholar
  84. 84.
    ter Horst HJ (2005) Completeness, decidability and complexity of entailment for RDF schema and a semantic extension involving the OWL vocabulary. Web Semant 3(2–3):79–115CrossRefGoogle Scholar
  85. 85.
    The Apache Software Foundation (2014) Welcome to Apache Hadoop!. Accessed 19 Feb 2018
  86. 86.
    The Apache Software Foundation (2016) Apache Flink: scalable stream and batch data processing. Accessed 19 Feb 2018
  87. 87.
    The Apache Software Foundation (2016) Apache Tez–Welcome to Apache Tez. Accessed 19 Feb 2018
  88. 88.
    The Apache Software Foundation (2017) Apache Spark—Lightning-fast cluster computing. Accessed 19 Feb 2018
  89. 89.
    Turck M (2016) Is Big data still a thing? (The 2016 Big data landscape). Blog of Matt Turck. Accessed 19 Feb 2018
  90. 90.
    Khan MA, Uddin MF, Gupta N (2014) Seven v’s of Big data understanding Big data to extract value. In: Proceedings of the 2014 Zone 1 Conference of the American Society for Engineering Education, pp 1–5Google Scholar
  91. 91.
    Waldrop MM (2016) The chips are down for Moores law. Nature 530(7589):144–147CrossRefGoogle Scholar
  92. 92.
    Wang J, Park D, Papakonstantinou Y, Swanson S (2017) SSD in-storage computing for search engines. IEEE Trans Comput (to appear)Google Scholar
  93. 93.
    Wang Y (2016) Definition and categorization of dew computing. Open J Cloud Comput 3(1):1–7MathSciNetCrossRefGoogle Scholar
  94. 94.
    Weisz G, Melber J, Wang Y, Fleming K, Nurvitadhi E, Hoe JC (2016) A study of pointer-chasing performance on shared-memory processor-fpga systems. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA ’16, New York, NY, USA, 2016. ACM, pp 264–273Google Scholar
  95. 95.
    Werner S (2017) Hybrid Architecture for Hardware-accelerated query processing in semantic web databases based on Runtime Reconfigurable FPGAs. PhD thesis, University of LübeckGoogle Scholar
  96. 96.
    Werner S, Groppe S, Linnemann V, Pionteck T (2013) Hardware-accelerated join processing in large semantic web databases with FPGAs. In: Proceedings of the 2013 International Conference on High Performance Computing & Simulation (HPCS 2013), Helsinki, Finland, July 1–5 2013. IEEE, pp 131–138Google Scholar
  97. 97.
    Werner S, Heinrich D, Piper J, Groppe S, Backasch R, Blochwitz C, Pionteck T (2015) Automated composition and execution of hardware-accelerated operator graphs. In: Proceedings of the 10th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC 2015), Bremen, Germany, June 29–July 1 2015. IEEEGoogle Scholar
  98. 98.
    Werner S, Heinrich D, Stelzner M, Groppe S, Backasch R, Pionteck T (2014) Parallel and pipelined filter operator for hardware-accelerated operator graphs in semantic web databases. In: Proceedings of the 14th IEEE International Conference on Computer and Information Technology (CIT 2014), Xian, China, September 11–13 2014. IEEEGoogle Scholar
  99. 99.
    Werner S, Heinrich D, Stelzner M, Linnemann V, Pionteck T, Groppe S (2016) Accelerated join evaluation in semantic web databases by using FPGAs. Concurr Comput Pract Exp 28(7):2031–2051CrossRefGoogle Scholar
  100. 100.
    You S, Zhang J, Gruenwald L (2013) Parallel spatial query processing on gpus using r-trees. In: Proceedings of the 2Nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data (BigSpatial), pp 23–31Google Scholar
  101. 101.
    Zhang H, Andersen DG, Pavlo A, Kaminsky M, Ma L, Shen R (2016) Reducing the storage overhead of main-memory oltp databases with hybrid indexes. In: Proceedings of the 2016 International Conference on Management of Data (SIGMOD), pp 1567–1581Google Scholar
  102. 102.
    Zohouri HR, Maruyama N, Smith A, Matsuda M, Matsuoka S (2016) Evaluating and optimizing opencl kernels for high performance computing with FPGAs. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’16, Piscataway, NJ, USA. IEEE Press, pp 35:1–35:12Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Institute of Information SystemsUniversity of LübeckLübeckGermany

Personalised recommendations