Big Data Tools and Platforms

Chapter

Abstract

The fast evolving Big Data Tools and Platforms space has given rise to various technologies to deal with different Big Data use cases. However, because of the multitude of the tools and platforms involved it is often difficult for the Big Data practitioners to understand and select the right tools for addressing a given business problem related to Big Data. In this chapter we cover an introductory discussion to the various Big Data Tools and Platforms with the aim of providing necessary breadth and depth to the Big Data practitioner so that they can have a reasonable background to start with to support the Big Data initiatives in their organizations. We start with the discussion of common Technical Concepts and Patterns typically used by the core Big Data Tools and Platforms. Then we delve into the individual characteristics of different categories of the Big Data Tools and Platforms in detail. Then we also cover the applicability of the various categories of Big Data Tools and Platforms to various enterprise level Big Data use cases. Finally, we discuss the future works happening in this space to cover the newer patterns, tools and platforms to be watched for implementation of Big Data use cases.

References

  1. 1.
    Apache Software Foundation. http://en.wikipedia.org/wiki/Apache_Software_Foundation. Accessed 06 Aug 2015
  2. 2.
    Apache Projects Directory. https://projects.apache.org/. Accessed 06 Aug 2015
  3. 3.
    Apache Incubator. http://incubator.apache.org/. Accessed 06 Aug 2015
  4. 4.
    Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: Sixth symposium on operating system design and implementation, San Francisco, CA, December 2004Google Scholar
  5. 5.
    Ghemawat S, Gobioff H, Leung S (2003) The Google file system. In: SOSP’03 Proceedings of the nineteenth ACM symposium on operating systems principles, pp 29–43, October 19–22, 2003, Bolton Landing, New York, USAGoogle Scholar
  6. 6.
    Woodie A (2014) Yahoo: we run the whole company on Hadoop. In: Datanami, http://www.datanami.com/2014/06/04/yahoo-run-whole-company-hadoop/. Accessed 06 Aug 2015
  7. 7.
  8. 8.
  9. 9.
    Saha B (2013) Philosophy behind YARN Resource Management. http://hortonworks.com/blog/philosophy-behind-yarn-resource-management/
  10. 10.
    Murthy A (2012) Apache Hadoop YARN – concepts and applications. http://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/. Accessed 06 Aug 2015
  11. 11.
    Apache Spark. https://spark.apache.org/. Accessed 06 Aug 2015
  12. 12.
    Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. University of California, Berkeley, CAGoogle Scholar
  13. 13.
    Dean J, Ghemawat S (2004) Parallel execution. In: MapReduce: simplified data processing on large clusters. http://research.google.com/archive/mapreduce-osdi04-slides/index-auto-0008.html. Accessed 06 Aug 2015
  14. 14.
  15. 15.
    Apache Drill. http://drill.apache.org/. Accessed 06 Aug 2015
  16. 16.
    Apache Drill Architecture. http://drill.apache.org/architecture/. Accessed 06 Aug 2015
  17. 17.
    Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. University of California, Berkeley, CAGoogle Scholar
  18. 18.
    Apache Drill Architecture. http://drill.apache.org/architecture/
  19. 19.
    Tez. http://tez.apache.org/. Accessed 06 Aug 2015
  20. 20.
    HDFS Architecture. In: HDFS Architecture Guide. http://hadoop.apache.org/docs/r1.2.1/hdfs_design.html. Accessed 06 Aug 2015
  21. 21.
  22. 22.
    Gates A, Bains R (2014) Stinger.next: Enterprise SQL at Hadoop Scale with Apache Hive. http://hortonworks.com/blog/stinger-next-enterprise-sql-hadoop-scale-apache-hive/. Accessed 06 Aug 2015
  23. 23.
    Zhan X, Ho S (2015). Hive on Spark. https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started. Accessed 22 Jan 2016
  24. 24.
    Binary JSON. http://bsonspec.org/. Accessed 06 Aug 2015
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
    Apache Pig. https://pig.apache.org/. Accessed 06 Aug 2015
  32. 32.
    Pig Latin Basics. http://pig.apache.org/docs/r0.14.0/basic.html. Accessed 06 Aug 2015
  33. 33.
  34. 34.
    Apache HCatalog. https://cwiki.apache.org/confluence/display/Hive/HCatalog. Accessed 06 Aug 2015
  35. 35.
    Apache WebHCat. https://cwiki.apache.org/confluence/display/Hive/WebHCat. Accessed 06 Aug 2015
  36. 36.
    Apache Flume. https://flume.apache.org/. Accessed 06 Aug 2015
  37. 37.
    MongoDB Sharding. http://docs.mongodb.org/manual/core/sharding-introduction/. Accessed 06 Aug 2015
  38. 38.
    Apache Sqoop. http://sqoop.apache.org/. Accessed 06 Aug 2015
  39. 39.
    X.509. In: Wikipedia. https://en.wikipedia.org/wiki/X.509. Accessed 06 Aug 2015
  40. 40.
    Apache Oozie. http://oozie.apache.org/. Accessed 06 Aug 2015
  41. 41.
    XPDL. In: Wikipedia. http://en.wikipedia.org/wiki/XPDL. Accessed 06 Aug 2015
  42. 42.
    Vormetric Data Security Platform. http://www.vormetric.com/. Accessed 06 Aug 2015
  43. 43.
    Apache ZooKeeper. https://cwiki.apache.org/confluence/display/ZOOKEEPER/Index. Accessed 06 Aug 2015
  44. 44.
    Linux Unified Key Setup. In: Wikipedia. https://en.wikipedia.org/wiki/Linux_Unified_Key_Setup. Accessed 06 Aug 2015
  45. 45.
    Apache Slider. http://slider.incubator.apache.org/. Accessed 06 Aug 2015
  46. 46.
    Apache Knox. https://knox.apache.org/. Accessed 06 Aug 2015
  47. 47.
    Apache Ambari. https://ambari.apache.org/. Accessed 06 Aug 2015
  48. 48.
    Apache Giraph. http://giraph.apache.org/. Accessed 06 Aug 2015
  49. 49.
    Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010). Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. http://dl.acm.org/citation.cfm?id=1807184
  50. 50.
    Valiant LG (1990). A bridging model for parallel computation. Commun ACM 33(8):103–111Google Scholar
  51. 51.
    IBM Infosphere Guardium Data Encryption. http://www-03.ibm.com/software/products/en/infosphere-guardium-data-encryption. Accessed 06 Aug 2015
  52. 52.
    MongoDB. http://www.mongodb.com/. Accessed 06 Aug 2015
  53. 53.
  54. 54.
    Apache Cassandra. http://cassandra.apache.org/. Accessed 06 Aug 2015
  55. 55.
    Apache Hbase. http://hbase.apache.org/. Accessed 06 Aug 2015
  56. 56.
    Britton Lee, Inc. In: Wikipedia. https://en.wikipedia.org/wiki/Britton_Lee,_Inc. Accessed 06 Aug 2015
  57. 57.
    Snijders C, Matzat U, Reips U (2012) Big data: big gaps of knowledge in the field of internet science. Int J Internet Sci 7(1):1–5Google Scholar
  58. 58.
  59. 59.
    Cockroach Labs. http://cockroachdb.org/. Accessed 06 Aug 2015
  60. 60.
    Corbett JC, Dean J, Epstein M, Fikes A, Frost C, Furman J, Ghemawat S, Gubarev A, Heiser C, Hochschild P, Hsieh W, Kanthak S, Kogan E, Li H, Lloyd A, Melnik S, Mwaura D, Nagle D, Quinlan S, Rao R, Rolig L, Saito Y, Szymaniak M, Taylor C, Wang R, Woodford D (2012) Spanner: Google’s globally-distributed database. In: Tenth symposium on operating system design and implementation, Hollywood, CA, October 2012Google Scholar
  61. 61.
    IBM Cloudant. https://cloudant.com/. Accessed 06 Aug 2015
  62. 62.
    Apache Nutch. http://nutch.apache.org/. Accessed 06 Aug 2015
  63. 63.
    Apache Parquet. http://parquet.apache.org/. Accessed 06 Aug 2015
  64. 64.
    Leverenz L (2015). Language Manual of ORC. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC. Accessed 06 Aug 2015
  65. 65.
    Apache Avro. http://avro.apache.org/docs/1.3.0/. Accessed 06 Aug 2015
  66. 66.
    Sequence File. http://wiki.apache.org/hadoop/SequenceFile. Accessed 06 Aug 2015
  67. 67.
    Melnik S, Gubarev A, Long JJ, Romer G, Shivakumar S, Tolton M, Vassilakis T (2010) Dremel: interactive analysis of web-scale datasets. In: Proceedings of the 36th international conference on very large data bases, 330–339, September 13–17, 2010, Singapore.Google Scholar
  68. 68.
    Massively Parallel. In: Wikipedia. http://en.wikipedia.org/wiki/Massively_parallel_%28computing%29. Accessed 06 Aug 2015
  69. 69.
    Apache Sentry. http://sentry.incubator.apache.org/. Accessed 06 Aug 2015
  70. 70.
    Apache Ranger. http://ranger.incubator.apache.org/. Accessed 06 Aug 2015
  71. 71.
    Apache Falcon. http://falcon.apache.org/index.html. Accessed 06 Aug 2015
  72. 72.
    Apache Atlas Proposal. https://wiki.apache.org/incubator/AtlasProposal. Accessed 06 Aug 2015
  73. 73.
    ODPi. https://www.odpi.org/. Accessed 22 Jan 2016
  74. 74.
    Amazon EMR. http://aws.amazon.com/elasticmapreduce/. Accessed 06 Aug 2015
  75. 75.
  76. 76.
    Qubole’s Hadoop As A Service. http://www.qubole.com/hadoop-as-a-service/. Accessed 06 Aug 2015
  77. 77.
    HDInsight on Microsoft Azure. http://azure.microsoft.com/en-us/services/hdinsight. Accessed 06 Aug 2015
  78. 78.
    Big Data Computing in the HP Cloud. http://www.hpcloud.com/solutions/hadoop. Accessed 06 Aug 2015
  79. 79.
    Hadoop on Google Compute Engine. https://cloud.google.com/solutions/hadoop/. Accessed 06 Aug 2015
  80. 80.
    Altiscale Hadoop As A Service. https://www.altiscale.com/. Accessed 06 Aug 2015
  81. 81.
    Oracle Big Data Appliance. https://www.oracle.com/engineered-systems/big-data-appliance/index.html. Accessed 06 Aug 2015
  82. 82.
    Avnet Hadoop Appliance. http://news.avnet.com/index.php?s=20295&item=127070. Accessed 06 Aug 2015
  83. 83.
  84. 84.
    EMC Data Computing Appliance. http://pivotal.io/big-data/emc-dca. Accessed 06 Aug 2015
  85. 85.
    SeaMicro Fabric Compute System. http://www.seamicro.com/sites/default/files/SM_DS06_v2.1.pdf. Accessed 06 Aug 2015
  86. 86.
    SGI InfiniteData Cluster. https://www.sgi.com/products/servers/infinitedata_cluster/. Accessed 06 Aug 2015
  87. 87.
    Cray Cluster Supercomputer for Hadoop. http://www.cray.com/Assets/PDF/products/cs/CS300HadoopBrochure.pdf. Accessed 06 Aug 2015
  88. 88.
  89. 89.
    Apache Flink. https://flink.apache.org/. Accessed 06 Aug 2015
  90. 90.
  91. 91.
    Apache Solr. http://lucene.apache.org/solr/. Accessed 06 Aug 2015
  92. 92.
    Apache Lucene. https://lucene.apache.org/. Accessed 06 Aug 2015
  93. 93.
    Elastic Search. https://www.elastic.co/products/elasticsearch. Accessed 06 Aug 2015
  94. 94.
    Sphynx. http://sphinxsearch.com/. Accessed 06 Aug 2015
  95. 95.
    Found: Elasticsearch As A Service. https://www.found.no/. Accessed 06 Aug 2015
  96. 96.
    Snaplogic. http://www.snaplogic.com/. Accessed 06 Aug 2015
  97. 97.
    Apache Mahout. http://mahout.apache.org/. Accessed 06 Aug 2015
  98. 98.
    Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. J ACM Trans Comput Syst (TOCS) 26(2)Google Scholar
  99. 99.
    Microsoft Azure Stream Analytics. http://azure.microsoft.com/en-us/services/stream-analytics/. Accessed 06 Aug 2015
  100. 100.
  101. 101.
    Amazon Kinesis. http://aws.amazon.com/kinesis/. Accessed 06 Aug 2015
  102. 102.
    Natural Language Generation. In: Wikipedia. http://en.wikipedia.org/wiki/Natural_language_generation
  103. 103.
    Quill. http://www.narrativescience.com/quill. Accessed 06 Aug 2015
  104. 104.
    Wordsmith. http://automatedinsights.com/wordsmith/. Accessed 06 Aug 2015
  105. 105.
    The Arria NLG Engine. http://www.arria.com/platform.php. Accessed 06 Aug 2015
  106. 106.
    George L (2009) HBase Architecture 101 – Storage. http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html. Accessed 06 Aug 2015
  107. 107.
    IBM Platform Computing. http://www-03.ibm.com/systems/platformcomputing/products/symphony/. Accessed 06 Aug 2015
  108. 108.
    Cascading. http://www.cascading.org/. Accessed 06 Aug 2015
  109. 109.
    Herman Hollerith. Columbia University, Computing History. http://www.columbia.edu/cu/computinghistory/hollerith.html. Accessed 06 Aug 2015
  110. 110.
    Internet of Things. In: Wikipedia. http://en.wikipedia.org/wiki/Internet_of_Things
  111. 111.
    Reed J (2015) Hadoop survey offers insight into investment, adoption. DataInformed. http://data-informed.com/hadoop-survey-offers-insight-into-investment-adoption/
  112. 112.
  113. 113.
    DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W (2007) Dynamo: Amazon’s highly available key-value store. In: SOSP’07. http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf
  114. 114.
    Simple Authentication and Security Layer. In: Wikipedia. https://en.wikipedia.org/wiki/Simple_Authentication_and_Security_Layer. Accessed 06 Aug 2015
  115. 115.
    Aerospike. http://www.aerospike.com/. Accessed 06 Aug 2015
  116. 116.
    Hazelcast. http://hazelcast.org/. Accessed 06 Aug 2015
  117. 117.
    Pivotal GemFire. http://pivotal.io/big-data/pivotal-gemfire. Accessed 06 Aug 2015
  118. 118.
    Amazon Dynamo DB. https://aws.amazon.com/dynamodb/. Accessed 06 Aug 2015
  119. 119.
    ObjectRocket. http://objectrocket.com/. Accessed 06 Aug 2015/
  120. 120.
    Apache Ignite. https://ignite.incubator.apache.org/. Accessed 06 Aug 2015
  121. 121.
  122. 122.
    Oracle TimesTen In-Memory Database. http://www.oracle.com/us/products/database/timesten/overview/index.html. Accessed 06 Aug 2015
  123. 123.
    IBM DB2 with BLU Acceleration. http://www.ibmbluhub.com/. Accessed 06 Aug 2015
  124. 124.
    SAP HANA. http://hana.sap.com/abouthana.html. Accessed 06 Aug 2015
  125. 125.
  126. 126.
    SAP HANA Cloud Platform. http://hcp.sap.com/index.html. Accessed 06 Aug 2015
  127. 127.
    IBM DashDB Cloud Data Warehouse Service. http://www-01.ibm.com/software/data/dashdb/. Accessed 06 Aug 2015
  128. 128.
    EXACloud. http://www.exasol.com/en/products/exacloud/. Accessed 06 Aug 2015
  129. 129.
    Google Cloud Bigtable. https://cloud.google.com/bigtable/docs/. Accessed 06 Aug 2015
  130. 130.
    Spark Streaming. https://spark.apache.org/streaming/. Accessed 06 Aug 2015
  131. 131.
    Teradata. http://www.teradata.com/?LangType=1033l. Accessed 06 Aug 2015
  132. 132.
    Apache Phoenix. http://phoenix.apache.org/. Accessed 06 Aug 2015
  133. 133.
    Mazumder S, Dhar S (2015) Hadoop_as_Big_Data_Operating_System__The_Emerging_ Approach_for_Managing_Challenges_of_Enterprise_Big_Data_Platform. Research Gate. http://www.researchgate.net/publication/274713261
  134. 134.
    Sort Benchmark Home Page. http://sortbenchmark.org/. Accessed 06 Aug 2015
  135. 135.
    Harris D (2014) Databricks demolishes big data benchmark to prove Spark is fast on disk, too. In: GIGAOM Research. https://gigaom.com/2014/10/10/databricks-demolishes-big-data-benchmark-to-prove-spark-is-fast-on-disk-too/. Accessed 06 Aug 2015
  136. 136.
    Gualtieri M, Yuhanna N, Kisker H, Murphy D (2014) The Forrester Wave™: Big Data Hadoop Solutions, Q1 2014. http://www.forrester.com/The+Forrester+Wave+Big+Data+Hadoop+Solutions+Q1+2014/fulltext/-/E-RES112461
  137. 137.
    Penn B (2014) Comparing MapR-FS and HDFS NFS and Snapshots. https://www.mapr.com/blog/comparing-mapr-fs-and-hdfs-nfs-and-snapshots#.VWxfXWMSknE. Accessed 06 Aug 2015
  138. 138.
    IBM Spectrum Scale V4.1.1 delivers software-defined storage for cloud, big data and analytics, and data-intensive technical workflows. http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?infotype=an&subtype=ca&appname=gpateam&supplier=897&letternum=ENUS215-148. Accessed 06 Aug 2015
  139. 139.
    IBM Netezza. http://www-01.ibm.com/software/data/netezza/. Accessed 06 Aug 2015
  140. 140.
  141. 141.
  142. 142.
    Apache Tachyon. http://tachyon-project.org/. Accessed 06 Aug 2015
  143. 143.
    FoundationDB. https://foundationdb.com/. Accessed 06 Aug 2015
  144. 144.
    ACID. In: Wikipedia. http://en.wikipedia.org/wiki/ACID. Accessed 06 Aug 2015
  145. 145.
    Talend. https://www.talend.com/. Accessed 06 Aug 2015
  146. 146.
  147. 147.
    Lu H, Kian-Lee T (1992) Dynamic and load-balanced task- oriented database query processing in parallel systems. In: Proceedings of the 3rd international conference on extending database technology, 357–372Google Scholar
  148. 148.
    Infosphere Streams. http://www-03.ibm.com/software/products/en/infosphere-streams. Accessed 06 Aug 2015
  149. 149.
  150. 150.
  151. 151.
    Sqlstream. http://www.sqlstream.com/. Accessed 06 Aug 2015
  152. 152.
  153. 153.
    Comer D (1979) Ubiquitous B-Tree. ACM Comput Surv (CSUR) Surv 11(2):121–137CrossRefMATHGoogle Scholar
  154. 154.
    Manning CD, Raghavan P, Schütze H (2008) A first take at building an inverted index. In: Introduction to information retrieval, Cambridge University Press, New York, USAGoogle Scholar
  155. 155.
    Binari Radix Indexes. In: Wikipedia. https://en.wikipedia.org/wiki/Radix_tree. Accessed 06 Aug 2015
  156. 156.
    O’Neil E, O’Neil P, Wu K (2007) Bitmap index design choices and their performance implications. In: IDEAS’07 proceedings of the 11th international database engineering and applications symposium, 72–84Google Scholar
  157. 157.
    Broder A, Mitzenmacher M (2005) Network applications of bloom filters: A survey. Internet Math 1(4):485–509MathSciNetCrossRefMATHGoogle Scholar
  158. 158.
  159. 159.
    Design a pluggable interface to place replicas of blocks in HDFS. https://issues.apache.org/jira/browse/HDFS-385. Accessed 06 Aug 2015
  160. 160.
    Zero loss HDFS data replication for multiple datacenters. https://issues.apache.org/jira/browse/HDFS-5442. Accessed 06 Aug 2015
  161. 161.
    DistCp Version2 Guide. http://hadoop.apache.org/docs/r2.7.1/hadoop-distcp/DistCp.html. Accessed 06 Aug 2015
  162. 162.
    Dittrich J, Richter S, Schuh S (2013) Efficient OR Hadoop: why not both? Datenbank-Spektrum 13(1):17–22CrossRefGoogle Scholar
  163. 163.
    Gankidi VR, Teletia N, Patel JM, Halverson A, DeWitt DJ (2014) Indexing HDFS Data in PDW: splitting the data from the index. Proc VLDB Endow 7(13)Google Scholar
  164. 164.
    Liao H, Han J, Fang J (2010) Multi-dimensional Index on Hadoop distributed file system. In: Fifth IEEE international conference on networking, architecture, and storageGoogle Scholar
  165. 165.
    Amplab. https://amplab.cs.berkeley.edu/. Accessed 06 Aug 2015
  166. 166.
    Apache Mesos. http://mesos.apache.org/. Accessed 06 Aug 2015
  167. 167.
  168. 168.
    Informatica. https://www.informatica.com/. Accessed 06 Aug 2015
  169. 169.
    Resource Management for MongoDB. http://jsonstudio.com/resource-management-for-mongodb/. Accessed 06 Aug 2015
  170. 170.
    Demo: Migrating MongoDB data with Mesos and Flocker. https://mesosphere.com/blog/2015/05/21/demo-migrating-mongodb-data-with-mesos-and-powerstrip/. Accessed 06 Aug 2015
  171. 171.
    Nachbar E (2014) Cassandra on Mesos – Scalable Enterprise Storage. https://mesosphere.com/blog/2014/02/12/cassandra-on-mesos-scalable-enterprise-storage/. Accessed 06 Aug 2015
  172. 172.
    Kamenov DZ (2012) Monitoring HBase. http://www.monitis.com/blog/2012/03/28/monitoring-hbase/. Accessed 06 Aug 2015
  173. 173.
    Hannibal Wiki. https://github.com/sentric/hannibal/wiki. Accessed 06 Aug 2015
  174. 174.
    Lai M, Koontz E, Purtell A (2012) Coprocessor Introduction. https://blogs.apache.org/hbase/entry/coprocessor_introduction. Accessed 06 Aug 2015
  175. 175.
    Krompass S, Dayal U, Kuno HA, Kemper A (2007) Dynamic workload management for very large data warehouses: juggling feathers and bowling balls. In: Proceedings of the 33rd international conference on very large data bases, 1105–1115Google Scholar
  176. 176.
    Krompass S, Kuno HA, Wiener JL, Wilkinson K, Dayal U, Kemper A (2009) Managing long-running queries. In: Proceedings of the 13th international conference on extending database technology, 132–143Google Scholar
  177. 177.
    Pang H, Carey MJ, Livny M (1995) Multiclass query scheduling in real-time database systems. IEEE Trans Knowl Data Eng 7(4):533–551CrossRefGoogle Scholar
  178. 178.
    Brown KP, Mehta M, Carey MJ, Livny M (1994) Towards automated performance tuning for complex workloads. In: Proceedings of the 20th international conference on very large data bases, 72–84Google Scholar
  179. 179.
    Chaudhuri S, König AC, Narasayya VR (2004) SQLCM: A continuous monitoring framework for relational database engines. In: Proceedings of the 20th IEEE international conference on data engineering, 473–484Google Scholar
  180. 180.
    Potter T (2014) Solr on YARN. In: Lucidworks. https://lucidworks.com/blog/solr-yarn/. Accessed 06 Aug 2015
  181. 181.
    SPM – Performance Monitoring & Alerting. http://sematext.com/spm/. Accessed 06 Aug 2015
  182. 182.
  183. 183.
    Elasticsearch on Mesos. https://github.com/mesos/elasticsearch. Accessed 06 Aug 2015
  184. 184.
    Health and Performance Monitoring. https://www.elastic.co/guide/en/elasticsearch/client/community/current/health.html. Accessed 06 Aug 2015
  185. 185.
    Shield | Security for Elasticsearch. https://www.elastic.co/products/shield. Accessed 06 Aug 2015
  186. 186.
    High availability – Built-in Mirroring. http://sphinxsearch.com/blog/2013/04/01/high-availability-built-in-mirroring/. Accessed 06 Aug 2015
  187. 187.
    Sphinx Tools beta. https://tools.sphinxsearch.com/. Accessed 06 Aug 2015
  188. 188.
  189. 189.
  190. 190.
    Li H, Ghodsi A, Zaharia M, Shenker S, Stoica I (2014) Tachyon: reliable, memory speed storage for cluster computing frameworks. In: SoCC’14, Seattle WA, 3–5 Nov 2014Google Scholar
  191. 191.
    Pivotal Greenplum. http://pivotal.io/big-data/pivotal-greenplum-database. Accessed 06 Aug 2015
  192. 192.
    Klpoo R (2014) Netezza Zone Maps and I/O Avoidance. In: Database Fog Blog. http://skylandtech.net/2014/04/25/netezza-zone-maps-and-io-avoidance/
  193. 193.
    Centralized cache management in HDFS. https://issues.apache.org/jira/browse/HDFS-4949. Accessed 06 Aug 2015
  194. 194.
    Support memory as a storage medium. https://issues.apache.org/jira/browse/HDFS-5851. Accessed 06 Aug 2015
  195. 195.
  196. 196.
    Sung M (2000) SIMD parallel processing. In: 6.911 Architecture AnonymousGoogle Scholar
  197. 197.
    R. https://www.r-project.org/about.html. Accessed 06 Aug 2015
  198. 198.
  199. 199.
  200. 200.
    BDAS, the Berkeley Data Analytics Stack: https://amplab.cs.berkeley.edu/software/. Accessed 06 Aug 2015
  201. 201.
    Succinct. http://succinct.cs.berkeley.edu/wp/wordpress/. Accessed 06 Aug 2015
  202. 202.
    Splash. http://zhangyuc.github.io/splash/. Accessed 06 Aug 2015
  203. 203.
    Spark ML Programming Guide. https://spark.apache.org/docs/latest/ml-guide.html#main-concepts. Accessed 06 Aug 2015
  204. 204.
    BlinkDB. http://blinkdb.org/. Accessed 06 Aug 2015
  205. 205.
    Sampleclean. http://sampleclean.org/. Accessed 06 Aug 2015
  206. 206.
    Crankshaw D (2014) Velox: models in action. https://amplab.cs.berkeley.edu/projects/velox/. Accessed 06 Aug 2015
  207. 207.
    Databricks Spark As A Service. https://databricks.com/product/databricks. Accessed 06 Aug 2015
  208. 208.
    Qubole Spark As A Service. http://www.qubole.com/apache-spark-as-a-service/. Accessed 06 Aug 2015
  209. 209.
    IBM Spark As A Service. http://www.spark.tc/beta/. Accessed 06 Aug 2015
  210. 210.
    Terracota BigMemory. http://terracotta.org/products/bigmemory. Accessed 06 Aug 2015
  211. 211.
    Apache Thrift. https://thrift.apache.org/. Accessed 06 Aug 2015
  212. 212.
    Rodriguez A (2008) RESTful Web services: the basics. In: IBM developerWorks. http://www.ibm.com/developerworks/library/ws-restful/
  213. 213.
    Jupyter. http://jupyter.org/. Accessed 06 Aug 2015
  214. 214.
    Apache Zeppelin. https://zeppelin.incubator.apache.org/. Accessed 06 Aug 2015
  215. 215.
    Ullman JD, Aho A (1992) The relational data model. In: Foundations of Computer Science, C edn. http://infolab.stanford.edu/~ullman/focs/ch08.pdf
  216. 216.
    Edgar F. Codd. In: Wikipedia. https://en.wikipedia.org/wiki/Edgar_F_Codd. Accessed 06 Aug 2015
  217. 217.
    Ullman JD, Aho A (1992) The graph data model. In: Foundations of computer science, C edn. http://infolab.stanford.edu/~ullman/focs/ch09.pdf
  218. 218.
    Simmhan YL, Plale B, Gannon D (2005) A survey of data provenance techniques. Newslett ACM SIGMOD 34(3):31–36CrossRefGoogle Scholar
  219. 219.
    Gilbert S, Lynch NA (2012) Perspectives on the CAP theorem. Computer 45(2):30–36CrossRefGoogle Scholar
  220. 220.
    Kamat G, Singh S (2013). Comparisons of compression. In: Hadoop Summit 2013. http://www.slideshare.net/Hadoop_Summit/kamat-singh-june27425pmroom210cv2
  221. 221.
    DMZ (Demilitarized Zone). In: CCM. http://ccm.net/contents/602-dmz-demilitarized-zone. Accessed 06 Aug 2015
  222. 222.
    Stonebraker M. The case for shared nothing. University of California, Berkeley, CAGoogle Scholar
  223. 223.
    Chamberlin DD, Boyce RF (1974) SEQUEL: A structured english query language. In: Proceedings of the 1974 ACM SIGFIDET workshop on Data description, access and control, 249–264Google Scholar
  224. 224.
    Kerberos: The Network Authentication Protocol. http://web.mit.edu/kerberos/. Accessed 06 Aug 2015
  225. 225.
    Lightweight Directory Access Protocol (LDAP). In: Wikipedia. https://en.wikipedia.org/wiki/Lightweight_Directory_Access_Protocol. Accessed 06 Aug 2015
  226. 226.
    Machine Learning. In: Wikipedia. https://en.wikipedia.org/wiki/Machine_learning. Accessed 06 Aug 2015
  227. 227.
    Apache System ML. http://systemml.apache.org/. Accessed 19 Jan 2016
  228. 228.
    Mazumder S (2010) NoSQL in the Enterprise. In: InfoQ. http://www.infoq.com/articles/nosql-in-the-enterprise. Accessed 06 Aug 2015
  229. 229.
    Zaman Khan RZ, Ali J (2013) Use of DAG in distributed parallel computing. Int J Appl Innov Eng Manag 2(11):81–85Google Scholar
  230. 230.
  231. 231.
  232. 232.
    Introduction to Massively Parallel Processing (MPP) database. https://dwarehouse.wordpress.com/2012/12/28/introduction-to-massively-parallel-processing-mpp-database/. Accessed 06 Aug 2015
  233. 233.
    Flume User Guide. https://flume.apache.org/FlumeUserGuide.html. Accessed 06 Aug 2015
  234. 234.
    Noll MG (2013) Running a Multi-Node Storm Cluster. http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/. Accessed 06 Aug 2015
  235. 235.
    Sharma A (2014) Apache Kafka: Next Generation Distributed Messaging System. http://www.infoq.com/articles/apache-kafka. Accessed 06 Aug 2015
  236. 236.
  237. 237.
    Spark Cluster Overview. https://spark.apache.org/docs/1.0.1/cluster-overview.html. Accessed 06 Aug 2015
  238. 238.
  239. 239.
    Amazon Redshift. https://aws.amazon.com/redshift/. Accessed 06 Aug 2015
  240. 240.
  241. 241.
    IBM Watson Explorer. http://www.ibm.com/smarterplanet/us/en/ibmwatson/explorer.html. Accessed 06 Aug 2015
  242. 242.
    Oracle Secure Enterprise Search. http://www.oracle.com/us/products/039247.htm. Accessed 06 Aug 2015
  243. 243.
    Amazon CloudSearch. https://aws.amazon.com/cloudsearch/. Accessed 06 Aug 2015
  244. 244.
  245. 245.
    IBM SPSS Software. http://www-01.ibm.com/software/analytics/spss/. Accessed 06 Aug 2015
  246. 246.
    Microstrategy. http://www.microstrategy.com/us/. Accessed 06 Aug 2015
  247. 247.
    SAP Business Intelligence Solutions. http://go.sap.com/solution/platform-technology/business-intelligence.html. Accessed 06 Aug 2015
  248. 248.
    IBM Cognos Software. http://www-01.ibm.com/software/analytics/cognos/. Accessed 06 Aug 2015
  249. 249.
    Tableau Software. http://www.tableau.com/. Accessed 06 Aug 2015
  250. 250.
    JasperSoft Business Intelligence Software. https://www.jaspersoft.com/. Accessed 06 Aug 2015
  251. 251.
    Pentaho. http://www.pentaho.com/. Accessed 06 Aug 2015.

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.IBM AnalyticsSan FranciscoUSA

Personalised recommendations