Abstract
In this study, we provide an overview of the state-of-the-art technologies in programming, computing, and storage of the massive data analytics landscape. We shed light on different types of analytics that can be performed on massive data. For that, we first provide a detailed taxonomy on different analytic types along with examples of each type. Next, we highlight technology trends of massive data analytics that are available for corporations, government agencies, and researchers. In addition, we enumerate several instances of opportunities that exist for turning massive data into knowledge. We describe and position two distinct case studies of massive data analytics that are being investigated in our research group: recommendation systems in e-commerce applications; and link discovery to predict unknown association of medical concepts. Finally, we discuss the lessons we have learnt and open challenges faced by researchers and businesses in the field of massive data analytics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
References
5 must-have lessons from the 2014 holiday season. http://www.experian.com/blogs/marketing-forward/2015/01/14/five-lessons-from-the-2014-holiday-season/. Accessed 6 March 2015
6 uses of big data for online retailers. http://www.practicalecommerce.com/articles/3960-6-Uses-of-Big-Data-for-Online-Retailers. Accessed 28 Feb 2015
Abbasi A, Albrecht C, Vance A, Hansen J (2012) Metafraud: a meta-learning framework for detecting financial fraud. MIS Q 36(4):1293–1327
Al Hasan M, Zaki MJ (2011) A survey of link prediction in social networks. In: Aggarwal CC (ed) Social network data analytics, pp 243–275. Springer US
Alaçam O, Dalcı D (2009) A usability study of webmaps with eye tracking tool: The effects of iconic representation of information. In: Proceedings of the 13th international conference on human-computer interaction. Part I: new trends, pp 12–21. Springer
Amini Salehi M, Caldwell T, Fernandez A, Mickiewicz E, Redberg D, Rozier EWD, Zonouz S (2014) RESeED: regular expression search over encrypted data in the cloud. In: Proceedings of the 7th IEEE Cloud conference, Cloud ’14, pp 673–680
Assunção MD, Calheiros RN, Bianchi S, Netto MAS, Buyya R (2014) Big data computing and clouds: Trends and future directions. J Parallel Distrib Comput
Australian square kilometer array pathfinder radio telescope. http://www.atnf.csiro.au/projects/askap/index.html. Accessed 28 Feb 2015
Big data and content analytics: measuring the ROI. http://www.aiim.org/Research-and-Publications/Research/Industry-Watch/Big-Data-2013. Accessed 28 Feb 2015
Buckinx W, Verstraeten G, Van den Poel D (2007) Predicting customer loyalty using the internal transactional database. Expert Syst Appl 32(1):125–134
Chen Y, Canny JF (2011) Recommending ephemeral items at web scale. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval, pp 1013–1022. ACM
Dean J (2006) Experiences with MapReduce, an abstraction for large-scale computation. In: Proceedings of the 15th international conference on parallel architectures and compilation techniques, PACT ’06
Dean J, Ghemawat S (2008) MapReduce: Simplified data processing on large clusters. Commun ACM 51(1):107–113
Descriptive, predictive, prescriptive: transforming asset and facilities management with analytics (2013)
Enhancing exploration and production with big data in oil & gas. http://www-01.ibm.com/software/data/bigdata/industry-oil.html. Accessed 28 Feb 2015
Facebook. http://newsroom.fb.com/company-info/. Accessed 14 March 2015
Farris A (2012) How big data is changing the oil & gas industry. Analyt Mag
Gartner survey reveals that 73 percent of organizations have invested or plan to invest in big data in the next two years. http://www.gartner.com/newsroom/id/2848718. Accessed 28 Feb 2015
Gartner taps predictive analytics as next big business intelligence trend. http://www.enterpriseappstoday.com/business-intelligence/gartner-taps-predictive-analytics-as-next-big-business-intelligence-trend.html. Accessed 28 Feb 2015
Ghemawat S, Gobioff H, Leung ST (2003) The google file system. In: Proceedings of the 19th ACM symposium on operating systems principles, SOSP ’03, pp 29–43
Gudivada VN, Baeza-Yates R, Raghavan VV (2015) Big data: promises and problems. Computer 3:20–23
Gudivada VN, Rao D, Raghavan VV (2014) NoSQL systems for big data management. In: IEEE World congress on Services (SERVICES), 2014, pp 190–197. IEEE
Gudivada VN, Rao D, Raghavan VV (2014) Renaissance in data management systems: SQL, NoSQL, and NewSQL. IEEE Computer (in Press)
HPCC vs Hadoop. http://hpccsystems.com/Why-HPCC/HPCC-vs-Hadoop. Accessed 14 March 2015
IBM netfinity predictive failure analysis. http://ps-2.kev009.com/pccbbs/pc_servers/pfaf.pdf. Accessed 14 March 2015
Indrawan-Santiago M (2012) Database research: are we at a crossroad? reflection on NoSQL. In: 2012 15th International conference on network-based information systems (NBiS), pp 45–51
Instagram. https://instagram.com/press/. Accessed 28 Feb 2015
Jayasimha K, Rajyashree M, Tolga K (2013) Large-scale recommendations in a dynamic marketplace. In Workshop on large scale recommendation systems at RecSys 13:
Jayasimha K, Rajyashree M, Tolga K (2015) Subjective similarity: personalizing alternative item recommendations. In: WWW workshop: Ad targeting at scale
Katukuri JR, Xie Y, Raghavan VV, Gupta A (2012) Hypotheses generation as supervised link discovery with automated class labeling on large-scale biomedical concept networks. BMC Genom 13(Suppl 3):S5
Katukuri J, Konik ,T Mukherjee R, Kolay S (2014) Recommending similar items in large-scale online marketplaces. In: 2014 IEEE International conference on Big Data (Big Data), pp 868–876. IEEE
Ko SY, Hoque I, Cho B, Gupta I (2010) Making cloud intermediate data fault-tolerant. In: Proceedings of the 1st ACM symposium on cloud computing, SoCC ’10, pp 181–192
Lam C (2010) Hadoop in action, 1st edn. Manning Publications Co., Greenwich, CT, USA
Li W, Yang Y, Yuan D (2011) A novel cost-effective dynamic data replication strategy for reliability in cloud data centres. In: Proceedings of the Ninth IEEE international conference on dependable, autonomic and secure computing, DASC ’11, pp 496–502
Lohr S (2012) The age of big data. New York Times 11
Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2010) Graphlab: a new framework for parallel machine learning. arxiv preprint. arXiv:1006.4990
Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD ’10, pp 135–146
Manyika J, Michael C, Brad B, Jacques B, Richard D, Charles R (2011) Angela Hung Byers, and McKinsey Global Institute. The next frontier for innovation, competition, and productivity, Big data
Martin A, Knauth T, Creutz S, Becker D, Weigert S, Fetzer C, Brito A (2011) Low-overhead fault tolerance for high-throughput data processing systems. In: Proceedings of the 31st International conference on distributed computing systems, ICDCS ’11, pp 689–699
Middleton AM, Bayliss DA, Halliday G (2011) ECL/HPCC: A unified approach to big data. In: Furht B, Escalante A (eds) Handbook of data intensive computing, pp 59–107. Springer, New York
Middleton AM (2011) Lexisnexis, and risk solutions. White Paper HPCC systems: data intensive supercomputing solutions. Solutions
NASA applies text analytics to airline safety. http://data-informed.com/nasa-applies-text-analytics-to-airline-safety/. Accessed 28 Feb 2015
New IDC worldwide big data technology and services forecast shows market expected to grow to $32.4 billion in 2017. http://www.idc.com/getdoc.jsp?containerId=prUS24542113. Accessed 28 Feb 2015
Processing large-scale graph data: A guide to current technology. http://www.ibm.com/developerworks/library/os-giraph/. Accessed 08 Sept 2015
Purdue university achieves remarkable results with big data. https://datafloq.com/read/purdue-university-achieves-remarkable-results-with/489. Accessed 28 Feb 2015
Reinsel R, Minton S, Turner V, Gantz JF (2014) The digital universe of opportunities: rich data and increasing value of the internet of things
Resources:HPCC systems. http://hpccsystems.com/resources. Accessed 14 March 2015
Russom P et al (2011) Big data analytics. TDWI Best Practices Report, Fourth Quarter
Salehi M, Buyya R (2010) Adapting market-oriented scheduling policies for cloud computing. In: Algorithms and architectures for parallel processing, vol 6081 of ICA3PP’ 10. Springer, Berlin, pp 351–362
Sato K (2012) An inside look at google bigquery. White paper. https://cloud.google.com/files/BigQueryTechnicalWP.pdf
Shinnar A, Cunningham D, Saraswat V, Herta B (2012) M3r: Increased performance for in-memory hadoop jobs. Proc VLDB Endown 5(12):1736–1747
Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: Proceedings of the 26th IEEE symposium on mass storage systems and technologies, MSST ’10, pp 1–10
Singh VK, Gao M, Jain R (2012) Situation recognition: an evolving problem for heterogeneous dynamic big multimedia data. In: Proceedings of the 20th ACM international conference on multimedia, MM ’12, pp 1209–1218, New York, NY, USA, 2012. ACM
The future of big data? three use cases of prescriptive analytics. https://datafloq.com/read/future-big-data-use-cases-prescriptive-analytics/668. Accessed 02 March 2015
The large Hadron collider. http://home.web.cern.ch/topics/large-hadron-collider. Accessed 28 Feb 2015
The oil & gas industry looks to prescriptive analytics to improve exploration and production. https://www.exelisvis.com/Home/NewsUpdates/TabId/170/ArtMID/735/ArticleID/14254/The-Oil--Gas-Industry-Looks-to-Prescriptive-Analytics-To-Improve-Exploration-and-Production.aspx. Accessed 28 Feb 2015
Troester M (2012) Big data meets big data analytics, p 13
Van den Poel D, Buckinx W (2005) Predicting online-purchasing behaviour. Eur J Oper Res 166(2):557–575
VC funding trends in big data (IDC report). http://www.experfy.com/blog/vc-funding-trends-big-data-idc-report/. Accessed 28 Feb 2015
Wang J, Gong W, Varman P, Xie C (2012) Reducing storage overhead with small write bottleneck avoiding in cloud raid system. In: Proceedings of the 2012 ACM/IEEE 13th international conference on grid computing, GRID ’12, pp 174–183, Washington, DC, USA, 2012. IEEE Computer Society
Wolpin S (2006) An exploratory study of an intranet dashboard in a multi-state healthcare system
Youtube statistics. http://www.youtube.com/yt/press/statistics.html. Accessed 28 Feb 2015
Yuan D, Yang Y, Liu X, Chen J (2010) A data placement strategy in scientific cloud workflows. Fut Gen Comput Syst 26(8):1200–1214
Yuan D, Cui L, Liu X (2014) Cloud data management for scientific workflows: research issues, methodologies, and state-of-the-art. In: 10th International conference on semantics, knowledge and grids (SKG), pp 21–28
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on networked systems design and implementation, NSDI’12, pp 2–12. USENIX Association
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: Cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on hot topics in cloud computing, HotCloud’10, pp 10–15
Zhang C, Chang EC, Yap RHC (2014) Tagged-MapReduce: a general framework for secure computing with mixed-sensitivity data on hybrid clouds. In: Proceedings of 14th IEEE/ACM international symposium on cluster, cloud and grid computing, pp 31–40
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer India
About this chapter
Cite this chapter
Pusala, M.K., Amini Salehi, M., Katukuri, J.R., Xie, Y., Raghavan, V. (2016). Massive Data Analysis: Tasks, Tools, Applications, and Challenges. In: Pyne, S., Rao, B., Rao, S. (eds) Big Data Analytics. Springer, New Delhi. https://doi.org/10.1007/978-81-322-3628-3_2
Download citation
DOI: https://doi.org/10.1007/978-81-322-3628-3_2
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-3626-9
Online ISBN: 978-81-322-3628-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)