Mobile Networks and Applications

, Volume 19, Issue 2, pp 171–209 | Cite as

Big Data: A Survey

Article

Abstract

In this paper, we review the background and state-of-the-art of big data. We first introduce the general background of big data and review related technologies, such as could computing, Internet of Things, data centers, and Hadoop. We then focus on the four phases of the value chain of big data, i.e., data generation, data acquisition, data storage, and data analysis. For each phase, we introduce the general background, discuss the technical challenges, and review the latest advances. We finally examine the several representative applications of big data, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid. These discussions aim to provide a comprehensive overview and big-picture to readers of this exciting area. This survey is concluded with a discussion of open problems and future directions.

Keywords

Big data Cloud computing Internet of things Data center Hadoop Smart grid Big data analysis 

References

  1. 1.
    Gantz J, Reinsel D (2011) Extracting value from chaos. IDC iView, pp 1–12Google Scholar
  2. 2.
  3. 3.
    Cukier K (2010) Data, data everywhere: a special report on managing information. Economist NewspaperGoogle Scholar
  4. 4.
    Drowning in numbers - digital data will flood the planet- and help us understand it better (2011). http://www.economist.com/blogs/dailychart/2011/11/bigdata-0
  5. 5.
    Lohr S (2012) The age of big data. New York Times, pp 11Google Scholar
  6. 6.
    Yuki N (2011) Following digital breadcrumbs to big data gold. http://www.npr.org/2011/11/29/142521910/thedigitalbreadcrumbs-that-lead-to-big-data
  7. 7.
    Yuki NThe search for analysts to make sense of big data (2011). http://www.npr.org/2011/11/30/142893065/the-searchforanalysts-to-make-sense-of-big-data
  8. 8.
  9. 9.
    Special online collection: dealing with big data (2011). http://www.sciencemag.org/site/special/data/
  10. 10.
    Manyika J, McKinsey Global Institute, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Byers AH (2011) Big data: the next frontier for innovation, competition, and productivity. McKinsey Global InstituteGoogle Scholar
  11. 11.
    Mayer-Schönberger V, Cukier K (2013) Big data: a revolution that will transform how we live, work, and think. Eamon Dolan/Houghton Mifflin HarcourtGoogle Scholar
  12. 12.
    Laney D (2001) 3-d data management: controlling data volume, velocity and variety. META Group Research Note, 6 FebruaryGoogle Scholar
  13. 13.
    Zikopoulos P, Eaton C, et al (2011) Understanding big data: analytics for enterprise class hadoop and streaming data. McGraw-Hill Osborne MediaGoogle Scholar
  14. 14.
    Meijer E (2011) The world according to linq. Communications of the ACM 54(10):45–51CrossRefGoogle Scholar
  15. 15.
    Beyer M (2011) Gartner says solving big data challenge involves more than just managing volumes of data. Gartner. http://www.gartner.com/it/page.jsp
  16. 16.
    O. R. Team (2011) Big data now: current perspectives from OReilly Radar. OReilly MediaGoogle Scholar
  17. 17.
    Grobelnik M (2012) Big data tutorial. http://videolectures.net/eswc2012grobelnikbigdata/
  18. 18.
    Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L (2008) Detecting influenza epidemics using search engine query data. Nature 457(7232):1012–1014CrossRefGoogle Scholar
  19. 19.
    DeWitt D, Gray J (1992) Parallel database systems: the future of high performance database systems. Commun ACM 35(6):85–98CrossRefGoogle Scholar
  20. 20.
    Walter T (2009) Teradata past, present, and future. UCI ISG lecture series on scalable data managementGoogle Scholar
  21. 21.
    Ghemawat S, Gobioff H, Leung S-T (2003) The google file system. In: ACM SIGOPS Operating Systems Review, vol 37. ACM, pp 29–43Google Scholar
  22. 22.
    Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRefGoogle Scholar
  23. 23.
    Hey AJG, Tansley S, Tolle KM, et al (2009) The fourth paradigm: data-intensive scientific discoveryGoogle Scholar
  24. 24.
    Howard JH, Kazar ML, Menees SG, Nichols DA, Satyanarayanan M, Sidebotham RN, West MJ (1988) Scale and performance in a distributed file system. ACM Trans Comput Syst (TOCS) 6(1):51–81CrossRefGoogle Scholar
  25. 25.
    Cattell R (2011) Scalable sql and nosql data stores. ACM SIGMOD Record 39(4):12–27CrossRefGoogle Scholar
  26. 26.
    Labrinidis A, Jagadish HV (2012) Challenges and opportunities with big data. Proc VLDB Endowment 5(12):2032–2033CrossRefGoogle Scholar
  27. 27.
    Chaudhuri S, Dayal U, Narasayya V (2011) An overview of business intelligence technology. Commun ACM 54(8):88–98CrossRefGoogle Scholar
  28. 28.
    Agrawal D, Bernstein P, Bertino E, Davidson S, Dayal U, Franklin M, Gehrke J, Haas L, Halevy A, Han J et al (2012) Challenges and opportunities with big data. A community white paper developed by leading researches across the United StatesGoogle Scholar
  29. 29.
    Sun Y, Chen M, Liu B, Mao S (2013) Far: a fault-avoidant routing method for data center networks with regular topology. In: Proceedings of ACM/IEEE symposium on architectures for networking and communications systems (ANCS’13). ACMGoogle Scholar
  30. 30.
    Wiki (2013). Applications and organizations using hadoop. http://wiki.apache.org/hadoop/PoweredBy
  31. 31.
    Bahga A, Madisetti VK (2012) Analyzing massive machine maintenance data in a computing cloud. IEEE Transac Parallel Distrib Syst 23(10):1831–1843CrossRefGoogle Scholar
  32. 32.
    Gunarathne T, Wu T-L, Choi JY, Bae S-H, Qiu J (2011) Cloud computing paradigms for pleasingly parallel biomedical applications. Concurr Comput Prac Experience 23(17):2338–2354CrossRefGoogle Scholar
  33. 33.
    Gantz J, Reinsel D (2010) The digital universe decade-are you ready. External publication of IDC (Analyse the Future) information and data, pp 1–16Google Scholar
  34. 34.
    Bryant RE (2011) Data-intensive scalable computing for scientific applications. Comput Sci Eng 13(6):25–33CrossRefGoogle Scholar
  35. 35.
    Wahab MHA, Mohd MNH, Hanafi HF, Mohsin MFM (2008) Data pre-processing on web server logs for generalized association rules mining algorithm. World Acad Sci Eng Technol 48:2008Google Scholar
  36. 36.
    Nanopoulos A, Manolopoulos Y, Zakrzewicz M, Morzy T (2002) Indexing web access-logs for pattern queries. In: Proceedings of the 4th international workshop on web information and data management. ACM, pp 63–68Google Scholar
  37. 37.
    Joshi KP, Joshi A, Yesha Y (2003) On using a warehouse to analyze web logs. Distrib Parallel Databases 13(2):161–180MATHCrossRefGoogle Scholar
  38. 38.
    Chandramohan V, Christensen K (2002) A first look at wired sensor networks for video surveillance systems. In: Proceedings LCN 2002, 27th annual IEEE conference on local computer networks. IEEE, pp 728–729Google Scholar
  39. 39.
    Selavo L, Wood A, Cao Q, Sookoor T, Liu H, Srinivasan A, Wu Y, Kang W, Stankovic J, Young D et al (2007) Luster: wireless sensor network for environmental research. In: Proceedings of the 5th international conference on Embedded networked sensor systems. ACM, pp 103–116Google Scholar
  40. 40.
    Barrenetxea G, Ingelrest F, Schaefer G, Vetterli M, Couach O, Parlange M (2008) Sensorscope: out-of-the-box environmental monitoring. In: Information processing in sensor networks, 2008, international conference on IPSN’08. IEEE, pp 332– 343Google Scholar
  41. 41.
    Kim Y, Schmid T, Charbiwala ZM, Friedman J, Srivastava MB (2008) Nawms: nonintrusive autonomous water monitoring system. In: Proceedings of the 6th ACM conference on Embedded network sensor systems. ACM, pp 309–322Google Scholar
  42. 42.
    Kim S, Pakzad S, Culler D, Demmel J, Fenves G, Glaser S, Turon M (2007) Health monitoring of civil infrastructures using wireless sensor networks. In Information Processing in Sensor Networks 2007, 6th International Symposium on IPSN 2007. IEEE, pp 254–263Google Scholar
  43. 43.
    Ceriotti M, Mottola L, Picco GP, Murphy AL, Guna S, Corra M, Pozzi M, Zonta D, Zanon P (2009) Monitoring heritage buildings with wireless sensor networks: the torre aquila deployment. In: Proceedings of the 2009 International Conference on Information Processing in Sensor Networks. IEEE Computer Society, pp 277–288Google Scholar
  44. 44.
    Tolle G, Polastre J, Szewczyk R, Culler D, Turner N, Tu K, Burgess S, Dawson T, Buonadonna P, Gay D et al (2005) A macroscope in the redwoods. In: Proceedings of the 3rd international conference on embedded networked sensor systems. ACM, pp 51–63Google Scholar
  45. 45.
    Wang F, Liu J (2011) Networked wireless sensor data collection: issues, challenges, and approaches. IEEE Commun Surv Tutor 13(4):673–687CrossRefGoogle Scholar
  46. 46.
    Cho J, Garcia-Molina H (2002) Parallel crawlers. In: Proceedings of the 11th international conference on World Wide Web. ACM, pp 124–135Google Scholar
  47. 47.
    Choudhary S, Dincturk ME, Mirtaheri SM, Moosavi A, von Bochmann G, Jourdan G-V, Onut I-V (2012) Crawling rich internet applications: the state of the art. In: CASCON. pp 146–160Google Scholar
  48. 48.
    Ghani N, Dixit S, Wang T-S (2000) On ip-over-wdm integration. IEEE Commun Mag 38(3):72–84CrossRefGoogle Scholar
  49. 49.
    Manchester J, Anderson J, Doshi B, Dravida S, Ip over sonet (1998). IEEE Commun Mag 36(5):136–142CrossRefGoogle Scholar
  50. 50.
    Jinno M, Takara H, Kozicki B (2009) Dynamic optical mesh networks: drivers, challenges and solutions for the future. In: Optical communication, 2009, 35th European conference on ECOC’09. IEEE, pp 1–4Google Scholar
  51. 51.
    Barroso LA, Hölzle U (2009) The datacenter as a computer: an introduction to the design of warehouse-scale machines. Synt Lect Comput Archit 4(1):1–108Google Scholar
  52. 52.
    Armstrong J (2009) Ofdm for optical communications. J Light Technol 27(3):189–204CrossRefGoogle Scholar
  53. 53.
    Shieh W (2011) Ofdm for flexible high-speed optical networks. J Light Technol 29(10):1560–1577MathSciNetCrossRefGoogle Scholar
  54. 54.
    Cisco data center interconnect design and deployment guide (2010)Google Scholar
  55. 55.
    Greenberg A, Hamilton JR, Jain N, Kandula S, Kim C, Lahiri P, Maltz DA, Patel P, Sengupta S (2009) Vl2: a scalable and flexible data center network. In ACM SIGCOMM computer communication review, vol 39. ACM, pp 51–62Google Scholar
  56. 56.
    Guo C, Lu G, Li D, Wu H, Zhang X, Shi Y, Tian C, Zhang Y, Lu S (2009) Bcube: a high performance, server-centric network architecture for modular data centers. ACM SIGCOMM Comput Commun Rev 39(4):63–74CrossRefGoogle Scholar
  57. 57.
    Farrington N, Porter G, Radhakrishnan S, Bazzaz HH, Subramanya V, Fainman Y, Papen G, Vahdat A (2011) Helios: a hybrid electrical/optical switch architecture for modular data centers. ACM SIGCOMM Comput Commun Rev 41(4):339–350Google Scholar
  58. 58.
    Abu-Libdeh H, Costa P, Rowstron A, O’Shea G, Donnelly A (2010) Symbiotic routing in future data centers. ACM SIGCOMM Comput Commun Rev 40(4):51–62CrossRefGoogle Scholar
  59. 59.
    Lam C, Liu H, Koley B, Zhao X, Kamalov V, Gill V, Fiber optic communication technologies: what’s needed for datacenter network operations (2010). IEEE Commun Mag 48(7):32–39CrossRefGoogle Scholar
  60. 60.
    Wang G, Andersen DG, Kaminsky M, Papagiannaki K, Ng TS, Kozuch M, Ryan M (2010) c-through: Part-time optics in data centers. In: ACM SIGCOMM Computer Communication Review, vol 40. ACM, pp 327–338Google Scholar
  61. 61.
    Ye X, Yin Y, Yoo SJB, Mejia P, Proietti R, Akella V (2010) Dos: a scalable optical switch for datacenters. In Proceedings of the 6th ACM/IEEE symposium on architectures for networking and communications systems. ACM, p 24Google Scholar
  62. 62.
    Singla A, Singh A, Ramachandran K, Xu L, Zhang Y (2010) Proteus: a topology malleable data center network. In Proceedings of the 9th ACM SIGCOMM workshop on hot topics in networks. ACM, p 8Google Scholar
  63. 63.
    Liboiron-Ladouceur O, Cerutti I, Raponi PG, Andriolli N, Castoldi P (2011) Energy-efficient design of a scalable optical multiplane interconnection architecture. IEEE J Sel Top Quantum Electron 17(2):377–383CrossRefGoogle Scholar
  64. 64.
    Kodi AK, Louri A (2011) Energy-efficient and bandwidth-reconfigurable photonic networks for high-performance computing (hpc) systems. IEEE J Sel Top Quantum Electron 17(2):384–395CrossRefGoogle Scholar
  65. 65.
    Zhou X, Zhang Z, Zhu Y, Li Y, Kumar S, Vahdat A, Zhao BY, Zheng H (2012) Mirror mirror on the ceiling: flexible wireless links for data centers. ACM SIGCOMM Comput Commun Rev 42(4):443–454CrossRefGoogle Scholar
  66. 66.
    Lenzerini M (2002) Data integration: a theoretical perspective. In: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM, pp 233–246Google Scholar
  67. 67.
    Cafarella MJ, Halevy A, Khoussainova N (2009) Data integration for the relational web. Proc VLDB Endowment 2(1):1090–1101CrossRefGoogle Scholar
  68. 68.
    Maletic JI, Marcus A (2000) Data cleansing: beyond integrity analysis. In: IQ. Citeseer, pp 200–209Google Scholar
  69. 69.
    Kohavi R, Mason L, Parekh R, Zheng Z (2004) Lessons and challenges from mining retail e-commerce data. Mach Learn 57(1-2):83–113CrossRefGoogle Scholar
  70. 70.
    Chen H, Ku W-S, Wang H, Sun M-T (2010) Leveraging spatio-temporal redundancy for rfid data cleansing. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, pp 51–62Google Scholar
  71. 71.
    Zhao Z, Ng W (2012) A model-based approach for rfid data stream cleansing. In Proceedings of the 21st ACM international conference on information and knowledge management. ACM, pp 862–871Google Scholar
  72. 72.
    Khoussainova N, Balazinska M, Suciu D (2008) Probabilistic event extraction from rfid data. In: Data Engineering, 2008. IEEE 24th international conference on ICDE 2008. IEEE, pp 1480–1482Google Scholar
  73. 73.
    Herbert KG, Wang JTL (2007) Biological data cleaning: a case study. Int J Inf Qual 1(1):60–82CrossRefGoogle Scholar
  74. 74.
    Tsai T-H, Lin C-Y (2012) Exploring contextual redundancy in improving object-based video coding for video sensor networks surveillance. IEEE Transac Multmed 14(3):669–682CrossRefGoogle Scholar
  75. 75.
    Sarawagi S, Bhamidipaty A (2002) Interactive deduplication using active learning. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 269–278Google Scholar
  76. 76.
    Kamath U, Compton J, Dogan RI, Jong KD, Shehu A (2012) An evolutionary algorithm approach for feature generation from sequence data and its application to dna splice site prediction. IEEE/ACM Transac Comput Biol Bioinforma (TCBB) 9(5):1387–1398CrossRefGoogle Scholar
  77. 77.
    Leung K-S, Lee KH, Wang J-F, Ng EYT, Chan HLY, Tsui SKW, Mok TSK, Tse PC-H, Sung JJ-Y (2011) Data mining on dna sequences of hepatitis b virus. IEEE/ACM Transac Comput Biol Bioinforma 8(2):428–440CrossRefGoogle Scholar
  78. 78.
    Huang Z, Shen H, Liu J, Zhou X (2011) Effective data co-reduction for multimedia similarity search. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. ACM, pp 1021–1032Google Scholar
  79. 79.
    Bleiholder J, Naumann F (2008) Data fusion. ACM Comput Surv (CSUR) 41(1):1CrossRefGoogle Scholar
  80. 80.
    Brewer EA (2000) Towards robust distributed systems. In: PODC. p 7Google Scholar
  81. 81.
    Gilbert S, Lynch N (2002) Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News 33(2):51–59CrossRefGoogle Scholar
  82. 82.
    McKusick MK, Quinlan S (2009) Gfs: eqvolution on fast-forward. ACM Queue 7(7):10CrossRefGoogle Scholar
  83. 83.
    Chaiken R, Jenkins B, Larson P-Å, Ramsey B, Shakib D, Weaver S, Zhou J (2008) Scope: easy and efficient parallel processing of massive data sets. Proc VLDB Endowment 1(2):1265–1276CrossRefGoogle Scholar
  84. 84.
    Beaver D, Kumar S, Li HC, Sobel J, Vajgel P et al (2010) Finding a needle in haystack: facebook’s photo storage. In OSDI, vol 10. pp 1–8Google Scholar
  85. 85.
    DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W (2007) Dynamo: amazon’s highly available key-value store. In: SOSP, vol 7. pp 205–220Google Scholar
  86. 86.
    Karger D, Lehman E, Leighton T, Panigrahy R, Levine M, Lewin D (1997) Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In: Proceedings of the twenty-ninth annual ACM symposium on theory of computing. ACM, pp 654–663Google Scholar
  87. 87.
    Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst (TOCS) 26(2):4CrossRefGoogle Scholar
  88. 88.
    Burrows M (2006) The chubby lock service for loosely-coupled distributed systems. In: Proceedings of the 7th symposium on Operating systems design and implementation. USENIX Association, pp 335–350Google Scholar
  89. 89.
    Lakshman A, Malik P (2009) Cassandra: structured storage system on a p2p network. In: Proceedings of the 28th ACM symposium on principles of distributed computing. ACM, pp 5–5Google Scholar
  90. 90.
    George L (2011) HBase: the definitive guide. O’Reilly Media IncGoogle Scholar
  91. 91.
    Judd D (2008) hypertable-0.9. 0.4-alphaGoogle Scholar
  92. 92.
    Chodorow K (2013) MongoDB: the definitive guide. O’Reilly Media IncGoogle Scholar
  93. 93.
    Crockford D (2006) The application/json media type for javascript object notation (json)Google Scholar
  94. 94.
    Murty J (2009) Programming amazon web services: S3, EC2, SQS, FPS, and SimpleDB. O’Reilly Media IncGoogle Scholar
  95. 95.
    Anderson JC, Lehnardt J, Slater N (2010) CouchDB: the definitive guide. O’Reilly Media IncGoogle Scholar
  96. 96.
    Blanas S, Patel JM, Ercegovac V, Rao J, Shekita EJ, Tian Y (2010) A comparison of join algorithms for log processing in mapreduce. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, pp 975–986Google Scholar
  97. 97.
    Yang H-C, Parker DS (2009) Traverse: simplified indexing on large map-reduce-merge clusters. In: Database systems for advanced applications. Springer, pp 308–322Google Scholar
  98. 98.
    Pike R, Dorward S, Griesemer R, Quinlan S (2005) Interpreting the data: parallel analysis with sawzall. Sci Program 13(4):277–298Google Scholar
  99. 99.
    Gates AF, Natkovich O, Chopra S, Kamath P, Narayanamurthy SM, Olston C, Reed B, Srinivasan S, Srivastava U (2009) Building a high-level dataflow system on top of map-reduce: the pig experience. Proceedings VLDB Endowment 2(2):1414–1425CrossRefGoogle Scholar
  100. 100.
    Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R (2009) Hive: a warehousing solution over a map-reduce framework. Proc VLDB Endowment 2(2):1626–1629CrossRefGoogle Scholar
  101. 101.
    Isard M, Budiu M, Yu Y, Birrell A, Fetterly D (2007) Dryad: distributed data-parallel programs from sequential building blocks. ACM SIGOPS Oper Syst Rev 41(3):59–72CrossRefGoogle Scholar
  102. 102.
    Yu Y, Isard M, Fetterly D, Budiu M, Erlingsson Ú, Gunda PK, Currey J (2008) Dryadlinq: a system for general-purpose distributed data-parallel computing using a high-level language. In: OSDI, vol 8. pp 1–14Google Scholar
  103. 103.
    Moretti C, Bulosan J, Thain D, Flynn PJ (2008) All-pairs: an abstraction for data-intensive cloud computing. In: Parallel and distributed processing, 2008. IEEE international symposium on IPDPS 2008. IEEE, pp 1–11Google Scholar
  104. 104.
    Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, pp 135–146Google Scholar
  105. 105.
    Bu Y, Bill H, Balazinska M, Ernst MD (2010) Haloop: efficient iterative data processing on large clusters. Proc VLDB Endowment 3(1-2):285–296CrossRefGoogle Scholar
  106. 106.
    Ekanayake J, Li H, Zhang B, Gunarathne T, Bae S-H, Qiu J, Fox G (2010) Twister: a runtime for iterative mapreduce. In Proceedings of the 19th ACM international symposium on high performance distributed computing. ACM, pp 810–818Google Scholar
  107. 107.
    Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin M, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on networked systems design and implementation. USENIX Association, pp 2–2Google Scholar
  108. 108.
    Bhatotia P, Wieder A, Rodrigues R, Acar UA, Pasquin R (2011) Incoop: mapreduce for incremental computations. In: Proceedings of the 2nd ACM symposium on cloud computing. ACM, p 7Google Scholar
  109. 109.
    Murray DG, Schwarzkopf M, Smowton C, Smith S, Madhavapeddy A, Hand S (2011) Ciel: a universal execution engine for distributed data-flow computing. In: Proceedings of the 8th USENIX conference on Networked systems design and implementation. p 9Google Scholar
  110. 110.
    Anderson TW (1958) An introduction to multivariate statistical analysis, vol 2. Wiley, New YorkGoogle Scholar
  111. 111.
    Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Philip SY et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37CrossRefGoogle Scholar
  112. 112.
    What analytics data mining, big data software you used in the past 12 months for a real project? (2012) http://www.kdnuggets.com/polls/2012/analytics-data-mining-big-data-software.html
  113. 113.
    Berthold MR, Cebron N, Dill F, Gabriel TR, Kötter T, Meinl T, Ohl P, Sieb C, Thiel K, Wiswedel B (2008) KNIME: the Konstanz information miner. SpringerGoogle Scholar
  114. 114.
    Sallam RL, Richardson J, Hagerty J, Hostmann B (2011) Magic quadrant for business intelligence platforms. CT, Gartner Group, StamfordGoogle Scholar
  115. 115.
    Beyond the PC. Special Report on Personal Technology (2011)Google Scholar
  116. 116.
    Goff SA, Vaughn M, McKay S, Lyons E, Stapleton AE, Gessler D, Matasci N, Wang L, Hanlon M, Lenards A et al (2011) The iplant collaborative: cyberinfrastructure for plant biology. Front Plant Sci 34(2):1–16. doi:10.3389/fpls.2011.00034 Google Scholar
  117. 117.
    Baah GK, Gray A, Harrold MJ (2006) On-line anomaly detection of deployed software: a statistical machine learning approach. In: Proceedings of the 3rd international workshop on Software quality assurance. ACM, pp 70–77Google Scholar
  118. 118.
    Moeng M, Melhem R (2010) Applying statistical machine learning to multicore voltage & frequency scaling. In: Proceedings of the 7th ACM international conference on computing frontiers. ACM, pp 277–286Google Scholar
  119. 119.
    Gaber MM, Zaslavsky A, Krishnaswamy S (2005) Mining data streams: a review. ACM Sigmod Record 34(2):18–26CrossRefGoogle Scholar
  120. 120.
    Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. ACM Sigmod Record 33(1):50–57CrossRefGoogle Scholar
  121. 121.
    van der Aalst W (2012) Process mining: overview and opportunities. ACM Transac Manag Inform Syst (TMIS) 3(2):7Google Scholar
  122. 122.
    Manning CD, Schütze H (1999) Foundations of statistical natural language processing, vol 999. MIT PressGoogle Scholar
  123. 123.
    Pal SK, Talwar V, Mitra P (2002) Web mining in soft computing framework, relevance, state of the art and future directions. IEEE Transac Neural Netw 13(5):1163–1177CrossRefGoogle Scholar
  124. 124.
    Chakrabarti S (2000) Data mining for hypertext: a tutorial survey. ACM SIGKDD Explor Newsl 1(2):1–11CrossRefGoogle Scholar
  125. 125.
    Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1):107–117CrossRefGoogle Scholar
  126. 126.
    Konopnicki D, Shmueli O (1995) W3qs: a query system for the world-wide web. In: VLDB, vol 95. pp 54–65Google Scholar
  127. 127.
    Chakrabarti S, Van den Berg M, Dom B (1999) Focused crawling: a new approach to topic-specific web resource discovery. Comput Netw 31(11):1623–1640CrossRefGoogle Scholar
  128. 128.
    Ding D, Metze F, Rawat S, Schulam PF, Burger S, Younessian E, Bao L, Christel MG, Hauptmann A (2012) Beyond audio and video retrieval: towards multimedia summarization. In: Proceedings of the 2nd ACM international conference on multimedia retrieval. ACM, pp 2Google Scholar
  129. 129.
    Wang M, Ni B, Hua X-S, Chua T-S (2012) Assistive tagging: a survey of multimedia tagging with human-computer joint exploration. ACM Comput Surv (CSUR) 44(4):25CrossRefGoogle Scholar
  130. 130.
    Lew MS, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state of the art and challenges. ACM Trans Multimed Comput Commun Appl (TOMCCAP) 2(1):1–19CrossRefGoogle Scholar
  131. 131.
    Hu W, Xie N, Li L, Zeng X, Maybank S (2011) A survey on visual content-based video indexing and retrieval. IEEE Trans Syst Man Cybern Part C Appl Rev 41(6):797–819CrossRefGoogle Scholar
  132. 132.
    Park Y-J, Chang K-N (2009) Individual and group behavior-based customer profile model for personalized product recommendation. Expert Syst Appl 36(2):1932–1939MathSciNetCrossRefGoogle Scholar
  133. 133.
    Barragáns-Martínez AB, Costa-Montenegro E, Burguillo JC, Rey-López M, Mikic-Fonte FA, Peleteiro A (2010) A hybrid content-based and item-based collaborative filtering approach to recommend tv programs enhanced with singular value decomposition. Inf Sci 180(22):4290–4311CrossRefGoogle Scholar
  134. 134.
    Naphade M, Smith JR, Tesic J, Chang S-F, Hsu W, Kennedy L, Hauptmann A, Curtis J (2006) Large-scale concept ontology for multimedia. IEEE Multimedia 13(3):86–91CrossRefGoogle Scholar
  135. 135.
    Ma Z, Yang Y, Cai Y, Sebe N, Hauptmann AG (2012) Knowledge adaptation for ad hoc multimedia event detection with few exemplars. In: Proceedings of the 20th ACM international conference on multimedia. ACM, pp 469–478Google Scholar
  136. 136.
    Hirsch JE (2005) An index to quantify an individual’s scientific research output. Proc Natl Acad Sci USA 102(46):16569CrossRefGoogle Scholar
  137. 137.
    Watts DJ (2004) Six degrees: the science of a connected age. WW Norton & CompanyGoogle Scholar
  138. 138.
    Aggarwal CC (2011) An introduction to social network data analytics. SpringerGoogle Scholar
  139. 139.
    Scellato S, Noulas A, Mascolo C (2011) Exploiting place features in link prediction on location-based social networks. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1046–1054Google Scholar
  140. 140.
    Ninagawa A, Eguchi K (2010) Link prediction using probabilistic group models of network structure. In: Proceedings of the 2010 ACM symposium on applied Computing. ACM, pp 1115–1116Google Scholar
  141. 141.
    Dunlavy DM, Kolda TG, Acar E (2011) Temporal link prediction using matrix and tensor factorizations. ACM Transac Knowl Discov Data (TKDD) 5(2):10Google Scholar
  142. 142.
    Leskovec J, Lang KJ, Mahoney M (2010) Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th international conference on World wide web. ACM, pp 631–640Google Scholar
  143. 143.
    Du N, Wu B, Pei X, Wang B, Xu L (2007) Community detection in large-scale social networks. In: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis. ACM, pp 16–25Google Scholar
  144. 144.
    Garg S, Gupta T, Carlsson N, Mahanti A (2009) Evolution of an online social aggregation network: an empirical study. In: Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference. ACM, pp 315–321Google Scholar
  145. 145.
    Allamanis M, Scellato S, Mascolo C (2012) Evolution of a location-based online social network: analysis and models. In: Proceedings of the 2012 ACM conference on Internet measurement conference. ACM, pp 145–158Google Scholar
  146. 146.
    Gong NZ, Xu W, Huang L, Mittal P, Stefanov E, Sekar V, Song D (2012) Evolution of social-attribute networks: measurements, modeling, and implications using google+. In: Proceedings of the 2012 ACM conference on Internet measurement conference. ACM, pp 131–144Google Scholar
  147. 147.
    Zheleva E, Sharara H, Getoor L (2009) Co-evolution of social and affiliation networks. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1007–1016Google Scholar
  148. 148.
    Tang J, Sun J, Wang C, Yang Z (2009) Social influence analysis in large-scale networks. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 807–816Google Scholar
  149. 149.
    Li Y, Chen W, Wang Y, Zhang Z-L (2013) Influence diffusion dynamics and influence maximization in social networks with friend and foe relationships. In Proceedings of the sixth ACM international conference on Web search and data mining. ACM, pp 657–666Google Scholar
  150. 150.
    Dai W, Chen Y, Xue G-R, Yang Q, Yu Y (2008) Translated learning: transfer learning across different feature spaces: In: Advances in neural information processing systems. pp 353–360Google Scholar
  151. 151.
    Cisco Visual Networking Index (2013) Global mobile data traffic forecast update, 2012–2017 http://www.cisco.com/en.US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-520862.html (Son erişim: 5 Mayıs 2013)
  152. 152.
    Rhee Y, Lee J (2009) On modeling a model of mobile community: designing user interfaces to support group interaction. Interactions 16(6):46–51CrossRefGoogle Scholar
  153. 153.
    Han J, Lee J-G, Gonzalez H, Li X (2008) Mining massive rfid, trajectory, and traffic data sets. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, p 2Google Scholar
  154. 154.
    Garg MK, Kim D-J, Turaga DS, Prabhakaran B (2010) Multimodal analysis of body sensor network data streams for real-time healthcare. In: Proceedings of the international conference on multimedia information retrieval. ACM, pp 469–478Google Scholar
  155. 155.
    Park Y, Ghosh J (2012) A probabilistic imputation framework for predictive analysis using variably aggregated, multi-source healthcare data. In: Proceedings of the 2nd ACM SIGHIT international health informatics symposium. ACM, pp 445–454Google Scholar
  156. 156.
    Tasevski P (2011) Password attacks and generation strategies. Tartu University: Faculty of Mathematics and Computer SciencesGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhanChina
  2. 2.Department of Electrical & Computer EngineeringAuburn UniversityAuburnUSA
  3. 3.TNLIST, School of SoftwareTsinghua UniversityBeijingChina

Personalised recommendations