Abstract
We have introduced several key technologies related to big data, i.e., cloud computing, IoT, data center, and Hadoop. Next, we will focus on the value chain of big data, which can be generally divided into four phases: data generation, data acquisition, data storage, and data analysis. If we take data as a raw material, data generation and data acquisition are exploitation process, data storage is a storage process, and data analysis is a production process that utilizes the raw material to create new value.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
James Manyika, McKinsey Global Institute, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela Hung Byers. Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute, 2011.
John Gantz and David Reinsel. The digital universe decade-are you ready. External Publication of IDC (Analyse the Future) Information and Data, pages 1–16, 2010.
Douglas Laney. 3-d data management: Controlling data volume, velocity and variety. META Group Research Note, February, 6, 2001.
Kenneth Cukier. Data, data everywhere: A special report on managing information. Economist Newspaper, 2010.
Paul Zikopoulos, Chris Eaton, et al. Understanding big data: Analytics for enterprise class hadoop and streaming data. McGraw-Hill Osborne Media, 2011.
Randal E Bryant. Data-intensive scalable computing for scientific applications. Computing in Science & Engineering, 13(6):25–33, 2011.
Mohd Helmy Abd Wahab, Mohd Norzali Haji Mohd, Hafizul Fahri Hanafi, and Mohamad Farhan Mohamad Mohsin. Data pre-processing on web server logs for generalized association rules mining algorithm. World Academy of Science, Engineering and Technology, 48:2008, 2008.
Alexandros Nanopoulos, Yannis Manolopoulos, Maciej Zakrzewicz, and Tadeusz Morzy. Indexing web access-logs for pattern queries. In Proceedings of the 4th international workshop on Web information and data management, pages 63–68. ACM, 2002.
Karuna P Joshi, Anupam Joshi, and Yelena Yesha. On using a warehouse to analyze web logs. Distributed and Parallel Databases, 13(2):161–180, 2003.
Vijay Chandramohan and Ken Christensen. A first look at wired sensor networks for video surveillance systems. In Local Computer Networks, 2002. Proceedings. LCN 2002. 27th Annual IEEE Conference on, pages 728–729. IEEE, 2002.
Leo Selavo, Anthony Wood, Qing Cao, Tamim Sookoor, Hengchang Liu, Aravind Srinivasan, Yafeng Wu, Woochul Kang, John Stankovic, Don Young, et al. Luster: wireless sensor network for environmental research. In Proceedings of the 5th international conference on Embedded networked sensor systems, pages 103–116. ACM, 2007.
Guillermo Barrenetxea, François Ingelrest, Gunnar Schaefer, Martin Vetterli, Olivier Couach, and Marc Parlange. Sensorscope: Out-of-the-box environmental monitoring. In Information Processing in Sensor Networks, 2008. IPSN’08. International Conference on, pages 332–343. IEEE, 2008.
Younghun Kim, Thomas Schmid, Zainul M Charbiwala, Jonathan Friedman, and Mani B Srivastava. Nawms: nonintrusive autonomous water monitoring system. In Proceedings of the 6th ACM conference on Embedded network sensor systems, pages 309–322. ACM, 2008.
Sukun Kim, Shamim Pakzad, David Culler, James Demmel, Gregory Fenves, Steven Glaser, and Martin Turon. Health monitoring of civil infrastructures using wireless sensor networks. In Information Processing in Sensor Networks, 2007. IPSN 2007. 6th International Symposium on, pages 254–263. IEEE, 2007.
Matteo Ceriotti, Luca Mottola, Gian Pietro Picco, Amy L Murphy, Stefan Guna, Michele Corra, Matteo Pozzi, Daniele Zonta, and Paolo Zanon. Monitoring heritage buildings with wireless sensor networks: The torre aquila deployment. In Proceedings of the 2009 International Conference on Information Processing in Sensor Networks, pages 277–288. IEEE Computer Society, 2009.
Gilman Tolle, Joseph Polastre, Robert Szewczyk, David Culler, Neil Turner, Kevin Tu, Stephen Burgess, Todd Dawson, Phil Buonadonna, David Gay, et al. A macroscope in the redwoods. In Proceedings of the 3rd international conference on Embedded networked sensor systems, pages 51–63. ACM, 2005.
Feng Wang and Jiangchuan Liu. Networked wireless sensor data collection: issues, challenges, and approaches. Communications Surveys & Tutorials, IEEE, 13(4):673–687, 2011.
Junghoo Cho and Hector Garcia-Molina. Parallel crawlers. In Proceedings of the 11th international conference on World Wide Web, pages 124–135. ACM, 2002.
Suryakant Choudhary, Mustafa Emre Dincturk, Seyed M Mirtaheri, Ali Moosavi, Gregor von Bochmann, Guy-Vincent Jourdan, and Iosif-Viorel Onut. Crawling rich internet applications: the state of the art. In CASCON, pages 146–160, 2012.
Nasir Ghani, Sudhir Dixit, and Ti-Shiang Wang. On ip-over-wdm integration. Communications Magazine, IEEE, 38(3):72–84, 2000.
James Manchester, Jon Anderson, Bharat Doshi, and Subra Dravida. Ip over sonet. Communications Magazine, IEEE, 36(5):136–142, 1998.
M Jinno, H Takara, and B Kozicki. Dynamic optical mesh networks: Drivers, challenges and solutions for the future. In Optical Communication, 2009. ECOC’09. 35th European Conference on, pages 1–4. IEEE, 2009.
Luiz André Barroso and Urs Hölzle. The datacenter as a computer: An introduction to the design of warehouse-scale machines. Synthesis Lectures on Computer Architecture, 4(1):1–108, 2009.
Jean Armstrong. Ofdm for optical communications. Journal of lightwave technology, 27(3): 189–204, 2009.
William Shieh. Ofdm for flexible high-speed optical networks. Journal of Lightwave Technology, 29(10):1560–1577, 2011.
Cisco data center interconnect design and deployment guide, 2010.
Albert Greenberg, James R Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A Maltz, Parveen Patel, and Sudipta Sengupta. Vl2: a scalable and flexible data center network. In ACM SIGCOMM Computer Communication Review, volume 39, pages 51–62. ACM, 2009.
Chuanxiong Guo, Guohan Lu, Dan Li, Haitao Wu, Xuan Zhang, Yunfeng Shi, Chen Tian, Yongguang Zhang, and Songwu Lu. Bcube: a high performance, server-centric network architecture for modular data centers. ACM SIGCOMM Computer Communication Review, 39(4):63–74, 2009.
Nathan Farrington, George Porter, Sivasankar Radhakrishnan, Hamid Hajabdolali Bazzaz, Vikram Subramanya, Yeshaiahu Fainman, George Papen, and Amin Vahdat. Helios: a hybrid electrical/optical switch architecture for modular data centers. ACM SIGCOMM Computer Communication Review, 41(4):339–350, 2011.
Hussam Abu-Libdeh, Paolo Costa, Antony Rowstron, Greg O’Shea, and Austin Donnelly. Symbiotic routing in future data centers. ACM SIGCOMM Computer Communication Review, 40(4):51–62, 2010.
Cedric Lam, Hong Liu, Bikash Koley, Xiaoxue Zhao, Valey Kamalov, and Vijay Gill. Fiber optic communication technologies: What’s needed for datacenter network operations. Communications Magazine, IEEE, 48(7):32–39, 2010.
Guohui Wang, David G Andersen, Michael Kaminsky, Konstantina Papagiannaki, TS Ng, Michael Kozuch, and Michael Ryan. c-through: Part-time optics in data centers. In ACM SIGCOMM Computer Communication Review, volume 40, pages 327–338. ACM, 2010.
Xiaohui Ye, Yawei Yin, SJ Ben Yoo, Paul Mejia, Roberto Proietti, and Venkatesh Akella. Dos: A scalable optical switch for datacenters. In Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, page 24. ACM, 2010.
Ankit Singla, Atul Singh, Kishore Ramachandran, Lei Xu, and Yueping Zhang. Proteus: a topology malleable data center network. In Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks, page 8. ACM, 2010.
Odile Liboiron-Ladouceur, Isabella Cerutti, Pier Giorgio Raponi, Nicola Andriolli, and Piero Castoldi. Energy-efficient design of a scalable optical multiplane interconnection architecture. Selected Topics in Quantum Electronics, IEEE Journal of, 17(2):377–383, 2011.
Avinash Karanth Kodi and Ahmed Louri. Energy-efficient and bandwidth-reconfigurable photonic networks for high-performance computing (hpc) systems. Selected Topics in Quantum Electronics, IEEE Journal of, 17(2):384–395, 2011.
Xia Zhou, Zengbin Zhang, Yibo Zhu, Yubo Li, Saipriya Kumar, Amin Vahdat, Ben Y Zhao, and Haitao Zheng. Mirror mirror on the ceiling: Flexible wireless links for data centers. ACM SIGCOMM Computer Communication Review, 42(4):443–454, 2012.
Maurizio Lenzerini. Data integration: A theoretical perspective. In Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 233–246. ACM, 2002.
Wiki. Applications and organizations using hadoop. http://wiki.apache.org/hadoop/PoweredBy, 2013.
Michael J Cafarella, Alon Halevy, and Nodira Khoussainova. Data integration for the relational web. Proceedings of the VLDB Endowment, 2(1):1090–1101, 2009.
Jonathan I Maletic and Andrian Marcus. Data cleansing: Beyond integrity analysis. In IQ, pages 200–209. Citeseer, 2000.
Ron Kohavi, Llew Mason, Rajesh Parekh, and Zijian Zheng. Lessons and challenges from mining retail e-commerce data. Machine Learning, 57(1–2):83–113, 2004.
Haiquan Chen, Wei-Shinn Ku, Haixun Wang, and Min-Te Sun. Leveraging spatio-temporal redundancy for rfid data cleansing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 51–62. ACM, 2010.
Zhou Zhao and Wilfred Ng. A model-based approach for rfid data stream cleansing. In Proceedings of the 21st ACM international conference on Information and knowledge management, pages 862–871. ACM, 2012.
Nodira Khoussainova, Magdalena Balazinska, and Dan Suciu. Probabilistic event extraction from rfid data. In Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on, pages 1480–1482. IEEE, 2008.
Katherine G Herbert and Jason TL Wang. Biological data cleaning: a case study. International Journal of Information Quality, 1(1):60–82, 2007.
Tsung-Han Tsai and Chung-Yuan Lin. Exploring contextual redundancy in improving object-based video coding for video sensor networks surveillance. Multimedia, IEEE Transactions on, 14(3):669–682, 2012.
Sunita Sarawagi and Anuradha Bhamidipaty. Interactive deduplication using active learning. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 269–278. ACM, 2002.
Uday Kamath, Jack Compton, Rezarta Islamaj Dogan, Kenneth De Jong, and Amarda Shehu. An evolutionary algorithm approach for feature generation from sequence data and its application to dna splice site prediction. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 9(5):1387–1398, 2012.
Kwong-Sak Leung, Kin Hong Lee, Jin-Feng Wang, Eddie YT Ng, Henry LY Chan, Stephen KW Tsui, Tony SK Mok, PC-H Tse, and JJ-Y Sung. Data mining on dna sequences of hepatitis b virus. Computational Biology and Bioinformatics, IEEE/ACM Transactions on, 8(2):428–440, 2011.
Zi Huang, Hengtao Shen, Jiajun Liu, and Xiaofang Zhou. Effective data co-reduction for multimedia similarity search. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 1021–1032. ACM, 2011.
Jens Bleiholder and Felix Naumann. Data fusion. ACM Computing Surveys (CSUR), 41(1):1, 2008.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2014 The Author(s)
About this chapter
Cite this chapter
Chen, M., Mao, S., Zhang, Y., Leung, V.C.M. (2014). Big Data Generation and Acquisition. In: Big Data. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-06245-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-06245-7_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06244-0
Online ISBN: 978-3-319-06245-7
eBook Packages: Computer ScienceComputer Science (R0)