Skip to main content

Big Data Generation and Acquisition

  • Chapter
  • First Online:
Big Data

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

Abstract

We have introduced several key technologies related to big data, i.e., cloud computing, IoT, data center, and Hadoop. Next, we will focus on the value chain of big data, which can be generally divided into four phases: data generation, data acquisition, data storage, and data analysis. If we take data as a raw material, data generation and data acquisition are exploitation process, data storage is a storage process, and data analysis is a production process that utilizes the raw material to create new value.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. James Manyika, McKinsey Global Institute, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela Hung Byers. Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute, 2011.

    Google Scholar 

  2. John Gantz and David Reinsel. The digital universe decade-are you ready. External Publication of IDC (Analyse the Future) Information and Data, pages 1–16, 2010.

    Google Scholar 

  3. Douglas Laney. 3-d data management: Controlling data volume, velocity and variety. META Group Research Note, February, 6, 2001.

    Google Scholar 

  4. Kenneth Cukier. Data, data everywhere: A special report on managing information. Economist Newspaper, 2010.

    Google Scholar 

  5. Paul Zikopoulos, Chris Eaton, et al. Understanding big data: Analytics for enterprise class hadoop and streaming data. McGraw-Hill Osborne Media, 2011.

    Google Scholar 

  6. Randal E Bryant. Data-intensive scalable computing for scientific applications. Computing in Science & Engineering, 13(6):25–33, 2011.

    Google Scholar 

  7. Mohd Helmy Abd Wahab, Mohd Norzali Haji Mohd, Hafizul Fahri Hanafi, and Mohamad Farhan Mohamad Mohsin. Data pre-processing on web server logs for generalized association rules mining algorithm. World Academy of Science, Engineering and Technology, 48:2008, 2008.

    Google Scholar 

  8. Alexandros Nanopoulos, Yannis Manolopoulos, Maciej Zakrzewicz, and Tadeusz Morzy. Indexing web access-logs for pattern queries. In Proceedings of the 4th international workshop on Web information and data management, pages 63–68. ACM, 2002.

    Google Scholar 

  9. Karuna P Joshi, Anupam Joshi, and Yelena Yesha. On using a warehouse to analyze web logs. Distributed and Parallel Databases, 13(2):161–180, 2003.

    Google Scholar 

  10. Vijay Chandramohan and Ken Christensen. A first look at wired sensor networks for video surveillance systems. In Local Computer Networks, 2002. Proceedings. LCN 2002. 27th Annual IEEE Conference on, pages 728–729. IEEE, 2002.

    Google Scholar 

  11. Leo Selavo, Anthony Wood, Qing Cao, Tamim Sookoor, Hengchang Liu, Aravind Srinivasan, Yafeng Wu, Woochul Kang, John Stankovic, Don Young, et al. Luster: wireless sensor network for environmental research. In Proceedings of the 5th international conference on Embedded networked sensor systems, pages 103–116. ACM, 2007.

    Google Scholar 

  12. Guillermo Barrenetxea, François Ingelrest, Gunnar Schaefer, Martin Vetterli, Olivier Couach, and Marc Parlange. Sensorscope: Out-of-the-box environmental monitoring. In Information Processing in Sensor Networks, 2008. IPSN’08. International Conference on, pages 332–343. IEEE, 2008.

    Google Scholar 

  13. Younghun Kim, Thomas Schmid, Zainul M Charbiwala, Jonathan Friedman, and Mani B Srivastava. Nawms: nonintrusive autonomous water monitoring system. In Proceedings of the 6th ACM conference on Embedded network sensor systems, pages 309–322. ACM, 2008.

    Google Scholar 

  14. Sukun Kim, Shamim Pakzad, David Culler, James Demmel, Gregory Fenves, Steven Glaser, and Martin Turon. Health monitoring of civil infrastructures using wireless sensor networks. In Information Processing in Sensor Networks, 2007. IPSN 2007. 6th International Symposium on, pages 254–263. IEEE, 2007.

    Google Scholar 

  15. Matteo Ceriotti, Luca Mottola, Gian Pietro Picco, Amy L Murphy, Stefan Guna, Michele Corra, Matteo Pozzi, Daniele Zonta, and Paolo Zanon. Monitoring heritage buildings with wireless sensor networks: The torre aquila deployment. In Proceedings of the 2009 International Conference on Information Processing in Sensor Networks, pages 277–288. IEEE Computer Society, 2009.

    Google Scholar 

  16. Gilman Tolle, Joseph Polastre, Robert Szewczyk, David Culler, Neil Turner, Kevin Tu, Stephen Burgess, Todd Dawson, Phil Buonadonna, David Gay, et al. A macroscope in the redwoods. In Proceedings of the 3rd international conference on Embedded networked sensor systems, pages 51–63. ACM, 2005.

    Google Scholar 

  17. Feng Wang and Jiangchuan Liu. Networked wireless sensor data collection: issues, challenges, and approaches. Communications Surveys & Tutorials, IEEE, 13(4):673–687, 2011.

    Google Scholar 

  18. Junghoo Cho and Hector Garcia-Molina. Parallel crawlers. In Proceedings of the 11th international conference on World Wide Web, pages 124–135. ACM, 2002.

    Google Scholar 

  19. Suryakant Choudhary, Mustafa Emre Dincturk, Seyed M Mirtaheri, Ali Moosavi, Gregor von Bochmann, Guy-Vincent Jourdan, and Iosif-Viorel Onut. Crawling rich internet applications: the state of the art. In CASCON, pages 146–160, 2012.

    Google Scholar 

  20. Nasir Ghani, Sudhir Dixit, and Ti-Shiang Wang. On ip-over-wdm integration. Communications Magazine, IEEE, 38(3):72–84, 2000.

    Google Scholar 

  21. James Manchester, Jon Anderson, Bharat Doshi, and Subra Dravida. Ip over sonet. Communications Magazine, IEEE, 36(5):136–142, 1998.

    Google Scholar 

  22. M Jinno, H Takara, and B Kozicki. Dynamic optical mesh networks: Drivers, challenges and solutions for the future. In Optical Communication, 2009. ECOC’09. 35th European Conference on, pages 1–4. IEEE, 2009.

    Google Scholar 

  23. Luiz André Barroso and Urs Hölzle. The datacenter as a computer: An introduction to the design of warehouse-scale machines. Synthesis Lectures on Computer Architecture, 4(1):1–108, 2009.

    Google Scholar 

  24. Jean Armstrong. Ofdm for optical communications. Journal of lightwave technology, 27(3): 189–204, 2009.

    Article  Google Scholar 

  25. William Shieh. Ofdm for flexible high-speed optical networks. Journal of Lightwave Technology, 29(10):1560–1577, 2011.

    Article  MathSciNet  Google Scholar 

  26. Cisco data center interconnect design and deployment guide, 2010.

    Google Scholar 

  27. Albert Greenberg, James R Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A Maltz, Parveen Patel, and Sudipta Sengupta. Vl2: a scalable and flexible data center network. In ACM SIGCOMM Computer Communication Review, volume 39, pages 51–62. ACM, 2009.

    Google Scholar 

  28. Chuanxiong Guo, Guohan Lu, Dan Li, Haitao Wu, Xuan Zhang, Yunfeng Shi, Chen Tian, Yongguang Zhang, and Songwu Lu. Bcube: a high performance, server-centric network architecture for modular data centers. ACM SIGCOMM Computer Communication Review, 39(4):63–74, 2009.

    Article  Google Scholar 

  29. Nathan Farrington, George Porter, Sivasankar Radhakrishnan, Hamid Hajabdolali Bazzaz, Vikram Subramanya, Yeshaiahu Fainman, George Papen, and Amin Vahdat. Helios: a hybrid electrical/optical switch architecture for modular data centers. ACM SIGCOMM Computer Communication Review, 41(4):339–350, 2011.

    Google Scholar 

  30. Hussam Abu-Libdeh, Paolo Costa, Antony Rowstron, Greg O’Shea, and Austin Donnelly. Symbiotic routing in future data centers. ACM SIGCOMM Computer Communication Review, 40(4):51–62, 2010.

    Article  Google Scholar 

  31. Cedric Lam, Hong Liu, Bikash Koley, Xiaoxue Zhao, Valey Kamalov, and Vijay Gill. Fiber optic communication technologies: What’s needed for datacenter network operations. Communications Magazine, IEEE, 48(7):32–39, 2010.

    Google Scholar 

  32. Guohui Wang, David G Andersen, Michael Kaminsky, Konstantina Papagiannaki, TS Ng, Michael Kozuch, and Michael Ryan. c-through: Part-time optics in data centers. In ACM SIGCOMM Computer Communication Review, volume 40, pages 327–338. ACM, 2010.

    Google Scholar 

  33. Xiaohui Ye, Yawei Yin, SJ Ben Yoo, Paul Mejia, Roberto Proietti, and Venkatesh Akella. Dos: A scalable optical switch for datacenters. In Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, page 24. ACM, 2010.

    Google Scholar 

  34. Ankit Singla, Atul Singh, Kishore Ramachandran, Lei Xu, and Yueping Zhang. Proteus: a topology malleable data center network. In Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks, page 8. ACM, 2010.

    Google Scholar 

  35. Odile Liboiron-Ladouceur, Isabella Cerutti, Pier Giorgio Raponi, Nicola Andriolli, and Piero Castoldi. Energy-efficient design of a scalable optical multiplane interconnection architecture. Selected Topics in Quantum Electronics, IEEE Journal of, 17(2):377–383, 2011.

    Google Scholar 

  36. Avinash Karanth Kodi and Ahmed Louri. Energy-efficient and bandwidth-reconfigurable photonic networks for high-performance computing (hpc) systems. Selected Topics in Quantum Electronics, IEEE Journal of, 17(2):384–395, 2011.

    Google Scholar 

  37. Xia Zhou, Zengbin Zhang, Yibo Zhu, Yubo Li, Saipriya Kumar, Amin Vahdat, Ben Y Zhao, and Haitao Zheng. Mirror mirror on the ceiling: Flexible wireless links for data centers. ACM SIGCOMM Computer Communication Review, 42(4):443–454, 2012.

    Google Scholar 

  38. Maurizio Lenzerini. Data integration: A theoretical perspective. In Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 233–246. ACM, 2002.

    Google Scholar 

  39. Wiki. Applications and organizations using hadoop. http://wiki.apache.org/hadoop/PoweredBy, 2013.

  40. Michael J Cafarella, Alon Halevy, and Nodira Khoussainova. Data integration for the relational web. Proceedings of the VLDB Endowment, 2(1):1090–1101, 2009.

    Google Scholar 

  41. Jonathan I Maletic and Andrian Marcus. Data cleansing: Beyond integrity analysis. In IQ, pages 200–209. Citeseer, 2000.

    Google Scholar 

  42. Ron Kohavi, Llew Mason, Rajesh Parekh, and Zijian Zheng. Lessons and challenges from mining retail e-commerce data. Machine Learning, 57(1–2):83–113, 2004.

    Article  Google Scholar 

  43. Haiquan Chen, Wei-Shinn Ku, Haixun Wang, and Min-Te Sun. Leveraging spatio-temporal redundancy for rfid data cleansing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 51–62. ACM, 2010.

    Google Scholar 

  44. Zhou Zhao and Wilfred Ng. A model-based approach for rfid data stream cleansing. In Proceedings of the 21st ACM international conference on Information and knowledge management, pages 862–871. ACM, 2012.

    Google Scholar 

  45. Nodira Khoussainova, Magdalena Balazinska, and Dan Suciu. Probabilistic event extraction from rfid data. In Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on, pages 1480–1482. IEEE, 2008.

    Google Scholar 

  46. Katherine G Herbert and Jason TL Wang. Biological data cleaning: a case study. International Journal of Information Quality, 1(1):60–82, 2007.

    Google Scholar 

  47. Tsung-Han Tsai and Chung-Yuan Lin. Exploring contextual redundancy in improving object-based video coding for video sensor networks surveillance. Multimedia, IEEE Transactions on, 14(3):669–682, 2012.

    Google Scholar 

  48. Sunita Sarawagi and Anuradha Bhamidipaty. Interactive deduplication using active learning. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 269–278. ACM, 2002.

    Google Scholar 

  49. Uday Kamath, Jack Compton, Rezarta Islamaj Dogan, Kenneth De Jong, and Amarda Shehu. An evolutionary algorithm approach for feature generation from sequence data and its application to dna splice site prediction. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 9(5):1387–1398, 2012.

    Google Scholar 

  50. Kwong-Sak Leung, Kin Hong Lee, Jin-Feng Wang, Eddie YT Ng, Henry LY Chan, Stephen KW Tsui, Tony SK Mok, PC-H Tse, and JJ-Y Sung. Data mining on dna sequences of hepatitis b virus. Computational Biology and Bioinformatics, IEEE/ACM Transactions on, 8(2):428–440, 2011.

    Google Scholar 

  51. Zi Huang, Hengtao Shen, Jiajun Liu, and Xiaofang Zhou. Effective data co-reduction for multimedia similarity search. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 1021–1032. ACM, 2011.

    Google Scholar 

  52. Jens Bleiholder and Felix Naumann. Data fusion. ACM Computing Surveys (CSUR), 41(1):1, 2008.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2014 The Author(s)

About this chapter

Cite this chapter

Chen, M., Mao, S., Zhang, Y., Leung, V.C.M. (2014). Big Data Generation and Acquisition. In: Big Data. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-06245-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06245-7_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06244-0

  • Online ISBN: 978-3-319-06245-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics