Science of Information
The main objective of this chapter is to provide an overview of the modern field of data science and some of the current progress in this field. The overview focuses on two important paradigms: (1) big data paradigm, which describes a problem space for the big data analytics, and (2) machine learning paradigm, which describes a solution space for the big data analytics. It also includes a preliminary description of the important elements of data science. These important elements are the data, the knowledge (also called responses), and the operations. The terms knowledge and responses will be used interchangeably in the rest of the book. A preliminary information of the data format, the data types and the classification are also presented in this chapter. This chapter emphasizes the importance of collaboration between the experts from multiple disciplines and provides the information on some of the current institutions that show collaborative activities with useful resources.
Thanks to the Department of Statistics, University of California, Berkeley; the Center for Science of Information, Purdue University; the Statistical Applied Mathematical Science Institute; and the Institute for Mathematics and its Applications, University of Minnesota for their support which contributed to the development of this book.
- 1.M. Loukides. “What is data science?” http://radar.oreilly.com/2010/06/what-is-data-science.html, 2010.
- 2.A. Lazarevic, V. Kumar, and J. Srivastava, “Intrusion detection: A survey,” Managing Cyber Threats, vol.5, Part I, pp. 19–78, June 2005.Google Scholar
- 3.S. Suthaharan, M. Alzahrani, S. Rajasegarar, C. Leckie and M. Palaniswami. “Labelled data collection for anomaly detection in wireless sensor networks,” in Proceedings of the 6th International Conference on Intelligent Sensors, Sensor Networks and Information Processing, pp. 269–274, 2010.Google Scholar
- 4.S. Bandari and S. Suthaharan. “Intruder detection in public space using suspicious behavior phenomena and wireless sensor networks,” in Proceedings of the 1st ACM International Workshop on Sensor-Enhanced Safety and Security in Public Spaces at ACM MOBIHOC, pp. 3–8, 2012.Google Scholar
- 5.P. Zikopoulos, C. Eaton, et al. “Understanding big data: Analytics for enterprise class hadoop and streaming data.” McGraw-Hill Osborne Media, 2011.Google Scholar
- 7.H. Tong. “Big data classification,” Data Classification: Algorithms and Applications. Chapter 10. (Eds.) C.C. Aggarwal. Taylor and Francis Group, LLC. pp. 275–286. 2015.Google Scholar
- 10.T. G. Dietterich, “Machine-learning research: Four current directions,” AI Magazine, vol. 18, no. 4, pp. 97–136, 1997.Google Scholar
- 15.K. Shvachko, H. Kuang, S. Radia, and R. Chansler. “The hadoop distributed file system,” In Proceedings of the IEEE 26th Symposium on Mass Storage Systems and Technologies, pp. 1–10, 2010.Google Scholar
- 16.T. White. Hadoop: the definitive guide. O’Reilly, 2012.Google Scholar
- 22.L. Wan, M. Zeiler, S. Zhang, Y. LeCun, and R. Fergus. “Regularization of neural networks using dropconnect.” In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 1058–1066, 2013.Google Scholar