Abstract
The term Smart Data refers to the challenge of transforming raw data into quality data that can be appropriately exploited to obtain valuable insights. Big Data is focused on volume, velocity, variety, veracity, and value. The idea of Smart Data is to separate the physical properties of the data (volume, velocity, and variety), from the value and veracity of the data. This transformation is the key to move from Big to Smart Data. Without value and veracity, Big Data becomes an accumulation of raw data that is not accessible in order to extract knowledge. Therefore, Smart Data discovery is tasked to extract useful information from data, in the form of a subset (big or not), which poses enough quality for a successful data mining process. The impact of Smart Data discovery in industry and academia is two-fold: higher quality data mining and reduction of data storage costs. In this chapter we give an insight of the state of Smart Data. Next, we provide a discussion on how to move from Big to Smart Data. We finish with an introduction to Smart Data and its relation with the Internet of Things.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R., Konwinski, A., et al. (2010). A view of cloud computing. Communications of the ACM, 53(4), 50–58.
Atzori, L., Iera, A., & Morabito, G. (2010). The internet of things: A survey. Computer Networks, 54(15), 2787–2805.
Baldassarre, M. T., Caballero, I., Caivano, D., Rivas Garcia, B., & Piattini, M. (2018). From big data to smart data: A data quality perspective. In Proceedings of the 1st ACM SIGSOFT International Workshop on Ensemble-Based Software Engineering (pp. 19–24). New York: ACM.
Chen, J., Dosyn, D., Lytvyn, V., & Sachenko, A. (2017). Smart data integration by goal driven ontology learning. In Advances in Intelligent Systems and Computing (vol. 529, pp. 283–292).
del Río, S., López, V., Benítez, J. M., & Herrera, F. (2014). On the use of MapReduce for imbalanced big data using random forest. Information Sciences, 285, 112–137.
Fan, J., & Fan, Y. (2008). High dimensional classification using features annealed independence rules. Annals of Statistics, 36(6), 2605–2637.
Fan, J., Han, F., & Liu, H. (2014). Challenges of big data analysis. National Science Review, 1(2), 293–314.
Fernández, A., del Río, S., Chawla, N. V., & Herrera, F. (2017). An insight into imbalanced big data classification: outcomes and challenges. Complex & Intelligent Systems, 3(2), 105–120.
Frénay, B., & Verleysen, M. (2014). Classification in the presence of label noise: A survey. IEEE Transactions on Neural Networks and Learning Systems, 25(5), 845–869.
García, S., Luengo, J., & Herrera, F. (2015). Data preprocessing in data mining. Berlin: Springer.
García, S., Luengo, J., & Herrera, F. (2016). Tutorial on practical tips of the most influential data preprocessing algorithms in data mining. Knowledge-Based Systems, 98, 1–29.
García-Gil, D., Luengo, J., García, S., & Herrera, F. (2019). Enabling smart data: Noise filtering in big data classification. Information Sciences, 479, 135–152.
Iafrate, F. (2014). A journey from big data to smart data. Advances in Intelligent Systems and Computing, 261, 25–33.
Lee, J., Bagheri, B., & Kao, H.-A. (2015). A cyber-physical systems architecture for industry 4.0-based manufacturing systems. Manufacturing Letters, 3, 18–23.
Lenk, A., Bonorden, L., Hellmanns, A., Roedder, N., & Jaehnichen, S. (2015). Towards a taxonomy of standards in smart data. In Proceedings: 2015 IEEE International Conference on Big Data, IEEE Big Data 2015 (pp. 1749–1754).
Marr, B. (2015). Why only one of the 5 Vs of big data really matters. https://www.ibmbigdatahub.com/blog/why-only-one-5-vs-big-data-really-matters/. Online; accessed July 2019.
Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., et al. (2016). MLlib: Machine learning in apache spark. Journal of Machine Learning Research, 17(34), 1–7.
Monostori, L., Kádár, B., Bauernhansl, T., Kondoh, S., Kumara, S., Reinhart, G., et al. (2016). Cyber-physical systems in manufacturing. CIRP Annals, 65(2), 621–641.
Peralta, D., del Río, S., Ramírez-Gallego, S., Triguero, I., Benitez, J. M., & Herrera, F. (2016). Evolutionary feature selection for big data classification: A MapReduce approach. Mathematical Problems in Engineering, 2015, 1–11, Article ID 246139
Raja, P. V., Sivasankar, E., & Pitchiah, R. (2015). Framework for smart health: Toward connected data from big data. Advances in Intelligent Systems and Computing, 343, 423–433.
Ramírez-Gallego, S., García, S., Mouriño-Talín, H., Martínez-Rego, D., Bolón-Canedo, V., Alonso-Betanzos, A., et al. (2016). Data discretization: taxonomy and big data challenge. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 6(1), 5–21.
Ramírez-Gallego, S., Lastra, I., Martínez-Rego, D., Bolón-Canedo, V., Benítez, J. M., Herrera, F., et al. (2017). Fast-mRMR: Fast minimum redundancy maximum relevance algorithm for high-dimensional big data. International Journal of Intelligent Systems, 32(2), 134–152.
Rastogi, A. K., Narang, N., & Siddiqui, Z. A. (2018). Imbalanced big data classification: A distributed implementation of smote. In Proceedings of the Workshop Program of the 19th International Conference on Distributed Computing and Networking (p. 14). New York: ACM.
Shi, W., Cao, J., Zhang, Q., Li, Y., & Xu, L. (2016). Edge computing: Vision and challenges. IEEE Internet of Things Journal, 3(5), 637–646.
Tan, M., Tsang, I. W., & Wang, L. (2014). Towards ultrahigh dimensional feature selection for big data. Journal of Machine Learning Research, 15, 1371–1429.
Teng, H., Liu, Y., Liu, A., Xiong, N. N., Cai, Z., Wang, T., et al. (2019). A novel code data dissemination scheme for internet of things through mobile vehicle of smart cities. Future Generation Computer Systems, 94, 351–367.
Triguero, I., del Río, S., López, V., Bacardit, J., Benítez, J. M., & Herrera, F. (2015). ROSEFW-RF: the winner algorithm for the ECBDL14 big data competition: an extremely imbalanced big data bioinformatics problem. Knowledge-Based Systems, 87, 69–79.
Triguero, I., García-Gil, D., Maillo, J., Luengo, J., García, S., & Herrera, F. (2019). Transforming big data into smart data: An insight on the use of the k-nearest neighbors algorithm to obtain quality data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(2), e1289.
Triguero, I., Peralta, D., Bacardit, J., García, S., & Herrera, F. (2015). MRPR: A MapReduce solution for prototype reduction in big data classification. Neurocomputing, 150, 331–345.
Zanella, A., Bui, N., Castellani, A., Vangelista, L., & Zorzi, M. (2014). Internet of things for smart cities. IEEE Internet of Things Journal, 1(1), 22–32.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Luengo, J., García-Gil, D., Ramírez-Gallego, S., García, S., Herrera, F. (2020). Smart Data. In: Big Data Preprocessing. Springer, Cham. https://doi.org/10.1007/978-3-030-39105-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-39105-8_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39104-1
Online ISBN: 978-3-030-39105-8
eBook Packages: Computer ScienceComputer Science (R0)