Skip to main content

Smart Data

  • Chapter
  • First Online:
Big Data Preprocessing

Abstract

The term Smart Data refers to the challenge of transforming raw data into quality data that can be appropriately exploited to obtain valuable insights. Big Data is focused on volume, velocity, variety, veracity, and value. The idea of Smart Data is to separate the physical properties of the data (volume, velocity, and variety), from the value and veracity of the data. This transformation is the key to move from Big to Smart Data. Without value and veracity, Big Data becomes an accumulation of raw data that is not accessible in order to extract knowledge. Therefore, Smart Data discovery is tasked to extract useful information from data, in the form of a subset (big or not), which poses enough quality for a successful data mining process. The impact of Smart Data discovery in industry and academia is two-fold: higher quality data mining and reduction of data storage costs. In this chapter we give an insight of the state of Smart Data. Next, we provide a discussion on how to move from Big to Smart Data. We finish with an introduction to Smart Data and its relation with the Internet of Things.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 84.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R., Konwinski, A., et al. (2010). A view of cloud computing. Communications of the ACM, 53(4), 50–58.

    Article  Google Scholar 

  2. Atzori, L., Iera, A., & Morabito, G. (2010). The internet of things: A survey. Computer Networks, 54(15), 2787–2805.

    Article  Google Scholar 

  3. Baldassarre, M. T., Caballero, I., Caivano, D., Rivas Garcia, B., & Piattini, M. (2018). From big data to smart data: A data quality perspective. In Proceedings of the 1st ACM SIGSOFT International Workshop on Ensemble-Based Software Engineering (pp. 19–24). New York: ACM.

    Chapter  Google Scholar 

  4. Chen, J., Dosyn, D., Lytvyn, V., & Sachenko, A. (2017). Smart data integration by goal driven ontology learning. In Advances in Intelligent Systems and Computing (vol. 529, pp. 283–292).

    Google Scholar 

  5. del Río, S., López, V., Benítez, J. M., & Herrera, F. (2014). On the use of MapReduce for imbalanced big data using random forest. Information Sciences, 285, 112–137.

    Article  Google Scholar 

  6. Fan, J., & Fan, Y. (2008). High dimensional classification using features annealed independence rules. Annals of Statistics, 36(6), 2605–2637.

    Article  MathSciNet  Google Scholar 

  7. Fan, J., Han, F., & Liu, H. (2014). Challenges of big data analysis. National Science Review, 1(2), 293–314.

    Article  Google Scholar 

  8. Fernández, A., del Río, S., Chawla, N. V., & Herrera, F. (2017). An insight into imbalanced big data classification: outcomes and challenges. Complex & Intelligent Systems, 3(2), 105–120.

    Article  Google Scholar 

  9. Frénay, B., & Verleysen, M. (2014). Classification in the presence of label noise: A survey. IEEE Transactions on Neural Networks and Learning Systems, 25(5), 845–869.

    Article  Google Scholar 

  10. García, S., Luengo, J., & Herrera, F. (2015). Data preprocessing in data mining. Berlin: Springer.

    Book  Google Scholar 

  11. García, S., Luengo, J., & Herrera, F. (2016). Tutorial on practical tips of the most influential data preprocessing algorithms in data mining. Knowledge-Based Systems, 98, 1–29.

    Article  Google Scholar 

  12. García-Gil, D., Luengo, J., García, S., & Herrera, F. (2019). Enabling smart data: Noise filtering in big data classification. Information Sciences, 479, 135–152.

    Article  Google Scholar 

  13. Iafrate, F. (2014). A journey from big data to smart data. Advances in Intelligent Systems and Computing, 261, 25–33.

    Article  Google Scholar 

  14. Lee, J., Bagheri, B., & Kao, H.-A. (2015). A cyber-physical systems architecture for industry 4.0-based manufacturing systems. Manufacturing Letters, 3, 18–23.

    Article  Google Scholar 

  15. Lenk, A., Bonorden, L., Hellmanns, A., Roedder, N., & Jaehnichen, S. (2015). Towards a taxonomy of standards in smart data. In Proceedings: 2015 IEEE International Conference on Big Data, IEEE Big Data 2015 (pp. 1749–1754).

    Google Scholar 

  16. Marr, B. (2015). Why only one of the 5 Vs of big data really matters. https://www.ibmbigdatahub.com/blog/why-only-one-5-vs-big-data-really-matters/. Online; accessed July 2019.

  17. Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., et al. (2016). MLlib: Machine learning in apache spark. Journal of Machine Learning Research, 17(34), 1–7.

    MathSciNet  MATH  Google Scholar 

  18. Monostori, L., Kádár, B., Bauernhansl, T., Kondoh, S., Kumara, S., Reinhart, G., et al. (2016). Cyber-physical systems in manufacturing. CIRP Annals, 65(2), 621–641.

    Article  Google Scholar 

  19. Peralta, D., del Río, S., Ramírez-Gallego, S., Triguero, I., Benitez, J. M., & Herrera, F. (2016). Evolutionary feature selection for big data classification: A MapReduce approach. Mathematical Problems in Engineering, 2015, 1–11, Article ID 246139

    Google Scholar 

  20. Raja, P. V., Sivasankar, E., & Pitchiah, R. (2015). Framework for smart health: Toward connected data from big data. Advances in Intelligent Systems and Computing, 343, 423–433.

    Article  Google Scholar 

  21. Ramírez-Gallego, S., García, S., Mouriño-Talín, H., Martínez-Rego, D., Bolón-Canedo, V., Alonso-Betanzos, A., et al. (2016). Data discretization: taxonomy and big data challenge. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 6(1), 5–21.

    Google Scholar 

  22. Ramírez-Gallego, S., Lastra, I., Martínez-Rego, D., Bolón-Canedo, V., Benítez, J. M., Herrera, F., et al. (2017). Fast-mRMR: Fast minimum redundancy maximum relevance algorithm for high-dimensional big data. International Journal of Intelligent Systems, 32(2), 134–152.

    Article  Google Scholar 

  23. Rastogi, A. K., Narang, N., & Siddiqui, Z. A. (2018). Imbalanced big data classification: A distributed implementation of smote. In Proceedings of the Workshop Program of the 19th International Conference on Distributed Computing and Networking (p. 14). New York: ACM.

    Google Scholar 

  24. Shi, W., Cao, J., Zhang, Q., Li, Y., & Xu, L. (2016). Edge computing: Vision and challenges. IEEE Internet of Things Journal, 3(5), 637–646.

    Article  Google Scholar 

  25. Tan, M., Tsang, I. W., & Wang, L. (2014). Towards ultrahigh dimensional feature selection for big data. Journal of Machine Learning Research, 15, 1371–1429.

    MathSciNet  MATH  Google Scholar 

  26. Teng, H., Liu, Y., Liu, A., Xiong, N. N., Cai, Z., Wang, T., et al. (2019). A novel code data dissemination scheme for internet of things through mobile vehicle of smart cities. Future Generation Computer Systems, 94, 351–367.

    Article  Google Scholar 

  27. Triguero, I., del Río, S., López, V., Bacardit, J., Benítez, J. M., & Herrera, F. (2015). ROSEFW-RF: the winner algorithm for the ECBDL14 big data competition: an extremely imbalanced big data bioinformatics problem. Knowledge-Based Systems, 87, 69–79.

    Article  Google Scholar 

  28. Triguero, I., García-Gil, D., Maillo, J., Luengo, J., García, S., & Herrera, F. (2019). Transforming big data into smart data: An insight on the use of the k-nearest neighbors algorithm to obtain quality data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(2), e1289.

    Google Scholar 

  29. Triguero, I., Peralta, D., Bacardit, J., García, S., & Herrera, F. (2015). MRPR: A MapReduce solution for prototype reduction in big data classification. Neurocomputing, 150, 331–345.

    Article  Google Scholar 

  30. Zanella, A., Bui, N., Castellani, A., Vangelista, L., & Zorzi, M. (2014). Internet of things for smart cities. IEEE Internet of Things Journal, 1(1), 22–32.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Luengo, J., García-Gil, D., Ramírez-Gallego, S., García, S., Herrera, F. (2020). Smart Data. In: Big Data Preprocessing. Springer, Cham. https://doi.org/10.1007/978-3-030-39105-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-39105-8_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-39104-1

  • Online ISBN: 978-3-030-39105-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics