Skip to main content
Log in

A Perceptually Important Points Approach Based on Imputation Clustering with Weighted Distance Techniques for Big Data Reduction in Internet of Things Cloud

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

IoT sensing devices tend to generate large volume of data samples consisting of relevant and irrelevant sensed data records. Irrelevant data points are regarded as data redundancy which are mainly eliminated from the data samples to achieve relevant ones for onward processing. Several researches have made significant effort to detect and eliminate redundant sensing data points with the support of dimensionality reduction techniques. These techniques mainly remove redundant data records by similarity comparison between the features of a given sensed dataset without considering the data records or points. However, there is a technique called Perceptually Important Point (PIP) deployed to eliminate data redundancy that considers the sensed data records but proved ineffective as it eliminates relevant sensed data alongside with redundant ones due to missing data. Therefore, K-means imputation clustering with the combination of Cosine and Manhattan Weighted Distance Measure technique is proposed in this research. Thus, for the recovery of missing data in order to improve the performance of the PIP technique. Simulations are conducted on five benchmark datasets for the elimination of redundant sensed data records. Experimental results shows that the proposed model outperforms the existing PIP technique with up to 99.967 and 99.614% accuracy with the execution time of 1234 s, before and after the elimination of redundant sensed data records on a given datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Ahmad A, Paul A, Mazhar Rathore M (2016) An efficient divide-and-conquer approach for big data analytics in machine-to-machine communication. Neurocomputing (Elsevier) 174:439–453. https://doi.org/10.1016/j.neucom.2015.04.109

    Article  Google Scholar 

  2. Ahmad A, Khan M, Paul A, Din S, Rathore MM, Jeon G, Choi GS (2018) Toward modeling and optimization of features selection in big data based social internet of things. Fut Generat Comp Sys 82:715–726. https://doi.org/10.1016/j.future.2017.09.028

    Article  Google Scholar 

  3. Challapalli K (2014) The internet of things: a time series data challenge, informix competitive technology and enablement. IBM 1–12

  4. FP7- NICT (2016) ClouT Concept: The Cloud of Things, The ClouT project is jointly funded by the European Community's Seventh Framework Programme. National Institute of Information and Communications Technology of Japan, pp1–23. http://clout-project.eu/clout-concept-the-cloud-of-things/. Accessed 26 December 2019

  5. Hruschka ER, Covoes TF (2006) Feature selection for cluster analysis: an approach based on the simplified silhoutte criterion. In Proc. of IEEE Conf. on computational intelligent agents, Web Technologies and internet of things commerce pp1–6. https://doi.org/10.1109/CIMCA.2005.1631238

  6. Jiang Y, Ren J (2011) Eigenvector sensitive feature selection for spectral clustering. In Proc. of Conf. on machine learning and knowledge discovery in database, pp.114- 129. https://doi.org/10.1007/978-3-642-23783-6_8

  7. Xue B, Cervante L, Shang L, Zhang M (2012) A particle optimization based multi-objective filter approach to feature selection for classification. In Proc. of Springer Conf. on Artificial Intelligence, pp. 673–685. https://doi.org/10.1007/978-3-642-32695-0_59

  8. Li Z, Sun L, Higgs R (2017) Research on, and Development of, Data Extraction and Data Cleaning Technology based on the Internet of Things. In Proc. of IEEE Conf. on Computation Science and Engineering and Embedded and Ubiquitous Computing, pp. 332–341. https://doi.org/10.1109/CSE-EUC.2017.248

  9. Ling WS, Yaik OB, Yue LS (2017) A novel data reduction technique with fualt-tolerance for internet-of-things. Associat Comput Mach (ACM). https://doi.org/10.1145/3018896.3018971

    Article  Google Scholar 

  10. Weiss DJ, Atkinson PM, Bhatt S, Mappin B, Hay SI, Gething PW (2014) An effective approach for gap-filling continental scale remotely sensed time-sereis. ISPRS J Photogramm Remote Sens 98:106–118. https://doi.org/10.1016/j.isprsjprs.2014.10.001

    Article  Google Scholar 

  11. Fekade B, Maksymyuk T, Kuryk M, Jo M (2018) Probabilistic recovery of incomplete sensed data in IoT. IEEE Internet Things J 5:2282–2292. https://doi.org/10.1109/JIOT.2017.2730360

    Article  Google Scholar 

  12. Chang Liu Yu, Cao YL, Chen G, Vokkarane V, Yunsheng Ma, Chen S, Hou P (2017) A new deep learning-based food recognition system for dietary assessment on an edge computing service infrastructure. IEEE Trans Serv Comput 11:249–261. https://doi.org/10.1109/TSC.2017.2662008

    Article  Google Scholar 

  13. Seon Ho Oh, Kim G-W, Lim K-S (2017) Compact deep learned feature-based face recognition for visual internet of things. J Supercomp 74:6729–6741. https://doi.org/10.1007/s11227-017-2198-0

    Article  Google Scholar 

  14. Huang X, Xie K, Leng S, Yuan T, Ma M (2018) Improving quality of experience in multimedia internet of things leveraging machine learning on big data. Futur Gener Comput Syst 86:1413–1423. https://doi.org/10.1016/j.future.2018.02.046

    Article  Google Scholar 

  15. Abawajy JH, Hassan MM (2017) Federated internet of things and cloud computing pervasive patient health monitoring system. IEEE Commun Mag 5:48–53. https://doi.org/10.1109/MCOM.2017.1600374CM

    Article  Google Scholar 

  16. Gonzalez-Vidal A, Barnaghi P, Skarmeta AF (2018) BEATS: blocks of eigenvalues algorithm for time series segmentation. IEEE Trans Knowl Data Eng 30(11):2051–2064

    Google Scholar 

  17. Wu Z, Mao K, Ng G-W (2019) Enhanced feature fusion through irrelevant redundancy elimination in intra-class and extra-class discriminative correlation analysis. Neurocomputing (Elsevier) 335(2019):105–118

    Article  Google Scholar 

  18. Gong X, Si Y-W, Fong S, Biuk-Aghai RP (2016) Financial time series pattern matching with extended UCR suite and support vector machine. Expert Syst Appl 55:284–296. https://doi.org/10.1016/j.eswa.2016.02.017

    Article  Google Scholar 

  19. Feng L, Kortoci P, Liu Y (2017) A Multi-tier Data Reduction Mechanism for IoT Sensors. In Proc. of ACM 7th Conf. on Internet of Things pp1–7. https://doi.org/10.1145/3131542.3131557

  20. Srivastava H (2017) What is K-Fold cross validation. Magoosh Data Science Blog, pp 1–2. https://magoosh.com/data-science/k-fold-cross-validation/ Accessed 23 April 2020

  21. Cortez P, Rio M, Rocha M, Sousa P (2010) Multi-scale Internet Traffic forecasting using neural networks and time series methods. Expert Syst 29:142–155. https://doi.org/10.1111/j.1468-0394.2010.00568.x

    Article  Google Scholar 

  22. Fonollosa J, Sheik S, Huerta R, Marco S (2015) Reservoir computing compensates slow response of chemosensor arrays exposed to fast varying gas concentrations in continuous monitoring. Sens Actuat 215:618–629. https://doi.org/10.1016/j.snb.2015.03.028

    Article  Google Scholar 

  23. Burgues J, Jimenez-Soto JM, Marco S (2018) Estimation of limit of detection in semiconductor gas sensors through linearized calibration models. Anal Chem Acta 1013:13–25. https://doi.org/10.1016/j.aca.2018.01.062

    Article  Google Scholar 

  24. Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2015) “The UCR time series classification archive,” Jul. 2015,[Online]. Available: www.cs.ucr.edu/ eamonn/time_series_data/

  25. Turabieh H, Salem AA, Abu-El-Rub N (2018) Dynamic L-Rn recovery of missing Data in IoMT applications. Futur Gener Comput Syst 89:575–583. https://doi.org/10.1016/j.future.2018.07.006

    Article  Google Scholar 

  26. Kumar S, Sriramakrishnan GV (2018) Internet of things based clinical decsion support system using data mining techniques. J Adv Res Dyn Cont Sys 10:132–139

    Google Scholar 

  27. Gonzales-Vidal A, Barnaghi P, Skarmeta AF (2018) BEATS: block of eigenvalues algorithm for time series segmentation. IEEE Trans Knowl Data Eng 30:2051–2064. https://doi.org/10.1109/TKDE.2018.2817229

    Article  Google Scholar 

  28. Bikmukhamedov RF, Nadeev AF (2019) Lightweight machine learning classifiers of IoT traffic flows. In Proc. of IEEE Conf. on Systems of Signal Synchronization, Generating and Processing in Telecommunications pp1–5. https://doi.org/10.1109/SYNCHROINFO.2019.8814156

  29. Raafat HM, Shamim Hossain M, Essa E, Elmougy S, Tolba AS, Muhammad G, Ghoneim A (2017) Fog intelligence for real-time IoT sensor data analytics. IEEE Access 5:24062–24069. https://doi.org/10.1109/ACCESS.2017.2754538

    Article  Google Scholar 

  30. Jinjian Wu, Lin W, Shi G, Li L, Fang Y (2016) Orientation selectivity based visual pattern for reduced-reference image assessment. Inf Sci 351:18–29. https://doi.org/10.1016/j.ins.2016.02.043

    Article  Google Scholar 

  31. Nekouie A, Moattar MH (2019) Missing value Imputation for breast cancer diagnosis data using tensor factorization improved by enhanced reduced adaptive particle swarm optimization. J King Saud University-Comput Info Sci 31:287–294. https://doi.org/10.1016/j.jksuci.2018.01.006

    Article  Google Scholar 

  32. Kaur D, Aujla GS, Kumar N, Zomaya AY, Perera C, Ranjan R (2018) Tensor-based big data management scheme for dimensionality reduction problem in smart grid system. IEEE Trans Knowl Data Eng 30:1985–1998. https://doi.org/10.1109/TKDE.2018.2809747

    Article  Google Scholar 

  33. Jarwan A, Sabbah A, Ibnkahla M (2019) Data transmission reduction schemes in WSNs for efficient IoT systems. IEEE J Elected Areas Commun 37:1307–1324. https://doi.org/10.1109/JSAC.2019.2904357

    Article  Google Scholar 

  34. Edje AE, Latiff SMA, Chan HW (2021) Enhanced non-parametric sequence-based learning algorithm for outlier detection in the internet of things. Neur Process Lett 53:1889–1919. https://doi.org/10.1007/s11063-021-10473-2

    Article  Google Scholar 

Download references

Acknowledgements

The Authors would like to appreciate the support of the Research Management Centre (RMC) Universiti Teknologi Malaysia with the research grant (Q.J130000.2451.07G48). The grant is also collaboratively contriubted by Research Management Centre of Universiti Tun Hussein Onn Malaysia Malaka (PJP/2018/FTMK-CACT/CRG/S01649). We would like express our sincere thanks to all research scholars from three Universities who devoted their time and knowledge to the completeness of this research project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Efetobor Abel Edje.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Edje, E.A., Shaffie, A.L.M. & Howe, C.W. A Perceptually Important Points Approach Based on Imputation Clustering with Weighted Distance Techniques for Big Data Reduction in Internet of Things Cloud. Neural Process Lett 55, 709–734 (2023). https://doi.org/10.1007/s11063-022-10905-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-022-10905-7

Keywords

Navigation