Study on Replica Strategy Based on Access Pattern Mining in Smart City Cloud Storage System

Article
  • 4 Downloads

Abstract

The replica strategy in traditional distributed file system, which creates a copy mainly from the perspective of internal resources while changes in external demand are ignored. However, this strategy is not suitable for deployment in a service-based, resource-rich internal storage “smart city” in cloud storage center. This paper proposes a replica strategy, which combines data security (the minimum amount of copies) together with service needs (best copy volume). The strategy predicts file popularity based on access pattern mining algorithms. What’s more, the number of copies of the cloud adjusts itself dynamically according to the popularity of file and system resources. Mining algorithm is based on the analysis of the characteristics of spatio-temporal data in smart cities. The algorithm first maps the historical user access request to the spatio-temporal attribute domain. Then according to the geographical area grid and association rules, the correlation analysis and evolution rule identification of access requests are carried out in the domain of spatio-temporal attributes. Finally dig out the user access mode and predict the user’s access request, calculate the file popularity according to the request. The simulation results show that the popularity of the file calculated by the access pattern mining algorithm in this paper is simple and efficient, and the prediction accuracy of the popularity can reach 84%. The dynamic replica mechanism based on popularity has a significant advantage in coping with sudden large-scale concurrent accesses. Meanwhile, compared with the conventional dynamic replicas based on access frequency, the proposed strategy consumes less storage resources.

Keywords

Access pattern mining Saptio-temporal data Dynamical replica Prediction Smart city 

Notes

Acknowledgements

This work was supported by the Natural Science Fund of Hubei Province (2018, research on small file merging strategy for massive spatio-temporal data in smart city), The Doctoral Scientific Fund Project of Huanggang Normal University (Grant No. 2013031103). The humanities and social science research project of the Ministry of Education, special project of science and technology personnel research project (No: 13JDGC020); Hubei Provincial Higher Education Research Project (No: 2012376).

References

  1. 1.
    Li, J., Chen, S., & WU, C.-z. (2006). Model of data replication strategy based on security in grid. Computer Applications (Chinese), 26(10), 2282–2284.Google Scholar
  2. 2.
    Hou, M.-S., Wang, X.-B., & Lu, X. (2006). A novel dynamic replication management mechanism. Computer Science, 33(9), 50–51.Google Scholar
  3. 3.
    Ranganathan, K., & Foster, I. (2003). Identifying dynamie replieation strategies for a high performanee data grid. In Proceeding of the Seeond International workshop on Grid Computing (pp. 75–86), Denver, November 2003.Google Scholar
  4. 4.
    Tiantian, L., Li Chao, H., & Qingcheng, Z. G. (2011). Multiple-replicas management in the cloud environment. Journal of Computer Research and Development, 48(Supply), 254–260.Google Scholar
  5. 5.
    Allcock, B., Bester, J., Bresnahan, J., et al. (2001). Secure, efficient data transport and replica management for high performance data-intensive computing. In Proceedings of 18th IEEE symposium on the mass storage systems and technologies.Google Scholar
  6. 6.
    Wang, X., Yang, S., & Wang, S. (2010). An application based adaptive replica consistency for cloud storage. In Proceedings of the 9th international conference on grid and cloud computing (pp. 13–17), Piscataway, NJ, IEEE.Google Scholar
  7. 7.
    Carman, M., Zini, F., Serafini, L.,et al. (2002). Towards an economy-based optimisation of file access and replication on a data grid. In Proceedings of 2nd IEEE/ACM international symposium on cluster computing and the grid (CCGrid’2002) (pp. 340–345), Berlin.Google Scholar
  8. 8.
    Allcock, B., Bester, J., & Bresnahan, J., et al. Secure, efficient data transport and eplica management for high performance data-intensive computing. In Proceedings of eighteenth IEEE symposium on the mass storage systems and technologies.Google Scholar
  9. 9.
    Xiong, R., Luo, J., Song, A., & Jin, J. (2001). QoS preference-aware replica selection strategy in cloud computing. Journal on Communications (Chinese), 32(7), 93–102.Google Scholar
  10. 10.
    Wang, X., Yang, S., & Wang, S. (2010). An application-based adaptive replica consistency for cloud storage. In Proceedings of the 9th international conference on grid and cloud computing (pp. 13–17), Piscataway, NJ. IEEE.Google Scholar
  11. 11.
    Pallis, G., Vakali, A., & Pokorny, J. (2008). A clustering-based prefetching scheme on a Web cache environment. Computers & Electrical Engineering, 34(4), 309–323.CrossRefMATHGoogle Scholar
  12. 12.
    Wan, M., Jönsson, A., Wang, C., et al. (2011). Web user clustering and Web prefetching using Random Indexing with weight functions. Knowledge and Information Systems, 33(1), 89–115.CrossRefGoogle Scholar
  13. 13.
    Cadez, I., Heckerman, D., Meek, C., et al. (2003). Model-based clustering and visualization of navigation patterns on a web site. Data Mining and Knowledge Discovery, 7(4), 399–424.MathSciNetCrossRefGoogle Scholar
  14. 14.
    Perkowitz, M., & Etzioni, O. (1998). Adaptive web sites: Automatically synthesizing web pages. In Proceeding of AAAI-98 (American Association for Artificial Intelligence), (pp. 727–732).Google Scholar
  15. 15.
    Perkowitz, M., & Etzioni, O. (2000). Adaptive Web sites. Communications of the ACM, 43(10), 152–158.CrossRefMATHGoogle Scholar
  16. 16.
    Mobasher, B., Dai, H., Luo, T., et al. (2001). Effective personalization based on association rule discovery from web usage data. In Proceedings of international workshop on web information & data management.Google Scholar
  17. 17.
    Matthews, S. G., Gongora, M. A., & Hopgood A. A., et al. (2012). Temporal fuzzy association rule mining with 2-tuple linguistic representation. In IEEE international conference on fuzzy systems (pp. 1–8).Google Scholar
  18. 18.
    Matthews, S. G., Gongora, M. A., Hopgood, A. A., et al. (2013). Web usage mining with evolutionary extraction of temporal fuzzy association rules. Knowledge-Based Systems, 54(4), 66–72.CrossRefGoogle Scholar
  19. 19.
    Khosravi, M., & Tarokh, M. J. (2010). Dynamic mining of users interest navigation patterns using naive Bayesian method. In: IEEE international conference on intelligent computer communication and processing (pp. 119–122).Google Scholar
  20. 20.
    Jalali, M., Mustapha, N., Mamat, A., et al. (2008). Web user navigation pattern mining approach based on graph partitioning algorithm. Journal of Theoretical & Applied Information Technology, 33(11), 49–56.Google Scholar
  21. 21.
    Shahabi, C., & Banaei-Kashani, F. (2001). A framework for efficient and anonymous web usage mining based on client-side tracking. Lecture Notes in Computer Science, 2356, 113–144.CrossRefMATHGoogle Scholar
  22. 22.
    Mobasher, B. (2007). Data mining for web personalization. In P. Brusilovsky, A. Kobsa, & W. Nejdl (Eds.), The adaptive web (pp. 90–135). Berlin: Springer.CrossRefGoogle Scholar
  23. 23.
    Joshi, A., & Krishnapuram, R. (2000). On mining web access logs. In ACM SIGMOD workshop on research issues in data mining & knowledge discovery (pp. 63–69).Google Scholar
  24. 24.
    Shrivastava, M. V., & Gupta, M. N. (2013). Performance improvement of web usage mining by using learning based k-mean clustering. International Journal of Computer Science and Its Applications, 31(4), 2250–3765.Google Scholar
  25. 25.
    Wang, T. Z. (2012). The development of web log mining based on improve-K-means clustering analysis. In D. Jin & S. Lin (Eds.), Advances in computer science and information engineering (pp. 613–618). Berlin: Springer.CrossRefGoogle Scholar
  26. 26.
    Calheiros, R. N., Ranjan, R., et al. (2011). CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software: Practice and experience, 41(1), 23–50.Google Scholar
  27. 27.
    Che, H., Wang, Z., & Tung, Y. (2001). Analysis and design of hierarchical web caching systems. In Proceedings of the 20th annual joint conference of the IEEE computer and communications societies (INFOCOM 2001) (pp. 1416–1424). Anchorage: IEEE Computer Society.Google Scholar
  28. 28.
    Tang, X., & Chanson, S. T. (2003). Coordinated management of cascaded caches for efficient content distribution. In Proceedings of the 19th international conference on data engineering (ICDE 2003) (pp. 37–48). Bangalore: IEEE Computer Society.Google Scholar
  29. 29.
    Tang, X., & Chanson, S. T. (2002). Coordinated en-route web caching. IEEE Transactions on Computers, 51(6), 595–607.CrossRefGoogle Scholar
  30. 30.
    Liu, X., Zhihua, H., & Pan, S. (2016). Control strategy for the number of replica in smart city cloud stroage system. Geomatics and Information Science of Wuhan University (Chinese), 41(9), 1205–1210.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of TransportationHuanggang Normal UniversityHuanggangChina
  2. 2.School of Communication and Information TechnologyChongqing University of Posts and TelecommunicationsChongqingChina

Personalised recommendations