Grey Fault Detection Method Based on Context Knowledge Graph in Container Cloud Storage

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1042)


In the field of container cloud storage cluster resource scheduling, the activities, such as how to schedule resources according to load changes, and migrate according to resource conditions, are mainly considered. These activities bring about frequent changes in the context and also changes in the application’s operating environment. They pose great difficulties in locating fault, especially the location of grey faults, which affect the operation of the application in the containers. Therefore, in order to ensure the normal operation of the application, grey fault detection method is proposed, which establishes a relationship knowledge graph for the relationship between the context change and the grey fault by studying the change of the application attention feature, which are brought by the context change. The method introduces temporal and spatial snapshot group architecture to solve a large number of situational temporal queries caused by too large structure of knowledge graph. The method is validated in the container cluster project and the Google open source dataset, which can effectively detect grey fault scenarios and the accuracy rate has been improved by more than 90%.


Fault detection Context Grey failure Cloud storage Knowledge graph 


  1. 1.
    Huang, P., et al.: Gray failure: the Achilles’ heel of cloud-scale systems. In: Proceedings of the 16th Workshop on Hot Topics in Operating Systems, pp. 150–155. ACM (2017)Google Scholar
  2. 2.
    Miao, Y., et al.: ImmortalGraph: a system for storage and analysis of temporal graphs. ACM Trans. Storage (TOS) 11(3), 14 (2015)Google Scholar
  3. 3.
    Docker: docker (2014).
  4. 4.
    Bernstein, D.: Containers and cloud: from LXC to docker to kubernetes. IEEE Cloud Comput. 1(3), 81–84 (2014)CrossRefGoogle Scholar
  5. 5.
    Huang, P., Guo, C., Lorch, J.R., Zhou, L., Dang, Y.: Capturing and enhancing in situ system observability for failure detection. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2018), pp. 1–16 (2018)Google Scholar
  6. 6.
    Kubernetes: kubernetes (2014).
  7. 7.
    Islam, T., Manivannan, D.: Predicting application failure in cloud: a machine learning approach. In: 2017 IEEE International Conference on Cognitive Computing (ICCC), pp. 24–31. IEEE (2017) Google Scholar
  8. 8.
    Alquraan, A., Takruri, H., Alfatafta, M., Al-Kiswany, S.: An analysis of network-partitioning failures in cloud systems. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2018), pp. 51–68 (2018)Google Scholar
  9. 9.
  10. 10.
  11. 11.
    Hariri, S., Kind, M.C.: Batch and online anomaly detection for scientific applications in a Kubernetes environment. In: Proceedings of the 9th Workshop on Scientific Cloud Computing, p. 3. ACM (2018)Google Scholar
  12. 12.
    Song, B., Yu, Y., Zhou, Y., Wang, Z., Du, S.: Host load prediction with long short-term memory in cloud computing. J. Supercomput. 74(12), 6554–6568 (2018)CrossRefGoogle Scholar
  13. 13.
    Gupta, S., Dinesh, D.A.: Resource usage prediction of cloud workloads using deep bidirectional long short term memory networks. In: 2017 IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS), pp. 1–6. IEEE (2017)Google Scholar
  14. 14.
  15. 15.
    Gupta, S., Muthiyan, N., Kumar, S., Nigam, A., Dinesh, D.A.: A supervised deep learning framework for proactive anomaly detection in cloud workloads. In: 2017 14th IEEE India Council International Conference (INDICON), pp. 1–6. IEEE (2017)Google Scholar
  16. 16.
  17. 17.
  18. 18.
    Chen, X., Lu, C.D., Pattabiraman, K.: Failure analysis of jobs in compute clouds: a Google cluster case study. In: 2014 IEEE 25th International Symposium on Software Reliability Engineering, pp. 167–177. IEEE (2014)Google Scholar
  19. 19.
    Hwang, S.Y., Yang, W.S.: On-tour attraction recommendation in a mobile environment. In: 2012 IEEE International Conference on Pervasive Computing and Communications Workshops, pp. 661–666. IEEE (2012)Google Scholar
  20. 20.
    Cao, L., Luo, J., Gallagher, A., Jin, X., Han, J., Huang, T.S.: A worldwide tourism recommendation system based on geotagged web photos. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2274–2277. IEEE (2010)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.School of Computer and Electronic InformationGuangxi UniversityNanningChina
  2. 2.Guangxi Key Laboratory of Multimedia Communications and Network TechnologyNanningChina

Personalised recommendations