Assured Cloud-Based Data Analysis with ClusterBFT

  • Julian James Stephen
  • Patrick Eugster
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8275)

Abstract

The shift to cloud technologies is a paradigm change that offers considerable financial and administrative gains. However governmental and business institutions wanting to tap into these gains are concerned with security issues. The cloud presents new vulnerabilities and is dominated by new kinds of applications, which calls for new security solutions.

Intuitively, Byzantine fault tolerant (BFT) replication has many benefits to enforce integrity and availability in clouds. Existing BFT systems, however, are not suited for typical “data-flow processing” cloud applications which analyze large amounts of data in a parallelizable manner: indeed, existing BFT solutions focus on replicating single monolithic servers, whilst data-flow applications consist in several different stages, each of which may give rise to multiple components at runtime to exploit cheap hardware parallelism; similarly, BFT replication hinges on comparison of redundant outputs generated, which in the case of data-flow processing can represent huge amounts of data. In fact, current limits of data processing directly depend on the amount of data that can be processed per time unit.

In this paper we present ClusterBFT, a system that secures computations being run in the cloud by leveraging BFT replication coupled with fault isolation. In short, ClusterBFT leverages a combination of variable-degree clustering, approximated and offline output comparison, smart deployment, and separation of duty, to achieve a parameterized tradeoff between fault tolerance and overhead in practice. We demonstrate the low overhead achieved with ClusterBFT when securing data-flow computations expressed in Apache Pig, and Hadoop. Our solution allows assured computation with less than 10 percent latency overhead as shown by our evaluation.

Keywords

Cloud Byzantine fault replication integrity data analysis 

References

  1. 1.
    A programmable cloud-computing research testbed, http://www.vicci.org
  2. 2.
  3. 3.
  4. 4.
    Department of Defense Information Enterprise Strategic Plan (2011-2012), http://dodcio.defense.gov/docs/DodIESP-r16.pdf
  5. 5.
    High-performance Byzantine Fault-Tolerant State Machine Replication, https://code.google.com/p/bft-smart/
  6. 6.
  7. 7.
    Abd-El-Malek, M., Ganger, G.R., Goodson, G.R., Reiter, M.K., Wylie, J.J.: Fault-scalable Byzantine Fault-tolerant Services. In: SIGOPS OSR, pp. 59–74 (2005)Google Scholar
  8. 8.
    Bessani, A., Correia, M., Quaresma, B., André, F., Sousa, P.: DepSky: Dependable and Secure Storage in a Cloud-of-Clouds. In: EuroSys 2011 (2011)Google Scholar
  9. 9.
    Birman, K., Chockler, G., van Renesse, R.: Toward a Cloud Computing Research Agenda. SIGACT News, 68–80 (2009)Google Scholar
  10. 10.
    Brun, Y., Medvidovic, N.: Keeping Data Private while Computing in the Cloud. In: CLOUD 2012 (2012)Google Scholar
  11. 11.
    Burrows, M.: The Chubby Lock Service for Loosely-coupled Distributed Systems. In: OSDI 2006 (2006)Google Scholar
  12. 12.
    Castro, M., Liskov, B.: Practical Byzantine Fault Tolerance. In: OSDI 1999 (1999)Google Scholar
  13. 13.
    Clement, A., Kapritsos, M., Lee, S., Wang, Y., Alvisi, L., Dahlin, M., Riche, T.: Upright Cluster Services. In: SOSP 2009 (2009)Google Scholar
  14. 14.
    Costa, P., Pasin, M., Bessani, A., Correia, M.: Byzantine Fault-Tolerant MapReduce: Faults are Not Just Crashes. In: CloudCom 2011 (2011)Google Scholar
  15. 15.
    Cowling, J., Myers, D., Liskov, B., Rodrigues, R., Shrira, L.: HQ Replication: A Hybrid Quorum Protocol for Byzantine Fault Tolerance. In: OSDI 2006 (2006)Google Scholar
  16. 16.
    Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM, 107–113 (2008)Google Scholar
  17. 17.
    Denning, D.: A Lattice Model of Secure Information Flow. Commun. ACM 19(5) (1976)Google Scholar
  18. 18.
    Dutta, P., Guerraoui, R., Vukolic, M.: Best-Case Complexity of Asynchronous Byzantine Consensus. Tech. rep., EPFL (2005)Google Scholar
  19. 19.
    Hadoop: Hadoop, http://hadoop.apache.org/
  20. 20.
    Kihlstrom, K.P., Moser, L.E., Melliar-Smith, P.M.: Byzantine Fault Detectors for Solving Consensus. The Computer Journal, 16–35 (2003)Google Scholar
  21. 21.
    Kotla, R., Alvisi, L., Dahlin, M., Clement, A., Wong, E.: Zyzzyva: Speculative Byzantine Fault Tolerance. In: SOSP 2007 (2007)Google Scholar
  22. 22.
    Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a Social Network or a News Media? In: WWW 2010 (2010)Google Scholar
  23. 23.
    Lamport, L., Shostak, R., Pease, M.: The Byzantine Generals Problem. ACM Trans. Prog. Lang. and Sys., 382–401 (1982)Google Scholar
  24. 24.
    Lamport, L.: Lower bounds for asynchronous consensus. In: Schiper, A., Shvartsman, M.M.A.A., Weatherspoon, H., Zhao, B.Y. (eds.) Future Directions in Distributed Computing. LNCS, vol. 2584, pp. 22–23. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  25. 25.
    MRC: DARPA-BAA-11-55: I2O Mission-oriented Resilient Clouds (MRC), https://www.fbo.gov/spg/ODA/DARPA/CMO/DARPA-BAA-11-55/listing.html
  26. 26.
    NCDC: weatherdata snapshot, http://aws.amazon.com/datasets/2759
  27. 27.
    Newell, A., Obenshain, D., Tantillo, T., Nita-Rotaru, C., Amir, Y.: Increasing Network Resiliency by Optimally Assigning Diverse Variants to Routing Nodes. In: DSN 2013 (2013)Google Scholar
  28. 28.
    Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: PigLatin: A Not-so-foreign Language for Data Processing. In: SIGMOD 2008 (2008)Google Scholar
  29. 29.
    Olston, C., Reed, B.: Inspector Gadget: A Framework for Custom Monitoring and Debugging of Distributed Dataflows. In: SIGMOD 2011 (2011)Google Scholar
  30. 30.
    Parno, B., Gentry, C., Howell, J., Raykova, M.: Pinocchio: Nearly Practical Verifiable Computation. Cryptology ePrint Archive, Report 2013/279 (2013)Google Scholar
  31. 31.
    Pleisch, S., Kupsys, A., Schiper, A.: Preventing Orphan Requests in the Context of Replicated Invocation. In: SRDS 2003 (2003)Google Scholar
  32. 32.
    Popa, R.A., Redfield, C.M.S., Zeldovich, N., Balakrishnan, H.: CryptDB: Protecting Confidentiality with Encrypted Query Processing. In: SOSP 2011 (2011)Google Scholar
  33. 33.
    Roy, I., Setty, S., Kilzer, A., Shmatikov, V., Witchel, E.: Airavat: Security and Privacy for MapReduce. In: NSDI 2010 (2010)Google Scholar
  34. 34.
    Setty, S., McPherson, R., Walfish, A.J.B.: M.: Making Argument Systems for Outsourced Computation Practical (Sometimes). In: NDSS 2012 (2012)Google Scholar
  35. 35.
    Santos Veronese, G., Correia, M., Bessani, A., Lung, L.C.: Ebawa: Efficient byzantine agreement for wide-area networks. In: HASE 2010 (2010)Google Scholar
  36. 36.
    Setty, S., Vu, V., Panpalia, N., Braun, B., Blumberg, A.J., Walfish, M.: Taking Proof-based Verified Computation a Few Steps Closer to Practicality. In: Security 2012 (2010)Google Scholar
  37. 37.
    Shvachko, K., Hairong, K., Radia, S., Chansler, R.: The Hadoop Distributed File System. In: MSST 2010 (2010)Google Scholar
  38. 38.
    Verissimo, P., Bessani, A., Pasin, M.: The TClouds Architecture: Open and Resilient Cloud-of-Clouds Computing. In: DSN Workshops 2012 (2012)Google Scholar
  39. 39.
    Yin, J., Martin, J.P., Venkataramani, A., Alvisi, L., Dahlin, M.: Separating Agreement from Execution for Byzantine Fault Tolerant Services. SIGOPS OSR, 253–267 (2003)Google Scholar
  40. 40.
    Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, U., Gunda, P., Currey, J.: DryadLINQ: a System for General-purpose Distributed Data-parallel Computing using a High-level Language. In: OSDI 2008 (2008)Google Scholar
  41. 41.
    Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In: NSDI 2012 (2012)Google Scholar
  42. 42.
    Zhang, Y., Zheng, Z., Lyu, M.R.: BFTCloud: A Byzantine Fault Tolerance Framework for Voluntary-Resource Cloud Computing. In: CloudCom 2012 (2012)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2013

Authors and Affiliations

  • Julian James Stephen
    • 1
  • Patrick Eugster
    • 1
  1. 1.Purdue UniversityUSA

Personalised recommendations