Advertisement

Ubiq: A Scalable and Fault-Tolerant Log Processing Infrastructure

  • Venkatesh Basker
  • Manish Bhatia
  • Vinny Ganeshan
  • Ashish Gupta
  • Shan He
  • Scott Holzer
  • Haifeng Jiang
  • Monica Chawathe Lenart
  • Navin Melville
  • Tianhao Qiu
  • Namit Sikka
  • Manpreet SinghEmail author
  • Alexander Smolyanov
  • Yuri Vasilevski
  • Shivakumar Venkataraman
  • Divyakant Agrawal
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 337)

Abstract

Most of today’s Internet applications generate vast amounts of data (typically, in the form of event logs) that needs to be processed and analyzed for detailed reporting, enhancing user experience and increasing monetization. In this paper, we describe the architecture of Ubiq, a geographically distributed framework for processing continuously growing log files in real time with high scalability, high availability and low latency. The Ubiq framework fully tolerates infrastructure degradation and data center-level outages without any manual intervention. It also guarantees exactly-once semantics for application pipelines to process logs as a collection of multiple events. Ubiq has been in production for Google’s advertising system for many years and has served as a critical log processing framework for several dozen pipelines. Our production deployment demonstrates linear scalability with machine resources, extremely high availability even with underlying infrastructure failures, and an end-to-end latency of under a minute.

Keywords

Stream processing Continuous streams Log processing Distributed systems Multi-homing Fault tolerance Distributed consensus protocol Geo-replication 

References

  1. 1.
    Abadi, D.J., et al.: Aurora: a new model and architecture for data stream management. VLDB J. 12(2), 120–139 (2003)CrossRefGoogle Scholar
  2. 2.
    Abadi, D.J., et al.: The design of the borealis stream processing engine. In: CIDR, pp. 277–289 (2005)Google Scholar
  3. 3.
    Ananthanarayanan, R., et al.: Photon: fault-tolerant and scalable joining of continuous data streams. In: SIGMOD, pp. 577–588 (2013)Google Scholar
  4. 4.
    Apache Flink (2014). http://flink.apache.org
  5. 5.
    Apache Samza (2014). http://samza.apache.org
  6. 6.
    Apache Storm (2013). http://storm.apache.org
  7. 7.
    Arasu, A., et al.: STREAM: the Stanford stream data manager. In: SIGMOD, p. 665 (2003)Google Scholar
  8. 8.
    Chandra, T.D., et al.: Paxos made live - an engineering perspective. In: PODC, pp. 398–407 (2007)Google Scholar
  9. 9.
    Chandrasekaran, S., et al.: TelegraphCQ: continuous dataflow processing. In: SIGMOD, p. 668 (2003)Google Scholar
  10. 10.
    Chen, J., et al.: NiagaraCQ: a scalable continuous query system for internet databases. In: SIGMOD, pp. 379–390 (2000)Google Scholar
  11. 11.
    Corbett, J.C., et al.: Spanner: Google’s globally distributed database. ACM Trans. Comput. Syst. 31(3), 8 (2013)CrossRefGoogle Scholar
  12. 12.
    Gupta, A., et al.: Mesa: geo-replicated, near real-time, scalable data warehousing. PVLDB 7(12), 1259–1270 (2014)Google Scholar
  13. 13.
    Gupta, A., Shute, J.: High-availability at massive scale: building Google’s data infrastructure for ads. In: BIRTE (2015)Google Scholar
  14. 14.
    Kulkarni, S., et al.: Twitter Heron: stream processing at scale. In: SIGMOD, SIGMOD 2015, pp. 239–250 (2015)Google Scholar
  15. 15.
    Lamport, L.: The part-time parliament. ACM Trans. Comput. Syst. 16(2), 133–169 (1998)CrossRefGoogle Scholar
  16. 16.
    Verma, A., et al.: Large-scale cluster management at Google with Borg. In: EuroSys, pp. 18:1–18:17 (2015)Google Scholar
  17. 17.
    Zaharia, M., et al.: Discretized streams: fault-tolerant streaming computation at scale. In: SOSP, pp. 423–438 (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Venkatesh Basker
    • 1
  • Manish Bhatia
    • 1
  • Vinny Ganeshan
    • 1
  • Ashish Gupta
    • 1
  • Shan He
    • 1
  • Scott Holzer
    • 1
  • Haifeng Jiang
    • 1
  • Monica Chawathe Lenart
    • 1
  • Navin Melville
    • 1
  • Tianhao Qiu
    • 1
  • Namit Sikka
    • 1
  • Manpreet Singh
    • 1
    Email author
  • Alexander Smolyanov
    • 1
  • Yuri Vasilevski
    • 1
  • Shivakumar Venkataraman
    • 1
  • Divyakant Agrawal
    • 1
  1. 1.Google Inc.Mountain ViewUSA

Personalised recommendations