Summary. Google’s Ads Data Infrastructure systems run the multibilliondollar ads business at Google. High availability and strong consistencyare critical for these systems. While most distributed systemshandle machine-level failures well, handling datacenter-level failures isless common. In our experience, handling datacenter-level failures is criticalfor running true high availability systems. Most of our systems (e.g.Photon, F1, Mesa) now support multi-homing as a fundamental designproperty. Multi-homed systems run live in multiple datacenters all thetime, adaptively moving load between datacenters, with the ability tohandle outages of any scale completely transparently.This paper focuses primarily on stream processing systems, and describesour general approaches for building high availability multi-homed systems,discusses common challenges and solutions, and shares what wehave learned in building and running these large-scale systems for overten years.

About brandonmount

Ashish Gupta and Jeff Shute / Google Inc. {agupta,jshute}@google.com

document preview