Content-type: text/html Downes.ca ~ Stephen's Web ~ Operating a Large, Distributed System in a Reliable Way: Practices I Learned

Stephen Downes

Knowledge, Learning, Community

This is a long and detailed look at the challenges of running a distributed system told from the perspective of an insider at Uber. As the author notes, "the practices might be an overkill for smaller or less mission-critical systems." But there's no harm in knowing about them, especially given the outside chance that what you're building might suddenly become the next Uber. And there are some pretty good practices here - failover drills, blameless post-mortems, black-box testing systems.

Today: 4 Total: 97 [Direct link] [Share]


Stephen Downes Stephen Downes, Casselman, Canada
stephen@downes.ca

Copyright 2024
Last Updated: Nov 23, 2024 5:30 p.m.

Canadian Flag Creative Commons License.

Force:yes