I was recently invited to contribute to the Amazon Builders' Library. One article I'd been wanting to publish publicly is about how bizarre distributed systems are, and what's been the biggest challenge building them, in my experience.
Developing distributed utility computing services, such as reliable long-distance telephone networks, or Amazon Web Services (AWS) services, is hard. Distributed computing is also weirder and less intuitive than other forms of computing because of two interrelated problems. Independent failures and nondeterminism cause the most impactful issues in distributed systems. In addition to the typical computing failures most engineers are used to, failures in distributed systems can occur in many other ways. What’s worse, it’s impossible always to know whether something failed. This article reviews the concepts that contribute to why distributed computing is so, well, weird.