Session abstract:
Causality is an essential component of how we make sense of the physical world, and of our relations to other humans. If I put a cup on the table, and look back at it, I expect it to be there. I also expect to get a reply to my postcards, after I send them, and not before.
These days hardly any service can claim not to have some form of distributed algorithm at its core. In a distributed scenario, if we are not careful, it is very easy to break the causal sense of things. In a key-value store my writes can be directed to a replica, and my subsequent reads served from an outdated one --- my cup might not be there when I look back. Message dissemination middleware might not always provide the ordering I expect --- I might receive some replies, before their leading questions.
Luckily, most of these problems were already there 30 years ago, although in a much smaller scale, and lots of techniques have been developed to keep track of causality and make sense of the complex interactions in modern systems. However developers often look at techniques such as as replica synchronization with version vectors, or causal broadcasting algorithms, as black boxes; or as complex sets of rules that have to be followed and not questioned.
This talk will focus on bringing back the intuition on causality, and show that keeping in mind some simple concepts, allows to understand how version vectors and vector clocks work, and were they differ, and how to use more sophisticated mechanisms to handle millions of concurrent clients in modern distributed data stores.