Session abstract:
There has been so much noise surrounding advances in analytical systems lately that Online Transaction Processing (OLTP) problems may seem a bit overshadowed. Big data, streaming analytics, machine learning, and even deep learning have all changed the way businesses are run and problems are solved. Meanwhile, problems that require low latency, high write throughput and some consistency guarantees haven’t gone away.
Two of the biggest “new” ideas for operational workloads are:
- Command Query Responsibility Segregation suggests splitting operations into parallel streams of queries and modifications.
- System like Samza and Kafka Streams propose flipping the persistence mechanism around, using logs to store events and maintaining materializations on top of the logs. This idea is best introduced here.
While these approaches blur the line between stream processing and traditional operational databases, the fuzziness is coming to even the most stodgy systems. We now speak of “events” more than “transactions”, even if the transaction hasn’t gone away. Systems are becoming more parallel and more and more asynchronous.
This talk will do something a little weird: compare streaming systems like Kafka, Flink, DataFlow, and Kinesis with traditional databases, like Postgres and Oracle, or newfangled databases like Cloud Spanner, HBase, MySQL Galera Cluster, or VoltDB. It’s going to be fun.
We’ll take example operational problems and show how they would be solved in a database-first environment. Then we’ll flip them around and show how they might be expressed in a streaming-first environment (with or without a stateful database involved). What are the pros and cons? How will I know which approach is best?
This talk will contain clear explanations. Sometimes there are also clear answers, and this talk will help you identify those. Other times, the decision will not be obvious, and you’ll have to know what questions to start asking. Good thing you came to this talk!