Modern streaming data pipelines ingest and process events from multiple sources to generate meaningful insights in real-time. These events typically need to be enriched, aggregated, or restructured into a semi-structured form in a data warehouse before being consumed and displayed in a final product.
However, what happens if there are late events, out-of-order events, or a networking outage? How can we ensure that the streaming data pipeline can recover and still produce correct results? This is where Apache Flink comes in. Apache Flink has robust event-time semantics built into the processing engine, which enables it to handle late data, process a backlog, or re-process data while ensuring consistently correct results. Additionally, Apache Flink offers a unified batch and streaming API, allowing the same job to be easily switched from a streaming job to a batch job if processing a backlog becomes necessary. Moreover, it runs over a distributed cluster - meaning scaling is easy!
In this talk, we will build a modern, robust, and scalable data pipeline that powers a real-time dashboard using Apache Flink. We will demonstrate how the data pipeline is flexible and robust even against late, out-of-order events or network outages. Come along to discover how Apache Flink can simplify and make your data pipelines more robust!
Hong Liang Teoh
Hong is a Software Development Engineer working for the Managed Service for Apache Flink in AWS. The team provides a fully managed platform for Apache Flink jobs. In his free time, he enjoys contributing to Apache Flink and is an Apache Flink Committer.