Hong is a Software Development Engineer working for the Managed Service for Apache Flink in AWS. The team provides a fully managed platform for Apache Flink jobs. In his free time, he enjoys contributing to Apache Flink and is an Apache Flink Committer.
Modern streaming data pipelines ingest and process events from multiple sources to generate meaningful insights in real-time. These events typically need to be enriched, aggregated, or restructured into a semi-structured form in a data warehouse before being consumed and displayed in a final product.
However, what happens if there are late events, out-of-order events, or a networking outage? How can we ensure that the streaming data pipeline can recover and still produce correct results? This is where Apache Flink comes in. Apache Flink has robust event-time semantics built into the processing engine, which enables it to handle late data, process a backlog, or re-process data while ensuring consistently correct results. Additionally, Apache Flink offers a unified batch and streaming API, allowing the same job to be easily switched from a streaming job to a batch job if processing a backlog becomes necessary. Moreover, it runs over a distributed cluster - meaning scaling is easy!
In this talk, we will build a modern, robust, and scalable data pipeline that powers a real-time dashboard using Apache Flink. We will demonstrate how the data pipeline is flexible and robust even against late, out-of-order events or network outages. Come along to discover how Apache Flink can simplify and make your data pipelines more robust!
Searching for speaker images...