Speaker

Frank Munz
Databricks

Frank Munz works at the intersection of product, developer experience, and technical marketing at Databricks, translating complex engineering into compelling narratives, technical insights into product direction, and storytelling into demos that resonate.

Frank has presented at top-tier conferences on every continent except Antarctica (due to its inhospitable climate). His speaking engagements include Devoxx, KubeCon, JavaOne, re:Invent, and the Data + AI Summit. He's built hands-on workshops reaching tens of thousands of practitioners annually and is renowned for demos that showcase innovative and interactive applications of technology.

With a Ph.D. in Computer Science (summa cum laude) from TU Munich, where he worked on supercomputing for brain research developing systems that enable better diagnosis for children with epilepsy facing potential surgery, he brings over 25 years of expertise in data, AI, and cloud computing to every session.

At AWS, he built technical evangelism for Germany, Austria, and Switzerland, tripling the team's reach. Earlier in his career, he worked as a data scientist with a research group that won a Nobel Prize.

A published author of 3 textbooks used in US universities and 17 scientific articles, Frank was recognized as Cloud Technologist of the Year by Oracle. Expect a session where you'll leave ready to build.

View
Spark Declarative Pipelines in Action: Live Avionics Streaming from 40,000 Aircraft Overhead
Conference (INTERMEDIATE level)

Apache Spark 4.1.0, released just weeks ago in January 2026, introduces a powerful new capability: Spark Declarative Pipelines (SDP). This declarative framework brings battle-tested ETL patterns to the open-source ecosystem, simplifying the development and maintenance of data pipelines by shifting from an imperative to a declarative mindset. Instead of telling Spark how to transform your data in seemingly endless lines of DataFrame code, you define what datasets should exist in your pipeline, and Spark handles the rest.

In this hands-on session, I'll provide a critical review of SDP's design decisions and demonstrate how declarative pipelines replace traditional imperative Spark code. Next, we'll build a streaming pipeline that processes real avionics telemetry from tens of thousands of aircraft, ranging from tiny Cessnas to massive Airbus A380s, flying overhead right now.

The backbone of this demo is an open-source PySpark data source that I developed in collaboration with founders from OpenSky, based at Oxford and ETH Zurich. This custom but standard data plugin enables direct access to live ADS-B flight telemetry and showcases SDP's power: streaming tables for ingestion, materialized views for transformations, all defined declaratively. No orchestration glue code. Define what your pipeline should accomplish, and let Spark handle the execution.

I'll live-code a challenging streaming data pipeline using real-world data, leveraging just Visual Studio Code, PySpark, and Apache Spark. No proprietary tools, no vendor lock-in. Alternatively, attendees can replicate the demo on a forever-free Lakehouse account. Either way, you can follow along at no cost.

You'll leave with practical SDP knowledge and best practices for modernizing your pipeline.

WARNING: After this session, you may catch the virus of tracking live aircraft data and getting deeply proficient with Apache Spark. But after this session, coding the data pipeline won't be your bottleneck!

More

Searching for speaker images...