Talk

Optimizing Speed and Scale of User-Facing Analytics Using Apache Kafka and Apache Pinot

Conference
Data & AI
Voting no longer possible
Voting enabled when talk has started

Apache Kafka is the de facto standard for real-time event streaming, but what do you do if you want to perform user-facing, ad-hoc, real-time analytics too? That's where Apache Pinot comes in.


Apache Pinot is a realtime distributed OLAP datastore, which is used to deliver scalable real time analytics with low latency. It can ingest data from batch data sources (S3, HDFS, Azure Data Lake, Google Cloud Storage) as well as streaming sources such as Kafka. Pinot is used extensively at LinkedIn and Uber to power many analytical applications such as Who Viewed My Profile, Ad Analytics, Talent Analytics, Uber Eats and many more serving 100k+ queries per second while ingesting 1Million+ events per second.


Apache Kafka's highly performant, distributed, fault-tolerant, real-time publish-subscribe messaging platform powers big data solutions at Airbnb, LinkedIn, MailChimp, Netflix, the New York Times, Oracle, PayPal, Pinterest, Spotify, Twitter, Uber, Wikimedia Foundation, and countless other businesses.


Come hear from Mark Needham, Senior Engineer at a StarTree, and Karin Wolok, Head of Developer Community at StarTree, on an introduction to both systems and a view of how they work together.



Mark Needham

StarTree

Mark is currently working on real-time user-facing analytics with Apache Pinot at StarTree. He previously worked on graph analytics at Neo4j, where he co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.

Karin Wolok

StarTree

Karin is currently the leading developer community programming in the Developer Relations team at StarTree, a start-up founded by the original creators of Apache Pinot. Karin initially began her career in entertainment marketing working with the likes of names like Eminem and Live Nation. She also launched a successful professional women's network in two major cities in the U.S., organized events for her local Data Science meetup, and helped lead a on-going hackathon to put machine learning in the hands of cancer biologists. Her journey working in data eventually let her to a position as Program Manager for Community Development for the leading graph database in the world, Neo4j. Most recently, she was brought on to StarTree to improve the adoption and success of the overall developer community.