Speaker Details

Gayathri Rajan

Expedia Group

I am an Engineering Leader at Expedia Group leading teams who build High Volume Streaming platform capabilities. I am a Domain Driven Design evangelist having practiced it on many projects and given many talks on the subject. I am passionate about building responsible, motivated and thriving team of engineers, clean architecture and design, building highly scalable robust distributed systems


Aside from my day-to-day work, I lead Apprentices Early Talent initiative at Expedia Group and proud promoter of WomenInTech initiatives inside and outside the organisation.


I am an Engineering Leader at Expedia Group leading teams who build High Volume Streaming platform capabilities. I am a Domain Driven Design evangelist having practiced it on many projects and given many talks on the subje"/>

Assuring Data Quality at Scale

More and more companies are becoming Data and AI driven - Data is the lifeblood of many systems we build and business decisions that are made today. Customer experience and journey which in turn drive the P&L for businesses rely on data that are captured and fed into our systems. It is highly imperative that this data is of the highest quality and continues to stay high quality. The quality of data has a direct impact on the quality of the ML model output, accuracy and relevance. It also has a proportional impact on the cost of running data engineering pipelines be it stream or batch data processing. 

Following the DataMesh pattern to building platform capabilities that powers decentralised data products, I want to layout an approach to implementing Data Quality at scale, the key steps in providing confidence and trust in the data being produced & consumed by the data product teams. In this talk, I will talk about 

  • Data Quality challenges in modern day data-driven enterprises from both Stream and Batch perspective
  • Dimensions & metrics of Data Quality
  • Key parts and approach to build a Data Quality platform at scale to provide near-realtime visibility to DQ issues 
  • Fitting this capability around data eco-system including triggering remediation actions such as stopping a data pipeline 

Big Data
Data Streaming
Quality