Talks - Devoxx UK

Scaling AI on Hybrid Cloud for Production LLM Inference at Scale

Conference (INTERMEDIATE level)

Thursday from 12:20 13:10

Room E

Learn how to run production-ready LLM inference on Kubernetes with vLLM for fast, efficient, and cost-effective AI at scale across hybrid cloud environments. We’ll follow the evolution of production inference from vLLM fundamentals, to distributed inference patterns with llm-d, and finally to operating Models as a Service (MaaS) as a scalable platform capability across on-prem and cloud infrastructure.

Roberto Carratala

Red Hat

Roberto is a Principal AI Architect working in the AI Business Unit specializing in Container Orchestration Platforms (OpenShift & Kubernetes), AI/ML, DevSecOps, and CI/CD. With over 10 years of experience in system administration, cloud infrastructure, and AI/ML, he holds two MSc degrees in Telco Engineering and AI/ML.