Talks

Learn how to run production-ready LLM inference on Kubernetes with vLLM for fast, efficient, and cost-effective AI at scale across hybrid cloud environments. We’ll follow the evolution of production inference from vLLM fundamentals, to distributed inference patterns with llm-d, and finally to operating Models as a Service (MaaS) as a scalable platform capability across on-prem and cloud infrastructure.
Roberto Carratala
Red Hat
Roberto is a Principal AI Architect working in the AI Business Unit specializing in Container Orchestration Platforms (OpenShift & Kubernetes), AI/ML, DevSecOps, and CI/CD. With over 10 years of experience in system administration, cloud infrastructure, and AI/ML, he holds two MSc degrees in Telco Engineering and AI/ML.