Fast, Cheap, and Accurate: Optimizing LLM Inference with vLLM and Quantization Conference - INTERMEDIATE LEVEL Legare Kerrison Red Hat View
Scaling AI on Hybrid Cloud for Production LLM Inference at Scale Conference - INTERMEDIATE LEVEL Roberto Carratala Red Hat View
Fast, Cheap, and Accurate: Optimizing LLM Inference with vLLM and Quantization Similarity score = 0.78 More