Fast, Cheap, and Accurate: Optimizing LLM Inference with vLLM and Quantization Conference - INTERMEDIATE LEVEL Legare Kerrison Red Hat View