Semantic search promises a revolution: contextual relevance and natural language understanding with just a few lines of code. On a notebook or a POC, it’s magical. But what happens when your index exceeds a billion vectors?
The magic quickly gives way to the brutality of engineering: exploding latency, uncontrolled infrastructure costs, and RAM challenges.
In this talk, we leave marketing buzz at the door and dive into the guts of Elasticsearch and OpenSearch at very large scale. We will cover how to:
If you need to move from a “Hello World” semantic search to massive production, this session is your survival guide.
The magic quickly gives way to the brutality of engineering: exploding latency, uncontrolled infrastructure costs, and RAM challenges.
In this talk, we leave marketing buzz at the door and dive into the guts of Elasticsearch and OpenSearch at very large scale. We will cover how to:
- Architect your clusters to handle a billion embeddings without failing.
- Optimize the critical trade-off between precision (recall) and performance (latency).
- Reduce costs using quantization strategies and intelligent chunking.
If you need to move from a “Hello World” semantic search to massive production, this session is your survival guide.
