How the langcache-customer-data-eval repo helps measure cache hit rate, precision, and threshold tradeoffs on your own query data.
A specialized embedding model for semantic caching that improves intent matching while reducing latency, memory use, and deployment cost.
A simpler yes/no classification setup improves determinism, scales better on larger batches, and preserves evaluation quality.
How to combine embedding similarity with BM25, plus failure modes (negation and identifiers).
A Redis-backed multi-agent system that learns from successful executions, failures, and user feedback to improve structured-data QA over time.