RetrievalAttention
Repository: RetrievalAttention
Author: microsoft · Source status: Clear source
[VLDB 26, NeurIPS 25] Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.
Score basis:Clear source · Risk needs review · Universal