Text Embeddings Inference

Rust-based high-throughput server for sentence-transformers–class embedding models with GPU/CPU backends.

Why it is included

Prominent in TAAFT’s #llm repository list as Hugging Face’s Apache-2.0 embedding inference service.

Production RAG stacks that need fast embedding microservices beside vector DBs.

vLLM embed endpoints · Custom FastAPI + transformers