Skip to content
OpenCatalogcurated by FLOSSK
AI & Machine Learning

Text Embeddings Inference

Rust-based high-throughput server for sentence-transformers–class embedding models with GPU/CPU backends.

Why it is included

Prominent in TAAFT’s #llm repository list as Hugging Face’s Apache-2.0 embedding inference service.

Best for

Production RAG stacks that need fast embedding microservices beside vector DBs.

Strengths

  • Purpose-built for embeddings
  • Rust performance
  • HF model compatibility

Limitations

  • Scope limited to embedding models, not full generative serving

Good alternatives

vLLM embed endpoints · Custom FastAPI + transformers

Related tools