Skip to content
OpenCatalogcurated by FLOSSK
AI & Machine Learning

rtp-llm

Alibaba’s high-performance LLM inference engine (CUDA-focused) for production serving of diverse decoder architectures.

Why it is included

Appears in TAAFT’s #llm repository listings as Alibaba’s open serving-oriented stack.

Best for

GPU inference teams evaluating alternatives to vLLM/Triton for datacenter LLM APIs.

Strengths

  • Serving-oriented
  • Active Alibaba maintenance

Limitations

  • Primarily NVIDIA CUDA; ops patterns less universal than vLLM docs

Good alternatives

vLLM · TensorRT-LLM · SGLang

Related tools