NVIDIA Triton Inference Server

Name: NVIDIA Triton Inference Server
Availability: InStock

Multi-framework inference server for TensorRT, ONNX, PyTorch, Python backends—dynamic batching, ensembles, and GPU sharing.

Why it is included

Widely used OSS serving layer in NVIDIA-centric production ML and LLM hosting stacks.

GPU datacenters needing one serving plane for heterogeneous model formats.

vLLM · TorchServe · BentoML