BentoML
Unified model serving and deployment toolkit: package models as APIs, ship to Kubernetes, and manage runtimes.
Why it is included
Popular OSS bridge between training artifacts and production services beyond ad-hoc FastAPI wrappers.
Best for
Platform teams standardizing how Python models become versioned HTTP/gRPC services.
Strengths
- Model packaging
- Multi-framework
- K8s yatai option
Limitations
- Competes conceptually with bespoke mesh of tools—pick one platform story
Good alternatives
Seldon Core · KServe · Cortex (archived)
Related tools
AI & Machine Learning
NVIDIA Triton Inference Server
Multi-framework inference server for TensorRT, ONNX, PyTorch, Python backends—dynamic batching, ensembles, and GPU sharing.
AI & Machine Learning
MLflow
Open platform for the ML lifecycle: experiment tracking, model registry, packaging, evaluation, and production monitoring.
AI & Machine Learning
Ray
Distributed compute framework for Python: scale data loading, training, hyperparameter search, and online serving (Ray Serve).
AI & Machine Learning
vLLM
High-throughput LLM serving with PagedAttention, continuous batching, and OpenAI-compatible APIs for GPU clusters.
AI & Machine Learning
SGLang
Structured generation language for fast serving: RadixAttention, constrained decoding, and multi-turn batching for frontier-class workloads.
AI & Machine Learning
LiteLLM
Unified OpenAI-compatible proxy and SDK for 100+ model providers (local, cloud, Bedrock, Azure) with budgets, fallbacks, and logging.
