Kubeflow
Kubernetes-native toolkit for ML: notebooks, training jobs, pipelines, tuning, and serving components you compose on-cluster.
Why it is included
The reference OSS pattern for running full ML lifecycles on Kubernetes at org scale.
Best for
Platform teams standardizing training and batch inference on shared K8s estates.
Strengths
- K8s-native
- Modular subprojects
- Large user base
Limitations
- Operational complexity; needs strong platform ownership
Good alternatives
SageMaker (commercial) · Vertex AI · Ray on K8s
Related tools
AI & Machine Learning
Ray
Distributed compute framework for Python: scale data loading, training, hyperparameter search, and online serving (Ray Serve).
AI & Machine Learning
MLflow
Open platform for the ML lifecycle: experiment tracking, model registry, packaging, evaluation, and production monitoring.
AI & Machine Learning
DVC
Data version control for ML: version datasets and models with Git, cloud storage, and reproducible pipelines.
AI & Machine Learning
NVIDIA Triton Inference Server
Multi-framework inference server for TensorRT, ONNX, PyTorch, Python backends—dynamic batching, ensembles, and GPU sharing.
AI & Machine Learning
Haystack
Deepset framework for production-ready search and RAG: pipelines, document stores, and evaluation for QA systems.
AI & Machine Learning
BentoML
Unified model serving and deployment toolkit: package models as APIs, ship to Kubernetes, and manage runtimes.
