DVC
Data version control for ML: version datasets and models with Git, cloud storage, and reproducible pipelines.
Why it is included
Fills the gap between Git and terabyte-scale artifacts—essential for serious reproducible ML outside notebook-only workflows.
Best for
Teams sharing data + model lineage across researchers and CI without copying giant blobs into Git.
Strengths
- Git-native mental model
- Remote storage
- Pipelines
Limitations
- Requires discipline on remote cache layout and access control
Good alternatives
LakeFS · Git LFS alone · MLflow artifacts
Related tools
AI & Machine Learning
MLflow
Open platform for the ML lifecycle: experiment tracking, model registry, packaging, evaluation, and production monitoring.
AI & Machine Learning
PyTorch
Deep learning framework with strong research-to-production paths.
AI & Machine Learning
Kubeflow
Kubernetes-native toolkit for ML: notebooks, training jobs, pipelines, tuning, and serving components you compose on-cluster.
AI & Machine Learning
Haystack
Deepset framework for production-ready search and RAG: pipelines, document stores, and evaluation for QA systems.
AI & Machine Learning
BentoML
Unified model serving and deployment toolkit: package models as APIs, ship to Kubernetes, and manage runtimes.
AI & Machine Learning
Ollama
Local LLM runner and model library with simple CLI and API for workstation inference.
