LiteLLM
Unified OpenAI-compatible proxy and SDK for 100+ model providers (local, cloud, Bedrock, Azure) with budgets, fallbacks, and logging.
Why it is included
Standard glue layer when apps must swap Ollama, vLLM, and hosted APIs without rewriting clients.
Best for
Product teams abstracting multi-provider LLM routing in one gateway.
Strengths
- Provider breadth
- Drop-in OpenAI API
- Observability hooks
Limitations
- Operational security for keys and logs is your responsibility
Good alternatives
Custom FastAPI · LangChain adapters
Related tools
AI & Machine Learning
LangChain
Framework for building LLM applications with chains, tools, and agents.
AI & Machine Learning
Ollama
Local LLM runner and model library with simple CLI and API for workstation inference.
AI & Machine Learning
vLLM
High-throughput LLM serving with PagedAttention, continuous batching, and OpenAI-compatible APIs for GPU clusters.
AI & Machine Learning
llama.cpp
Plain C/C++ inference for LLaMA-class models with broad community backends.
AI & Machine Learning
SGLang
Structured generation language for fast serving: RadixAttention, constrained decoding, and multi-turn batching for frontier-class workloads.
AI & Machine Learning
MLX LM
Apple MLX-based LLM inference and training on Apple silicon: efficient Metal-backed transformers and examples for local chat models.
