MLC LLM
Universal deployment stack compiling models to Vulkan, Metal, CUDA, and WebGPU via TVM/Unity for phones, browsers, and servers.
Why it is included
Unique open angle for edge and WebGPU LLM inference beyond desktop CUDA defaults.
Best for
Teams shipping LLMs to mobile, WebGPU, or heterogeneous devices.
Strengths
- Multi-backend compilation
- WebGPU path
- MLC ecosystem
Limitations
- Compile pipeline learning curve
Good alternatives
llama.cpp · ONNX Runtime
Related tools
AI & Machine Learning
llama.cpp
Plain C/C++ inference for LLaMA-class models with broad community backends.
AI & Machine Learning
PyTorch
Deep learning framework with strong research-to-production paths.
AI & Machine Learning
MNN
Alibaba’s lightweight inference engine for mobile and edge—used for on-device LLMs and classic CV models with aggressive optimization.
AI & Machine Learning
Google Gemma
Google’s smaller open **weights** Gemma line (Gemma 2/3, etc.) with Gemma license terms, plus `gemma.cpp` for lightweight CPU inference.
AI & Machine Learning
Microsoft Phi
Small language model family (Phi-3/4 lineage) emphasizing strong quality per parameter; weights on Hugging Face under Microsoft licenses per release.
AI & Machine Learning
TinyLlama
1.1B-parameter Llama-architecture model trained on ~3T tokens—Apache-2.0 weights for fast experiments and teaching.
