Skip to content
OpenCatalogcurated by FLOSSK

Browse & filter

Filter by platform, license text, maturity, maintenance cadence, and editorial tags like privacy-focused or self-hosted. Search matches names, summaries, tags, and use cases.

54 tools match your filters

Local LLM runner and model library with simple CLI and API for workstation inference.

llmlocalinference

High-throughput LLM serving with PagedAttention, continuous batching, and OpenAI-compatible APIs for GPU clusters.

llminferenceservinggpuapi

Structured generation language for fast serving: RadixAttention, constrained decoding, and multi-turn batching for frontier-class workloads.

llminferenceservinggpustructured-output

Unified OpenAI-compatible proxy and SDK for 100+ model providers (local, cloud, Bedrock, Azure) with budgets, fallbacks, and logging.

llmapiproxymulti-providergateway

Apple MLX-based LLM inference and training on Apple silicon: efficient Metal-backed transformers and examples for local chat models.

llmapple-siliconinferencemetallocal

Single-file distributable LLM weights + llama.cpp runtime: run large models from one executable with broad OS CPU/GPU support.

llmlocalinferenceportable

Universal deployment stack compiling models to Vulkan, Metal, CUDA, and WebGPU via TVM/Unity for phones, browsers, and servers.

llmedgewebgpumobilecompilation

Memory-efficient CUDA inference kernels for quantized Llama-class models—popular in consumer GPU chat UIs.

llminferencecudaquantizationlocal

NVIDIA TensorRT–based library for optimized LLM inference on GPUs with multi-GPU and speculative decoding features.

llminferencenvidiatensorrtgpu

YAML-configured fine-tuning for LLMs: LoRA, QLoRA, FSDP, and many architectures on top of Hugging Face trainers.

llmfine-tuningloratraininghuggingface

Optimized fine-tuning library claiming 2× faster LoRA/QLoRA with less VRAM via custom kernels and Hugging Face compatibility.

llmfine-tuningloratrainingoptimization

Meta’s Llama family of open **weights** (subject to Llama license) with reference code, tooling, and downloads via Hugging Face and meta-llama org.

llmopen-weightsmetafoundation-model

Mistral’s open-weight checkpoints (e.g. 7B era, Mixtral MoE) and Apache-2.0–licensed **code** alongside proprietary flagship lines—verify each checkpoint.

llmopen-weightsmoeeuropefoundation-model

Alibaba’s Qwen family (dense and MoE) with strong multilingual and coding variants; weights and code on Hugging Face under stated licenses per release.

llmopen-weightscodingmultilingualfoundation-model

DeepSeek open-weight models (e.g. V3/R1 lineage) with MIT or custom terms per release—high capability coding and reasoning checkpoints.

llmopen-weightsreasoningcodingfoundation-model

Google’s smaller open **weights** Gemma line (Gemma 2/3, etc.) with Gemma license terms, plus `gemma.cpp` for lightweight CPU inference.

llmopen-weightsgoogleedgefoundation-model

Small language model family (Phi-3/4 lineage) emphasizing strong quality per parameter; weights on Hugging Face under Microsoft licenses per release.

llmslmmicrosoftonnxedge

Technology Innovation Institute Falcon open weights (7B–180B era) under Apache-2.0 weights for many releases—landmark UAE-led open model line.

llmopen-weightsapache-2foundation-model
Honorable mention

RNN-meets-transformer linear-attention LM architecture running with O(n) memory—unique open line for long-context and embedded inference.

llmarchitecturelinear-attentionopen-weights
Honorable mention

01.AI Yi open-weight bilingual models (EN/ZH focus) with Apache-2.0 or Yi license per checkpoint on Hugging Face.

llmopen-weightsmultilingualchineseenglish

1.1B-parameter Llama-architecture model trained on ~3T tokens—Apache-2.0 weights for fast experiments and teaching.

llmslmapache-2educationedge

Allen AI fully open LLM **pipeline**: weights, training code, data mixes, and evaluation—research transparency flagship.

llmopen-sciencetrainingresearchtransparent

BigScience 176B multilingual causal LM—landmark collaborative open training effort on Jean Zay (weights under BigScience Responsible AI License).

llmmultilingualopen-weightsresearchhistory

EleutherAI framework and 20B-class models for training large autoregressive LMs with 3D parallelism—Apache-2.0 training stack.

llmtrainingdistributedresearcheleutherai

Hugging Face TB small LM family (135M–1.7B) with Apache-2.0 weights aimed at on-device and edge quality per size.

llmslmedgeapache-2huggingface

OpenAI’s open-weight GPT-OSS checkpoints (e.g. 20B, 120B) hosted on Hugging Face for local inference and fine-tuning.

llmhuggingfaceopen-weightsopenaitext-generation

Historic decoder-only LM family (124M–1.5B) under `openai-community` on the Hub—still a default tutorial and pipeline test target.

llmhuggingfacegpt-2educationtext-generation

Meta’s Open Pretrained Transformer suite (125M–175B) released with reproducible logbooks—canonical Hub org `facebook` / `facebook/opt-*`.

llmhuggingfacemetaresearchtext-generation

Early open chat models fine-tuned from Llama-class bases by LMSYS—widely mirrored on the Hub (e.g. Vicuna-7B v1.5).

llmhuggingfacechatinstruction-tuninglmsys

Z.ai GLM-5–generation checkpoints (e.g. FP8 builds) distributed on the Hub for text generation and agent-style use cases.

llmhuggingfaceglmtext-generationz.ai

EleutherAI’s public scaling suite: matched GPT-NeoX–architecture models from 70M–12B with public datasets for interpretability research.

llmhuggingfaceresearcheleutheraiinterpretability

Apple’s OpenELM family—openly released efficient language models with layer-wise scaling and Hub-hosted instruct variants.

llmhuggingfaceappleefficienttext-generation

NVIDIA Nemotron 3 open model checkpoints (dense and MoE) on Hugging Face for reasoning, coding, and agentic workloads at scale.

llmhuggingfacenvidiamoetext-generation

BigScience instruction-tuned BLOOM derivatives (e.g. BLOOMZ-560M–176B) for multilingual zero-shot instruction following on the Hub.

llmhuggingfacemultilingualinstructionbigscience

Data framework for LLM applications: ingestion, indexing, retrieval, and agents over documents and APIs.

ragllmagentsretrieval

Open-source embedding database focused on developer ergonomics for LLM apps: local dev, server mode, and simple APIs.

vector-databaseembeddingsragllm

Parameter-efficient fine-tuning methods (LoRA, adapters, prompt tuning) integrated with Transformers models.

fine-tuningloratransformersllm

Transformer Reinforcement Learning: train LLMs with RLHF, DPO, ORPO, and related preference optimization recipes.

rlhfdpoalignmentllm

Hugging Face library for large shared datasets: memory mapping, streaming, Arrow-backed columns, and Hub integration.

datanlpllmhuggingface

Alibaba’s lightweight inference engine for mobile and edge—used for on-device LLMs and classic CV models with aggressive optimization.

inferenceedgemobilellmtaaft-repositories

Alibaba’s high-performance LLM inference engine (CUDA-focused) for production serving of diverse decoder architectures.

llminferenceservinggputaaft-repositories

NVIDIA research-oriented toolkit for LLM KV-cache compression to stretch context within fixed VRAM budgets.

llmkv-cachecompressioninferencetaaft-repositories

Open-source Svelte/TypeScript app that powers HuggingChat—multi-model chat, tools, and self-hostable UI patterns.

chatuillmself-hostedtaaft-repositories

Rust LSP server that plugs LLM-backed completions into editors—designed to pair with local or API models.

lspidellmdeveloper-toolstaaft-repositories

Google library to extract structured fields from unstructured text with LLMs, source grounding, and visualization helpers.

llmextractionstructured-outputtaaft-repositories

ByteDance open agent harness for long-horizon research, coding, and creation with tools, memory, and subagents.

agentsorchestrationllmtaaft-repositories

DeepSeek Janus series: unified multimodal understanding and generation models with MIT-licensed research code.

multimodalvisionllmdeepseektaaft-repositories

LLM red-teaming framework for jailbreak and prompt-injection testing.

llmred-teamsecurity