Open source AI · the 2026 stack

Open source AI — the full stack, with honest positions on every layer.

Open-source AI in 2026 is not a checkbox. It is a stack — open weights, an inference runtime that runs locally, a fine-tuning toolchain, an agent framework, an embedding model, and a vector store — each layer with several real options and several marketing-only ones. This page maps every layer, names the projects that actually matter, and gives the position CyberdyneLabs takes on each.

Companion pages: /ai (the field map), /run-ai-locally (the practical guide), /ai-faq (40 dated answers), /glossary (vocabulary).

1 · What "open" actually means in 2026.

The word "open" is doing a lot of work in this field, and not all of it honestly. There is a real spectrum from "fully reproducible" to "you can use the API for free."

Tier	Code	Weights	Training data	Examples
Truly open	Public, permissive	Public, permissive	Public, recipe included	OLMo, Pythia, K2
Open weights	Public	Public, permissive	Recipe partial, data not redistributable	Llama 3, Mistral, Qwen, DeepSeek, Gemma, Phi
Open access	Some public	Public, restrictive license	Closed	Llama 2 (older), some Gemma
Free API	Closed	Closed	Closed	GPT-4o-mini, Claude-Instant trial, etc.

For most practical purposes — running locally, fine-tuning, redistribution — the second tier (open weights) is what you want. Truly-open projects (OLMo, Pythia) trade some quality for full reproducibility and are valuable for research; the rest is positioning.

2 · Open-weights model families.

Six families dominate 2026. Each row below is the family at large; specific size/checkpoint licensing varies.

Family	Sizes (B)	License	Position
Qwen 2.5 (Alibaba)	0.5 / 1.5 / 3 / 7 / 14 / 32 / 72	Apache 2.0 (most)	Strongest 7B-class assistant per parameter. Multilingual. Permissive. Donor for our /surgery work.
Llama 3 / 3.1 / 3.3 (Meta)	1 / 3 / 8 / 70 / 405	Custom permissive	Heavy reasoning, English-leaning, big benchmark presence. Read the actual license — there are clauses on naming and acceptable-use.
Mistral / Mixtral (Mistral AI)	7 (Mistral) / 8×7 / 8×22 (Mixtral)	Apache 2.0 (community), commercial paywall	Strong code, fast inference. Mixtral was the first widely-deployed MoE in the open-weights world.
DeepSeek V3 / V4-Flash	67 / 284 (active 13–37)	Custom	Pushed the capability/cost ratio dramatically. V4-Flash (284 B / 13 B active, 159 GB weights) was closed by CyberdyneLabs as Surgery Case 01 — end-to-end on a single 8 GB RTX 3060 Ti via our own native C++/CUDA engine and PLANCK_PACK expert streaming. Decode: 1.86 tok/s warm → 0.16 tok/s full 43-layer text, disk-I/O-bound. See /r/V4_FLASH_TECH_BRIEF.
Gemma 2 / 3 (Google)	2 / 9 / 27	Gemma license	Strong instruction-following per parameter. License is restrictive — read it before commercial use.
Phi-4 (Microsoft)	14	MIT	Strong reasoning per parameter. MIT-licensed weights. Excellent for code and math.
OLMo / Pythia (AI2 / EleutherAI)	1 – 32	Apache 2.0	Truly open — code, weights, and data recipe all public. Lower benchmark scores; high research value.

For 2026 production, the practical choice usually narrows to Qwen 2.5 7B/72B or Llama 3.3 70B for general-purpose work, with one of the DeepSeek MoE models if you need very large parameter counts on small hardware.

3 · Inference runtimes.

Runtime	Strength	License	Position
llama.cpp	Footprint, multi-backend	MIT	Default open-source local runtime. CPU/CUDA/Metal/ROCm/Vulkan. GGUF format. Pair with Ollama or LM Studio for friction-free use.
vLLM	Throughput, serving	Apache 2.0	Production GPU inference at scale. PagedAttention + continuous batching. The default for self-hosted serving.
Ollama	Friction, UX	MIT	Wraps llama.cpp with a Docker-style CLI. `ollama run qwen2.5:7b` and you are talking to an LLM.
MLC-LLM	Compile-everywhere	Apache 2.0	TVM-based compiler. Targets WebGPU, Metal, Vulkan, ROCm. Best for browser and exotic-backend deployment.
TGI	HF integration	Apache 2.0	Hugging Face's serving stack. Tight integration with Hub. Good for HF-native shops.
ExLlamaV2	EXL2 quant speed	MIT	Very fast on NVIDIA, EXL2 quant scheme has good quality/size frontier. Niche but excellent.
Frankenstellm (us)	Multi-organ system, single C++/CUDA binary (gigachad_native)	MIT	Single C++/CUDA binary. No PyTorch, no cuBLAS. 83.58 tok/s Q4 7B on RTX 3060 Ti. Hologram cache (860× repeat speedup), Black-Dog router, expert streaming for V4-Flash.

The decision tree: local dev / laptop → Ollama or llama.cpp; self-hosted serving for many users → vLLM or TGI; browser / mobile → MLC-LLM; NVIDIA-only with EXL2 quants → ExLlamaV2; writing your own runtime as a research target → start by reading llama.cpp source, then ours.

4 · Fine-tuning toolchains.

Tool	What it is	License	Position
TRL	HF Transformer-RLHF/PEFT	Apache 2.0	The reference implementation for SFT, DPO, PPO, GRPO. Pair with PEFT and Transformers libraries.
axolotl	YAML-driven training	Apache 2.0	Configure-by-YAML wrapper around TRL/DeepSpeed. Most-used community fine-tuning pipeline.
Unsloth	Fast, memory-efficient	Apache 2.0	Custom Triton kernels. ~2× faster QLoRA training than vanilla. Excellent for single-GPU work.
QLoRA (paper + bitsandbytes)	4-bit base + LoRA delta	MIT (bitsandbytes)	The technique that made 65B-class fine-tunable on 24 GB. Underpins almost everything below the multi-GPU tier.
DeepSpeed	ZeRO sharding	Apache 2.0	For multi-GPU full fine-tuning. Pairs with Megatron-LM for very large training runs.
Surgery doctrine (us)	4-axis acceptance gating	MIT (code) / CC-BY-SA (docs)	Wraps QLoRA with anchor / strict-schema / target-bench / no-leak gates. Reverts kept in public ledger. See /surgery and the BD-series reports in /r/.

5 · Agent frameworks.

Framework	Style	License	Position
LangChain	Composable Python pipeline	MIT	Largest community, biggest surface area, most legacy code. Useful as a starting point; many production teams replace it with a thinner stack as scale grows.
LangGraph	Stateful multi-agent	MIT	Explicit state-machine flavour. Good fit for production agents with clear control flow.
AutoGen	Conversational multi-agent	CC-BY 4.0 / MIT	Microsoft Research origin. Multiple agents converse to solve tasks. Cleaner abstractions than LangChain's early generations.
CrewAI	Role-based teams	MIT	Higher-level than AutoGen. "Crew" of role-typed agents. Lightweight and fast to prototype with.
OpenAI Agents SDK	OpenAI-native	MIT (SDK)	Released 2025. Tight integration with OpenAI tools. Open-source SDK against a closed runtime — caveat emptor.
Frankenstellm (us)	Multi-organ runtime	MIT	Native C++/CUDA cognitive runtime. One Q4 7B top-brain + 8 surgered specialist organs (5 GREEN after BD9), Black-Dog conductance router, line-addressable memory spine.

6 · Embeddings + vector databases.

Layer	Tool	License	Position
Embedding model	sentence-transformers	Apache 2.0	The default Python interface for embeddings. Wraps any HF model into `.encode()`.
Embedding model	BGE / GTE / E5 / Nomic	Apache 2.0 (most)	Top-of-MTEB-leaderboard families. BGE-M3 is multilingual; GTE-Qwen is the strongest open as of mid-2025.
Vector DB (in-process)	FAISS	MIT	Library, not a service. Best for embedded use and reproducible research. Limited operationally.
Vector DB (server)	Qdrant	Apache 2.0	Rust, strong query language, payload filtering. Production-ready single-node and cluster.
Vector DB (server)	Milvus	Apache 2.0	CNCF graduate. Heavyweight but mature. Best for very large indexes and multi-tenant deployments.
Vector DB (light)	Chroma	Apache 2.0	Python-friendly, easy to embed. Best for prototypes and small deployments.
Vector DB (Postgres)	pgvector	PostgreSQL	A pgvector index inside your existing Postgres. Best when you want one database, not two.

7 · Image / video / audio.

Modality	Project	License	Position
Image gen	Stable Diffusion XL / 3.5	SAI license	SDXL is the production baseline; SD 3.5 sharper but more restrictive license. Pair with ComfyUI or A1111 for UX.
Image gen	FLUX.1 (Black Forest Labs)	FLUX license (dev / pro)	Higher quality than SDXL on prompts that need fidelity. License read carefully — dev is non-commercial.
Image gen	HiDream	MIT (some)	Strong open alternative for 2025+. Worth comparison vs FLUX dev.
Image edit	InstructPix2Pix, Anydoor	MIT / Apache	Edit-by-prompt. Reasonable for simple changes; fragile on complex scenes.
Video gen	HunyuanVideo / Mochi / WAN 2.1	Various	2025 wave of open video. Quality has caught up to commercial Sora-class on short clips. 30+ GB VRAM still typical.
ASR	Whisper / Distil-Whisper / Faster-Whisper	MIT	OpenAI's gift. Distil-Whisper / Faster-Whisper are 4× faster on the same hardware. Multilingual.
TTS	OpenVoice / XTTS / Piper / Kokoro	MIT / Apache	Voice cloning at usable quality, on consumer hardware. Kokoro is the smallest, runs on CPU.

8 · Training data.

The frontier models train on multi-trillion-token corpora that are not redistributable. The open-data world is recovering — slowly — through projects that publish the recipe and curated subsets:

Dataset	Scale	License	Position
FineWeb / FineWeb-Edu	15T tokens	ODC-BY	HuggingFace's curated Common Crawl. Educational subset is the modern training-data baseline.
RedPajama-V2	30T tokens	Various source	Together's reproduction of LLaMA training mix. Heavy but well-documented.
The Pile / Pile-Uncopyrighted	825 GB	Various	EleutherAI's classic. Pile-Uncopyrighted is the legally cleaner subset.
Dolma	3T tokens	ODC-BY	Allen AI's pre-training corpus for OLMo. Recipe and tooling fully open.
SlimPajama / RefinedWeb	627B / 600B	Various	High-quality deduplicated subsets. Practical for medium-scale pretraining.
Common Crawl	3-4 PB / month	CC-BY-style for crawler	The raw input. Most of the above are processed subsets of CC.

9 · Where CyberdyneLabs sits in this stack.

Six programs, all open. We sit in the open-weights tier: we ship deltas, not donor weights, because the donor (Qwen 2.5) is already openly available under Apache 2.0. The full position:

Layer	Our contribution	License	Where
Runtime	Frankenstellm — single C++/CUDA binary (gigachad_native), Physarium-7B Q4 at 83.58 tok/s on 3060 Ti	MIT	/frankenstellm
Fine-tuning	Surgery doctrine — QLoRA + 4-axis gate + revert ledger	MIT (code) / CC-BY-SA (docs)	/surgery
Multi-organ orchestration	Frankenstellm — 7B brain + 8 specialist organs, Black-Dog router	MIT	/frankenstellm
Cognitive engine	ADAM — 1.2M concepts, Cl(3,0) dynamics, line-addressable spine	MIT	/adam
Embodied AI	MACHINA — N-d cognitive substrate world simulator	MIT	/machina
Layer-1 substrate	PhysarumChain — bio-routed L1, Cl(4,1) addresses, AMM	MIT	/physarum
4D agent ecosystem	Hypercolony — 1024 agents on hypercube, Ibn Khaldun cycle	MIT	/hypercolony
Doctrine	Truth ledger, ARIZ kernel, Black-Dog learning loop	CC-BY-SA 4.0	/downloads
Reports	66 dated research reports, individually addressable	CC-BY-SA 4.0	/r/

We do not run a paid API gateway, a metered weight-rental service, or a closed managed offering. The entire stack ships from /downloads as a single doctrine pack (24 docs) and reports archive (66 reports) plus the runtime binary. Patient capital, deliberate engineering, an aversion to premature deployment.