Open source AI · the 2026 stack

Open source AI — the full stack, with honest positions on every layer.

Open-source AI in 2026 is not a checkbox. It is a stack — open weights, an inference runtime that runs locally, a fine-tuning toolchain, an agent framework, an embedding model, and a vector store — each layer with several real options and several marketing-only ones. This page maps every layer, names the projects that actually matter, and gives the position CyberdyneLabs takes on each.

Companion pages: /ai (the field map), /run-ai-locally (the practical guide), /ai-faq (40 dated answers), /glossary (vocabulary).

1 · What "open" actually means in 2026.

The word "open" is doing a lot of work in this field, and not all of it honestly. There is a real spectrum from "fully reproducible" to "you can use the API for free."

TierCodeWeightsTraining dataExamples
Truly openPublic, permissivePublic, permissivePublic, recipe includedOLMo, Pythia, K2
Open weightsPublicPublic, permissiveRecipe partial, data not redistributableLlama 3, Mistral, Qwen, DeepSeek, Gemma, Phi
Open accessSome publicPublic, restrictive licenseClosedLlama 2 (older), some Gemma
Free APIClosedClosedClosedGPT-4o-mini, Claude-Instant trial, etc.

For most practical purposes — running locally, fine-tuning, redistribution — the second tier (open weights) is what you want. Truly-open projects (OLMo, Pythia) trade some quality for full reproducibility and are valuable for research; the rest is positioning.

2 · Open-weights model families.

Six families dominate 2026. Each row below is the family at large; specific size/checkpoint licensing varies.

FamilySizes (B)LicensePosition
Qwen 2.5 (Alibaba)0.5 / 1.5 / 3 / 7 / 14 / 32 / 72Apache 2.0 (most)Strongest 7B-class assistant per parameter. Multilingual. Permissive. Donor for our /surgery work.
Llama 3 / 3.1 / 3.3 (Meta)1 / 3 / 8 / 70 / 405Custom permissiveHeavy reasoning, English-leaning, big benchmark presence. Read the actual license — there are clauses on naming and acceptable-use.
Mistral / Mixtral (Mistral AI)7 (Mistral) / 8×7 / 8×22 (Mixtral)Apache 2.0 (community), commercial paywallStrong code, fast inference. Mixtral was the first widely-deployed MoE in the open-weights world.
DeepSeek V3 / V4-Flash67 / 284 (active 13–37)CustomPushed the capability/cost ratio dramatically. V4-Flash (284 B / 13 B active, 159 GB weights) was closed by CyberdyneLabs as Surgery Case 01 — end-to-end on a single 8 GB RTX 3060 Ti via our own native C++/CUDA engine and PLANCK_PACK expert streaming. Decode: 1.86 tok/s warm → 0.16 tok/s full 43-layer text, disk-I/O-bound. See /r/V4_FLASH_TECH_BRIEF.
Gemma 2 / 3 (Google)2 / 9 / 27Gemma licenseStrong instruction-following per parameter. License is restrictive — read it before commercial use.
Phi-4 (Microsoft)14MITStrong reasoning per parameter. MIT-licensed weights. Excellent for code and math.
OLMo / Pythia (AI2 / EleutherAI)1 – 32Apache 2.0Truly open — code, weights, and data recipe all public. Lower benchmark scores; high research value.

For 2026 production, the practical choice usually narrows to Qwen 2.5 7B/72B or Llama 3.3 70B for general-purpose work, with one of the DeepSeek MoE models if you need very large parameter counts on small hardware.

3 · Inference runtimes.

RuntimeStrengthLicensePosition
llama.cppFootprint, multi-backendMITDefault open-source local runtime. CPU/CUDA/Metal/ROCm/Vulkan. GGUF format. Pair with Ollama or LM Studio for friction-free use.
vLLMThroughput, servingApache 2.0Production GPU inference at scale. PagedAttention + continuous batching. The default for self-hosted serving.
OllamaFriction, UXMITWraps llama.cpp with a Docker-style CLI. ollama run qwen2.5:7b and you are talking to an LLM.
MLC-LLMCompile-everywhereApache 2.0TVM-based compiler. Targets WebGPU, Metal, Vulkan, ROCm. Best for browser and exotic-backend deployment.
TGIHF integrationApache 2.0Hugging Face's serving stack. Tight integration with Hub. Good for HF-native shops.
ExLlamaV2EXL2 quant speedMITVery fast on NVIDIA, EXL2 quant scheme has good quality/size frontier. Niche but excellent.
Frankenstellm (us)Multi-organ system, single C++/CUDA binary (gigachad_native)MITSingle C++/CUDA binary. No PyTorch, no cuBLAS. 83.58 tok/s Q4 7B on RTX 3060 Ti. Hologram cache (860× repeat speedup), Black-Dog router, expert streaming for V4-Flash.

The decision tree: local dev / laptop → Ollama or llama.cpp; self-hosted serving for many users → vLLM or TGI; browser / mobile → MLC-LLM; NVIDIA-only with EXL2 quants → ExLlamaV2; writing your own runtime as a research target → start by reading llama.cpp source, then ours.

4 · Fine-tuning toolchains.

ToolWhat it isLicensePosition
TRLHF Transformer-RLHF/PEFTApache 2.0The reference implementation for SFT, DPO, PPO, GRPO. Pair with PEFT and Transformers libraries.
axolotlYAML-driven trainingApache 2.0Configure-by-YAML wrapper around TRL/DeepSpeed. Most-used community fine-tuning pipeline.
UnslothFast, memory-efficientApache 2.0Custom Triton kernels. ~2× faster QLoRA training than vanilla. Excellent for single-GPU work.
QLoRA (paper + bitsandbytes)4-bit base + LoRA deltaMIT (bitsandbytes)The technique that made 65B-class fine-tunable on 24 GB. Underpins almost everything below the multi-GPU tier.
DeepSpeedZeRO shardingApache 2.0For multi-GPU full fine-tuning. Pairs with Megatron-LM for very large training runs.
Surgery doctrine (us)4-axis acceptance gatingMIT (code) / CC-BY-SA (docs)Wraps QLoRA with anchor / strict-schema / target-bench / no-leak gates. Reverts kept in public ledger. See /surgery and the BD-series reports in /r/.

5 · Agent frameworks.

FrameworkStyleLicensePosition
LangChainComposable Python pipelineMITLargest community, biggest surface area, most legacy code. Useful as a starting point; many production teams replace it with a thinner stack as scale grows.
LangGraphStateful multi-agentMITExplicit state-machine flavour. Good fit for production agents with clear control flow.
AutoGenConversational multi-agentCC-BY 4.0 / MITMicrosoft Research origin. Multiple agents converse to solve tasks. Cleaner abstractions than LangChain's early generations.
CrewAIRole-based teamsMITHigher-level than AutoGen. "Crew" of role-typed agents. Lightweight and fast to prototype with.
OpenAI Agents SDKOpenAI-nativeMIT (SDK)Released 2025. Tight integration with OpenAI tools. Open-source SDK against a closed runtime — caveat emptor.
Frankenstellm (us)Multi-organ runtimeMITNative C++/CUDA cognitive runtime. One Q4 7B top-brain + 8 surgered specialist organs (5 GREEN after BD9), Black-Dog conductance router, line-addressable memory spine.

6 · Embeddings + vector databases.

LayerToolLicensePosition
Embedding modelsentence-transformersApache 2.0The default Python interface for embeddings. Wraps any HF model into .encode().
Embedding modelBGE / GTE / E5 / NomicApache 2.0 (most)Top-of-MTEB-leaderboard families. BGE-M3 is multilingual; GTE-Qwen is the strongest open as of mid-2025.
Vector DB (in-process)FAISSMITLibrary, not a service. Best for embedded use and reproducible research. Limited operationally.
Vector DB (server)QdrantApache 2.0Rust, strong query language, payload filtering. Production-ready single-node and cluster.
Vector DB (server)MilvusApache 2.0CNCF graduate. Heavyweight but mature. Best for very large indexes and multi-tenant deployments.
Vector DB (light)ChromaApache 2.0Python-friendly, easy to embed. Best for prototypes and small deployments.
Vector DB (Postgres)pgvectorPostgreSQLA pgvector index inside your existing Postgres. Best when you want one database, not two.

7 · Image / video / audio.

ModalityProjectLicensePosition
Image genStable Diffusion XL / 3.5SAI licenseSDXL is the production baseline; SD 3.5 sharper but more restrictive license. Pair with ComfyUI or A1111 for UX.
Image genFLUX.1 (Black Forest Labs)FLUX license (dev / pro)Higher quality than SDXL on prompts that need fidelity. License read carefully — dev is non-commercial.
Image genHiDreamMIT (some)Strong open alternative for 2025+. Worth comparison vs FLUX dev.
Image editInstructPix2Pix, AnydoorMIT / ApacheEdit-by-prompt. Reasonable for simple changes; fragile on complex scenes.
Video genHunyuanVideo / Mochi / WAN 2.1Various2025 wave of open video. Quality has caught up to commercial Sora-class on short clips. 30+ GB VRAM still typical.
ASRWhisper / Distil-Whisper / Faster-WhisperMITOpenAI's gift. Distil-Whisper / Faster-Whisper are 4× faster on the same hardware. Multilingual.
TTSOpenVoice / XTTS / Piper / KokoroMIT / ApacheVoice cloning at usable quality, on consumer hardware. Kokoro is the smallest, runs on CPU.

8 · Training data.

The frontier models train on multi-trillion-token corpora that are not redistributable. The open-data world is recovering — slowly — through projects that publish the recipe and curated subsets:

DatasetScaleLicensePosition
FineWeb / FineWeb-Edu15T tokensODC-BYHuggingFace's curated Common Crawl. Educational subset is the modern training-data baseline.
RedPajama-V230T tokensVarious sourceTogether's reproduction of LLaMA training mix. Heavy but well-documented.
The Pile / Pile-Uncopyrighted825 GBVariousEleutherAI's classic. Pile-Uncopyrighted is the legally cleaner subset.
Dolma3T tokensODC-BYAllen AI's pre-training corpus for OLMo. Recipe and tooling fully open.
SlimPajama / RefinedWeb627B / 600BVariousHigh-quality deduplicated subsets. Practical for medium-scale pretraining.
Common Crawl3-4 PB / monthCC-BY-style for crawlerThe raw input. Most of the above are processed subsets of CC.

9 · Where CyberdyneLabs sits in this stack.

Six programs, all open. We sit in the open-weights tier: we ship deltas, not donor weights, because the donor (Qwen 2.5) is already openly available under Apache 2.0. The full position:

LayerOur contributionLicenseWhere
RuntimeFrankenstellm — single C++/CUDA binary (gigachad_native), Physarium-7B Q4 at 83.58 tok/s on 3060 TiMIT/frankenstellm
Fine-tuningSurgery doctrine — QLoRA + 4-axis gate + revert ledgerMIT (code) / CC-BY-SA (docs)/surgery
Multi-organ orchestrationFrankenstellm — 7B brain + 8 specialist organs, Black-Dog routerMIT/frankenstellm
Cognitive engineADAM — 1.2M concepts, Cl(3,0) dynamics, line-addressable spineMIT/adam
Embodied AIMACHINA — N-d cognitive substrate world simulatorMIT/machina
Layer-1 substratePhysarumChain — bio-routed L1, Cl(4,1) addresses, AMMMIT/physarum
4D agent ecosystemHypercolony — 1024 agents on hypercube, Ibn Khaldun cycleMIT/hypercolony
DoctrineTruth ledger, ARIZ kernel, Black-Dog learning loopCC-BY-SA 4.0/downloads
Reports66 dated research reports, individually addressableCC-BY-SA 4.0/r/

We do not run a paid API gateway, a metered weight-rental service, or a closed managed offering. The entire stack ships from /downloads as a single doctrine pack (24 docs) and reports archive (66 reports) plus the runtime binary. Patient capital, deliberate engineering, an aversion to premature deployment.