Open Source · Apache 2.0 · Air-Gap Ready · Federated Knowledge · Public Release: April 13, 2026

Sovereign AI Infrastructure.
Self-Hosted. Deterministic. Graph-accumulating.

MoE Sovereign is a template-based Multi-Model Orchestrator that runs entirely on your own hardware. Requests are classified, routed to specialised LLM experts, enriched by a knowledge graph and real-time web search, then synthesised by a Judge model — all without sending data to external APIs. Community knowledge bundles enable Federated Knowledge Sync where every deployment enriches the collective intelligence.

curl -sSL https://raw.githubusercontent.com/h3rb3rn/moe-sovereign/main/install.sh | bash
15 Expert Domains
46 MCP Precision Tools
9.3× Accumulation Speedup
46.7 % GAIA Score (Level 1 · n=30 · GPT-4o Mini 44.8 %)
0 Mandatory Cloud Calls
1M+ Effective Context Tokens (Tier-2 Memory)

Project Resources

Documentation, source code, and community tools for getting started with your own MoE Sovereign deployment.

portal.moe-sovereign.org ↗

User self-service: create API keys, configure Claude Code profiles, assign expert templates, view token usage.

chat.moe-sovereign.org ↗

Open WebUI with a direct connection to the MoE API. All model IDs and expert modes available immediately.

api.moe-sovereign.org ↗

OpenAI-compatible and Anthropic Messages API endpoint. Per-user token limits enforced server-side.

moe-libris.org ↗

Federated knowledge exchange: share and import knowledge graph entries across sovereign MoE instances.

moe-codex.org ↗

EU-sovereign compliance data platform: catalog, approval workflows, lineage, versioning, and drift detection for regulated deployments.

moe-admin.de ↗

German version of the project website with full project description.

moe-sovereign.org

This page — the international English version of the project.

What is MoE Sovereign?

Instead of a single massive model on an expensive GPU, many specialized models are coordinated — each running on the hardware best suited for it.

The Problem

Modern Large Language Models (LLMs) like GPT-4 or Claude require significant investments in GPU hardware for self-hosting — and create a permanent cloud dependency with corresponding privacy risks. For businesses, research institutions, and privacy-conscious users, both paths are often not an ideal option.

The Solution: Multi-Model Orchestrator + Flexible Backend

MoE Sovereign distributes inference across a cluster of nodes. Each request is analyzed by an intelligent planner, routed to the appropriate expert models, and the results are synthesized by a merger model. The outcome: for structured knowledge and research tasks, on par with smaller cloud models — with full data control as an option and at a fraction of the running cost.

Privacy by Design is an architectural option, not a constraint: The inference backends can be Ollama instances on your own hardware, but equally well Claude API endpoints, your own enterprise AI hubs, or cloud inference services. The MoE system is the routing layer — it is decoupled from the hardware.

The API is fully OpenAI-compatible and implements the Anthropic Third-Party Inference Gateway spec, so existing tools like Open WebUI, Claude Desktop, Claude Cowork, Claude Code, or any OpenAI SDK integration work without modification.

Federated Knowledge Ecosystem

Knowledge bundles enable structured exchange of domain-specific knowledge graphs between independent deployments. Each instance remains autonomous and offline-capable; shared bundles enrich the local graph without transferring source data or proprietary information.

Privacy by Design

As an option: all data stays on your own infrastructure. Not a single API call leaves your network — if you want.

Cloud Flexibility

Equally usable as a routing layer in front of cloud services: Claude, Gemini, Azure OpenAI, or your own enterprise AI hubs.

Legacy Hardware

Tesla K80 to RTX 3060: retired enterprise hardware and affordable consumer GPUs are sufficient for distributed inference.

Open Source

Fully licensed under Apache 2.0. No vendor lock-in, no hidden costs, no proprietary stack.

Up to 75 % Less API Costs

Not every question needs a 100-billion-parameter model. MoE Sovereign classifies requests heuristically and routes them to the cheapest model that can solve the task — without LLM overhead for the classification itself.

~75 % API Cost Reduction vs. routing all requests to Cloud LLM
0 € Marginal Cost Self-Hosted per request on local hardware
70–85 % Routed to Self-Hosted trivial & moderate requests
~1,600 Tokens Saved per Cache Hit Planner cache (Redis L2)

Intelligent Routing by Complexity

Heuristic complexity routing classifies every request without an LLM call into three tiers — loading expensive models only when truly needed.

~55 % Trivial Self-Hosted T1
qwen3.5:32b, phi4:14b
0 € / request
~30 % Moderate Self-Hosted T2
qwen3.5:72b, mistral-large
0 € / request
~15 % Complex Optional: Cloud API
Claude, GPT-4 or
Self-Hosted 120B+

Internal benchmark result (reference setup): In the AIHUB H200 benchmark (proprietary, internal evaluation framework), the M10 Council template (8 experts on legacy hardware with gpt-oss:120b + qwen-3.5:122b) scored 9/9 (100 %) — fully self-hosted, 0 cloud API calls. In the demo reference deployment, Claude Code using the moe-orchestrator-agent-orchestrated profile delegates over 80 % of sub-tasks to self-hosted experts.

📈 Request Distribution by Complexity

💰 Cost Comparison vs. Pure Cloud

Everything in One Stack

MoE Sovereign ships with all components needed for a production-ready AI infrastructure.

OpenAI & Anthropic API

Drop-in replacement for OpenAI endpoints and Anthropic Messages API. Every compatible tool works without code changes.

Claude Desktop & Cowork

Full Anthropic Third-Party Inference Gateway compatibility. Claude Desktop and Claude Cowork route all inference through your MoE Sovereign cluster — no prompt ever leaves your own infrastructure. One-command setup via scripts/setup-claude-desktop.sh.

15 Expert Domains

Specialized LLMs for law, medicine, code, mathematics, translation, security, vision and more — coordinated by a Judge model with two-tier escalation.

51 MCP Precision Tools

Deterministic calculations via AST-whitelist: math, date arithmetic, unit conversion, network tools, code review, file generation — 100% accuracy, zero hallucination.

GraphRAG & Knowledge Graph

Neo4j-based knowledge graph with 2-hop traversal, automatic ingest via Kafka, and feedback integration.

4-Layer Caching

ChromaDB semantic cache, Redis plan cache, GraphRAG cache, and performance scores reduce latency and GPU load.

Private Web Search

SearXNG meta-search engine without tracking for research queries — fully self-hosted, no external search requests.

User Management

API keys, token budgets, Claude Code profiles, and expert templates configurable per user — via admin UI or REST API.

Monitoring & Observability

Prometheus metrics, 5 pre-built Grafana dashboards, real-time pipeline logs via WebSocket in the admin UI.

Starfleet — Ambient Intelligence

LCARS-style status dashboard with a proactive watchdog alert loop, live node health (15 s polling), email escalation with per-alert cooldown, cross-session mission context with per-template system-prompt injection, and hot-reloadable thresholds — no container restart required.

Deterministic Complexity Routing

Rule-based request classification (trivial/moderate/complex) without an LLM call — deliberately non-learned, and therefore fully transparent, reproducible and free of black-box decisions. Saves up to 80% of pipeline costs for simple queries.

Self-Correction Loop

Feedback (rating 1–5) flows into expert performance scores and few-shot examples — the system learns from mistakes automatically.

Vision & Multimodal

Image, screenshot, and document analysis via Base64 input through multimodal Tier-2 expert models.

Kafka Event Streaming

Asynchronous background processing: GraphRAG ingest, request audit log, and feedback processing decoupled from the HTTP path.

Thompson Sampling (RL)

Stochastic expert scoring via Beta distribution instead of static Laplace estimates. Natural exploration of underutilized experts without cold-start problems.

Correction Memory

Past corrections are stored as Neo4j nodes and automatically injected as context into expert prompts when similar queries arise.

Context Window Abstraction

Automatic budget computation per model context window. Per-template configurable history compression with GraphRAG as long-term memory.

1M-Token Context Window

Tier-2 Semantic Memory via ChromaDB: conversation turns are embedded as vectors and retrieved on demand via direct numpy cosine ranking (no HNSW approximation error). The effective context window extends well beyond any native LLM limit — model-agnostic and without additional token cost at inference time.

Agentic Re-Planning Loop

After each synthesis a gap detector checks completeness. Unresolved questions trigger a focused follow-up round automatically — no user intervention, up to 3 agentic iterations per request.

PowerPoint Generation (MCP)

The generate_pptx MCP tool creates fully formatted presentations directly from the chat and delivers a signed download link — no export step, no manual authoring required.

Selective Template Export

In the admin UI, individual expert templates and CC profiles can be checkbox-selected for targeted export — no need to export the full set every time.

Security Hardening

SSRF protection for outbound URL requests, rate limiting at API level, and container hardening (read-only filesystem, no-new-privileges, restricted capabilities). Defense-in-depth against common attack vectors even in self-hosted deployments.

Lineage & Data Catalog

OpenLineage events flow into an embedded Marquez server; the /catalog admin page aggregates Marquez datasets, Neo4j knowledge domains, and lakeFS repositories into a single searchable, source-filterable table — Foundry-inspired cross-source browsing without leaving the admin UI.

Versioning & Branch-based Approval

Every external knowledge bundle is staged on a lakeFS branch pending/<tag>-<ts> instead of being written straight into Neo4j. Admins review pending imports on /approval and decide with one click: approve (Neo4j MERGE + lakeFS merge to main) or reject (branch delete). Explicit gate before any write hits the live graph.

NiFi ETL Fan-Out

Apache NiFi with the ListenHTTP processor receives bundle submissions and fans them out to the cluster as OpenLineage runs. The ETL layer is auditable on /enterprise — each run shows up with its inputs, outputs, and status fields in the lineage overview.

Data Health & Drift Detection

Every successful knowledge import is wrapped in a stats snapshot; compute_drift() flags entity_dedup_suppressed, zero_entities_added, entity_count_shrank and similar regressions. Events surface on the Enterprise dashboard with severity pills (ok / info / warn / crit) and persist in a Redis ring buffer (cap 500). Threshold tunable via DATA_HEALTH_DRIFT_THRESHOLD.

Read-only Cypher Explorer

In-page Cypher editor at /explorer with two independent write-protection layers: a regex blacklist rejecting CREATE/DELETE/SET/MERGE/REMOVE/DROP/ALTER/GRANT/REVOKE/FOREACH plus the driver in READ_ACCESS mode. Includes preset queries and a deep-link to the standalone Neo4j Browser — ad-hoc analysis with no risk to the live graph.

JupyterLite Notebook in the Admin UI

Embedded JupyterLite (browser-only WebAssembly Python — no server-side kernel needed) at /notebook alongside five copy-paste snippets for the orchestrator API (export, pending-import, search, Cypher, lineage runs). Power users prototype against the live graph without installing a Python kernel anywhere. JUPYTERLITE_URL configurable for air-gapped deployments.

System Architecture

LangGraph-driven pipeline with parallel expert fan-out, 4-layer caching, and asynchronous Kafka backend.

Docker Services

Running services and their ports
Service Image Port Function
LangGraph OrchestratorPython/FastAPI8002Main service: API, pipeline, streaming
MCP Precision ToolsPython800351 deterministic tools (AST-whitelist)
ChromaDBChromaDB8001Vector database for semantic caching
RedisRedis Stack6379Plan cache, performance scores, checkpoints
Neo4jNeo4j 5 Community7474/7687Knowledge graph for GraphRAG
KafkaApache Kafka KRaft9092Event streaming, audit log, feedback loop
PrometheusPrometheus9090Metrics (API, GPU, containers, host)
GrafanaGrafana30015 pre-built monitoring dashboards
SearXNGSearXNG8888Private meta-search engine without tracking
MarquezOpenLineage5000Lineage server — inputs/outputs of every pipeline run (optional, Enterprise Stack)
lakeFSlakeFS8000Git-style versioning of knowledge bundles on MinIO (optional, Enterprise Stack)
Apache NiFiNiFi8443ETL fan-out via ListenHTTP processor (optional, Enterprise Stack)

Two-Tier Model Architecture

Tier properties and escalation criteria
Tier Parameters VRAM (4-bit) Usage Escalation
T1 ≤ 20B 8–16 GB Fast first opinion, most requests When CONFIDENCE < 0.65
T2 > 20B 16–40 GB Complex reasoning tasks, low confidence Endpoint

4-Layer Caching

L1

Semantic Cache

ChromaDB vector search
Cosine distance < 0.15 → direct hit

permanent
L2

Plan Cache

Redis: planner LLM output
saves ~1,600 tokens per hit

30 minutes
L3

GraphRAG Cache

Redis: Neo4j context queries
avoids redundant graph traversals

1 hour
L4

Performance Scores

Redis: model ratings per category
Laplace smoothing for routing

permanent

Three-Tier Conversation Memory — Effective 1M-Token Context Window

MoE Sovereign overcomes the native context window limits of individual models through a three-tier memory architecture. Each tier covers a different time horizon — without additional token costs at inference time.

T1

Hot Memory

The last n conversation turns verbatim in the LLM context. Zero loss, instant access, no retrieval overhead.

current session
T2

Warm Memory (Semantic)

Evicted turns are stored as nomic-embed-text vectors (768 dim.) in ChromaDB. Retrieval: direct numpy cosine ranking → topic-overlap fallback → keyword metadata filter. Guaranteed recall at 1M+ stored tokens.

configurable TTL
T3

Cold Memory (GraphRAG)

Neo4j knowledge graph: persistently stored facts, entities, and relations. Queried automatically via GraphRAG for knowledge-intensive questions.

permanent

Comparison: Native Context Window vs. Tier-2 Semantic Memory

Effective context depth and privacy across systems
System Native window Effective window Privacy Inference cost
GPT-4o (OpenAI) 128,000 tokens 128,000 tokens ☀︎ Cloud per token
Claude 3.5 Sonnet 200,000 tokens 200,000 tokens ☀︎ Cloud per token
Local 7B model (no SM) 4,000–32,000 tokens 4,000–32,000 tokens 🔒 Local 0
MoE Sovereign + Tier-2 SM 4,000–32,000 (model) 1,000,000+ tokens (infra) 🔒 Local 0

MRCR-lite v2 — Benchmark Results (60 runs, April 2026)

The MRCR-lite v2 benchmark injects facts ("needles") into a synthetic conversation and forces them out of the LLM context via filler turns. The only variable: ChromaDB pre-seeded (WITH) or empty (WITHOUT).

Recall by needle depth — moe-memory-aihub-hybrid, nomic-embed-text 768-dim
Depth (filler turns) WITHOUT Semantic Memory WITH Semantic Memory Status
50.0001.000✓ Benchmark confirmed
100.0001.000✓ Benchmark confirmed
200.0001.000✓ Benchmark confirmed
50–1000.000~1.000Retrieval unit test ✓ (rank #1, dist. 0.34)

60 runs: 5 needles × 3 depths × 2 conditions × 2 reps. Overall WITH score: 1.000.

Token Overhead of the MoE Cycle

Token overhead by category (10 prompts, April 2026)
CategoryDirectMoEOverhead factor
Knowledge~4,640~29,4506.35× ← lowest
Coding~1,880~18,95010.36×
Math~1,270~15,40012.48×
Reasoning~1,750~16,00014.76×
Instruction following~460~18,70042.66×
Overall~2,011~19,84417.32×

Fixed prompt cost of the MoE cycle: constant ~11,000 tokens per request. Recommendation: MoE pipeline for knowledge-intensive queries; native mode (moe_mode: native) for short, simple questions.

Mode Comparison: Strengths & Trade-offs

Four operating modes — overhead, strengths, weaknesses, ideal use cases
ModeOverheadStrengthsWeaknessesBest for
native Minimal latency, zero overhead, instant response No memory, no multi-expert, no tools Short questions, calculations, quick lookups
moe_orchestrated6–43× (avg 17×) Multi-expert synthesis, MCP tools, GraphRAG, self-correction High token overhead; poor ROI for simple queries Complex, cross-domain questions; research; code review
moe + Semantic Memory17× + ~50ms Long-term memory across sessions; depth 5–20+ at 1.0 recall Embedding warm-up needed; ~50ms retrieval overhead Project assistance, support, multi-session research
moe + Cross-Session17× + ~50ms Shared team knowledge; institutional memory; scope hierarchy Explicit sharing required; privacy setup needed Knowledge management, shared project spaces, support teams

Use Cases

💻

Software Project

Architecture decisions, bug reports and API discussions from past sessions are retrieved automatically during code reviews. "Why did we choose PostgreSQL over MongoDB?" — answered immediately.

Overhead: 10.36×
📚

Knowledge Management

Team members share research results and findings. What person A discovered last week can be retrieved by person B today via cross-session — no repeated research. Lowest overhead factor.

Overhead: 6.35×
🧍

Consulting & Support

In follow-up conversations with the same customer, the system remembers previous solutions, preferences and agreements. No need to re-explain context at the start of each new session.

Overhead: 6–15×
📋

Research & Analysis

Weeks of research accumulate. Hypotheses, sources and interim results from session 1 are still retrievable in session 20 — the system keeps thinking where a human would have stopped.

Overhead: 6.35–12×

Compatibility & Activation

Tier-2 Semantic Memory is fully OpenAI API-compatible. No client code changes are required — Open WebUI, Claude Code, and any OpenAI SDK client benefit automatically. Enable per template in the Admin UI:

{
  "enable_semantic_memory": true,
  "semantic_memory_n_results": 8,
  "semantic_memory_ttl_hours": 168,
  "enable_cross_session_memory": true,
  "cross_session_scopes": ["private", "team"]
}

15 Configurable Expert Domains

Each expert domain is fully configurable via the Admin UI — assign any LLM, set system prompts, define tier strategy and GPU node. No code changes required.

Expert categories and their capabilities (models are configurable per template)
Category Tier Strategy Use Case Special Features
generalT2General knowledge, definitions, explanations
mathT1+T2Calculations, equations, statisticsMCP tools + SymPy
technical_supportT1+T2IT, DevOps, Docker, networking, LinuxMCP network tools
code_reviewerT2Code review, security, refactoringOWASP-focused
creative_writerT2Content creation, marketing, storytelling
medical_consultT1+T2Medical information (not professional advice)Critic node
legal_advisorT2Legal research (not professional advice)MCP law tools
translationT2Professional translation (multilingual)
data_analystT1Statistics, data analysis, SQLMCP stats
scienceT2Chemistry, biology, physics
reasoningT1+T2Complex logic, strategy, analysisThinking node
visionT2Image and screenshot analysisMultimodal
agentic_coderT2Autonomous code generationFull-file output
web_researcherT1Web research via SearXNGReal-time search
tool_expertT1MCP tool orchestration51 tools

All expert assignments (LLM model, GPU node, system prompt, tier) are configured via Expert Templates in the Admin UI or via the EXPERT_TEMPLATES environment variable. See the Templating Guide.

The CONFIDENCE System

Each expert returns a confidence score alongside its response. This determines whether the result is used directly or escalated to a more capable Tier-2 model:

Output Modes

Available output modes (model field)
Model ID Mode Description
moe-orchestratorStandardFull answers with explanations
moe-orchestrator-codeCodeCode output only, no prose
moe-orchestrator-conciseConciseMax 120 words, no filler text
moe-orchestrator-researchResearchDeep analysis with source references
moe-orchestrator-reportReportStructured report with sections
moe-orchestrator-agentAgentTool-use optimized for agents
moe-orchestrator-agent-orchestratedAgent MoEClaude Code with full MoE fan-out
moe-orchestrator-planPlanTask planning with step list

Deterministic Tools Without Hallucinations

LLMs hallucinate on calculations, date arithmetic, and legal statutes. 51 MCP Precision Tools with AST-whitelist security replace these with exact, verifiable results — 100% accuracy, zero variance.

✦ Mathematics

  • calculate – Safe arithmetic evaluation
  • solve_equation – SymPy equation solver
  • prime_factorize – Prime factorization
  • gcd_lcm – Greatest common divisor / LCM
  • roman_numeral – Arabic ↔ Roman

📅 Date & Time

  • date_diff – Difference between dates
  • date_add – Add/subtract from date
  • day_of_week – Calculate day of week

📏 Units & Statistics

  • unit_convert – km, miles, kg, lb, °C, °F, ...
  • statistics_calc – Mean, median, std dev, percentiles

🔒 Cryptography & Encoding

  • hash_text – MD5, SHA-256, SHA-512
  • base64_codec – Base64 encode/decode

🌐 Networking

  • subnet_calc – CIDR analysis, netmask, broadcast

📜 Text & Patterns

  • regex_extract – Apply regular expressions
  • text_analyze – Word count, chars, sentences
  • json_query – JSONPath extraction

⚖ German Law

  • legal_search_laws – Search statutes
  • legal_get_law_overview – Law overview
  • legal_get_paragraph – Retrieve paragraphs
  • legal_fulltext_search – Full-text search (BGB, StGB, ...)

Requirements & Deployment

MoE Sovereign runs on any hardware with Docker — from a single VM to a multi-node GPU cluster. The orchestrator itself needs no GPU; It requires no GPU and no VRAM; inference is handled by external backends (e.g. self-hosted GPU nodes or cloud APIs).

1

Solo Profile

  • TargetSingle VM, Proxmox LXC, Raspberry Pi 5, Windows WSL 2
  • RAM8 GB minimum
  • GPUOptional (API-only mode possible)
  • Disk40 GB
  • Installdeploy/lxc/setup.sh
~1.5 GiB RAM footprint
2

Team Profile

  • TargetDocker host, homelab server
  • RAM16 GB+ recommended
  • GPUNot required (external inference backends)
  • Disk100 GB+
  • Installdocker compose up -d
~6 GiB RAM footprint
3

Enterprise Profile

  • TargetK3s, Kubernetes, OpenShift (architecturally prepared — community validation requested)
  • FeaturesHA, HPA, PDB, multi-tenant
  • GPUMulti-node heterogeneous clusters
  • StorageExternal data tier (Longhorn, NFS)
  • Installhelm install moe charts/moe-sovereign
Variable — scales with workload

One OCI Image, Three Profiles: The same container image runs across all deployment targets. Only the environment and surrounding wrapper differ — no code forks, no feature loss. VRAM-aware scheduling automatically distributes models across heterogeneous GPU nodes based on configurable per-node VRAM limits.

curl -sSL https://raw.githubusercontent.com/h3rb3rn/moe-sovereign/main/install.sh | bash

Admin UI — System Monitoring Dashboard

The built-in monitoring dashboard provides real-time metrics at a glance: active sessions, LLM server status across all GPU nodes, token usage per model, cache hit rate, expert call distribution, and user ratings.

MoE Sovereign Admin UI — System Monitoring Dashboard showing LLM server status, token usage, cache performance, and expert call statistics
System Monitoring dashboard after production traffic — all six system gauges, LLM server cards, Chart.js widgets for token usage, cache performance, expert categories, and latency ratings. Documentation ›

OpenAI-Compatible Quick Start

MoE Sovereign behaves like the OpenAI API and additionally supports the Anthropic Messages API. Every existing integration works without code changes.

Quick Start with cURL

bash POST /v1/chat/completions
curl -X POST https://api.moe-sovereign.org/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <YOUR-API-KEY>" \
  -d '{
    "model": "moe-orchestrator",
    "messages": [
      {"role": "user", "content": "Explain the difference between TCP and UDP"}
    ],
    "stream": false
  }'

Enable Streaming

bash Server-Sent Events
curl -X POST https://api.moe-sovereign.org/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <YOUR-API-KEY>" \
  -d '{
    "model": "moe-orchestrator-code",
    "messages": [{"role": "user", "content": "Write a Python Fibonacci function"}],
    "stream": true
  }'

Python with the openai Library

python OpenAI SDK drop-in
from openai import OpenAI

client = OpenAI(
    base_url="https://api.moe-sovereign.org/v1",
    api_key="<YOUR-API-KEY>"
)

response = client.chat.completions.create(
    model="moe-orchestrator-research",
    messages=[{"role": "user", "content": "Analyze the pros and cons of Kubernetes"}]
)
print(response.choices[0].message.content)

Claude Code Integration (.bashrc)

bash ~/.bashrc or ~/.zshrc
# MoE API as Anthropic backend for Claude Code
export ANTHROPIC_BASE_URL=https://api.moe-sovereign.org
export ANTHROPIC_API_KEY=<YOUR-API-KEY>

Full API reference, authentication, budget management, and integration guides in the documentation: docs.moe-sovereign.org ↗

MoE Libris — Federated Knowledge Exchange

What if independent AI systems could learn from each other without giving up their autonomy? MoE Libris makes this possible — a federation hub inspired by the Fediverse (Mastodon, Friendica), where sovereign MoE instances voluntarily share knowledge graph entries as JSON-LD bundles. No central authority, no forced synchronization. Each node decides what to publish and what to accept.

How It Works

MoE Libris follows a hub-and-spoke architecture. Each MoE Sovereign instance runs its own Libris node, which connects to federation partners through a bilateral handshake protocol — both sides must explicitly agree before any data flows. Nodes discover each other through a public Git registry (anyone can register via pull request), keeping discovery decentralized and transparent.

The push/pull cycle works as follows: a node curates knowledge graph triples from its local Neo4j database, packages them as JSON-LD bundles, runs them through a pre-audit pipeline (syntax validation + heuristic scanning for PII and secrets), and pushes them to federation partners. On the receiving side, incoming bundles land in an admin audit queue where every entry requires explicit approval before integration into the local knowledge graph.

This solves concrete problems: data silos between isolated AI deployments, vendor lock-in from proprietary knowledge stores, and the cold-start problem for new installations. A fresh MoE Sovereign deployment can import curated knowledge from the federation and immediately benefit from the collective experience of the network — while keeping full control over what enters its own knowledge base.

Trust Model

Imported triples are never treated as first-class local knowledge. They enter at a configurable trust floor and must accumulate confirmation through local usage before their trust score rises. When an imported triple contradicts an existing local triple, the system flags the contradiction for admin review rather than silently overwriting. This prevents knowledge poisoning while still allowing the network to grow.

Technical Details

Data format: Knowledge entries are serialized as JSON-LD triples (subject-predicate-object) with provenance metadata, timestamps, and trust scores attached. The format is self-describing and interoperable with standard RDF tooling.

Pre-audit pipeline: Before export, every bundle passes through two stages: (1) syntax validation ensuring well-formed JSON-LD and valid triple structure, and (2) heuristic scanning that flags potential PII (names, emails, addresses), API keys, credentials, and sensitive relation types. Flagged entries are held for manual review.

Abuse prevention: The federation implements a graduated strike system. Nodes that repeatedly push low-quality, flagged, or rejected content accumulate strikes. Thresholds trigger rate limiting, temporary suspension, and eventually permanent exclusion from the federation — all enforced locally by each receiving node.

Stack: FastAPI for the federation API, PostgreSQL for federation state and audit logs, Neo4j for the global knowledge graph, Valkey for caching and rate limiting. The entire stack runs in Docker containers alongside the main MoE Sovereign deployment.

moe-libris.org ↗

MoE Codex — Sovereign Data Intelligence for Regulated Sectors

95 % of operators want a sovereign LLM gateway — that’s MoE Sovereign. The remaining 5 % in regulated sectors need documented risk classification, data lineage, approval workflows, and audit trails. That’s MoE Codex — an open-source extension layer architecturally inspired by platforms like Palantir Foundry, without claiming their commercial maturity or enterprise depth.

What MoE Codex Delivers

MoE Codex is an opt-in extension layer deployed alongside a running MoE Sovereign instance. It adds a full data management stack on top of the LLM gateway:

  • Data Catalog: Asset discovery, schema registry, tagging and classification of all data sources.
  • Approval Workflows: Multi-step authorization gates before data reaches AI pipelines. Role-based reviewer assignment and documented decisions.
  • Data Lineage (OpenLineage / Marquez): End-to-end traceability from raw source to inference output.
  • Data Versioning (lakeFS): Git-style branches and commits for datasets. Reproducible snapshots for compliance audits.
  • Drift Detection: Continuous monitoring of knowledge graph metrics and statistical data drift. Prometheus-native alerting.
  • ETL Automation (Apache NiFi): Visual data flow design without writing code.
  • Object Explorer (Cypher): Read-only graph query interface for compliance investigations by data protection officers.
  • JupyterLab Notebook: Proxied notebook environment for reproducible data analysis within the sovereign perimeter.
  • Pipeline Builder (Kestra): Declarative workflow orchestration as a lightweight ETL alternative.
  • Structured Forms (JSONForms): Schema-driven data entry for compliance forms and risk assessments.
  • Charts & Analytics: Embedded pivot analysis and visualisation of catalog and lineage data.
  • Link Analysis (Cytoscape.js): Interactive graph exploration for investigations and relationship analysis.
  • Timeline (vis-timeline): Time-based visualisation of event chains across entities and data movements.
  • Federated Search (OpenSearch): Multi-tenant full-text and vector search across catalog assets.

Regulatory Coverage

MoE Codex was designed with current EU regulations in mind: EU AI Act (Reg. 2024/1689) — high-risk Annex III systems require risk documentation and audit trails; MoE Codex provides both. NIS2 / NIS2UmsuCG — risk management and supply-chain transparency for essential entities. GDPR Art. 35 DPIA — catalog metadata and lineage records document processing activities. BSI Grundschutz & C5 — hosting on BSI-C5-certified EU providers (Hetzner, IONOS, STACKIT, OVHcloud).

The BVerfG 2023 judgment (Hessendata = Palantir Gotham ruled unconstitutional) created an acute need for sovereignly deployable, technically auditable data platforms across the EU. MoE Codex addresses exactly this need as an open-source approach: Apache 2.0, air-gap capable, fully auditable codebase, zero US-cloud dependency, zero vendor lock-in.

Honest positioning: MoE Codex is not a current drop-in replacement for Palantir Foundry in terms of product maturity, enterprise support, or certification depth. It is an architecturally related, transparent open-source platform — with the potential to become a credible long-term alternative in regulated scenarios where auditability and data sovereignty outweigh commercial feature breadth.

moe-codex.org ↗

Roadmap & Milestones

MoE Sovereign reached its public release on April 13, 2026. All four launch phases are complete. Development continues with community contributions and federated knowledge features.

Phase 1: Infrastructure & Deployment

Docker Compose, LXC, Podman, and Helm deployment wrappers. VRAM-aware scheduling across heterogeneous GPU clusters. Prometheus, Grafana, and Kafka observability stack.

Phase 2: Architecture & Pipeline

LangGraph pipeline with two-tier expert escalation, 51 MCP precision tools, Neo4j GraphRAG with trust-score self-healing, 4-layer cache hierarchy, complexity routing, and self-correction loop.

Phase 3: Expert Templates & Benchmarks

69-model LLM suitability study, 15 expert domains, 6 Claude Code profiles, GAIA L1 benchmark (60%), 9.3× compounding effect validated, adversarial MCP testing (9/9 blocked). AIHUB H200 benchmark: 9/9 passed (100%) with gpt-oss-120B + qwen-3.5-122B. M10-Gremium 8-expert template: 9/9 passed on legacy hardware. GAIA Benchmark: 14/30 = 46.7 % — surpasses GPT-4o Mini (44.8 %). 5 iterative runs (2026-04-25): L1 60 %, L2 50 %, L3 40 % (best run). 8 new deterministic MCP tools: wikidata_sparql, pubmed_search, crossref_lookup, openalex_search, web_browser, wayback_fetch. Thompson Sampling (RL flywheel), Correction Memory, Context Window Abstraction Layer.

Phase 5: Tier-2 Semantic Memory — April 2026 🧠

Effective 1M-token context window through infrastructure rather than model upgrades: evicted conversation turns are stored as nomic-embed-text vectors (768 dim.) in ChromaDB and retrieved on demand via hybrid retrieval (direct cosine ranking + keyword fallback). Template flag enable_semantic_memory: true activates Tier-2 for any expert template with no additional token cost at inference time. Validated by MRCR-lite v2 benchmark (needle recall at depths 5–100) — overall score 1.000; full benchmark results in the context window documentation.

License: Apache 2.0 · Stack: Python + FastAPI + LangGraph · Minimum hardware: no VRAM – inference via external API backends

System in Action

Live screenshots from production operation — Admin UI, live monitoring, Grafana dashboards, container logs, and knowledge graph.

MoE Sovereign Admin UI — Full overview with system status, expert nodes, and configuration menu
Admin UI — Full Overview — System status, registered expert nodes, LLM configuration, and routing profiles at a glance.
MoE Admin Live Monitoring — Real-time pipeline status, token consumption, latencies, and expert call statistics
Admin UI — Live Monitoring — Real-time pipeline status, token consumption, cache hit rate, expert categories, and latencies. Documentation ›
Grafana Dashboard — GPU and inference node utilization in real-time for all cluster nodes
Grafana — GPU & Inference Nodes — VRAM utilization, GPU load, and inference throughput of all cluster nodes in real-time.
Grafana Dashboard — Knowledge Base Health with ontology metrics, gap queue, and Neo4J statistics
Grafana — Knowledge Base Health — Ontology growth, gap queue depth, corrections, and Neo4J database statistics.
Dozzle Docker Log Viewer — Real-time container logs for all MoE services
Dozzle — Container Logs — Real-time log streaming for all MoE services — orchestrator, healer, admin UI, and MCP server.
Neo4J Browser — Knowledge graph visualization with 500 entity nodes and their relationships
Neo4J — Knowledge Graph — 500 entity nodes with semantic relationships — curated by the LLM-powered ontology healer.

The Complete Sovereign AI Stack

MoE Sovereign is the core — a fully self-hosted LLM gateway with expert routing, GraphRAG, and MCP precision tools. Two optional extensions complete the platform: MoE Codex adds enterprise data intelligence, and MoE Libris enables federated knowledge exchange between sovereign deployments.

MoE Sovereign — LLM Core

The centre of the stack. Template-based multi-model orchestrator with 15 specialist experts, 51 deterministic MCP tools, Neo4j GraphRAG, 4-layer caching, Kafka event streaming, and a 1 million-token semantic memory layer. Runs air-gap ready on any Linux host. Zero mandatory cloud calls.

API: OpenAI-compatible + Anthropic Messages API · Port: 8002

MoE Codex — Data Intelligence Extension

Optional add-on for regulated sectors. Extends the core with a full Palantir Foundry-inspired data management stack — all open source, all deployable alongside MoE Sovereign without touching its configuration.

  • Data Catalog & Lineage — Marquez OpenLineage, cross-source catalog browser
  • Data Versioning — lakeFS Git-style branches and approval gates
  • ETL Automation — Apache NiFi visual flow canvas
  • BI & Analytics — Apache Superset dashboards, Trino federated SQL
  • Investigation Tools — link analysis, timeline, dossier, geospatial layers
  • Policy Enforcement — Open Policy Agent ABAC/RBAC
  • Document Intelligence — DocLing OCR & entity extraction
  • Federated Search — OpenSearch across all catalog sources

Coverage: 92 % of Palantir Foundry/Gotham/AIP surface area · Apache 2.0

MoE Libris — Federation Hub

Optional federation layer. Independent sovereign deployments exchange curated knowledge graph bundles via a Fediverse-inspired hub-and-spoke protocol. Bilateral consent handshake, pre-audit PII pipeline, trust-scored imports, and admin approval queue — no central authority, no forced synchronisation.

Protocol: JSON-LD triples with provenance · Anti-poison: conflict detection + strike system

How the Three Layers Work Together

Stack interaction summary
Layer Role Interfaces with Required
MoE Sovereign LLM gateway, expert routing, GraphRAG, MCP tools Clients via OpenAI / Anthropic API, Codex via REST, Libris via bundle import Yes — core platform
MoE Codex Data catalog, lineage, versioning, BI, investigation, compliance Receives OpenLineage events from Sovereign; writes approved bundles back to Neo4j Optional — regulated deployments
MoE Libris Federated knowledge exchange between sovereign instances Sends / receives JSON-LD bundles; imports land in Codex approval queue Optional — multi-cluster deployments

Industry Use Cases

Government & Authorities

Deploy Sovereign for citizen-query routing and legal-advisor expert. Add Codex for EU AI Act audit trails, NIS2 risk documentation, and the OPA policy layer that enforces classification markings. Use lakeFS to snapshot evidence datasets before every decision run.

Healthcare & Pharma

Sovereign handles medical consultation routing and document analysis via DocLing. Codex tracks clinical trial dataset versions in lakeFS, records full provenance in Marquez, and surfaces compliance gaps in Superset dashboards connected to Trino’s federated SQL layer.

Banking & Compliance

Route model-risk and regulatory queries through Sovereign’s expert ensemble. Codex delivers the complete audit trail required under DSGVO Art. 35 and BSI C5: OpenLineage lineage from source to inference output, lakeFS dataset commits, OPA policy decisions, and Superset compliance dashboards. OpenSearch enables cross-system investigations without data movement.

GitHub: moe-sovereign ↗   MoE Codex ↗   MoE Libris ↗